Data Seminar

Data Science Seminar

Welcome to the Data Science Seminar. The topics of the talks will vary among multiple topics in applied analysis, probability, applied mathematics related to data and dynamical systems, statistical and machine learning, signal processing, and computation.
To the attending graduate students: together with each talk we include a list of papers that are relevant to that talk, and we strongly encourage the graduate students to read those papers both in advance of the talk and after it.

Click here to subscribe to the calendar for the seminar series.

Spring 2024
Fall 2022
Spring 2022
Fall 2021
Spring 2020
Fall 2019
Spring 2019
Fall 2018
Spring 2018
Fall 2017
Spring 2017
Fall 2016

Organizers: Xiong Wang, Mauro Maggioni, Fei Lu. Contact us if you would like to meet with the speakers.

Spring 2024: The Data Science Seminar will be held in person Krieger 411, at 3:00-4:00pm EST on Wednesdays .

April 10

In person, Krieger 411

Inbar Serrousi
Department of Mathematics,

From Stochastic to Deterministic: SGD dynamics of non-convex models in high dimensions

Stochastic gradient descent (SGD) stands as a cornerstone of optimization and modern machine learning. However, understanding why SGD performs so well remains a major challenge. In this talk, I will present a theory for SGD in high dimensions when the number of samples and problem dimensions are large. We show that the dynamics of one-pass SGD applied to generalized linear models and multi-index problems with data possessing a general covariance matrix become deterministic in the large sample and dimensional limit. In particular, the limiting dynamics are governed by a set of low-dimensional ordinary differential equations (ODEs). Our setup encompasses many optimization problems, including linear regression, logistic regression, and two-layer neural networks. In addition, it unveils the implicit bias inherent in SGD. For each of these problems, the deterministic equivalent of SGD enables us to derive a close approximation of the generalization error (with explicit and vanishing error bounds). Furthermore, we leverage our theorem to establish explicit conditions on the step size, ensuring the convergence and stability of SGD within high-dimensional settings. This is joint work with Elizabeth-Collins-Woodfin, Courtney Paquette, and Elliot Paquette, for more information see arXiv2308.08977.

May 1, 3:00-4:00pm EST

In person, Krieger 413

Richard Zhang
Department of Electrical and Computer Engineering, UIUC

Rank Overparameterization and Chordal Conversion for Large-Scale Low-rank Optimization

Optimization problems over low-rank matrices are ubiquitous in the real-world. In principle, these can be reliably solved to global optimality via convex relaxation, but the computational costs can become prohibitive on a large scale. In practice, it is much more common to optimize over the low-rank matrices directly, as in the Burer-Monteiro approach, but their nonconvexity can cause failure by getting stuck at a spurious local minimum. For safety-critical societal applications, such as the operation and planning of an electricity grid, our inability to reliably achieve global optimality can have significant real-world consequences.
In the first part of this talk, we investigate global optimality guarantees by overparameterizing the low-rank factorization. In the smooth and strongly-convex setting, we rigorously show that, as the rank is increased, spurious local minima become increasingly rare in a step-wise fashion. In other words, rank-2 has fewer spurious local minima than rank-1, and rank-3 has fewer than rank-2, etc. Once the rank exceeds an O(1) threshold, every remaining local minimum is a global minimum, and every saddle point can be escaped.
In the second part of this talk, we revisit convex relaxations. Here, chordal conversion is a heuristic widely used to reduce the per-iteration cost of interior-point methods to as low as linear time, but a theoretical explanation for this speedup was previously unknown. We give a sufficient condition for the speedup, which covers many successful applications of chordal conversion, including the MAX-k-CUT relaxation, the Lovasz Theta problem, sensor network localization, polynomial optimization, and the AC optimal power flow relaxation, thus allowing theory to match practical experience. Main related papers: https://arxiv.org/abs/2207.01789 and https://arxiv.org/abs/2306.15288 .

May 3, 3:00-4:00pm EST

In person, Krieger 413

Zhenfu Whang
Beijing International Center for Mathematical Research (BICMR), Peking University

Uniform-in-time propagation of chaos for second order interacting particle systems

We study the long time behavior of second order particle systems interacting through global Lipschitz kernels. Combining the hypocoercivity method by Villani and the relative entropy method by Jabin and Wang, we are able to overcome the degeneracy of diffusion in position direction by controlling the relative entropy and relative Fisher information together. This implies the uniform-in-time propagation of chaos through the strong convergence of all marginals. Our method works at the level of Liouville equation and relies on the Log Sobolev inequality of equilibrium of Vlasov-Fokker-Planck equation. This is based on a joint work with Yun Gong (Peking University) and Pengzhi Xie (Fudan University).

Organizers: Mauro Maggioni, Fei Lu. Contacts us if you would like to meet with the speakers.

The Data Science Seminar will be held on ZOOM and/or in-person (see location for each talk) at 3:00-4:00pm EST on Wednesdays. Meeting ID: 936 4120 5752 Passcode: 507989

Sep.7

In person, Krieger 411

Weiqi Chu
Department of Mathematics, UCLA

Opinion dynamics models on networks and their mean-field description

The perspectives and opinions of people change and spread through social interactions on a daily basis. In the study of opinion dynamics on networks, one often models entities as nodes and their social relationships as edges, and examines how opinions evolve as dynamical processes on networks, including graphs, hypergraphs, multi-layer networks, etc. In this talk, I will review some agent-based opinion dynamics models on graphs and extend one type of models to hypergraphs, whose edges can connect an arbitrary number of nodes and encode interactions that involve three or more nodes. I will also derive a mean-field model using the density description, whose governing equation follows a kinetic equation of Kac type. We prove that the opinion density converges to a sum of Dirac delta measures as time goes to infinity, which means the opinions will always form isolated opinion clusters at equilibrium.

Oct.12

ZOOM

Hoang Tran
Oak Ridge National Laboratory

Black-Box Optimization with a Novel Nonlocal Gradient and Its Applications to Deep Learning

The problem of minimizing multi-modal loss functions with a large number of local optima frequently arises in machine learning and model calibration problems. Since the local gradient points to the direction of the steepest slope in an infinitesimal neighborhood, an optimizer guided by the local gradient is often trapped in a local minimum. To address this issue, we develop a novel nonlocal gradient to skip small local minima by capturing major structures of the loss's landscape in black-box optimization. The nonlocal gradient is defined by a directional Gaussian smoothing (DGS) approach. The key idea of DGS is to conducts 1D long-range exploration with a large smoothing radius along d orthogonal directions in R^d, each of which defines a nonlocal directional derivative as a 1D integral. Such long-range exploration enables the nonlocal gradient to skip small local minima. The d directional derivatives are then assembled to form the nonlocal gradient. We use the Gauss-Hermite quadrature rule to approximate the d 1D integrals to obtain an accurate estimator. The superior performance of our method is demonstrated in several benchmark tests and machine learning problems, including reinforcement learning and exploring the latent space of deep networks.

Enabling long-range exploration in minimization of multimodal functions
AdaDGS: An adaptive black-box optimization method with a nonlocal directional Gaussian smoothing gradient
Model Calibration of the Liquid Mercury Spallation Target using Evolutionary Neural Networks and Sparse Polynomial Expansions

Oct.19

Postponed

Roy Lederman
Department of Statistics and Data Science, Yale University

The Geometry of Molecular Conformations in Cryo-EM

Cryo-Electron Microscopy (cryo-EM) is an imaging technology that is revolutionizing structural biology. Cryo-electron microscopes produce many very noisy two-dimensional projection images of individual frozen molecules; unlike related methods, such as computed tomography (CT), the viewing direction of each particle image is unknown. The unknown directions and extreme noise make the determination of the structure of molecules challenging. While other methods for structure determination, such as x-ray crystallography and NMR, measure ensembles of molecules, cryo-electron microscopes produce images of individual particles. Therefore, cryo-EM could potentially be used to study mixtures of conformations of molecules. We will discuss a range of recent methods for analyzing the geometry of molecular conformations using cryo-EM data and some new issues that arise.

Oct.26

ZOOM

Christos Mavridis
Department of Electrical & Computer Engineering, University of Maryland

Online Deterministic Annealing: Progressive Learning for Cyber-Physical Systems

The continuously increasing interest in intelligent autonomous systems is accentuating the need for new developments on cyber-physical systems that can learn, adapt, and reason. Towards this direction, we will formally analyze the properties of learning as a continuous, dynamic, and adaptive process of such systems, in applications where computational resources are limited, and robustness and interpretability are prioritized. We will focus on the notion of progressive learning: the ability to hierarchically approximate the solution of an optimal decision-making problem given real-time observations from the system and its environment. We will introduce the Online Deterministic Annealing (ODA) algorithm as a gradient-free stochastic optimization method to construct a learning model that progressively increases its complexity as needed, through an intuitive bifurcation phenomenon. We will discuss the properties of robustness and interpretability, and the importance of being able to control the performance-complexity trade-off. Finally, we will showcase its use in constructing supervised, unsupervised, and reinforcement learning algorithms with applications in robotics, multi-agent systems, and communication networks.

Nov.2

Krieger 413

Enrique Mallada
ECE@JHU

Model-Free Analysis of Dynamical Systems Using Recurrent Sets

In this talk, we develop model-free methods for the analysis of dynamical systems using data. Our key insight is to replace the notion of invariance, a core concept in Lyapunov Theory, with the more relaxed notion of recurrence. A set is τ-recurrent (resp. k-recurrent) if every trajectory that starts within the set returns to it after at most τ seconds (resp. k steps). We then leverage the notion of recurrence to develop several analysis tools and algorithms to study dynamical systems. We first consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. We show that a τ-recurrent set containing a stable equilibrium must be a subset of its ROA under mild assumptions. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allows them to continue being executed even after an initial offline training stage. We will finalize by providing some recent extensions of this work that generalizes Lyapunov's Direct Method to allow for non-decreasing functions to certify stability, and illustrate future directions of research.

Model-Free Analysis of Dynamical Systems Using Recurrent Sets

Nov.30

ZOOM

Ruoyu Wu
Department of Mathematics, Iowa State University

Graphon mean field systems: large population and long-time asymptotics

We consider heterogeneously interacting diffusive particle systems and their large population limit. The interaction is of mean field type with (random) weights characterized by an underlying graphon. The limit is given by a graphon particle system consisting of independent but heterogeneous nonlinear diffusions whose probability distributions are fully coupled. A law of large numbers result is established as the system size increases and the underlying graphons converge. Under suitable convexity assumptions, we show the exponential ergodicity for the system, establish the uniform in time law of large numbers and concentration bounds, and analyze the uniform in time Euler approximation. The precise rate of convergence of the Euler approximation is provided. Based on joint works with Erhan Bayraktar and Suman Chakraborty.

Organizers: Mauro Maggioni, Fei Lu, Marie-Jose Kuffner, and Christian Kümmerle. Contacts us if you would like to meet with the speakers.

The Data Science seminar will be held on ZOOM and/or in-person at at 3:00-4:00pm EST on Wednesdays.
Meeting ID: 936 4120 5752 Passcode: 507989

Feb.23

Zoom

Evelyn Lunasin
United States Naval Academy

An Efficient Continuous Data Assimilation Algorithm for the Sabra Shell Model of Turbulence

Complex nonlinear turbulent dynamical systems are ubiquitous in many areas of research. Recovering unobserved state variables is an important topic for the data assimilation of turbulent systems. In this talk I will present an efficient continuous in time data assimilation scheme which exploits closed analytic formulae for updating the unobserved state variables. It is computationally efficient and accurate. The new data assimilation scheme is combined with a simple reduced order modeling technique that involves a cheap closure approximation and a noise inflation. In such a way, many complicated turbulent dynamical systems can satisfy the requirements of the mathematical structures for the proposed efficient data assimilation scheme. The new data assimilation scheme is then applied to the Sabra shell model, which is a conceptual model for nonlinear turbulence. The goal is to recover the unobserved shell velocities across different spatial scales. It is shown that the new data assimilation scheme is skillful in capturing the nonlinear features of turbulence including the intermittency and extreme events in both the chaotic and the turbulent dynamical regimes. It is also shown that the new data assimilation scheme is more accurate and computationally cheaper than the standard ensemble Kalman filter and nudging data assimilation schemes for assimilating the Sabra shell model with partial observations. This is joint work with Nan Chen and Yuchen Li. .

An Efficient Continuous Data Assimilation Algorithm...

March 2

Zoom

Katy Craig
University of California, Santa Barbara

Graph Clustering Dynamics: From Spectral to Mean Shift

Clustering algorithms based on mean shift or spectral methods on graphs are ubiquitous in data analysis. However, in practice, these two types of algorithms are treated as conceptually disjoint: mean shift clusters based on the density of a dataset, while spectral methods allow for clustering based on geometry. In joint work with Nicolás García Trillos and Dejan Slepcev, we define a new notion of Fokker-Planck equation on graph and use this to introduce an algorithm that interpolates between mean shift and spectral approaches, enabling it to cluster based on both the density and geometry of a dataset. We illustrate the benefits of this approach in numerical examples and contrast it with Coifman and Lafon’s well-known method of diffusion maps, which can also be thought of as a Fokker-Planck equation on a graph, though one that degenerates in the zero diffusion limit.

Graph Clustering Dynamics: From Spectral to Mean Shift

March 16

Hybrid in Shaffer 301 and on Zoom

Clément Royer
University Paris-Dauphine

Numerical optimization methods with complexity guarantees for nonconvex data science

Nonconvex optimization problems have attracted significant attention from the data science community. Those formulations arise not only in deep learning, but also in data analysis and robust statistics. As a result, these instances come with a great amount of structure, that can be leveraged for optimization purposes. For such problems, the optimization landscape is often completely characterized by means of the first- and second-order derivatives. Consequently, the development of optimization routines based on those derivatives has become an important area of research, wherein the theoretical efficiency of a method is typically measured by deriving complexity bounds, or convergence rates. Such guarantees quantify the effort required to reach an approximate solution, and are often used to compare optimization schemes. On the other hand, approaches that achieve the best theoretical results tend to depart from standard optimization frameworks, and may even by outperformed by textbook algorithms in practice.

In this talk, we revisit popular numerical optimization frameworks to equip them with complexity guarantees while maintaining their practical appeal, as evidenced on certain data science problems. The first part of the talk concerns nonlinear conjugate gradient methods. Through a careful complexity analysis, we are able to identify a regime in which nonlinear conjugate gradient achieves a better complexity than gradient descent on nonconvex problems. Our experiments on nonconvex robust regression suggest that this regime is actually met in practice, thus partially explaining the superior performance of nonlinear conjugate gradient on these instances. In the second part of the talk, we study Newton-Conjugate Gradient techniques, and describe a framework equipped with the best-known complexity guarantees for nonconvex optimization that hews close to the classical algorithm, and illustrate the practical impact of enforcing these guarantees. We then propose a variant of this method tailored to specific nonconvex optimization landscapes, for which even stronger results can be derived. Finally, if time permits, we will also discuss nonconvex problems arising in the development of neural network architectures, and the guarantees that can be sought for on such instances.

April 6 Recording pwd: DSS@JHU0406

Hybrid in Shaffer 301 and on Zoom

Matthew Levine
Caltech

A Framework for Machine Learning of Model Error in Dynamical Systems

The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. Here, we present a unifying framework for blending mechanistic and machine-learning approaches for identifying dynamical systems from data. This framework is agnostic to the chosen machine learning model parameterization, and casts the problem in both continuous- and discrete-time. We will also show recent developments that allow these methods to learn from noisy, partial observations. We first study model error from the learning theory perspective, defining the excess risk and generalization error. For a linear model of the error used to learn about ergodic dynamical systems, both excess risk and generalization error are bounded by terms that diminish with the square-root of T (the length of the training trajectory data). In our numerical examples, we first study an idealized, fully-observed Lorenz system with model error, and demonstrate that hybrid methods substantially outperform solely data-driven and solely mechanistic-approaches. Then, we present recent results for modeling partially observed Lorenz dynamics that leverages both data assimilation and neural differential equations. Joint work with Andrew Stuart.

April 13 Recording pwd: DSS@JHU0413

Hybrid in Shaffer 301 and on Zoom

Salar Fattahi
University of Michigan

Convergence of Subgradient Method in Factorized Models: Small Initialization, Noisy Measurements, and Over-parameterization

Factorized models, from low-rank matrix recovery to deep neural networks, play a central role in many modern machine learning problems. Despite their widespread applications, problems based factorized models are difficult to solve in their worst case due to their inherent nonconvexity--a fact noted as early as the 1980s. Our talk is inspired by the recent observations in the optimization and machine learning community that, despite their nonconvexity, many realistic and practical instances of factorized models are far from their worst case scenarios.

We study a natural nonconvex and nonsmooth formulation of two prototypical factorized models, namely low-rank matrix factorization and deep linear regression, where the goal is to recover a low-dimensional model from a limited number of measurements, a subset of which may be grossly corrupted with noise. We study a scenario where the dimension of the true solution is unknown and over-estimated instead. The over-estimation of the dimension gives rise to an over-parameterized model in which there are more degrees of freedom than needed. Such over-parameterization may lead to overfitting, or adversely affect the performance of the algorithm. We prove that a simple subgradient method (SubGM) with small initialization is agnostic to both over-parameterization and noise in the measurements. In particular, we provide the first unifying framework for analyzing the behavior of subgradient method under different noise models, showing that it converges to the true solution, even under arbitrarily large and arbitrarily dense noise values, and--perhaps surprisingly--even if the globally optimal solutions do not correspond to the ground truth. Finally, we study the effect of depth on the local landscape of the factorized models, showing that deeper models enjoy a flatter landscape around ground truth.

Preprint

April 20

Hybrid in Shaffer 1 and on Zoom

Terry Lyons
University of Oxford

Rough paths and streamed data - current challenges

The mathematics of rough paths began as an approach to extending calculus to address the interactions between highly oscillatory streams. The methods come from geometry of Chen and the analysis of Young. A key step involved is the representation of unparameterized paths as elements of the tensor algebra. Today this representation or feature set is having a significant impact on the data science of streamed data.

April 27 Recording pwd: DSS@JHU0427

Hybrid in Shaffer 301 and on Zoom

Amarjit Budhiraja
University of North Carolina at Chapel Hill

Numerical Approaches for Computing Quasi-stationary Distributions

Markov processes with absorbing states occur frequently in epidemiology, statistical physics, population biology, and other areas. Quasi-stationary distributions (QSD) are the basic mathematical object used to describe the long-time behavior of such Markov processes on non-absorption events. Just as stationary distributions of ergodic Markov processes make the law of the Markov process, initialized at that distribution, invariant at all times, quasi-stationary distributions are probability measures that leave the conditional law of the Markov process, on the event of non-absorption, invariant. In this talk I will present a couple of numerical approaches for approximating QSD for Markov processes. These approaches use ideas from reinforced random walks, interacting particle systems, and stochastic approximations. The talk is based on joint work with Adam Waterbury.

Approximating Quasi-Stationary Distributions with Interacting Reinforced Random Walks

Organizers: Mauro Maggioni, Fei Lu, Marie-Jose Kuffner, and Christian Kümmerle. Contacts us if you would like to meet with the speakers.

The Data Science seminar will be held on ZOOM at at 3:00-4:00pm EST on Wednesdays.
Meeting ID: 936 4120 5752 Passcode: 507989

September 15

Shiqian Ma
UC Davis

Riemannian Optimization for Projection Robust Optimal Transport

The optimal transport problem is known to suffer the curse of dimensionality. A recently proposed approach to mitigate the curse of dimensionality is to project the sampled data from the high dimensional probability distribution onto a lower-dimensional subspace, and then compute the optimal transport between the projected data. However, this approach requires solving a max-min problem over the Stiefel manifold, which is very challenging in practice. In this talk, we propose a Riemannian block coordinate descent (RBCD) method to solve this problem. We analyze the complexity of arithmetic operations for RBCD to obtain an $\epsilon$-stationary point, and show that it significantly improves the corresponding complexity of existing methods. Numerical results on both synthetic and real datasets demonstrate that our method is more efficient than existing methods, especially when the number of sampled data is very large. If time permits, we will also talk about the projection robust Wasserstein barycenter problem.

Related papers:
A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance (ICML 2021)
Projection Robust Wasserstein Barycenters (ICML 2021)

$Shiqian Ma$

October 13

Yuxin Chen
Princeton University

Inference and Uncertainty Quantification for Low-Rank Models

Many high-dimensional problems involve reconstruction of a low-rank matrix from incomplete and corrupted observations. Despite substantial progress in designing efficient estimation algorithms, it remains largely unclear how to assess the uncertainty of the obtained low-rank estimates, and how to construct valid yet short confidence intervals for the unknown low-rank matrix.
In this talk, I will discuss how to perform inference and uncertainty quantification for two examples of low-rank models: (1) noisy matrix completion, and (2) heteroskedastic PCA with Missing Data. For both problems, we identify statistically efficient estimators that admit non-asymptotic distributional characterizations, which in turn enable optimal construction of confidence intervals for, say, the unseen entries of the low-rank matrix of interest. Our inferential procedures do not rely on sample splitting, thus avoiding unnecessary loss of data efficiency. All this is accomplished by a powerful leave-one-out analysis framework that originated from probability and random matrix theory.
The first part of my talk is based on joint work with Cong Ma, Yuling Yan and Jianqing Fan, while the second part is based on joint work with Yuling Yan and Jianqing Fan.

Related papers:
Inference and uncertainty quantification for noisy matrix completion
Inference for Heteroskedastic PCA with Missing Data

October 20

Grigorios A. Pavliotis
Imperial College University

Optimal Langevin Samplers

Sampling from a probability distribution in a high dimensional spaces is a standard problem in computational statistical mechanics, Bayesian statistics and other applications. A standard approach for doing this is by constructing an appropriate Markov process that is ergodic with respect to the measure from which we wish to sample. In this talk we will present a class of sampling schemes based on Langevin-type stochastic differential equations. We will show, in particular, nonreversible Langevin samplers, i.e. stochastic dynamics that do not satisfy detailed balance, have, in general, better properties than their reversible counterparts, in the sense of accelerating convergence to equilibrium and of reducing the asymptotic variance. Numerical schemes for such nonreversible samplers will be discussed and the connection with nonequilibrium statistical mechanics will be made.

Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation
Optimizing interacting Langevin dynamics using spectral gaps

October 27, Hybrid: Hodson 316

Thomas Fai
Brandeis University

Coarse-grained stochastic model of myosin-driven vesicles into dendritic spines

We model vesicle transport into dendritic spines, which are micron-sized structures located at the postsynapses of neurons characterized by their thin necks and bulbous heads. Recent high-resolution 3D images show that spine morphologies are highly diverse. To study the influence of geometry on transport, our model reduces the fluid dynamics of vesicle motion to two essential parameters representing the system geometry and elasticity. Upon including competing molecular motor species that push and pull on vesicles, the model exhibits multiple steady states that neurons could exploit in order to control the strength of synapses. Moreover, the small numbers of motors lead to random switching between these steady states. We describe a method that incorporates stochasticity into the model to predict the probability and mean time of translocation as a function of spine geometry.

Coarse-grained stochastic model of myosin-driven vesicles into dendritic spines

Nov. 3

Arnaud Doucet
University of Oxford & DeepMind

Diffusion Schrodinger Bridge with Applications to Score-Based Generative Modeling

Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrodinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).
This is joint work with Valentin De Bortoli, James Thornton and Jeremy Heng.

https://arxiv.org/abs/2106.01357

Nov 10, Hybrid: Hodson 316

Paris Perdikaris
University of Pennsylvania

Learning to solve parametric PDEs with deep operator networks

Partial differential equations (PDEs) play a central role in the mathematical analysis and modeling of complex dynamic processes across all corners of science and engineering. Their solution often requires laborious analytical or computational tools, associated with a cost that is dramatically amplified when different scenarios need to be investigated, for example corresponding to different initial or boundary conditions, different inputs, etc. In this work we introduce physics-informed DeepONets; a deep learning framework for learning the solution operator of arbitrary PDEs, even in the absence of any paired input-output training data. We illustrate the effectiveness of the proposed framework in rapidly predicting the solution of various types of parametric PDEs up to three orders of magnitude faster compared to conventional PDE solvers, setting a new paradigm for modeling and simulation of non-linear and non-equilibrium processes in science and engineering.

https://arxiv.org/abs/2103.10974
https://arxiv.org/abs/2106.05384

Special time 4pm, Dec. 1

Madeleine Udell
Cornell University

Detecting equivalence between iterative algorithms for optimization

When are two algorithms the same? How can we be sure a recently proposed algorithm is novel, and not a minor twist on an existing method? In this talk, we present a framework for reasoning about equivalence between a broad class of iterative algorithms, with a focus on algorithms designed for convex optimization. We propose several notions of what it means for two algorithms to be equivalent, and provide computationally tractable means to detect equivalence. Our main definition, oracle equivalence, states that two algorithms are equivalent if they result in the same sequence of calls to the function oracles (for suitable initialization). Borrowing from control theory, we use state-space realizations to represent algorithms and characterize algorithm equivalence via transfer functions. Our framework can also identify and characterize some algorithm transformations including permutations of the update equations, repetition of the iteration, and conjugation of some of the function oracles in the algorithm. A software package named Linnaeus implements the framework and makes it easy to find other iterative algorithms that are equivalent to an input algorithm. More broadly, this framework and software advances the goal of making mathematics searchable.
This is based on joint work with Shipu Zhao and Laurent Lessard.

An automatic system to detect equivalence between iterative algorithms

Organizers: Mauro Maggioni, Fei Lu, Sui Tang, Felix (Xiaofeng) Ye, and Ming Zhong. Contacts us if you would like to meet with the speakers.

The Data Science seminar will be held at Maryland Hall 309 at 3:00-4:00pm on Wednesdays, unless otherwise specified.

Postponed, date TBD

David F. Anderson
University of Wisconsin

Network structure and dynamics for biochemical reaction networks

Models of cellular processes are often represented with networks that describe the interactions between the constituent molecules. The mathematical study of how dynamical properties of a system relate to graphical properties of its associated network often goes by the name "reaction network theory". In this talk, I will connect some of the classical results of reaction network theory, including the Deficiency Zero Theorem and its analog in the stochastic setting, with some current results at the interface of synthetic biology and mathematics. Topics will include: (i) a characterization of when a stochastic model will admit a time-dependent distribution that is a product of Poissons and (ii) designing reaction networks that admit a particular (marginal) stationary distribution.

$David Anderson$

Cancelled due to COVID 19.

Eitan Tadmor
University of Maryland

Emergent Behavior in Collective Dynamics
A fascinating aspect of collective dynamics is the self-organization of small-scales and their emergence as higher-order patterns -- clusters, flocks, tissues, parties. The emergence of different patterns can be described in terms of few fundamental ''rules of interactions''. I will discuss recent results of the large-time, large-crowd dynamics, driven by anticipation that tend to align the crowd, while other pairwise interactions keep the crowd together and prevent over-crowding. In particular, I address the question how short-range interactions lead to the emergence of long-range patterns, comparing different rules of interactions based on geometric vs. topological neighborhoods.

Cancelled due to COVID 19.

April 8

Paris Perdikaris
University of Pennsylvania

Cancelled due to COVID 19.

April 22

Evelyn Lunasin
U.S. Naval Academy

TBD

TBD.

Organizers: Mauro Maggioni, Fei Lu, Sui Tang, Felix (Xiaofeng) Ye, and Ming Zhong. Contacts us if you would like to meet with the speakers.

The Data Science seminar will be held at Maryland Hall 109 at 3:00-4:00pm on Wednesdays, unless otherwise specified.

Sep. 11th

Tingran Gao
University of Chicago

Multi-Representation Manifold Learning on Fibre Bundles

Fibre bundles serve as a natural geometric setting for many learning problems involving non-scalar pairwise interactions. Modeled on a fixed principal bundle, different irreducible representations of the structural group induce many associated vector bundles, encoding rich geometric information for the fibre bundle as well as the underlying base manifold. An intuitive example for such a learning paradigm is phase synchronization---the problem of recovering angles from noisy pairwise relative phase measurements---which is prototypical for a large class of imaging problems in cryogenic electron microscopy (cryo-EM) image analysis. We propose a novel nonconvex optimization formulation for this problem, and develop a simple yet efficient two-stage algorithm that, for the first time, guarantees strong recovery for the phases with high probability. We demonstrate applications of this multi-representation methodology that improve denoising and clustering results for cryo-EM images. This algorithmic framework also extends naturally to general synchronization problems over other compact Lie groups, with a wide spectrum of potential applications.

This talk is based on three joint work with Prof. Zhizhen Zhao from UIUC:
Tingran Gao and Zhizhen Zhao, Multi-Frequency Phase Synchronization, Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2132 - 2141, 2019
Tingran Gao, Yifeng Fan, and Zhizhen Zhao, Representation Theoretic Patterns in Multi-Frequency Class Averaging for Three-Dimensional Cryo-Electron Microscopy, arXiv:1906.01082. (2019)
Yifeng Fan, Tingran Gao, and Zhizhen Zhao, Unsupervised Co-Learning on G-Manifolds Across Irreducible Representations, arXiv:1906.02707. (2019)

Sep. 25th, 2019

Anna Little
Michigan State University

Robust Statistical Procedures for Clustering in High Dimensions

This talk addresses multiple topics related to robust statistical procedures for clustering in high dimensions, including path-based spectral clustering (a new method), classical multidimensional scaling (an old method), and clustering in signal processing. Path-based spectral clustering is a novel approach which combines a data driven metric with graph-based clustering. Using a data driven metric allows for fast algorithms and strong theoretical guarantees when clusters concentrate around low-dimensional sets. Another approach to high-dimensional clustering is classical multidimensional scaling (CMDS), a dimension reduction technique widely popular across disciplines due to its simplicity and generality. CMDS followed by a simple clustering algorithm can exactly recover all cluster labels with high probability when the signal to noise ratio is high enough. However, scaling conditions become increasingly restrictive as the ambient dimension increases, illustrating the need for robust unbiasing procedures in high dimensions. Clustering in signal processing is the final topic; in this context each data point corresponds to a corrupted signal. The classic multireference alignment problem is generalized to include random dilation in addition to random translation and additive noise, and a wavelet based approach is used to define an unbiased representation of the target signal(s) which is robust to high frequency perturbations.

Relevant papers
Anna Little, James Murphy, and Mauro Maggioni, Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms, arXiv:1712.06206, 2017.
Anna Little, Yuying Xie, and Qiang Sun, An Analysis of Classical Multidimensional Scaling, arXiv:1812.11954, 2018.

Oct. 2nd, 2019

Jinqiao Duan
Illinois Institute of Technology

Data Science Plus Dynamical Systems: What Can We Learn?

Observational datasets are abundant. Dynamical systems are mathematical models in engineering, medicine and science. Data are noisy and dynamical systems are often under random fluctuations (either Gaussian or non-Gaussian noise). The interactions between data science and dynamical systems are becoming exciting. On the one hand, dynamical systems tools are valuable to extract information from datasets. On the other hand, data science techniques are indispensable for understanding dynamical behaviors with observational data.
I will present recent progress on extracting information like the most probable transition pathways, mean residence time, and escape probability from datasets, and on estimating system states and parameters with help of datasets. In addition to highlighting the underlying dynamical systems structures, such as stochastic flows, slow manifolds and dimension reduction, I will outline several mathematical issues at the foundation of relevant machine learning approaches.

Relevant papers
The Onsager-Machlup function as Lagrangian for the most probable path of a jump-diffusion process.
Discovering mean residence time and escape probability from data of stochastic dynamical systems.
Effective Filtering Analysis for Non-Gaussian Dynamic Systems.
Dynamical inference for transitions in stochastic systems with alpha-stable Levy noise.
More on arXiv.

Oct. 9th, 2019

Guang Lin
Purdue University

Uncertainty Quantification and Machine Learning of the Physical Laws Hidden Behind the Noisy Data

In this talk, we will present three new data-driven, machine-learning based methods for Learning of the physical laws Hidden Behind the noisy data and quantifying the uncertainties in machine learning. First, I will present a new data-driven paradigm on how to quantify the structural uncertainty (model-form uncertainty) and learn the physical laws hidden behind the noisy data in the complex systems governed by partial differential equations. The key idea is to identify the terms in the underlying equations and to approximate the coefficients of the terms with error bars using Bayesian machine learning algorithms on the available noisy measurement. In particular, Bayesian sparse feature selection and parameter estimation are performed. Numerical experiments show the robustness of the learning algorithms with respect to noisy data and size, and its ability to learn various candidate equations with error bars to represent the quantified uncertainty. Second, I will introduce a fast-probabilistic convolutional encoder-decoder network named ConvPDE-UQ for predicting the solutions of heterogeneous elliptic partial differential equations on varied domains. Unlike other approaches, ConvPDE-UQ can quantify the uncertainties in deep-learning-based prediction and allow training to be scaled to the large data sets. Finally, to predict material failure and fracture propagation, we will present a new deep neural network named Peri-Net.

Relevant papers
Robust data-driven discovery of governing physical laws with error bars.
Convolutional neural networks with quantified uncertainty for heterogeneous elliptic partial differential equations on varied domains.
Peri-Net: Analysis of Crack Patterns Using Deep Neural Networks.

$Guang Lin$

Note date and place: Nov. 7th, 1:30-2:30pm, Whitehead 304 AMS seminar

Jonathan Mattingly
Duke University

Computational methods for Quantifying Gerrymandering and other computational statistical mechanics problems

I will describe some of the interesting problems which have arisen around the problem of understanding Gerrymandering. It is a high sampling dimensional problem. I will talk about some basic MCMC schemes and some extensions to both interesting global moves as well as some other generalizations. I will also take a moment to frame the problem and state some open questions.

Relevant papers
Quantifying Gerrymandering in North Carolina.
Evaluating Partisan Gerrymandering in Wisconsin.

Nov. 13th, 2019

Le Song
Georgia Institute of Technology

Learning Augmented Design of Algorithms

Algorithms are step-by-step instructions designed by humans to solve a problem. However, many data analytics problems are intrinsically hard and complex, making the design of effective algorithms very challenging. Domain experts have to perform extensive research, experiment with many trial-and-errors, and carefully craft some approximation or heuristic schemes in order to design effective algorithms. Previous algorithm design paradigms seldom systematically exploit a common trait of real-world algorithms: instances of the same type of problem are solved again and again on a regular basis using the algorithm, maintaining the same problem structure, but differing mainly in their data. In this talk, I will explain two examples of using learning to augment algorithm design (combinatorial optimization and sparse recovery). The new design paradigm will delegate some difficult design choices in an algorithm to a nonlinear learning model, and use data to figure out the best model/algorithm. I will show that the learned augmented design leads to interesting and effective new algorithms.

Dec. 4th, 2019

Matthew Zahr
University of Notre Dame

Integrating computational physics and numerical optimization to address challenges in computational science, engineering, and medicine

Optimization problems governed by partial differential equations are ubiquitous in modern science, engineering, and mathematics. They play a central role in optimal design and control of multiphysics systems, data assimilation, and inverse problems. However, as the complexity of the underlying PDE increases, efficient and robust methods to accurately compute the objective function and its gradient become paramount. To this end, I will present a globally high-order discretization of PDEs and their quantities of interest and the corresponding fully discrete adjoint method for use in a gradient-based PDE-constrained optimization setting. The framework is applied to solve a several of optimization problems including the design of energetically optimal flapping motions, the design of energy harvesting mechanisms, and data assimilation to dramatically enhance the resolution of magnetic resonance images. In addition, I will demonstrate that the role of optimization in computational physics extends well beyond these traditional design and control problems. I will introduce a new method for the discovery and high-order accurate resolution of shock waves in compressible flows using PDE-constrained optimization techniques. The key feature of this method is an optimization formulation that aims to align discontinuous features of the solution basis with discontinuities in the solution. The method is demonstrated on a number of one- and two-dimensional transonic and supersonic flow problems. In all cases, the framework tracks the discontinuity closely with curved mesh elements and provides accurate solutions on extremely coarse meshes.

References
An adjoint method for a high-order discretization of deforming domain conservation laws for optimization of flow problems.
An optimization-based approach for high-order accurate discretization of conservation laws with discontinuous solutions.
High-order, linearly stable, partitioned solvers for general multiphysics problems based on implicit--explicit Runge--Kutta schemes.

Organizers: Mauro Maggioni, Fei Lu, Sui Tang, Felix (Xiaofeng) Ye, and Ming Zhong. Contacts us if you would like to meet with the speakers.

The Data Science seminar is merging with the CIS seminar , and it will be in either Mergenthaler 11 or in Clark Hall 316, throughout the semester, on Johns Hopkins Homewood campus this map). As the merging of the two seminar series takes place in Spring 2019, the the schedule of the CIS seminar will remain available and takes precedence over the one below for CIS speakers.

Jan. 29th

Hamed Pirsiavash
UMBC

Clark Hall Room 12:15-1:15pm

Feb. 5th, 2019

Ted Satterthwaite
UPenn

Clark Hall Room 12:15-1:15pm

Feb. 12th, 2019

Yanxun Xu
Johns Hopkins University

Clark Hall Room 110, 1pm-2pm (light lunch served at 12:30pm)

Bayesian Estimation of Sparse Spiked Covariance Matrices in High Dimensions

Abstract:
We propose a Bayesian methodology for estimating spiked covariance matrices with jointly sparse structure in high dimensions. The spiked covariance matrix is reparametrized in terms of the latent factor model, where the loading matrix is equipped with a novel matrix spike-and-slab LASSO prior, which is a continuous shrinkage prior for modeling jointly sparse matrices. We establish the rate-optimal posterior contraction for the covariance matrix with respect to the operator norm as well as that for the principal subspace with respect to the projection operator norm loss. We also study the posterior contraction rate of the principal subspace with respect to the two-to-infinity norm loss, a novel loss function measuring the distance between subspaces that is able to capture element-wise eigenvector perturbations. We show that the posterior contraction rate with respect to the two-to-infinity norm loss is tighter than that with respect to the routinely used projection operator norm loss under certain low-rank and bounded coherence conditions. In addition, a point estimator for the principal subspace is proposed with the rate-optimal risk bound with respect to the projection operator norm loss. These results are based on a collection of concentration and large deviation inequalities for the matrix spike-and-slab LASSO prior. The numerical performance of the proposed methodology is assessed through synthetic examples and the analysis of a real-world face data example.

April 3, Special time: 3:00-4:00, location: Shaffer 100

Andrew Sommese
University of Notre Dame

Title: A Brief Survey of Numerical Algebraic Geometry

Abstract: This talk will survey numerical algebraic geometry and the changes of viewpoint as the area has progressed. There will be a discussion of some of the motivating problems from kinematics and differential equations

Related papers:
Numerical Decomposition of Real Algebraic Curves and Surfaces, ACM Transactions of Mathematical Software, 44 (2017)
Homotopy techniques for tensor decomposition and perfect identifiability, J. Reine Angew.\ Math. (2016)
On the completeness of solutions of Bethe's equations , Physical Review E, 88, 052113 (2013)
Cell cycle control and bifurcation for a free boundary problem modeling tissue growth, JSC (2013)
A homotopy method based on WENO schemes for solving steady state problems of hyperbolic conservation laws, JCP (2013)
Numerically solving polynomial systems with Bertini, SIAM (2013)
Numerical Algebraic Geometry and Algebraic Kinematics, Acta Numerica (2011)
Numerical solution of systems of polynomials arising in engineering and science,World Scientific Press. (2005)

April 10, Special time: 3:00-4:00, location: Shaffer 100

Nan Chen
University of Wisconsin-Madison

Title: A Conditional Gaussian Framework for Uncertainty Quantification, Data Assimilation and Prediction of Nonlinear Turbulent Dynamical Systems

Abstract: A conditional Gaussian framework for uncertainty quantification, data assimilation and prediction of nonlinear turbulent dynamical systems will be introduced in this talk. Despite the conditional Gaussianity, the dynamics remain highly nonlinear and are able to capture strongly non-Gaussian features such as intermittency and extreme events. The conditional Gaussian structure allows efficient and analytically solvable conditional statistics that facilitates the real-time data assimilation and prediction. This talk will include three applications of such conditional Gaussian framework. The first part regards the state estimation and data assimilation of multiscale and turbulent ocean flows using noisy Lagrangian tracers. Rigorous analysis shows that an exponential increase in the number of tracers is required for reducing the uncertainty by a fixed amount. This indicates a practical information barrier. In the second part, an efficient statistically accurate algorithm is developed that is able to solve a rich class of high-dimensional Fokker-Planck equation with strong non-Gaussian features and beat the curse of dimensions. In the last part of this talk, a physics-constrained nonlinear stochastic model is developed, and is applied to predicting the Madden-Julian oscillation indices with strongly non-Gaussian intermittent features.

Related papers:

Organizers: Mauro Maggioni, Fei Lu. Contacts us if you would like to meet with the speakers.

The data seminar will take place on Wednesdays at 3pm, in Shaeffer 304, during the semester, on Johns Hopkins Homewood campus this map).

October 10

Jianfeng Lu
Duke

Title: Solving large-scale leading eigenvalue problem

Abstract: The leading eigenvalue problems arise in many applications. When the dimension of the matrix is super huge, such as for applications in quantum many-body problems, conventional algorithms become impractical due to computational and memory complexity.
In this talk, we will describe some recent works on new algorithms for the leading eigenvalue problems based on randomized and coordinate-wise methods (joint work with Yingzhou Li and Zhe Wang).

Related papers:
Coordinate-wise descent methods for leading eigenvalue problem
The full configuration interaction quantum Monte Carlo method in the lens of inexact power iteration

$Jianfeng Lu$

November 6, Special time: 4:00-5:00, location: Gilman 277

Mohammad Farazmand
Department of Mechanical Engineering, MIT

Title: Extreme Events: Dynamics, Prediction and Mitigation

Abstract: A wide range of natural and engineering systems exhibit extreme events, i.e., spontaneous intermittent behavior manifested through sporadic bursts in the time series of their observables. Examples include ocean rogue waves, intermittency in turbulence and extreme weather patterns. Because of their undesirable impact on the system or the surrounding environment, the real-time prediction and mitigation of extreme events is of great interest. In this talk, I discuss some recent advances in the quantification and prediction of extreme events. In particular, I introduce a variational method that disentangles the mechanisms underpinning the formation of extreme events. This in turn enables the data-driven, real-time prediction of the extreme events. I demonstrate the application of this method with several examples including the prediction of ocean rogue waves and the intermittent energy dissipation bursts in turbulent fluid flows.

Related papers:
Extreme Events: Mechanisms and Prediction
A variational approach to probing extreme events in turbulent dynamical systems
Reduced-order prediction of extreme events

November 7

Kevin Lin
School of Mathematical Science, University of Arizona

Title: Mori-Zwanzig formalism and discrete-time modeling of chaotic dynamics

Abstract: Nonlinear dynamic phenomena often require a large number of dynamical variables to model, only a small fraction of which are of direct interest. Reduced models that use only the relevant dynamical variables can be very useful in such situations, both for computational efficiency and insights into the dynamics. Recent work has shown that the NARMAX (Nonlinear Auto-Regressive Moving-Average with eXogenous inputs) representation of stochastic processes provides an effective basis for parametric model reduction in a number of concrete settings [Chorin-Lu PNAS 2015]. In this talk, I will review some of these developments, then explain how the NARMAX method can be seen as a special case of a general theoretical framework for model reduction due to Mori and Zwanzig. These ideas will be illustrated on a prototypical model of spatiotemporal chaos.

Related papers:
Preprint available upon request (please write to feilu@math.jhu.edu)
Data-based stochastic model reduction for the Kuramoto--Sivashinsky equation
Comparison of continuous and discrete-time data-based modeling for hypoelliptic systems

$KLin$

December 5

Clarence Rowley
Princeton University

Title: Structure, stability, and simplicity in complex fluid flows

Abstract: Fluid flows can be extraordinarily complex, and even turbulent, yet often there is structure lying within the apparent complexity. Understanding this structure can help explain observed physical phenomena, and can help with the design of control strategies in situations where one would like to change the natural state of a flow. This talk addresses techniques for obtaining simple, approximate models for fluid flows, using data from simulations or experiments. We discuss a number of methods, including balanced truncation, linear stability theory, and dynamic mode decomposition, and apply them to several flows with complex behavior, including a transitional channel flow, a jet in crossflow, and a T-junction in a pipe.

Related papers:
Model Reduction for Flow Analysis and Control

December 10, Special time: 12:00-1:00, location: Shaffer 304

Marina Meila
Department of Statistics, University of Washington

Title: Unsupervised Validation for Unsupervised Learning

Abstract: Scientific research involves finding patterns in data, formulating hypotheses, and validating them with new observations. Machine learning is many times faster than humans at finding patterns, yet the task of validating these as "significant" is still left to the human expert or to further experiment. In this talk I will present a few instances in which unsupervised machine learning tasks can be augmented with data driven validation. In the case of clustering, I will demonstrate a new framework of "proving" that a clustering is approximately correct, that does not require a user to know anything about the data distribution. This framework has some similarities to PAC bounds in supervised learning; unlike PAC bounds, the bounds for clustering can be calculated exactly and can be of direct practical utility. In the case of non-linear dimension reduction by manifold learning, I will present a way around the following problem. It is widely recognized that the low dimensional embeddings obtained with manifold learning algorithms distort the geometric properties of the original data, like distances and angles. These algorithm dependent distortions make it unsafe to pipeline the output of a manifold learning algorithm into other data analysis algorithms, limiting the use of these techniques in engineering and the sciences. Our contribution is a statistically founded methodology to estimate and then cancel out the distortions introduced by any embedding algorithm, thus effectively preserving the distances in the original data. This method is based on augmenting the output of a manifold learning algorithm with "the pushforward Riemannian metric", i.e. with additional metric information that allows it to reconstruct the original geometry. Joint work with Dominique Perrault-Joncas, James McQueen, Jacob VanderPlas, Grace Telford, Yu-chia Chen, Samson Koelle

Organizers: Mauro Maggioni, Fei Lu. Contacts us if you would like to meet with the speakers.

The data seminar will take place on Wednesdays at 3pm, in Shaeffer 304, during the semester, on Johns Hopkins Homewood campus this map).

Special Time: 11am and Location: Krieger 413; January 25th

Xiaofeng (Felix) Ye
University of Washington

Title: Stochastic dynamics: Markov chains and random transformations

Abstract: The theory of stochastic dynamics has two different mathematical representations: stochastic processes and random dynamical system (RDS). RDS actually is a more refined mathematical description of the reality; it provides not only stochastic trajectories following one initial condition, but also describes how the entire phase space, with all initial conditions, changes with time. Stochastic processes represent the stochastic movements of individual systems. RDS, however, describes the motions of many systems that experience a common deterministic law that is changing with time due to extrinsic noises, which represent a fluctuating environment or complex external singles. The RDS is often a good framework to study a quite counterintuitive phenomenon called noise-induced synchronization: the stochastic motions of noninteracting systems under a common noise synchronize; their trajectories become close to each other, while individual one remains stochastic. I first established some elementary contradistinctions between Markov chain theory and RDS descriptions of a stochastic dynamical system under discrete time, discrete state (dtds) setting. It was shown that a given Markov chain is compatible with many possible RDS, and I particularly studied the corresponding RDS with maximum metric entropy. I then proved the sufficient and necessary conditions for synchronization in general dtds-RDS and in dtds-RDS with maximum metric entropy. The work is based on the observation that under certain mild conditions, the forward probability in a hidden Markov model exhibits synchronization, which yields an efficient estimation with subsequences. Here I developed a mini-batch gradient descent algorithm for parameter inference in the hidden Markov model. I first efficiently estimated the rate of synchronization, which was proven as the gap of top Lyapunov exponents, and then fully utilized it to approximate the length of subsequences in the mini-batch algorithm. I theoretically validated the algorithm and numerically demonstrated the effectiveness.

January 31st

Tom Goldstein
University of Maryland

Title: Principled non-convex optimization for deep learning and phase retrieval

Abstract: This talk looks at two classes of non-convex problems. First, we discuss phase retrieval problems, and present a new formulation, called PhaseMax, that reduces this class of non-convex problems into a convex linear program. Then, we turn our attention to more complex non-convex problems that arise in deep learning. We'll explore the non-convex structure of deep networks using a range of visualization methods. Finally, we discuss a class of principled algorithms for training "binarized" neural networks, and show that these algorithms have theoretical properties that enable them to overcome the non-convexities present in neural loss functions.

Related papers:
PhaseMax: Convex Phase Retrieval via Basis Pursuit
Visualizing the Loss Landscape of Neural Nets
Training Quantized Nets: A Deeper Understanding

February 21st

Dimitris Giannakis
NYU

Title:Data-driven modeling of vector fields and differential forms by spectral exterior calculus

Abstract: We discuss a data-driven framework for exterior calculus on manifolds. This framework is based on a representations of vector fields, differential forms, and operators acting on these objects in frames (overcomplete bases) for L^2 and higher-order Sobolev spaces built entirely from the eigenvalues and eigenfunctions of the Laplacian of functions. Using this approach, we represent vector fields either as linear combinations of frame elements, or as operators on functions via matrices. In addition, we construct a Galerkin approximation scheme for the eigenvalue problem for the Laplace-de-Rham operator on 1-forms, and establish its spectral convergence. We present applications of this scheme to a variety of examples involving data sampled on smooth manifolds and the Lorenz 63 fractal attractor. This work is in collaboration with Tyrus Berry.

Related papers:
Spectral exterior calculus

March 14th

Edriss Titi
Texas A&M University, and The Weizmann Institute of Science

Title: Data Assimilation and Feedback Control Algorithm for Dissipative Evolution Models Employing Coarse Mesh Observables

Abstract: One of the main characteristics of infinite-dimensional dissipative evolution equations, such as the Navier-Stokes equations and reaction-diffusion systems, is that their long-time dynamics is determined by finitely many parameters -- finite number of determining modes, nodes, volume elements and other determining interpolants. In this talk I will show how to explore this finite-dimensional feature of the long-time behavior of infinite-dimensional dissipative systems to design finite-dimensional feedback control for stabilizing their solutions. Notably, it is observed that this very same approach can be implemented for designing data assimilation algorithms of weather prediction based on discrete measurements. In addition, and if time allows, I will also show that the long-time dynamics of the Navier-Stokes equations can be imbedded in an infinite-dimensional dynamical system that is induced by an ordinary differential equations, named {\it determining form}, which is governed by a globally Lipschitz vector field.Remarkably, as a result of this machinery I will eventually show that the global dynamics of the Navier-Stokes equations is be determining by only one parameter that is governed by an ODE. The Navier-Stokes equations are used as an illustrative example, and all the above mentioned results equally hold to other dissipative evolution PDEs, in particular to various dissipative reaction-diffusion systems and geophysical models.
This is a joint work with A. Azouani, H. Bessaih, A. Farhat, C. Foias, M. Jolly, R. Kravchenko, E. Lunasin and E. Olson

Related papers:
Data assimilation algorithm for 3D Benard convection in porous media employing only temperature measurements
A Discrete Data Assimilation Scheme for the Solutions of the Two-Dimensional Navier--Stokes Equations and Their Statistics,

$Titi$

March 28th

Rama Chellappa
University of Maryland, College Park

Title: Learning Along the Edge of Deep Networks

Abstract: While Deep Convolutional Neural Networks (DCNNs) have achieved impressive results on many detection and classification tasks (for example, unconstrained face detection, verification and recognition), it is still unclear why they perform so well and how to properly design them. It is widely recognized that while training deep networks, an abundance of training samples is required. These training samples need to be lossless, perfectly labeled, and spanning various classes in a balanced way. The generalization performance of designed networks and their robustness to adversarial examples needs to be improved too. In this talk, we analyze each of these individual conditions to understand their effects on the performance of deep networks and present mitigation strategies when the ideal conditions are not met. First, we investigate the relationship between the performance of a convolutional neural network (CNN), its depth, and the size of its training set and present performance bounds on CNNs with respect to the network parameters and the size of the available training dataset. Next, we consider the task of adaptively finding optimal training subsets which will be iteratively presented to the DCNN. We present convex optimization methods, based on an objective criterion and a quantitative measure of the current performance of the classifier, to efficiently identify informative samples to train on. Then we present Defense-GAN, a new strategy that leverages the expressive capability of generative models to defend DCNNs against adversarial attacks. The Defense-GAN can be used with any classification model and does not modify the classifier structure or training procedure. It can also be used as a defense against any attack as it does not assume knowledge of the process for generating the adversarial examples. An approach for training a DCNN using compressed data will also be presented by employing the GAN framework. Finally, to address generalization to unlabeled test data and robustness to adversarial samples, we propose an approach that leverages unsupervised data to bring the source and target distributions closer in a learned joint feature space. This is accomplished by inducing a symbiotic relationship between the learned embedding and a generative adversarial network. We demonstrate the impact of the analyses discussed above on a variety of reconstruction and classification problems.

Related papers:
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models
On the Size of Convolutional Neural Networks and Generalization Performance
Task-Aware Compressed Sensing with Generative Adversarial Networks

April 18th

Pierre-Emmanuel Jabin
University of Maryland, College Park

Title: Complexity of some models of interacting biological neurons

Abstract: We study some models of for the dynamics of large groups of biological neurons: Those typically consist of large coupled systems of ODE's or SDE's, usually implementing some simple form of integrate and fire. The main question that we wish to address concerns the behavior of such networks as the number of neurons increases. Many particle systems (as they are used in physics, or multi-agent systems in general) naturally come to mind, as it is well known in such cases that propagation of chaos (i.e. the almost independence of each agent) can lead to a reduction in complexity through the direct calculation of various macroscopic densities. However the system under consideration here can be seen as multi-agent system with positive reinforcement so that correlations between neurons never vanish. In an ongoing work with D. Poyato, we first study the case where neurons are essentially fully connected. We show that in spite of this simple topology, the networks may exhibit different measures of complexity which can be characterized through the type of initial connections between neurons.

Related papers:
On the simulation of large populations of neurons
Dynamics of sparsely connected networks of excitatory and inhibitory spiking networks
Mean-field theory of irregularly spiking neuronal populations and working memory in recurrent cortical networks
The mean field equation for the Kuramoto model on graph sequences with non-Lipschitz limit

Organizers: Mauro Maggioni, Fei Lu, Stefano Vigogna. Contacts us if you would like to meet with the speakers.

The data seminar will take place on Wednesdays at 3pm, in Hodson Hall 203, during the semester, on Johns Hopkins Homewood campus this map).

September 13th

Yannis Kevrekidis
Bloomberg distinguished Professor, ChemBE, Johns Hopkins University

Title: No equations, no variables, no parameters, no space, no time: Data and the computational modeling of complex/multiscale systems

Abstract: Obtaining predictive dynamical equations from data lies at the heart of science and engineering modeling, and is the linchpin of our technology. In mathematical modeling one typically progresses from observations of the world (and some serious thinking!) first to equations for a model, and then to the analysis of the model to make predictions. Good mathematical models give good predictions (and inaccurate ones do not) - but the computational tools for analyzing them are the same: algorithms that are typically based on closed form equations. While the skeleton of the process remains the same, today we witness the development of mathematical techniques that operate directly on observations -data-, and appear to circumvent the serious thinking that goes into selecting variables and parameters and deriving accurate equations. The process then may appear to the user a little like making predictions by "looking in a crystal ball". Yet the "serious thinking" is still there and uses the same -and some new- mathematics: it goes into building algorithms that "jump directly" from data to the analysis of the model (which is now not available in closed form) so as to make predictions. Our work here presents a couple of efforts that illustrate this ``new path from data to predictions. It really is the same old path, but it is travelled by new means.

Related papers:
Parsimonious Representation of Nonlinear Dynamical Systems through Manifold Learning: a Chemotaxis Case Study
An Equal Space for Complex Data with Unknown Internal Order: Observability, Gauge Invariance and Manifold Learning

September 20th

Carey Priebe
Professor, Applied Mathematics and Statistics, Johns Hopkins University

Semiparametric spectral modeling of the Drosophila connectome

We present semiparametric spectral modeling of the complete larval Drosophila mushroom body connectome. Motivated by a thorough exploratory data analysis of the network via Gaussian mixture modeling (GMM) in the adjacency spectral embedding (ASE) representation space, we introduce the latent structure model (LSM) for network modeling and inference. LSM is a generalization of the stochastic block model (SBM) and a special case of the random dot product graph (RDPG) latent position model, and is amenable to semiparametric GMM in the ASE representation space. The resulting connectome code derived via semiparametric GMM composed with ASE captures latent connectome structure and elucidates biologically relevant neuronal properties.

Related papers:
The complete connectome of a learning and memory center in an insect brain
A consistent adjacency spectral embedding for stochastic blockmodel graphs
A limit theorem for scaled eigenvectors of random dot product graphs
Limit theorems for eigenvectors of the normalized Laplacian for random graphs

September 27

John Benedetto
Professor, Department of Mathematics, University of Maryland, College Park and Norbert Wiener Center

Frames -- two case studies: ambiguity and uncertainty

The theory of frames is an essential concept for dealing with signal representation in noisy environments. We shall examine the theory in the settings of the narrow band ambiguity function and of quantum information theory. For the ambiguity function, best possible estimates are derived for applicable constant amplitude zero autocorrelation (CAZAC) sequences using Weil's solution of the Riemann hypothesis for finite fields. In extending the theory to the vector-valued case modelling multi-sensor environments, the definition of the ambiguity function is characterized by means of group frames. For the uncertainty principle, Andrew Gleason's measure theoretic theorem, establishing the transition from the lattice interpretation of quantum mechanics to Born's probabilistic interpretation, is generalized in terms of frames to deal with uncertainty principle inequalities beyond Heisenberg's. My collaborators are Travis Andrews, Robert Benedetto, Jeffrey Donatelli, Paul Koprowski, and Joseph Woodworth.

Related papers:
Super-resolution by means of Beurling minimal extrapolation
Generalized Fourier frames in terms of balayage
Uncertainty principles and weighted norm inequalities
A frame reconstruction algorithm with applications to magnetric resonance imaging
Frame multiplication theory and a vector-valued DFT and ambiguity functions

October 4th

Nathan Kutz
Robert Bolles and Yasuko Endo Professor, Applied Mathematics, University of Washington

Data-driven discovery of governing equations and physical laws

The emergence of data methods for the sciences in the last decade has been enabled by the plummeting costs of sensors, computational power, and data storage. Such vast quantities of data afford us new opportunities for data-driven discovery, which has been referred to as the 4th paradigm of scientific discovery. We demonstrate that we can use emerging, large-scale time-series data from modern sensors to directly construct, in an adaptive manner, governing equations, even nonlinear dynamics, that best model the system measured using modern regression techniques. Recent innovations also allow for handling multi-scale physics phenomenon and control protocols in an adaptive and robust way. The overall architecture is equation-free in that the dynamics and control protocols are discovered directly from data acquired from sensors. The theory developed is demonstrated on a number of canonical example problems from physics, biology and engineering.

Related papers:
Discovering governing equations from data by sparse identification of nonlinear dynamical systems
Data-driven discovery of partial differential equations
Chaos as an intermittently forced linear system

October 11th

Fei Lu
Assistant Professor, Department of Mathematics, Johns Hopkins University

Data assimilation with stochastic model reduction

In weather and climate prediction, data assimilation combines data with dynamical models to make prediction, using ensemble of solutions to represent the uncertainty. Due to limited computational resources, reduced models are needed and coarse-grid models are often used, and the effects of the subgrid scales are left to be taken into account. A major challenge is to account for the memory effects due to coarse graining while capturing the key statistical-dynamical properties. We propose to use nonlinear autoregression moving average (NARMA) type models to account for the memory effects, and demonstrate by examples that the resulting NARMA type stochastic reduced models can capture the key statistical and dynamical properties and therefore can improve the performance of ensemble prediction in data assimilation. The examples include the Lorenz 96 system (which is a simplified model of the atmosphere) and the Kuramoto-Sivashinsky equation of spatiotemporally chaotic dynamics.

Related papers:
Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics
Data-based stochastic model reduction for the Kuramoto-Sivashinsky equation
Accounting for model error from unresolved scales in ensemble Kalman filters by stochastic parametrization

$FeiLu$

October 18th

Roy Lederman
Postdoc, Program in Applied and Computational Mathematics, Princeton University

Hyper-Molecules in Cryo-Electron Microscopy (cryo-EM)

Cryo-EM is an imaging technology that is revolutionizing structural biology; the Nobel Prize in Chemistry 2017 was recently awarded to Jacques Dubochet, Joachim Frank and Richard Henderson for developing cryo-electron microscopy for the high-resolution structure determination of biomolecules in solution". Cryo-electron microscopes produce a large number of very noisy two-dimensional projection images of individual frozen molecules. Unlike related tomography methods, such as computed tomography (CT), the viewing direction of each image is unknown. The unknown directions, together with extreme levels of noise and additional technical factors, make the determination of the structure of molecules challenging. Unlike other structure determination methods, such as x-ray crystallography and nuclear magnetic resonance (NMR), cryo-EM produces measurements of individual molecules and not ensembles of molecules. Therefore, cryo-EM could potentially be used to study mixtures of different conformations of molecules. While current algorithms have been very successful at analyzing homogeneous samples, and can recover some distinct conformations mixed in solutions, the determination of multiple conformations, and in particular, continuums of similar conformations (continuous heterogeneity), remains one of the open problems in cryo-EM. I will discuss the hyper-molecules approach to continuous heterogeneity, and the numerical tools and analysis methods that we are developing in order to recover such hyper-molecules.

Related papers:
Continuously heterogeneous hyper-objects in cryo-EM and 3-D movies of many temporal dimensions Numerical Algorithms for the Computation of Generalized Prolate Spheroidal Functions

October 25th

John Harlim
Professor, Department of Mathematics, The Pennsylvania State University

Nonparametric modeling for prediction and data assimilation

I will discuss a nonparametric modeling approach for forecasting stochastic dynamical systems on smooth manifolds embedded in Euclidean space. This approach allows one to evolve the probability distribution of non-trivial dynamical systems with an equation-free modeling. In the second part of this talk, I will discuss a nonparametric estimation of likelihood functions using data-driven basis functions and the theory of kernel embeddings of conditional distributions developed in the machine learning community. I will demonstrate how to use this likelihood function to estimate biased modeling error in assimilating cloudy satellite brightness temperature-like quantities.

Related papers: Nonparametric forecasting of low-dimensional dynamical systems
Correcting Biased Observation Model Error in Data Assimilation

November 8th

Eitan Tadmor
Distinguished University Professor, Department of Mathematics, Institute for Physical Science & Technology, Center for Scientific Computation and Mathematical Modeling (CSCAMM), University of Maryland

Title: Emergent behavior in self-organized dynamics: from consensus to hydrodynamic flocking

Abstract: We discuss several first- and second-order models encountered in opinion and flocking dynamics. The models are driven by different rules of engagement, which quantify how each member interacts with its immediate neighbors. We highlight the role of geometric vs. topological neighborhoods and distinguish between local and global interactions, while addressing the following two related questions. (i) How local rules of interaction lead, over time, to the emergence of consensus; and (ii) How the flocking behavior of large crowds captured by their hydrodynamic description.

Related papers:
Kinetic descriptions a mathematical bridge to better understand the world
Mathematical aspects of self-organized dynamics: consensus, emergence of leaders, and social hydrodynamics
Heterophilious dynamics enhances consensus
From particle to kinetic and hydrodynamic descriptions of flocking

November 15th

Tyrus Berry
Research Associate, Department of Mathematical Science, George Mason University

What geometries can we learn from data?

In the field of manifold learning, the foundational theoretical results of Coifman and Lafon (Diffusion Maps, 2006) showed that for data sampled near an embedded manifold, certain graph Laplacian constructions are consistent estimators of the Laplace-Beltrami operator on the underlying manifold. Since these operators determine the Riemannian metric, they completely describe the geometry of the manifold (as inherited from the embedding). It was later shown that different kernel functions could be used to recover any desired geometry, at least in terms of pointwise estimation of the associated Laplace-Beltrami operator. In this talk I will first briefly review the above results and then introduce new results on the spectral convergence of these graph Laplacians. These results reveal that not all geometries are accessible in the stronger spectral sense. However, when the data set is sampled from a smooth density, there is a natural conformally invariant geometry which is accessible on all compact manifolds, and even on a large class of non-compact manifolds. Moreover, the kernel which estimates this geometry has a very natural construction which we call Continuous k-Nearest Neighbors (CkNN).

Related papers:
Consistent Manifold Representation for Topological Data Analysis
Iterated diffusion maps for feature identification

November 29th

Yingzhou Li
Phillip Griffiths Research Assistant Professor, Department of Mathematics, Duke University

Kernel functions and their fast evaluations

Kernel matrices are popular in machine learning and scientific computing, but they are limited by their quadratic complexity in both construction and storage. It is well-known that as one varies the kernel parameter, e.g., the width parameter in radial basis function kernels, the kernel matrix changes from a smooth low-rank kernel to a diagonally-dominant and then fully-diagonal kernel. Low-rank approximation methods have been widely-studied, mostly in the first case, to reduce the memory storage and the cost of computing matrix-vector products. Here, we use ideas from scientific computing to propose an extension of these methods to situations where the matrix is not well-approximated by a low-rank matrix. In particular, we construct an efficient block low-rank approximation method -- which we call the Block Basis Factorization -- and we show that it has O(n) complexity in both time and memory. Our method works for a wide range of kernel parameters, extending the domain of applicability of low-rank approximation methods, and our empirical results demonstrate the stability (small standard deviation in error) and superiority over current state-of-art kernel approximation algorithms.

Related papers:
Structured Block Basis Factorization for Scalable Kernel Matrix
On the numerical rank of radial basis function kernel matrices in high dimension

December 6th

Hau-Tieng Wu
Associate Professor, Department of Mathematics, Duke University

Some Data Analysis Tools Inspired by Medical Challenges Fetal ECG as an example

We discuss a particular interest in medicine extracting hidden dynamics from a single observed time series composed of multiple oscillatory signals, which could be viewed as a single-channel blind source separation problem. This problem is common nowadays due to the popular mobile health monitoring devices, and is made challenging by the structure of the signal which consists of non-sinusoidal oscillations with time varying amplitude/frequency, and by the heteroscedastic nature of the noise. Inspired by the fetal electrocardiogram (ECG) signal analysis from the single lead maternal abdominal ECG signal, in this talk I will discuss some new data analysis tools, including the cepstrum-based nonlinear-type time-frequency analysis and fiber-bundle based manifold learning technique. In addition to showing the results in fetal ECG analysis, I will also show how the approach could be applied to simultaneously extract the instantaneous heart/respiratory rate from a PPG signal during exercise. If time permits, the clinical trial results will be discussed.Abstract: Some Data Analysis Tools Inspired by Medical Challenges Fetal ECG as an example

Related papers:

December 20th

Valeriya Naumova
Senior research scientist, Section for Computing and Software, Simula Research Laboratory (Simula)

Multi-parameter regularisation for solving unmixing problems in signal processing: theoretical and practical aspects

Motivated by real-life applications in signal processing and image analysis, where the quantity of interest is generated by several sources to be accurately modelled and separated, as well as by recent advances in sparse regularisation theory and optimisation, we present a theoretical and algorithmic framework for optimal support recovery in inverse problems of unmixing type by means of multi-penalty regularisation. While multi-penalty regularisation is not a novel technique [1], we aim at providing precise reconstruction guarantees and methods for adaptive regularisation parameter choice. We consider and analyse a regularisation functional composed of a data-fidelity term, where signal and noise are additively mixed, a non-smooth, convex, sparsity promoting term, and a convex penalty term to model the noise. We prove not only that the well-established theory for sparse recovery in the single parameter case can be translated to the multi-penalty settings, but we also demonstrate the enhanced properties of multi-penalty regularisation in terms of support identification compared to sole $\ell^1$-minimisation. Extending the notion of Lasso path algorithm, we additionally propose an efficient procedure for an adaptive parameter choice in multi-penalty regularisation, focusing on the recovery of the correct support of the solution. The approach essentially enables a fast construction of a tiling over the parameter space in such a way that each tile corresponds to a different sparsity pattern of the solution. Finally, we propose an iterative alternating algorithm based on simple iterative thresholding steps to perform the minimisation of the extended multi-penalty functional, containing non-smooth and non-convex sparsity promoting term. To exemplify the robustness and effectiveness of the multi-penalty framework, we provide an extensive numerical analysis of our method and compare it with state-of-the-art single-penalty algorithms for compressed sensing problems. This is joint work with Markus Grasmair [3, 4], Norwegian University of Science and Technology; Timo Klock [4], Simula Research Laboratory, and Johannes Maly and Steffen Peter [2], Technical University of Munich.

Related papers:
Y. Meyer, Oscillating patterns in image processing and nonlinear evolution equations
Minimization of multi-penalty functionals by alternating iterative thresholding and optimal parameter choices
Conditions on optimal support recovery in unmixing problems by means of multi-penalty regularization
Multiple parameter learning with regularization path algorithms

Organizers: Mauro Maggioni, Wenjing Liao, Stefano Vigogna. Contacts us if you would like to meet with the speakers.

The data seminar will take place on Wednesdays at 3pm, in Whitehead Hall 304, during the semester, on Johns Hopkins Homewood campus this map).

February 8th

Kasso Okoudjou
Professor and Associate Chair, Department of Mathematics, University of Maryland, College Park
https://www.math.umd.edu/~okoudjou/

Inductive and numerical approaches to the HRT conjecture

Given a non-zero square integrable function $g$ and $\Lambda=\{(a_k, b_k)\}_{k=1}^N \subset \Br^2$ let $G(g, \Lambda)=\{e^{2\pi i b_k \cdot}g(\cdot - a_k)\}_{k=1}^N.$ The Heil-Ramanathan-Topiwala (HRT) Conjecture is the question of whether $G(g, \Lambda)$ is linearly independent. For the last two decades, very little progress has been made in settling the conjecture. In the first part of the talk, I will give an overview of the state of the conjecture. I will then describe recent inductive and numerical approaches to attack the conjecture. If time permits, I will present some new positive results in the special case where $g$ is real-valued.

Related papers:
Linear Independence of Finite Gabor System
The HRT Conjecture and the Zero Divisor Conjecture for the Heisenberg Group
Extension and restriction principles for the HRT conjecture

$Kasso$

February 15th

Jerome Darbon
Assistant Professor, Applied Mathematics, Brown University
https://www.brown.edu/academics/applied-mathematics/jerome-darbon

On convex finite-dimensional variational methods in imaging sciences, and Hamilton-Jacobi equations

We consider standard finite-dimensional variational models used in signal/image processing that consist in minimizing an energy involving a data fidelity term and a regularization term. We propose new remarks from a theoretical perspective which give a precise description on how the solutions of the optimization problem depend on the amount of smoothing effects and the data itself. The dependence of the minimal values of the energy is shown to be ruled by Hamilton-Jacobi equations, while the minimizers $u(x,t)$ for the observed images x and smoothing parameters $t$ are given by $ u(x,t)=x -\nabla H(\nabla E(x,t))$ where $E(x,t)$ is the minimal value of the energy and $H$ is a Hamiltonian related to the data fidelity term. Various vanishing smoothing parameter results are derived illustrating the role played by the prior in such limits. Finally, we briefly present an efficient numerical numerical method for solving certain Hamilton-Jacobi equations in high dimension and some applications in optimal control.

Related papers:
Linear Independence of Finite Gabor SysteOn Convex Finite-Dimensional Variational Methods in Imaging Sciences and Hamilton--Jacobi Equations
Algorithms for overcoming the curse of dimensionality for certain HamiltonJacobi equations arising in control theory and elsewhere

March 8th

Matthew Hirn
Assistant Professor, Department of Mathematics, Michigan State University
https://matthewhirn.wordpress.com/

Learning many body physics via multiscale, multilayer machine learnining architectures

Deep learning algorithms are making their mark in machine learning, obtaining state of the art results across computer vision, natural language processing, auditory signal processing and more. A wavelet scattering transform has the general architecture of a convolutional neural network, but leverages structure within data by encoding multiscale, stable invariants relevant to the task at hand. This approach is particularly relevant to data generated by physical systems, as such data must respect underlying physical laws. We illustrate this point through many body physics, in which scattering transforms can be loosely interpreted as the machine learning version of a fast multipole method (FMM). Unlike FMMs, which efficiently simulate the physical system, the scattering transform learns the underlying physical kernel from given states of the system. The resulting learning algorithm obtains state of the art numerical results for the regression of molecular energies in quantum chemistry, obtaining errors on the order of more costly quantum mechanical approaches.

Related papers:
Wavelet Scattering Regression of Quantum Chemical Energies

Solid Harmonic Wavelet Scattering for Molecular Energy Regression

Group Invariant Scattering

March 29th

Jason Eisner
Professor, Department of Computer Science, Johns Hopkins University
http://www.cs.jhu.edu/~jason/

Probabilistic Modeling of Natural Language

Natural language is a particular kind of time-series data. By way of introduction, I will informally sketch some of the phenomena that occur in natural language data and the kinds of probabilistic models that are traditionally used to describe them (e.g., context free grammars, Chinese restaurant processes, graphical models, finite-state transducers, recurrent neural networks augmented with memory). Many of these are covered in my JHU fall course, EN.600.465 Natural Language Processing. As an illustrative example, I will then describe a new conditional probability model that combines LSTMs with finite-state transducers to predict one string from another. For example, such a model can convert between the past and present tenses of an unfamiliar verb. Such pairwise conditional distributions can be combined into graphical models that model the relationships among many strings.

Related papers:
Weighting finite-state transductions with neural context

Modeling word forms using latent underlying morphs and phonology

Robust entity clustering via phylogenetic inference

April 5th

Afonso Bandeira
Assistant Professor, Courant Institute of Mathematical Sciences, NYU
http://www.cims.nyu.edu/~bandeira/

On Phase Transitions for Spiked Random Matrix and Tensor Models

A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector (or low rank structure) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences, where the goal is often to recover or detect the planted low rank structured. In this talk we discuss fundamental limitations of statistical methods to perform these tasks and methods that outperform PCA at it. Emphasis will be given to low rank structures arising in Synchronization problems. Time permitting, analogous results for spiked tensor models will also be discussed. Joint work with: Amelia Perry, Alex Wein, and Ankur Moitra.

Related papers:
Statistical limits of spiked tensor models

Message-passing algorithms for synchronization problems over compact groups

Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization

April 12th

Andrew Christlieb
MSU Foundation Professor (Department of Mathematics) and Department Chair (Department of Computational Mathematics, Science and Engineering), Michigan State University
https://cmse.msu.edu/directory/faculty/andrew-christlieb/

A sub-linear deterministic FFT for sparse high dimensional signals

In this talk we investigate the problems of efficient recover of sparse signals (sparsity=k) in a high dimensional setting. In particular, we are going to investigate efficient recovery of the k largest Fourier modes of a signal of size N^d, where N is the bandwidth and d is the dimension. Our objective is the development of a high dimensional sub-linear FFT, d=100 or 1000, that can recover the signal in O(d k log k) time. The methodology is based on our one dimensional deterministic sparse FFT that is O(k log k). The original method is recursive and based on ratios of short FFTs of pares of sub-sampled signals. The same ratio test allows us to identify when there is a collision due to aliasing the sub-sampled signals. The recursive nature allows us to separate and identify frequencies that have collided. Key in the high dimensional setting is the introduction of a partial unwrapping method and a tilting method that can ensure that we avoid collisions in the high dimensional setting on sub-sampled grids. We present the method, some analysis and results for a range of tests in both the noisy and noiseless cases.

Related papers:
Adaptive Sub-Linear Time Fourier Algorithms

A Multiscale Sub-linear Time Fourier Algorithm for Noisy Data

Multi-dimensional Sublinear Sparse Fourier Algorithm

April 26th

Wojciech Czaja
Professor, Department of Mathematics, University of Maryland, College Park
https://www.math.umd.edu/~wojtek/

Solving Fredholm integrals from incomplete measurements

We present an algorithm to solve Fredholm integrals of the first kind with tensor product structures, from a limited number of measurements with the goal of using this method to accelerate Nuclear Magnetic Resonance (NMR) acquisition. This is done by incorporating compressive sampling type arguments to fill in the missing measurements using a priori knowledge of the structure of the data. In the first step, we recover a compressed data matrix from measurements that form a tight frame, and establish that these measurements satisfy the restricted isometry property (RIP). In the second step, we solve the zeroth-order regularization minimization problem using the Venkataramanan-Song-Huerlimann algorithm. We demonstrate the performance of this algorithm on simulated and real data and we compare it with other sampling techniques. Our theory applied to both 2D and multidimensional NMR.

Related papers:
Efficient 2D MRI relaxometry using compressed sensing

Solving 2D Fredholm Integral from Incomplete Measurements Using Compressive Sensing

2D sparse sampling algorithm for ND Fredholm equations with applications to NMR relaxometry

Organizers: Mauro Maggioni, Wenjing Liao, Stefano Vigogna. Contacts us if you would like to meet with the speakers.

The data seminar will take place on Wednesdays at 3pm, in Krieger Hall 309, during the semester, on Johns Hopkins Homewood campus (building 39 at location F2/3 on this map).

September 7th

Radu Balan
Professor, Department of Mathematics, University of Maryland, College Park
http://www.math.umd.edu/~rvbalan/

Statistics of the Stability Bounds in the Phase Retrieval Problem

In this talk we present a local-global Lipschitz analysis of the phase retrieval problem. Additionally we present tentative estimates of the tail-bound for the distribution of the global Lipschitz constants. Specifically it is known that if the frame $\{f_1,\ldots,f_m\}$ for $C^n$ is phase retrievable then there are constants $a_0$ and $b_0$ so that for every $x,y\in C^n$: $a_0 ||xx^*-yy^*||_1^2 \leq \sum_{k=1}^m ||\langle x,f_k\rangle|^2-|\langle y,f_k\rangle|^2|^2 \leq b_0 ||xx^*-yy^*||_1^2$. Assume $f_1,\ldots,f_m$ are independent realizations with entries from $CN(0,1)$. In this talk we establish estimates for the probability $P(a_0>a)$.

Related papers:
On Lipschitz Analysis and Lipschitz Synthesis for the Phase Retrieval Problem
Frames and Phaseless Reconstruction
An Introduction to Matrix Concentration Inequalities
Introduction to the non-asymptotic analysis of random matrices

$Balan$

September 14th

Nate Strawn
Assistant Professor, Department of Mathematics and Statistics, Georgetown University

Finite-Sample Bounds for Geometric Multiresolution Analysis

Geometric Multiresolution Analysis takes a multiscale partition of a Euclidean dataset and fits affine approximations to data inside of each partition element at every scale. We prove finite-sample approximation error bounds for Geometric Multiresolution Analysis of "noisy" manifold models. First, we develop bounds on approximation error based on compatibility conditions between a multiscale partitions and the underlying generating distribution of the dataset. We then show partitions obtained from Cover Trees of random data drawn from "noisy" manifolds will satisfy these compatibility conditions with high probability. We conclude with open questions and future directions. This is joint work with Mauro Maggioni (Johns Hopkins University) and Stanislav Minsker (University of Southern California).

Related papers:
Multiscale Dictionary Learning: Non-Asymptotic Bounds and Robustness
Cover trees for nearest neighbor
Multi-scale geometric methods for data sets II: Geometric Multi-Resolution Analysis

September 21st

Charles Meneveau
Louis M. Sardella Professor of Mechanical Engineering, Johns Hopkins University
http://pages.jh.edu/~cmeneve1/

Hydrodynamic turbulence in the era of big data: simulation, data, and analysis

In this talk, we review the classic problem of Navier-Stokes turbulence, the role numerical simulations have played in advancing the field, and the data challenges posed by these simulations. We describe the Johns Hopkins Turbulence Databases (JHTDB) and present some sample applications from the areas of velocity increment statistics and finite time Lyapunov exponents in isotropic turbulence and wall modeling for Large Eddy Simulations of wall-bounded flows.

Related papers:
A Web services accessible database of turbulent channel flow and its use for testing a new integral wall model for LES
Large-deviation statistics of vorticity stretching in isotropic turbulence
A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence

Note date and place: September 23rd - 2pm - Gilman 132

Ben Leimkuhler
Professor of Mathematics, University of Edinburgh
http://kac.maths.ed.ac.uk/~bl/

From Molecular Dynamics to Large Scale Inference

Molecular models and data analytics problems give rise to very large systems of stochastic differential equations (SDEs) whose paths are designed to ergodically sample multimodal probability distributions. An important challenge for the numerical analyst (or the data scientist, for that matter) is the design of numerical procedures to generate these paths. One of the interesting ideas is to construct stochastic numerical methods with close attention to the error in the invariant measure. Another is to redesign the underlying stochastic dynamics to reduce bias or locally transform variables to enhance sampling efficiency. I will illustrate these ideas with various examples, including a geodesic integrator for constrained Langevin dynamics [1] and an ensemble sampling strategy for distributed inference [2].

Related papers:
[1] B. Leimkuhler and C. Matthews, Efficient molecular dynamics using geodesic integration and solvent-solute splitting, Proceedings of the Royal Society A, 2016.
[2] B. Leimkuhler, C. Matthews and J. Weare, Ensemble preconditioning for Markov chain Monte Carlo simulation.

September 28th

Rene Vidal
Professor of Biomedical Engineering, Computer Science, Mechanical Engineering, and Electrical and Computer Engineering, Johns Hopkins University
http://cis.jhu.edu/~rvidal/

Global Optimality in Matrix and Tensor Factorization, Deep Learning, and Beyond

Matrix, tensor, and other factorization techniques are used in many applications and have enjoyed significant empirical success in many fields. However, common to a vast majority of these problems is the significant disadvantage that the associated optimization problems are typically non-convex due to a multilinear form or other convexity destroying transformation. Building on ideas from convex relaxations of matrix factorizations, in this talk I will present a very general framework which allows for the analysis of a wide range of non-convex factorization problems - including matrix factorization, tensor factorization, and deep neural network training formulations. In particular, I will present sufficient conditions under which a local minimum of the non-convex optimization problem is a global minimum and show that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a local descent algorithm.

Related papers:
Global Optimality in Tensor Factorization, Deep Learning, and Beyond

October 5th

Mauro Maggioni
Bloomberg Distinguished Professor of Mathematics and Applied Mathematics and Statistics, Johns Hopkins University

Geometric Methods for the Approximation of High-dimensional Dynamical Systems

I will discuss a geometry-based statistical learning framework for performing model reduction and modeling of stochastic high-dimensional dynamical systems. I will consider two complementary settings. In the first one, I am given long trajectories of a system, e.g. from molecular dynamics, and I discuss techniques for estimating, in a robust fashion, an effective number of degrees of freedom of the system, which may vary in the state space of then system, and a local scale where the dynamics is well-approximated by a reduced dynamics with a small number of degrees of freedom. I will then use these ideas to produce an approximation to the generator of the system and obtain, via eigenfunctions of an empirical Fokker-Planck question, reaction coordinates for the system that capture the large time behavior of the dynamics. I will present various examples from molecular dynamics illustrating these ideas. In the second setting I assume I only have access to a (large number of expensive) simulators that can return short simulations of high-dimensional stochastic system, and introduce a novel statistical learning framework for learning automatically a family of local approximations to the system, that can be (automatically) pieced together to form a fast global reduced model for the system, called ATLAS. ATLAS is guaranteed to be accurate (in the sense of producing stochastic paths whose distribution is close to that of paths generated by the original system) not only at small time scales, but also at large time scales, under suitable assumptions on the dynamics. I discuss applications to homogenization of rough diffusions in low and high dimensions, as well as relatively simple systems with separations of time scales, and deterministic chaotic systems in high-dimensions, that are well-approximated by stochastic differential equations.

No knowledge of molecular dynamics is required, and the techniques above are quite universal. Ideas in the first part of the talk are based on what is called Diffusion Geometry, and have been used widely in data analysis; ideas in the second part are applicable to MCMC. The talk will be accessible to students with a wide variety of backgrounds and interests.

Related papers:
My page on Analysis of Molecular Dynamics data.
ATLAS: A geometric approach to learning high-dimensional stochastic systems near manifolds

October 12th

Maria Cameron
Assistant Professor, Department of Mathematics, University of Maryland College Park

Modeling the dynamics of interacting particles by means of stochastic networks

Material science have been rapidly developing in recent years. A variety of particles interacting according to different kinds of pair potentials has been produced in experimental works. Looking into the future, one can imagine controlled self-assembly of particles into clusters of desired structures leading to the creation of new types of materials. Analytical studies of the self-assembly involve coping with difficulties associated with the huge numbers configurations, high dimensionality, complex geometry, and unacceptably large CPU times. A feasible approach to the study of self-assembly consists of mapping the collections of clusters onto stochastic networks (continuous-time Markov chains) and analyzing their dynamics. Vertices of the networks represent local minima of the potential energy of the clusters, while arcs connect only those pairs of vertices that correspond to local minima between which direct transitions are physically possible. Transition rates along the arcs are the transition rates between the corresponding pairs of local minima. Such networks are mathematically tractable and, at the same time, preserve important features of the underlying dynamics. Nevertheless, their huge size and complexity render their analysis challenging and invoke the development of new mathematical techniques. I will discuss some approaches to construction and analysis of such networks.

Related papers:
Spectral analysis and clustering of large stochastic networks. Application to the Lennard-Jones-75 cluster
Metastability, spectrum, and eigencurrents of the Lennard-Jones-38 network

$Radu$

October 26th

Robert Pego
Professor, Department of Mathematical Sciences, Carnegie Mellon University
http://www.math.cmu.edu/~bobpego/

Microdroplet instablity for incompressible distance between shapes

AbstractThe least-action problem for geodesic distance on the 'manifold' of fluid-blob shapes exhibits instability due to microdroplet formation.This reflects a striking connection between Arnold's least-action principle for incompressible Euler flows and geodesic paths for Wasserstein distance. A connection with fluid mixture models via a variant of Brenier's relaxed least-action principle for generalized Euler flows will be outlined also. This is joint work with Jian-Guo Liu and Dejan Slepcev.

Related papers:
Euler sprays and Wasserstein geometry of the space of shapes

$Pego$

November 2nd

Dejan Slepcev
Associate Professor, Department of Mathematical Sciences, Carnegie Mellon University
http://www.math.cmu.edu/~slepcev/

Variational problems on graphs and their continuum limits

We will discuss variational problems arising in machine learning and their limits as the number of data points goes to infinity. Consider point clouds obtained as random samples of an underlying "ground-truth" measure. Graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points. Many machine learning tasks, such as clustering and classification, can be posed as minimizing functionals on such graphs. We consider functionals involving graph cuts and graph laplacians and their limits as the number of data points goes to infinity. In particular we establish under what conditions the minimizers of discrete problems have a well defined continuum limit, and characterize the limit. The talk is primarily based on joint work with Nicolas Garcia Trillos, as well as on works with Xavier Bresson, Moritz Gerlach, Matthias Hein, Thomas Laurent, James von Brecht and Matt Thorpe.

Related papers:
A variational approach to the consistency of spectral clustering

$Slepcev$

November 9th

Markos Katsoulakis
Professor, Department of Mathematics and Statistics
http://people.math.umass.edu/~markos/

Scalable Information Inequalities for Uncertainty Quantification in high dimensional probabilistic models

In this this talk we discuss new scalable information bounds for quantities of interest of complex stochastic models. The scalability of inequalities allows us to (a) obtain uncertainty quantification bounds for quantities of interest in high-dimensional systems and/or for long time stochastic dynamics; (b) assess the impact of large model perturbations such as in nonlinear response regimes in statistical mechanics; (c) address model-form uncertainty, i.e. compare different extended probabilistic models and corresponding quantities of interest. We demonstrate these tools in fast sensitivity screening of chemical reaction networks with a very large number of parameters, and towards obtaining robust and tight uncertainty quantification bounds for phase diagrams in statistical mechanics models.

Related papers:
Scalable Information Inequalities for Uncertainty Quantification
Accelerated Sensitivity Analysis in High- Dimensional Stochastic Reaction Networks
Path-Space Information Bounds for Uncertainty Quantification and Sensitivity Analysis of Stochastic Dynamics
Path-space variational inference for non-equilibrium coarse-grained systems
Effects of correlated parameters and uncertainty in electronic-structure-based chemical kinetic modelling

$Katsoulakis$

November 30th

Youssef Marzouk
Associate Professor of Aeronautics and Astronautics, Massachusetts Institute of Technology
http://aeroastro.mit.edu/faculty-research/faculty-list/youssef-m-marzouk

Measure transport approaches for Bayesian computation

We will discuss how transport maps, i.e., deterministic couplings between probability measures, can enable useful new approaches to Bayesian computation. A first use involves a combination of optimal transport and Metropolis correction; here, we use continuous transportation to transform typical MCMC proposals into adapted non-Gaussian proposals, both local and global. Second, we discuss a variational approach to Bayesian inference that constructs a deterministic transport map from a reference distribution to the posterior, without resorting to MCMC. Independent and unweighted samples can then be obtained by pushing forward reference samples through the map. Making either approach efficient in high dimensions, however, requires identifying and exploiting low-dimensional structure. We present new results relating the sparsity and decomposability of transports to the conditional independence structure of the target distribution. We also describe conditions, common in inverse problems, under which transport maps have a particular low-rank or near-identity structure. In general, these properties of transports can yield more efficient algorithms. As a particular example, we derive new deterministic "online" algorithms for Bayesian inference in nonlinear and non-Gaussian state-space models with static parameters. This is joint work with Daniele Bigoni, Matthew Parno, and Alessio Spantini.

Related papers:
Bayesian Inference with Optimal Maps
Transport map accelerated Markov chain Monte Carlo
A multiscale strategy for Bayesian inference using transport maps
An introduction to sampling via measure transport

Note date and place: December 2nd - 10am - Gilman 132

Alex Cloninger
Gibbs Assistant Professor and NSF Postdoctoral Fellow, Yale University
http://users.math.yale.edu/~ac2528

Incorporation of Geometry into Learning Algorithms and Medicine

This talk focuses on two instances in which scientific fields outside mathematics benefit from incorporating the geometry of the data. In each instance, the applications area motivates the need for new mathematical approaches and algorithms, and leads to interesting new questions. (1) A method to determine and predict drug treatment effectiveness for patients based off their baseline information. This motivates building a function adapted diffusion operator high dimensional data X when the function F can only be evaluated on large subsets of X, and defining a localized filtration of F and estimation values of F at a finer scale than it is reliable naively. (2) The current empirical success of deep learning in imaging and medical applications, in which theory and understanding is lagging far behind.. By assuming the data lies near low dimensional manifolds and building local wavelet frames, we improve on existing theory that breaks down when the ambient dimension is large (the regime in which deep learning has seen the most success)

Related papers:
Provable approximation properties for deep neural nets
Function weighted metric discovery for unreliable functions

$Marzouk$

December 7th

Ben Adcock
Assistant Professor, Simons Fraser University
http://benadcock.org/

Sparse polynomial approximation of high-dimensional functions

Many problems in scientific computing require the approximation of smooth, high-dimensional functions from limited amounts of data. For instance, a common problem in uncertainty quantification involves identifying the parameter dependence of the output of a computational model. Complex physical systems require computational models with many parameters, resulting in multivariate functions of many variables. Although the amount of data may be large, the curse of dimensionality essentially prohibits collecting or processing enough data to reconstruct the unknown function using classical approximation techniques. In this talk, I will give an overview of the approximation of smooth, high-dimensional functions by sparse polynomial expansions. I will focus on the recent application of techniques from compressed sensing to this problem, and demonstrate how such approaches theoretically overcome the curse of dimensionality. If time, I will also discuss a number of extensions, including dealing with corrupted and/or unstructured data, the effect of model error and incorporating additional information such as gradient data. I will also highlight several challenges and open problems. This is joint work with Casie Bao, Simone Brugiapaglia and Yi Sui (SFU).

Related papers:
Infinite-dimensional l1 minimization and function approximation from pointwise data
Infinite-dimensional compressed sensing and function interpolation