Welcome to the data seminar. The topics of the talks will vary among multiple topics in applied analysis, probability, applied mathematics related to data and dynamical systems, statistical and machine learning, signal processing, and computation.
To the attending graduate students: together with each talk we include a list of papers that are relevant to that talk, and we strongly encourage the graduate students to read those papers both in advance of the talk and after it. You may view and subscribe to the calendar for the seminar series (this is being tested, please let us know if it does not work for you). 
Organizers: Mauro Maggioni, Fei Lu. Contacts us if you would like to meet with the speakers.

The data seminar will take place on Wednesdays at 3pm, in Shaeffer 304, during the semester, on Johns Hopkins Homewood campus this map). 
October 10
Jianfeng Lu
Duke Title: Solving largescale leading eigenvalue problem
Abstract: The leading eigenvalue problems arise in many applications. When the dimension of the matrix is super huge, such as for applications in quantum manybody problems, conventional algorithms become impractical due to computational and memory complexity. In this talk, we will describe some recent works on new algorithms for the leading eigenvalue problems based on randomized and coordinatewise methods (joint work with Yingzhou Li and Zhe Wang). 

November 6, Special time: 4:005:00, location: Gilman 277
Mohammad Farazmand
Department of Mechanical Engineering, MIT Title: Extreme Events: Dynamics, Prediction and Mitigation
Abstract: A wide range of natural and engineering systems exhibit extreme events, i.e., spontaneous intermittent behavior manifested through sporadic bursts in the time series of their observables. Examples include ocean rogue waves, intermittency in turbulence and extreme weather patterns. Because of their undesirable impact on the system or the surrounding environment, the realtime prediction and mitigation of extreme events is of great interest. In this talk, I discuss some recent advances in the quantification and prediction of extreme events. In particular, I introduce a variational method that disentangles the mechanisms underpinning the formation of extreme events. This in turn enables the datadriven, realtime prediction of the extreme events. I demonstrate the application of this method with several examples including the prediction of ocean rogue waves and the intermittent energy dissipation bursts in turbulent fluid flows. 

November 7
Kevin Lin
School of Mathematical Science, University of Arizona Title: MoriZwanzig formalism and discretetime modeling of chaotic
dynamics
Abstract: Nonlinear dynamic phenomena often require a large number of dynamical variables to model, only a small fraction of which are of direct interest. Reduced models that use only the relevant dynamical variables can be very useful in such situations, both for computational efficiency and insights into the dynamics. Recent work has shown that the NARMAX (Nonlinear AutoRegressive MovingAverage with eXogenous inputs) representation of stochastic processes provides an effective basis for parametric model reduction in a number of concrete settings [ChorinLu PNAS 2015]. In this talk, I will review some of these developments, then explain how the NARMAX method can be seen as a special case of a general theoretical framework for model reduction due to Mori and Zwanzig. These ideas will be illustrated on a prototypical model of spatiotemporal chaos. Related papers:
Preprint available upon request (please write to feilu@math.jhu.edu) Databased stochastic model reduction for the KuramotoSivashinsky equation Comparison of continuous and discretetime databased modeling for hypoelliptic systems 

December 5
Clarence Rowley
Princeton University Title: Structure, stability, and simplicity in complex fluid flows
Abstract: Fluid flows can be extraordinarily complex, and even turbulent, yet often there is structure lying within the apparent complexity. Understanding this structure can help explain observed physical phenomena, and can help with the design of control strategies in situations where one would like to change the natural state of a flow. This talk addresses techniques for obtaining simple, approximate models for fluid flows, using data from simulations or experiments. We discuss a number of methods, including balanced truncation, linear stability theory, and dynamic mode decomposition, and apply them to several flows with complex behavior, including a transitional channel flow, a jet in crossflow, and a Tjunction in a pipe. 

December 10, Special time: 12:001:00, location: Shaffer 304
Marina Meila
Department of Statistics, University of Washington Title: Unsupervised Validation for Unsupervised Learning
Abstract: Scientific research involves finding patterns in data, formulating hypotheses, and validating them with new observations. Machine learning is many times faster than humans at finding patterns, yet the task of validating these as "significant" is still left to the human expert or to further experiment. In this talk I will present a few instances in which unsupervised machine learning tasks can be augmented with data driven validation. In the case of clustering, I will demonstrate a new framework of "proving" that a clustering is approximately correct, that does not require a user to know anything about the data distribution. This framework has some similarities to PAC bounds in supervised learning; unlike PAC bounds, the bounds for clustering can be calculated exactly and can be of direct practical utility. In the case of nonlinear dimension reduction by manifold learning, I will present a way around the following problem. It is widely recognized that the low dimensional embeddings obtained with manifold learning algorithms distort the geometric properties of the original data, like distances and angles. These algorithm dependent distortions make it unsafe to pipeline the output of a manifold learning algorithm into other data analysis algorithms, limiting the use of these techniques in engineering and the sciences. Our contribution is a statistically founded methodology to estimate and then cancel out the distortions introduced by any embedding algorithm, thus effectively preserving the distances in the original data. This method is based on augmenting the output of a manifold learning algorithm with "the pushforward Riemannian metric", i.e. with additional metric information that allows it to reconstruct the original geometry. Joint work with Dominique PerraultJoncas, James McQueen, Jacob VanderPlas, Grace Telford, Yuchia Chen, Samson Koelle 

The data seminar will take place on Wednesdays at 3pm, in Shaeffer 304, during the semester, on Johns Hopkins Homewood campus this map). 
Special Time: 11am and Location: Krieger 413; January 25th
Xiaofeng (Felix) Ye
University of Washington Title: Stochastic dynamics: Markov chains and random transformations
Abstract: The theory of stochastic dynamics has two different mathematical representations: stochastic processes and random dynamical system (RDS). RDS actually is a more refined mathematical description of the reality; it provides not only stochastic trajectories following one initial condition, but also describes how the entire phase space, with all initial conditions, changes with time. Stochastic processes represent the stochastic movements of individual systems. RDS, however, describes the motions of many systems that experience a common deterministic law that is changing with time due to extrinsic noises, which represent a fluctuating environment or complex external singles. The RDS is often a good framework to study a quite counterintuitive phenomenon called noiseinduced synchronization: the stochastic motions of noninteracting systems under a common noise synchronize; their trajectories become close to each other, while individual one remains stochastic. I first established some elementary contradistinctions between Markov chain theory and RDS descriptions of a stochastic dynamical system under discrete time, discrete state (dtds) setting. It was shown that a given Markov chain is compatible with many possible RDS, and I particularly studied the corresponding RDS with maximum metric entropy. I then proved the sufficient and necessary conditions for synchronization in general dtdsRDS and in dtdsRDS with maximum metric entropy. The work is based on the observation that under certain mild conditions, the forward probability in a hidden Markov model exhibits synchronization, which yields an efficient estimation with subsequences. Here I developed a minibatch gradient descent algorithm for parameter inference in the hidden Markov model. I first efficiently estimated the rate of synchronization, which was proven as the gap of top Lyapunov exponents, and then fully utilized it to approximate the length of subsequences in the minibatch algorithm. I theoretically validated the algorithm and numerically demonstrated the effectiveness. 
January 31st
Tom Goldstein
University of Maryland Title: Principled nonconvex optimization for deep learning and phase retrieval
Abstract: This talk looks at two classes of nonconvex problems. First, we discuss phase retrieval problems, and present a new formulation, called PhaseMax, that reduces this class of nonconvex problems into a convex linear program. Then, we turn our attention to more complex nonconvex problems that arise in deep learning. We'll explore the nonconvex structure of deep networks using a range of visualization methods. Finally, we discuss a class of principled algorithms for training "binarized" neural networks, and show that these algorithms have theoretical properties that enable them to overcome the nonconvexities present in neural loss functions. 
February 21st
Dimitris Giannakis
NYU Title:Datadriven modeling of vector fields and differential forms by spectral exterior calculus
Abstract: We discuss a datadriven framework for exterior calculus on manifolds. This framework is based on a representations of vector fields, differential forms, and operators acting on these objects in frames (overcomplete bases) for L^2 and higherorder Sobolev spaces built entirely from the eigenvalues and eigenfunctions of the Laplacian of functions. Using this approach, we represent vector fields either as linear combinations of frame elements, or as operators on functions via matrices. In addition, we construct a Galerkin approximation scheme for the eigenvalue problem for the LaplacedeRham operator on 1forms, and establish its spectral convergence. We present applications of this scheme to a variety of examples involving data sampled on smooth manifolds and the Lorenz 63 fractal attractor. This work is in collaboration with Tyrus Berry. 
March 14th
Edriss Titi
Texas A&M University, and The Weizmann Institute of Science Title: Data Assimilation and Feedback Control Algorithm for Dissipative Evolution Models Employing Coarse Mesh Observables
Abstract: One of the main characteristics of infinitedimensional dissipative evolution equations, such as the NavierStokes equations and reactiondiffusion systems, is that their longtime dynamics is determined by finitely many parameters  finite number of determining modes, nodes, volume elements and other determining interpolants. In this talk I will show how to explore this finitedimensional feature of the longtime behavior of infinitedimensional dissipative systems to design finitedimensional feedback control for stabilizing their solutions. Notably, it is observed that this very same approach can be implemented for designing data assimilation algorithms of weather prediction based on discrete measurements. In addition, and if time allows, I will also show that the longtime dynamics of the NavierStokes equations can be imbedded in an infinitedimensional dynamical system that is induced by an ordinary differential equations, named {\it determining form}, which is governed by a globally Lipschitz vector field.Remarkably, as a result of this machinery I will eventually show that the global dynamics of the NavierStokes equations is be determining by only one parameter that is governed by an ODE. The NavierStokes equations are used as an illustrative example, and all the above mentioned results equally hold to other dissipative evolution PDEs, in particular to various dissipative reactiondiffusion systems and geophysical models. This is a joint work with A. Azouani, H. Bessaih, A. Farhat, C. Foias, M. Jolly, R. Kravchenko, E. Lunasin and E. Olson 
March 28th
Rama Chellappa
University of Maryland, College Park Title: Learning Along the Edge of Deep Networks
Abstract: While Deep Convolutional Neural Networks (DCNNs) have achieved impressive results on many detection and classification tasks (for example, unconstrained face detection, verification and recognition), it is still unclear why they perform so well and how to properly design them. It is widely recognized that while training deep networks, an abundance of training samples is required. These training samples need to be lossless, perfectly labeled, and spanning various classes in a balanced way. The generalization performance of designed networks and their robustness to adversarial examples needs to be improved too. In this talk, we analyze each of these individual conditions to understand their effects on the performance of deep networks and present mitigation strategies when the ideal conditions are not met. First, we investigate the relationship between the performance of a convolutional neural network (CNN), its depth, and the size of its training set and present performance bounds on CNNs with respect to the network parameters and the size of the available training dataset. Next, we consider the task of adaptively finding optimal training subsets which will be iteratively presented to the DCNN. We present convex optimization methods, based on an objective criterion and a quantitative measure of the current performance of the classifier, to efficiently identify informative samples to train on. Then we present DefenseGAN, a new strategy that leverages the expressive capability of generative models to defend DCNNs against adversarial attacks. The DefenseGAN can be used with any classification model and does not modify the classifier structure or training procedure. It can also be used as a defense against any attack as it does not assume knowledge of the process for generating the adversarial examples. An approach for training a DCNN using compressed data will also be presented by employing the GAN framework. Finally, to address generalization to unlabeled test data and robustness to adversarial samples, we propose an approach that leverages unsupervised data to bring the source and target distributions closer in a learned joint feature space. This is accomplished by inducing a symbiotic relationship between the learned embedding and a generative adversarial network. We demonstrate the impact of the analyses discussed above on a variety of reconstruction and classification problems. 
April 18th
PierreEmmanuel Jabin
University of Maryland, College Park Title: Complexity of some models of interacting biological neurons
Abstract: We study some models of for the dynamics of large groups of biological neurons: Those typically consist of large coupled systems of ODE's or SDE's, usually implementing some simple form of integrate and fire. The main question that we wish to address concerns the behavior of such networks as the number of neurons increases. Many particle systems (as they are used in physics, or multiagent systems in general) naturally come to mind, as it is well known in such cases that propagation of chaos (i.e. the almost independence of each agent) can lead to a reduction in complexity through the direct calculation of various macroscopic densities. However the system under consideration here can be seen as multiagent system with positive reinforcement so that correlations between neurons never vanish. In an ongoing work with D. Poyato, we first study the case where neurons are essentially fully connected. We show that in spite of this simple topology, the networks may exhibit different measures of complexity which can be characterized through the type of initial connections between neurons. Related papers:
On the simulation of large populations of neurons Dynamics of sparsely connected networks of excitatory and inhibitory spiking networks Meanfield theory of irregularly spiking neuronal populations and working memory in recurrent cortical networks The mean field equation for the Kuramoto model on graph sequences with nonLipschitz limit 
The data seminar will take place on Wednesdays at 3pm, in Hodson Hall 203, during the semester, on Johns Hopkins Homewood campus this map). 
September 13th
Yannis Kevrekidis
Bloomberg distinguished Professor, ChemBE, Johns Hopkins University Title: No equations, no variables, no parameters, no space, no time: Data and the computational modeling of complex/multiscale systems
Abstract: Obtaining predictive dynamical equations from data lies at the heart of science and engineering modeling, and is the linchpin of our technology. In mathematical modeling one typically progresses from observations of the world (and some serious thinking!) first to equations for a model, and then to the analysis of the model to make predictions. Good mathematical models give good predictions (and inaccurate ones do not)  but the computational tools for analyzing them are the same: algorithms that are typically based on closed form equations. While the skeleton of the process remains the same, today we witness the development of mathematical techniques that operate directly on observations data, and appear to circumvent the serious thinking that goes into selecting variables and parameters and deriving accurate equations. The process then may appear to the user a little like making predictions by "looking in a crystal ball". Yet the "serious thinking" is still there and uses the same and some new mathematics: it goes into building algorithms that "jump directly" from data to the analysis of the model (which is now not available in closed form) so as to make predictions. Our work here presents a couple of efforts that illustrate this ``new” path from data to predictions. It really is the same old path, but it is travelled by new means. 
September 20th
Carey Priebe
Professor, Applied Mathematics and Statistics, Johns Hopkins University Semiparametric spectral modeling of the Drosophila connectome
We present semiparametric spectral modeling of the complete larval Drosophila mushroom body connectome. Motivated by a thorough exploratory data analysis of the network via Gaussian mixture modeling (GMM) in the adjacency spectral embedding (ASE) representation space, we introduce the latent structure model (LSM) for network modeling and inference. LSM is a generalization of the stochastic block model (SBM) and a special case of the random dot product graph (RDPG) latent position model, and is amenable to semiparametric GMM in the ASE representation space. The resulting connectome code derived via semiparametric GMM composed with ASE captures latent connectome structure and elucidates biologically relevant neuronal properties. Related papers:
The complete connectome of a learning and memory center in an insect brain A consistent adjacency spectral embedding for stochastic blockmodel graphs A limit theorem for scaled eigenvectors of random dot product graphs Limit theorems for eigenvectors of the normalized Laplacian for random graphs 
September 27
John Benedetto
Professor, Department of Mathematics, University of Maryland, College Park and Norbert Wiener Center Frames  two case studies: ambiguity and uncertainty
The theory of frames is an essential concept for dealing with signal representation in noisy environments. We shall examine the theory in the settings of the narrow band ambiguity function and of quantum information theory. For the ambiguity function, best possible estimates are derived for applicable constant amplitude zero autocorrelation (CAZAC) sequences using Weil's solution of the Riemann hypothesis for finite fields. In extending the theory to the vectorvalued case modelling multisensor environments, the definition of the ambiguity function is characterized by means of group frames. For the uncertainty principle, Andrew Gleason's measure theoretic theorem, establishing the transition from the lattice interpretation of quantum mechanics to Born's probabilistic interpretation, is generalized in terms of frames to deal with uncertainty principle inequalities beyond Heisenberg's. My collaborators are Travis Andrews, Robert Benedetto, Jeffrey Donatelli, Paul Koprowski, and Joseph Woodworth. Related papers:
Superresolution by means of Beurling minimal extrapolation Generalized Fourier frames in terms of balayage Uncertainty principles and weighted norm inequalities A frame reconstruction algorithm with applications to magnetric resonance imaging Frame multiplication theory and a vectorvalued DFT and ambiguity functions 
October 4th
Nathan Kutz
Robert Bolles and Yasuko Endo Professor, Applied Mathematics, University of Washington Datadriven discovery of governing equations and physical laws
The emergence of data methods for the sciences in the last decade has been enabled by the plummeting costs of sensors, computational power, and data storage. Such vast quantities of data afford us new opportunities for datadriven discovery, which has been referred to as the 4th paradigm of scientific discovery. We demonstrate that we can use emerging, largescale timeseries data from modern sensors to directly construct, in an adaptive manner, governing equations, even nonlinear dynamics, that best model the system measured using modern regression techniques. Recent innovations also allow for handling multiscale physics phenomenon and control protocols in an adaptive and robust way. The overall architecture is equationfree in that the dynamics and control protocols are discovered directly from data acquired from sensors. The theory developed is demonstrated on a number of canonical example problems from physics, biology and engineering. 
October 11th
Fei Lu
Assistant Professor, Department of Mathematics, Johns Hopkins University Data assimilation with stochastic model reduction
In weather and climate prediction, data assimilation combines data with dynamical models to make prediction, using ensemble of solutions to represent the uncertainty. Due to limited computational resources, reduced models are needed and coarsegrid models are often used, and the effects of the subgrid scales are left to be taken into account. A major challenge is to account for the memory effects due to coarse graining while capturing the key statisticaldynamical properties. We propose to use nonlinear autoregression moving average (NARMA) type models to account for the memory effects, and demonstrate by examples that the resulting NARMA type stochastic reduced models can capture the key statistical and dynamical properties and therefore can improve the performance of ensemble prediction in data assimilation. The examples include the Lorenz 96 system (which is a simplified model of the atmosphere) and the KuramotoSivashinsky equation of spatiotemporally chaotic dynamics. 
October 18th
Roy Lederman
Postdoc, Program in Applied and Computational Mathematics, Princeton University “HyperMolecules” in CryoElectron Microscopy (cryoEM)
CryoEM is an imaging technology that is revolutionizing structural biology; the Nobel Prize in Chemistry 2017 was recently awarded to Jacques Dubochet, Joachim Frank and Richard Henderson “for developing cryoelectron microscopy for the highresolution structure determination of biomolecules in solution". Cryoelectron microscopes produce a large number of very noisy twodimensional projection images of individual frozen molecules. Unlike related tomography methods, such as computed tomography (CT), the viewing direction of each image is unknown. The unknown directions, together with extreme levels of noise and additional technical factors, make the determination of the structure of molecules challenging. Unlike other structure determination methods, such as xray crystallography and nuclear magnetic resonance (NMR), cryoEM produces measurements of individual molecules and not ensembles of molecules. Therefore, cryoEM could potentially be used to study mixtures of different conformations of molecules. While current algorithms have been very successful at analyzing homogeneous samples, and can recover some distinct conformations mixed in solutions, the determination of multiple conformations, and in particular, continuums of similar conformations (continuous heterogeneity), remains one of the open problems in cryoEM. I will discuss the “hypermolecules” approach to continuous heterogeneity, and the numerical tools and analysis methods that we are developing in order to recover such hypermolecules. 
October 25th
John Harlim
Professor, Department of Mathematics, The Pennsylvania State University Nonparametric modeling for prediction and data assimilation
I will discuss a nonparametric modeling approach for forecasting stochastic dynamical systems on smooth manifolds embedded in Euclidean space. This approach allows one to evolve the probability distribution of nontrivial dynamical systems with an equationfree modeling. In the second part of this talk, I will discuss a nonparametric estimation of likelihood functions using datadriven basis functions and the theory of kernel embeddings of conditional distributions developed in the machine learning community. I will demonstrate how to use this likelihood function to estimate biased modeling error in assimilating cloudy satellite brightness temperaturelike quantities. 
November 8th
Eitan Tadmor
Distinguished University Professor, Department of Mathematics, Institute for Physical Science & Technology, Center for Scientific Computation and Mathematical Modeling (CSCAMM), University of Maryland Title: Emergent behavior in selforganized dynamics: from consensus to hydrodynamic flocking
Abstract: We discuss several first and secondorder models encountered in opinion and flocking dynamics. The models are driven by different “rules of engagement”, which quantify how each member interacts with its immediate neighbors. We highlight the role of geometric vs. topological neighborhoods and distinguish between local and global interactions, while addressing the following two related questions. (i) How local rules of interaction lead, over time, to the emergence of consensus; and (ii) How the flocking behavior of large crowds captured by their hydrodynamic description. Related papers:
Kinetic descriptions – a mathematical bridge to better understand the world Mathematical aspects of selforganized dynamics: consensus, emergence of leaders, and social hydrodynamics Heterophilious dynamics enhances consensus From particle to kinetic and hydrodynamic descriptions of flocking 
November 15th
Tyrus Berry
Research Associate, Department of Mathematical Science, George Mason University What geometries can we learn from data?
In the field of manifold learning, the foundational theoretical results of Coifman and Lafon (Diffusion Maps, 2006) showed that for data sampled near an embedded manifold, certain graph Laplacian constructions are consistent estimators of the LaplaceBeltrami operator on the underlying manifold. Since these operators determine the Riemannian metric, they completely describe the geometry of the manifold (as inherited from the embedding). It was later shown that different kernel functions could be used to recover any desired geometry, at least in terms of pointwise estimation of the associated LaplaceBeltrami operator. In this talk I will first briefly review the above results and then introduce new results on the spectral convergence of these graph Laplacians. These results reveal that not all geometries are accessible in the stronger spectral sense. However, when the data set is sampled from a smooth density, there is a natural conformally invariant geometry which is accessible on all compact manifolds, and even on a large class of noncompact manifolds. Moreover, the kernel which estimates this geometry has a very natural construction which we call Continuous kNearest Neighbors (CkNN). 
November 29th
Yingzhou Li
Phillip Griffiths Research Assistant Professor, Department of Mathematics, Duke University Kernel functions and their fast evaluations
Kernel matrices are popular in machine learning and scientific computing, but they are limited by their quadratic complexity in both construction and storage. It is wellknown that as one varies the kernel parameter, e.g., the width parameter in radial basis function kernels, the kernel matrix changes from a smooth lowrank kernel to a diagonallydominant and then fullydiagonal kernel. Lowrank approximation methods have been widelystudied, mostly in the first case, to reduce the memory storage and the cost of computing matrixvector products. Here, we use ideas from scientific computing to propose an extension of these methods to situations where the matrix is not wellapproximated by a lowrank matrix. In particular, we construct an efficient block lowrank approximation method  which we call the Block Basis Factorization  and we show that it has O(n) complexity in both time and memory. Our method works for a wide range of kernel parameters, extending the domain of applicability of lowrank approximation methods, and our empirical results demonstrate the stability (small standard deviation in error) and superiority over current stateofart kernel approximation algorithms. Related papers:
Structured Block Basis Factorization for Scalable Kernel Matrix On the numerical rank of radial basis function kernel matrices in high dimension 
December 6th
HauTieng Wu
Associate Professor, Department of Mathematics, Duke University Some Data Analysis Tools Inspired by Medical Challenges — Fetal ECG as an example
We discuss a particular interest in medicine — extracting hidden dynamics from a single observed time series composed of multiple oscillatory signals, which could be viewed as a singlechannel blind source separation problem. This problem is common nowadays due to the popular mobile health monitoring devices, and is made challenging by the structure of the signal which consists of nonsinusoidal oscillations with time varying amplitude/frequency, and by the heteroscedastic nature of the noise. Inspired by the fetal electrocardiogram (ECG) signal analysis from the single lead maternal abdominal ECG signal, in this talk I will discuss some new data analysis tools, including the cepstrumbased nonlineartype timefrequency analysis and fiberbundle based manifold learning technique. In addition to showing the results in fetal ECG analysis, I will also show how the approach could be applied to simultaneously extract the instantaneous heart/respiratory rate from a PPG signal during exercise. If time permits, the clinical trial results will be discussed.Abstract: Some Data Analysis Tools Inspired by Medical Challenges — Fetal ECG as an example Related papers:

December 20th
Valeriya Naumova
Senior research scientist, Section for Computing and Software, Simula Research Laboratory (Simula) Multiparameter regularisation for solving unmixing problems in signal
processing: theoretical and practical aspects
Motivated by reallife applications in signal processing and image analysis, where the quantity of interest is generated by several sources to be accurately modelled and separated, as well as by recent advances in sparse regularisation theory and optimisation, we present a theoretical and algorithmic framework for optimal support recovery in inverse problems of unmixing type by means of multipenalty regularisation. While multipenalty regularisation is not a novel technique [1], we aim at providing precise reconstruction guarantees and methods for adaptive regularisation parameter choice. We consider and analyse a regularisation functional composed of a datafidelity term, where signal and noise are additively mixed, a nonsmooth, convex, sparsity promoting term, and a convex penalty term to model the noise. We prove not only that the wellestablished theory for sparse recovery in the single parameter case can be translated to the multipenalty settings, but we also demonstrate the enhanced properties of multipenalty regularisation in terms of support identification compared to sole $\ell^1$minimisation. Extending the notion of Lasso path algorithm, we additionally propose an efficient procedure for an adaptive parameter choice in multipenalty regularisation, focusing on the recovery of the correct support of the solution. The approach essentially enables a fast construction of a tiling over the parameter space in such a way that each tile corresponds to a different sparsity pattern of the solution. Finally, we propose an iterative alternating algorithm based on simple iterative thresholding steps to perform the minimisation of the extended multipenalty functional, containing nonsmooth and nonconvex sparsity promoting term. To exemplify the robustness and effectiveness of the multipenalty framework, we provide an extensive numerical analysis of our method and compare it with stateoftheart singlepenalty algorithms for compressed sensing problems. This is joint work with Markus Grasmair [3, 4], Norwegian University of Science and Technology; Timo Klock [4], Simula Research Laboratory, and Johannes Maly and Steffen Peter [2], Technical University of Munich. Related papers:
Y. Meyer, Oscillating patterns in image processing and nonlinear evolution equations Minimization of multipenalty functionals by alternating iterative thresholding and optimal parameter choices Conditions on optimal support recovery in unmixing problems by means of multipenalty regularization Multiple parameter learning with regularization path algorithms 
The data seminar will take place on Wednesdays at 3pm, in Whitehead Hall 304, during the semester, on Johns Hopkins Homewood campus this map). 
February 8th
Kasso Okoudjou
Professor and Associate Chair, Department of Mathematics, University of Maryland, College Park https://www.math.umd.edu/~okoudjou/ Inductive and numerical approaches to the HRT conjecture
Given a nonzero square integrable function $g$ and $\Lambda=\{(a_k, b_k)\}_{k=1}^N \subset \Br^2$ let $G(g, \Lambda)=\{e^{2\pi i b_k \cdot}g(\cdot  a_k)\}_{k=1}^N.$ The HeilRamanathanTopiwala (HRT) Conjecture is the question of whether $G(g, \Lambda)$ is linearly independent. For the last two decades, very little progress has been made in settling the conjecture. In the first part of the talk, I will give an overview of the state of the conjecture. I will then describe recent inductive and numerical approaches to attack the conjecture. If time permits, I will present some new positive results in the special case where $g$ is realvalued. 
February 15th
Jerome Darbon
Assistant Professor, Applied Mathematics, Brown University https://www.brown.edu/academics/appliedmathematics/jeromedarbon On convex finitedimensional variational methods in imaging sciences,
and HamiltonJacobi equations
We consider standard finitedimensional variational models used in signal/image processing that consist in minimizing an energy involving a data fidelity term and a regularization term. We propose new remarks from a theoretical perspective which give a precise description on how the solutions of the optimization problem depend on the amount of smoothing effects and the data itself. The dependence of the minimal values of the energy is shown to be ruled by HamiltonJacobi equations, while the minimizers $u(x,t)$ for the observed images x and smoothing parameters $t$ are given by $ u(x,t)=x \nabla H(\nabla E(x,t))$ where $E(x,t)$ is the minimal value of the energy and $H$ is a Hamiltonian related to the data fidelity term. Various vanishing smoothing parameter results are derived illustrating the role played by the prior in such limits. Finally, we briefly present an efficient numerical numerical method for solving certain HamiltonJacobi equations in high dimension and some applications in optimal control. 
March 8th
Matthew Hirn
Assistant Professor, Department of Mathematics, Michigan State University https://matthewhirn.wordpress.com/ Learning many body physics via multiscale, multilayer machine learnining architectures
Deep learning algorithms are making their mark in machine learning, obtaining state of the art results across computer vision, natural language processing, auditory signal processing and more. A wavelet scattering transform has the general architecture of a convolutional neural network, but leverages structure within data by encoding multiscale, stable invariants relevant to the task at hand. This approach is particularly relevant to data generated by physical systems, as such data must respect underlying physical laws. We illustrate this point through many body physics, in which scattering transforms can be loosely interpreted as the machine learning version of a fast multipole method (FMM). Unlike FMMs, which efficiently simulate the physical system, the scattering transform learns the underlying physical kernel from given states of the system. The resulting learning algorithm obtains state of the art numerical results for the regression of molecular energies in quantum chemistry, obtaining errors on the order of more costly quantum mechanical approaches. 
March 29th
Jason Eisner
Professor, Department of Computer Science, Johns Hopkins University http://www.cs.jhu.edu/~jason/ Probabilistic Modeling of Natural Language
Natural language is a particular kind of timeseries data. By way of introduction, I will informally sketch some of the phenomena that occur in natural language data and the kinds of probabilistic models that are traditionally used to describe them (e.g., context free grammars, Chinese restaurant processes, graphical models, finitestate transducers, recurrent neural networks augmented with memory). Many of these are covered in my JHU fall course, EN.600.465 Natural Language Processing. As an illustrative example, I will then describe a new conditional probability model that combines LSTMs with finitestate transducers to predict one string from another. For example, such a model can convert between the past and present tenses of an unfamiliar verb. Such pairwise conditional distributions can be combined into graphical models that model the relationships among many strings. 
April 5th
Afonso Bandeira
Assistant Professor, Courant Institute of Mathematical Sciences, NYU http://www.cims.nyu.edu/~bandeira/ On Phase Transitions for Spiked Random Matrix and Tensor Models
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector (or low rank structure) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences, where the goal is often to recover or detect the planted low rank structured. In this talk we discuss fundamental limitations of statistical methods to perform these tasks and methods that outperform PCA at it. Emphasis will be given to low rank structures arising in Synchronization problems. Time permitting, analogous results for spiked tensor models will also be discussed. Joint work with: Amelia Perry, Alex Wein, and Ankur Moitra. 
April 12th
Andrew Christlieb
MSU Foundation Professor (Department of Mathematics) and Department Chair (Department of Computational Mathematics, Science and Engineering), Michigan State University https://cmse.msu.edu/directory/faculty/andrewchristlieb/ A sublinear deterministic FFT for sparse high dimensional signals
In this talk we investigate the problems of efficient recover of sparse signals (sparsity=k) in a high dimensional setting. In particular, we are going to investigate efficient recovery of the k largest Fourier modes of a signal of size N^d, where N is the bandwidth and d is the dimension. Our objective is the development of a high dimensional sublinear FFT, d=100 or 1000, that can recover the signal in O(d k log k) time. The methodology is based on our one dimensional deterministic sparse FFT that is O(k log k). The original method is recursive and based on ratios of short FFTs of pares of subsampled signals. The same ratio test allows us to identify when there is a collision due to aliasing the subsampled signals. The recursive nature allows us to separate and identify frequencies that have collided. Key in the high dimensional setting is the introduction of a partial unwrapping method and a tilting method that can ensure that we avoid collisions in the high dimensional setting on subsampled grids. We present the method, some analysis and results for a range of tests in both the noisy and noiseless cases. 
April 26th
Wojciech Czaja
Professor, Department of Mathematics, University of Maryland, College Park https://www.math.umd.edu/~wojtek/ Solving Fredholm integrals from incomplete measurements
We present an algorithm to solve Fredholm integrals of the first kind with tensor product structures, from a limited number of measurements with the goal of using this method to accelerate Nuclear Magnetic Resonance (NMR) acquisition. This is done by incorporating compressive sampling type arguments to fill in the missing measurements using a priori knowledge of the structure of the data. In the first step, we recover a compressed data matrix from measurements that form a tight frame, and establish that these measurements satisfy the restricted isometry property (RIP). In the second step, we solve the zerothorder regularization minimization problem using the VenkataramananSongHuerlimann algorithm. We demonstrate the performance of this algorithm on simulated and real data and we compare it with other sampling techniques. Our theory applied to both 2D and multidimensional NMR. 
The data seminar will take place on Wednesdays at 3pm, in Krieger Hall 309, during the semester, on Johns Hopkins Homewood campus (building 39 at location F2/3 on this map). 
September 7th
Radu Balan
Professor, Department of Mathematics, University of Maryland, College Park http://www.math.umd.edu/~rvbalan/ Statistics of the Stability Bounds in the Phase Retrieval Problem
In this talk we present a localglobal Lipschitz analysis of the phase retrieval problem. Additionally we present tentative estimates of the tailbound for the distribution of the global Lipschitz constants. Specifically it is known that if the frame $\{f_1,\ldots,f_m\}$ for $C^n$ is phase retrievable then there are constants $a_0$ and $b_0$ so that for every $x,y\in C^n$: $a_0 xx^*yy^*_1^2 \leq \sum_{k=1}^m \langle x,f_k\rangle^2\langle y,f_k\rangle^2^2 \leq b_0 xx^*yy^*_1^2$. Assume $f_1,\ldots,f_m$ are independent realizations with entries from $CN(0,1)$. In this talk we establish estimates for the probability $P(a_0>a)$. 
September 21st
Charles Meneveau
Louis M. Sardella Professor of Mechanical Engineering, Johns Hopkins University http://pages.jh.edu/~cmeneve1/ Hydrodynamic turbulence in the era of big data: simulation, data, and analysis
In this talk, we review the classic problem of NavierStokes turbulence, the role numerical simulations have played in advancing the field, and the data challenges posed by these simulations. We describe the Johns Hopkins Turbulence Databases (JHTDB) and present some sample applications from the areas of velocity increment statistics and finite time Lyapunov exponents in isotropic turbulence and wall modeling for Large Eddy Simulations of wallbounded flows. Related papers:
A Web services accessible database of turbulent channel flow and its use for testing a new integral wall model for LES Largedeviation statistics of vorticity stretching in isotropic turbulence A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence 
Note date and place: September 23rd  2pm  Gilman 132
Ben Leimkuhler
Professor of Mathematics, University of Edinburgh http://kac.maths.ed.ac.uk/~bl/ From Molecular Dynamics to Large Scale Inference
Molecular models and data analytics problems give rise to very large systems of stochastic differential equations (SDEs) whose paths are designed to ergodically sample multimodal probability distributions. An important challenge for the numerical analyst (or the data scientist, for that matter) is the design of numerical procedures to generate these paths. One of the interesting ideas is to construct stochastic numerical methods with close attention to the error in the invariant measure. Another is to redesign the underlying stochastic dynamics to reduce bias or locally transform variables to enhance sampling efficiency. I will illustrate these ideas with various examples, including a geodesic integrator for constrained Langevin dynamics [1] and an ensemble sampling strategy for distributed inference [2]. 
September 28th
Rene Vidal
Professor of Biomedical Engineering, Computer Science, Mechanical Engineering, and Electrical and Computer Engineering, Johns Hopkins University http://cis.jhu.edu/~rvidal/ Global Optimality in Matrix and Tensor Factorization, Deep Learning, and Beyond
Matrix, tensor, and other factorization techniques are used in many applications and have enjoyed significant empirical success in many fields. However, common to a vast majority of these problems is the significant disadvantage that the associated optimization problems are typically nonconvex due to a multilinear form or other convexity destroying transformation. Building on ideas from convex relaxations of matrix factorizations, in this talk I will present a very general framework which allows for the analysis of a wide range of nonconvex factorization problems  including matrix factorization, tensor factorization, and deep neural network training formulations. In particular, I will present sufficient conditions under which a local minimum of the nonconvex optimization problem is a global minimum and show that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a local descent algorithm. 
October 26th
Robert Pego
Professor, Department of Mathematical Sciences, Carnegie Mellon University http://www.math.cmu.edu/~bobpego/ Microdroplet instablity for incompressible distance between shapes
AbstractThe leastaction problem for geodesic distance on the 'manifold' of fluidblob shapes exhibits instability due to microdroplet formation.This reflects a striking connection between Arnold's leastaction principle for incompressible Euler flows and geodesic paths for Wasserstein distance. A connection with fluid mixture models via a variant of Brenier's relaxed leastaction principle for generalized Euler flows will be outlined also. This is joint work with JianGuo Liu and Dejan Slepcev. 
November 2nd
Dejan Slepcev
Associate Professor, Department of Mathematical Sciences, Carnegie Mellon University http://www.math.cmu.edu/~slepcev/ Variational problems on graphs and their continuum limits
We will discuss variational problems arising in machine learning and their limits as the number of data points goes to infinity. Consider point clouds obtained as random samples of an underlying "groundtruth" measure. Graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points. Many machine learning tasks, such as clustering and classification, can be posed as minimizing functionals on such graphs. We consider functionals involving graph cuts and graph laplacians and their limits as the number of data points goes to infinity. In particular we establish under what conditions the minimizers of discrete problems have a well defined continuum limit, and characterize the limit. The talk is primarily based on joint work with Nicolas Garcia Trillos, as well as on works with Xavier Bresson, Moritz Gerlach, Matthias Hein, Thomas Laurent, James von Brecht and Matt Thorpe. 
November 9th
Markos Katsoulakis
Professor, Department of Mathematics and Statistics http://people.math.umass.edu/~markos/ Scalable Information Inequalities for Uncertainty Quantification in high dimensional probabilistic models
In this this talk we discuss new scalable information bounds for quantities of interest of complex stochastic models. The scalability of inequalities allows us to (a) obtain uncertainty quantification bounds for quantities of interest in highdimensional systems and/or for long time stochastic dynamics; (b) assess the impact of large model perturbations such as in nonlinear response regimes in statistical mechanics; (c) address modelform uncertainty, i.e. compare different extended probabilistic models and corresponding quantities of interest. We demonstrate these tools in fast sensitivity screening of chemical reaction networks with a very large number of parameters, and towards obtaining robust and tight uncertainty quantification bounds for phase diagrams in statistical mechanics models. Related papers:
Scalable Information Inequalities for Uncertainty Quantification Accelerated Sensitivity Analysis in High Dimensional Stochastic Reaction Networks PathSpace Information Bounds for Uncertainty Quantification and Sensitivity Analysis of Stochastic Dynamics Pathspace variational inference for nonequilibrium coarsegrained systems Effects of correlated parameters and uncertainty in electronicstructurebased chemical kinetic modelling 
November 30th
Youssef Marzouk
Associate Professor of Aeronautics and Astronautics, Massachusetts Institute of Technology http://aeroastro.mit.edu/facultyresearch/facultylist/youssefmmarzouk Measure transport approaches for Bayesian computation
We will discuss how transport maps, i.e., deterministic couplings between probability measures, can enable useful new approaches to Bayesian computation. A first use involves a combination of optimal transport and Metropolis correction; here, we use continuous transportation to transform typical MCMC proposals into adapted nonGaussian proposals, both local and global. Second, we discuss a variational approach to Bayesian inference that constructs a deterministic transport map from a reference distribution to the posterior, without resorting to MCMC. Independent and unweighted samples can then be obtained by pushing forward reference samples through the map. Making either approach efficient in high dimensions, however, requires identifying and exploiting lowdimensional structure. We present new results relating the sparsity and decomposability of transports to the conditional independence structure of the target distribution. We also describe conditions, common in inverse problems, under which transport maps have a particular lowrank or nearidentity structure. In general, these properties of transports can yield more efficient algorithms. As a particular example, we derive new deterministic "online" algorithms for Bayesian inference in nonlinear and nonGaussian statespace models with static parameters. This is joint work with Daniele Bigoni, Matthew Parno, and Alessio Spantini. 
Note date and place: December 2nd  10am  Gilman 132
Alex Cloninger
Gibbs Assistant Professor and NSF Postdoctoral Fellow, Yale University http://users.math.yale.edu/~ac2528 Incorporation of Geometry into Learning Algorithms and Medicine
This talk focuses on two instances in which scientific fields outside mathematics benefit from incorporating the geometry of the data. In each instance, the applications area motivates the need for new mathematical approaches and algorithms, and leads to interesting new questions. (1) A method to determine and predict drug treatment effectiveness for patients based off their baseline information. This motivates building a function adapted diffusion operator high dimensional data X when the function F can only be evaluated on large subsets of X, and defining a localized filtration of F and estimation values of F at a finer scale than it is reliable naively. (2) The current empirical success of deep learning in imaging and medical applications, in which theory and understanding is lagging far behind.. By assuming the data lies near low dimensional manifolds and building local wavelet frames, we improve on existing theory that breaks down when the ambient dimension is large (the regime in which deep learning has seen the most success) 
December 7th
Ben Adcock
Assistant Professor, Simons Fraser University http://benadcock.org/ Sparse polynomial approximation of highdimensional functions
Many problems in scientific computing require the approximation of smooth, highdimensional functions from limited amounts of data. For instance, a common problem in uncertainty quantification involves identifying the parameter dependence of the output of a computational model. Complex physical systems require computational models with many parameters, resulting in multivariate functions of many variables. Although the amount of data may be large, the curse of dimensionality essentially prohibits collecting or processing enough data to reconstruct the unknown function using classical approximation techniques. In this talk, I will give an overview of the approximation of smooth, highdimensional functions by sparse polynomial expansions. I will focus on the recent application of techniques from compressed sensing to this problem, and demonstrate how such approaches theoretically overcome the curse of dimensionality. If time, I will also discuss a number of extensions, including dealing with corrupted and/or unstructured data, the effect of model error and incorporating additional information such as gradient data. I will also highlight several challenges and open problems. This is joint work with Casie Bao, Simone Brugiapaglia and Yi Sui (SFU). 