These are the slides behind my major presentations and posters. They are all released with a CC-BY license. You can subscribe to these posts as an RSS feed here. The source code for this website is available in this repo. It uses Jekyll to take the list of presentations and statically generate this page along with individual linkable pages for them.


  1. 2024 Differentiable optimization and robotics
  2. 2024 Amortized optimization for OT and LLMs
  3. 2024 Amortized optimization and AI
  4. 2024 Lagrangian OT Poster
  5. 2024 End-to-end learning geometries for graphs, dynamical systems, and regression
  6. 2023 On amortizing convex conjugates for optimal transport
  7. 2023 TaskMet Poster
  8. 2023 Meta Optimal Transport
  9. 2023 Learning with differentiable and amortized optimization
  10. 2023 On optimal control and machine learning
  11. 2023 Continuous optimal transport
  12. 2023 Amortized optimization
  13. 2023 Amortized optimization for optimal transport
  14. 2022 Differentiable optimization
  15. 2022 Differentiable control
  16. 2022 Amortized optimization
  17. 2022 Amortized optimization for computing optimal transport maps
  18. 2021 On the model-based stochastic value gradient for continuous RL
  19. 2021 Riemannian Convex Potential Maps
  20. 2020 The differentiable cross-entropy method
  21. 2019 Differentiable optimization-based modeling for machine learning (PhD Thesis)
  22. 2018 PyTorch libraries for linear algebra, optimization, and control
  23. 2018 OptNet, end-to-end task-based learning, and control (ISMP)
  24. 2018 Differentiable MPC
  25. 2018 Differentiable MPC Poster
  26. 2017 OptNet: Differentiable Optimization as a Layer in Neural Networks
  27. 2017 Input Convex Neural Networks

Differentiable optimization and robotics

2024 | Powerpoint | PDF

Optimization is a crucial technology for robotics and provides functionality such as optimal control, motion planning, state estimation, alignment, manipulation, tactile sensing, pose tracking, and safety mechanisms. These solvers are often integrated with learned models that estimate and predict non-trivial parts of the world. *Differentiable optimization* enables the learned model to receive a learning signal from these downstream optimization problems. This signal encourages the model to improve on regions that are important for the optimization problem to work well, rather than making accurate predictions under a supervised loss. This talk will overview the foundations, applications, and recent advancements on these topics, with a focus on continuous optimal control (MPC) and non-linear least squares.



Amortized optimization for OT and LLMs

2024 | Powerpoint | PDF

Amortized optimization methods provide fast solvers by predicting approximate solutions to optimization problems. This talk covers two recent advancements using amortization to significantly speed up the solvers of non-trivial optimization problems arising in the fields of optimal transport (OT) and large language model (LLM) attacks. Computational optimal transport problems may involve solving three nested optimization problems, each of which amortization can help with: 1) the solution map from the measures to the primal/dual OT solution ([Meta OT](https://arxiv.org/abs/2206.05262)), 2) the computation of the c-transform or Fenchel conjugate ([amortized conjugates](https://arxiv.org/abs/2210.12153)), and 3) the computation of geodesics and Lagrangian (minimum-action) paths/costs ([Lagrangian OT](https://openreview.net/pdf?id=myb0FKB8C9)). Adding amortization to the standard solvers in these OT settings significantly improves the runtime and deployment time of the methods. These faster amortized solutions to the Fenchel conjugate and geodesic/Lagrangian paths are of potential more general interest in other settings bottlenecked by numerical solutions to them. Beyond these optimal transport applications, we will also discuss the prompt optimization problems arising in adversarial attacks on LLMs ([AdvPrompter](https://arxiv.org/abs/2404.16873)). Here, amortization enables us to attain state-of-the-art results on the standard AdvBench dataset, that also transfer to closed-source black-box LLM APIs. The fast amortized predictions then enable us to generate a synthetic dataset of adversarial examples which an LLM can be fine-tuned on to make it more robust against jailbreaking attacks while maintaining performance.



Amortized optimization and AI

2024 | Powerpoint | PDF

AI and optimization systems are widely deployed in today's computing landscape. AI systems have a remarkable capacity to make abstractions and predictions about the world while optimization systems drive decision-making, control, and robotic systems that reason and interact with the world. These technologies are already intertwined and overlapping, and optimization-based reasoning systems will continue playing a crucial role in AI systems as they continue advancing towards general intelligence. Connecting to Kahneman's modes on thought, explicitly forming and solving an optimization problem is akin to "System 2" (i.e., slow thinking), while rapidly predicting a solution to the problem can be seen as "System 1" (i.e., fast thinking). AI systems can interact with optimization solvers via a "System 2" approach by using optimization as a tool, where humans can also inject domain knowledge or safety constraints and guardrails, or via "System 1" by learning to rapidly predict (or [amortize](https://arxiv.org/abs/2202.00665)) solutions to the optimization problems. This talk focuses on the amortization process of distilling the solutions to optimization problems into a fast, predictive model. We highlight a few recent developments in: 1) amortizing transportation between measures ([Meta Optimal Transport](https://arxiv.org/abs/2206.05262) and [Meta Flow Matching](https://openreview.net/forum?id=f9GsKvLdzs)). These methods have applications in computational biology for predicting how a population of cells will be transported given an initial population and treatment. 2) amortizing [convex conjugates](https://arxiv.org/abs/2210.12153) and [Lagrangian paths](https://arxiv.org/abs/2406.00288), including geodesic computations. These significantly improve neural optimal transport methods repeatedly solving these subproblems, and are of broader interest anywhere repeatedly conjugating or solving path planning problems. 3) [amortizing language model prompt optimization and adversarial attacks](https://arxiv.org/abs/2404.16873). This setting involves repeatedly searching over the prompt space for every new prompt to jailbreak a target model, and amortization involves learning a language model that generates prompt-conditional suffixes that solve this optimization problem. Amortizing these problems attains state-of-the-art results and human-interpretable prompt modifications on the standard AdvBench settings that also transfer to closed-source black-box LLM APIs.



Lagrangian OT Poster

2024 | Powerpoint | PDF | Paper

We investigate the optimal transport problem between probability measures when the underlying cost function is understood to satisfy a least action principle, also known as a Lagrangian cost. These generalizations are useful when connecting observations from a physical system where the transport dynamics are influenced by the geometry of the system, such as obstacles (e.g., incorporating barrier functions in the Lagrangian), and allows practitioners to incorporate a priori knowledge of the underlying system such as non-Euclidean geometries (e.g., paths must be circular). Our contributions are of computational interest, where we demonstrate the ability to efficiently compute geodesics and amortize spline-based paths, which has not been done before, even in low dimensional problems. Unlike prior work, we also output the resulting Lagrangian optimal transport map without requiring an ODE solver. We demonstrate the effectiveness of our formulation on low-dimensional examples taken from prior work.



End-to-end learning geometries for graphs, dynamical systems, and regression

2024 | Powerpoint | PDF

Every machine learning setting has an underlying geometry where the data is represented and the predictions are performed in. While defaulting the geometry to a Euclidean or known manifold is capable of building powerful models, /learning/ a non-trivial geometry from data is useful for improving the overall performance and estimating unobserved structures. This talk focuses on learning geometries for: 1) *graph embeddings*, where the geometry of the embedding (e.g., Euclidean, spherical, or hyperbolic) heavily influences the accuracy and distortion of the embedding depending on the graph's structure; 2) *dynamical systems*, where the geometry of the state space can uncover unobserved properties of the underlying systems, e.g., geographic information such as obstacles or terrains; and 3) *regression*, where the geometry of the prediction space influences where the model should be accurate or inaccurate for some downstream task. We will focus on *latent* geometries in these settings that are not directly observable from the data, i.e., the geometry cannot be estimated as a submanifold of the Euclidean space the data is observed in. Instead in these settings the geometry can be shaped via a downstream signal that propagates through differentiable operations such as the geodesic distance, and log/exp maps on Riemannian manifolds. The talk covers the foundational tools here on making operations differentiable (in general via the envelope and implicit function theorems, but potentially simpler when closed-form operations are available), and demonstrates where the end-to-end learned geometry is effective.



On amortizing convex conjugates for optimal transport

2023 | Powerpoint | PDF | Paper

This paper focuses on computing the convex conjugate operation that arises when solving Euclidean Wasserstein-2 optimal transport problems. This conjugation, which is also referred to as the Legendre-Fenchel conjugate or c-transform,is considered difficult to compute and in practice,Wasserstein-2 methods are limited by not being able to exactly conjugate the dual potentials in continuous space. To overcome this, the computation of the conjugate can be approximated with amortized optimization, which learns a model to predict the conjugate. I show that combining amortized approximations to the conjugate with a solver for fine-tuning significantly improves the quality of transport maps learned for the Wasserstein-2 benchmark by Korotin et al. (2021a) and is able to model many 2-dimensional couplings and flows considered in the literature.



TaskMet Poster

2023 | Powerpoint | PDF | Paper

Deep learning models are often deployed in downstream tasks that the training procedure may not be aware of. For example, models solely trained to achieve accurate predictions may struggle to perform well on downstream tasks because seemingly small prediction errors may incur drastic task errors. The standard end-to-end learning approach is to make the task loss differentiable or to introduce a differentiable surrogate that the model can be trained on. In these settings, the task loss needs to be carefully balanced with the prediction loss because they may have conflicting objectives. We propose take the task loss signal one level deeper than the parameters of the model and use it to learn the parameters of the loss function the model is trained on, which can be done by learning a metric in the prediction space. This approach does not alter the optimal prediction model itself, but rather changes the model learning to emphasize the information important for the downstream task. This enables us to achieve the best of both worlds: a prediction model trained in the original prediction space while also being valuable for the desired downstream task. We validate our approach through experiments conducted in two main settings: 1) decision-focused model learning scenarios involving portfolio optimization and budget allocation, and 2) reinforcement learning in noisy environments with distracting states.



Meta Optimal Transport

2023 | Powerpoint | PDF | Paper

We study the use of amortized optimization to predict optimal transport (OT) maps from the input measures, which we call Meta OT. This helps repeatedly solve similar OT problems between different measures by leveraging the knowledge and information present from past problems to rapidly predict and solve new problems. Otherwise, standard methods ignore the knowledge of the past solutions and suboptimally re-solve each problem from scratch. We instantiate Meta OT models in discrete and continuous settings between grayscale images, spherical data, classification labels, and color palettes and use them to improve the computational time of standard OT solvers.



Learning with differentiable and amortized optimization

2023 | Powerpoint | PDF

Optimization has been a transformative modeling and decision-making paradigm over the past century that computationally encodes non-trivial reasoning operations. Developments in optimization foundations alongside domain experts have resulted in breakthroughs for 1) controlling robotic, autonomous, mechanical, and multi-agent systems, 2) making operational decisions based on future predictions, 3) efficiently transporting or matching resources, information, and measures, 4) allocating budgets and portfolios, 5) designing materials, molecules, and other structures, 6) solving inverse problems to infer underlying hidden costs, incentives, geometries, terrains, and other structures, and 7) learning and meta-learning the parameters of predictive and statistical models. These settings often analytically specify the relevant models of the world along with an explicit objective to optimize for. Once these are specified, computational optimization solvers are able to search over the space of possible solutions or configurations and return the best one. The magic of optimization stops when 1) the relevant models of the world are too difficult or impossible to specify, leading to inaccurate or incomplete representations of the true setting, and 2) solving the optimization problem is computationally challenging and takes too long to return a solution on today's hardware. Machine learning methods help overcome both of these by providing fast predictive models and powerful latent abstractions of the world. In this talk, I will cover two ways of tightly integrating optimization and machine learning methods:] 1. *Differentiable optimization* characterizes how the solution to an optimization problem changes as the inputs change. In machine learning settings, differentiable optimization provides an implicit layer that integrates optimization-based domain knowledge into the model and enables unknown parts of the optimization problem to be learned. I will cover the foundations of learning these layers with implicit differentiation and highlight applications in robotics and control settings. 2. *Amortized optimization* rapidly predicts approximate solutions to optimization problems and is useful when repeatedly solving optimization problems. Traditional optimization methods typically solve every new problem instance from scratch, ignoring shared structures and information when solving a new instance. In contrast, a solver augmented with amortized optimization learns the shared structure present in the solution mappings and better-searches the domain. I will cover the foundations of amortized optimization and highlight new applications in control and optimal transport.



On optimal control and machine learning

2023 | Powerpoint | PDF



Continuous optimal transport

2023 | Powerpoint | PDF



Amortized optimization

2023 | Powerpoint | PDF | Paper



Amortized optimization for optimal transport

2023 | Powerpoint | PDF



Differentiable optimization

2022 | Powerpoint | PDF



Differentiable control

2022 | Powerpoint | PDF



Amortized optimization

2022 | Powerpoint | PDF



Amortized optimization for computing optimal transport maps

2022 | Powerpoint | PDF



On the model-based stochastic value gradient for continuous RL

2021 | Powerpoint | PDF | Paper

For over a decade, model-based reinforcement learning has been seen as a way to leverage control-based domain knowledge to improve the sample-efficiency of reinforcement learning agents. While model-based agents are conceptually appealing, their policies tend to lag behind those of model-free agents in terms of final reward, especially in non-trivial environments. In response, researchers have proposed model-based agents with increasingly complex components, from ensembles of probabilistic dynamics models, to heuristics for mitigating model error. In a reversal of this trend, we show that simple model-based agents can be derived from existing ideas that not only match, but outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward. We find that a model-free soft value estimate for policy evaluation and a model-based stochastic value gradient for policy improvement is an effective combination, achieving state-of-the-art results on a high-dimensional humanoid control task, which most model-based agents are unable to solve. Our findings suggest that model-based policy evaluation deserves closer attention.



Riemannian Convex Potential Maps

2021 | Keynote | PDF | Paper

Modeling distributions on Riemannian manifolds is a crucial component in understanding non-Euclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited by representational and computational tradeoffs. We propose and study a class of flows that uses convex potentials from Riemannian optimal transport. These are universal and can model distributions on any compact Riemannian manifold without requiring domain knowledge of the manifold to be integrated into the architecture. We demonstrate that these flows can model standard distributions on spheres, and tori, on synthetic and geological data.



The differentiable cross-entropy method

2020 | Powerpoint | PDF | Paper



Differentiable optimization-based modeling for machine learning (PhD Thesis)

2019 | Powerpoint | PDF



PyTorch libraries for linear algebra, optimization, and control

2018 | Powerpoint | PDF



OptNet, end-to-end task-based learning, and control (ISMP)

2018 | Powerpoint | PDF

Deep learning and end-to-end architectures provide a general and powerful way of implementing most modern machine learning tasks with a relatively small set of differentiable operations. These operations are usually simple affine operations composed with pointwise nonlinearities like the ReLU or sigmoid function. While general and successful, the drawbacks of these operations are plentiful, as the resulting learned modules can be uninterpretable and difficult to inject domainspecific knowledge into. This talk presents OptNet, a new paradigm for deep learning that integrates the solution of optimization problems "into the loop." OptNet allows domain knowledge in the form of learnable constrained optimization problems to be integrated into larger end-to-end architectures. We will first discuss the new OptNet primitive operations in the form of learning the parameters of a constrained convex quadratic program from data. Then we will show applications of applying these primitive operations in non-convex stochastic optimization and control.



Differentiable MPC

2018 | Powerpoint | PDF | Paper

We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning in continuous state and action spaces. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experiments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.



Differentiable MPC Poster

2018 | Powerpoint | PDF



OptNet: Differentiable Optimization as a Layer in Neural Networks

2017 | Powerpoint | PDF | Paper

This paper presents OptNet, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. We explore the foundations for such an architecture: we show how techniques from sensitivity analysis, bilevel optimization, and implicit differentiation can be used to exactly differentiate through these layers and with respect to layer parameters; we develop a highly efficient solver for these layers that exploits fast GPU-based batch solves within a primal-dual interior point method, and which provides backpropagation gradients with virtually no additional cost on top of the solve; and we highlight the application of these approaches in several problems. In one notable example, the method is learns to play mini-Sudoku (4x4) given just input and output games, with no a-priori information about the rules of the game; this highlights the ability of OptNet to learn hard constraints better than other neural architectures.



Input Convex Neural Networks

2017 | Powerpoint | PDF | Paper

This paper presents the input convex neural network architecture. These are scalar-valued (potentially deep) neural networks with constraints on the network parameters such that the output of the network is a convex function of (some of) the inputs. The networks allow for efficient inference via optimization over some inputs to the network given others, and can be applied to settings including structured prediction, data imputation, reinforcement learning, and others. In this paper we lay the basic groundwork for these models, proposing methods for inference, optimization and learning, and analyze their representational power. We show that many existing neural network architectures can be made input-convex with a minor modification, and develop specialized optimization algorithms tailored to this setting. Finally, we highlight the performance of the methods on multi-label prediction, image completion, and reinforcement learning problems, where we show improvement over the existing state of the art in many cases.