Tag: AI&Physics
-
Flow-Matching Objectives (27 May 2024)
In the previous blog, I walked through the simulation-based approaches to train the neural ODE/continuous normalizing flow models. Those approaches are mathematically elegant, while they are still expensive and non-scalable in practice. Flow-matching objectives are targets to make the training more affordable and scalable. In this blog, I will review the derivations behind flow-matching models. -
Training Neural ODE with three different loss types (13 May 2024)
The recent popular flow-matching models are based on another interesting model group called Neural ODE/continuous normalizing flow. While the main idea behind flow-matching models is to find a practical and affordable way to train the neural ODE, the original adjoint sensitivity method is actually very intellectually interesting and full of meaningful details. So, in this blog, I'll review the derivations behind the adjoint method before diving into the flow-matching objective in the next one. At the end, they are both good candiates of protocols to make observables from MD trajectories differentiable. -
Implicit Reparameterization Gradients (12 Sep 2023)
This note delves into a paper recommended by Kevin, which focuses on the challenges of obtaining low-variance gradients for continuous random variables, particularly those pesky distributions we often encounter (yes, the Rice distribution). Key takeaway, you can have unbiased estimators for pathwise gradients of continuous distributions with numerically tractable CDFs, like gamma, truncated, or mixtures. -
Early Implementation of Attention Mechanism (27 Dec 2021)
We are witnessing the popularity and fast development of the Attention Mechanism in the deep learning community in recent years. It serves as the pivotal parts in most state-of-the-art models in NLP tasks, and continues to be a rapid evolving research topics in CV field. Besides, in recent AI-related scientific breakthroughs, like AlphaFold 2, the Attention Mechanism looks like an omnipresent component in the models. That is why we (Kevin and I) decided to start a journal club to read and discuss seminal papers about how attention was introduced and further developed. We hope this discussion could bring us more intuition about this fancy name, such that we could apply it to problems we are interested in with more confidence.
This blog is a note of the first discussion, about the paper Bahdanau, et al. (2014) Neural machine translation by jointly learning to align and translate1. As an early (or first) implementation of “Attention Mechanism” in the translation task, it helps a lot, at least for me, to understand what is attention, although the attention here is a little different from that in the following Transformer model.
-
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014). ↩
-
-
Spectral Bias and Positional Encoding (26 Jul 2021)
Recent days (maybe it is already out of date when you read this blog), we see a “renaissance” of classic multilayer perceptron (MLP) models in machine learning field. The logic behind this trend is heuristic for researches to see that, by understanding how a complex black box works, we can naturally add some reasonably modifications to make it better, instead of shotting with blind eyes. The majority of the blog is based on paper Tancik, Matthew, et al. (2020) Fourier features let networks learn high frequency functions in low dimensional domains.
The basic take-away is, a standard MLP model fails to learn high frequencies both in theory and in practice, which is called Spectral Bias. Based on this findings, with a simple Fourier feature mapping (Positional Encoding), the performance of MLPs can be greatly improved, especially for low-dimensional regression tasks, like your inputs are atom coordinates.
-
The Thermodynamic Variational Objective and more (20 Jan 2021)
This is a reading note centering around paper Vaden Masrani, et al. (2019) The Thermodynamic Variational Objective. This paper provides a way to construct a tighter bound, called TVO, to observed likelihood in VAE than ELBO (my note about ELBO), base on a statistical physics intuition. There are two main points I want to make here:
- Mordern statistical machine learning could borrow a lot from statistical physics field, as they generally share many similar questions;
- A tighter bound of learning objective does not always guarantee a better performance. Yes, a beautiful heuristic math sometimes doesn’t produce better result.
This is also what we choose to study on final project of Harvard 20 Fall AM207 course, we test this novel TVO on some sepcially-designed toy datesets. You can find our report here
-
A Theoretical Connection Between Statistical Physics and Reinforcement Learning (20 Jan 2021)
This is a reading note centering around paper Jad Rahme and Ryan Adams (2019) A Theoretical Connection Between Statistical Physics and Reinforcement Learning. I would rather call it a framework to understand RL centering partition function. It is a very heuristic work (but again, seems not very useful at the moment), as we long have the feeling that RL and statistical mechanics deal with the similar issue about integrating all phase space. So this paper provide a good example about how we can start from that intuition and establish more. Similarly, I could also imagine that there is a way to understand the RL in a path intergral framework.
TBC