What?
A survey of simulation-based inference and identifying what can improve the field.
Why?
Simulators are omnipresent in science, with a progress in ML, we have new tools to improve the simulators → keep the science going.
How?
source: original paper
What is a simulator?
- A computer program;
- Takes parameters $\theta$ as input;
- Samples latent variables (internal states) $z \sim p_i(z_i \mid \theta, z_{<i})$;
- parameters affect the transition probabilities ^^^.
- latents heavily depend on the problem;
- can be discrete or continuous;
- have some semantics;
- dimensions can vary greatly;
- some simulators provide access to latents (others are black boxes)
- Produces a data vector $x \sim p(x\mid \theta, z)$as the output.
- $x$ correspond to observations.
Inference
- Given observed data $x$, what are the parameters $\theta$ or latents $z$?
- Calcluate the posterior $p(\theta \mid x) = p(x \mid \theta) p(\theta)/ \int{p(x\mid \theta')p(\theta')d\theta'}$.
- The likelihood is intractable! (need to integrate over all the latents)
Traditional methods
- Approximate Bayesian Computation (ABC):
- Draw $\theta$ from the prior;
- Simulate with $\theta$ to get $x_\text{sim}$;
- Reject if $\rho(x_\text{sim}, x_\text{obs})<\epsilon$, where $\rho$ is some distance;
- You get your posterior.
- Problems:
- Needs a lot of samples;
- Poor scaling to high-dimensional data (here you go, Jeremy Howard);
- For new observations, requires rerunning the inference.
- Amortized likelihood (1E):
- Similar to ABC, can call it "approximate frequentist computation";
- Amortized! New data points can be efficiently evaluated.
- Curse of dimensionality again! (Oh no =()
- Both approaches require low-dimensional summary statistics (need feature extractors!)
Frontiers
- Shortcomings of traditional approaches:
- sample efficiency;
- quality of inference;
- amortization;
- Main directions of progress:
- Machine Learning
- Neural Networks for the win;
- Replace human feature extractors with networks;
- Density estimation in high dimensions;
- Normalizing flows;
- GANs;
- Active Learning
- Sample parameters $\theta$ that increase our knowledge the most;
- Automatic Differentiation
- Treat the simulation as a white box;
- Probabilistic programming;
Using the simulator directly during inference (A-D on the figure)
- Run the simulator with the parameters that are expected to improve the knowledge of the posterior the most.
- Inference controls all steps of the program execution and can bias each draw of latents to improve the data match.
- Probabilistic programming for the win;
Surrogate models (E-H on the figure)