The frontier of simulation-based inference

A survey of simulation-based inference and identifying what can improve the field.

Simulators are omnipresent in science, with a progress in ML, we have new tools to improve the simulators → keep the science going.

source: original paper

What is a simulator?

A computer program;
Takes parameters $\theta$ as input;
Samples latent variables (internal states) $z \sim p_i(z_i \mid \theta, z_{<i})$;
- parameters affect the transition probabilities ^^^.
- latents heavily depend on the problem;
  - can be discrete or continuous;
  - have some semantics;
  - dimensions can vary greatly;
  - some simulators provide access to latents (others are black boxes)
Produces a data vector $x \sim p(x\mid \theta, z)$as the output.
- $x$ correspond to observations.

Inference

Given observed data $x$, what are the parameters $\theta$ or latents $z$?
- Calcluate the posterior $p(\theta \mid x) = p(x \mid \theta) p(\theta)/ \int{p(x\mid \theta')p(\theta')d\theta'}$.
- The likelihood is intractable! (need to integrate over all the latents)

Traditional methods

Approximate Bayesian Computation (ABC):
- Draw $\theta$ from the prior;
- Simulate with $\theta$ to get $x_\text{sim}$;
- Reject if $\rho(x_\text{sim}, x_\text{obs})<\epsilon$, where $\rho$ is some distance;
- You get your posterior.
- Problems:
  - Needs a lot of samples;
  - Poor scaling to high-dimensional data (here you go, Jeremy Howard);
  - For new observations, requires rerunning the inference.
Amortized likelihood (1E):
- Similar to ABC, can call it "approximate frequentist computation";
- Amortized! New data points can be efficiently evaluated.
- Curse of dimensionality again! (Oh no =()
  - Both approaches require low-dimensional summary statistics (need feature extractors!)

Frontiers

Shortcomings of traditional approaches:
- sample efficiency;
- quality of inference;
- amortization;
Main directions of progress:
- Machine Learning
  - Neural Networks for the win;
  - Replace human feature extractors with networks;
  - Density estimation in high dimensions;
  - Normalizing flows;
  - GANs;
- Active Learning
  - Sample parameters $\theta$ that increase our knowledge the most;
- Automatic Differentiation
  - Treat the simulation as a white box;
  - Probabilistic programming;

Using the simulator directly during inference (A-D on the figure)

Run the simulator with the parameters that are expected to improve the knowledge of the posterior the most.
Inference controls all steps of the program execution and can bias each draw of latents to improve the data match.
Probabilistic programming for the win;

Surrogate models (E-H on the figure)