What?
A position paper on using neural nets to extend existing algorithms to inputs previously considered inaccessible to them.
Why?
Algorithms are extremely effective but expect input prepared by humans in a specific form. Can we use neural networks to circumvent the need for highly pre-specified inputs?
How?
source: original paper
- Algorithms & Deep Learning
- Algorithm are amazingly generalizable
- However, the input should satisfy some conditions (pre-conditions)
- e.g. non-negative weights in Dijkstra's algorithm
- Algorithms: solve a specific problem with guarantees.
- DL: adapt to new problems (but with bad guarantees).
- A single network might learn multiple algorithms.
- Algorithms in the real world
- Stringent pre-conditions & post-conditions
- Performance guarantees;
- elegant and interpretable pseudocode;
- Clear connections between seemingly different problems;
- Algorithmic Bottleneck and Neural Algorithm Execution
- Features:
- Neural Networks should align with the target algorithm.
- Backprop all the way down (or back?).
- Operates over high-dimensional latent spaces
- In contrast to the traditional algorithm using scalars, e.g. edge labels in Dijstra's algorithm.
- Having high-dimensional latents will avoid these bottlenecks.
- The blueprint:
- The idea is quite simple & elegant
- steps:
- Train three blocks (encoder $f$, processor (usually GNN)), and decoder $g$ on synthetic inputs (e.g. randomly generated MDPs). One can use supervision on this stage to bring the processor closer to the original algorithm (e.g. train on traces using the algorithm as target generator).
- Throw, $f$ and $g$ away. Freeze the processor.
- Train $\tilde{f}$ and $\tilde{g}$, encoder and decoder working on natural inputs (compared to synthetic scalar inputs, e.g. path costs).
- So, we learn how to modulate & readout the results of a processor for the new data modalities.
And?
- I really like this paper, though, I believe it could have been even better:
- I buy the bottleneck argument, but it would be great if the authors made an attempt to formalise this a little bit. How can we falsify the bottleneck hypothesis?
- In XLVIN, the authors compare themselves to TreeQN and, TreeQN performs worse, but is that because of the bottleneck?
- I like how clear the stitching blueprint story is, but I believe algorithmic reasoning includes another research direction omitted here: heuristics.
- Heuristics, effectively, modulate the algorithm and can be learned from data. Hence, heuristics is another contact point of the algorithms and machine learning models.
- There's a lot of work on learning heuristics and using them as a part of a well-established algorithm. [shameless self-promotion]
- Extending the blueprint, in addition to encoder $f$ and decoder $g$, can we have another block $h$ (heuristic) that goes to the processor itself? What this neural heuristic would be?
- Thinking about pre-conditions and post-conditions is useful. However, I believe for DL algorithms, there are also pre-conditions that were not discussed in the paper. For instance, the feature dimensions should be always the same. Feature inputs are real-valued vectors. These are some constraints on being able to simply run stuff.
- Related to this, I think it's a useful intuition to think of natural inputs as observations whereas a traditional algorithm expects a state. However, here comes the bottleneck argument.
- This paper is an extension of Section 3.3 of this amazing survey. I highly recommend reading it.
- We can extend the application range of ML models using the proposed blueprint. What are good benchmarks that are good proxies for tracking the progress in this area?
This note is a part of my paper notes series. You can find more here or on Twitter. I also have a blog.