What?

An attention-based model where an element of a set can be updated with a different set of weights.

Why?

There is declarative and procedural knowledge in the world. The paper gets inspired by cognitive science and wants to show that the declarative/procedural knowledge is a useful split for us.

How?

source: original paper. I was not brave enough to retype that.

source: original paper. I was not brave enough to retype that.

I believe, the figure is self explanatory, the authors did an amazing job in making the pseudocode crystal clear. Some comments:

And?