Interaction:
Probe
Result
Memory (Weights to Selected Cell)
Mode:
Designed and programmed by Robert Goldstone (rgoldsto@iu.edu). Percepts and Concepts Laboratory, Cognitive Science Program & Department of Psychological and Brain Sciences, Indiana University.
Hopfield Neural Network for Pattern Memory
This simulation implements a neural network based on the 2024 Nobel Prize Winner John Hopfield's proposal for how patterns can be memorized and retrieved in a massively distributed neural network.
- Create patterns to train the neural network: Paint your own patterns in the pattern boxes with the draw and erase modes, drag patterns from one location to another, or press "Draw Letters" to draw whatever letters are contained in the Letter boxes into the training pattern boxes
- Memory: The neural network memory consists of connection weights from every grid cell to every other grid cell. Given that each grid has 21 X 21 (=441) cells, there are 441^2 connection weights between cells. Weights determine whether one active cell will excite (turn on, weight = +1) or
inhibit (weight = -1) the connected cell. When you click on a cell in the memory, you will see all weights to that cell from every other cell: black=-1, gray=0, white=+1. Weights are learned through a variant of Hebbian Learning -- as the training patterns are presented to the network one at a time, the connection weights between two
cells increases if they are in the same state (both white or both black) and decreases if they are in opposite states. Mathematically speaking:
Let \(s \in \{-1,+1\}^N\) be the currently presented training pattern (white \(=+1\), black \(=-1\)). Weights \(W_{ij}\) encode the connection from cell \(j\) to cell \(i\). For \(i\neq j\), one online update step is:
\[ W_{ij}^{(t+1)} \;=\; \mathrm{clip}\!\left( W_{ij}^{(t)} \;+\; \eta\,\big(s_i\,s_j \;-\; W_{ij}^{(t)}\big),\; -1,\; +1 \right), \qquad W_{ii}^{(t+1)} \;=\; 0. \]- \(\eta\): learning rate (small, e.g., \( \eta \ll 1 \)).
- \(\mathrm{clip}(x,-1,+1)\): clamps values to \([-1,+1]\).
- Intuition: if \(s_i\) and \(s_j\) match, \(s_i s_j=+1\) and \(W_{ij}\) is pulled toward \(+1\); if they differ, \(s_i s_j=-1\) and \(W_{ij}\) is pulled toward \(-1\).
- Probe: Once the neural network has been trained, draw a probe pattern, or drag one of the training patterns into the probe box. Add noise to it if you want. Press "Process probe" to see how the memory (learned connection weights) transforms the probe. During processing, the next time step's probe result activation for Cell i is based on its previous activation plus the sum of the products of each other cell's activation times the weight from the other cell to i.
- \(r_i^{(t)}\): current activation of cell \(i\) in the Result grid at step \(t\)
- \(x_j^{(t)}\): driving activation of cell \(j\) (from the current Result)
- \(W_{ij}\): weight from cell \(j\) to cell \(i\)
- \(\gamma\): step size (small positive constant)
- Things to see and do:
- If you only train the memory on a single pattern that fills about half of the grid, then drawing a small part of the pattern as a probe will always end up resulting in the trained pattern. Drawing a part of the background that is not part of the pattern will produce a negative image of the trained pattern. Why?
- After the neural network has been trained on the letters A, X, H, O, and V, notice that if two cells are always in the same state (both black or both white) in every letter then the connection weight between them will be +1 (white). If two cells are always in opposite states, the connection weight between them will be -1 (black). Why? Generally, would you say that different regions of the memory are dedicated to different letters (a "localist" representation) or all regions of the memory are involved in remembering all letters (a "distributed representation")?
- After training on A, X, H, O, and V, how good a job does the network do at recovering the letters when probed with noisy, warped, and simplified versions of the letters? If the memory for the patterns is completely distributed, how does the network end up cleanly recovering perfect copies of the trained patterns? Does it do equally well when you train on different sets of letters?
- If you train the network on "O", "C", and "D" (more similar patterns than A, X, H, O, and V), what happens when you process "C" or "D"? How and why does the network mis-perceive these letters? In what ways is this a good/bad model of human perception/memory? How might you change the learning or probing process to avoid these "hallucinations"?
- After you train memory on some patterns, then stop all training, and then modify the memory by drawing and erasing weights between cells, does that have much effect on the ability of the memory to aptly process probes? Why? What does this suggest about how resilient neural networks are to random perturbations?
\[
r_i^{(t+1)} \;=\; \mathrm{clip}\!\left(
r_i^{(t)} \;+\; \gamma \sum_{j \ne i} W_{ij}\, x_j^{(t)}
,\,-1,\,+1
\right)
\]