A Three-Phase Factual Recall Circuit in Gemma-2B and Gemma-12B-IT
An empirical mechanistic interpretability study of factual recall in Gemma-2B and Gemma-12B-IT using activation patching. The research identifies a consistent three-phase circuit: Phase 1 (Storage) encodes facts as directions in the residual stream at the entity token position in early-to-mid layers, with the residual stream contributing 40× more causally than attention outputs. Phase 2 (Routing) moves the signal to the final token position via distributed attention heads with no single dominant head. Phase 3 (Readout) retrieves the answer in late layers without additional computation. The pattern replicates proportionally at the 12B scale, with routing becoming even more distributed. The study also highlights tokenizer-induced dataset drift as a methodological concern for cross-model comparisons, and proposes path patching and SAE analysis as natural next steps.