Anthropic’s “AI Microscope” Explores the Inner Workings of Large Language Models

Two recent papers from Anthropic attempt to shed light on the processes that take place within a large language model, exploring how to locate interpretable concepts and link them to the computational “circuits” that translate them into language, and how to characterize crucial behaviors of Claude Haiku 3.5, including hallucinations, planning, and other key traits. … Read more