Why do LLMs make stuff up? New research peers under the hood.

Fine-tuning helps mitigate this problem, guiding the model to act as a helpful assistant and to refuse to complete a prompt when its related training data is sparse. That fine-tuning process creates distinct sets of artificial neurons that researchers can see activating when Claude encounters the name of a “known entity” (e.g., “Michael Jordan”) or an “unfamiliar name” (e.g., “Michael Batkin”) in a prompt.



A simplified graph showing how various features and circuits interact in prompts about sports stars, real and fake.

A…

Source link

Leave a Comment