Mapping the Mind of a Large Language Model \ Anthropic

We were able to measure a kind of “distance” between features based on which neurons appeared in their activation patterns. This allowed us to look for features that are “close” to each other. Looking near a “Golden Gate Bridge” feature, we found features for Alcatraz Island, Ghirardelli Square, the Golden State Warriors, California Governor Gavin Newsom, the 1906 earthquake, and the San Francisco-set Alfred Hitchcock film Vertigo.
This holds at a higher level of conceptual abstraction: looking near a feature related to the concept of “inner conflict”, we find features related to relationship breakups, conflicting allegiances, logical inconsistencies, as well as the phrase “catch-22”. This shows that the internal organization of concepts in the AI model corresponds, at least somewhat, to our human notions of similarity.
— Read on www.anthropic.com/research/mapping-mind-language-model


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *