Constellations

Search

LLM interpretability

Apr 30, 2024, 1 min read

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

https://transformer-circuits.pub/2023/monosemantic-features/index.html

referred to me twice: Zac & Spencer
how to relate a paper like this to the internal structures revealed by a behavioral/interactional analysis?

Reading notes

go here

Wolfram on ChatGPT https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

Graph View

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Reading notes

Backlinks

2024-01-12

Created with Quartz v4.1.4, © 2024

GitHub
Discord Community