Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
https://transformer-circuits.pub/2023/monosemantic-features/index.html
- referred to me twice: Zac & Spencer
- how to relate a paper like this to the internal structures revealed by a behavioral/interactional analysis?
Reading notes
go here
Wolfram on ChatGPT https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/