Researchers have developed Prisma, a mechanistic interpretability library for multimodal models like CLIP and ViTs. The library, incubated at Tyrell Turing's lab and in collaboration with Neel Nanda, focuses on interventions in neural models. Pyvene, another library, facilitates running intervention experiments and supports new architectures easily. The importance of mechanistic interpretability beyond language models is highlighted, with expectations for vision-language interpretability work using Pyvene.
Pyvene is awesome! Expect to see some cool vision-language interpretability work leveraging it soon 😉 https://t.co/Y4j7ndUaZW
Great work from @soniajoseph_! Frontier models are multimodal, and it's increasingly clear that mechanistic interpretability can't only study language models. Good tooling is unglamorous to work on, but essential for good research. I'm excited to see what work Prisma enables! https://t.co/bhpw1AuDao
Pyvene has been really useful for easily running intervention experiments in my workflow! In particular, it's super easy to add support for new architectures compared to other interp libraries. Come try it out! https://t.co/Y5ac1vJMPB https://t.co/fXvI5Uh84h
New paper and library! 🫡 Intervening on internal states has emerged as a fundamental operation for analyzing and improving neural models. We release pyvene, a library for performing interventions and sharing intervened models. 👉Code & Paper: https://t.co/wV5L9NExft https://t.co/hq8RSfWLwE
I'm excited to release Prisma, a mechanistic interpretability library for multimodal models like CLIP and ViTs. Incubated at @tyrell_turing's lab & in collab with @NeelNanda5. Recent mech interp work has focused on language, but many techniques transfer. Behold, the dogit lens: https://t.co/gs2wCFIGAa