Interpreting the Second-Order Effects of Neurons in CLIP
CoRR(2024)
摘要
We interpret the function of individual neurons in CLIP by automatically
describing them using text. Analyzing the direct effects (i.e. the flow from a
neuron through the residual stream to the output) or the indirect effects
(overall contribution) fails to capture the neurons' function in CLIP.
Therefore, we present the "second-order lens", analyzing the effect flowing
from a neuron through the later attention heads, directly to the output. We
find that these effects are highly selective: for each neuron, the effect is
significant for <2
a single direction in the text-image space of CLIP. We describe neurons by
decomposing these directions into sparse sets of text representations. The sets
reveal polysemantic behavior - each neuron corresponds to multiple, often
unrelated, concepts (e.g. ships and cars). Exploiting this neuron polysemy, we
mass-produce "semantic" adversarial examples by generating images with concepts
spuriously correlated to the incorrect class. Additionally, we use the
second-order effects for zero-shot segmentation and attribute discovery in
images. Our results indicate that a scalable understanding of neurons can be
used for model deception and for introducing new model capabilities.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn