Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Date:

arXiv:2407.12161v1 Announce Type: new
Abstract: Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task – crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death.

Share post:

Subscribe

Popular

More like this
Related

당신의 AI 동반자

Microsoft가 현재 50 년 동안 끊임없는 혁신 혁신에 걸친...

4월 4일 정부지원사업 신규 공고 리스트 (106건) _ (파일 재가공/재배포 가능)

4월 4일 106건<4/4 지원사업 신규 공고 목록> *전 영업일인 4/3에...

미국 정부 정책 이동은 로봇 공학, 노트 패널리스트를위한 기회를 제공합니다.

생생한 행성은 토지 관리 및 화재 완화, 연방 정부의...

민첩성 로봇 공학은 Digit Humanoid의 최신 발전을 선보입니다

Digit Humanoid는 Promat 2025에서 최신 기능을 보여줍니다. 출처 :...