Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Date:

arXiv:2407.12161v1 Announce Type: new
Abstract: Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task – crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death.

Share post:

Subscribe

Popular

More like this
Related

12월23일 정부지원사업 신규 공고 리스트 (12건) _ (파일 재가공/재배포 가능)

12월 23일 12건<12/23지원사업 신규 공고 목록> *전 영업일인 12/20에 올라온...

Waste Robotics와 Greyparrot가 분류 로봇을 강화하는 방법

Waste Robotics는 FANUC 로봇 팔을 사용하여 안정적이고 정확한 피킹을...

2024년 상위 10가지 생물의학 이야기

2024년에는 생체 의학 기술이 실제로 우리 머리, 더 구체적으로...

Sora AI 리뷰: AI가 영상 제작자를 영원히 대체할 수 있을까요?

말로만 고품질 비디오를 만들고 싶었던 적이 있습니까?2024년 2월 OpenAI...