Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Date:

arXiv:2407.12161v1 Announce Type: new
Abstract: Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task – crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death.

Share post:

Subscribe

spot_imgspot_img

Popular

More like this
Related

Photoneo는 로봇 인식을 향상시키기 위해 MotionCAM-3D 컬러 (파란색)를 출시합니다

MotionCam 3D Color (Blue)는이 팔레팅 응용 프로그램에서와 같이 거리에서...

Rainbow Robotics는 전 방향 바퀴, 이중 암 로봇을위한 개발 키트를 공개합니다.

RB-Y1에는 휠 모바일 플랫폼에 장착 된 두 개의 암이...

10 로봇 트렌드는 2025 년에 발견되었습니다

지난 주 디트로이트에서 2025 년을 소집했습니다. 출처 : 로봇...

한 명의 운전자, 두 트럭 : 이것이화물의 미래입니까?

쌍의 쌍 반 트럭 콜럼버스, 오하이오 및 인디애나 폴리스...