Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Date:

arXiv:2407.12161v1 Announce Type: new
Abstract: Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task – crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death.

Share post:

Subscribe

Popular

More like this
Related

도쿄 공공 도로에서 로봇 락시스 테스트를 시작하는 Waymo

Waymo는 도쿄 교통 생태계의 일부가되어 안전과 이동성을 향상시키는 것을...

활로 보편적 인 로봇 플랫폼 구축

로봇 보고서 팟 캐스트 · 활로 보편적 인 로봇...

PIAP Space의 Titan Robotic Arm은 궤도 내 검사를 자동화하는 것을 목표로합니다.

PIAP는이 10 년 말 전에 타이탄과 같은 시스템이 배치...

비디오 금요일 : 작은 로봇 벌레 홉과 점프

Video Friday는 친구가 수집 한 주별 멋진 로봇 비디오입니다....