최신뉴3

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

7월 19, 2024

154

비공개:-interpretability-in-action:-exploratory-analysis-of-vpt,-a-minecraft-agent — 비공개: Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

arXiv:2407.12161v1 Announce Type: new
Abstract: Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task – crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death.

News Week
Magazine PRO

Company

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

LEAVE A REPLY Cancel reply

About us

Company

The latest

Locus Robotics, 로봇 보조 픽 60억 개 넘어 시장 모멘텀 주목

제너럴모터스(GM), 크루즈의 도움으로 ‘눈을 떼는’ 운전 서비스 제공 2028년 출시 예정

Avride는 자율주행차, 배송을 위해 최대 3억 7,500만 달러의 전략적 투자를 확보했습니다.

News WeekMagazine PRO

Company

관련된 글:

관련된 글:

LEAVE A REPLY Cancel reply

About us

Company

The latest

Locus Robotics, 로봇 보조 픽 60억 개 넘어 시장 모멘텀 주목

제너럴모터스(GM), 크루즈의 도움으로 ‘눈을 떼는’ 운전 서비스 제공 2028년 출시 예정

Avride는 자율주행차, 배송을 위해 최대 3억 7,500만 달러의 전략적 투자를 확보했습니다.

News Week
Magazine PRO