邁向推理導向的自動駕駛：NVIDIA Alpamayo 簡介自動駕駛（AV）研究正經歷一場快速變革。推理導向的視覺–語言–行動（VLA）模型之出現，為自動駕駛的決策過程注入了類人化的思考能力，進而重塑了整個領域。

Autonomous vehicle (AV) research is undergoing a rapid shift. The field is being reshaped by the emergence of reasoning-based vision–language–action (VLA) models that bring human-like thinking to AV decision-making.

These models can be viewed as implicit world models operating in a semantic space, allowing AVs to solve complex problems step-by-step and to generate reasoning traces that mirror human thought processes.

This shift extends beyond the models themselves: traditional open-loop evaluation is no longer sufficient to rigorously assess such models, and new evaluation tools are required.

Recently, NVIDIA introduced Alpamayo, a family of models, simulation tools, and datasets to enable development of reasoning-based AV architectures.

Our goal is to provide researchers and developers with a flexible, fast, and scalable platform for evaluating, and ultimately training, modern reasoning-based AV architectures in realistic closed-loop settings.

In this blog, we introduce Alpamayo and how to get up and running with reasoning-based AV development:• Part 1: Introducing NVIDIA Alpamayo 1, an open, 10B reasoning VLA model, as well as how to use the model to both generate trajectory predictions and review the corresponding reasoning traces. • • Part 2: Introducing the Physical AI dataset, one of the largest and most geographically diverse open AV datasets available that enables training and evaluating these models. • • Part 3: Introducing NVIDIA AlpaSim, an open-source end-to-end simulation tool designed for evaluating end-to-end models • • Part 4: Leveraging the ecosystem altogether to drive Alpamayo 1 closed-loop on reconstructed data within AlpaSim. •

These three key components provide the essential pieces needed to start building reasoning-based VLA models: a base model, large-scale data for training, and a simulator for testing and evaluation.

邁向推理導向的自動駕駛：NVIDIA Alpamayo 簡介
自動駕駛（AV）研究正經歷一場快速變革。推理導向的視覺–語言–行動（VLA）模型之出現，為自動駕駛的決策過程注入了類人化的思考能力，進而重塑了整個領域。

這些模型可被視為在語義空間中運作的隱式世界模型（implicit world models），使自動駕駛車輛能夠逐步解決複雜問題，並生成鏡像人類思維過程的「推理軌跡」。

這種轉變不僅限於模型本身：傳統的「開環評估（open-loop evaluation）」已不足以嚴謹地評估此類模型，因此我們需要全新的評估工具。

最近，NVIDIA 推出了 Alpamayo，這是一個包含模型系列、模擬工具和數據集的生態系統，旨在推動推理導向自動駕駛架構的開發。

我們的目標是為研究人員和開發者提供一個靈活、快速且具擴展性的平台，以便在真實的閉環（closed-loop）設置中評估並最終訓練現代推理導向的自動駕駛架構。

在本篇部落格中，我們將介紹 Alpamayo 以及如何開始進行推理導向的自動駕駛開發：

第一部分：介紹 NVIDIA Alpamayo 1
這是一個開源的 10B 參數推理 VLA 模型。我們將展示如何使用該模型生成軌跡預測，並審查其對應的推理軌跡。
第二部分：介紹 Physical AI 數據集
這是目前最大且地理多樣性最豐富的開源自動駕駛數據集之一，專為訓練與評估此類模型而設計。
第三部分：介紹 NVIDIA AlpaSim
一個開源的端到端模擬工具，專為評估端到端（end-to-end）模型而設計。
第四部分：生態系統協作
展示如何結合上述組件，在 AlpaSim 的重建數據中以閉環方式驅動 Alpamayo 1。

這三個核心組件構成了構建推理導向 VLA 模型的關鍵基礎：基礎模型、大規模訓練數據以及用於測試與評估的模擬器。

邁向推理導向的自動駕駛：NVIDIA Alpamayo 簡介自動駕駛（AV）研究正經歷一場快速變革。推理導向的視覺–語言–行動（VLA）模型之出現，為自動駕駛的決策過程注入了類人化的思考能力，進而重塑了整個領域。

發表者：劉伯烏UBER 多元計程車｜UberEats | 優步小黃｜無障礙計程車

發表留言取消回覆

分享此文：

發表者：劉伯烏UBER 多元計程車｜UberEats | 優步小黃｜無障礙計程車

發表留言 取消回覆

發表留言取消回覆