Autonomous vehicle (AV) research is undergoing a rapid shift. The field is being reshaped by the emergence of reasoning-based vision–language–action (VLA) models that bring human-like thinking to AV decision-making.
These models can be viewed as implicit world models operating in a semantic space, allowing AVs to solve complex problems step-by-step and to generate reasoning traces that mirror human thought processes.
This shift extends beyond the models themselves: traditional open-loop evaluation is no longer sufficient to rigorously assess such models, and new evaluation tools are required.
Recently, NVIDIA introduced Alpamayo, a family of models, simulation tools, and datasets to enable development of reasoning-based AV architectures.
Our goal is to provide researchers and developers with a flexible, fast, and scalable platform for evaluating, and ultimately training, modern reasoning-based AV architectures in realistic closed-loop settings.
In this blog, we introduce Alpamayo and how to get up and running with reasoning-based AV development:• Part 1: Introducing NVIDIA Alpamayo 1, an open, 10B reasoning VLA model, as well as how to use the model to both generate trajectory predictions and review the corresponding reasoning traces. • • Part 2: Introducing the Physical AI dataset, one of the largest and most geographically diverse open AV datasets available that enables training and evaluating these models. • • Part 3: Introducing NVIDIA AlpaSim, an open-source end-to-end simulation tool designed for evaluating end-to-end models • • Part 4: Leveraging the ecosystem altogether to drive Alpamayo 1 closed-loop on reconstructed data within AlpaSim. •
These three key components provide the essential pieces needed to start building reasoning-based VLA models: a base model, large-scale data for training, and a simulator for testing and evaluation.
邁向推理導向的自動駕駛:NVIDIA Alpamayo 簡介
自動駕駛(AV)研究正經歷一場快速變革。推理導向的視覺–語言–行動(VLA)模型之出現,為自動駕駛的決策過程注入了類人化的思考能力,進而重塑了整個領域。
這些模型可被視為在語義空間中運作的隱式世界模型(implicit world models),使自動駕駛車輛能夠逐步解決複雜問題,並生成鏡像人類思維過程的「推理軌跡」。
這種轉變不僅限於模型本身:傳統的「開環評估(open-loop evaluation)」已不足以嚴謹地評估此類模型,因此我們需要全新的評估工具。
最近,NVIDIA 推出了 Alpamayo,這是一個包含模型系列、模擬工具和數據集的生態系統,旨在推動推理導向自動駕駛架構的開發。
我們的目標是為研究人員和開發者提供一個靈活、快速且具擴展性的平台,以便在真實的閉環(closed-loop)設置中評估並最終訓練現代推理導向的自動駕駛架構。
在本篇部落格中,我們將介紹 Alpamayo 以及如何開始進行推理導向的自動駕駛開發:
- 第一部分:介紹 NVIDIA Alpamayo 1
這是一個開源的 10B 參數推理 VLA 模型。我們將展示如何使用該模型生成軌跡預測,並審查其對應的推理軌跡。 - 第二部分:介紹 Physical AI 數據集
這是目前最大且地理多樣性最豐富的開源自動駕駛數據集之一,專為訓練與評估此類模型而設計。 - 第三部分:介紹 NVIDIA AlpaSim
一個開源的端到端模擬工具,專為評估端到端(end-to-end)模型而設計。 - 第四部分:生態系統協作
展示如何結合上述組件,在 AlpaSim 的重建數據中以閉環方式驅動 Alpamayo 1。
這三個核心組件構成了構建推理導向 VLA 模型的關鍵基礎:基礎模型、大規模訓練數據以及用於測試與評估的模擬器。
自動駕駛(AV)研究正經歷一場快速變革。推理導向的視覺–語言–行動(VLA)模型之出現,為自動駕駛的決策過程注入了類人化的思考能力,進而重塑了整個領域。
自動駕駛(AV)研究正經歷一場快速變革。推理導向的視覺–語言–行動(VLA)模型之出現,為自動駕駛的決策過程注入了類人化的思考能力,進而重塑了整個領域。
自動駕駛(AV)研究正經歷一場快速變革。推理導向的視覺–語言–行動(VLA)模型之出現,為自動駕駛的決策過程注入了類人化的思考能力,進而重塑了整個領域。