1 Matching Annotations
  1. Last 7 days
    1. Gemini Robotics-ER 1.6 achieves its highly accurate instrument readings by using agentic vision, which combines visual reasoning with code execution. The model takes intermediate steps: first zooming into an image to get a better read of small details in a gauge, then using pointing and code execution to estimate proportions and intervals and get an accurate reading.

      这一描述揭示了AI如何通过多步骤推理解决复杂问题,展示了模型在处理精细视觉任务时的创新方法。将视觉推理与代码执行相结合的能力代表了AI系统向更接近人类认知方式的方向发展,这种混合方法可能成为未来AI解决复杂物理任务的标准范式。