Develop applications with strong audio and visual understanding, for rich multimodal support.
令人意外的架构决策:音频输入能力是 E2B/E4B 专属的,反而是更大的 26B 和 31B 模型不支持音频。这意味着 Google 刻意把语音能力部署在边缘端——暗示他们对端侧语音助手场景的押注,而非将音频作为云端大模型的特权能力。小模型反而是音频 AI 的「第一公民」。
Develop applications with strong audio and visual understanding, for rich multimodal support.
令人意外的架构决策:音频输入能力是 E2B/E4B 专属的,反而是更大的 26B 和 31B 模型不支持音频。这意味着 Google 刻意把语音能力部署在边缘端——暗示他们对端侧语音助手场景的押注,而非将音频作为云端大模型的特权能力。小模型反而是音频 AI 的「第一公民」。
what's more important from the perspective of a software architect is why a particular implementation or approach was chosen over its alternatives. A common way to document decisions like this is to use architecture decision records, ideally stored in source control with or near the application(s) impacted by the decision.
Kozyreva, A., Lewandowsky, S., & Hertwig, R. (2019, December 4). Citizens Versus the Internet: Confronting Digital Challenges With Cognitive Tools. https://doi.org/10.31234/osf.io/ky4x8