The Next Web0 Hot0 bình luận5 phút đọc3 giờ trước

Why the next leap in AI video is teaching avatars to see and listen

AI video is shifting from a fidelity race to an interactivity race. A three-level framework defines interactive avatar models: Level 1 avatars can only talk (one-way generation), Level 2 can talk and listen (reacting to user audio in real time with nods, expressions, and vocal cues), and Level 3 can talk, listen, and see (responding to posture, gesture, and facial expression via camera feed). The critical leap is Level 1 to Level 2, because an avatar that talks without listening feels uncanny and worse than audio-only systems. Achieving convincing listening requires joint modeling of audio and motion rather than stacking separate systems. Level 3 represents full human-to-human interaction replication, including contextual cues like a person standing up to end a conversation.

Đọc bài gốc

#conversational-ai #multimodal

Nguồn: https://thenextweb.com/news/interactive-avatar-models-three-levels-interactivity. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

ByteByteGo1 Hot13 phút2 ngày trướcAI

Inside Thinking Machines’ Interaction Models

Phòng thí nghiệm AI mới Thinking Machines đề xuất mô hình "interaction model" thay thế kiến trúc turn-based truyền thống bằng cách tích hợp tương tác trực tiếp vào mô hình, sử dụng các micro-turns (200ms) và phối hợp hai mô hình (tương tác nhanh + suy luận nền). Mô hình 276B tham số (12B tham số hoạt động) của họ thể hiện khả năng dịch thuật live, đếm nhịp real-time và sửa lỗi codeswitching giữa câu, nhưng vẫn gặp hạn chế về quản lý ngữ cảnh dài, yêu cầu kết nối và độ trễ.

Lập trình viên AI nên đọc bài này để hiểu cách thiết kế lại mô hình tương tác thực tế bằng cách loại bỏ giới hạn của hệ thống dựa trên vòng lặp ngôn ngữ truyền thống, giúp tối ưu hóa hiệu suất và khả năng tương tác đa phương tiện trong ứng dụng AI hiện đại.

Why the next leap in AI video is teaching avatars to see and listen

Đề xuất cho bạn

Inside Thinking Machines’ Interaction Models

Inside the vLLM-Omni architecture: Serving Qwen3-Omni

Meet Penny, Pick n Pay’s new AI shopping companion

TCS | Pick n Pay’s Enrico Ferigolli on Penny, the AI that shops for you

Google Launches Nano Banana 2 Lite and Gemini Omni Flash

Call-centre stocks slide as investors worry AI makes them uninvestable

What is the Best AI Video Model in 2026?

Gemma 4's smallest model runs on 3GB of VRAM, and it's the one I actually reach for