Native multimodal: Unified text, image, audio, video; simultaneous multi-input understanding; cross-modal generation (one input, many output types); spatial and temporal understanding for video and motion.
Agentic: Multi-step planning for complex tasks; seamless Google and third-party tool integration; learning and adaptation from interaction; safe, human-supervised operation.
