top of page

Google Releases LiteRT-LM Framework Powering On-Device AI in Chrome, Chromebook Plus, and Pixel Watch

  • Sep 26, 2025
  • 1 min read

Google has released LiteRT-LM, the production-ready inference framework that enables on-device deployment of large language models like Gemini Nano across hundreds of millions of devices including Chrome browsers, Chromebooks, and Pixel Watches.


Technical Architecture:

  1. Engine/Session Design: Singleton Engine manages shared resources while individual Sessions handle stateful conversations, enabling multiple AI features to share a single foundation model with task-specific LoRA adapters.

  2. Cross-Platform Support: Deploys across Android, Linux, macOS, Windows, and Raspberry Pi with CPU, GPU, and NPU hardware acceleration through underlying LiteRT runtime.

  3. Memory Optimization: Features session cloning, copy-on-write KV-cache, and context switching to minimize memory footprint while enabling sub-second time-to-first-token latency.

  4. Modular Components: Open-source framework allows custom pipeline construction from core modules like executor, tokenizer, and sampler for resource-constrained deployments.


Production Deployments: The framework currently powers Web AI in Chrome through built-in AI APIs, AI capabilities for tab management and text analysis on Chromebook Plus, and Smart Replies feature on Pixel Watch. Each deployment demonstrates the system's scalability from high-performance multi-task environments to severely resource-constrained wearable devices.


Developer Access: Google now provides direct C++ interface access to LiteRT-LM for the first time, enabling developers to build custom high-performance AI pipelines. The framework complements existing high-level APIs including MediaPipe LLM Inference API and Chrome Built-in AI APIs.


Market Impact: LiteRT-LM addresses the fundamental challenge of deploying gigabyte-scale models across diverse edge hardware while maintaining offline availability and eliminating per-API-call costs for high-frequency AI tasks.


Comments


bottom of page