Amazon And Cerebras Strike Deal For AWS

Mar 16
1 min read

Amazon Web Services announced on Friday that it is deploying Cerebras Systems CS-3 chips in AWS data centers to deliver the fastest AI inference available in cloud computing.

The partnership brings Cerebras's Wafer-Scale Engine, which has powered models for OpenAI, Cognition, and Meta at speeds reaching 3,000 tokens per second, to AWS's global customer base through AWS Bedrock. The service will support leading open-source language models and Amazon's Nova models.

AWS and Cerebras are also developing a new disaggregated architecture that pairs AWS Trainium chips with Cerebras's WSE to deliver five times more high-speed token capacity in the same hardware footprint. Under this configuration, Trainium handles prefill work by computing the key-value cache, then sends it to the Cerebras WSE via Amazon's high-speed EFA interconnect. The Cerebras WSE exclusively performs decode, generating thousands of output tokens per second compared to hundreds on GPUs.

Cerebras has established itself as the market leader in high-speed AI inference by storing all model weights on-chip in SRAM, delivering thousands of times greater memory bandwidth than the fastest GPU. The company brings a decade of innovation in wafer-scale system architecture, model expertise, and inference serving to the collaboration.

Amazon's Trainium is a purpose-built AI chip designed for scalable performance and cost efficiency across a broad range of generative AI workloads, with dense compute cores especially suited for the prefill phase. AWS brings expertise in custom silicon, networking, and distributed computing to the partnership.

The companies indicated that the service will become available to customers in the coming months.

Responsible AI Foundation

Amazon And Cerebras Strike Deal For AWS

Related Posts

Comments

Never Miss a New Post.

Join Us