Why AI Is So Forgetful
- Nikita Silaech
- Dec 15, 2025
- 3 min read

We've built AI systems that can reason across entire books, code across millions of lines, and converse in ways that feel almost human. Except they can't remember you from one conversation to the next.
Every time you start a new chat, the model forgets everything about you, everything you've told it, everything you've established in previous conversations. This isn't a limitation we're working around anymore. It's the fundamental constraint that's been holding AI back from anything resembling real intelligence.
Large language models rely entirely on a fixed "context window," a token capacity that determines how much information they can process at once. Expand the window from 1,000 tokens to 1 million, and the computational cost doesn't increase linearly. It explodes.
You're computing attention across massively more information, which means exponentially more memory usage, slower inference, and astronomical costs for anyone trying to deploy these systems at scale. So models are capped. They forget. They start fresh every time.
What began as a technical limitation has become a design constraint. AI companies have worked around it through various means. Whether it be retrieval systems, summarization, prompt engineering tricks, none of them solve the underlying problem.
The model itself remains stateless. It has no persistent memory. A financial advisor AI can't remember your portfolio from yesterday. A therapist-like chatbot can't track your progress over months. A coding assistant can't maintain awareness of your codebase's evolution.
EverMind just released EverMemOS, an open-source Memory Operating System that treats memory as a first-class resource rather than an afterthought. Instead of cramming everything into a limited context window, the system separates memory into three types. The first is parametric memory (knowledge baked into model weights), the second is activation memory (temporary internal states used during inference), and the third is plaintext memory (external data like documents or conversations) (Laotian Times, 2025).
These interact through a unified framework called MemCube, which handles storage, versioning, access control, and intelligent recall. The architecture enables models to adapt continuously, internalize user preferences, and maintain behavioral consistency across interactions.
Once you can build a system where AI actually remembers users, remembers context, and remembers goals, you're no longer building simple tools. You're building something closer to companionship.
That also brings obligations. Who owns the memory? Who can access it? What happens when the AI system uses memory to manipulate or exploit? The technical problem of how to enable long-term memory was hard. The governance problem of what to do once you've solved it might be harder.
Google Research also released an architecture called Titans with MIRAS that solves the problem differently, using surprise-based memory updates to selectively store only novel, context-breaking information (Google Research, 2025). If a new input is expected based on what the model already knows, don't memorize it. If it's surprising, or anomalous, prioritize it for long-term storage. This lets the system scale memory without proportionally scaling compute.
The irony is that the memory bottleneck has been so present for so long that the entire AI industry has organized itself around its absence. We've built entire product categories, such as retrieval augmented generation, vector databases, and prompt engineering frameworks, to work around the fact that models can't remember. Now that we can enable real memory, those entire categories might become obsolete or transform into something unrecognizable.
While the technical solutions exist now, integrating a persistent memory system into deployed AI infrastructure is not trivial. It requires retraining workflows, redesigned interfaces, new privacy considerations, and some important conversations about what it means to build AI that remembers. The fact that these systems are open-source is important, but open-source doesn't automatically mean widespread adoption.
The memory constraint has limited what AI can become. We're about to find out whether lifting it reveals something remarkable, or whether it just enables us to deploy systems we weren't prepared to deploy.





Comments