Replies: 1 comment
-
Cross-posting from the community sync (Aug 6). Memory TypesThere are multiple differences between Sensory Memory, Short-Term Memory and Long-Term Memory.
Which should be the dominating differentiator? E.g., what if the user needs something with the access pattern of sensory data while the retention of short-term memory?
Hybrid State BackendModifying Flink's State Backend, or creating a new State Backend can be complicated. Instead, I would suggest designing our own Memory Component, which may leverage Flink's State Backend and other storages that we need. Flink's checkpointing mechanism is providing callbacks that allows to do things during making and restoring from checkpoints. In high-level, this should be the same as having a hybrid state backend. Redistribution of MemoryOne thing that the current design did not mention is the redistribution of memory. Imagine we stop the Flink job, change the parallelism, and resume the job from a checkpoint. In that case, the distribution of the keyed data over the parallel tasks changes, and we would need to redistribute the memory along with the keyed data. Flink's MapState already supported this. We need to figure out how the vector store, and the history store if we don't use Flink's existing states for this, should be redistributed in such cases. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Introduction
Memory is fundamental to human intelligence—it shapes identity, guides decisions, and enables learning, adaptation, and meaningful relationships. In communication, memory allows us to recall past interactions, infer preferences, and maintain coherent, context-rich exchanges over long periods. In contrast, current AI agents powered by large language models (LLMs) are limited by fixed context windows and lack persistent memory, leading to forgetfulness, contradictions, and a diminished user experience. Even as LLMs’ context windows grow, they cannot match the human ability to retain and retrieve relevant information across sessions and topics. This limitation is especially problematic in domains requiring continuity and trust, such as healthcare and education. To overcome these challenges, AI agents need robust memory systems that can selectively store, consolidate, and retrieve important information—mirroring human cognition. Such systems will enable AI agents to maintain consistent personas, track evolving user preferences, and build upon prior exchanges, transforming them into reliable, long-term collaborators.
Proposal
To address the critical limitations of current AI agents—lack of persistent memory, poor long-term relationship building—we propose a distributed memory system for flink-agents that mirrors human cognitive processes.
The solution implements three memory types: Sensory Memory (internal events), Short-term Memory (recent history), and Long-term Memory (semantic retrieval). It leverages Flink's distributed state management with a hybrid backend combining history store and vector store, directly addressing LLM context window limitations through persistent, searchable memory.
Additionally, Knowledge provides shared long-term memory accessible across all agent instances, stored externally without Flink checkpointing, enabling domain-specific expertise and consistent collaboration in trust-critical domains.
Memory
High-Level Architecture
The Flink-Agents Memory System introduces a hybrid state backend that combines two specialized state backends:
This hybrid approach optimizes both storage efficiency and search performance, unlike traditional single-backend solutions.
Memory Types with Hybrid Storage
The system supports different types of memories, each with specialized storage in our hybrid state backend. Sensory memory operates internally and is not exposed to users:
Memory Types and Implementation
The system supports three distinct types of memories, each serving different purposes and using different storage mechanisms:
Sensory Memory (Event Processing)
Sensory memory captures real-time events from agents and stores them in Flink's keyed state using
MapState
. This represents the immediate sensory input that agents receive from their environment. Sensory memory is completely invisible to users and operates automatically in the background.Characteristics:
Short-term Memory (Recent History)
Short-term memory maintains a configurable history of recent agent interactions and experiences. It uses history storage for persistence and provides fast access to recent memories.
Characteristics:
Long-term Memory (Semantic Search)
Long-term memory provides semantic search capabilities using a vector storage backend. It stores memories with embeddings for similarity-based retrieval.
Characteristics:
Memory API Design
The Memory API provides a unified interface for different types of memories. Sensory memory is not exposed through the API and operates automatically in the background.
Memory inherits from Flink's State API. It behaves like other Flink KeyedState implementations and can be saved to StateBackends and checkpointed for fault tolerance and recovery.
Knowledge
Knowledge is a special type of long-term memory that is shared across all Agent instances. It provides access to external, pre-built knowledge sources through remote vector database connections, offering domain-specific knowledge that can enhance agent responses. Unlike individual agent memories, knowledge is globally accessible and persistent across the entire Flink cluster. Since knowledge is independent from a particular agent,
it does not require Flink state checkpointing.
Knowledge Architecture
Knowledge Base Implementation
Memory Compaction
Memory compaction addresses the fundamental limitation of Large Language Models (LLMs): their fixed context window constraints. While the Memory API can store vast amounts of memories limited only by disk capacity and storage limits, LLMs have context window limitations ranging from several thousand to several million tokens.
Compaction Strategies
Memory compaction reduces memory volume while preserving essential information through two main approaches:
1. Scoring-Based Compaction
Scoring-based compaction evaluates the relevance and importance of each memory item using mechanisms such as recency, frequency of access, semantic similarity to the current query, and user-defined importance markers. By ranking memories based on these criteria, the system retains only the most significant ones.
2. Summarization-Based Compaction
Summarization-based compaction uses LLMs to create intelligent summaries of memory groups. This approach groups similar memories together and generates concise summaries that capture essential information from multiple related memories.
Integration
The compaction functionality is integrated into the Agent API to provide seamless compaction capabilities. This integration ensures that agents can efficiently access relevant memory information without exceeding context window limitations, resulting in optimized context window usage, faster LLM inference, and reduced token costs.
Data Flow
Memory Addition Flow
Memory Search Flow
Integration in Agents
Execution Plan
The implementation will be executed in three phases:
Phase 1: Core Memory Foundation
Phase 2: Flink State Integration
Phase 3: Hybrid State Backend
Conclusion
The Flink-Agents Memory System introduces a revolutionary hybrid state backend that combines RocksDB/ForSt and Lucene instances within a unified state management framework. This dual-engine approach eliminates the need for external vector databases while providing optimal performance for both historical data storage and semantic search operations.
Key Innovations
This design provides a solid foundation for building sophisticated, memory-aware agents in Apache Flink environments, with the hybrid state backend serving as the cornerstone of the architecture.
Beta Was this translation helpful? Give feedback.
All reactions