Long-horizon LLM agents finally get a lifecycle-aware memory primitive
GitHub RepoImpressions109
View on GitHub
@githubprojectsPost Author

Long-Horizon LLM Agents Finally Get a Lifecycle-Aware Memory Primitive

If you've ever tried building an LLM agent that needs to remember state across hours or days of interaction, you know the pain. Context windows fill up. Old memories get evicted. The agent forgets it already solved a problem, or worse, hallucinates a new one.

Most attempts at long-term memory for LLMs are either too brittle (just dump everything into a vector store) or too manual (hand-write summarization logic). The PaperGuru Benchmark project on GitHub takes a different approach: a lifecycle-aware memory primitive that treats memory as something that lives, grows, and eventually expires.

It's not a hype-driven framework. It's a practical, well-thought-out primitive for agents that need to keep state across long horizons.

What It Does

The PaperGuru Benchmark is a testing ground for evaluating how well LLM agents handle long-horizon tasks with a structured memory system. At its core, it provides a memory primitive that tracks the lifecycle of information:

  • Birth: New information arrives and gets stored with metadata (timestamp, relevance score, source)
  • Life: The memory can be read, updated, or merged with other memories as the agent learns more
  • Death: Old, irrelevant, or contradictory memories are flagged for eviction or compression

This isn't just a vector store with timestamps. The primitive understands that some memories are more important than others, and that the same piece of information can change meaning over time.

Why It's Cool

The design is refreshingly developer-friendly. Instead of forcing you into a specific agent architecture, it gives you a memory interface you can plug into any LLM workflow. A few highlights:

  • Lifecycle hooks let you define custom policies for when memories should be promoted, merged, or deleted
  • Conflict resolution is built in — if two memories contradict each other (e.g., "project deadline is Friday" vs "deadline is Monday"), the system can flag the conflict rather than silently overwriting
  • Benchmarking is first-class: the repo includes a set of long-horizon tasks specifically designed to test memory fidelity over dozens of turns

What makes this particularly clever is the decay model. Memories don't just sit forever. They lose relevance based on how often they're accessed, their age, and whether they've been superseded by newer information. This mirrors how human memory works — recent, frequently used information stays sharp; old forgotten stuff fades.

How to Try It

Getting started is straightforward. The repo is self-contained with clear instructions:

git clone https://github.com/PaperGuru-AI/PaperGuru-Benchmark.git
cd PaperGuru-Benchmark
pip install -r requirements.txt
python run_benchmark.py --agent your_model_name

You can also dive straight into the API docs in the docs/ folder. The memory interface is clean and unopinionated:

from paperguru.memory import LifecycleMemory

memory = LifecycleMemory(decay_rate=0.95)
memory.store("user_name", "Alice", context={"source": "greeting"})
# Later...
result = memory.retrieve("user_name")  # Returns "Alice" with freshness metadata

For a quick demo, check out the notebook in examples/long_horizon_chat.ipynb — it shows a 50-turn conversation where the agent remembers context from the beginning without hitting context limits.

Final Thoughts

This project scratches an itch that's been nagging at me for a while. Most LLM memory solutions are either too simple (just append to context) or too complex (build a whole graph database). PaperGuru Benchmark finds a nice middle ground: a primitive that's smart enough to handle real-world use cases but simple enough you can drop it into your project in an afternoon.

If you're building agents that need to work over hours, days, or across multiple sessions (think customer support bots, research assistants, or even games), this is worth your time. The benchmark tasks alone are valuable for stress-testing your own memory approaches.

The long tail of agent applications is going to need memory that doesn't just vomit all previous context into the prompt. Lifecycle-aware primitives like this are a step in the right direction.


Found this useful? Follow @githubprojects for more dev-focused tools and projects.

Back to Projects
Last updated: June 8, 2026 at 03:31 AM