EMem: A Simple Yet Strong Baseline for Long-Term Conversational Memory of LLM Agents

EMem is a structured memory system designed to enable LLM-based conversational agents to maintain coherent, personalized interactions over extended multi-session dialogues. Unlike approaches that compress or chunk conversation history, EMem preserves information in a non-lossy form by decomposing each session into enriched Elementary Discourse Units (EDUs) — self-contained propositions with normalized entities and source turn attributions.

Key features:

Fine-grained, proposition-level representation: Conversational history is decomposed into EDUs — atomic statements that bundle participants, temporal cues, and minimal local context (inspired by neo-Davidsonian semantics where propositions capture facts, states, and relations)
Heterogeneous graph organization (optional): Sessions, EDUs, and their arguments are organized in a graph that supports associative recall
Two retrieval variants:
- EMem: A simple "Dense similarity search + LLM filtering" over EDUs
- EMem-G: Adds graph-based Personalized PageRank (PPR) propagation to connect and aggregate evidence across related EDUs
Efficient QA context: Achieves strong performance with significantly shorter QA contexts compared to baselines

📰 News

[2025.12] We open-source the EMem codebase on LoCoMo and LongMemEval benchmarks.

✅ TODO

Further refine the codebase and enhance the efficiency
Add support for additional benchmarks
Add documentation for custom dataset integration

🔧 Installation

Prerequisites

Python 3.10+
uv - Fast Python package installer and resolver

Install uv

If you don't have uv installed, you can install it via:

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or using pip
pip install uv

Install Dependencies

# Clone the repository
git clone https://github.com/KevinSRR/EMem.git
cd EMem

# Install all dependencies using uv
make install

This will install all required packages and download necessary NLTK data.

Environment Variables

Set your OpenAI API key before running experiments:

export OPENAI_API_KEY="your-openai-api-key"

📦 Dataset Preparation

Before running experiments, download and place the evaluation datasets in the appropriate directories.

LoCoMo Dataset

Download locomo10.json from the official LoCoMo repository:

# Create data directory if it doesn't exist
mkdir -p data

# Download LoCoMo dataset
wget -O data/locomo10.json \
  https://raw.githubusercontent.com/snap-research/locomo/main/data/locomo10.json

Source: snap-research/locomo

LongMemEval Dataset

Download longmemeval_s_cleaned.json from Hugging Face:

# Create directory structure
mkdir -p data/longmemeval/preprocess

# Download LongMemEval dataset (requires huggingface-hub or manual download)
# Option 1: Using huggingface-cli
huggingface-cli download xiaowu0162/longmemeval-cleaned longmemeval_s_cleaned.json \
  --local-dir data/longmemeval/preprocess

# Option 2: Manual download from browser
# Visit: https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/tree/main
# Download longmemeval_s_cleaned.json and place it in data/longmemeval/preprocess/

Source: xiaowu0162/longmemeval-cleaned

Expected Directory Structure

EMem/
├── data/
│   ├── locomo10.json
│   └── longmemeval/
│       └── preprocess/
│           └── longmemeval_s_cleaned.json

🧪 Reproducing Experiments

Running EMem vs EMem-G

EMem provides two model variants:

Variant	Description	Command Flag
EMem-G (default)	Uses PPR graph propagation for retrieval	(none)
EMem	Skips PPR step, retrieves filtered EDUs directly	`--skip_retrieval_ppr`

*Note that with --skip_retrieval_ppr, we still build the same graph before retrieval even though the graph is redundant for EMem. We are working to remove this dependency.

LoCoMo Evaluation

Run evaluation on the LoCoMo benchmark:

# EMem-G (with PPR) - default
make eval-locomo

# EMem (without PPR)
make eval-locomo-emem

Optional parameters:

# Use a different LLM model
make eval-locomo LLM_MODEL=gpt-4o

# Use a different embedding model
make eval-locomo EMBEDDING_MODEL=text-embedding-3-large

# Custom save frequency
make eval-locomo LOCOMO_SAVE_FREQUENCY=50

Final prediction results (pickle files) will be saved to results/. Intermediate artifacts (LLM cache, graph indices, traces) will be saved to outputs/emem_locomo_eval/.

LongMemEval Evaluation

Run evaluation on the LongMemEval benchmark:

# EMem-G (with PPR) - default
make eval-longmemeval

# EMem (without PPR)
make eval-longmemeval-emem

Advanced options:

The LongMemEval dataset is large, so we provide a pre-caching option that batch-calls LLM APIs for EDU extraction and event extraction across all test samples before running the full evaluation. This helps improve the speed.

# Only pre-cache API calls
make eval-longmemeval-precache

# Skip pre-caching (use existing cache)
make eval-longmemeval-skip-precache

Final prediction results (pickle files) will be saved to results/. Intermediate artifacts (LLM cache, graph indices, traces) will be saved to outputs/emem_longmemeval_eval/.

Final Evaluation for Paper Reporting

After running the main evaluation workflows, use the final evaluation commands to compute standardized metrics for paper reporting. This includes:

LLM Judge scores (multiple runs for mean ± std)
BLEU-1, F1, and Exact Match (EM) scores

# For LoCoMo results
make final-eval-locomo RESULT_FILE=results/emem_locomo_results_xxx.pkl NUM_RUNS=3

# For LongMemEval results
make final-eval-longmemeval RESULT_FILE=results/emem_longmemeval_results_xxx.pkl NUM_RUNS=3

Parameters:

RESULT_FILE: Path to the pickle file containing model predictions and ground truth references (required)
NUM_RUNS: Number of independent LLM judge runs for computing mean and std (default: 3)
MAX_CONCURRENT: Maximum concurrent API requests (default: 50)

Final evaluation results (JSON with detailed metrics) will be saved to the same directory as the input RESULT_FILE.

📄 Citation

If you find this work useful, please consider citing our paper:

@article{zhou2025simple,
  title={A Simple Yet Strong Baseline for Long-Term Conversational Memory of LLM Agents},
  author={Zhou, Sizhe},
  journal={arXiv preprint arXiv:2511.17208},
  year={2025}
}

Please feel free to open an issue or PR if you have any questions or suggestions.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
assets		assets
data/longmemeval/preprocess		data/longmemeval/preprocess
src/emem		src/emem
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
run_emem_locomo.sh		run_emem_locomo.sh
run_emem_locomo_cached.py		run_emem_locomo_cached.py
run_emem_longmemeval.sh		run_emem_longmemeval.sh
run_emem_longmemeval_cached.py		run_emem_longmemeval_cached.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EMem: A Simple Yet Strong Baseline for Long-Term Conversational Memory of LLM Agents

📰 News

📑 Table of Contents

✅ TODO

🔧 Installation

Prerequisites

Install uv

Install Dependencies

Environment Variables

📦 Dataset Preparation

LoCoMo Dataset

LongMemEval Dataset

Expected Directory Structure

🧪 Reproducing Experiments

Running EMem vs EMem-G

LoCoMo Evaluation

LongMemEval Evaluation

Final Evaluation for Paper Reporting

📄 Citation

About

Uh oh!

Releases

Packages

Languages

License

KevinSRR/EMem

Folders and files

Latest commit

History

Repository files navigation

EMem: A Simple Yet Strong Baseline for Long-Term Conversational Memory of LLM Agents

📰 News

📑 Table of Contents

✅ TODO

🔧 Installation

Prerequisites

Install uv

Install Dependencies

Environment Variables

📦 Dataset Preparation

LoCoMo Dataset

LongMemEval Dataset

Expected Directory Structure

🧪 Reproducing Experiments

Running EMem vs EMem-G

LoCoMo Evaluation

LongMemEval Evaluation

Final Evaluation for Paper Reporting

📄 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages