Giving your AI agents persistent memory

Store and retrieve conversation context across sessions using Azure Cosmos DB

Giving your AI agents persistent memory

One of the first complaints you hear from users after they start using an AI assistant in the enterprise is that it forgets everything between sessions. They have to repeat context every single time they start a new conversation. This is because most agent implementations are stateless - once the conversation ends, the context is gone.

Adding persistent memory to your agents is not complicated and it makes a massive difference to the user experience. I use Azure Cosmos DB for this because it scales well, supports flexible document schemas and integrates cleanly with the rest of the Azure stack.

Set up your environment:

pip install azure-cosmos openai python-dotenv

AZURE_OPENAI_ENDPOINT=https://yourresource.openai.azure.com/
AZURE_OPENAI_API_KEY=<your api key>
COSMOS_ENDPOINT=https://yourcosmosaccount.documents.azure.com:443/
COSMOS_KEY=<your cosmos key>
COSMOS_DATABASE=agentdb
COSMOS_CONTAINER=conversations

The memory manager handles saving and loading conversation history:

from azure.cosmos import CosmosClient, PartitionKey
import os
from dotenv import load_dotenv
 
load_dotenv()
 
class AgentMemory:
 
    def __init__(self, user_id: str):
        self.user_id = user_id
        client = CosmosClient(
            url=os.getenv("COSMOS_ENDPOINT"),
            credential=os.getenv("COSMOS_KEY")
        )
        db = client.get_database_client(os.getenv("COSMOS_DATABASE"))
        self.container = db.get_container_client(os.getenv("COSMOS_CONTAINER"))
 
    def load_history(self) -> list:
        try:
            item = self.container.read_item(
                item=self.user_id,
                partition_key=self.user_id
            )
            return item.get("messages", [])
        except Exception:
            return []
 
    def save_history(self, messages: list):
        self.container.upsert_item({
            "id": self.user_id,
            "messages": messages[-20:]  # keep last 20 messages
        })

Now you can build an agent that loads the users history at the start of each session:

from openai import AzureOpenAI
 
def chat_with_memory(user_id: str, user_input: str) -> str:
 
    memory = AgentMemory(user_id=user_id)
    messages = memory.load_history()
 
    if not messages:
        messages = [
            {"role": "system", "content": "You are a helpful enterprise assistant. Remember context from previous conversations with this user."}
        ]
 
    messages.append({"role": "user", "content": user_input})
 
    openai_client = AzureOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        api_version="2024-12-01-preview",
    )
 
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
 
    assistant_message = response.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_message})
 
    memory.save_history(messages)
 
    return assistant_message

A few things to consider

Storing full conversation histories can get expensive both in terms of storage and token costs. I limit the history to the last 20 messages which is enough to maintain useful context without bloating your prompts.

You also want to think about what you store. If users are discussing sensitive information, make sure your Cosmos DB instance has the right encryption and access controls in place. In an enterprise setting this is usually straightforward as you are already within your Azure tenant.

Another option is to use a summarisation step instead of storing raw messages. Every 10 messages, you call the model to produce a summary of the conversation so far and store that instead. This keeps your token usage down while still maintaining continuity.

Persistent memory is one of those features that turns an AI assistant from a novelty into something people genuinely rely on. Its worth the extra setup.