PIXIV: 94008036 @苏翼丶

A Brief Discussion on Agent Memory Systems

(After finishing this article and looking back at it, I realized I’ve used the word “memory” too many times, and I’m starting to doubt my recognition of it. Please give it a like; writing tutorials really drains your lifespan… (｡ ́︿ ̀｡)

Since the moment GPT-3 broke into mainstream awareness, opinions on the memory systems of Large Language Models (LLMs) have been varied. Technically speaking, on one hand, some technologies derived from the Transformer architecture have facilitated the development of memory systems (such as Embedding/Vector Databases), while on the other hand, demand has spurred additional technologies (like Graph Databases). However, from 2023 to the present, the progress of Agent memory systems has been relatively slow.

With the help of these technologies and concepts, I’ve designed a memory system tailored for Cyber Waifus, characterized by automatic storage and retrieval while ensuring tiered persistent memory.

Classification of Memory Concepts

I categorize the memory system of Cyber Waifus into three types: short-term memory, medium-term memory, and long-term memory. Let’s explain these concepts one by one:

Short-term Memory

How short is short-term memory? I believe short-term memory contains the richest and most primal information. Much like human memory, people forget many details daily, yet may recall recent events with specificity. I consider all content that has not yet been summarized or simplified by the LLM as short-term memory. In simpler terms, for Cyber Waifus, the context of each conversation or the memories of the day (or yesterday) count as short-term memory. This portion of memory contains the raw records of interactions between you and the Cyber Waifu. I would also extend this definition to include daily summary memories and memories from each conversation because in practical memory loading, recent memories often occur with higher frequency and require relatively detailed specifics, and they are generally loaded unconditionally together. That is to say, contextual memory is assumed to be essential, and past conversations, or what was discussed yesterday, are also more likely to be brought up; their loading frequency is similarly high.

Conversation: You interact with your Cyber Waifu, chat for a bit, and then move on to other tasks, like having a meal. After finishing your meal, the conversation may have exceeded its timeout, and you don’t wish to continue with the previous topic but rather talk about something else. Therefore, the Cyber Waifu opens a new conversation. This benefits entity extraction and the generation of memory networks in subsequent structured memories.

Medium-term Memory

Medium-term memory differs from short-term memory; it consists of simplified and summarized content, meaning some details have been discarded. But why do we refer to it as medium-term memory? This still mimics human brain memory, where a portion of ongoing events is temporarily stored and its contents simplified. Once the event concludes with a result, the majority will be forgotten, often leaving only a general impression, such as: “I once did something, and the result was…” While the specific process and details of the event are entirely forgotten. This type of medium-term memory also does not significantly influence the permanent memory network.

As an example: You purchased a brand new graphics card, the RTX 9090 Ti Super Plus Max Ultra 1024TB, from an online shopping platform. You paid, and the seller dispatched the package. Being keen on this super powerful graphics card, you keep checking the shipping status. This is a prime example of medium-term memory during an “event.” Several days later, your package finally arrives, and you are elated and can’t wait to unbox and install it; at that moment, you might forget how many times you tracked the shipping or how many days it took because you are focused on installing it to see its performance. This exemplifies the forgetting characteristic of medium-term memory—achieving a stage goal may lead you to forget many details, but the event’s existence spans a cycle that far exceeds “short-term memory.” You installed it on your computer, but you found that running LLMs with this graphics card is painfully slow, and you’re shocked to discover that, due to a certain company’s precise manufacturing, the RTX 9090 Ti Super Plus Max Ultra 1024TB despite having a massive video memory, has a dismal memory bandwidth of 128 bits—even worse than the next generation’s 10070 Ti Super Plus Max Ultra Gaming’s upgraded 150 bits! While lamenting the company’s construction, you couldn’t help but curse: “Fxxk you Nvxdxa !!” Feeling disheartened, you sold it at a loss in a seafood market while occasionally checking certain platforms, worried about buyers returning it arbitrarily… Finally, you’re lucky enough to sell the graphics card, closing the chapter on this saga. Years later, you might only remember the unfortunate limitations and that phrase “Fxxk you Nvxdxa !!”

This illustrates the characteristics of medium-term memory; not all memories become medium-term memories, only certain specific events that fall within a moderate time frame qualify. Additionally, the example provided above is fictional and not adapted from real events!!!

Long-term Memory

At last, we’ve come to this point. Let’s accelerate.

The long-term memory of the human brain is an extremely complex system. Our aim for Cyber Waifus is to try to balance “conciseness” and “functionality.” However, Cyber Waifus are not like humans; when you need them to recall important details, they can’t just be cute and gloss over it. The long-term memory of Cyber Waifus needs to be able to retrace to the original conversation memory or relevant documents. In most cases, it only requires key information prompts. This places high demands on the indexing and querying systems; the key to designing a long-term memory system lies in how to efficiently and accurately automate storage and retrieval, especially the retrieval process.

I categorize long-term memory queries and indexing into two modes: “fuzzy search” and “relational search,” corresponding to Retrieval-Augmented Generation (RAG) and Graph Query respectively. These are just two different retrieval methods; in practice, they are often used in combination. Why do this? Let’s look at a practical example.

Fuzzy search: Imagine you need to have the Cyber Waifu recall the days when you spent more than 500 yuan. “More than 500 yuan” is a semantic expression related to value/expense. In this case, the memory executes a semantic quantity range retrieval, which is relatively fuzzy. (Since the memory database is general, we do not have a dedicated ledger database; otherwise, a direct expression of >500 would suffice.) Here, semantic fuzzy retrieval predominates, while relational search plays a minor role, merely serving as a supplementary query for entities or further details.

Relational search: Suppose you ask your Cyber Waifu to analyze your current social network and find potential connections among its members. At this moment, relational search comes into play. Relational searches depend on graph databases, which convert memories into entities and establish connections. When we perform entity relation chain retrieval, the query’s triples activate several initial nodes representing the queried entities. The system retrieves information about other entities connected to these primary entities, returning results. During this, any association with the query in terms of space-time or semantics or any other form will be returned. Subsequent jumps through the knowledge network can occur multiple times, or they can be grouped and reordered, or narrowed down to a small cluster of nodes for fuzzy retrieval, etc.

In the examples provided, it’s easy to see that the two major types of retrieval methods are frequently used in combination. This is also the current trend. In practice, short, medium, and long-term memories are likewise mixed; however, the focus or intensity may differ each time they are invoked.

Relevant Technologies for Memory Systems

Non-relational Database (NoSQL)

For example, MongoDB serves as a useful non-relational database for storing multimedia data such as text, audio, and images. All of our raw data exists here; superior databases trace back to here, making raw data relatively accessible, while subsequent modifications, additions, backups, and maintenance are simpler with MongoDB. The Python SDK for MongoDB is quite robust, and the platform supports solid deployment options.

Embedding

A data quantization technique that works with graph and vector databases to quantify multimedia data from multiple perspectives (semantics, acoustic features, visual features, etc.), facilitating processing by databases and LLMs.

Vector Database

Stores vectorized embedding data and corresponding summarized secondary data, enabling semantic searches, RAG, and mixed reordering. Vector databases excel in similarity searches, encompassing images, text, audio, etc.

Graph Database

Graph databases are adept at handling entity relationships and networks. Through graph databases, LLMs can efficiently store and process relationship networks among entities, particularly useful in applications requiring the understanding and navigation of complex associations. Another example is the construction of knowledge graphs where agents can use graph databases to store knowledge points and their interrelated information, aided by internet searches to help you discover new potential content or even predict temporal events.

Other Relevant Technologies

There are also various details in this project that may utilize techniques such as clustering, regression, Q-learning, decision trees, dimensionality reduction, and so forth. As this article is relatively introductory, these topics will be discussed in corresponding advanced tutorials later. (I’ve dug countless deep pits (;￣O￣）)

Short-term Memory Practice

Having discussed so much, let’s start to practice relatively simple short-term memory.

Three Types of Contextual Memory

These three types of memory essentially inherit previous methodologies from Langchain; however, as we understand the methods, we can arrange and combine them, choosing based on different scenarios.

Contextual Memory

We’ve already introduced the concept of contextual memory in the second article, so let’s systematically discuss it here. The first type of contextual memory is unrestricted. As long as your program is running, and the backend isn’t shut down, this list of memories remains in RAM. If the program is closed or the machine loses power, the contextual memory is lost, so it is not a persistent memory. Moreover, this first type of context is unlimited in length: the more you chat, the more context accumulates, with no unloading mechanism, so there are many potential pitfalls. Here’s the modified code:

It simply changes this line:

# Change this
chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history[-2*4:-1]])
# To this
chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history])
# Spot the difference

Complete code is as follows:

from openai import OpenAI

chat_model= OpenAI(
	# Replace this with your backend API address
	base_url="https://api.openai.com/v1/",
	# API Key for authentication
	api_key = "sk-SbmHyhKJHt3378h9dn1145141919810D1Fbcd12d"
	)

chat_history = []

def get_response_from_llm(question):

    print(chat_history)

    # No window means unlimited context, but exceeding maximum context limits of LLM will cause errors
    chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history])
    chat_history_prompt = f"Here is the chat history:\n {chat_history_window}"  


    message = [
        {"role": "system", "content": "You are a catgirl! Output in Chinese."},
        {"role": "assistant", "content": chat_history_prompt},
        {"role": "user", "content": question},
    ]

    print(message)

    response = chat_model.chat.completions.create(
                model='gpt-4o-mini',
                messages=message,
                temperature=0.7,
            )
  
    response_str = response.choices[0].message.content

    return response_str

if __name__ == "__main__":
    while True:
        user_input = input("\nEnter a question or type 'exit' to quit:")
        if user_input.lower() == 'exit':
            print("Goodbye!")
            break 
        chat_history.append(('human', user_input))
        response = get_response_from_llm(user_input)
        chat_history.append(('ai', response))
        print(response)

Contextual Window

# Change this to create a window that extracts only the most recent portion of the context from the list, with a fixed number determined by the expression[-2*n:-1]
chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history[-2*4:-1]])

By altering n to any number, you can control the number of conversational rounds. For instance, if you want the AI’s memory to encompass three rounds of conversations, you would change it to [-7:-1], since one round of dialogue consists of user input and robot reply messages, which means six entries.

Contextual Summary

Simply put, once a certain number of rounds are reached, extract the most recent entries, then summarize them to reduce token usage, or prepare for medium and long-term memory. The LLM receives several summaries and continues the conversation based on the summarized context. This method will sacrifice some information and should be used in specific situations.

We first add a new function to summarize chat records:

def summarize_chat_history(chat_history_window):
    """
    Summarize recent chat records.
    :param chat_history_window: Recent chat records.
    :return: Summary string.
    """
    # Create summary prompt
    summary_prompt = f"Please summarize the following conversation content:\n{chat_history_window}"

    print(f"Generating summary for the following content: {summary_prompt}")
    # Call LLM to generate the summary
    summary_response = chat_model.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{"role": "user", "content": summary_prompt}],
        temperature=0.7,
    )
    # Get summary content
    summary_str = summary_response.choices[0].message.content
    return summary_str

Next, we need to add some counting and checking in the earlier LLM response function so that the summarized content can be included in the prompt message:

    # If the chat history reaches the summary threshold
    if len(chat_history) >= SUMMARY_THRESHOLD:
        # Retrieve the most recent chat window
        chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history[-SUMMARY_THRESHOLD*2-1:-1]])
        # Generate summary
        summary = summarize_chat_history(chat_history_window)
        # Add summary to the summary record
        summary_history.append(summary)
        # Keep the last chat record
        chat_history = chat_history[-1:]

Thus, the LLM response function will transform to:

def get_response_from_llm(question):
    """
    Get the response from LLM.
    :param question: User's question.
    :return: LLM's response.
    """
    global chat_history, summary_history

    # If chat history reaches the summary threshold
    if len(chat_history) >= SUMMARY_THRESHOLD:
        # Retrieve the most recent chat window
        chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history[-SUMMARY_THRESHOLD*2-1:-1]])
        # Generate summary
        summary = summarize_chat_history(chat_history_window)
        # Add summary to the summary record
        summary_history.append(summary)
        # Keep the last chat record
        chat_history = chat_history[-1:]

    # Get the recent chat history window
    chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history[-2*SUMMARY_THRESHOLD:-1]])
    chat_history_prompt = f"Here is the chat history:\n {chat_history_window}"  

    # Create message list
    message = [
        {"role": "system", "content": "You are a catgirl! Output in Chinese."},
        {"role": "assistant", "content": chat_history_prompt},
        {"role": "user", "content": question},
    ]

    # If there is a summary record, add it to the message list
    if summary_history:
        summary_prompt = "\n".join(summary_history)
        message.insert(1, {"role": "assistant", "content": f"Summary of previous conversations:\n{summary_prompt}"})

    # Call LLM to get the response
    response = chat_model.chat.completions.create(
        model='gpt-4o',
        messages=message,
        temperature=0.7,
    )
  
    print(f"message: {message}")
    # Get response content
    response_str = response.choices[0].message.content

    return response_str

The final complete code looks like this:

from openai import OpenAI

# Initialize OpenAI model
chat_model= OpenAI(
	# Replace this with your backend API address
	base_url="https://api.openai.com/v1/",
	# API Key for authentication
	api_key = "sk-SbmHyhKJHt3378h9dn1145141919810D1Fbcd12d"
	)

# Lists for chat records and summary records
chat_history = []
summary_history = []
SUMMARY_THRESHOLD = 4  # Define the number of rounds after which to summarize

def summarize_chat_history(chat_history_window):
    """
    Summarize recent chat records.
    :param chat_history_window: Recent chat records.
    :return: Summary string.
    """
    # Create summary prompt
    summary_prompt = f"Please summarize the following conversation content:\n{chat_history_window}"

    print(f"Generating summary for the following content: {summary_prompt}")
    # Call LLM to generate the summary
    summary_response = chat_model.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{"role": "user", "content": summary_prompt}],
        temperature=0.7,
    )
    # Get summary content
    summary_str = summary_response.choices[0].message.content
    return summary_str

def get_response_from_llm(question):
    """
    Get the response from LLM.
    :param question: User's question.
    :return: LLM's response.
    """
    global chat_history, summary_history

    # If chat history reaches the summary threshold
    if len(chat_history) >= SUMMARY_THRESHOLD:
        # Retrieve the most recent chat window
        chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history[-SUMMARY_THRESHOLD*2-1:-1]])
        # Generate summary
        summary = summarize_chat_history(chat_history_window)
        # Add summary to the summary record
        summary_history.append(summary)
        # Keep the last chat record
        chat_history = chat_history[-1:]

    # Get the recent chat history window
    chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history[-2*SUMMARY_THRESHOLD:-1]])
    chat_history_prompt = f"Here is the chat history:\n {chat_history_window}"  

    # Create message list
    message = [
        {"role": "system", "content": "You are a catgirl! Output in Chinese."},
        {"role": "assistant", "content": chat_history_prompt},
        {"role": "user", "content": question},
    ]

    # If there is a summary record, add it to the message list
    if summary_history:
        summary_prompt = "\n".join(summary_history)
        message.insert(1, {"role": "assistant", "content": f"Summary of previous conversations:\n{summary_prompt}"})

    # Call LLM to get the response
    response = chat_model.chat.completions.create(
        model='gpt-4o',
        messages=message,
        temperature=0.7,
    )
  
    print(f"message: {message}")
    # Get response content
    response_str = response.choices[0].message.content

    return response_str

if __name__ == "__main__":
    while True:
        user_input = input("\nEnter a question or type 'exit' to quit:")
        if user_input.lower() == 'exit':
            print("Goodbye!")
            break 
        # Add user input to chat history
        chat_history.append(('human', user_input))
        # Get LLM response
        response = get_response_from_llm(user_input)
        # Add LLM response to chat history
        chat_history.append(('ai', response))
        # Print LLM response
        print(response)

Database and Memory Persistence

Deploying MongoDB

We will deploy MongoDB using Docker; feel free to choose any deployment method. We will temporarily install Docker Engine; if you prefer the desktop version, please configure that yourself. Here, most commands will involve Docker Engine and Docker Compose.

First, here’s the official installation guide: Install Docker

Using the command line, input each command in order:

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

If everything goes as expected, it will prompt messages related to the successful installation of Docker.

After the installation, run this command to test if Docker is functioning correctly:

sudo docker run hello-world

If working normally, this command will print output confirmation and exit. Then check if you have installed Compose along with Docker; nowadays, Docker comes with Compose by default, but if not, please install it yourself.

sudo docker compose version

Create a directory in any path you can access and name it cyberai, where we will create the docker-compose.yaml file to efficiently manage containers.

sudo mkdir cyberai
cd cyberai

Then, use nano, vim, or vscode to create a docker-compose.yaml file. I’ll use vim, but whichever method you prefer is fine. If you’re a beginner or haven’t installed vim, you might want to stick with vscode for ease of use.

sudo vim docker-compose.yaml

Next, copy this configuration file section, and paste it into your command line (if you cannot paste immediately, you can press the “i” key to enter insert mode, and then paste it).

version: '3.8'  # Specify the version of the Docker Compose file

services:
  mongodb:
    image: mongo:latest  # Use the latest MongoDB image
    volumes:
      - ./data/db:/data/db  # Bind current directory ./data/db to /data/db in the container for database data persistence
      - ./data/backup:/data/backup  # Bind current directory ./data/backup to /data/backup in the container for data backup
    environment:
      - MONGO_INITDB_ROOT_USERNAME=admin  # Set the root username for MongoDB, can be changed as needed
      - MONGO_INITDB_ROOT_PASSWORD=secret  # Set the root user's password for MongoDB, can be changed as needed
    networks:
      - internal_network  # Connect the service to an internal network named internal_network
    restart: always  # Configure the container to automatically restart after failure
    ports:
      - "127.0.0.1:27017:27017"  # Map the container's 27017 port to the host's 127.0.0.1:27017, exposing the port only locally

networks:
  internal_network:
    driver: bridge  # Define an internal network named internal_network, using bridge driver

After entering, press the ESC key to exit insert mode, then press shift+； to enter a colon, :wq to save and exit. The summary is to press Esc, type :wq, and press Enter to return to the command line interface.

Whenever you start or shut down this image, you should operate it either with the absolute path of docker-compose.yaml or cd to its folder. Otherwise, Compose cannot find the location of the YAML file. Inside the folder, input this command:

docker compose up -d

You will notice that it begins downloading the image and unzipping it, and once done, it will create and start the container.

You can use these commands to observe logs or manage containers:

# List all containers
docker ps -a
# Display real-time logs for the container, copy ID or name from the list returned earlier
docker logs -f <Container-ID-or-Name>
# Shut down this compose
docker compose down

We also need to install the corresponding Python pymongo library:

pip install pymongo

Next, we’ll create a new Python file; let’s call it pymongo_test.py:

# pip install pymongo
import pymongo
from pymongo import MongoClient
import os

def test_mongodb_connection():
    # Retrieve MongoDB configuration from environment variables
    mongo_host = os.getenv('MONGO_HOST', 'localhost')
    mongo_port = int(os.getenv('MONGO_PORT', 27017))
    mongo_user = os.getenv('MONGO_USER', 'admin')
    mongo_password = os.getenv('MONGO_PASSWORD', 'secret')
    # Create a MongoDB database named chat_history
    mongo_db_name = os.getenv('MONGO_DB_NAME', 'chat_history')

    # Create MongoDB client
    client = MongoClient(
        host=mongo_host,
        port=mongo_port,
        username=mongo_user,
        password=mongo_password
    )

    try:
            # Connect to specified database
            db = client[mongo_db_name]
          
            # Insert a document into a test collection
            test_collection = db['daily']
            test_document = {"name": "test", "value": 123}
            test_collection.insert_one(test_document)
          
            # Attempt to retrieve the list of collections to verify connection and insert operation
            collections = db.list_collection_names()
            print(f"Successfully connected to database '{mongo_db_name}', collection list: {collections}")
    except Exception as e:
        print(f"Failed to connect to MongoDB database: {e}")
    finally:
        # Close MongoDB client
        client.close()

if __name__ == "__main__":
    test_mongodb_connection()

Running this script will yield various outputs, and if there are no errors, it confirms a successful connection to the database.

Short-term Memory and Raw Data

We are now equipped with the external database and LLM. Next, we will prepare to process data with LLM and store it in the database. Some of the stored data will serve as short-term memory, while another portion will be used for higher-tier databases for retrospection and refinement.

My plan involves employing a session management mechanism, where each session has a timeout—for instance, we may set a session duration of 30 minutes; if there is no dialogue beyond this time, the backend will archive the expired session. Of course, this timeout can be freely modified. This approach benefits the independent management of sessions, facilitating retrospection and subsequent retrieval/organization.

As the functional modules gradually increase in complexity, we will now use customizable modules and classes (Class) in Python to define different functionality and categorize them together. This will ease future maintenance, feature addition, and testing.

Custom Modules and Classes

Since this is a “from scratch” tutorial, I’ll briefly introduce custom modules and classes—experienced users can skip this.

Creating a Custom Module

In Python, a custom module is essentially a file containing Python code. By creating custom modules, you can organize related functional code together for reusability and maintainability.

Create a Python file: First, create a new Python file in your project directory. The file name will serve as the module’s name. For example, create a file named mymodule.py.

Write module code: Define functions, variables, and classes in that file. For example:

# mymodule.py

def add(a, b):
    return a + b

def subtract(a, b):
    return a - b

class Calculator:
    def __init__(self):
        self.value = 0

    def add(self, amount):
        self.value += amount

    def subtract(self, amount):
        self.value -= amount

    def get_value(self):
        return self.value

Using a Module

To utilize the custom module, import it in other files in your project. Assume you have a file named main.py and you want to use functionalities from mymodule:

# main.py
# Note: If your custom module and main script are not in the same folder, provide the relative path to the main.py file. Even though Python automatically searches many paths during compilation, using accurate relative paths is advisable.
import mymodule

# Use functions
result = mymodule.add(5, 3)
print(f"Addition: {result}")

# Use class, initializing an object named calc, which can be any name, pointing to our custom module
calc = mymodule.Calculator()
# Subsequent calls can simply use the name + function name inside our class
calc.add(10)
calc.subtract(3)
print(f"Calculator Value: {calc.get_value()}")

Creating and Using Classes

Classes form the basis of object-oriented programming in Python. Using classes, you can create objects to encapsulate data and functionality.

Defining a Class

In Python, define classes using the class keyword. Here’s a simple example of class definition:

class Dog:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def bark(self):
        print(f"{self.name} says woof!")

    def get_age(self):
        return self.age

Using a Class

Once a class is defined, you can create instances (objects) of that class and utilize their methods and properties:

# Create an instance of Dog class
my_dog = Dog("Buddy", 3)

# Call instance methods
my_dog.bark()  # Output: Buddy says woof （＾_＾）!

# Retrieve property
print(f"My dog is {my_dog.get_age()} years old.")

Organizing Modules and Classes

Combining modules and classes can provide a clear structure for your project. Each module can contain one or more related classes, making code management easier. For the next steps, we will create a custom module to manage MongoDB database operations; within this module, we will define a class with functions for writing or querying the database.

Creating a MongoDB Custom Module

We will start organizing our project folder, so be sure to back it up—whether using Git or traditional duplicate copies, do remember to back up.

In the root directory of your project, create a new folder called cyberaimodules. This is where we will create custom modules and import them into the main script.

This is what our directory structure will look like:

your_project/
│
├── main.py
└── cyberaimodules/
    ├── __init__.py
    ├── cyberaimongo.py # The custom module we will create for handling database operations
    └── module2.py

your_project/ is your project root directory.
main.py is the main program file.
cyberaimodules/ is the directory for our custom modules.
__init__.py is a special file that marks the directory cyberaimodules as a Python package. Although it’s currently not needed for functionality, it’s still better to include it; it should be an empty file.
cyberaimongo.py and module2.py are the Python module files created within the cyberaimodules directory.

We will create a cyberaimongo.py module in the cyberaimodules folder. This module will manage MongoDB, and it must contain three essential functions: to write, query, and initialize. Here’s how to start with the necessary imports:

import json
from pymongo import MongoClient
import uuid
from datetime import datetime

Next, we define the module’s initialization:

class MongoManager:
    def __init__(self, host='mongodb', port=27017, username='admin', password='secret', db_name='chat_history'):
        self.client = MongoClient(f'mongodb://{username}:{password}@{host}:{port}/')
        self.db = self.client[db_name]

The Role of `init`

Automatic Initialization: When you create a MongoManager object, say manager = MongoManager(), Python automatically calls the __init__ function. It’s like getting into a car and turning on the engine, preparing everything.
Setting Initial State: The __init__ function sets initial states for the newly created object. It stores parameters passed (like database address, username, password, etc.) for later use.
Default Parameter Values: The function has set some defaults, such as host='mongodb' and port=27017. This means that if you do not specifically state these parameters, the system will use these default values. It’s like your television set is ready with default channels and volume; you don’t need to adjust it for it to work properly.

The Role of `self`

Referring to the Current Object: self is a special variable that always points to the current instance of the object. It allows access to and modification of the object’s properties within class methods.
Property Storage: When you write self.client = ... in __init__, you’re adding a property named client to this object and storing the MongoDB connection there. You can access this property anytime using that object (for example, manager.client).
Consistency: self ensures that you can access the same data across different methods of the object. For example, self.db created in __init__ can be used in other methods for database operations.

__init__ and self are like installing an engine, wheels, and steering wheel in a new object (like a toy car). The __init__ process assembles everything in place, while self makes sure you can find and use those components later. Each MongoManager object you create is independent and fully equipped for use.

Currently, I use JSON to store chat records, but feel free to use other types as you see fit.

We will create a function to generate JSON for data that will exist in the database:

def generate_chat_json(self, ai_reply, human_message, session_id, timestamp=None):
    """
    Generate a JSON string containing chat information.

    Parameters:
    - ai_reply (str): AI's reply.
    - human_message (str): Human's message.
    - session_id (str): Session’s unique identifier.
    - timestamp (datetime, optional): Timestamp of the message. Defaults to current time if not provided.

    Returns:
    - str: JSON string containing chat information.
    """

    # Use current time if timestamp not provided
    if timestamp is None:
        timestamp = datetime.now()

    # Format timestamp as string, ISO 8601 (without seconds)
    timestamp_str = timestamp.strftime("%Y-%m-%dT%H:%M")

    # Create a dictionary to store chat data
    chat_data = {
        'message_id': str(uuid.uuid4()),  # Generate a unique message ID
        'ai_reply': ai_reply,             # Store AI’s reply
        'human_message': human_message,   # Store human's message
        'timestamp': timestamp_str,       # Store formatted timestamp
        'session_id': session_id          # Store session ID
    }

    # Convert dictionary to a JSON string and return
    return json.dumps(chat_data, ensure_ascii=False, indent=4)
    # ensure_ascii=False ensures non-ASCII characters (like Chinese) display correctly
    # indent=4 makes the generated JSON more readable

Next, we’ll create a function to write a single dialog record to the database:

def insert_chat(self, collection_name, ai_reply, human_message, session_id, timestamp=None):
    """
    Insert chat record into the specified MongoDB collection.

    Parameters:
    - collection_name (str): The name of the collection in MongoDB.
    - ai_reply (str): AI's reply.
    - human_message (str): Human's message.
    - session_id (str): Session’s unique identifier.
    - timestamp (datetime, optional): Timestamp of the message. Defaults to current time if not provided.

    Returns:
    - ObjectId: The unique ID of the inserted document.
    """

    # Generate chat record JSON data
    json_data = self.generate_chat_json(self, ai_reply, human_message, session_id, timestamp)

    # Get the specified collection object
    collection = self.db[collection_name]

    # Ensure json_data is of dict type for MongoDB insertion
    if isinstance(json_data, str):
        json_data = json.loads(json_data)

    # Insert chat record into the collection and return the inserted document ID
    return collection.insert_one(json_data).inserted_id

We should also add some functions for basic queries:

    # Retrieve all data within a specified time range
    def get_mem_in_time_range(self, collection_name, start_time, end_time):
        """
        Retrieve all data within a specified time range in the designated collection.

        Parameters:
        - collection_name (str): Name of the collection.
        - start_time (datetime): Start time.
        - end_time (datetime): End time.

        Returns:
        - list: List containing query results.
        """
        collection = self.db[collection_name]

        # Use find to query documents meeting specified conditions
        cursor = collection.find({
            'timestamp': {
                '$gte': start_time.isoformat(),
                '$lte': end_time.isoformat()
            }
        })

        # Convert query results into a list and return
        return list(cursor)

    # Retrieve data by ID
    def get_data_by_id(self, collection_name, id):
        """
        Retrieve data from the specified collection by document ID.

        Parameters:
        - collection_name (str): Name of the collection.
        - id (ObjectId): Unique ID of the document.

        Returns:
        - dict or None: The queried document data; returns None if not found.
        """
        collection = self.db[collection_name]

        # Use find_one to query the document with the specified ID
        data = collection.find_one({'_id': id})

        return data

The overall code now looks like this:

import json
from pymongo import MongoClient
import uuid
from datetime import datetime

class MongoManager:
    def __init__(self, host='mongodb', port=27017, username='admin', password='secret', db_name='chat_history'):
        self.client = MongoClient(f'mongodb://{username}:{password}@{host}:{port}/')
        self.db = self.client[db_name]

    def generate_chat_json(self, ai_reply, human_message, session_id, timestamp=None):
        if timestamp is None:
            timestamp = datetime.now()
        timestamp_str = timestamp.strftime("%Y-%m-%dT%H:%M")
        chat_data = {
            'message_id': str(uuid.uuid4()),
            'ai_reply': ai_reply,
            'human_message': human_message,
            'timestamp': timestamp_str,
            'session_id': session_id
        }
        return json.dumps(chat_data, ensure_ascii=False, indent=4)

    def insert_chat(self, collection_name, ai_reply, human_message, session_id, timestamp=None):
        json_data = self.generate_chat_json(self, ai_reply, human_message, session_id, timestamp)
        collection = self.db[collection_name]
        if isinstance(json_data, str):
            json_data = json.loads(json_data)
        return collection.insert_one(json_data).inserted_id

    def get_mem_in_time_range(self, collection_name, start_time, end_time):
        collection = self.db[collection_name]
        cursor = collection.find({
            'timestamp': {
                '$gte': start_time.isoformat(),
                '$lte': end_time.isoformat()
            }
        })
        return list(cursor)

    def get_data_by_id(self, collection_name, id):
        collection = self.db[collection_name]
        data = collection.find_one({'_id': id})
        return data

At this point, we have temporarily concluded the custom module section; next, we will import and use them in the main function.

import os 
import uuid
# Import the custom module we created earlier
from cyberaimodules import cyberaimongo

For now, we use temporary environment variables for testing; it’s best to securely store sensitive data in a real production environment to avoid embedding credentials directly into the code.

# Create temporary environment variables for testing
mongo_host = os.getenv('MONGO_HOST', 'localhost')
mongo_port = int(os.getenv('MONGO_PORT', 27017))
mongo_user = os.getenv('MONGO_USER', 'admin')
mongo_password = os.getenv('MONGO_PASSWORD', 'secret')
# The name of the database will be chat_history
mongo_db_name = os.getenv('MONGO_DB_NAME', 'chat_history')

# Initialize MongoDB client
mongo_manager = cyberaimongo.MongoManager(
    host=mongo_host,
    port=mongo_port,
    username=mongo_user,
    password=mongo_password,
    db_name=mongo_db_name,
)

Next, we will briefly test the database writing operation for a single dialogue during the chat, while more complex conversation and summarization functionalities will wait for the next post.

We will add some content to the get_response_from_llm function to test if the module we just created is functioning correctly (make sure your MongoDB allows connections on the local port):

  # Attempt to insert this chat data after the LLM response
  chat_id = mongo_manager.insert_chat(
        collection_name='daily',
        ai_reply=response_str,
        human_message=question,
        # Temporarily generate a session ID for testing; we will handle session management based on timeout later
        session_id=str(uuid.uuid4()),
    )

  # Use the returned unique chat ID to query this dialogue to check if it can be traced back
    query = mongo_manager.get_data_by_id(
        collection_name='daily',
        id=chat_id
    )
	# If found, print it out:
    if query:
        print(f"Chat data: {query}")

The complete code now looks like this:

# main.py
import os
import uuid
from openai import OpenAI

# Import the custom module we created earlier
from cyberaimodules import cyberaimongo

mongo_host = os.getenv('MONGO_HOST', 'localhost')
mongo_port = int(os.getenv('MONGO_PORT', 27017))
mongo_user = os.getenv('MONGO_USER', 'admin')
mongo_password = os.getenv('MONGO_PASSWORD', 'secret')
# The name of the database will be chat_history
mongo_db_name = os.getenv('MONGO_DB_NAME', 'chat_history')

# Initialize MongoDB client
mongo_manager = cyberaimongo.MongoManager(
    host=mongo_host,
    port=mongo_port,
    username=mongo_user,
    password=mongo_password,
    db_name=mongo_db_name,
)

# Initialize OpenAI model
chat_model= OpenAI(
	# Replace this with your backend API address
	base_url="https://api.openai.com/v1/",
	# API Key for authentication
	api_key = "sk-SbmHyhKJHt3378h9dn1145141919810D1Fbcd12d"
	)

# Lists for chat records and summary records
chat_history = []
summary_history = []
SUMMARY_THRESHOLD = 4  # Define the number of rounds after which to summarize

def summarize_chat_history(chat_history_window):
    """
    Summarize recent chat records.
    :param chat_history_window: Recent chat records.
    :return: Summary string.
    """
    # Create summary prompt
    summary_prompt = f"Please summarize the following conversation content:\n{chat_history_window}"

    print(f"Generating summary for the following content: {summary_prompt}")
    # Call LLM to generate the summary
    summary_response = chat_model.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{"role": "user", "content": summary_prompt}],
        temperature=0.7,
    )
    # Get summary content
    summary_str = summary_response.choices[0].message.content
    return summary_str

def get_response_from_llm(question):
    """
    Get the response from LLM.
    :param question: User's question.
    :return: LLM's response.
    """
    global chat_history, summary_history

    # If chat history reaches the summary threshold
    if len(chat_history) >= SUMMARY_THRESHOLD:
        # Retrieve the most recent chat window
        chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history[-SUMMARY_THRESHOLD*2-1:-1]])
        # Generate summary
        summary = summarize_chat_history(chat_history_window)
        # Add summary to the summary record
        summary_history.append(summary)
        # Keep the last chat record
        chat_history = chat_history[-1:]

    # Get the recent chat history window
    chat_history_window = "\n".join([f"{role}: {content}" for role, content in chat_history[-2*SUMMARY_THRESHOLD:-1]])
    chat_history_prompt = f"Here is the chat history:\n {chat_history_window}"  

    # Create message list
    message = [
        {"role": "system", "content": "You are a catgirl! Output in Chinese."},
        {"role": "assistant", "content": chat_history_prompt},
        {"role": "user", "content": question},
    ]

    # If there is a summary record, add it to the message list
    if summary_history:
        summary_prompt = "\n".join(summary_history)
        message.insert(1, {"role": "assistant", "content": f"Summary of previous conversations:\n{summary_prompt}"})

    # Call LLM to get the response
    response = chat_model.chat.completions.create(
        model='gpt-4o',
        messages=message,
        temperature=0.7,
    )
  
    print(f"message: {message}")
    # Get response content
    response_str = response.choices[0].message.content

    chat_id = mongo_manager.insert_chat(
        collection_name='daily',
        ai_reply=response_str,
        human_message=question,
        session_id=str(uuid.uuid4()),
    )

    query = mongo_manager.get_data_by_id(
        collection_name='daily',
        id=chat_id
    )

    if query:
        print(f"Chat data: {query}")

    return response_str

if __name__ == "__main__":
    while True:
        user_input = input("\nEnter a question or type 'exit' to quit:")
        if user_input.lower() == 'exit':
            print("Goodbye!")
            break 
        # Add user input to chat history
        chat_history.append(('human', user_input))
        # Get LLM response
        response = get_response_from_llm(user_input)
        # Add LLM response to chat history
        chat_history.append(('ai', response))
        # Print LLM response
        print(response)

As of now, our preliminary discussion on memory has approximately come to a close. In the next installment, we will continue to build upon this foundation and fully execute short-term memory and regular memory. Essentially, every time LLM engages in conversation, it will be able to recall what happened yesterday or just moments ago, so that our Cyber Waifus can gradually develop a memory. Wishing you all an enjoyable time in the Cyberpunk world!

A Brief Discussion on Agent Memory Systems#

Classification of Memory Concepts#

Short-term Memory#

Medium-term Memory#

Long-term Memory#

Relevant Technologies for Memory Systems#

Non-relational Database (NoSQL)#

Embedding#

Vector Database#

Graph Database#

Other Relevant Technologies#

Short-term Memory Practice#

Three Types of Contextual Memory#

Contextual Memory#

Contextual Window#

Contextual Summary#

Database and Memory Persistence#

Deploying MongoDB#

Short-term Memory and Raw Data#

Custom Modules and Classes#

Creating a Custom Module#

Using a Module#

Creating and Using Classes#

Defining a Class#

Using a Class#

Organizing Modules and Classes#

Creating a MongoDB Custom Module#

The Role of __init__#

The Role of self#