Agent with memory

Human-Inspired Memory in Chatbots: A Pinecone and Ollama Project

How I created an AI that forgets like us—and remembers what matters.

Have you ever wondered what it’d be like if a chatbot could remember like a human? Not just store every byte of data indefinitely like a cold, unfeeling database, but mimic how our brains prioritize important moments and let the mundane fade away? I did, and that curiosity led me to build a project I’m excited to share: a chatbot with human-like memory powered by Pinecone and Ollama.

In this article, I’ll walk you through why I built it, how it works, and how you can try it yourself. Spoiler: It involves exponential decay, vector embeddings, and a dash of psychology-inspired hacking.

The Inspiration: Human Memory Isn’t Perfect

Human memory is messy and marvelous. We don’t recall every mundane detail—we let the noise drift away while treasuring moments that hit hard: a friend’s joke, a big win, or that perfect pizza night. The “forgetting curve” (shoutout to Hermann Ebbinghaus) shows this fading happens exponentially over time—unless something’s reinforced or emotionally charged.

Most AIs, though, are the opposite—perfect recall, no nuance. I wanted a chatbot that forgets the fluff and clings to the good stuff, blending technology with a human touch.

The Tech Stack: Pinecone, Ollama, and Friends

Here’s what powers this experiment:

  • Ollama: Runs llama3.1:latest locally for snappy, natural responses.
  • Pinecone: A vector database for storing and searching conversation embeddings.
  • LangChain: Ties embeddings and Pinecone together seamlessly.
  • Sentence Transformers: Turns text into 384-dimensional vectors with all-MiniLM-L6-v2.

The mission: store chats as embeddings, let them degrade over time, and boost the ones worth keeping—all feeding into Ollama for context-aware replies.

How It Works: Code That Forgets and Remembers

Let’s peek at the guts of llm-memory.py from the GitHub repo.

Storing Memories

Every chat gets saved as an embedding in Pinecone. The store_conversation function captures the exchange, calculates an importance score, and amplifies the embedding:

# Store conversation with decay and importance
def store_conversation(user_input, response):
    conversation_text = f"User: {user_input}\nAI: {response}"
    importance = calculate_importance(user_input, response)
    timestamp = datetime.now().isoformat()
    
    metadata = {
        "text": conversation_text,
        "timestamp": timestamp,
        "importance": importance
    }
    
    # Embed the text
    embedding = embedding_model.embed_documents([conversation_text])[0]
    # Amplify embedding based on importance (optional, for stronger imprint)
    amplified_embedding = [v * importance for v in embedding]
    
    # Store in Pinecone with unique ID
    vector_store.add_texts(
        texts=[conversation_text],
        embeddings=[amplified_embedding],
        metadatas=[metadata]
    )

Forgetting with Exponential Decay

To mimic the forgetting curve, I wrote calculate_decay_factor. Older memories lose relevance over time—here’s how:

# Exponential decay function for memory degradation
def calculate_decay_factor(timestamp, half_life_days=30):
    now = datetime.now()
    time_diff = (now - datetime.fromisoformat(timestamp)).total_seconds() / (24 * 3600)  # Days
    decay = np.exp(-time_diff / half_life_days)  # Exponential decay
    return decay

After 30 days (the default half-life), a memory’s weight drops to 50%. In retrieve_context, this decay scales the similarity score, pushing faded memories down or out.

Remembering What Matters

The calculate_importance function gives a score (1.0–5.0) based on keywords like “important” or “love”:

# Function to determine importance (simplified example)
def calculate_importance(user_input, response):
    # Example: Importance based on keywords or length (could use sentiment analysis or user tagging)
    keywords = ["important", "urgent", "love", "hate", "remember"]
    base_importance = 1.0
    for keyword in keywords:
        if keyword in user_input.lower() or keyword in response.lower():
            base_importance += 1.0
    return min(base_importance, 5.0)  # Cap at 5 for simplicity

High-scoring memories get boosted—both in their embedding and retrieval weight—making them stickier, like a human recalling a pivotal moment.

Contextual Responses

When you chat, retrieve_context grabs the top memories ( factoring in similarity, decay, and importance), and Ollama uses them to reply. The prompt might look like:

Previous conversation (faded by time, stronger if important):
User: I love pizza
AI: That’s awesome!

User: What do I like?
AI:

Ollama fills in the blank, leaning on what it “remembers.”

Trying It Out

The project’s live on GitHub: github.com/MLidstrom/llm-memory. Here’s a sample run:

You: Hello again
AI: It looks like we're starting from scratch again! Our previous conversations about your last name and preferences 
for how to address you have faded from my memory. But I'm happy to learn more about you now!

So, let's try this again... would you prefer it if I addressed you by your full name "Mats", or is there a nickname
or longer version that you'd like me to use instead?
You: My full name is "Mats Lidström", but please call me "Mats" as it is shorter.
AI: It looks like we're having a bit of déjà vu here! I think I see what's happening. You've already told me that
you prefer to be called just "Mats". There's no need for the full name or any variations.

Let's move on and start fresh, shall we? What would you like to talk about today?
You:

Wait a bit (or tweak the half-life), and “Hi” might fade, but “I love pizza” sticks—especially with that “important” tag.

To play with it:

  1. Clone the repo.
  2. Install: pip install pinecone-client ollama langchain langchain_pinecone sentence-transformers numpy python-dotenv.
  3. Add your Pinecone API key to a .env file.
  4. Run Ollama: ollama run llama3.1:latest.
  5. Start chatting: python llm-memory.py.

What’s Next?

This is version 1.0—rough but promising. I’m eyeing:

  • Sentiment Analysis: Smarter importance scoring beyond keywords.
  • Repetition: Boost memories mentioned often, like how humans reinforce through practice.
  • Pruning: Clear out faded junk from Pinecone.
  • Multi-User: Keep my pizza love separate from Bob’s sushi vibes.

Why It Matters

This isn’t just a toy—it’s a glimpse at AI that feels human. Perfect memory is cool, but selective recall is relatable. Imagine a companion that forgets your awkward ramblings but holds onto what defines you. That’s where I’m headed.

Check out the code, tweak the decay, or pitch an idea—I’d love your take. Find it all at github.com/MLidstrom/llm-memory. Let’s make AI a little more forgetful, and a lot more real.

Thoughts? Try it out and let me know below!

Leave a Reply

Your email address will not be published. Required fields are marked *