A Next.js demo showcasing persistent memory for voice AI agents using Gemini Live and MongoDB.
Based on the article: Building Persistent Memory for Voice AI Agents with MongoDB
- 🎙️ Real-time Voice Interaction - WebSocket-based voice communication with Gemini Live
- 🧠 Persistent Memory - Store and retrieve memories across sessions using MongoDB
- 🔧 Memory as Tool - AI decides when to store/retrieve information (not hardcoded rules)
- 🔒 User Isolation - Each browser gets a unique ID for privacy
- 🌐 Global vs Private - Gemini classifies memories and obfuscates PII in shared data
┌─────────────────────┐ ┌──────────────────────┐
│ Browser │ │ Gemini Live API │
│ - Mic capture │◄───►│ WebSocket │
│ - Audio playback │ │ gemini-live-2.5- │
│ - UI │ │ flash-native-audio │
└─────────────────────┘ └──────────────────────┘
│
│ Tool Calls
▼
┌─────────────────────┐ ┌──────────────────────┐
│ Next.js API │────►│ MongoDB Atlas │
│ /api/memory │ │ memories collection│
└─────────────────────┘ └──────────────────────┘
cd voice-memory-demo
npm installcp .env.local.example .env.localEdit .env.local with your credentials:
GOOGLE_API_KEY=your_gemini_api_key
MONGODB_URI=mongodb+srv://...
MONGODB_DB=voice_memory_demo
VOYAGE_AI_API_KEY=your_voyage_ai_keynpm run devOpen http://localhost:3000 in your browser.
For hybrid search (vector + text) with $rankFusion (require latest Atlas version cluster), the code creates on startup two Atlas Search indexes on the memories collection:
Index Name: memory_vector_index
Optional: In Atlas UI: Search Indexes → Create Search Index → Atlas Vector Search
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1024,
"similarity": "cosine"
},
{
"type": "filter",
"path": "deploymentId"
},
{
"type": "filter",
"path": "userCookie"
},
{
"type": "filter",
"path": "isGlobal"
}
]
}Index Name: memory_text_index
Optional: In Atlas UI: Search Indexes → Create Search Index → Atlas Search
{
"mappings": {
"dynamic": false,
"fields": {
"key": {
"type": "string",
"analyzer": "lucene.standard"
},
"value": {
"type": "string",
"analyzer": "lucene.standard"
},
"deploymentId": {
"type": "string"
},
"userCookie": {
"type": "string"
},
"isGlobal": {
"type": "boolean"
}
}
}
}Note: If you don't set up these indexes, the app will fall back to regex-based search which is less accurate for semantic queries.
The AI agent has access to an agentMemory tool with four operations:
| Operation | Description | Example |
|---|---|---|
set |
Store a key-value pair | {op: "set", key: "user_name", value: "Pavel"} |
get |
Retrieve by key | {op: "get", key: "user_name"} |
delete |
Remove a memory | {op: "delete", key: "user_name"} |
query |
Search memories | {op: "query", query: "user preferences"} |
When storing a memory, Gemini classifies it:
- Private: User-specific data (name, preferences, contact info)
- Global: Shared facts (business hours, product info, policies)
Global memories have PII obfuscated (emails → [EMAIL], phones → [PHONE]).
Each browser generates a UUID stored in localStorage:
- Private memories are scoped to this ID
- Global memories are accessible to everyone
- Users can reset their identity from the UI
voice-memory-demo/
├── src/
│ ├── app/
│ │ ├── api/memory/route.ts # Memory API endpoint
│ │ ├── page.tsx # Main page
│ │ ├── layout.tsx # App layout
│ │ └── globals.css # Styles
│ ├── components/
│ │ ├── VoiceAgent.tsx # Main voice interface
│ │ └── MemoryPanel.tsx # Memory debug panel
│ ├── hooks/
│ │ ├── useGeminiLive.ts # Gemini WebSocket hook
│ │ └── useUserCookie.ts # User ID management
│ └── lib/
│ ├── mongodb.ts # MongoDB connection
│ ├── memory-service.ts # Memory CRUD operations
│ └── gemini-tools.ts # Tool definitions
├── public/
│ └── audio-processor.js # AudioWorklet for mic capture
└── .env.local.example # Environment template
After connecting, try saying:
- "My name is [your name]"
- "I live in [city]"
- "I prefer email over phone calls"
- "What do you remember about me?"
- "What's my name?"
- "Forget my name"
- Input: PCM 16-bit, 16kHz, mono
- Output: PCM 16-bit, 24kHz, mono
Using gemini-2.5-flash-native-audio-preview-12-2025 for real-time voice-to-voice interaction with tool calling support.
{
deploymentId: "voice-memory-demo",
key: "user_name",
value: "Pavel",
userCookie: "uuid-xxx" | "global",
isGlobal: false,
embedding: [0.123, -0.456, ...], // 1024-dim VoyageAI vector (if enabled)
createdAt: ISODate(),
updatedAt: ISODate()
}- Check your
GOOGLE_API_KEYis valid - Ensure you have access to the Gemini Live API
- Check browser permissions for audio
- Try clicking the page first (browsers require user interaction)
- Verify
MONGODB_URIis correct - Check MongoDB network access (IP whitelist)
MIT