KOS Chatbot Deep Dive: Explorer, Memory, Models

In the first post, I explained how I structured my entire business into YAML files and built a chatbot to query it. That post covered the philosophy and the architecture. This one goes deeper into the features I have built since.

The chatbot is no longer just a chat window. Here is what it includes today:

Feature	Category	Status
File Explorer	Document search UI	Live
M ou F?	Mini app (French gender)	Live
French Tutor	Chat mode	Live
Persistent Memory	Two-tier memory system	Live
Multi-Model Switching	AI model selector	Live
Conversation Logging	Session archive + recall	Live
Voice Mode	Speech input	In progress
Daily Briefing	Morning summary	In progress
Smart Search	Semantic search	In progress
Workflows	n8n integration	In progress

The Document Search UI

The first thing I needed beyond chat was a way to see my data. Not just ask about it. See it.

File Explorer. An IDE-style panel built into the chatbot UI. Click “File Explorer” in the sidebar and the interface shifts. The warm accent tones swap to a cooler blue palette. The layout splits into two panels.

┌─────────────────────────────────────────────────────┐
│  Sidebar  │  Tree Panel (260px)  │  Content Panel   │
│           │                      │                   │
│  KOS      │  data/               │  [View] [Edit]    │
│  French   │  ├── contacts/       │  [AI Changes]     │
│  M ou F?  │  ├── businesses/     │  [Raw]            │
│  Explorer │  ├── projects/       │                   │
│           │  ├── services/       │  Parsed YAML card │
│  ───────  │  meta/               │  or editor        │
│  Coming   │  config/             │  or diff view     │
│  Soon     │  schemas/            │  or raw content   │
│           │  00-inbox/           │                   │
│  [Model]  │                      │                   │
│  [Memory] │  [Filter: ______]    │                   │
└─────────────────────────────────────────────────────┘

The Tree Panel

Key features:

Directory browsing — shows data/, meta/, config/, schemas/, and 00-inbox/ with expand/collapse
Color-coded file types — yellow for YAML, green for Markdown, gray for everything else
Count badges — each folder shows how many items it contains
Real-time search filter — type “george” and only matching files remain visible
Live refresh — tree rebuilds from the API on every load, always reflects current state

Four View Modes

Click any file and the content panel offers four modes:

Mode	What it shows	Use case
View	Parsed YAML as formatted cards with `_meta` badges	Human-friendly reading
Edit	Raw file in monospace editor with save button	Direct file modification
AI Changes	Line-by-line diff (green = added, red = removed)	See what the chatbot changed
Raw	Unprocessed YAML or Markdown	Exact file content on disk

How each mode works:

View renders YAML records as cards. A contact file shows the person’s name, title, tags, and all fields laid out cleanly. The _meta block appears as badges: record type, version number, last modified date.
Edit opens a full-height textarea with monospace font. When you save, the backend creates a timestamped backup before overwriting. Every edit is recoverable.
AI Changes computes a diff between the current file and its most recent backup. This is how you see exactly what the chatbot changed when it wrote to a file through the chat interface.
Raw displays the file content exactly as it exists on disk. No formatting, no cards.

Key takeaway: The file explorer turns the chatbot from a question-answering tool into a knowledge management interface. Ask the chatbot about a project, then flip to the explorer and read the full YAML file. Edit a record, save it, immediately ask questions about the updated data. Chat and explorer share the same backend, the same files, the same source of truth.

Mini Apps: M ou F?

Not every interaction with the knowledge base is a conversation. Some things are better as standalone tools.

“M ou F?” is a French gender reference tool built directly into the chatbot UI. I am learning French, and one of the most frustrating parts is memorizing whether a word is masculine or feminine. Every noun. Every time.

How It Works

User types "maison"
      ↓
Sends to current AI model (e.g. GPT-4o)
      ↓
Model returns structured JSON
      ↓
UI renders as formatted card

The card contains:

Gender badge — masculine (blue) or feminine (pink)
The word with its article — “la maison” or “le livre”
Articles grid — all forms in a 2-column layout:

Article type	Example (feminine)
Definite	la
Indefinite	une
Partitive	de la
Contracted (a)	a la
Contracted (de)	de la

Adjective chips — 3-4 adjectives agreeing with the word’s gender, each with:
- French adjective
- English translation
- Example phrase using word + adjective
Example sentences — 3-4 natural sentences in three languages:
- French (target language)
- English (translation)
- Farsi (my native language, right-to-left)
Mnemonic tip — a rule or trick for remembering the gender

Previous lookups appear as clickable pills below the input. Blue for masculine, pink for feminine. Click any pill to see that word’s card again.

Chat vs Mini App

Approach	Speed	Structure	Overhead
Ask chatbot “Is maison feminine?”	Slow (tool loop, multiple rounds)	Unstructured prose answer	Conversation context required
M ou F? mini app	Fast (single API call)	Structured card with all forms	Zero overhead, purpose-built prompt

The pattern: Chat for exploration, mini apps for utilities. The chatbot handles open-ended questions. Mini apps handle repetitive, structured lookups.

French Tutor Mode

Beyond the gender tool, there is a full French Tutor mode:

Same chat interface — the message input and response stream stay identical
Same AI models — swap between GPT-4o, Claude, Gemini
Same tool-calling — can still reference your knowledge base for personal context
Different system prompt — the chatbot becomes a conversational French teacher
Focus areas — grammar, vocabulary, practice conversations, corrections

The Memory System

Every chatbot forgets. Close the tab, start a new session, and it has no idea who you are. The KOS chatbot does not forget.

Two-Tier Architecture

┌─────────────────────────────────────────────┐
│              SYSTEM PROMPT                   │
│                                             │
│  ┌─────────────────────────────────────┐    │
│  │  CORE MEMORY (always loaded)        │    │
│  │  - Preferences                      │    │
│  │  - Corrections                      │    │
│  │  - Cap: 15 entries, ~375 tokens     │    │
│  └─────────────────────────────────────┘    │
│                                             │
│  ┌─────────────────────────────────────┐    │
│  │  EPISODIC MEMORY (query-relevant)   │    │
│  │  - Facts, decisions, patterns       │    │
│  │  - Retrieved by cosine similarity   │    │
│  │  - Top 7 per query, ~175 tokens     │    │
│  └─────────────────────────────────────┘    │
│                                             │
│  ┌─────────────────────────────────────┐    │
│  │  KOS CONTEXT                        │    │
│  │  - Directory map                    │    │
│  │  - File registry                    │    │
│  └─────────────────────────────────────┘    │
└─────────────────────────────────────────────┘

Tier	What it stores	Retrieval	Cap	Token budget
Core	Preferences, corrections	Always loaded into every prompt	15 entries	~375 tokens
Episodic	Facts, decisions, patterns	Semantic similarity to user query	No cap	~175 tokens/query

How Memories Are Created

Method	Trigger	What happens
Explicit save	Chatbot calls `save_memory` tool	Routes by category: preference/correction → core, fact/decision/pattern → episodic
Auto-extraction	After every conversation with 3+ turns	Background model reads full conversation, extracts memories aggressively
Manual save	User clicks in memory panel UI	View, search, delete individual memories. “Tidy Up” triggers consolidation

Memory Categories

Category	Tier	What it captures	Example
`preference`	Core	How you like things done	”Prefers bullet-point summaries over paragraphs”
`correction`	Core	Things you corrected	”George’s company is Quantum Club, not Quantum Group”
`fact`	Episodic	Important information	”Colin’s contract ends March 2026”
`decision`	Episodic	Conclusions reached	”Using Stripe for Quantum Club payments”
`pattern`	Episodic	Recurring themes	”Frequently asks about content pipeline status”

Auto-Consolidation

When core memory exceeds 30 entries, the system triggers cleanup automatically:

A background model reads all core entries
Merges duplicate or near-duplicate memories into one
Removes outdated entries superseded by newer corrections
Keeps language concise (1-2 sentences per entry)
Prioritizes preferences and corrections over general facts
Writes the cleaned result back to disk

You never have to think about it.

The Memory Panel UI

The memory panel slides in from the right as a drawer overlay. Each entry shows:

Tier badge — gold for core, gray for episodic
Category badge — color-coded (purple for preference, green for fact, orange for correction, pink for decision, blue for pattern)
Content text — the memory itself
Timestamp — when it was saved
Delete button — remove anything wrong or outdated
Search — filter memories by keyword
Tidy Up button — trigger manual consolidation

Key takeaway: The memory system makes the chatbot compound over time. It learns how you like information presented, what corrections you have made, what decisions you have reached. Next session, it already knows.

Multi-Model AI Switching

The chatbot is not locked to one AI provider. A dropdown in the sidebar lets you switch at any time.

Available Models

Model	Provider	Best for	Speed	Cost
GPT-4o	OpenAI	Structured tool use, general reasoning	Fast	$$
GPT-4o Mini	OpenAI	Quick lookups, cost-sensitive queries	Very fast	$
Claude Sonnet 4	Anthropic	Nuanced writing, careful analysis	Medium	$$
Gemini 2.5 Flash	Google	Speed with good accuracy	Very fast	$
Gemini 2.5 Pro	Google	Deep reasoning, long context	Medium	$$

How It Works

litellm handles the routing. The architecture is model-agnostic:

User selects model from dropdown
      ↓
Same tool schemas (read_file, search_content, etc.)
      ↓
Same system prompt (KOS context + memory)
      ↓
Same session state (conversation history preserved)
      ↓
Different LLM brain processes the request
      ↓
Same streaming UI renders the response

Key behaviors:

Mid-conversation switching — change from GPT-4o to Claude and the chatbot keeps going. Session state stays the same. Only the brain changes.
Mini apps use the same selector — check a French word with GPT-4o, then switch to Claude and check again. Different examples, different mnemonics, different adjective choices.
No code changes needed — litellm normalizes all provider differences. Add a new model by adding one line to the config.

Key takeaway: Running multiple models against the same structured prompt and the same knowledge base is one of the fastest ways to learn where each model excels. When one model gives a weak answer, switch and ask again. Same data, same tools, different perspective.

Conversation Logging and Recall

Every conversation is saved to disk automatically.

Storage Structure

conversations/
├── 2026/
│   ├── 01/
│   │   ├── 2026-01-15_a3f8b2c1.json
│   │   └── 2026-01-15_d7e4c9a2.json
│   └── 02/
│       ├── 2026-02-20_b1c3d5e7.json
│       ├── 2026-02-22_f9a1b3c5.json
│       └── 2026-02-23_e2d4f6a8.json

Recall Tool

Tool	Input	Output
`recall_conversation`	Date in YYYY-MM-DD format	Full conversation transcripts from that date

Ask “What did we talk about on February 20th?” and the chatbot pulls the conversation transcripts from that date.

How it pairs with memory:

Memory captures the insights (preferences, facts, decisions)
Conversation logs capture the full context (everything said, in order)
Together they give the chatbot both quick-access knowledge and deep recall

What Is Coming Next

The sidebar shows four features marked “in progress”:

Feature	What it will do	Pattern
Voice Mode	Speak to the knowledge base instead of typing	New input method → same tool loop
Daily Briefing	Morning summary of tasks, projects, calendar	New UI panel → reads existing YAML data
Smart Search	Semantic search across the entire knowledge base	New tool → embedding-based file retrieval
Workflows	Trigger n8n automations from the chatbot	New tool → calls n8n webhook endpoints

Each one follows the same architecture pattern. Adding a new feature means:

Add a new tool schema (JSON function definition)
Add a new API endpoint (FastAPI route)
Add a new panel in the UI (HTML + JS in the single-page app)

The data is structured. The interface is modular. The tools are composable.

The Architecture Pattern

Every feature in the chatbot follows the same formula:

YAML Files (source of truth)
      ↓
Tools (read, search, write, move)
      ↓
LLM (processes with tools)
      ↓
UI Shell (chat, explorer, mini apps)
      ↓
Memory (compounds value over time)

The four principles:

Data lives in YAML files. The knowledge base is the source of truth.
Tools expose the data. Read, search, write, move. The LLM uses tools to interact with the data.
The UI is a shell. Chat, explorer, mini apps. Each is a different view into the same backend.
Memory compounds value. The more you use it, the more it knows. The more it knows, the better it answers.

Property	Value
Total Python backend	~1,100 lines across 4 files
Frontend	1 HTML file
Database	None (JSON + YAML on disk)
Monthly hosting cost	$0
Models supported	5 (any litellm-compatible model)
Vendor lock-in	Zero

The data is the product. Everything else is a view.

If you want to build something like this for your business, reach out. This is what I do at ConnectMyTech. Observe the bottleneck, design the system, build it. My name is Shahab Papoon, and I help businesses turn scattered knowledge into structured systems that AI can actually use.