Interactive Explainer
Understanding Vector Databases
Learn how vector databases power semantic search, from parsing your documents to retrieving relevant information for AI agents.
WAIT — how much content are we talking?
If your total content is < 25,000 tokens, you probably don't need a vector database at all.
Vector searches often include 10 results each with 512 tokens. Instead, just have an LLM compress your content first and serve that to your agent. Way simpler.
Did you know? A well-configured LLM compression prompt can result in 50% fewer tokens without sacrificing accuracy. Always evaluate performance against full document prompts to ensure quality.
Not sure how many tokens you have? Check with OpenAI's tokenizer
Creating the search index
Using a retrieval pipeline
Runs everytime a file changes
Runs for every agent query
Parse Documents into Markdown
Extract clean, structured text from documents while preserving key information.
Converting to markdown alone can reduce file size by more than 5x compared to the original document.
Source Document

Parsed Markdown
Table broken by footer# MediCare Plus Benefits 2024 ## Preventive Care Annual wellness visits: 100% covered, no copay (in-network). Includes physicals, immunizations, and cancer screenings. ## Prescription Drugs Generic: $10 | Preferred brand: $35 | Non-preferred: $60 Specialty: 20% coinsurance (max $250) ## Mental Health Therapy: $25/visit | Psychiatry: $40 | Telehealth: $15 ## Coverage Table | Service | In-Network | Out-of-Network | |---------|------------|----------------| | Preventive | 100% | 60% after ded. | --- *MediCare Plus | Page 1 of 12 | Effective Jan 1, 2024* --- | Primary Care | $25 copay | 50% after ded. | | Specialist | $40 copay | 50% after ded. | --- *MediCare Plus | Page 2 of 12 | Effective Jan 1, 2024*