Alright, imagine trying to find resumes that *"feel similar"* to a job description — not just based on keywords like “Python” or “ETL” but based on meaning.
That’s where a vector database comes in. It stores content as high-dimensional vectors (numbers!) that represent meaning using embeddings.
We’ll use sentence-transformers to convert text into embeddings:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
text = "Built scalable data pipelines using Apache Airflow"
embedding = model.encode(text)
print(embedding[:5]) # Just showing the first 5 numbers
This returns a long vector (usually 384 or 768 dimensions) that represents the meaning of the text.
Now let’s store our resume or job description vector in a vector store. We’ll use ChromaDB — easy to use and runs locally.
import chromadb
client = chromadb.Client()
collection = client.create_collection("resumes")
collection.add(
documents=["Built scalable data pipelines using Apache Airflow"],
embeddings=[embedding],
ids=["resume_1"]
)
Say a recruiter types this:
job_desc = "Looking for someone experienced in data engineering and Airflow"
job_embedding = model.encode(job_desc)
results = collection.query(
query_embeddings=[job_embedding],
n_results=1
)
print(results["documents"])
✨ Boom! You just performed a semantic search. The most similar resumes will appear — even if they don’t use the exact words.
✅ Vector DBs store **meaning**, not keywords.
✅ Great for building **AI-driven search, Q&A, and recommendations**.
✅ Try it out with open tools like ChromaDB, FAISS, or Weaviate.
Want a full tutorial on this? I’ve got a resume matcher project using this exact setup. Let me know and I’ll write a step-by-step build guide!