Alright, imagine trying to find resumes that *"feel similar"* to a job description — not just based on keywords like “Python” or “ETL” but based on meaning.
That’s where a vector database comes in. It stores content as high-dimensional vectors (numbers!) that represent meaning using embeddings.
We’ll use sentence-transformers
to convert text into embeddings:
from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') text = "Built scalable data pipelines using Apache Airflow" embedding = model.encode(text) print(embedding[:5]) # Just showing the first 5 numbers
This returns a long vector (usually 384 or 768 dimensions) that represents the meaning of the text.
Now let’s store our resume or job description vector in a vector store. We’ll use ChromaDB — easy to use and runs locally.
import chromadb client = chromadb.Client() collection = client.create_collection("resumes") collection.add( documents=["Built scalable data pipelines using Apache Airflow"], embeddings=[embedding], ids=["resume_1"] )
Say a recruiter types this:
job_desc = "Looking for someone experienced in data engineering and Airflow" job_embedding = model.encode(job_desc) results = collection.query( query_embeddings=[job_embedding], n_results=1 ) print(results["documents"])
✨ Boom! You just performed a semantic search. The most similar resumes will appear — even if they don’t use the exact words.
✅ Vector DBs store **meaning**, not keywords.
✅ Great for building **AI-driven search, Q&A, and recommendations**.
✅ Try it out with open tools like ChromaDB, FAISS, or Weaviate.
Want a full tutorial on this? I’ve got a resume matcher project using this exact setup. Let me know and I’ll write a step-by-step build guide!