ai-resume-shortlisting-engine

๐ŸŽฏ AI Resume Shortlisting & Evaluation Engine

An AI-powered recruitment system that evaluates resumes against Job Descriptions using multi-dimensional scoring, semantic matching, and explainable AI.

Built for the Internship Take-Home Assignment โ€” Option A: Evaluation & Scoring Engine (Depth over Breadth)

๐ŸŒ Live Demo โ†’


๐Ÿ—๏ธ Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Streamlit   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚   FastAPI API     โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Gemini 2.5 Flashโ”‚
โ”‚  Frontend    โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚   /evaluate       โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚  (via LangChain) โ”‚
โ”‚  :8501       โ”‚     โ”‚   :8000           โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                             โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚    ChromaDB       โ”‚
                    โ”‚  (Embeddings +   โ”‚
                    โ”‚   Vector Search) โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿง  How It Works

  1. PDF Ingestion โ†’ PyPDF2 extracts raw text from the resume
  2. LLM Parsing โ†’ Gemini 2.5 Flash transforms raw text into structured JSON (Education with Tiering, Experience with Impact, Skills)
  3. Semantic Matching โ†’ ChromaDB embeds resume chunks and finds conceptual overlaps with the JD (e.g., โ€œAWS Kinesisโ€ โ‰ˆ โ€œKafkaโ€)
  4. Multi-Dimensional Scoring โ†’ Gemini evaluates the candidate on 4 axes (0โ€“100 each):
    • Exact Match: Direct keyword overlap
    • Semantic Similarity: Conceptual alignment via embeddings
    • Impact/Achievements: Quantified results (e.g., โ€œReduced latency by 20%โ€)
    • Ownership: Leadership and autonomy evidence
  5. Explainability โ†’ Every score includes a โ€œWhyโ€ explanation
  6. Tiering โ†’ Candidates classified into Tier A (Fast-track), Tier B (Technical Screen), or Tier C (Needs Evaluation)

๐Ÿ› ๏ธ Tech Stack

Component Technology
Language Python 3.10+
Backend API FastAPI + Uvicorn
LLM Google Gemini 2.5 Flash
Reasoning Layer LangChain
Vector Store ChromaDB (in-memory)
Embeddings Sentence-Transformers (all-MiniLM-L6-v2)
PDF Parsing PyPDF2
Data Validation Pydantic v2
Frontend UI Streamlit

๐Ÿ“ Project Structure

resume-shortlisting-app/
โ”œโ”€โ”€ app/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ main.py                    # FastAPI app + /evaluate endpoint
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ schemas.py             # Pydantic models (ResumeData, EvaluationOutput)
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ prompts.py             # LangChain prompt templates
โ”‚   โ”œโ”€โ”€ services/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ evaluator.py           # LLM parsing + scoring orchestration
โ”‚   โ”‚   โ””โ”€โ”€ chroma_service.py      # ChromaDB semantic matching
โ”‚   โ””โ”€โ”€ api/
โ”‚       โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ streamlit_app.py               # Streamlit frontend UI
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ SYSTEM_DESIGN.md               # Architecture & design document
โ”œโ”€โ”€ .env                           # API keys (not committed)
โ””โ”€โ”€ README.md                      # This file

๐Ÿš€ Quick Start

Prerequisites

1. Clone & Install

git clone https://github.com/mridulnehra/ai-resume-shortlisting-engine.git
cd ai-resume-shortlisting-engine

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Configure API Key

Create a .env file in the project root:

GOOGLE_API_KEY=your_gemini_api_key_here

3. Start the Backend

source venv/bin/activate
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000

The API will be live at http://localhost:8000. Swagger docs at http://localhost:8000/docs.

4. Start the Frontend (Streamlit UI)

In a separate terminal:

source venv/bin/activate
streamlit run streamlit_app.py --server.port 8501

Open http://localhost:8501 in your browser.

5. Test It

  1. Upload a PDF resume via the sidebar
  2. Paste a job description
  3. Click ๐Ÿš€ Evaluate Candidate
  4. View scores, tier classification, and explanations

๐Ÿ“– API Reference

POST /evaluate

Content-Type: multipart/form-data

Parameter Type Description
job_description string (form field) The full job description text
resume_pdf file (PDF) The candidateโ€™s resume

Response (200 OK):

{
  "exact_match": { "score": 85, "explanation": "Candidate has Python, FastAPI..." },
  "semantic_similarity": { "score": 78, "explanation": "Strong conceptual match..." },
  "impact": { "score": 90, "explanation": "Reduced latency by 20%..." },
  "ownership": { "score": 75, "explanation": "Led development of analytics..." },
  "overall_score": 82,
  "tier": "Tier A",
  "final_recommendation": "Fast-track. Strong technical fit."
}

GET /health

Returns {"status": "healthy"} - used for monitoring and load balancer health checks.

๐Ÿ”ฎ Scalability Considerations (10,000 resumes/day)