Skip to content
All Projects
Present

RAG-Based Agentic Assistant

A production-grade retrieval-augmented generation system supporting 1,000+ heterogeneous documents with agentic and non-agentic pipelines, built on LlamaIndex and multiple LLMs.

RAG-Based Agentic Assistant

Overview

A research and production project exploring the frontier of retrieval-augmented generation. The system ingests heterogeneous document collections (PDF, PPT, TXT, JSON, DOCX) and answers questions with citations, using both agentic (multi-step reasoning) and non-agentic (single-pass) pipelines.

Why RAG?

Fine-tuning LLMs for every new document set is expensive and slow. RAG gives LLMs access to up-to-date, domain-specific knowledge at inference time — no retraining required. The challenge is doing retrieval well: ranking irrelevant chunks first destroys answer quality.

Pipeline Design

Document Ingestion

Retrieval

LLM Layer

Experimented with and benchmarked:

Agentic Pipeline

The agentic mode uses LlamaIndex’s agent framework:

  1. Query decomposition → sub-questions
  2. Tool calls to the retrieval system for each sub-question
  3. Synthesis pass to combine evidence into a final answer with citations

Microservice Architecture

┌──────────────┐    ┌───────────────┐    ┌──────────────┐
│  FastAPI GW  │───▶│  Retrieval    │───▶│  ChromaDB    │
│  (REST API)  │    │  Service      │    │  Vector DB   │
└──────────────┘    └───────────────┘    └──────────────┘
        │                   │
        ▼                   ▼
┌──────────────┐    ┌───────────────┐
│  Generation  │    │  PostgreSQL   │
│  Service     │    │  (metadata)   │
└──────────────┘    └───────────────┘

Separating retrieval and generation services allows independent scaling — generation is GPU-bound; retrieval is IO-bound.

Results

MetricValue
Documents supported1,000+ heterogeneous
Retrieval precision improvement+30%
Response latency reduction35% (microservices)
Peak retrieval requests handled10,000+

Tech Stack

Framework: LlamaIndex 3.2 · FastAPI LLMs: Gemini 2.5 Pro/Flash · LLaMA 3.2 · DeepSeek Vector DB: ChromaDB · Gemini Embeddings Storage: PostgreSQL · Python

All Projects