Senior LLM Engineer Needed — Build a Private AI Assistant (RAG, FastAPI, Streamlit, ChromaDB, OpenAI)
Project Overview
I’m looking for a senior-level AI engineer with real experience designing and implementing LLM-powered applications, especially those involving Retrieval-Augmented Generation (RAG), vector databases, multi-prompt agent behavior, and clean production-grade Python architectures.
The goal is to build my internal private AI assistant (“TomGPT”) that will run locally and serve as:
•A Tax Planning Advisor
• A Profitability & Business Advisory Assistant
• A Content Creation Assistant for my CPA practice
This project requires someone who understands how to build modular LLM systems, not someone who glues together LangChain tutorials.
________________________________________
What I Need Built
A complete private AI system with:
1. Backend (FastAPI)
• /chat endpoint that:
o Loads mode-specific system prompts
o Performs vector retrieval (Chroma)
o Constructs messages for the LLM
o Calls OpenAI models (GPT-4.x / 5.x class)
o Returns assistant responses
2. Frontend (Streamlit)
• Password-gated access
• Mode selector (Tax / Profit / Content / General)
• Full chat interface with history in session state
• Fast, responsive UI
3. Document Knowledge Base (RAG)
• Document ingestion pipeline:
o PDF/DOCX text extraction
o Chunking (configurable)
o Embedding (OpenAI)
o Storage in ChromaDB with metadata
• Runtime retrieval:
o Query embedding
o Top-k similarity search
o Automatic context injection
4. Mode-Based Agent Behavior
Load prompts from external files (4 modes):
• Tax Planner
• Profitability Coach
• Content Writer
• General Advisor
The backend should orchestrate prompts cleanly, not hard-code them.
5. Security & Config
• Password protection for the UI
• .env for secrets
• No API keys exposed to frontend
6. Documentation
A professional-quality README explaining:
• How to run the system
• How to add documents
• How to create new modes
• How to change models
• Optional: how to run everything via Docker
________________________________________
Tech Stack Requirements
Required Experience
You must be strong in:
• Python (senior-level)
• FastAPI (production-quality routes & architecture)
• Streamlit (clean user interface)
• OpenAI API (chat + embeddings)
• Vector DBs (Chroma, Pinecone, Qdrant, etc.)
• RAG design patterns:
o chunking strategies
o embedding management
o context window optimization
o metadata filters
• Prompt architecture & multi-agent patterns
Strongly Preferred
• Experience with Ollama or other local models
• Docker
• Building similar “private GPT” solutions
• Understanding of tax or financial domain (not required, but helpful)
Not Interested In
• Beginners
• People who only use LangChain without understanding what happens under the hood
• No-code tools (e.g., Bubble, WordPress plugins)
• “Chatbot builders” with no real backend knowledge
If you cannot explain embeddings, chunking, and RAG tradeoffs clearly, please do not apply.
________________________________________
Deliverables
• Fully working FastAPI backend
• Fully working Streamlit frontend
• Ingestion script
• Vector DB setup (Chroma)
• Mode-based prompt system
• Clean, simple project structure (folders provided upon hire)
• Excellent documentation
________________________________________
Budget & Timeline
• Budget: $2,000–$3,500 (fixed price or milestone-based)
• Timeline: 2–3 weeks
I’m willing to pay top rate within the budget for senior talent who can build this cleanly, modularly, and efficiently.
________________________________________
To Apply (Important)
Please include the following in your proposal:
1. A short summary of your experience building LLM/RAG systems.
2. One example of an LLM app you built (no NDAs needed—just describe architecture & decisions).
3. Confirmation that you are comfortable with:
o FastAPI
o Streamlit
o Chroma or similar
o RAG design
4. Your estimated timeline and approach to this project.
Shortlisted candidates will be asked one technical question about embeddings and chunking to verify expertise.
Apply Now
Apply Now