Hi, I'm Pankaj 👋
I build intelligent
systems as a
Machine Learning EngineerAgentic AI Systems Builder
Building intelligent, agentic AI systems that turn unstructured documents into structured, reliable data. Currently building multi-agent Document AI systems powered by Claude 4.5 Sonnet at KlearNow.AI.
About Me
Turning unstructured data into structured intelligence
I'm a Machine Learning Engineer specializing in Document AI, LLM-powered agentic systems, and large-scale data extraction pipelines. At KlearNow.AI, I design multi-agent systems built on Claude 4.5 Sonnet that process thousands of documents daily with near-perfect reliability — replacing legacy model architectures with flexible, prompt-driven LLM workflows.
Designed multi-agent Document AI systems processing 3,000+ documents/day at 99% reliability
Migrated legacy LayoutLM/BERT pipelines to LLM-powered architectures supporting 1,000+ templates
Built validation agents enforcing 14+ schema rules, cutting manual correction effort by 85%
Fine-tuned LLaMA & Qwen with LoRA/QLoRA, reducing GPU memory usage by 60%
Technical Skills
My Technical Arsenal
A blend of machine learning expertise, generative AI engineering, and production-grade backend systems.
Programming
Machine Learning & NLP
LLM & Generative AI
Backend & Distributed Systems
Cloud & DevOps
Work Experience
Where I've Made an Impact
From scaling production document pipelines to fine-tuning LLMs for real-world extraction tasks.
Machine Learning Engineer
June 2025 – PresentKlearNow.AI
- Designed and deployed an AGNO-based multi-agent Document AI system using Claude 4.5 Sonnet, coordinating 6 microservices for document classification, OCR, and structured data extraction, processing 3,000+ documents/day with 99% reliability.
- Modernized document extraction workflows by migrating from LayoutLM and BERT models to an LLM-powered architecture, enabling seamless adaptation to 1000+ document templates and removing the need for template-specific retraining.
- Built a Validation Agent enforcing 14+ schema rules on LLM outputs, mitigating hallucinations, reducing downstream error rates by 30%, and cutting manual correction effort by 85%.
- Built event-driven pipelines using Apache Kafka and gRPC across 6 microservices for end-to-end processing; implemented AWS SQS-based architecture reducing queue backlog by 60% at peak load.
- Designed prompt-based extraction workflows generating structured JSON outputs from OCR text, achieving 92%+ field-level accuracy across invoice and customs formats.
Machine Learning Engineer Intern
March 2025 – May 2025KlearNow.AI
- Fine-tuned Qwen and LLaMA for structured document extraction using LoRA/QLoRA on 4-bit quantized models, reducing GPU memory by 60% and improving field-level accuracy by 18% over zero-shot baselines across 5+ document categories.
- Designed one-shot and few-shot prompting strategies for document extraction, reducing prompt token usage by 30% while maintaining extraction accuracy across 5+ document categories.
- Developed image preprocessing pipelines including orientation correction, denoising, and OCR optimization, reducing document processing latency by 35%.
Software Engineering Intern
June 2023 – August 2023Beans.ai
- Created indoor maps for 10+ facilities using ArcGIS and geospatial datasets, improving delivery routing accuracy.
- Developed a CNN-based malaria detection app with Grad-CAM visualization, achieving 94% accuracy on 27K+ blood smear images.
Projects
Selected Work & Experiments
A mix of production systems and research projects spanning Document AI, RAG, computer vision, and NLP.
RAG-based Document QA System
A production-ready Retrieval-Augmented Generation system for document question answering, combining FAISS vector search with Claude and LLaMA LLMs.
Invoice & Non-Invoice Information Extraction
Fine-tuned GeoLayoutLM for high-accuracy key-value extraction from invoices and customs documents across 135 classes.
Document Classification System
A multi-model ensemble pipeline classifying 10 document classes using both textual and visual features at scale.
NER-Based Key-Value Extraction
A BERT-based Named Entity Recognition pipeline for OCR documents using BIO tagging, with extended context length for long documents.
Education
Academic Background & Leadership
Building a strong foundation in computer science while leading technical communities on campus.
Education
B.Tech, Computer Science Engineering
SRM University, Chennai
2022 – 2026Positions of Responsibility
President, CS Club
Led 3+ technical events and managed a 50+ member team.
May 2022 – May 2025Event Coordinator, Annual Tech Fest
Coordinated 8+ sessions with 200+ student participation.
Feb 2023 – May 2023Achievements
Milestones & Recognition
Competitive programming achievements and contributions to the technical community.
Solved 500+ DSA problems across LeetCode and GeeksforGeeks
Ranked in the Top 20% among 50,000+ participants in LeetCode Weekly Contests
Authored technical articles on BERT and Kimi-VL, covering architecture, fine-tuning strategies, quantization, and cost-efficient inference on NVIDIA A10/L4 GPUs
Contact
Let's Build Something Great Together
Have a project in mind, an opportunity to discuss, or just want to connect? My inbox is always open.