Reusable document-AI platform
Architected the end-to-end pipeline — OCR → Bedrock/Claude via AgentCore → Strands structured output → human-review APIs — now powering three production workflows across the business.
// AI / Machine Learning Engineer
I design and ship generative-AI and ML systems end to end — extraction, RAG, agents, and classification — right-sized for cost, latency, and reliability, and held to hard accuracy metrics.
Hi, I'm Gabriel — an AI/ML engineer who designs and ships production generative-AI and machine-learning systems end to end. My focus is document intelligence: turning unstructured documents into structured, verified data that teams can actually act on, through extraction, RAG, agents, and classification.
What I care about most is rigor over buzzwords. I'd rather right-size a model — classical ML or an LLM — to the cost, latency, and reliability a problem actually needs, then prove it works with real evaluation: benchmarking against human-verified files, LLM-as-judge, and post-generation citation checks. Most of my work runs on AWS (Bedrock, AgentCore, SageMaker, Redshift) behind REST APIs and event-driven pipelines.
If you're building something where the answer has to be right, not just plausible, I'd love to talk.
Production generative-AI and ML systems I designed, shipped, and own — each held to a measured result.
Architected the end-to-end pipeline — OCR → Bedrock/Claude via AgentCore → Strands structured output → human-review APIs — now powering three production workflows across the business.
Captures key line items at 98% field-level accuracy, tracked in production against underwriter corrections. Scaled throughput from ~700 to ~1,200 statements/month; per the company's annual report, the platform drove an 85% reduction in manual data entry and 50% growth in instant endorsements.
Automated summaries of 400+ sixty-page contracts a month with page-and-quote citations, grounded by post-generation string-match verification against the source PDF.
Built an 8-class classifier — TF-IDF + one-class SVM for OOD detection + logistic regression — routing 2,500+ files a month. Chose classical ML over an LLM for cost, latency, and interpretability.
Built a system matching 3,200+ underwriter-opinion documents to database records, scoring section similarity and flagging discrepancies (red/yellow/green) to establish a single source of truth.
Built a RAG + tool-calling assistant over dashboards and live database metrics, letting staff find dashboards and pull numbers in plain English.
Presented analysis to VPs showing no correlation between financial data and claims, driving the pivot from a predictive model to an LLM-with-rubric system applying underwriter guidelines.
Mentored a developer redesigning a quarterly-analysis LLM workflow into parallel task-specific agents — cutting token cost ~70% and eliminating threshold hallucinations.
Open-source work. Replace each placeholder below with a real project.
[ 2–3 sentences: what you built, the key technical decision, and the result or metric. ]
repo →demo →[ 2–3 sentences: what you built, the key technical decision, and the result or metric. ]
repo →demo →[ 2–3 sentences: what you built, the key technical decision, and the result or metric. ]
repo →demo →