A tool that ingests employee benefits PDFs from two organizations, extracts and normalizes the content, and exposes it through a report or conversational agent — so you can ask plain-language questions and get direct comparisons across plans.
The problem
Comparing employee benefits across organizations is harder than it should be. The information exists, but it's buried in dense PDFs with inconsistent formatting, plan-specific terminology, and no standard schema. The result is that most people either spend hours reading documents or make decisions with incomplete information.
Approach
The tool will work in two stages. First, an extraction and normalization step that parses the PDFs, identifies plan sections (medical, dental, vision, FSA/HSA, etc.), and maps them to a common schema. Second, a query layer — either a structured comparison report or a conversational agent — that lets you ask questions like "which plan has lower out-of-pocket maximums for a family?" and get a direct answer with source references.
Stack
Python for the core pipeline. PDF extraction via pdfplumber or pymupdf,
with an LLM-assisted normalization step to handle the structural variation across documents.
The conversational layer will use the Claude API with tool use for structured plan lookups.
Status
In planning. Repo and write-up will be linked here once the initial version is working.