Milovan Dekic ← Milovan Dekic
Blog 002 · Case Study

Building an MDA Trends Platform — A Case Study

June 20, 2026 · see it live at MDA Tracker

An end-to-end demonstration of a production AI stack: open-world extraction, semantic clustering, and a two-tier LLM design that keeps a real product under $0.50/month in steady state.

Every public company has to file a report with the SEC explaining what's going on in their business. Inside that report there's a section called MD&A — Management's Discussion and Analysis — where executives basically write in plain English what's changing, what worries them, and where they see things heading. It's the most human part of an otherwise very dry document.

mda-trends asks one question: what are companies starting to say that they weren't saying last quarter? That's where the early signal is.

How it works

Every day the system pulls the latest filings from SEC EDGAR and compares each company's MD&A against their previous one — looking only at what's new. Boilerplate legal language gets filtered out. What's left are the paragraphs where something actually changed.

Those paragraphs go to an AI model that extracts structured signals: what's the topic, and which direction it's moving. A clustering algorithm then groups similar signals across companies — finding themes that are spreading across industries, not just one sector having a bad quarter. A second, more capable model names each cluster and checks whether it actually holds together as a real theme.

The output is a two-lane feed: confirmed trends that have hit a threshold of evidence, and emerging signals that are still forming.

mda-trends architecture Data flow from SEC EDGAR through pipeline to web and newsletter SEC EDGAR full-text filings API · free · ~8 req/s Pipeline — GitHub Actions cron (daily 06:10 UTC) Segment & diff MD&A · self-diff · boilerplate Claude Haiku 4.5 Batch API · tool-forced JSON · 50% off Local embeddings MiniLM-L6-v2 · 384-dim · $0 Clustering Agglomerative · cosine · deterministic Claude Sonnet 4.6 Name + coherence gate · 1 call per cluster · snap to ontology or → candidates Supabase Postgres + pgvector · free tier Next.js 14 / Vercel Two-lane feed · ISR 30s Trend cards → no LLM prose Newsletter Python → static HTML digest Backfill ~$31 one-time · steady-state ~$0.50/mo · ceiling <$50/mo
Data flow — from SEC EDGAR through the pipeline to web and newsletter.

The stack

DataSEC EDGAR full-text API (free, no auth required)
OrchestrationGitHub Actions cron job — runs daily at 6am UTC
ExtractionClaude Haiku 4.5 via Batch API (50% cheaper, schema-enforced output)
EmbeddingsMiniLM running locally on CPU — no external API
ClusteringAgglomerative clustering with cosine distance — fully deterministic
Naming & quality gateClaude Sonnet 4.6 — one call per cluster
DatabaseSupabase (Postgres + pgvector, free tier)
FrontendNext.js 14 on Vercel
Cost: ~$0.50/month in steady state.