← Research
Applied AI/8 min

Applied AI in the Wild: Lessons From Building a Real-Estate AI Product (Tribunus Labs)

By WaferZeroPublished June 16, 2026
TL;DR
  • In applied AI the demo is the easy 10%; trust on real, messy inputs is the hard 90%, and in a regulated domain a wrong answer is liability, not just a bad vibe.
  • The design rule that earns trust: ground every answer in the source documents, make every claim traceable, and prefer “I am not sure, check here” over confident fabrication.
  • Most of the work is unglamorous data handling (extraction, chunking, retrieval), not prompt cleverness.
  • A live product has a real recurring bill: routing by difficulty, right-sizing context, and caching are what keep it affordable.
  • Understanding cost from the hardware up is what let us cut the bill without cutting quality, the same capability we now sell as research.

Research is one half of what we do; the other is shipping AI into real, messy production. Tribunus Labs is an applied-AI product built for a regulated, document-heavy corner of real estate, and building it taught us the lessons that no amount of benchmark-watching can. The short version: the demo is the easy 10%, trust is the hard 90%, and understanding cost from the hardware up is what keeps a live product economical.

The gap between a demo and a product

A demo has to impress for five minutes on clean inputs. A product has to be right on the five-hundredth document at two in the morning, on a scanned PDF that is crooked, half-handwritten, and missing a page. That gap, the edge cases, the weird inputs, the reliability and latency and error handling, is where almost all the real engineering lives. In a regulated, document-heavy domain a wrong answer is not a bad vibe; it is potential liability, and users will not adopt a tool they cannot trust on their documents.

Grounding, and the cost of being wrong

The central design rule was simple: never let the model free-associate. Every answer is grounded in the source documents through retrieval, and every claim is traceable back to the passage it came from, so a user can verify it in one click. Because the cost of being wrong is high, the system is built to prefer “I am not sure, check here” over a confident fabrication.

Most of the actual work was unglamorous and lived before the model ever ran: extracting clean text from messy PDFs and scans, handling tables and forms, chunking documents sensibly, and retrieving the right passages. Hallucination is the enemy of trust, and the antidote is mostly retrieval, citation, and guardrails, not a cleverer prompt.

Model routing and cost control in a live product

A live product has a real, recurring bill, and it behaves exactly like the inference economics we write about (see Training vs Inference Economics). The levers that kept it affordable are the same ones the research points to:

LeverIn practice
Route by difficultyA small, cheap model for extraction and classification; a frontier model only for hard reasoning
Right-size contextSend the retrieved passage, not the whole document, so the KV-cache and token bill stay small
Cache repeated workReuse results across the many near-identical requests a workflow generates
Measure per-request costWatch the unit economics in production, not just average latency

What the silicon-up view bought us

When the bill arrived, knowing the cost structure from the metal up is what let us cut it without cutting quality. We could reason about why long context was expensive (the KV-cache, not the compute), when a smaller model was genuinely good enough for a task, and where tokens were being spent on output nobody needed. That is the difference between managing AI cost as an engineering quantity and guessing at it from an invoice, and it is the same capability we now offer as research.

The takeaway

Shipping applied AI well is mostly discipline, not magic: ground every answer and make it checkable, sweat the unglamorous data-handling, and treat cost as a first-class engineering metric you can reason about rather than a surprise at month-end. Tribunus Labs is where we learned these lessons by living them, and it is why our research is written by people who have had to make AI work, and pay, in production.

Sources
  1. [1]Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"
  2. [2]Tribunus Labs

Have a question that needs this kind of depth? Get in touch.