We built our own AI tool — here’s what we learned along the way

Written by Sophia Ward | June 2025

TL; DR [1 minute]

We couldn’t find an AI tool we trusted — so we built one ourselves.

As consultants, we needed something grounded, explainable, and consistent enough to use in real client work. Off-the-shelf tools like ChatGPT helped in small ways, but broke down when it came to traceability, reliability, and control.

This post shares what we learned over a year of building Altea — our own AI assistant for knowledge workers. From technical architecture to UX design, every decision was shaped by real needs: clearer reasoning, structured outputs, consistent results, and the ability to see exactly where an answer came from.

Here’s what we built into Altea:

A graph-based retrieval engine that replaced black-box vector search
Layered, document-aware ontologies to handle inconsistent language across domains
Modular AI agents that each solve a distinct part of the workflow
Template-driven outputs for clarity and consistency
UX features that make reasoning visible and trustworthy

Why we built Altea

Like many, we’ve spent the past few years watching the rise of AI — especially large language models — and wondering how they might reshape the way we work.

But for us, it couldn’t stay hypothetical. As consultants helping clients navigate digital transformation, we couldn’t just advise from the sidelines — we had to explore these tools ourselves and understand what actually worked in practice.

So we moved beyond curiosity. We began testing AI in our own workflows: reviewing dense documents, writing reports, and piecing together insights from chaotic folders full of PowerPoints, spreadsheets, PDFs, and meeting notes. Could AI help us do this faster, with less repetition and more consistency?

We work with documents all day — complex, fragmented, often confidential.
Making sense of that material is at the core of our work. So when we started using AI, we needed it to help us navigate that complexity without losing trust, traceability, or control.

We tried tools like ChatGPT. Sometimes they helped — summarising snippets, rewording text. But they quickly hit limits. Answers changed from run to run. We couldn’t trace where the information came from. And there was no guarantee it was grounded in our own work — let alone a client’s.

We now use ChatGPT for Business — mainly to protect client confidentiality and benefit from more persistent context. But even then, the core issues remain: it struggles to ground its answers in the documents we provide and often hallucinates information. For the kind of work we do, that makes it hard to trust — and impossible to reuse.

So we decided to build something better.

We brought in an AI expert — first for client projects, and then to lead our internal journey. We didn’t start with a product vision. We started with a problem: how to speed up our daily work without sacrificing accuracy or trust.

We also considered training our own model — but quickly ruled it out. The costs were high, the benefits marginal, and we’d be constantly chasing fast-evolving foundation models.

So instead of competing, we built around them — adding structure, explainability, and practical constraints to make AI usable in real knowledge work.

That meant defining a few core principles early:

Truth should come from bounded sources — our actual files, not general training data
Outputs had to be explainable and controllable — we needed to see where answers came from
Results had to be consistent — same input, same result
Value had to be practical — reducing effort in reporting, templating, handovers, and information retrieval

These principles became the foundation for what we built: Altea.

Altea is our AI assistant for knowledge workers in small and medium enterprises — people who work with complex information, produce reports, and need clarity from chaos. It doesn’t replace your tools. It brings structure, transparency, and confidence to the knowledge you already have.

Rooted in the Greek concept of Aletheia (ἀλήθεια) — “truth” as an act of unveiling — Altea does more than retrieve data. It uncovers, connects, and explains. It transforms scattered, evolving content into structured knowledge — and keeps you in control.

What follows are the lessons we learned while building it: the things that worked, the paths that didn’t, and the decisions that shaped what Altea is today.

From technical architecture to design decisions

We started with vector search — and ended up building a knowledge graph

Our first approach used vector databases to retrieve relevant content based on similarity (technically called RAG), combined with prompts to provide additional context.

Back when we began, RAG wasn’t yet widely available in open-source form — so our AI developers built a custom version from scratch. It worked well for small tests, but quickly broke down in real-world use. We couldn’t see why certain results were selected, or what parts of the source they matched. We realized accuracy didn’t come from similarity alone — it came from structure and context.

So we switched to a graph-based approach. By breaking documents into nodes and linking related concepts, we preserved relationships, improved traceability, and made results explainable.

Graphs gave us clarity — until they got too complex

As we scaled up, our graphs became enormous. Everything was connected — but not everything was useful. Relationships multiplied until the signal got lost in the noise.

We tried ranking nodes with PageRank (a method from Google’s original algorithm) to highlight the most connected concepts. It worked well on static data. But our documents were messy, inconsistent, and constantly changing — and high connectivity didn’t always mean high relevance.

Lately, we’ve been testing community detection — clustering nodes based on their relationships — and it’s helped us surface meaningful structure without overwhelming noise.

Building reliable, repeatable outputs

We had to choose accuracy over creativity

In early versions, we let the model do what it does best: generate. It could rephrase, fill in gaps, even infer meaning. But the more creative it got, the more hallucinations we saw. It would invent facts or present shaky interpretations with full confidence.

We couldn’t rely on that. So we constrained the system. We introduced templates, bounded sources, and scoring logic to enforce consistency and traceability — even if it meant sacrificing some of the “magic” LLMs are known for.

That trade-off — reliability over fluency — made Altea something we could actually use.

Why we built parameter controls and scoring logic

In early tests, the same question could yield different answers. That unpredictability made it hard to trust — or reuse — anything the model produced.

We fixed that by locking in parameters, adding scoring rules, and enforcing consistent formatting. Now, same input means same output — every time. No surprises.

It’s one of the most practical features we built, and one we hadn’t planned for at the start.

Shifting the AI system architecture

We stopped chasing a single smart agent — and built a system of specialists

At one point, we thought one intelligent AI agent could do it all: retrieve documents, understand context, answer questions, format outputs.

But that model quickly became too complex. Each function had different needs — and debugging a monolithic system was a nightmare.

So we split it up. We built modular AI agents, each with a single role such as:

One chooses the ontology
One parses the question
One scores the content
One formats the answer

They operate independently but inside a shared pipeline. This modular setup made debugging easier, upgrades safer, and iteration faster.

Evolving the product through real use

We thought answering questions was enough — but we needed structure

At first, Altea was designed to surface relevant content. But we found ourselves wanting more than answers — we needed structured outputs: summaries, comparison tables, decision notes, reports.

So we built a templating engine. It defines the shape and structure of responses, so every output is consistent, reusable, and aligned with how we actually work.

The template library — now covering everything from handovers to AI risk frameworks — saves us hundreds of hours each month. It’s one of the most pragmatic parts of the system.

Understanding the question wasn’t the only problem — it was about trusting the answer

In one early test, we asked ChatGPT a question about the NIST AI Risk Framework and gave it the most recent source file. But ChatGPT ignored that and used outdated definitions from its training data.

We realised RAG wasn’t enough if the model didn’t anchor to our actual sources. And we saw how even small changes in question phrasing could lead to huge swings in output.

So we tackled both: we built better pre-processing of questions, clearer mappings to the graph, and strict source-bounding. That’s how we moved from plausible-sounding answers to dependable ones.

UX, continuity, and trust

We needed visible reasoning — not just results

Showing the answer wasn’t enough. We needed to see why it showed up, where it came from, and how confident the system was.

So we built reliability scores, surfaced source links, and made the graph explorable. Trust didn’t come from backend logic — it came from front-end clarity.

Tip: To speed up those iterations, we used Cursor — an AI coding assistant that helped us experiment with frontend tweaks without deep React expertise. It made it easier for our technical team to translate feedback into interface improvements quickly, without slowing down development.

So where are we now?

We’ve made mistakes, changed direction more than once, and learned a lot along the way — but we’ve built something we’re proud of.

We built Altea for ourselves — but once we started using it in real projects, the response from our clients was clear:

“Can we use it ourselves?”

So now we’re getting ready to share it. We are launching Altea as a standalone product in October 2025.

More soon.
— Sophia Ward from the NimbleNova team

View full post