How to build a business case for AI before writing a line of code
Most AI projects do not fail because the technology stops working. They fail because nobody agreed, before a single line of code was written, on what "working" actually meant.
This is a structural problem, not a technical one. Engineering teams begin evaluating models. Product teams scope features. Leadership approves a budget. And somewhere in that sequence, the foundational question gets deferred: what does this need to achieve, at what cost, to justify the investment?
The consequences of skipping that question surface late and at great expense. A system reaches 94% accuracy and the team celebrates, until someone realizes the target was always 99%, because anything below that still requires the same manual review process it was supposed to replace. An AI agent successfully downloads utility bills from hundreds of provider portals, but the cost per transaction turns out to be three times what the existing supplier charges. The technology worked. The business case did not.
Building a rigorous business case before development begins is not a finance exercise bolted onto an engineering project. When done correctly, it becomes the architectural brief. It determines which model is good enough, which infrastructure is acceptable, and which constraints are non-negotiable.
The business case and the technical design are the same document.
What a pre-development business case actually requires
A useful AI business case answers three questions before any technical evaluation begins.
What existing spend does this replace?
Every AI initiative worth building displaces something: a third-party vendor, a manual process, a team performing repetitive work, or some combination of all three. Identifying and quantifying that displacement is the starting point. If the AI solution costs more than what it replaces, the investment logic collapses regardless of how impressive the technology is.
What accuracy does the replacement need to have for it to hold?
This is the question most teams skip, and it is the most consequential one. An AI system operating at 90% accuracy may still require significant human review, which means the labour cost it was supposed to eliminate persists. The accuracy target is not a technical aspiration; it is a business threshold. Below a certain point, the model does not replace the process, it adds a layer to it.
What is the maximum allowable cost per unit of output?
Whether the unit is a processed document, a transaction, a classification decision, or a customer interaction, there is a price point above which the economics do not work. Defining that ceiling before development begins means every subsequent architectural decision, which model, which infrastructure, which hosting arrangement, is evaluated against a concrete constraint rather than a vague preference for efficiency.
The relationship between the business case and the technical design
When these three questions are answered clearly, something important happens: the scope of the technical problem becomes specific. This is not incidental. It fundamentally changes how an engineering team approaches the build.
Consider the difference between "build an AI that extracts data from invoices accurately" and "build an AI that extracts data from invoices at 99% accuracy, at a cost of under $X per invoice, running entirely within our own infrastructure." The first framing produces a feature. The second produces a system with defined success criteria, testable benchmarks, and a cost model that can be validated before deployment.
In practice, the cost-per-unit ceiling often drives infrastructure decisions that would not otherwise be obvious. A model that achieves high accuracy using a commercial API may be technically impressive but commercially unviable at scale. That constraint pushes the engineering team toward self-hosted open-source models, a different set of trade-offs, different fine-tuning requirements, and a different deployment architecture. None of that becomes clear without the business case establishing the ceiling first.
Similarly, the accuracy target shapes the evaluation methodology. If the acceptable error rate is 1% across 100 extracted fields, teams need benchmark datasets, structured evaluation runs, and a rigorous understanding of where the model fails before committing to a particular approach. Without that target, benchmarking becomes qualitative and the scale-to-production decision becomes a guess.
Case study: U.S.-based energy intelligence platform
To see what this looks like in practice, let's take a look at a project we worked on that illustrates how directly a business case can shape technical architecture.
A U.S.-based energy intelligence platform was managing utility data for a large portfolio of commercial and industrial clients. Each month, utility invoices arrived from hundreds of providers across North America, each with a different format, field structure, and layout. The client had been using a third-party service to collect and extract this data, but accuracy had degraded, and the manual correction overhead had grown to the point where the economics of the arrangement no longer held.
The brief, at its simplest, was to replace that third-party service with an AI-powered pipeline. Before any technical evaluation began, we worked with the client to define the business case precisely. What did the existing provider cost annually? What did the internal team spend correcting errors it introduced? What headcount was involved, and what would need to change for those costs to be reduced? The answers to those questions established a baseline: the total cost of the current state, across all its components.
From that baseline, two constraints followed. The first was a cost-per-invoice ceiling, the maximum the new system could spend per bill processed for the investment to generate a positive return over a three-year horizon. The second was an accuracy floor of 97%, the point below which the volume of errors would still require a level of manual review that made the cost savings disappear.
Those two numbers changed the architecture before the architecture existed. When we began evaluating language models for the extraction task, the cost-per-unit ceiling immediately narrowed the field. Several capable hosted models were ruled out, not because they performed poorly, but because their inference costs at the expected processing volumes could not meet the ceiling. The evaluation shifted to open-source models, including Mistral and Qwen, that could be self-hosted on hardware priced to meet the constraint.
The accuracy target shaped the development process in different ways. Achieving 93% to 94% accuracy through initial fine-tuning was relatively quick. Reaching 97% required a more layered approach: splitting invoices into segments before extraction, running multiple model passes and combining the outputs, and ultimately building a mechanism to retrieve historical invoices from the same provider and use that context to inform the current extraction. Each of those steps was a direct response to a defined target, not an open-ended effort to improve the model. The business case gave the team a finish line, and the engineering work was organized around reaching it.
Why the business case also reveals what "done" looks like
One of the less obvious functions of a pre-development business case is that it defines the endpoint. Without it, AI projects tend to expand into indefinite optimization cycles, because there is no agreed standard for when the system is good enough to replace what it was meant to replace.
When the business case specifies that the existing third-party provider costs a defined amount annually, that a manual review team can be reduced by a specific headcount, and that the new system needs to hit a defined cost and accuracy threshold to make those reductions possible, the team has a finish line. Progress is measurable. The decision to move to production, or to keep iterating, is grounded in evidence rather than engineering preference.
This also changes how teams handle the distance between early performance and the target. A system reaching 97% accuracy when the target is 99% is not nearly done; it is at a meaningful decision point. The team must evaluate whether the remaining gap can be closed through additional fine-tuning, architectural changes such as splitting the document into segments before extraction, or contextual enrichment from historical data. Each of those approaches has a cost and a timeline. The business case gives the team a framework for making that evaluation rather than simply continuing to iterate.
Where business cases most commonly fail
Even when teams attempt a pre-development business case, several failure modes recur.
The displacement calculation is optimistic. Teams calculate the cost of the vendor or process being replaced, but underestimate the transition costs, integration overhead, and ongoing maintenance required to sustain the AI system. The net saving looks attractive in the model and disappoints in the first year of operation.
The accuracy target is set by intuition rather than process analysis. A target of 97% accuracy sounds rigorous, but it may be too low if the downstream process requires manual correction of every error, or unnecessarily high if errors are caught cheaply by a downstream validation step. The right accuracy target comes from mapping the process, not from picking a number that feels ambitious.
The cost-per-unit ceiling ignores infrastructure at scale. A system that meets its cost target during testing, when processing hundreds of documents, may blow past it at production volumes of hundreds of thousands. Architectural decisions that are cheap at small scale, model size, API calls, redundancy requirements, behave differently at volume. The business case needs to model the cost curve, not just the unit cost at the expected initial throughput.
The business case treats compliance and security as outside scope. In regulated environments or where data sensitivity is a constraint, the business case must include the cost of operating within those constraints. Deploying a self-hosted model to avoid sending credentials or sensitive data to a third-party API is a business requirement, not a technical preference. The cost of that requirement should be included in the model from the beginning.
Building the case before the code
The practical implication of all of this is straightforward: the business case should be completed, challenged, and agreed upon before the technical evaluation begins, not in parallel with it.
That sequencing is harder than it sounds. There is always pressure to start building something while the business case is still being developed. Proof-of-concept work can feel productive and create momentum. But a proof-of-concept that has not been scoped against a cost ceiling or an accuracy threshold is likely testing the wrong thing. It validates technical feasibility without establishing commercial viability, and those are different questions.
The teams that manage AI investments well tend to treat the business case as the first engineering artifact of the project. It constrains the solution space before the solution is designed. It sets the evaluation criteria before the evaluation begins. And it defines what the project needs to deliver before anyone decides how to deliver it.
That discipline is what separates AI projects that compound value from those that compound cost.
Navigate AI adoption with our assistance
If you want to understand whether AI can strengthen your architecture or whether it would amplify existing issues, we can help you assess that.