Why AI Can't Do CRE Math (And Why That's Good News)

An acquisitions analyst pastes a T12 into an AI chat and asks for the levered IRR at a 5.5% exit cap. The model returns 22.4%, with a short narrative explaining the assumptions. The number looks plausible. It is also wrong by 340 basis points. The analyst does not catch it. The IRR ends up in the IC memo.

This is not a hypothetical. It happens every week in firms that have wired a frontier LLM directly to their deal data without architectural discipline. The mistake is not using AI. The mistake is asking AI to do the math.

The thesis of this piece is simple. AI is the most important technology shift in a generation. It also cannot reliably or consistently do arithmetic, and while frontier models are getting better at it every release, the gap is structural, not just a matter of model size. Once you internalize both facts, the architecture for using AI inside an institutional CRE firm gets clear, and the upside gets enormous.

Why this happens, and why “better prompts” won’t fix it

Large language models are statistical pattern matchers. They predict the next token in a sequence based on probability distributions learned from training data. When you ask one to compute an IRR, it is not running a discounted cash flow. It is predicting what an IRR answer would look like, token by token, based on every IRR-shaped string of text it has ever seen.

Sometimes that prediction is close. Sometimes it is not. The model has no internal sense of which case it is in. It outputs both with the same confident syntax. This is structural to how transformers work. It is not a bug that fine-tuning fixes. It is not a problem that better prompts solve. It is the engine.

You can ask a frontier model to use a code interpreter, and the math itself becomes deterministic inside the sandbox. But the AI is still deciding which numbers to pass in, which formula to use, and how to interpret the output. The orchestration is still fallible. The seam between the language layer and the compute layer still leaks.

What CRE math actually requires

Institutional CRE math has a specific set of requirements that are non-negotiable. A levered IRR is not a creative answer. It is a deterministic function of a cash-flow stream, a discount rate, and an exit assumption. The same inputs must produce the same output every time. The formula has to be inspectable. The lineage has to go back to source.

Why does this matter? Because when an LP asks where a rent growth assumption came from, you need to point to a comp set, a regression on historical submarket data, and a model that ran on a specific date. When an auditor reviews a fund report, the same calculation has to reproduce two years later. When an IC challenges an underwriting number, you have to show the code path that produced it.

AI satisfies none of these requirements. It is non-deterministic by design. The same prompt can return different numbers on different days. There is no inspectable formula, only a probability distribution over tokens. The lineage stops at the model weights, which are opaque.

The wrong solutions

The natural reaction, once a firm realizes AI is unreliable on math, is to try to harden the AI. None of these approaches work for institutional use.

Buy a bigger model. Frontier models hallucinate less than older ones, but they still hallucinate. The error rate trends down but never reaches zero, and the errors are by definition the cases the model is most confident about. There is no model size that clears the bar for IC defensibility.

Prompt engineer harder. Prompt engineering is real, but it is a bandaid on a structural issue. You can reduce certain error modes. You cannot eliminate the underlying behavior that the model predicts numbers rather than computes them.

Add human review. Better, but it does not scale, and humans miss subtle errors in plausible-looking numbers. An analyst eyeballing an IRR that looks reasonable will sign off on a wrong answer most of the time. This is what AI is best at: producing wrong answers that pass casual inspection.

Limit AI to “low stakes” work only. Reasonable in theory, but the line between low-stakes and high-stakes shifts and gets blurred. An IC memo summary feels low-stakes until the partner reads the summary instead of the numbers. A reporting comment feels low-stakes until the LP quotes it back. There is no stable boundary.

The right architecture: separate the layers

The solution is not to make AI better at math. The solution is to stop asking AI to do math. Build three layers, and let each do what it is good at.

Layer 1: a clean data foundation. Sources flow into a governed warehouse inside your tenant. Every fact has lineage. Every transformation is logged. Every refresh is validated.

Layer 2: deterministic models on top of the data. Regression. Forecasting. Scoring. Variance attribution. Monte Carlo. All of it coded once, runs reproducibly, returns the same answer next week and next year on the same inputs. Every output is inspectable. Every formula is auditable.

Layer 3: AI as the natural-language interface and orchestrator. The AI’s job is to understand the question, identify the right deterministic model to call, pass the right inputs, and present the cited result in plain English. The AI’s non-job is computing anything itself. Numbers come from Layer 2 or they do not appear in the output.

This is the modern grounded-LLM pattern with a CRE twist: every model call returns numbers with lineage back to source data, and every AI response cites which deterministic model produced which number. The harness around the AI enforces this. If the model tries to invent a value, the harness blocks it.

What this looks like in practice

An asset manager opens the platform and asks, in plain English: Why did Cypress Creek NOI miss budget in Q1?

The AI parses the intent. It identifies three deterministic queries to run. The first hits the Yardi rent roll and calculates the occupancy variance. The second hits the expense GL and identifies a turnover-cost overrun. The third hits CoStar submarket data and surfaces a concession trend. Each query runs in code. Each returns a cited result.

The AI then composes the answer: “NOI miss driven by 3.2 pp occupancy decline and $42K turnover overrun. Submarket concessions up 2.1% since October suggest pricing pressure on renewals.”

Every number in that sentence cites which deterministic model produced it, which data source, which timestamp. The AI did the language. The math was already done. The asset manager can click any number and see the underlying query, the source data, and the formula. The IC can defend the analysis because the analysis is defensible.

Why this is the good news

Once the architecture is right, the upside is enormous, because AI and deterministic systems have nearly complementary strengths.

AI is remarkable at the work deterministic systems are bad at: synthesizing across messy documents, drafting in natural language, reasoning about ambiguity, surfacing patterns in unstructured text, translating between domain vocabularies. The 200-page OM, the broker email thread, the property manager’s note in the rent comment field: AI eats this and produces structured output that flows into your data layer.

Deterministic systems are remarkable at the work AI is bad at: arithmetic, reproducibility, auditability, regulatory compliance, formula transparency. Pipeline scores, underwriting calibration, hold-period optimization, variance attribution: deterministic models do this and produce answers your IC can defend.

Together they cover the full surface area of institutional CRE workflow. Neither alone gets you there. The architecture matters more than the model. Firms that get the architecture right will compound their advantage every quarter for years. Firms that do not will either limit AI to drafting cover letters and squander the upside, or accept hallucinated numbers in IC packages and create real downside.

The architectural unlock

This is why we build the stack the way we do. A clean data foundation in your Microsoft Azure tenant. The deterministic quant models on top of it, every output traced to source. AI as the natural-language interface above that, guardrailed and cited, never the calculator.

Three layers. Each does what it is good at. Each holds the other accountable. The math is yours. The audit trail is yours. The AI is the assistant your IC can actually defend.

When the IC asks where a number came from, the answer should never be “the AI said so.” It should be “the variance model ran this query on the Yardi GL as of last night, the comp model pulled CoStar on Tuesday, here is the formula and here is the lineage.” That is the whole game.

The firms that win the next decade in institutional CRE will not be the ones with the most AI. They will be the ones whose AI does the right things, on top of the right foundation, with the math underneath that ties.

Want to see how this architecture works against your current stack?

Explore Quant for CRE →