The problem: leadership demand for reproducibility
Imagine the risk committee chair of a bank. On Monday morning, the KDM dashboard shows "internal audit compliance score: 78". On Tuesday, with the same data, it shows 81. Who added new information? Which rule changed? If the answer is "the model might output a slightly different value each time", you've lost the trust in the room. For governance AI, that's not acceptable.
That's why in KDM (and in fact every risk/compliance/regulation product of ours), we run a two-layer architecture. The score itself is fully deterministic; only the explanation text is LLM-generated. This separation answers the customer's "same input, same output" demand while keeping report readability.
Architecture summary
- Layer 1 — Score engine: Pure functional Go code. Data input + rule-weight snapshot + calculation version (semver). The same triple always returns the same float64.
- Layer 2 — Explanation generator: Score + rule breakdown → LLM (temperature=0.2, structured output). Natural language + fixed slot template.
- Layer 3 — Audit log: For every calculation:
calculation_run_id,factor_snapshot_hash,template_version,raw_inputs_hash. 7-year retention (KVKK + CMB).
Why pure LLM didn't work
We tried pure LLM in the pilot. Three problems:
- Reproducibility: Same prompt, same data, different hour — small drift. Customers don't want to see fluctuation on a panel.
- Audit trail: When a regulator asks "why is this score 78?", an LLM saying "I weighed everything" isn't enough. We must know which rule weighed how much.
- Cost: For 50k+ measurements/day, pure LLM isn't economical. The deterministic engine runs in microseconds; LLM in seconds.
Pattern: "Numbers from rules, narrative from AI"
We now use this pattern in all regulation-touching products: U2 Carbon (CBAM calculation), Adaletopia (case-law score), AI Finance IQ (KAP filter score). We don't outsource number generation to AI. We use AI to tell people the story of the number.
The right boundary: "Don't compute with the LLM. Explain with the LLM."
When do we use pure LLM?
Pure generative is fine for:
- Content summarization (newsletters, case-law summary)
- Customer support (FAQ, intent classification)
- Creative content (marketing copy, drafts)
In these places, "variation" is a feature, not a bug.
Takeaway
The first question we now ask when designing enterprise AI: "Will this output be audited?" If yes, the deterministic layer is mandatory. AI is just the visible surface. Building the architecture in this order makes it easier to talk to both the regulator and the CFO.