When Machines Stop Being Predictable
Anyone who has written software over the past few decades understands a fundamental property of traditional programs: they are deterministic. Given the same input, the system will always produce the same output. This blog post explores how probabilistic AI challenges that mental model and what engineers and organizations must do about it.
Consider a simple banking system. A customer has ₹5,000 in their account and initiates a transfer of ₹1,000. The system checks the balance, verifies it is sufficient, deducts the amount, and confirms the transaction. Run this operation a thousand times under the same conditions, and the outcome will be identical every single time. The balance will always be ₹4,000. The confirmation will always appear. There is no ambiguity.
This property shaped not only how we build software but how we think about systems. Over time, determinism became more than a technical characteristic. It became a mental model. Programmers built systems around it. Businesses built products around it. Entire industries came to rely on it. Policies, institutions, regulatory frameworks — all of them were architected on one foundational assumption: machines behave predictably.
This assumption became essential in industries where decisions carry real consequences. Consider finance, healthcare, or legal services. These sectors operate under strict regulatory environments where every decision must be traceable and explainable. If a system rejects a loan application or flags a transaction as fraudulent, regulators will ask a simple question: Why?
In deterministic systems, the answer is straightforward. There is a defined chain of logic — a rule, a condition, a piece of code — that produced the result. The system behaves the same way every time under the same conditions. This made auditing possible. It made compliance manageable. And it deeply reinforced our belief that machines should behave with perfect consistency.
In truth, that consistency was never absolute. Anyone who has spent time in production engineering knows that traditional software has always had pockets of non-determinism — race conditions, floating-point edge cases, distributed systems behaving unpredictably under load. But we invested enormous effort in taming those inconsistencies. We built layers of abstraction, testing, and monitoring to ensure that at the surface, where humans interact with machines, everything appeared orderly and predictable. The illusion of determinism was engineered, and it was convincing.
That illusion held for decades. But quietly, a different kind of system was emerging underneath.
The Quiet Rise of Probabilistic Systems
Machine learning systems do not follow hard-coded rules in the traditional sense. They learn patterns from data and make predictions based on statistical likelihoods. And for years, they were already making consequential decisions — most of us just never saw it happening.
Consider fraud detection. When you swipe your credit card at a store, a machine learning model evaluates the transaction in milliseconds. It considers hundreds of signals — your location, the merchant type, the amount, the time of day, your spending history — and estimates a probability that the transaction is fraudulent. If the probability is high enough, the transaction is blocked. If not, it goes through. You experience this as a simple binary outcome: approved or declined. But behind that clean, decisive moment, a probabilistic model just made an educated guess about your behavior. You never saw the probability. You only saw the outcome.
This is the pattern that held for years. Businesses wrapped probabilistic models inside deterministic decision systems. We were never interacting with these models directly. The algorithms operated in the background, processing signals, adjusting estimates, updating predictions — while our interaction remained familiar and deterministic. Approved. Declined. Flagged. Cleared. Simple, binary, predictable.
Enterprises worked hard to preserve this arrangement. They added a layer of deterministic logic on top of every probabilistic model.
Here is how that worked in practice. Suppose a bank uses a machine learning model to assess credit risk. For a given applicant, the model examines income, employment history, existing debt, repayment patterns, and dozens of other variables. It outputs a probability — say, a 73% likelihood that this applicant will repay the loan on time.
But the bank cannot act on a probability alone. Regulators and internal policy demand a clear outcome: approve or reject. So the institution defines a threshold. If the model's confidence in repayment falls below a certain level, the application is declined. If it meets the bar, the application proceeds to the next stage of review.
The threshold converts a probabilistic estimate into a deterministic decision. The model guesses. The rule decides. And the rule is what gets audited.
For years, this approach worked remarkably well.
Then Came Large Language Models
Then the world was introduced to a class of systems that changed the equation entirely: Large Language Models. Models like those powering ChatGPT, Claude, Gemini, and similar systems operate on a deceptively simple underlying principle. At their core, they predict the next token in a sequence.
To understand what this means in practice, consider what happens when you give one of these models a simple instruction: "Write a professional email declining a meeting request."
The model does not retrieve a template. It does not follow a script. It begins generating the response one token at a time — each word selected based on the statistical probability of what should come next, given everything that came before it. The opening might be "Thank you for the invitation" or "I appreciate you reaching out" or "Unfortunately, I won't be able to attend." Each choice is plausible. Each is different. And each one shapes what follows. A response that begins with "Thank you" will unfold differently than one that begins with "Unfortunately" — different tone, different structure, different word choices cascading through the entire message.
A single email might involve hundreds of these probabilistic selections, each one influencing the next. This is not a lookup. It is not retrieval. It is generation — construction through sequential estimation. And it is why the same prompt, given twice, can produce two entirely different emails that are both perfectly reasonable.
Unlike earlier machine learning systems that operated invisibly in the background, LLMs interact directly with humans through natural language. They hold conversations. They draft documents. They answer questions. They generate code. And they do all of this in real time, producing different outputs even when given identical inputs.
This is where the deterministic mental model begins to break down.
Why This Matters for Enterprises
This shift in how AI systems behave challenges the very foundations on which enterprise systems are built.
Consider a hospital system that uses an AI tool to help physicians generate discharge summaries — the documents that tell patients what happened during their stay, what medications to take, and what warning signs to watch for after they go home.
A patient is treated for a mild cardiac event. The physician asks the AI to draft a discharge summary. The system produces a clear, well-organized document — medication instructions, follow-up appointment details, and a list of symptoms that should prompt an immediate return to the emergency room.
The following week, a different physician treats a nearly identical case — same diagnosis, same treatment protocol, same medications — and asks the system to generate the same type of summary. This time, the document is subtly different. The medication dosage instructions are phrased as "take as needed" rather than "take daily." One warning sign from the first summary is absent entirely — not because the system evaluated it as irrelevant, but because the probabilistic generation simply took a different path through the text and never produced that particular detail.
Both documents look professional. Both read like they were written by a competent physician. But the difference between "take daily" and "take as needed" for a cardiac medication is not a matter of style. It is a matter of patient safety.
I have worked on a similar problem in a previous organization, and what I can tell you is this: the dangerous thing about these systems is not that they produce obvious errors. Obvious errors get caught. The dangerous thing is that they produce plausible variations — text that reads fluently, sounds authoritative, and happens to be subtly, consequentially different from what it should say. On a busy shift, when a physician is reviewing an AI-generated draft that looks polished and complete, that subtle difference is exactly the kind of thing that slips through.
And the problems extend beyond inconsistency. Consider a legal research assistant powered by an LLM. An attorney asks it to find case precedents supporting a particular argument. The model returns three citations — two are real, one is entirely fabricated. The case name sounds plausible. The citation format is correct. But the case does not exist. The model did not "look up" cases and make an error. It generated a probable-sounding citation because, statistically, that is what belonged in that position in the text. This has already happened in practice — attorneys have submitted AI-generated briefs containing fictitious case citations to federal courts.
These are not bugs in the traditional sense. There is no line of code to fix. No rule that misfired. The systems were doing exactly what they were designed to do — estimating the most probable next token — and the result was still wrong.
In deterministic systems, when something breaks, you can reproduce the failure, trace the root cause, and deploy a fix. With probabilistic systems, the same failure may never reproduce. The conditions that led to it — the exact sequence of tokens, the context window, the sampling parameters — may never align in quite the same way again.
For organizations built on deterministic assumptions — where every outcome must be traceable, reproducible, and explainable — this is a profound shift. It is not just a technical problem. It is a structural one. It touches compliance frameworks, liability models, quality assurance processes, and institutional trust.
And it raises a question that no one has fully answered yet: when a probabilistic system makes a consequential error, who is responsible? The training data? The model provider? The engineer who wrote the prompt? The architect who designed the system? The executive who approved the deployment? In deterministic systems, the causal chain was clear. In probabilistic ones, it becomes diffuse. And our institutions — legal, regulatory, organizational — were not built for diffuse causality.
The Emerging Discipline of AI Systems Engineering
This is where my work lives. As an AI engineer, my role is no longer limited to training models. Increasingly, it involves designing systems that constrain and guide probabilistic models so they behave reliably enough for real-world applications.
This includes approaches like:
- Controlling how models generate outputs — adjusting sampling strategies to reduce randomness where consistency is needed
- Adding guardrails and validation layers — catching outputs that fall outside acceptable boundaries before they reach a user
- Enforcing structured outputs — constraining the model to return data in predefined formats rather than freeform text
- Using models as reasoning engines, not decision makers — letting the model propose, while deterministic systems verify and execute
- Building evaluation and monitoring pipelines — because when you cannot write a simple pass/fail test for natural language, you need new frameworks for measuring quality
- Combining LLMs with deterministic tools and workflows — grounding probabilistic intelligence in systems that enforce rules, validate facts, and maintain audit trails
The objective is not to eliminate probability — that would be both impossible and counterproductive. The goal is to build systems where probabilistic intelligence operates within deterministic boundaries. The model reasons. The system decides. The boundary between the two is carefully engineered.
In many ways, this is the defining engineering challenge of the current era of AI.
The Harder Shift Is Psychological
Beyond the technical challenges, there is a deeper shift taking place — one that is psychological and cultural.
Humans prefer certainty. Predictable systems reduce anxiety and build trust. For decades, software reinforced this expectation. Machines behaved like precise instruments. They executed instructions exactly as written. When you pressed a button, the outcome was never in doubt.
AI systems behave differently. They do not calculate answers in the traditional sense. They estimate them. And this introduces a kind of uncertainty into our relationship with machines that we are not accustomed to.
This discomfort is not limited to individual users. It runs through entire organizations. The cultures inside enterprises — the QA processes, the change management workflows, the incident response playbooks, the SLAs — all of these were built for a world where you could reproduce a problem, find its source, and fix it. When the "problem" is a statistical artifact that may never occur again in the same way, those workflows stop making sense. The organizational muscle memory developed over decades of deterministic systems does not transfer cleanly to probabilistic ones.
Adjusting to this reality requires more than new tools. It requires new ways of thinking — about reliability, about accountability, about what it means for a system to be "correct."
Where This Is Heading
The near-term future of AI systems will not be purely probabilistic or purely deterministic. What I see emerging are hybrid architectures — systems where each layer does what it is best suited for.
Probabilistic models handle tasks like language understanding, pattern recognition, reasoning, and generation. Deterministic systems enforce rules, validate outputs, execute decisions, and maintain audit trails. Intelligence becomes probabilistic. Decision enforcement remains deterministic.
This layered approach is, in my view, the most promising path forward. It allows organizations to harness the remarkable capabilities of modern AI while preserving the reliability, traceability, and accountability that real-world systems demand.
But getting this right is not just an engineering problem. It requires rethinking compliance frameworks, liability models, evaluation standards, and institutional expectations. The organizations that adapt — that update their mental models as quickly as they update their technology — will have a significant advantage. Those that cling to purely deterministic assumptions will find themselves unable to adopt the most powerful tools of this era. And those that embrace probability without building adequate structure around it will learn painful lessons about what happens when estimation meets accountability.
For decades, software trained us to expect certainty from machines. AI is asking us to develop a more sophisticated relationship with uncertainty — not to abandon rigor, but to evolve what rigor means in a world where intelligence itself is probabilistic. We are only beginning to figure out how to do that well.