In lesson 2, we studied goal trees, how to decompose one complex decision into a tree of smaller sub-decisions. That gave us a structure for reasoning, but it still left open an important practical question: where does the actual knowledge come from? In this lesson, that structure gets filled with explicit rules, confidence values, and recommendations.

Rule-based expert systems were the first major proof that AI could be genuinely useful in the real world. They kept goal trees as the underlying reasoning structure, but added a layer on top: explicit IF/THEN rules paired with confidence factors. They did not learn from data, and they did not look like modern neural networks. Instead, they stored expert knowledge explicitly and applied it through logical rules with weights attached to reflect uncertainty.

This lesson transforms lesson 2’s question about structure into a question about knowledge: how do we encode what we know? and how do we apply that knowledge when facts are uncertain? The answer is expert systems.

That may sound old-fashioned, but the core idea is still everywhere: compliance systems, medical checkers, fraud rules, underwriting systems, industrial control logic, and many modern agent workflows all still rely on explicit rule application.

Core learnings about expert systems

How do expert systems store knowledge in a form a machine can execute?
What does an inference engine actually do when it applies rules?
Why were certainty factors introduced instead of using only clean true/false logic?
Why did expert systems work so well in narrow domains and then struggle to scale?

What an expert system is

In lesson 1, we described AI as a mapping from inputs to outputs. Expert systems still fit that same frame. The difference is that the mapping is not hidden inside learned parameters. It is written down as explicit rules.

One rule can be written as

R_j : c_1 \land c_2 \land \dots \land c_m \Rightarrow h \qquad [CF = \alpha]

If $CF$ or symbolic rule notation is unclear, use Certainty factors in expert systems first.

You can read this line as follows:

$R_j$ means rule number $j$
$c_1, c_2, \dots, c_m$ are the conditions that must hold
$\land$ means logical “and”
$\Rightarrow$ means “implies” or “leads to”
$h$ is the conclusion or hypothesis produced by the rule
$CF$ stands for certainty factor
$\alpha$ is the confidence attached to the rule

This is one of the main advantages of rule-based AI: the system’s reasoning is visible.

The three moving parts

Every expert system is built from three connected pieces.

Knowledge base

The knowledge base stores domain knowledge.

Facts describe the current case, for example high fever or stiff neck.
Rules describe expert knowledge in IF/THEN form.

In our triage example, a rule might say that fever, neck stiffness, and severe headache together support a meningitis hypothesis.

Inference engine

The inference engine decides how rules are applied.

It follows a repeated cycle:

Match rules whose conditions are currently satisfied.
Select one rule to fire.
Execute the rule and add its conclusion to working memory.

If we call the current set of known facts $S_t$ , then firing a rule that concludes $h$ produces

S_{t+1} = S_t \cup \{h\}

For a quick notation refresh on expressions like this, see Function notation for AI.

The new symbol here is $\cup$ , which means “union” or more simply “add this new item to the set.”

Explanation facility

Expert systems can usually explain their own reasoning, because each conclusion came from an explicit rule firing.

That means the system can answer questions like:

why was this diagnosis proposed?
which rules fired?
which facts were required?

This built-in traceability is one reason expert systems are still attractive in regulated environments.

Why MYCIN mattered

MYCIN is the classic expert-system case study. It was designed to diagnose bacterial infections and recommend antibiotic treatments.

Why it mattered:

it showed that a rule-based system could perform at or above specialist level in a narrow task
it used confidence-weighted rules rather than pretending all medical evidence was certain
it could explain the chain of reasoning behind a recommendation

The key lesson is not just that MYCIN performed well. It is that expert knowledge, when encoded clearly enough, could become executable.

Why certainty factors were needed

Medical reasoning is rarely perfectly certain. Symptoms suggest possibilities; they do not guarantee them.

That is why systems like MYCIN attached a certainty factor to each rule. A certainty factor is not exactly a probability. It is better understood as a practical confidence score used to rank or strengthen conclusions.

For a dedicated explanation, see Certainty factors in expert systems.

The triage rule base in action

Our triage example turns the abstract architecture into something concrete.

some facts are directly observed, like fever or stiff neck
some facts are derived, like possible meningitis
some conclusions are recommendations, like prescribing ceftriaxone

What you are looking for is not only the final recommendation. You are looking at the path the system takes to get there.

For example:

if strong symptoms are present, one rule may derive a meningitis hypothesis
if test evidence is also present, another rule may strengthen that conclusion
if the patient has no allergy, a treatment recommendation rule can fire

So the general topic of expert systems becomes concrete here: explicit knowledge, explicit updates, explicit explanations.

Explore the expert system simulator

The interactive simulator below shows a simplified MYCIN-style reasoning loop for hospital triage.

As you use it, focus on four panels:

Observable facts: what the system knows at the start
Rule base: what knowledge has been encoded
Working memory: what new facts the system derives while it runs
Inference trace: why each step happened

The preset scenarios are meant to show different kinds of behavior:

Meningitis presentation: symptom-based reasoning
Lab confirmed: stronger evidence and stronger downstream conclusions
Viral pattern: a different branch of the rule base
Allergy contraindication: same diagnosis, different treatment outcome because of a safety constraint

Walkthrough: Allergy contraindication

Run this one once in this exact order:

Choose Allergy contraindication.
Keep Backward mode and click Run.
In the trace, observe that meningitis-supporting rules still fire from symptoms plus culture evidence.
Then follow the treatment split: the no-allergy condition is absent, so the ceftriaxone rule cannot fire.
The allergy-specific rule can fire instead, producing the chloramphenicol recommendation.

What this means: one changed fact (allergy status) can redirect treatment while keeping the diagnosis reasoning transparent.

These results can be interpreted quite directly: the system is not recommending ceftriaxone because the patient is allergic to penicillin, so the safety precondition for ceftriaxone is not satisfied. It therefore redirects treatment to the allergy-safe alternative. In practice, that is exactly the kind of transparent safety behavior rule systems are good at.

If you switch to Forward mode, the clinical outcome should stay consistent, but the reasoning flow is different:

Forward mode starts from observed facts and derives all reachable consequences.
Backward mode starts from a target recommendation and asks what must be true to justify it.
In this allergy scenario, Forward mode shows both treatment branches as consequences of available facts, while Backward mode more directly highlights why the ceftriaxone branch fails.

Expert-style reasoning in modern LLM systems

Rule-based reasoning is still used in many modern LLM stacks, usually as a control layer around neural generation:

Policy and safety rules: deterministic checks enforce contraindications, compliance, and guardrails.
Routing rules: IF/THEN logic selects tools, workflows, or prompt templates by request type.
Verification rules: symbolic validators check whether generated outputs satisfy domain constraints.
Hybrid loops: the LLM proposes outputs, while explicit rules accept, reject, revise, or escalate.

So expert-system methods did not disappear; they often provide reliability and governance around probabilistic models.

Mode

Speed medium

Observable Facts

Check the facts that are known about the patient.

High fever (>38.5°C) Stiff neck (nuchal rigidity) Severe headache Moderate headache Positive blood culture Elevated WBC count No known drug allergy Penicillin allergy

Rule Base 9 rules

IF high-fever AND stiff-neck AND severe-headache

THEN possible-meningitis CF 0.80

Classic triad of bacterial meningitis symptoms. All three must be present.

IF possible-meningitis AND positive-culture

THEN bacterial-meningitis CF 0.95

Clinical triad plus positive blood culture strongly confirms bacterial etiology.

IF possible-meningitis AND elevated-wbc

THEN bacterial-meningitis CF 0.76

Clinical triad plus elevated WBC suggests bacterial, not viral, etiology.

IF bacterial-meningitis AND no-allergy

THEN prescribe-ceftriaxone CF 0.94

First-line antibiotic therapy for bacterial meningitis when no penicillin allergy is present.

IF bacterial-meningitis AND penicillin-allergy

THEN prescribe-chloramphenicol CF 0.88

Alternative therapy when standard cephalosporin treatment is contraindicated by allergy.

IF high-fever AND headache

THEN possible-viral CF 0.68

Fever with headache but no neck stiffness is more consistent with a viral than bacterial cause.

IF possible-viral

THEN recommend-rest CF 0.92

Viral infections are generally managed with supportive care: rest and adequate hydration.

IF severe-headache

THEN possible-tension CF 0.55

Severe headache without fever or neck stiffness may indicate a tension-type headache, though this rule has low confidence and would be overridden by stronger evidence.

IF possible-tension

THEN prescribe-ibuprofen CF 0.90

NSAIDs are first-line treatment for tension-type headache management.

Working Memory

No derived facts yet.

Inference Trace

Run the engine to see inference steps.

If the simulator works well, you should come away with two intuitions. First, the system is easy to inspect because every step is explicit. Second, adding more rules quickly increases maintenance complexity.

You can think of this lesson as the next step after lesson 2. The tree structure is still there in the background, but now the reasoning feels more concrete because reusable rules are doing the work and confidence values can influence what gets recommended.

What comes next

In lesson 4, we move away from a fixed rule base and start searching through possible states directly. That shift matters because not every problem can be solved by storing enough rules in advance.

Why expert systems struggled to scale

The central problem was the knowledge acquisition bottleneck.

That means useful knowledge had to be extracted from experts and written down manually. This is harder than it sounds for three reasons:

experts often know more than they can clearly articulate
rules interact with one another in ways that become difficult to manage at scale
real domains change, so the rule base has to be updated continuously by hand

This is where systems like XCON become important. They proved expert systems could generate real business value, but they also showed that maintaining thousands of rules is expensive and fragile.

Why this still matters today

Expert systems are not dead. In many industries they were absorbed into production software and rebranded as decision engines, compliance logic, or workflow rules.

Their core strengths still matter:

transparent reasoning
fast execution
explicit control over decisions
easy auditing

Their core weakness also still matters: they do not learn their own knowledge.

Notation quick reference

Symbol	Meaning	Detailed Explanation
$R_j$	rule number $j$	What an expert system is
$c_i$	condition number $i$	What an expert system is
$\land$	logical and	What an expert system is
$\Rightarrow$	implies / leads to	What an expert system is
$h$	conclusion or hypothesis	What an expert system is
$CF$	certainty factor	Why certainty factors were needed
$\alpha$	confidence attached to a rule	What an expert system is
$S_t$	fact set at step $t$	The three moving parts
$S_{t+1}$	updated fact set after one firing step	The three moving parts
$\cup$	add to / union of sets	The three moving parts

For standalone math deep dives:

References and Further Reading

Shortliffe, E.H. Computer-Based Medical Consultations: MYCIN. Elsevier, 1976.
Winston, Patrick H. Artificial Intelligence, 3rd ed. Addison-Wesley, 1992. Chapter 5.
McDermott, J. “R1: A Rule-Based Configurer of Computer Systems.” Artificial Intelligence 19(1), 1982.
Hayes-Roth, F., Waterman, D.A., Lenat, D.B. (eds.) Building Expert Systems. Addison-Wesley, 1983.
Buchanan, B.G. and Shortliffe, E.H. (eds.) Rule-Based Expert Systems. Addison-Wesley, 1984.

This is Lesson 3 of 18 in the AI Starter Course.

What Are Expert Systems? Rule-Based AI, MYCIN, and the First Commercial AI Wave

Lesson introduction