CPIA Module 2, Section 4: Ontology Management and Semantic Normalization
MODULE 2: DATA STRUCTURES AND HEALTHCARE TERMINOLOGIES

Section 2.4: Ontology Management and Semantic Normalization

We explore how computers can understand that “heart attack” and “myocardial infarction” are the same concept. This section covers the creation of concept “family trees” (ontologies) to enable advanced analytics and decision support.

SECTION 2.4

Ontology Management and Semantic Normalization

Teaching Computers to Think Like a Clinician.

2.4.1 The “Why”: Beyond the Dictionary to the Thesaurus and Encyclopedia

In the previous sections, we’ve established the critical role of standardized terminologies. We learned that RxNorm is a comprehensive dictionary of medications, and SNOMED CT is a dictionary of clinical ideas. These “dictionaries” are essential for ensuring that when two systems use the same code, they mean the same thing. This is a huge leap forward from the ambiguity of free text. However, a dictionary alone is not enough to create a truly intelligent system. A dictionary can tell you what a word means, but it can’t tell you how that word relates to other words. For a computer to reason, to infer, and to provide meaningful clinical decision support, it needs more than a dictionary; it needs a thesaurus and an encyclopedia combined. This is the role of an ontology.

Consider this clinical problem: A computer system knows that a patient is taking “Atorvastatin 20mg Tablet” (RxCUI: 259255) and also has a new order for “Rosuvastatin 10mg Tablet” (RxCUI: 301542). To a simple system looking at codes, these are just two different numbers, as distinct as apples and oranges. The system has no inherent understanding that both of these drugs belong to the “Statin” class and that ordering them together represents a serious therapeutic duplication. How do we teach the computer this relationship? How do we build the “family tree” of medications so the system knows that atorvastatin and rosuvastatin are “siblings” in the “Statin” family?

This is the core challenge that ontology management solves. An ontology provides the rich, hierarchical structure and the web of relationships that connect individual concepts. It is the framework that allows a system to understand that “heart attack” is a synonym for “myocardial infarction,” that a “penicillin allergy” is a type of “drug hypersensitivity,” and that “atorvastatin” is a member of the “HMG-CoA Reductase Inhibitor” class. Semantic normalization is the practical process of taking the messy, varied language of a clinical encounter—the words a doctor types, the names a patient uses—and accurately pinning it to a single, precise concept within this rich ontological framework. Mastering this concept is the gateway to building truly advanced clinical tools, moving beyond simple data storage to genuine knowledge management.

Retail Pharmacist Analogy: The “Mental Ontology” of Drug Classes

As an experienced pharmacist, you possess a deeply sophisticated and instantly accessible ontology in your brain. You don’t just know tens of thousands of individual drug facts; you understand the intricate relationships between them. This “mental ontology” is what allows you to practice safely and efficiently.

A patient comes to your counter with a new prescription for celecoxib and asks, “Is it safe to take this with the Aleve I bought here yesterday?”

Your brain does not perform a simple, one-to-one check between “celecoxib” and “Aleve.” It executes a rapid, multi-step process using its internal ontology:

  1. Semantic Normalization: Your brain first normalizes the input. It knows “Aleve” is a brand name for the ingredient Naproxen. It knows Celecoxib is an ingredient.
  2. Hierarchical Lookup (The “is_a” Relationship): Your mental ontology immediately classifies these ingredients. You know that Naproxen is a “Non-selective NSAID.” You know that Celecoxib is a “COX-2 Selective NSAID.” You also know that both of these sub-classes are, in turn, members of the broader parent class “NSAID” (Nonsteroidal Anti-inflammatory Drug). You have navigated up the “family tree.”
  3. Rule Application: Your clinical knowledge base contains a high-priority rule linked to the parent “NSAID” class: “Concurrent use of multiple agents from the NSAID class significantly increases the risk of gastrointestinal bleeding and acute kidney injury.”
  4. Actionable Advice (The Output): Based on the inference that both drugs are NSAIDs, you advise the patient: “Actually, Aleve and your new prescription, celecoxib, are in the same family of anti-inflammatory medications. You should not take them together. Please only take the celecoxib as your doctor prescribed and stop taking the Aleve for now.”

Ontology management is the discipline of taking this incredibly powerful, nuanced “mental map” that exists in the heads of clinicians and formally codifying it into a structure that a computer can use. The goal is to build a system that can perform this exact same reasoning process automatically, at scale, for every patient and every medication order, 24 hours a day.

2.4.2 Deconstructing an Ontology: The Components of a Knowledge Graph

So, what actually is an ontology? While the term can sound academic and abstract, an ontology is simply a formal, machine-readable model of a particular domain of knowledge. Think of it as a highly structured “knowledge graph.” It’s composed of a few key building blocks that, when combined, can represent incredibly complex information.

The Core Components of an Ontology

Concepts (or Classes)

These are the fundamental “nouns” or categories of things in our domain. They represent abstract groupings. Examples in a clinical ontology would include: Drug, Disease, Surgical Procedure, Allergic Reaction, Statin, ACE Inhibitor.

Individuals (or Instances)

These are the specific, concrete examples of a concept. If Statin is the concept (the category), then Atorvastatin, Rosuvastatin, and Simvastatin are the individuals (the members of that category).

Attributes (or Properties)

These are the characteristics or data points that describe a concept. For example, the Drug concept might have attributes like has_mechanism_of_action, has_dose_form, or has_half_life.

Relationships (or Predicates)

These are the “verbs” that define how concepts and individuals are connected to one another. They are the most powerful part of the ontology. The most common and important relationship is is_a, which creates the hierarchy.

Visualizing a Simple Drug Ontology

Let’s build a small piece of a drug ontology to see how these components work together. The boxes are concepts/individuals, and the arrows are the relationships that connect them.

Concept: HMG-CoA Reductase Inhibitor

is_a (subclass of)

Concept: Statin

treats

Concept: Hypercholesterolemia

has_instance

has_instance

Individual: Atorvastatin Individual: Rosuvastatin

From this simple graph, a computer can understand that Atorvastatin is a Statin, that Statins are a type of HMG-CoA Reductase Inhibitor, and that Statins treat Hypercholesterolemia. The system has moved from simple data to actual knowledge.

2.4.3 Semantic Normalization in Practice: From Messy Text to Clean Concepts

Now that we understand the “map” (the ontology), we can explore the process of “getting directions”—taking real-world clinical language and finding its precise location on that map. This is semantic normalization. It is an active, computational process, often leveraging a technology called Natural Language Processing (NLP).

NLP is a field of artificial intelligence that gives computers the ability to read, understand, and derive meaning from human language. In healthcare, NLP engines are trained to read a doctor’s note or a discharge summary, identify the key clinical concepts within the text (a process called Named Entity Recognition), and then link those recognized entities to standard codes in an ontology.

Masterclass Table: The Semantic Normalization Workflow
Raw Clinical Input NLP / Normalization Engine Steps Output (Structured, Normalized Concepts)
Physician’s Note: “Pt has hx of heart attack in 2019. Now on Lipitor 20.”
  1. Named Entity Recognition: Identifies “heart attack” as a potential clinical finding and “Lipitor 20” as a potential medication.
  2. Concept Mapping: The engine queries its SNOMED CT ontology. It finds that “heart attack” is a known synonym for the concept “Myocardial infarction.”
  3. Drug Normalization: The engine queries its RxNorm ontology. It recognizes “Lipitor” as a brand name and parses “20” as the strength. It maps this to the appropriate RxCUI.
  • Finding: SNOMED CT ID: 22298006 (Myocardial infarction)
  • Medication: RxCUI (SBD): 259256 (Lipitor 20 MG Oral Tablet)
Patient Portal Message: “I get a bad rash and feel short of breath when I take Keflex.”
  1. Named Entity Recognition: Identifies “rash,” “short of breath,” and “Keflex” as key entities.
  2. Drug Normalization: Maps “Keflex” to its ingredient concept in RxNorm: Cephalexin.
  3. Reaction Mapping: Maps “rash” to the SNOMED concept for “Skin rash” and “short of breath” to the concept for “Dyspnea.”
  4. Relationship Inference: The system creates a structured allergy entry linking the substance (Cephalexin) to the reactions (Rash, Dyspnea).
Structured Allergy Entry:
  • Substance: RxCUI (IN): 2186 (Cephalexin)
  • Reactions: [ SNOMED CT: 271807003 (Skin rash), SNOMED CT: 267036007 (Dyspnea) ]

2.4.4 The Power of Inference: How Ontologies Enable System “Thinking”

This is the most powerful—and perhaps most mind-bending—aspect of a well-constructed ontology. Because the relationships between concepts are formally defined and machine-readable, a specialized program called a reasoner or inference engine can navigate the ontology to discover new, unstated knowledge. It can make logical deductions based on the information it has been given. This is the difference between a database, which can only tell you what you’ve explicitly told it, and a knowledge base, which can tell you things you haven’t.

Explicit vs. Inferred Knowledge: The Foundation of Smart Systems

Let’s make this concrete.

Explicit Knowledge (What we tell the system):

  1. We define in the ontology that Atorvastatin is_a Statin.
  2. We define that Statin is_a HMG-CoA Reductase Inhibitor.
  3. We create a clinical decision support rule: “Alert if a patient is prescribed any drug from the class HMG-CoA Reductase Inhibitor and the patient has a SNOMED diagnosis of Rhabdomyolysis.”

Inferred Knowledge (What the system “figures out” on its own):

A physician orders Atorvastatin for a patient who has a history of Rhabdomyolysis. The reasoner engine starts at “Atorvastatin,” follows the is_a link up to “Statin,” follows the next is_a link up to “HMG-CoA Reductase Inhibitor,” and realizes that this drug is a member of the class mentioned in the CDS rule. It therefore concludes, correctly, that it must fire the alert.

This is a revolutionary concept. We never created a specific rule for atorvastatin. We created a general rule for the entire class. The ontology provided the “scaffolding” that allowed the system to apply that general knowledge to a specific case. This makes our clinical systems infinitely more powerful, scalable, and easier to maintain.

Informatics Use Case: Future-Proofing Your Clinical Decision Support

The Challenge: A brand new statin, “Novastatin,” is approved by the FDA and added to your hospital’s formulary. You have hundreds of CDS rules related to statins: therapeutic duplication, drug-disease contraindications (liver disease), dose range checks, etc. In a traditional, non-ontological system, an informaticist would have to manually find and update every single one of those hundreds of rules to add “Novastatin” to the list. This is time-consuming, error-prone, and unsustainable.

The Ontological Solution: The pharmacy informaticist receives the request to add Novastatin. They perform one single, critical action: they open the ontology management tool and add one new relationship: Novastatin is_a Statin.

The “Magic”: Instantly, and without any other human intervention, the inference engine ensures that all one hundred existing rules for statins now automatically apply to Novastatin. The system has been “future-proofed.” The knowledge is managed centrally in the ontology, not hard-coded into hundreds of individual rules. This is the essence of building an intelligent, scalable, and maintainable clinical system.

2.4.5 The Informatics Angle: Curating the “Tree of Knowledge”

While national and international standards like SNOMED CT and RxNorm provide a vast, foundational ontology, no single terminology can perfectly meet every unique need of a local health system. Therefore, a core function of the health informatics team is to act as the curators and managers of the organization’s knowledge base. This involves extending the standard ontologies, creating custom groupings of concepts, and ensuring the knowledge remains up-to-date.

Masterclass Table: Core Ontology Management Tasks for Pharmacists
Task Description Pharmacist-Led Example
Value Set Management A value set is a curated, finite list of codes from one or more terminologies, created for a specific purpose. It’s a “playlist” of concepts. You are building a report to measure adherence to a new heart failure guideline. You create the “Guideline-Directed Beta Blockers” value set, which contains the RxNorm SCDs for Carvedilol, Metoprolol Succinate, and Bisoprolol. This value set is now the single source of truth for any rule or report related to this guideline.
Local Ontology Extension This involves creating new, local concepts and relationships that are specific to your institution and don’t exist in the national standards. Your hospital has a “Pharmacy to Dose” protocol for vancomycin. This protocol is not a standard drug. You create a local concept for “Vancomycin Per Protocol” and define its relationships: it is_a type of “Protocol Order,” and it involves_substance “Vancomycin.” This allows you to track and report on protocol usage.
Cross-Ontology Mapping This is the process of creating relationships that span different terminologies. This is essential for building powerful CDS. To build a drug-indication alert system, you must create relationships between the drug ontology (RxNorm) and the disease ontology (SNOMED). You would create a link asserting that the RxNorm Ingredient class “ACE Inhibitors” has_indication the SNOMED disease class “Hypertension.”
Synonym Management This involves adding local slang, abbreviations, or common misspellings to the descriptions of a concept to improve search and NLP accuracy. You notice that many physicians in your ED refer to piperacillin-tazobactam as “Zos.” While not a formal name, it’s common local parlance. You add “Zos” as a synonym for the “Piperacillin-Tazobactam” concept in your terminology server. Now, the system’s search function will correctly find the drug when a user types “Zos.”

By performing these knowledge curation tasks, the pharmacy informaticist transitions from being a simple consumer of data to being a true knowledge engineer. You are not just using the system as it is; you are actively making it smarter, safer, and more aligned with the clinical needs of your organization. This is a fundamental and powerful shift in professional responsibility, enabled by a deep understanding of how clinical knowledge is formally structured and managed.