CPIA Module 11, Section 4: Data Quality Checks and Reconciliation
MODULE 11: ANALYTICS & DATA EXTRACTION

Section 11.4: Data Quality Checks and Reconciliation

Master the art of “data forensics.” Learn how to develop and implement data quality audits to identify inconsistencies, reconcile discrepancies between systems (e.g., billing vs. eMAR data), and ensure the integrity and reliability of your analytical outputs.

SECTION 11.4

Data Quality Checks and Reconciliation

The Pharmacist as a Data Detective: Ensuring Trust in Every Number.

11.4.1 The “Why”: Garbage In, Gospel Out

In the world of data, there is a foundational principle known as “Garbage In, Garbage Out” (GIGO). It states that the quality of the output is determined by the quality of the input. If you feed a system flawed, inaccurate, or incomplete data, the results it produces will be equally flawed, inaccurate, and incomplete. In healthcare analytics, however, we face a far more dangerous phenomenon: “Garbage In, Gospel Out.”

This is the critical difference. When a beautiful, authoritative-looking dashboard is presented to a C-suite executive or a quality committee, it is often treated as “gospel.” The charts are so clean and the numbers so precise that they are rarely questioned. No one sees the messy, inconsistent, and sometimes erroneous raw data that lives underneath. They see the polished final product and assume it is the absolute truth. If that “truth” is built on a foundation of poor-quality data, the organization will make confident, data-driven, and catastrophically wrong decisions.

Consider the clinical stakes. A dashboard shows that your hospital’s sepsis mortality is trending up. The administration launches a multi-million dollar initiative to retrain providers. But what if the real problem was a change in the billing department’s coding practices that incorrectly attributed deaths to sepsis? A report identifies a physician as a high-prescriber of opioids, triggering a formal review. But what if the data failed to account for their role as the primary attending for a hospice unit? A financial analysis concludes that a new, expensive antibiotic is not cost-effective. But what if the ETL process failed to correctly import the data showing that this drug significantly reduced patient length of stay?

Data quality is not an IT problem; it is a patient safety and operational imperative. As a pharmacy informaticist, you are uniquely positioned to be the guardian of medication data integrity. Your clinical expertise allows you to spot errors that a purely technical analyst would never recognize. Your role is to be the ultimate skeptic—to treat every number with professional distrust until you have validated its source and its journey. This section is designed to give you the forensic tools and the investigative mindset to ensure that when you present data, it isn’t just a pretty picture; it’s a trustworthy foundation for critical decision-making.

Retail Pharmacist Analogy: The C-II Biennial Inventory

Imagine it is time for your pharmacy’s biennial controlled substance inventory, one of the most high-stakes reconciliations you perform. You cannot simply trust the number in the computer. You must assume it is wrong and prove it is right. This is the data quality mindset.

Your “computer on-hand quantity” is your data warehouse record. Your “physical count of tablets in the bottle” is your source of truth. The process of comparing these two is data reconciliation. When you find a discrepancy—the computer says you should have 500 tablets of oxycodone, but you only count 498—you become a data detective. You don’t just change the number. You launch a forensic investigation:

  • Audit the “Extract” process: “Let’s check the delivery log. Did we receive a bottle of 500 and someone incorrectly entered 502 when checking it in?” (This is a data entry error at the source).
  • Audit the “Transform” process: “I’ll pull the dispensing log. Is it possible a prescription for #120 was dispensed, but the system incorrectly decremented the inventory by #118?” (This is a business logic error).
  • Audit for “Completeness”: “Did a technician dispense a prescription and forget to complete the final step in the computer, so the patient got the drug but the inventory was never updated?” (This is a process/workflow error leading to missing data).
  • Reconcile against another system: “Let me check the point-of-sale cash register records. I see a co-pay was collected for this prescription, which confirms it was dispensed. Now I know the error is in the pharmacy system’s inventory logic.” (This is reconciling against a secondary source).

The meticulous, step-by-step, assumption-free process you use to resolve a C-II count discrepancy is the exact same intellectual process required for ensuring clinical data quality. You must trace the data’s journey, validate each step, and prove its integrity before you can certify it as correct.

11.4.2 The Sources of “Data Rot”: A Taxonomy of Errors

Data errors are not random. They tend to fall into predictable categories. Understanding this taxonomy is the first step in developing a systematic approach to finding and fixing them. As a data detective, you need to know what kind of “crimes” to look for. The International Organization for Standardization (ISO) outlines several dimensions of data quality, which we can adapt for a clinical pharmacy context.

Masterclass Table: The Six Dimensions of Medication Data Quality
Dimension Core Question High-Stakes Pharmacy Example Common Causes
Completeness Is all the necessary data present? A patient’s weight is not documented (`NULL`), making it impossible to safely calculate their weight-based vancomycin dose. An allergy field is blank, leading to the administration of a drug to which the patient has a known anaphylactic allergy. Non-mandatory fields in the EHR; workflow interruptions; data not sent from an upstream system (e.g., lab interface is down).
Uniqueness Is this record a one-of-a-kind entity? During a chaotic ED registration, a patient is accidentally registered twice, creating two different Medical Record Numbers (MRNs). Their home medications are documented under MRN-A, but all their inpatient orders are placed under MRN-B. The clinical team is unaware of the patient’s home anticoagulants, leading to a massive safety risk. Human data entry errors during registration; failures in the Master Patient Index (MPI) matching logic.
Timeliness Is the data available when it is needed? A patient’s morning lab results, including a critically high potassium level, are not loaded into the data warehouse until the next day. An automated report designed to flag these patients for intervention fails to fire, and the patient receives another dose of an ACE inhibitor. Failures or delays in the nightly ETL process; interface engine delays; system performance issues.
Validity Does the data conform to the required format and rules? A nurse manually enters a heparin drip rate as “500” but the field was expecting units/hr, not mL/hr, resulting in a 5x overdose. A drug concentration is entered as “250mg/5ml” in a free-text field, which is unusable for automated calculations. Poorly designed user interfaces; lack of input masks or data type constraints; use of free-text fields instead of structured data.
Accuracy Is the data factually correct and reflective of the real world? A nurse accidentally documents the administration of a patient’s morning medications on the wrong patient’s chart. The data is complete and valid (it’s in the right format), but it is dangerously inaccurate. The first patient misses their doses while the second is double-dosed. Human error; misidentification of patients; incorrect selection from a dropdown list; typos.
Consistency Is the same piece of information the same across all systems? A patient’s allergy to penicillin is correctly documented in the Pharmacy system but was never updated in the core EHR allergy module. The physician places an order for Zosyn in the EHR, the system fails to fire an allergy warning, and the pharmacist must rely on manual checks to prevent a severe reaction. Lack of bidirectional interfaces; systems updated at different times; different data standards between departments or vendors.

11.4.3 The Data Quality Toolkit: Your Forensic Instruments

Identifying data quality issues requires a deliberate, systematic approach. You cannot simply look at a billion-row table and spot errors. You must run specific tests designed to uncover the types of problems we’ve just defined. This process is called data profiling. It is the initial act of investigation where you use a set of tools—primarily SQL queries—to analyze your data and understand its structure, content, and quality. Think of this as the initial “lab work” you run on your data to diagnose its health.

Level 1: Basic Profiling with Aggregate Functions

Your first pass at any new dataset should involve a series of simple but powerful SQL queries that give you a high-level overview of its quality. These queries are fast, easy to write, and can immediately flag major problems.

The Pharmacist’s Data Profiling “Lab Panel”

For any new table (e.g., `dbo.Medication_Orders`), run this standard panel of tests to get a baseline understanding of its health.

  1. The “Record Count” Test (Sanity Check): Does the number of rows make sense?
    SELECT COUNT(*) FROM dbo.Medication_Orders;

    What it tells you: If you expect about 1 million orders per year and the count is 50,000, your ETL process may have failed. If it’s 20 million, you may have duplicate records.

  2. The “Completeness” Test (Checking for NULLs): How many missing values are in your critical columns?
    SELECT COUNT(*) FROM dbo.Medication_Orders WHERE Patient_Weight_kg IS NULL;

    What it tells you: A high number of NULLs in a field required for clinical decision support (like weight or creatinine) is a major quality and safety issue that must be investigated.

  3. The “Range” Test (Finding Outliers): What are the minimum and maximum values for your numeric fields?
    SELECT MIN(Order_Date), MAX(Order_Date), MIN(Patient_Age), MAX(Patient_Age) FROM dbo.Orders_Joined_With_Patients;

    What it tells you: This quickly finds validity errors. If `MAX(Patient_Age)` is 250, you have bad data. If `MIN(Order_Date)` is from 1985, you have a data import error.

  4. The “Frequency” Test (Finding Inconsistent Categories): What are all the distinct values in a text field, and how often do they appear?
    SELECT Drug_Name, COUNT(*)
    FROM dbo.Medication_Orders
    GROUP BY Drug_Name
    ORDER BY Drug_Name;

    What it tells you: This is the single best way to find standardization problems. The results will show you “Warfarin”, “warfarin”, “Warfarin Sodium”, and “Coumadin” as separate entries that all need to be mapped to a single standard concept.

Level 2: Advanced Audits with Business Rules

Once you’ve profiled the data, the next step is to apply your clinical expertise. A business rule audit is a query you write that checks for violations of a known clinical or operational rule. This is where you, as a pharmacist, provide immense value. You know the rules of safe medication use, and you can translate those rules into SQL code to proactively hunt for deviations and potential errors in the data.

Playbook: Building Clinical Business Rule Audits
Clinical Rule Data Quality Dimension Tested Simplified SQL Logic
“A heparin drip order must have a corresponding indication (e.g., DVT, PE, ACS) documented.” Completeness
SELECT Order_ID FROM dbo.Medication_Orders WHERE Drug_Name LIKE ‘Heparin%Drip%’ AND Indication_Text IS NULL;
“The documented administration time cannot be earlier than the order time.” Accuracy / Consistency
SELECT * FROM dbo.MAR_Data WHERE Admin_Timestamp < Order_Timestamp;
“Patients with a documented severe penicillin allergy should not have active orders for beta-lactam antibiotics.” Consistency
SELECT P.Patient_ID FROM dbo.Patients P JOIN dbo.Allergies A ON P.Patient_ID=A.Patient_ID JOIN dbo.Orders O ON P.Patient_ID=O.Patient_ID WHERE A.Allergen = ‘Penicillin’ AND A.Severity = ‘Severe’ AND O.Drug_Class = ‘Beta-Lactam’;

11.4.4 Masterclass Deep Dive: Reconciling Billing vs. eMAR Data

One of the most valuable and frequently performed data quality investigations in hospital pharmacy is the charge reconciliation process. This involves comparing the data from two different “sources of truth”—the pharmacy/billing system, which records what was charged for, and the electronic Medication Administration Record (eMAR), which records what was clinically documented as administered to the patient. The goal is to find discrepancies, which can represent significant patient safety risks, compliance issues, or lost revenue.

The Billing System View

“What we charged for.”

  • Source of Truth: The charge master, pharmacy information system, and automated dispensing cabinet logs.
  • Event Trigger: A charge is typically generated when a drug is dispensed from the central pharmacy, pulled from an ADC, or wasted.
  • Primary Purpose: Revenue cycle management. To ensure the hospital is properly reimbursed for the medications it provides.

The eMAR System View

“What the patient received.”

  • Source of Truth: The nurse’s or provider’s clinical documentation.
  • Event Trigger: A record is created when a nurse scans a barcode and signs off on the administration in the EHR.
  • Primary Purpose: Patient care and legal documentation. To create a permanent record of the patient’s treatment.

The Reconciliation Process: A Conceptual Walkthrough

The core of reconciliation is to join the two datasets together and look for mismatches. A `FULL OUTER JOIN` is often the perfect tool for this, as it will show you records that exist in one table but not the other.

— This is a conceptual query; actual table/column names will vary greatly.
SELECT
    COALESCE(C.Patient_ID, A.Patient_ID) AS Patient_ID,
    COALESCE(C.Drug_ID, A.Drug_ID) AS Drug_ID,
    C.Charged_Quantity,
    A.Administered_Quantity
FROM dbo.Billing_Charges C
FULL OUTER JOIN dbo.eMAR_Admins A
    ON C.Patient_ID = A.Patient_ID — Match on patient
    AND C.Drug_ID = A.Drug_ID — Match on drug
    AND C.Charge_Date = A.Admin_Date — Match on date
WHERE
    C.Charge_ID IS NULL — Administered but not charged
    OR A.Admin_ID IS NULL — Charged but not administered
    OR C.Charged_Quantity <> A.Administered_Quantity; — Mismatched quantities
Forensic Playbook: Investigating Common Discrepancies
Discrepancy Found Potential Causes (The “Differential Diagnosis”) Investigative Action
Charged but Not Administered
(Charge record exists, eMAR record is NULL)
  • Patient Refusal: Dose was dispensed but the patient refused it.
  • Dose Held/Discontinued: Order was changed after the dose was prepared.
  • Diversion: A staff member removed the drug but did not administer it. (Highest risk!)
  • Process Failure: The dose was properly returned to pharmacy/ADC, but a credit was never issued.
This requires immediate follow-up. Start by reviewing the patient’s chart for notes about held/refused doses. Check ADC return records. If no clear reason is found, this must be escalated to management as a potential diversion event.
Administered but Not Charged
(eMAR record exists, Charge record is NULL)
  • Lost Revenue: A dose was administered from floor stock, but a manual charge was never entered.
  • Wrong Patient Charged: The dose was charged to Patient A but administered to Patient B.
  • System Interface Issue: The charge from the pharmacy system failed to transmit to the billing system.
This is primarily a revenue integrity issue. The analyst should work with the pharmacy billing team to identify the source of the missing charge and, if possible, submit a late charge to capture the lost revenue.
Mismatched Quantities
(e.g., Charged Qty=2, Admin Qty=1)
  • Partial Dose/Waste: Most common cause. A nurse wastes part of a dose but a credit is not issued for the wasted amount.
  • Data Entry Error: A charge for a full vial was generated, but only half the vial was used.
  • Dose Titration: An IV drip was charged for a 250mL bag, but the infusion was stopped after 100mL was administered.
Review the eMAR for any documentation of waste. These discrepancies often highlight the need for better nursing education on the process for documenting waste in a way that automatically generates a credit in the billing system.

11.4.5 Building a Data Quality Dashboard

Data quality cannot be a one-time project. It is an ongoing process of monitoring, maintenance, and continuous improvement. The most effective way to manage this is by building a Data Quality Dashboard. Unlike a clinical dashboard that tells you about patient care, this dashboard tells you about the health of your data itself. It is a tool for you, your fellow analysts, and the IT department to monitor the performance of your ETL processes and proactively identify emerging quality issues before they impact downstream reports.

Medication Data Quality Dashboard

Monitoring the Health of the Clinical Data Warehouse

Nightly ETL Status

Success (Last Run: 02:15 AM)

New Duplicate MRNs

3

vs. weekly avg.

% Orders Missing Weight

1.8%

vs. prior month

New Charge Discrepancies

1,204

Flagged for Review

Trend of NULL Values in Critical Fields
[Visual of a Line Chart showing trends for NULL Patient_Weight, NULL Allergies, etc.]
Unmapped Drug Name Frequency

New drug names that need to be mapped to a standard concept.

[Visual of a Horizontal Bar Chart showing names like “Tylenol #3” or “Aspirin EC 81mg” that need standardization]

11.4.6 The Pharmacist as Data Governor

Mastering the tools and techniques of data quality elevates your role from a simple report writer to a true Data Steward and a key player in your organization’s Data Governance program. Data Governance is the formal process of managing data as a strategic enterprise asset. It establishes the policies, standards, and roles responsible for ensuring that data is accurate, consistent, secure, and trustworthy across the entire organization.

As the designated steward for medication data, you are the acknowledged subject matter expert. You are the bridge between the technical teams who manage the databases and the clinical/operational teams who use the data. Your voice in the Data Governance committee is essential because you provide the clinical context that turns abstract quality metrics into tangible patient safety initiatives.

Core Responsibilities of a Pharmacy Data Steward
  • Defining the “Single Source of Truth”: You are responsible for leading the effort to answer questions like, “Which system holds the official record for a patient’s allergies?” or “How do we define a ‘Day of Therapy’ for our entire hospital?” You help create the official data dictionary for all medication-related terms.
  • Approving ETL Logic: You review and approve the business rules embedded in the ETL pipeline for medication data. You are the one who signs off on the logic for calculating creatinine clearance or flagging high-risk drugs.
  • Investigating and Triaging Issues: When a clinician questions a number on a dashboard, the request comes to you. You perform the initial forensic investigation to determine the root cause and triage the issue to the appropriate team (e.g., ETL developers, the application team, clinical education).
  • Championing Data Literacy: You are a key educator, helping clinicians and leaders understand the limitations of the data, how metrics are defined, and how to interpret reports correctly. You are the advocate for making data-driven decisions, but only with data that has been proven to be trustworthy.

Ultimately, your work in data quality and reconciliation is about building trust. When you can stand before a committee and say, “I have personally validated the integrity of this data from the bedside to this dashboard,” you provide the foundation of confidence that allows your organization to move forward with critical initiatives that improve patient care, enhance safety, and optimize financial performance.