Section 4: Bug Tracking and Root Cause Analysis
Transform from a bug finder to a problem solver. Learn to use tools like Jira for defect management and master the “Five Whys” and other root cause analysis techniques to move beyond fixing symptoms and start addressing the underlying systemic issues.
Bug Tracking and Root Cause Analysis
From Medication Error Reporting to Systemic Process Improvement: The Pharmacist as a Safety Engineer.
10.4.1 The “Why”: The Evolution from Error Reporting to Error Prevention
As a practicing pharmacist, you are intimately familiar with the process of medication error reporting. When a near-miss or an actual error occurs, you are ethically and professionally bound to document it, report it through systems like VERIP, and participate in discussions to understand what happened. This process is a cornerstone of a just culture and a foundational element of patient safety. At its core, this workflow involves three key activities: identifying a deviation from expected outcomes, documenting it with objective evidence, and analyzing the immediate factors that contributed to the event.
Bug tracking and root cause analysis are the direct informatics equivalent of this process, elevated to a science. A “bug” or “defect” is simply a deviation from expected system behavior—an electronic medication error. The process of tracking these bugs is the formal, structured, and auditable method of documenting these deviations. But the true value, and the focus of this section, lies in moving beyond simply documenting the bug. It is about making the crucial leap from identifying the symptom (the bug itself) to diagnosing and curing the underlying disease (the flawed process, knowledge gap, or technical weakness that allowed the bug to exist). This is the discipline of Root Cause Analysis (RCA).
Fixing a bug is like correcting a single dispensing error; it resolves the immediate problem for one patient at one point in time. Performing a thorough root cause analysis is like redesigning the pharmacy workflow to make that entire class of error impossible. It is the difference between reactive problem-solving and proactive safety engineering. As a pharmacy informatics analyst, you are uniquely positioned to excel at this. Your clinical training has equipped you with a powerful ability to perform differential diagnoses—to look at a set of symptoms and work backward to find the single unifying cause. This section will teach you how to apply that diagnostic mindset to the world of software and systems. You will learn to use structured tools like Jira for defect management, master powerful RCA techniques like the “Five Whys” and Fishbone Diagrams, and ultimately transform yourself from a finder of problems into a designer of resilient, error-resistant systems.
Pharmacist Analogy: The Recurring “Wrong SIG” Investigation
Imagine you work in a busy retail pharmacy. For the third time this month, you receive an e-prescription for Ozempic with the SIG: “Inject 1 pen subcutaneously once weekly.” This is a dangerously wrong SIG; the patient should inject a specific dose from the pen, not the entire pen. Each time, you have caught the error, called the prescriber’s office to clarify the correct dose (e.g., 0.25 mg), corrected the prescription, and dispensed it safely.
Level 1 Response (Fixing the Symptom): You have successfully fixed the immediate problem. This is equivalent to a developer fixing a single bug that a tester reported. The immediate danger is averted.
Level 2 Response (Bug Tracking): After the second time, you start a log. You create a formal record of each event: the date, patient, prescriber, and the specific error. You attach a copy of the erroneous e-prescription. This is bug tracking. You are no longer treating these as isolated incidents but as data points in a larger pattern of failure.
Level 3 Response (Root Cause Analysis): After the third incident, you escalate. Fixing each Rx is not enough; the problem is systemic. You perform a Root Cause Analysis using the “Five Whys”:
- Why did the prescription have the wrong SIG? Because the medical assistant (MA) selected the wrong default option in their EHR.
- Why did the MA select the wrong option? Because the default options are confusingly named. Option A is “Ozempic Pen” and Option B is “Ozempic Dose.”
- Why are the options named so poorly? Because the EHR’s medication database (formulary build) was configured by an IT analyst without any clinical pharmacy input.
- Why was a pharmacist not involved in the formulary build? Because the hospital’s policy for EHR configuration doesn’t require pharmacist sign-off on medication build files.
- Why doesn’t the policy require pharmacist sign-off? Because the P&T and IT Governance committees have historically operated in silos, without a clear policy for interdisciplinary clinical system validation.
The root cause is not a careless MA; it’s a flawed institutional governance process. The ultimate solution is not to retrain the MA, but to work with the other institution to fix their EHR drug record and, more broadly, to advocate for a policy change that requires pharmacy review of all medication-related system configurations. This is the power of moving from bug fixing to root cause analysis. You have scaled your intervention from protecting one patient to protecting every future patient who will receive a prescription from that clinic.
10.4.2 Masterclass on Bug Tracking: The Art of the Perfect Defect Report
Effective bug tracking is the foundation of a healthy software development lifecycle. A bug that is poorly documented, ambiguous, or impossible to reproduce is a bug that will never be fixed. The goal of a defect report is to provide a developer with all the information they need to find, understand, and fix the problem without having to engage in a lengthy back-and-forth clarification process. As a clinical tester, your ability to write a “perfect” bug report is a force multiplier for the entire development team. It replaces confusion with clarity and accelerates the resolution of patient safety issues.
The industry-standard tool for this process is Jira (or similar platforms like Azure DevOps). These tools provide a structured framework for capturing, triaging, and managing the lifecycle of every bug, feature request, and project task. Your mastery of this tool is a non-negotiable core competency for a pharmacy informatics analyst.
The Anatomy of a World-Class Jira Ticket
A Jira ticket is not just a free-text comment box. It is a structured data record with specific fields, each serving a critical purpose. Let’s dissect the components of a perfect, clinically-focused bug report.
| Jira Field | Purpose | Pharmacist-Centric Best Practices & Examples |
|---|---|---|
| Project | The specific project or system the bug relates to (e.g., “EHR Upgrade 2025,” “Pyxis Optimization”). | Ensure you are logging the bug in the correct project so it is routed to the right development team. |
| Issue Type | The nature of the ticket. Most commonly Bug, but could also be Story (a new feature request), Task (a to-do item), or Epic (a large collection of stories). | Be precise. If the system is crashing, that’s a Bug. If you think the system *should* have a new type of alert, that’s a Story (a feature request), not a bug. |
| Summary | A clear, concise, one-line title for the bug. This is what appears in lists and reports. | Should be in the format of “[Feature Area]: [Brief description of what is wrong]”.
Bad: “Order error” Good: “CPOE – Renal Dosing: Vancomycin alert does not fire for patients on dialysis.” |
| Description | The most critical part of the ticket. This is where you provide the detailed, step-by-step instructions to reproduce the bug. |
Always use the Gherkin “Given-When-Then” format.
|
| Severity | The degree of impact the bug has on the system and patient safety. This is an objective measure. |
Use a standard scale:
|
| Priority | The degree of urgency with which the bug needs to be fixed. This is a subjective business decision. | Severity and Priority are not the same!
High Severity, High Priority: The chemo dose calculation is wrong. Fix it NOW. Low Severity, High Priority: A typo on the main login screen. It doesn’t harm anyone, but it’s embarrassing and the CIO wants it fixed before the board meeting tomorrow. High Severity, Low Priority: A rare bug that corrupts data only when a specific report is run on the last day of February during a leap year. It’s a severe data issue, but because it’s so rare, it’s a lower priority to fix. |
| Environment | The specific system where you found the bug (e.g., TEST, TRAINING, PRODUCTION). | Crucial for developers. A bug that only exists in the TEST environment is less urgent than one that is actively impacting patients in PRODUCTION. Always include the URL or server name if you know it. |
| Attachments | Objective evidence. Screenshots, screen recordings, log files, or copies of erroneous reports. | A screenshot is worth a thousand words. Always take a screenshot of the error message and the screen state leading up to it. Use a tool to draw a red box around the specific part of the screen that is wrong. If possible, a short screen recording of you reproducing the bug is invaluable. |
10.4.3 The Defect Management Lifecycle: From Discovery to Resolution
A bug doesn’t just get reported and then magically fixed. It moves through a formal, well-defined lifecycle within the tracking system. Each step in this lifecycle represents a change in the bug’s status and a handoff of responsibility. As a clinical analyst, you are a key player at the beginning and end of this process, but understanding the entire flow is crucial for effective collaboration with the IT and development teams.
The Journey of a Bug
1. New / Open
A tester or end-user discovers a bug and submits a new ticket in Jira. At this point, it is an unconfirmed issue waiting for review.
Your Role: Author a perfect, detailed bug report as described above.
2. Triage / In Review
A project manager or team lead reviews the ticket. They check for completeness, attempt to reproduce the bug, assign a priority and severity, and assign it to a specific developer to fix.
Branch: The ticket might be Rejected here if it’s a duplicate, not a bug, or cannot be reproduced.
3. In Progress
The assigned developer is actively working on the bug. They are writing code to fix the issue and creating a unit test that proves the fix works.
Your Role: Be available to answer any clinical questions the developer may have about the expected behavior.
4. Ready for QA / In Testing
The developer has finished their fix and deployed the new code to the TEST environment. The ticket is assigned back to the clinical testing team (you!).
5. Verification
You execute the exact same steps from your original bug report in the TEST environment to confirm the bug is gone. You also perform regression testing on related functions to ensure the fix didn’t break anything else.
Your Role: This is your final gatekeeping step. You must be rigorous. If the fix works, you move the ticket to “Closed.” If it doesn’t work or introduced a new problem, you move it back to “New / Open” with detailed comments and new screenshots.
6. Closed / Done
The bug has been fixed, the fix has been verified by the clinical tester, and the code is now ready to be deployed to the production environment in the next scheduled release.
10.4.4 Masterclass on Root Cause Analysis: From “What” to “Why”
The defect management lifecycle is excellent for fixing individual bugs. But a mature, safety-conscious organization doesn’t stop there. For any significant or recurring bug, especially one with patient safety implications, a formal Root Cause Analysis (RCA) must be performed. RCA is a structured, team-based investigation that aims to move beyond the immediate, technical cause of a problem to uncover the fundamental process, system, or human factor failures that were the true origin of the defect. Fixing the code prevents the bug from happening again in the same way; a successful RCA prevents entire classes of similar bugs from ever being created in the first place.
The “Five Whys”: The Simplest and Most Powerful RCA Tool
The “Five Whys” is a deceptively simple but incredibly powerful technique for drilling down past superficial symptoms to find the root cause. Popularized by Toyota as part of its manufacturing production system, the method involves simply taking a problem statement and repeatedly asking the question “Why?” until you arrive at a fundamental process failure. It often takes fewer or more than five “whys” to get there, but the principle remains the same. This is a perfect tool for a pharmacist to facilitate, as it mirrors the process of a clinical differential diagnosis.
Playbook for Facilitating a “Five Whys” RCA Session
As an analyst, you will often be asked to lead RCA sessions. Here is a step-by-step guide to facilitating an effective meeting.
- Assemble the Right Team: The session must include representatives from every stage of the process. For a CPOE bug, this means the original reporter (nurse/pharmacist), the informatics analyst (you), the developer who fixed the bug, and a QA engineer.
- Define a Clear, Factual Problem Statement: Agree on a single, unambiguous statement of what happened. It must be a factual observation, not a conclusion.
Bad: “The developer made a mistake in the dose calculation.” (This is a conclusion and assigns blame).
Good: “When a CPOE order was placed for Drug X for a pediatric patient weighing 12kg, the calculated dose was 10 times the recommended maximum.” - Ask the First “Why”: Ask the group, “Why did the system calculate a dose that was 10 times too high?” Focus on the technical, system-level reason. Let the developer lead this part.
- Continue Asking “Why” to Each Answer: Take the answer from the previous question and use it as the basis for the next “Why.” Each “why” should peel back another layer of the process. If the group gets stuck, rephrase the question: “What process failure allowed this to happen?”
- Identify the Root Cause: You have likely reached the root cause when the answer is a process that is broken or non-existent (e.g., “We don’t have a formal peer review process for high-risk code changes”) or when you can no longer ask a meaningful “why.”
- Brainstorm Countermeasures: Once the root cause is identified, the focus shifts to solutions. For each root cause, brainstorm specific, actionable countermeasures that would prevent that cause from recurring.
- Assign Ownership and Timelines: A countermeasure without an owner is just a suggestion. Assign every action item to a specific person with a specific deadline. Track these in your project management tool (like Jira).
Deep Dive Example: A “Five Whys” RCA in Action
Problem Statement: During user acceptance testing, a nurse discovered that scanning a 2D barcode on a Tylenol suppository from a new manufacturer was causing the BCMA system to crash, requiring a full workstation reboot.
-
1. Why did the BCMA system crash when the barcode was scanned?
Answer: The developer explains, “The new barcode contains an embedded lot number with a special character (a tilde ‘~’) that our barcode parsing library was not programmed to handle, causing an unhandled exception that crashed the application.” (Technical Cause)
-
2. Why was the parsing library not able to handle that special character?
Answer: “The original technical specifications for the parser were based on the GS1 barcode standard, which doesn’t typically use that character. We only built it to handle the characters defined in the spec.”
-
3. Why was the new manufacturer’s barcode not tested against the spec before being introduced to the hospital?
Answer: The pharmacy buyer says, “Our process for adding a new drug to the formulary involves clinical review and pricing analysis, but there is no mandatory step for the informatics team to test the new product’s barcode against our systems before it is purchased.” (Process Failure)
-
4. Why is there no mandatory informatics testing step in the new drug procurement process?
Answer: The Director of Pharmacy explains, “Historically, barcodes were simple and standardized. The complexity of 2D barcodes is new, and we haven’t updated our 10-year-old P&T formulary request policy to reflect this new technological dependency.” (Policy/Governance Failure)
-
5. Why hasn’t the policy been updated?
Answer: “There is no formal, scheduled review process for our departmental policies. We tend to only update them when something breaks.” (Systemic/Cultural Failure)
From Technical Fix to Systemic Solution
The RCA has transformed the problem and the solution.
- The Symptom: A BCMA crash.
- The Technical Cause: An unhandled character in the parser.
- The Root Cause: An outdated governance model that fails to integrate technological validation into the clinical procurement process.
The Countermeasures:
- Corrective Action (Short-term): The developer will update the parsing library to handle the ‘~’ character. (Owner: Dev Team Lead, Due: 3 days).
- Preventative Action (Long-term): The Pharmacy Informatics Analyst will work with the Director of Pharmacy to formally revise the “New Formulary Request” policy to include a mandatory “Barcode and Systems Integration Validation” step, which must be signed off by the informatics team before a drug can be purchased. (Owner: CPIA Analyst, Due: End of Quarter).
- Systemic Action (Cultural): The Director of Pharmacy will propose to the Quality Council a new hospital-wide policy for annual review of all clinical-administrative policies to ensure they are keeping pace with technological changes. (Owner: Director of Pharmacy, Due: 6 months).