Section 3: Downtime Simulation and Disaster Recovery
A critical masterclass in preparing for the inevitable. Learn to plan and execute controlled downtime drills, test backup and recovery procedures, and ensure pharmacy can provide safe patient care when technology fails.
Downtime Simulation and Disaster Recovery
Transforming IT Failures into Clinical Drills: The Pharmacist’s Role in Building Operational Resilience.
10.3.1 The “Why”: It’s Not a Matter of If, But When
In a modern hospital, the electronic health record and its web of interconnected systems are as vital to patient care as electricity and running water. This absolute dependence creates a paradox: the more advanced and integrated our technology becomes, the more catastrophic the impact when it inevitably fails. The question for a healthcare organization is not if it will experience a system downtime, but when, for how long, and how prepared it will be. An EHR outage is not an IT problem; it is a patient safety emergency of the highest order. The sudden loss of CPOE, eMAR, clinical decision support, and patient data transforms a 21st-century hospital back to the 1980s in an instant, but without the muscle memory and paper-based systems that made care possible in that era.
As a pharmacist, your entire practice is built upon a foundation of risk mitigation and contingency planning. You instinctively understand that you cannot dispense a critical medication without a backup plan. What if the patient is allergic? What if it’s not on formulary? What if there’s a drug interaction? Downtime planning is this same clinical mindset applied at a macro scale. It is the institution’s formal strategy for mitigating the catastrophic risk of technological failure. Your role as a pharmacy informatics analyst is to be a primary architect and guardian of this strategy for the entire medication-use process.
This section is a deep dive into the practical, operational realities of planning for, executing during, and recovering from a system downtime. We will treat downtime not as a technical abstraction, but as a hands-on clinical drill. You will learn to move beyond the theoretical concept of a “disaster recovery plan” and into the granular details of creating a functional, battle-tested Downtime Playbook. This includes designing paper forms that work under pressure, defining clear communication channels that don’t rely on the network, and, most importantly, planning and executing realistic downtime simulations. Just as a hospital runs “Code Blue” drills to prepare for cardiac arrests, it must run downtime drills to prepare for system arrests. Your leadership in this area is a direct extension of your core duty as a pharmacist: to ensure the safe and effective use of medications, even—and especially—when the systems designed to support that process have failed.
Pharmacist Analogy: The Catastrophic Wholesaler Outage
Imagine you are the pharmacy manager. You arrive on a Monday morning to a notification from your primary drug wholesaler: a major warehouse fire has destroyed their regional distribution center. Their ordering portal is offline, and all deliveries are suspended indefinitely. This is a supply chain “disaster.” Your pharmacy’s lifeblood has been cut off.
What do you do? You don’t simply throw up your hands. You activate your downtime plan.
- Initial Triage & Communication: Your first call is to your leadership. You activate the pharmacy’s emergency preparedness group. You immediately send a clear, concise communication to all pharmacy staff and nursing leadership: “Cardinal Health is down. All deliveries are on hold. Conserve existing stock. Emergency procedures are now in effect.”
- Activate Backup Systems: You immediately contact your secondary and tertiary suppliers (e.g., McKesson, AmerisourceBergen). You dust off the login credentials and ordering processes for these alternate vendors, which you wisely established and tested months ago.
- Manual Processes & Workarounds: Orders that were once electronic must now be placed via phone or fax. Inventory management, normally automated, becomes a manual process of walking the shelves and using spreadsheets. You re-assign staff roles, dedicating two pharmacists to just managing the new ordering process and three technicians to continuous inventory counts.
- Clinical Triage: You work with medical staff to immediately implement therapeutic interchange protocols. You identify which medications are at critical stock levels and proactively switch patients to therapeutic alternatives that you can procure from your backup suppliers. This is clinical risk mitigation in real-time.
- Recovery & Reconciliation: When the primary wholesaler is back online a week later, the work isn’t over. You must now perform a massive reconciliation: comparing what you ordered from backup suppliers against your inventory to prevent massive overstocking, and ensuring all billing and purchasing records are corrected.
This entire scenario—the sudden failure of a critical system, the activation of pre-planned manual procedures, the reliance on backup systems, the triage of critical needs, and the painful reconciliation process—is a perfect parallel to an EHR downtime. Your experience in managing supply chain disruptions has already given you the exact mental framework required to lead during a technological disaster.
10.3.2 Defining the Unthinkable: A Spectrum of System Failures
The word “downtime” is often used as a monolith, but in reality, system failures exist on a wide spectrum of severity and scope. A nuanced understanding of these different failure states is critical for developing a proportional and effective response. A plan designed for a 30-minute server reboot is wholly inadequate for a 24-hour ransomware attack. As an analyst, your first step in downtime planning is to categorize the potential disasters you are planning for. This allows you to create a tiered response plan rather than a single, one-size-fits-all procedure.
Masterclass Table: The Tiers of Downtime
| Downtime Tier | Description & Examples | Typical Duration | Primary Impact on Pharmacy | Pharmacist/Analyst Role |
|---|---|---|---|---|
| Tier 1: Planned Maintenance | A scheduled, controlled outage required for system upgrades, hardware replacement, or patching. The timing is known weeks or months in advance.
Ex: Monthly Windows server patching, EHR version upgrade. |
2 – 8 hours (Typically overnight or weekend) | Minimal. Workflows are adjusted temporarily. Focus is on smooth transition and post-upgrade validation. |
|
| Tier 2: Unplanned, Localized Failure | An unexpected failure of a single component or interface. The core EHR may be up, but a critical ancillary function is lost.
Ex: The ADT interface fails (no patient movement data), a network switch on a nursing unit goes down, the ADC server for the ICUs crashes. |
30 minutes – 4 hours | Highly disruptive but contained. Requires rapid activation of specific workarounds. Causes significant frustration and potential for localized errors. |
|
| Tier 3: Unplanned, Major System Failure | A major, unexpected outage of a core clinical system.
Ex: The primary EHR application server crashes. The pharmacy management system database becomes corrupted. CPOE is unavailable hospital-wide. |
4 – 12 hours | Severe, hospital-wide disruption. Full downtime procedures must be activated immediately. High risk of medication errors due to chaos and reliance on unfamiliar paper workflows. |
|
| Tier 4: Catastrophic Disaster | The complete and potentially long-term loss of the primary data center and its systems.
Ex: A natural disaster (hurricane, flood), a major fire, a debilitating ransomware attack that encrypts all primary servers and backups. |
Days to weeks | Existential threat to hospital operations. Standard downtime procedures may be insufficient. The focus shifts from “recovery” to “business continuity.” |
|
10.3.3 The Downtime Playbook: Your Master Procedure
A downtime procedure is not a single document; it is a comprehensive playbook that contains all the communication plans, paper forms, clinical protocols, and technical recovery steps needed to navigate a system failure. As an informatics analyst, you will be a lead author and custodian of this playbook for the medication-use process. It must be detailed enough that a pharmacist who has never seen it before could pick it up in the middle of a crisis and understand their role. It must be reviewed and updated at least annually and after every downtime event or drill.
Component 1: The Activation & Communication Plan
The first 15 minutes of an unplanned downtime are the most critical. A clear, rapid, and multi-modal communication strategy is essential to prevent chaos from turning into catastrophe. The plan must answer: Who has the authority to declare a downtime? Who do they notify first? How is the message cascaded to the entire organization?
Sample Downtime Communication Cascade
- Initial Detection (Time 0): The IT Help Desk receives multiple calls about EHR slowness/unavailability. The on-call IT Director confirms a Tier 3 outage.
- Executive Notification (Time +5 min): The IT Director notifies the hospital’s Administrator On-Call (AOC) via a dedicated emergency phone line. The AOC gives the authority to declare a formal “Code Downtime.”
- Mass Notification (Time +10 min): The IT Director and AOC use a mass-notification system (e.g., Everbridge) to send a text message to a pre-defined “Downtime Leadership” group, which includes the Director of Pharmacy, CNO, CMO, and department managers. The message is simple: “URGENT: Unplanned EHR Downtime is in effect. Activate your departmental procedures. Report to the command center.”
- Overhead Announcement (Time +12 min): The hospital operator makes a plain-language overhead announcement: “Attention all staff, attention all staff. We are experiencing a system-wide computer downtime. Please activate your departmental downtime procedures at this time.” This is repeated every 15 minutes.
- Local Activation (Time +15 min): The pharmacy department manager, having received the text, directs staff to immediately begin using paper forms, unlock ADC overrides, and prepare downtime kits. “Runners” are dispatched to clinical units to provide support and gather paper orders.
Component 2: The Paper Arsenal – Downtime Forms
When the EHR goes dark, paper becomes your only source of truth. The design and availability of these forms are critical. They must be intuitive, comprehensive, and readily accessible in “Downtime Boxes” on every unit and in the pharmacy. Your role is to design and maintain these medication-specific forms.
| Form Name | Purpose | Key Design Elements (from a Pharmacist’s perspective) |
|---|---|---|
| Downtime Medication Administration Record (MAR) | The temporary, paper-based record of all scheduled and PRN medications for a patient. It is the primary communication tool between pharmacy and nursing. |
|
| Downtime Physician Order Form | The replacement for CPOE. A multi-part (carbon copy) form used by providers to write new medication, lab, and diet orders. |
|
| Pharmacy Order Intake Form | Used by the pharmacist to document the verification of a paper order and to track its progress (e.g., “Dispensed from ADC,” “IV Sent”). |
|
Component 3: The Clinical Workflows
This is the heart of the playbook. It must detail, step-by-step, how the entire medication use process will function without technology. A visual flowchart is often the most effective way to communicate this.
Downtime Medication Order Workflow
1. Order Written
Provider writes a new medication order on a multi-part carbon copy Downtime Physician Order Form.
2. Order Transmission
The unit secretary or nurse separates the copies. The top copy is placed in the patient’s physical chart. The yellow pharmacy copy is sent to the pharmacy via pneumatic tube or a designated “runner.”
3. Pharmacy Triage & Verification
The pharmacist receives the paper order. They review it for appropriateness against the patient’s last known Downtime MAR and any available (read-only) data. They perform any necessary clarifications via phone call and document on the order. The verified order is timestamped.
4. Dispensing & Documentation
If the drug is in the ADC, the pharmacist calls the nurse to notify them the order is approved for override. If it is a first dose or not in the ADC, the pharmacy dispenses it. The pharmacist hand-writes the new medication onto the master Downtime MAR and sends the updated MAR back to the unit.
5. Administration
The nurse retrieves the medication (often via override). A manual, independent double-check is performed with a second nurse. The nurse administers the medication and documents it immediately on the patient’s paper Downtime MAR by initialing in the correct time slot.
10.3.4 The Disaster Recovery Plan (DRP): Your Technical Blueprint
While the Downtime Playbook focuses on clinical operations, the Disaster Recovery Plan (DRP) is its technical counterpart. This is a highly detailed, formal document maintained by the IT department that specifies the exact procedures for recovering and restoring the technological infrastructure after a failure. While you will not write the DRP, you must understand its key concepts and provide clinical input into its priorities.
Key DRP Concepts for the Pharmacist
You need to be able to speak the language of IT during a disaster to effectively advocate for your department’s needs.
| Concept | Definition | Why It Matters to a Pharmacist |
|---|---|---|
| Recovery Time Objective (RTO) | The maximum amount of time a system is allowed to be unavailable after a disaster. | This is the most important metric. An RTO of 4 hours for the EHR means IT is contractually obligated to restore service within that window. You can plan your clinical procedures around this target. If the RTO is 24 hours, your downtime procedures need to be far more robust. |
| Recovery Point Objective (RPO) | The maximum amount of data loss that is acceptable, measured in time. | An RPO of 15 minutes means that in a worst-case scenario, you could lose up to 15 minutes of data entered right before the crash. This tells you how much data you will need to manually re-enter after recovery. An RPO of 24 hours would be clinically unacceptable. |
| Backup Strategies | The method for copying data for recovery.
Full: Copies everything. Slow, takes lots of space. Incremental: Copies only what’s changed since the last backup of any kind. Fast, but slow to restore. Differential: Copies what’s changed since the last FULL backup. |
You need to know how frequently backups are taken. Are they taken nightly? Hourly? This directly impacts your RPO. You also need to know if the backups themselves are tested regularly. An untested backup is no backup at all. |
| Failover / High Availability (HA) | An automated process where a redundant, standby server or system immediately takes over if the primary system fails. | Systems with HA may have a downtime of only a few seconds or minutes, which is an inconvenience rather than a disaster. Your most critical systems, like the core EHR database and interface engine, should have HA capabilities. |
| Failover Sites | A secondary data center for use in a catastrophic disaster at the primary site.
Hot Site: A fully equipped, mirrored data center that can take over almost instantly. Very expensive. Warm Site: Has hardware and connectivity, but requires configuration and data restoration. Cold Site: Just a room with power and cooling. Everything must be brought in. |
You need to know what kind of site your hospital has, as this determines your RTO in a Tier 4 disaster. A Hot Site might mean an RTO of 2-4 hours. A Cold Site might mean an RTO of weeks. |
10.3.5 The Downtime Drill: Practice Makes Resilient
A downtime procedure that has never been tested is not a plan; it is a theory. Just as pilots spend hours in a flight simulator practicing for engine failures, a hospital must regularly simulate system failures to build muscle memory and expose flaws in its procedures. The downtime drill is a planned, controlled exercise designed to mimic the chaos of a real event in a safe environment. As a pharmacy informatics analyst, you will be a key planner, facilitator, and evaluator for these drills.
The Goal of a Drill is to Find Failure
This is a critical mindset shift. A “successful” downtime drill is not one where everything goes perfectly. A perfect drill means your simulation was too easy. The most successful drills are the ones that create controlled chaos and expose the weak points in your plan. Every problem you find during a 2-hour drill on a Saturday morning is a potential patient safety catastrophe you have prevented during a real, 8-hour outage on a busy Tuesday afternoon. Embrace the failures found in a drill; they are your greatest learning opportunities.
Planning and Executing a Pharmacy Downtime Drill
- Define Scope and Objectives: Start small. Your first drill might only involve the pharmacy department and one medical/surgical nursing unit. The objective might be simple: “Test the process for communicating, verifying, and documenting three new paper-based medication orders within a 30-minute timeframe.”
-
Develop the Scenario: Create realistic clinical scenarios. Don’t just test a simple lisinopril order. Create a complex, high-risk scenario that will stress-test the paper process.
Example Scenario: A patient arrives from the OR post-op. The surgeon writes paper orders for:- Morphine PCA with a basal rate, demand dose, and 4-hour limit.
- A weight-based enoxaparin dose for DVT prophylaxis.
- A STAT dose of IV ondansetron for nausea.
- Prepare the Environment: Schedule the drill for a non-peak time. Ensure the “Downtime Box” on the unit is fully stocked. Have “mock” patient charts and paper forms ready. Brief all participants—the pharmacists, technicians, and nurses—that this is a drill, but they should act with the urgency of a real event. Designate several analysts as “observers” with checklists to record what happens.
- Initiate the Drill: Announce the start: “Downtime Drill has now begun.” Have your mock surgeon hand the paper orders to the mock unit secretary. The observers start their stopwatches and begin taking notes.
- Observe, Don’t Intervene (Unless Necessary for Safety): As a facilitator, your job is to watch the process unfold. How long does it take for the order to get to the pharmacy? Does the pharmacist have all the information they need (like the patient’s weight)? How do they communicate the verification back to the nurse? Do the nurses know the override procedure for the ADC? Document every deviation from the plan, every point of confusion, and every success.
-
The Hot Wash (Debrief): Immediately after the drill ends, gather all participants for a debrief. This is the most important step. Go through the timeline of events.
- What went well? (“The runner system worked great.”)
- What went wrong? (“The pharmacist didn’t have the patient’s weight for the enoxaparin.” “The PCA order form was confusing.” “We couldn’t find the key for the ADC override.”)
- What can we improve? (“We need to add the patient’s weight to the header of the MAR.” “Let’s redesign the PCA form.” “The ADC key location needs to be on our downtime checklist.”)
- Follow-Up and Action Items: Document the findings from the debrief. Assign action items with deadlines (e.g., “Pharmacy Informatics to redesign the PCA downtime order form by November 1st.”). This formal follow-up is what turns the drill from a simple exercise into a true quality improvement cycle.
10.3.6 The Recovery: The Most Dangerous Phase
The moment the IT department announces “The system is back online” is not a moment of relief; it is the beginning of the most chaotic and high-risk phase of the entire downtime event. The “recovery” or “reconciliation” phase is a frantic, all-hands-on-deck effort to manually enter all the clinical activity that occurred on paper back into the electronic health record. This process is fraught with peril. The risk of creating duplicate orders, missing new orders, or mis-documenting administrations is extraordinarily high. A disciplined, methodical, and well-communicated recovery process is the final, critical component of your downtime playbook.
The Reconciliation Command Center
Just as a command center is needed to manage the downtime, one is needed to manage the recovery. A team of pharmacists, nurses, and providers, led by informatics analysts, must coordinate this effort. The goal is to bring the EHR back to a state of being the “source of truth” as quickly and safely as possible.
The Reconciliation Playbook: A Step-by-Step Guide
- Step 0: System Validation. Before anyone touches the system, a small team of informatics analysts and super-users must perform a rapid regression test to confirm the system is truly stable and functional.
- Step 1: Announce the “Reconciliation Period” and FREEZE Paper. A clear announcement must be made: “The EHR is online for viewing and reconciliation only. DO NOT place any new orders in the EHR. ALL new orders must continue on paper until further notice.” This prevents a chaotic mix of paper and electronic orders.
- Step 2: Reconcile Patient Demographics (ADT). The first data to be entered are all the admissions, discharges, and transfers that occurred during the downtime. The accuracy of the medication record depends on patients being in the right location in the system.
- Step 3: Enter All New Medication Orders. A dedicated team of pharmacists and technicians begins the methodical process of entering every single medication order from the yellow paper copies received during the downtime. Every order must be back-timed to reflect when it was actually written.
- Step 4: Reconcile the eMAR against the Paper MAR. This is the most labor-intensive step. Nurses on the units, often assisted by pharmacists or analysts, must go through every patient’s paper Downtime MAR, line by line, and document every single administration that occurred during the outage into the eMAR. This is where missed or duplicate doses can easily occur if not done carefully.
- Step 5: Reconcile All ADC Transactions. All overrides and discrepancies from the ADCs must be accounted for and resolved. This is particularly critical for controlled substances.
- Step 6: Lift the Freeze. Only when the Reconciliation Command Center lead confirms that all orders and administrations are entered and the eMAR is accurate, is the announcement made: “Reconciliation is complete. Paper-based ordering is now discontinued. Please resume normal CPOE and eMAR workflows.”
The “Duplicate Order” Catastrophe
The single greatest risk during recovery is the duplicate order. A provider writes an order for Morphine 4mg IV on paper. The pharmacist verifies it, the nurse overrides it from the ADC, and administers it, documenting on the paper MAR. When the system comes back up, the same provider, forgetting they already wrote it on paper, enters the exact same order into the now-functional CPOE. If the reconciliation process is not perfect, the pharmacy might verify this new electronic order, and the nurse, seeing it fresh on the eMAR, could administer a second, unintended dose.
This is why the “freeze” on new electronic orders (Step 1) is so critical. It creates a clear cut-off, allowing for a methodical reconciliation of the paper world before re-introducing the complexity of new electronic orders. Your communication and enforcement of this freeze are paramount to patient safety.