CPAP Module 25, Section 2: Evidence-Based Medicine in Policy
MODULE 25: CLINICAL CRITERIA DEVELOPMENT & MAINTENANCE

Section 2: Evidence-Based Medicine in Policy

Learn the process of critically evaluating primary literature from journals like NEJM and JAMA to establish defensible criteria for efficacy, safety, and place in therapy.

SECTION 25.2

Evidence-Based Medicine in Policy

From Journal Club to Population-Level Impact: The Pharmacist as a Critical Appraiser.

25.2.1 The “Why”: From Evidence Consumer to Evidence Architect

Throughout your pharmacy education and career, you have been trained as a sophisticated consumer of clinical evidence. You have participated in journal clubs, answered complex drug information questions, and counseled patients based on the latest clinical trial data published in premier journals. You know how to read a study and understand its conclusions. This skill forms the foundation of your clinical expertise and professional judgment.

In the world of managed care pharmacy, this foundational skill is transformed and amplified. You are no longer just a consumer of evidence; you are tasked with becoming an architect of policy based on that evidence. The stakes are profoundly different. In a community or hospital setting, your critical appraisal of a study might influence the care of a single patient or a small group. Within a PBM or health plan, your interpretation and application of that same study will form the basis of a clinical policy that impacts the care of tens of thousands, or even millions, of members. The responsibility is immense, and it requires a level of rigor, skepticism, and systematic thinking that goes far beyond a typical journal club discussion.

When a PBM develops a new PA guideline or reviews an existing one, its clinical pharmacists are not simply asking, “Did the drug work?” They are asking a series of much deeper, more consequential questions:

  • How well did it work, and in what specific patient population?
  • Was the benefit clinically meaningful, or just statistically significant?
  • Was it compared to a placebo, or to the current, accepted standard of care?
  • What was the safety profile, and are there specific risks for our member population?
  • How does the evidence for this drug compare to the evidence for other, less expensive alternatives?

This section is designed to be your masterclass in answering these questions. We will equip you with a systematic framework for dissecting primary literature—not as an academic exercise, but as a practical tool for building robust, defensible, and clinically sound coverage policies. You will learn to move beyond the abstract and the headline conclusions to scrutinize the methodology, to question the endpoints, and to translate complex statistical data into actionable policy criteria. Mastering this skill is the single greatest differentiator between a PA pharmacist who applies rules and a clinical leader who writes them.

Retail Pharmacist Analogy: The New Generic NTI Drug Dilemma

Imagine the first generic version of a complex, narrow therapeutic index (NTI) drug you dispense frequently—perhaps an anti-epileptic or an immunosuppressant—is released. The computer system automatically flags it as the preferred product, and the price is significantly lower. A less experienced professional might simply accept the system’s choice and begin automatically substituting for all patients.

Your expert training, however, prompts a more rigorous process. You don’t just accept the change; you critically appraise the evidence of equivalence. This is your literature review.

  • Checking the Primary “Label” (The Orange Book): Your first step is to consult the FDA’s Orange Book. You’re looking for an “AB” rating, which is the FDA’s seal of therapeutic equivalence based on bioequivalence studies. This is analogous to checking a drug’s FDA-approved indication—it’s the primary, most important piece of evidence.
  • Reviewing the “Methods” (Bioequivalence Data): If you’re particularly cautious, you might look up the bioequivalence data. Did the studies show the generic’s Cmax and AUC were well within the 80-125% confidence interval of the brand? Were the studies done in healthy volunteers or in the target patient population? This is you digging into the study’s methodology.
  • Consulting the “Systematic Reviews” (Professional Guidelines): You check for position statements from organizations like the American Academy of Neurology or the American Epilepsy Society. Do they recommend caution when switching between manufacturers for this specific drug? These guidelines are your “meta-analyses,” summarizing the expert consensus on the topic.
  • Identifying the “Exclusion Criteria” (High-Risk Patients): Based on your review and clinical judgment, you might decide that while the generic is appropriate for most new-start patients, you will be extremely cautious with specific high-risk individuals—like a patient whose seizures have been perfectly controlled for 10 years on the brand product. You create a mental “policy” to discuss the switch with the prescriber for this specific sub-population before proceeding.

This entire process—looking beyond the surface, questioning the data, consulting expert guidelines, and creating a nuanced plan based on patient-specific factors—is a perfect microcosm of evidence-based policy development. A PBM pharmacist does the exact same thing when a new multi-million dollar specialty drug is launched. They dissect the evidence to determine not just IF the drug should be covered, but FOR WHOM, AFTER WHAT, and UNDER WHAT specific clinical circumstances.

25.2.2 The Hierarchy of Evidence: A Managed Care Perspective

In academic settings, the “pyramid of evidence” is a familiar concept. It ranks study designs based on their ability to minimize bias. In a managed care setting, this pyramid is not just a theoretical model; it is the practical framework for all clinical decision-making. Every piece of data is weighed and valued according to its position in this hierarchy. Understanding this is crucial, because a high-volume of low-quality evidence will never outweigh a single, well-designed, high-quality study.

Systematic Reviews & Meta-Analyses
Randomized Controlled Trials (RCTs)
Cohort Studies
Case-Control Studies
Case Series / Case Reports
Expert Opinion / Editorials

Let’s deconstruct each level from the practical perspective of a pharmacist building a PA policy.

Level of Evidence Description How It’s Used in Policy Development Pharmacist’s Critical Questions
Systematic Reviews & Meta-Analyses A structured review that collects and critically analyzes multiple research studies (ideally RCTs) to synthesize the overall evidence on a topic. A meta-analysis goes a step further by using statistical methods to combine the results of multiple studies. Primary Tool for Guideline Development. These are used to establish the overall efficacy and safety of a drug class and to inform the recommendations of major clinical practice guidelines (e.g., from the ACC/AHA, ADA). A PBM will lean heavily on a Cochrane review or a major societal guideline to define the overall “place in therapy” for a drug.
  • Were the inclusion criteria for studies appropriate? Did they only include high-quality RCTs?
  • Is there significant heterogeneity? (Were the studies too different to be meaningfully combined?) Look for the I² statistic.
  • Is there evidence of publication bias? (Did the authors only include positive studies?) Look for a funnel plot analysis.
Randomized Controlled Trials (RCTs) The gold standard. Participants are randomly assigned to an intervention group or a control group. This design minimizes selection bias and is the most reliable way to determine cause-and-effect (i.e., did the drug cause the outcome?). The Bedrock of Individual Drug Criteria. The pivotal Phase III RCTs are the source material for nearly all specific PA criteria. The study’s inclusion criteria become the policy’s diagnostic requirements. The study’s primary endpoint becomes the measure of success. The comparator drug becomes the basis for step therapy.
  • (This is the focus of the rest of the section) Who was studied? What was the comparator? What were the endpoints? Was the analysis appropriate?
Observational Studies (Cohort, Case-Control) Studies where researchers observe outcomes without manipulation. Cohort studies follow groups over time to see who develops a disease. Case-control studies look backward from a disease to identify risk factors. Primarily for Safety & “Real-World” Data. RCTs are often too small or too short to detect rare or long-term side effects. Large cohort studies are critical for post-marketing safety surveillance. A PA policy might add a warning or exclusion based on safety signals from a large observational study, even if it wasn’t seen in the initial RCTs. They can also help confirm if the efficacy seen in a pristine RCT population holds up in a messier, real-world patient population.
  • What was the potential for confounding? (Were the groups truly similar except for the drug exposure?)
  • How was the data collected? (From reliable EMRs or from less reliable patient recall?)
  • How large was the effect size? A small effect in an observational study is more likely to be due to bias than a large one.
Case Series / Case Reports A simple description of a group of patients (series) or a single patient (report). There is no control group. Used Almost Exclusively for Hypothesis Generation & Safety Signals. These are never used to establish efficacy criteria. However, a series of case reports describing a novel, serious adverse event can trigger a safety review and may lead to a new warning or exclusion in a policy. For example, the first signals of Vioxx’s cardiovascular risk came from observational data and case reports.
  • Is this a plausible adverse drug reaction?
  • Are there alternative explanations for the outcome?
  • Is this a single, isolated event or part of a growing pattern?
Expert Opinion / Editorials The personal viewpoint or interpretation of a respected leader in the field. Not based on original research. Used for Context and Nuance, NEVER for Primary Criteria. An editorial in the NEJM by a thought leader might help a policy committee understand the clinical context or potential future impact of a new drug. However, a PA criterion will never state “Covered because Dr. Smith recommends it.” Policy must be based on data, not opinion.
  • Does the expert have any financial conflicts of interest?
  • Is their opinion based on the available data, or is it speculative?
  • Does their opinion align with or contradict major professional society guidelines?

25.2.3 Masterclass Deep Dive: Critically Appraising a Randomized Controlled Trial (RCT)

The pivotal, registration-enabling, double-blind, randomized, controlled trial is the single most important piece of evidence in drug evaluation for policy development. Your ability to dissect these studies with surgical precision is paramount. We will use the PICO framework (Patient, Intervention, Comparison, Outcome) as our guide to deconstruct a landmark study from start to finish.

The PICO Framework: Your Starting Questions

Before you even read the first line of the methods section, you must frame the study using PICO. This is your mental scaffolding for the entire appraisal.

  • Patient/Population: Who was included in this study? (e.g., Adults with HFrEF, NYHA Class II-IV, EF ≤ 40%)
  • Intervention: What was the new treatment being tested? (e.g., Sacubitril/valsartan 97/103 mg BID)
  • Comparison: What was the new treatment compared against? (e.g., Enalapril 10 mg BID)
  • Outcome: What was the primary goal or endpoint measured? (e.g., Composite of cardiovascular death or hospitalization for heart failure)

If you cannot clearly define these four elements, you cannot properly evaluate the study.

The following table is the most detailed component of this module. It provides a systematic checklist for every section of a published RCT, explaining what to look for and, most importantly, how it translates into a PA policy decision.

Masterclass Table: Deconstructing an RCT for Policy Development

Study Section & Key Question The Deep Dive: What to Look For Translation to PA Policy Criteria
THE METHODS SECTION: The Heart of the Appraisal
Patient Population: Inclusion & Exclusion Criteria

“Who, exactly, was in this trial?”

This is the most critical element for defining the “on-label” patient. You must scrutinize these criteria line by line.
  • Inclusion Criteria: Look for specifics. Age range, disease definition (e.g., diagnostic criteria, severity scores like NYHA class), baseline lab values (e.g., eGFR, EF), and prior treatment requirements.
  • Exclusion Criteria: This is just as important! Who was left out? Common exclusions are severe renal or hepatic impairment, specific comorbidities, contraindicated medications, or pregnancy. This defines the population in which the drug has not been proven safe or effective.
Direct Translation.
  • Inclusion Criteria ➔ Initial PA Criteria. (e.g., “Patient must have a diagnosis of HFrEF with LVEF ≤ 40%”).
  • Exclusion Criteria ➔ Policy Exclusions. (e.g., “Not covered for patients with severe hepatic impairment (Child-Pugh Class C)”).
Comparison / Control Group

“What are we comparing this new drug against?”

The choice of comparator determines the drug’s place in therapy.
  • Placebo-Controlled: Proves the drug is better than nothing. This is acceptable for conditions with no existing treatment but is a lower bar of evidence otherwise.
  • Active Comparator: This is the gold standard. It proves the drug is better (superiority trial) or at least as good as (non-inferiority trial) the current standard of care.
The “Straw Man” Comparator

Be skeptical. Was the active comparator the right drug at the right dose? A new drug might look impressive when compared to an older drug at a sub-optimal dose. Your clinical expertise is required to judge if the comparison was fair.

Basis for Step Therapy.
  • If the drug was superior to the standard of care (e.g., Drug X > Metformin), it may become a first-line option or require a step through metformin.
  • If it was only proven superior to placebo, it will almost certainly be placed after all other established, active therapies.
Outcomes / Endpoints

“What did they measure, and does it matter?”

This is where clinical significance is determined.
  • Primary vs. Secondary: The trial is powered to answer the question of the primary endpoint. Results for secondary endpoints are exploratory and must be interpreted with caution. A policy should rarely be based on a secondary endpoint alone.
  • Surrogate vs. Clinical: A surrogate endpoint is a lab value or physical sign that is thought to predict a real clinical outcome (e.g., HbA1c, blood pressure). A hard clinical endpoint is a direct measure of how a patient feels, functions, or survives (e.g., MI, stroke, death). Payers strongly prefer policies based on hard clinical endpoints.
  • Composite Endpoints: Be careful! An endpoint that combines “CV death, MI, stroke, or urgent revascularization” can be driven by the “softest” component (revascularization). Always check the results for each component of the composite.
Defines “Success” for Reauthorization.
  • The primary endpoint often informs the continuation criteria. If a drug was approved based on reducing hospitalizations, the reauthorization criteria might require documentation that the patient has remained hospitalization-free.
THE STATISTICS SECTION: Quantifying the Benefit
Risk Reduction & NNT

“How big was the benefit, really?”

This is where you translate statistics into clinical impact.
  • Relative Risk Reduction (RRR): Often sounds impressive (e.g., “50% reduction in risk!”), but can be misleading if the baseline risk is very low. Formula: $$RRR = 1 – RR$$
  • Absolute Risk Reduction (ARR): The true difference in risk between groups. This is what payers care about. Formula: $$ARR = \text{Risk}_text{control} – \text{Risk}_text{treatment}$$
  • Number Needed to Treat (NNT): The number of patients you would need to treat with the new drug to prevent one additional bad outcome. This is the most intuitive measure of clinical impact. Formula: $$NNT = \frac{1}{ARR}$$
Example: The Power of NNT

A study shows a new drug reduces the risk of an event from 2% in the control group to 1% in the treatment group. The RRR is 50% ([2-1]/2), which sounds amazing. But the ARR is only 1% (2%-1%). The NNT is 100 (1/0.01). You have to treat 100 patients to prevent one event. This context is critical for a PBM deciding whether the drug’s cost is justified by its benefit.

Foundation of Value Assessment.

While not written directly into a PA criterion, NNT is a core metric used in Pharmacy & Therapeutics (P&T) committee discussions to determine a drug’s formulary placement and overall value. A drug with an NNT of 5 will be viewed much more favorably than one with an NNT of 500.

Statistical Significance

“Is the result real or just due to chance?”

  • P-value: The probability that the observed result occurred by random chance. The conventional cutoff for “statistical significance” is p < 0.05. This means there is less than a 5% probability the result is a fluke. It does not tell you the size or importance of the effect.
  • Confidence Interval (CI): This provides a range of plausible values for the true effect size. A 95% CI means you can be 95% confident the true value lies within that range. For a difference to be statistically significant, the 95% CI cannot include the value of “no effect” (e.g., a value of 0 for a difference, or 1.0 for a ratio/odds ratio). CIs are more informative than p-values.
A “Go/No-Go” Check.

A policy will generally only be created for an indication where the drug has demonstrated a statistically significant benefit on a primary endpoint. If the p-value is > 0.05 or the 95% CI crosses 1.0, the drug is typically considered to have failed to prove its benefit for that outcome.