MetaReview

Psychiatry Meta-Analysis Guide

From rating scale harmonization to network meta-analysis. Everything you need to synthesize evidence from depression, anxiety, schizophrenia, and psychotherapy trials.

Table of Contents

  1. Why Meta-Analysis Is Critical in Psychiatry
  2. Defining Your Psychiatric Research Question (PICO)
  3. Choosing the Right Effect Size
  4. Handling Rating Scale Differences
  5. The Placebo Effect Problem
  6. Data Extraction Checklist for Psychiatric Studies
  7. Handling Heterogeneity in Psychiatric Trials
  8. Subgroup and Network Meta-Analysis
  9. Step-by-Step: Psychiatry Meta-Analysis in MetaReview
  10. Common Pitfalls in Psychiatric Meta-Analysis

1. Why Meta-Analysis Is Critical in Psychiatry

Psychiatric disorders affect hundreds of millions of people worldwide, yet individual trials frequently produce conflicting results. Effect sizes are typically small to moderate (SMD 0.2–0.5 for antidepressants), placebo response rates are high, and dozens of competing treatments exist without head-to-head comparisons. Meta-analysis addresses these challenges by:

Key psychiatric meta-analyses that shaped clinical practice: Cipriani 2018 (21 antidepressants NMA, Lancet), Leucht 2013 (antipsychotic efficacy, Lancet), Turner 2008 (publication bias, NEJM), Cuijpers 2019 (psychotherapy vs pharmacotherapy).

2. Defining Your Psychiatric Research Question (PICO)

A precise PICO framework prevents scope creep and unmanageable heterogeneity. Psychiatric populations are highly variable, so specificity at this stage saves enormous effort later.

ElementDescriptionPsychiatry Examples
P (Population)Disorder, diagnostic criteria, severityAdults with MDD (DSM-5), moderate-to-severe; Adolescents with GAD; Treatment-resistant schizophrenia (DSM-5); PTSD in veterans
I (Intervention)Treatment being evaluatedSSRIs (escitalopram, sertraline); SNRIs (venlafaxine, duloxetine); CBT; Combination (SSRI + CBT); Aripiprazole augmentation
C (Comparator)Control conditionPlacebo (pill placebo); Active comparator (fluoxetine); Waitlist; TAU (treatment as usual)
O (Outcomes)Primary and secondary endpointsMean change on HAMD-17; Response rate (≥50% reduction); Remission (HAMD ≤7); Dropout due to AE; CGI-I responder rate

Disorder-Specific Considerations

DisorderCommon ScalesTypical ComparatorsKey Challenges
MDDHAMD-17, MADRS, PHQ-9, BDI-IIPlacebo, active drugHigh placebo response, scale heterogeneity
GADHAM-A, GAD-7Placebo, active drugHigh comorbidity with MDD
SchizophreniaPANSS, BPRS, CGI-SPlacebo, haloperidol, active drugDose equivalence, chronicity
BipolarYMRS, MADRS, CGI-BPPlacebo, lithiumSeparate mania vs depression phases
PTSDCAPS-5, PCL-5Placebo, waitlistTrauma type variability
OCDY-BOCSPlacebo, active drugHigher SSRI doses needed
ADHDADHD-RS, CGI-SPlaceboStimulant vs non-stimulant classes
Scope warning: Combining different disorders (e.g., MDD + GAD + PTSD) in a single pooled analysis is almost never appropriate. Even within MDD, mixing treatment-resistant depression with first-episode depression introduces extreme heterogeneity.

3. Choosing the Right Effect Size

This is the most critical decision in psychiatric meta-analysis. Because studies use different rating scales for the same construct, Standardized Mean Difference (SMD) is the default choice — unlike most other medical fields where Mean Difference (MD) is common.

What is your outcome type? ├─ Continuous (symptom severity scores) │ ├─ All studies use the SAME scale and version │ │ └─ Use Mean Difference (MD) │ └─ Studies use DIFFERENT scales (HAMD vs MADRS vs PHQ-9) │ └─ Use Standardized Mean Difference (SMD / Hedges' g) ├─ Binary (response, remission, dropout) │ ├─ Response (≥50% score reduction) → OR or RR │ ├─ Remission (below threshold, e.g., HAMD ≤7) → OR or RR │ └─ Dropout (all-cause or AE-related) → OR or RR └─ Time-to-event (relapse prevention) └─ Use Hazard Ratio (HR)
OutcomeEffect SizeNull ValueInterpretation Example
Symptom reduction (mixed scales)SMD (Hedges' g)0SMD = −0.30 means drug reduces symptoms by 0.3 SD more than placebo
Symptom reduction (same scale)MD0MD = −3.0 on HAMD-17 means 3-point greater improvement with drug
Response rate (≥50% reduction)OR or RR1.0OR = 1.5 means 50% higher odds of response with drug
Remission rateOR or RR1.0RR = 1.4 means 40% more likely to remit with drug
Dropout due to adverse eventsOR or RR1.0OR = 2.0 means twice the odds of discontinuing due to side effects
Time to relapseHR1.0HR = 0.60 means drug reduces relapse hazard by 40%
SMD (Hedges' g) = (M_treatment − M_control) / SD_pooled × J(df)

J(df) is the small-sample correction factor, approximately 1 − 3/(4df − 1)

MetaReview supports all psychiatric effect sizes. Select SMD from the dropdown, enter mean, SD, and N for each arm. The tool calculates Hedges' g, pools using random effects, and generates the forest plot automatically.

4. Handling Rating Scale Differences

Rating scale heterogeneity is the defining challenge of psychiatric meta-analysis. Unlike blood pressure or tumor size, psychiatric symptoms are measured through subjective scales with different items, scoring ranges, and rater perspectives.

Major Depression Scales

ScaleTypeItemsScore RangeResponse ThresholdRemission Threshold
HAMD-17Clinician-rated170–52≥50% reduction≤7
HAMD-21Clinician-rated210–66≥50% reduction≤7 (first 17 items)
MADRSClinician-rated100–60≥50% reduction≤10
PHQ-9Self-rated90–27≥50% reduction≤4
BDI-IISelf-rated210–63≥50% reduction≤12

Other Key Psychiatric Scales

DisorderScaleTypeScore Range
SchizophreniaPANSSClinician-rated30–210
SchizophreniaBPRSClinician-rated18–126
Anxiety (GAD)GAD-7Self-rated0–21
OCDY-BOCSClinician-rated0–40
Critical rule: Do not mix clinician-rated and self-rated scales without sensitivity analysis. Clinician-rated scales (HAMD, MADRS) tend to show larger drug-placebo differences than self-rated scales (PHQ-9, BDI-II). If you pool both types, conduct a subgroup analysis by rater type and report results separately.

Practical Recommendations

5. The Placebo Effect Problem

No other medical field faces the placebo challenge quite like psychiatry. The placebo response in antidepressant trials typically ranges from 30% to 50%, dwarfing the drug-specific effect and creating unique analytical difficulties.

Key Facts About the Psychiatric Placebo Response

How the Placebo Effect Impacts Your Meta-Analysis

ImpactMechanismMitigation Strategy
Compressed effect sizesHigh placebo response leaves little room for drug superiorityReport both absolute and relative effect sizes; use NNT for clinical interpretation
Heterogeneity inflationVariable placebo response across trials increases I²Meta-regression with placebo response rate as covariate
Publication year confoundNewer trials have higher placebo rates and smaller effectsSubgroup by decade or meta-regression on publication year
Selection biasFailed trials less likely to be publishedInclude FDA review data and ClinicalTrials.gov results
Severity matters: Kirsch et al. (2008) argued that antidepressant effects were clinically significant only in severely depressed patients (baseline HAMD ≥28). Fournier et al. (2010) confirmed a severity-response gradient. Consider subgroup analysis by baseline severity.

6. Data Extraction Checklist for Psychiatric Studies

Psychiatric trials have unique reporting patterns. Use this checklist to ensure complete, consistent extraction.

Study Characteristics

Patient Population

Intervention Details

Outcomes Data

LOCF vs MMRM matters enormously. LOCF (Last Observation Carried Forward) biases results in unpredictable directions. MMRM (Mixed-Model Repeated Measures) is preferred by the FDA and EMA since ~2008. Record which method each study used and plan a sensitivity analysis.

7. Handling Heterogeneity in Psychiatric Trials

Psychiatric meta-analyses almost always show moderate to high heterogeneity (I² 40%–80%). Understanding and accounting for the sources is essential for credible results.

Sources of Heterogeneity Specific to Psychiatry

SourceExamplesImpact on Effect Size
Diagnostic criteriaDSM-IV vs DSM-5 vs ICD-10/11Different patient populations enrolled
Baseline severityMild (HAMD 8–13) vs moderate (14–18) vs severe (19–22) vs very severe (≥23)Larger drug-placebo difference in severe patients
Rating scaleHAMD-17 vs MADRS vs PHQ-9Different sensitivity to change; managed by SMD but adds noise
Drug doseSub-therapeutic vs therapeutic vs supra-therapeuticDose-response relationship varies by drug
Trial duration4 weeks vs 8 weeks vs 12 weeksLonger trials: larger placebo response, potentially larger drug effect
Publication year1990s vs 2000s vs 2010s vs 2020sSecular increase in placebo response; changes in trial methodology
ComorbidityMDD alone vs MDD + anxiety vs MDD + substance useComorbid patients may respond differently
Sponsor typeIndustry vs academic vs governmentIndustry trials may have allegiance effects

Recommended Strategies

  1. Always use random-effects models — Fixed-effect is almost never appropriate in psychiatry given the clinical diversity
  2. Pre-specify subgroup analyses by drug class, severity, scale type, and trial duration
  3. Meta-regression on baseline severity, publication year, and placebo response rate (if ≥10 studies)
  4. Sensitivity analysis — Leave-one-out; restrict to HAMD-17 only; exclude studies with high risk of bias; MMRM-only vs all studies
In MetaReview: Heterogeneity statistics (I², Q, τ²) are automatically calculated. Use the subgroup column and meta-regression feature to explore sources of heterogeneity.

8. Subgroup and Network Meta-Analysis

Subgroup Analysis

Pre-specified subgroup analyses are essential for interpreting psychiatric meta-analyses. Common subgroup variables include:

Network Meta-Analysis (NMA)

NMA is arguably the most impactful analytical method in psychiatric research. Because head-to-head trials between active drugs are rare, NMA uses indirect comparisons through a common comparator (usually placebo) to rank multiple treatments simultaneously.

The Cipriani 2018 Landmark Study

Cipriani et al. (Lancet, 2018) conducted the largest NMA of antidepressants ever performed: 522 double-blind RCTs, 116,477 patients, 21 antidepressants. Key findings:

NMA Requirements and Assumptions

RequirementDescriptionHow to Check
Connected networkEvery treatment must link to at least one other via direct or indirect evidenceDraw the network graph; check for disconnected nodes
TransitivityStudy characteristics must be similar across comparisonsCompare baseline severity, duration, year across different comparison sets
ConsistencyDirect and indirect estimates must agreeNode-splitting test; design-by-treatment interaction test
Antipsychotic NMA: Leucht et al. (Lancet, 2013) performed a similar NMA for 15 antipsychotics in schizophrenia, comparing efficacy (PANSS/BPRS reduction), all-cause discontinuation, and specific side effects (weight gain, sedation, EPS). Clozapine was most effective; aripiprazole had the most favorable side-effect profile.

9. Step-by-Step: Psychiatry Meta-Analysis in MetaReview

Step 1: Select Effect Measure

Open MetaReview and choose the appropriate measure:

Step 2: Enter Study Data

For SMD analysis: enter Study name, Year, Mean, SD, and N for both Treatment and Control arms.

For OR analysis: enter Study name, Year, Events and Total for both arms.

Batch entry: Organize your data in Excel or Google Sheets, then copy and paste directly into MetaReview. The tool auto-detects tabular data.

Step 3: Assign Subgroups

Use the Subgroup column to label studies by drug class (SSRI, SNRI, atypical), severity level, or scale type. This enables stratified forest plots with Q-between tests.

Step 4: Run the Analysis

Click "Run Meta-Analysis". Results appear within seconds:

Step 5: Advanced Diagnostics

Step 6: Export Report

Generate an HTML or DOCX report with all figures, tables, auto-generated Methods paragraph (PRISMA 2020 format), and narrative interpretation including SMD-to-clinical-meaning conversion.

10. Common Pitfalls in Psychiatric Meta-Analysis

Pitfall 1: Ignoring Publication Bias

Turner et al. (2008, NEJM) demonstrated that publication bias inflates antidepressant effect sizes by approximately 32%. Relying only on published literature produces an overly optimistic estimate of drug efficacy.

Solution: Search ClinicalTrials.gov and FDA medical reviews for unpublished data. Use funnel plots, Egger's test, and Trim-and-Fill. Always report both corrected and uncorrected pooled estimates.

Pitfall 2: The Kirsch Debate — Clinical vs Statistical Significance

Kirsch et al. (2008) reported a pooled SMD of −0.32 for SSRIs vs placebo, arguing this was below the NICE threshold of 0.50 for clinical significance. Others argued the 0.50 threshold was arbitrary and that even SMD = 0.30 translates to meaningful symptom relief for many patients.

Solution: Report SMD alongside clinically interpretable metrics: NNT (number needed to treat), HAMD-point equivalents (SMD × SD_placebo), and response/remission rate differences.

Pitfall 3: Mixing Scales Without Using SMD

Pooling raw mean differences from HAMD-17 (range 0–52) and MADRS (range 0–60) produces nonsensical results because a 3-point change means entirely different things on each scale.

Solution: Always use SMD (Hedges' g) when combining data from different rating scales. This is not optional — it is a fundamental methodological requirement.

Pitfall 4: LOCF vs MMRM Confusion

Many older trials report only LOCF results. LOCF carries forward the last observed score for patients who drop out, which can bias results in either direction. MMRM uses all available data without imputation and is now the regulatory standard.

Solution: Prefer MMRM data when available. Record the analysis method for each study. Run a sensitivity analysis excluding LOCF-only studies.

Pitfall 5: Not Accounting for Active Placebo

Standard pill placebos may be "unblinded" because patients can tell they are not experiencing side effects. Studies using active placebos (substances that mimic side effects without therapeutic action) show smaller drug-placebo differences.

Solution: Record whether the trial used an active or inert placebo. Consider subgroup analysis or at minimum discuss as a limitation.

Pitfall 6: Allegiance Effects

The researcher's theoretical allegiance (favoring psychotherapy vs pharmacotherapy, or one drug over another) can influence trial outcomes through subtle methodological choices. Industry-sponsored trials tend to favor the sponsor's drug.

Solution: Record sponsor type and declared conflicts of interest. Run subgroup analysis by sponsor (industry vs academic vs government). Use GRADE to downgrade certainty if most evidence comes from a single sponsor.

Start Your Psychiatry Meta-Analysis Now

Enter mean, SD, and N for each arm (or response/remission events). MetaReview computes SMD, generates forest plots, and detects publication bias. Free, no coding required.

Open MetaReview

See live demo: Aspirin vs Placebo meta-analysis (7 RCTs) →

Stay Updated

Get notified about new features and meta-analysis tips.

No spam. Unsubscribe anytime.