From rating scale harmonization to network meta-analysis. Everything you need to synthesize evidence from depression, anxiety, schizophrenia, and psychotherapy trials.
Psychiatric disorders affect hundreds of millions of people worldwide, yet individual trials frequently produce conflicting results. Effect sizes are typically small to moderate (SMD 0.2–0.5 for antidepressants), placebo response rates are high, and dozens of competing treatments exist without head-to-head comparisons. Meta-analysis addresses these challenges by:
A precise PICO framework prevents scope creep and unmanageable heterogeneity. Psychiatric populations are highly variable, so specificity at this stage saves enormous effort later.
| Element | Description | Psychiatry Examples |
|---|---|---|
| P (Population) | Disorder, diagnostic criteria, severity | Adults with MDD (DSM-5), moderate-to-severe; Adolescents with GAD; Treatment-resistant schizophrenia (DSM-5); PTSD in veterans |
| I (Intervention) | Treatment being evaluated | SSRIs (escitalopram, sertraline); SNRIs (venlafaxine, duloxetine); CBT; Combination (SSRI + CBT); Aripiprazole augmentation |
| C (Comparator) | Control condition | Placebo (pill placebo); Active comparator (fluoxetine); Waitlist; TAU (treatment as usual) |
| O (Outcomes) | Primary and secondary endpoints | Mean change on HAMD-17; Response rate (≥50% reduction); Remission (HAMD ≤7); Dropout due to AE; CGI-I responder rate |
| Disorder | Common Scales | Typical Comparators | Key Challenges |
|---|---|---|---|
| MDD | HAMD-17, MADRS, PHQ-9, BDI-II | Placebo, active drug | High placebo response, scale heterogeneity |
| GAD | HAM-A, GAD-7 | Placebo, active drug | High comorbidity with MDD |
| Schizophrenia | PANSS, BPRS, CGI-S | Placebo, haloperidol, active drug | Dose equivalence, chronicity |
| Bipolar | YMRS, MADRS, CGI-BP | Placebo, lithium | Separate mania vs depression phases |
| PTSD | CAPS-5, PCL-5 | Placebo, waitlist | Trauma type variability |
| OCD | Y-BOCS | Placebo, active drug | Higher SSRI doses needed |
| ADHD | ADHD-RS, CGI-S | Placebo | Stimulant vs non-stimulant classes |
This is the most critical decision in psychiatric meta-analysis. Because studies use different rating scales for the same construct, Standardized Mean Difference (SMD) is the default choice — unlike most other medical fields where Mean Difference (MD) is common.
| Outcome | Effect Size | Null Value | Interpretation Example |
|---|---|---|---|
| Symptom reduction (mixed scales) | SMD (Hedges' g) | 0 | SMD = −0.30 means drug reduces symptoms by 0.3 SD more than placebo |
| Symptom reduction (same scale) | MD | 0 | MD = −3.0 on HAMD-17 means 3-point greater improvement with drug |
| Response rate (≥50% reduction) | OR or RR | 1.0 | OR = 1.5 means 50% higher odds of response with drug |
| Remission rate | OR or RR | 1.0 | RR = 1.4 means 40% more likely to remit with drug |
| Dropout due to adverse events | OR or RR | 1.0 | OR = 2.0 means twice the odds of discontinuing due to side effects |
| Time to relapse | HR | 1.0 | HR = 0.60 means drug reduces relapse hazard by 40% |
J(df) is the small-sample correction factor, approximately 1 − 3/(4df − 1)
Rating scale heterogeneity is the defining challenge of psychiatric meta-analysis. Unlike blood pressure or tumor size, psychiatric symptoms are measured through subjective scales with different items, scoring ranges, and rater perspectives.
| Scale | Type | Items | Score Range | Response Threshold | Remission Threshold |
|---|---|---|---|---|---|
| HAMD-17 | Clinician-rated | 17 | 0–52 | ≥50% reduction | ≤7 |
| HAMD-21 | Clinician-rated | 21 | 0–66 | ≥50% reduction | ≤7 (first 17 items) |
| MADRS | Clinician-rated | 10 | 0–60 | ≥50% reduction | ≤10 |
| PHQ-9 | Self-rated | 9 | 0–27 | ≥50% reduction | ≤4 |
| BDI-II | Self-rated | 21 | 0–63 | ≥50% reduction | ≤12 |
| Disorder | Scale | Type | Score Range |
|---|---|---|---|
| Schizophrenia | PANSS | Clinician-rated | 30–210 |
| Schizophrenia | BPRS | Clinician-rated | 18–126 |
| Anxiety (GAD) | GAD-7 | Self-rated | 0–21 |
| OCD | Y-BOCS | Clinician-rated | 0–40 |
No other medical field faces the placebo challenge quite like psychiatry. The placebo response in antidepressant trials typically ranges from 30% to 50%, dwarfing the drug-specific effect and creating unique analytical difficulties.
| Impact | Mechanism | Mitigation Strategy |
|---|---|---|
| Compressed effect sizes | High placebo response leaves little room for drug superiority | Report both absolute and relative effect sizes; use NNT for clinical interpretation |
| Heterogeneity inflation | Variable placebo response across trials increases I² | Meta-regression with placebo response rate as covariate |
| Publication year confound | Newer trials have higher placebo rates and smaller effects | Subgroup by decade or meta-regression on publication year |
| Selection bias | Failed trials less likely to be published | Include FDA review data and ClinicalTrials.gov results |
Psychiatric trials have unique reporting patterns. Use this checklist to ensure complete, consistent extraction.
Psychiatric meta-analyses almost always show moderate to high heterogeneity (I² 40%–80%). Understanding and accounting for the sources is essential for credible results.
| Source | Examples | Impact on Effect Size |
|---|---|---|
| Diagnostic criteria | DSM-IV vs DSM-5 vs ICD-10/11 | Different patient populations enrolled |
| Baseline severity | Mild (HAMD 8–13) vs moderate (14–18) vs severe (19–22) vs very severe (≥23) | Larger drug-placebo difference in severe patients |
| Rating scale | HAMD-17 vs MADRS vs PHQ-9 | Different sensitivity to change; managed by SMD but adds noise |
| Drug dose | Sub-therapeutic vs therapeutic vs supra-therapeutic | Dose-response relationship varies by drug |
| Trial duration | 4 weeks vs 8 weeks vs 12 weeks | Longer trials: larger placebo response, potentially larger drug effect |
| Publication year | 1990s vs 2000s vs 2010s vs 2020s | Secular increase in placebo response; changes in trial methodology |
| Comorbidity | MDD alone vs MDD + anxiety vs MDD + substance use | Comorbid patients may respond differently |
| Sponsor type | Industry vs academic vs government | Industry trials may have allegiance effects |
Pre-specified subgroup analyses are essential for interpreting psychiatric meta-analyses. Common subgroup variables include:
NMA is arguably the most impactful analytical method in psychiatric research. Because head-to-head trials between active drugs are rare, NMA uses indirect comparisons through a common comparator (usually placebo) to rank multiple treatments simultaneously.
Cipriani et al. (Lancet, 2018) conducted the largest NMA of antidepressants ever performed: 522 double-blind RCTs, 116,477 patients, 21 antidepressants. Key findings:
| Requirement | Description | How to Check |
|---|---|---|
| Connected network | Every treatment must link to at least one other via direct or indirect evidence | Draw the network graph; check for disconnected nodes |
| Transitivity | Study characteristics must be similar across comparisons | Compare baseline severity, duration, year across different comparison sets |
| Consistency | Direct and indirect estimates must agree | Node-splitting test; design-by-treatment interaction test |
Open MetaReview and choose the appropriate measure:
For SMD analysis: enter Study name, Year, Mean, SD, and N for both Treatment and Control arms.
For OR analysis: enter Study name, Year, Events and Total for both arms.
Use the Subgroup column to label studies by drug class (SSRI, SNRI, atypical), severity level, or scale type. This enables stratified forest plots with Q-between tests.
Click "Run Meta-Analysis". Results appear within seconds:
Generate an HTML or DOCX report with all figures, tables, auto-generated Methods paragraph (PRISMA 2020 format), and narrative interpretation including SMD-to-clinical-meaning conversion.
Turner et al. (2008, NEJM) demonstrated that publication bias inflates antidepressant effect sizes by approximately 32%. Relying only on published literature produces an overly optimistic estimate of drug efficacy.
Solution: Search ClinicalTrials.gov and FDA medical reviews for unpublished data. Use funnel plots, Egger's test, and Trim-and-Fill. Always report both corrected and uncorrected pooled estimates.
Kirsch et al. (2008) reported a pooled SMD of −0.32 for SSRIs vs placebo, arguing this was below the NICE threshold of 0.50 for clinical significance. Others argued the 0.50 threshold was arbitrary and that even SMD = 0.30 translates to meaningful symptom relief for many patients.
Solution: Report SMD alongside clinically interpretable metrics: NNT (number needed to treat), HAMD-point equivalents (SMD × SD_placebo), and response/remission rate differences.
Pooling raw mean differences from HAMD-17 (range 0–52) and MADRS (range 0–60) produces nonsensical results because a 3-point change means entirely different things on each scale.
Solution: Always use SMD (Hedges' g) when combining data from different rating scales. This is not optional — it is a fundamental methodological requirement.
Many older trials report only LOCF results. LOCF carries forward the last observed score for patients who drop out, which can bias results in either direction. MMRM uses all available data without imputation and is now the regulatory standard.
Solution: Prefer MMRM data when available. Record the analysis method for each study. Run a sensitivity analysis excluding LOCF-only studies.
Standard pill placebos may be "unblinded" because patients can tell they are not experiencing side effects. Studies using active placebos (substances that mimic side effects without therapeutic action) show smaller drug-placebo differences.
Solution: Record whether the trial used an active or inert placebo. Consider subgroup analysis or at minimum discuss as a limitation.
The researcher's theoretical allegiance (favoring psychotherapy vs pharmacotherapy, or one drug over another) can influence trial outcomes through subtle methodological choices. Industry-sponsored trials tend to favor the sponsor's drug.
Solution: Record sponsor type and declared conflicts of interest. Run subgroup analysis by sponsor (industry vs academic vs government). Use GRADE to downgrade certainty if most evidence comes from a single sponsor.
Enter mean, SD, and N for each arm (or response/remission events). MetaReview computes SMD, generates forest plots, and detects publication bias. Free, no coding required.
Open MetaReviewGet notified about new features and meta-analysis tips.
No spam. Unsubscribe anytime.