Vector Quality Sciences
VECTORQuality Sciences
Case Study

Data Quality Initiative at Memorial Sloan Kettering

Improving clinical trial data quality through EMR integration and automated reconciliation

Memorial Sloan Kettering Cancer Center
12 Investigator-Initiated Trials
2014-2016

Key Results

18%
Data Quality Improvement
Query rate reduction
40%
Faster Data Entry
EMR pre-population
12
Trials Standardized
Unified data flow

The Challenge

Memorial Sloan Kettering (MSK) is a world-renowned cancer center conducting dozens of investigator-initiated trials. Unlike industry-sponsored trials, these trials had limited budgets and relied heavily on clinical staff to perform both patient care and research data entry.

The Data Quality Problem

  • Dual Data Entry: Clinical staff entered data into EMR (Epic) for patient care, then re-entered the same data into EDC (Medidata Rave) for research—creating discrepancies
  • High Query Rates: Average query rate was 12.3% across trials, with most queries related to missing or inconsistent lab values, vital signs, and adverse events
  • Delayed Data Entry: Research coordinators entered data 2-4 weeks after patient visits due to workload, making real-time monitoring impossible
  • No Standardization: Each trial had its own data flow, making it impossible to scale quality improvements across the portfolio
  • Limited IT Resources: Small clinical research IT team couldn't support custom integrations for every trial

The Director of Clinical Research Operations needed a scalable solution to improve data quality without increasing coordinator workload or requiring massive IT investment.

The Solution

I was hired as a System Analyst (promoted from Clinical Research Specialist) to design and implement a standardized data quality framework. The approach focused on reducing dual data entry and automating data quality checks.

Phase 1: EMR-to-EDC Mapping (Months 1-4)

  • Data Flow Analysis: Mapped data flow for 12 trials, identifying 85 common data points collected in both EMR and EDC (labs, vitals, AEs, concomitant meds)
  • Epic-to-Rave Integration: Worked with IT to build HL7 interface extracting data from Epic and pre-populating Rave forms
  • Data Transformation Rules: Created mapping rules to transform Epic data formats (LOINC codes, SNOMED terms) to Rave-compatible formats
  • Validation Logic: Implemented validation rules to flag discrepancies between EMR and EDC data for coordinator review

Phase 2: Automated Reconciliation (Months 5-8)

  • SQL-Based Reconciliation Scripts: Built SQL scripts to compare EMR data against EDC data, generating discrepancy reports
  • Weekly Reconciliation Reports: Automated weekly reports sent to coordinators highlighting missing or inconsistent data
  • Priority Scoring: Ranked discrepancies by impact (critical safety data vs. non-critical demographics) to focus coordinator attention
  • Closed-Loop Workflow: Coordinators reviewed reports, resolved discrepancies, and marked items as complete in tracking database

Phase 3: Standardization & Training (Months 9-12)

  • Standard Data Flow Template: Created reusable template for future trials, reducing setup time from 3 months to 2 weeks
  • Coordinator Training: Trained 25 research coordinators on new workflows, emphasizing how integration reduced their workload
  • Documentation: Created SOPs, data flow diagrams, and troubleshooting guides for IT and coordinators
  • Continuous Improvement: Established quarterly review process to refine mapping rules and add new data points

The Results

Quantitative Outcomes

18%
Data Quality Improvement
Query rate decreased from 12.3% to 10.1% across 12 trials
40%
Faster Data Entry
EMR pre-population reduced data entry time from 45 min to 27 min per patient visit
85%
Data Points Automated
85 of 100 common data points now pre-populated from EMR
2 weeks
Faster Data Availability
Data available in EDC within 48 hours of visit (down from 2-4 weeks)

Qualitative Outcomes

  • Coordinator Satisfaction: Post-implementation survey showed 92% of coordinators felt the integration reduced their workload
  • Scalability: Template approach allowed 8 new trials to adopt the integration in Year 2 with minimal IT effort
  • Real-Time Monitoring: PI and study teams could now monitor trial progress in real-time instead of waiting for quarterly reports
  • Audit Readiness: Sponsor audits found "well-documented data flow with appropriate quality controls"

Real-World Example: Catching Data Discrepancies Early

In Month 10, the weekly reconciliation report flagged a discrepancy for Patient 042: EMR showed platelet count of 45,000 (Grade 3 thrombocytopenia), but EDC showed 145,000 (normal).

The coordinator investigated and discovered a transcription error during manual data entry. The correct value was 45,000, which should have triggered a dose modification per protocol.

Impact: The error was caught within 48 hours of the lab result, allowing the PI to adjust treatment before the next dose. Without automated reconciliation, this would have been discovered during quarterly monitoring (8 weeks later), potentially compromising patient safety.

Technical Implementation Details

For those interested in the technical approach:

  • Integration Architecture: HL7 interface extracted data from Epic (ADT, ORU, DFT messages), transformed via Mirth Connect, loaded into Rave via API
  • Database Design: Created staging database (SQL Server) to store Epic data, apply transformation rules, and track reconciliation status
  • Reconciliation Logic: SQL stored procedures compared Epic vs. Rave data, calculated discrepancy scores, generated coordinator reports
  • Reporting: SSRS reports emailed to coordinators weekly, with drill-down capability to patient-level discrepancies

Lessons Learned

  • 1.
    Start with Common Data Points: Focusing on the 85 common data points (vs. trying to automate everything) delivered 80% of the value with 20% of the effort
  • 2.
    Coordinator Buy-In is Critical: Coordinators were initially skeptical that integration would work. Pilot success on 2 trials convinced them to adopt.
  • 3.
    Template Approach Scales: Creating a reusable template allowed the solution to scale from 12 trials to 20+ trials without proportional IT effort
  • 4.
    Automated Reconciliation > Manual Checks: Weekly automated reports were far more effective than quarterly manual reconciliation, catching issues before they became problems

Need Help with EMR-to-EDC Integration?

I've designed and implemented data integration solutions at academic medical centers and pharmaceutical sponsors. Let's discuss how to improve your data quality through automation.