Statlog - Privacy preserving methods: sharing data without sharing data

Methods development

Privacy preserving methods: sharing data without sharing data


  • There are several multicentre distributed-healthcare-data networks in the U.S. that enable large-scale epidemiologic studies.
  • However, privacy issues make it impossible to centrally pool individual-level data, as is done in conventional studies.
  • These constraints pose major methodological and statistical challenges because only summary-level information is available from individual centres.

Our approach

  • We used electronic health data from 34 healthcare institutions in the National Patient-Centered Clinical Research Network to develop and implement distributed linear regression methods based only on summary-level data.
  • We fitted 12 multivariable-adjusted linear regression models to assess the association of antibiotic use and weight in infants.

Our results

We benchmarked the distributed and individual level data regressions and showed that both approaches yielded identical results.

In a privacy preserving setting, the same value can be extracted from summary and individual-level data.

This opens the door to sharing data across institutions in healthcare and banking without concern for privacy.