By Ben Holmes, senior clinical data analyst, Syapse.
When it comes to getting a clear picture from real-world data, breadth of view and careful analysis matter equally.
Interpreting data is always a challenge; it’s a problem space with high dimensionality, deeply interrelated variables, and where data completeness is defined in infinite ways. Separating actionable insights from mountains of data requires rigorous statistical validation, thoughtful modeling, and a variety of analytic approaches. Biostatisticians take these steps to avoid biasing results, and to make sure that samples are truly representative and relationships between variables are accounted for.
But even with all possible care and due diligence taken, it’s possible to arrive at skewed results if the view from the data sources included is limited by their inherent biases. For example, mortality is an important data element in oncology research that helps oncologists communicate chances of remission to their patients. Yet, in the real-world setting, there isn’t a single complete source for mortality data that can be used to better understand remission and survival rates.
This is, partly, because many of the traditional mortality data sources only apply to certain groups of patients. For example, death data from hospital registries is only applicable for patients in cases where registry data is available. Additionally, registries tend to rely on electronic health record (EHR) and obituary data to capture deceased status, which do not naturally account for all patients—for example, women and minorities are less likely to have obituaries. With that in mind, datasets that rely heavily on obituary data alone are going to under-represent deaths and overall survival curves associated with women and minorities. This finding is consistent with recently published studies of digitized obituaries which showed that women were awarded significantly fewer obituaries compared to men.