The Detail is in the Data

This article explores how data and technology are being applied within clinical trials.

Share
The Detail is in the Data

Clinical research is one of the most consequential fields in medicine. Before any drug reaches a patient, it must pass through a rigorous human testing process known as the clinical trial phase. This is where we determine not only whether a drug works, but also what it does to the human body, who it helps most, and who it may harm. Understanding those answers requires collecting, managing, and analyzing vast amounts of data, and that is precisely where the intersection of big data and clinical research becomes so important.

This article explores how data and technology are being applied within clinical trials today, with a particular focus on adverse event surveillance, pharmacovigilance, and post-marketing drug safety monitoring. It also examines where the field is heading and why researchers who can bridge biology and computational science are increasingly essential.

The Clinical Trial: A Data-Generating Machine

To appreciate the role of big data in clinical research, it helps to understand what a clinical trial actually involves from the inside. I spent years coordinating clinical trials at a university hospital. Throughout that experience, I can tell you that a clinical trial is not simply a human lab rat test. It is a highly structured, meticulous, data-intensive process that generates enormous volumes of information at every patient visit.

When a patient participates in a clinical trial, the clinical research team including the study coordinators and principal investigator collect vital information known as adverse events (AEs), which are any side effects that are experienced by the patients during the duration of the clinical trial. For each adverse event, the following data points are recorded: the date and time of the reaction, the medical term with its applicable diagnosis code, the severity of the event, whether the event resolved or not, whether the event is considered related to the investigational drug amongst many others. These are examples of structured data fields that relate to AEs. In addition to structured data fields, clinicians also document unstructured information in the form of clinical notes. These notes may contain nuanced observations that do not fit neatly into a data table but are nonetheless critical for understanding patient experience. 

Alongside adverse events, a comprehensive medication list is collected for each patient, beginning at the screening visit and updated continuously throughout the trial if the patient is enrolled. This meticulous approach is intentional. The comprehensive medication list exists precisely because understanding drug-to-drug interactions is one of the most important safety objectives of a clinical trial.

The sources of this data are varied. Electronic Medical Records (EMRs) such as Epic capture clinical notes and patient history. Systems such as REDCap serve as the site-level source for recording data from patient visits. The study sponsor typically maintains an Electronic Data Capture (EDC) system where trial data is entered, reviewed, and monitored across all participating sites. Together, these systems generate data streams that, when analyzed at scale, reveal critical patterns for pharmacovigilance. 

What Big Data Offers Clinical Research

The application of big data methodologies to clinical research has expanded substantially in recent years. As Luo and colleagues noted in a 2016 review published in Biomedical Informatics Insights, the role of big data techniques in bioinformatics applications is to provide data repositories, computing infrastructure, and efficient data manipulation tools for investigators to gather and analyze biological information (Luo et al., 2016). That framework applies directly to clinical trials.

When adverse event data is aggregated across thousands of patients and multiple trial sites, patterns emerge that would otherwise go undetected. Some of the most meaningful insights include the following:-

Identifying Drug-Related Side Effects in Specific Populations

Big data analysis allows researchers to identify which populations are disproportionately affected by certain adverse events. For example, in trials involving pulmonary arterial hypertension related to connective tissue disease, such as the PAH-scleroderma population, data analysis can reveal elevated rates of specific events like gastrointestinal bleeding or telangiectasia. These population-level insights are critical for refining prescribing guidelines and informing patient selection criteria in future trials.

Uncovering Drug-to-Drug Interactions

One of the reasons clinical trials collect comprehensive medication lists is to detect potential interactions between the investigational drug and other medications a patient may be taking. Laboratory phase testing is great but it is only a component of the full evaluation required to understand the drug to drug interactions. Big data analysis of combined medication and adverse event records, along with diagnostic and laboratory tests can help to uncover these interactions.

Connecting Adverse Events to Laboratory Results

Clinical trials also generate laboratory data with every patient visit, and correlating lab results with adverse events has become an important area of analysis. For instance, in trials involving drugs like Winrevair (sotatercept), researchers have identified associations between elevated hemoglobin or decreased platelet counts and specific adverse events reported by patients. These correlations help clinicians anticipate complications and adjust monitoring protocols accordingly (Merck & Co.Inc., 2024).

Quantifying the Frequency and Distribution of Adverse Events

At the most fundamental level, big data enables researchers to understand how often adverse events occur, how severe they tend to be, and how their frequency changes over time during a trial. This information forms the foundation of the drug label that is eventually approved by the FDA and the basis for the side effect disclosures that patients hear in pharmaceutical advertisements.

These are some of the examples of the details big data offers to clinical research. In the next article, we will look at the case study of a drug and how big data was essential to post marketing surveillance. 

References

Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data application in biomedical research and health care: A literature review. Biomedical Informatics Insights, 8, 1-10. https://doi.org/10.4137/BII.S31559

Merck & Co.Inc. (2024, November 10). Winrevair side effects. https://www.winrevair.com/side-effects/