The Detail is in the Data: Case Study
In the previous article on "The Detail is in the Data," we looked at some of the meaningful insights big data provides for clinical trials. In this article, we would explore a case study of a drug and how big data was essential to post marketing surveillance.
A Case Study: Niraparib and Pharmacovigilance
One of the most instructive examples of big data applied to adverse event surveillance involves niraparib, a PARP inhibitor used in the treatment of ovarian cancer. The clinical development of niraparib illustrates both the power and the limits of traditional trial-based data collection, and why post-marketing surveillance powered by big data has become so essential.
During the Phase II and Phase III clinical trials of niraparib, the most commonly identified adverse drug reactions included hematological toxicities such as decreased white blood cell counts, decreased neutrophil counts, decreased platelet counts, and anemia, as well as fatigue, insomnia, hypertension, constipation, nausea, vomiting, and diarrhea. These adverse events were considered manageable within the controlled clinical trial setting. A study of patients with recurrent ovarian cancer treated with niraparib found that approximately 14.7% of patients discontinued treatment due to treatment-emergent adverse events (Guo et al., 2022).
However, clinical trials, no matter how carefully designed, have limitations. They enroll a defined patient population over a fixed period of time. They cannot capture everything that happens when a drug enters broad clinical use across diverse populations, with varying co-morbidities and medication regimens. That is where post-marketing pharmacovigilance comes in.
The FDA Adverse Event Reporting System (FAERS)
The FDA Adverse Event Reporting System, known as FAERS, is a publicly accessible, voluntary, spontaneous reporting database and post-marketing repository for adverse event reports, product quality complaints, and medication error reports related to FDA-approved therapies. Researchers have applied data mining algorithms to FAERS data to search for potential adverse drug reaction signals for niraparib beyond what was captured in clinical trials (Guo et al., 2022).
The results were telling. The analysis confirmed known adverse events that were consistent with the existing niraparib drug label, including fatigue, nausea, vomiting, constipation, anemia, thrombocytopenia, headache, hypertension, and insomnia. These confirmations validate the reliability of the analytical approach.
More significantly, the analysis also uncovered previously unlabeled adverse events, including peripheral neuropathy, photosensitivity reaction, gastroesophageal reflux disease, emotional distress, decreased blood potassium, increased heart rate, balance disorder, and memory impairment. These signals were not listed in the niraparib drug label at the time of the analysis. Their discovery through big data methods represents exactly the kind of insight that big data provides to clinical trials.
Hematological Risk Quantification
The study identified high disproportionality signals in blood and lymphatic disorders. Among niraparib recipients, Grade 3 or higher adverse events included anemia in 31.0% of patients, thrombocytopenia in 28.7%, decreased platelet count in 13.0%, and neutropenia in 12.8%. These findings led researchers to recommend careful monitoring and caution when prescribing antiplatelet or anticoagulant agents alongside niraparib. That is a concrete, actionable clinical recommendation that emerged directly from big data analysis.
Time-to-Onset Analysis
Perhaps one of the most clinically practical findings from the niraparib pharmacovigilance study was the time-to-onset analysis. Researchers calculated the onset of adverse events by subtracting the drug start date from the event date across 4,323 reports. The median onset was 18 days, with 62.92% of adverse events occurring within the first month of treatment and the majority occurring within the first three months of niraparib initiation. This kind of temporal analysis provides clinicians with specific guidance on when monitoring should be most intensive, which can meaningfully improve patient safety in practice.
The Researcher at the Intersection
What this example illustrates is that the future of clinical research does not belong solely to biologists or solely to data scientists. It belongs to those who understand both. Over the course of my career, I have worked on both sides of this equation. I have sat across from patients and collected adverse event data with the care and precision that their safety demands. The ability to move between those two worlds is increasingly valuable. A researcher who understands the clinical context of adverse event data brings something different to a big data analysis than one who only sees the numbers. Knowing what it means for a patient to report fatigue during week three of a trial, understanding the clinical significance of a Grade 3 hematological event, and appreciating why a clinician's note might contain information that does not appear in any structured field: these forms of knowledge shape better research questions and more meaningful analyses.
At the same time, clinical researchers who lack computational skills are increasingly limited in what they can contribute to a field that is generating data at unprecedented scale. The integration of programming, data analysis, and computational modeling is no longer an optional enhancement to clinical research training. It is becoming a core competency.
Communicating the Science
There is one more dimension of this work that I believe deserves attention: communication. Big data can generate powerful insights, but those insights are only valuable if they are understood and acted upon. That means communicating findings clearly to clinicians, regulatory bodies, patients, and the broader public.
The niraparib pharmacovigilance findings, for example, are only as useful as the degree to which they reach and inform the oncologists and patients who need them. As data in clinical research becomes more complex and more abundant, the ability to translate that complexity into clear, accessible, and accurate communication becomes more important.
Conclusion
Clinical research has always been a discipline built on careful observation and rigorous documentation. What has changed is the scale at which observation is now possible and the sophistication of the tools available to find meaning in what is observed. From the adverse event records collected during patient visits to the post-marketing surveillance data that flows into FAERS, the clinical research enterprise generates a continuous stream of information about how drugs behave in real human bodies.
Big data methodologies are making it possible to extract insights from that stream that were previously out of reach: unlabeled adverse events, population-specific risk profiles, drug interaction signals, and temporal patterns that guide clinical monitoring. These are not abstract technological achievements. They are advances that directly shape the safety and efficacy of the medicines that patients rely on.
The researchers who will do this work most effectively are those who bring both biological understanding and computational capability to the table, and who can communicate what they find to the people who need to know it.
References
Guo, M., Shu, Y., Chen, G., et al. (2022). A real-world pharmacovigilance study of FDA adverse event reporting system (FAERS) events for niraparib. Scientific Reports, 12, 20601. https://doi.org/10.1038/s41598-022-23726-4
Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data application in biomedical research and health care: A literature review. Biomedical Informatics Insights, 8, 1-10. https://doi.org/10.4137/BII.S31559