Signal detection and risk management is an important area for pharmaceutical companies for safety surveillance and benefit-risk monitoring of approved products. Marketing Authorization Holders (MAH) are using various signal detection methods and systematic empirical assessments (i.e. quantitative methods) for signals identification. However, the challenge of accurate, timely and evidence-based signal detection still remains which results in missing or delay in identification of potential signals and implementation of required risk mitigation plans to safeguard the patients. The quantitative methods are based on statistical algorithms and clinical relevance of source data is disregarded when generating signals of disproportionate reporting (SDR), and resulting in a high number of false positive signals due to lack of clinical relevance which would require significant human effort for assessment^{1}. This white paper discusses the need for artificial intelligence in quantitative methods to identify accurate signals based on the combination of clinical relevance and statistical thresholds.

### 1. INTRODUCTION

Routine safety surveillance of approved drugs is mandatory for any MAHto ensure the benefit-risk profile of marketed products remains favourable for the patient population and also that the benefits outweigh the risks of the products. One of the most important aspects of marketed-drug safety monitoring is the identification and analysis of new, medically important findings called‘signals’ that might influence the use of a medicinal product.

As per the World Health Organization (WHO), a Safety Signal is defined as reported information on a possible causal relationship between an adverse event and a drug, the relationship being unknown or incompletely documented previously. A more recent definition was given by the Council for InternationalOrganizations of Medical Sciences (CIOMS)-‘information that arises from one or multiple sources (including observations and experiments), which suggests a new potentially causal association, or a new aspect of a known association, between an intervention and an event or set of related events, either adverse or beneficial, that is judged to be of sufficient likelihood to justify verificatory action’^{2}.

Signals are identified from source data through qualitative or quantitative methods. Qualitative methods include the analysis of individual case safety reports (ICSRs), case series and published scientific information from clinical and pharmacoepidemiology studies on the usage of drug and occurrence of adverse drug reaction (ADR). Quantitative methods produce the SDRs which represents a statistical association between the use of drug and occurrence of an event using algorithm principles and pre-defined thresholds. MAHs have adopted a wide variety of tools for signal identification purpose.

The quantitative methods or computational analysis constitute an important tool for signal detection^{3}, and most commonly used quantitative analysis tool Empirica was developed by Oracle health sciences. Regulatory authorities European Medical Agency (EMA),WHO and United States Food and Drug Administration (USFDA) perform quantitative analysis of spontaneous data on EudraVigilance Data Analysis System (EVDAS),Vigirank and FDA Adverse Event Reporting System (FAERS) using different data mining algorithms, respectively.

While various advanced quantitative methods and tools are used byMAHs for signal detection based on common data models, reference datasets for evaluation, as well as new analysis methods and systematic empirical assessments, the challenge of accurate, timely and evidence-based signal detection still remains which results in missing or delay in identification of potential signals and implementation of required risk mitigation plans to safeguard patientshealth^{4}. It is also important to avoid the high number of false positive signals which require great human efforts for assessment. To overcome these challenges, Artificial Intelligence (AI) or machine learning would be required for quantitative methods to identify accurate signals based on the clinical relevance of source data.

This white paper discusses a brief overview of quantitative signal-detection methods, strengths and limitations of quantitative methods and spontaneous data with a focus on the need of AI to improve conventional quantitative methods for identification of more accurate signals based on clinical evidence. The objectives of this white paper are as follow:

- To develop robust quantitative methods for true signals based on clinical evidence from source data and using AI/machine learning for signals identification.
- Identification of early signals, thus the implementation of early risk minimisation measures for better patient safety.
- To reduce a high number of false positive signals and to avoid significant human efforts in their assessment.
- For better monitoring of a drug’s benefit-risk profile to safeguard the patients from potential drug usage risks, thus making the MAHs legally and ethically binding to patient safety.

### 2. DATA MINING METHODS

Signal detection can be performed through qualitative and quantitative methods. Qualitative signal detection is based on the routine review of spontaneous data i.e. ICSRs and scientific literature from clinical and epidemiology studies etc. However, this approach is best for a small number of reported cases and literature review, since a routine review of ICSRs for signal detection purpose via the qualitative approach is a highly painstaking and time-consuming process for many MAHs considering the enormous product portfolio and a high number of reported cases. As an alternative, quantitative methods can identify the risks associated with the use of drugs from large spontaneous data using algorithm principles.

Quantitative signal detection can be performed through automated procedures to support the clinical evaluation of spontaneous reports commonly called‘data mining approach’. Data mining methods aim to examine a large dataset of adverse event report records by using statistical or mathematical tools, often known as data mining algorithms. These methods are based on the concept of disproportionality. The basic idea of Disproportionality Analysis (DPA) lies on the assumption that combinations of a drug and a clinical/adverse event that are disproportionately highly represented in the database may indicate an important findings (signal) based upon a difference from the background frequency^{1}.

DPA methods are useful for first-pass screening of large collections of ICSRs quantitatively, in which the observed rate of a drug and adverse drug reaction (ADR) reported together is compared with an expected value based on their relative frequencies reported individually in the spontaneous reporting database^{5}. It is important to note that, although these approaches are known as ‘quantitative’ signal detection methodologies, no risk quantification can be assessed. Moreover, the presence of a statistically significant result does not necessarily imply an actual causal relationship between the ADR and the drug, nor does the absence of a statistically significant result necessarily disprove the possible relationship^{6}. Hence, this association is simply referred to as a signal of disproportionate reporting (SDR), which requires an application of clinical knowledge to identify the relevant risk with the drug of interest^{6}.

Several statistical algorithms have been developed for DPA, while 2×2 contingency table-based disproportionality analysis is the most commonly used in contemporary statistical methods in pharmacovigilance^{3}. A variety of multivariate methods and sequential methods are being increasingly tested and applied for the purposes of signal detection and preliminary evaluation. DPA methods can be classified into frequentist and Bayesian approaches. Frequentist methods are Proportional Reporting Rate (PRR), RelativeReporting Ratio (RRR) and Reporting Odds Ratio (ROR) which provide a statistical association for a reported ADR using safety database background as expected ratios. These methods(frequentist) suffer from the sampling variance issue meaning that a signal cannot be generated when the reporting ADRs are low in number (≤3) which may result in missing potential signals for the drug-event pair which has strong temporal relationship.^{1,7}

Bayesian methods, like the Multi-Item Gamma Poisson Shrinker (MGPS) and BayesianConfidence Propagation Neural Network (BCPNN), were proposed to address the sampling variance issue, which is based on Bayes’ law to estimate the probability (posterior probability) that the suspected event occurs given the use of a suspect drug. Both approaches handle sampling variance by shrinking the relative reporting ratio or information component (IC) towards a prior when less data concerning the drug-ADR pair is available. Particularly, MGPS is considered reliable and in routine use by the FDA.^{1}

#### 2.1. ProportionalReporting Ratio (PRR) method

The PRR is a measure of disproportionality of reporting used to detect SDRs in pharmacovigilance

databases such as EudraVigilance. This method assumes

that when an SDR (involving a particular adverse event ‘Y’) is identified for a medicinal product (X), this adverse event is reported relatively more frequently in association with this medicinal product ‘X’ than with other medicinal products. This relative increase in the adverse event reporting for the medicinal product X is reflected in a 2×2 contingency table (Table 1) based on the total number of individual cases contained in a pharmacovigilance database, as follows^{8}:

The PRR is computed as follows:

For example, the proportion of individual cases of Angioedema (event ‘Y’) amongst all the reports reported for ibuprofen (product ‘X) is equal to 5% (e.g. 5 reports of angioedema amongst a total of 100 reports reported with ibuprofen; in that case A = 5, B = 95, A+B = 100). The proportion of reports of angioedema amongst all the reports involving all the other medicinal products of the database (but not ibuprofen) is also equal to 5% (e.g. 5,000 reports of angioedema amongst 100,000 reports reported with all other medicinal products; similarly, in this case, C = 5000, D = 95,000, C+D =100,000). In this example, the PRR is equal to 1 (i.e. 0.05/0.05).

ThePRR involves the calculation of the rate of reporting of one specific event among all events for a given drug, the comparator being this reporting rate for all drugs present in the database (including the drug of interest). If the ratio of [a/(a+b)] is greater than the ratio of [c/(c+d)], then Event Y is ‘disproportionately reported’ for Product X, while the rest of the database is considered as a background ‘expected.’ PRR is a method that does not adjust for multiplicity, small counts, or the underlying fact that every report represents a suspicion of an adverse event related to a product and may generate a high number of false positive signals particularly when the number of reports is low. Therefore, case count thresholds (number of reports > 3) are also used in association with the PR and Chi-square statistics to reduce the number of false positives.

There is currently no ‘gold standard’ that establishes universal thresholds for signals of disproportionate reporting^{8}. According to EMA guidelines for statistical signal detection methods, a signal is considered when the PRR is displayed with its 95% confidence interval:

The most commonly used statistical methods for disproportionate reporting have been included below in Table 2. Some of these statistics (e.g., RR and IC) have been integrated into Bayesian approaches MGPS and BCPNN which have been developed in part to account for the variability associated with small numbers of reports.^{7}

##### 2.2. Bayesian Methods

The frequentist methods (PRR and ROR) suffer from the sampling variance issue i.e. does not adjust for small observed or expected numbers of reports of the product-event pair of interest. To address this issue, to bayesian methods MGPS and BCPNN were proposed. Both approaches handle sampling variance by shrinking relative reporting ratio or information component (IC)towards a prior when less data concerning the drug-ADR pair is available and produces empirical Bayesian Geometric Mean (EGBM) scores^{1}.

The EBGMcalculation is conceptually similar to the PRR but incorporates Bayesian‘shrinkage’ and stratification to produce disproportionality scores toward the null, especially when there are limited data and relatively small numbers of cases. One important difference between the PRR and EBGM estimates is that in the case of PRR the adverse events from the product in question do not contribute to the number of ‘expected’ cases, while all adverse events from the product contribute to the expectation when using EBGM. The statistical modifications used in the EBGM methodology diminish the effect of spuriously high PRR values, thus reducing the number of false-positive safety signals.EBGM values provide a more stable estimate of the relative reporting rate of an event for a particular product relative to all other events and products in the database being analysed. A Lower and upper 90% confidence limits for the EBGMvalues are denoted EB05 and EB95, respectively^{3}.

##### 2.2.1 Bayesian confidence propagation Neural Network (BCPNN) Method

The BCPNN method measures the association between a drug and an adverse event by IC defined as the logarithm of the ratio of the observed rate of a specific drug-AE combination to the expected rate of adverse event (AE)under the null hypothesis of no association between the drug and the event. Thus, when a drug-AE pair is reported more often than expected, relative to general reporting of the drugs and the AEs, it results in positive values of Information Coefficient (IC). The crude estimate of IC, IC_{0} without using a Bayesian framework is defined as: Therefore, for any event (i) – drug (j)combination, IC is defined as:

Bayesian framework uses the property under the null hypothesis of no association between drug j and adverse event i:

It has been assumed that P (drug is j, event is i), P (drug is j) and P (event isi) each follow a beta-binomial distribution, and IC is estimated using posterior mean and variance from a fully Bayesian model specification. Unlike in the frequentist approach, in the Bayesian framework, population parameters like number of AE reports are assumed to have intrinsic probability distributions, reflecting the uncertainty in their parameter values. A signal is generated if the lower limit of the 95% credible interval is greater than zero^{9}

##### 2.2.2. The Multi-Item Gamma Poisson Shrinker (MGPS) Method

The Multi-Item Gamma Poisson Shrinker(MGPS) developed by Du Mouchel aims at finding the ‘interestingly large’ cell counts in a large frequency table of drugs and events for possible further evaluation. This method computes the logarithm of the ratio of observed number of counts for the drug-AE pair over the expected number of events^{10}.

This method assumes that the observed count for each drug-event pair ‘a’ is drawn from a Poisson distribution with an unknown mean, μij. The ratio μij/ij E is also assumed to be drawn from a prior distribution. The EBGM which is an empirical Bayes estimate of the RR is obtained from the model. This is used to rank all cell counts and determine which cells have unusually large observed counts compared to the expected counts. The EBGM is also seen as a ‘shrinkage’estimate of the true relative risk for a particular drug-event pair. The EBGM will be close to the crude RR if the observed or expected number of events of drug j is large, otherwise, EBGM will be shrunk towards the null value of 1. If the lower 95% credible interval of the posterior distribution is greater than 2, then a signal is generated. This threshold is one possible choice of defining a signal. There is not, nor should there be, a single, fixed definition of a signal threshold when using MGPS; rather, it is important to consider the severity of the drug-event pair and the severity of the condition being treated. This threshold ensures that a particular drug-AE pair is being reported at least twice as often as would be expected if there were no association between the drug and the AE (the adjusted ratio of observed to expected counts)^{9}.

However, it is important to know that a high relative reporting rate does not necessarily indicate a high incidence of the event or suggest a causal relationship between the drug and the AE. There is a scientific consensus that signals of disproportionate reporting identified with statistical methods must be considered with a medical judgement. The SDRs which are considered to warrant further evaluation should always be systematically medically assessed^{8}.

### 3. LIMITATIONS OF QUANTITATIVE METHODS AND SPONTANEOUS DATA FOR SIGNAL IDENTIFICATION

The widely used frequentist methods PRR, RRR and ROR produce spurious safety signals i.e. false positives because of the variance in sample size. The underlying model assumptions of these methods fail for low-count drug adverse event pairs, which represent a majority of Drug Event Combinations (DECs) in spontaneous reporting system (SRS) databases. Under these conditions, frequentist signal detection methods can become unstable (i.e. the increased detection of the signal is accompanied by an increased detection of false positives) and unreliable. Advanced methods of MGPS and BCPNN address sampling variance by shrinking the relative reporting ratio or information component towards a prior when less data concerning the drug-ADR pair is available but shrinks the expected IC more harshly as the expected value of a drug event pair decreases. This means that these methods are less sensitive and will not be able to detect signals for drugs that have low exposure to the population and the confounders induced by concomitant drugs remain and would cause inaccurate detections by implying over shrink. Bayesian approaches are also less intuitive and more computationally intensive than frequentist approaches^{11}.

Quantitative methods also fail to adjust statistical thresholds for quality and confounding factors of spontaneous data. Confounding factors such as concomitant medication, disease comorbidity or underlying condition and patient medical history etc, play important role in producing a high number of false positive signals. Moreover, potential confounding information such as smoking behaviour or alcohol consumption are still lacking from spontaneous data. Spontaneous data reporting quality and bias such as known sources of error include an incorrect association between drugs and events, over-reporting (multiple reports for the same incident) and under-reporting (events that are never reported), cases with missing or incorrect and vague information will produce false thresholds considering background expected ratio. It is important to ensure that confounding factors and quality of spontaneous data do not produce spurious safety signal warnings of non-existing hazards.

SRS databases also lack exposure information, meaning that the database lacks information on the population who used the drug without associating ADR, thus implying that the event reporting rates derived from databases can only be considered a relative. In addition, SRS data are more akin to a census, rather than an unbiased sample from an underlying true population of adverse event reports. Therefore ‘estimates’ and corresponding ‘confidence intervals’ of SDRs stemming from any data-mining algorithm (DMA) should really be viewed as ‘pseudo-estimates’and ‘pseudo-intervals’^{11}.

However, the frequentist and Bayesian approaches have their own distinct advantages. The primary advantages of frequentist approaches are that they are simple to compute, are easy to interpret and have higher sensitivity than currentBayesian methods when comparing common implementations. Bayesian methods, in contrast, attempt to stabilize the resulting ratio metrics for low-count drug-event pair via shrinkage. All these methods are very useful for analysis of large spontaneous data for potential signals, drug usage patterns in wider population and for early risk minimisation measures for better monitoring drug-benefit risk profile, thus providing better patient safety.

### 4. ARTIFICIAL INTELLIGENCE FOR QUANTITATIVE METHODS FOR TRUE SIGNALS

To overcome the limitations of quantitative methods, artificial intelligence is required to adjust the disproportionality scores between observed/expected ratio with strength of evidence available from spontaneous case reports, for example index information from narratives, time to onset(latency), temporal or dose-response relationship, pharmacological or biological plausibility, de-challenge and re-challenge information and considering presence of confounding factors such as concomitant medication, disease comorbidity, patients medical history for calculating observed/expected ratio and generating SDRs for true positive signals.

However, published studies are very limited on combining clinical evidence and alleviating confounding information for adjusting disproportionality scores and generating SDR’s of true positives. This white paper discusses frameworks proposed by different clinical researchers from published studies for accurate signal detection, and results from these studies suggest the beneficial role of AI or machine learning and scope of improvement for current quantitative methods to identify true signals and avoid false positives. Authors from this studies propose the framework in addition to current quantitative methods through adjusting disproportionality scores for clinical variables, confounding factors, drugs therapeutic area (TA) based approach, and the combination of spontaneous data with other observational healthcare data for real-world evidence. Results from these studies suggest that the proposed predictive models could identify more accurate signals than quantitative methods alone, which are discussed below.

Caster et al^{12}(2014) developed a predictive model of Vigirank based on vigibase safety data at Uppasala monitoring centre for emerging signals of suspected ADRs from large collections of individual case reports, accounting for a broad variety of aspects of the strength of evidence including clinical relevance, quality, content and disproportionality analysis of individual reports. This method provides empirical support that simultaneously accounts for multiple strength‐of evidence aspects and offers higher real‐world signal detection performance than disproportionality analysis alone. The authors considered the variables that measure the quality and clinical content of ICSR’s, as well as more quantitative aspects of a reporting pattern such as trends in time and geographic spread. Each considered variable is defined at the level of a drug–ADR pair and is based on the reporting pattern of that pair in a specific collection of individual case reports. Informative reports with full information on the type of report, type of notifier, time to onset, country of origin, patient age and sex, indication for treatment, dosage, outcome, and free text description are considered as variables. The predictive method analysed the spontaneous reports in 2014 for signals and compared them with historical data from 2009 to2013 for its effectiveness^{13}. Overall, 194 drug‐ADR pairs highlighted byvigiRank were subjected to initial assessment during 2014, resulting in 6signals (3.1%) following the in‐depth assessments. The observed performance forvigiRank is over 2.5‐fold better than that seen historically for disproportionality‐ based signal detection (from 2009 to 2013), with 19 signals out of 1592 initial assessments (1.2%; P < .05 using Fisher’s exact test).The 6 vigiRank signals came out of 18 in‐depth assessments, corresponding to9.3% of the initial assessments. The corresponding proportion for disproportionality analysis was 215 of 1592 (14%; P = .17). In conclusion, combining multiple strength‐of‐evidence aspects as in vigiRank significantly outperforms disproportionality analysis alone in real‐world pharmacovigilance signal detection, for VigiBase. The study results are presented in Figure 1.

Xiao et al^{1 }developed a framework that detects improved drug safety signals from multiple and heterogeneous data sources via Monte Carlo Expectation-Maximization (MCEM) and signals combination. The MCEM procedure was designed to explicitly handle concomitant confounders in SRS data to filter out concomitant confounders in each case report via sampling procedures that could assign each ADR to its major associated drug determined by the drugs’ contribution to that ADR (e.g. measured by normalized MGPS) in the case report. This is achieved via an iterative procedure, in each iteration, the sample drug is added to the report saved throughout previous iterations and MGPSscores are re-calculated for the current report. This procedure can generate more accurate SRS signals for each ADR through iteratively down-weighting their associations with irrelevant drugs in case reports. While in the signal combination step, the authors adopted Bayesian hierarchical model and proposed anew summary statistic such that SRS signals can be combined with signals derived from other observational health data allowing for related signals to borrow statistical support with adjustment of data reliability. This combined approach effectively alleviates the concomitant confounders, data bias, rareADR and under-reporting issues.

The study results suggest that the proposed MCEM step generates more accurate signals, and thus brings the significant area under the curve (AUC) gains for overall data (Table 3 and4), except acute liver failure (ALI), which could be due to the unusual data distribution of ALI, e.g. more cases than controls. Results also show that a trend of better prediction performance over the years using MCEM procedure(Table 5). The study demonstrated that the proposed framework outperformed state-of-the-art baselines and detected many true signals that the baseline methods could not detect.

Grundmark et al^{14}conducted a study to explore the possibility of decreasing false-positive signals of SDR by calculating PRR by therapeutic area (TA), while still maintaining the ability to detect relevant SDR’s. The PRR-TA’s ability to detect true-positive SDRs was increased compared to the conventional PRR and performed 8–31% better than EU-SDR definition. This method removed false SDRsconfounded by disease or disease spill-over by up to 63%, while retaining or increasing the number of unclassified SDRs relevant for manual validation, and thereby improving the ratio between confounded SDRs (i.e., false positives) and unclassified SDRs for all investigated drugs (possible signals). The study results suggest that the performance of the PRR was improved by background restriction with the PRR-TA method; the number of false-positive SDRs decreased and the ability to detect true positive SDRs increased, therefore improving the signal-to-noise ratio.

The proposed framework by Caster et al^{13}(2017) has improved disproportionality signal detection performance by 2.5-folds which results in the overall efficiency of 3.1% of signals compared to historical data of identified signals. Xiao et al^{1}discussed a signal combination method to alleviate the issues of confounding factors, under-reporting, data bias, and results from this method increased average AUC scores for MCEM MGPS than MGPS alone for all ADRs within FDA’s FEARS and Canada’s MedEffect spontaneous data, meaning that this method could detect more relevant possible SDRs compared to baseline method for further evaluation of signals. The method developed by Grundmark et al^{14}removed up to 63% false positives SDRs which were confounded by diseases comorbid conditions and detected true positives up to 8- 31% when compared to conventional PRR methods. The results from these studies indicate better performance of proposed predictive models for signal detection purpose and suggest the need of improvement to conventional quantitative methods for true signals identification based on clinical evidence and real-world experience.

### 5. CONCLUSION

The strength of clinical evidence from spontaneous reports such as temporal or dose-response relationship, time to onset, de-challenge or re-challenge and pharmacological/biological plausibility information play an important role in causality assessment for any ADR(s). Quantitative methods should be adjusted for clinical evidence information from spontaneous data while calculating disproportionality scores against expected/observed ratio in generating statistical thresholds for true positive SDRs. Spontaneous data included with confounding factors (i.e. concomitant medications, disease co-morbidity or concurrent condition and patient medical history) provides an alternative explanation for the occurrence of an ADR. Presence of confounding factors resulting in spurious signals is challenging in quantitative methods, therefore confounding information needs to be adjusted for disproportionality scores to generate true positive SDRs. Concerning the overall quality of spontaneous reports and confounding factors, there is still room for improvement to quantitative methods due to the vast amount of missing data, especially on clinically relevant information such as time-to onset, de-challenge or re-challenge, disease morbidity and medical history. There should also be a commitment to improving the quality of spontaneous data, which is ultimately the rate-limiting step.

Furthermore, spontaneous report databases cover a range of products aimed at diverse medical conditions and are used across a broad range of patient populations. This diversity is important as vaccines are given to healthy subjects, often children who are likely to have fewer underlying medical conditions and consequently different reported background adverse events than the main population of patients that use other medicines. Many quantitative signal detection algorithms disregard this diversity and give equal weight to information from all products and all patients when computing the expected number of reports for a particular drug-event pair, which may result in signals either being masked, or false associations being flagged as potential signals. Stratification and subgroup analyses like sex and/or age(groups) are generally useful to reduce confounding SDRs^{5}.Further development of quantitative methods and AI technology would require analysing large amounts of spontaneous data based on clinical evidence, along with data quality for identification of true positive signals to avoid large numbers of false positives. These measures would provide effective monitoring of a drug’s benefit-risk profile and early risk minimisation measures for overall better patient safety.

### 6. REFERENCES

- Xiao C, Li Y, Baytas IM, Zhou J, Wang F. An MCEMFramework for Drug Safety Signal Detection and Combination from HeterogeneousReal-World Evidence. Sci Rep. 2018; 8:1806.
- CIOMS. Practical Aspects of Signal Detection in pharmacovigilance: Report of CIOMs Working Group VIII. 1 edition. Geneva: Council for International Organizations of Medical Sciences. 2010; 143.
- Data Mining at FDA – White Paper. https://www.fda.gov/scienceresearch/dataminingatfda/ucm446239.htm
- Koutkias VG, Marie-Christine Jaulent MC.Computational Approaches for Pharmacovigilance Signal Detection: towards integrated and Semantically-Enriched Frameworks. Drug Safety, Springer Verlag,2015, 38 (3), pp.219-232
- Wisniewski AFZ, Bate A, Bousquet C, Brueckner A, Candore G, Juhlin K, Macia-Martinez MA, Manlik K, Quarcoo N, Seabroke S, Slattery J, Southworth H, Thakrar B, Tregunno P, Van Holle L, Kayser M, Nore´nGN. Good Signal Detection Practices: Evidence from IMI PROTECT. Drug Saf 2016;39:469–490.
- Poluzzi E, Raschi E, Piccinni C, Ponti DF. DataMining Techniques in Pharmacovigilance: Analysis of the Publicly Accessible FDA Adverse Event Reporting System (AERS). Intech. 2012; 260-302.
- Shibata A, Hauben M. Pharmacovigilance, signal detection and Signal Intelligence Overview. 14th International Conference on Information Fusion. Chicago, Illinois, USA, July 5-8, 2011.
- EMEA. Guidelines on the use of statistical signal detection methods in the Eudravigilance data analysis system. 2006.
- Kajungu D, Speybroeck N. Implementation of signal Detection Methods in Pharmacovigilance – A Case for their Application toSafety Data from Developing Countries. Ann Biom Biostat. 2015; 2(2): 1017.
- DuMouchel, W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system.Am. Stat. 1999, 53, 177–190.
- Johnson K, Guo C, Gosink M, Wang V and Hauben M.Multinomial modeling and an evaluation of common data-mining algorithms for identifying signals of disproportionate reporting in pharmacovigilance databases. Bioinformatics. 2012; Vol. 28 no. 23: 3123–3130.
- Caster O, Juhlin K, Watson S, Nore´n GN. Improved Statistical Signal Detection inPharmacovigilance by Combining Multiple Strength-of-Evidence Aspects invigiRank. Drug Saf 2014; 37:617–628.
- Caster O, Sandberg L, Bergvall T, Watson S,Nore´n GN. VigiRank for statistical signal detection in pharmacovigilance: First results from prospective real‐world use. Pharmacoepidemiol Drug Saf.2017; 26:1006–1010.
- Grundmark B, Holmberg L, Garmo H and ZetheliusB. Reducing the noise in signal detection of adverse drug reactions by standardizing the background: a pilot study on analyses of proportional reporting ratios-by-therapeutic area. Eur J Clin Pharmacol (2014) 70:627–635.

*White paper prepared by Thirupathi Ponaganti, MS, PharmD – Pharmacovigilance*

*DISCLAIMER: The information provided in this White Paper is strictly the perspectives and opinions of individual authors and does not represent the opinions and statements of the iMEDGlobal (an FMD K&L Company).*