Data analytics is an essential part of every profession. The collection of data and information is used for forecasting the unknown future. Big data and data analytics are a vital part of any corporation, business or organization. Analytics play a vital role in healthcare, then it might be for the financial or business market. Understanding data to the ever-changing trends, including new research findings, emergency situations and outbreaks of diseases is essential. Therefore, this paper is about using big data and effective use of analytics in the healthcare industry which can not only improve current healthcare but more importantly can facilitate preventive care as well.
What is Big Data?
Big data basically refers to a huge volume of data that cannot be stored or processed using the traditional approach within a specified time period. The sum of all the information and data related to the patient’s health and well-being make up for the “big data” in the healthcare industry. This includes everything from social media posts or web pages to emergency correspondences, news feed and articles in medical journals. Understanding and utilizing this vast amount of data has the potential to improve care and save lives however, this potential is lost without creating a way to connect and understand the desired patterns and trends. Thus, to take advantage of this explosion in data, data analytics are required to create understanding and actionable conclusions.
What is Data Analytics?
Data analytics is the examination of data with an intent to draw a conclusion from the information. It is used in every industry; from the most popular burger during dinner time at a restaurant to what direction a baseball player is more likely to hit a ball off a right-handed pitcher. In emergency care, data analytics helps emergency teams to effortlessly sort through the raw data, message traffic and news feeds from the Internet to define the “where” and “when” immediately. In preventative care, data analytics predicts outbreaks, trends and prepares healthcare specialists for the challenges they’ll be facing soon. Like other industry Healthcare sectors also benefits immensely from data analytics. Data analytics can be used in collecting research papers, searching specific findings and to get information on the latest research-based best practices which eventually helps the medical practitioner in diagnosing. Data analytics is not just analyzing data and draw the conclusion on current situations but also predicting future events based on current facts and trends. There are two-part in Data Analytics. Exploratory data analysis and confirmatory data analysis. Exploratory data analysis, also known as EDA, is used to summarize the main characteristics of the data set by visual methods. In confirmatory data analysis, one evaluates evidence using a traditional statistical test. Confirmatory data analysis involves testing of hypothesis, analysis of variance, significance level, P value and confidence interval etc.
Before modern science began to take shape in the 19th century, most healthcare practitioners depended on independent knowledge, skill and slight luck when diagnosing people. Little knowledge was available on most common sicknesses, let alone what caused it. Healthcare practitioner knew the symptoms and possibly the name of the disease, but other than that, most healthcare practitioner could do little to treat the infection, and often, the cure appeared far more drastic than the condition itself. If healthcare analytics existed 100 years ago one could easily avoid the epidemic of plague and saved millions of people from dying and with proper data analysis, a medical professional could have found the real cause of the disease.
Sadly, many in the medical community remain in the dark era by not properly using healthcare analytics to determine the needs of patients and their communities. This is a limitation in the current medical facility, resulting in poor treatment options for local patients. Data analysis obtained directly from a community can identify key needs that the local citizens have, what their common ailments are, services that they require and just about any other variety of information needed. It is essential for a medical service provider to tailor its services towards the community. The effective use of analytics turns big data into actionable information for local care providers 6.
Before the advent of modern analytics, researchers and analysts were forced to work on a huge amount of data resulting in thousands of hours of labor to make a simple conclusion based on the combined data. Most of the time information was missed during data transition in which data was analyzed. Hence it is necessary to bring a system to create understanding from multiple data sets. A flexible platform is required to accept multiple data sources and should have below features:
- The ability to explore information by relationships between entities
- Ability to customize the environment based on individual needs and data sets
- Tunable search algorithms for significance, relevance, temporal decay and geo-spatial decay
- Simple third-party integration using JSON, XML, RSS and/or KML
- Visual interface
- Natural language processing (NLP)
- Ability to preprocess structured and unstructured data 6
Modern Healthcare Analytics Data
Financial sectors often employ advanced predictive models. Many credit card companies implement a system called real-time analytics, allowing them to keep track of data flow. This helps the company to determine when a credit card may be stolen and when unwanted purchases are made. This helps prevent identity theft. Health providers can now do the same, predicting when diseases will appear, what the severity of the disease will be by using data analytics. In terms of healthcare, companies must understand what services are best for an individual with a specific need. After understanding and analyzing the data of all past patients it is possible to determine what course of actions will best help future patients. Without proper data analysis, treatment is done on an ad-hoc basis often based on outdated research. A direct, knowledge-based method, which almost a ensures the right treatment is used the first time, reducing the total amount of time a patient may need a doctor, hospital visits or other medical needs 6.
Collecting Healthcare Analytics Data
Data collection is very important when it comes to healthcare analytics. This includes collecting data from the patient, medical journals, social media or doctors online. Most medical facilities collect introductory data before even seeing a patient. This includes information such as name, weight, age, medical history and family history. This data is important, as the information can then be compared to other data flows containing related information to find out if the patient shares similar causes and symptoms. But data collection shouldn’t stop there. This is merely the tip of the iceberg, which needs uncovering. Notations must continue throughout the entire treatment process, to see how disease and its symptoms developed and how they compare to the results of other patients in the community.
Detailed records compared to historical databases not only help the current patient but also determine future preventative practices for the same illness. With the ability to collect and use data, it is possible to find similarities between patients. This provides a data analytic program with the ability to create understanding and help point to other causes and better forms of treatment. Seemingly unrelated information may uncover correlations or treatments for illness. The ability to combine medical research with local findings also means providing localized and more effective treatments. It is always impossible to find what material may prove essential in the medical field, so collecting and understanding big data will help produce a new practice in healthcare domain 6.
Challenges in Healthcare Data Analytics
Healthcare data is particularly rich, and it is collected from a wide variety of sources such as images, text in the form of medical literature, clinical notes, and traditional electronic records. This heterogeneity in the data collection and representation process will bring numerous challenges in both the pre-processing and analysis of the data. There are different techniques required to analyze these different forms of data. And the diversity among the data naturally creates various data integration and data analysis challenges. According to researcher and healthcare professionals, a major challenge in the healthcare domain is its heterogeneous nature. In the healthcare domain, we have seen information comes from different sources such as databases, medical researchers, social media and healthcare practitioners. While these diverse sources of information add richness to the field along with that more challenges in making significant advances. Computer scientists and statisticians are usually not trained in healthcare domain and medical concepts, whereas medical practitioners and researchers also have a limited understanding of mathematics and statistics required in the data analytics field. These difficulties will not allow creating a coherent body of work in this domain even though it is evident that much of the available data can benefit from such advanced analysis techniques.
Another big challenge that exists in the healthcare domain is the “data privacy gap” between medical researchers and data scientists/statisticians. Healthcare data is very sensitive because it can reveal much information about individuals. Several laws in different countries, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, explicitly forbid the release of medical information about individuals. Medical practitioners have natural access to healthcare data because their research is often based on actual medical practice. On the other hand, collecting data is not quite as simple for data scientists/statisticians without a proper collaboration with a medical practitioner. Even then, there are barriers to learning from data. Many of these challenges can be avoided if accepted protocols, privacy in technologies, and safeguards are in place.
Healthcare Data Sources and Basic Analytics
This part discusses briefly regarding various healthcare data sources and the basic analytical methods that are widely used in the processing and analysis of such data. The different forms of study subject data that is currently being collected in both clinical and non-clinical environments will be studied. The clinical data will have structured electronic health records and medical images. Sensor data has gained a lot of attention recently. Personalized medicine has received a lot of importance due to the advancements in genomic data. Genomic data analysis involves several statistical techniques. Patient’s in-hospital clinical data will also include a lot of unstructured data in the form of clinical notes. The fundamental data mining, machine learning, information retrieval, and natural language processing (NLP) techniques for processing these data types can be used extensively.
Advanced-Data Analytics for Healthcare
This part deals with the advanced analytical methods can be used on healthcare. This includes clinical prediction models, temporal data mining, and visual analytics. The clinical and genomic data is essential for improving the predictive power of the data. Information retrieval techniques that can enhance the quality of medical search. Below we will discuss several advanced data analytics methods, techniques which include various data mining and machine learning models that need to be adapted in the healthcare domain.
1.Clinical Prediction Models
Clinical prediction is a critical component in modern-day healthcare. Several prediction models have been extensively investigated and have been successfully implemented in clinical practise 5. These statistical models have made a great impact on the detection and treatment of diseases or disorders. Many successful supervised learning methods that have been implemented for clinical prediction tasks fall into three categories: (i) Statistical techniques such as linear regression, logistic regression, and Bayesian models; (ii) Sophisticated methods in machine learning and data mining such as decision trees, Random Forest and Neural Networks; and (iii) Survival models that aim to predict survival outcomes. All these techniques focus on discovering the relationship between covariate variables, which are also known as attributes and features, and a dependent outcome variable. The choice of the model can be used for a healthcare problem mainly depends on the outcomes to be predicted. There are different kinds of predictive models that are proposed in the literature for handling such a diverse variety of outcomes. Some of the most common outcomes are in binary and continuous forms and less common forms are categorical and ordinal outcomes. In addition, there are also different models has been used to handle survival outcomes where the aim is to predict the time of occurrence of an event of interest. There are different methods for evaluating and validating the performance of these prediction models.
2.Temporal Data Mining
Healthcare data contain time information and it is inconceivable to reason and mine these data without using the temporal dimension. There are two main sources of temporal data generated in the healthcare domain. The first is the electronic health records (EHR) data and the second is the sensor data. Mining the temporal dimension of EHR data is extremely promising as it may reveal hidden patterns that enable a more precise understanding of disease manifestation, progression and response to treatment. Some of the unique characteristics of EHR data (such as heterogeneous, high-dimensional, irregular time intervals) makes conventional methods ineffective to handle them. Unlike EHR data, sensor data are usually in the form of numeric (time series) that are regularly measured in time at a high frequency. Examples for such data are physiological data obtained by monitoring the patients on a regular basis and other electrical activity recordings such as electrocardiogram (ECG), electroencephalogram (EEG), etc. Sensor data for a specific subject are measured over a much shorter period compared to the longitudinal EHR data. The different natures of EHR data and other sensor data’s, the choice of appropriate temporal data mining methods are often different. EHR data are usually mined using temporal pattern mining methods, which represent data instances (e.g., patients’ records) as sequences of discrete events (e.g., diagnosis codes, procedures, etc.) and then try to find and enumerate statistically relevant patterns that are embedded in the data. On the other hand, sensor data are often analyzed using signal processing and time-series analysis techniques (e.g., wavelet transform, independent component analysis, etc.) 4, 5
Visual analytics provides a way to combine the strengths of human perception with interactive interfaces and data analytics that can facilitate the exploration of complex datasets. Visual analytics is a science that involves the integration of interactive visual interfaces with analytical techniques to develop systems that makes easier for interpretation of complex data 3. Visual analytics is popular in every aspect of healthcare data analysis because of the wide variety of insights that provides. Due to the rapid increase in health-related information, it becomes critical to build effective ways of analyzing huge amounts of data by leveraging human-computer interaction and graphical interfaces. In general, it provides summaries of complex data such a way that anyone can understand and gain novel insights.
Computer-aided diagnosis/detection (CAD) is a procedure in radiology that supports radiologists in reading medical images 1–2. CAD tools general refer to fully automated and designed to assist the radiologist in the detection of lesions. Clinical specialists agreed that the use of CAD tools can improve the performance of the radiologist. CAD helps the radiologist to interpret the images while CAD algorithms are running in the background or have already been preprogrammed which helps in identifying the structures are then highlighted as regions of interest to the radiologist.
There is great promise in the future for healthcare driven by data analytics. The growing quantity of clinical and research data, along with methods to analyze and put it to use, can lead to improving personal health, healthcare delivery, and medical research. However, there is also a continued need to improve the completeness and quality of data as well as to conduct research to demonstrate how to best apply it to solve real-world problems. In addition, human expertise in the healthcare domain, including informatics, will be required to optimally carry out such work.
Goutam Hanje, Executive- Clinical Safety at FMD K&L
Goutam is a trained pharmacologist having more than 3 years of work experience in conducting, Monitoring and Data reviewing of phase 1 Clinical trials and BA/BE (Bio Availability & Bio equivalent) clinical studies. Clinical Safety, data management, Claims Reviewer and expert in Reviewing Marketing collateral, Brochures, Promo Adds & Videos to ensure Clinical, Medical & Scientific Accuracy of Consumer, Cosmetics and OTC products.
Sudhama Shetty, Executive- Statistician at FMD K&L
Sudhama Shetty is a statistician with experience of 3+ years and holds a Master degree in Biostatistics from Manipal University. He has experience in building statistical models and designing and implementing adaptive clinical trials.
DISCLAIMER: The information provided in this White Paper is strictly the perspectives and opinions of individual authors and does not represent the opinions and statements of the iMEDGlobal (an FMD K&L Company).