If you think about a doctor’s job,” says Ziad Obermeyer, MD ’08, “making a decision for even one patient is a big data challenge.” Doctors must process an enormous flow of information, he explains, beginning with “the patient and her prior care—and all the data that accompany that, while also incorporating the research literature that is growing every day.”
For centuries, a physician’s first source of data has come from the clinical conversations that form the heart of the doctor-patient relationship. Think of this as the fine art of listening. The information from that conversation is documented in medical records as notes, forming the basis of medical reasoning and clinical decision making.
With the volumes of data being captured in biomedical laboratories and through electronic health records, making well-informed clinical decisions is becoming increasingly challenging.
“One way to let humans play to their strengths,” says Obermeyer, “is to let computers help us process some of that information and turn it into more precise probability predictions.” Obermeyer, who is an acting associate professor of health policy and management at the University of California-Berkeley’s School of Public Health and a researcher in the Department of Emergency Medicine at Brigham and Women’s Hospital, adds that “a doctor’s job is to take in and process a ton of information and turn it into a probability judgment—about the likelihood of a disease or the likelihood that a potential treatment will benefit the patient. A lot of what we do in routine medical care is solve problems that computers are really good at solving.”
That’s where machine learning comes in—the tool kit of algorithms and statistical techniques that, combined with twenty-first century computing power, can analyze the immense amounts of data produced while caring for patients. These computational tools have the potential to transform how doctors use data to make clinical decisions for their patients and are increasingly empowering precision medicine and personalized care. Machine learning, and the big data sets it requires, is transforming how physicians approach patient care and clinical and translational research.
The soul of the machine
In a now-famous paper published in 1950, British mathematician and logician Alan Turing asked, “Can machines think?” His question planted the seed of an idea: artificial intelligence. The 1940s and 1950s saw the development of artificial neural network algorithms, which were modeled on the way the brain’s neurons respond iteratively to stimuli and which are the origins of today’s deep learning and artificial intelligence applications and expert systems.
“One way to let humans play to their strengths is to let computers help us process some of that information and turn it into more precise probability predictions.”
When it comes to AI applications in health care, the spark, says John Halamka, the International Healthcare Innovation Professor of Emergency Medicine at HMS and chief information officer at Beth Israel Deaconess Medical Center, came from the Obama administration, when, in the 2009 American Recovery and Reinvestment Act, it provided incentives to encourage the adoption of electronic health records.
Among the first clinical decision-support tools were paper flowcharts and checklists; these capacities are now built into EHRs to help intelligently filter information. Real-time electronic alerts and reminders help ensure preventive care such as cancer screenings and management of chronic diseases such as diabetes. They can also provide guidance to physicians on drug selection, dosage decisions, drug-interaction screens, and disease-specific orders that reflect best practices. Machine learning can also improve health care delivery. Beth Israel Deaconess, for instance, uses machine learning to predict operating-room time needed for each patient and to flag patients who are unlikely to show up for appointments.
One of the obstacles to capitalizing on the potential of EHRs has been their proprietary formats. That’s changing, however, thanks to Fast Healthcare Interoperability Resources, a tool that is helping developers more easily create apps and tools from EHR data. Mobility also matters: the online SMART App Gallery of software includes patient- and clinician-facing apps that allow for data sharing among patients and provide clinicians with app-based diagnostic tools.
Unlocking the potential of EHR-based big data for clinical research depends on machine learning, the algorithmic and statistical tools that excel at identifying patterns and applying a learned pattern to new data in order to make predictions. Because these algorithms are designed to solve these kinds of problems, they make what Obermeyer calls “good thinking partners” for doctors.
In the mid-1980s, when Isaac Kohane, the School’s Marion V. Nelson Professor of Biomedical Informatics and chair of the Department of Biomedical Informatics in the Blavatnik Institute at HMS, interrupted medical school to pursue a doctorate in computer science, medicine, he says, “was already so overwhelmed with information that it was becoming challenging to turn facts into knowledge.” Although the tools of artificial intelligence promised a way to learn from patients, there first needed to be a data infrastructure to work from. Kohane and colleagues wondered whether de-identified, privacy-protected clinical data from EHRs could be combined with genomic data to form a single database capable of providing the large data sets needed to advance research on genetic diseases.
This need led to Informatics for Integrating Biology and the Bedside, or i2b2, a research platform for mining EHR data. i2b2 was first funded by the National Institutes of Health in 2004, with Kohane as the project’s principal investigator. Released in 2007, the software behind i2b2 allows for information held in clinical records systems to be combined with genomic data in a secure HIPAA-compliant database. The platform is free, scalable, and shareable, and now widely used at NIH’s Clinical and Translational Science Award sites.
The platform is also a promising precision-medicine tool. Consider pharmacogenetics, a field that investigates the genetic basis of a patient’s response to drugs. Many drug protocols for diseases from cancer to diabetes have been one size fits all. Yet, says Kohane, with access to large sets of genomic data as well as data on patients’ responsiveness to drug treatments, clinicians can “rationalize picking the right drug for you.” The tool will also enable researchers to develop clinical trials of drug protocols based on a patient’s genetics and the genetic variants of their disease. This, in turn, will lead to genetically informed, trial-tested personalized drug treatments.
Today, EHR-driven genomic research is a field of its own. But it’s taken new tools to extract information from EHRs because they include many different types of information. That’s where natural language processing, a subset of machine learning that uses algorithms to turn text into data, comes in. Researchers are using such algorithms to more accurately identify patients with the phenotypes they hope to study. Better data are driving discovery.
Building better care
When it comes to research, says Katherine Liao, “I always start with patients. My questions come from the unanswered questions that come up all the time as part of clinical care.” Liao, an HMS assistant professor of biomedical informatics and an assistant professor of medicine at Brigham and Women’s Hospital, began thinking about the “kinds of questions that could be asked of EHR data that we probably couldn’t ask before” while a rheumatology fellow at the hospital and involved in the i2b2 project.
Liao’s clinical practice and research focus on rheumatoid arthritis, an autoimmune disease in which the immune system attacks the lining of joints, causing inflammation and damage to the surrounding cartilage and bone. Recently, studies have found that other tissues and body organs can also be affected by the inflammation. When diagnosing the disease, physicians use a patient’s medical history, physical exam, and diagnostic tests.
Liao’s research questions focus on genetic risk factors for the disease, factors that could allow for earlier diagnosis, better disease management, and better treatment decisions. Although rheumatoid arthritis is the most common inflammatory autoimmune joint disease, it is relatively uncommon statistically, affecting just 1 percent of the population worldwide. Such a small patient population means that it can be hard to fill studies with enough people to conduct robust studies.
As part of the i2b2 project, Liao and her team wanted to tap EHR data but first needed to devise an approach that would accurately identify patients with rheumatoid arthritis. She and her colleagues found that machine learning could help them design a highly accurate classification algorithm using coded data from EHRs along with data extracted from narrative clinical notes using natural language processing. After running the algorithm, they had a data set of 4,500 patients and they had it in 18 months rather than in the decades such recruitment would usually take. Even better: the same algorithm worked just as accurately on EHR data from other institutions.
Machine learning, coupled with the ability to more fully mine EHR data, has changed the way researchers approach studies. Prospective cohort studies have traditionally been designed to investigate specific outcomes and test specific hypotheses; researchers need to decide ahead of time what data they want to collect. Since 1948, the Framingham Heart Study, for example, has been used to elucidate the causes of heart disease. Patients have been followed for years with periodic interviews used to gather information on behaviors and how they influence heart disease. In such studies, says Liao, “questions or topic areas that were not considered at the outset can prove difficult to study because the appropriate data would not be available to analyze.”
By contrast, in research driven by EHR data and machine learning, algorithms can analyze health and genomic data and identify meaningful patterns previously unknown to researchers and clinicians. “Now we’re able to let the patterns in the data help us study relationships between genes and many diseases—and not just prespecified diseases,” Liao says.
Studying EHR data has informed Liao’s clinical practice, including how and when she screens patients for lipids, her choice of treatments, and how she addresses the topic of genetic risks with patients. These days, Liao and her team are studying whether they can develop algorithms “that can tell us, in real time, what is the probability of having or developing a certain condition or an unwanted outcome.” Such real-time analysis would transform prevention and early diagnosis efforts, including informing how doctors manage conditions such as rheumatoid arthritis in clinical settings.
The future of data-driven medicine will focus on prediction and personalization. “Machine learning,” Halamka says, “is not about replacing clinicians; it is about enabling clinicians to practice at the top of their license, delivering the right care in the right setting.” Clinical care is also becoming more participatory and patient-centered, easing the flow of doctor-patient information.
Halamka says that the EHR is already getting smarter as medical care is increasingly mobile, interactive, and participatory. Patients at Beth Israel Deaconess are using BIDMC@home, a mobile app that allows them to send health data to their EHR, making them more active partners in providing the data that doctors can use to improve their care, especially for chronic conditions. A sense of what’s on the clinical care horizon can be found in a few of the projects underway in the hospital’s Health Technology Exploration Center: health care features incorporated into common household appliances, apps to help manage anxiety and depression, and telemedicine apps to provide specialty care globally.
Sharing data is as important as the tools used to analyze it. Several of Kohane’s current projects rely on data sharing to help solve otherwise intractable problems. The Undiagnosed Diseases Network distributes data among twelve centers to help diagnose and treat rare, undiagnosed genetic diseases. Kohane’s Network of Enigmatic Exceptional Responders Study investigates the genetics and immunology behind why some patients have a much better than average response to drug treatment. Together with Paul Avillach, an assistant professor in biomedical informatics, Kohane is developing a patient-centered data commons that will allow patients options on how they share their health data.
“I give much more weight to what patients tell me in all those small conversations.”
For Obermeyer, the power of machine learning lies in its capacity for making health policy smarter and informing decisions on how to manage care for high-risk patients. “If we look at our health care system,” he says, “It’s clear we’re not making great decisions. Algorithms can help.”
Paradoxically, new computational tools, and the data-driven infrastructures they draw from, are leading physicians back to what Kohane calls “small data”—and the fine art of listening to patients.
“I give much more weight to what patients tell me in all those small conversations,” he says.
With the rise of tools that transform the notes from doctor-patient conversations into data, two of the physician’s oldest clinical tools—listening and note-taking—are well on their way to becoming medicine’s newest tools for discovery.
Andrea Volpe is a Massachusetts-based writer.
Images: Mattias Paludi (top illustration); John Soares