This is how AI could save us from the coronavirus crisis

By Ariana Eunjung Cha

Early this spring as the pandemic began accelerating, AJ Venkatakrishnan took genetic data from 10,967 samples of the novel coronavirus and fed it into a machine. The Stanford-trained data scientist did not have a particular hypothesis, but he was hoping the artificial intelligence would pinpoint possible weaknesses that could be exploited to develop therapies.

He was awed when the program reported back that the new virus appeared to have a snippet of DNA code - "RRARSAS" - distinct from its predecessor coronaviruses. This sequence, he learned, mimics a protein that helps the human body regulate salt and fluid balance.

Venkatakrishnan, director of scientific research and partnerships at AI start-up Nference, wondered whether this change might allow the virus to act as a kind of Trojan horse. Could this explain its high infection and transmission rates? And perhaps even why people with cardiovascular disease were experiencing more-severe cases, since sodium can impact blood pressure?

"It was a surprise, completely accidental," he recalled. "The machine just spotted that."

Millions of gigabytes of data - the equivalent of a modest library - are being generated by the pandemic each day in medical records and other information on infected patients. Blood results. Age, race. Genetic testing. Interventions attempted. Outcomes. Now, nearly 10 months into the outbreak, scientists are starting to make connections in this jumble of letters and numbers with the help of artificial intelligence, leading to new theories about the virus and how to stop it.

While the human brain can process only so much information at a time, machines are whizzes at finding subtle patterns in huge amounts of data, and they are being deployed against covid-19 - the disease caused by the coronavirus - in ways only imagined in the past. Data scientists are aiming AI at some of the coronavirus's biggest mysteries - why the disease looks so different in children vs. adults, what makes some people "superspreaders" while others don't transmit the virus at all - and other, lesser questions we have made little headway in understanding.

At Northwestern University, a modeling lab is running large-scale simulations on the effects of travel restrictions and social distancing on infection rates. The U.S. Energy Department's Argonne National Laboratory is using AI to home in on the most promising molecules to test in the lab as possible treatments. In Egypt, AI is helping counter coronavirus misinformation in Arabic.

Jason Moore, director of the Penn Institute for Biomedical Informatics at the University of Pennsylvania, who is helping put together an international covid-19 data consortium, said that if the virus had hit 20 years ago, the world might have been doomed.

"But I think we have a fighting chance today because of AI and machine learning," he said.

In April, a computer sorting through medical records confirmed that a lack of smell and taste, which had been reported mostly anecdotally, was one of the earliest symptoms of infection - a discovery that influenced the Centers for Disease Control and Prevention to add anosmia to its list of symptoms. In June, a deep dive into the records of nearly 8,000 patients found that while only a small fraction had obvious and catastrophic blood clots, nearly all had worrisome changes in their blood coagulation.

Other researchers piggybacked on Venkatakrishnan's finding of the aberrant genetic sequence to understand how the virus binds to cells, and to use that knowledge to develop drugs that aim to reduce transmission.

In a follow-up paper published in September, Venkatakrishnan and his colleagues reported that a computer analysis showed this "evolutionary tinkering" by coronavirus, which appears to have made it appear like a friend instead of a foe to the human immune system, mostly target the lungs and blood vessels - a finding that provides new insights about clinical symptoms seen by doctors at hospitals.

The early progress in AI has been promising, but critics worry that efforts to harness covid-19 data have been disjointed and frustratingly slow. Others are concerned that analyses based on faulty or biased algorithms could exacerbate existing racial gaps and other disparities in health care.

One of the biggest challenges has been that much data remains siloed inside incompatible computer systems, hoarded by business interests and tangled in geopolitics. Academic researchers, medical societies and private companies have launched a number of efforts to try to overcome those barriers by creating their own giant databases of health records and other data - but the efforts are slow-going.

The largest - a $20 million, four-year project by the National Institutes of Health led by scientist Bill Kapogiannis - is not expected to yield results until December at the earliest. But Kapogiannis said he is optimistic the pace of science will accelerate with computing power behind it.

"The human brain becomes pretty quickly overwhelmed by the permutations and combinations of these things," he said. "But when you put AI into it, it can run countless simulations and can home in on important stuff very quickly and effectively."

Yet with the stakes so enormous, Isaac Kohane, a Harvard bioinformatics researcher, said the world is not moving fast enough to tap into the power of electronic medical records and other data. He argues that "parochial interests have slowed our national response."

For example, reviewing data from 96 hospitals in several countries from Jan. 1 to April 11, scientists found that many patients had "really off-the-charts" readings of blood clotting, he said. But because of the difficulty of consolidating all that information, the analysis was not conducted until early summer - delaying the use of blood thinners at some institutions. He worries the wait cost thousands of lives.

"It's not that we are failing to learn from our data," Kohane said. "We are not learning fast enough."

- - -

The dawn of AI in medicine was supposed to have come and gone.

In 2008, Google rolled out a flu tracker it said would revolutionize our public health response to infectious diseases by predicting outbreaks before they occurred. In 2014 and 2015, IBM made headlines when it brought Watson - its "Jeopardy!"-winning, recipe-making, call-center-innovating computer brain - into cancer care and promised it would upend treatment by having a computer recommend personalized plans for every patient based on their histories, genetic data and other information.

But those efforts wildly overpromised, and the second machine age, as scholars called it, failed to materialize amid technical failures, skepticism from some doctors and a clash among scientists over what is perhaps the most fundamental part of the scientific process.

The traditionalists believed science should begin with hypotheses that should be systematically tested. What AI researchers were doing - starting with the data and then looking for correlations - was derisively dismissed as "p-hacking" by the critics. Popularized in an article in Nature News, the term refers to the manipulation of data to get the desired statistical significance, or what's known as the "p-value" of an idea.

It was not until relatively recently that AI became more accepted as a tool for pinpointing "signals" to guide researchers, rather than as a method for generating definitive conclusions. Covid-19 has been a big part of that change.

One of the most ambitious new data projects is led by Maryellen Giger, a radiology professor at the University of Chicago. Giger is working with the three major medical-imaging societies to create an open-source repository of 60,000 covid-19 images - with a focus on chest radiographs.

In the United States, the dominant method of diagnosing coronavirus illness has been through polymerase chain reaction tests, which measure the presence of viral DNA in the nasal cavity. The shortcomings of that strategy have been well documented; the tests are only somewhat reliable, and there has been a huge lag time to get results in some areas, sometimes making them useless in controlling transmission.

During the crisis in Wuhan, in contrast, some Chinese doctors used chest images as part of their diagnoses and found that even those without symptoms sometimes had the telltale "ground-glass opacities" that showed their infection.

Giger is looking to test those anecdotal reports and said early AI analyses that look pixel by pixel suggest that the images might be useful not only as a diagnostic tool but also as a way of monitoring the progression of illness.

"You can see the creep of the disease as you look at these images," she said.

In another prominent effort, this one based at Harvard, Kohane and his collaborators have put together a data-sharing consortium of nearly 200 medical institutions with 50,000 patients. Many of the insights have been retrospective. The degree of kidney damage could be seen in the data as far back as February. But when the pandemic hit New York state in March and April, doctors were taken by surprise at the number of patients who needed dialysis.

Perhaps the most eye-opening finding for him, Kohane said, is that the course of the disease appears remarkably similar across countries despite huge differences in death rates - suggesting that treatment, age of the population and preexisting conditions may be responsible for varying mortality, rather than something about the virus itself.

"If you had asked me in April, 'Is there something different about the virus across countries?' I would have said yes. But what came out of our analysis is remarkable consistency," he said.

Kohane said the coronavirus has underscored the need for health systems to pivot to a more predictive approach to medicine.

"We don't wait for the hurricane to hit Florida before we start preparing," he said. A scientist might look at storms forming in the Sahara, which can turn into tropical cyclones and march across the Atlantic, to figure out what is coming next in the Caribbean and the United States.

"This is an opportunity to realize how seriously we need to take this function of monitoring our own health-care systems with the same responsibility a meteorologist feels for their city," Kohane said.

Adrienne Randolph, a critical-care specialist at Boston Children's Hospital, is helping coordinate an effort across more than 70 hospitals to look at why children appear to be affected by the virus so differently than adults are.

More than 1,000 children are in their database who were hospitalized for covid-19 or the related multisystem inflammatory syndrome in children (MIS-C). With no obvious clue to what might predict MIS-C, the researchers are conducting genetic sequencing and regularly monitoring blood samples for antibodies and other changes to try to better define the course of the illness.

One critical and time-sensitive question they are exploring is whether a subgroup of children may be more vulnerable to adverse effects from vaccines. One theory of MIS-C suggests it is the antibody response that upsets the immune system to create the illness.

"We want to make sure to anticipate what we can," she said, "such as, could a vaccine trigger MIS-C?"

- - -

In April, Leo Anthony Celi, a critical-care physician at Beth Israel Deaconess Medical Center in Boston and a Massachusetts Institute of Technology AI researcher, watched in horror as the pandemic unfolded in New York City - "when we were really in hell," he said. He saw the bodies in trailers and monitored the count of ventilators and beds.

He was especially concerned about rationing protocols that could determine who lived and who died. The algorithms were built on data showing Black and Hispanic patients dying more, so he worried they would flag such patients as being at higher risk of death. That might result in doctors being more likely to withhold resources from them - sealing their fate.

"Algorithms are not perfect scorers," he said. "They make mistakes all the time. If you apply these to covid, you might identify someone as being likely to die but they will actually live."

Celi said that an AI analysis of covid-19 mortality data, under review at a journal, shows that ventilator and bed allocation plans in many states do not accurately predict who might benefit most from treatment. He is urging states and hospitals to rework those plans for a second wave of the pandemic, with an eye toward minimizing biases.

Moore, the Penn bioinformatics expert, has similar concerns about analyses on the efficacy of therapies.

"If you're only studying primarily Caucasian populations and want to apply that nationally, that may not work as well on a more diverse population," he said. "AI algorithms themselves can be biased, and can pick and inflate biases in the data. Those are the things I worry about."

- - -

Cambridge, Mass.-based Nference is made up of 250 computer programmers, PhDs in medical or biological sciences, and other specialists. Before the pandemic, the company, which raised nearly $145 million from venture capital funds and other investors, had secured partnerships with several prestigious institutions - most prominently the Mayo Clinic and Janssen Pharmaceuticals - to help them manage and analyze their medical data.

The company's previous focus was cancer. But since April, it has made headlines for its work on covid-19.

In peer-reviewed publications, the team has confirmed reports out of Britain that steroid use could be effective in treating severely impacted covid-19 patients experiencing respiratory distress. It found that a small percentage of people might be long-term "shedders" of the virus - for up to 22 days. And it identified existing childhood vaccines that may provide some protection against covid-19 infection.

It has also partnered with Minnesota state to develop a way to predict covid-19 hot spots so that public health resources such as test sites could be better deployed.

In addition to clinical data, the company is analyzing 50,000 public documents from academic journals, filings with the Securities and Exchange Commission, and other public sources of data on covid-19 - many multiples of what the average researcher can follow and digest.

Venky Soundararajan, co-founder of Nference and a biological engineer, said that seeing the scope of the information gathered on covid-19 makes him hopeful and appreciative that so many minds around the world - both human and artificial - are working on the problem.

"It makes you very humble very quickly," he said. "What you know is only an atom in the universe of what's out there."

The Washington Post