How Virologists in China Worked to Warn the World About COVID-19

On the evening of December 30, Chinese virologist Zhengli Shi was in Shanghai attending a conference when her boss in Wuhan, director of the Wuhan Institute of Virology, reached her by cell phone. “Around ten o’clock in the evening,” as Shi told me later. “I have never heard about an unknown pneumonia before that.” Now she heard. There was an atypical form of pneumonia, manifest in a smattering of cases across the city, its cause so far unknown, with preliminary lab results indicating that a coronavirus could be involved. Some samples from patients had just arrived at the WIV and the director wanted Shi’s lab to work on them.

Article continues after advertisement

“I was asked to take action, yeah,” she said. “To do the detection”—to identify the virus more conclusively. Shanghai time is thirteen hours ahead of New York’s, so Pollack’s first posting on ProMED hadn’t yet gone up; most of the world was still oblivious, except those, like Henry Li at the Susan Weiss lab in Philadelphia, connected to Wuhan by WeChat or other social media. In Wuhan itself, word was out, but only to a select few.

Shi immediately called her own lab, found that three night-owl students were still there, and asked them not to go home but to wait, despite the hour, and receive an extract of viral RNA, drawn from hospital samples, when that arrived as promised from another laboratory, at any moment. She instructed them to begin work, using two methods, to identify what sort of virus it was. The first method was a broad PCR test that would detect any form of coronavirus. The second PCR method was more specific for detecting SARS-related coronaviruses.

There was an atypical form of pneumonia, manifest in a smattering of cases across the city, its cause so far unknown, with preliminary lab results indicating that a coronavirus could be involved.

Shi herself had a meeting next morning in Shanghai, but as soon as that ended, on December 31, she grabbed a train home to Wuhan. She went straight to her lab and saw the PCR results, which the students had gotten that morning. “The machines read the data,” she said, and from that “we know that it’s a SARS-related coronavirus.” They hadn’t yet sequenced the full genome themselves, Shi’s group, but they possessed partial sequence data from another lab.

“My first reaction is we need to compare the sequence,” she said—compare the new virus’s genome, that is, with the genomes of bat coronaviruses detected from samples in her own lab, to see whether, by some horrible mischance, there was a match. “It’s normal!” she said to me with some vehemence, reacting against criticisms that have come at her since. If she had “frantically” gone to her own data, as reported, didn’t that imply a guilty awareness that the new virus had likely leaked from her lab? No, she said, it did not. What it implied was normal diligence on an important point. And the new virus did not match anything from her sequence records, if she can be believed (and I think she can, though I can’t prove it).

“So, the afternoon of December 31, I already know it’s nothing related to what we have done in our laboratory.” She felt great relief. That evening she met with local officials, from the Wuhan Municipal Health Commission, and reported her lab results.

Then her team plunged back to work. Within two days, they had a provisional draft of almost the full genome sequence. They may not have been the first, but they were among the earliest, to sequence a near-complete genome. Why didn’t they publish it at once? Because of concern for accuracy over speed. The first SARS genome as published, back in 2003, had contained mistakes; sequencing technology was less precise and reliable then, and haste had preempted confirmation. This time should be different.

The Health Commission asked two other institutions besides Shi’s lab to produce sequences, all working independently, and then they compared versions and resolved technical disparities. By January 6, she had a complete genome, correct and confirmed. But still she didn’t release it due to continuing caution. And so the Zhang version, released by Eddie Holmes through the Virological website on early January 11 (UTC), and the sequences submitted by George Gao’s team to GISAID on late January 9 (UTC), were the first SARS-CoV-2 genomes made widely available.

The loss of priority on that point does not seem to have bothered her. Neither did the early questioning about whether this novel virus might have leaked from her laboratory. “I think it’s normal,” she said of such questioning. People would speculate, would make accusations, but that was because they didn’t know the intricacies of coronaviruses. She could see differences, a complex of features, distinguishing this one from any bat virus she had sequenced, let alone anything she had grown. “But at the beginning, I think, Okay, it’s not necessary to explain too much.” Maybe. If there was any delay in demands for explaining, though, it wouldn’t last long.

Through the next couple weeks, Shi and her group were busy in the lab. Besides assembling a full genome sequence of the virus from the fragments they had detected by PCR, they compared that sequence to the sequence of SARS-CoV and found them to be 79.5 percent identical. So this was another SARS-like virus, but it wasn’t the original SARS virus, because 20 percent difference implies many decades of divergent evolution. They assembled four more complete genomes, from four other patients, and each of those was an almost exact match to their first. That helped confirm what they were seeing.

They gave their virus the tentative label nCoV-2019, a slight variant of what the WHO had begun calling it; but the time was still early, and naming was in flux. By way of colleagues at Wuhan Jinyintan Hospital, which treated many of the first few dozen cases, they obtained a sample from the deep airways of one patient, and from that they grew live virus. They tested that virus against cells in culture, finding that it could use the same receptor for cell entry as SARS-CoV, the ACE2 receptor. Furthermore, it could use the ACE2 of a horseshoe bat, of a civet, and of a pig, as well as the ACE2 of a human—so this virus was already broadly adapted, it seemed, for infecting a variety of hosts.

They may not have been the first, but they were among the earliest, to sequence a near-complete genome.

One further bit of Shi’s lab work, from these early weeks of 2020, would draw continuing attention. That’s putting it mildly. In fact, this discovery would become like a Rorschach inkblot, susceptible to drastically different and subjective and, in some cases, impassioned interpretations. (Is it coincidence that Hermann Rorschach’s Card 5 looks so much like a bat—or is that just me?) Having noticed a strong similarity between one region of the new genome and something naggingly familiar, they gave the similarity a closer look: they retrieved their full genome sequence of a bat virus from the Mojiang mine, the one they had labeled RaTG13, and compared that with their genome from Jinyintan Hospital. The similarity was 96.2 percent. That made RaTG13, at least for the moment, the closest known relative of the pandemic virus.

On January 23, 2020, Zhengli Shi and her colleagues announced these findings to the world. They did it in the form of a preprint (a draft paper, made available on a website, not yet peer-reviewed and published in a journal) posted through the preprint repository bioRxiv (pronounced “bio-archive”), which is hosted by the Cold Spring Harbor Laboratory, an august institution on Long Island. They also submitted the paper to Nature, where it was peer-reviewed quickly and published on February 3.

In the meantime, case numbers rose quickly in China—from forty-one lab-confirmed cases at Jinyintan Hospital on January 2, then outbreaks in other parts of the country by January 19, exploding to 11,791 Chinese cases by January 31—and the virus escaped, by way of travelers, beyond national boundaries. Thailand reported a confirmed case on January 13, in a woman from Wuhan who had come to Bangkok on a visit. Japan confirmed a case two days later and then, on January 20, both South Korea and the United States reported their first recognized cases. Early reporting from the Wuhan Municipal Health Commission linked many of the cases to the Huanan Seafood Wholesale Market, as mentioned earlier.

That connection, to a market including wildlife, drove the provisional narrative about how and from where this novel virus might have gotten into humans. But because the market had been closed by Wuhan authorities, and the place cleaned, on January 1, its potential role was never thoroughly investigated. Concern grew around the world, as people gradually realized that this tiny microbe could become a global problem. Fragmentary data and anecdotal scraps fed speculation, venturesome hypothesizing, hasty conclusions, and confusion, especially regarding the origin of the virus. Where had it come from, how had it taken shape, and how had it gotten into people? January was a feverish month.

Two early research papers exemplify the dizzy eagerness among some scientists to tell an arresting story. The first, from a Chinese team affiliated with Peking University, Guangxi University of Chinese Medicine, and other institutions, noted certain parallels between the genome of the new virus and the genomes of snakes. Those parallels involved something called codon usage, referring to the various ways by which the letters of a genome, in three-letter clusters (called codons), can specify a given amino acid be inserted as the next element in a protein being built. In other words, codon usage is spelling.

All you need to know about it, for purposes here, is that there are alternate possible ways to spell the coding for each amino acid—just as there are alternate possible ways to spell the English word “color.” If you see it spelled “colour,” that delivers a hint: British. Similarly, these Chinese researchers claimed to see a hint in the codon usage of the new coronavirus: snake. Since the codon usage in the virus seemed to resemble the codon usage in some snakes, could that mean the virus had been a longtime infection of snakes? It was tenuous.

The scientists looked at two kinds of snake, both native to Hubei province roundabout Wuhan: the many-banded krait and the Chinese cobra. Both snakes spelled some of their amino acids with codon usage resembling the usage in the new coronavirus—more similar than that seen in birds, hedgehogs, marmots, humans, or bats. “Snakes were also sold at the Huanan Seafood Wholesale Market,” the authors noted, although they don’t seem to have known which kinds. Many-banded krait and Chinese cobra might well have been market-available, because they are favored for that ancient Cantonese delicacy, snake soup.

But the researchers were cautious, merely suggesting that their codon-usage analysis “provides some insights to the question of wildlife animal reservoir” of the virus, “although it requires further validation by experimental studies in animals.” For instance, experiments to test whether the new virus could even survive in snakes—experiments that these researchers didn’t do. This study appeared in a peer-review monthly, the Journal of Medical Virology, but it wasn’t warmly embraced by the scientific community, to say the least. It made headlines in tabloids, it got its moment on CNN, it appealed to nonscientists with a certain taste for the lurid, but the hypothesis came and went quickly. Other scientists looked at the evidence, such as it was, and essentially said: phooey.

Fragmentary data and anecdotal scraps fed speculation, venturesome hypothesizing, hasty conclusions, and confusion, especially regarding the origin of the virus.

The second incendiary paper that month came from a group of scientists in New Delhi, who posted it as a preprint on bioRxiv on January 31. These authors purported to have discovered four “unique” stretches of amino acids in the spike protein of the new coronavirus, each stretch six to twelve aminos long, that bore an “uncanny similarity” to amino acid placements in corresponding proteins in HIV-1, which includes the pandemic subtype of the AIDS virus.

Such similarity, they claimed, was “unlikely to be fortuitous in nature.” They called these stretches “insertions” into the coronavirus, implying that it had been assembled in a lab, possibly using portions of the HIV-1 genome to make it more infectious to human cells. But as expert critics soon pointed out, the “uncanny” coincidence was not uncanny at all. The “insertions” were not insertions; they were commonplace, resembling stretches seen in many other creatures (including the bat virus RaTG13).

The whole paper was a bollox, trumpeting a coincidence about as improbable and suspicious as finding (as you can do) the words “mischievous,” “players,” “overcharged,” and “countrymen” within the complete works of Shakespeare. Was the clever playwright from Stratford-upon-Avon slyly bragging that his mischievous players had overcharged their countrymen? Doubtful. Equally doubtful that the new virus had grabbed bits of its genome, by some form of implausible recombination, from HIV-1.

Very quickly, this paper was taken down, and if you find it online now, you’ll see a large gray stamp across each page: WITHDRAWN. The authors issued a statement saying, “To avoid further misinterpretation and confusions world-over, we have decided to withdraw the current version of the preprint and will get back with a revised version after reanalysis, addressing the comments and concerns.” But it seems they never have.

“I was very angry,” Zhengli Shi told me. The snake story and the “uncanny similarity” paper are just a sampling of the distractions, false leads, and misapprehensions that crackled across the internet in early 2020. Some of them targeted her lab, implicitly or explicitly. “So I tried to isolate this information, misinformation,” she said—to block it out, to quiet herself, to focus. “And continue to work.”

_____________________________________