March 2010

I. Genome, Diversity, and Origin(s)

A new (and mysterious) lethal infectious disease, eventually named AIDS, was recognized in the U.S. in 1981. Within a year, it was clear that the causative agent was a previously unknown virus. The virus was isolated (first in France and then in the U.S.) in 1983 and eventually designated HIV. In the years since then, more work has been done on HIV than on any other virus ever. This intense effort on characterizing HIV disease is due to the combination of the essentially 100% lethality and the long asymptomatic period during which transmission via blood and sexual fluids can occur. As of 2006, the cumulative world death toll from AIDS has surpassed 20 million, and there are over 40 million people worldwide who are currently infected with HIV.

A review article by Rambaut et al. from 2004 Nature Reviews Genetics is titled "The Causes and Consequences of HIV Evolution". Print the summary, which provides short statements about much of what we will discuss this week.

We will start our investigation of HIV epidemiology and evolution by looking today at the specifics of the HIV genome that distinguish it from other retroviruses, the level of diversity of the worldwide population of HIV, and the extent to which we understand the historical origin(s) of the HIV pandemic.


1. What are the specific properties of the HIV genome and its expression that set it apart from other retroviruses?

The handout shows and lists the additional (i.e., other than gag, pol, and env) genes of HIV. These genes (vif, vpr, tat, rev, nef, vpu) get expressed from various alternative splicing products of the original primary transcript. The six proteins produced from these genes are involved in various aspects of virus production, as described in the handout figure.


2. How do we characterize the diversity and relationships of the various strains of HIV?

There are several "levels" of our classification of the diversity of HIV.

First, we recognize two distinct HIVs, HIV-1 and HIV-2, both of which have essentially the same virion structure, the same genome organization, and the same general replication process, and both of which cause AIDS. However, HIV-2 is much less prevalent and is somewhat less pathogenic than HIV-1. Almost all of the cases of AIDS we ever talk about are caused by HIV-1.

HIV-1 strains can be broken into three groups, designated M, N, and O, which differ by over 40% in nucleotide sequence. About 99% of AIDS cases are due to group M strains.

Among the M group HIVs, there are sub-types designated A through K, which differ by up to about 35% in nucleotide sequence from each other. Most of the HIV in North America is sub-type B.


3. How were the basic historical origin(s) of HIV figured out about ten years or so ago?

A major advance was published in Nature in 1999 (from five research labs, in USA, UK, and France) titled "Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes ". Based on comprehensive nucleotide sequencing studies on HIV-1 strains and strains of Simian Immunodeficiency Virus (SIV), the authors concluded the following:

"All HIV-1 strains known to infect man, including HIV-1 groups M, N, and O, are closely related to just one of the SIVcpz lineages, that found in P. t. troglodytes. . . . . . These results, together with the observation that the natural range of P. t. troglodytes coincides uniquely with areas of HIV-1 group M, N, and O endemicity, indicate that P. t. troglodytes is the primary reservoir for HIV-1 and has been the source of at least three independent introductions of SIVcpz into the human population."

Then, in mid 2000, another large collaborative study (from five major labs) was published in Science titled "Timing the ancestor of the HIV-1 pandemic strains." This study concluded the following:

"Using parallel supercomputers and assuming a constant rate of evolution, (we constructed) a comprehensive full-length env sequence allignment, and (from this) estimated the date of the last common ancestor of the main (M) group of HIV-1 to be (approximately) 1931. Analysis of gag gene allignment ... supported these results." A summary diagram is shown in Figure 4 from this paper.

In late 2000 came an article in Journal of Virology titled "Unprecedented degree of HIV-1 group M genetic diversity in the Democratic Republic of Congo suggests that the HIV-1 pandemic originated in Central Africa". This study concluded the following:

"Overall, the high number of HIV-1 subtypes cocirculating, the high intrasubtype diversity, and the high numbers of possible recombinant viruses as well as different unclassified strains are all in agreement with an old and mature epidemic in the DRC, suggesting that this region is the epicenter of HIV-1 group M."

So, by the end of 2000 we finally had a pretty solid understanding of the basic what, when, and where questions pertaining to the origins of HIV. These results have held up pretty well, as can be seen by taking a look at an article in Nature in November 2006 titled "Human immunodeficieny viruses: SIV infection in wild gorillas".