February 2010


Retroviruses are the cause of various cancers, leukemias, and immunodeficiencies in a wide variety of animals. The most widespread retrovirus in humans is HIV. Retroviruses differ from other RNA genome viruses in several important ways. One oddity is that the +ss RNA genome is present in two copies in the virion. Also, retrovirus virions contain a novel polymerase, commonly called reverse transcriptase, that uses the viral RNA as a template to synthesize DNA (so, it is an "RNA-dependent DNA polymerase"). After virion entry, this enzyme produces a ds DNA copy of the viral genome, and this dsDNA then gets incorporated into a (random?) chromosomal site in the host cell nucleus by action of another virion enzyme, called integrase. All viral gene expression, as well as production of full length RNA genomes for new virus particle assembly, occurs from this integrated viral DNA.

All retroviruses have a somewhat similar genome, "gag--pol--env", which gets expressed to produce the 8 or so virion proteins. For now we will look at the ubiquitous features of retrovirus replication. Later in the course, we will look at more particular details of genome structure and replication of HIV.


1. How does a dsDNA copy of the retroviral RNA genome get synthesized and then integrated into cellular chromosomal DNA?

Virion entry via fusion at the plasma membrane releases the virion core into the cytoplasm. The core disassembles partially, allowing the virion copy of reverse transcriptase to start using the RNA genome as a template to synthesize a complementary strand of DNA. Reverse transcriptase also has an "RNAase H" activity, by which it digests the RNA genome strand. The ssDNA is then converted to dsDNA, also by reverse transcriptase. To make sure there is no loss due to the need for primers, the overall process is quite complex, involving "jumps" and duplicate synthesis of the end regions of the genome. The net result is a dsDNA molecule that is actually longer than the RNA genome. Both ends of the DNA contain the LTR ("long terminal repeat") sequence (whereas the RNA genome had some of this sequence at its 5' end and part at its 3' end.)

While this DNA synthesis was proceeding, it 's likely that the overall complex was being transported towards the nucleus. Regardless of exactly where the dsDNA is when it gets completed, it indeed is then transported into the nucleus, still in a complex with a part of the material from the original viral core (in particular, the enzyme integrase). The actual mechanism of transport into the nucleus is known to be different in different retroviruses (through a nuclear pore for some, such as HIV, while for others, nuclear entry requires the breakdown of the nuclear envelope associated with cell division).

Integrase becomes bound to both of the LTRs (thus holding the dsDNA in a circle) and also interacts with a site in the cellular chromosomal DNA. The enzymatic activity of integrase then physically incorporates the viral DNA into the chromosomal site. Early experiments had suggested that the selection of the chromosomal site for integration was random. However, recent studies have shown that the selection may not be completely random. For example, an article by Liu et al., in Journal of Virology in August 2006 is titled "Integration of Human Immunodeficiency Virus Type 1 in Untreated Infection Occurs Preferentially within Genes."

So, all of these events result in an approximately 10 Kbp dsDNA version of the retrovirus genome becoming present "somewhere" in the cell's genome. Until now, no viral gene expression has occurred in this infected cell.



2. How does the retroviral genome get expressed, using just one promoter, to produce lots of the "gag" and "env" proteins and small amounts of the "pol" proteins, as needed for assembly of new virions?

My overview figure shows how retroviral genome expression occurs. In the nucleus, transcription by cellular RNA polymerase starting in the left LTR gives rise to full length (~10Kb) primary transcripts. Some of these remain intact and some get spliced. Following transport to the cytoplasm, the unspliced and spliced RNAs get translated to give, respectively, lots of copies of gag and env polyproteins. During the translation of the full length RNA, ribosomal frameshifting (for HIV and HTLV) occurs approximately 10% of the time to produce small amounts of gag-pol polyprotein (by avoiding the translational stop codon at the end of gag).

Note that if RNA splicing did not occur for some fraction of the transcripts, there would be no way to express the env gene. It is only via splicing that the AUG translational start codon at the beginning of env ever gets "seen" as a start codon by ribosomes. Also, if ribosomal frameshifting did not occur at least once in a while during the translation of the full length transcripts, no pol proteins would ever get made. So, it is very clear that both of these "partial processes" (splicing some but not all of the transcripts, and ribosomal frameshifting during a few, but not most, of the passages of ribosomes down the full length transcript) are not mistakes, but rather part of the evolved pattern of expression of retrovirus genomes. The signals for these partial processes are somehow there in the nucleotide sequence.

As the various polyproteins are being made, specific proteolytic cleavages give rise to the various structural proteins of both the capsid and the envelope of the assembling virions. Virion core assembly in the cytoplasm involves the aggregation of gag and gag-pol proteins with full length (genomic) RNA molecules. Budding of cores through plasma membrane patches containing the envelope glycoproteins is coupled with proper assembly and proteolytic cutting of gag and gag-pol polyproteins. For HIV, the requirement for proteolytic cuts for proper assembly and release of virions is the basis of the most powerful of the anti-HIV drugs, the protease inhibitors. Inhibiting the action of the viral protease prevents the final assembly and release of infectious virions.