I
NTRANUCLEAR
T
RAFFICKING OF
FLUORESCENT
HIV-1
PARTICLES
Alberto Albanese
Ph.D. Thesis in Molecular Biology
Supervisor: Dott.ssa Anna Cereseto
Scuola Normale Superiore
T
ABLE OF CONTENTS
Introduction ____________________________________________ 8
The Retroviridae Family of Retroviruses _______________________ 9
HIV-1 Structure __________________________________________ 10
Viral Genome_____________________________________________ 11
HIV-1 Proteins____________________________________________ 11
HIV-1 Replication cycle ____________________________________ 17
Viral Entry ___________________________________________________17 Retrotranscription______________________________________________18 Nuclear translocation ___________________________________________20 Integration ___________________________________________________20 Viral Transcription_____________________________________________21 Viral Assembly _______________________________________________23Integrase_________________________________________________ 24
Integrase Structure _____________________________________________24 Enzymatic Activity ____________________________________________26Nuclear Structure _________________________________________ 27
The Nucleus __________________________________________________27 Chromatin Organization_________________________________________29 Interphase chromatin ___________________________________________31 Epigenetic ___________________________________________________33 Histone Post-Translational Modifications _________________________34 Acetylation ________________________________________________35 Phosphorylation_____________________________________________37 Methylation ________________________________________________37 Ubiquitylation and Sumoylation ________________________________39 DNA Methylation ___________________________________________40 Spatial Organization of Genomes _________________________________41Relationships between viral integration and chromatin organization
_________________________________________________________ 43
Interaction of HIV-1 integrase with host cellular proteins ________ 51
PICs nuclear import _______________________________________ 56
Fluorophores and Their Application__________________________ 60
Fluorescent Proteins ____________________________________________61 Confocal microscope ___________________________________________63 Fluorescent viruses_____________________________________________65 Imaging viral entry __________________________________________68 Following viral cytoplasmic trafficking __________________________71Visualizing viral assembly ____________________________________72 Fluorescent HIV-1 viruses _______________________________________73 Imaging HIV-1 entry _________________________________________73 Following HIV-1 cytoplasmic trafficking _________________________75 Monitoring interactions with cellular restriction factors ______________79 Visualizing HIV-1 assembly and budding_________________________80 Visualization of viral synapses _________________________________85
Material and Methods ___________________________________ 89
Cells and antibodies _______________________________________ 90
Expression plasmids _______________________________________ 90
Recombinant proteins ______________________________________ 91
IN activity assay___________________________________________ 91
Virus Production and Infection ______________________________ 91
Immunofluorescence and NERT fluorescence labeling ___________ 92
Image acquisition and analysis ______________________________ 93
Results________________________________________________ 94
Molecular engineering of fluorescent HIV-1 particles____________ 95
Construction of fluorescently labeled IN proteins _____________________95 IN-EGFP is trans-incorporated into the viral particles _________________97 Visualized IN-EGFP particles are virions ___________________________99Visualization of HIV-IN-EGFP virions in the cytoplasm ________ 100
IN-EGFP dots in the cell are viral particles _________________________100 Visualization of IN-EGFP retrotranscription complexes: IN-EGFP PICs __101Functional IN-EGFP PICs translocate in the nucleus ___________ 103
Are IN-EGFP viral particles really visualized in the nucleus?___________103 Translocation of HIV-IN-EGFP in the nucleus follows CA disassembly __109 Nuclear IN-EGFP particles bind viral cDNA________________________110 HIV-IN-EGFP nuclear translocation kinetic ________________________112Distribution of IN-EGFP particles in the nucleus ______________ 113
Peripheral distribution of HIV-IN-EGFP ___________________________113 Characterization of heterochromatin in HeLa H2B-EYFP cells _________114 IN-EGFP viral particles localize in less condensed chromatin __________115 IN-EGFP viral particles selectively target euchromatin________________119HIV-IN-EGFP as nuclear import assay ______________________ 121
Influences of drugs in nuclear import _____________________________121 Transportin SR2 imports PICs into the nucleus ______________________124Molecular engineering of fluorescent HIV-1 particles___________ 128
Visualization of HIV-IN-EGFP virions in the cytoplasm ________ 130
Functional IN-EGFP PICs translocate in the nucleus ___________ 133
Distribution of IN-EGFP particles in the nucleus ______________ 136
HIV-IN-EGFP as nuclear import assay ______________________ 139
Future perspectives _______________________________________ 142
References ___________________________________________ 144
1.
C
HAPTER
1
T
HE
R
ETROVIRIDAE
F
AMILY OF
R
ETROVIRUSES
The human immunodeficiency virus type 1 is an enveloped virus, characterized by an icosahedral capsid, containing the viral genome, which consists of two copies of positive single strand RNA. HIV-1 is classified as retrovirus and, like all the retroviruses, belongs to the Retroviridae family (International Committee on Taxonomy of Viruses (6th) et al., 1995). The peculiarity of retroviruses, as suggested by the term “retro”, relies on the specific capability to perform the reverse transcription of their genome from RNA into DNA, which can then be integrated into the host cellular genome.
The Retroviridae family is the only viral family possessing this feature, which characterizes the Group VI of viruses of Baltimore classification. This classification is based on the genetic system of the viruses and describes the obligatory relationship between the viral genome and its mRNA. By convention, mRNA is defined as positive strand, because it contains the immediately translatable information. In the Baltimore classification, a strand of DNA that is of equivalent polarity is also designated as positive strand. The RNA and DNA complements of positive strands are designated as negative strands. According to Baltimore classification, viruses are divided into the following seven classes: (I) dsDNA viruses, (II) ssDNA viruses, (III) dsRNA viruses, (IV) (+)-sense ssRNA viruses, (V) (-)-sense ssRNA viruses, (VI) RNA reverse transcribing viruses, and (VII) DNA reverse transcribing viruses.
Retroviruses used to be taxonomically divided into three subfamilies: the Oncovirinae, which includes those with oncogenic potential; the Lentivirinae or slow
viruses, including HIV; and the Spumavirinae or foamy viruses, which have not been shown to be pathogenic (Coffin et al., 1997).
This taxonomic classification is no longer used and Retroviridae family has been reclassified into seven distinct genera largely on the basis of the sequence similarity within the pol gene (Coffin et al., 1997): mammalian C-type viruses (prototype MLV), avian C-type viruses (the ASLV, prototype RSV), B-type viruses (prototype MMTV), D-type viruses (prototype M-PMV), viruses of the HTLV/BLV group (prototype HTLV-1), lentiviruses (prototype HIV-1), and spumaviruses (prototype HFV) (Coffin et al., 1997; Zuckerman et al., 2004).
Genus Example Virion morphology
1. Avian sarcoma and
leukosis viral group
Rous Sarcoma Virus central, spherical core “C particles”
2. Mammalian B-type
viral group
Mouse Mammary Tumor Virus
eccentric, spherical core “B particles”
3. Murine
leukemia-related viral group
Moloney Murine Leukemia Virus
central, spherical core “C particles”
4. Human T-cell
leukemia–bovine leukemia viral
Human T-Cell
Leukemia Virus central, spherical core
5. D-type viral group Mason-Pfizer Monkey
Virus
cylindrical core “D particles”
6. Lentiviruses Human
Immunodeficiency Virus
cone-shaped core
7. Spumaviruses Human Foamy Virus central, spherical core
Table 1-1. Classification of Retroviruses. (Coffin et al., 1997).
HIV-1 S
TRUCTURE
The HIV-1 virus is about 120 nm in diameter and roughly spherical. It is surrounded by an envelope composed by the plasma
membrane of host-cell origin and the viral proteins gp120 and gp41. Immediately in the interior, matrix protein lines the inner surface of the viral particle. Deeper inside there is a cone-shaped viral core, which contains two molecules of positive RNA, the viral proteins involved in the
and reverse transcriptase), and three of the viral accessory proteins (Vif, Vpr, and Nef) (Coffin et al., 1997; Frankel and Young, 1998).
V
IRAL
G
ENOME
Retroviral genomic RNA is a product of the host RNA synthesis machinery, and as such, the viral RNA genome has the structural features of a cellular messenger RNA, including a methylated cap ribonucleotide at the 5’ end and a polyadenylated 3’ end. Two direct repeats, termed R for repeated sequences, lie at the 5’ and 3’ end of the genomic viral RNA flanking the 5’ cap and the 3’ Poly(A) tail, respectively. Immediately adjacent and internal to the R sequences, there are two unique sequences, known as U5 at the 5’ end and U3 at the 3’ end of the viral RNA genome. Following U5 there is the primer binding site (PBS), a region annealed by a tRNA, that functions as the primer for reverse transcriptase to initiate synthesis of the minus strand of DNA. Adjacent to PBS there is the Ψ sequence, which is involved in the packaging of the genomic RNA into the assembling virions. Next, there are genes encoding structural, functional and accessory proteins. Finally, just upstream of U3, there is the polypurine tract, a purine-rich sequence, which is cleaved during reverse transcription to produce the RNA primer for the synthesis of the plus strand of viral DNA.
Figure 1-3. RNA viral genome. (Coffin et al., 1997).
HIV-1 P
ROTEINS
The retroviral genome is about 9-kb of RNA, and encodes nine open reading frames. Three of these encode the Gag, Pol, and Env polyproteins, which are subsequently proteolyzed into individual proteins common to all retroviruses. The four Gag
proteins, matrix (MA or p17), capsid (CA or p24), nucleocapsid (NC or p7), and p6, and the two Env proteins, gp120 and gp41, are structural components that make up the core of the virion and outer membrane envelope, respectively. The three Pol proteins, protease (PR), reverse transcriptase (RT), and integrase (IN), provide essential enzymatic functions and are also encapsulated within the viral particle, as mentioned above. HIV-1 encodes six additional proteins, often called accessory proteins, three of which (Vif, Vpr, and Nef) are found in the viral particle (Coffin et al., 1997; Frankel and Young, 1998).
Figure 1-4. Organization of the HIV-1 genome. The location of the long terminal repeats (LTRs) and
the genes encoded by HIV-1 are indicated. Gag, Pol and Env proteins are initially synthesized as polyprotein precursors. The Gag precursor is cleaved by the viral protease (PR) into the mature Gag proteins: matrix (MA), capsid (CA), nucleocapsid (NC) and p6. The GagPol precursor undergoes PR-mediated processing to generate the Gag proteins and the Pol enzymes: PR, reverse transcriptase (RT) and integrase (IN). The Env glycoprotein precursor gp160 is cleaved by a cellular protease during transport to the cell surface to generate the mature surface glycoprotein gp120 and the trans-membrane glycoprotein gp41. The sizes of the genes and encoded proteins are not to scale. (Freed, 2004).
The core and matrix proteins are encoded by Gag. The Gag proteins of the mature virus are p17, p24, p7 and p6, and are processed by cleavage of the p55 precursor protein by the viral protease (Coffin et al., 1997; Zuckerman et al., 2004). MA is the N-terminal component of the Gag polyprotein and is important for targeting Gag and Gag-Pol precursor polyproteins to the plasma membrane prior to viral assembly. Indeed, MA protein contains a bipartite membrane-binding signal: one is the 14-carbon, saturated fatty acid myristate covalently linked to N-terminal glycine residue, the other is the largely basic sequence located a short distance downstream. The two signals mediate high-affinity binding of MA and of Gag polyprotein to the lipid bilayer by hydrophobic interactions with membrane lipids and ionic bonds with negatively charged head groups of membrane phospholipids, respectively (Flint et al., 2004). As a consequence of this interaction, MA lines the inner surface of the mature viral particle. In addition, this protein appears to help incorporate Env
glycoproteins into viral particles (Mammano et al., 1995). Furthermore, MA is part of the pre-integration complexes (PICs) (Bukrinsky et al., 1993b) and the two nuclear localization signals (NLS) (Haffar et al., 2000) may facilitate the nuclear import (Gallay et al., 1995).
CA is the second component of the Gag polyprotein and forms the core shell of the HIV-1 viral particle with about 2000 molecules per virion (Scarlata and Carter, 2003). This protein is often used in Enzyme-Linked ImmunoSorbent Assay (ELISA) for quantifying the amount of virus. The C-terminal domain functions primarily in assembly and is important for CA dimerization and Gag oligomerization. Capsid is important for infectivity, by participating in viral uncoating through its interaction with cyclophilin A (cypA) (Kootstra et al., 2003; Saphire et al., 2002; Towers et al., 2002; Towers, 2007) and it is the major determinant of infection in growth-arrested cells (Yamashita and Emerman, 2004). Cyclophilin A is a cytosolic cellular protein that belongs to the peptidyl prolyl isomerases family and performs the cis-trans isomerization of proline peptide bonds in sensitive proteins (Towers, 2007). By interacting with Gag in infected cells, CypA is incorporated into nascent HIV-1 virions (Franke et al., 1994; Thali et al., 1994). CypA performs cis-trans isomerization at CA Gly89-Pro90 on the outer surface of the capsid (Bosco et al., 2002; Bosco and Kern, 2004) leading to an increased infectivity. The research group lead by Luban (Sayah et al., 2004) showed that cypA is an important cellular factor, since it is the target for the recently discovered old world monkey TRIM5α restriction factor (Song et al., 2005; Stremlau et al., 2004; Yap et al., 2004), which has been demonstrated to greatly decrease HIV-1 infectivity (Berthoux et al., 2005; Keckesova et al., 2006; Stremlau et al., 2006).
Nucleocapsid protein is the third component of the Gag polyprotein and coats the genomic RNA inside the virion core. The primary function of NC is to bind specifically, through its two zinc finger domains, to the packaging signal (Ψ) and deliver full-length viral RNAs into the assembling virion. The packaging signal is composed of three RNA hairpins located around the major splice donor site, the first of which contains the kissing loop involved in RNA dimerization. NC is a basic protein that also binds single-stranded nucleic acids nonspecifically, leading to
coating of the genomic RNA that presumably protects it from nucleases and compacts it within the core (Frankel and Young, 1998).
Protein p6 comprises the C-terminal 51 amino acids of Gag and is important for incorporation of Vpr during viral assembly (Cohen et al., 1990). In addition, p6 is required for efficient viral particle release (Demirov et al., 2002; Huang et al., 1995). The pol gene encodes the enzymes protease, reverse transcriptase and integrase. When the virus buds from the membrane surface, it is released as immature noninfectious particle. PR mediates the cleavage of Gag and Pol polyproteins (Coffin et al., 1997).
RT protein catalyzes both RNA-dependent and DNA-dependent DNA polymerization reactions and contains an RNase H domain that cleaves the RNA portion of RNA-DNA hybrids generated during the reaction (Coffin et al., 1997). RT is a heterodimer containing a 560-residue subunit (p66) and a 440-residue subunit (p51), both derived from the Pol polyprotein (Flint et al., 2004), each of which contains a polymerase domain composed of four subdomains called fingers, palm, thumb and connection, and p66 contains an additional RNase H domain (Frankel and Young, 1998). Even though their amino acid sequences are identical, the polymerase subdomains are arranged differently in the two subdomains, with p66 forming a large active-site cleft and p51 forming an inactive closed structure. RT is characterized by an high error rate when transcribing RNA into DNA, since it lacks a proofreading function (Coffin et al., 1997).
Following reverse transcription, IN catalyzes a series of reactions to integrate the viral genome into a host chromosome. IN together with other viral and cellular proteins forms the pre-integration complex (PIC) and binds specific sequences located at the ends of the viral cDNA (att sites) (Coffin et al., 1997). This protein will be described in more detail in the subsequent sections.
The env gene encodes the gp41 and gp120 envelope glycoproteins, cleaved by cellular enzymes (furins) from the gp160 precursor (Zuckerman et al., 2004). The proteins gp120 and gp41 are located on the viral membrane surface and their
function is to bind the CD4 receptor of the target cells and mediate fusion between viral and cellular membranes, respectively (Frankel and Young, 1998).
In addition to gag, pol and env, HIV-1 carries six regulatory and accessory genes. The tat gene encodes a small protein, which is essential for efficient transcription of viral genes and for viral replication (Kessler and Mathews, 1992; Marcello et al., 2001), resulting in a remarkable increase of viral gene expression (Ratnasabapathy et al., 1990; Zhou and Sharp, 1995). Tat binds to a structured RNA element (TAR, transactivation-responsive region) present at the 5’-end of viral leader mRNA via cyclin T bridging between the activation domain of Tat and the TAR loop (Wei et al., 1998). Through this interaction, Tat recruits a series of transcriptional complexes, including histone acetyl transferases, which modify chromatin at the proviral integration site and make it more suitable to transcription, and P-TEFb (Positive Transcription Elongation Factor b), which stimulates RNA polymerase II phosphorylation by Cdk9, increasing the processivity of the enzyme complex (Bieniasz et al., 1998; Shilatifard et al., 2003; Wei et al., 1998).
Rev is a sequence-specific RNA binding phosphoprotein that is expressed during the early stages of HIV-1 replication (Malim et al., 1989). Rev transports to the cytoplasm single-spliced and un-spliced viral mRNA that are required for expression of HIV structural proteins and production of genomic RNA. Eukaryotes have evolved a special mechanism to retain the incompletely spliced RNAs in the nucleus. Since HIV-1 only has one LTR promoter, it encodes a single genome-length primary transcript. In order to express the various incompletely spliced viral transcripts, some HIV-1 transcripts must be transported out of the nucleus without splicing. Rev fulfill this function (Malim et al., 1989).
Nef is a 27 KDa myristoylated protein that is abundantly produced during the early phase of viral replication cycle. It is highly conserved in all primate lentiviruses, suggesting that its function is essential for survival of these pathogens. Nef has different roles in HIV-1 replication and disease pathogenesis. It down-regulates CD4 (Garcia and Miller, 1991), which limits the adhesion of a Nef-expressing T cell to the antigen-presenting cell, thus promoting the movement of HIV-infected cells into circulation and spread of the virus. Nef down-modulates MHC I (Schwartz et al.,
1996) cell surface expression, protecting HIV-infected cells from host CTL response. In addition, it interferes with cellular signal transduction pathways and it enhances virion infectivity and viral replication, since it induces actin remodeling and facilitates the movement of the viral core past a potentially obstructive cortical actin barrier (Campbell et al., 2004; Chowers et al., 1994).
Vpr is a 96 aa small basic protein. Despite its small size, Vpr has been shown to have multiple activities during viral replication. Vpr appears to participate in the anchoring the PICs to the nuclear envelope and to be involved in the nuclear translocation of the viral DNA (Heinzinger et al., 1994). In addition, Vpr induces cell cycle G2 phase arrest (Bartz et al., 1996; Di Marzio et al., 1995). The biological significance of Vpr-induced arrest during viral infection is not well understood. However, HIV-1 LTR seems to be more active in the G2 phase, implying that Vpr-induced G2 arrest may confer a favorable cellular environment for efficient transcription of HIV-1 (Goh et al., 1998).
Vpu is a 9 KDa membrane protein that induces the degradation of the CD4 receptor. Vpu interacts with a membrane-proximal domain of the cytoplasmic tail of CD4 and links it to h-βTrCP (Margottin et al., 1998). The CD4-Vpu-h-βTrCP ternary complex then recruits SKP1, a member of the ubiquitination machinery (West, 2003). As a result, CD4 is ubiquitylated and targeted to proteasomes for degradation. In addition, Vpu increases progeny virus secretion from infected cells. This function is related to the ability of Vpu to self-assemble into homooligomeric complexes that in vitro function as ion-conductive membrane pores (Bour and Strebel, 2003). The requirement of Vpu for efficient virus release is host cell-dependent (Varthakavi et al., 2003), suggesting that Vpu may counteract an inhibitory cellular factor that, in the absence of Vpu, inhibits virus release. Recent report showed that this factor is TASK-1, an acid-sensitive K+ channel (Hsu et al.,
2004). TASK-1 is structurally homologous to Vpu, suggesting oligomerization as a possible mechanism of inactivation of ion channel activity of these proteins (Hsu et al., 2004). However, the mechanism by which TASK-1 inhibits virion release is still unclear (Li et al., 2005).
Vif is a 192 aa protein that is expressed at high levels in the cytoplasm of infected cells. Vif was thought to be important because it is essential for the replication of HIV-1 in the peripheral blood lymphocytes, macrophages, and certain cell lines known as “nonpermissive” cells (Strebel et al., 1987). The molecular nature of the permissivity is related to a host cellular protein known as APOBEC3G (apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like 3G), a potent inhibitor of HIV infection in the nonpermissive cells (Harris et al., 2002; Jarmuz et al., 2002). APOBEC3G is a member of the cytidine deaminase family, which prevents viral cDNA synthesis via deaminating deoxycytidines in the minus-strand retroviral cDNA replication intermediate (Harris et al., 2003; Yu et al., 2004). As a result, it creates stop codons or G to A transitions in the newly synthesized viral cDNA, which is then subjected to elimination by host DNA repair machinery (Zhang et al., 2003). Thus, APOBEC3G represents an innate host defense mechanism against HIV infection. However, the virus has also developed an offensive strategy to suppress the antiviral effect of APOBEC3G through Vif. Vif binds directly to APOBEC3G and counteracts its anti-HIV activity by promoting its degradation (Li et al., 2005). In addition, Vif is specifically packaged into virions, where it is processed by protease (Khan et al., 2002). Vif also stabilizes viral nucleoprotein complex through direct interaction with 5’ region of HIV-1 genomic RNA (Simon and Malim, 1996).
HIV-1 R
EPLICATION CYCLE
Viral Entry
The tropism of HIV-1 for the target cells is governed by the presence of both the cellular receptor CD4 and a coreceptor on the plasma membrane of target cells (Dalgleish et al., 1984). Two types of coreceptor were identified: the chemokine receptors CCR5 [chemokine (C-C motif) receptor 5] and CXCR4 [chemokine (CXC motif) receptor 4] (Choe et al., 1996; Deng et al., 1996). The distribution of these coreceptors permits infection not only of CD4+ T cells, but also macrophages and
Figure 1-5.The HIV-1 replication cycle. (Pommier et al., 2005).
In order to enter the target cell, HIV-1 gp120 protein binds the CD4 receptor, inducing a conformational change and promoting the binding of the chemokine receptor CCR5 or CXCR4. It is noteworthy that individuals homozygous for a defective CCR5 allele are highly resistant to HIV-1 infection. The interaction between gp120, CD4 and the coreceptor induce a conformational change in gp41 that expose a hydrophobic glycine-rich “fusion” peptide, which initiate the fusion of the viral envelope with the plasma membrane in specific membrane microdomains rich in cholesterol, known as lipid rafts (Manes et al., 2000).
Retrotranscription
Following entry, the viral RNA genome is in the cell cytoplasm as part of a nucleoprotein complex, which associates with microtubules before the loss of the capsid structure (McDonald et al., 2002). The function of the microtubule-based mobility is to transport the HIV-1 viral complex from the cell periphery to the nucleus. The next step of viral infection is
the synthesis of a DNA copy of the RNA viral genome, through the viral enzyme RT, and this process has been shown to start in the intact capsid structure (McDonald et al., 2002).
Reverse transcription is an essential step in HIV-1 life cycle, since it is a process that converts the genomic RNA into DNA. It has been proposed that retroviruses copackage two copies of positive single-stranded RNA to increase the probability of successful DNA synthesis (Coffin, 1979). During initiation of reverse transcription, a cellular tRNA primer (tRNALys) is placed onto a complementary sequence in the viral genome, called the primer binding site (PBS). The reverse transcriptase recognizes this RNA-RNA complex and catalyzes the synthesis of minus-strand DNA starting from the 3’ end of the tRNA primer, with the viral RNA acting as template. The synthesis of minus-strand DNA (-sssDNA) extends up to the 5’ cap of the genomic RNA template and the RNase domain of RT cleaves the RNA portion of RNA-DNA hybrid. Continued minus-strand DNA synthesis requires a strand-transfer reaction that allows the 3’end of the genomic RNA to serve as a template. Once the first jump has occurred, the 3’end of the minus strand is extended up to the PBS of the RNA viral genome. The site where DNA plus-strand is initiated is the polypurine tract, where RNase H domain of RT cleaves the RNA to generate an RNA primer. RT catalyzes the synthesis of the plus-strand DNA (+sssDNA) up to a portion of the tRNA. The 3’end of the +sssDNA is complementary to the PBS of the –sssDNA and it is required as complementary region for the second strand transfer. Once the second jump has occurred, elongation of the plus and minus
Figure 1-6. Schematic representation of HIV-1 reverse transcription. (Coffin et al., 1997).
strands can be completed. The final product is a blunt-ended linear duplex DNA (Coffin et al., 1997).
Compared to other DNA polymerases, RT lacks a 3’ exonuclease activity capable to excise mispaired nucleotides, resulting in a more error-prone enzyme. This feature allows the HIV-1 to adapt to environment changes, helping it to escape immune system defensive mechanisms and even drugs treatment (Coffin et al., 1997).
Nuclear translocation
Once the reverse transcription process is completed, the newly synthesized viral DNA remains associated with a high molecular weight complex composed of both viral and cellular proteins, known as preintegration complex (PIC), that will be explained in more detail in the subsequent sections.
Whereas most retroviruses need the nuclear membrane disassembly during mitosis to allow the retrotranscribed viral complexes to access the host genome, lentiviruses have evolved a mechanism whereby the PIC is actively transported across the nuclear envelope through the nuclear pores. Several viral determinants for nuclear import have been proposed, including MA (Bukrinsky et al., 1993a), Vpr (Heinzinger et al., 1994), the IN enzyme (Gallay et al., 1997) and an unusual triple-stranded fragment of lentiviral DNA referred to as the DNA ‘flap’ (De Rijck and Debyser, 2006). Although a consensus has not emerged so far regarding the mechanism by which the PIC is imported to the nucleus, this unique property enables lentiviruses to infect non-dividing cells.
Integration
Once inside the nucleus, IN catalyzes the integration of the viral DNA into the host cell chromosome, which will be discussed in more detail in the subsequent sections. IN together with other viral and cellular proteins that forms the PICs bind specific sequences located at the end of the viral cDNA (att sites) (Coffin et al., 1997). So far no primary sequence in the cellular genome has been identified as the preferential binding site for IN and integration seems to occur at random on DNA molecules (Carteau et al., 1998; Stevens and Griffith, 1996). Only recently it has been showed that transcriptionally active genes are strongly favored as integration target sites
(Barr et al., 2006; Barr et al., 2005; Carteau et al., 1998; Ciuffi et al., 2005; Lewinski et al., 2005; Lewinski et al., 2006; Mitchell et al., 2004; Schroder et al., 2002; Wu et al., 2003).
Alternatively, the viral DNA may follow three different fates, all of which do not lead to the formation of a functional provirus. The ends of viral DNA may join to form a 2-LTR ring or the viral genome may undergo homologous recombination producing a single LTR ring. Finally, the viral DNA may integrate into itself (autointegration) leading to the formation of a rearranged circular structure (Coffin et al., 1997). None of these circular forms serve as precursor to integrated provirus, and none appear to contribute significantly to viral replication. Rather, they all appear to be dead-end by-products of aborted infections (Coffin et al., 1997).
Viral Transcription
In the integrated provirus, the 5’ LTR acts as the viral promoter; it contains several positive transcription factor binding sites even if, in the absence of the viral Tat protein, the binding of these factors is not sufficient to activate the transcription of viral genes. However, the presence of these promoter elements results in the correct
Figure 1-7. Unintegrated viral DNA structures.
(A) The linear product of viral DNA synthesis is the precursor to the integrated provirus. (B) 1-LTR circle. This structure is consistent with one that could be formed by homologous recombination between the LTRs of the linear DNA molecule. (C) 2-LTR circle. This structure is consistent with one that could be formed by simply joining the ends of the linear DNA molecule, although there are often bases inserted or deleted at the "circle junction". (D and E) Autointegration products. These circular molecules are apparently formed by the suicidal integration of the viral DNA ends using the viral DNA itself as the target, instead of cellular DNA. Their structures depend on the site of integration (which determines the spacing between the two LTRs in the full- length circular products or the sizes of the two subgenomic circular products), and on the path of the DNA between the ends and the target site (which determines whether the product is a single full-length circle [D] or two smaller circles [E]). Dots indicate the sites of joining; arrows show orientation of the DNA sequence relative to the RNA genome. (Coffin et al., 1997).
positioning of RNA polymerase II at the site of initiation of transcription and to the assembly of the pre-initiation complex. At this point transcription starts but the polymerase produces predominantly short, non-polyadenylated RNA that include a hairpin structure at the 5’ end of the nascent viral RNA, named trans-activation-responsive region (TAR) (Peterlin and Trono, 2003). Tat acts as a very powerful transcriptional activator of the integrated provirus by interacting with TAR and promoting the production of polyadenylated full-length RNA viral genomes.
Tat-activated transcription originates different transcripts derived by the splicing of the full-length viral genome. The first viral transcripts that appear after infection are completely spliced and are rapidly transported into the cytoplasm following the same pathway as cellular mRNA (Cullen, 1998).
Figure 1-8. Mechanism of Tat transactivation. Activators that bind the promoter recruit RNA
polymerase II (RNAPII) to the long terminal repeat (LTR). In the pre-initiation complex, the unphosphorylated carboxy-terminal domain (CTD) of RNAPII, which is shown as a yellow coil, binds mediators. Together with the general transcription factor TFIIH, which contains DNA-helicase and CTD-kinase activities, RNAPII clears the promoter and starts copying the viral genome. Cyclin-dependent kinase 7 (CDK7) in TFIIH is shown as a grey changing to red ball, indicating its activation as a kinase. The partially phosphorylated RNAPII arrests at or near the transactivation response element (TAR), synthesizing TAR and/or an alternative paused hairpin. 5,6-dichloro-1-β-D-ribofuranosylbenzimidazole (DRB)-sensitivity-inducing factor (DSIF) and negative elongation factor (NELF) then ensure that RNAPII does not elongate. RD — so named for its many repeats of arginine and glutamate residues — in NELF contains an RNA-recognition motif that binds the stem in TAR. For formation of the tripartite complex with transcriptional transactivator (Tat) and TAR, positive transcription elongation factor b (P-TEFb), which contains cyclin T1 (CYCT1) and CDK9, must be free of 7SK RNA, and CDK9 must be autophosphorylated. After its recruitment to TAR, P-TEFb phosphorylates suppressor of Ty 5 (SPT5) in DSIF and RD in NELF, and completes the phosphorylation of the CTD of RNAPII, thereby modifying RNAPII for efficient elongation. The phosphorylated CTD now binds elongators, which consist of capping enzymes, splicing apparatus and polyadenylation factors. Efficient elongation of transcription and viral replication ensue. The change in color of the CTD from yellow to red and its increased thickness indicate increased levels of phosphorylation. (Peterlin and Trono, 2003).
Incompletely spliced RNAs are blocked in the nucleus by the cellular machinery that control the integrity of the splicing process; the single spliced and unspliced transcripts persist in the nucleus due to defective donor and acceptor splice sites and to the inhibitory effect of Rev on splicing (Luo et al., 1994; Powell et al., 1997). The translocation of these transcripts into the cytoplasm depends on the expression of the Rev protein (Pomerantz et al., 1992). Rev is able to shuttle between the nucleus and the cytoplasm and binds the viral transcripts through the interaction with an RNA stem-loop structure named Rev responsive element (RRE), located in the env gene (Malim et al., 1990).
Viral Assembly
Once translated, all the viral proteins necessary for the virion assembly and RNA genomes are transported to the plasma membrane, in cholesterol-rich lipid domains known as lipid rafts, where the building of new virions begins. The gp120/gp41 complex is transported via the Endoplasmic reticulum-Golgi pathway, whereas the Gag-Pol polyproteins are targeted to the plasma membrane after the myristoylation of Gag (Gottlinger et al., 1989). The resulting virions bud from the plasma membrane as immature virions. Their maturation is accomplished by viral protease activity that first cleaves Gag-Pol and then, starting from the Gag and Pol separated precursors, originates the single core proteins, matrix and the viral enzymes. The proteolytic activity ends when the virion is already detached from the host plasma membrane and results in the formation of mature infectious viruses (Coffin et al., 1997).
Figure 1-9. The HIV genome, transcripts and proteins. HIV transcripts. Integrated into
the host chromosome, the 10-kb viral genome contains open reading frames for 16 proteins that are synthesized from at least ten transcripts. Black lines denote unspliced and spliced transcripts, above which coding sequences are given, with the start codons indicated. Of these transcripts, all singly spliced and unspliced transcripts shown above those encoding the transcriptional transactivator (Tat) require regulator of virion gene expression (Rev) for their export from the nucleus to the cytoplasm. The RNA target for Rev, the Rev response element (RRE), is contained in the gene encoding envelope protein (Env). (Peterlin and Trono, 2003).
I
NTEGRASE
The name of the key enzyme mediating retroviral integration has evolved through several stages in the past 25 years, reflecting incremental progress in our understanding of its role and activities. Indeed, this enzyme was initially identified by its apparent molecular weight, then labeled “endonuclease”, in recognition of the relatively non specific endonuclease activity observed in assays using unnatural DNA substrates, and finally “integrase” (IN), when it became clear that it was the enzyme that actually catalyzed the key chemical steps in integration (Coffin et al., 1997).
The integrase protein is encoded by sequences at the 3’end of the pol gene, immediately downstream from the sequences encoding reverse transcriptase. Once viral protease has cleaved the polyprotein, the stoichiometry of integrase protomers in the virion is 1:1 with reverse transcriptase protomers, or approximately 50-100 protomers per viral particle (Coffin et al., 1997).
Integrase catalyzes the integration of the viral DNA into the host cell genome, via a two-step process: the 3’-end processing and the 3’-end joining (Coffin et al., 1997).
Integrase Structure
HIV-1 integrase is a protein of 288 amino acids (about 32 kDa) and it shares similar structural domains with the other retroviral integrases. The domains consist of an N-terminal domain of 50 amino acids, a central domain of 160 amino acids, and a less conserved C-terminal domain of 80 amino acids.
Figure 1-10. Schematic of the domain structure of retroviral integrases. The three domains appear to
be stably folded when prepared separately. The amino-terminal-most (HHCC) domain is characterized by pairs of histidine and cysteine residues that are universally conserved among retroviral integrases. The central domain contains the catalytic site. It is characterized by a triad of universally conserved and essential residues, an aspartate, followed at some distance by an aspartate and glutamic acid that are always separated by 35 amino acids. The carboxy-terminal domain is sometimes called the DNA-binding domain, a bit of a misnomer since the core domain also binds DNA, but nevertheless an accurate reflection of its one known activity. (Coffin et al., 1997).
Within the N-terminal domain of IN is a putative zinc finger of the HHCC type. Using a zinc binding assay, Bushman et al. (Bushman et al., 1993) reported that wild-type HIV IN binds zinc. Recently, a solution structure of the N-terminal domain was determined and revealed a dimeric structure having an HHCC zinc binding motif that coordinates zinc. The folds of the N termini are similar to those of other DNA binding proteins in having a helix-turn-helix structural motif (Hindmarsh and Leis, 1999). The N terminus influences the catalytic activity of IN but does not contain its catalytic core and seems not to be involved in the multimerization (Hindmarsh and Leis, 1999).
The central core domain comprises residues 50 to 212 and has been shown to coordinate divalent cations. The crystal structures of the catalytic core domains for HIV-1 have been solved. The central core is the catalytic domain of the enzyme. The core domain is characterized by the catalytic triad of three highly conserved residues, D,D(35)E. Substitutions of any of these residues abolish end-processing and/or joining reactions. Crystal structures of the catalytic core, coordinating a divalent cation, have been determined for HIV by using Mg++. The divalent cations
were found to be coordinated by the two conserved aspartic acid residues of the catalytic triad (Hindmarsh and Leis, 1999).
The C terminus of IN is required both for 3’end processing and integration activity (Coffin et al., 1997). An HIV-1 IN fragment representing residues 235 to 288 binds nonspecifically to DNA (Hindmarsh and Leis, 1999). Interpretation of the DNA binding activity of the carboxy-terminal region is complicated by the fact that integration involves two different DNA substrates, which have different structural requirements: the viral cDNA and the host genomic DNA. The isolated carboxy-terminal region binds well to simple linear double-stranded DNA oligonucleotides (Engelman et al., 1994; Lutzke et al., 1994; Vink et al., 1993), suggesting that it may contribute to binding the viral cDNA ends (att sites) (Coffin et al., 1997). In addition, the C-terminus seems to be involved in the multimerization of IN (Hindmarsh and Leis, 1999) (Asante-Appiah and Skalka, 1999; Engelman, 1999).
Enzymatic Activity
Integration occurs in two well-characterized catalytic steps, referred to as end processing and end joining (Coffin et al., 1997; Hindmarsh and Leis, 1999).
End processing involves removal of a dinucleotide, adjacent to a highly conserved CA dinucleotide, from the 3’ strand of the U3 and U5 viral DNA LTRs in a reaction involving a water molecule or other nucleophile (Engelman et al., 1991). This exposes a 3’ hydroxyl group, whose oxygen is used as an attacking nucleophile on the target DNA during the joining reaction, in which the viral DNA is inserted into the cellular DNA. It is believed that a Mg++ atom coordinated in the active site of IN
facilitates the deprotonation of the water to activate it as a nucleophile.
Figure 1-11. Schematic outline of the principal steps in retroviral DNA integration. (Coffin et al., 1997).
The DNA-joining step of integration, which involves the formation of new phosphodiester bonds joining the viral and host DNAs, proceeds without an extrinsic source of chemical energy. This suggests that the energy from the target DNA bonds that need to be broken in this step is used to form the new bonds that join the viral and target DNAs. This cleavage-ligation reaction proceed via a transesterification reaction and not via a covalent intermediate between IN and DNA (Engelman et al., 1991), as it happens, for example, between topoisomerases and DNA (Champoux, 1977).
Integration is accompanied by duplication of a short sequence from the target site, which flanks the integrated provirus as a direct repeat of 4-6 bp (Coffin et al., 1997).
The 5’ ends of the viral DNA and the 3’ ends of the host DNA remain unjoined. It is thought that repair of this integration intermediate is carried out by cellular enzymes, generating the integrated provirus (Coffin et al., 1997; Hindmarsh and Leis, 1999).
N
UCLEAR
S
TRUCTURE
The DNA of eukaryotic cells is sequestered from the cytoplasm in the nucleus and it is complexed with cellular proteins. The result is a very complex structure, with different levels of organization. Since retroviral integration takes place in the cellular genome, it is important to understand the organization of the host genome.
The Nucleus
The nuclear compartment is delimited from the cytoplasm by the nuclear envelope, constituted by two concentric lipid bilayer membranes. The inner membrane contacts the nuclear lamina, which forms a thin sheetlike meshwork giving
Figure 1-12. A cross-sectional view of a typical cell nucleus. The nuclear envelope consists of two
membranes, the outer one being continuous with the endoplasmic reticulum membrane. The space inside the endoplasmic reticulum (the ER lumen) is colored yellow; it is continuous with the space between the two nuclear membranes. The lipid bilayers of the inner and outer nuclear membranes are connected at each nuclear pore. Two networks of intermediate filaments (green) provide mechanical support for the nuclear envelope; the intermediate filaments inside the nucleus form a special supporting structure called the nuclear lamina. (Alberts et al., 2002).
mechanical support to the nuclear envelope. The nuclear lamina is a layer of intermediate filament proteins. The α-helical heptad repeats of lamins form coiled-coil dimers, which associate head-to-tail in filaments that span from pore to pore
(Akhtar and Gasser, 2007). The outer membrane is directly connected to the endoplasmic reticulum of the cytosol. The space between these two membranes is continuous with the lumen of the endoplasmic reticulum (Alberts et al., 2002). In order to allow the trafficking of molecules between the nuclear compartment and the cytosol the two membranes come into contact at openings called nuclear pore complexes. There are more than 3,000 pore complexes on the nuclear envelope of an animal cell. Each complex is composed of more than 50 different proteins, the nucleoporins, which are arranged with a striking octagonal symmetry. The nuclear pores are used for both import of molecules, like proteins synthesized in the cytosol, and export, like mRNAs transcribed in the nucleus.
Figure 1-13. The arrangement of nuclear pore complexes in the nuclear envelope. (A) A small region
of the nuclear envelope. In cross section, a nuclear pore complex seems to have four structural building blocks; column subunits, which form the bulk of the pore wall; annular subunits, which extend "spokes" (not shown) toward the center of the pore; luminal subunits, which contain transmembrane proteins that anchor the complex to the nuclear membrane; and ring subunits, which form the cytosolic and nuclear faces of the complex. In addition, fibrils protrude from both the cytosolic and the nuclear sides of the complex. On the nuclear side, the fibrils converge to form basketlike structures. Localization studies using immunoelectron microscopy techniques showed that the proteins that make up the core of the nuclear pore complex are symmetrically distributed across the nuclear envelope so that the nuclear and cytosolic sides look identical. This is in contrast to proteins that make up the fibrils, which are different on each side of the cytosolic or the nuclear side. (B) A scanning electron micrograph of the nuclear side of the nuclear envelope of an oocyte. (C) The continuity of the inner and outer nuclear membrane at the pore is apparent in this thin section electron micrograph, showing a side view of two nuclear pore complexes (brackets). (D) This electron micrograph shows face-on views of negatively stained nuclear pore complexes from which the membrane has been removed by detergent extraction. (Alberts et al., 2002).
Chromatin Organization
The DNA of human cells is made up of approximately 7x109 nucleotides, divided
between a set of 46 chromosomes, 22 pairs common to both males and females, plus two so-called sex chromosomes (X and Y in males and two Xs in females). Stretched out end to end, the human DNA would extend for a total length of about 1.8 meters. Since the average diameter of a nucleus is around 6 µm, it comes out that the DNA must be tightly packaged to fit in it. This packaging is performed by proteins, which successively coil and fold the DNA into higher level of organization until the highest one that is the mitotic chromosome. The high overall packing ratio of the genetic material suggests that DNA cannot be directly packaged into the final structure of chromatin. Indeed there are hierarchies of organization.
The proteins that bind the DNA to form the eukaryotic chromosome are divided into two classes: the histones and the nonhistone chromosomal proteins. The complex resulting from both classes of proteins and the nuclear DNA is called chromatin. The first and most basic level of chromatin organization is the nucleosome. At this level
the double strand DNA is wrapped around a complex of eight histone proteins, called the histone core. This histone octamer consists of two copies each of H2A,
Figure 1-14. Structural organization of the nucleosome. A nucleosome contains a protein
core made of eight histone molecules. As indicated, the nucleosome core particle is released from chromatin by digestion of the linker DNA with a nuclease, an enzyme that breaks DNA. (The nuclease can degrade the exposed linker DNA but cannot attack the DNA wound tightly around the nucleosome core.) After dissociation of the isolated nucleosome into its protein core and DNA, the length of the DNA that was wound around the core can be determined. This length of 146 nucleotide pairs is sufficient to wrap 1.65 times around the histone core. (Alberts et al., 2002).
H2B, H3 and H4. The organization of DNA with proteins to form nucleosomes leads to a chromatin length that is one-third of the initial one. At this stage the chromatin is a continuous of nucleosomes and resembles a series of beads on a string. This structure, called 10 nm fiber, is not still clear whether exists in vivo or is only an artifact, as a consequence of unfolding during extraction in vitro.
Figure 1-15. Nucleosomes as seen in the electron microscope. This electron micrograph shows a length
of chromatin that has been experimentally unpacked, or decondensed, after isolation to show the nucleosomes (Alberts et al., 2002).
The 10 nm fiber condenses into a more packed form, named 30 nm fiber. The presence of histone H1 is required to form the 30 nm fiber. Histone H1 condense the 10 nm fiber through its interaction with nucleosomes, changing the path of the DNA and leading to a fiber resembling a solenoid (Schalch et al., 2005). It has about 6 nucleosomes per turn, which correspond to a packing ratio of 40 (1 µm of this fiber contains 40 µm of DNA).
These still extended structures present in the interphase chromosomes condense more and more during mitosis to form highly condensed structures called mitotic chromosomes. At this stage each chromosome consist of two daughter DNA molecules produced by DNA replication and they are folded separately to produce two sister chromosomes, called sister chromatids, held together at their centromers. Several mechanisms of chromatin condensation that lead to the formation of the highly condensed chromosome structure have been proposed (Belmont, 2002; Belmont and Bruce, 1994; Poirier and Marko, 2002; Strukov et al., 2003; Swedlow and Hirano, 2003), but the most known is the “radial loop model” (Coelho et al., 2004; Maeshima and Laemmli, 2003; Swedlow and Hirano, 2003). In this model the 30 nm fiber forms loops, which in turn coil to form the mitotic chromosome.
Figure 1-16. Chromatin packing. This model shows some of the many levels of chromatin packing
postulated to give rise to the highly condensed mitotic chromosome. (Alberts et al., 2002).
Interphase chromatin
Highly condensed chromosomes are present in the eukaryotic cell for a brief period, during the act of cell division. During most of the life cycle of the cell, however, its genetic material occupies an area of the nucleus in which individual chromosomes can’t be distinguished. The structure of interphase chromatin doesn’t change visibly between one division and the following. The characteristics of chromatin in interphase nuclei have been studied at light microscope since the 1930s, distinguishing two types of material: a highly condensed and a less condensed form, called heterochromatin and euchromatin, respectively.
Euchromatin is composed of the types of chromosomal structures such 30 nm fiber and looped domains. It has relatively dispersed appearance in the nucleus and occupies most of the nuclear region. Heterochromatin, in contrast, is characterized by regions very densely packed with fibers, displaying a condition comparable to that of the chromosome at mitosis. It includes additional proteins and although present in many locations along chromosomes, it is concentrated in specific regions, including the centromeres and the telomeres (Alberts et al., 2002; Lewin, 2004). The amount and distribution of condensed chromatin is similar in terminally differentiated cells of the same lineage, but it varies in the nuclei of different cell types, indicating that nuclear organization may be cell-type specific (Francastel et al., 2000).
The same fibers run continuously between euchromatin and heterochromatin, which implies that these states represent different degrees of condensation of the genetic material. In the same way, euchromatic regions exist in different states of condensation during interphase and during mitosis. So the genetic material is organized in a manner that permits alternative states to be maintained side by side in chromatin, and allows cyclical changes to occur in the packaging of euchromatin between interphase and division (Lewin, 2004).
The structural condition of the genetic material is correlated with its activity. Indeed, active genes are contained within euchromatin, but only a small minority of the sequences in the euchromatin is transcribed at any time. So location in euchromatin is necessary for gene expression, but is not sufficient for it. Heterochromatin is not transcribed and replicates late in the S phase of cell cycle (Lewin, 2004; Wu et al.,
Figure 1-17. Electron micrograph of a cell nucleus. A thin section through a nucleus stained
with Feulgen shows heterochromatin (H) as compact regions clustered near the nucleolus (Nu) and the nuclear membrane. Euchromatin (E) appears as less condensed regions. (Lewin, 2004).
2005). This suggests that condensation of the genetic material is associated with (perhaps is responsible for) its inactivity. Heterochromatin can be distinguished in facultative or constitutive heterochromatin. The former is the fraction of chromatin that is condensed and inactive in a given cell lineage, which may be decondensed and active in another. The latter is the fraction of heterochromatin that stays compact through the cell cycle. It is mainly composed of repetitive sequences (e.g. satellite DNA), and is concentrated as mentioned above in centromeres and telomeres (Francastel et al., 2000).
Epigenetic
Historically, the word “epigenetics” was used to describe events that could not be explained by genetic principles. Conrad Waddington, who is given credit for coining the term, defined epigenetics as “the branch of biology which studies the causal interactions between genes and their products, which bring the phenotype into being” (Goldberg et al., 2007).
Epigenetics, in a broad sense, is a bridge between genotype and phenotype; it’s a phenomenon that changes the final outcome of a locus or chromosome without changing the underlying DNA sequence. More specifically, epigenetics may be defined as the study of any potentially stable and, ideally, heritable change in gene expression or cellular phenotype that occurs without changes in Watson-Crick base pairing of DNA.
Much of today’s epigenetic research is converging on the study of covalent and noncovalent modifications of histone proteins and DNA and the mechanisms by which such modifications influence overall chromatin structure (Goldberg et al., 2007). The efforts in studying these chromatin modifications have clearly showed that epigenetic contributes to regulate chromatin structure and DNA accessibility;
Figure 1-18. Epigenetic. In 1957, Conrad
Waddington proposed the concept of an epigenetic landscape to represent the process of cellular decision-making during development. At various points in this dynamic visual metaphor, the cell (represented by a ball) can take specific permitted trajectories, leading to different outcomes or cell fates. Figure reprinted from Waddington, 1957. (Goldberg et al., 2007).
nevertheless, it’s part of the core mechanism for regulating the transcriptional status of a genetic locus, whether a small element within an individual gene, a chromosomal domain, or even an entire chromosome (Bernstein et al., 2007). Among the epigenetic modifications there are mainly two categories: histone posttranslational modifications and DNA methylation. All these chromatin modifications influence how the genome is made manifest across a different array of developmental stages, tissue types and even disease states (Margueron et al., 2005).
Histone Post-Translational Modifications
The binding of a chemical group to one or more aminoacidic residue of histones is known as histone posttranslational modification (HPTM). There are a large number of HPTMs, and they divide into two groups (Allis et al., 2007): (1) small chemical groups, including acetylation, phosphorylation and methylation; (2) the much larger peptides, including ubiquitylation and sumoylation.
The mechanism through which HPTMs may affect chromatin structure and/or gene transcription is still poorly understood. Three mechanisms are commonly considered (Allis et al., 2007). In the first one, the binding of chemical compounds may change the charge of the aminoacids, altering the organization of the chromatin, leading it to a more or less condensed structure. The other two mechanisms propose that a structural change of the aminoacids, as a consequence of the HPTMs, may favor or block the binding of specific proteins, such as chromatin remodeling proteins, chromatin modifying complexes, and transcriptional factors.
Histones may be modified at many sites. To date, more than 60 different residues have been identified, either by specific antibodies or by mass spectrometry
Figure 1-18. Models showing how histone posttranslational modifications affect the chromain template. Model 1 proposes that
changes to chromatin structure are mediated by the cis effects of covalent histone modifications, such as histone acetylation or phosphorylation. Model 2 illustrates the inhibitory effect of an HPTM for the binding of a chromatin-associated factor (CF). In model 3, an HPTM may provide binding specificità for a chromatin-associated factor. (Allis et al., 2007).
(Kouzarides, 2007; Macek et al., 2006). This large number of histone posttranslational modifications and their various combinations have led to the idea that they regulate via combinatorial patterns, in temporal sequences, and can be established over short- and long-range distances.
Acetylation
Less condensed chromatin regions are transcriptionally active (Felsenfeld and Groudine, 2003; Weintraub and Groudine, 1976). Indeed, these regions are characterized by an “open” chromatin configuration, which is more accessible to enzymes involved in DNA regulatory processes, such as transcription (Allis et al., 2007). In addition, these regions showed to be closely correlated with acetylated histones (Hebbes et al., 1994), revealing a role for acetylation in chromatin condensation and gene regulation. Acetylation is a histone posttranslational modification mediated by a family of proteins called histone acetyl transferases (HAT). These enzymes transfer an acetyl group from acetyl-coenzyme A (acetyl-CoA) to the ε-amino group of specific lysine residues within the histone basic N-terminal tail region (Roth et al., 2001). HAT proteins can acetylate lysine on all four core histones, but different enzymes possess distinct specificities in their substrate of choice.
To date three families of HAT have been described. One major HAT family, GNAT (for GCN5 related acetyltransferase), targets histone H3 as its major substrate. A second family, the MYST, targets histone H4 as its main substrate. A third major family, CBP/p300, targets both H3 and H4, and is the most promiscuous. Each of these acetyltransferase families is also able to acetylate non-histone substrates (Allis et al., 2007; Glozak et al., 2005).
Figure 1-19. Characterized sites of histone acetylation. Histones are mostly acetylated at
lysine residues located in the amino termini of H3 and H4, with the exeption of H3K5 localized in the globular domain. The proteins that express binding specificity to acetylated histones are shown. (Allis et al., 2007).
The acetylation of histones may regulate chromatin structure through different mechanisms. It neutralizes the positively charged lysines, reducing the strength of binding of the strongly basic histones or histone tails to negatively charged DNA, opening chromatin for gene activation (Vettese-Dadey et al., 1996). But another mechanism exploited is the involvement of a specialized protein domain called bromodomain that specifically binds to acetylated lysines. Bromodomain is commonly found in many HATs, such as GCN5 and CBP/p300, and other chromatin-associated proteins (Allis et al., 2007; Dhalluin et al., 1999). Proteins containing this motif bind to acetylated histones and, thus, associate with chromatin (Hassan et al., 2002). Histone Acetylated site Role in transcription H3K9 Activation H3K14 Activation H3K18 Activation H3K56 Activation H4K5 Activation H4K8 Activation H4K12 Activation H4K16 Activation H2A Activation H2BK6 Activation H2BK7 Activation H2BK16 Activation H2BK17 Activation
Table 1-2. Role of different histone acetylated sites on transcription.
Acetyl groups may be removed from acetylated histones through histone deacetylase enzyme (HDAC) (Kurdistani and Grunstein, 2003; Yang and Seto, 2003). There are numerous HDAC enzymes and they fall into three catalytic groups. Type I and type II have a related mechanism of deacetylation, which does not involve a cofactor, whereas type III (Sir-2 related enzymes) require the cofactor NAD. Many of
HDACs are found within large multisubunit complexes, components of which serve to target the enzymes to genes, leading to transcriptional repression (Kurdistani and Grunstein, 2003; Yang and Seto, 2003).
Phosphorylation
Phosphorylation is a histone posttranslational modification associated with active transcription. Indeed, when immediate-early genes are induced to become transcriptionally active a strong correlation is found with H3 phosphorylation (Allis et al., 2007; Mahadevan et al., 1991). The histone 3 serine 10 residue has turned out to be an important phosphorylation site for transcription (Nowak and Corces, 2004). The precise mechanistic role of histone phosphorylation is still not known, but the collective negative charges resulting from the phosphorylation of clusters of nearby residues affects the affinity of binding of histone H1 to DNA, positively increasing the transcriptional potential (Dou and Gorovsky, 2002). Otherwise, phosphorylated residues of histones may dislodge proteins bound to chromatin (Fischle et al., 2005; Hirota et al., 2005) or, alternatively, they are bound by phospho-binding protein (Macdonald et al., 2005) that modify chromatin structure or or regulate transcription activity (Allis et al., 2007).
Methylation
Methylation is another histone posttranslational modification. It occurs on either lysines or arginines. Furthermore, there can be multiple methylated states on each residue, resulting in a higher level of complexity with respect to the other HPTMs. Indeed, lysines can be mono- (me1), di- (2me) or tri- (3me) methylated, whereas arginines can be mono- or di- methylated. The consequence of methylation can be either positive or negative toward transcriptional expression, depending on the position of the residue within the histone.
Given that there are at least 24 identified sites of lysine and arginines methylation on H3, H4, H2A and H2B, the number of distinct nucleosomal methylated sites is enormous. Such combinatorial potential of methylated nucleosomes may be necessary, at lest partly, to allow the regulation of complex and dynamic processes such as transcription and replication, which requires sequential and precisely timed events (Allis et al., 2007; Dimitrova and Gilbert, 1999; Wu et al., 2005).
Histone lysine methyltransferases (HKMTs) have been identified and their sites of modification on histones are defined. All of these enzymes, except Dot 1, share the SET domain, which contain the catalytically active site and allows binding to the S-adenosyl-L-methionine cofactor. Of the many known methylated sites, six have been well characterized do date: five on H3 (K4, K9, K27, K36 and K79) and one on H4 (K20). The role of these modifications on transcription is reported in Table 1-3. Specific protein binders recognize each of the six characterized methylation sites
and, in turn, regulate chromatin condensation and gene expression. These proteins have one of three distinct types of methyl lysine recognition domains: the chromo, tudor and PHD repeat domains. For example, the methylation of lysine 9 of histone 3 by the methyltransferase SUV39H creates a binding site for HP1 (Nishigaki et al., 2000). Once HP1 binds through Histone methylated site Role in transcription H3K4 Activation H3K9 Repression H3K27 Repression H3K36 Activation H3K79 Activation H4K20 Repression
Figure 1-20. Sites of histone methylation, their protein binders, and functional role in genomic processes. Methylation of histones
occurs at lysine residues in histones H3 and H4. Ceratin methylated lysine residues are associated with activating transcription (green Me flag), whereas others are involved in repressive processes (red Me flag). Proteins that bind particular methylated lysine residues are indicated. (Allis et al., 2007).
Table 1-3. Role of different histone methylated sites on transcription.