Secondary literature sources for T5orf172
The following references were automatically generated.
- Bahrami S, Ehsani R, Drablos F
- A property-based analysis of human transcription factors.
- BMC Res Notes. 2015; 8: 82-82
- Display abstract
BACKGROUND: Transcription factors are essential proteins for regulating gene expression. This regulation depends upon specific features of the transcription factors, including how they interact with DNA, how they interact with each other, and how they are post-translationally modified. Reliable information about key properties associated with transcription factors will therefore be useful for data analysis, in particular of data from high-throughput experiments. RESULTS: We have used an existing list of 1978 human proteins described as transcription factors to make a well-annotated data set, which includes information on Pfam domains, DNA-binding domains, post-translational modifications and protein-protein interactions. We have then used this data set for enrichment analysis. We have investigated correlations within this set of features, and between the features and more general protein properties. We have also used the data set to analyze previously published gene lists associated with cell differentiation, cancer, and tissue distribution. CONCLUSIONS: The study shows that well-annotated feature list for transcription factors is a useful resource for extensive data analysis; both of transcription factor properties in general and of properties associated with specific processes. However, the study also shows that such analyses are easily biased by incomplete coverage in experimental data, and by how gene sets are defined.
- NandyMazumdar M, Artsimovitch I
- Ubiquitous transcription factors display structural plasticity and diverse functions: NusG proteins - Shifting shapes and paradigms.
- Bioessays. 2015; 37: 324-34
- Display abstract
Numerous accessory factors modulate RNA polymerase response to regulatory signals and cellular cues and establish communications with co-transcriptional RNA processing. Transcription regulators are astonishingly diverse, with similar mechanisms arising via convergent evolution. NusG/Spt5 elongation factors comprise the only universally conserved and ancient family of regulators. They bind to the conserved clamp helices domain of RNA polymerase, which also interacts with non-homologous initiation factors in all domains of life, and reach across the DNA channel to form processivity clamps that enable uninterrupted RNA chain synthesis. In addition to this ubiquitous function, NusG homologs exert diverse, and sometimes opposite, effects on gene expression by competing with each other and other regulators for binding to the clamp helices and by recruiting auxiliary factors that facilitate termination, antitermination, splicing, translation, etc. This surprisingly diverse range of activities and the underlying unprecedented structural changes make studies of these "transformer" proteins both challenging and rewarding.
- Kaushik S, Sowdhamini R
- Distribution, classification, domain architectures and evolution of prolyl oligopeptidases in prokaryotic lineages.
- BMC Genomics. 2014; 15: 985-985
- Display abstract
BACKGROUND: Prolyl oligopeptidases (POPs) are proteolytic enzymes, widely distributed in all the kingdoms of life. Bacterial POPs are pharmaceutically important enzymes, yet their functional and evolutionary details are not fully explored. Therefore, current analysis is aimed at understanding the distribution, domain architecture, probable biological functions and gene family expansion of POPs in bacterial and archaeal lineages. RESULTS: Exhaustive sequence analysis of 1,202 bacterial and 91 archaeal genomes revealed ~3,000 POP homologs, with only 638 annotated POPs. We observed wide distribution of POPs in all the analysed bacterial lineages. Phylogenetic analysis and co-clustering of POPs of different phyla suggested their common functions in all the prokaryotic species. Further, on the basis of unique sequence motifs we could classify bacterial POPs into eight subtypes. Analysis of coexisting domains in POPs highlighted their involvement in protein-protein interactions and cellular signaling. We proposed significant extension of this gene family by characterizing 39 new POPs and 158 new alpha/beta hydrolase members. CONCLUSIONS: Our study reflects diversity and functional importance of POPs in bacterial species. Many genomes with multiple POPs were identified with high sequence variations and different cellular localizations. Such anomalous distribution of POP genes in different bacterial genomes shows differential expansion of POP gene family primarily by multiple horizontal gene transfer events.
- Gonzalez MW, Spouge JL
- Domain analysis of symbionts and hosts (DASH) in a genome-wide survey of pathogenic human viruses.
- BMC Res Notes. 2013; 6: 209-209
- Display abstract
BACKGROUND: In the coevolution of viruses and their hosts, viruses often capture host genes, gaining advantageous functions (e.g. immune system control). Identifying functional similarities shared by viruses and their hosts can help decipher mechanisms of pathogenesis and accelerate virus-targeted drug and vaccine development. Cellular homologs in viruses are usually documented using pairwise-sequence comparison methods. Yet, pairwise-sequence searches have limited sensitivity resulting in poor identification of divergent homologies. RESULTS: Methods based on profiles from multiple sequences provide a more sensitive alternative to identify similarities in host-pathogen systems. The present work describes a profile-based bioinformatics pipeline that we call the Domain Analysis of Symbionts and Hosts (DASH). DASH provides a web platform for the functional analysis of viral and host genomes. This study uses Human Herpesvirus 8 (HHV-8) as a model to validate the methodology. Our results indicate that HHV-8 shares at least 29% of its genes with humans (fourteen immunomodulatory and ten metabolic genes). DASH also suggests functions for fifty-one additional HHV-8 structural and metabolic proteins. We also perform two other comparative genomics studies of human viruses: (1) a broad survey of eleven viruses of disparate sizes and transcription strategies; and (2) a closer examination of forty-one viruses of the order Mononegavirales. In the survey, DASH detects human homologs in 4/5 DNA viruses. None of the non-retro-transcribing RNA viruses in the survey showed evidence of homology to humans. The order Mononegavirales are also non-retro-transcribing RNA viruses, however, and DASH found homology in 39/41 of them. Mononegaviruses display larger fractions of human similarities (up to 75%) than any of the other RNA or DNA viruses (up to 55% and 29% respectively). CONCLUSIONS: We conclude that gene sharing probably occurs between humans and both DNA and RNA viruses, in viral genomes of differing sizes, regardless of transcription strategies. Our method (DASH) simultaneously analyzes the genomes of two interacting species thereby mining functional information to identify shared as well as exclusive domains to each organism. Our results validate our approach, showing that DASH has potential as a pipeline for making therapeutic discoveries in other host-symbiont systems. DASH results are available at http://tinyurl.com/spouge-dash.
- Iyer LM, Aravind L
- ALOG domains: provenance of plant homeotic and developmental regulators from the DNA-binding domain of a novel class of DIRS1-type retroposons.
- Biol Direct. 2012; 7: 39-39
- Display abstract
Members of the Arabidopsis LSH1 and Oryza G1 (ALOG) family of proteins have been shown to function as key developmental regulators in land plants. However, their precise mode of action remains unclear. Using sensitive sequence and structure analysis, we show that the ALOG domains are a distinct version of the N-terminal DNA-binding domain shared by the XerC/D-like, protelomerase, topoisomerase-IA, and Flp tyrosine recombinases. ALOG domains are distinguished by the insertion of an additional zinc ribbon into this DNA-binding domain. In particular, we show that the ALOG domain is derived from the XerC/D-like recombinases of a novel class of DIRS-1-like retroposons. Copies of this element, which have been recently inactivated, are present in several marine metazoan lineages, whereas the stramenopile Ectocarpus, retains an active copy of the same. Thus, we predict that ALOG domains help establish organ identity and differentiation by binding specific DNA sequences and acting as transcription factors or recruiters of repressive chromatin. They are also found in certain plant defense proteins, where they are predicted to function as DNA sensors. The evolutionary history of the ALOG domain represents a unique instance of a domain, otherwise exclusively found in retroelements, being recruited as a specific transcription factor in the streptophyte lineage of plants. Hence, they add to the growing evidence for derivation of DNA-binding domains of eukaryotic specific TFs from mobile and selfish elements.
- Fischer MG, Suttle CA
- A virophage at the origin of large DNA transposons.
- Science. 2011; 332: 231-4
- Display abstract
DNA transposons are mobile genetic elements that have shaped the genomes of eukaryotes for millions of years, yet their origins remain obscure. We discovered a virophage that, on the basis of genetic homology, likely represents an evolutionary link between double-stranded DNA viruses and Maverick/Polinton eukaryotic DNA transposons. The Mavirus virophage parasitizes the giant Cafeteria roenbergensis virus and encodes 20 predicted proteins, including a retroviral integrase and a protein-primed DNA polymerase B. On the basis of our data, we conclude that Maverick/Polinton transposons may have originated from ancient relatives of Mavirus, and thereby influenced the evolution of eukaryotic genomes, although we cannot rule out alternative evolutionary scenarios.
- Makarova KS, Wolf YI, Snir S, Koonin EV
- Defense islands in bacterial and archaeal genomes and prediction of novel defense systems.
- J Bacteriol. 2011; 193: 6039-56
- Display abstract
The arms race between cellular life forms and viruses is a major driving force of evolution. A substantial fraction of bacterial and archaeal genomes is dedicated to antivirus defense. We analyzed the distribution of defense genes and typical mobilome components (such as viral and transposon genes) in bacterial and archaeal genomes and demonstrated statistically significant clustering of antivirus defense systems and mobile genes and elements in genomic islands. The defense islands are enriched in putative operons and contain numerous overrepresented gene families. A detailed sequence analysis of the proteins encoded by genes in these families shows that many of them are diverged variants of known defense system components, whereas others show features, such as characteristic operonic organization, that are suggestive of novel defense systems. Thus, genomic islands provide abundant material for the experimental study of bacterial and archaeal antivirus defense. Except for the CRISPR-Cas systems, different classes of defense systems, in particular toxin-antitoxin and restriction-modification systems, show nonrandom clustering in defense islands. It remains unclear to what extent these associations reflect functional cooperation between different defense systems and to what extent the islands are genomic "sinks" that accumulate diverse nonessential genes, particularly those acquired via horizontal gene transfer. The characteristics of defense islands resemble those of mobilome islands. Defense and mobilome genes are nonrandomly associated in islands, suggesting nonadaptive evolution of the islands via a preferential attachment-like mechanism underpinned by the addictive properties of defense systems such as toxins-antitoxins and an important role of horizontal mobility in the evolution of these islands.
- Aravind L, Abhiman S, Iyer LM
- Natural history of the eukaryotic chromatin protein methylation system.
- Prog Mol Biol Transl Sci. 2011; 101: 105-76
- Display abstract
In eukaryotes, methylation of nucleosomal histones and other nuclear proteins is a central aspect of chromatin structure and dynamics. The past 15 years have seen an enormous advance in our understanding of the biochemistry of these modifications, and of their role in establishing the epigenetic code. We provide a synthetic overview, from an evolutionary perspective, of the main players in the eukaryotic chromatin protein methylation system, with an emphasis on catalytic domains. Several components of the eukaryotic protein methylation system had their origins in bacteria. In particular, the Rossmann fold protein methylases (PRMTs and DOT1), and the LSD1 and jumonji-related demethylases and oxidases, appear to have emerged in the context of bacterial peptide methylation and hydroxylation systems. These systems were originally involved in synthesis of peptide secondary metabolites, such as antibiotics, toxins, and siderophores. The peptidylarginine deiminases appear to have been acquired by animals from bacterial enzymes that modify cell-surface proteins. SET domain methylases, which display the beta-clip fold, apparently first emerged in prokaryotes from the SAF superfamily of carbohydrate-binding domains. However, even in bacteria, a subset of the SET domains might have evolved a chromatin-related role in conjunction with a BAF60a/b-like SWIB domain protein and topoisomerases. By the time of the last eukaryotic common ancestor, multiple SET and PRMT methylases were already in place and are likely to have mediated methylation at the H3K4, H3K9, H3K36, and H4K20 positions, and carried out both asymmetric and symmetric arginine dimethylation. Inference of H3K27 methylation in the ancestral eukaryote appears uncertain, though it was certainly in place a little later in eukaryotic evolution. Current data suggest that unlike SET methylases, which are universally present in eukaryotes, demethylases are not. They appear to be absent in the earliest-branching eukaryotic lineages, and emerged later along with several other chromatin proteins, such as the Dot1-methylase, prior to divergence of the kinetoplastid-heterolobosean lineage from the remaining eukaryotes. This period also corresponds to the point of origin of DNA cytosine methylation by DNMT1. Origin of major lineages of SET domains such as the Trithorax, Su(var)3-9, Ash1, SMYD, and TTLL12 and E(Z) might have played the initial role in the establishment of multiple distinct heterochromatic and euchromatic states that are likely to have been present, in some form, through much of eukaryotic evolution. Elaboration of these chromatin states might have gone hand-in-hand with acquisition of multiple jumonji-related and LSD1-like demethylases, and functional linkages with the DNA methylation and RNAi systems. Throughout eukaryotic evolution, there were several lineage-specific expansions of SET domain proteins, which might be related to a special transcription regulation process in trypanosomes, acquisition of new meiotic recombination hotspots in animals, and methylation and associated modifications of the diatom silaffin proteins involved in silica biomineralization. The use of specific domains to "read" the methylation marks appears to have been present in the ancestral eukaryote itself. Of these the chromo-like domains appear to have been acquired from bacterial secreted proteins that might have a role in binding cell-surface peptides or peptidoglycan. Domain architectures of the primary enzymes involved in the eukaryotic protein methylation system indicate key features relating to interactions with each other and other modifications in chromatin, such as acetylation. They also emphasize the profound functional distinction between the role of demethylation and deacetylation in regulation of chromatin dynamics.
- Bilewitch JP, Degnan SM
- A unique horizontal gene transfer event has provided the octocoral mitochondrial genome with an active mismatch repair gene that has potential for an unusual self-contained function.
- BMC Evol Biol. 2011; 11: 228-228
- Display abstract
BACKGROUND: The mitochondrial genome of the Octocorallia has several characteristics atypical for metazoans, including a novel gene suggested to function in DNA repair. This mtMutS gene is favored for octocoral molecular systematics, due to its high information content. Several hypotheses concerning the origins of mtMutS have been proposed, and remain equivocal, although current weight of support is for a horizontal gene transfer from either an epsilonproteobacterium or a large DNA virus. Here we present new and compelling evidence on the evolutionary origin of mtMutS, and provide the very first data on its activity, functional capacity and stability within the octocoral mitochondrial genome. RESULTS: The mtMutS gene has the expected conserved amino acids, protein domains and predicted tertiary protein structure. Phylogenetic analysis indicates that mtMutS is not a member of the MSH family and therefore not of eukaryotic origin. MtMutS clusters closely with representatives of the MutS7 lineage; further support for this relationship derives from the sharing of a C-terminal endonuclease domain that confers a self-contained mismatch repair function. Gene expression analyses confirm that mtMutS is actively transcribed in octocorals. Rates of mitochondrial gene evolution in mtMutS-containing octocorals are lower than in their hexacoral sister-group, which lacks the gene, although paradoxically the mtMutS gene itself has higher rates of mutation than other octocoral mitochondrial genes. CONCLUSIONS: The octocoral mtMutS gene is active and codes for a protein with all the necessary components for DNA mismatch repair. A lower rate of mitochondrial evolution, and the presence of a nicking endonuclease domain, both indirectly support a theory of self-sufficient DNA mismatch repair within the octocoral mitochondrion. The ancestral affinity of mtMutS to non-eukaryotic MutS7 provides compelling support for an origin by horizontal gene transfer. The immediate vector of transmission into octocorals can be attributed to either an epsilonproteobacterium in an endosymbiotic association or to a viral infection, although DNA viruses are not currently known to infect both bacteria and eukaryotes, nor mitochondria in particular. In consolidating the first known case of HGT into an animal mitochondrial genome, these findings suggest the need for reconsideration of the means by which metazoan mitochondrial genomes evolve.
- Gonzalez JM, Esteban M
- A poxvirus Bcl-2-like gene family involved in regulation of host immune response: sequence similarity and evolutionary history.
- Virol J. 2010; 7: 59-59
- Display abstract
BACKGROUND: Poxviruses evade the immune system of the host through the action of viral encoded inhibitors that block various signalling pathways. The exact number of viral inhibitors is not yet known. Several members of the vaccinia virus A46 and N1 families, with a Bcl-2-like structure, are involved in the regulation of the host innate immune response where they act non-redundantly at different levels of the Toll-like receptor signalling pathway. N1 also maintains an anti-apoptotic effect by acting similarly to cellular Bcl-2 proteins. Whether there are related families that could have similar functions is the main subject of this investigation. RESULTS: We describe the sequence similarity existing among poxvirus A46, N1, N2 and C1 protein families, which share a common domain of approximately 110-140 amino acids at their C-termini that spans the entire N1 sequence. Secondary structure and fold recognition predictions suggest that this domain presents an all-alpha-helical fold compatible with the Bcl-2-like structures of vaccinia virus proteins N1, A52, B15 and K7. We propose that these protein families should be merged into a single one. We describe the phylogenetic distribution of this family and reconstruct its evolutionary history, which indicates an extensive gene gain in ancestral viruses and a further stabilization of its gene content. CONCLUSIONS: Based on the sequence/structure similarity, we propose that other members with unknown function, like vaccinia virus N2, C1, C6 and C16/B22, might have a similar role in the suppression of host immune response as A46, A52, B15 and K7, by antagonizing at different levels with the TLR signalling pathways.
- Sorokin V, Severinov K, Gelfand MS
- Large-scale identification and analysis of C-proteins.
- Methods Mol Biol. 2010; 674: 269-82
- Display abstract
The restriction-modification system is a toxin-antitoxin mechanism of bacterial cells to resist phage attacks. High efficiency comes at a price of high maintenance costs: (1) a host cell dies whenever it loses restriction-modification genes and (2) whenever a plasmid with restriction-modification genes enters a naive cell, modification enzyme (methylase) has to be expressed prior to the synthesis of the restriction enzyme (restrictase) or the cell dies. These phenomena imply a sophisticated regulatory mechanism. During the evolution several such mechanisms were developed, of which one relies on a special C(control)-protein, a short autoregulatory protein containing an HTH-domain. Given the extreme diversity among restriction-modification systems, one could expect that C-proteins had evolved into several groups that might differ in autoregulatory binding sites architecture. However, only a few C-proteins (and the corresponding binding sites) were known before this study. Bioinformatics studies applied to C-proteins and their binding sites were limited to groups of well-known C-proteins and lacked systematic analysis. In this work, the authors use bioinformatics techniques to discover 201 C-protein genes with predicted autoregulatory binding sites. The systematic analysis of the predicted sites allowed for the discovery of 10 structural classes of binding sites.
- Carland TM, Gerwick L
- The C1q domain containing proteins: Where do they come from and what do they do?
- Dev Comp Immunol. 2010; 34: 785-90
- Display abstract
The gene sequence encoding an N-terminal collagen stalk followed by a globular complement 1q domain (gC1q), an architecture that characterizes the C1q A, B and C chains of the first complement component (C1), did not become prevalent until the cephalochordates and urochordates. However, genes encoding only the globular complement 1q domain (ghC1q) are more ancient as they exist within many lower vertebrate and invertebrate genomes, and are even present in the prokaryotes. These genes can be divided into two groups, the first, which appears to be the more ancient form, encodes proteins that are not secreted (cghC1q). The second group encodes proteins in which the globular domain is preceded by a signal peptide indicating secretion (sghC1q). In this review we examine bioinformatic evidence for C1q domain containing (C1qDC) genes in many organisms and integrate these observations with research performed and published on the biochemistry and functions of this fascinating set of proteins.
- Christie GE et al.
- The complete genomes of Staphylococcus aureus bacteriophages 80 and 80alpha--implications for the specificity of SaPI mobilization.
- Virology. 2010; 407: 381-90
- Display abstract
Staphylococcus aureus pathogenicity islands (SaPIs) are mobile elements that are induced by a helper bacteriophage to excise and replicate and to be encapsidated in phage-like particles smaller than those of the helper, leading to high-frequency transfer. SaPI mobilization is helper phage specific; only certain SaPIs can be mobilized by a particular helper phage. Staphylococcal phage 80alpha can mobilize every SaPI tested thus far, including SaPI1, SaPI2 and SaPIbov1. Phage 80, on the other hand, cannot mobilize SaPI1, and varphi11 mobilizes only SaPIbov1. In order to better understand the relationship between SaPIs and their helper phages, the genomes of phages 80 and 80alpha were sequenced, compared with other staphylococcal phage genomes, and analyzed for unique features that may be involved in SaPI mobilization.
- Lohse MB, Zordan RE, Cain CW, Johnson AD
- Distinct class of DNA-binding domains is exemplified by a master regulator of phenotypic switching in Candida albicans.
- Proc Natl Acad Sci U S A. 2010; 107: 14105-10
- Display abstract
Among the most important classes of regulatory proteins are the sequence-specific DNA-binding proteins that control transcription through the occupancy of discrete DNA sequences within genomes. Currently, this class of proteins encompasses at least 37 distinct structural superfamilies and more than 100 distinct structural motifs. In this paper, we examine the transcriptional regulator Wor1, a master regulator of white-opaque switching in the human fungal pathogen Candida albicans. As assessed by a variety of algorithms, this protein has no sequence or structural similarity to any known DNA-binding protein. It is, however, conserved across the vast fungal lineage, with a 300aa region of sequence conservation. Here, we show that this 300aa region of Wor1 exhibits sequence-specific DNA binding and therefore represents a new superfamily of DNA-binding proteins. We identify the 14-nucleotide-pair DNA sequence recognized by Wor1, characterize the site through mutational analysis, and demonstrate that this sequence is sufficient for the Wor1-dependent activation of transcription in vivo. Within the 300aa DNA-binding conserved region, which we have termed the WOPR box, are two domains (WOPRa and WOPRb), dissimilar to each other but especially well-conserved across the fungal lineage. We show that the WOPR box binds DNA as a monomer and that neither domain, when expressed and purified separately, exhibits sequence-specific binding. DNA binding is restored, however, when the two isolated domains are added together. These results indicate that the WOPR family of DNA-binding proteins involves an unusual coupling between two dissimilar, covalently linked domains.
- Apic G, Russell RB
- Domain recombination: a workhorse for evolutionary innovation.
- Sci Signal. 2010; 3: 30-30
- Display abstract
Although the combination of modular domains within proteins has been proposed as a determining feature of evolutionary innovation and flexibility, direct evidence for this mechanism of evolution has been sketchy. Two papers, one creating new domain combinations in the yeast mating pathway and another involving a comprehensive analysis of protein function and domain architecture across major organisms, have provided firm evidence that the recombining of domains can lead to evolutionary innovation. The results will guide future studies in synthetic and evolutionary biology.
- Nacher JC, Hayashida M, Akutsu T
- The role of internal duplication in the evolution of multi-domain proteins.
- Biosystems. 2010; 101: 127-35
- Display abstract
Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion.
- Boyer M, Gimenez G, Suzan-Monti M, Raoult D
- Classification and determination of possible origins of ORFans through analysis of nucleocytoplasmic large DNA viruses.
- Intervirology. 2010; 53: 310-20
- Display abstract
OBJECTIVE: An important proportion of coding sequences in genomes, notably in viruses, do not match any sequences in databases and are assigned as ORFan sequences. Nucleocytoplasmic large DNA viruses (NCLDVs) harbor great numbers of ORFs with a high number consisting of ORFans. Thus, we decided to decipher the nature of ORFans in the NCLDVs. METHODS: A genome-wide study was carried out to estimate the ORFan proportion in NCLDV genomes and to analyze their general features compared with non-ORFan. RESULTS: The ORFan percentages comprised between 2.8 and 75.2% of the ORF content according to the virus lineage. We propose to classify ORFans in four categories according to their possible match with metagenomic sequences and their prevalence at different taxonomic ranks. Our results indicate that NCLDV ORFans have overall similar features with non-ORFans, except they are shorter. CONCLUSIONS: An ORFan classification scheme was proposed to decipher their origin and evolution. Most ORFans were likely labeled ORFan owing to the gap of knowledge of the sequence space. ORFans might be true functional genes with likely the same expression potential as non-ORFan genes. Part of them may also correspond to new genes formed de novo through the diverse mechanisms of gene evolution.
- Colson P, Raoult D
- Gene repertoire of amoeba-associated giant viruses.
- Intervirology. 2010; 53: 330-43
- Display abstract
Acanthamoeba polyphaga mimivirus, Marseillevirus, and Sputnik, a virophage, are intra-amoebal viruses that have been isolated from water collected in cooling towers. They have provided fascinating data and have raised exciting questions about viruses definition and evolution. Mimivirus and Marseillevirus have been classified in the nucleo-cytoplasmic large DNA viruses (NCLDVs) class. Their genomes are the largest and fifth largest viral genomes sequenced so far. The gene repertoire of these amoeba-associated viruses can be divided into four groups: the core genome, genes acquired by lateral gene transfer, duplicated genes, and ORFans. Open reading frames (ORFs) that have homologs in the NCLDVs core gene set represent 2.9 and 6.1% of the Mimivirus and Marseillevirus gene contents, respectively. A substantial proportion of the Mimivirus, Marseillevirus and Sputnik ORFs exhibit sequence similarities to homologs found in bacteria, archaea, eukaryotes or viruses. The large amount of chimeric genes in these viral genomes might have resulted from acquisitions by lateral gene transfers, implicating sympatric bacteria and viruses with an intra-amoebal lifestyle. In addition, lineage-specific gene expansion may have played a major role in the genome shaping. Altogether, the data so far accumulated on amoeba-associated giant viruses are a powerful incentive to isolate and study additional strains to gain better understanding of their pangenome.
- Yutin N, Wolf YI, Raoult D, Koonin EV
- Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution.
- Virol J. 2009; 6: 223-223
- Display abstract
BACKGROUND: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes. RESULTS: A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions. CONCLUSIONS: The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.
- Makarova KS, Wolf YI, van der Oost J, Koonin EV
- Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements.
- Biol Direct. 2009; 4: 29-29
- Display abstract
BACKGROUND: In eukaryotes, RNA interference (RNAi) is a major mechanism of defense against viruses and transposable elements as well of regulating translation of endogenous mRNAs. The RNAi systems recognize the target RNA molecules via small guide RNAs that are completely or partially complementary to a region of the target. Key components of the RNAi systems are proteins of the Argonaute-PIWI family some of which function as slicers, the nucleases that cleave the target RNA that is base-paired to a guide RNA. Numerous prokaryotes possess the CRISPR-associated system (CASS) of defense against phages and plasmids that is, in part, mechanistically analogous but not homologous to eukaryotic RNAi systems. Many prokaryotes also encode homologs of Argonaute-PIWI proteins but their functions remain unknown. RESULTS: We present a detailed analysis of Argonaute-PIWI protein sequences and the genomic neighborhoods of the respective genes in prokaryotes. Whereas eukaryotic Ago/PIWI proteins always contain PAZ (oligonucleotide binding) and PIWI (active or inactivated nuclease) domains, the prokaryotic Argonaute homologs (pAgos) fall into two major groups in which the PAZ domain is either present or absent. The monophyly of each group is supported by a phylogenetic analysis of the conserved PIWI-domains. Almost all pAgos that lack a PAZ domain appear to be inactivated, and the respective genes are associated with a variety of predicted nucleases in putative operons. An additional, uncharacterized domain that is fused to various nucleases appears to be a unique signature of operons encoding the short (lacking PAZ) pAgo form. By contrast, almost all PAZ-domain containing pAgos are predicted to be active nucleases. Some proteins of this group (e.g., that from Aquifex aeolicus) have been experimentally shown to possess nuclease activity, and are not typically associated with genes for other (putative) nucleases. Given these observations, the apparent extensive horizontal transfer of pAgo genes, and their common, statistically significant over-representation in genomic neighborhoods enriched in genes encoding proteins involved in the defense against phages and/or plasmids, we hypothesize that pAgos are key components of a novel class of defense systems. The PAZ-domain containing pAgos are predicted to directly destroy virus or plasmid nucleic acids via their nuclease activity, whereas the apparently inactivated, PAZ-lacking pAgos could be structural subunits of protein complexes that contain, as active moieties, the putative nucleases that we predict to be co-expressed with these pAgos. All these nucleases are predicted to be DNA endonucleases, so it seems most probable that the putative novel phage/plasmid-defense system targets phage DNA rather than mRNAs. Given that in eukaryotic RNAi systems, the PAZ domain binds a guide RNA and positions it on the complementary region of the target, we further speculate that pAgos function on a similar principle (the guide being either DNA or RNA), and that the uncharacterized domain found in putative operons with the short forms of pAgos is a functional substitute for the PAZ domain. CONCLUSION: The hypothesis that pAgos are key components of a novel prokaryotic immune system that employs guide RNA or DNA molecules to degrade nucleic acids of invading mobile elements implies a functional analogy with the prokaryotic CASS and a direct evolutionary connection with eukaryotic RNAi. The predictions of the hypothesis including both the activities of pAgos and those of the associated endonucleases are readily amenable to experimental tests.
- Ye Y, Scheel H, Hofmann K, Komander D
- Dissection of USP catalytic domains reveals five common insertion points.
- Mol Biosyst. 2009; 5: 1797-808
- Display abstract
Ubiquitin specific proteases (USPs) are the largest family of deubiquitinating enzymes with approximately 56 members in humans. USPs regulate a wide variety of cellular processes by their ability to remove (poly)ubiquitin from target proteins. Their enzymatic activity is encoded in a common catalytic core of approximately 350 amino acids, however many USPs show significantly larger catalytic domains. Here we have analysed human and yeast USP domains, combining bioinformatics with structural information. We reveal that all USP domains can be divided into six conserved boxes, and we map the conserved boxes onto the USP domain core structure. The boxes are interspersed by insertions, some of which as large as the catalytic core. The two most common insertion points place inserts near the distal ubiquitin binding site, and in many cases ubiquitin binding domains or ubiquitin-like folds are found in these insertions, potentially directly affecting catalytic function. Other inserted sequences are unstructured, and removal of these might aid future structural and functional analysis. Yeast USP domains have a different pattern of inserted sequences, suggesting that the insertions are hotspots for evolutionary diversity to expand USP functionality.
- Yutin N, Koonin EV
- Evolution of DNA ligases of nucleo-cytoplasmic large DNA viruses of eukaryotes: a case of hidden complexity.
- Biol Direct. 2009; 4: 51-51
- Display abstract
BACKGROUND: Eukaryotic Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) encode most if not all of the enzymes involved in their DNA replication. It has been inferred that genes for these enzymes were already present in the last common ancestor of the NCLDV. However, the details of the evolution of these genes that bear on the complexity of the putative ancestral NCLDV and on the evolutionary relationships between viruses and their hosts are not well understood. RESULTS: Phylogenetic analysis of the ATP-dependent and NAD-dependent DNA ligases encoded by the NCLDV reveals an unexpectedly complex evolutionary history. The NAD-dependent ligases are encoded only by a minority of NCLDV (including mimiviruses, some iridoviruses and entomopoxviruses) but phylogenetic analysis clearly indicated that all viral NAD-dependent ligases are monophyletic. Combined with the topology of the NCLDV tree derived by consensus of trees for universally conserved genes suggests that this enzyme was represented in the ancestral NCLDV. Phylogenetic analysis of ATP-dependent ligases that are encoded by chordopoxviruses, most of the phycodnaviruses and Marseillevirus failed to demonstrate monophyly and instead revealed an unexpectedly complex evolutionary trajectory. The ligases of the majority of phycodnaviruses and Marseillevirus seem to have evolved from bacteriophage or bacterial homologs; the ligase of one phycodnavirus, Emiliana huxlei virus, belongs to the eukaryotic DNA ligase I branch; and ligases of chordopoxviruses unequivocally cluster with eukaryotic DNA ligase III. CONCLUSIONS: Examination of phyletic patterns and phylogenetic analysis of DNA ligases of the NCLDV suggest that the common ancestor of the extant NCLDV encoded an NAD-dependent ligase that most likely was acquired from a bacteriophage at the early stages of evolution of eukaryotes. By contrast, ATP-dependent ligases from different prokaryotic and eukaryotic sources displaced the ancestral NAD-dependent ligase at different stages of subsequent evolution. These findings emphasize complex routes of viral evolution that become apparent through detailed phylogenomic analysis but not necessarily in reconstructions based on phyletic patterns of genes. REVIEWERS: This article was reviewed by: Patrick Forterre, George V. Shpakovski, and Igor B. Zhulin.
- Song N, Joseph JM, Davis GB, Durand D
- Sequence similarity network reveals common ancestry of multidomain proteins.
- PLoS Comput Biol. 2008; 4: 1000063-1000063
- Display abstract
We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain architectures, we show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them. Our study demonstrates the utility of mining network structure for evolutionary information, suggesting this is a fertile approach for investigating evolutionary processes in the post-genomic era.
- Park J et al.
- FTFD: an informatics pipeline supporting phylogenomic analysis of fungal transcription factors.
- Bioinformatics. 2008; 24: 1024-5
- Display abstract
SUMMARY: Genomes of more than 60 fungal species have been sequenced to date, yet there has been no systematic approach to analyze fungal transcription factors (TFs) kingdom widely. We developed a standardized pipeline for annotating TFs in fungal genomes. Resulting data have been archived in a new database termed the Fungal Transcription Factor Database (FTFD). In FTFD, 31,832 putative fungal TFs, identified from 62 fungal and 3 Oomycete species, were classified into 61 families and phylogenetically analyzed. The FTFD will serve as a community resource supporting comparative analyses of the distribution and domain structure of TFs within and across species. AVAILABILITY: All data described in this study can be browsed through the FTFD web site at http://ftfd.snu.ac.kr/.
- Kumar A, Joo WS, Meinke G, Moine S, Naumova EN, Bullock PA
- Evidence for a structural relationship between BRCT domains and the helicase domains of the replication initiators encoded by the Polyomaviridae and Papillomaviridae families of DNA tumor viruses.
- J Virol. 2008; 82: 8849-62
- Display abstract
Studies of DNA tumor viruses have provided important insights into fundamental cellular processes and oncogenic transformation. They have revealed, for example, that upon expression of virally encoded proteins, cellular pathways involved in DNA repair and cell cycle control are disrupted. Herein, evidence is presented that BRCT-related regions are present in the helicase domains of the viral initiators encoded by the Polyomaviridae and Papillomaviridae viral families. Of interest, BRCT domains in cellular proteins recruit factors involved in diverse pathways, including DNA repair and the regulation of cell cycle progression. Therefore, the viral BRCT-related regions may compete with host BRCT domains for particular cellular ligands, a process that would help to explain the pleiotropic effects associated with infections with many DNA tumor viruses.
- Prevorovsky M, Puta F, Folk P
- Fungal CSL transcription factors.
- BMC Genomics. 2007; 8: 233-233
- Display abstract
BACKGROUND: The CSL (CBF1/RBP-Jkappa/Suppressor of Hairless/LAG-1) transcription factor family members are well-known components of the transmembrane receptor Notch signaling pathway, which plays a critical role in metazoan development. They function as context-dependent activators or repressors of transcription of their responsive genes, the promoters of which harbor the GTG(G/A)GAA consensus elements. Recently, several studies described Notch-independent activities of the CSL proteins. RESULTS: We have identified putative CSL genes in several fungal species, showing that this family is not confined to metazoans. We have analyzed their sequence conservation and identified the presence of well-defined domains typical of genuine CSL proteins. Furthermore, we have shown that the candidate fungal protein sequences contain highly conserved regions known to be required for sequence-specific DNA binding in their metazoan counterparts. The phylogenetic analysis of the newly identified fungal CSL proteins revealed the existence of two distinct classes, both of which are present in all the species studied. CONCLUSION: Our findings support the evolutionary origin of the CSL transcription factor family in the last common ancestor of fungi and metazoans. We hypothesize that the ancestral CSL function involved DNA binding and Notch-independent regulation of transcription and that this function may still be shared, to a certain degree, by the present CSL family members from both fungi and metazoans.
- Kopp JL, Wilder PJ, Desler M, Kinarsky L, Rizzino A
- Different domains of the transcription factor ELF3 are required in a promoter-specific manner and multiple domains control its binding to DNA.
- J Biol Chem. 2007; 282: 3027-41
- Display abstract
Elf3 is an epithelially restricted member of the ETS transcription factor family, which is involved in a wide range of normal cellular processes. Elf3 is also aberrantly expressed in several cancers, including breast cancer. To better understand the molecular mechanisms by which Elf3 regulates these processes, we created a large series of Elf3 mutant proteins with specific domains deleted or targeted by point mutations. The modified forms of Elf3 were used to analyze the contribution of each domain to DNA binding and the activation of gene expression. Our work demonstrates that three regions of Elf3, in addition to its DNA binding domain (ETS domain), influence Elf3 binding to DNA, including the transactivation domain that behaves as an autoinhibitory domain. Interestingly, disruption of the transactivation domain relieves the autoinhibition of Elf3 and enhances Elf3 binding to DNA. On the basis of these studies, we suggest a model for autoinhibition of Elf3 involving intramolecular interactions. Importantly, this model is consistent with our finding that the N-terminal region of Elf3, which contains the transactivation domain, interacts with its C terminus, which contains the ETS domain. In parallel studies, we demonstrate that residues flanking the N- and C-terminal sides of the ETS domain of Elf3 are crucial for its binding to DNA. Our studies also show that an AT-hook domain, as well as the serine- and aspartic acid-rich domain but not the pointed domain, is necessary for Elf3 activation of promoter activity. Unexpectedly, we determined that one of the AT-hook domains is required in a promoter-specific manner.
- Iyer LM, Balaji S, Koonin EV, Aravind L
- Evolutionary genomics of nucleo-cytoplasmic large DNA viruses.
- Virus Res. 2006; 117: 156-84
- Display abstract
A previous comparative-genomic study of large nuclear and cytoplasmic DNA viruses (NCLDVs) of eukaryotes revealed the monophyletic origin of four viral families: poxviruses, asfarviruses, iridoviruses, and phycodnaviruses [Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75 (23), 11720-11734]. Here we update this analysis by including the recently sequenced giant genome of the mimiviruses and several additional genomes of iridoviruses, phycodnaviruses, and poxviruses. The parsimonious reconstruction of the gene complement of the ancestral NCLDV shows that it was a complex virus with at least 41 genes that encoded the replication machinery, up to four RNA polymerase subunits, at least three transcription factors, capping and polyadenylation enzymes, the DNA packaging apparatus, and structural components of an icosahedral capsid and the viral membrane. The phylogeny of the NCLDVs is reconstructed by cladistic analysis of the viral gene complements, and it is shown that the two principal lineages of NCLDVs are comprised of poxviruses grouped with asfarviruses and iridoviruses grouped with phycodnaviruses-mimiviruses. The phycodna-mimivirus grouping was strongly supported by several derived shared characters, which seemed to rule out the previously suggested basal position of the mimivirus [Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.M. 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306 (5700), 1344-1350]. These results indicate that the divergence of the major NCLDV families occurred at an early stage of evolution, prior to the divergence of the major eukaryotic lineages. It is shown that subsequent evolution of the NCLDV genomes involved lineage-specific expansion of paralogous gene families and acquisition of numerous genes via horizontal gene transfer from the eukaryotic hosts, other viruses, and bacteria (primarily, endosymbionts and parasites). Amongst the expansions, there are multiple families of predicted virus-specific signaling and regulatory domains. Most NCLDVs have also acquired large arrays of genes related to ubiquitin signaling, and the animal viruses in particular have independently evolved several defenses against apoptosis and immune response, including growth factors and potential inhibitors of cytokine signaling. The mimivirus displays an enormous array of genes of bacterial provenance, including a representative of a new class of predicted papain-like peptidases. It is further demonstrated that a significant number of genes found in NCLDVs also have homologs in bacteriophages, although a vertical relationship between the NCLDVs and a particular bacteriophage group could not be established. On the basis of these observations, two alternative scenarios for the origin of the NCLDVs and other groups of large DNA viruses of eukaryotes are considered. One of these scenarios posits an early assembly of an already large DNA virus precursor from which various large DNA viruses diverged through an ongoing process of displacement of the original genes by xenologous or non-orthologous genes from various sources. The second scenario posits convergent emergence, on multiple occasions, of large DNA viruses from small plasmid-like precursors through independent accretion of similar sets of genes due to strong selective pressures imposed by their life cycles and hosts.
- Pal LR, Guda C
- Tracing the origin of functional and conserved domains in the human proteome: implications for protein evolution at the modular level.
- BMC Evol Biol. 2006; 6: 91-91
- Display abstract
BACKGROUND: The functional repertoire of the human proteome is an incremental collection of functions accomplished by protein domains evolved along the Homo sapiens lineage. Therefore, knowledge on the origin of these functionalities provides a better understanding of the domain and protein evolution in human. The lack of proper comprehension about such origin has impelled us to study the evolutionary origin of human proteome in a unique way as detailed in this study. RESULTS: This study reports a unique approach for understanding the evolution of human proteome by tracing the origin of its constituting domains hierarchically, along the Homo sapiens lineage. The uniqueness of this method lies in subtractive searching of functional and conserved domains in the human proteome resulting in higher efficiency of detecting their origins. From these analyses the nature of protein evolution and trends in domain evolution can be observed in the context of the entire human proteome data. The method adopted here also helps delineate the degree of divergence of functional families occurred during the course of evolution. CONCLUSION: This approach to trace the evolutionary origin of functional domains in the human proteome facilitates better understanding of their functional versatility as well as provides insights into the functionality of hypothetical proteins present in the human proteome. This work elucidates the origin of functional and conserved domains in human proteins, their distribution along the Homo sapiens lineage, occurrence frequency of different domain combinations and proteome-wide patterns of their distribution, providing insights into the evolutionary solution to the increased complexity of the human proteome.
- Zhang D, Martyniuk CJ, Trudeau VL
- SANTA domain: a novel conserved protein module in Eukaryota with potential involvement in chromatin regulation.
- Bioinformatics. 2006; 22: 2459-62
- Display abstract
Since packaging of DNA in the chromatin structure restricts the accessibility for regulatory factors, chromatin remodeling is required to facilitate nuclear processes such as gene transcription, replication, and genome recombination. Many conserved non-enzymatic protein domains have been identified that contribute to the activities of multiprotein remodeling complexes. Here we identified a novel conserved protein domain in Eukaryota whose putative function may be in regulating chromatin remodeling. Since this domain is associated with a known SANT domain in several vertebrate proteins, we named it the SANTA (SANT Associated) domain. Sequence analysis showed that the SANTA domain is approximately a 90 amino acid module and likely composed of four central beta-sheets and three flanking alpha-helices. Many hydrophobic residues exhibited high conservation along the domain, implying a possible function in protein-protein interactions. The SANTA domain was identified in mammals, chicken, frog, fish, sea squirt, sea urchin, worms and plants. Furthermore, a phylogenetic tree of SANTA domains showed that one plant-specific duplication event happened in the Viridiplantae lineage.
- Katoh Y, Katoh M
- Comparative genomics on SOX2 orthologs.
- Oncol Rep. 2005; 14: 797-800
- Display abstract
SOX2 and POU5F1 (OCT3 or OCT4) transcription factors are implicated in FGF4 expression in embryonic stem (ES) cells. SOX2, POU5F1, and FGF4 are key molecules for the integrome network in oncology and stem cell biology. SOX2 gene at human chromosome 3q26.33, SOX1 gene at 13q34, and SOX3 gene at Xq27.1 constitute a subfamily among the SOX gene family. Here, rat Sox2 and Xenopus sox2 genes were identified and characterized by using bioinformatics for comparative genomics and comparative proteomics analyses. Rat Sox2 gene, encoding a 319-aa protein, was located around the nucleotide position 73213-75621 of rat genome sequence AC123231.4. Xenopus tropicalis sox2 complete coding sequence, encoding a 311-aa protein, was derived from CR760314.1 cDNA. Rat Sox2 showed 98.4%, 97.8%, 92.2%, 88.1% and 86.8% total amino-acid identity with mouse Sox2, human SOX2, chicken sox2, Xenopus sox2 and zebrafish sox2, respectively. SOX123C domain was identified as the novel domain corresponding to the C-terminal region conserved among SOX1, SOX2 and SOX3 orthologs. Vertebrate SOX1, SOX2 and SOX3 orthologs were found consisting of HMG box and SOX123C domain. SOX9, TCF/LEF, POU2F1 and COMP1 binding sites were conserved among human SOX2 promoter, rat Sox2 promoter, and mouse Sox2 promoter. SOX2 mRNA was expressed in ES cells, fetal brain, anaplastic oligodendroglioma, rhabdomyosarcoma, and small cell lung carcinoma. Due to the pivotal role of SOX2 in the early embryogenesis, SOX2 promoter and SOX2 protein were well conserved during vertebrate evolution. This is the first report on comparative integromics analyses on the SOX2 orthologs.
- Suhre K
- Gene and genome duplication in Acanthamoeba polyphaga Mimivirus.
- J Virol. 2005; 79: 14095-101
- Display abstract
Gene duplication is key to molecular evolution in all three domains of life and may be the first step in the emergence of new gene function. It is a well-recognized feature in large DNA viruses but has not been studied extensively in the largest known virus to date, the recently discovered Acanthamoeba polyphaga Mimivirus. Here, I present a systematic analysis of gene and genome duplication events in the mimivirus genome. I found that one-third of the mimivirus genes are related to at least one other gene in the mimivirus genome, either through a large segmental genome duplication event that occurred in the more remote past or through more recent gene duplication events, which often occur in tandem. This shows that gene and genome duplication played a major role in shaping the mimivirus genome. Using multiple alignments, together with remote-homology detection methods based on Hidden Markov Model comparison, I assign putative functions to some of the paralogous gene families. I suggest that a large part of the duplicated mimivirus gene families are likely to interfere with important host cell processes, such as transcription control, protein degradation, and cell regulatory processes. My findings support the view that large DNA viruses are complex evolving organisms, possibly deeply rooted within the tree of life, and oppose the paradigm that viral evolution is dominated by lateral gene acquisition, at least in regard to large DNA viruses.
- Atchley WR, Fernandes AD
- Sequence signatures and the probabilistic identification of proteins in the Myc-Max-Mad network.
- Proc Natl Acad Sci U S A. 2005; 102: 6401-6
- Display abstract
Accurate identification of specific groups of proteins by their amino acid sequence is an important goal in genome research. Here we combine information theory with fuzzy logic search procedures to identify sequence signatures or predictive motifs for members of the Myc-Max-Mad transcription factor network. Myc is a well known oncoprotein, and this family is involved in cell proliferation, apoptosis, and differentiation. We describe a small set of amino acid sites from the N-terminal portion of the basic helix-loop-helix (bHLH) domain that provide very accurate sequence signatures for the Myc-Max-Mad transcription factor network and three of its member proteins. A predictive motif involving 28 contiguous bHLH sequence elements found 337 network proteins in the GenBank NR database with no mismatches or misidentifications. This motif also identifies at least one previously unknown fungal protein with strong affinity to the Myc-Max-Mad network. Another motif found 96% of known Myc protein sequences with only a single mismatch, including sequences from genomes previously not thought to contain Myc proteins. The predictive motif for Myc is very similar to the ancestral sequence for the Myc group estimated from phylogenetic analyses. Based on available crystal structure studies, this motif is discussed in terms of its functional consequences. Our results provide insight into evolutionary diversification of DNA binding and dimerization in a well characterized family of regulatory proteins and provide a method of identifying signature motifs in protein families.
- Balaji S, Babu MM, Iyer LM, Aravind L
- Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains.
- Nucleic Acids Res. 2005; 33: 3994-4006
- Display abstract
The comparative genomics of apicomplexans, such as the malarial parasite Plasmodium, the cattle parasite Theileria and the emerging human parasite Cryptosporidium, have suggested an unexpected paucity of specific transcription factors (TFs) with DNA binding domains that are closely related to those found in the major families of TFs from other eukaryotes. This apparent lack of specific TFs is paradoxical, given that the apicomplexans show a complex developmental cycle in one or more hosts and a reproducible pattern of differential gene expression in course of this cycle. Using sensitive sequence profile searches, we show that the apicomplexans possess a lineage-specific expansion of a novel family of proteins with a version of the AP2 (Apetala2)-integrase DNA binding domain, which is present in numerous plant TFs. About 20-27 members of this apicomplexan AP2 (ApiAP2) family are encoded in different apicomplexan genomes, with each protein containing one to four copies of the AP2 DNA binding domain. Using gene expression data from Plasmodium falciparum, we show that guilds of ApiAP2 genes are expressed in different stages of intraerythrocytic development. By analogy to the plant AP2 proteins and based on the expression patterns, we predict that the ApiAP2 proteins are likely to function as previously unknown specific TFs in the apicomplexans and regulate the progression of their developmental cycle. In addition to the ApiAP2 family, we also identified two other novel families of AP2 DNA binding domains in bacteria and transposons. Using structure similarity searches, we also identified divergent versions of the AP2-integrase DNA binding domain fold in the DNA binding region of the PI-SceI homing endonuclease and the C-terminal domain of the pleckstrin homology (PH) domain-like modules of eukaryotes. Integrating these findings, we present a reconstruction of the evolutionary scenario of the AP2-integrase DNA binding domain fold, which suggests that it underwent multiple independent combinations with different types of mobile endonucleases or recombinases. It appears that the eukaryotic versions have emerged from versions of the domain associated with mobile elements, followed by independent lineage-specific expansions, which accompanied their recruitment to transcription regulation functions.
- Schmitt EK, Hoff B, Kuck U
- AcFKH1, a novel member of the forkhead family, associates with the RFX transcription factor CPCR1 in the cephalosporin C-producing fungus Acremonium chrysogenum.
- Gene. 2004; 342: 269-81
- Display abstract
In the filamentous fungus Acremonium chrysogenum, a complex regulatory network of transcription factors controls the expression of at least seven cephalosporin C biosynthesis genes. The RFX transcription factor CPCR1 binds to regulatory sequences in the promoter region of cephalosporin C biosynthesis genes, and is involved in the transcriptional regulation of the pcbC gene which encodes isopenicillin N synthase. In this study, we used CPCR1 in a yeast two-hybrid screen to identify potential protein interaction partners. A cDNA was identified, encoding the C-terminal part (pos. 438-665) of the novel forkhead protein, AcFKH1. The full-length AcFKH1 amino acid sequence is 665 residues and shares between 31% and 60% identity with forkhead protein sequences in the genomes of Aspergillus nidulans, Fusarium graminearum, and Neurospora crassa. AcFKH1 is characterized by two conserved domains, the N-terminal forkhead-associated domain (FHA), which might be involved in phospho-protein interactions, and the C-terminal DNA-binding domain (FKH) of the winged helix/forkhead type. The two-hybrid system was also used to map the protein domains required for the interaction of transcription factors CPCR1 and AcFKH1. The observed interaction between CPCR1 and the C-terminus of AcFKH1 in the yeast system was verified in vitro in a GST pulldown assay. Using gel retardation analysis, the DNA-binding properties of the fungal forkhead protein AcFKH1 were investigated. AcFKH1 recognizes two forkhead consensus binding sites within the 1.2 kb promoter region of the divergently oriented cephalosporin biosynthesis gene pair pcbAB-pcbC from A. chrysogenum. Additionally, AcFKH1 is able to bind with high affinity to the SWI5-binding site of the yeast FKH2 protein.
- Leipe DD, Koonin EV, Aravind L
- STAND, a class of P-loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer.
- J Mol Biol. 2004; 343: 1-28
- Display abstract
Using sequence profile analysis and sequence-based structure predictions, we define a previously unrecognized, widespread class of P-loop NTPases. The signal transduction ATPases with numerous domains (STAND) class includes the AP-ATPases (animal apoptosis regulators CED4/Apaf-1, plant disease resistance proteins, and bacterial AfsR-like transcription regulators) and NACHT NTPases (e.g. NAIP, TLP1, Het-E-1) that have been studied extensively in the context of apoptosis, pathogen response in animals and plants, and transcriptional regulation in bacteria. We show that, in addition to these well-characterized protein families, the STAND class includes several other groups of (predicted) NTPase domains from diverse signaling and transcription regulatory proteins from bacteria and eukaryotes, and three Archaea-specific families. We identified the STAND domain in several biologically well-characterized proteins that have not been suspected to have NTPase activity, including soluble adenylyl cyclases, nephrocystin 3 (implicated in polycystic kidney disease), and Rolling pebble (a regulator of muscle development); these findings are expected to facilitate elucidation of the functions of these proteins. The STAND class belongs to the additional strand, catalytic E division of P-loop NTPases together with the AAA+ ATPases, RecA/helicase-related ATPases, ABC-ATPases, and VirD4/PilT-like ATPases. The STAND proteins are distinguished from other P-loop NTPases by the presence of unique sequence motifs associated with the N-terminal helix and the core strand-4, as well as a C-terminal helical bundle that is fused to the NTPase domain. This helical module contains a signature GxP motif in the loop between the two distal helices. With the exception of the archaeal families, almost all STAND NTPases are multidomain proteins containing three or more domains. In addition to the NTPase domain, these proteins typically contain DNA-binding or protein-binding domains, superstructure-forming repeats, such as WD40 and TPR, and enzymatic domains involved in signal transduction, including adenylate cyclases and kinases. By analogy to the AAA+ ATPases, it can be predicted that STAND NTPases use the C-terminal helical bundle as a "lever" to transmit the conformational changes brought about by NTP hydrolysis to effector domains. STAND NTPases represent a novel paradigm in signal transduction, whereby adaptor, regulatory switch, scaffolding, and, in some cases, signal-generating moieties are combined into a single polypeptide. The STAND class consists of 14 distinct families, and the evolutionary history of most of these families is riddled with dramatic instances of lineage-specific expansion and apparent horizontal gene transfer. The STAND NTPases are most abundant in developmentally and organizationally complex prokaryotes and eukaryotes. Transfer of genes for STAND NTPases from bacteria to eukaryotes on several occasions might have played a significant role in the evolution of eukaryotic signaling systems.
- Kim S, Zhang Z, Upchurch S, Isern N, Chen Y
- Structure and DNA-binding sites of the SWI1 AT-rich interaction domain (ARID) suggest determinants for sequence-specific DNA recognition.
- J Biol Chem. 2004; 279: 16670-6
- Display abstract
ARID (AT-rich interaction domain) is a homologous family of DNA-binding domains that occur in DNA-binding proteins from a wide variety of species, ranging from yeast to nematodes, insects, mammals, and plants. SWI1, a member of the SWI/SNF protein complex that is involved in chromatin remodeling during transcription, contains the ARID motif. The ARID domain of human SWI1 (also known as p270) does not select for a specific DNA sequence from a random sequence pool. The lack of sequence specificity shown by the SWI1 ARID domain stands in contrast to the other characterized ARID domains, which recognize specific AT-rich sequences. We have solved the three-dimensional structure of human SWI1 ARID using solution NMR methods. In addition, we have characterized nonspecific DNA binding by the SWI1 ARID domain. Results from this study indicate that a flexible, long, internal loop in the ARID motif is likely to be important for sequence-specific DNA recognition. The structure of the human SWI1 ARID domain also represents a distinct structural subfamily. Studies of ARID indicate that the boundary of DNA binding structural and functional domains can extend beyond the sequence homologous region in a homologous family of proteins. Structural studies of homologous domains such as the ARID family of DNA-binding domains should provide information to better predict the boundary of structural and functional domains in structural genomic studies.
- Anantharaman V, Aravind L
- Novel conserved domains in proteins with predicted roles in eukaryotic cell-cycle regulation, decapping and RNA stability.
- BMC Genomics. 2004; 5: 45-45
- Display abstract
BACKGROUND: The emergence of eukaryotes was characterized by the expansion and diversification of several ancient RNA-binding domains and the apparent de novo innovation of new RNA-binding domains. The identification of these RNA-binding domains may throw light on the emergence of eukaryote-specific systems of RNA metabolism. RESULTS: Using sensitive sequence profile searches, homology-based fold recognition and sequence-structure superpositions, we identified novel, divergent versions of the Sm domain in the Scd6p family of proteins. This family of Sm-related domains shares certain features of conventional Sm domains, which are required for binding RNA, in addition to possessing some unique conserved features. We also show that these proteins contain a second previously uncharacterized C-terminal domain, termed the FDF domain (after a conserved sequence motif in this domain). The FDF domain is also found in the fungal Dcp3p-like and the animal FLJ22128-like proteins, where it fused to a C-terminal domain of the YjeF-N domain family. In addition to the FDF domains, the FLJ22128-like proteins contain yet another divergent version of the Sm domain at their extreme N-terminus. We show that the YjeF-N domains represent a novel version of the Rossmann fold that has acquired a set of catalytic residues and structural features that distinguish them from the conventional dehydrogenases. CONCLUSIONS: Several lines of contextual information suggest that the Scd6p family and the Dcp3p-like proteins are conserved components of the eukaryotic RNA metabolism system. We propose that the novel domains reported here, namely the divergent versions of the Sm domain and the FDF domain may mediate specific RNA-protein and protein-protein interactions in cytoplasmic ribonucleoprotein complexes. More specifically, the protein complexes containing Sm-like domains of the Scd6p family are predicted to regulate the stability of mRNA encoding proteins involved in cell cycle progression and vesicular assembly. The Dcp3p and FLJ22128 proteins may localize to the cytoplasmic processing bodies and possibly catalyze a specific processing step in the decapping pathway. The explosive diversification of Sm domains appears to have played a role in the emergence of several uniquely eukaryotic ribonucleoprotein complexes, including those involved in decapping and mRNA stability.
- Wang Z, Peters B, Klussmann S, Bender H, Herb A, Krieglstein K
- Gene structure and evolution of Tieg3, a new member of the Tieg family of proteins.
- Gene. 2004; 325: 25-34
- Display abstract
TGF beta-inducible immediate early gene, Tieg, belongs to the superfamily of Sp1-like transcription factors containing three C(2)H(2)-zinc finger DNA binding motifs close to the C-terminus. So far, Tieg1 and Tieg2 have been identified in human and mouse. We identified Tieg3, a new member of the Tieg protein family by screening a mouse cDNA library. Tieg3 has almost all the known features of the Tieg protein family: it shares a highly conserved C(2)H(2) zinc finger DNA binding domain and is 96% identical to Tieg2 and 86% to Tieg1, respectively. In addition, the three repression domains at the N-terminus, R1, R2 and R3 are conserved in all the Tiegs. Similar to Tieg1 and Tieg2, Tieg3 mRNA is up-regulated in response to TGF beta 1 treatment and can perform the Sp1 sites mediated repression of transcription. A 4 kilobase (kb) long transcript of mouse Tieg3 can be detected using Northern-blot analysis. The gene of mouse Tieg3 contains four exons. Due to the amino acid sequence similarity, mouse Tieg2 is regarded as an orthologue of human Tieg2. However, the mouse Tieg3 gene is localized in a conserved segment on mouse chromosome 12 corresponding to human Tieg2 on chromosome 2 with the same gene order. An interesting explanation for this apparent contradiction might be a homologous recombination leading to loci exchange between the mouse Tieg3 and Tieg2.
- Glusman G, Kaur A, Hood L, Rowen L
- An enigmatic fourth runt domain gene in the fugu genome: ancestral gene loss versus accelerated evolution.
- BMC Evol Biol. 2004; 4: 43-43
- Display abstract
BACKGROUND: The runt domain transcription factors are key regulators of developmental processes in bilaterians, involved both in cell proliferation and differentiation, and their disruption usually leads to disease. Three runt domain genes have been described in each vertebrate genome (the RUNX gene family), but only one in other chordates. Therefore, the common ancestor of vertebrates has been thought to have had a single runt domain gene. RESULTS: Analysis of the genome draft of the fugu pufferfish (Takifugu rubripes) reveals the existence of a fourth runt domain gene, FrRUNT, in addition to the orthologs of human RUNX1, RUNX2 and RUNX3. The tiny FrRUNT packs six exons and two putative promoters in just 3 kb of genomic sequence. The first exon is located within an intron of FrSUPT3H, the ortholog of human SUPT3H, and the first exon of FrSUPT3H resides within the first intron of FrRUNT. The two gene structures are therefore "interlocked". In the human genome, SUPT3H is instead interlocked with RUNX2. FrRUNT has no detectable ortholog in the genomes of mammals, birds or amphibians. We consider alternative explanations for an apparent contradiction between the phylogenetic data and the comparison of the genomic neighborhoods of human and fugu runt domain genes. We hypothesize that an ancient RUNT locus was lost in the tetrapod lineage, together with FrFSTL6, a member of a novel family of follistatin-like genes. CONCLUSIONS: Our results suggest that the runt domain family may have started expanding in chordates much earlier than previously thought, and exemplify the importance of detailed analysis of whole-genome draft sequence to provide new insights into gene evolution.
- Aravind L, Anantharaman V
- HutC/FarR-like bacterial transcription factors of the GntR family contain a small molecule-binding domain of the chorismate lyase fold.
- FEMS Microbiol Lett. 2003; 222: 17-23
- Display abstract
Numerous bacterial transcription factors contain a DNA-binding helix-turn-helix domain and a signaling domain, linked together in a single polypeptide. Typically, this signaling domain is a small-molecule-binding domain that undergoes a conformational change upon recognizing a specific ligand. The HutC/FarR-like transcription factors of the GntR family are one of the largest groups of transcription factors in the proteomes of most free-living bacteria. Using sensitive sequence profile analysis we show that the HutC/FarR-like transcription factors contain a conserved ligand-binding domain, which possesses the same fold as chorismate lyase (Escherichia coli UbiC gene product). This relationship suggests that the C-terminal domain of the HutC/FarR-like transcription factors binds small molecules in a cleft similar to the substrate-binding site of the chorismate lyases. The sequence diversity within the predicted binding cleft of the HutC/FarR ligand-binding domains is consistent with the ability of these transcription factors to respond to diverse small molecules, such as histidine (HutC), fatty acids (FarR), sugars (TreR) and alkylphosphonate (PhnF). UbiC-like chorismate lyases function in the ubiquinone biosynthesis pathway, and have characteristic charged, catalytic residues. Genome comparisons reveal that chorismate lyase orthologs are found in several bacteria, chloroplasts of eukaryotic algae and euryarchaea. In contrast, the GntR transcription regulators lack the conserved catalytic residues of the chorismate lyases, and have so far been detected only in bacteria. An ancestral, generic small-molecule-binding domain appears to have given rise to the enzymatic and non-catalytic ligand-binding versions of the same fold under the influence of different selective pressures.
- Aravind L, Anantharaman V, Koonin EV
- Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA.
- Proteins. 2002; 48: 1-14
- Display abstract
Protein sequence and structure comparisons show that the catalytic domains of Class I aminoacyl-tRNA synthetases, a related family of nucleotidyltransferases involved primarily in coenzyme biosynthesis, nucleotide-binding domains related to the UspA protein (USPA domains), photolyases, electron transport flavoproteins, and PP-loop-containing ATPases together comprise a distinct class of alpha/beta domains designated the HUP domain after HIGH-signature proteins, UspA, and PP-ATPase. Several lines of evidence are presented to support the monophyly of the HUP domains, to the exclusion of other three-layered alpha/beta folds with the generic "Rossmann-like" topology. Cladistic analysis, with patterns of structural and sequence similarity used as discrete characters, identified three major evolutionary lineages within the HUP domain class: the PP-ATPases; the HIGH superfamily, which includes class I aaRS and related nucleotidyltransferases containing the HIGH signature in their nucleotide-binding loop; and a previously unrecognized USPA-like group, which includes USPA domains, electron transport flavoproteins, and photolyases. Examination of the patterns of phyletic distribution of distinct families within these three major lineages suggests that the Last Universal Common Ancestor of all modern life forms encoded 15-18 distinct alpha/beta ATPases and nucleotide-binding proteins of the HUP class. This points to an extensive radiation of HUP domains before the last universal common ancestor (LUCA), during which the multiple class I aminoacyl-tRNA synthetases emerged only at a late stage. Thus, substantial evolutionary diversification of protein domains occurred well before the modern version of the protein-dependent translation machinery was established, i.e., still in the RNA world.
- Uchida A et al.
- The carboxy-terminal domain of the XPC protein plays a crucial role in nucleotide excision repair through interactions with transcription factor IIH.
- DNA Repair (Amst). 2002; 1: 449-61
- Display abstract
The xeroderma pigmentosum group C (XPC) protein specifically involved in genome-wide damage recognition for nucleotide excision repair (NER) was purified as a tight complex with HR23B, one of the two mammalian homologs of RAD23 in budding yeast. This XPC-HR23B complex exhibits strong binding affinity for single-stranded DNA, as well as preferential binding to various types of damaged DNA. To examine the structure-function relationship of XPC, a series of truncated mutant proteins were generated and assayed for various binding activities. The two domains participating in binding to HR23B and damaged DNA, respectively, were mapped within the carboxy-terminal half of XPC, which also contains an evolutionary conserved amino acid sequence homologous to the yeast RAD4 protein. We established that the carboxy-terminal 125 amino acids are dispensable for both HR23B and damaged DNA binding, while interactions with transcription factor IIH (TFIIH) are significantly impaired by truncation of this domain. Furthermore, deletion of the extreme carboxy-terminal domain totally abolished XPC activity in the cell-free NER reaction. These results suggest that following initial damage recognition, the carboxy terminus of XPC may be essential for the recruitment of TFIIH, and that most truncation mutations identified in XP-C patients result in non-functional proteins.
- Peng H, Zheng L, Lee WH, Rux JJ, Rauscher FJ 3rd
- A common DNA-binding site for SZF1 and the BRCA1-associated zinc finger protein, ZBRK1.
- Cancer Res. 2002; 62: 3773-81
- Display abstract
More than 220 Kruppel-associated box-zinc finger protein (KRAB-ZFP) genes are encoded in the human genome. KRAB-ZFPs function as transcriptionalrepressors by binding DNA through their tandem zinc finger motifs.Gene silencing is mediated by the highly conserved KRAB domain, which recruits histone deacetylase complexes, histone methylases, and heterochromatin proteins. However, little is known of the biological programs regulated by KRAB-ZFPs, in large part because of the difficulty in identifying DNA-binding sites recognized by long arrays of zinc fingers. In an attempt to identify the natural target genes for a KRAB-ZFP, we chose SZF1, a hematopoietic progenitor-restricted, KRAB-ZFP that contains only four C(2)H(2) zinc finger motifs. Using recombinant SZF1 protein and a PCR-based binding site selection strategy, we identified a 15-bp consensus DNA sequence recognized by SZF1. Remarkably, this sequence is similar to the core DNA-binding site described recently for ZBRK1, a KRAB-ZFP that binds to BRCA1 and is involved in coordinating the cellular DNA damage response. The SZF1 and ZBRK1 proteins bind to both the experimentally derived SZF1 site and the canonical ZBRK1 site. The KRAB domain from SZF1 bound directly to the KAP-1 corepressor and displayed intrinsic silencing activity. Moreover, full-length SZF1 repressed a promoter containing ZBRK1 recognition sequences. Thus, SZF1 and ZBRK1 may regulate a common set of target genes in vivo.
- Boucher L, Ouzounis CA, Enright AJ, Blencowe BJ
- A genome-wide survey of RS domain proteins.
- RNA. 2001; 7: 1693-701
- Display abstract
Domains rich in alternating arginine and serine residues (RS domains) are frequently found in metazoan proteins involved in pre-mRNA splicing. The RS domains of splicing factors associate with each other and are important for the formation of protein-protein interactions required for both constitutive and regulated splicing. The prevalence of the RS domain in splicing factors suggests that it might serve as a useful signature for the identification of new proteins that function in pre-mRNA processing, although it remains to be determined whether RS domains also participate in other cellular functions. Using database search and sequence clustering methods, we have identified and categorized RS domain proteins encoded within the entire genomes of Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae. This genome-wide survey revealed a surprising complexity of RS domain proteins in metazoans with functions associated with chromatin structure, transcription by RNA polymerase II, cell cycle, and cell structure, as well as pre-mRNA processing. Also identified were RS domain proteins in S. cerevisiae with functions associated with cell structure, osmotic regulation, and cell cycle progression. The results thus demonstrate an effective strategy for the genomic mining of RS domain proteins. The identification of many new proteins using this strategy has provided a database of factors that are candidates for forming RS domain-mediated interactions associated with different steps in pre-mRNA processing, in addition to other cellular functions.
- Staub E, Dahl E, Rosenthal A
- The DAPIN family: a novel domain links apoptotic and interferon response proteins.
- Trends Biochem Sci. 2001; 26: 83-5
- Display abstract
We report the discovery of a protein domain, hereafter referred to as DAPIN, in diverse vertebrate and viral proteins that is associated with tumor biology, apoptosis and inflammation. Based on a secondary structure prediction, we suggest an all-alpha fold for DAPIN, which is also adopted by apoptotic protein domains of the CARD, death domain and death effector domain type.
- Torshin IY
- Clustering amino acid contents of protein domains: biochemical functions of proteins and implications for origin of biological macromolecules.
- Front Biosci. 2001; 6: 112-112
- Display abstract
Structural classes of protein domains correlate with their amino acid compositions. Several successful algorithms (that use only amino acid composition) have been elaborated for the prediction of structural class or potential biochemical significance. This work deals with dynamic classification (clustering) of the domains on the basis of their amino acid composition. Amino acid contents of domains from a non-redundant PDB set were clustered in 20-dimensional space of amino acid contents. Despite the variations of an empirical parameter and non-redundancy of the set, only one large cluster (tens-hundreds of proteins) surrounded by hundreds of small clusters (1-5 proteins), was identified. The core of the largest cluster contains at least 64% DNA (nucleotide)-interacting protein domains from various sources. About 90% of the proteins of the core are intracellular proteins. 83% of the DNA/nucleotide interacting domains in the core belong to the mixed alpha-beta folds (a+b, a/b), 14% are all-alpha (mostly helices) and all-beta (mostly beta-strands) proteins. At the same time, when core domains that belong to one organism (E.coli) are considered, over 80% of them prove to be DNA/nucleotide interacting proteins. The core is compact: amino acid contents of domains from the core lie in relatively narrow and specific ranges. The core also contains several Fe-S cluster-binding domains, amino acid contents of the core overlap with ferredoxin and CO-dehydrogenase clusters, the oldest known proteins. As Fe-S clusters are thought to be the first biocatalysts, the results are discussed in relation to contemporary experiments and models dealing with the origin of biological macromolecules. The origin of most primordial proteins is considered here to be a result of co-adsorption of nucleotides and amino acids on specific clays, followed by en-block polymerization of the adsorbed mixtures of amino acids.
- Zhang X, Xu X, Hew CL
- The structure and function of a gene encoding a basic peptide from prawn white spot syndrome virus.
- Virus Res. 2001; 79: 137-44
- Display abstract
Prawn white spot syndrome virus (WSSV) is the major pathogen responsible for the high mortality of cultured prawns. A gene (termed as p6.8) encoding a basic peptide was found by screening the cDNA and DNA libraries of WSSV. The peptide was highly homologous with proteins rich in arginine and lysine. A fusion protein containing the p6.8 and green fluorescent protein (GFP) genes was cloned into pBV220 and expressed in E. coli. Gel mobility shift assay indicated that the peptide encoded by p6.8 had the capability of binding DNA and might be involved in DNA packaging.
- Aravind L, Koonin EV
- Prokaryotic homologs of the eukaryotic DNA-end-binding protein Ku, novel domains in the Ku protein and prediction of a prokaryotic double-strand break repair system.
- Genome Res. 2001; 11: 1365-74
- Display abstract
Homologs of the eukaryotic DNA-end-binding protein Ku were identified in several bacterial and one archeal genome using iterative database searches with sequence profiles. Identification of prokaryotic Ku homologs allowed the dissection of the Ku protein sequences into three distinct domains, the Ku core that is conserved in eukaryotes and prokaryotes, a derived von Willebrand A domain that is fused to the amino terminus of the core in eukaryotic Ku proteins, and the newly recognized helix-extension-helix (HEH) domain that is fused to the carboxyl terminus of the core in eukaryotes and in one of the Ku homologs from the Actinomycete Streptomyces coelicolor. The version of the HEH domain present in eukaryotic Ku proteins represents the previously described DNA-binding domain called SAP. The Ku homolog from S. coelicolor contains a distinct version of the HEH domain that belongs to a previously unnoticed family of nucleic-acid-binding domains, which also includes HEH domains from the bacterial transcription termination factor Rho, bacterial and eukaryotic lysyl-tRNA synthetases, bacteriophage T4 endonuclease VII, and several uncharacterized proteins. The distribution of the Ku homologs in bacteria coincides with that of the archeal-eukaryotic-type DNA primase and genes for prokaryotic Ku homologs form predicted operons with genes coding for an ATP-dependent DNA ligase and/or archeal-eukaryotic-type DNA primase. Some of these operons additionally encode an uncharacterized protein that may function as nuclease or an Slx1p-like predicted nuclease containing a URI domain. A hypothesis is proposed that the Ku homolog, together with the associated gene products, comprise a previously unrecognized prokaryotic system for repair of double-strand breaks in DNA.
- Demidov VV
- DNA dumbbells: decoys for gene-regulatory proteins.
- Drug Discov Today. 2001; 6: 749-750
- Anantharaman V, Koonin EV, Aravind L
- Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains.
- J Mol Biol. 2001; 307: 1271-92
- Display abstract
Central cellular functions such as metabolism, solute transport and signal transduction are regulated, in part, via binding of small molecules by specialized domains. Using sensitive methods for sequence profile analysis and protein structure comparison, we exhaustively surveyed the protein sets from completely sequenced genomes for all occurrences of 21 intracellular small-molecule-binding domains (SMBDs) that are represented in at least two of the three major divisions of life (bacteria, archaea and eukaryotes). These included previously characterized domains such as PAS, GAF, ACT and ferredoxins, as well as three newly predicted SMBDs, namely the 4-vinyl reductase (4VR) domain, the NIFX domain and the 3-histidines (3H) domain. Although there are only a limited number of different superfamilies of these ancient SMBDs, they are present in numerous distinct proteins combined with various enzymatic, transport and signal-transducing domains. Most of the SMBDs show considerable evolutionary mobility and are involved in the generation of many lineage-specific domain architectures. Frequent re-invention of analogous architectures involving functionally related, but not homologous, domains was detected, such as, fusion of different SMBDs to several types of DNA-binding domains to form diverse transcription regulators in prokaryotes and eukaryotes. This is suggestive of similar selective forces affecting the diverse SMBDs and resulting in the formation of multidomain proteins that fit a limited number of functional stereotypes. Using the "guilt by association approach", the identification of SMBDs allowed prediction of functions and mode of regulation for a variety of previously uncharacterized proteins.
- Henderson CE, Bromek K, Mullin NP, Smith BO, Uhrin D, Barlow PN
- Solution structure and dynamics of the central CCP module pair of a poxvirus complement control protein.
- J Mol Biol. 2001; 307: 323-39
- Display abstract
The complement control protein (CCP) module (also known as SCR, CCP or sushi domain) is prevalent amongst proteins that regulate complement activation. Functional and mutagenesis studies have shown that in most cases two or more neighbouring CCP modules form specific binding sites for other molecules. Hence the orientation in space of a CCP module with respect to its neighbours and the flexibility of the intermodular junction are likely to be critical for function. Vaccinia virus complement control protein (VCP) is a complement regulatory protein composed of four tandemly arranged CCP modules. The solution structure of the carboxy-terminal half of this protein (CCP modules 3 and 4) has been solved previously. The structure of the central portion (modules 2 and 3, VCP approximately 2,3) has now also been solved using NMR spectroscopy at 37 degrees C. In addition, the backbone dynamics of VCP approximately 2,3 have been characterised by analysis of its (15)N relaxation parameters. Module 2 has a typical CCP module structure while module 3 in the context of VCP approximately 2,3 has some modest but significant differences in structure and dynamics to module 3 within the 3,4 pair. Modules 2 and 3 do not share an extensive interface, unlike modules 3 and 4. Only two possible NOEs were identified between the bodies of the modules, but a total of 40 NOEs between the short intermodular linker of VCP approximately 2,3 and the bodies of the two modules determines a preferred, elongated, orientation of the two modules in the calculated structures. The anisotropy of rotational diffusion has been characterised from (15)N relaxation data, and this indicates that the time-averaged structure is more compact than suggested by (1)H-(1)H NOEs. The data are consistent with the presence of many intermodular orientations, some of which are kinked, undergoing interconversion on a 10(-8)-10(-6) second time-scale. A reconstructed representation of modules 2-4 allows visualisation of the spatial arrangement of the 11 substitutions that occur in the more potent complement inhibitor from Variola (small pox) virus.
- Lowry JA, Atchley WR
- Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain.
- J Mol Evol. 2000; 50: 103-15
- Display abstract
The GATA-binding transcription factors comprise a protein family whose members contain either one or two highly conserved zinc finger DNA-binding domains. Members of this group have been identified in organisms ranging from cellular slime mold to vertebrates, including plants, fungi, nematodes, insects, and echinoderms. While much work has been done describing the expression patterns, functional aspects, and target genes for many of these proteins, an evolutionary analysis of the entire family has been lacking. Herein we show that only the C-terminal zinc finger (Cf) and basic domain, which together constitute the GATA-binding domain, are conserved throughout this protein family. Phylogenetic analyses of amino acid sequences demonstrate distinct evolutionary pathways. Analysis of GATA factors isolated from vertebrates suggests that the six distinct vertebrate GATAs are descended from a common ancestral sequence, while those isolated from nonvertebrates (with the exception of the fungal AREA orthologues and Arabidopsis paralogues) appear to be related only within the DNA-binding domain and otherwise provide little insight into their evolutionary history. These results suggest multiple modes of evolution, including gene duplication and modular evolution of GATA factors based upon inclusion of a class IV zinc finger motif. As such, GATA transcription factors represent a group of proteins related solely by their homologous DNA-binding domains. Further analysis of this domain examines the degree of conservation at each amino acid site using the Boltzmann entropy measure, thereby identifying residues critical to preservation of structure and function. Finally, we construct a predictive motif that can accurately identify potential GATA proteins.
- Zemskov EA, Kang W, Maeda S
- Evidence for nucleic acid binding ability and nucleosome association of Bombyx mori nucleopolyhedrovirus BRO proteins.
- J Virol. 2000; 74: 6784-9
- Display abstract
The Bombyx mori nucleopolyhedrovirus (BmNPV) genome contains five related members of the bro gene family, all of which are actively expressed in infected BmN cells. Although their functions are unknown, their amino acid sequences contain a motif found in all known viral and prokaryotic single-stranded DNA binding proteins. To determine if they bind to nucleic acids, we fractionated the nuclei of BmNPV-infected BmN cells using a histone extraction protocol. We detected BRO-A, BRO-C, and BRO-D in the histone H1 fraction using anti-BRO antibodies. Micrococcal nuclease treatment released these BRO proteins from the chromatin fraction, suggesting their involvement in nucleosome structures. Chromatographic fractionation showed that BRO-A and/or BRO-C interacted with core histones. Expression of partial sequences of BRO-A proved that the N-terminal 80 amino acid residues were required for DNA binding activity. We also demonstrated that BmNPV BRO proteins underwent phosphorylation and ubiquitination followed by proteasome degradation, which may explain their distribution in the cytoplasm as well as the nucleus. We propose that BRO-A and BRO-C may function as DNA binding proteins that influence host DNA replication and/or transcription.
- Xie G, Bonner CA, Jensen RA
- A probable mixed-function supraoperon in pseudomonas exhibits gene organization features of both intergenomic conservation and gene shuffling
- J Mol Evol. 2000; 50: 202-202
- Villarreal LP, DeFilippis VR
- A hypothesis for DNA viruses as the origin of eukaryotic replication proteins.
- J Virol. 2000; 74: 7079-84
- Display abstract
The eukaryotic replicative DNA polymerases are similar to those of large DNA viruses of eukaryotic and bacterial T4 phages but not to those of eubacteria. We develop and examine the hypothesis that DNA virus replication proteins gave rise to those of eukaryotes during evolution. We chose the DNA polymerase from phycodnavirus (which infects microalgae) as the basis of this analysis, as it represents a virus of a primitive eukaryote. We show that it has significant similarity with replicative DNA polymerases of eukaryotes and certain of their large DNA viruses. Sequence alignment confirms this similarity and establishes the presence of highly conserved domains in the polymerase amino terminus. Subsequent reconstruction of a phylogenetic tree indicates that these algal viral DNA polymerases are near the root of the clade containing all eukaryotic DNA polymerase delta members but that this clade does not contain the polymerases of other DNA viruses. We consider arguments for the polarity of this relationship and present the hypothesis that the replication genes of DNA viruses gave rise to those of eukaryotes and not the reverse direction.
- Rosinski JA, Atchley WR
- Molecular evolution of helix-turn-helix proteins.
- J Mol Evol. 1999; 49: 301-9
- Display abstract
The helix-turn-helix domain-containing family of transcriptional regulators is of ancient origin and has been incorporated into numerous disparate biological processes. As a consequence, the forces shaping its early evolution have been difficult to reconstruct. Herein, we analyze this large and diverse family with a combination of traditional phylogenetic techniques and newer sequence analysis tools to determine whether the helix-turn-helix family arose from a single common ancestor. Our analyses of the DNA-binding domain show that amino acid chemistry is conserved at many sites in the first helix and the turn. The high level of divergence combined with the short length of the domain hinders robust reconstruction of the entire phylogeny, but some level of deep node inference is possible. All analyses point to a predominantly monophyletic origin for the helix-turn-helix domain. The consequences of such an origin for a diverse group of proteins, and guidelines for the identification of future members of the HTH family are discussed.
- Kang W, Suzuki M, Zemskov E, Okano K, Maeda S
- Characterization of baculovirus repeated open reading frames (bro) in Bombyx mori nucleopolyhedrovirus.
- J Virol. 1999; 73: 10339-45
- Display abstract
The baculovirus Bombyx mori nucleopolyhedrovirus (BmNPV) contains five related open reading frames (ORFs). Recent sequence analyses of several other baculovirus genomes reveal that these ORFs belong to a unique multigene family called the baculovirus repeated ORFs (bro) family. Here we have characterized these five genes from BmNPV at the transcriptional and translational levels. Reverse transcription-PCR and primer extension analyses indicated that transcription of all bro genes occurs by 2 to 4 h postinfection (p.i.) and reaches maximal levels between at 8 and 12 h p.i. Transcription of all genes is initiated between 50 and 70 nucleotides upstream of the start codon, at a characteristic C(T)AGT motif. Expression of a cat reporter gene under the control of each bro promoter provides evidence that a viral factor(s) is required for the transcription of all bro genes. Immunoblot analysis indicated that a population of BRO proteins is produced vigorously between at 8 and 14 h p.i. Immunohistochemical analysis by confocal microscopy showed that BRO proteins are localized in both the nucleus and the cytoplasm at 8 h p.i. Four BmNPV mutants, in which the bro-a, bro-b, bro-c, and bro-e genes were individually inactivated, were successfully isolated. However, exhaustive efforts failed to isolate a bro-d-deficient mutant. Similarly, it was not possible to isolate a double-deletion bro-a bro-c mutant. The bro-d gene may play an irreplaceable functional role(s) during viral infection, while bro-a and bro-c may functionally complement each other.
- Kim SK, Wang JC
- Gene silencing via protein-mediated subcellular localization of DNA.
- Proc Natl Acad Sci U S A. 1999; 96: 8557-61
- Display abstract
We previously reported that overexpression of SopB, an Escherichia coli F plasmid-encoded partition protein, led to silencing of genes linked to, but well-separated from, a cluster of SopB-binding sites termed sopC. We show here that in this SopB-mediated repression of sopC-linked genes, all but the N-terminal 82 amino acids of SopB can be replaced by the DNA-binding domain of a sequence-specific DNA-binding protein, provided that the sopC locus is also replaced by the recognition sequence of the DNA-binding domain. These results, together with our previous finding that the N-terminal fragment of SopB is responsible for its polar localization in cells, suggest a mechanism of gene silencing: patches of closely packed DNA-binding domains are formed if a sequence-specific DNA-binding protein is localized to specific cellular sites; such a patch can capture a DNA carrying the recognition site of the DNA-binding domain and sequestrate genes adjacent to the recognition site through nonspecific binding of DNA. The generalization of this model to gene silencing in eukaryotes is discussed.
- Morgenstern B, Atchley WR
- Evolution of bHLH transcription factors: modular evolution by domain shuffling?
- Mol Biol Evol. 1999; 16: 1654-63
- Display abstract
Multidomain proteins usually contain several conserved and apparently independently evolved domains. As a result, classifications based on only a single small domain may obscure the true evolutionary relationships of the proteins. The current classification of basic helix-loop-helix (bHLH) domain-containing proteins is based on the conserved bHLH domain alone. Herein, we explore whether sequence homology and, therefore, evolutionary relationships can be detected among the flanking or non-bHLH components of the amino acid sequences of 122 bHLH proteins. These 122 proteins were the same proteins previously used to construct the existing classification of the bHLH-domain-containing proteins. Several possible scenarios are examined in order to explain the observed patterns of sequence divergence, including (1) monophyly, (2) convergent evolution, (3) addition of functional components to the bHLH domain, and (4) modular evolution with domain shuffling. Drawing on several lines of evidence, we suggest that modular evolution by domain shuffling may have played an important role in the evolution of this large group of transcriptional regulators.
- Aravind L, Subramanian G
- Origin of multicellular eukaryotes - insights from proteome comparisons.
- Curr Opin Genet Dev. 1999; 9: 688-94
- Display abstract
The complete genomes of the yeast Saccharomyces cerevisiae and the nematode worm Caenorhabditis elegans have recently become available allowing the comparison of the complete protein sets of a unicellular and multicellular eukaryote for the first time. These comparisons reveal some striking trends in terms of expansions or extensive shuffling of specific domains that are involved in regulatory functions and signaling. Similar comparisons with the available sequence data from the plant Arabidopsis thaliana produce consistent results. These observations have provided useful insights regarding the origin of multicellular organisms.
- Brick DJ, Burke RD, Schiff L, Upton C
- Shope fibroma virus RING finger protein N1R binds DNA and inhibits apoptosis.
- Virology. 1998; 249: 42-51
- Display abstract
Shope fibroma virus (SFV) N1R gene encodes a RING finger protein that localizes to virus factories within the cytoplasm of infected cells. Altered proteins, with deletions and site-specific mutations, were transiently expressed in vaccinia virus-infected cells to discern regions of the protein that are required for localization. We have determined that at least part of the RING finger region is necessary for localization but that the RING motif alone is not sufficient. A chimeric protein, however, in which the RING finger region of the herpes simplex virus-1 ICP0 protein replaces the SFV N1R RING motif does localize to virus factories. A region of five highly conserved amino acids at the amino terminus of SFV N1R is also critical for localization. We report that the SFV N1R protein binds double- and single-stranded DNA, suggesting a mechanism for localization, and that overexpression of this protein in vaccinia virus-infected cells reduces apoptosis-associated fragmentation of nuclear DNA.
- Maul GG
- Nuclear domain 10, the site of DNA virus transcription and replication.
- Bioessays. 1998; 20: 660-7
- Display abstract
Within the highly organized nuclear structure, specific nuclear domains (ND10) are defined by accumulations of proteins that can be interferon-upregulated, implicating ND10 as sites of a nuclear defense mechanism. Compatible with such a mechanism is the deposition of herpesvirus, adenovirus, and papovavirus genomes at the periphery of ND10. However, these DNA viruses begin their transcription at ND10 and consequently initiate replication at these sites, suggesting that viruses have evolved ways to circumvent this potential cellular defense and exploit it. Other ND10-associated proteins belong to ubiquitin-related pathways. These findings, together with the accumulation of various overexpressed cellular and viral proteins, suggest that ND10 function as nuclear dumps or as nuclear depots. Consistent with the recruitment or deposition of various proteins and viral genomes adjacent to ND10, ND10 themselves may only be protein accumulations at specific but as yet undefined nuclear deposition sites. The concept of specific nuclear deposition sites may explain the juxtaposition of various nuclear bodies and allows testable predictions about a potential supramolecular regulatory mechanism whereby proteins are selectively segregated or released by global changes induced in nuclear functions such as viral infections, stress, or hormonal induction.
- Li H, Trotta CR, Abelson J
- Crystal structure and evolution of a transfer RNA splicing enzyme.
- Science. 1998; 280: 279-84
- Display abstract
The splicing of transfer RNA precursors is similar in Eucarya and Archaea. In both kingdoms an endonuclease recognizes the splice sites and releases the intron, but the mechanism of splice site recognition is different in each kingdom. The crystal structure of the endonuclease from the archaeon Methanococcus jannaschii was determined to a resolution of 2.3 angstroms. The structure indicates that the cleavage reaction is similar to that of ribonuclease A and the arrangement of the active sites is conserved between the archaeal and eucaryal enzymes. These results suggest an evolutionary pathway for splice site recognition.
- Chervitz SA et al.
- Comparison of the complete protein sets of worm and yeast: orthology and divergence.
- Science. 1998; 282: 2022-8
- Display abstract
Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried out by orthologous proteins (proteins of different species that can be traced back to a common ancestor) that occur in comparable numbers. The specialized processes of signal transduction and regulatory control that are unique to the multicellular worm appear to use novel proteins, many of which re-use conserved domains. Major expansion of the number of some of these domains seen in the worm may have contributed to the advent of multicellularity. The proteins conserved in yeast and worm are likely to have orthologs throughout eukaryotes; in contrast, the proteins unique to the worm may well define metazoans.
- Jurica MS, Monnat RJ Jr, Stoddard BL
- DNA recognition and cleavage by the LAGLIDADG homing endonuclease I-CreI.
- Mol Cell. 1998; 2: 469-76
- Display abstract
The structure of the LAGLIDADG intron-encoded homing endonuclease I-CreI bound to homing site DNA has been determined. The interface is formed by an extended, concave beta sheet from each enzyme monomer that contacts each DNA half-site, resulting in direct side-chain contacts to 18 of the 24 base pairs across the full-length homing site. The structure indicates that I-CreI is optimized to its role in genetic transposition by exhibiting long site-recognition while being able to cleave many closely related target sequences. DNA cleavage is mediated by a compact pair of active sites in the I-CreI homodimer, each of which contains a separate bound divalent cation.
- Altschul SF et al.
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
- Nucleic Acids Res. 1997; 25: 3389-402
- Display abstract
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
- Hurme R, Berndt KD, Namork E, Rhen M
- Additions and Corrections to DNA binding exerted by a bacterial gene regulator with an extensive coiled-coil domain.
- J Biol Chem. 1996; 271: 17547-17547
- Mushegian AR, Koonin EV
- Sequence analysis of eukaryotic developmental proteins: ancient and novel domains.
- Genetics. 1996; 144: 817-28
- Display abstract
Most of the genes involved in the development of multicellular eukaryotes encode large, multidomain proteins. To decipher the major trends in the evolution of these proteins and make functional predictions for uncharacterized domains, we applied a strategy of sequence database search that includes construction of specialized data sets and iterative subsequence masking. This computational approach allowed us to detect previously unnoticed but potentially important sequence similarities. Developmental gene products are enriched in predicted nonglobular regions as compared to unbiased sets of eukaryotic and bacterial proteins. Developmental genes that act intracellularly, primarily at the level of transcription regulation, typically code for proteins containing highly conserved DNA-binding domains, most of which appear to have evolved before the radiation of bacteria and eukaryotes. We identified bacterial homologues, namely a protein family that includes the Escherichia coli universal stress protein UspA, for the MADS-box transcription regulators previously described only in eukaryotes. We also show that the FUS6 family of eukaryotic proteins contains a putative DNA-binding domain related to bacterial helix-turn-helix transcription regulators. Developmental proteins that act extracellularly are less conserved and often do not have bacterial homologues. Nevertheless, several provocative similarities between different groups of such proteins were detected.
- Emery P, Durand B, Mach B, Reith W
- RFX proteins, a novel family of DNA binding proteins conserved in the eukaryotic kingdom.
- Nucleic Acids Res. 1996; 24: 803-7
- Display abstract
Until recently, the RFX family of DNA binding proteins consisted exclusively of four mammalian members (RFX1-RFX4) characterized by a novel highly conserved DNA binding domain. Strong conservation of this DNA binding domain precluded a precise definition of the motif required for DNA binding. In addition, the biological systems in which these RFX proteins are implicated remained obscure. The recent identification of four new RFX genes has now shed light on the evolutionary conservation of the RFX family, contributed greatly to a detailed characterization of the RFX DNA binding motif, and provided clear evidence for the function of some of the RFX proteins. RFX proteins have been conserved throughout evolution in a wide variety of species, including Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, mouse and man. The characteristic RFX DNA binding motif has been recruited into otherwise very divergent regulatory factors functioning in a diverse spectrum of unrelated systems, including regulation of the mitotic cell cycle in fission yeast, the control of the immune response in mammals, and infection by human hepatitis B virus.
- Belfort M, Reaban ME, Coetzee T, Dalgaard JZ
- Prokaryotic introns and inteins: a panoply of form and function.
- J Bacteriol. 1995; 177: 3897-903
- Kumar A, Paietta JV
- The sulfur controller-2 negative regulatory gene of Neurospora crassa encodes a protein with beta-transducin repeats.
- Proc Natl Acad Sci U S A. 1995; 92: 3343-7
- Display abstract
The sulfur regulatory system of Neurospora crassa is composed of a set of structural genes involved in sulfur catabolism controlled by a genetically defined set of trans-acting regulatory genes. These sulfur regulatory genes include cys-3+, which encodes a basic region-leucine zipper transcriptional activator, and the negative regulatory gene scon-2+. We report here that the scon-2+ gene encodes a polypeptide of 650 amino acids belonging to the expanding beta-transducin family of eukaryotic regulatory proteins. Specifically, SCON2 protein contains six repeated G beta-homologous domains spanning the C-terminal half of the protein. SCON2 represents the initial filamentous fungal protein identified in the beta-transducin group. Additionally, SCON2 exhibits a specific amino-terminal domain that potentially defines another subfamily of beta-transducin homologs. Expression of the scon-2+ gene has been examined using RNA hybridization and gel mobility-shift analysis. The dependence of scon-2+ expression on CYS3 function and the binding of CYS3 to the scon-2+ promoter indicate the presence of an important control loop within the N. crassa sulfur regulatory circuit involving CYS3 activation of scon-2+ expression. On the basis of the presence of beta-transducin repeats, the crucial role of SCON2 in the signal-response pathway triggered by sulfur limitation may be mediated by protein-protein interactions.
- Henthorn KS, Friedman DI
- Identification of related genes in phages phi 80 and P22 whose products are inhibitory for phage growth in Escherichia coli IHF mutants.
- J Bacteriol. 1995; 177: 3185-90
- Display abstract
Bacteriophage lambda grows in both IHF+ and IHF- host strains, but the lambdoid phage phi 80 and hybrid phage lambda (QSRrha+)80 fail to grow in IHF- host strains. We have identified a gene, rha, in the phi80 region of the lambda(QSRrha+)80 genome whose product, Rha, inhibits phage growth in an IHF- host. A search of the GenBank database identified a homolog of rha, ORF201, a previously identified gene in phage P22, which similarly inhibits phage growth in IHF- hosts. Both rha and ORF201 contain two possible translation start sites and two IHF binding site consensus sequences flanking the translation start sites. Mutations allowing lambda (QSRrha+)80 and P22 to grow in IHF- hosts map in rha and ORF201, respectively. We present evidence suggesting that, in an IHF+ host, lambda(QSRrha+)80 expresses Rha only late in infection but in an IHF- host the phage expresses Rha at low levels early in infection and at levels higher than those in an IHF+ host late in infection. We suspect that the deregulation of rha expression and, by analogy, ORF201 expression, is responsible for the failure of phi80, lambda(QSRrha+)80, and P22 to grow in IHF mutants.
- Shadan FF, Villarreal LP
- The evolution of small DNA viruses of eukaryotes: past and present considerations.
- Virus Genes. 1995; 11: 239-57
- Display abstract
Historically, viral evolution has often been considered from the perspective of the ability of the virus to maintain viral pathogenic fitness by causing disease. A predator-prey model has been successfully applied to explain genetically variable quasi-species of viruses, such as influenza virus and human immunodeficiency virus (HIV), which evolve much faster rates than the host. In contrast, small DNA viruses (polyomaviruses, papillomaviruses, and parvoviruses) are species specific but are stable genetically, and appear to have co-evolved with their host species. Genetic stability is attributable primarily to the ability to establish and maintain a benign persistent state in vivo and not to the host DNA proofreading mechanisms. The persistent state often involves a cell cycle-regulated episomal state and a tight linkage of DNA amplification mechanisms to cellular differentiation. This linkage requires conserved features among viral regulatory proteins, with characteristic host-interactive domains needed to recruit and utilize host machinery, thus imposing mechanistic constrains on possible evolutionary options. Sequence similarities within these domains are seen amongst all small mammalian DNA viruses and most of the parvo-like viruses, including those that span the entire spectrum of evolution of organisms from E. coli to humans that replicate via a rolling circle-like mechanism among the entire spectrum of organisms throughout evolution from E. coli to humans. To achieve benign inapparent viral persistence, small DNA viruses are proposed to circumvent the host acute phase reaction (characterized by minimal inflammation) by mechanisms that are evolutionarily adapted to the immune system and the related cytokine communication networks. A striking example of this is the relationship of hymenoptera to polydnaviruses, in which the crucial to the recognition of self, development, and maintenance of genetic identity of both the host and virus. These observations in aggregate suggest that viral replicons are not recent "escapies" of host replication, but rather provide relentless pressure in driving the evolution of the host through cospeciation.
- Raumann BE, Rould MA, Pabo CO, Sauer RT
- DNA recognition by beta-sheets in the Arc repressor-operator crystal structure.
- Nature. 1994; 367: 754-7
- Display abstract
Transcription of the ant gene during lytic growth of bacteriophage P22 (ref. 1) is regulated by the cooperative binding of two Arc repressor dimers to a 21-base-pair operator site. Here we report the co-crystal structure of this Arc tetramer-operator complex at 2.6 A resolution. As expected from genetic and structural studies and from the co-crystal structure of the homologous Escherichia coli MetJ repressor, each Arc dimer uses an antiparallel beta-sheet to recognize bases in the major groove. However, the Arc and MetJ complexes differ in several important ways: the beta-sheet-DNA interactions of Arc are far less symmetrical; DNA binding by Arc is accompanied by important conformational changes in the beta-sheet; and Arc uses a different part of its protein surface for dimer-dimer interactions.
- Farmer AA et al.
- Extreme evolutionary conservation of QM, a novel c-Jun associated transcription factor.
- Hum Mol Genet. 1994; 3: 723-8
- Display abstract
QM is a 214 amino acid polypeptide, encoded by a gene (DXS648) in Xq28, that contains a high percentage of charged amino acids and has been found to bind c-Jun and DNA. Searches of the GenBank database revealed no matches between QM and any other known transcription factors. However, we and others have isolated QM homologs from a diverse array of eukaryotes. Alignment of these sequences indicated a high degree of conservation throughout the first 175 residues of the protein and revealed several interesting features. Most notable is the considerable conservation of charged amino acids within specific regions of the protein. Secondary structure analysis suggests that two of these regions form amphipathic alpha-helices, one basic and one acidic. A third conserved charged domain, comprising the N-terminal 30 amino acids, is both basic and proline rich. The rate of sequence divergence of the various homologs was found to be slow (of the order of 1% change every 22 million years), consistent with a critical role for QM in eukaryotic cells. A role for QM as a novel class of transcription regulatory protein is suggested.
- Wagner S, Green MR
- DNA-binding domains: targets for viral and cellular regulators.
- Curr Opin Cell Biol. 1994; 6: 410-4
- Display abstract
Specific DNA binding by eukaryotic transcription factors is conferred by several types of sequence motif. These domains have been extensively studied with regard to their precise interaction with DNA and the basis of sequence specificity. Evidence is accumulating that DNA-binding domains serve functions in addition to binding DNA: they are also targets of viral and cellular regulatory proteins.
- Shadan FF, Villarreal LP
- Coevolution of persistently infecting small DNA viruses and their hosts linked to host-interactive regulatory domains.
- Proc Natl Acad Sci U S A. 1993; 90: 4117-21
- Display abstract
Although most RNA viral genomes (and related cellular retroposons) can evolve at rates a millionfold greater than that of their host genomes, some of the small DNA viruses (polyomaviruses and papillomaviruses) appear to evolve at much slower rates. These DNA viruses generally cause host species-specific inapparent primary infections followed by life-long, benign persistent infections. Using global progressive sequence alignments for kidney-specific Polyomaviridae (mouse, hamster, primate, human), we have constructed parsimonious evolutionary trees for the viral capsid proteins (VP1, VP2/VP3) and the large tumor (T) antigen. We show that these three coding sequences can yield phylogenetic trees similar to each other and to that of their host species. Such virus-host "co-speciation" appears incongruent with some prevailing views of viral evolution, and we suggest that inapparent persistent infections may link virus and most host evolution. Similarity analysis identified three specific regions of polyoma regulatory gene products (T antigens) as highly conserved, and two of these regions correspond to binding sites for host regulatory proteins (p53, the retinoblastoma gene product p105, and the related protein p107). The p53 site overlaps with a conserved ATPase domain and the retinoblastoma site corresponds to conserved region 1 of E1A protein of adenovirus type 5. We examined the local conservation of these binding sequences and show that the conserved retinoblastoma binding domain is characteristic and inclusive of the entire polyomavirus family, but the conserved p53-like binding domain is characteristic and inclusive of three entire families of small DNA viruses: polyomaviruses, papillomaviruses, and parvoviruses. The evolution of small-DNA-virus families may thus be tightly linked to host evolution and speciation by interaction with a subset of host regulatory proteins.
- Koonin EV
- A common set of conserved motifs in a vast variety of putative nucleic acid-dependent ATPases including MCM proteins involved in the initiation of eukaryotic DNA replication.
- Nucleic Acids Res. 1993; 21: 2541-7
- Display abstract
A new superfamily of (putative) DNA-dependent ATPases is described that includes the ATPase domains of prokaryotic NtrC-related transcription regulators, MCM proteins involved in the initiation of eukaryotic DNA replication, and a group of uncharacterized bacterial and chloroplast proteins. MCM proteins are shown to contain a modified form of the ATP-binding motif and are predicted to mediate ATP-dependent opening of double-stranded DNA in the replication origins. In a second line of investigation, it is demonstrated that the products of unidentified open reading frames from Marchantia mitochondria and from yeast, and a domain of a baculovirus protein involved in viral DNA replication are related to the superfamily III of DNA and RNA helicases that previously has been known to include only proteins of small viruses. Comparison of the multiple alignments showed that the proteins of the NtrC superfamily and the helicases of superfamily III share three related sequence motifs tightly packed in the ATPase domain that consists of 100-150 amino acid residues. A similar array of conserved motifs is found in the family of DnaA-related ATPases. It is hypothesized that the three large groups of nucleic acid-dependent ATPases have similar structure of the core ATPase domain and have evolved from a common ancestor.
- Krajewska WM
- Regulation of transcription in eukaryotes by DNA-binding proteins.
- Int J Biochem. 1992; 24: 1885-98
- Display abstract
1. The recognition of DNA by gene regulatory proteins is often mediated by structural motifs that comprise a protein DNA-binding domain. 2. Although binding of these proteins to DNA is not itself sufficient to affect transcription it is a necessary prerequisite. 3. This review summarizes recent studies that define structural motifs for DNA binding function of eukaryotic transcription factors.
- Jones NC, Rigby PW, Ziff EB
- Trans-acting protein factors and the regulation of eukaryotic transcription: lessons from studies on DNA tumor viruses.
- Genes Dev. 1988; 2: 267-81
- Bodnar JW, Ward DC
- Highly recurring sequence elements identified in eukaryotic DNAs by computer analysis are often homologous to regulatory sequences or protein binding sites.
- Nucleic Acids Res. 1987; 15: 1835-51
- Display abstract
We have used computer assisted dot matrix and oligonucleotide frequency analyses to identify highly recurring sequence elements of 7-11 base pairs in eukaryotic genes and viral DNAs. Such elements are found much more frequently than expected, often with an average spacing of a few hundred base pairs. Furthermore, the most abundant repetitive elements observed in the ovalbumin locus, the beta-globin gene cluster, the metallothionein gene and the viral genomes of SV40, polyoma, Herpes simplex-1 and Mouse Mammary Tumor Virus were sequences shown previously to be protein binding sites or sequences important for regulating gene expression. These sequences were present in both exons and introns as well as promoter regions. These observations suggest that such sequences are often highly overrepresented within the specific gene segments with which they are associated. Computer analysis of other genetic units, including viral genomes and oncogenes, has identified a number of highly recurring sequence elements that could serve similar regulatory or protein-binding functions. A model for the role of such reiterated sequence elements in DNA organization and function is presented.
- Binns MM, Stenzler L, Tomley FM, Campbell J, Boursnell ME
- Identification by a random sequencing strategy of the fowlpoxvirus DNA polymerase gene, its nucleotide sequence and comparison with other viral DNA polymerases.
- Nucleic Acids Res. 1987; 15: 6563-73
- Display abstract
The nucleotide sequence of the DNA polymerase gene of the avipoxvirus fowlpox is presented and the predicted amino acid sequence compared with that of the orthopoxvirus vaccinia. The results have brought to light an error in the vaccinia sequence which has resulted in the ommission of 44 amino acids from the carboxy-terminus of the vaccinia DNA polymerase. There has been extensive conservation of amino acids throughout the enzymes, and regions identified as being present in DNA polymerases from a wide range of viruses are again present here. The method used to identify the fowlpoxvirus gene could have applications towards defining genomic organisations in other viral systems.
- Ptashne M
- Gene regulation by proteins acting nearby and at a distance.
- Nature. 1986; 322: 697-701
- Display abstract
Transcription of genes can be controlled by regulatory proteins that bind to sites on the DNA either nearby or at a considerable distance. Recent experiments suggest a unified view of these apparently disparate types of gene regulation.
- Greene JR et al.
- Sequence of the bacteriophage SP01 gene coding for transcription factor 1, a viral homologue of the bacterial type II DNA-binding proteins.
- Proc Natl Acad Sci U S A. 1984; 81: 7031-5
- Display abstract
The Bacillus subtilis phage SP01, whose DNA contains 5-hydroxymethyluracil (hmUra) in place of thymine, codes for an abundant, small, basic protein called TF1. TF1 binds preferentially to hydroxymethyluracil-containing DNA and thereby selectively inhibits transcription of such DNA in vitro. The gene for TF1 has been sequenced. We find that this viral protein is a homologue of the ubiquitous bacterial type II DNA-binding proteins. The three-dimensional structure of one of these bacterial proteins has recently been determined. We are able to discern common as well as distinctive features in the amino acid sequence and the three-dimensional structure of the homologous viral protein.
- Sauer RT, Krovatin W, DeAnda J, Youderian P, Susskind MM
- Primary structure of the immI immunity region of bacteriophage P22.
- J Mol Biol. 1983; 168: 699-713
- Display abstract
The DNA sequence of the immI immunity region of bacteriophage P22 has been determined. This region includes the ant gene, which encodes the P22 antirepressor protein, and the mnt and arc genes, which encode proteins that negatively regulate antirepressor synthesis. We have purified antirepressor protein and selected tryptic peptides of antirepressor, and have determined the amino terminal sequences and amino acid composition of these molecules. These data, in combination with the DNA sequence, locate the ant gene and define the complete amino acid sequence of antirepressor (300 residues). The mnt and arc genes have been located by sequencing the mnt-am343 and arc-amH1605 mutations. The Mnt and Arc proteins are predicted to be small, basic polypeptides that are homologous in amino acid sequence. The Mnt protein also shows significant sequence homology with the lambda Cro protein. The arc and ant genes are transcribed rightward from the Pant promoter, while mnt is transcribed leftward from a promotor that may overlap Pant. The Mnt protein apparently acts by binding to an operator site located immediately adjacent to the startpoint of Pant transcription.
- Ohlendorf DH, Anderson WF, Matthews BW
- Many gene-regulatory proteins appear to have a similar alpha-helical fold that binds DNA and evolved from a common precursor.
- J Mol Evol. 1983; 19: 109-14
- Display abstract
Amino acid and DNA sequence comparisons suggest that many sequence-specific DNA-binding proteins have in common an homologous region of about 22 amino acids. This region corresponds to two consecutive alpha-helices that occur in both Cro and cI repressor proteins of bacteriophage lambda and in catabolite gene activator protein of Escherichia coli and are presumed to interact with DNA. The results obtained here suggest that this alpha-helical DNA-binding fold occurs in many proteins that regulate gene expression. It also appears that this DNA-binding unit evolved from a common evolutionary precursor.
- Mayor HD, Melnick JL
- Small deoxyribonucleic acid-containing viruses (picodnavirus group).
- Nature. 1966; 210: 331-2