Secondary literature sources for Cache_2
The following references were automatically generated.
- Biswas KH, Badireddy S, Rajendran A, Anand GS, Visweswariah SS
- Cyclic nucleotide binding and structural changes in the isolated GAF domain of Anabaena adenylyl cyclase, CyaB2.
- PeerJ. 2015; 3: 882-882
- Display abstract
GAF domains are a large family of regulatory domains, and a subset are found associated with enzymes involved in cyclic nucleotide (cNMP) metabolism such as adenylyl cyclases and phosphodiesterases. CyaB2, an adenylyl cyclase from Anabaena, contains two GAF domains in tandem at the N-terminus and an adenylyl cyclase domain at the C-terminus. Cyclic AMP, but not cGMP, binding to the GAF domains of CyaB2 increases the activity of the cyclase domain leading to enhanced synthesis of cAMP. Here we show that the isolated GAFb domain of CyaB2 can bind both cAMP and cGMP, and enhanced specificity for cAMP is observed only when both the GAFa and the GAFb domains are present in tandem (GAFab domain). In silico docking and mutational analysis identified distinct residues important for interaction with either cAMP or cGMP in the GAFb domain. Structural changes associated with ligand binding to the GAF domains could not be detected by bioluminescence resonance energy transfer (BRET) experiments. However, amide hydrogen-deuterium exchange mass spectrometry (HDXMS) experiments provided insights into the structural basis for cAMP-induced allosteric regulation of the GAF domains, and differences in the changes induced by cAMP and cGMP binding to the GAF domain. Thus, our findings could allow the development of molecules that modulate the allosteric regulation by GAF domains present in pharmacologically relevant proteins.
- Kodavali PK, Dudkiewicz M, Pikula S, Pawlowski K
- Bioinformatics analysis of bacterial annexins--putative ancestral relatives of eukaryotic annexins.
- PLoS One. 2014; 9: 85428-85428
- Display abstract
Annexins are Ca(2+)-binding, membrane-interacting proteins, widespread among eukaryotes, consisting usually of four structurally similar repeated domains. It is accepted that vertebrate annexins derive from a double genome duplication event. It has been postulated that a single domain annexin, if found, might represent a molecule related to the hypothetical ancestral annexin. The recent discovery of a single-domain annexin in a bacterium, Cytophaga hutchinsonii, apparently confirmed this hypothesis. Here, we present a more complex picture. Using remote sequence similarity detection tools, a survey of bacterial genomes was performed in search of annexin-like proteins. In total, we identified about thirty annexin homologues, including single-domain and multi-domain annexins, in seventeen bacterial species. The thorough search yielded, besides the known annexin homologue from C. hutchinsonii, homologues from the Bacteroidetes/Chlorobi phylum, from Gemmatimonadetes, from beta- and delta-Proteobacteria, and from Actinobacteria. The sequences of bacterial annexins exhibited remote but statistically significant similarity to sequence profiles built of the eukaryotic ones. Some bacterial annexins are equipped with additional, different domains, for example those characteristic for toxins. The variation in bacterial annexin sequences, much wider than that observed in eukaryotes, and different domain architectures suggest that annexins found in bacteria may actually descend from an ancestral bacterial annexin, from which eukaryotic annexins also originate. The hypothesis of an ancient origin of bacterial annexins has to be reconciled with the fact that remarkably few bacterial strains possess annexin genes compared to the thousands of known bacterial genomes and with the patchy, anomalous phylogenetic distribution of bacterial annexins. Thus, a massive annexin gene loss in several bacterial lineages or very divergent evolution would appear a likely explanation. Alternative evolutionary scenarios, involving horizontal gene transfer between bacteria and protozoan eukaryotes, in either direction, appear much less likely. Altogether, current evidence does not allow unequivocal judgement as to the origin of bacterial annexins.
- Neely A, Hidalgo P
- Structure-function of proteins interacting with the alpha1 pore-forming subunit of high-voltage-activated calcium channels.
- Front Physiol. 2014; 5: 209-209
- Display abstract
Openings of high-voltage-activated (HVA) calcium channels lead to a transient increase in calcium concentration that in turn activate a plethora of cellular functions, including muscle contraction, secretion and gene transcription. To coordinate all these responses calcium channels form supramolecular assemblies containing effectors and regulatory proteins that couple calcium influx to the downstream signal cascades and to feedback elements. According to the original biochemical characterization of skeletal muscle Dihydropyridine receptors, HVA calcium channels are multi-subunit protein complexes consisting of a pore-forming subunit (alpha1) associated with four additional polypeptide chains beta, alpha2, delta, and gamma, often referred to as accessory subunits. Twenty-five years after the first purification of a high-voltage calcium channel, the concept of a flexible stoichiometry to expand the repertoire of mechanisms that regulate calcium channel influx has emerged. Several other proteins have been identified that associate directly with the alpha1-subunit, including calmodulin and multiple members of the small and large GTPase family. Some of these proteins only interact with a subset of alpha1-subunits and during specific stages of biogenesis. More strikingly, most of the alpha1-subunit interacting proteins, such as the beta-subunit and small GTPases, regulate both gating and trafficking through a variety of mechanisms. Modulation of channel activity covers almost all biophysical properties of the channel. Likewise, regulation of the number of channels in the plasma membrane is performed by altering the release of the alpha1-subunit from the endoplasmic reticulum, by reducing its degradation or enhancing its recycling back to the cell surface. In this review, we discuss the structural basis, interplay and functional role of selected proteins that interact with the central pore-forming subunit of HVA calcium channels.
- LaRonde NA
- The ancient microbial RIO kinases.
- J Biol Chem. 2014; 289: 9488-92
- Display abstract
The RIO kinases existed before the split between Archaea and Eubacteria and are essential in eukaryotes. Although much has been elucidated in the past few years regarding the function of these proteins in eukaryotes, questions remain about their role in prokaryotes. Comparison of structure and sequence suggests that the ancient RIO kinases may have similar functional properties in prokaryotes as they do in eukaryotes. The conservation of charge distribution, functional residues, and overall structure supports a role for these proteins in ribosome interactions, as is their purpose in eukaryotes. However, a lack of study in this area has left little direct evidence in support of this function.
- Dunin-Horkawicz S, Kopec KO, Lupas AN
- Prokaryotic ancestry of eukaryotic protein networks mediating innate immunity and apoptosis.
- J Mol Biol. 2014; 426: 1568-82
- Display abstract
Protein domains characteristic of eukaryotic innate immunity and apoptosis have many prokaryotic counterparts of unknown function. By reconstructing interactomes computationally, we found that bacterial proteins containing these domains are part of a network that also includes other domains not hitherto associated with immunity. This network is connected to the network of prokaryotic signal transduction proteins, such as histidine kinases and chemoreceptors. The network varies considerably in domain composition and degree of paralogy, even between strains of the same species, and its repetitive domains are often amplified recently, with individual repeats sharing up to 100% sequence identity. Both phenomena are evidence of considerable evolutionary pressure and thus compatible with a role in the "arms race" between host and pathogen. In order to investigate the relationship of this network to its eukaryotic counterparts, we performed a cluster analysis of organisms based on a census of its constituent domains across all fully sequenced genomes. We obtained a large central cluster of mainly unicellular organisms, from which multicellular organisms radiate out in two main directions. One is taken by multicellular bacteria, primarily cyanobacteria and actinomycetes, and plants form an extension of this direction, connected via the basal, unicellular cyanobacteria. The second main direction is taken by animals and fungi, which form separate branches with a common root in the alpha-proteobacteria of the central cluster. This analysis supports the notion that the innate immunity networks of eukaryotes originated from their endosymbionts and that increases in the complexity of these networks accompanied the emergence of multicellularity.
- Khafif M, Cottret L, Balague C, Raffaele S
- Identification and phylogenetic analyses of VASt, an uncharacterized protein domain associated with lipid-binding domains in Eukaryotes.
- BMC Bioinformatics. 2014; 15: 222-222
- Display abstract
BACKGROUND: Several regulators of programmed cell death (PCD) in plants encode proteins with putative lipid-binding domains. Among them, VAD1 is a regulator of PCD propagation harboring a GRAM putative lipid-binding domain. However the function of VAD1 at the subcellular level is unknown and the domain architecture of VAD1 has not been analyzed in details. RESULTS: We analyzed sequence conservation across the plant kingdom in the VAD1 protein and identified an uncharacterized VASt (VAD1 Analog of StAR-related lipid transfer) domain. Using profile hidden Markov models (profile HMMs) and phylogenetic analysis we found that this domain is conserved among eukaryotes and generally associates with various lipid-binding domains. Proteins containing both a GRAM and a VASt domain include notably the yeast Ysp2 cell death regulator and numerous uncharacterized proteins. Using structure-based phylogeny, we found that the VASt domain is structurally related to Bet v1-like domains. CONCLUSION: We identified a novel protein domain ubiquitous in Eukaryotic genomes and belonging to the Bet v1-like superfamily. Our findings open perspectives for the functional analysis of VASt-containing proteins and the characterization of novel mechanisms regulating PCD.
- Wu R et al.
- Insight into the sporulation phosphorelay: crystal structure of the sensor domain of Bacillus subtilis histidine kinase, KinD.
- Protein Sci. 2013; 22: 564-76
- Display abstract
The Bacillus subtilis KinD signal-transducing histidine kinase is a part of the sporulation phosphorelay known to regulate important developmental decisions such as sporulation and biofilm formation. We have determined crystal structures of the extracytoplasmic sensing domain of KinD, which was copurified and crystallized with a pyruvate ligand. The structure of a ligand-binding site mutant was also determined; it was copurified and crystallized with an acetate ligand. The structure of the KinD extracytoplasmic segment is similar to that of several other sensing domains of signal transduction proteins and is composed of tandem Per-Arnt-Sim (PAS)-like domains. The KinD ligand-binding site is located on the membrane distal PAS-like domain and appears to be highly selective; a single mutation, R131A, abolishes pyruvate binding and the mutant binds acetate instead. Differential scanning fluorimetry, using a variety of monocarboxylic and dicarboxylic acids, identified pyruvate, propionate, and butyrate but not lactate, acetate, or malate as KinD ligands. A recent report found that malate induces biofilm formation in a KinD-dependent manner. It was suggested that malate might induce a metabolic shift and increased secretion of the KinD ligand of unknown identity. The structure and binding assays now suggests that this ligand is pyruvate and/or other small monocarboxylic acids. In summary, this study gives a first insight into the identity of a molecular ligand for one of the five phosphorelay kinases of B. subtilis.
- Lluis MW, Godfroy JI 3rd, Yin H
- Protein engineering methods applied to membrane protein targets.
- Protein Eng Des Sel. 2013; 26: 91-100
- Display abstract
Genes encoding membrane proteins have been estimated to comprise as much as 30% of the human genome. Among these membrane, proteins are a large number of signaling receptors, transporters, ion channels and enzymes that are vital to cellular regulation, metabolism and homeostasis. While many membrane proteins are considered high-priority targets for drug design, there is a dearth of structural and biochemical information on them. This lack of information stems from the inherent insolubility and instability of transmembrane domains, which prevents easy obtainment of high-resolution crystals to specifically study structure-function relationships. In part, this lack of structures has greatly impeded our understanding in the field of membrane proteins. One method that can be used to enhance our understanding is directed evolution, a molecular biology method that mimics natural selection to engineer proteins that have specific phenotypes. It is a powerful technique that has considerable success with globular proteins, notably the engineering of protein therapeutics. With respect to transmembrane protein targets, this tool may be underutilized. Another powerful tool to investigate membrane protein structure-function relationships is computational modeling. This review will discuss these protein engineering methods and their tremendous potential in the study of membrane proteins.
- Goncearenco A, Berezovsky IN
- Exploring the evolution of protein function in Archaea.
- BMC Evol Biol. 2012; 12: 75-75
- Display abstract
BACKGROUND: Despite recent progress in studies of the evolution of protein function, the questions what were the first functional protein domains and what were their basic building blocks remain unresolved. Previously, we introduced the concept of elementary functional loops (EFLs), which are the functional units of enzymes that provide elementary reactions in biochemical transformations. They are presumably descendants of primordial catalytic peptides. RESULTS: We analyzed distant evolutionary connections between protein functions in Archaea based on the EFLs comprising them. We show examples of the involvement of EFLs in new functional domains, as well as reutilization of EFLs and functional domains in building multidomain structures and protein complexes. CONCLUSIONS: Our analysis of the archaeal superkingdom yields the dominating mechanisms in different periods of protein evolution, which resulted in several levels of the organization of biochemical function. First, functional domains emerged as combinations of prebiotic peptides with the very basic functions, such as nucleotide/phosphate and metal cofactor binding. Second, domain recombination brought to the evolutionary scene the multidomain proteins and complexes. Later, reutilization and de novo design of functional domains and elementary functional loops complemented evolution of protein function.
- Suen S, Lu HH, Yeang CH
- Evolution of domain architectures and catalytic functions of enzymes in metabolic systems.
- Genome Biol Evol. 2012; 4: 976-93
- Display abstract
Domain architectures and catalytic functions of enzymes constitute the centerpieces of a metabolic network. These types of information are formulated as a two-layered network consisting of domains, proteins, and reactions-a domain-protein-reaction (DPR) network. We propose an algorithm to reconstruct the evolutionary history of DPR networks across multiple species and categorize the mechanisms of metabolic systems evolution in terms of network changes. The reconstructed history reveals distinct patterns of evolutionary mechanisms between prokaryotic and eukaryotic networks. Although the evolutionary mechanisms in early ancestors of prokaryotes and eukaryotes are quite similar, more novel and duplicated domain compositions with identical catalytic functions arise along the eukaryotic lineage. In contrast, prokaryotic enzymes become more versatile by catalyzing multiple reactions with similar chemical operations. Moreover, different metabolic pathways are enriched with distinct network evolution mechanisms. For instance, although the pathways of steroid biosynthesis, protein kinases, and glycosaminoglycan biosynthesis all constitute prominent features of animal-specific physiology, their evolution of domain architectures and catalytic functions follows distinct patterns. Steroid biosynthesis is enriched with reaction creations but retains a relatively conserved repertoire of domain compositions and proteins. Protein kinases retain conserved reactions but possess many novel domains and proteins. In contrast, glycosaminoglycan biosynthesis has high rates of reaction/protein creations and domain recruitments. Finally, we elicit and validate two general principles underlying the evolution of DPR networks: 1) duplicated enzyme proteins possess similar catalytic functions and 2) the majority of novel domains arise to catalyze novel reactions. These results shed new lights on the evolution of metabolic systems.
- Ueki T, Leang C, Inoue K, Lovley DR
- Identification of multicomponent histidine-aspartate phosphorelay system controlling flagellar and motility gene expression in Geobacter species.
- J Biol Chem. 2012; 287: 10958-66
- Display abstract
Geobacter species play an important role in the natural biogeochemical cycles of aquatic sediments and subsurface environments as well as in subsurface bioremediation by oxidizing organic compounds with the reduction of insoluble Fe(III) oxides. Flagellum-based motility is considered to be critical for Geobacter species to locate fresh sources of Fe(III) oxides. Functional and comparative genomic approaches, coupled with genetic and biochemical methods, identified key regulators for flagellar gene expression in Geobacter species. A master transcriptional regulator, designated FgrM, is a member of the enhancer-binding protein family. The fgrM gene in the most studied strain of Geobacter species, Geobacter sulfurreducens strain DL-1, is truncated by a transposase gene, preventing flagellar biosynthesis. Integrating a functional FgrM homolog restored flagellar biosynthesis and motility in G. sulfurreducens DL-1 and enhanced the ability to reduce insoluble Fe(III) oxide. Interrupting the fgrM gene in G. sulfurreducens strain KN400, which is motile, removed the capacity for flagellar production and inhibited Fe(III) oxide reduction. FgrM, which is also a response regulator of the two-component His-Asp phosphorelay system, was phosphorylated by histidine kinase GHK4, which was essential for flagellar production and motility. GHK4, which is a hybrid kinase with a receiver domain at the N terminus, was phosphorylated by another histidine kinase, GHK3. Therefore, the multicomponent His-Asp phosphorelay system appears to control flagellar gene expression in Geobacter species.
- Eisenbeis S et al.
- Potential of fragment recombination for rational design of proteins.
- J Am Chem Soc. 2012; 134: 4019-22
- Display abstract
It is hypothesized that protein domains evolved from smaller intrinsically stable subunits via combinatorial assembly. Illegitimate recombination of fragments that encode protein subunits could have quickly led to diversification of protein folds and their functionality. This evolutionary concept presents an attractive strategy to protein engineering, e.g., to create new scaffolds for enzyme design. We previously combined structurally similar parts from two ancient protein folds, the (betaalpha)(8)-barrel and the flavodoxin-like fold. The resulting "hopeful monster" differed significantly from the intended (betaalpha)(8)-barrel fold by an extra beta-strand in the core. In this study, we ask what modifications are necessary to form the intended structure and what potential this approach has for the rational design of functional proteins. Guided by computational design, we optimized the interface between the fragments with five targeted mutations yielding a stable, monomeric protein whose predicted structure was verified experimentally. We further tested binding of a phosphorylated compound and detected that some affinity was already present due to an intact phosphate-binding site provided by one fragment. The affinity could be improved quickly to the level of natural proteins by introducing two additional mutations. The study illustrates the potential of recombining protein fragments with unique properties to design new and functional proteins, offering both a possible pathway of protein evolution and a protocol to rapidly engineer proteins for new applications.
- Pancsa R, Tompa P
- Structural disorder in eukaryotes.
- PLoS One. 2012; 7: 34687-34687
- Display abstract
Based on early bioinformatic studies on a handful of species, the frequency of structural disorder of proteins is generally thought to be much higher in eukaryotes than in prokaryotes. To refine this view, we present here a comparative prediction study and analysis of 194 fully described eukaryotic proteomes and 87 reference prokaryotes for structural disorder. We found that structural disorder does distinguish eukaryotes from prokaryotes, but its frequency spans a very wide range in the two superkingdoms that largely overlap. The number of disordered binding regions and different Pfam domain types also contribute to distinguish eukaryotes from prokaryotes. Unexpectedly, the highest levels--and highest variability--of predicted disorder is found in protists, i.e. single-celled eukaryotes, often surpassing more complex eukaryote organisms, plants and animals. This trend contrasts with that of the number of domain types, which increases rather monotonously toward more complex organisms. The level of structural disorder appears to be strongly correlated with lifestyle, because some obligate intracellular parasites and endosymbionts have the lowest levels, whereas host-changing parasites have the highest level of predicted disorder. We conclude that protists have been the evolutionary hot-bed of experimentation with structural disorder, in a period when structural disorder was actively invented and the major functional classes of disordered proteins established.
- Chen Y et al.
- A Bacillus subtilis sensor kinase involved in triggering biofilm formation on the roots of tomato plants.
- Mol Microbiol. 2012; 85: 418-30
- Display abstract
The soil bacterium Bacillus subtilis is widely used in agriculture as a biocontrol agent able to protect plants from a variety of pathogens. Protection is thought to involve the formation of bacterial communities - biofilms - on the roots of the plants. Here we used confocal microscopy to visualize biofilms on the surface of the roots of tomato seedlings and demonstrated that biofilm formation requires genes governing the production of the extracellular matrix that holds cells together. We further show that biofilm formation was dependent on the sensor histidine kinase KinD and in particular on an extracellular CACHE domain implicated in small molecule sensing. Finally, we report that exudates of tomato roots strongly stimulated biofilm formation ex planta and that an abundant small molecule in the exudates, (L) -malic acid, was able to stimulate biofilm formation at high concentrations in a manner that depended on the KinD CACHE domain. We propose that small signalling molecules released by the roots of tomato plants are directly or indirectly recognized by KinD, triggering biofilm formation.
- Li Q, Cheng T, Wang Y, Bryant SH
- Characterizing protein domain associations by Small-molecule ligand binding.
- J Proteome Sci Comput Biol. 2012; 1: 0-0
- Display abstract
Protein domains are evolutionarily conserved building blocks for protein structure and function, which are conventionally identified based on protein sequence or structure similarity. Small molecule binding domains are of great importance for the recognition of small molecules in biological systems and drug development. Many small molecules, including drugs, have been increasingly identified to bind to multiple targets, leading to promiscuous interactions with protein domains. Thus, a large scale characterization of the protein domains and their associations with respect to small-molecule binding is of particular interest to system biology research, drug target identification, as well as drug repurposing. We compiled a collection of 13,822 physical interactions of small molecules and protein domains derived from the Protein Data Bank (PDB) structures. Based on the chemical similarity of these small molecules, we characterized pairwise associations of the protein domains and further investigated their global associations from a network point of view. We found that protein domains, despite lack of similarity in sequence and structure, were comprehensively associated through binding the same or similar small-molecule ligands. Moreover, we identified modules in the domain network that consisted of closely related protein domains by sharing similar biochemical mechanisms, being involved in relevant biological pathways, or being regulated by the same cognate cofactors. A novel protein domain relationship was identified in the context of small-molecule binding, which is complementary to those identified by traditional sequence-based or structure-based approaches. The protein domain network constructed in the present study provides a novel perspective for chemogenomic study and network pharmacology, as well as target identification for drug repurposing.
- Leclere L, Rentzsch F
- Repeated evolution of identical domain architecture in metazoan netrin domain-containing proteins.
- Genome Biol Evol. 2012; 4: 883-99
- Display abstract
The majority of proteins in eukaryotes are composed of multiple domains, and the number and order of these domains is an important determinant of protein function. Although multidomain proteins with a particular domain architecture were initially considered to have a common evolutionary origin, recent comparative studies of protein families or whole genomes have reported that a minority of multidomain proteins could have appeared multiple times independently. Here, we test this scenario in detail for the signaling molecules netrin and secreted frizzled-related proteins (sFRPs), two groups of netrin domain-containing proteins with essential roles in animal development. Our primary phylogenetic analyses suggest that the particular domain architectures of each of these proteins were present in the eumetazoan ancestor and evolved a second time independently within the metazoan lineage from laminin and frizzled proteins, respectively. Using an array of phylogenetic methods, statistical tests, and character sorting analyses, we show that the polyphyly of netrin and sFRP is well supported and cannot be explained by classical phylogenetic reconstruction artifacts. Despite their independent origins, the two groups of netrins and of sFRPs have the same protein interaction partners (Deleted in Colorectal Cancer/neogenin and Unc5 for netrins and Wnts for sFRPs) and similar developmental functions. Thus, these cases of convergent evolution emphasize the importance of domain architecture for protein function by uncoupling shared domain architecture from shared evolutionary history. Therefore, we propose the terms merology to describe the repeated evolution of proteins with similar domain architecture and discuss the potential of merologous proteins to help understanding protein evolution.
- Ghosh A, Albers SV
- Assembly and function of the archaeal flagellum.
- Biochem Soc Trans. 2011; 39: 64-9
- Display abstract
Motility is a common behaviour in prokaryotes. Both bacteria and archaea use flagella for swimming motility, but it has been well documented that structures of the flagellum from these two domains of life are completely different, although they contribute to a similar function. Interestingly, information available to date has revealed that structurally archaeal flagella are more similar to bacterial type IV pili rather than to bacterial flagella. With the increasing genome sequence information and advancement in genetic tools for archaea, identification of the components involved in the assembly of the archaeal flagellum is possible. A subset of these components shows similarities to components from type IV pilus-assembly systems. Whereas the molecular players involved in assembly of the archaeal flagellum are being identified, the mechanics and dynamics of the assembly of the archaeal flagellum have yet to be established. Recent computational analysis in our laboratory has identified conserved highly charged loop regions within one of the core proteins of the flagellum, the membrane integral protein FlaJ, and predicted that these are involved in the interaction with the assembly ATPase FlaI. Interestingly, considerable variation was found among the loops of FlaJ from the two major subkingdoms of archaea, the Euryarchaeota and the Crenarchaeota. Understanding the assembly pathway and creating an interaction map of the molecular players in the archaeal flagellum will shed light on the details of the assembly and also the evolutionary relationship to the bacterial type IV pili-assembly systems.
- Janecek S, Svensson B, MacGregor EA
- Structural and evolutionary aspects of two families of non-catalytic domains present in starch and glycogen binding proteins from microbes, plants and animals.
- Enzyme Microb Technol. 2011; 49: 429-40
- Display abstract
Starch-binding domains (SBDs) comprise distinct protein modules that bind starch, glycogen or related carbohydrates and have been classified into different families of carbohydrate-binding modules (CBMs). The present review focuses on SBDs of CBM20 and CBM48 found in amylolytic enzymes from several glycoside hydrolase (GH) families GH13, GH14, GH15, GH31, GH57 and GH77, as well as in a number of regulatory enzymes, e.g., phosphoglucan, water dikinase-3, genethonin-1, laforin, starch-excess protein-4, the beta-subunit of AMP-activated protein kinase and its homologues from sucrose non-fermenting-1 protein kinase SNF1 complex, and an adaptor-regulator related to the SNF1/AMPK family, AKINbetagamma. CBM20s and CBM48s of amylolytic enzymes occur predominantly in the microbial world, whereas the non-amylolytic proteins containing these modules are mostly of plant and animal origin. Comparison of amino acid sequences and tertiary structures of CBM20 and CBM48 reveals the close relatedness of these SBDs and, in some cases, glycogen-binding domains (GBDs). The families CBM20 and CBM48 share both an ancestral form and the mode of starch/glycogen binding at one or two binding sites. Phylogenetic analyses demonstrate that they exhibit independent behaviour, i.e. each family forms its own part in an evolutionary tree, with enzyme specificity (protein function) being well represented within each family. The distinction between CBM20 and CBM48 families is not sharp since there are representatives in both CBM families that possess an intermediate character. These are, for example, CBM20s from hypothetical GH57 amylopullulanase (probably lacking the starch-binding site 2) and CBM48s from the GH13 pullulanase subfamily (probably lacking the starch/glycogen-binding site 1). The knowledge gained concerning the occurrence of these SBDs and GBDs through the range of taxonomy will support future experimental research.
- Arcus VL, McKenzie JL, Robson J, Cook GM
- The PIN-domain ribonucleases and the prokaryotic VapBC toxin-antitoxin array.
- Protein Eng Des Sel. 2011; 24: 33-40
- Display abstract
The PIN-domains are small proteins of ~130 amino acids that are found in bacteria, archaea and eukaryotes and are defined by a group of three strictly conserved acidic amino acids. The conserved three-dimensional structures of the PIN-domains cluster these acidic residues in an enzymatic active site. PIN-domains cleave single-stranded RNA in a sequence-specific, Mg(2)+- or Mn(2)+-dependent manner. These ribonucleases are toxic to the cells which express them and to offset this toxicity, they are co-expressed with tight binding protein inhibitors. The genes encoding these two proteins are adjacent in the genome of all prokaryotic organisms where they are found. This sequential arrangement of inhibitor-RNAse genes conforms to that of the so-called toxin-antitoxin (TA) modules and the PIN-domain TAs have been named VapBC TAs (virulence associated proteins, VapB is the inhibitor which contains a transcription factor domain and VapC is the PIN-domain ribonuclease). The presence of large numbers of vapBC loci in disparate prokaryotes has motivated many researchers to investigate their biochemical and biological functions. For example, the devastating human pathogen Mycobacterium tuberculosis has 45 vapBC loci encoded in its genome whereas its non-pathogenic relative, Mycobacterium smegmatis has just one vapBC operon. On another branch of the prokaryotic tree, the nitrogen-fixing symbiont of legumes, Sinorhizobium meliloti has 21 vapBC loci and at least one of these loci have been implicated in the regulation of growth in the plant nodule. A range of biological functions has been suggested for these operons and this review sets out to survey the PIN-domains and summarise the current knowledge about the vapBC TA systems and their roles in diverse bacteria.
- Atkinson GC, Tenson T, Hauryliuk V
- The RelA/SpoT homolog (RSH) superfamily: distribution and functional evolution of ppGpp synthetases and hydrolases across the tree of life.
- PLoS One. 2011; 6: 23479-23479
- Display abstract
RelA/SpoT Homologue (RSH) proteins, named for their sequence similarity to the RelA and SpoT enzymes of Escherichia coli, comprise a superfamily of enzymes that synthesize and/or hydrolyze the alarmone ppGpp, activator of the "stringent" response and regulator of cellular metabolism. The classical "long" RSHs Rel, RelA and SpoT with the ppGpp hydrolase, synthetase, TGS and ACT domain architecture have been found across diverse bacteria and plant chloroplasts, while dedicated single domain ppGpp-synthesizing and -hydrolyzing RSHs have also been discovered in disparate bacteria and animals respectively. However, there is considerable confusion in terms of nomenclature and no comprehensive phylogenetic and sequence analyses have previously been carried out to classify RSHs on a genomic scale. We have performed high-throughput sensitive sequence searching of over 1000 genomes from across the tree of life, in combination with phylogenetic analyses to consolidate previous ad hoc identification of diverse RSHs in different organisms and provide a much-needed unifying terminology for the field. We classify RSHs into 30 subgroups comprising three groups: long RSHs, small alarmone synthetases (SASs), and small alarmone hydrolases (SAHs). Members of nineteen previously unidentified RSH subgroups can now be studied experimentally, including previously unknown RSHs in archaea, expanding the "stringent response" to this domain of life. We have analyzed possible combinations of RSH proteins and their domains in bacterial genomes and compared RSH content with available RSH knock-out data for various organisms to determine the rules of combining RSHs. Through comparative sequence analysis of long and small RSHs, we find exposed sites limited in conservation to the long RSHs that we propose are involved in transmitting regulatory signals. Such signals may be transmitted via NTD to CTD intra-molecular interactions, or inter-molecular interactions either among individual RSH molecules or among long RSHs and other binding partners such as the ribosome.
- Frank V, Koler M, Furst S, Vaknin A
- The physical and functional thermal sensitivity of bacterial chemoreceptors.
- J Mol Biol. 2011; 411: 554-66
- Display abstract
The bacterium Escherichia coli exhibits chemotactic behavior at temperatures ranging from approximately 20 degrees C to at least 42 degrees C. This behavior is controlled by clusters of transmembrane chemoreceptors made from trimers of dimers that are linked together by cross-binding to cytoplasmic components. By detecting fluorescence energy transfer between various components of this system, we studied the underlying molecular behavior of these receptors in vivo and throughout their operating temperature range. We reveal a sharp modulation in the conformation of unclustered and clustered receptor trimers and, consequently, in kinase activity output. These modulations occurred at a characteristic temperature that depended on clustering and were lower for receptors at lower adaptational states. However, in the presence of dynamic adaptation, the response of kinase activity to a stimulus was sustained up to 45 degrees C, but sensitivity notably decreased. Thus, this molecular system exhibits a clear thermal sensitivity that emerges at the level of receptor trimers, but both receptor clustering and adaptation support the overall robust operation of the system at elevated temperatures.
- Sjolander K, Datta RS, Shen Y, Shoffner GM
- Ortholog identification in the presence of domain architecture rearrangement.
- Brief Bioinform. 2011; 12: 413-22
- Display abstract
Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area.
- Francke C et al.
- Comparative analyses imply that the enigmatic Sigma factor 54 is a central controller of the bacterial exterior.
- BMC Genomics. 2011; 12: 385-385
- Display abstract
BACKGROUND: Sigma-54 is a central regulator in many pathogenic bacteria and has been linked to a multitude of cellular processes like nitrogen assimilation and important functional traits such as motility, virulence, and biofilm formation. Until now it has remained obscure whether these phenomena and the control by Sigma-54 share an underlying theme. RESULTS: We have uncovered the commonality by performing a range of comparative genome analyses. A) The presence of Sigma-54 and its associated activators was determined for all sequenced prokaryotes. We observed a phylum-dependent distribution that is suggestive of an evolutionary relationship between Sigma-54 and lipopolysaccharide and flagellar biosynthesis. B) All Sigma-54 activators were identified and annotated. The relation with phosphotransfer-mediated signaling (TCS and PTS) and the transport and assimilation of carboxylates and nitrogen containing metabolites was substantiated. C) The function annotations, that were represented within the genomic context of all genes encoding Sigma-54, its activators and its promoters, were analyzed for intra-phylum representation and inter-phylum conservation. Promoters were localized using a straightforward scoring strategy that was formulated to identify similar motifs. We found clear highly-represented and conserved genetic associations with genes that concern the transport and biosynthesis of the metabolic intermediates of exopolysaccharides, flagella, lipids, lipopolysaccharides, lipoproteins and peptidoglycan. CONCLUSION: Our analyses directly implicate Sigma-54 as a central player in the control over the processes that involve the physical interaction of an organism with its environment like in the colonization of a host (virulence) or the formation of biofilm.
- Cohen-Gihon I, Fong JH, Sharan R, Nussinov R, Przytycka TM, Panchenko AR
- Evolution of domain promiscuity in eukaryotic genomes--a perspective from the inferred ancestral domain architectures.
- Mol Biosyst. 2011; 7: 784-92
- Display abstract
Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution.
- Nasir A, Naeem A, Khan MJ, Nicora HD, Caetano-Anolles G
- Annotation of Protein Domains Reveals Remarkable Conservation in the Functional Make up of Proteomes Across Superkingdoms.
- Genes (Basel). 2011; 2: 869-911
- Display abstract
The functional repertoire of a cell is largely embodied in its proteome, the collection of proteins encoded in the genome of an organism. The molecular functions of proteins are the direct consequence of their structure and structure can be inferred from sequence using hidden Markov models of structural recognition. Here we analyze the functional annotation of protein domain structures in almost a thousand sequenced genomes, exploring the functional and structural diversity of proteomes. We find there is a remarkable conservation in the distribution of domains with respect to the molecular functions they perform in the three superkingdoms of life. In general, most of the protein repertoire is spent in functions related to metabolic processes but there are significant differences in the usage of domains for regulatory and extra-cellular processes both within and between superkingdoms. Our results support the hypotheses that the proteomes of superkingdom Eukarya evolved via genome expansion mechanisms that were directed towards innovating new domain architectures for regulatory and extra/intracellular process functions needed for example to maintain the integrity of multicellular structure or to interact with environmental biotic and abiotic factors (e.g., cell signaling and adhesion, immune responses, and toxin production). Proteomes of microbial superkingdoms Archaea and Bacteria retained fewer numbers of domains and maintained simple and smaller protein repertoires. Viruses appear to play an important role in the evolution of superkingdoms. We finally identify few genomic outliers that deviate significantly from the conserved functional design. These include Nanoarchaeum equitans, proteobacterial symbionts of insects with extremely reduced genomes, Tenericutes and Guillardia theta. These organisms spend most of their domains on information functions, including translation and transcription, rather than on metabolism and harbor a domain repertoire characteristic of parasitic organisms. In contrast, the functional repertoire of the proteomes of the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum was no different than the rest of bacteria, failing to support claims of them representing a separate superkingdom. In turn, Protista and Bacteria shared similar functional distribution patterns suggesting an ancestral evolutionary link between these groups.
- Wiedenhoeft J, Krause R, Eulenstein O
- The plexus model for the inference of ancestral multidomain proteins.
- IEEE/ACM Trans Comput Biol Bioinform. 2011; 8: 890-901
- Display abstract
Interactions of protein domains control essential cellular processes. Thus, inferring the evolutionary histories of multidomain proteins in the context of their families can provide rewarding insights into protein function. However, methods to infer these histories are challenged by the complexity of macroevolutionary events. Here, we address this challenge by describing an algorithm that computes a novel network-like structure, called plexus, which represents the evolution of domains and their combinations. Finally, we demonstrate the performance of this algorithm with empirical data sets.
- Patil A, Kinoshita K, Nakamura H
- Domain distribution and intrinsic disorder in hubs in the human protein-protein interaction network.
- Protein Sci. 2010; 19: 1461-8
- Display abstract
Intrinsic disorder and distributed surface charge have been previously identified as some of the characteristics that differentiate hubs (proteins with a large number of interactions) from non-hubs in protein-protein interaction networks. In this study, we investigated the differences in the quantity, diversity, and functional nature of Pfam domains, and their relationship with intrinsic disorder, in hubs and non-hubs. We found that proteins with a more diverse domain composition were over-represented in hubs when compared with non-hubs, with the number of interactions in hubs increasing with domain diversity. Conversely, the fraction of intrinsic disorder in hubs decreased with increasing number of ordered domains. The difference in the levels of disorder was more prominent in hubs and non-hubs with fewer domains. Functional analysis showed that hubs were enriched in kinase and adaptor domains acting primarily in signal transduction and transcription regulation, whereas non-hubs had more DNA-binding domains and were involved in catalytic activity. Consistent with the differences in the functional nature of their domains, hubs with two or more domains were more likely to connect distinct functional modules in the interaction network when compared with single domain hubs. We conclude that the availability of greater number and diversity of ordered domains, in addition to the tendency to have promiscuous domains, differentiates hubs from non-hubs and provides an additional means of achieving interaction promiscuity. Further, hubs with fewer domains use greater levels of intrinsic disorder to facilitate interaction promiscuity with the prevalence of disorder decreasing with increasing number of ordered domains.
- Ueki T, Lovley DR
- Novel regulatory cascades controlling expression of nitrogen-fixation genes in Geobacter sulfurreducens.
- Nucleic Acids Res. 2010; 38: 7485-99
- Display abstract
Geobacter species often play an important role in bioremediation of environments contaminated with metals or organics and show promise for harvesting electricity from waste organic matter in microbial fuel cells. The ability of Geobacter species to fix atmospheric nitrogen is an important metabolic feature for these applications. We identified novel regulatory cascades controlling nitrogen-fixation gene expression in Geobacter sulfurreducens. Unlike the regulatory mechanisms known in other nitrogen-fixing microorganisms, nitrogen-fixation gene regulation in G. sulfurreducens is controlled by two two-component His-Asp phosphorelay systems. One of these systems appears to be the master regulatory system that activates transcription of the majority of nitrogen-fixation genes and represses a gene encoding glutamate dehydrogenase during nitrogen fixation. The other system whose expression is directly activated by the master regulatory system appears to control by antitermination the expression of a subset of the nitrogen-fixation genes whose transcription is activated by the master regulatory system and whose promoter contains transcription termination signals. This study provides a new paradigm for nitrogen-fixation gene regulation.
- Al-Khodor S, Price CT, Kalia A, Abu Kwaik Y
- Functional diversity of ankyrin repeats in microbial proteins.
- Trends Microbiol. 2010; 18: 132-9
- Display abstract
The ankyrin repeat (ANK) is the most common protein-protein interaction motif in nature, and is predominantly found in eukaryotic proteins. Genome sequencing of various pathogenic or symbiotic bacteria and eukaryotic viruses has identified numerous genes encoding ANK-containing proteins that are proposed to have been acquired from eukaryotes by horizontal gene transfer. However, the recent discovery of additional ANK-containing proteins encoded in the genomes of archaea and free-living bacteria suggests either a more ancient origin of the ANK motif or multiple convergent evolution events. Many bacterial pathogens employ various types of secretion systems to deliver ANK-containing proteins into eukaryotic cells, where they mimic or manipulate various host functions. Studying the molecular and biochemical functions of this family of proteins will enhance our understanding of important host-microbe interactions.
- Bourret RB
- Receiver domain structure and function in response regulator proteins.
- Curr Opin Microbiol. 2010; 13: 142-9
- Display abstract
During signal transduction by two-component regulatory systems, sensor kinases detect and encode input information while response regulators (RRs) control output. Most receiver domains function as phosphorylation-mediated switches within RRs, but some transfer phosphoryl groups in multistep phosphorelays. Conserved features of receiver domain amino acid sequence correlate with structure and hence function. Receiver domains catalyze their own phosphorylation and dephosphorylation in reactions requiring a divalent cation. Molecular dynamics simulations are supplementing structural investigation of the conformational changes that underlie receiver domain switch function. As understanding of features shared by all receiver domains matures, factors conferring differences (e.g. in reaction rate or specificity) are receiving increased attention. Numerous examples of atypical receiver or pseudo-receiver domains that function without phosphorylation have recently been characterized.
- Bashton M, Thornton JM
- Domain-ligand mapping for enzymes.
- J Mol Recognit. 2010; 23: 194-208
- Display abstract
In this paper we provide an overview of our current knowledge of the mapping between small molecule ligands and protein domains. We give an overview of the present data resources available on the Web, which provide information about protein-ligand interactions, as well as discussing our own PROCOGNATE database. We present an update of ligand binding in large protein superfamilies and identify those ligands most frequently utilized by nature. Finally we discuss potential uses for this type of data.
- Carland TM, Gerwick L
- The C1q domain containing proteins: Where do they come from and what do they do?
- Dev Comp Immunol. 2010; 34: 785-90
- Display abstract
The gene sequence encoding an N-terminal collagen stalk followed by a globular complement 1q domain (gC1q), an architecture that characterizes the C1q A, B and C chains of the first complement component (C1), did not become prevalent until the cephalochordates and urochordates. However, genes encoding only the globular complement 1q domain (ghC1q) are more ancient as they exist within many lower vertebrate and invertebrate genomes, and are even present in the prokaryotes. These genes can be divided into two groups, the first, which appears to be the more ancient form, encodes proteins that are not secreted (cghC1q). The second group encodes proteins in which the globular domain is preceded by a signal peptide indicating secretion (sghC1q). In this review we examine bioinformatic evidence for C1q domain containing (C1qDC) genes in many organisms and integrate these observations with research performed and published on the biochemistry and functions of this fascinating set of proteins.
- Berntsson RP, Smits SH, Schmitt L, Slotboom DJ, Poolman B
- A structural classification of substrate-binding proteins.
- FEBS Lett. 2010; 584: 2606-17
- Display abstract
Substrate-binding proteins (SBP) are associated with a wide variety of protein complexes. The proteins are part of ATP-binding cassette transporters for substrate uptake, ion gradient driven transporters, DNA-binding proteins, as well as channels and receptors from both pro- and eukaryotes. A wealth of structural and functional data is available on SBPs, with over 120 unique entries in the Protein Data Bank (PDB). Over a decade ago these proteins were divided into three structural classes, but based on the currently available wealth of structural data, we propose a new classification into six clusters, based on features of their three-dimensional structure.
- Nacher JC, Hayashida M, Akutsu T
- The role of internal duplication in the evolution of multi-domain proteins.
- Biosystems. 2010; 101: 127-35
- Display abstract
Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion.
- Zhang D, Aravind L
- Identification of novel families and classification of the C2 domain superfamily elucidate the origin and evolution of membrane targeting activities in eukaryotes.
- Gene. 2010; 469: 18-30
- Display abstract
Eukaryotes contain an elaborate membrane system, which bounds the cell itself, nuclei, organelles and transient intracellular structures, such as vesicles. The emergence of this system was marked by an expansion of a number of structurally distinct classes of lipid-binding domains that could throw light on the early evolution of eukaryotic membranes. The C2 domain is a useful model to understand these events because it is one of the most prevalent eukaryotic lipid-binding domains deployed in diverse functional contexts. Most studies have concentrated on C2 domains prototyped by those in protein kinase C (PKC-C2) isoforms that bind lipid in a calcium-dependent manner. While two other distinct families of C2 domains, namely those in PI3K-C2 and PTEN-C2 are also recognized, a complete picture of evolutionary relationships within the C2 domain superfamily is lacking. We systematically studied this superfamily using sequence profile searches, phylogenetic and phyletic-pattern analysis and structure-prediction. Consequently, we identified several distinct families of C2 domains including those respectively typified by C2 domains in the Aida (axin interactor, dorsalization associated) proteins, B9 proteins (e.g. Mks1 (Xbx-7), Stumpy (Tza-1) and Tza-2) involved in centrosome migration and ciliogenesis, Dock180/Zizimin proteins which are Rac/CDC42 GDP exchange factors, the EEIG1/Sym-3, EHBP1 and plant RPG/PMI1 proteins involved in endocytotic recycling and organellar positioning and an apicomplexan family. We present evidence that the last eukaryotic common ancestor (LECA) contained at least 10 C2 domains belonging to 6 well-defined families. Further, we suggest that this pre-LECA diversification was linked to the emergence of several quintessentially eukaryotic structures, such as membrane repair and vesicular trafficking system, anchoring of the actin and tubulin cytoskeleton to the plasma and vesicular membranes, localization of small GTPases to membranes and lipid-based signal transduction. Subsequent lineage-specific expansions of Zizimin-type C2 domains and functionally linked CDC42/Rac GTPases occurred independently in eukaryotes that evolved active amoeboid motility. While two lipid-binding regions are likely to be shared by majority of C2 domains, the actual constellation of lipid-binding residues (predominantly basic) are distinct in each family potentially reflective of the functional and biochemical diversity of these domains. Importantly, we show that the calcium-dependent membrane interaction is a derived feature limited to the PKC-C2 domains. Our identification of novel C2 domains offers new insights into interaction between both the microtubular and microfilament cytoskeleton and cellular membranes.
- Friedberg F
- Singlet CH domain containing human multidomain proteins: an inventory.
- Mol Biol Rep. 2010; 37: 1531-9
- Display abstract
The actin cytoskeleton presents the basic force in processes such as cytokinesis, endocytosis, vesicular trafficking and cell migration. Here, we list 30 human singlet CH (calpononin homology/actin binding) containing multidomain molecules, each encoded by one gene. We show the domain distributions as given by the SMART program. These mosaic proteins organize geographically the placement of selected proteins in proximity within the cell. In most instances, their precise location, their actin binding capacity by way of the singlet CH (or by other domains?) and their physiological functions need further elucidation. A dendrogram based solely on the relationship for the human singlet CH domains (in terms of AA sequences) for the various molecules that possess the domain, implies that the singlet descended from a common ancestor which in turn sprouted three main branches of protein products. Each branch bifurcated multiple times thus accounting for a cornucopia of products. Wherever, additional (unassigned), highly homologous regions exist in related proteins (e.g., in LIM and LMO7 or in Tangerin and EH/BP1), these unrecognized domain regions await assignment as specific functional domains. Frequently genes coding multidomain proteins duplicated. The varying modular nature within multidomain proteins should have accelerated evolutionary changes to a degree not feasible to achieve by means of mere post-duplication mutational changes.
- Apic G, Russell RB
- Domain recombination: a workhorse for evolutionary innovation.
- Sci Signal. 2010; 3: 30-30
- Display abstract
Although the combination of modular domains within proteins has been proposed as a determining feature of evolutionary innovation and flexibility, direct evidence for this mechanism of evolution has been sketchy. Two papers, one creating new domain combinations in the yeast mating pathway and another involving a comprehensive analysis of protein function and domain architecture across major organisms, have provided firm evidence that the recombining of domains can lead to evolutionary innovation. The results will guide future studies in synthetic and evolutionary biology.
- Pawlowski K, Muszewska A, Lenart A, Szczepinska T, Godzik A, Grynberg M
- A widespread peroxiredoxin-like domain present in tumor suppression- and progression-implicated proteins.
- BMC Genomics. 2010; 11: 590-590
- Display abstract
BACKGROUND: Peroxide turnover and signalling are involved in many biological phenomena relevant to human diseases. Yet, all the players and mechanisms involved in peroxide perception are not known. Elucidating very remote evolutionary relationships between proteins is an approach that allows the discovery of novel protein functions. Here, we start with three human proteins, SRPX, SRPX2 and CCDC80, involved in tumor suppression and progression, which possess a conserved region of similarity. Structure and function prediction allowed the definition of P-DUDES, a phylogenetically widespread, possibly ancient protein structural domain, common to vertebrates and many bacterial species. RESULTS: We show, using bioinformatics approaches, that the P-DUDES domain, surprisingly, adopts the thioredoxin-like (Thx-like) fold. A tentative, more detailed prediction of function is made, namely, that of a 2-Cys peroxiredoxin. Incidentally, consistent overexpression of all three human P-DUDES genes in two public glioblastoma microarray gene expression datasets was discovered. This finding is discussed in the context of the tumor suppressor role that has been ascribed to P-DUDES proteins in several studies. Majority of non-redundant P-DUDES proteins are found in marine metagenome, and among the bacterial species possessing this domain a trend for a higher proportion of aquatic species is observed. CONCLUSIONS: The new protein structural domain, now with a broad enzymatic function predicted, may become a drug target once its detailed molecular mechanism of action is understood in detail.
- Makarova KS, Wolf YI, van der Oost J, Koonin EV
- Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements.
- Biol Direct. 2009; 4: 29-29
- Display abstract
BACKGROUND: In eukaryotes, RNA interference (RNAi) is a major mechanism of defense against viruses and transposable elements as well of regulating translation of endogenous mRNAs. The RNAi systems recognize the target RNA molecules via small guide RNAs that are completely or partially complementary to a region of the target. Key components of the RNAi systems are proteins of the Argonaute-PIWI family some of which function as slicers, the nucleases that cleave the target RNA that is base-paired to a guide RNA. Numerous prokaryotes possess the CRISPR-associated system (CASS) of defense against phages and plasmids that is, in part, mechanistically analogous but not homologous to eukaryotic RNAi systems. Many prokaryotes also encode homologs of Argonaute-PIWI proteins but their functions remain unknown. RESULTS: We present a detailed analysis of Argonaute-PIWI protein sequences and the genomic neighborhoods of the respective genes in prokaryotes. Whereas eukaryotic Ago/PIWI proteins always contain PAZ (oligonucleotide binding) and PIWI (active or inactivated nuclease) domains, the prokaryotic Argonaute homologs (pAgos) fall into two major groups in which the PAZ domain is either present or absent. The monophyly of each group is supported by a phylogenetic analysis of the conserved PIWI-domains. Almost all pAgos that lack a PAZ domain appear to be inactivated, and the respective genes are associated with a variety of predicted nucleases in putative operons. An additional, uncharacterized domain that is fused to various nucleases appears to be a unique signature of operons encoding the short (lacking PAZ) pAgo form. By contrast, almost all PAZ-domain containing pAgos are predicted to be active nucleases. Some proteins of this group (e.g., that from Aquifex aeolicus) have been experimentally shown to possess nuclease activity, and are not typically associated with genes for other (putative) nucleases. Given these observations, the apparent extensive horizontal transfer of pAgo genes, and their common, statistically significant over-representation in genomic neighborhoods enriched in genes encoding proteins involved in the defense against phages and/or plasmids, we hypothesize that pAgos are key components of a novel class of defense systems. The PAZ-domain containing pAgos are predicted to directly destroy virus or plasmid nucleic acids via their nuclease activity, whereas the apparently inactivated, PAZ-lacking pAgos could be structural subunits of protein complexes that contain, as active moieties, the putative nucleases that we predict to be co-expressed with these pAgos. All these nucleases are predicted to be DNA endonucleases, so it seems most probable that the putative novel phage/plasmid-defense system targets phage DNA rather than mRNAs. Given that in eukaryotic RNAi systems, the PAZ domain binds a guide RNA and positions it on the complementary region of the target, we further speculate that pAgos function on a similar principle (the guide being either DNA or RNA), and that the uncharacterized domain found in putative operons with the short forms of pAgos is a functional substitute for the PAZ domain. CONCLUSION: The hypothesis that pAgos are key components of a novel prokaryotic immune system that employs guide RNA or DNA molecules to degrade nucleic acids of invading mobile elements implies a functional analogy with the prokaryotic CASS and a direct evolutionary connection with eukaryotic RNAi. The predictions of the hypothesis including both the activities of pAgos and those of the associated endonucleases are readily amenable to experimental tests.
- Lenaerts T, Schymkowitz J, Rousseau F
- Protein domains as information processing units.
- Curr Protein Pept Sci. 2009; 10: 133-45
- Display abstract
Transducing environmental signals from the cell surface to the nucleus in order to evoke appropriate gene regulatory response requires an accurate and robust medium to propagate biological information. The structure of proteins and especially the dynamic properties of these structures allows for the uptake and restitution of biological information from and to the environment. To understand the functioning and regulation of signalling pathways we therefore have to understand how protein structures encode biological information. Towards this goal several computational methods have been carried out over the last years. First we will provide an overview of these in silico approaches. Next, using the well known SH2 domain as a case study, we describe two specific approaches in more detail to illustrate the similarities and differences between sequence-based and structure-based methods for the analysis of protein communication. Both methods address the same question yet from a different level of description. As a consequence both have their limits and a number of pros and cons that are discussed here. Together all the methods discussed here provide an arsenal of in silico approaches that may be used to understand how information content is maintained through protein structural dynamics, elucidating explicitly information transfer in signalling networks.
- Rigden DJ, Liu H, Hayes SD, Urbe S, Clague MJ
- Ab initio protein modelling reveals novel human MIT domains.
- FEBS Lett. 2009; 583: 872-8
- Display abstract
Database searches can fail to detect all truly homologous sequences, particularly when dealing with short, highly sequence diverse protein families. Here, using microtubule interacting and transport (MIT) domains as an example, we have applied an approach of profile-profile matching followed by ab initio structure modelling to the detection of true homologues in the borderline significant zone of database searches. Novel MIT domains were confidently identified in USP54, containing an apparently inactive ubiquitin carboxyl-terminal hydrolase domain, a katanin-like ATPase KATNAL1, and an uncharacterized protein containing a VPS9 domain. As a proof of principle, we have confirmed the novel MIT annotation for USP54 by in vitro profiling of binding to CHMP proteins.
- Green JB, Lower RP, Young JP
- The NfeD protein family and its conserved gene neighbours throughout prokaryotes: functional implications for stomatin-like proteins.
- J Mol Evol. 2009; 69: 657-67
- Display abstract
NfeD-like proteins are widely distributed throughout prokaryotes and are frequently associated with genes encoding stomatin-like proteins (slipins). Here, we reveal that the NfeD family is ancient and comprises three major groups: NfeD1a, NfeD1b and truncated NfeD1b. Members of each group are associated with one of four conserved gene partners, three of which have eukaryotic homologues that are membrane raft associated, namely stomatin, paraslipin (previously SLP-2) and flotillin. The first NfeD group (NfeD1b), comprises proteins of approximately 460-aa long that have three functional domains: an N-terminal protease, a middle membrane-spanning region and a soluble C-terminal region rich in beta-strands. The nfeD1b gene is adjacent to eoslipin in prokaryotic genomes except in Firmicutes and Deinococci, where yqfA replaces eoslipin. Proteins in the second major group (NfeD1a) are homologous to the C-terminus of NfeD1b which forms a beta-barrel-like domain, and their genes are associated with paraslipin. Using OrthoMCL clustering, we show that nfeD1b genes have become truncated on many independent occasions giving rise to the third major group. These short NfeD homologues frequently remain associated with their ancestral gene neighbour, resembling NfeD1a in structure, yet are much more related to full-length NfeD1b; we term these "truncated NfeD1b". These conserved associations suggest that NfeD proteins are dependent on gene partners for their function and that the site of interaction may lie within the C-terminal portion that is common to all NfeD homologues. Although NfeD homologues are confined to prokaryotes, this conserved association could represent an excellent system to study slipin and flotillin proteins.
- Seshasayee AS, Fraser GM, Babu MM, Luscombe NM
- Principles of transcriptional regulation and evolution of the metabolic system in E. coli.
- Genome Res. 2009; 19: 79-91
- Display abstract
Organisms must adapt to make optimal use of the metabolic system in response to environmental changes. In the long-term, this involves evolution of the genomic repertoire of enzymes; in the short-term, transcriptional control ensures that appropriate enzymes are expressed in response to transitory extracellular conditions. Unicellular organisms are particularly susceptible to environmental changes; however, genome-scale impact of these modulatory effects has not been explored so far in bacteria. Here, we integrate genome-scale data to investigate the evolutionary trends and transcriptional control of metabolism in Escherichia coli K12. Globally, the regulatory system is organized in a clear hierarchy of general and specific transcription factors (TFs) that control differing ranges of metabolic functions. Further, catabolic, anabolic, and central metabolic pathways are targeted by distinct combinations of these TFs. Locally, enzymes catalyzing sequential reactions in a metabolic pathway are co-regulated by the same TFs. Regulation is more complex at junctions: General TFs control the overall activity of all connecting reactions, whereas specific TFs control individual enzymes. Divergent junctions play a special role in delineating metabolic pathways and decouple the regulation of incoming and outgoing reactions. We find little evidence for differential usage of isozymes, which are generally co-expressed in similar conditions, and thus are likely to reinforce the metabolic system through redundancy. Finally, we show that enzymes controlled by the same TFs have a strong tendency to co-evolve, suggesting a significant constraint to maintain similar regulatory regimes during evolution. Catabolic, anabolic, and central energy pathways evolve differently, emphasizing the role of the environment in shaping the metabolic system. Many of the observations also occur in yeast, and our findings may apply across large evolutionary distances.
- Rajagopalan L, Pereira FA, Lichtarge O, Brownell WE
- Identification of functionally important residues/domains in membrane proteins using an evolutionary approach coupled with systematic mutational analysis.
- Methods Mol Biol. 2009; 493: 287-97
- Display abstract
Structure-function studies of membrane proteins present a unique challenge to researchers due to the numerous technical difficulties associated with their expression, purification and structural characterization. In the absence of structural information, rational identification of putative functionally important residues/regions is difficult. Phylogenetic relationships could provide valuable information about the functional significance of a particular residue or region of a membrane protein. Evolutionary Trace (ET) analysis is a method developed to utilize this phylogenetic information to predict functional sites in proteins. In this method, residues are ranked according to conservation or divergence through evolution, based on the hypothesis that mutations at key positions should coincide with functional evolutionary divergences. This information can be used as the basis for a systematic mutational analysis of identified residues, leading to the identification of functionally important residues and/or domains in membrane proteins, in the absence of structural information apart from the primary amino acid sequence. This approach is potentially useful in the context of the auditory system, as several key processes in audition involve the action of membrane proteins, many of which are novel and not well characterized structurally or functionally to date.
- Martinez SE, Heikaus CC, Klevit RE, Beavo JA
- The structure of the GAF A domain from phosphodiesterase 6C reveals determinants of cGMP binding, a conserved binding surface, and a large cGMP-dependent conformational change.
- J Biol Chem. 2008; 283: 25913-9
- Display abstract
The photoreceptor phosphodiesterase (PDE6) regulates the intracellular levels of the second messenger cGMP in the outer segments of cone and rod photoreceptor cells. PDE6 contains two regulatory GAF domains, of which one (GAF A) binds cGMP and regulates the activity of the PDE6 holoenzyme. To increase our understanding of this allosteric regulation mechanism, we present the 2.6A crystal structure of the cGMP-bound GAF A domain of chicken cone PDE6. Nucleotide specificity appears to be provided in part by the orientation of Asn-116, which makes two hydrogen bonds to the guanine ring of cGMP but is not strictly conserved among PDE6 isoforms. The isolated PDE6C GAF A domain is monomeric and does not contain sufficient structural determinants to form a homodimer as found in full-length PDE6C. A highly conserved surface patch on GAF A indicates a potential binding site for the inhibitory subunit Pgamma. NMR studies reveal that the apo-PDE6C GAF A domain is structured but adopts a significantly altered structural state indicating a large conformational change with rearrangement of secondary structure elements upon cGMP binding. The presented crystal structure will help to define the cGMP-dependent regulation mechanism of the PDE6 holoenzyme and its inhibition through Pgamma binding.
- Wang F et al.
- A systematic survey of mini-proteins in bacteria and archaea.
- PLoS One. 2008; 3: 4027-4027
- Display abstract
BACKGROUND: Mini-proteins, defined as polypeptides containing no more than 100 amino acids, are ubiquitous in prokaryotes and eukaryotes. They play significant roles in various biological processes, and their regulatory functions gradually attract the attentions of scientists. However, the functions of the majority of mini-proteins are still largely unknown due to the constraints of experimental methods and bioinformatic analysis. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we extracted a total of 180,879 mini-proteins from the annotations of 532 sequenced genomes, including 491 strains of Bacteria and 41 strains of Archaea. The average proportion of mini-proteins among all genomic proteins is approximately 10.99%, but different strains exhibit remarkable fluctuations. These mini-proteins display two notable characteristics. First, the majority are species-specific proteins with an average proportion of 58.79% among six representative phyla. Second, an even larger proportion (70.03% among all strains) is hypothetical proteins. However, a fraction of highly conserved hypothetical proteins potentially play crucial roles in organisms. Among mini-proteins with known functions, it seems that regulatory and metabolic proteins are more abundant than essential structural proteins. Furthermore, domains in mini-proteins seem to have greater distributions in Bacteria than Eukarya. Analysis of the evolutionary progression of these domains reveals that they have diverged to new patterns from a single ancestor. CONCLUSIONS/SIGNIFICANCE: Mini-proteins are ubiquitous in bacterial and archaeal species and play significant roles in various functions. The number of mini-proteins in each genome displays remarkable fluctuation, likely resulting from the differential selective pressures that reflect the respective life-styles of the organisms. The answers to many questions surrounding mini-proteins remain elusive and need to be resolved experimentally.
- Bailly X, Vanin S, Chabasse C, Mizuguchi K, Vinogradov SN
- A phylogenomic profile of hemerythrins, the nonheme diiron binding respiratory proteins.
- BMC Evol Biol. 2008; 8: 244-244
- Display abstract
BACKGROUND: Hemerythrins, are the non-heme, diiron binding respiratory proteins of brachiopods, priapulids and sipunculans; they are also found in annelids and bacteria, where their functions have not been fully elucidated. RESULTS: A search for putative Hrs in the genomes of 43 archaea, 444 bacteria and 135 eukaryotes, revealed their presence in 3 archaea, 118 bacteria, several fungi, one apicomplexan, a heterolobosan, a cnidarian and several annelids. About a fourth of the Hr sequences were identified as N- or C-terminal domains of chimeric, chemotactic gene regulators. The function of the remaining single domain bacterial Hrs remains to be determined. In addition to oxygen transport, the possible functions in annelids have been proposed to include cadmium-binding, antibacterial action and immunoprotection. A Bayesian phylogenetic tree revealed a split into two clades, one encompassing archaea, bacteria and fungi, and the other comprising the remaining eukaryotes. The annelid and sipunculan Hrs share the same intron-exon structure, different from that of the cnidarian Hr. CONCLUSION: The phylogenomic profile of Hrs demonstrated a limited occurrence in bacteria and archaea and a marked absence in the vast majority of multicellular organisms. Among the metazoa, Hrs have survived in a cnidarian and in a few protostome groups; hence, it appears that in metazoans the Hr gene was lost in deuterostome ancestor(s) after the radiata/bilateria split. Signal peptide sequences in several Hirudinea Hrs suggest for the first time, the possibility of extracellular localization. Since the alpha-helical bundle is likely to have been among the earliest protein folds, Hrs represent an ancient family of iron-binding proteins, whose primary function in bacteria may have been that of an oxygen sensor, enabling aerophilic or aerophobic responses. Although Hrs evolved to function as O2 transporters in brachiopods, priapulids and sipunculans, their function in annelids remains to be elucidated. Overall Hrs exhibit a considerable lack of evolutionary success in metazoans.
- Ly CV, Yao CK, Verstreken P, Ohyama T, Bellen HJ
- straightjacket is required for the synaptic stabilization of cacophony, a voltage-gated calcium channel alpha1 subunit.
- J Cell Biol. 2008; 181: 157-70
- Display abstract
In a screen to identify genes involved in synaptic function, we isolated mutations in Drosophila melanogaster straightjacket (stj), an alpha(2)delta subunit of the voltage-gated calcium channel. stj mutant photoreceptors develop normal synaptic connections but display reduced "on-off" transients in electroretinogram recordings, indicating a failure to evoke postsynaptic responses and, thus, a defect in neurotransmission. stj is expressed in neurons but excluded from glia. Mutants exhibit endogenous seizure-like activity, indicating altered neuronal excitability. However, at the synaptic level, stj larval neuromuscular junctions exhibit approximately fourfold reduction in synaptic release compared with controls stemming from a reduced release probability at these synapses. These defects likely stem from destabilization of Cacophony (Cac), the primary presynaptic alpha(1) subunit in D. melanogaster. Interestingly, neuronal overexpression of cac partially rescues the viability and physiological defects in stj mutants, indicating a role for the alpha(2)delta Ca(2+) channel subunit in mediating the proper localization of an alpha(1) subunit at synapses.
- Burroughs AM, Balaji S, Iyer LM, Aravind L
- Small but versatile: the extraordinary functional and structural diversity of the beta-grasp fold.
- Biol Direct. 2007; 2: 18-18
- Display abstract
BACKGROUND: The beta-grasp fold (beta-GF), prototyped by ubiquitin (UB), has been recruited for a strikingly diverse range of biochemical functions. These functions include providing a scaffold for different enzymatic active sites (e.g. NUDIX phosphohydrolases) and iron-sulfur clusters, RNA-soluble-ligand and co-factor-binding, sulfur transfer, adaptor functions in signaling, assembly of macromolecular complexes and post-translational protein modification. To understand the basis for the functional versatility of this small fold we undertook a comprehensive sequence-structure analysis of the fold and developed a natural classification for its members. RESULTS: As a result we were able to define the core distinguishing features of the fold and numerous elaborations, including several previously unrecognized variants. Systematic analysis of all known interactions of the fold showed that its manifold functional abilities arise primarily from the prominent beta-sheet, which provides an exposed surface for diverse interactions or additionally, by forming open barrel-like structures. We show that in the beta-GF both enzymatic activities and the binding of diverse co-factors (e.g. molybdopterin) have independently evolved on at least three occasions each, and iron-sulfur-cluster-binding on at least two independent occasions. Our analysis identified multiple previously unknown large monophyletic assemblages within the beta-GF, including one which unifies versions found in the fasciclin-1 superfamily, the ribosomal protein L25, the phosphoribosyl AMP cyclohydrolase (HisI) and glutamine synthetase. We also uncovered several new groups of beta-GF domains including a domain found in bacterial flagellar and fimbrial assembly components, and 5 new UB-like domains in the eukaryotes. CONCLUSION: Evolutionary reconstruction indicates that the beta-GF had differentiated into at least 7 distinct lineages by the time of the last universal common ancestor of all extant organisms, encompassing much of the structural diversity observed in extant versions of the fold. The earliest beta-GF members were probably involved in RNA metabolism and subsequently radiated into various functional niches. Most of the structural diversification occurred in the prokaryotes, whereas the eukaryotic phase was mainly marked by a specific expansion of the ubiquitin-like beta-GF members. The eukaryotic UB superfamily diversified into at least 67 distinct families, of which at least 19-20 families were already present in the eukaryotic common ancestor, including several protein and one lipid conjugated forms. Another key aspect of the eukaryotic phase of evolution of the beta-GF was the dramatic increase in domain architectural complexity of proteins related to the expansion of UB-like domains in numerous adaptor roles.
- Janga SC, Salgado H, Martinez-Antonio A, Collado-Vides J
- Coordination logic of the sensing machinery in the transcriptional regulatory network of Escherichia coli.
- Nucleic Acids Res. 2007; 35: 6963-72
- Display abstract
The active and inactive state of transcription factors in growing cells is usually directed by allosteric physicochemical signals or metabolites, which are in turn either produced in the cell or obtained from the environment by the activity of the products of effector genes. To understand the regulatory dynamics and to improve our knowledge about how transcription factors (TFs) respond to endogenous and exogenous signals in the bacterial model, Escherichia coli, we previously proposed to classify TFs into external, internal and hybrid sensing classes depending on the source of their allosteric or equivalent metabolite. Here we analyze how a cell uses its topological structures in the context of sensing machinery and show that, while feed forward loops (FFLs) tightly integrate internal and external sensing TFs connecting TFs from different layers of the hierarchical transcriptional regulatory network (TRN), bifan motifs frequently connect TFs belonging to the same sensing class and could act as a bridge between TFs originating from the same level in the hierarchy. We observe that modules identified in the regulatory network of E. coli are heterogeneous in sensing context with a clear combination of internal and external sensing categories depending on the physiological role played by the module. We also note that propensity of two-component response regulators increases at promoters, as the number of TFs regulating a target operon increases. Finally we show that evolutionary families of TFs do not show a tendency to preserve their sensing abilities. Our results provide a detailed panorama of the topological structures of E. coli TRN and the way TFs they compose off, sense their surroundings by coordinating responses.
- Miller WG et al.
- The complete genome sequence and analysis of the epsilonproteobacterium Arcobacter butzleri.
- PLoS One. 2007; 2: 1358-1358
- Display abstract
BACKGROUND: Arcobacter butzleri is a member of the epsilon subdivision of the Proteobacteria and a close taxonomic relative of established pathogens, such as Campylobacter jejuni and Helicobacter pylori. Here we present the complete genome sequence of the human clinical isolate, A. butzleri strain RM4018. METHODOLOGY/PRINCIPAL FINDINGS: Arcobacter butzleri is a member of the Campylobacteraceae, but the majority of its proteome is most similar to those of Sulfuromonas denitrificans and Wolinella succinogenes, both members of the Helicobacteraceae, and those of the deep-sea vent Epsilonproteobacteria Sulfurovum and Nitratiruptor. In addition, many of the genes and pathways described here, e.g. those involved in signal transduction and sulfur metabolism, have been identified previously within the epsilon subdivision only in S. denitrificans, W. succinogenes, Sulfurovum, and/or Nitratiruptor, or are unique to the subdivision. In addition, the analyses indicated also that a substantial proportion of the A. butzleri genome is devoted to growth and survival under diverse environmental conditions, with a large number of respiration-associated proteins, signal transduction and chemotaxis proteins and proteins involved in DNA repair and adaptation. To investigate the genomic diversity of A. butzleri strains, we constructed an A. butzleri DNA microarray comprising 2238 genes from strain RM4018. Comparative genomic indexing analysis of 12 additional A. butzleri strains identified both the core genes of A. butzleri and intraspecies hypervariable regions, where <70% of the genes were present in at least two strains. CONCLUSION/SIGNIFICANCE: The presence of pathways and loci associated often with non-host-associated organisms, as well as genes associated with virulence, suggests that A. butzleri is a free-living, water-borne organism that might be classified rightfully as an emerging pathogen. The genome sequence and analyses presented in this study are an important first step in understanding the physiology and genetics of this organism, which constitutes a bridge between the environment and mammalian hosts.
- Meier VM, Muschler P, Scharf BE
- Functional analysis of nine putative chemoreceptor proteins in Sinorhizobium meliloti.
- J Bacteriol. 2007; 189: 1816-26
- Display abstract
The genome of the symbiotic soil bacterium Sinorhizobium meliloti contains eight genes coding for methyl-accepting chemotaxis proteins (MCPs) McpS to McpZ and one gene coding for a transducer-like protein, IcpA. Seven of the MCPs are localized in the cytoplasmic membrane via two membrane-spanning regions, whereas McpY and IcpA lack such hydrophobic regions. The periplasmic regions of McpU, McpV, and McpX contain the small-ligand-binding domain Cache. In addition, McpU possesses the ligand-binding domain TarH. By probing gene expression with lacZ fusions, we have identified mcpU and mcpX as being highly expressed. Deletion of any one of the receptor genes caused impairments in the chemotactic response toward most organic acids, amino acids, and sugars in a swarm plate assay. The data imply that chemoreceptor proteins in S. meliloti can sense more than one class of carbon source and suggest that many or all receptors work as an ensemble. Tactic responses were virtually eliminated for a strain lacking all nine receptor genes. Capillary assays revealed three important sensors for the strong attractant proline: McpU, McpX, and McpY. Receptor deletions variously affected free-swimming speed and attractant-induced chemokinesis. Noticeably, cells lacking mcpU were swimming 9% slower than the wild-type control. We infer that McpU inhibits the kinase activity of CheA in the absence of an attractant. Cells lacking one of the two soluble receptors were impaired in chemokinetic proficiency by more than 50%. We propose that the internal sensors, IcpA and the PAS domain containing McpY, monitor the metabolic state of S. meliloti.
- Ashby MK
- Distribution, structure and diversity of "bacterial" genes encoding two-component proteins in the Euryarchaeota.
- Archaea. 2006; 2: 11-30
- Display abstract
The publicly available annotated archaeal genome sequences (23 complete and three partial annotations, October 2005) were searched for the presence of potential two-component open reading frames (ORFs) using gene category lists and BLASTP. A total of 489 potential two-component genes were identified from the gene category lists and BLASTP. Two-component genes were found in 14 of the 21 Euryarchaeal sequences (October 2005) and in neither the Crenarchaeota nor the Nanoarchaeota. A total of 20 predicted protein domains were identified in the putative two-component ORFs that, in addition to the histidine kinase and receiver domains, also includes sensor and signalling domains. The detailed structure of these putative proteins is shown, as is the distribution of each class of two-component genes in each species. Potential members of orthologous groups have been identified, as have any potential operons containing two or more two-component genes. The number of two-component genes in those Euryarchaeal species which have them seems to be linked more to lifestyle and habitat than to genome complexity, with most examples being found in Methanospirillum hungatei, Haloarcula marismortui, Methanococcoides burtonii and the mesophilic Methanosarcinales group. The large numbers of two-component genes in these species may reflect a greater requirement for internal regulation. Phylogenetic analysis of orthologous groups of five different protein classes, three probably involved in regulating taxis, suggests that most of these ORFs have been inherited vertically from an ancestral Euryarchaeal species and point to a limited number of key horizontal gene transfer events.
- Kinch LN, Grishin NV
- Longin-like folds identified in CHiPS and DUF254 proteins: vesicle trafficking complexes conserved in eukaryotic evolution.
- Protein Sci. 2006; 15: 2669-74
- Display abstract
Eukaryotic protein trafficking pathways require specific transfer of cargo vesicles to different target organelles. A number of vesicle trafficking and membrane fusion components participate in this process, including various tethering factor complexes that interact with small GTPases prior to SNARE-mediated vesicle fusion. In Saccharomyces cerevisiae a protein complex of Mon1 and Ccz1 functions with the small GTPase Ypt7 to mediate vesicle trafficking to the vacuole. Mon1 belongs to DUF254 found in a diverse range of eukaryotic genomes, while Ccz1 includes a CHiPS domain that is also present in a known human protein trafficking disorder gene (HPS-4). The present work identifies the CHiPS domain and a sequence region from another trafficking disorder gene (HPS-1) as homologs of an N-terminal domain from DUF254. This link establishes the evolutionary conservation of a protein complex (HPS-1/HPS-4) that functions similarly to Mon1/Ccz1 in vesicle trafficking to lysosome-related organelles of diverse eukaryotic species. Furthermore, the newly identified DUF254 domain is a distant homolog of the mu-adaptin longin domain found in clathrin adapter protein (AP) complexes of known structure that function to localize cargo protein to specific organelles. In support of this fold assignment, known longin domains such as the AP complex sigma-adaptin, the synaptobrevin N-terminal domains sec22 and Ykt6, and the srx domain of the signal recognition particle receptor also regulate vesicle trafficking pathways by mediating SNARE fusion, recognizing specialized compartments, and interacting with small GTPases that resemble Ypt7.
- Evans D, Marquez SM, Pace NR
- RNase P: interface of the RNA and protein worlds.
- Trends Biochem Sci. 2006; 31: 333-41
- Display abstract
Ribonuclease P (RNase P) is an endonuclease involved in processing tRNA. It contains both RNA and protein subunits and occurs in all three domains of life: namely, Archaea, Bacteria and Eukarya. The RNase P RNA subunits from bacteria and some archaea are catalytically active in vitro, whereas those from eukaryotes and most archaea require protein subunits for activity. RNase P has been characterized biochemically and genetically in several systems, and detailed structural information is emerging for both RNA and protein subunits from phylogenetically diverse organisms. In vitro reconstitution of activity is providing insight into the role of proteins in the RNase P holoenzyme. Together, these findings are beginning to impart an understanding of the coevolution of the RNA and protein worlds.
- Iyer LM, Balaji S, Koonin EV, Aravind L
- Evolutionary genomics of nucleo-cytoplasmic large DNA viruses.
- Virus Res. 2006; 117: 156-84
- Display abstract
A previous comparative-genomic study of large nuclear and cytoplasmic DNA viruses (NCLDVs) of eukaryotes revealed the monophyletic origin of four viral families: poxviruses, asfarviruses, iridoviruses, and phycodnaviruses [Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75 (23), 11720-11734]. Here we update this analysis by including the recently sequenced giant genome of the mimiviruses and several additional genomes of iridoviruses, phycodnaviruses, and poxviruses. The parsimonious reconstruction of the gene complement of the ancestral NCLDV shows that it was a complex virus with at least 41 genes that encoded the replication machinery, up to four RNA polymerase subunits, at least three transcription factors, capping and polyadenylation enzymes, the DNA packaging apparatus, and structural components of an icosahedral capsid and the viral membrane. The phylogeny of the NCLDVs is reconstructed by cladistic analysis of the viral gene complements, and it is shown that the two principal lineages of NCLDVs are comprised of poxviruses grouped with asfarviruses and iridoviruses grouped with phycodnaviruses-mimiviruses. The phycodna-mimivirus grouping was strongly supported by several derived shared characters, which seemed to rule out the previously suggested basal position of the mimivirus [Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.M. 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306 (5700), 1344-1350]. These results indicate that the divergence of the major NCLDV families occurred at an early stage of evolution, prior to the divergence of the major eukaryotic lineages. It is shown that subsequent evolution of the NCLDV genomes involved lineage-specific expansion of paralogous gene families and acquisition of numerous genes via horizontal gene transfer from the eukaryotic hosts, other viruses, and bacteria (primarily, endosymbionts and parasites). Amongst the expansions, there are multiple families of predicted virus-specific signaling and regulatory domains. Most NCLDVs have also acquired large arrays of genes related to ubiquitin signaling, and the animal viruses in particular have independently evolved several defenses against apoptosis and immune response, including growth factors and potential inhibitors of cytokine signaling. The mimivirus displays an enormous array of genes of bacterial provenance, including a representative of a new class of predicted papain-like peptidases. It is further demonstrated that a significant number of genes found in NCLDVs also have homologs in bacteriophages, although a vertical relationship between the NCLDVs and a particular bacteriophage group could not be established. On the basis of these observations, two alternative scenarios for the origin of the NCLDVs and other groups of large DNA viruses of eukaryotes are considered. One of these scenarios posits an early assembly of an already large DNA virus precursor from which various large DNA viruses diverged through an ongoing process of displacement of the original genes by xenologous or non-orthologous genes from various sources. The second scenario posits convergent emergence, on multiple occasions, of large DNA viruses from small plasmid-like precursors through independent accretion of similar sets of genes due to strong selective pressures imposed by their life cycles and hosts.
- Snell EA, Brooke NM, Taylor WR, Casane D, Philippe H, Holland PW
- An unusual choanoflagellate protein released by Hedgehog autocatalytic processing.
- Proc Biol Sci. 2006; 273: 401-7
- Display abstract
Hedgehog proteins are important cell-cell signalling proteins utilized during the development of multicellular animals. Members of the hedgehog gene family have not been detected outside the Metazoa, raising unanswered questions about their evolutionary origin. Here we report a highly unusual hedgehog-related gene from a choanoflagellate, a close unicellular relative of the animals. The deduced C-terminal domain, Hoglet-C, is homologous to the autocatalytic domain of Hedgehog proteins and is predicted to function in autocatalytic cleavage of the precursor peptide. In contrast, the N-terminal Hoglet-N peptide has no similarity to the signalling peptide of Hedgehog (Hh-N). Instead, Hoglet-N is deduced to be a secreted protein with an enormous threonine-rich domain of unprecedented size and purity (over 200 threonine residues) and two polysaccharide-binding domains. Structural modelling reveals that these domains have a novel combination of features found in cellulose-binding domains (CBD) of types IIa and IIb, and are expected to bind cellulose. We propose that the two CBD domains enable Hoglet-N to bind to plant matter, tethering an amorphous nucleophilic anchor, facilitating transient adhesion of the choanoflagellate cell. Since Hh-C and Hoglet-C are homologous, but Hh-N and Hoglet-N are not, we argue that metazoan hedgehog genes evolved by fusion of two distinct genes.
- Ashby MK, Houmard J
- Cyanobacterial two-component proteins: structure, diversity, distribution, and evolution.
- Microbiol Mol Biol Rev. 2006; 70: 472-509
- Display abstract
A survey of the already characterized and potential two-component protein sequences that exist in the nine complete and seven partially annotated cyanobacterial genome sequences available (as of May 2005) showed that the cyanobacteria possess a much larger repertoire of such proteins than most other bacteria. By analysis of the domain structure of the 1,171 potential histidine kinases, response regulators, and hybrid kinases, many various arrangements of about thirty different modules could be distinguished. The number of two-component proteins is related in part to genome size but also to the variety of physiological properties and ecophysiologies of the different strains. Groups of orthologues were defined, only a few of which have representatives with known physiological functions. Based on comparisons with the proposed phylogenetic relationships between the strains, the orthology groups show that (i) a few genes, some of them clustered on the genome, have been conserved by all species, suggesting their very ancient origin and an essential role for the corresponding proteins, and (ii) duplications, fusions, gene losses, insertions, and deletions, as well as domain shuffling, occurred during evolution, leading to the extant repertoire. These mechanisms are put in perspective with the different genetic properties that cyanobacteria have to achieve genome plasticity. This review is designed to serve as a basis for orienting further research aimed at defining the most ancient regulatory mechanisms and understanding how evolution worked to select and keep the most appropriate systems for cyanobacteria to develop in the quite different environments that they have successfully colonized.
- Dreyfus DH, Nagasawa M, Gelfand EW, Ghoda LY
- Modulation of p53 activity by IkappaBalpha: evidence suggesting a common phylogeny between NF-kappaB and p53 transcription factors.
- BMC Immunol. 2005; 6: 12-12
- Display abstract
BACKGROUND: In this work we present evidence that the p53 tumor suppressor protein and NF-kappaB transcription factors could be related through common descent from a family of ancestral transcription factors regulating cellular proliferation and apoptosis. P53 is a homotetrameric transcription factor known to interact with the ankyrin protein 53BP2 (a fragment of the ASPP2 protein). NF-kappaB is also regulated by ankyrin proteins, the prototype of which is the IkappaB family. The DNA binding sequences of the two transcription factors are similar, sharing 8 out of 10 nucleotides. Interactions between the two proteins, both direct and indirect, have been noted previously and the two proteins play central roles in the control of proliferation and apoptosis. RESULTS: Using previously published structure data, we noted a significant degree of structural alignment between p53 and NF-kappaB p65. We also determined that IkappaBalpha and p53 bind in vitro through a specific interaction in part involving the DNA binding region of p53, or a region proximal to it, and the amino terminus of IkappaBalpha independently or cooperatively with the ankyrin 3 domain of IkappaBalpha In cotransfection experiments, kappaBalpha could significantly inhibit the transcriptional activity of p53. Inhibition of p53-mediated transcription was increased by deletion of the ankyrin 2, 4, or 5 domains of IkappaBalpha Co-precipitation experiments using the stably transfected ankyrin 5 deletion mutant of kappaBalpha and endogenous wild-type p53 further support the hypothesis that p53 and IkappaBalpha can physically interact in vivo. CONCLUSION: The aggregate results obtained using bacterially produced IkappaBalpha and p53 as well as reticulocyte lysate produced proteins suggest a correlation between in vitro co-precipitation in at least one of the systems and in vivo p53 inhibitory activity. These observations argue for a mechanism involving direct binding of IkappaBalpha to p53 in the inhibition of p53 transcriptional activity, analogous to the inhibition of NF-kappaB by kappaBalpha and p53 by 53BP2/ASPP2. These data furthermore suggest a role for ankyrin proteins in the regulation of p53 activity. Taken together, the NFkappaB and p53 proteins share similarities in structure, DNA binding sites and binding and regulation by ankyrin proteins in support of our hypothesis that the two proteins share common descent from an ancestral transcriptional factor.
- Rossolillo P, Marinoni I, Galli E, Colosimo A, Albertini AM
- YrxA is the transcriptional regulator that represses de novo NAD biosynthesis in Bacillus subtilis.
- J Bacteriol. 2005; 187: 7155-60
- Display abstract
The first genetic, in vivo, and in vitro evidences that YrxA is the regulator of NAD de novo biosynthesis in Bacillus subtilis are hereby reported. The protein is essential to the transcription repression of the divergent operons nadBCA and nifS-yrxA in the presence of nicotinic acid and binds to their shared operator-promoter region.
- Novatchkova M, Wildpaner M, Schweizer D, Eisenhaber F
- PhyloDome--visualization of taxonomic distributions of domains occurring in eukaryote protein sequence sets.
- Nucleic Acids Res. 2005; 33: 1215-1215
- Display abstract
The analysis of taxonomic distribution and lineage-specific variation of domains and domain combinations is an important step in the assessment of their functional roles and potential interoperability. In the study of eukaryote sequence sets with many multi-domain proteins, it can become laborious to evaluate the phylogenetic context of the many occurring domains and their mutual relationships. PhyloDome is an answer to that problem. It provides a fast overview on the taxonomic spreading and potential interrelation of domains that are either given as a list of names and PFAM/SMART accessions or derived from a user-defined set of sequences. This taxonomic distribution analysis can be helpful in protein function and interaction assignment as the comparative study of potential Hedgehog pathway members in C.elegans shows. An implementation of PhyloDome is accessible for public use as a WWW-Service at http://mendel.imp.univie.ac.at/phylodome/. Software components are available on request.
- Theobald DL, Wuttke DS
- Divergent evolution within protein superfolds inferred from profile-based phylogenetics.
- J Mol Biol. 2005; 354: 722-37
- Display abstract
Many dissimilar protein sequences fold into similar structures. A central and persistent challenge facing protein structural analysis is the discrimination between homology and convergence for structurally similar domains that lack significant sequence similarity. Classic examples are the OB-fold and SH3 domains, both small, modular beta-barrel protein superfolds. The similarities among these domains have variously been attributed to common descent or to convergent evolution. Using a sequence profile-based phylogenetic technique, we analyzed all structurally characterized OB-fold, SH3, and PDZ domains with less than 40% mutual sequence identity. An all-against-all, profile-versus-profile analysis of these domains revealed many previously undetectable significant interrelationships. The matrices of scores were used to infer phylogenies based on our derivation of the relationships between sequence similarity E-values and evolutionary distances. The resulting clades of domains correlate remarkably well with biological function, as opposed to structural similarity, indicating that the functionally distinct sub-families within these superfolds are homologous. This method extends phylogenetics into the challenging "twilight zone" of sequence similarity, providing the first objective resolution of deep evolutionary relationships among distant protein families.
- Lai WC, Hazelbauer GL
- Carboxyl-terminal extensions beyond the conserved pentapeptide reduce rates of chemoreceptor adaptational modification.
- J Bacteriol. 2005; 187: 5115-21
- Display abstract
Sensory adaptation in bacterial chemotaxis is mediated by covalent modification of chemoreceptors. Specific glutamyl residues are methylated and demethylated in reactions catalyzed by methyltransferase CheR and methylesterase CheB. In the well-characterized chemosensory systems of Escherichia coli and Salmonella spp., efficient modification by either enzyme is dependent on a conserved pentapeptide sequence, NWETF or NWESF, present at the extreme carboxyl terminus of high-abundance chemoreceptors. To what extent is position at the extreme carboxyl terminus important for pentapeptide-mediated enhancement of adaptational modification? Is this position equally important for enhancement of both enzyme activities? To address these questions, we created forms of high-abundance receptor Tsr or Tar carrying one, six, or eight additional amino acids extending beyond the pentapeptide at their carboxyl termini and assayed methylation, demethylation, deamidation, and ability to mediate chemotaxis. In vitro and in vivo, all three carboxyl-terminal extensions reduced pentapeptide-mediated enhancement of rates of adaptational modification. CheB-catalyzed reactions were more affected than CheR-catalyzed reactions. Effects were less severe for the complete sensory system in vivo than for the minimal system of receptor and modification enzymes in vitro. Notably, extended receptors mediated chemotaxis as efficiently as wild-type receptors, providing a striking example of robustness in chemotactic systems. This could reflect compensatory reductions of rates for both modification reactions, mitigation of effects of slower reactions by the intertwined circuitry of signaling and adaptation, or tolerance of a range of reactions rates for adaptational modification. No matter what the mechanism, the observations provide a challenging test for mathematical models of chemotaxis.
- Matsushita M, Matsui H
- Protein transduction technology.
- J Mol Med (Berl). 2005; 83: 324-8
- Display abstract
With the elucidation of the human genome, exhaustive analysis of genomic data related to gene transcription and the structure and function of translated protein products has progressed rapidly. Delivery of proteins and their functional domains or inhibitory peptides directly into the cell is ideal to use this protein information and analyze associated physiological functions. Protein transduction technology, which controls cell function via direct delivery of a desired protein into the cell, involves fusing the protein with a special peptide sequence consisting of 10-20 amino acids, referred to as the protein transduction domain. The recent discovery that the protein transduction domain can also be inserted into various macromolecules heightens expectations in terms of development of novel advanced experimental tools and clinical reagents.
- Balaji S, Babu MM, Iyer LM, Aravind L
- Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains.
- Nucleic Acids Res. 2005; 33: 3994-4006
- Display abstract
The comparative genomics of apicomplexans, such as the malarial parasite Plasmodium, the cattle parasite Theileria and the emerging human parasite Cryptosporidium, have suggested an unexpected paucity of specific transcription factors (TFs) with DNA binding domains that are closely related to those found in the major families of TFs from other eukaryotes. This apparent lack of specific TFs is paradoxical, given that the apicomplexans show a complex developmental cycle in one or more hosts and a reproducible pattern of differential gene expression in course of this cycle. Using sensitive sequence profile searches, we show that the apicomplexans possess a lineage-specific expansion of a novel family of proteins with a version of the AP2 (Apetala2)-integrase DNA binding domain, which is present in numerous plant TFs. About 20-27 members of this apicomplexan AP2 (ApiAP2) family are encoded in different apicomplexan genomes, with each protein containing one to four copies of the AP2 DNA binding domain. Using gene expression data from Plasmodium falciparum, we show that guilds of ApiAP2 genes are expressed in different stages of intraerythrocytic development. By analogy to the plant AP2 proteins and based on the expression patterns, we predict that the ApiAP2 proteins are likely to function as previously unknown specific TFs in the apicomplexans and regulate the progression of their developmental cycle. In addition to the ApiAP2 family, we also identified two other novel families of AP2 DNA binding domains in bacteria and transposons. Using structure similarity searches, we also identified divergent versions of the AP2-integrase DNA binding domain fold in the DNA binding region of the PI-SceI homing endonuclease and the C-terminal domain of the pleckstrin homology (PH) domain-like modules of eukaryotes. Integrating these findings, we present a reconstruction of the evolutionary scenario of the AP2-integrase DNA binding domain fold, which suggests that it underwent multiple independent combinations with different types of mobile endonucleases or recombinases. It appears that the eukaryotic versions have emerged from versions of the domain associated with mobile elements, followed by independent lineage-specific expansions, which accompanied their recruitment to transcription regulation functions.
- Leipe DD, Koonin EV, Aravind L
- STAND, a class of P-loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer.
- J Mol Biol. 2004; 343: 1-28
- Display abstract
Using sequence profile analysis and sequence-based structure predictions, we define a previously unrecognized, widespread class of P-loop NTPases. The signal transduction ATPases with numerous domains (STAND) class includes the AP-ATPases (animal apoptosis regulators CED4/Apaf-1, plant disease resistance proteins, and bacterial AfsR-like transcription regulators) and NACHT NTPases (e.g. NAIP, TLP1, Het-E-1) that have been studied extensively in the context of apoptosis, pathogen response in animals and plants, and transcriptional regulation in bacteria. We show that, in addition to these well-characterized protein families, the STAND class includes several other groups of (predicted) NTPase domains from diverse signaling and transcription regulatory proteins from bacteria and eukaryotes, and three Archaea-specific families. We identified the STAND domain in several biologically well-characterized proteins that have not been suspected to have NTPase activity, including soluble adenylyl cyclases, nephrocystin 3 (implicated in polycystic kidney disease), and Rolling pebble (a regulator of muscle development); these findings are expected to facilitate elucidation of the functions of these proteins. The STAND class belongs to the additional strand, catalytic E division of P-loop NTPases together with the AAA+ ATPases, RecA/helicase-related ATPases, ABC-ATPases, and VirD4/PilT-like ATPases. The STAND proteins are distinguished from other P-loop NTPases by the presence of unique sequence motifs associated with the N-terminal helix and the core strand-4, as well as a C-terminal helical bundle that is fused to the NTPase domain. This helical module contains a signature GxP motif in the loop between the two distal helices. With the exception of the archaeal families, almost all STAND NTPases are multidomain proteins containing three or more domains. In addition to the NTPase domain, these proteins typically contain DNA-binding or protein-binding domains, superstructure-forming repeats, such as WD40 and TPR, and enzymatic domains involved in signal transduction, including adenylate cyclases and kinases. By analogy to the AAA+ ATPases, it can be predicted that STAND NTPases use the C-terminal helical bundle as a "lever" to transmit the conformational changes brought about by NTP hydrolysis to effector domains. STAND NTPases represent a novel paradigm in signal transduction, whereby adaptor, regulatory switch, scaffolding, and, in some cases, signal-generating moieties are combined into a single polypeptide. The STAND class consists of 14 distinct families, and the evolutionary history of most of these families is riddled with dramatic instances of lineage-specific expansion and apparent horizontal gene transfer. The STAND NTPases are most abundant in developmentally and organizationally complex prokaryotes and eukaryotes. Transfer of genes for STAND NTPases from bacteria to eukaryotes on several occasions might have played a significant role in the evolution of eukaryotic signaling systems.
- Bonneau R, Baliga NS, Deutsch EW, Shannon P, Hood L
- Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1.
- Genome Biol. 2004; 5: 52-52
- Display abstract
BACKGROUND: Large fractions of all fully sequenced genomes code for proteins of unknown function. Annotating these proteins of unknown function remains a critical bottleneck for systems biology and is crucial to understanding the biological relevance of genome-wide changes in mRNA and protein expression, protein-protein and protein-DNA interactions. The work reported here demonstrates that de novo structure prediction is now a viable option for providing general function information for many proteins of unknown function. RESULTS: We have used Rosetta de novo structure prediction to predict three-dimensional structures for 1,185 proteins and protein domains (<150 residues in length) found in Halobacterium NRC-1, a widely studied halophilic archaeon. Predicted structures were searched against the Protein Data Bank to identify fold similarities and extrapolate putative functions. They were analyzed in the context of a predicted association network composed of several sources of functional associations such as: predicted protein interactions, predicted operons, phylogenetic profile similarity and domain fusion. To illustrate this approach, we highlight three cases where our combined procedure has provided novel insights into our understanding of chemotaxis, possible prophage remnants in Halobacterium NRC-1 and archaeal transcriptional regulators. CONCLUSIONS: Simultaneous analysis of the association network, coordinated mRNA level changes in microarray experiments and genome-wide structure prediction has allowed us to glean significant biological insights into the roles of several Halobacterium NRC-1 proteins of previously unknown function, and significantly reduce the number of proteins encoded in the genome of this haloarchaeon for which no annotation is available.
- Liu Y, Gerstein M, Engelman DM
- Transmembrane protein domains rarely use covalent domain recombination as an evolutionary mechanism.
- Proc Natl Acad Sci U S A. 2004; 101: 3495-7
- Display abstract
Recombination of evolutionarily unrelated domains is a mechanism often used by evolution to produce variety in soluble proteins. By using a classification of polytopic transmembrane domains into families, we examined integral membrane proteins for evidence of this mechanism. Surprisingly, we found that domain recombination is not common for the transmembrane regions of membrane proteins, a majority of integral membrane proteins containing only a single transmembrane domain. We suggest that noncovalent oligomeric associations, which are common in membrane proteins, may provide an alternative source of evolutionary diversity.
- Anantharaman V, Aravind L
- Novel conserved domains in proteins with predicted roles in eukaryotic cell-cycle regulation, decapping and RNA stability.
- BMC Genomics. 2004; 5: 45-45
- Display abstract
BACKGROUND: The emergence of eukaryotes was characterized by the expansion and diversification of several ancient RNA-binding domains and the apparent de novo innovation of new RNA-binding domains. The identification of these RNA-binding domains may throw light on the emergence of eukaryote-specific systems of RNA metabolism. RESULTS: Using sensitive sequence profile searches, homology-based fold recognition and sequence-structure superpositions, we identified novel, divergent versions of the Sm domain in the Scd6p family of proteins. This family of Sm-related domains shares certain features of conventional Sm domains, which are required for binding RNA, in addition to possessing some unique conserved features. We also show that these proteins contain a second previously uncharacterized C-terminal domain, termed the FDF domain (after a conserved sequence motif in this domain). The FDF domain is also found in the fungal Dcp3p-like and the animal FLJ22128-like proteins, where it fused to a C-terminal domain of the YjeF-N domain family. In addition to the FDF domains, the FLJ22128-like proteins contain yet another divergent version of the Sm domain at their extreme N-terminus. We show that the YjeF-N domains represent a novel version of the Rossmann fold that has acquired a set of catalytic residues and structural features that distinguish them from the conventional dehydrogenases. CONCLUSIONS: Several lines of contextual information suggest that the Scd6p family and the Dcp3p-like proteins are conserved components of the eukaryotic RNA metabolism system. We propose that the novel domains reported here, namely the divergent versions of the Sm domain and the FDF domain may mediate specific RNA-protein and protein-protein interactions in cytoplasmic ribonucleoprotein complexes. More specifically, the protein complexes containing Sm-like domains of the Scd6p family are predicted to regulate the stability of mRNA encoding proteins involved in cell cycle progression and vesicular assembly. The Dcp3p and FLJ22128 proteins may localize to the cytoplasmic processing bodies and possibly catalyze a specific processing step in the decapping pathway. The explosive diversification of Sm domains appears to have played a role in the emergence of several uniquely eukaryotic ribonucleoprotein complexes, including those involved in decapping and mRNA stability.
- Catlett NL, Yoder OC, Turgeon BG
- Whole-genome analysis of two-component signal transduction genes in fungal pathogens.
- Eukaryot Cell. 2003; 2: 1151-61
- Display abstract
Two-component phosphorelay systems are minimally comprised of a histidine kinase (HK) component, which autophosphorylates in response to an environmental stimulus, and a response regulator (RR) component, which transmits the signal, resulting in an output such as activation of transcription, or of a mitogen-activated protein kinase cascade. The genomes of the yeasts Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Candida albicans encode one, three, and three HKs, respectively. In contrast, the genome sequences of the filamentous ascomycetes Neurospora crassa, Cochliobolus heterostrophus (Bipolaris maydis), Gibberella moniliformis (Fusarium verticillioides), and Botryotinia fuckeliana (Botrytis cinerea) encode an extensive family of two-component signaling proteins. The putative HKs fall into 11 classes. Most of these classes are represented in each filamentous ascomycete species examined. A few of these classes are significantly more prevalent in the fungal pathogens than in the saprobe N. crassa, suggesting that these groups contain paralogs required for virulence. Despite the larger numbers of HKs in filamentous ascomycetes than in yeasts, all of the ascomycetes contain virtually the same downstream histidine phosphotransfer proteins and RR proteins, suggesting extensive cross talk or redundancy among HKs.
- Koonin EV, Makarova KS, Rogozin IB, Davidovic L, Letellier MC, Pellegrini L
- The rhomboids: a nearly ubiquitous family of intramembrane serine proteases that probably evolved by multiple ancient horizontal gene transfers.
- Genome Biol. 2003; 4: 19-19
- Display abstract
BACKGROUND: The rhomboid family of polytopic membrane proteins shows a level of evolutionary conservation unique among membrane proteins. They are present in nearly all the sequenced genomes of archaea, bacteria and eukaryotes, with the exception of several species with small genomes. On the basis of experimental studies with the developmental regulator rhomboid from Drosophila and the AarA protein from the bacterium Providencia stuartii, the rhomboids are thought to be intramembrane serine proteases whose signaling function is conserved in eukaryotes and prokaryotes. RESULTS: Phylogenetic tree analysis carried out using several independent methods for tree constructions and the corresponding statistical tests suggests that, despite its broad distribution in all three superkingdoms, the rhomboid family was not present in the last universal common ancestor of extant life forms. Instead, we propose that rhomboids evolved in bacteria and have been acquired by archaea and eukaryotes through several independent horizontal gene transfers. In eukaryotes, two distinct, ancient acquisitions apparently gave rise to the two major subfamilies, typified by rhomboid and PARL (presenilins-associated rhomboid-like protein), respectively. Subsequent evolution of the rhomboid family in eukaryotes proceeded by multiple duplications and functional diversification through the addition of extra transmembrane helices and other domains in different orientations relative to the conserved core that harbors the protease activity. CONCLUSIONS: Although the near-universal presence of the rhomboid family in bacteria, archaea and eukaryotes appears to suggest that this protein is part of the heritage of the last universal common ancestor, phylogenetic tree analysis indicates a likely bacterial origin with subsequent dissemination by horizontal gene transfer. This emphasizes the importance of explicit phylogenetic analysis for the reconstruction of ancestral life forms. A hypothetical scenario for the origin of intracellular membrane proteases from membrane transporters is proposed.
- Grimm C, Kraft R, Sauerbruch S, Schultz G, Harteneck C
- Molecular and functional characterization of the melastatin-related cation channel TRPM3.
- J Biol Chem. 2003; 278: 21493-501
- Display abstract
Proteins of the mammalian TRP (transient receptor potential) family form a heterogenous group of cation channels important for cellular Ca2+ signaling and homeostasis. Here we present the full-length sequence of TRPM3, a member of the melastatin-like subfamily (TRPM) of TRP channels. TRPM3 expression was found in human kidney and brain. HEK293 cells transiently transfected with TRPM3 showed a constitutive Ca2+ and Mn2+ entry. Whole-cell patch clamp experiments confirmed the spontaneous activity of TRPM3 and revealed permeability ratios PCa/PNa of 1.57 and PNa/PCs of 0.75. In cell-attached patches, spontaneous inward and outward currents were observed. At negative membrane potentials and in the presence of either 140 mm Cs+, 140 mm Na+, or 100 mm Ca2+ in the pipette solution, the single channel conductance levels were 133, 83, and 65 pS, respectively. The Ca2+ entry in TRPM3-expressing HEK293 cells increased during treatment with hypotonic extracellular solution. The reduction of extracellular osmolarity was accompanied by cell swelling, suggesting volume-regulated activity of TRPM3. From its function and expression in human kidney, we propose a role of TRPM3 in renal Ca2+ homeostasis.
- Apic G, Huber W, Teichmann SA
- Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination.
- J Struct Funct Genomics. 2003; 4: 67-78
- Display abstract
There is a limited repertoire of domain families in nature that are duplicated and combined in different ways to form the set of proteins in a genome. Most proteins in both prokaryote and eukaryote genomes consist of two or more domains, and we show that the family size distribution of multi-domain protein families follows a power law like that of individual families. Most domain pairs occur in four to six different domain architectures: in isolation and in combinations with different partners. We showed previously that within the set of all pairwise domain combinations, most small and medium-sized families are observed in combination with one or two other families, while a few large families are very versatile and combine with many different partners. Though this may appear to be a stochastic pattern, in which large families have more combination partners by virtue of their size, we establish here that all the domain families with more than three members in genomes are duplicated more frequently than would be expected by chance considering their number of neighbouring domains. This duplication of domain pairs is statistically significant for between one and three quarters of all families with seven or more members. For the majority of pairwise domain combinations, there is no known three-dimensional structure of the two domains together, and we term these novel combinations. Novel domain combinations are interesting and important targets for structural elucidation, as the geometry and interaction between the domains will help understand the function and evolution of multi-domain proteins. Of particular interest are those combinations that occur in the largest number of multi-domain proteins, and several of these frequent novel combinations contain DNA-binding domains.
- Matsuda K, Nishioka T, Kinoshita K, Kawabata T, Go N
- Finding evolutionary relations beyond superfamilies: fold-based superfamilies.
- Protein Sci. 2003; 12: 2239-51
- Display abstract
Superfamily classifications are based variably on similarity of sequences, global folds, local structures, or functions. We have examined the possibility of defining superfamilies purely from the viewpoint of the global fold/function relationship. For this purpose, we first classified protein domains according to the beta-sheet topology. We then introduced the concept of kinship relations among the classified beta-sheet topology by assuming that the major elementary event leading to creation of a new beta-sheet topology is either an addition or deletion of one beta-strand at the edge of an existing beta-sheet during the molecular evolution. Based on this kinship relation, a network of protein domains was constructed so that the distance between a pair of domains represents the number of evolutionary events that lead one from the other domain. We then mapped on it all known domains with a specific core chemical function (here taken, as an example, that involving ATP or its analogs). Careful analyses revealed that the domains are found distributed on the network as >20 mutually disjointed clusters. The proteins in each cluster are defined to form a fold-based superfamily. The results indicate that >20 ATP-binding protein superfamilies have been invented independently in the process of molecular evolution, and the conservative evolutionary diffusion of global folds and functions is the origin of the relationship between them.
- Arinaminpathy Y, Biggin PC, Shrivastava IH, Sansom MS
- A prokaryotic glutamate receptor: homology modelling and molecular dynamics simulations of GluR0.
- FEBS Lett. 2003; 553: 321-7
- Display abstract
GluR0 is a prokaryotic homologue of mammalian glutamate receptors that forms glutamate-activated, potassium-selective ion channels. The topology of its transmembrane (TM) domain is similar to that of simple potassium channels such as KcsA. Two plausible alignments of the sequence of the TM domain of GluR0 with KcsA are possible, differing in the region of the P helix. We have constructed homology models based on both alignments and evaluated them using 6 ns duration molecular dynamics simulations in a membrane-mimetic environment. One model, in which an insertion in GluR0 relative to KcsA is located in the loop between the M1 and P helices, is preferred on the basis of lower structural drift and maintenance of the P helix conformation during simulation. This model also exhibits inter-subunit salt bridges that help to stabilise the TM domain tetramer. During the simulation, concerted K(+) ion-water movement along the selectivity filter is observed, as is the case in simulations of KcsA. K(+) ion exit from the central cavity is associated with opening of the hydrophobic gate formed by the C-termini of the M2 helices. In the intact receptor the opening of this gate will be controlled by interactions with the extramembranous ligand-binding domains.
- Anantharaman V, Aravind L
- Application of comparative genomics in the identification and analysis of novel families of membrane-associated receptors in bacteria.
- BMC Genomics. 2003; 4: 34-34
- Display abstract
BACKGROUND: A great diversity of multi-pass membrane receptors, typically with 7 transmembrane (TM) helices, is observed in the eukaryote crown group. So far, they are relatively rare in the prokaryotes, and are restricted to the well-characterized sensory rhodopsins of various phototropic prokaryotes. RESULTS: Utilizing the currently available wealth of prokaryotic genomic sequences, we set up a computational screen to identify putative 7 (TM) and other multi-pass membrane receptors in prokaryotes. As a result of this procedure we were able to recover two widespread families of 7 TM receptors in bacteria that are distantly related to the eukaryotic 7 TM receptors and prokaryotic rhodopsins. Using sequence profile analysis, we were able to establish that the first members of these receptor families contain one of two distinct N-terminal extracellular globular domains, which are predicted to bind ligands such as carbohydrates. In their intracellular portions they contain fusions to a variety of signaling domains, which suggest that they are likely to transduce signals via cyclic AMP, cyclic diguanylate, histidine phosphorylation, dephosphorylation, and through direct interactions with DNA. The second family of bacterial 7 TM receptors possesses an alpha-helical extracellular domain, and is predicted to transduce a signal via an intracellular HD hydrolase domain. Based on comparative analysis of gene neighborhoods, this receptor is predicted to function as a regulator of the diacylglycerol-kinase-dependent glycerolipid pathway. Additionally, our procedure also recovered other types of putative prokaryotic multi-pass membrane associated receptor domains. Of these, we characterized two widespread, evolutionarily mobile multi-TM domains that are fused to a variety of C-terminal intracellular signaling domains. One of these typified by the Gram-positive LytS protein is predicted to be a potential sensor of murein derivatives, whereas the other one typified by the Escherichia coli UhpB protein is predicted to function as sensor of conformational changes occurring in associated membrane proteins CONCLUSIONS: We present evidence for considerable variety in the types of uncharacterized surface receptors in bacteria, and reconstruct the evolutionary processes that model their diversity. The identification of novel receptor families in prokaryotes is likely to aid in the experimental analysis of signal transduction and environmental responses of several bacteria, including pathogens such as Leptospira, Treponema, Corynebacterium, Coxiella, Bacillus anthracis and Cytophaga.
- Aravind L, Anantharaman V
- HutC/FarR-like bacterial transcription factors of the GntR family contain a small molecule-binding domain of the chorismate lyase fold.
- FEMS Microbiol Lett. 2003; 222: 17-23
- Display abstract
Numerous bacterial transcription factors contain a DNA-binding helix-turn-helix domain and a signaling domain, linked together in a single polypeptide. Typically, this signaling domain is a small-molecule-binding domain that undergoes a conformational change upon recognizing a specific ligand. The HutC/FarR-like transcription factors of the GntR family are one of the largest groups of transcription factors in the proteomes of most free-living bacteria. Using sensitive sequence profile analysis we show that the HutC/FarR-like transcription factors contain a conserved ligand-binding domain, which possesses the same fold as chorismate lyase (Escherichia coli UbiC gene product). This relationship suggests that the C-terminal domain of the HutC/FarR-like transcription factors binds small molecules in a cleft similar to the substrate-binding site of the chorismate lyases. The sequence diversity within the predicted binding cleft of the HutC/FarR ligand-binding domains is consistent with the ability of these transcription factors to respond to diverse small molecules, such as histidine (HutC), fatty acids (FarR), sugars (TreR) and alkylphosphonate (PhnF). UbiC-like chorismate lyases function in the ubiquinone biosynthesis pathway, and have characteristic charged, catalytic residues. Genome comparisons reveal that chorismate lyase orthologs are found in several bacteria, chloroplasts of eukaryotic algae and euryarchaea. In contrast, the GntR transcription regulators lack the conserved catalytic residues of the chorismate lyases, and have so far been detected only in bacteria. An ancestral, generic small-molecule-binding domain appears to have given rise to the enzymatic and non-catalytic ligand-binding versions of the same fold under the influence of different selective pressures.
- Makarova KS, Aravind L, Koonin EV
- SWIM, a novel Zn-chelating domain present in bacteria, archaea and eukaryotes.
- Trends Biochem Sci. 2002; 27: 384-6
- Display abstract
A previously undetected domain with a CxCx(n)CxH pattern of predicted zinc-chelating residues was identified in a variety of prokaryotic and eukaryotic proteins. These include bacterial ATPases of the SWI2/SNF2 family, plant MuDR transposases and transposase-derived Far1 nuclear proteins, and vertebrate MEK kinase-1. This domain was designated SWIM after SWI2/SNF2 and MuDR, and is predicted to have DNA-binding and protein-protein interaction functions in different contexts.
- Whittaker CA, Hynes RO
- Distribution and evolution of von Willebrand/integrin A domains: widely dispersed domains with roles in cell adhesion and elsewhere.
- Mol Biol Cell. 2002; 13: 3369-87
- Display abstract
The von Willebrand A (VWA) domain is a well-studied domain involved in cell adhesion, in extracellular matrix proteins, and in integrin receptors. A number of human diseases arise from mutations in VWA domains. We have analyzed the phylogenetic distribution of this domain and the relationships among approximately 500 proteins containing this domain. Although the majority of VWA-containing proteins are extracellular, the most ancient ones, present in all eukaryotes, are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport, and the proteasome. A common feature seems to be involvement in multiprotein complexes. Subsequent evolution involved deployment of VWA domains by Metazoa in extracellular proteins involved in cell adhesion such as integrin beta subunits (all Metazoa). Nematodes and chordates separately expanded their complements of extracellular matrix proteins containing VWA domains, whereas plants expanded their intracellular complement. Chordates developed VWA-containing integrin alpha subunits, collagens, and other extracellular matrix proteins (e.g., matrilins, cochlin/vitrin, and von Willebrand factor). Consideration of the known properties of VWA domains in integrins and extracellular matrix proteins allows insights into their involvement in protein-protein interactions and the roles of bound divalent cations and conformational changes. These allow inferences about similar functions in novel situations such as protease regulators (e.g., complement factors and trypsin inhibitors) and intracellular proteins (e.g., helicases, chelatases, and copines).
- Makarova KS, Aravind L, Grishin NV, Rogozin IB, Koonin EV
- A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis.
- Nucleic Acids Res. 2002; 30: 482-96
- Display abstract
During a systematic analysis of conserved gene context in prokaryotic genomes, a previously undetected, complex, partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea (with the exception of Thermoplasma acidophilum and Halobacterium NRC-1) and some bacteria, including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus. The gene composition and gene order in this neighborhood vary greatly between species, but all versions have a stable, conserved core that consists of five genes. One of the core genes encodes a predicted DNA helicase, often fused to a predicted HD-superfamily hydrolase, and another encodes a RecB family exonuclease; three core genes remain uncharacterized, but one of these might encode a nuclease of a new family. Two more genes that belong to this neighborhood and are present in most of the genomes in which the neighborhood was detected encode, respectively, a predicted HD-superfamily hydrolase (possibly a nuclease) of a distinct family and a predicted, novel DNA polymerase. Another characteristic feature of this neighborhood is the expansion of a superfamily of paralogous, uncharacterized proteins, which are encoded by at least 20-30% of the genes in the neighborhood. The functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system, which, to our knowledge, is the first repair system largely specific for thermophiles to be identified. This hypothetical repair system might be functionally analogous to the bacterial-eukaryotic system of translesion, mutagenic repair whose central components are DNA polymerases of the UmuC-DinB-Rad30-Rev1 superfamily, which typically are missing in thermophiles.
- Jogl G et al.
- Crystal structure of the BEACH domain reveals an unusual fold and extensive association with a novel PH domain.
- EMBO J. 2002; 21: 4785-95
- Display abstract
The BEACH domain is highly conserved in a large family of eukaryotic proteins, and is crucial for their functions in vesicle trafficking, membrane dynamics and receptor signaling. However, it does not share any sequence homology with other proteins. Here we report the crystal structure at 2.9 A resolution of the BEACH domain of human neurobeachin. It shows that the BEACH domain has a new and unusual polypeptide backbone fold, as the peptide segments in its core do not assume regular secondary structures. Unexpectedly, the structure also reveals that the BEACH domain is in extensive association with a novel, weakly conserved pleckstrin-homology (PH) domain. Consistent with the structural analysis, biochemical studies show that the PH and BEACH domains have strong interactions, suggesting they may function as a single unit. Functional studies in intact cells demonstrate the requirement of both the PH and the BEACH domains for activity. A prominent groove at the interface between the two domains may be used to recruit their binding partners.
- Francis NR, Levit MN, Shaikh TR, Melanson LA, Stock JB, DeRosier DJ
- Subunit organization in a soluble complex of tar, CheW, and CheA by electron microscopy.
- J Biol Chem. 2002; 277: 36755-9
- Display abstract
The Salmonella and Escherichia coli aspartate receptor, Tar, is representative of a large class of membrane receptors that generate chemotaxis responses by regulating the activity of an associated histidine protein kinase, CheA. Tar is composed of an NH(2)-terminal periplasmic ligand-binding domain linked through a transmembrane sequence to a COOH-terminal coiled-coil signaling domain in the cytoplasm. The isolated cytoplasmic domain of Tar fused to a leucine zipper sequence forms a soluble complex with CheA and the Src homology 3-like kinase activator, CheW. Activity of the CheA kinase in the soluble complex is essentially the same as in fully active complexes with the intact receptor in the membrane. The soluble complex is composed of approximately 28 receptor cytoplasmic domain chains, 6 CheW chains, and 4 CheA chains. It has a molecular weight of 1,400,000 (Liu, I., Levit, M., Lurz, R., Surette, M.G., and Stock, J.B. (1997) EMBO J. 16, 7231-7240). Electron microscopy reveals an elongated barrel-like structure with a largely hollow center. Immunoelectron microscopy has provided a general picture of the subunit and domain organization of the complex. CheA and CheW appear to be in the middle of the complex with the leucine zippers of the receptor construct at the ends. These findings show that the receptor signaling complex forms higher ordered structures with defined geometric architectures. Coupled with atomic models of the subunits, our results provide insights into the functional architecture by which the receptor regulates CheA kinase activity during bacterial chemotaxis.
- Aravind L, Anantharaman V, Koonin EV
- Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA.
- Proteins. 2002; 48: 1-14
- Display abstract
Protein sequence and structure comparisons show that the catalytic domains of Class I aminoacyl-tRNA synthetases, a related family of nucleotidyltransferases involved primarily in coenzyme biosynthesis, nucleotide-binding domains related to the UspA protein (USPA domains), photolyases, electron transport flavoproteins, and PP-loop-containing ATPases together comprise a distinct class of alpha/beta domains designated the HUP domain after HIGH-signature proteins, UspA, and PP-ATPase. Several lines of evidence are presented to support the monophyly of the HUP domains, to the exclusion of other three-layered alpha/beta folds with the generic "Rossmann-like" topology. Cladistic analysis, with patterns of structural and sequence similarity used as discrete characters, identified three major evolutionary lineages within the HUP domain class: the PP-ATPases; the HIGH superfamily, which includes class I aaRS and related nucleotidyltransferases containing the HIGH signature in their nucleotide-binding loop; and a previously unrecognized USPA-like group, which includes USPA domains, electron transport flavoproteins, and photolyases. Examination of the patterns of phyletic distribution of distinct families within these three major lineages suggests that the Last Universal Common Ancestor of all modern life forms encoded 15-18 distinct alpha/beta ATPases and nucleotide-binding proteins of the HUP class. This points to an extensive radiation of HUP domains before the last universal common ancestor (LUCA), during which the multiple class I aminoacyl-tRNA synthetases emerged only at a late stage. Thus, substantial evolutionary diversification of protein domains occurred well before the modern version of the protein-dependent translation machinery was established, i.e., still in the RNA world.
- Vannini A et al.
- The crystal structure of the quorum sensing protein TraR bound to its autoinducer and target DNA.
- EMBO J. 2002; 21: 4393-401
- Display abstract
The quorum sensing system allows bacteria to sense their cell density and initiate an altered pattern of gene expression after a sufficient quorum of cells has accumulated. In Agrobacterium tumefaciens, quorum sensing controls conjugal transfer of the tumour- inducing plasmid, responsible for plant crown gall disease. The core components of this system are the transcriptional regulator TraR and its inducing ligand N-(3-oxo-octanoyl)-L-homoserine lactone. This complex binds DNA and activates gene expression. We have determined the crystal structure of TraR in complex with its autoinducer and target DNA (PDB code 1h0m). The protein is dimeric, with each monomer composed of an N-terminal domain, which binds the ligand in an enclosed cavity far from the dimerization region, and a C-terminal domain, which binds DNA via a helix-turn-helix motif. The structure reveals an asymmetric homodimer, with one monomer longer than the other. The N-terminal domain resembles GAF/PAS domains, normally fused to catalytic signalling domains. In TraR, the gene fusion is between a GAF/PAS domain and a DNA-binding domain, resulting in a specific transcriptional regulator involved in quorum sensing.
- Shiomi D, Zhulin IB, Homma M, Kawagishi I
- Dual recognition of the bacterial chemoreceptor by chemotaxis-specific domains of the CheR methyltransferase.
- J Biol Chem. 2002; 277: 42325-33
- Display abstract
Adaptation to persisting stimulation is required for highly sensitive detection of temporal changes of stimuli, and often involves covalent modification of receptors. Therefore, it is of vital importance to understand how a receptor and its cognate modifying enzyme(s) modulate each other through specific protein-protein interactions. In the chemotaxis of Escherichia coli, adaptation requires methylation of chemoreceptors (e.g. Tar) catalyzed by the CheR methyltransferase. CheR binds to the C-terminal NWETF sequence of a chemoreceptor that is distinct from the methylation sites. However, little is known about how CheR recognizes its methylation sites or how it is distributed in a cell. In this study, we used comparative genomics to demonstrate that the CheR chemotaxis methyltransferase contains three structurally and functionally distinct modules: (i) the catalytic domain common to a methyltransferase superfamily; (ii) the N-terminal domain; and (iii) the beta-subdomain of the catalytic domain, both of which are found exclusively in chemotaxis methyltransferases. The only evolutionary conserved motif specific to CheR is the positively charged face of helix alpha2 in the N-terminal domain. The disulfide cross-linking analysis suggested that this face interacts with the methylation helix of Tar. We also demonstrated that CheR localizes to receptor clusters at cell poles via interaction of the beta-subdomain with the NWETF sequence. Thus, the two chemotaxis-specific modules of CheR interact with distinct regions of the chemoreceptor for targeting to the receptor cluster and for recognition of the substrate sites, respectively.
- Aravind L, Koonin EV
- Prokaryotic homologs of the eukaryotic DNA-end-binding protein Ku, novel domains in the Ku protein and prediction of a prokaryotic double-strand break repair system.
- Genome Res. 2001; 11: 1365-74
- Display abstract
Homologs of the eukaryotic DNA-end-binding protein Ku were identified in several bacterial and one archeal genome using iterative database searches with sequence profiles. Identification of prokaryotic Ku homologs allowed the dissection of the Ku protein sequences into three distinct domains, the Ku core that is conserved in eukaryotes and prokaryotes, a derived von Willebrand A domain that is fused to the amino terminus of the core in eukaryotic Ku proteins, and the newly recognized helix-extension-helix (HEH) domain that is fused to the carboxyl terminus of the core in eukaryotes and in one of the Ku homologs from the Actinomycete Streptomyces coelicolor. The version of the HEH domain present in eukaryotic Ku proteins represents the previously described DNA-binding domain called SAP. The Ku homolog from S. coelicolor contains a distinct version of the HEH domain that belongs to a previously unnoticed family of nucleic-acid-binding domains, which also includes HEH domains from the bacterial transcription termination factor Rho, bacterial and eukaryotic lysyl-tRNA synthetases, bacteriophage T4 endonuclease VII, and several uncharacterized proteins. The distribution of the Ku homologs in bacteria coincides with that of the archeal-eukaryotic-type DNA primase and genes for prokaryotic Ku homologs form predicted operons with genes coding for an ATP-dependent DNA ligase and/or archeal-eukaryotic-type DNA primase. Some of these operons additionally encode an uncharacterized protein that may function as nuclease or an Slx1p-like predicted nuclease containing a URI domain. A hypothesis is proposed that the Ku homolog, together with the associated gene products, comprise a previously unrecognized prokaryotic system for repair of double-strand breaks in DNA.
- Ren D, Navarro B, Xu H, Yue L, Shi Q, Clapham DE
- A prokaryotic voltage-gated sodium channel.
- Science. 2001; 294: 2372-5
- Display abstract
The pore-forming subunits of canonical voltage-gated sodium and calcium channels are encoded by four repeated domains of six-transmembrane (6TM) segments. We expressed and characterized a bacterial ion channel (NaChBac) from Bacillus halodurans that is encoded by one 6TM segment. The sequence, especially in the pore region, is similar to that of voltage-gated calcium channels. The expressed channel was activated by voltage and was blocked by calcium channel blockers. However, the channel was selective for sodium. The identification of NaChBac as a functionally expressed bacterial voltage-sensitive ion-selective channel provides insight into both voltage-dependent activation and divalent cation selectivity.
- Catterall WA
- Physiology. A one-domain voltage-gated sodium channel in bacteria.
- Science. 2001; 294: 2306-8
- Chung YJ, Krueger C, Metzgar D, Saier MH Jr
- Size comparisons among integral membrane transport protein homologues in bacteria, Archaea, and Eucarya.
- J Bacteriol. 2001; 183: 1012-21
- Display abstract
Integral membrane proteins from over 20 ubiquitous families of channels, secondary carriers, and primary active transporters were analyzed for average size differences between homologues from the three domains of life: Bacteria, Archaea, and Eucarya. The results showed that while eucaryotic homologues are consistently larger than their bacterial counterparts, archaeal homologues are significantly smaller. These size differences proved to be due primarily to variations in the sizes of hydrophilic domains localized to the N termini, the C termini, or specific loops between transmembrane alpha-helical spanners, depending on the family. Within the Eucarya domain, plant homologues proved to be substantially smaller than their animal and fungal counterparts. By contrast, extracytoplasmic receptors of ABC-type uptake systems in Archaea proved to be larger on average than those of their bacterial homologues, while cytoplasmic enzymes from different organisms exhibited little or no significant size differences. These observations presumably reflect evolutionary pressure and molecular mechanisms that must have been operative since these groups of organisms diverged from each other.
- Charest A, Lane K, McMahon K, Housman DE
- Association of a novel PDZ domain-containing peripheral Golgi protein with the Q-SNARE (Q-soluble N-ethylmaleimide-sensitive fusion protein (NSF) attachment protein receptor) protein syntaxin 6.
- J Biol Chem. 2001; 276: 29456-65
- Display abstract
PDZ domains are involved in the scaffolding and assembly of multi-protein complexes at various subcellular sites. We describe here the isolation and characterization of a novel PDZ domain-containing protein that localizes to the Golgi apparatus. Using an in silico cloning approach, we have identified and isolated a cDNA encoding a ubiquitously expressed 59-kDa protein that we call FIG. It is composed of two coiled coil regions, a leucine zipper, and a single PDZ domain. Cytological studies using indirect immunofluorescence microscopy revealed that FIG is a peripheral protein that uses one of its coiled coil domains to localize to the Golgi apparatus. To ascertain the modalities of this Golgi localization, the same coiled coil region was tested for its ability to interact with a panel of coiled coil domain-containing integral membrane Golgi proteins. Using a series of GST fusion protein binding assays, co-immunofluorescence and co-immunoprecipitation experiments, we show that FIG specifically binds to the coiled coil domain-containing Q-SNARE (Q-soluble NSF attachment protein receptor) protein syntaxin 6 both in vitro and in vivo. The structural features of FIG and its interaction with a SNARE protein suggest that FIG may play a role in membrane vesicle trafficking. This is the first example of a PDZ domain-containing peripheral protein that localizes to the Golgi through a coiled coil-mediated interaction with a resident membrane protein. Our results broaden the scope of PDZ domain-mediated functions.
- Schwarz G, Schrader N, Mendel RR, Hecht HJ, Schindelin H
- Crystal structures of human gephyrin and plant Cnx1 G domains: comparative analysis and functional implications.
- J Mol Biol. 2001; 312: 405-18
- Display abstract
The molybdenum cofactor (Moco) consists of a unique and conserved pterin derivative, usually referred to as molybdopterin (MPT), which coordinates the essential transition metal molybdenum (Mo). Moco is required for the enzymatic activities of all Mo-enzymes, with the exception of nitrogenase and is synthesized by an evolutionary old multi-step pathway that is dependent on the activities of at least six gene products. In eukaryotes, the final step of Moco biosynthesis, i.e. transfer and insertion of Mo into MPT, is catalyzed by the two-domain proteins Cnx1 in plants and gephyrin in mammals. Gephyrin is ubiquitously expressed, and was initially found in the central nervous system, where it is essential for clustering of inhibitory neuroreceptors in the postsynaptic membrane. Gephyrin and Cnx1 contain at least two functional domains (E and G) that are homologous to the Escherichia coli proteins MoeA and MogA, the atomic structures of which have been solved recently. Here, we present the crystal structures of the N-terminal human gephyrin G domain (Geph-G) and the C-terminal Arabidopsis thaliana Cnx1 G domain (Cnx1-G) at 1.7 and 2.6 A resolution, respectively. These structures are highly similar and compared to MogA reveal four major differences in their three-dimensional structures: (1) In Geph-G and Cnx1-G an additional alpha-helix is present between the first beta-strand and alpha-helix of MogA. (2) The loop between alpha 2 and beta 2 undergoes conformational changes in all three structures. (3) A beta-hairpin loop found in MogA is absent from Geph-G and Cnx1-G. (4) The C terminus of Geph-G follows a different path from that in MogA. Based on the structures of the eukaryotic proteins and their comparisons with E. coli MogA, the predicted binding site for MPT has been further refined. In addition, the characterized alternative splice variants of gephyrin are analyzed in the context of the three-dimensional structure of Geph-G.
- Iyer LM, Koonin EV, Aravind L
- Adaptations of the helix-grip fold for ligand binding and catalysis in the START domain superfamily.
- Proteins. 2001; 43: 134-44
- Display abstract
With a protein structure comparison, an iterative database search with sequence profiles, and a multiple-alignment analysis, we show that two domains with the helix-grip fold, the star-related lipid-transfer (START) domain of the MLN64 protein and the birch allergen, are homologous. They define a large, previously underappreciated superfamily that we call the START superfamily. In addition to the classical START domains that are primarily involved in eukaryotic signaling mediated by lipid binding and the birch antigen family that consists of plant proteins implicated in stress/pathogen response, the START superfamily includes bacterial polyketide cyclases/aromatases (e.g., TcmN and WhiE VI) and two families of previously uncharacterized proteins. The identification of this domain provides a structural prediction of an important class of enzymes involved in polyketide antibiotic synthesis and allows the prediction of their active site. It is predicted that all START domains contain a similar ligand-binding pocket. Modifications of this pocket determine the ligand-binding specificity and may also be the basis for at least two distinct enzymatic activities, those of a cyclase/aromatase and an RNase. Thus, the START domain superfamily is a rare case of the adaptation of a protein fold with a conserved ligand-binding mode for both a broad variety of catalytic activities and noncatalytic regulatory functions. Proteins 2001;43:134-144.
- Alexandre G, Zhulin IB
- More than one way to sense chemicals.
- J Bacteriol. 2001; 183: 4681-6
- Park J, Lappe M, Teichmann SA
- Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast.
- J Mol Biol. 2001; 307: 929-38
- Display abstract
In the postgenomic era, one of the most interesting and important challenges is to understand protein interactions on a large scale. The physical interactions between protein domains are fundamental to the workings of a cell: in multi-domain polypeptide chains, in multi-subunit proteins and in transient complexes between proteins that also exist independently. To study the large-scale patterns and evolution of interactions between protein domains, we view interactions between protein domains in terms of the interactions between structural families of evolutionarily related domains. This allows us to classify 8151 interactions between individual domains in the Protein Data Bank and the yeast Saccharomyces cerevisiae in terms of 664 types of interactions, between protein families. At least 51 interactions do not occur in the Protein Data Bank and can only be derived from the yeast data. The map of interactions between protein families has the form of a scale-free network, meaning that most protein families only interact with one or two other families, while a few families are extremely versatile in their interactions and are connected to many families. We observe that almost half of all known families engage in interactions with domains from their own family. We also see that the repertoires of interactions of domains within and between polypeptide chains overlap mostly for two specific types of protein families: enzymes and same-family interactions. This suggests that different types of protein interaction repertoires exist for structural, functional and regulatory reasons.
- Copley RR, Bork P
- Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways.
- J Mol Biol. 2000; 303: 627-41
- Display abstract
We provide statistically reliable sequence evidence indicating that at least 12 of 23 SCOP (betaalpha)(8) (TIM) barrel superfamilies share a common origin. This includes all but one of the known and predicted TIM barrels found in central metabolism. The statistical evidence is complemented by an examination of the details of protein structure, with certain structural locations favouring catalytic residues even though the nature of their molecular function may change. The combined analysis of sequence, structure and function also enables us to propose a phylogeny of TIM barrels. Based on these data, we are able to examine differing theories of pathway and enzyme evolution, by mapping known TIM barrel folds to the pathways of central metabolism. The results favour widespread recruitment of enzymes between pathways, rather than a "backwards evolution" model, and support the idea that modern proteins may have arisen from common ancestors that bound key metabolites.
- Koonin EV, Wolf YI, Aravind L
- Protein fold recognition using sequence profiles and its application in structural genomics.
- Adv Protein Chem. 2000; 54: 245-75
- Koonin EV, Aravind L
- Dynein light chains of the Roadblock/LC7 group belong to an ancient protein superfamily implicated in NTPase regulation.
- Curr Biol. 2000; 10: 7746-7746
- Koretke KK, Lupas AN, Warren PV, Rosenberg M, Brown JR
- Evolution of two-component signal transduction.
- Mol Biol Evol. 2000; 17: 1956-70
- Display abstract
Two-component signal transduction (TCST) systems are the principal means for coordinating responses to environmental changes in bacteria as well as some plants, fungi, protozoa, and archaea. These systems typically consist of a receptor histidine kinase, which reacts to an extracellular signal by phosphorylating a cytoplasmic response regulator, causing a change in cellular behavior. Although several model systems, including sporulation and chemotaxis, have been extensively studied, the evolutionary relationships between specific TCST systems are not well understood, and the ancestry of the signal transduction components is unclear. Phylogenetic trees of TCST components from 14 complete and 6 partial genomes, containing 183 histidine kinases and 220 response regulators, were constructed using distance methods. The trees showed extensive congruence in the positions of 11 recognizable phylogenetic clusters. Eukaryotic sequences were found almost exclusively in one cluster, which also showed the greatest extent of domain variability in its component proteins, and archaeal sequences mainly formed species-specific clusters. Three clusters in different parts of the kinase tree contained proteins with serine-phosphorylating activity. All kinases were found to be monophyletic with respect to other members of their superfamily, such as type II topoisomerases and Hsp90. Structural analysis further revealed significant similarity to the ATP-binding domain of eukaryotic protein kinases. TCST systems are of bacterial origin and radiated into archaea and eukaryotes by lateral gene transfer. Their components show extensive coevolution, suggesting that recombination has not been a major factor in their differentiation. Although histidine kinase activity is prevalent, serine kinases have evolved multiple times independently within this family, accompanied by a loss of the cognate response regulator(s). The structural and functional similarity between TCST kinases and eukaryotic protein kinases raises the possibility of a distant evolutionary relationship.
- McCue LA, McDonough KA, Lawrence CE
- Functional classification of cNMP-binding proteins and nucleotide cyclases with implications for novel regulatory pathways in Mycobacterium tuberculosis.
- Genome Res. 2000; 10: 204-19
- Display abstract
We have analyzed the cyclic nucleotide (cNMP)-binding protein and nucleotide cyclase superfamilies using Bayesian computational methods of protein family identification and classification. In addition to the known cNMP-binding proteins (cNMP-dependent kinases, cNMP-gated channels, cAMP-guanine nucleotide exchange factors, and bacterial cAMP-dependent transcription factors), new functional groups of cNMP-binding proteins were identified, including putative ABC-transporter subunits, translocases, and esterases. Classification of the nucleotide cyclases revealed subtle differences in sequence conservation of the active site that distinguish the five classes of cyclases: the multicellular eukaryotic adenylyl cyclases, the eukaryotic receptor-type guanylyl cyclases, the eukaryotic soluble guanylyl cyclases, the unicellular eukaryotic and prokaryotic adenylyl cyclases, and the putative prokaryotic guanylyl cyclases. Phylogenetic distribution of the cNMP-binding proteins and cyclases was analyzed, with particular attention to the 22 complete archaeal and eubacterial genome sequences. Mycobacterium tuberculosis H37Rv and Synechocystis PCC6803 were each found to encode several more putative cNMP-binding proteins than other prokaryotes; many of these proteins are of unknown function. M. tuberculosis also encodes several more putative nucleotide cyclases than other prokaryotic species.
- Stanford DR, Martin NC, Hopper AK
- ADEPTs: information necessary for subcellular distribution of eukaryotic sorting isozymes resides in domains missing from eubacterial and archaeal counterparts.
- Nucleic Acids Res. 2000; 28: 383-92
- Display abstract
Sorting isozymes are encoded by single genes, but the encoded proteins are distributed to multiple subcellular compartments. We surveyed the predicted protein sequences of several nucleic acid interacting sorting isozymes from the eukaryotic taxonomic domain and compared them with their homologs in the archaeal and eubacterial domains. Here, we summarize the data showing that the eukaryotic sorting isozymes often possess sequences not present in the archaeal and eubacterial counterparts and that the additional sequences can act to target the eukaryotic proteins to their appropriate subcellular locations. Therefore, we have named these protein domains ADEPTs (Additional Domains for Eukaryotic Protein Targeting). Identification of additional domains by phylogenetic comparisons should be generally useful for locating candidate sequences important for subcellular distribution of eukaryotic proteins.
- Aravind L
- Guilt by association: contextual information in genome analysis.
- Genome Res. 2000; 10: 1074-7
- Wright PE, Dyson HJ
- Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm.
- J Mol Biol. 1999; 293: 321-31
- Display abstract
A major challenge in the post-genome era will be determination of the functions of the encoded protein sequences. Since it is generally assumed that the function of a protein is closely linked to its three-dimensional structure, prediction or experimental determination of the library of protein structures is a matter of high priority. However, a large proportion of gene sequences appear to code not for folded, globular proteins, but for long stretches of amino acids that are likely to be either unfolded in solution or adopt non-globular structures of unknown conformation. Characterization of the conformational propensities and function of the non-globular protein sequences represents a major challenge. The high proportion of these sequences in the genomes of all organisms studied to date argues for important, as yet unknown functions, since there could be no other reason for their persistence throughout evolution. Clearly the assumption that a folded three-dimensional structure is necessary for function needs to be re-examined. Although the functions of many proteins are directly related to their three-dimensional structures, numerous proteins that lack intrinsic globular structure under physiological conditions have now been recognized. Such proteins are frequently involved in some of the most important regulatory functions in the cell, and the lack of intrinsic structure in many cases is relieved when the protein binds to its target molecule. The intrinsic lack of structure can confer functional advantages on a protein, including the ability to bind to several different targets. It also allows precise control over the thermodynamics of the binding process and provides a simple mechanism for inducibility by phosphorylation or through interaction with other components of the cellular machinery. Numerous examples of domains that are unstructured in solution but which become structured upon binding to the target have been noted in the areas of cell cycle control and both transcriptional and translational regulation, and unstructured domains are present in proteins that are targeted for rapid destruction. Since such proteins participate in critical cellular control mechanisms, it appears likely that their rapid turnover, aided by their unstructured nature in the unbound state, provides a level of control that allows rapid and accurate responses of the cell to changing environmental conditions.
- Ponting CP, Aravind L, Schultz J, Bork P, Koonin EV
- Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer.
- J Mol Biol. 1999; 289: 729-45
- Display abstract
Phyletic distributions of eukaryotic signalling domains were studied using recently developed sensitive methods for protein sequence analysis, with an emphasis on the detection and accurate enumeration of homologues in bacteria and archaea. A major difference was found between the distributions of enzyme families that are typically found in all three divisions of cellular life and non-enzymatic domain families that are usually eukaryote-specific. Previously undetected bacterial homologues were identified for# plant pathogenesis-related proteins, Pad1, von Willebrand factor type A, src homology 3 and YWTD repeat-containing domains. Comparisons of the domain distributions in eukaryotes and prokaryotes enabled distinctions to be made between the domains originating prior to the last common ancestor of all known life forms and those apparently originating as consequences of horizontal gene transfer events. A number of transfers of signalling domains from eukaryotes to bacteria were confidently identified, in contrast to only a single case of apparent transfer from eukaryotes to archaea.
- Makarova KS et al.
- Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell.
- Genome Res. 1999; 9: 608-28
- Display abstract
Comparative analysis of the protein sequences encoded in the four euryarchaeal species whose genomes have been sequenced completely (Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Archaeoglobus fulgidus, and Pyrococcus horikoshii) revealed 1326 orthologous sets, of which 543 are represented in all four species. The proteins that belong to these conserved euryarchaeal families comprise 31%-35% of the gene complement and may be considered the evolutionarily stable core of the archaeal genomes. The core gene set includes the great majority of genes coding for proteins involved in genome replication and expression, but only a relatively small subset of metabolic functions. For many gene families that are conserved in all euryarchaea, previously undetected orthologs in bacteria and eukaryotes were identified. A number of euryarchaeal synapomorphies (unique shared characters) were identified; these are protein families that possess sequence signatures or domain architectures that are conserved in all euryarchaea but are not found in bacteria or eukaryotes. In addition, euryarchaea-specific expansions of several protein and domain families were detected. In terms of their apparent phylogenetic affinities, the archaeal protein families split into bacterial and eukaryotic families. The majority of the proteins that have only eukaryotic orthologs or show the greatest similarity to their eukaryotic counterparts belong to the core set. The families of euryarchaeal genes that are conserved in only two or three species constitute a relatively mobile component of the genomes whose evolution should have involved multiple events of lineage-specific gene loss and horizontal gene transfer. Frequently these proteins have detectable orthologs only in bacteria or show the greatest similarity to the bacterial homologs, which might suggest a significant role of horizontal gene transfer from bacteria in the evolution of the euryarchaeota.
- Wolf YI, Brenner SE, Bash PA, Koonin EV
- Distribution of protein folds in the three superkingdoms of life.
- Genome Res. 1999; 9: 17-26
- Display abstract
A sensitive protein-fold recognition procedure was developed on the basis of iterative database search using the PSI-BLAST program. A collection of 1193 position-dependent weight matrices that can be used as fold identifiers was produced. In the completely sequenced genomes, folds could be automatically identified for 20%-30% of the proteins, with 3%-6% more detectable by additional analysis of conserved motifs. The distribution of the most common folds is very similar in bacteria and archaea but distinct in eukaryotes. Within the bacteria, this distribution differs between parasitic and free-living species. In all analyzed genomes, the P-loop NTPases are the most abundant fold. In bacteria and archaea, the next most common folds are ferredoxin-like domains, TIM-barrels, and methyltransferases, whereas in eukaryotes, the second to fourth places belong to protein kinases, beta-propellers and TIM-barrels. The observed diversity of protein folds in different proteomes is approximately twice as high as it would be expected from a simple stochastic model describing a proteome as a finite sample from an infinite pool of proteins with an exponential distribution of the fold fractions. Distribution of the number of domains with different folds in one protein fits the geometric model, which is compatible with the evolution of multidomain proteins by random combination of domains. [Fold predictions for proteins from 14 proteomes are available on the World Wide Web at. The FIDs are available by anonymous ftp at the same location.]
- Aravind L, Subramanian G
- Origin of multicellular eukaryotes - insights from proteome comparisons.
- Curr Opin Genet Dev. 1999; 9: 688-94
- Display abstract
The complete genomes of the yeast Saccharomyces cerevisiae and the nematode worm Caenorhabditis elegans have recently become available allowing the comparison of the complete protein sets of a unicellular and multicellular eukaryote for the first time. These comparisons reveal some striking trends in terms of expansions or extensive shuffling of specific domains that are involved in regulatory functions and signaling. Similar comparisons with the available sequence data from the plant Arabidopsis thaliana produce consistent results. These observations have provided useful insights regarding the origin of multicellular organisms.
- Kay LE
- Protein dynamics from NMR.
- Biochem Cell Biol. 1998; 76: 145-52
- Display abstract
The past several years have seen the development of a significant number of new multidimensional NMR methods for the study of molecular dynamics spanning a wide range of time scales. Applications involving a large number of different biological systems have emerged and correlations with function have been established. Unique insights are obtained that are not available from structure alone, indicating the importance of dynamics studies for understanding function.
- Mowbray SL, Sandgren MO
- Chemotaxis receptors: a progress report on structure and function.
- J Struct Biol. 1998; 124: 257-75
- Display abstract
Recent biochemical and structural studies have provided many new insights into the structure and function of bacterial chemoreceptors. Aspects of their ligand binding, conformational changes, and interactions with other members of the signaling pathway are being defined at the structural level. It is anticipated that the combined effort will soon provide a detailed, unified view of an entire response system.
- Gupta RS
- Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes.
- Microbiol Mol Biol Rev. 1998; 62: 1435-91
- Display abstract
The presence of shared conserved insertion or deletions (indels) in protein sequences is a special type of signature sequence that shows considerable promise for phylogenetic inference. An alternative model of microbial evolution based on the use of indels of conserved proteins and the morphological features of prokaryotic organisms is proposed. In this model, extant archaebacteria and gram-positive bacteria, which have a simple, single-layered cell wall structure, are termed monoderm prokaryotes. They are believed to be descended from the most primitive organisms. Evidence from indels supports the view that the archaebacteria probably evolved from gram-positive bacteria, and I suggest that this evolution occurred in response to antibiotic selection pressures. Evidence is presented that diderm prokaryotes (i.e., gram-negative bacteria), which have a bilayered cell wall, are derived from monoderm prokaryotes. Signature sequences in different proteins provide a means to define a number of different taxa within prokaryotes (namely, low G+C and high G+C gram-positive, Deinococcus-Thermus, cyanobacteria, chlamydia-cytophaga related, and two different groups of Proteobacteria) and to indicate how they evolved from a common ancestor. Based on phylogenetic information from indels in different protein sequences, it is hypothesized that all eukaryotes, including amitochondriate and aplastidic organisms, received major gene contributions from both an archaebacterium and a gram-negative eubacterium. In this model, the ancestral eukaryotic cell is a chimera that resulted from a unique fusion event between the two separate groups of prokaryotes followed by integration of their genomes.
- Liepinsh E, Kitamura M, Murakami T, Nakaya T, Otting G
- Common ancestor of serine proteases and flavin-binding domains.
- Nat Struct Biol. 1998; 5: 102-3
- Mian IS, Moser MJ, Holley WR, Chatterjee A
- Statistical modelling and phylogenetic analysis of a deaminase domain.
- J Comput Biol. 1998; 5: 57-72
- Display abstract
Deamination reactions are catalyzed by a variety of enzymes including those involved in nucleoside/nucleotide metabolism and cytosine to uracil (C-->U) and adenosine to inosine (A-->I) mRNA editing. The active site of the deaminase (DM) domain in these enzymes contains a conserved histidine (or rarely cysteine), two cysteines and a glutamate proposed to act as a proton shuttle during deamination. Here, a statistical model, a hidden Markov model (HMM), of the DM domain has been created which identifies currently known DM domains and suggests new DM domains in viral, bacterial and eucaryotic proteins. However, no DM domains were identified in the currently predicted proteins from the archaeon Methanococcus jannaschii and possible causes for, and a potential means to ameliorate this situation are discussed. In some of the newly identified DM domains, the glutamate is changed to a residue that could not function as a proton shuttle and in one instance (Mus musculus spermatid protein TENR) the cysteines are also changed to lysine and serine. These may be non-competent DM domains able to bind but not act upon their substrate. Phylogenetic analysis using an HMM-generated alignment of DM domains reveals three branches with clear substructure in each branch. The results suggest DM domains that are candidates for yeast, platyhelminth, plant and mammalian C-->U and A-->I mRNA editing enzymes. Some bacterial and eucaryotic DM domains form distinct branches in the phylogenetic tree suggesting the existence of common, novel substrates.
- Zhulin IB, Taylor BL
- Correlation of PAS domains with electron transport-associated proteins in completely sequenced microbial genomes.
- Mol Microbiol. 1998; 29: 1522-3
- Reizer J, Saier MH Jr
- Modular multidomain phosphoryl transfer proteins of bacteria.
- Curr Opin Struct Biol. 1997; 7: 407-15
- Display abstract
Recent phylogenetic and structural analyses of multidomain phosphoryl transfer proteins of bacteria have revealed that interdomain (but not intradomain) splicing and fusion, as well as domain duplication and deletion, have occurred frequently during evolution. These events have been found to be exceedingly rare in certain other protein families. Domain-shuffling events are illustrated by examples from the superfamilies of phosphoenolpyruvate-dependent sugar phosphotransferase systems, their transcriptional regulatory protein targets of phosphorylation, sensor autokinase/response regulator signal transduction systems, and permeases of the ATP-binding-cassette type.
- Wilkinson AJ
- Accommodating structurally diverse peptides in proteins.
- Chem Biol. 1996; 3: 519-24
- Display abstract
Many peptide-binding proteins must bind numerous ligands that differ in size, sequence and sometimes orientation. A variety of strategies for coping with structurally diverse peptide ligands have been revealed by biochemical and structural studies of proteins with roles in immunity, transport and signal transduction.
- Stock J
- Receptor signaling: dimerization and beyond.
- Curr Biol. 1996; 6: 825-7
- Display abstract
It is has been proposed that hormone receptors which have only a single transmembrane sequence mediate signaling via hormone-induced monomer-to-dimer transitions. But recent studies of analogous receptors in bacteria indicate that dimerization may be only a prerequisite for signaling.
- Ramasarma T
- Transmembrane domains participate in functions of integral membrane proteins.
- Indian J Biochem Biophys. 1996; 33: 20-9
- Display abstract
Integral membrane proteins have one or more transmembrane alpha-helical domains and carry out a variety of functions such as enzyme catalysis, transport across membranes, transducing signals as receptors of hormones and growth factors, and energy transfer in ATP synthesis. These transmembrane domains are not mere structural units anchoring the protein to the lipid bilayer but seem to-contribute in the overall activity. Recent findings in support of this are described using some typical examples-LDL receptor, growth factor receptor tyrosine kinase, HMG-CoA reductase, F0-ATPase and adrenergic receptors. The trends in research indicate that these transmembrane domains participate in a variety of ways such as a linker, a transducer or an exchanger in the overall functions of these proteins in transfer of materials, energy and signals.
- Harrison SC
- Peptide-surface association: the case of PDZ and PTB domains.
- Cell. 1996; 86: 341-3
- Mushegian AR, Koonin EV
- Sequence analysis of eukaryotic developmental proteins: ancient and novel domains.
- Genetics. 1996; 144: 817-28
- Display abstract
Most of the genes involved in the development of multicellular eukaryotes encode large, multidomain proteins. To decipher the major trends in the evolution of these proteins and make functional predictions for uncharacterized domains, we applied a strategy of sequence database search that includes construction of specialized data sets and iterative subsequence masking. This computational approach allowed us to detect previously unnoticed but potentially important sequence similarities. Developmental gene products are enriched in predicted nonglobular regions as compared to unbiased sets of eukaryotic and bacterial proteins. Developmental genes that act intracellularly, primarily at the level of transcription regulation, typically code for proteins containing highly conserved DNA-binding domains, most of which appear to have evolved before the radiation of bacteria and eukaryotes. We identified bacterial homologues, namely a protein family that includes the Escherichia coli universal stress protein UspA, for the MADS-box transcription regulators previously described only in eukaryotes. We also show that the FUS6 family of eukaryotic proteins contains a putative DNA-binding domain related to bacterial helix-turn-helix transcription regulators. Developmental proteins that act extracellularly are less conserved and often do not have bacterial homologues. Nevertheless, several provocative similarities between different groups of such proteins were detected.
- Jackson CS et al.
- Dynamic protein acylation and the regulation of localization and function of signal-transducing proteins.
- Biochem Soc Trans. 1995; 23: 568-71
- Saier MH Jr
- Computer-aided analyses of transport protein sequences: gleaning evidence concerning function, structure, biogenesis, and evolution.
- Microbiol Rev. 1994; 58: 71-93
- Display abstract
Three-dimensional structures have been elucidated for very few integral membrane proteins. Computer methods can be used as guides for estimation of solute transport protein structure, function, biogenesis, and evolution. In this paper the application of currently available computer programs to over a dozen distinct families of transport proteins is reviewed. The reliability of sequence-based topological and localization analyses and the importance of sequence and residue conservation to structure and function are evaluated. Evidence concerning the nature and frequency of occurrence of domain shuffling, splicing, fusion, deletion, and duplication during evolution of specific transport protein families is also evaluated. Channel proteins are proposed to be functionally related to carriers. It is argued that energy coupling to transport was a late occurrence, superimposed on preexisting mechanisms of solute facilitation. It is shown that several transport protein families have evolved independently of each other, employing different routes, at different times in evolutionary history, to give topologically similar transmembrane protein complexes. The possible significance of this apparent topological convergence is discussed.
- Stewart CB
- Comparative method in study of protein structure and function: enzyme specificity as an example.
- Methods Enzymol. 1993; 224: 591-603
- Tam R, Saier MH Jr
- Structural, functional, and evolutionary relationships among extracellular solute-binding receptors of bacteria.
- Microbiol Rev. 1993; 57: 320-46
- Display abstract
Extracellular solute-binding proteins of bacteria serve as chemoreceptors, recognition constituents of transport systems, and initiators of signal transduction pathways. Over 50 sequenced periplasmic solute-binding proteins of gram-negative bacteria and homologous extracytoplasmic lipoproteins of gram-positive bacteria have been analyzed for sequence similarities, and their degrees of relatedness have been determined. Some of these proteins are homologous to cytoplasmic transcriptional regulatory proteins of bacteria; however, with the sole exception of the vitamin B12-binding protein of Escherichia coli, which is homologous to human glutathione peroxidase, they are not demonstrably homologous to any of the several thousand sequenced eukaryotic proteins. Most of these proteins fall into eight distinct clusters as follows. Cluster 1 solute-binding proteins are specific for malto-oligosaccharides, multiple oligosaccharides, glycerol 3-phosphate, and iron. Cluster 2 proteins are specific for galactose, ribose, arabinose, and multiple monosaccharides, and they are homologous to a number of transcriptional regulatory proteins including the lactose, galactose, and fructose repressors of E. coli. Cluster 3 proteins are specific for histidine, lysine-arginine-ornithine, glutamine, octopine, nopaline, and basic amino acids. Cluster 4 proteins are specific for leucine and leucine-isoleucine-valine, and they are homologous to the aliphatic amidase transcriptional repressor, AmiC, of Pseudomonas aeruginosa. Cluster 5 proteins are specific for dipeptides and oligopeptides as well as nickel. Cluster 6 proteins are specific for sulfate, thiosulfate, and possibly phosphate. Cluster 7 proteins are specific for dicarboxylates and tricarboxylates, but these two proteins exhibit insufficient sequence similarity to establish homology. Finally, cluster 8 proteins are specific for iron complexes and possibly vitamin B12. Members of each cluster of binding proteins exhibit greater sequence conservation in their N-terminal domains than in their C-terminal domains. Signature sequences for these eight protein families are presented. The results reveal that binding proteins specific for the same solute from different bacteria are generally more closely related to each other than are binding proteins specific for different solutes from the same organism, although exceptions exist. They also suggest that a requirement for high-affinity solute binding imposes severe structural constraints on a protein. The occurrence of two distinct classes of bacterial cytoplasmic repressor proteins which are homologous to two different clusters of periplasmic binding proteins suggests that the gene-splicing events which allowed functional conversion of these proteins with retention of domain structure have occurred repeatedly during evolutionary history.(ABSTRACT TRUNCATED AT 400 WORDS)