Secondary literature sources for ASCH

The following references were automatically generated.

Makarova KS, Wolf YI, van der Oost J, Koonin EV
Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements.
Biol Direct. 2009; 4: 29-29
Display abstract
BACKGROUND: In eukaryotes, RNA interference (RNAi) is a major mechanism of defense against viruses and transposable elements as well of regulating translation of endogenous mRNAs. The RNAi systems recognize the target RNA molecules via small guide RNAs that are completely or partially complementary to a region of the target. Key components of the RNAi systems are proteins of the Argonaute-PIWI family some of which function as slicers, the nucleases that cleave the target RNA that is base-paired to a guide RNA. Numerous prokaryotes possess the CRISPR-associated system (CASS) of defense against phages and plasmids that is, in part, mechanistically analogous but not homologous to eukaryotic RNAi systems. Many prokaryotes also encode homologs of Argonaute-PIWI proteins but their functions remain unknown. RESULTS: We present a detailed analysis of Argonaute-PIWI protein sequences and the genomic neighborhoods of the respective genes in prokaryotes. Whereas eukaryotic Ago/PIWI proteins always contain PAZ (oligonucleotide binding) and PIWI (active or inactivated nuclease) domains, the prokaryotic Argonaute homologs (pAgos) fall into two major groups in which the PAZ domain is either present or absent. The monophyly of each group is supported by a phylogenetic analysis of the conserved PIWI-domains. Almost all pAgos that lack a PAZ domain appear to be inactivated, and the respective genes are associated with a variety of predicted nucleases in putative operons. An additional, uncharacterized domain that is fused to various nucleases appears to be a unique signature of operons encoding the short (lacking PAZ) pAgo form. By contrast, almost all PAZ-domain containing pAgos are predicted to be active nucleases. Some proteins of this group (e.g., that from Aquifex aeolicus) have been experimentally shown to possess nuclease activity, and are not typically associated with genes for other (putative) nucleases. Given these observations, the apparent extensive horizontal transfer of pAgo genes, and their common, statistically significant over-representation in genomic neighborhoods enriched in genes encoding proteins involved in the defense against phages and/or plasmids, we hypothesize that pAgos are key components of a novel class of defense systems. The PAZ-domain containing pAgos are predicted to directly destroy virus or plasmid nucleic acids via their nuclease activity, whereas the apparently inactivated, PAZ-lacking pAgos could be structural subunits of protein complexes that contain, as active moieties, the putative nucleases that we predict to be co-expressed with these pAgos. All these nucleases are predicted to be DNA endonucleases, so it seems most probable that the putative novel phage/plasmid-defense system targets phage DNA rather than mRNAs. Given that in eukaryotic RNAi systems, the PAZ domain binds a guide RNA and positions it on the complementary region of the target, we further speculate that pAgos function on a similar principle (the guide being either DNA or RNA), and that the uncharacterized domain found in putative operons with the short forms of pAgos is a functional substitute for the PAZ domain. CONCLUSION: The hypothesis that pAgos are key components of a novel prokaryotic immune system that employs guide RNA or DNA molecules to degrade nucleic acids of invading mobile elements implies a functional analogy with the prokaryotic CASS and a direct evolutionary connection with eukaryotic RNAi. The predictions of the hypothesis including both the activities of pAgos and those of the associated endonucleases are readily amenable to experimental tests.

Kerner P, Hung J, Behague J, Le Gouar M, Balavoine G, Vervoort M
Insights into the evolution of the snail superfamily from metazoan wide molecular phylogenies and expression data in annelids.
BMC Evol Biol. 2009; 9: 94-94
Display abstract
BACKGROUND: An important issue concerning the evolution of duplicated genes is to understand why paralogous genes are retained in a genome even though the most likely fate for a redundant duplicated gene is nonfunctionalization and thereby its elimination. Here we study a complex superfamily generated by gene duplications, the snail related genes that play key roles during animal development. We investigate the evolutionary history of these genes by genomic, phylogenetic, and expression data studies. RESULTS: We systematically retrieved the full complement of snail related genes in several sequenced genomes. Through phylogenetic analysis, we found that the snail superfamily is composed of three ancestral families, snail, scratchA and scratchB. Analyses of the organization of the encoded proteins point out specific molecular signatures, indicative of functional specificities for Snail, ScratchA and ScratchB proteins. We also report the presence of two snail genes in the annelid Platynereis dumerilii, which have distinct expression patterns in the developing mesoderm, nervous system, and foregut. The combined expression of these two genes is identical to that of two independently duplicated snail genes in another annelid, Capitella spI, but different aspects of the expression patterns are differentially shared among paralogs of Platynereis and Capitella. CONCLUSION: Our study indicates that the snail and scratchB families have expanded through multiple independent gene duplications in the different bilaterian lineages, and highlights potential functional diversifications of Snail and ScratchB proteins following duplications, as, in several instances, paralogous proteins in a given species show different domain organizations. Comparisons of the expression pattern domains of the two Platynereis and Capitella snail paralogs provide evidence for independent subfunctionalization events which have occurred in these two species. We propose that the snail related genes may be especially prone to subfunctionalization, and this would explain why the snail superfamily underwent so many independent duplications leading to maintenance of functional paralogs.

McGuire AT, Keates RA, Cook S, Mangroo D
Structural modeling identified the tRNA-binding domain of Utp8p, an essential nucleolar component of the nuclear tRNA export machinery of Saccharomyces cerevisiae.
Biochem Cell Biol. 2009; 87: 431-43
Display abstract
Utp8p is an essential 80 kDa intranuclear tRNA chaperone that transports tRNAs from the nucleolus to the nuclear tRNA export receptors in Saccharomyces cerevisiae. To help understand the mechanism of Utp8p function, predictive tools were used to derive a partial model of the tertiary structure of Utp8p. Secondary structure prediction, supported by circular dichroism measurements, indicated that Utp8p is divided into 2 domains: the N-terminal beta sheet and the C-terminal alpha helical domain. Tertiary structure prediction was more challenging, because the amino acid sequence of Utp8p is not directly homologous to any known protein structure. The tertiary structures predicted by threading and fold recognition had generally modest scores, but for the C-terminal domain, threading and fold recognition consistently pointed to an alpha-alpha superhelix. Because of the sequence diversity of this fold type, no single structural template was an ideal fit to the Utp8p sequence. Instead, a composite template was constructed from 3 different alpha-alpha superhelix structures that gave the best matches to different portions of the C-terminal domain sequence. In the resulting model, the most conserved sequences grouped in a tight cluster of positive charges on a protein that is otherwise predominantly negative, suggesting that the positive-charge cleft may be the tRNA-binding site. Mutations of conserved positive residues in the proposed binding site resulted in a reduction in the affinity of Utp8p for tRNA both in vivo and in vitro. Models were also derived for the 10 fungal homologues of Utp8p, and the localization of the positive charges on the conserved surface was found in all cases. Taken together, these data suggest that the positive-charge cleft of the C-terminal domain of Utp8p is involved in tRNA-binding.

Vangelatos I, Vlachakis D, Sophianopoulou V, Diallinas G
Modelling and mutational evidence identify the substrate binding site and functional elements in APC amino acid transporters.
Mol Membr Biol. 2009; 26: 356-70
Display abstract
The Amino acid-Polyamine-Organocation (APC) superfamily is the main family of amino acid transporters found in all domains of life and one of the largest families of secondary transporters. Here, using a sensitive homology threading approach and modelling we show that the predicted structure of APC members is extremely similar to the crystal structures of several prokaryotic transporters belonging to evolutionary distinct protein families with different substrate specificities. All of these proteins, despite having no primary amino acid sequence similarity, share a similar structural core, consisting of two V-shaped domains of five transmembrane domains each, intertwined in an antiparallel topology. Based on this model, we reviewed available data on functional mutations in bacterial, fungal and mammalian APCs and obtained novel mutational data, which provide compelling evidence that the amino acid binding pocket is located in the vicinity of the unwound part of two broken helices, in a nearly identical position to the structures of similar transporters. Our analysis is fully supported by the evolutionary conservation and specific amino acid substitutions in the proposed substrate binding domains. Furthermore, it allows predictions concerning residues that might be crucial in determining the specificity profile of APC members. Finally, we show that two cytoplasmic loops constitute important functional elements in APCs. Our work along with different kinetic and specificity profiles of APC members in easily manipulated bacterial and fungal model systems could form a unique framework for combining genetic, in-silico and structural studies, for understanding the function of one of the most important transporter families.

Sikder AR, Zomaya AY
Inferring boundary information of discontinuous-domain proteins.
IEEE Trans Nanobioscience. 2008; 7: 200-5
Display abstract
Wetlaufer introduced the classification of domains into continuous and discontinuous. Continuous domains form from a single-chain segment and discontinuous domains are composed of two or more chain segments. Richardson identified approximately 100 domains in her review. Her assignment was based on the concepts that the domain would be independently stable and/or could undergo rigid-body-like movements with respect to the entire protein. There are now several instances where structurally similar domains occur in different proteins in the absence of noticeable sequence similarity. Possibly, the most notable of such domains is the trios-phosphate isomerase (TIM) barrel. With the increase in the number of known sequences, computer algorithms are required to identify the discontinuous domain of an unknown protein chain in order to determine its structure and function. We have developed a novel algorithm for discontinuous-domain boundary prediction based on a machine learning algorithm and interresidue contact interactions values. We have used 415 proteins, including 100 discontinuous-domain chains for training. There is no method available that is designed solely on a sequence based for the prediction of discontinuous domain. DomainDiscovery performed significantly well compared to the structure-based methods like structural classification of proteins (SCOP), class, architecture, topology and homologous superfamily (CATH), and DOMain MAKer (DOMAK).

Nikolakaki E et al.
RNA association or phosphorylation of the RS domain prevents aggregation of RS domain-containing proteins.
Biochim Biophys Acta. 2008; 1780: 214-25
Display abstract
Domains rich in alternating arginine and serine residues (RS domains) are found in a large number of eukaryotic proteins involved in several cellular processes. According to the prevailing view RS domains function as protein interaction domains, thereby promoting the assembly of higher-order cellular structures. Furthermore, recent data demonstrated that the RS regions of several SR splicing factors directly contact the pre-mRNA in a nonsequence specific but functionally important fashion. Using a variety of biochemical approaches, we now demonstrate that the RS domains of three proteins, not directly associated with the splicing reaction, such as lamin b receptor, acinus and peroxisome proliferator-activated receptor gamma coactivator-1 alpha, associate mainly with nuclear RNA and that this association is conducive in retaining the proteins in a soluble form. Phosphorylation by SRPK1 prevents RNA association, yet it greatly increases the fraction of the proteins recovered in soluble form, thereby mimicking the RNA effect. Based on these results we propose that the tendency to self-associate and form aggregates is a general property of RS domain-containing proteins and could be attributed to their disordered structure. RNA binding or SRPK1-mediated phosphorylation prevents aggregation and may serve to modulate the RS domain interaction modes.

Sandholzer J, Hoeth M, Piskacek M, Mayer H, de Martin R
A novel 9-amino-acid transactivation domain in the C-terminal part of Sox18.
Biochem Biophys Res Commun. 2007; 360: 370-4
Display abstract
Sox transcription factors are members of the Sry-related protein family that play multiple roles mainly during development. Sox18 has been implicated in the development of hair follicles as well as the blood and lymphatic vasculature, due to the identification of mutations that result in the ragged phenotype in mice, and in the hypotrichosis lymphedema telangiectasia syndrome in humans. Sox18 consists of an N-terminal high-mobility group DNA binding and a central transactivation domain, followed by a C-terminal region of unknown function. We show here that this C-terminal domain consists of three blocks that are highly conserved within a subgroup of the Sox family, and that the central so-called charged block comprises an additional strong transactivating domain. This activity can be pinpointed to a recently described 9aa transactivation motif that mediates the interaction with the transcriptional cofactor TAF9. These result can explain previously controversial data on the functional consequences of Sox18 mutations in mice and humans.

Siddiqui N, Osborne MJ, Gallie DR, Gehring K
Solution structure of the PABC domain from wheat poly (A)-binding protein: an insight into RNA metabolic and translational control in plants.
Biochemistry. 2007; 46: 4221-31
Display abstract
In animals, the PABC domain from poly (A)-binding protein recruits proteins containing a specific interacting motif (PAM-2) to the mRNP complex. These proteins include Paip1, Paip2, and eukaryotic release factor 3 (eRF3), all of which regulate PABP function in translation. The following reports the solution structure of PABC from Triticum avestium (wheat) poly (A)-binding protein determined by NMR spectroscopy. Wheat PABC (wPABC) is an alpha-helical protein domain, which displays a fold highly similar to the human PABC domain and contains a PAM-2 peptide binding site. Through a bioinformatics search, several plant proteins containing a PAM-2 site were identified including the early response to dehydration protein (ERD-15), which was previously shown to regulate PABP-dependent translation. The plant PAM-2 proteins contain a variety of conserved sequences including a PABP-interacting 1 motif (PAM-1), RNA binding domains, an SMR endonuclease domain, and a poly (A)-nuclease regulatory domain, all of which suggest a function in either translation or mRNA metabolism. The proteins identified are well conserved throughout plant species but have no sequence homologues in metazoans. We show that wPABC binds to the plant PAM-2 motif with high affinity through a conserved mechanism. Overall, our results suggest that plant species have evolved a distinct regulatory mechanism involving novel PABP binding partners.

Ohtomo T, Horii T, Nomizu M, Suga T, Yamada J
Molecular cloning of a structural homolog of YY1AP, a coactivator of the multifunctional transcription factor YY1.
Amino Acids. 2007; 33: 645-52
Display abstract
YY1 is a multifunctional transcription factor that activates or represses gene transcription depending on interactions with other regulatory proteins that include coactivator YY1AP. Here, we describe the cloning of a novel homolog of YY1AP, referred to as YARP, from the human neuroblastoma cell line SK-N-SH. The cloned cDNA encoded a 2240 amino acid protein that contained a domain which was 97% homologous to an entire YY1AP sequence of 739 amino acids. Two splice variants, YARP2 and YARP3, were also cloned. Northern blotting demonstrated the YARP mRNA (approximately 10 kb), which was increased 1.7-fold after dibutyryl cAMP-induced neural differentiation of the cells. Presence of YARP mRNA was also confirmed in human tissues such as the heart, brain and placenta. Bioinformatic analysis predicted various functional motifs in the YARP structure, including nuclear localization signals and domains associated with protein-protein interactions (PAH2), DNA-binding (SANT), and chromatin assembly (nucleoplasmin-like), outside the YY1AP-homology domain. Thus, we propose that YARP is multifunctional and plays not only a role analogous to YY1AP, but also its own specific roles in DNA-utilizing processes such as transcription.

Ishikawa H, Nakagawa N, Kuramitsu S, Masui R
Crystal structure of TTHA0252 from Thermus thermophilus HB8, a RNA degradation protein of the metallo-beta-lactamase superfamily.
J Biochem. 2006; 140: 535-42
Display abstract
In bacterial RNA metabolism, mRNA degradation is an important process for gene expression. Recently, a novel ribonuclease (RNase), belonging to the beta-CASP family within the metallo-beta-lactamase superfamily, was identified as a functional homologue of RNase E, a major component for mRNA degradation in Escherichia coli. Here, we have determined the crystal structure of TTHA0252 from Thermus thermophilus HB8, which represents the first report of the tertiary structure of a beta-CASP family protein. TTHA0252 comprises two separate domains: a metallo-beta-lactamase domain and a "clamp" domain. The active site of the enzyme is located in a cleft between the two domains, which includes two zinc ions coordinated by seven conserved residues. Although this configuration is similar to those of other beta-lactamases, TTHA0252 has one conserved His residue characteristic of the beta-CASP family as a ligand. We also detected nuclease activity of TTHA0252 against rRNAs of T. thermophilus. Our results reveal structural and functional aspects of novel RNase E-like enzymes with a beta-CASP fold.

Rodenburg KW, Smolenaars MM, Van Hoof D, Van der Horst DJ
Sequence analysis of the non-recurring C-terminal domains shows that insect lipoprotein receptors constitute a distinct group of LDL receptor family members.
Insect Biochem Mol Biol. 2006; 36: 250-63
Display abstract
Lipoprotein-mediated delivery of lipids in mammals involves endocytic receptors of the low density lipoprotein (LDL) receptor (LDLR) family. In contrast, in insects, the lipoprotein, lipophorin (Lp), functions as a reusable lipid shuttle in lipid delivery, and these animals, therefore, were not supposed to use endocytic receptors. However, recent data indicate additional endocytic uptake of Lp, mediated by a Lp receptor (LpR) of the LDLR family. The two N-terminal domains of LDLR family members are involved in ligand binding and dissociation, respectively, and are composed of a mosaic of multiple repeats. The three C-terminal domains, viz., the optional O-linked glycosylation domain, the transmembrane domain, and the intracellular domain, are of a non-repetitive sequence. The present classification of newly discovered LDLR family members, including the LpRs, bears no relevance to physiological function. Therefore, as a novel approach, the C-terminal domains of LDLR family members across the entire animal kingdom were used to perform a sequence comparison analysis in combination with a phylogenetic tree analysis. The LpRs appeared to segregate into a specific group distinct from the groups encompassing the other family members, and each of the three C-terminal domains of the insect receptors is composed of unique set of sequence motifs. Based on conservation of sequence motifs and organization of these motifs in the domains, LpR resembles most the groups of the LDLRs, very low density lipoprotein (VLDL) receptors, and vitellogenin receptors. However, in sequence aspects in which LpR deviates from these three receptor groups, it most notably resembles LDLR-related protein-2, or megalin. These features might explain the functional differences disclosed between insect and mammalian lipoprotein receptors.

Zhang D, Martyniuk CJ, Trudeau VL
SANTA domain: a novel conserved protein module in Eukaryota with potential involvement in chromatin regulation.
Bioinformatics. 2006; 22: 2459-62
Display abstract
Since packaging of DNA in the chromatin structure restricts the accessibility for regulatory factors, chromatin remodeling is required to facilitate nuclear processes such as gene transcription, replication, and genome recombination. Many conserved non-enzymatic protein domains have been identified that contribute to the activities of multiprotein remodeling complexes. Here we identified a novel conserved protein domain in Eukaryota whose putative function may be in regulating chromatin remodeling. Since this domain is associated with a known SANT domain in several vertebrate proteins, we named it the SANTA (SANT Associated) domain. Sequence analysis showed that the SANTA domain is approximately a 90 amino acid module and likely composed of four central beta-sheets and three flanking alpha-helices. Many hydrophobic residues exhibited high conservation along the domain, implying a possible function in protein-protein interactions. The SANTA domain was identified in mammals, chicken, frog, fish, sea squirt, sea urchin, worms and plants. Furthermore, a phylogenetic tree of SANTA domains showed that one plant-specific duplication event happened in the Viridiplantae lineage.

Coles M et al.
AbrB-like transcription factors assume a swapped hairpin fold that is evolutionarily related to double-psi beta barrels.
Structure. 2005; 13: 919-28
Display abstract
AbrB is a key transition-state regulator of Bacillus subtilis. Based on the conservation of a betaalphabeta structural unit, we proposed a beta barrel fold for its DNA binding domain, similar to, but topologically distinct from, double-psi beta barrels. However, the NMR structure revealed a novel fold, the "looped-hinge helix." To understand this discrepancy, we undertook a bioinformatics study of AbrB and its homologs; these form a large superfamily, which includes SpoVT, PrlF, MraZ, addiction module antidotes (PemI, MazE), plasmid maintenance proteins (VagC, VapB), and archaeal PhoU homologs. MazE and MraZ form swapped-hairpin beta barrels. We therefore reexamined the fold of AbrB by NMR spectroscopy and found that it also forms a swapped-hairpin barrel. The conservation of the core betaalphabeta element supports a common evolutionary origin for swapped-hairpin and double-psi barrels, which we group into a higher-order class, the cradle-loop barrels, based on the peculiar shape of their ligand binding site.

Gough J
Convergent evolution of domain architectures (is rare).
Bioinformatics. 2005; 21: 1464-71
Display abstract
MOTIVATION: In this paper, we shall examine the evolution of domain architectures across 62 genomes of known phylogeny including all kingdoms of life. We look in particular at the possibility of convergent evolution, with a view to determining the extent to which the architectures observed in the genomes are due to functional necessity or evolutionary descent. We used domains of known structure, because from this and other information we know their evolutionary relationships. We use a range of methods including phylogenetic grouping, sequence similarity/alignment, mutation rates and comparative genomics to approach this difficult problem from several angles. RESULTS: Although we do not claim an exhaustive analysis, we conclude that between 0.4 and 4% of sequences are involved in convergent evolution of domain architectures, and expect the actual number to be close to the lower bound. We also made two incidental observations, albeit on a small sample: the events leading to convergent evolution appear to be random with no functional or structural preferences, and changes in the number of tandem repeat domains occur more readily than changes which alter the domain composition. CONCLUSION: The principal conclusion is that the observed domain architectures of the sequences in the genomes are driven by evolutionary descent rather than functional necessity. CONTACT: gough@supfam.org.

Anantharaman V, Aravind L
Novel conserved domains in proteins with predicted roles in eukaryotic cell-cycle regulation, decapping and RNA stability.
BMC Genomics. 2004; 5: 45-45
Display abstract
BACKGROUND: The emergence of eukaryotes was characterized by the expansion and diversification of several ancient RNA-binding domains and the apparent de novo innovation of new RNA-binding domains. The identification of these RNA-binding domains may throw light on the emergence of eukaryote-specific systems of RNA metabolism. RESULTS: Using sensitive sequence profile searches, homology-based fold recognition and sequence-structure superpositions, we identified novel, divergent versions of the Sm domain in the Scd6p family of proteins. This family of Sm-related domains shares certain features of conventional Sm domains, which are required for binding RNA, in addition to possessing some unique conserved features. We also show that these proteins contain a second previously uncharacterized C-terminal domain, termed the FDF domain (after a conserved sequence motif in this domain). The FDF domain is also found in the fungal Dcp3p-like and the animal FLJ22128-like proteins, where it fused to a C-terminal domain of the YjeF-N domain family. In addition to the FDF domains, the FLJ22128-like proteins contain yet another divergent version of the Sm domain at their extreme N-terminus. We show that the YjeF-N domains represent a novel version of the Rossmann fold that has acquired a set of catalytic residues and structural features that distinguish them from the conventional dehydrogenases. CONCLUSIONS: Several lines of contextual information suggest that the Scd6p family and the Dcp3p-like proteins are conserved components of the eukaryotic RNA metabolism system. We propose that the novel domains reported here, namely the divergent versions of the Sm domain and the FDF domain may mediate specific RNA-protein and protein-protein interactions in cytoplasmic ribonucleoprotein complexes. More specifically, the protein complexes containing Sm-like domains of the Scd6p family are predicted to regulate the stability of mRNA encoding proteins involved in cell cycle progression and vesicular assembly. The Dcp3p and FLJ22128 proteins may localize to the cytoplasmic processing bodies and possibly catalyze a specific processing step in the decapping pathway. The explosive diversification of Sm domains appears to have played a role in the emergence of several uniquely eukaryotic ribonucleoprotein complexes, including those involved in decapping and mRNA stability.

Musunuru K, Darnell RB
Determination and augmentation of RNA sequence specificity of the Nova K-homology domains.
Nucleic Acids Res. 2004; 32: 4852-61
Display abstract
The Nova onconeural antigens are implicated in the pathogenesis of paraneoplastic opsoclonus-myoclonus-ataxia (POMA). The Nova antigens are neuron-specific RNA-binding proteins harboring three repeats of the K-homology (KH) motif; they have been implicated in the regulation of alternative splicing of a host of genes involved in inhibitory synaptic transmission. Although the third Nova KH domain (KH3) has been extensively characterized using biochemical and crystallographic techniques, the roles of the KH1 and KH2 domains remain unclear. Furthermore, the specificity determinants that distinguish the Nova KH domains from those of the closely related hnRNP E and hnRNP K proteins are undefined. We demonstrate through the use of RNA selection and biochemical analysis that the sequence specificity of the Nova KH1/2 domains is similar to that of Nova KH3. We also show that the mutagenesis of a Nova KH domain to render it similar to the KH domains of the heterogeneous nuclear ribonucleoprotein E (hnRNP E) and hnRNP K allow it to recognize longer RNA sequences. These data yield important insights into KH domain function and suggest a strategy by which to engineer KH domains with novel sequence preferences.

Ebner S, Sharon N, Ben-Tal N
Evolutionary analysis reveals collective properties and specificity in the C-type lectin and lectin-like domain superfamily.
Proteins. 2003; 53: 44-55
Display abstract
Members of the C-type lectin/C-type lectin-like domain (CTL/CTLD) superfamily share a common fold and are involved in a variety of functions, such as generalized defense mechanisms against foreign agents, discrimination between healthy and pathogen-infected cells, and endocytosis and blood coagulation. In this work we used ConSurf, a computer program recently developed in our lab, to perform an evolutionary analysis of this superfamily in order to further identify characteristics of all or part of its members. Given a set of homologous proteins in the form of multiple sequence alignment (MSA) and an inferred phylogenetic tree, ConSurf calculates the conservation score in every alignment position, taking into account the relationships between the sequences and the physicochemical similarity between the amino acids. The scores are then color-coded onto the three-dimensional structure of one of the homologous proteins. We provide here and at http://ashtoret.tau.ac.il/ approximately sharon a detailed analysis of the conservation pattern obtained for the entire superfamily and for two subgroups of proteins: (a) 21 CTLs and (b) 11 heterodimeric CTLD toxins. We show that, in general, proteins of the superfamily have one face that is constructed mostly of conserved residues and another that is not, and we suggest that the former face is involved in binding to other proteins or domains. In the CTLs examined we detected a region of highly conserved residues, corresponding to the known calcium- and carbohydrate-binding site of the family, which is not conserved throughout the entire superfamily, and in the CTLD toxins we found a patch of highly conserved residues, corresponding to the known dimerization region of these proteins. Our analysis also detected patches of conserved residues with yet unknown function(s).

Aravind L, Anantharaman V, Koonin EV
Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA.
Proteins. 2002; 48: 1-14
Display abstract
Protein sequence and structure comparisons show that the catalytic domains of Class I aminoacyl-tRNA synthetases, a related family of nucleotidyltransferases involved primarily in coenzyme biosynthesis, nucleotide-binding domains related to the UspA protein (USPA domains), photolyases, electron transport flavoproteins, and PP-loop-containing ATPases together comprise a distinct class of alpha/beta domains designated the HUP domain after HIGH-signature proteins, UspA, and PP-ATPase. Several lines of evidence are presented to support the monophyly of the HUP domains, to the exclusion of other three-layered alpha/beta folds with the generic "Rossmann-like" topology. Cladistic analysis, with patterns of structural and sequence similarity used as discrete characters, identified three major evolutionary lineages within the HUP domain class: the PP-ATPases; the HIGH superfamily, which includes class I aaRS and related nucleotidyltransferases containing the HIGH signature in their nucleotide-binding loop; and a previously unrecognized USPA-like group, which includes USPA domains, electron transport flavoproteins, and photolyases. Examination of the patterns of phyletic distribution of distinct families within these three major lineages suggests that the Last Universal Common Ancestor of all modern life forms encoded 15-18 distinct alpha/beta ATPases and nucleotide-binding proteins of the HUP class. This points to an extensive radiation of HUP domains before the last universal common ancestor (LUCA), during which the multiple class I aminoacyl-tRNA synthetases emerged only at a late stage. Thus, substantial evolutionary diversification of protein domains occurred well before the modern version of the protein-dependent translation machinery was established, i.e., still in the RNA world.

Williams AF
A year in the life of the immunoglobulin superfamily.
Immunol Today. 1987; 8: 298-303
Display abstract
The superfamily of molecules with immunoglobulin-like domains has recently been gaining new members-largely on the basis of sequence homology. Here Alan Williams reviews this new work and reveals how the comparison of sequence patterns enables decisions on membership to be made. Accommodation of the new structures demands the provision of new categories, and forces the abandonment of the conserved disulphide bond as the last invariant characteristic of an immunoglobulin-type domain. They may, however, provide more dues to the origins and evolution of the immunoglobulin superfamily.