Comp Funct GenomicsComparative and Functional Genomics1531-69121532-6268Hindawi Publishing Corporation186289142447211 S153169120100033X 10.1002/cfg.93 Research ArticleSAND, a New Protein Family: From Nucleic Acid to Protein Structure and Function Prediction CottageAmanda EdwardsYvonne J. K. ElgarGreg
gelgar@hgmp.mrc.ac.uk
82001242262351520012162001Copyright © 2001 Hindawi Publishing Corporation.2001Hindawi Publishing Corporation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

As a result of genome, EST and cDNA sequencing projects, there are huge numbers of predicted and/or partially characterised protein sequences compared with a relatively small number of proteins with experimentally determined function and structure. Thus, there is a considerable attention focused on the accurate prediction of gene function and structure from sequence by using bioinformatics. In the course of our analysis of genomic sequence from Fugu rubripes, we identified a novel gene, SAND, with significant sequence identity to hypothetical proteins predicted in Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, a Drosophila melanogaster gene, and mouse and human cDNAs. Here we identify a further SAND homologue in human and Arabidopsis thaliana by use of standard computational tools. We describe the genomic organisation of SAND in these evolutionarily divergent species and identify sequence homologues from EST database searches confirming the expression of SAND in over 20 different eukaryotes. We confirm the expression of two different SAND paralogues in mammals and determine expression of one SAND in other vertebrates and eukaryotes. Furthermore, we predict structural properties of SAND, and characterise conserved sequence motifs in this protein family.