Login
Help

PROTOCOL

Submit your Data

Protocol of Gene Homology Pipeline

Species

Molgula oculata; Molgula occidentalis; Botryllus schlosseri; Halocynthia roretzi; Halocynthia aurantium; Ciona intestinalis; Ciona savignyi; Phallusia mammillata; Phallusia fumigata; Latimeria chalumnae; Callorhinchus milii; Pelodiscus sinensis; Gallus gallus; Homo sapiens; Mus musculus; Branchiostoma belcheri; Saccoglossus kowalevskii; Strongylocentrotus purpuratus

Authors

Paul Simion

Céline Scornavacca

Frédéric Delsuc

Emmanuel J.P. Douzery

Gene Homology Pipeline Description
We downloaded proteomic data for 12 species spanning the diversity of chordates and including outgroups in addition of which we used data produced here for 6 new tunicate species. These 18 datasets were dereplicated (i.e. the longuest transcript for each gene was kept) and then used as input for the clustering software package SiLiX (Miele et al. 2012). Two rounds of clusterization were ran, the second one having been designed to break apart and reclusterize around 30% of all sequences that were affiliated to the same "mega-cluster". We then only kept cluster of sequences that contained either at least two tunicate sequences, or at least one tunicate and one vertebrate sequence. This resulted in 12,885 clusters of homologous sequences, containing both orthologs and paralogs.

These clusters were then aligned using MAFFT (Katoh & Standley 2013), and fragmented sequences (i.e. small sequence in both absolute length and relatively to the rest of the alignment) were discarded. For each of these alignements, we computed a phylogenetic tree as well as a 100 bootstrap replicates to estimate node support using the LG+G4+F evolution model in RaxML (Stamatakis 2014). These trees of homologous sequences were then analyzed with a custom program (written in C++) in order to detect the vertebrates orthologous genes of each tunicate sequence. This phylogeny-based orthology information was subsequently used to create a definitive name for tunicate gene that follows recently published recommendation for tunicate gene nomenclature (Stolfi et al. 2015).
References
Miele et al. Ultra-fast sequence clustering from similarity networks with SiLiX, BMC Bioinformatics

Kazutaka Katoh, Daron M. Standley.(2013).MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 30(4):772-780.

Alexandros Stamatakis.(2014). RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312-1313.

Stolfi A et al.(2015) Guidelines for the Nomenclature of Genetic Elements in Tunicate Genomes. Genesis 53(1):1-14.