- Release Notes
Aniseed Release Notes
New data:
Gene expression data:
Thanks to a collaborative agreement with Yutaka Satou, the
system now includes the annotated ISH data from Transcription
factor/signalling molecules carried out in Yutaka Satou and Nori Satoh's
lab. Pictures are not yet available, but will be in the coming months.
For now if you want to see the pictures from this screen, click on the
link in the result page that will bring you to the home page of the
Ghost database where you will find the original images. When refering to
any gene from this screen, please refer to the original publication in
Development, NOT to Aniseed. The same applies when refering to the
Halocynthia data found on Aniseed: refer to the initial paper, not to
Aniseed. We have also started in Collaboration with Yutaka to reannotate
more precisely the expression profiles, making use of the more detailed
Aniseed anatomical dictionaries. This reannotation will be put online
progressively.
Gene annotation:
Most gene annotation, and in particular Gene Ontology terms
and names was derived from orthology relationships, which only exist for
about 50% of gene models. We have added GO terms (localisation and
molecular function) and names (highly similar to... similar to...) on
the basis of the best blast hit.
Anatomy:
During the past months, we have corrected some earlier
problems in the Ciona anatomical dictionaries, and horizontally linked
them, by introducing lineage information. You can now trace the lineage
of a structure and use this to query for genes expressed in a precise
lineage across developmental stages. We also defined 17 keywords
defining the final fates of given blastomeres or structures from Ciona
intestinalis. Thanks to the 3D data we obtained and quantified using the
3D embryo Handler software and 3D embryo reconstructions, we provide
some informations relative to neighbourhood relationships, shape of
blastomeres, etc.... This is currently only available for two stages of
Ciona embryos: 16- and 32-cell, but we are now building up a collection
of reconstructed embryos and their biometrical analysis will gradually
be put online. Some embryological information (inductions, competence of
cells) is available for the late 32-cell stage. This is only very
preliminary and subject to changes wihout notice. Use it with
precautions. It is just there to give a flavour of what we would like to
develop in the future.
New tools:
Anatomy interface:
Upon selection of a dictionary, it is possible to ask to
display the structures that will eventually adopt one or several fates
of interest. They will be marked by a green dot on the results page. At
the top of the results page, all buttons are now active. Upon selection
of a structure: The "In situ data" button will give access to all genes
expressed in this structure, The "lineage" button will show you all
progenitors and progeny (the interface will improve soon), The
"biometry" button only works with cells, and gives access to the volume,
surface, shape, and neighbours of the selected cell with a quantitation
of the surface of contacts (only for 16- and 32-cell stages so far). The
"fates" button gives the fates of the selected structure (works only for
cells). The "induction" button indicates whether the cell is subject to
an induction process at this stage. This is still work in progress, very
few data are entered but please give us feed back on what you like or
dislike there.
Fates interface:
Select one or several stages, one or several fates, which
boolean syntax you want to use, and the results page will indicate all
relevant cells and structures. The format of the results page is OK but
not yet great but it will improve with time (and with the help of your
suggestions).
Neighbours interface:
Select your favorite cells within the anatomical dictionary
(beware this only works with cells, not higher order structures), a
maximal( or minimal) distance for cells of interest, a minimal or
maximal surface of contact and the system will retrieve a list of
relevant cells. Clicking on a cell opens a new page showing its position
in the anatomical dictionary.
Biometry interface:
Select a species and a stage (works only on 16 cell and late
32 cell in Ciona intestinalis), and some properties of interest for your
cell (entropy (meaning compactness), sphericity, elongation, flatness,
large or small volume, etc...). The results page lists all cells
satisfying the criteria, and with all their shape descriptor values.
Click on a cell and you will see its position in the anatomical tree.
Lineage interface:
select species, a stage and a structure, tick the boxes
according to whether you want to see the progenitors, progeny or both of
the selected cells. The system will return a list of cells with an
indication of the stage they are present. A tree-based representation
will soon replace the current interface.
DDD: (alias Digital Differential Display):
This interface allows to find genes that have differential EST
representation between sequenced cDNA libraries. It uses the idea (put
forward in Ciona by Yutaka Satou and N. Satoh) that as the libraries
were not normalised , the number of ESTs in a library reflects the
abundance of the transcript in the starting population. The DDD
interface allows you to find genes expressed at statistically higher
levels between sets of libraries. This is a very powerful tool, with ONE
MAJOR LIMITATION: each request mobilises a large proportion of the
resources of our current server. So please, for the sake of others, do
not play with it, only place the requests that make sense for your
research. If the use of this interface significantly reduces the speed
of the system, we will remove the interface until we have migrated to a
new more powerful server (see below).
Refine mode:
the last tool is the tiny "refine" button that appears at the
bottom of most results page (except the ISH results pages). By selecting
another interface and pressing refine, you tell the system you want to
use the results on your page as the search space of the next query. In
short, it allows you to place sequential queries such as find the zinc
finger genes (Interpro search) that are annotated transcription factors
(Refine with Gene Ontology search), etc... Play with it and you will see
how powerful this small addition makes the system.
Technical aspects:
From Oracle to PostgreSQL:
Aniseed version 1.0 was using a commercial SQL database
system, Oracle, as is the released version 2.0. However, in the next few
weeks, version 2.0 will migrate to a different, non-commercial SQL
engine: PostgreSQL. Advantage of this is that it will be possible to
install mirror sites of Aniseed away from Marseille, making the system
less sensitive to server troubles. It will also make the system truely
generic and easier to use for other model systems.
Migration to a new server:
While postgreSQL opens the way to mirror sites, we will be
migrating within a month to a more powerful independent server within an
IBM cluster. This should allow faster processing of more complex
queries. No change of URL address is anticipated.
Future prospects:
At present we estimate that the development of new interfaces,
besides the ones mentioned above, will slow down and that the focus will
be placed on:
Getting as many data into the system as possible.
This includes more ISH (currently a bit over 10000 for Ciona and
Halocynthia combined) and promoter expression data (currently less than
10.....), more anatomical data (for Halocynthia for instance), more
embryological data (inductions, competence, etc...) and more 3D data and
reconstructed embryos. We would like the whole community to contribute
to this aim and will therefore release the loader software for remote
submission of ISH and Cis-regulatory element data, probably during the
Santa Barbara meeting. We will make all efforts that submitted data are
be tracable, and properly attributed to their contributing author. We
are currently reconstructing in 3D a more important set of embryos, and
expect to release the 3D Embryo Handler software at the beginning of
autumn, with the associated reconstructed embryos. Again, this should
allow the community to contribute additional models to the other labs.
Making all data downlodable as flat or XML files.
We have started to put some flat files in the download section, but this
is still incomplete. We plan to release the rest of the data by summer,
so that people can donwload them and use them for large or small scale
bioinformatics analysis.
Community decisions over the future of the system.
The system is now getting rather large and it may be time for it to
become steered by the community rather than by my lab. May be it would
be a good idea to include a discussion of its future during the Monday
July 11 evening round table in Santa Barbara? For instance, should we
decide to nominate a steering committee to oversee future developments?
The Aniseed (Ascidian Network of In Situ Expression and Embryological
Data) system is a community resource for ascidian developmental studies.
It allows one to mine and download available embryological, anatomical,
genomic and gene expression data.
It is an Oracle database organised in six main parts:
Anatomy:
For each of the 22 stages we defined, the anatomical field describes,
using a controlled hierarchical dictionary, the different biological
structures and blastomeres present in each ascidian species. This
ontology is represented as a directed graph, which allows one to
organise terms as the nodes of a tree and to link them according to the
characteristics they share. We thus described each organ and structure
in a hierarchical way where terms at the top of the hierarchy represent
a global structure (eg. Mesoderm) while child terms correspond to more
precise parts (eg. Secondary notochord lineage). This description has a
single-cell resolution level up to the beginning of gastrulation, a
stage up to which the lineage was completely worked out. Following this
stage, the ontology follows the germ layers and was designed to be as
compatible as possible with vertebrate ontologies.
Molecules:
The system hosts all available ESTs, cDNAs, and gene models for the two
species Ciona intestinalis (mainly the very big Kyoto set and the small
Marseille set) and Halocynthia roretzi (the set contributed by the
Halocynthia consortium via Kaz Makabe and Takeshi Kawashima). 400.000
Ciona EST and cDNA clones are clustered according to the predicted gene
model they correspond to. The remaining 80.000 clones could not be
matched either because of the draft quality of the assembly (5% of genes
are estimated to be missing) or of the gene predictions (these
predictions do not at present take into account the EST data and
frequently miss the 5' and 3' ends of genes). The proportion of clones
correctly clustered will increase with the accuracy of the assembled
annotated genome.
Functional annotation of proteins:
Functional annotation of the predicted proteins was achieved by three
methods. We first run a programme, Inparanoid (Remm et al., 2001), which
identifies orthologues by comparing in a pairwise fashion proteomes from
completely sequenced organisms. Clear fly, human or mouse orthologues
for approximately 50% of predicted Ciona genes (8119/15592) are detected
this way. The relatively small % of detected orthologues is again
probably due to the incompleteness of the JGI gene models. The
orthologues are then used to name, but also to attribute a Gene Ontology
classification to the Ciona gene. In parallel, we run Interproscan
(Zdobnov et al., 2001) for each Ciona protein and deduced the presence
of functional motifs. These were in turn used to attribute GO terms to
proteins without clear orthologues. Finally, a BlastP search against
trembl and swissprot with a cut off of 1e-06, will soon be used to
complement the GO information for proteins without clear orthologues or
motifs, but with similarity to proteins previously assigned a function.
The identification of orthologues also opens the way to a comparison of
expression profiles among metazoans.
Additional tables were included in the design of the database to more precisely characterise the function of proteins. These tables include for example protein interaction data, and DNA binding specificity of transcription factors. At present, however, they remain empty.
Expression data:
Two types of expression data are currently supported.
The ESTs generated in the Ciona genome projects originate from a
collection of non-normalised cDNA libraries from different stages and
adult tissues. Clustering of the ESTs on the basis of their
correspondance to a given gene model allows one to calculate the
abundance of the clones corresponding to this gene in the different
sequenced libraries. This EST count proves to be a reliable measure of
the level of expression of a gene at a given time or in a given tissue
(Satou et al., 2003).
In addition, Aniseed currently hosts In situ hybridisation data for
around 200 Ciona intestinalis genes with a restricted expression
pattern, mainly coming from the in situ screen carried out in the
Lemaire lab (Marseille, France). In situ hybridisation patterns are
illustrated by standardised pictures (orientation, format) and described
using the controlled vocabulary anatomical dictionary for the relevant
stage. In addition to In situ data, Aniseed supports the description of
promoter analyses and immunohistochemistry.
A unique feature of Aniseed is that it supports both wild type
expression patterns, as well as expression patterns in manipulated
embryos. Manipulations supported include both embryological (blastomere
explantation, or ablation) and genetic (over-expression, Morpholino
knock-down, treatment with pharmacological inhibitors or recombinant
signalling proteins) treatment. This type of information is of crucial
importance for the reconstruction of genetic cascades.
Literature:
This part describes the source of the data either published or
unpublished.
How to query Aniseed:
The web interface allows one to search Aniseed by
. In situ data
. Anatomy
. Molecules
. BLAST
. Gene ontology
. InterPro domains
In situ data:
Following the selection of a species and a developmental stage, this
page allows one to search for genes that are expressed in individual or
multiple structures from the anatomical dictionary. Conversely, the
expression data for a given gene model can be obtained. In addition to
wild type embryos, it is possible to search for expression patterns in
deregulated contexts. It is also possible to search for pictures showing
co-expression of two genes.
The result page displays the species and stage, thumbnails of the
relevant in situ pictures, a brief description of the staining and the
identity of the stained molecule according to the controlled dictionary,
corresponding gene model(s) and the labelled territories. All these
fields can be clicked for further information. Upon clicking on
"more" a second page appears showing a larger picture, a recap
of all expression domains at this stage, the name of the annotator with
direct e-mail link, the experimental conditions, and references.
Included as well is the possibility to search for other genes expressed
in the same territories, to perform expression clustering analysis and
to search for expression data for the same gene, but in deregulated
contexts (i.e. overexpression, morpholino injections, mutant background
or ablation of embryo parts, explants, etc. ).
Anatomy:
This page displays the anatomical dictionary at a given stage in a given
species. The displayed anatomical dictionary can be used as an
alternative interface to look for genes expressed in selected
structures. The "get lineage", "get position" and
"get fates" buttons are currently being implemented.
Molecules:
This page allows one to search for molecules by species, clone name,
clone sequence name (Genbank accession number), and also gene name
(biological name). The results window gives access to all genes matching
the query. Selection of one gene leads to a detailed description of its
features: link to the JGI genome project page, display of Interpro
domains, link to EST counts and in situ data, prediction of orthologues
in other complete genomes (Inparanoid predictions) or of paralogues in
Ciona, and best BlastP hits in the Swissprot database. These pieces of
information form the basis for the Gene Ontology classifications of the
genes.
Blast search:
This function allows one to search for Ciona molecules showing
similarity to a sequences of interest.
Gene Ontology/InterPro search pages:
Allows one to search for genes according to their associated Gene
Ontology or InterPro terms (search for proteins involved in a given
process, molecular function, subcellular localisation or with given
protein motifs.)
How can you contribute to Aniseed?
The aim of Aniseed is to form a community tool that will help us all in our research, but may also in the future, allow one to start some modelling work on Ciona embryogenesis. The more labs that participate inthe project, the more satisfying the tool will be for all. Key in our mind is that the future of the tool will be determined by the participating labs. You can participate at many levels.
Entering your expression data:
This is the most simple way to participate and a very important one. You
can already request from us the loader as a beta tester. You will see
that entering data is rather simple and we welcome your views on how to
make the process even more simple. There are several types of data you
can enter:
1) published expression data on your favourite gene(s). These are
usually very high quality data and most sought after.
2) expression data on genes you are not very interested in and do not
want to take time to publish. These data are usually of lesser quality
but are still very valuable to the community as they can guide other
people's steps. You will see that entering the data in Aniseed is much
simpler than publishing them, and your name (and e-mail) will remain
attached to the data.
3) large scale in situ screens. These data are usually of lesser
quality, but they are invaluable again as a guide for others.
Communicating embryo models:
Making embryo models is a rate limiting step. If you are interested in
participating to this task, let us know. We will then let you have all
the information about formats, etc..; so that your work is compatible
with the system.
Expressing your wishes:
Once you have tried Aniseed you will probably have comments and
suggestions for new pages allowing new searches, for types of data that
are not yet supported, etc.... You are most welcome to communicate them
to us and we will try and see what we can do, especially for reiterated
requests.
Developing software:
If you are interested in developing new tools, we are most happy to help
you do so...... Just let us know so that we can organise this.
We hope you will have fun with Aniseed, and look forward to your participation,
All the best,
Olivier Tassy and Patrick Lemaire