Using Biomart

    BioMart is a query-oriented data management system developed jointly by the Ontario Institute for Cancer Research (OICR) and the European Bioinformatics Institute (EBI. It can be used to perform complex data mining in order to extract relevant information for the researcher.


1.Choose a database and dataset

The databases available are the corresponding to the EST collection for S. salar and O. Mykiss. The dataset correspond to the Unigene assembly for each species. In the future, once there is more information regarding to other data (e.g. expression data), more options will be available.

2.Choosing filters.

The filters define the criteria that will limit the search. First click on 'Filters' in the panel on the left. The different categories of filters then appear.

As an example we will search for all contigs in Salmo salar EST database that contain a putative SNP, have an Interpro domain assigned and have one or more genes orthologous to Danio rerio. After selecting Salmo salar database and the unigene dataset we will click on filters.

Then, in the UNIGENE section, we will pick the option “limit to unigenes” with SNPs prediction only.

We will also expand the multispecies comparison track and we will click on homolog filters and then select only with orthologous to Danio rerio and under the track Protein domains we will select “limit to unigenes…” with Interpro Only.  As you can see in the figure in the panel on the Left, the selected filters are shown.

3.    Choosing attributes

Now, we want to get some information from all unigenes that match our selected filters or criteria. On the panel on the left you will click on attributes. There are four types of data that  can be selected and each of them can be customized to get only the information you require. Following our previous example, we will select the data type “ATTRIBUTES” and in the FEATURES section we will select all Unigene information. This includes information such as the Unigene ID, CDS start and stop position within the conting, etc. We also want to know if our unigenes are participating within a pathway so we will select the EC number prediction in the external references section. Since we were filtering by Interpro containing unigenes we also select this attribute in the PROTEIN DOMAINS section.

     4.    Count

It also shows the number of total entries in the database. In our example, there are 3285 hits that matched our selected filters out of total of 59336 unigenes.

It is possible to start a new search by clicking on new, but doing this we will start again so it is suggested to go in to results first and download the data.

     5.    Results

    Now, to save the information from your search you must click on Results in the upper left panel. The information will be displayed in a table format that can be downloaded from the server.

    For each type of data needed (attributes,sequences, variations or homolog) you have to click on results and download each data set. In our example, we also want the information regarding to Danio rerio orthologs, so we click on data group HOMOLOGS and then click on the corresponding attributes that we want to get (e.g. all Danio rerio information.)

    Finally the output table is shown once you press results on the upper panel.

    The results can be exported to a file in 4 different formats including CSV (comma separated), TSV (tab separated), HTML (webpage) or XLS (Excel file). Selecting “unique results only” will avoid showing duplicated entries.

6. New

    To start a new search click on New in the upper left panel. Once you do this you will loose the information from your previous search so it is suggested to go to Results and download the data first.