New IDs for Anopheles


Old identifiers are Ensembl style: ENASANGG + 11 digits for gene, ENSANGT for transcript and ENSANGP for protein. The numerical parts of the identifiers do not necessarily match (in fact they usually do not).

New identifiers are VectorBase style: AGAP + 6 digits for gene, gene identifier plus -RA and -PA for first transcript and its translation, -RB and -PB for an alternative transcript, etc. Hence a gene, its transcripts and its protein products all have the same AGAP + numeric value. This style is already in use for Aedes within VB, and it matches the familiar Drosophila usage of -RA and -PA suffixes.

Relationship between old and new identifiers

There is no relationship whatsoever between the numerical part of an old Ensembl-style identifier and the numerical part of the corresponding new VB identifier. The new identifiers were assigned in numerical order along the chromosome arms but this is not a permanent feature and should not be relied on. When a new identifier is needed in the future it will take the next numerical value, regardless of position.

Simple example: Gene CPR34 has the VB identifier AGAP006864, with transcript AGAP006864-RA and protein AGAP006864-PA. The corresponding old identifiers were gene ENSANGG00000020866, transcript ENSANGT00000023320 and protein ENSANGP00000024283.

Because the move to new identifiers happened at the same time as the release of a revised gene set (AgamP3.4), there is not always a one-to-one relationship between old and new identifiers. But we have mapped Ensembl-style identifiers in the previous gene set to the new gene set identifiers wherever possible.

Help with mapping old identifiers to the new ones

Here's how you can see what has been mapped to what:

The genome browser page "ID History" summarises relationships between old and new identifiers. This page can be reached by clicking the "ID history" link at the left of each Gene page (for genes) or Transcript page (for transcripts and proteins), or from the search results for an old-style identifier. It includes a map showing relationships between identifiers. Lines joining nodes on the graph are colour-coded by score (higher score means better mapping - but see warning below). If you click on a line joining 2 nodes on the map, you get a pop-up that includes a score for the mapping. The numbers on the nodes are sequence versions (all the new identifiers are currently .1).

You can also use "ID History Converter" to enter a list of old-style identifiers and find their current equivalents. This tool needs your list of genes in text format (*txt) and the output will be in HTML or text.

In some cases, an old gene can be mapped to two or more new genes ('split') or more than 1 old gene maps to a single new gene ('merge'). Here are examples of what a 'merge' and a 'split' look like in the ID History page:

Here are some numbers for mappings of gene identifiers:

Mapping old -> new identifiers No.
Old ids mapped 1-> 1 to new ids 12,343
Old ids mapped 1-> many new ids 182
Old ids not mapped to a new ids 1,221
Mapping old -> new identifiers No.
New ids mapped 1-> 1 to old ids 11,348
New ids mapped 1-> many old ids 638
New ids not mapped from old ids 959

Although the mappings should all be valid, the 'score' values attached to the mappings are sometimes misleading. A score of 1 generally indicates indicates identity, and very low scoring mappings usually represent just a small region of overlap. For genes with multiple mappings, the highest score will nearly always be the best mapping. But not all the good 1-to-1 mappings have a score of 1, and you shouldn't try and do anything too clever based on the scores.
We provide files showing all the mappings that we consider useful: download files.

  • New IDs to Old IDs - Genes
  • New IDs to Old IDs - Transcripts
  • New IDs to Old IDs - Translations
  • Old IDs to New IDs - Genes
  • Old IDs to New IDs - Transcripts
  • Old IDs to New IDs - Translations

Format is:

new_stable_id space separated old_stable_id(s) in score order


old_stable_id space separated new_stable_id(s) in score order

Download the mapping files

You can download all of the mapping files here. You will need to use gunzip and tar (or one of various multifunctional file decompressors - e.g. 7-Zip) to unpack this file. Please refer to this FAQ or contact the VectorBase help desk if you need any assistance.

Microsoft Excel conversion

You can convert an arbitrary list of ENSANG ids using any of the "Old to New" files as follows (instructions given for transcript ID conversion, adapt as necessary for the other files):

  1. open the old to new transcript .tsv file in Excel
  2. create a new worksheet alongside it
  3. paste in your ENSANG IDs in column A of the new worksheet
  4. enter the following formula in cell B1: =VLOOKUP(A1, 'Old IDs to New IDs - Transcript'!$A$1:$B$12571, 2, FALSE)
  5. copy the B1 formula down the rest of the column to look up all your IDs

Programmatic mapping

If you already have experience with the Ensembl Perl API, it is relatively simple to get it to map old↔new identifiers, please contact the VectorBase help desk for further information.
Here is some sample code that will output the current IDs when given an input file of old IDs.

DISCLAIMER: please note that it is not guaranteed fit for purpose. Check that it does what you expect/want. It seems to work for gene IDs and transcript IDs so will probably also work for protein IDs.

#!/usr/bin/perl -w

use lib '/EDIT/THIS/PATH/TO/ensembl45/ensembl/modules';
use Bio::EnsEMBL::DBSQL::DBAdaptor;

my $dbname = 'anopheles_gambiae_core_45_3h';
my $ensdb = Bio::EnsEMBL::DBSQL::DBAdaptor->new(
-user => 'anonymous',
-host => '',
-driver => 'mysql',
-dbname => $dbname,

while () {
next if (/^#/); # skip any comment lines beginning with '#'
my ($oldid) = split; # take old id from first column of tab or space delimited file
my $currentid = old2current($oldid) || "N/A";
print "$oldid\t$currentid\n";

sub old2current {
my $geneid = shift;
my @current_ids;

my $arch_adaptor = $ensdb->get_ArchiveStableIdAdaptor();
my $arch_id = $arch_adaptor->fetch_by_stable_id($geneid);

if (defined $arch_id) {
my $history = $arch_id->get_history_tree;
foreach my $a_id (@{ $history->get_all_ArchiveStableIds }) {
if ($a_id->release == $arch_adaptor->get_current_release()) {
push @current_ids, $a_id->stable_id;
# only return an ID if it's unique
return @current_ids == 1 ? $current_ids[0] : undef;