What data is available for download?
The data in VectorBase is available for download in a variety of formats.
Genomic sequences
Fasta files are available for contigs, scaffolds, and (if available) chromosomes, for all assembled genomes in VectorBase. All sequences are soft-masked. The assemblies are described by AGP (v2.0) files.
Repeat data
Repeat features annotated on assembled genomes are available in GFF3 format. For many species the custom repeat library used with RepeatMasker is also available.
Gene annotations
Gene sets for assembled genomes are provided in GFF3 and GTF (v2.2) formats. The sequences of the transcripts and proteins are available as Fasta files.
Transcriptomes and proteomes
Transcriptomes and proteomes are available, in Fasta format, for species which do not have a genome assembly (and also for some species which do have a genome assembly).
Projection data
When assemblies are updated, genes are projected from the old to the new assembly. This process generates a range of files (in text, Fasta, and GFF3 format), described in detail in another FAQ.
Microarray data
For Anopheles gambiae and Aedes aegypti there are tab-delimited files of gene-averaged expression summary data. These are the p-values and text summaries for genes that are displayed in the Expression Browser.
Ontologies
Ontologies used in VectorBase are provided in OBO format.
Comparative data
Files relating to comparative analyses are described in detail in another FAQ.