I find unexpected duplications in the Anopheles assembly: are they real?


We are aware that the Anopheles gambiae genome assembly contains several regions where there is an artefactual duplcation, though for individual cases it is hard to be sure whether the duplication is artefactual or real and very recent. To be sure of the real situation, one would ideally look at a BAC that spans the region.

It was noted in the original publication of the genome assembly that some scaffolds have regions with anomalies that suggest possible mis-assembly. These are thought to arise because some regions of the PEST strain genome are polymorphic, and the assembly algorithm sometimes produced two versions of such regions instead of collapsing them into one (even after fine tuning to minimise this problem - see supporting text in the paper). The two versions may sometimes appear next to each in a single scaffold as artefactual duplications (these could be in tandem or inverted orientation).

The current assembly AgamP3 designates some entire scaffolds as probable alternative assemblies of other, chromosomally-placed scaffolds; and also designates some adjacent ends of 2 chromosomally-placed scaffolds as probable alternative assemblies. But there has not yet been a systematic attempt to identify all scaffolds with assembly problems, and short duplications within a scaffold would be very hard to identify automatically.

An informal list of some scaffolds where a problem may exist can be found here.

Please add extra examples if you find them.

You may also what to use A. gambiae S and A. gambiae M to resolve this issue.