PhD studentship, deep learning of mosquito immunity

PhD studentship in Deep Learning of disease-vector biology: hacking the
mechanism of mosquito adaptation and pathogenic immune evasion.

Thanks to a grant from the National Productivity and Investment Fund and the
BBSRC the University of Cambridge is able to offer four fully funded PhD's in
the area of artificial intelligence and Data-Driving Economy. All projects
have been created jointly by a University of Cambridge Department or Partner
Institute and an Industrial Partner Company.

Successful candidates must be able to start their PhD before the end of
December 2018.

Students will become part of the BBSRC DTP Cohort offering an extra level of
support and training opportunities.

Candidates are asked to apply directly to Dr Monique Gangloff (e-mail by the 30th June, before 12 noon to be considered for the

Short Description

This project will apply deep learning and Bayesian approaches to
characterizing functional elements in the mosquito genome. The predictions
will be used to gain an understanding of how the malaria parasite evades the
host immune system, and will be validated experimentally using structural,
molecular and cell biological techniques.


Malaria is the most severe mosquito-borne disease with over a million deaths
annually. Infectious diseases are now spreading from the tropics to European
countries, likely as a result of climate change. The economic cost of these
diseases can be mitigated using data-driven approaches in a variety of ways.
One of these is tracking disease outbreaks and the geographical spread of the
pathogen. Another complementary approach is to apply machine learning
techniques to next-generation sequencing data in order to identify
polymorphisms that impact disease progression via, for example, the host
immune response to the pathogen. These are areas of interest for,
which is developing a novel blockchain-based system for delivering
distributed machine learning algorithms across a decentralized network of
processing nodes.

The objective of this proposal is to use VectorBase, a curated database of
genomic, phenotypic and population-centric data to perform a machine learning
and bioinformatic analysis on the Toll immune signalling pathway.
Preliminary, unpublished data suggests that it plays a role in the malaria
parasite’s evasion of the host immune system. In addition to potentially
providing avenues for halting spread of the disease, this project will also
serve as a test-case for developing’s distributed ML system

Technical summary

One of the issues of performing genomic analysis on disease vectors is that
far less proteomic and transcriptomic data is available for functionally
annotating their genomes. As a result, less training data is available to
train bioinformatics algorithms for predicting transcribed genes, protein
coding sequences and other functional elements.

The goal of this project is to further develop Hidden Markov model (HMM) and
deep recurrent neural network (RNN)-based predictors of functional elements
such as transcribed genes, signal peptides, and post-translational
modifications. This builds on the established algorithms for gene annotations
but is also focussed on tuning model parameters to increase performance for
poorly annotated genomes. This approach builds on Prof. Hain’s experience
of applying HMMs and RNNs in speech recognition and Dr. Ward’s experience
in sequence analysis. It is intended that a limited subset of the most
confident and biologically important predictions from the system will be
verified experimentally by Dr. Gangloff’s group.

Experimental validation will use synthetic genes suitable for structural and
functional studies. Doctoral training in molecular, cellular and structural
biology will be provided at the Department of Biochemistry by Dr. Gangloff
and facility managers, Dr. Chirgadze and Dr. Stott, respectively. Structural
techniques (X-ray crystallography and cryo-electron microscopy) will be
combined with biophysical tools to understand the mechanism of mosquito
adaptation and/or pathogenic immune evasion with atomic detail visualization.
Mosquito cell-based signalling assays and confocal fluorescence microscopy
will determine the cellular localization of the gene products and
co-localization of adaptors molecules. Structure-function relationships will
provide insight into the mechanism of protein network evolution.

The tuning of RNN and HMM annotation models are typically complex, involving
multiple parameters that include the DNA and protein substitution matrices
for the target genome, and are poorly tuned for less-studied model organisms.
This high-dimensional space of hyperparameters cannot easily be sampled using
standard approaches, and more sophisticated techniques such as Gaussian
Process models are required. These algorithms will be developed in the course
of the project to run across the’s distributed processing network,
thereby providing valuable use-cases for the system.

Academic Supervisors - Dr Monique Gangloff and Professor Nick Gay (University
of Cambridge)

Industry Supervisors - Dr Jonathan Ward and Professor Thomas Hain (

Entry requirements

Applicants must have obtained a First or Upper Second Class UK honours
degree, or equivalent qualifications gained outside the UK, in an appropriate
area of science or technology, including Mathematics, Computer Science,
Physics or the Biological Sciences. Previous experience in programming,
mathematics and data analysis would be advantageous.

Fixed-term: The funds for this post are available for 4 years in the first

Please contact Dr Monique Gangloff ( with your CV and two
references as soon as possible to discuss your application. More information
can be found at

Please quote reference PH15820 on your application and in any correspondence
about this vacancy. The University values diversity and is committed to
equality of opportunity.
The University has a responsibility to ensure that all employees are eligible
to live and work in the UK.

University of Cambridge (UK)