Collaboration and data sources
We received 212 samples of RNA extracted from mosquito pools that tested positive for West Nile virus from Alex Westrich, Claudia Blanco, and Raed Mansour from the City of Chicago Department of Public Health. Samples were selected for sequencing from each trap location within the city that we had samples from. 26 samples were ultimately selected and had suitable genome coverage after sequencing for phylogenetic analysis. All samples came from a single year, 2018. In this update, we highlight the sequencing results and a few conclusions drawn from a maximum likelihood based phylogenetic analysis.
The sequencing data was generated using an amplicon-based sequencing scheme, PrimalSeq (Grubaguh et al. Genome Biology 2019). Our full protocol is available online here. Sequencing reads were aligned using bwa and processed using iVar.
Consensus sequences and BAM files along with associated metadata are available on Google Cloud. Alignment statistics are shown in Table 1 below.
|Table 1. Alignment statistics|
|Sample||Percent Genome Coverage||Mean Coverage per Nucleotide|
We constructed a maximum likelihood (ML) phylogeny using 1927 genomes of West Nile virus from USA including 26 of the provided samples which had genome coverage greater than 70%, highlighted in light blue. Tree and root to tip regression plot are shown in Figure 1.
Illinois sequences are found spread throughout the tree in clades containing mostly Eastern United States sequences. In general, these sequences don’t cluster together within the tree, and form multiple separate clades with sequences from a range of states, including Louisiana, New Hampshire, New York, and Ohio. This is likely occurring for one of two reasons. First, this could be evidence of extensive mixing of transmission chains in the Eastern United States as compared to what we’ve seen in the West (see California and Washington). Due to environmental and/or host conditions, the virus is not able to overwinter in Illinois and is readily replaced each year by another introduced virus. Long branch lengths between neighboring sequences in these areas of the tree and only having sequencing information from a single year mean that it is hard to pin down exactly where and how often these introductions occur. Sampling from neighboring states and previous years is necessary to confirm this. On the other hand, given that this is preliminary data, an alternative explanation is that these sequences contain contamination given that most of the sequences the Illinois genomes cluster with were processed and sequenced in our lab. We are exploring both possibilities and will report on our finding when we have something conclusive.
Please note that this data is released as work in progress by the WestNile 4K Project and should be considered preliminary. If you intend to include any of these data in publications, please let us know – otherwise please feel free to download and use without restrictions. We have shared this data with the hope that people will download and use it, as well as scrutinize it so we can improve our methods and analyses. Please contact us if you have any questions or comments.