Collaboration and Data Sources
The samples from San Diego county were provided by Nikos Garfield and Saran Grewal from the San Diego County Vector Control Program. The samples from all the other counties in California, including Sacramento-Yolo and Kern were provided by Ying Fang and Chris Barker from the Barker Lab, University of California, Davis and Sarah Wheeler from Sacramento-Yolo Mosquito and Vector Control Program.
The samples from WA were provided by Krisztian Magori from the Eastern Washington University, Amy Salamone, Wayne Clifford and David Kangiser from the Washington State Department of Health.
Data Generation
The sequencing data was generated using PrimalSeq (Grubaugh et al. Genome Biology 2019). Our full protocol is available online here. Sequence reads were aligned using bwa and processed using iVar.
Raw Data
The consensus sequences and associated metadata are available on Github. The alignment files in BAM format are available on Google cloud: California and Washington.
Percent coverage and depth per nucleotide for all alignments are available in a tsv file on Github.
Preliminary Analysis
A total of 615 samples from California were used in this analysis of which 7 from Washington. The sampling scheme in California, shown in Fig 3B, was put together specifically to explore the longitudinal transmission dynamics of West Nile virus(WNV).
We constructed a maximum likelihood (ML) phylogeny using 1167 sequences from USA. Fig 2 illustrates the clear spread of the virus from east to west. From the ML tree in Fig 2, we pulled out all clades with >= 90% of sequences from CA which are shown in Fig 3.
In the uppermost clades in the ML tree (shown above) we see that sequences from San Diego (red), Kern (orange) and Sacramento-Yolo counties (shades of green) cluster together. This clearly indicates a movement of the virus between the North and the South. In addition, we see that the sequences from San Diego cluster directly with sequences from Sacramento indicating that this maybe a transmission chain covering a large distance. This could suggest transmission via migratory birds, but further analysis using time resolved phylogenies is essential to investigate this hypothesis.
In the lower clades of the ML tree (also shown below), we see that the sequences from WA (labelled WA) diverge from a clade predominantly comprised of sequences from Kern. However, due to the long genetic distance between the two clades and the low support value of the basal node, it is difficult to point to evidence of transmission between the two states. We also see that there is transmission of the virus between San Diego and Kern. We also see that a large San Diego clade diverges from sequences from Texas and Arizona. This gives more support to the spread of the virus to CA from the southern states due to geographic constraints like the Rocky mountains curtailing the spread from the east to west.
Disclaimer
Please note that this data is released as work in progress by the WestNile 4K Project and should be considered preliminary. If you intend to include any of these data in publications, please let us know – otherwise please feel free to download and use without restrictions. We have shared this data with the hope that people will download and use it, as well as scrutinize it so we can improve our methods and analyses. Please contact us if you have any questions or comments.