GLOBE Projects

GLOBE Side Navigation

Predicting West Nile Virus Mosquito Positivity Rates and Abundance A Comparative Evaluation of Machine Learning Methods for Epidemiological Applications

Student(s):Alexander Greco, Jillian Chang, Julianna Schneider, Luke Shao, Maria Molchanova
Grade Level:Secondary School (grades 9-12, ages 14-18)
GLOBE Educator(s):Cassie Soeffing
Contributors:Dr. Rusty Low, scientist, IGES Peder Nelson, scientist, OSU Dr. Erika Podest, scientist, NASA JP
Report Type(s):International Virtual Science Symposium Report, Mission Mosquito Report
Protocols:Land Cover Classification, Mosquitoes
Presentation Video: View Video
Presentation Poster: View Document
Language(s):English
Date Submitted:02/10/2022
Mosquitoes are a public health concern as they are major vectors of disease. Some cities have programs to track mosquito abundance from traps, but this fieldwork is expensive and time-consuming. Our project presents a comparative analysis of two machine-learning-based regression techniques for predicting the rate at which mosquito abundance changes and the rate at which mosquitoes test positive for West Nile Virus. Our methodology takes in climatic data obtained through remote sensing and outputs the derivative at each time step for the number of mosquitoes in Chicago and the proportion of them that are active vectors for West Nile Virus. We narrowed down the many climatic variables obtained through our literature review using p values, to determine their statistical significance in predicting our desired outputs. We ran ordinary least squares regression on each input individually and then in groups to determine which groupings best indicate mosquito abundance or disease positivity rate. Using these groups, we then trained four machine learning models using two types of regression: a Random Forest Regressor and Backward Elimination Linear Regression. We trained our Random Forest using a Randomized Search Cross Validation and our Backward Elimination model using a p value of 0.05. Our climatic input variables were statistically significant for determining how quickly the mosquito population in Chicago changed and how quickly they acquired West Nile Virus. Both the Random Forest Regressor and Backward Elimination Linear Regression accurately predict the rate of the decrease or increase in mosquito population and the rate at which mosquitoes become West Nile Virus vectors. Consequently, our methodology and results hold potential for valuable applications to public health programs and concerns. The Random Forest proved the most optimal machine learning model for predicting these mosquito outputs based on climatic, land cover, and ecological factors. Moreover, this study’s area for improvement demonstrates the importance of reliable data recorded frequently to advance machine learning models’ abilities and to combat the spread of mosquito vector borne diseases. Through citizen science data such as the German study and GLOBE Observers, high volumes of data can be processed by scientists and machine learning models, providing an optimistic outlook for future epidemiological applications.



Comments