Student Research Reports
USING FIRE DATA TO IMPROVE ABUNDANCE MODELS
Country:United States of America
Student(s):Raymond Lin, Haley Oba, Krish Desai, Ashmit Dewan, Zarar Haider
Grade Level:Secondary School (grades 9-12, ages 14-18)
GLOBE Educator(s):Cassie Soeffing
Contributors:Dr. Rusty Low, IGES, scientist
Peder Nelson, OSU, sme
Dr. Erika Podest, NASA JPL, scientist
Andrew Clark, IGES, EO Researcher and Data Analyst
Report Type(s):International Virtual Science Symposium Report, Mission Mosquito Report
Protocols:Earth As a System, Mosquitoes
Presentation Video:
View Video
Presentation Poster:
View Document
Language(s):English
Date Submitted:01/24/2023
Mosquitoes have been a major health concern for decades, and with climate change expanding their
range, their threat to public health is increasing. In response, mosquito abundance prediction machine
learning models have been researched in numerous locations. Our research builds on this and seeks to explore novel methods such as using natural disaster data, optimizing hyperparameters through
Bayesian Search, and inspecting models using Partial Dependence (PDP) and Individual Condition
Expectation (ICE) plots. Based on previous work, we selected four base ecological variables. We then
acquired variations of these base variables and assessed their effectiveness by training Random Forest Regressors (RFR) using the variables’ variations instead of the base variable. Out of all the variations, only minimum daily temperature proved better than its base variable (mean daily temperature). Our final model used the best variable variations and our custom forest fire index. We optimized all our models using Bayesian Search, which we found to be more effective than Grid Search. Our final RFR model had a root mean squared error (RMSE) of 3.94 when predicting the test set. To see whether forest fire index had any impact on accuracy, we used Drop-column variable importance, the purest way of calculating variable importance. We found that forest fire marginally increased accuracy, which is the best case scenario for rare-occurrence data, where most of the values are 0. Using PDP and ICE plots, we found that our model was able to synthesize accurate relationships between variables like temperature and mosquito abundance that reflect field and lab findings. Further research should be done on machine learning model inspection and its use cases. Within mosquito research, further work can explore other novel datasets, like forest fires, to form a more comprehensive understanding of mosquito abundance.
Keywords: machine learning, mosquito abundance, forest fire, model inspection, feature
optimization