File(s) under embargo
Reason: Three papers from the thesis are in progress to be submitted in the next few months.
until file(s) become available
MULTI-TEMPORAL MULTI-MODAL PREDICTIVE MODELLING OF PLANT PHENOTYPES
High-throughput phenotyping using high spatial, spectral, and temporal resolution remote sensing (RS) data has become a critical part of the plant breeding chain focused on reducing the time and cost of the selection process for the “best” genotypes with respect to the trait(s) of interest. In this study, the potential of accurate and reliable sorghum biomass prediction using hyperspectral and LiDAR data acquired by sensors mounted on UAV platforms is investigated. Experiments comprised multiple varieties of grain and forage sorghum, including some photoperiod sensitive varieties, providing an opportunity to evaluate a wide range of genotypes and phenotypes.
Feature extraction is investigated, where various novel features, as well as traditional features, are extracted directly from the hyperspectral imagery and LiDAR point cloud data and input to classical machine learning (ML) regression based models. Predictive models are developed for multiple experiments conducted during the 2017, 2018, and 2019 growing seasons at the Agronomy Center for Research and Education (ACRE) at Purdue University. The impact of the regression method, data source, timing of RS and field-based biomass reference data acquisition, and number of samples on the prediction results are investigated. R2 values for end-of-season biomass ranged from 0.64 to 0.89 for different experiments when features from all the data sources were included. Using geometric based features derived from the LiDAR point cloud and the chemistry-based features extracted from hyperspectral data provided the most accurate predictions. The analysis of variance (ANOVA) of the accuracies of the predictive models showed that both the data source and regression method are important factors for a reliable prediction; however, the data source was more important with 69% significance, versus 28% significance for the regression method. The characteristics of the experiments, including the number of samples and the type of sorghum genotypes in the experiment also impacted prediction accuracy.
Including the genomic information and weather data in the “multi-year” predictive models is also investigated for prediction of the end of season biomass. Models based on one and two years of data are used to predict the biomass yield for the future years. The results show the high potential of the models for biomass and biomass rank predictions. While models developed using one year of data are able to predict biomass rank, using two years of data resulted in more accurate models, especially when RS data, which encode the environmental variation, are included. Also, the possibility of developing predictive models using the RS data collected until mid-season, rather than the full season, is investigated. The results show that using the RS data until 60 days after sowing (DAS) in the models can predict the rank of biomass with R2 values of around 0.65-0.70. This not only reduces the time required for phenotyping by avoiding the manual sampling process, but also decreases the time and the cost of the RS data collections and the associated challenges of time-consuming processing and analysis of large data sets, and particularly for hyperspectral imaging data.
In addition to extracting features from the hyperspectral and LiDAR data and developing classical ML based predictive models, supervised and unsupervised feature learning based on fully connected, convolutional, and recurrent neural networks is also investigated. For hyperspectral data, supervised feature extraction provides more accurate predictions, while the features extracted from LiDAR data in an unsupervised training yield more accurate prediction.
Predictive models based on Recurrent Neural Networks (RNNs) are designed and implemented to accommodate high dimensional, multi-modal, multi-temporal data. RS data and weather data are incorporated in the RNN models. Results from multiple experiments focused on high throughput phenotyping of sorghum for biomass predictions are provided and evaluated. Using proposed RNNs for training on one experiment and predicting biomass for other experiments with different types of sorghum varieties illustrates the potential of the network for biomass prediction, and the challenges relative to small sample sizes, including weather and sensitivity to the associated ground reference information.