Four representatives of CIAT’s team win the 2018 Syngenta Crop Challenge in Analytics.
From left to right: Nicolas Martin (INFORMS, chair of the prize), Hugo Andrés Dorado (CIAT), Andrés Aguilar (CIAT), Daniel Jiménez (CIAT), Sylvain Delerce (CIAT), and Daniel Dyer (Syngenta)
Predicting is a tough task. Even with state-of-the-art workflow, the performances of algorithms are never as good as we would like them to be. But predicting is about reducing uncertainty, not having it always right, because there is value in reducing uncertainty.
CIAT’s team took part this year in the Syngenta Crop Challenge in Analytics. After intense work in preparing our submission, we couldn’t lower the error of our model anymore. But when the team submitted its proposal to the Challenge, back in March this year, we did not really know what to expect from it, as we had no idea of the real potential of those datasets we worked on.
Nonetheless, our solution was picked as one of the five finalists! So, we traveled to Baltimore to present our solution, where we discovered that we had a good stake.
“I pushed my team hard to participate when I saw the call, as I was convinced we had the cards to play. We spent time on it, and it was totally worth it.”Daniel Jiménez
The 2018 Syngenta Challenge
For this 2018 edition of the Syngenta Crop Challenge in Analytics, participants were asked to predict the performances of maize hybrids for 2017 using three different datasets holding information up to 2016 on genetic markers of maize hybrids and soil and weather conditions of the test locations. More details can be found at https://www.ideaconnection.com/syngenta-crop-challenge/challenge.php
The main challenges were the following:
- The teams had to generate a weather forecast for 2017 as the conditions of the coming season were unknown (no data).
- The teams had to figure out how to handle the huge genetic dataset and its missing values.
- Putting together three datasets and running a model, the teams had to manage the computational load to finish on time!
Cooking the solution
Overview of the workflow.
Our team addressed all of these aspects with specific tools in order to eventually build a robust workflow.
To generate a solid weather forecast, it is important to avoid mixed signals in the data. That is why we first classified the locations using a hierarchical clustering enhanced with a dynamic time warping distance. We obtained 26 groups of locations with a notable spatial coherence (see map). Then, for each cluster, we generated a weather forecast using a long short-term memory (LSTM) recurrent neural network under Google’s TensorFlow environment. The ease in using the GPU under TensorFlow allowed us to save valuable time.
Classification of the locations according to the clustering analysis.
“The Challenge was a great opportunity to test new techniques such as LSTM and the TensorFlow environment. The progress we made on this will serve for future work.”Andrés Aguilar
The huge genetic dataset was first filtered to remove markers with more than 30% missing values. For the remaining markers, we used a twofold strategy to estimate missing values. The hybrids with well-represented parental lines were filled using a pattern replication technique, while the others were filled using a Multiple Component Analysis. Finally, we calculated the percentage of heterozygosis by counting the zeros in each line, and we removed variables with near-zero variance.
We performed a feature selection and a dimension reduction to obtain a final dataset with 90 predictors. Then we trained a random forest model to predict the performance of the maize hybrids for 2017.
On the human side of this work, working as a team really helped us to address those new challenges like the huge genetic data. The group included agronomists, biologists, bio-informaticians, data-scientists, and statisticians. At every point of the preparation, aiming for excellence and clarity were the strong drivers for the team.
“Counting with a rich and diverse team really helped to address that diversity of data and challenges. And, I have to say, enjoyable to cheer us up when doubt was showing up.”Sylvain Delerce
Before the final, we had no way of knowing what the other finalists had. This is when you keep moving forward in a thick fog, knowing that this is the way forward even though you are totally blind.
On Monday, 16 April, the four representatives of the team were ready at the INFORMS conference in Baltimore to present our solution to the jury.
Although the temperature was well below the survival zone of our Colombian team members, we were all excited about taking part in this final round and convinced that we had something good to share with the other groups and the jury.
As presentations of the other finalists proved their high-quality work, one could hardly say who was up for a trip to the podium. We were also surprised by the diversity of the approaches developed by the other teams and the fact that, somehow, all converged to similar levels of performance. This cheered us up for our presentation.
Sylvain Delerce and Andrés Aguilar shared the stage to present the team’s work while Hugo Andrés Dorado and Daniel Jiménez took part in the questions segment. You can follow the presentation here: https://bit.ly/2HsOLhg.
After one day of suspense, the winners were finally announced on Tuesday. Saeed Khaki, Hans Mueller, and Lizhi Wang from Iowa State University took third place; Johnathan Pedroso Rigal dos Santos from Brazil won second place; and we were thrilled to be announced as the winners of the Challenge.
The success of the team confirms that CIAT and the research for development sector have significant capacity in applying analytics to real-world challenges such as agriculture.
The team acknowledges the high quality of the work presented by the other finalists and thanks Syngenta and INFORMS for the opportunity to compete on rich datasets and for their recognition of our work. We look forward to setting up new collaborations to speed up the development of analytics for agriculture and offer high-quality tools to end-users.
“Dealing with loads of data and numerous variables, and realizing that there are many ways to address the challenge, was an exciting experience. We are really glad that our idea was recognized as one of the best.”Hugo Andrés Dorado