The Cassava Genome Hub: Terabytes of tuberous tropical root research set to revolutionize big data for agriculture

The Cassava Genome Hub: Terabytes of tuberous tropical root research set to revolutionize big data for agriculture

We live in a golden age of information, with data readily available, and accessible at our fingertips. Some argue we have more information than we know what to do with.

“When it comes to cassava, we are in the midst of a genomic revolution that is producing enormous amounts of information. CIAT’s goal is to develop the tools and skills needed to analyze all this data, and in turn accelerate and enhance the impact of international agricultural research,” explains Dr. Luis Augusto Becerra, Cassava Program Leader at CIAT.

The Cassava Genome Hub, an online platform that produces and stores more than 15 terabytes of genetic data on cassava, is pioneering a new approach to big data management and analysis.

Launched publically in late 2015, and accessible from anywhere in the world, the Hub allows researchers to manage and mine this huge amount of data themselves, using graphical and analytical tools to conduct complex analysis in a user-friendly way.

It is likely the single largest collection of genomic data for cassava in the world.

cassava_genome_hub2The basic process involves taking cassava samples, genetically sequencing them, and uploading that data to the site. Sequencing happens at two companies in China – Beijing Genomics Institute (BGI) and Novogene.

Scientists can then compare wild with domesticated cassava plants, or land races with elite lines. This enables them to identify differences and, in turn, pinpoint desirable traits and genes.

The ultimate goal is to understand the genetic mechanisms behind those traits, so that researchers can develop selection tools for cassava breeders. The Hub also features a discussion forum, for users to share results, ask questions, and facilitate collaborative research.

“This site allows for molecular biologists, plant physiologists, breeders, and students with a basic understanding of biology, who wouldn’t otherwise have the means, particularly in rural and developing areas, to access these data and tools.” explains Manuel Ruiz, CIRAD Researcher and Head of the Bioinformatics team on Data Integration.

“It’s incredibly useful for scientists around the world to be using the same tools to analyze data, to have an easy, standardized approach, and a common baseline for comparison.”

Taking a ‘byte’ out of blight

Data from the Hub is already being put into practice.

Researchers have found that SNPs – Single Nucleotide Polymorphisms, each representing a different genetic variation in a single DNA building block – may help predict susceptibility of cassava varieties to heat, drought, pests, and diseases.

For example, data from the platform has been used to screen for and identify genetic resistance to Cassava Bacterial Blight (CBB), a destructive disease that causes yield losses of up to 75% in some African countries. Once identified, these could be used to develop blight-resistant commercial cassava varieties.

Whiteflies, another threat to cassava, are considered one of the world’s major agricultural pests, attacking a wide range of crops and causing considerable losses. CIAT scientists have mapped genes for whitefly resistance in cassava varieties in the CIAT germplasm collection. The data gathered was made publically available through the Hub, and has helped researchers breed new, resistant cassava strains.

What’s next? Scaling up and out

According to Dr. Becerra, the Cassava Genome Hub “can contribute far beyond the scope of what we thought possible when we began. There’s huge and unknown potential that we haven’t tapped into yet.”

Dr. Becerra foresees the sequencing all 6,000+ cassava accessions in the CIAT genebank in the next couple of years. The Hub’s technologies are already expanding to other tropical crops, including cocoa, coffee, banana, and sugarcane.

cassava_genome_hub3

CGIAR Platform for Big Data in Agriculture

The Cassava Genome Hub is just one way in which CIAT is working to transform rural livelihoods through the power of information. CIAT will also jointly lead the CGIAR Platform for Big data in Agriculture, launching in January 2017, which aims to provide global leadership to help organize, convene, and inspire partners to use open data in innovative ways.

The Cassava Genome Hub is jointly managed by CIAT, The French Agricultural Research Center for International Development (CIRAD), and The Research Institute for Development (IRD), with participation from the CGIAR Research Program on Roots, Tubers, and Bananas, South Green, Agropolis Foundation, BGI, the National University of Colombia, the University of London, and a network of collaborators worldwide.

The Cassava Genome Hub is supported by the CGIAR Fund Donors through the Roots, Tubers, and Bananas Research Program, and by the Agropolis Foundation.