Collecting, organizing, and centralizing data is a fundamental component of scientific research. However, making agricultural research data widely available and re-usable is a rather recent initiative, not just for CIAT but also for the CGIAR as a whole. As a result, an Open Access and Data Management policy is in the process of being implemented, thus creating the need for collecting and organizing historical data.
As a direct response to the increasing demand for data collection, organization and systematization, CIAT recently created a new data and information management team led by the Center’s Program Coordination Unit. This team includes five interns. The purpose of this blog post is to highlight the importance of their contributions in meeting the data needs of the various research programs.
Thanks to their work thus far, significant progress has been made in data collection; from the collection of materials that were not only scattered in different computers and servers but also stored in different data and file formats. This work began with the Cassava program (where Lizbeth Pino plays a mediating role, facilitating the requirements of the program and those of the data management team) and continued with tropical forages, rice, beans, and soils. It also includes already coded information from projects and trials in Africa.
interns today (Carlos Medina, Derlyn Lourido, Luis Gonzales, Andrea Mora, and Kenji Tanaka) have helped centralize, standardize, and organize research data from the 1970s to the present.
How do they do it?
An important part of this process of data standardization is ontology. Ontology allows us to define how to measure common characteristics (of different crops) in method and scale, thereby enabling data comparison among different research Centers.
The interns also contribute to the Integrated Breeding Platform and its Breeding Management System (BMS). The BMS has been specifically designed to help breeders manage their day-to-day activities through all phases of their breeding programs. With BMS, all CGIAR Centers can share the same ontology. This provides an opportunity to integrate information and thus take advantage of the data available from research conducted over the years.
For example, when working with the Cassava program, besides including all crosses and pedigrees of materials, our interns were vital in enabling us to add the results of all trials from 1978 to 2012, which were then added to the system. An intern is currently working on developing a software that allows us to import data from Excel or Oracle to BMS in batches, without having to enter records one by one.
Over the next year, it is expected that all CIAT crop data will be available through BMS and that researchers, using BMS software, can input trial results directly into these databases.
Besides working on the storage and dissemination of research results and projects developed at CIAT, the interns also develop, maintain and improve software solutions that are adapted to the needs of research areas. For example, to address the issue of transfer of materials, a software was designed some time ago for the Standard Material Transfer Agreement (SMTA) which allows traceability of any material that has been transferred from one country to another. Another example is the development of a new version of a software package, as part of one of the intern’s thesis requirements which enables the management of all aspects related to cassava genetics.
Towards an open data culture
These students have contributed to strengthening CIAT’s research processes and facilitated the Center’s progress towards an open data culture. In this regard, there has been an interesting evolution of researchers’ initial resistance of the publication of their data.
Historically, scientists have always been protective of their data, even going so far as to say, “my data are my data,” and perhaps even withholding it from their own team. They have also required certain conditions when adding data into databases, so that the data would not be surrendered to anyone in particular.
Now, however, with this institutional move towards an open access policy, not only is their greater awareness but also stronger leadership that emphasizes the importance of ensuring that data is available and accessible. The creation of this new team, therefore, with its greater proximity to researchers and visibility institution-wide, has been very timely.
The interns and their role in this process will continue to be essential, especially so that their enthusiasm, their fresh new ideas and need to continuously innovate will bring CIAT closer to maximizing overall accessibility of the data collected and thus reaching a greater number of beneficiaries.
For further information regarding the Data and Information Management group, please contact:
Leroy Mwanzia, data and information manager (firstname.lastname@example.org)
Arturo Franco, coordinator of database systems (email@example.com)
Carolina Garcia, systems analyst (firstname.lastname@example.org)
Paola Cruz, systems analyst (email@example.com)