Introduction. A large portion of the biodiversity data in natural history collections is still not available digitally. Increasingly, innovative high-throughput methods are being applied to digitize this backlog in bulk, generating large amounts of data. In parallel, natural history museums are becoming increasingly involved in the generation of large amounts of molecular biodiversity data using new massively parallel sequencing platforms. Against this backdrop, the goal of EU BON Task 1.4 has been to support data mobilization efforts targeting collection-based and molecular data, mainly through the development and integration of innovative open-source tools and services.
Progress towards objectives. The activities have involved work within the context of three major projects: i) DINA, an open-source, modular, web-based collection management system for natural history specimen data. ii) JACQ an open-access system for botanical (herbarium) data. iii) PlutoF, a web platform for working with traditional and molecular biodiversity research data. The task has also involved work on a number of other EU BON partner systems and services, as well as integration across internal EU BON and external biodiversity informatics resources. Finally, these systems have been used for targeted data mobilization efforts.
Achievements and current status. Within DINA, the focus has been on supporting the engineering of sophisticated biodiversity information systems through the exploration of tools supporting distributed development and a modular plug-and-play design based on services-oriented architectures. This has involved the testing and adoption of tools like Apiary for the design of Application Programming Interfaces (APIs) and Docker for systems integration and deployment tasks. A Python library for data migration to DINA was also developed and tested. Within JACQ, a number of tools were developed to facilitate deployment and data migration to the system, and the AnnoSys tool for annotation of data has been integrated. Within PlutoF, EU BON efforts focused on the development of a citizen-science module and improved functionality for the mobilization of collection (living) specimen data. A number of innovative tools were developed by Pensoft to help mobilize biodiversity data published in the scientific literature, including semantic mark-up of species conservation papers, direct import of data from a range of biodiversity platforms into manuscripts, and a mechanism for providing stable links from publications to global biodiversity repositories. Plazi implemented an automated workflow mining published scientific papers for taxonomic data, currently mobilizing 25 % of all published new names as they become available. GlueCad developed apps allowing citizen scientists reporting spontaneous observations or systematic inventory data to select target taxa and preferred data mobilization platform. IBSAS and UCPH have focused on national data mobilization efforts targeting Slovakia and Denmark, respectively.
Future developments. Although the development is clearly towards increased integration of biodiversity informatics tools into larger and more sophisticated systems, it is clear that there is no one size that fits all. Nevertheless, the increasingly widespread adoption of community standards, open-source development practises and service-oriented architectures are pushing the capability of current systems forward and facilitating tighter integration across systems. This trend is supported by the appearance of sophisticated tools enabling the design and deployment of complex modular systems. The adoption of the Docker approach is one example of how the biodiversity informatics community may benefit from this.
2016. , p. 34