Back to Search Start Over

Signed Biodiversity Data Packages: A Method to Cite, Verify, Mobilize, and Future Proof, Large Image Corpora. hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb hash://md5/bae7f441cdd2648d2356b2330e4b71e8

Authors :
Poelen, Jorrit H.
Best, Jason
Publication Year :
2023
Publisher :
Zenodo, 2023.

Abstract

Access to Natural History Collections helps researchers to better understand the natural world. Millions of digital images of herbarium specimens are openly available via the Internet. However, using these images in a data-intensive research project raises basic questions like: "How do I efficiently access, and verify, hundreds of thousands of images?", and, "How do I cite a version of a large image corpus?" Here, we present a method to cite, verify and mobilize such image corpora across different locations and medium types. We demonstrate our method with >100k images made available through the Botanical Research Institute of Texas using available tools (e.g., rsync, Preston) and technologies (e.g., internet, postal service). Our results show that our packaging method allows the US Postal Service to transfer a packaged corpus at about 3 images/s, whereas retrieving individual images via HTTP achieved a transfer rate of about 0.2 images/s. Our results support that signed digital packaging of image corpora enables distributed storage using readily available transfer and storage methods. In addition, our method is future proof because they can be used with any digital media, including those that are not yet available. included files are: 00_Poelen_DD2023.mp4 - recorded presentation (see also https://vimeo.com/832006741) 00_Poelen_DD2023.pdf - presentation slides 00_Poelen_DD2023.pptx - presentation slides (powerpoint) 00_Poelen_DD2023_Abstract.pdf - presentation abstract. Part of: Digital Data in Biodiversity Data Conference 2023 @ Arizona State University 5-7 June 2023. Provenance: preston head \ --remote https://linker.bio\ --anchor hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb preston head\ --remote https://linker.bio\ --anchor hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb\ | preston cat\ --remote https://linker.bio\ --anchor hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb\ | md5sum\ | sed 's+^+hash://md5/+g'\ | cut -d ' ' -f1 hash://md5/bae7f441cdd2648d2356b2330e4b71e8 preston alias\ --remote https://linker.bio\ --anchor hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb . . . . . . . . . . . . . .<br />In part funded by Nation Science Foundation OAC 1839201 .

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....1af6d8f3221d35b399878d37a06964f6
Full Text :
https://doi.org/10.5281/zenodo.7990926