Back to Search
Start Over
The MUSCIMA++ Dataset for Handwritten Optical Music Recognition
- Source :
- ICDAR
- Publication Year :
- 2017
- Publisher :
- IEEE, 2017.
-
Abstract
- Optical Music Recognition (OMR) promises to make accessible the content of large amounts of musical documents, an important component of cultural heritage. However, the field does not have an adequate dataset and ground truth for benchmarking OMR systems, which has been a major obstacle to measurable progress. Furthermore, machine learning methods for OMR require training data. We design and collect MUSCIMA++, a new dataset for OMR. Ground truth in MUSCIMA++ is a notation graph, which our analysis shows to be a necessary and sufficient representation of music notation. Building on the CVC-MUSCIMA dataset for staffline removal, the MUSCIMA++ dataset v1.0 consists of 140 pages of handwritten music, with 91254 manually annotated notation symbols and 82247 explicitly marked relationships between symbol pairs. The dataset allows training and directly evaluating models for symbol classification, symbol localization, and notation graph assembly, and indirectly musical content extraction, both in isolation and jointly. Open-source tools are provided for manipulating the dataset, visualizing the data and annotating further, and the data is made available under an open license.
- Subjects :
- Musical notation
Optical music recognition
business.industry
Computer science
06 humanities and the arts
02 engineering and technology
Optical character recognition
Musical
computer.software_genre
Notation
060404 music
Data visualization
Handwriting recognition
0202 electrical engineering, electronic engineering, information engineering
Graph (abstract data type)
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
0604 arts
Natural language processing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
- Accession number :
- edsair.doi...........2d9902878b09099678035609c87a7bba
- Full Text :
- https://doi.org/10.1109/icdar.2017.16