Notated Music Extraction into Machine Readable Data

by Erik Radio

Iterative processes constitute a significant part of any machine learning (ML) project. For my project I found they were necessary even before arriving at any coding. Specifically, just the concept and telos of ML changed the ways in which I thought about a particular problem. ML let me hone in on larger problems that needed addressing before I could approach what I had previously thought of as a starting point.
My initial project was concerned with the idea of affect analysis in Western music. Specifically, I was interested in the music of the French baroque era as this particular era of music was created with very specific ideas of what types of meaning music could signify and which are documented in treatises of that time. Some oversimplified examples: drone-like patterns are associated with a pastoral setting, while large intervallic leaps are often indicative of joy. I was really interested in examining these motifs within the context of Francois Couperin’s harpsichord music, in hopes that a new type of music retrieval could be enabled through an indexing of particular affects associated with pieces. More broadly, I figured that similar processes could be extended to other types of digital collections to enhance search opportunities for users.
The importance of the availability and quality of the data was articulated throughout the 2022 Idea Institute. Initially I thought of my project as having two types of available data with which to work. First, music as an audio signal (these are frequently MIDI files in ML projects); second, notated music encoded into a machine readable format. I decided to focus on the latter as the availability (!) of enough sound files posed a problem I didn’t have the means to mitigate. Using notated music was also attractive as it could be encoded into the Music Encoding Initiative (MEI) schema, which is serialized in XML and for which I have a background in writing code to parse and manipulate. Some may be familiar with the Text Encoding Initiative (TEI) which is an analogous standard, and like MEI, can be quite granular but also deeply labor intensive to create. I was sure there would be enough MEI files to get my analysis under way.
There were not. Furthermore, there were not even enough Baroque pieces encoded in MEI (at least not publicly available) from which to create a training data set. Again, availability (!) of data posed a significant challenge to the ML project I wanted to undertake. It was at this point that the affordances of ML, or rather the necessity of adequate data, meant that the problem I had identified was not the one that needed to be fixed first. While I couldn’t use ML to undertake a sentiment analysis (yet), I could use it to address the availability of such data. Specifically, I could use ML to take digitized scores and encode them in MEI.
With this new goal in mind I examined the literature on optical music recognition (OMR) and ML, and was weirdly encouraged by research indicating that there was a lot of room for improvement on these processes. (I guess it’s nice to know you aren’t unknowingly trying to solve a solved problem.) My general sense in the literature is that OMR, like any ML task, requires very deliberate parameters around the type of data you are using, and that what one process worked for a particular corpus may not work at all for one with different typeset. While there is still additional preparatory work to be undertaken before I can run larger scale OMR on my own corpus of music (let alone approach encoding it into MEI), I feel that the necessary steps have come into focus and I also have the language necessary to do additional research. Returning to iteration, the institute was invaluable in putting me into a different headspace to think about various ML problems. In contrast to other programming work I do in my job, ML has complicated the space in which I think about problems in a positive way. The problem you want to address may only be one side of it.b