Roman Numeral Evaluation with Graph Neural Networks | by Emmanouil Karystinaios

An Introductory Information

On this article, I wish to clarify my journey in growing a mannequin for computerized harmonic evaluation. Personally, I’m curious about understanding music deeply. Questions like: “Why are issues structured the best way they’re?” and “What was the composer or artist considering when writing the piece?” are essential to me. Naturally, the best way to begin was for me to analyse the underlying concord of a chunk.

Scavenging my previous notebooks again from the conservatory I stabled upon the method we had been utilizing to annotate and analyze small musical excerpts. It’s known as Roman Numeral evaluation. The thought could be a bit difficult if you happen to by no means heard about it earlier than however please naked with me.

My objective is to construct a system that may mechanically analyze musical scores. Given a rating then the system will return the identical rating with an additional workers containing the chords in Roman numeral notation. This could work primarily for classical tonal music however isn’t essentially restricted to that.

In the remainder of this text, I’ll introduce the ideas of Roman Numerals, Graph Neural Networks, and talk about some particulars concerning the mannequin I developed and the outcomes. I hope you take pleasure in!

Introduction to Roman Numerals

Roman Numeral evaluation is a technique used to know and analyze the chords and harmonic progressions in music, significantly in Western classical music and in style music. Chords are represented utilizing Roman numerals as a substitute of conventional musical notation.

In Roman Numeral evaluation, you see, every chord is assigned a Roman numeral primarily based on its place and performance inside a given key. The Roman numerals symbolize the dimensions levels of the important thing, with uppercase numerals representing main chords and lowercase numerals representing minor chords.

For instance, in the important thing of C main, the C main chord can be represented by the Roman numeral “I” (uppercase “I” denotes a serious chord). The D minor chord can be represented by “ii” (lowercase “ii” denotes a minor chord). The G main chord can be represented by “V” (uppercase “V” denotes a serious chord) as a result of it’s the fifth chord in the important thing of C main.

A Roman Numeral evaluation instance for 2 bars for four-part concord in C main.

Roman numerals are all the time relative to a key. Then if the bottom line is C main then the Roman numeral “V” can be the dominant or the G main chord. However chords do have completely different qualities for instance minor or main. In Roman numerals, capital letters stand for main high quality and lowercase for minor high quality.

In music evaluation, normally the bottom observe is some extent of reference concerning the character of a chord. Roman numerals are capable of convey this info too. Within the instance above, the bass (lowest chord observe) of the second chord is F sharp, however the root of the chord is D due to this fact the chord is in 1 inversion, indicated with the quantity 6.

One other attention-grabbing notation functionality of Roman numerals is expounded to borrowed chords. This impact known as secondary diploma, implicitly each Roman numeral (main) has a secondary diploma of the tonic (i.e. I or i), nevertheless, when the secondary diploma is annotated then we’re knowledgeable which scale diploma is appearing because the tonic momentarily. The third chord, within the instance above, has a dominant seven as its main diploma and the dominant of C main as its secondary diploma. The V65 signifies a serious with a seven high quality in second inversion.

Roman Numeral evaluation helps musicians and music theorists perceive the construction and relationships between chords in a chunk of music. It permits them to determine frequent chord progressions, analyze harmonic patterns, and make comparisons between completely different musical compositions. It’s a great tool for composers, arrangers, and performers to know the underlying concord and make musical selections primarily based on that data.

Computerized Roman Numeral Evaluation

Now that we’ve a foundation for what Roman Numeral evaluation appears like in observe we are able to talk about the best way to automate it. On this article, we’ll cowl a technique to foretell Roman Numeral from symbolic music, i.e. digital scores (MusicXML, MIDI, Mei, Kern, MuseScore, and so forth.). Please observe that you may receive a few of these codecs from any rating editor software program corresponding to Finale, Sibelius, MuseScore, or every other. Normally, the software program permits for an export to a musicxml (uncompressed) format. Nonetheless, for if you happen to don’t have any of those editors I counsel utilizing MuseScore.

Let’s now talk about the representations in additional depth. In distinction to audio representations the place music might be seen as a digital sequence within the waveform stage or a 2-D spectrogram within the frequency area, the symbolic illustration has particular person observe occasions carrying info corresponding to onset time, length, and pitch spelling (names of notes). The symbolic representations have usually been handled as a pseudo-audio illustration separating the rating into quantized time frames, for instance, a pianoroll (just like the determine proven under). Nonetheless, just lately some works proposed a graph illustration of a rating the place each observe represents a vertex within the graph and edges symbolize relations between notes. For the latter, scores might be remodeled on this graph construction which is especially helpful when a Machine Studying mannequin is concerned.

Completely different representations of the rating excerpt are proven within the center. High: quantized timeframe illustration, backside: graph illustration.

So given a symbolic rating, the graph is constructed by modelling 3 relationships between notes.

Notes beginning on the identical time, i.e. identical onset.
Word beginning when the opposite ends, i.e. consecutive notes.
Notes beginning whereas the opposite is sounding, i.e. throughout connection.

The graph of the rating can be utilized as enter to a Graph Neural Community which implicitly learns by propagating the knowledge alongside the sides of the graph. However earlier than we clarify how a mannequin works on scores, let’s first briefly clarify how Graph Neural Networks work.

So, what precisely are Graph Neural Networks? At their core, GNNs are a category of deep studying fashions designed to deal with knowledge represented as graphs. Similar to real-world networks, graphs encompass interconnected nodes or vertices, every with its personal distinctive options. GNNs leverage this interconnectedness to seize wealthy relationships and dependencies, enabling them to carry out evaluation and prediction duties.

However how do GNNs work? Think about a musical rating the place every observe is a node, and observe relations symbolize the connections between them. Conventional fashions would deal with every observe occasion individually, ignoring the musical context. Nonetheless, GNNs embrace this context by contemplating each the person’s options (e.g., pitch spelling, length) and their relationships (identical onset, consecutive) concurrently. By aggregating info from neighbouring nodes, GNNs empower us to know not solely particular person notes but in addition the dynamics and patterns throughout the whole community.

To realize this, GNNs make use of a collection of iterative message-passing steps. Throughout every step, nodes collect info from their neighbours, replace their very own representations, and propagate these up to date options additional via the community. This iterative course of permits GNNs to seize and refine info from close by nodes, progressively constructing a complete understanding of your entire graph.

The message-passing course of when accomplished iteratively within the community is typically known as graph convolution. A preferred graph convolution block that we additionally utilized in our music evaluation mannequin known as SageConv, from the well-known GraphSAGE paper. We received’t cowl the particulars right here however there are a lot of sources masking the performance of GraphSAGE, corresponding to this one.

The great thing about GNNs lies of their capacity to extract significant representations from graph knowledge. By studying from the native context and mixing it with international info, GNNs can uncover hidden patterns, make correct predictions, and even generate new insights. This makes them invaluable in a variety of domains, from social community evaluation to drug discovery, visitors prediction to fraud detection, and now to music evaluation.

The mannequin used for Roman Numeral evaluation known as ChordGNN.
Because the title suggests, ChordGNN is a mannequin for computerized Roman Numeral evaluation primarily based on Graph Neural Networks. A particularity of this mannequin is that’s leverages note-wise info however produces onset-wise prediction, i.e. a Roman Numeral is predicted for every distinctive onset occasion of the rating. That signifies that a number of notes on the identical onset will share the identical Roman Numeral similar to when annotating a musical rating. Nonetheless, by utilizing Graph Convolution info from each observe is propagated via the neighboring notes and onsets.

ChordGNN mannequin structure illustration.

ChordGNN relies on a Graph Convolutional Recurrent Neural Community Structure and it’s composed of stacked GraphSAGE Convolutional Blocks that function on the observe stage.

The Graph Convolution is adopted by an Onset-Pooling Layer that contracts the observe representations to the onset stage, thus leading to a vector embedding for every distinctive onset of the rating. This is a vital step because it strikes the illustration from a graph to a sequence.

The embeddings obtained by the Onset-Pooling, that are additionally ordered by time, are then fed to a Sequential mannequin, corresponding to a GRU stack. Lastly, easy Multi-Layer Perceptron Classifiers are added for every one of many attributes that describe a Roman Numeral. Due to this fact, ChordGNN can be a Multi-Activity mannequin.

ChordGNN doesn’t instantly predict the Roman numeral for each place of the rating however quite predicts the diploma, native key, high quality, inversion and root as a substitute. The predictions of every attribute job are mixed right into a single Roman Numeral prediction by analyzing the predictions for every of the duties. Let’s see what the output predictions regarded like.

On this part, we’ll have a look at a few of ChordGNN’s predictions and even evaluate them with an evaluation accomplished by a human. Under is an instance of the primary bars from Haydn’s string quartet op.20 №3 motion 4.

A comparability between the human annotation and ChordGNN on a passage of Haydn’s string
quartet op.20 №3 motion 4.

On this instance, we are able to view a number of issues. In measure 2, the human annotation marks a tonic in first inversion; nevertheless, the viola at that time is decrease than the cello and due to this fact the chord is definitely in root place. ChordGNN is ready to predict this appropriately. Subsequently, ChordGNN predicts a harmonic rhythm of eighth notes, which disagrees with the annotator’s half-note marking. Analyzing the underlying concord in that passage, we are able to justify our ChordGNN’s decisions.

The human annotation means that your entire second half of the 2nd measure represents a viio chord. Nonetheless, it shouldn’t be within the first inversion, because the cello performs an F# because the lowest observe (which is the foundation of viio). Nonetheless, there are two conflicting interpretations of the section. First, the viio on the third beat is seen as a passing chord between the encircling tonic chords, resulting in a dominant chord within the subsequent measure. Alternatively, the viio may already be a part of a protracted dominant concord (with passing chords on the offbeats) resulting in the V7. The ChordGNN resolution accommodates each interpretations because it doesn’t try and group chords at a better stage, treating every eighth observe as a person chord quite than a passing occasion.

A comparability between the human annotation and ChordGNN on a passage of Mozarts’s Piano Sonata K279 motion 1. Picture by the creator

Above is one other instance evaluating the predictions of ChordGNN with the unique evaluation of a Mozart Piano Sonata. On this case, ChordGNN’s evaluation is a little more simplistic, selecting to omit some chords. That is occurring on two completely different events with the dominant seven in 4 inversion (V2). It is a cheap assumption for ChordGNN because the bass is lacking. One other disagreement between the annotation and the prediction happens on the half cadence in direction of the top. ChordGNN is treating the C# of the melody as a passing observe the place the annotator chooses to specify the extension of #11.

On this article, we mentioned a brand new technique for automating Roman Numeral Evaluation utilizing Graph Neural Networks. We mentioned how the ChordGNN mannequin works and showcased a few of its predictions.

E. Karystinaios, G. Widmer. Roman Numeral Evaluation with Graph Neural Networks: Onset-wise Predictions from Word-wise Options. Proceedings of Worldwide Society of Music Info Retrieval Convention (ISMIR), 2023.

All pictures and graphics on this article are created by the creator.