A Framework for Interactive Generation of Musicfor Games
Kristopher Reese, Roman Yampolskiy, Adel Elmaghraby
Computer Engineering and Computer ScienceUniversity of LouisvilleLouisville, KY 40202
{
kwrees02, roman.yampolskiy, adel
}
@louisville.edu
Abstract
—Tonal music has had a rich history in Video Gamesand movies, and in fact, music generation has played a minorrole in the history of game music as well. Recent developments inMusic Theory has derived representions of chord progressionsusing geometric topologies. Unlike prior generative music, theframework proposed in this paper attempts to approach tonalmusic generation by building networks of chords using thegeometric topologies. This geometric network of chords can thenbe used inside of reinforcement learning models for learning thebest motions in the progression. The proposed method uses QLearning models, by rewarding acceptable chords, such as Majorand minor chords. Rewards are also given to chords withina speciﬁed scale. The proposed framework approaches tonalchord progressions by keeping a tonal center in the progression.Methods for creating interactive and unique music for videogames are also discussed.
Index Terms
—QLearning, Music, Generative, Interactive
I. I
NTRODUCTION
Music has long been a part of the video game industry, andmuch of the music has become a part of mainstream culturetoday. Concerts dedicated to music from video games havenow become an common event in concert halls across theworld. But much of the production of music for video gamesis often left to composers. In order for the composers to reachthe full gamut of emotions, many different pieces have to becomposed, which can cost excessive amounts of money. Forsmaller gaming companies or independent game programmers,a music system capable of using the full spectrum of chordsand emotions would greatly beneﬁt game design.Music has long been known to be able to affect one’s emotional and physiological state. Recent psychological studieshave shown the effect of music on various affective states.Scherer and Zentner [1] lay out how much various features inmusic affect various states including emotion and mood. Thereis an argument as to whether music evokes a genuine emotionor if the listener merely perceives emotions in compositions[2].This argument however is an unnecessary debate as to theeffects that music can have in video games. The ﬁeld of MusicTherapy has a role in helping to ease various disorders. Ithas been applied to Dementia Care [3], Schizophrenia [4],and Autism [5]. These types of studies show that musichas a very therapeutic effect, and would be well suited forimplementation into therapeutic games.There are of course other effects that music have on humansas well. We often use music, or the lack thereof, to warn us of dangers or as a sign of change for better or for worse. Recentstudies have shown that the stress levels of video game play areaffected by music in a physiological way as well [6]. Thesephysiological effects are caused by the secretion of varioushormones in response to the auditory system.It is both these physiological and therapeutic effects thatmusic can have that can be of interest to serious and therapeutic game developers. With this understanding, a more adaptivetype of music may become of interest to these developers, sothat moods and emotions can be affected without the need of large amounts of human composed music.This paper proposes a system that would attempt to mimichuman composer’s ability to write music, but do this ontheﬂy as needed by the game. Recent developments in musictheory have provided a means for using systems that implement stochastic decision making to generate chords. It is thisconcept that this paper looks at further.This method differs from grammatical structures in thatthe method used allows the generated music to use the fullspectrum of possible chords, from the traditional chords, tochromatic chords, to the potentiality of nonharmonic chords.The method discussed here is an experiment into the possibility of the use of this method, and the output is not expectedto approach or exceed the output of grammatical methods.The second and third sections of this paper will discuss anddevelop the framework for generating music in the chord progression. Section two will focus on the geometrical topology of chord theory proposed by Tymoczko. The section will focus onmotion in the model based on 2dimensional topologies, but ageneralization for any
N
dimensional space will be discussedas well. Section Three will discuss the Reinforcement Learningmodel, called QLearning, initially used in this framework.It also discusses the extension of the QLearning modelsfor generating chords and the Reward System used for theframework. A method for resolving voice leading issues is alsodiscussed. Section four takes a look at the measure of tonalitythat is used for the generated music, and looks at a passagegenerated by the framework. Section ﬁve proposes ideas foruses in Video Games and other Interactive methods for music
Fig. 1. The Two dimensional orbifold described by Dmitri Tymoczko [7]–[9].Movements representation on the two dimensional space and representing themovements of ﬁgure 3. The Blue lines are the ﬁrst to second interval; Greenis second to third; Red is third to fourth; Black is fourth to ﬁfth.
generation. Section six concludes the paper. Future work anddevelopments are scattered throughout sections IVVI.II. G
EOMETRIC
M
USIC
T
HEORY
Music theory is riddled with discussions about the languageof music and what makes music sound good to listeners.Because of this, tonal music theory has become more of agrammatical language than a mathematical study. Howeverrecent developments in music theory have begun to discussthe mathematics of music theory. Tymoczko determined thatthere is a latent model that can be used to represent tonal harmonic movement in any
n
note chord using an
n
dimensionalorbifold topology [7]–[9]. This model plays a signiﬁcant rolein the development of the proposed framework, and will bediscussed in this section.
A. One and Twodimensional Spaces
Chords in music generally consist of three or more pitchclasses. However, Tymoczko’s model generalizes well to nonchord spaces in music as well. Tymoczko discusses a singlepitch model, which maps to a OneDimensional topology,which he calls a ”Circular pitchclass space” [7]. He says that,to think about these single pitch classes, we can represent eachpitch class on a line [7], [9]. As with on a keyboard, once thelast pitch has been reached, the pitch classes repeat  so oncewe reach a note of
G
, we return to a pitch class of A.We can more formally discuss this using a formal notation.
E
−
12
−−→
E
would represent a movement from one pitch classof E to the same pitch class, and is topologically similar toa movement of
E
+0
−−→
E
, though in practice they representanother piece of information. We can read this notation as: “Emoves down 12 semitones (or a musical octave).” This circlealso captures information about the movement of the notes. Anegative motion represents a movement of the note’s frequencydownward (making the note sound lower). A positive motionrepresents an increase in the note’s frequency. Thereby thoughthe two representations are topologically the same, we knowthat in the ﬁrst notation, the E is higher than the second E.
Fig. 2. Movements possible in the two dimensional interval space.
Tymoczko puts more emphasis into the understanding of the two dimensional musical space. This is likely becauseunderstanding this is much simpler than attempting to describemusical spaces in 3 or 4 dimensions. By understanding the twodimensional space, one can begin understanding the higherdimensional topologies as well.Figure 1 shows a visual representation of the two dimensional intervalic space. This model can be built from thecombination of the circular one dimensional space of eachsingular note. The ﬁrst note and the second note of the intervalrepresent the way in which movement can be plotted onto atopological mesh.Tymoczko also describes motions of pitch classes in themodel. Contrary movement in music is described as movementof the interval in different directions. In this model, it can bedescribed by vertical movement from one interval to anotherwhere movement upwards represents the notes moving towardsone another, and moving downward represents moving awayfrom one another. Parallel motion is described in music as bothnotes moving in the same direction, which can be representedas movement to the right or left in the mesh. Moving to theright results in parallel movement upwards and moving to theleft is parallel movement downwards [7], [9]. The possiblemovements in this two dimensional space is shown in ﬁgure2.A further understanding about what happens when we reachthe end of the graph is also necessary. Tymoczko describes thistwo dimensional space as repeating on the right and left in thesame way a mobius strip works. The right an left sides of thisplot are brought back around and twisted so that the [
F
,
F
]pairs match up and the [
C
,
C
] pairs match up [7], [9].Using the same formal language that we had mentioned inthe one dimensional section, we can understand movementsin this space as well. The musical passage taken from [10]in ﬁgure 3 contains a two voice intervalic passage. We canrepresent the movement of the two voices on the graph. Wenotice that the ﬁrst movement to the second is in parallelmotion but they do not move the same distance. Therefore
45
Fig. 3. A simple two voice passage containing various intervalic jumps. Thetime signature used is for allowing the music to ﬁt into a single measure.
we move in the direction of parallel until we reach one of the voice’s notes and then move the other voice’s note to theproper note along the diagonal. The 2nd to third interval is incontrary motion so we move vertically and then ﬁx the notemovement. The next interval is parallel with some voice ﬁxing,and lastly the 4th to 5th interval are a simple movement of the top voice. These movements along the diagram are shownin ﬁgure 1.We can also formally deﬁned these movements as
(
C,E
)
+2
,
+3
−−−−→
(
D,G
)
for the ﬁrst to second;
(
D,G
)
−
1
,
+2
−−−−→
(
C,A
)
for the second to third; and so on. In this instance,we can extend the formal language to simply include twonotes and two values for change. These concepts are furthergeneralized to any
n
dimensional chord space by Tymoczkoin [9].
B. Ndimensional Space
It is in the 3rd and 4th dimensional topologies that we beginseeing what is traditionally understood in music to be a chord.These higher dimensional spaces begin harder to visualize,however Tymoczko’s model generalizes to any dimensionalspace.The third dimensional space is shown in ﬁgure 4. This spaceis very similar to the two dimensional space described in theprevious section except containing a third note in the model.Because of this, Tymoczko concludes that the shape of themodel is that of a triangular prism [7]–[9]. This model containstwo folds that are used to connect the edges of the prismtogether. In ﬁgure 4, the
(
C,C,C
)
pairs match up as well asthe
(
E,E,E
)
pairs and
(
G,G,G
)
pairs [9].[10] concludes that higher dimensional chords would existin more modern music, especially in Jazz where 56 notechords are not uncommon. In these spaces, we would needto try to visualize 5 or 6 dimensional topologies. And though
Fig. 4. The Three dimensional orbifold described by Dmitri Tymoczko [7]–[9].
these are hard for us to visualize, they could be implemented inthis framework. Because of the difﬁculty of explaining thesespaces, they will not be discussed in this paper. A furtherdiscussion about these spaces can be found in [9] and [10].III. G
ENERATION USING
QL
EARNING
Using the models that were developed, a framework canbegin to take shape that uses these principles in the model tocreate tonal music. Instead of trying to create a grammaticalstructure that our music has to follow, a purely mathematicalapproach can begin to take shape using concepts of DynamicProgramming, Reinforcement Learning, and Topological Geometry. In this section we discuss and modify the QLearningmodel to generate chords in a progression.
A. QLearning
QLearning is a reinforcement learning technique developedby Watkins [11]. This method works by learning actionvalue functions that give an expected utility of a given actionin a given state. Unlike a similar reinforcement learningmethod, the Markov Decision Process, this algorithm gives anapproximation of the Markov Decision Process, speeding uprunning time. Unlike standard pathplanning algorithms, thisallows a random movement with some probability that we willnot reach the state that we desired.The QLearning algorithm uses a Bellman update equationas part of the algorithm itself. This allows the algorithm toimplicitly deﬁne transitions and utilities into the Q matrix thatis created. Later developments extended this model to includea learning rate, creating a delayed QLearning model. Thisaddition to the model adds a Probably Approximately Correctlearning model to the Q Learning model [12].Sutton et al. simpliﬁed the mathematical equation of QLearning with PAC to the equation shown in equation 1. In thisequation
α
represents the learning rate for a stateaction pair.
R
(
s
)
represents the Rewards of a given state.
γ
represents thediscount factor. And
Q
(
s,a
)
represents the current Qmatrixvalue of a stateaction pair.
s
represents the state that thealgorithm proposes to move to.
Q
(
s,a
)
←
Q
(
s,a
)(1
−
α
(
s,a
))+
α
(
s,a
)
R
(
s
) +
γmax
a
Q
(
s
,a
)
(1)This learning algorithm takes in a state list, an action listfor each state, and a rewards list for each state. We iterate thealgorithm for any speciﬁed number of episodes to train thealgorithm sufﬁciently. During each episode, we chose a stateat random from the states list. Next we run some randomaction from the actions list for the state. With that stateactionpair, we can solve for equation 1. We continue these randomactions until a terminal state is reached in the algorithm. Atthat point, we continue with another episode by choosing anew starting state location at random. This algorithm alwaysconverges on an answer, like its counterpart the MDP [13].After this learning phase has been completed, a Q matrix isreturned that contains estimated utilities for each stateaction
pair. With these utilities, a traversal pattern from any locationcan be used to choose actions which maximize the estimatedutility for the state. The action with the highest utility valueis the action we will take.If we run the algorithm on the 4x4 world shown in ﬁgure 5,we get the action policy that is show in the ﬁgure as well. Wecalculate the Q matrix to any number of episodes. The moreepisode that are run, the more accurate the policy. In a smallworld, a smaller number of episodes tend to converge quickly.
B. QLearning for chord progressions
In a previous paper, [10], I have discussed methods forcreating actions in the topological mapping discussed byTymoczko in [9]. This modiﬁcation relies heavily on themodiﬁcation of the transition matrix in a Markov DecisionProcess as deﬁned by Bellman [14], [15] as well as in [16],[17].In this model, the transition matrix is deﬁned as that shownin left side of the equation in 2. This transition matrix isderived in such a way that Partially Observable worlds willwork we well, known as Partially Observable Markov DecisionProblems (POMDP). However since we know the structureof the world in its entirety, the chord topology is a fullyobservable world and thereby the transition matrix becomesthe probability of an action, which is shown in 2. Thesederivations are further explained in [10].
P
(
s

s,a
) =
P
(
a
)
(2)Since our chords will be moving, we can deﬁne our actiona velocity vector,
v
x
, containing both directional and “speed”information. The speed is simply how far a state will travelin the world next. We can therefore deﬁne a chord treatment
Fig. 5. A 4x4 world with two terminal states, a positive and a negativeterminal state. All other states have a reward of 0, however this does notneed to be the case. The arrows represent the policy at any given state wherewe try to minimize the chances of entering a negative termination state andmaximize the chance of reaching a positive termination state.
by replacing the action with the velocity vector as shown inequation 3
P
(
s

s,v
x
) =
P
(
v
x
)
(3)Since each note can be considered independent, we canseparate the vector into its independent components such that
v
x
becomes
v
ijk
, as shown in equation 4.
P
(
s

s,v
ijk
) =
P
(
v
i
,v
j
,v
k
)
(4)Using the chain rule on the right hand side, we can furthersimplify the equation as shown in equation 5. Since movementis often dependent on where another note moved, the initialnote,
i
remains independent, the second note,
j
is onlydependent on
i
, and the third note,
k
, would be dependenton both
i
and
j
. We can remove the independent componentsout of equation 5 and are left with equation 6.
P
(
v
i
,v
j
,v
k
) =
P
(
v
i

v
j
,v
k
)
P
(
v
j

v
i
,v
k
)
P
(
v
k

v
i
,v
j
)
(5)
P
(
v
i
,v
j
,v
k
) =
P
(
v
i
)
P
(
v
j

v
i
)
P
(
v
k

v
i
,v
j
)
(6)As mentioned each vector contains both directional andspeed information about the action. We can then extract outeach of these components such that
P
(
v
ijk
)
is equal to thatshown in equation 7.
P
(
v
x
) =
P
(
d
x
,sp
x
) =
P
(
d
x
)
P
(
sp
x
)
(7)Replacing each vector with its speeddirection pair, we areleft with a vector probability shown in equation 8. Since wecan assume that only directions of the notes are dependent onthe previous direction, and that all speeds are independent of both direction and other speeds, we are left with the vectorprobability shown in equation 9.
P
(
v
ijk
) =
P
(
d
i
,sp
i
)
P
(
d
j
,sp
j

d
i
,sp
i
)
P
(
d
k
,sp
k

d
i
,sp
i
,d
j
,sp
j
)
(8)
P
(
v
ijk
) =
P
(
d
k

d
i
,d
j
)
P
(
d
j

d
i
)
P
(
d
i
)
P
(
sp
i
)
P
(
sp
j
)
P
(
sp
k
)
(9)Knowing this information we can change the Q learningmodel, exchanging the actions with the velocity vector
v
ijk
.This results in the change in the QMatrix as shown inequation 10. The Q matrix can remain a 2dimensional matrixby writing a function to us the combination of velocityvectors to identify each uniquely and using that id in placeof the velocity vector. This Q learning methods will capturethose derivations of the transition matrix implicitly using thedeveloped combination function.
Q
(
s,v
ijk
)
←
Q
(
s,v
ijk
) (1
−
α
)+
α
R
(
s
) +
γ max
v
ijk
Q
(
s
,v
ijk
)
(10)
C. Rewarding the System
The initial development of this algorithm used some basicassumptions as well as trial and error for developing a rewardsystem. These numbers were chosen empirically based on thequalitative sound of the result after trial and error. To rewardthe system, we choose speciﬁc chords that follow tonal theory.All major chords are rewarded with 1500 points in the system,all minor chords with 150. Augmented and diminished chordsare extremely rare in tonal music and are only rewarded with avalue of 8. Diatonic chords  those that are found in a speciﬁcmusical scale  are rewarded with an extra 1000 points. Otherchords that do not fall into these categories should be purelyaccidental and in this initial development, a negative rewardwas given.Terminal states in the system follow tonal music theory’sconcept of cadences in music. These chords are Diatonic andgenerally consist of a chord on the Fifth note of the scale (V),a chord on the fourth note of the scale (IV), or a chord onthe seventh note of a scale (diminished VII). These cadencesare most likely to return to a chord built on the ﬁrst pitch onthe scale (I). These chords were taken from music theory asdeﬁned by Kostka et al. [18].
D. Resolving Voice Leading Issues
With the deﬁnitions of actions and tonality deﬁned in thissection, an issue arises that can make, even human written,tonal music sound qualitatively “bad”. Voice leading, or thedecision of the arrangement of notes that are decided upon bymoving from one note to another in a passage, becomes anissue that must be taken into account [18].This does not imply that our assumptions are incorrect aboutmusic. In fact, humans have to consider proper voice leadingwhen writing music as well. A greedy voice leading approachis discussed in [10]. In this method, we take a single notefrom the chord which we are currently playing and calculatethe modulartwelve distance in both directions (“Up” and“Down”) to every note in the chord that we are moving to.This modular arithmetic is important so that values of 12 arerepresented as a 0 in the table.Therefore if we want to move from a C major chord
[
C,E,G
]
to a G major chord
[
G,B,D
]
, we take the leadingof the ﬁrst chord  which we will assume is lowest to highest
[
G,C,E
]
. We now calculate the motion to the new chord. Thelowest note,
G
, moves up 0 pitch classes to
G
, 4 pitch classesto
B
, and 7 pitch classes to
D
. We continue this calculation forall voices in the scale both upward and downward motions. Touse these vales, we simply loop over each voice and choosethe lowest movement from the distances to the next chord.Once we have used an movement, we remove the pitch classof the new chord from the list that we are searching [10].Though in many cases, this does not give the most optimalsolution, it is rare that composers present the most optimalsolution for voice leading as well. The greedy approach isacceptable. A more optimal approach is also presented in[10], however when chords become larger, the time it takesto calculate the voice leading becomes a problem of tetration(or superexponential growth).IV. D
ISCUSSING
T
ONALITY
Measuring the quality of music is a difﬁcult task. Even if we were to measure how much someone liked the music, thedeﬁnition of tonality does not always include music that peopleﬁnd aesthetically pleasing. Even amongst people, there aredifferent tastes that one has to account for. After creating themusic here, the measure of tonality was not that the musicturned out to be aesthetically pleasing, but that the musicrevolved around a single tone.Figure 6 shows a short 16 measure composition createdby the framework. One can initially see that the music isvery chromatic, as one would expect from other nontonalgenerative methods. There are a lot of sharps and ﬂats in themusic which might lead one to believe that the music has beengenerated by a computer, and in fact they would be right. Themusic starts on the C major scale [C, E, G] and moves aboutthe world. At the end of the 16 measure, the music ended ona second C Major scale.However, a signiﬁcant improvement over purely stochasticmethods of composition is that the music continues to revolve,and in fact remain on the pitch class of C in the Tenor voice(second voice from the bottom). This means that there areother parameters that might need to be taken into consideration to improve the results of the framework, but that theexperiment was a relative success. By remaining around thekey of C, and in fact ending on the chord which the musicstarted on, was the goal of this framework.One thing that was not taken into consideration is the factthat chord progressions are not a single time task. ChordProgressions happen over a series of beats in the measure (ortime increments). Therefore, an extension to the QLearningmodel may need to be looked at to include this temporaldimension.There are many algorithms which could be used to furtherexperiment with this framework. The temporal differencelearning algorithm [19] is a common method for solving thereinforcement learning problem with regards to a time delayedrewards. This could help creating a system that takes anynumber of speciﬁed or learned steps to reach the goal.Another approach that can be looked at is learning thereward systems using Bayesian Networks. Music from famouscomposers could be run in a system that learns the chordprogressions and begins to reward certain chords more prominently based on what is learned. This would likely make themusic much more pleasing, but still be relatively stochastic.The goal is not to imitateV. I
NTERACTIVE
G
ENERATION FOR
G
AMES
One of the beneﬁts of using this reward system approachto generative chord progressions is that any number of rewardsystems can be used to learn the rewards for speciﬁc chords.A video game could have auditory music feedback for howwell a user is doing in the system. As the user begins