A Comprehensive Game Design Methodology
From First Ideas to Spectacular Pitches and Proposals

The content of this website is licensed under a Creative Commons CC BY-NC-SA 4.0 Attribution–NonCommercial– ShareAlike 4.0 International License. You can freely use and share everything non-commercially and under the same license as long as you attribute it to and link to:

J. Martin | |

However, you can also buy the Ludotronics PDF edition
for an unreasonably moderate price at DriveThruRPG.
Learn here about the five excellent reasons to do so!

Why DriveThruRPG? It’s the largest tabletop RPG download store and you’ll probably end up buying much more than just your copy of Ludotronics. Which would benefit all game designers!

Why not Amazon? Ludotronics isn’t well-suited for the Kindle format. And at €14.99, Amazon’s cut amounts to €9.75. Well, no.

More to read: My papers at Research Gate, my blogs at between drafts and just drafts.

Level Three: Plurimediality

Process Phase Level Three

Beat 3. Sound

From Score to Speech

As a reminder, the Plurimediality territory—the intersection between Ludology and Cinematology—is concerned with the game’s overall characteristics in terms of usability and aesthetics, or efficacy and enchantment. Therefore, sound is discussed in this beat with regard to design decisions that apply to the game as a whole. The question of how sound elements can trigger and support emotions in individual gameplay moments, in contrast, belongs to the Narrativity territory and is discussed in Level Four: Narrativity.

First, we will break “sound” down into its components, which we call types. For each type, then, we will assign the roles and tasks it has to fulfill in a game, which we call functions. Next, each type comes in different forms and can appear in different modes to fulfill its functions. That way, we will have three basic types of sound, each of which fulfills certain functions in a variety of forms and modes:

  • Type: Music, Foley, and Speech
  • Function: Mood, Feedback, and Information
  • Form: Forms of Music, Forms of Foley, and Forms of Speech
  • Mode: Diegetic, Meta-Diegetic, and Non-Diegetic

There are two distinct sets of design decisions you have to deal with. The first set is about the type or types of sound that you want to use in your game in principle. Should there be music? Should there be foley? Should there be speech? These decisions depend on the general functions you have in mind. The second set of decisions is about the different forms of music, foley, or speech that you want to use, what type-specific functions they should fulfill, and how they relate to the game world in terms of diegesis as the difference between what is part of the game world and what isn’t, which we will discuss more in-depth later.

What makes these design decisions tricky is that they appear much more obvious than they actually are. You have to think very carefully even about the most basic choices.

Let’s start with the first set, the type or types of sound you want to use in your game in principle, contingent upon the general functions they should fulfill in your game. Remember, it’s “efficacy first” not because efficacy is more important than enchantment, far from it. It’s efficacy first because everything has to have a function, which can also be an aesthetic function. Everything without a clear-cut function will violate the directive of skill, style, and subject matter.

Now, each of the three basic types, music, foley, and speech, can serve any or all of the three basic functions. They’re all capable of setting the mood, giving feedback, or providing information. Let’s take a closer look at these functions to get a better picture.

Mood affects or even controls player emotion. Feedback on player actions include menu feedback and input feedback in all three interfaces, as discussed in Beat 1. Style. Information includes meaning and cues for interpreting any given element in the game world. Each function, moreover, can be achieved by its negation: silence. The absence of music, foley, or speech can convey all kinds of mood, feedback, and information as impressively and efficiently as their respective presence. And between presence and absence, between full throttle and silence, there’s the whole world of intensity to play with—from the grandiose to the mesmerizing to the minimalist.

Accordingly, the questions you have to ask yourself are:

  • Will your game need music (score/soundtrack), and which function or functions will it serve?
  • Will your game need foley (sound effect/ambient sound), and which function or functions will it serve?
  • Will your game need speech (monologue/dialogue), and which function or functions will it serve?

Let’s examine a few aspects of music, foley, and speech in more detail to demonstrate that you should’t take any answer to these questions for granted.

1999’s Aliens versus Predator is a good example for music. You could play the game with the soundtrack CD mounted or without, and that made a huge difference, especially when playing the colonial marine levels. The music is great. If it is on, the player feels propelled forward with a sense of mounting suspense and epic adventure. Without the soundtrack, though, the player becomes much more aware of their actual isolation and loneliness, and the horror of being at the mercy of their environment and their ability to read environmental cues. Both playing experiences are breathtaking. But they are different in essential ways. It’s a rather extreme example, but it is almost always a decision that will change the character of your game, sometimes profoundly. In a dialogue between Ennio Morricone and Sergio Miceli in Composing for the Cinema, the metaphor of film music as a “guest” is introduced. Such a guest should not be invited out of habit, just because most films—or games, in our case—happen to have music. That guest should be invited or not invited after careful considerations. As Morricone puts it, music should be present for poetic reasons. Otherwise, this guest would be useless and could even make a bad impression. Also, music shouldn’t be a surprise guest. If it isn’t carefully introduced and listeners are not prepared for it, it might not be able to fulfill its functions.

In terms of foley, in contrast, it’s hard to come up with a game that would be able to manage without. Acoustic feedback is so vital, from menus to player actions to environmental cues, that a game without any foley is almost inconceivable. Even grunts, cries, and emotive sounds of any kind that are not speech are essentially foley. And there are still games released like Zelda: Breath of the Wild that—with the exception of cutscenes—convey what non-player characters have to say in written form while using foley to indicate speech.

Which brings us to speech. Especially in dramatically complete games, discussed in Level One: Integral Perspectives I, speech seems inevitable. It isn’t. Speech can be conveyed through written text, or, as in Zelda: Breath of the Wild and many other games before it, a combination of written text and foley. Speech can be conveyed by symbols, think The Sims. Then, information can be conveyed by other means than speech, text, or symbols altogether. Naturally, it’s much harder to convey information visually or audiovisually, such as combining visual representation with music or foley, than through speech or text. Speech is overrated because it’s so much easier to put something into words than it is to think about alternatives. What’s more, speech can make things worse! Case in point, Uncharted 4: A Thief’s End and Uncharted: The Lost Legacy. Like the whole series, these are hugely enjoyable games. But their disposable enemies have a rather limited set of dialogues in the manner of “Got anything?” “Nothing.” “Check Over There. I’ll go this way.” “Sure thing, mate.” And so on, over and over. Rarely are these dialogues warranted, if at all. The mercenaries should have followed their routes and procedures and used hand and arm signals to communicate observations and orders, if necessary, as professionals are wont to do. In addition, at no point are any of these dialogues needed to alert the player of enemy presence or reveal cues toward how to engage or evade them. And because putting something into words is so easy, it obscures the fact that speech never comes cheap. You need great dialogue writers, great voice talent, and you need the budget to pay them. In terms of quality, it is never cheaper to use speech instead of other means to convey mood, feedback, or information. Yet, to convey mood, feedback, and information, the use of speech is so natural as to be perceived as “free,” so all this is easily forgotten.

Silence, finally, needs the contrast to non-silence to work. In other words, silence will only be able to set the mood, give feedback, or convey information if it temporarily replaces either music, foley, or speech. Thus, if you need silence, you also need music, foley, or speech.

That should suffice to get you on track to decide if you want to use music, foley, or speech in your game. When you have decided which type or types you want to use, and what functions they need to fulfill, you have completed your first set of design decisions!

Now let’s consider the second set of design decisions. Here, you will learn about the different forms you can choose for music, foley, and speech, and which forms are better suited for which functions, including type-specific functions in two cases. Mode will also be discussed for each type, toward how music, foley and speech work as part of the game world and outside the game world.

Again, let’s start with music. If you decided that you want to use music in your game, you will have to make decisions according to form, its type-specific functions, and mode.

The form decision refers to the kind of music you want to employ—everything from a vast voice & orchestra apparatus that would put Gustav Mahler to shame to rock and rap music and synth landscapes to sparsely scattered notes from a lonely blues harp. The basic categories we will work with are art music, popular music, and traditional music. Certainly, these categories have fuzzy borders, and each category contains a whole universe of forms in turn. But in our context of designing games, these three categories are the most practical, and they serve our purposes well.

Your form decisions should be based on the following criteria, in order of importance: your theme, your value set, your primary target audience’s preferences and their expectations. The criteria are not: personal taste, current hotness, easy availability. Furthermore, your budget shouldn’t drive your principal decision but how you apply that principal decision. If art music (not a perfect designation, but vastly preferable to “classical” or “serious” music) is the best match for your theme, value set, and target audience, a full-scale orchestra isn’t your only choice! What about a chamber orchestra? a woodwind ensemble? a string quartet? a solo artist of any kind, like voice, piano, violin, or concert flute? an a capella chamber choir? There are plenty of options to choose from, and you will certainly find something that vibes with your criteria and your budget as well. Popular music (again, not a perfect designation, but at least everybody knows what it refers to) also has a wide price range, most obviously in terms of whom you want to hire or what you want to commission or license, and how many songs or ambient pieces you will need for your game. The third principal form, traditional music, also opens up a wide range of possibilities, and we will get back to this form when discussing its strengths in terms of function.

In all cases, keep the question of licensing in mind. For art music from composers that have been dead for a sufficient amount of time, you don’t have to pay license fees for the music itself, but in almost all cases for recorded performances. Except, of course, you either perform it yourself or have it performed by musicians hired and paid by you. (And then, don’t copy any published sheet music your musicians might need, but buy copies from the publishers.) Yet, be careful—huge amounts of terrific art music from composers exist who are alive, or from composers whose copyrights haven’t expired. For popular music, you almost always have to pay license fees, and if you pay a cover band to perform somebody else’s music, that applies too. And if you hire musicians who play their own music or indeed write music or songs for your game, yes, you will have to pay license fees! For traditional music, things are again a bit different. Traditional music is often license-free, but specific arrangements are usually not. So beware. Also, be both sensitive and sensible with respect to cultural appropriation. Borrowing expressions from a cultural group that is not yours without any involvement from members of that group is never a good idea. And be generous in giving back! Which could be granting your own arrangement back into the Public Domain instead of nailing it shut with a copyright claim.

The one thing you want to avoid at all costs are contracts of any kind where music licenses will expire after a number of years. You may think, well, in such-and-such a number of years, I won’t care anymore. Don’t. It’s a pest for everyone involved. Maybe your game has become a classic by then. In that case, you will be forced to renegotiate license fees to keep selling your game, or let others sell your game, and you’re not exactly in a position to bargain. Alternatively, if you go and take the music with expired licenses out of your game and replace it with something else, everybody will be rightfully salty and call you names. Don’t attach yourself so much to a certain idea that others can take your game hostage. Instead, look around. There is so much talent out there, and independent talent at that, for art and popular and traditional music. For decent pay and with a decent contract, all this can be avoided and your game will live and shine forever.

Next, let’s drill down to music’s specific function palette. These are functions that music can serve natively, beyond the three basic functions of mood, feedback, and information that all three types of sound have in common. This palette has four sections that work like primary colors, so to speak: evoke, illustrate, identify, and mesmerize.

  • Evoke has everything to do with setting the mood and presenting and eliciting emotions.
  • Illustrate involves accompanying, intensifying, reacting to, and advancing dramatic developments.
  • Identify refers to presenting characters, cultures, locales, and motifs and make them recognizable.
  • Mesmerize contributes to keeping the player spellbound, engaged, and focused.

In the “evoke” section, you will need to develop a good understanding about mood and emotions as such, the specific mood and emotions you want to establish, and the musical techniques needed to evoke precisely the mood and the emotions you have in mind.

In the “illustrate” section, you must be able to carry out what is known as “spotting sessions.” In a spotting session, you figure out together with the composer and/or the musicians where the music should start in a level, where it should stop, what it should do in between and why, including doing nothing, and a host of dramatic details, all without wasting their time and/or embarrassing yourself. Moreover, in case your game isn’t scripted to hell and back, you need the means to adapt your music in real-time to what’s actually going on during play. Long ago, there were engines and container formats like iMuse or MOD, now almost forgotten, that contained a number of layers of the same track in different keys, tempi, types of instrumentation, and so on, plus a handful of transitional sections, that could adapt to actual player behavior. Fortunately, there’s been a resurging interest in dynamic game audio, and a slew of freshly-baked dynamic music systems have popped up that you can choose from. But it’s not an easy task—if you want to read up on this, Tim van Geelen’s “Realizing Groundbreaking Adaptive Music” is a good place to start.

In the “identify” section, think of musical motifs as RFIDs that you can attach to everything that the player has to identify in a “near field” or “contactless” manner, if you will. (Something very similar can be done with foley, but for different ends, to be discussed later.) The most well-known technique is the so-called “leitmotif” technique, of which the best-know examples in turn are Richard Wagner’s operas and Ennio Morricone’s movie scores. Leitmotif means that every character has their own musical motif, and when it is played, you know that the character has appeared, has been there, or will appear shortly; has been talked about or thought about by other characters; or this character’s personal interests are affected in some way. There’s more, but it should suffice to show how versatile leitmotifs can be. And character leitmotifs aren’t the only game in town! You can attach leitmotifs to different cultures and different species. You can attach leitmotifs to relevant locations, even to ideas a.k.a. motifs! And if you have a talented composer, which you should, they can weave such leitmotifs together into a soundtrack that not only “evokes” and “illustrates” and “identifies,” but tells a deep and rich story about how characters, cultures, locales, and motifs relate to each other. This works for drama as well as for comedy. Just check Richard Wagner’s Meistersinger, where wit and humor and innuendo and outright punchlines are delivered through interwoven leitmotifs.

What Morricone or Wagner can also teach you is that leitmotifs are not the exclusive domain of melody. For the composition of leitmotifs for characters, cultures, locales, and motifs, you can choose from five different domains: melody, harmony, instrumentation, orchestration, and timing. Most of the time, leitmotifs combine several of these domains, but there’s usually one or two domains that stick out.

Here’s the rough guide. Melody is a string of notes that form a recognizable tune. Harmony is what happens when you play different notes at the same time. Instrumentation is the kind of instrument (including human voice) that you choose for performing the notes you have in mind. Orchestration is how you combine different instruments for a particular chord or a particular passage. Timing, finally, is the way how tones and chords form patterns over time—tempo, repetition, periodicity, rest (intervals of silence); the length of those notes, chords, and rests; and, in short, everything that is associated with beat or rhythm.

For simplicity’s sake, some prominent sound properties are associated with these domains instead of having their own domains. Included in melody are musical scales with their different melodic characteristics, which is called “mode,” and the frequency range that we experience as higher or lower, which is called “pitch.” Included in harmony are the tonal or atonal relationships and hierarchies between different notes, which is called “tonality.” Included in instrumentation is the tone color/sound quality that depends on the specific instrument or human voice as the source of that sound, which is called “timbre.” So there’s a veritable rabbit hole that you can follow down, a whole world of techniques and opportunities waiting for you to make your leitmotifs an unforgettable experience.

Fig.4.31 Leitmotif Techniques
Fig.4.31 Leitmotif Techniques

Finally, the “mesmerize” section. Whenever the player has to focus on a repetitive task, that’s where mesmerizing music should be considered. Being mesmerizing isn’t equivalent to being repetitive, but there’s a strong correlation. From all the spiritual endeavors in the history of humankind that require prolonged engagement and focus, it is hard to find one that doesn’t involve some form of repetition by the individual or the group as an essential component. Dancing, singing, chanting, shuckling, you name it. For games, the most famous example that comes to mind is the A-type Tetris theme, the “Korobeyniki” folk song-arrangement by Tanaka Hirokazu. Traditional or folk music in general, all over the world, is strong in repetition—not only because repeating patterns are easily memorized, but because they are linked to repetitive actions in the context of work or spiritual endeavors or, often enough, both. Certainly, art music and popular music can be mesmerizing as well, from Johann Sebastian Bach’s endlessly rising canon to Steve Reich’s “Clapping Music” to Iron Butterfly’s “In-A-Gadda-Da-Vida,” to give you a taste.

Finally, there’s mode—when and how you should use diegetic music, non-diegetic music, or both. These terms warrant an introduction.

A lot has been written about diegetic and non-diegetic music. On the surface it’s fairly simple: diegetic music belongs to your game world in such a way that the characters in that world can hear it, or would in principle be able to hear it, while non-diegetic music can only be heard by someone who isn’t part of that game world, which is the player or players. If you have a Washington DC level and your player character walks through a subway station and there’s a street musician performing Bach’s chaconne for solo violin, so that your character and all the non-player characters in that metro station can hear it, that’s diegetic. If your player character experiences rugged loneliness on a mountain top to that very same music, but there’s neither a hidden violinist on that mountain, nor a radio or other replay device, so only the player can hear this music, that’s non-diegetic. Especially games that work with period music, like Mafia III or Fallout 4, will have licensed music that plays as diegetic music through radio stations and car radios and such. Most of the time, though, such games will feature non-diegetic music as well, often in the form of ambient soundtracks that kick in when the player switches off the radio, leaves the car, and so on. You can also play with cross-diegetic effects, when originally non-diegetic music becomes diegetic, or vice versa. An unconventional example is Robert Altman’s 1973 movie The Long Goodbye, where John Williams & Johnny Mercer’s title song “The Long Goodbye” pops up again and again during the movie in different forms as supermarket muzak, hippy singalong, a Mexican brass band march, and whatnot, which is at the same time diegetic and non-diegetic, and eerily effective. (On a related note, the appearance of “The Long Goodbye” in Star Wars: The Last Jedi as casino muzak on Canto Bight is baffling.)

Artistic decisions need to be made, but there’s really no hard and fast rule to rely on, so it’s basically up to you.

Yet, there’s more. Between diegetic and non-diegetic, there exists another category that we will call meta-diegetic (it’s one of several available terms, but we will address terminology later). It comprises music that either only the player character can hear (hallucinations, dreams, visions, and so on), or can be heard by other characters in the game world, but the player character hears it in a markedly different way (transformed, distorted, and so on). We will come back to this and add a few more aspect when we discuss foley and mode, and again when we discuss speech and mode.

Now that we have looked at the different forms, the functional categories, and the different modes of music, we will do the same thing for foley.

For foley, we can differentiate three general form categories:

  • Natural sounds. These are the sounds of nature and of living beings, humans included, that do not fall into the category of speech or emotive foley (more on that below); sounds that are created by the movement of living beings and natural phenomena; and sounds that are created or modified by extraordinary spaces and environments. A common feature of natural sounds is that instances of the same sound are variable, fluctuating, and unpredictable as to their exact appearance.
  • Artifactual sounds. These are the sounds that emanate from artifacts or the use of artifacts, from tools and equipment and guns to machinery and technology in general. A common feature of artifactual sounds is that instances of the same sound remain constant under similar circumstances and their exact appearance is highly predictable.
  • Emotive sounds. These are the sounds from living beings that communicate internal states like emotions or wants and needs but are not speech.

That’s very coarse-grained, again. It has to be—categorizing all the possible forms of foley is a hopeless endeavor in any case. For our purposes, these three distinctions suffice, and they are important. The term “artifactual” roughly corresponds to how the term “artifact” is used by archaeologists for something made or modified and used by humans. (For us, though, it will also incorporate what counts as structures and features in archaeological parlance, i.e., artifacts that are not or not easily movable, and we’re not restricted to humans.) The decisive point is that artifactual sounds are predictable and natural sounds are not. Neither are emotive sounds. If your tool or gun or generator or iron tower makes always the same sound or sounds in the same acoustic context, that’s perfectly okay. If your wind or bird or human makes always the same sound or sounds in the same acoustic context, that’s weird. It will grate and attract attention. So if you deal with natural and emotive sounds, provide variation! Don’t pepper your game with Wilhelm Screams unless all your enemies are perfect clones with the narrowest of emotive registers.

In contrast to music, and later also speech, foley does not have its own native function palette. That way, we can simply apply our three basic categories mood, feedback, and information: the sounds of a summer forest after a brief rain shower; the sound when you hit an enemy, pick up an object, or select a menu option; the sound a creature or enemy makes habitually before you can see it, or before it can see you. Let’s have a look at these in more detail.

  • The Mood Function of Foley. Using foley, the mood function isn’t just about communicating emotions from the emotional landscape like joy, anger, or triumph. Primarily, it’s about establishing a sense of place, time, and context. You can use foley to evoke a certain historical epoch or a certain geographical region. You can use foley to convey a gigantic, commanding environment or a crushingly claustrophobic place, a mounting sense of urgency or feelings of peace and relaxation.
  • The Feedback Function of Foley. Using foley, the feedback function provides feedback on player action either within or without the game world. Within the game world, for example, foley should indicate that the player has successfully dropped or picked up an item. Outside the game world, for example, foley should indicate that the player has added, removed, or rearranged an item in their inventory. As will be seen below, foley’s feedback function is strongly connected to its mode.
  • The Information Function of Foley. Using foley, the information function has many different duties. One of the most prominent is to provide cues: distinctive and characteristic sound markers that are attached to anything that has the potential to be dangerous. Like the leitmotif technique discussed above, these markers work like RFID tags, and the player can recognize and distinguish enemies and dangerous situations instead of bumping into them unprepared. Moreover, these markers, or cues, should always be designed in such a way that they not only inform the player about the presence or approach of a certain creature or dangerous situation, but also about its type and its character.

In terms of mode, finally, foley can also be diegetic, meta-diegetic, or non-diegetic. On the whole, being diegetic or non-diegetic depends on the interface type, as discussed in Beat 1. Style. Almost always, foley sounds from the preference interface, the inventory interface, and in some cases from a traditional overlay gameplay interface, will be non-diegetic. In transparent or skeuomorphic gameplay interfaces, foley sounds will almost always be diegetic. But that’s just a guidepost. Like music, foley sounds from the gameplay interface can also be meta-diegetic when only the player character can hear them, or the player character experiences them in a substantially different way than other characters.

In “A Conceptual Framework for the Analysis of First-Person Shooter Audio and its Potential Use for Game Engines,” Mark Grimshaw and Gareth Scott differentiate diegetic sound events even further for multiplayer action along specified functions like who triggers these sounds or who can hear them, adding the terms ideodiegetic, telediegetic, kinediegetic, and exodiegetic to the pool. If your game includes multiplayer action, you might want to check this out. Here, we will limit ourselves to the terms diegetic, non-diegetic, and meta-diegetic.

Let’s proceed to our third and last type, speech. As certain schools of linguists have put it, language is a tool to manipulate one’s environment, with and through other humans. This “bio tool,” together with opposable thumbs, has been essential for the enormous success humans have enjoyed as a species not only by adapting to their environments, but by adapting environments to their needs. Indeed, to claim that language exists so that we can communicate with each other doesn’t really mean anything—it’s like saying that we are able to communicate in order to communicate. Moreover, animals can communicate just fine without being able to use language in the very specific and qualitatively different ways human animals can. Why is that important? Because if you think about language differently, it will open your mind for the versatility of speech and the numerous possibilities to utilize speech for different functions in your game. Or, on the contrary, it will open your mind to drop speech and use something else entirely, as discussed above!

For our purposes, speech comes in just two forms: as monologue or as dialogue. (Special cases are covered by mode, discussed further below.) This sounds simple but isn’t. Whatever you want to express through speech in your game, you should think about what you can accomplish with dialogue instead of monologue. Dialogue is much more natural and much more interesting, not least because dialogue is social. Monologues can also be interesting, but they should be used sparsely, and then to maximum effect.

Sadly, that’s not what we see in games, most of the time. If you think about it, most speech we find in games that we think of as dialogue is actually monologue. Why? A dialogue is a conversational exchange, a mutual exploration of observations, ideas, and intents, and it’s about relationships. Speech that merely lectures or informs isn’t dialogue, and even trading information doesn’t constitute a dialogue, but rather a sequence of monologues. Naturally, it’s easier to write great dialogues if the characters, including the player character, have emotional bonds and shared interests or opposing interests that connect them in one way or another. And the reason is very simple—to write interesting dialogue, you need to have interesting characters and interesting relationships between these characters! So when your dialogues fall flat, it could be an indicator for a completely different problem, namely that your characters and the relationships between your characters need a substantial, and most likely structural, rewriting.

Like music, speech has its own specialized palette of functions beyond mood, feedback, and information. Without diving too deep into linguistic waters, which is a polite way of saying that we will once more cut down and adapt a complex scientific domain with reckless abandon for our purpose of designing games, this palette has five sections that cover everything we need:

  • Inform. Describe external or internal states including intent; teach, educate, etc.
  • Inquire. Ask to be informed in order to learn something, be educated, develop, etc.
  • Influence. Request, persuade, order, command, convince, scare, inspire, etc.
  • Cultivate. Socialize, chat, assure, ascertain, introduce, etc.
  • Entertain. Aesthetic and poetic speech.

Let’s turn once more to the above-mentioned “dialogues” from Uncharted 4: A Thief’s End and Uncharted: The Lost Legacy and check them against this palette. Looking closely, you will notice something odd: they do not satisfy any of these five conditions! They do not inform—nothing that is exchanged exceeds anyone’s previous knowledge. They don’t inquire—any fresh information in these situations would have been obvious and not in need of being communicated. It doesn’t influence anyone—“this way” or “that way” is utterly random and devoid of directions. It doesn’t cultivate relationships—lest these are the most ineptly socializing mercs the world has ever seen. And it doesn’t entertain anyone—least of all the player. For the player, as has been said before, these dialogues serve no function either—they provide no cues for enemy presence or tactical options.

A great counter-example is Oxenfree. Not only is this game full of functional speech that informs, inquires, influences, cultivates, and entertains; techniques like dialogue time limits and interruptions (an ingeniously designed game mechanic, actually) make these dialogues appear exceptionally natural.

That’s not too hard, but it gets harder. If your piece of speech clearly serves one or more functions, then you have to ask yourself what function it serves with respect to the characters in your game world, and what function it serves with respect to your player. Answering both questions in a satisfactory manner is deeply important if you don’t want to litter your game with dialogues that make players cringe in their seats. Here’s why:

  • The Player Function of Speech. If your dialogue serves one or several functions only for the player, and not also for the characters within the game, chances are that these dialogues will sound like terrible movie exposition, the kind of screamingly unnatural informational exchange between non-player characters that would never happen that way within the game world—employed by the shovelful, for example, in games where the player character approaches guards in stealth mode. (A more recent example, regrettably, was Rise of the Tomb Raider.)
  • The Character Function of Speech.. If your dialogue serves one or several functions only for the characters within the game, and not also for the player, these dialogues aren’t doing what they’re supposed to be doing. What they should do, as discussed in more detail in Level Five: Architectonics, is to advance the plot, portray a character, communicate an insight into the game world, highlight an aspect related to the theme or its motifs, elicit an emotion, advance player proficiency, or push the player toward the goal. All of that can be accomplished by one or more of the five functions inform, inquire, influence, cultivate, or entertain.

In a nutshell, every dialogue should serve one or more functions for the characters in your game world and for the player. These functions don’t have to match! For example, characters within the game world can entertain other characters in the game world, which will provide the player with information about local lore. Guards that are bored out of their wits can socialize in a manner that may be highly entertaining for the player, and perhaps influence the player to try a different approach than to kill them. Use your imagination!

On the player character’s side, try to cut down on the inquire function, unless your game is a murder mystery or something along that line, and the player character is officially supposed to grill everyone on every occasion. Try to combine the inquire function with at least one other speech function. It will make it more palatable, realistic, and probable that non-player characters will open up. Combinations with the cultivate function or entertain function work great, but also as tit-for-tat with the inform function. And it works together with the influence function as well—to persuade, convince, or scare someone into delivering a piece of information is much more plausible, and much more interesting, than clicking systematically through predictable questions on a dialogue wheel.

Finally, mode. Almost all speech that is used within the game world is diegetic—speech that the game world’s characters are either able to hear, or would be able to hear in principle. This covers regular monologue, dialogue, and every kind of recording, like voice diaries. It also covers every kind of remote voice commands delivered to the player character, from radio to telepathy, because all that is part of the world, and other radio operators or other telepaths would be able to listen in, at least in principle.

But there is meta-diegetic speech in games as well, most frequently in the form of reflective speech. Its major forms are voice-over narration and soliloquies, each with a different time relationship to the matter on which they reflect. Voice-over narration might be best known from film noir and soliloquies from Hamlet. But reflective speech has been successfully employed in many other genres as well, including comedy. This technique is generally regarded as meta-diegetic in the original sense of that term, sometimes alternatively called hypodiegetic or extradiegetic. All three terms refer to embedded stories and frame narratives including voice-over narration. Lately, though, their meaning has expanded to include the techniques discussed in the context of music: treating dreams, hallucinations, distortions, and similar experiences as if they were embedded narratives, as stories within a story. Hellblade: Senua’s Sacrifice, again, is a notable example for this technique.

Voice-over narration, in turn, can again be differentiated for the purposes of narrative design: whether the narrator is the protagonist, as is often the case; or another character, which is rather rare; or someone who is not part of the fiction, which is again more common. Following Gérard Genette’s terminology from Narrative Discourse, these meta-diegetic events are called autodiegetic, homodiegetic, and heterodiegetic, respectively.

Playing with meta-diegetic speech is versatile and effective as long as it isn’t used to force-feed information to the player. If you use voice-over narration or soliloquies, it should serve dramatic purposes within your game that aren’t just exposition and information. As a cautionary tale, this was precisely the case with the theatrical versions of Blade Runner or Dark City, where publishers forced voice-over narration upon the directors to deleterious effects. Plus, as meta-diegetic speech changes the character of a game in substantial ways, it must be intimately connected to your theme.

Finally, is there non-diegetic speech in games? Yes, but it’s very rare. There exist voice-over narrators, especially in so-called serious games, who inform the player about possible moves, tasks, and so on, or explain menu choices, as a kind of talking manual beyond regular accessibility features. Some voice-over narrators taunt the player on certain actions, e.g., when the player wants to save, access the menu, or quit. And some voice-over narrators even comment on the story throughout, like in Bastion or in Edna & Harvey: Harvey’s New Eyes.

These examples do not exhaust all possible forms of diegetic, meta-diegetic, and non-diegetic speech events in games. It’s a rewarding terrain for creative and imaginative game design.

To conclude this section on speech, some advice with regard to voice talent. If you want to make a professional game, you will need professional voice talent, and if you want to make a professional pitch, and that pitch features a teaser or trailer or prototype that includes speech, you need professional voice talent for that too.

This should be obvious. Yet, it often isn’t. After all, can’t we’all speak, and haven’t we even done it for years? No. Like writing, drawing, dancing, running, throwing, cooking, and numerous other things we might do on a daily basis, there’s a huge difference between an activity as such and that same activity as an artistic or athletic expression. And so it is with speaking. These are professionals who have trained and honed their craft for infinite hours year after year. That’s what you pay them for. You’re not paying them for reading lines from a screen.

When you choose and brief the voice talent that fits your game, be aware of the fact that games are a very intimate medium. Game designers have a habit of pushing their voice actors into overly dramatic poses to sculpt every line, remark, warning, or even throwaway gag into forceful theatrical expressions of utmost urgency. Which is completely at odds with how people act and speak even in critical situations. And never is more than one character allowed to speak at the same time! (Again, a great counterexample to all of this is Oxenfree.) Behind that lurks the idea that every single line throughout the whole game is immensely relevant and must be received and understood by the player at all costs. Which is often true for speech functions like inform and influence, but not for speech functions like cultivate or entertain. Bantering, for example, falls into the cultivate category, and bantering among combat troops falls terribly flat in games when every line is conceived and performed like a rare gem worthy of the most prestigious comedy award. Just see to it that the important bits are understood by the player, and that the less important bits can be understood by the player.

Or, actually, see to it that they cannot be understood! A good example is the breakfast scene in Alien after the ship’s computer pulled the crew from cryosleep. The camera pans around the table, the crew members chatter away, somewhat relaxed, but there’s also palpable tension. And much of what they say isn’t fully intelligible. As Ridley Scott later revealed in an audio commentary, that was on purpose. Being unable to fully follow a conversation makes people uneasy, anxious, and even fearful, which was exactly what Scott intended. Games aren’t movies, granted. But games are far closer to movies than to plays with stage writing and theatrical expressions and articulations that are supposed to reach even the cheapest seats on the balcony with stupendous clarity.

There’s one last sound element we haven’t mentioned yet: soundscape. Soundscapes do not fall into any of our three basic categories music, foley, or speech because they encompass all three. In your game, it is everything your player hears while playing that is related to your game. (With the possible exception of screaming cooling fans.)

If you are able to weave all your music, foley, and speech events together into a distinct and recognizable soundscape, that’s a strong differentiator. It doesn’t have to be pleasing to the ear, but it does have to be good. It should draw out emotions and memories of these emotions when listened to.

Moreover, you can use the soundscape to fine-tune the playing experience and even affect player actions by modulating it in ways that make it more pleasing or displeasing. You can subtly manipulate the player into staying at certain places longer than others, or leaving certain places earlier than others (real-life venues like stores or restaurants have done this for ages). You can influence the player’s mood in ways that stimulate certain actions or decisions not only with music, foley, or speech, but with your soundscape as a whole.

If you want to develop a better sense of how soundscapes work and what they can or cannot do, try to listen to soundscapes in public spaces, like restaurants, stores, banks, offices, and so on, and try to relate these to your affective attitudes toward these places. The soundscape is a powerful tool to work with that is often overlooked.

Fig.4.32 Game Sound Model
Fig.4.32 Game Sound Model

up | down


Along style, space, and sound, we discussed the general design decisions in this level that you will have to make in the Plurimediality territory toward functional aesthetics, thematic unity, and a holistic user experience. You won’t have to design everything right away. But you will have to create at least some of these elements in an inspiring and convincing manner—which is to say, in a professional manner—for your pitch and maybe your prototype, to be discussed in the Proposition phase’s Level Two: Polishing. Some of these elements you can certainly create yourself. But don’t fool yourself with regard to those you cannot.

Ask yourself: do you have a good idea, preferably a creative vision, about your game’s functional aesthetics in terms of style, menus, spaces and passages, and a great soundscape composed of foley and possibly music, speech, and silence that rocks? Is this vision compatible with your USP and your value set? Does it support our motivational building in terms of mastery/performance on the ludological side? And for compelling aesthetics that advance and encourage relatedness and the building of communities on the cinematological side? And would you be able to present and communicate this vision in your pitch, and maybe in your prototype, with the help of professional artists?

If your answer to all these questions is yes, congratulations! You made a great leap forward by beating this level.