A Rant on Dialogue in Games
Video game storytelling is a nascent field. If we’re honest about it, we don’t really know how to do it. At least not well. Most of the time we just give up and copy film.
Because it’s comfortable to wrap ourselves in the cozy vernacular of the Hero’s Journey, three-act structure, and cinematography in general. It feels much safer. When we look at the interactivity of our medium and how it’s going against the grain of the stuff we’ve borrowed we are forced to excise the interactivity violently because it risks disrupting our carefully constructed “cinematic” experiences.
One of the worst offenders is dialogue. Something screenwriters have basically perfected for their media, but something we struggle with to such an extent that we still can’t agree whether we should even give our protagonists a voice or not.
So let’s talk about talk. In video games.
In parser-based adventure games like Leisure Suit Larry, and in many text adventures, a significant part of the attraction is to experiment and explore. To find your way through the game one misspelled sentence at a time and see what happens when you decide to say stupid or offensive things, or to come ever closer to the story’s conclusion.
With the ELIZA Effect in mind, there may even be a certain level of emotional connection involved. But beyond even that, a parser-based game will often seem larger (or smaller) than it really is, based on the types of answers you get.
At their best, you’ll feel that anything is possible. This was certainly my own experience playing Starship Titanic. Clever or hilarious responses that made me feel like I was interacting with the game world. Like the designers had anticipated every clever profanity I could conjure up from my teenage mind.
At their worst, it’s the same as the trial-and-error segments in many adventure games where you try to figure out where to use the odd things in your inventory and you’ll have to bash your head against the keyboard until the right sentence falls out.
But one thing to be said about this style of dialogue is that it was a ton of fun, and though it requires typing – which is somewhat awkward in modern gameplay – it was a very direct form of communication. Certainly more direct than what we’ve gotten used to since.
“It is important to remember that your story is working in unison with gameplay. The more your story can be told through gameplay, the better. Much like the film axiom ‘Don’t say it, show it,’ you should be thinking in a similar fashion for the game: ‘Don’t show it, play it.’”Flint Dille and John Zuur-Platten
Including many of the games that did parser-based dialogue, games have always done a lot of exposition for some reason. You know what I mean:
In the world of Fantasyland, the slime elves come from a 1,000-year lineage. Before they arrived in the land of Snowplace in the southern parts of Fantasyland, they spent several centuries Frowning and Despairing onboard Seaweed Ships – boats made not of wood but of seaweed from the bottom of the Dark Evil Ocean – sailing across stormy seas and sometimes resorting to piracy out of boredom. Then the mysterious Crown of Mystery suddenly shattered into three fragments for reasons, and you – our only hero and hope – must venture forth to find those fragments before it’s too late. Also, if you want that broadsword, that’ll be 500 gold pieces please.
It can be called lore or background or even be confused for narrative depth, but it’s really just information dumps and they rarely serve any purpose beyond feeding you the stream of words many narrative designers somehow insist on having. They’re there partly because they’re insanely cheap to produce when all the systems are in place, and partly (presumably) because many writers simply enjoy writing the stuff.
Of all the things we’ve taken from Hollywood, the ideal to “show, don’t tell” is somehow a thing we skipped. For all that is holy, we should stop doing infodumps.
Basically, if something needs to be stated directly and explicitly, it’s most likely too convoluted to be worth keeping.
“Us game designers are envious about movies for some reason – but film and cinema, they can’t do a lot of emotions because it’s simply an empathetic thing, theater, it’s technology. The format doesn’t allow certain emotions nearly as well as games.”Nicole Lazzaro
Film excels at empathy. When the monster in a horror movie shows its ugly mug, we don’t get to see it – we get to see the reactions of the protagonists. But not before the monster’s been hyped up by the characters’ mounting distress. The suspenseful music helps too, of course.
Characters can be sad, happy, angry, or they can display a wide range of other emotions, but you – the viewer – will only ever display empathy. Good actors will make you feel, but you’re not an active participant. You’re not there in any active sense.
This is the style of media we grow up on, making it more intuitive to go straight to our cinematic inspirations when we want to tell stories in games. And we don’t just emulate it conceptually, but concretely. We have long-since introduced the idea of a separate dialogue state where the game pauses, potentially zooms in, and the camera work can then be directly influenced by Hollywood cinematography (in 3D). Or at the very least, set the game into a restricted state where we can maintain directorial control.
The button-to-action immediacy of real gameplay is turned off, and the camera snaps to a view of whoever is talking. Usually kicked off with an impersonal greeting or a reminder of central story beats to make sure that the player is on the same page as the character about to speak.
You know the ones. “Have you done the thing I asked you?”
Mass Effect, which is the game in this screenshot, does address one issue that most state-based dialogue suffers from, but it doesn’t solve it. The problem of repetition, where you have to convey the same dialogue line multiple times even though it’s only said once in the presented fiction.
Typical state-based dialogue interaction looks something like this:
- A Non-Player Character (NPC) starts the conversation. Usually prompted by player activation, and sometimes by an initial Player Character (PC) dialogue line, but it’s often an NPC that starts a conversation from the perspective of the content being displayed.
- The NPC speaks a line.
- A number of alternative dialogue responses appear. The player must read each alternative to understand what can be said.
- The player selects one of the possible alternatives.
- The PC speaks the chosen line, repeating the same content that the player has already read and selected.
- The NPC responds, either taking the player back to 2), or ending the dialogue state.
It needs five separate steps for a single conversational exchange. It’s as if we had to watch actors quietly reading their script before saying their lines. Not very cinematic at all.
Mass Effect addresses #3 by stating the intent of each option instead of reprinting the whole prompt. This eliminates part of the reptition, but not the requirement of having to first read and then choose before a line is spoken. But it can sometimes cause frustration as well, when the shorthand doesn’t reflect what the player actually intended to say.
If you’d watch the dialogue state and not read text, these steps invariably makes state-based dialogue look more like drawn-out staredowns than conversations. But the alternatives aren’t necessarily better. Not as long as we have very specific stories to tell.
Please register for the court, evidence exhibit #2: a screenshot from The Witcher 3: Wild Hunt. The cinematic heritage is probably never clearer than in The Witcher 3. What makes it different from Mass Effect is that it embraces it more fully. You’ll often have several characters take part in the conversation, and you only choose the lines for the usually brief and very direct Geralt of Rivia, The Butcher of Blaviken. It’s a much more passive experience, but actually benefits from this since it leans into its cinematography. Embraces it. It’s using the game’s systems for cutscenes wholesale and therefore blurring the lines between the different types of narrative content the game offers. You can’t draw a clear line between cutscenes and dialogue, and this increases the value of both.
It’s much more fluid, but through production value. It’s more like a movie, and therefore its “movie parts” feel better. The dialogue is still experienced in a separate state and still suffers from the same problems of any other state-based system.
Though the production values have improved dramatically, this style of dialogue is what we’ve been stuck with. Those five steps haven’t changed at all between 1992 and 2022.
The very worst that dialogue states have to offer is where you need to say very specific things because the writer or designer demands it. The whole game won’t progress until you have eliminated all the wrong choices and been forced to make the right one.
In some types of interactive fiction, this is perfectly fine, because your interaction prompt serves mostly to parcel out text into more easily acceptable pieces.
It also makes sense from the perspective of cinematic inspiration, of course, since the intentions of the writers and designers take precedence over those of the player playing the game. In film, the frames will always be served in the right order. If you look at it like that, it’s the logical conclusion to a decades-long battle between systemic interaction and cinematic gameplay; the two arch-rivals of video game direction.
Sometimes we do have dialogue that’s allowed to stay free from the state-based restraints. But we do tend to use this very irresponsibly. A nag line is a style of repetitive and often extremely obtuse calls to action from the game system, communicated using lines of dialogue.
Once “I think we need to open the red door” has played, you’ll soon hear, “maybe the red door should be opened,” followed by “the red door must have been placed here for a reason,” and finally, “open the red door damnit before I quit this stupid voice acting job!”
For an absolute master class in both satirizing and perfectly gamifying this style of voice over, you should do yourself a favor and play The Stanley Parable.
If you’re playing any other game and you hear nag lines you should back away slowly from your gaming hardware and call the cops on yourself before you are forced to do something drastic.
“The guiding principle behind combat banter in FEAR is that whenever possible, AI characters should talk to each other about what they are doing. Rather than having an individual character react with a bark, multiple characters carry on a short dialogue that broadcasts mental state and intentions to the player.”Jeff Orkin
For years, if you talked to anyone about Artificial Intelligence (AI) in video games, they’d toot F.E.A.R‘s horn. An action/horror first-person shooter, F.E.A.R (which’s ridiculous acronym I refuse to spell out) did many interesting things. But it’s maybe even more interesting for what it didn’t do.
Many of the people who praised the AI were saying that it did such smart things and seemed to really understand what it was doing. But as the lead AI programmer – Jeff Orkin – explained, all they really did was tell you what they were doing.
As an example that may or may not exist in the game’s content, imagine two enemies approaching the player at roughly the same time. They’re on opposite sides of the player and one of them starts shooting. Since the AI can use perfect information, it can know the situation and respond to it. Maybe one plays the dialogue line “I’m going in,” and the one shooting responds “I’ll cover you!”
This isn’t because they’re smart, but because those lines are triggered by what gets collected in the game’s current state space. State triggers the dialogue and not the other way around. This type of dialogue is dynamic, interesting, and can make the situation seem more human. For example when an AI can’t reach a certain area and simply says “hell no!”
One tier below combat dialogue you find what’s often called “barks.” Things AIs decide to say as immediate responses to what they’re doing or experiencing. Things like “grenade!” or “reloading!” or other things that advertise changes in their local state.
This is something many games do really well, and the perfect stepping stone towards what modern video game dialogue could be exploring instead of staredowns.
Like Nicole Lazzaro points out, empathy is the thing that film does best. When we borrow from cinema it’s natural to think that we also need to do empathy. But when we try to use the empathetic tools employed in this other medium, it falls apart.
This happens in many games, but I’ll use FallOut 4 as an example. As the game begins and you make your character, you’re introduced to your spouse. Once the introduction reaches a close, this spouse is killed brutally in front of your eyes.
In a film, you’d see the sobbing shaking form of the protagonist as the tragedy of the event sinks in. You’d feel for that character. Understand some of what that character goes through, especially if you have a partner of your own.
But in FallOut 4, it’s forced to the point where you’re hammering ESC because you just want to play. You don’t care about this polygonal 3D model since it’s not your partner – you’ve just been told that it is.
All we have to do to see how empathy can become player motivation is go back to the earlier instalments of the same game series. In the first two FallOut games, the village where you begin the game is your home. Full of people you care about and who care about you. Later in the story, when those villages are attacked, this becomes personal. A thing you really don’t want to happen.
Please do borrow from cinema. But don’t try to borrow the one thing cinema will always do better than games.
Payload vs. Delivery
I love well-written games with good stories. Unsurprisingly, I’m not alone. People praise the thematic delivery of God of War, the depressive angst and fanaticism of Ellie in The Last of Us Part 2, and how their choices in Mass Effect affects the cultural complexities between Mordin and Wrex. There are many game stories that we remember just as fondly as those from Hollywood or our favorite authors.
But we also often confuse the payload, meaning the content, with the delivery; the state-based dialogue and cutscenes.
We didn’t have the strong experiences in these games that we had because they had cutscenes in them or a clever director saying how things should be. We had them because we were there – we made the choices. If there were no choices to make, then we were at least present enough to not turn off our consoles.
This is the trickiest part of the entire conversation, as most of the praised story games do in fact deliver their content using state-based dialogue. Passive observation. This often makes us equate the delivery format in the games we like with the payload of the delivery. In other words, if we like God of War, our first instinct is to tell our game stories in exactly the same way. Maybe even with the same plot beats or motivations. For example, making a game about delivering someone’s ashes.
It goes the other way too. Where you can fail to see clever features in a game you don’t like because you conflate payload and delivery.
I want to argue that this conflation is the reason we get so many games with state-based dialogue in the first place. When we sit down to make our dialogue-heavy games, we think about the stories we remember, and rather than considering how to invite the player into the experience we assume that our game has to use the same delivery if we want to get the same results.
We keep the staredowns because our favorite games had them too. It seems intuitive that, if you want to make a game that captures the emotional payload of God of War, you copy the delivery. Or even parts of the payload. But that’s really not how storytelling works, or how video games work.
“[G]ood games writing does three things at the same time: 1. characterizes the speaker or the world 2. provides mechanical information and 3. does this as succinctly as possible.”Anna Anthropy
When it comes to dialogue, there are some outstanding attempts to make dialogue more interesting that I want to highlight. It’s not an exhaustive list by any means, but it shows that new thinking both isn’t new and doesn’t require a whole lot of thinking. There are countless small experiments we can do before we decide to stick to our beloved staredowns.
Left 4 Dead
It surprises some, but L4D has a fantastic dialogue system. The techniques used are not unique – and they have been around in some form since the early days of immersive sims – but they are unusually well-documented thanks to Elan Ruskin’s excellent GDC talk from 2012. (It’s probably the link I’ve shared the most in my entire career.)
I consider this dialogue interactive because it responds to what’s happening as it happens rather than requiring the five-step process described earlier. It serves as contextual feedback to things in real-time, just like how Jeff Orkin describes F.E.A.R.
Kingpin: Life of Crime
No one remembers this feature when they think back to Kingpin, but it had what may be the most dynamic dialogue system in video game history.
Here’s what it did. The Y and X keys were mapped to context-sensitive positive and negative responses that you could press whenever you wanted while playing the game and looking at an NPC. It could lead to fights, it could disarm situations; even trigger questlines (such as getting a drunk a bottle of alcohol).
It didn’t work flawlessly, but it promised something that games still haven’t delivered on 23 years later: dynamic freeform dialogue. Something that can stay interactive every step of the way and completely avoids the staredown.
Whenever I write something positive about Cyberpunk 2077, people like telling me I can’t have played many games. But I have, surprisingly, and still think there are tricks used so well in CP’77 that it speaks of what games can be able to achieve with dialogue in the future. It’s a crucial first stepping stone leading us beyond the staredowns.
Ironically, it’s because it leans into its cinematography. Something I’ve already used too many words complaining about. But the difference from other state-based dialogue systems is that it does so carefully and – most importantly – with a great deal of subtlety.
In the scene pictured above, you are introduced to Evelyn for the first time. A character that plays an important role in what’s to come, though maybe not in the way you think. She won’t greet you immediately. She’ll simply sit there and watch, waiting to see how you make your move, and blending into the noisy nightclub background. Instead, you speak to the bartender and ask for Evelyn. Once he hands the conversation over to her, he takes a step back and lights a cigarette as Evelyn leans forward.
This is carefully directed every step of the way, and the lighting and the way the characters signal who you should pay attention to using body language and staging is part of how cinematographers make a living. It’s a bit too locked down and restricted to truly come into its own, but the promise of these types of stage-directed set pieces is that we can finally find a form language informed by how games are played.
Because, what if the stage-direction could be systemic? What if the careful lighting and the conscious choices of idle states tailored for 1v1, 1v2, 2v2, and other dialogue dynamics could become part of our own ludography, so we can finally leave our obsession with film by the wayside? This is what CP’77 promises, by taking a first tentative step.
The other thing that the game does is that it carries its factions, missions, and major plot beats through characters and not through exposition. It borrows from how TV shows can go back and forth between plots and weave a coherent tale through the whole rather than as a linear constant. There’s so much subtlety in writing and presentation at work here that it’s a shame that the game’s other flaws have probably scarred its reputation permanently.
Taking a cue from the use of clocks in tabletop role-playing games like Blades in the Dark, this title often both communicates things that wait for you around the corner and things that you want to prioritize. It’ll give you a clock and say that it provides a bonus at the end, but it can also simply leave a cryptic clue about something bad that will occur.
Once you reach the conclusion of a clock, usually based on the number of work cycles you’ve completed on the game’s broken down space station, there’s always a tricky tradeoff or some consequences to deal with based on how you handled the clock along the way.
By making choices be more about tiny interpersonal decisions in multiple situations, and the consequences follow based on the sum total, this game manages to tell a very different kind of story and it does so almost entirely through conversations with the game world’s various NPCs.
Beyond the sometimes excellent writing, there are actually few things I like about the Telltale games. It always felt like a style of game that didn’t evolve but simply stuck to a formula that gradually lost its charm.
With the Game of Thrones outing, and the first episode’s disappointing lack of choices that actually mattered to the story, I stopped playing these games. They were narrative games of the sort where a writer has a story to tell and acts like your interaction makes a difference. I lost the illusion, and couldn’t get it back.
But two things I adore about the Telltale games is that they tell you what matters and they compare your choices to those of other people playing the game.
It’s been said many times that the “Clementine will remember that” cues are not always true, but it doesn’t really matter, because it shows that the characters in the game’s simulation feel how you treat them. The ELIZA Effect again – we care that they care.
It demonstrates that we don’t need to make complex systems to turn dialogue into something more interactive. Sometimes it’s enough just to give clear and timely feedback.
My favorite out of all the Telltalelikes is Until Dawn. A skillful tribute to the college slasher genre of and one that tells a tight and interesting story about teenagers going to an isolated cabin. The story develops in exactly the ways you’d expect, but what matters is that the ending will vary greatly depending on how you play.
The spectrum goes almost all the way between no one surviving and everyone surviving, all based on how you manage the relationships in the group and whose sides you pick in the group’s many conversations. Clever use of stereotypes, great writing, and dialogue that branches in exactly the right ways. It’s funny, scary, and carries its tropes extremely well.
Structurally, it’s maybe nothing special. It makes use of many passive observation techniques, including quick-time events. But it also shows you how far you can push dialogue as a game mechanic when you respect both your inspiration and the strengths of video games as a medium.
Red Dead Redemption 2
RDR2 is primarily a cinematic experience in its storytelling (just like CP’77 is). But the context-sensitive interactions that you can access during sandbox play make both for interesting situations and for a sense that you’re part of the world. If it wasn’t for the separate mode that requires you to hold a button to access it, it would conjure memories of Kingpin: Life of Crime!
As with R* games since time immemorial, the controls will never quite sit right and you’ll still accidentally rob people when you just wanted to say hi even after several hours of play. But as has also always been the case, this is a big part of the core experience. So many times where things snowballed out of control and I tried to make things right again, the usefulness of dynamic dialogue in a sandbox truly shines.
The clever thing about Disco Elysium, or at least one of the many clever things, is that it lets the world and your own mind talk to you. Much of the game’s dialogue is informing you of things about the world, but filtered through the different often untrustworthy parts of your character’s own conflicted mind. You’re literally arguing with yourself.
It gives the game a somewhat dreamy atmosphere, where it can be hard to separate voices in the world from the voices in your head. Not to mention separating truth from fabrication. Furthermore, the whole game is focused around dialogue. You can talk yourself through every scene and feel clever doing it. A confident modern take on what Planescape Torment did at the close of the 90s.
The lesson to learn from Disco Elysium – beyond making sure to have great writers – is to ask who each voice in your game belongs to. What kinds of conversations can be had beyond nag lines, combat dialogue, and state-based staredowns.
There are so many neat tricks in Oxenfree that deserves mentioning that it feels hard to cover them all. The greatest thing it achieves is to make the ongoing conversation feel thoroughly natural. Characters can switch subjects, abort each other, stay quiet instead of responding, and just keep the stream of words flowing in a way that sounds and feels like real conversations.
As it does this, you’re also exploring an island and getting to know the cast of characters. It’s a great game in many ways, maybe mostly because it blends real-time interaction with an ongoing conversation. Much like an Aaron Sorkin show or The Gilmore Girls, but with interaction.
A bit tongue in cheek, but if I didn’t take this moment to praise Interstate ’76‘s fantastic poem button (yes!) I’d be committing a crime.
Stampede, the main character’s good friend and radio operator, responds to your requests for poems with surprisingly deep and thoughtful pieces. They always felt like a clever way to emulate how talkative cars can make you.
As we approach an era where we can offload an increasing part of the content workload onto systems like JALI, Altered, and every experimental thing Embark is working on, it’s time to start looking into truly interactive ways to handle dialogue.
Games are not movies, and they should stop trying to be. But we should keep borrowing the good parts so we can make them better suited for our own work. We just have to remember that our medium is interactive and experiment more, so we may see where it leads us.
Let’s just agree that, 30 years from now, we’re not still using the same state-based dialogue as we’ve been using for 30+ years already. But let’s also agree that you don’t have to do all of that innovation in a single stride. It’s possible to take small steps forward, all the time. We just need to take more of them.