Speaking Out for Change: The Next Evolution of the “Talkie”

It's been two decades since the emergence of the video game "talkie", and the inescapable truth is that there's been very little evolution in adventures ever since.  But as the saying goes, the more things change the more they stay the same, and the future of the genre may very well be the talkie all over again.  A brand new type of talkie, of course, but a talkie nonetheless. 

It's all very well and good for a game to talk to you, with pre-scripted dialogue lines recorded by voice actors in a studio long before you ever hear them.  But that's not really talking, just playback. So what about a game where you talk to it?  Impossible?  Not at all!  I've spoken to automated routing services that understood my commands on the phone before, so why not a game? 

One of the unfortunate byproducts of the move from text to graphic adventures was a significant loss of interactive freedom.  Even a basic text adventure can offer so much more personal control (however illusory it may be) than its modern day point-and-click counterparts.  Encountering a small mailbox in front of a white house, you could try opening it, looking at it, depositing something, emptying it, kicking it, kissing it, pushing it over, covering it with graffiti, or talking to it (hey, you never know).  True, far too often in the genre's early days the parser didn't understand what you were trying to tell it, but that was a technical limitation of the time that no longer applies.  That said, staring at a blank screen and typing just isn't going to cut it for most people today, so more text adventures really isn't the answer. 

Early SCUMM games like Maniac Mansion offered many more interactive options than modern adventures

SCUMM-era graphic adventures scaled back the interactivity, but still allowed a wide variety of verbs to play with.  With each hotspot giving you as many as fifteen generic options (not including inventory), there was still plenty of choice available.  And yet it was a nuisance to continually drag your cursor back and forth between action commands and the environment.  Before long, experimenting became more of a pain than pleasure.  Sierra refined the process further, reducing interactive options yet again and allowing right-clicks to cycle commands, but this too grew tedious over time.  The "verb coin" provided an elegant solution, eliminating additional mouse clicks at the expense of still more possibilities.  But even click-hold-slide-select is a hassle when multiplied by hundreds or thousands over the course of a game.

Most graphic adventures nowadays dispense with all semblance of individual control in favour of purely linear scripting.  Sure, you can click what you want (so long as it's a hotspot), but the player's only input is to click, guess what might happen, and hope it's what you seek to accomplish.  (Often getting nothing more than a "that won't work" for your troubles.)  Many games add a "look" option as well, and the verb coin is still around, occasionally offering another choice or two.  But for the most part, we've been reduced to one-click-fits-all interaction.  It's very restrictive, but simple, streamlined, and fast. 

I'm thankful for the "fast" part.  My time is limited, and I have no desire to spend literally hours of it "playing the interface" rather than the game itself.  But I do lament the loss of personal involvement in my adventures, and I wish there was an alternative to typing or a tedious series of mouse clicks to accomplish what text adventures could (at least theoretically) do from the start.  If only we could sit back and simply TELL the game what we wanted to do!

Well, why can't we? 

Of all video game genres, arguably none are less tactile than adventures (excluding direct control titles like Dreamfall and Sherlock Holmes, which remain few and far between).  You simply couldn't take hands-on control away from a shooter, RPG, or strategy game, but there's a reason the term "point-and-click" is largely referred to (outside genre circles) with derision: the act itself is boring. There is nothing intrinsically fun or inspiring about sweeping the screen, clicking hotspots, watching a character plod around, then be force-fed whatever scripted action (or response) the developer saw fit to provide.  That's one second of active engagement for many times that of passive spectacle.  Yawn. 

No, the appeal of adventures comes from the thinking, not the doing, and with most adventures resigned to click-and-pray mechanics these days, not only isn't there all that much thinking involved anymore, we spend far more time watching than acting. We accept it because it's so efficient, and because we'd never sacrifice the pretty pictures that come with modern games, but as entertainment it's a far cry from the more rewarding means of interaction we once enjoyed. By eliminating the mouse in favour of speech, could we finally have both?  

Retail products like Dragon NaturallySpeaking make speech recognition easily accessible

We've all seen futuristic sci-fi where everything is controlled by voice commands, and it's an appealing prospect.  Usually things go disastrously wrong (HAL says hi), but that's only when computers are so smart they're able to think and talk back.  I'm not asking them to do that much, merely listen and respond as they've been programmed.  And that's not science fiction, merely science.  In fact, speech recognition programs have been around for quite a while, but like all new technological breakthroughs, it's taken until now for them to reach a reliably functional level.  They may still largely be an automated annoyance on the phone, but voice-to-text programs are currently in use in many professional fields, from healthcare to law to education.  If it's good enough for "important" jobs, is it not good enough for a game? 

And you know what?  It's pretty cheap.  I was under the mistaken impression that such an option would be cost-prohibitive.  It probably was in years past, but now there are highly respected retail programs like Dragon NaturallySpeaking for only $200.  Surely that's well within reach for an enterprising developer looking to forge a new path.  And if not... well, coughKickstartercough.  This is the sort of tangible, justifiable expense I'd gladly contribute to if necessary.  There are even open source options to explore for those more technically than financially inclined.

Sep 21, 2012

There is an adventure game on steam greenlight called “In Verbis Virtus” in which the user uses voice commands to cast spells. Not quite as neat as having a conversation with a character but still a step in this direction. Could you look into reviewing it?


Jackal Jackal
Sep 21, 2012

I don’t know about reviewing it, but it certainly looks interesting.  Definitely a step in the right direction!

Kasper F. Nielsen Kasper F. Nielsen
Sep 21, 2012

I’ve never thought about this before, but now I can’t stop wanting it /now/.

Sep 21, 2012

Fascinating article. I used to report for a Radiology magazine about seven years ago, and radiologists were definitely using speech recognition to dictate reports. The issues back then were all the ones you’d expect, including difficulties with non-native language speakers or even speakers with heavy regional accents. I’m sure they’ve made great strides since then, but the thought of having the software misread my voice and having to repeat the word would be frustrating (just as clicking on what I think is a hot spot and getting no response is frustrating). However, designers can take some, ah, tips from veteran physician users of the technology:

Kurufinwe Kurufinwe
Sep 21, 2012

Ugh, no. Siri: The Game is really not something I’d look forward to playing.

OK, actually I could see it working as a gimmick for a game built specifically around the idea that you’re talking to someone. Something like The Experiment, but where you speak to the character instead of playing with the lights and cameras, and she replies to you and follows your instructions (or not). That could actually be fun if done well.

But I certainly wouldn’t want that as a standard interface. Beyond the technical difficulties and the utter retardedness of talking to a machine, what really bugs me is that it draws attention to the question of who you’re talking to. One of the reasons why the point & click interfaces became more and more transparent was to give a stronger sense of controlling your character instead of talking to some kind of invisible gamemaster (or not so invisible when Roberta Williams started sticking her face on the death messages). I don’t ever want that to return. I don’t want some narrator/gamemaster putting himself between me and my character.

But if I’m not talking to a gamemaster, then I must be talking to the character. And that’s a beautiful can of worms. Who am I? God? Some voice in the character’s head? (Is he crazy?) Is the character going to follow all my commands, and is he going to talk back to me at some point? Unless you establish a story framework where you’re an actual person in the story talking to the protagonist (as I outlined above), voice control just makes things extremely weird.

Interfaces have been evolving towards increasing the illusion that the player is controlling the character rather than communicating with the game—whether that’s through unobtrusive point & click or so-called “direct control”. Voice control is the opposite of direct control, putting a something back between the player and the character. It’s not something I want.

Jackal Jackal
Sep 21, 2012

Facetiousness aside, Siri has absolutely nothing to do with this suggestion. 

What an odd notion that P&C gives a greater sense of control.  I couldn’t disagree more.  Perhaps because it happens to involve your hand it feels like a more intuitive extension of yourself, but in terms of actual player control, P&C has pretty much stripped all sense of it away.  Now the game does all the work for you.  You just click.

I certainly can’t argue if you say clicking a mouse makes you feel more like you’re the character. But I certainly can’t agree that it adds something back that isn’t there now.  It’s simply a different means of inputting the same thing, just with far more options.

MoonBird MoonBird
Sep 22, 2012

Hmm.. I’m not so sure I like the idea… this feature’s been in smart phones for some time now, and for the most of the time it simply Does. Not. Work. If I mean “read” - “did you mean reed?” - Oh, for pity’s sake, I write it in!

Sep 22, 2012

I’m sorry, but most of your arguments struck me as rather contrived.
“And yet it was a nuisance to continually drag your cursor back and forth between action commands and the environment.”
Using the keyboard to select arrived shortly after MM, I believe. I think the original MI1 already had it. As for pointing on hotspots, it’s the same effort required for aiming in an FPS.

” I have no desire to spend literally hours of it “playing the interface” rather than the game itself”
Are you serious here? When playing old-school games, do you really feel you’re gaming the interface because you have to choose a verb to go with your hotspot?

“There is nothing intrinsically fun or inspiring about sweeping the screen, clicking hotspots, watching a character plod around, then be force-fed whatever scripted action (or response) the developer saw fit to provide.  That’s one second of active engagement for many times that of passive spectacle.  Yawn.”
If this were true, there would be nobody in cinemas or in front of the TV. Same amount of script, even less amount of interactivity.

As for speech recognition as an interface, that would restrict my playing time even more - if anyone’s asleep, headphones are no longer enough. Even if they’re not, it’s a lot more disturbing to hear someone yakking to a computer, instead of just hearing mouse clicks.
Also, when talking for extended periods of time, my mouth runs dry, so now I have to wander around and get a glass of water every once in a while, disturbing the immersion.
Finally, it just sounds inconvenient. My car has voice recognition. I never use it. In part because it was calibrated for Americans, and my accent isn’t, and in part because it just takes longer to say “car, volume up” than it does to move my finger and press a small area on the steering wheel. Multiply THAT by the amount of hotspots, and I fear I’ll lose patience with the game much sooner.

Jackal Jackal
Sep 22, 2012

I’m not sure which part you’re calling contrived, Antrax. Of course the old SCUMM-style mechanics are tedious. They were innovative and great for their time, but there’s a reason they’re never used anymore. 

Your comparison to shooters is poor. Sure, you point at things and click in both genres, but in a shooter it’s all based on accuracy, speed, reflexes, even tactics. Not many moving hotspots in adventure games that are shooting back. That’s why this is an idea that really only applies to adventures and not other genres. (Although any genre could benefit—I’d love a shooter that would let me voice-activate weapon changes, ammo reloads, etc. in the head of a battle instead of fumbling with F1 keys.)

As for movies, you’re seriously comparing 100% scripted cinema to clicking an object, watching a character walk over to it, and offer a comment? The two mediums are entirely different. I love movies; watching characters act out mundane commands (or should I say “clicks”, since you can’t be sure what you’re actually commanding anymore) is like watching paint dry.

Sure, dry/sore throats are a potential problem. Good point. But then, some people have arthritis and carpal tunnel syndrome, too.  It’s not like P&C is without its physical drawbacks.

I sympathize with the complaints about current implementation in other areas. But judging an idea based on the worst examples does the concept no justice. There are too many respected fields using this technology successfully to believe it can’t be done right.

Another point I probably should have highlighted in the article (getting back to Kurufinwe’s complaints about the connection between player and character) is that voice-activation would completely do away with the need for a cursor. I can’t think of a more blatantly artificial reminder that you’re playing a game than some ever-present, magical pointer on screen at all times.

Sep 22, 2012

Correct me if I’m wrong, but it seems your arguments against point and click are as follows:
a) When complex (many actions), it’s inconvenient as an interface because you have to make a lot of effort to get the character to do what you want.
b) When simple, it’s uninteresting as an interface because it requires too little player involvement.

(a) was, as mentioned, relevant around 1990. I agree that interfaces got “simpler”, but I think this is a by-product of the fact most verbs were typically useless - it’s not more rewarding to click on “open” and then “door” than it is to just click on a door and go through it. You inevitably had the single-use verb that was like 99% red herring until that one time it was required. I’m not sure why voice would help here - the actual action of selecting the verbs wasn’t the problem, so it doesn’t matter if the game recognizes clicks or the word “wear”.

As for (b), here I couldn’t agree less. I played Sam & Max mostly to click on everything multiple times. Other people might’ve played it to try and beat the game as quickly as possible. The interface let us both enjoy “our” kind of “passive” experience. If the game is good, there’s nothing upsetting in the characters walking to a hotspot and commenting on it. It’s only when the game is annoying (grating voice acting, stupid animations you can’t skip, generic “I can’t do that” endlessly) that avoiding clicks that don’t advance the game becomes a goal. And some games are more involved, on the Wii it’s common to have games require you act out on-screen actions with the remote, which I guess some people like (I don’t). In any case, I don’t think voice commands are the solution to that, either - if you take a point-and-click game and just replace all clicks with verbal commands, you still do stuff like say “open door” and then walking and door-opening animation.

I agree that it would be great if games were more open-ended and could understand us, but I don’t think voice recognition is essential to this, nor do I think point and click games are so inherently flawed.

Jackal Jackal
Sep 22, 2012

Those are pretty much the two correct arguments, yes.

I have no idea why you’re starting your discussion at 1990.  Whooooole lot of games came before that. But the year is irrelevant. The point is, once upon a time, games gave us greater interactive freedom, but because the interfaces were too damn clunky, they were abandoned.  I’m looking for a way to reclaim what we once had without a screen full of verbs and tiresome mouse-scrolling for even the simplest actions.

Obviously this idea isn’t “essential”. P&C isn’t broken. We hardcore adventure gamers could still be playing them 20 years from now. But that’s not really the issue. This is a way for the genre to innovate, and who knows—maybe even be technologically relevant again, like it once was. Will it work? Who knows. Will it be fun? Beats me. But I’d sure like someone to try.

Sep 22, 2012

@Kurufinwe: “Something like The Experiment, but where you speak to the character instead of playing with the lights and cameras, and she replies to you and follows your instructions (or not).”

They tried that, it was called “Lifeline,” and it was hilariously awful.

“Imagine playing a game like Eric the Unready with hi-res graphics and full speech recognition”
I am imagining a game that is uglier, and far slower and less fun to play. If you want a game with all of the freedom of a text parser, a text parser is really the best way to go. Even when fully functional, which it never is, voice recognition is not just slower than a mouse click, but slower than most typed commands.

Would bolting voice recognition onto games in other genres add anything? Kinect’s use of voice in a number of games has not been well received, nor was the above-mentioned “Lifeline.” Obviously it would be ludicrous to have to yell at Mario when to run and jump. Even in low-key games with no need for reflexes, like Animal Crossing (the two most recent installments were both on systems with microphones) the designers didn’t see fit to implement voice recognition, and I can’t see how they would have been an improvement at all. The only games that seem to lend themselves to it, or have met with any success with it, are pet simulators like Seaman or Nintendogs.

Essentially, if you want lots of verbs, use a text parser. If you want speed and fluidity, use mouse clicks. The conceptual space between them is peppered with UI solutions like verb coins and so on. If you want something that is slower than all of those choices, and offers unlimited verbs, but has even less chance of understanding them than a text parser,  and which can only be played in very limited situations, and makes you look and feel ridiculous, go for speech recognition.

Jackal Jackal
Sep 22, 2012

Dismissing an entire idea because it worked poorly once with ten year old technology is using your imagination?  Wow. I wonder how many technologies we’d actually use today if they were summarily rejected for not working perfectly the very first time. 

And who cares about “bolting” the technology on other genres? I already gave one good example of how it could be, but regardless, the whole point is for adventures to lead the way.

orient orient
Sep 23, 2012

I don’t think the extra immersion that should theoretically be felt through voice control would exist until the computer could respond in a realistic way e.g. having emergent conversations with NPCs, characters understanding your tone of voice and reacting appropriately etc.

As it stands, voice recognition is pretty much a substitute for typing, and I see little value in that when it comes to games.

I’m all for freshening up the adventure genre, so I would at least like to see a developer’s attempt at a voice-enabled game. However, my initial thought after reading the article is that it would simply create more problems than it solves—but hey, it could be interesting. We’ll never know until someone tries.

MoonBird MoonBird
Sep 23, 2012

I like point & click games, because they are point & click games. I like to search the screen for hotspots and I like to hear my character’s opinion about them. As Antrax mentioned, today’s gamers try to beat the game as quickly as possible. I simply don’t understand this rush. I want to explore and do some AD-VEN-TU-RING. Not to fee like l would be taking part to running competition. If I ever want to play adventures, they must have a cursor. Period.

tnzk tnzk
Sep 23, 2012

Perhaps the most feasible option is not just point-and-click, nor just text parsing, nor just voice command. The most feasible, and objectively the most engaging, is actually a combination of all three.

Mass Effect 3 on a Kinect equipped Xbox has voice control options, and the reality is, it’s not any more engaging than clicking buttons. Speaking out the dialogues instead of choosing them via controller feels more tacky than emotionally investing. However, it worked an absolute charm in the heat of battle, when my hands were engaged in second-to-second combat, but my voice could shout commands to team members/unleash auxiliary powers.

In a traditionally conceived adventure game, I reckon the normal point and click interface is pretty efficient for most things. It’s probably the best for navigation, both 3D and 2D. Where voice command *could* work, however, is in actions which require multiple inputs. Walking through an area, but want some exposition on that fountain the middle of the street?

“Check out the fountain”, you tell your character. And while you’ve already clicked his destination, your character starts talking about the object of interest. No need to stop navigation, right click on fountain, and click on a magnifying glass. You’ve streamlined the process here.

Or for a meatier role in the gameplay, what if you saw a bare plumbing pipe on the other side of a screen, and have a valve in your knapsack?

“Valve on pipe”, you say. And bam! Your character walks over and screws it in. No messy clicking or inventory shuffling. Done in a second.

I feel that with games, it’s not about “immersion” in the way we naturally feel it to be. With games (and its namesake is a giveaway), it’s all about efficiency in achieving objectives which provides all the immersion one needs.

A game like Lexis Numerique’s The Experiment most definitely would have benefitted from voice controls. The multiple windows, typing, and clicking would have been made more efficient with some easy voice cues to make the adventure so much more amazing!

In time, Jack. Your site is huge, and if you keep up with these features, a developer is bound to get inspired!

Jackal Jackal
Sep 23, 2012

Orient, you’re more or less right. Voice activation is really just a replacement for typing, but a far more convenient one. Accessibility is huge in today’s games, so typing just isn’t a viable option for most people anymore. Voice might be. 

Moonbird, your complaint doesn’t apply here. Have you never played an old Legend game? There was no rushing through those. The kind of game I’m suggesting could and should add MORE adventuring, not less. That’s one of the main reasons for doing it.

tnzk, absolutely, P&C is “efficient”, precisely because it restricts interaction to one-(maybe two)-click-fits-all. The point of moving to voice isn’t to streamline P&C, but to streamline old text-style games, where players had far more freedom to experiment. If all it did was replace “click hotspot” with “tell computer to click hotspot”, it would be a colossal waste.

Sep 24, 2012

Nice idea, but one thing I feel that would be strange about this as a general game mechanic is that it would break the immersion a little bit. If I’m shouting commands at the main character (“pick up that apple”, “try throwing that apple at the witch”) then I feel like that would make me (the player) an additional character, and the main character is someone who’s in a dialog with me. I think that could work in a certain kind of game - for example, if you are actually playing a general who’s on radio communication with a soldier trying to infiltrate an enemy base, and you’re giving them commands over the radio. However, in a regular game, shouting commands at my character would feel strange.

Another issue, already raised by some people, is the lack of accessibility for non-native speakers (or maybe even just non american accents?). I’m finding it pretty hard speaking with current state of the art voice recognition - I usually have to say everything 3-4 times until the system “gets me”. However, that’s just a problem with the current state of the art, and it would probably get much better over the years.

Sep 24, 2012

Ugh, no thankyou. This is entirely gimmicky. What does it add to a game that i have to try re-wording a sentence 10 times for it to get accepted versus just clicking on dialogue options. I’d rather go back to asking questions via text parser.

Jackal Jackal
Sep 24, 2012

This has nothing to do with dialogue, zane.  Not sure where you got that idea.

Matan, where do you currently use speech recognition? Language/accent barriers are indeed an issue, as the article itself points out. But I think phone systems are a poor reflection of the potential quality. If the technology were consistently that bad, it wouldn’t be trusted in fields like law, healthcare, and education. And that’s with (theoretically) unlimited vocabulary, which a game could restrict.

Still can’t wrap my head around why speaking into a headphone mike is any less an immersion breaker than sliding your hand around on the desk and staring at a magical, godlike pointer on the screen, but maybe that’s just me.  Tongue

Tramboi Tramboi
Sep 25, 2012

Do people really dream of playing Eric the Unready with a vocal interface?
Textual interface is just perfect - it does convey the right information to you and doesn’t make you speak loudly in your lounge while your girlfriend is watching tv.
Next time somebody will want us to make awkward gestures to play games.

Wait… they tried with Zack and Wiki!

Hint : Keep focused on better stories and better writing, we’ll survive pointing and clicking for a few more years.

Jackal Jackal
Sep 25, 2012

I wasn’t dreaming of playing games with a mouse while playing the early text adventures, either, but whaddaya know, it caught on!

Sep 26, 2012

Yeah, my experience with voice recognition is mostly from cell phones. However, since the recognition itself is done server side, it can be computed by very strong machines, so I think the only real limiting factor is the audio quality passed to the server which is perhaps not perfect (both because of imperfect recording hardware on the phone and because of the need to save bandwidth when sending the voice over the network). I agree that this could be done much better if the voice recognition is made specifically for the game with the vocabulary limited, for example, only to very common words and to words describing things that are actually in your vicinity.

Well, I agree point and clicking with a cursor is not very immersive. I actually like the move of adventure games towards a more direct control kind of interface. When you’re in direct control of the character, and if the interface is well designed, you very soon just feel one with the character. The interface just becomes very natural and you stop noticing it.
Maybe that could happen with voice commands as well, but I kinda doubt it. Just seems unnatural to speak to my computer and tell it what I want my character to do (again, unless I’m actually talking to another character and the interface is somehow blended into the game in a logical way).

Lee in Limbo Lee in Limbo
Oct 7, 2012

I think the first time I ever saw a text recognition computer in a piece of science fiction that felt believable to me was in Blade Runner. Oh sure, I saw HAL in 2001, and Even Doctor Who dabbled in that level of Sci Fi before Blade Runner reached the screen, but watching Dekker command a computer to zoom left and deeper into a photograph to pick up minute details that simply looking at a flat photograph under a loop would not yield was remarkable for me. THAT is science I can believe will be available in my lifetime. The text recognition software protocols for this exist. Siri is actually a really good example. You train Siri to recognize your voice before you start using it.

A gaming company could easily adapt Siri-type software and simply make a player recite a few lines of this and that until the software is ready to work. Personally, it’s no more complex than tweaking video and audio settings, really. Testing a microphone would be about as much hassle, and it can all be done at once. I like the idea. A lot.

