Speech Recognition and Artificial Intelligence

Every so often, I’ll get a little pissed off and start wondering aloud, “where the hell are my talking computers?”

Seriously, though – it’s 2008. Ten years ago, we were sure that by now, speech recognition would have surpassed the keyboard as the primary means of input. Hell, we’ve been predicting it for so long, it’s become somewhat of a hollow prediction – a lot like the “flying cars” argument.

But really, why aren’t we all talking to our computers? The answer, in my opinion, is that we haven’t developed artificial intelligence enough yet.

Why is artificial intelligence important for speech recognition, you ask? Let me explain.

We’ve had “basic” speech recognition for some time now. I have personally heard of “Dragon Naturally Speaking” as the be-all, end-all of speech recognition software since somewhere around 1998 – and I’m still not using it. Nor is anyone else – at least not on any large scale. And there’s a very, very good reason for that – it’s simply not good enough.

Now, I’m not saying that speech recognition isn’t getting better at recognizing words and so forth, but at this point, using your computer via voice commands is a bit like trying to operate your computer through the same interface as the original Altair 8800. Oh sure, each individual switch works quite well – but try teaching your grandmother to check her email by just flicking 8 switches on the front of a panel with a few lights on it. That’s about where voice recognition is right now.

You see, there’s a very important “missing piece,” which is context. Or, to put it another way, consciousness.

In order for a speech recognition system to understand instructions given by a human being in plain speech, that system needs to be able to understand plain human speech – which, more often than not, requires a lot of understanding of the context in which it’s used. And to understand context like that, you need a rudimentary consciousness – something that has awareness – not necessarily of itself, but of what it’s working with. And we simply don’t have that yet.

Take an example.

Imagine you’re composing a message. You’re going to send it to your friend, “Bob.” Here’s how you’d use voice commands today:

Command mode. Open Email. Compose message. Dictation mode. “Hi Bob comma how are you doing today question mark capital I am doing just fine comma we enjoyed dinner with you last week period command mode backspace word backspace word command mode” Alt, File, S, Tab, Tab, Tab, Enter. Close Program.

And that’s with minimal errors – in reality, you’d be using the “backspace” or “undo” command quite often. And because speech recognition has no context, no consciousness, you need to tell it explicitly when you move from giving commands about what to do with the computer (basically, using voice commands as a slow and unreliable mouse pointer) to “dictation mode,” where it just writes what you say – basically acting like a bad transcriber. It’s slow, cumbersome, and unreliable. And until it becomes faster and easier (and, to a certain extent, cheaper) than using a keyboard and a mouse, it will remain a fringe method of input.

Contrast this with a voice command session with a computer equipped with speech recognition and a rudimentary AI:

Computer, begin new email to Bob. “Hey Bob, how are you doing today? I am doing just fine, we enjoyed dinner with you last week.” Send message.

Which one do you think most people could adapt to quicker – the first one, or the second one?

Remember also that we haven’t even touched upon corrections. With AI, you could say “no, wait, make that ‘I’m doing just fine'” and the computer would know (based partly on your emphasis on “I’m,” and partly due to its awareness of the sentence structure itself and the context in which it was used) which phrase to replace. Just you try that with today’s speech recognition!

I’m not sure if AI research is being pursued as much as it should be – I have a sinking suspicion it’s not (probably due to fear of runaway AI and other ethical concerns). And maybe that’s a good thing, in the long run. But I’d like to see this sort of thing happen, and happen soon. Because I’m tired of typing – I want to talk to my computer.

I mean, seriously… it’s 2008! Wasn’t this sort of thing supposed to happen like 7 years ago, at least? What ever happened to “life imitating art?”

I’m waiting…

By Keith Survell

Geek, professional programmer, amateur photographer, crazy rabbit guy, only slightly obsessed with cute things.

4 comments

  1. I can totally understand your deep desire to talk to your pootur, I really do! And think of how many would be saved tha pain of carpal tunnel! My husband has been watching The Office a lot lately, though, and I have to laugh when I think of so many people all trying to talk to their computers all at once within close proximity to each other! And what happens when you get interrupted? What if you’re working on something private or confidential? So much potential for humor, no? Suddenly I can’t wait, either. But don’t worry, I saw a commercial during TMNT (yes I’m a 34 year old toonie!) for a “Girl Talk Journal”. It uses voice recognition as a security device to keep spying little brothers out of it. It’s a start! Sort of… I guess…

    Oh yeah, that’s a Critter Castle from bunnyluv.com. Glad you appreciate my book choices, too. 🙂

  2. Very good points. I guess I’m used to working in a quiet office all the time.

    Critter Castle, eh? I’ll have to check that out! Thanks!

  3. Though I suppose another advantage of basic AI would be it’s ability to pick your voice out of a crowd… and to sense context for when you stop talking to it and start talking to your cube mate. Although I guess in a crowded office, the keyboard would still be the best means of interaction. That, or a touch-screen of some sort.

  4. I’m still looking forward to the user interface that was used in Minority Report. Touch screen nothing! I’ve got a touch space-in-the-middle-of-my-room!

Comments are closed.