Digital assistants: intelligent or just good at voice recognition?
Smart devices and digital assistants are becoming the norm in technology thanks to Siri, Google Now, and Cortana, but how useful are they, and what does the future hold for these innovations? We investigate the rise in popularity of talking machines, and look to where it can all end up.
Intelligent computers have been the dream of technologists for many years. The idea has also proven a popular one in books and films, but the end results are often rather different. Whereas the scientists and programmers see machines that can help cure diseases, enrich the human experience by eradicating menial tasks, and at its furthest extreme actually usher in a new form of intelligent being; Hollywood usually portrays them as the destroyer of worlds who seek to overthrow and extinguish their cruel masters. Films such as Terminator, Wargames, and Transcendance offer a future where flesh-based life is something of an irritant that needs upgrading to a purer state – or merely point to the dangers of connecting computers that control a nuclear arsenal up to the internet.
Reality has a habit of being slightly less dramatic, and the revolution of artificial intelligence has quietly gone about its business with nary a global annihilation in sight. An invasion, though, is most certainly underway. In recent years our computers and mobile devices have gone from passive units with little agency of their own, except for scheduled updates or calendar alarms, to ones that constantly monitor our conversations, awaiting the magic words that mean we need their assistance.
They silently gather information on our likes, use of language, whereabouts, habits and routines, all with the aim of being able to understand us better. (See also: No-one cares about privacy any more.) As these systems grow in sophistication, and our interaction with them becomes more effortless, we could see a future where – just as many of us can’t remember more than a few phone numbers now – our devices begin to know more about our lives than we do ourselves. But just how intelligent do we want these devices to become, and how much can they already accomplish?
Open the pod bay doors HAL…
When Apple launched the iPhone 4S back at the end of 2011 the company didn’t focus on the excellent camera, enviable security, or high quality construction, instead it spent its copious advertising budget making a beta product the entire centre of its campaign. Siri became a star overnight. Images of celebrities like John Malkovich and Zooey Deschanel conversing with their new iPhones about soup, their schedule, or the finer arts of comedy, filled TV and computer screens the world over. But the endorsements weren’t the factor that really caught the public’s imagination – after all Samsung and Microsoft have used the same tactics with differing levels of success – instead it was the effortless way that they were interacting with their phones.
Of course, as any honest Apple user will tell you, Siri was far from the efficient PA that the adverts sold. Depending on your accent, and internet connection, the digital assistant could be far harder work. Misinterpreted commands, frustrating randomness, and complete paralysis when web access was absent, meant that it was often quicker to type things in yourself. But…in those moments when it worked, it seemed like magic. The future had arrived.
Two years on, all of the major platforms have their own variant of digital assistant. Google Now sits at the heart of the newest Android devices and Chromebooks, Samsung has S-Voice, while Microsoft recently announced the newest addition, Cortana. Named after an Artificial Intelligence character in the hugely popular Xbox series Halo, Cortana boasts an impressive set of features that could elevate the Windows Phone 8.1 OS to the head of the pack and possibly emerge as a new control interface for Windows as a whole.
‘One of the first things we did when we conceptualised Cortana,’ states Joe Belfiore, Head of the Windows Phone programme at Microsoft, ‘was to chat with real world, human assistants to learn how they actually worked. One technique real world assistants spoke about was the idea of a notebook, tracking all the interests and likes of their clients. Cortana also keeps a notebook…where she stores what she’s learned about me, and I can view or edit what she knows about me whenever I want.’
This information gathering is an essential element of an efficient, intelligent system, but in an age where privacy concerns are headline news this can leave users with important choices to make about how much they share with their assistants. Google Now monitors your internet search history, location, and general usage habits to collate a profile that allows it to suggest things that might be helpful. It can be incredibly useful, but some people find it a little unnerving that their devices are paying such close attention to their activity. There is the option to disable access to much of this data if you want more privacy, but this substantially reduces the capabilities of the service.
Siri is even more elusive, with the control options open to the user scaling down to pretty much which language you want to use. A nice feature though is that you can teach the assistant who your mother, father, sister, partner or other close relations are, which can then be used in features such as Find My Friends. Otherwise it’s not entirely clear what the service knows about you. Cortana’s notebook feature is admittedly still quite basic, but the fact that there’s one at all is at least a step in the right direction.
Say what you see
Of course the most obvious area of intelligence in a computer assistant is the voice interface. There’s nothing quite like telling a program what you want and then seeing it do exactly that. Nuance, creators of Dragon Dictate, have pioneered this technology for many years, and know what it takes to make our devices seem truly smart.
‘The personality of a system will be that instant thing that you latch onto about whether something is intelligent,’ explains John West, Principal Solution Architect at Nuance. ‘Taking you from your question to your answer in the quickest way possible. That’s different to the way you and I would interact as humans to a certain degree, because we could go off on tangents and come back. An intelligent agent in the future will possibly have the capability to do that, but at the moment its very much about you asking the intelligent agent to do something and the intelligent agent being able to fulfill that in the quickest possible way.’
Voice control has certainly developed to a point now where it’s a usable interface rather than just a party trick. Google’s voice search has also extended beyond mobile devices to the desktop Chrome browser. Opening a new tab on www.google.com and clicking on the microphone icon at the right of the search bar enables you to speak your search queries (so long as you have a built-in microphone), and in many cases Google will read the results back to you. In recent versions there is also the ability to ask related questions, for example – ‘When was Jaws released?’, then when the answer is given you can say ‘Who starred in it?’ and the system will know what you mean.
Thanks to Google maps you can also ask for directions to a location and the system will work them out and display them. You might not realise it but Windows has, for some time now, also had a comprehensive suite of voice control tools built into the OS. If you search for Speech Recognition in versions as far back as Vista, you’ll find a program that enables you to navigate, dictate, and generally control your PC all via spoken commands.
While voice control may still catch on when using the desktop, the personal nature of a mobile device seems to make it easier to accept for users. After all, we already talk into our phones, so transitioning to speech recognition is a much smaller mental step. Then there’s the interoperability of the apps and information that these devices bring. In Cortana there is a novel feature which allows you to tell the handset ‘remind me about borrowing the suitcase next time I talk to Jim’, and Cortana will monitor your contacts until Jim is on the phone next then pop up the note. Very simple, very clever.
A major drawback of intelligent assistants is their need to be constantly online. Put your mobile device in airplane mode and you’re back to typing in calendar entries and reminders yourself. This is because most of the translation and processing is done on large, powerful servers in the cloud that can handle the number crunching required. Depending on the handset you can have basic elements of control completed locally – say ‘Launch Music Player’ – but the majority will likely be online for the foreseeable future. This isn’t surprising, as so much of the information an assistant needs to be intelligent is based in the cloud anyway. This is particularly true in one of the fastest developing areas in the technology at the moment – cars.
‘You can imagine a mapping database sitting in your car,’ explains John West, ‘it could be a TomTom, whatever is embedded in your vehicle, and you say to it ‘Find me the nearest petrol station’. It looks at where you are, it will know all the petrol stations around that area and can display them. However if you’re connected to the cloud, it could actually say, ah, John collects Tesco points for example, and he’s traveling North on the M1, and I know that, so I can provide John with the petrol stations going North on the M1 that are Tescos, but I also know that he’s only got fifty miles left of diesel, so I’ll only provide him the ones within the area that he’s going to or has the capability of reaching. Likewise it could say ah, there’s congestion here, so he possibly won’t make that, I’ll show the ones closer.
‘With Dragon Drive we’ve got partnerships with a number of companies so that a user can say ‘Take me to a hotel in London’ and it finds the hotel you want to go to, again this is a hybrid between the handset and the vehicle, so you can arrange this in the house on your handset, and then when you get in the vehicle it’s aware that you’ve already set these destinations. Or it could look at your calendar and say oh, you’re going to this address and put it in. But then as you go along you could say ‘find me the nearest parking spot’, and you could then use services such as Parkopedia or Parkaround to be able to go out there and actively find you a parking spot. These services already exist, it’s just the method of bringing them together in a way that’s useable. If you wanted to do that right now on your PC you’d go to a mapping software, then a parking piece of software, but you couldn’t do it all in one piece of information. The personal assistant approach is very much a proactive system…if it knew your preferences, it could do those things for you.’
In many ways the car is the most obvious place for this technology to find its home. Being able to converse with an assistant verbally, organise your appointments, send email and text messages, have new arrivals read out to you, and conduct internet searches all without the need for a screen or physical interaction, would be a god-send for many. Google, Blackberry, Apple and several other giants have already outlined their in-car assistant programs, some already available in newer models, and it can only be a matter of time before the kids watch Netflix in the back while you get some work done on the long drive to the campsite for a summer break.
With the Internet of Things becoming a tangible reality in the near future the possibilities exist for a convergence of intelligent agents that could see you fridge knowing its contents and referencing your calendar to see if you have any dinner recipes planned for which you require ingredients. Our televisions will use biometric technology to recognise our voices and display the programs we prefer to watch, while GPS sensors in our cars and devices will tell our automated home systems when we’ll be home and set the temperature accordingly. If bathroom technology continues to advance then it might tap into this to run a warm bath when it knows we’ve been caught in the rain. But rather than a single, unified experience, it does seem more likely that companies will want you to sign up to their particular flavour of this brave new world. The future won’t be quite as egalitarian as we might have hoped.
‘From their perspective it gives you an allegiance to that brand or company,’ says John West, ‘so when you replace, say, your television set or set-top box or whatever, you’ll go with the company that are already set up with on your mobile phone…At the moment we’re spending a lot of time with organisations changing the [interface] voices for them to differentiate their service’.
Nuance is in a good position to watch these developments, as the company currently licenses technology to some of the largest brands in the world, including Apple, Microsoft, and Samsung, among many others.
‘It’s going to be difficult,’ West concludes, ‘at the moment it’s very much different people want to do different things. I see it really being…having this information sitting in the middle and people being able to access it. I don’t see any time in the future where we’re going to get a common set of APIs that will allow us access…I don’t think any of us would trust anybody with [sole control of] that information sitting in the centre. Who is going to own that? Who would we trust to own that?
‘We’ve been looking at the Internet of Things for example, about the amount of data that we’ve got, and the amount we’re willing to share. Who owns it? Who aggregates it? And such things. We’re very suspicious, and even more so recently, because of what’s gone on about data being leaked, the NSA, and all these sorts of things. The governments can’t be trusted with this information in the view of most people, so who would we trust? Would you trust Apple? It depends on your allegiance.’