[repost ]How Siri, Cortana, and Google Now are replacing our brains


Digital assistants: intelligent or just good at voice recognition?

Smart devices and digital assistants are becoming the norm in technology thanks to Siri, Google Now, and Cortana, but how useful are they, and what does the future hold for these innovations? We investigate the rise in popularity of talking machines, and look to where it can all end up.

Intelligent computers have been the dream of technologists for many years. The idea has also proven a popular one in books and films, but the end results are often rather different. Whereas the scientists and programmers see machines that can help cure diseases, enrich the human experience by eradicating menial tasks, and at its furthest extreme actually usher in a new form of intelligent being; Hollywood usually portrays them as the destroyer of worlds who seek to overthrow and extinguish their cruel masters. Films such as Terminator, Wargames, and Transcendance offer a future where flesh-based life is something of an irritant that needs upgrading to a purer state – or merely point to the dangers of connecting computers that control a nuclear arsenal up to the internet.

Reality has a habit of being slightly less dramatic, and the revolution of artificial intelligence has quietly gone about its business with nary a global annihilation in sight. An invasion, though, is most certainly underway. In recent years our computers and mobile devices have gone from passive units with little agency of their own, except for scheduled updates or calendar alarms, to ones that constantly monitor our conversations, awaiting the magic words that mean we need their assistance.

They silently gather information on our likes, use of language, whereabouts, habits and routines, all with the aim of being able to understand us better. (See also: No-one cares about privacy any more.) As these systems grow in sophistication, and our interaction with them becomes more effortless, we could see a future where – just as many of us can’t remember more than a few phone numbers now – our devices begin to know more about our lives than we do ourselves. But just how intelligent do we want these devices to become, and how much can they already  accomplish?

Open the pod bay doors HAL…

When Apple launched the iPhone 4S back at the end of 2011 the company didn’t focus on the excellent camera, enviable security, or high quality construction, instead it spent its copious advertising budget making a beta product the entire centre of its campaign. Siri became a star overnight. Images of celebrities like John Malkovich and Zooey Deschanel conversing with their new iPhones about soup, their schedule, or the finer arts of comedy, filled TV and computer screens the world over. But the endorsements weren’t the factor that really caught the public’s imagination – after all Samsung and Microsoft have used the same tactics with differing levels of success – instead it was the effortless way that they were interacting with their phones.

Siri on iPhone 5c

Of course, as any honest Apple user will tell you, Siri was far from the efficient PA that the adverts sold. Depending on your accent, and internet connection, the digital assistant could be far harder work. Misinterpreted commands, frustrating randomness, and complete paralysis when web access was absent, meant that it was often quicker to type things in yourself. But…in those moments when it worked, it seemed like magic. The future had arrived.

Two years on, all of the major platforms have their own variant of digital assistant. Google Now sits at the heart of the newest Android devices and Chromebooks, Samsung has S-Voice, while Microsoft recently announced the newest addition, Cortana. Named after an Artificial Intelligence character in the hugely popular Xbox series Halo, Cortana boasts an impressive set of features that could elevate the Windows Phone 8.1 OS to the head of the pack and possibly emerge as a new control interface for Windows as a whole.

‘One of the first things we did when we conceptualised Cortana,’ states Joe Belfiore, Head of the Windows Phone programme at Microsoft, ‘was to chat with real world, human assistants to learn how they actually worked. One technique real world assistants spoke about was the idea of a notebook, tracking all the interests and likes of their clients. Cortana also keeps a notebook…where she stores what she’s learned about me, and I can view or edit what she knows about me whenever I want.’

Cortana Windows Phone 8.1

This information gathering is an essential element of an efficient, intelligent system, but in an age where privacy concerns are headline news this can leave users with important choices to make about how much they share with their assistants. Google Now monitors your internet search history, location, and general usage habits to collate a profile that allows it to suggest things that might be helpful. It can be incredibly useful, but some people find it a little unnerving that their devices are paying such close attention to their activity. There is the option to disable access to much of this data if you want more privacy, but this substantially reduces the capabilities of the service.

Siri is even more elusive, with the control options open to the user scaling down to pretty much which language you want to use. A nice feature though is that you can teach the assistant who your mother, father, sister, partner or other close relations are, which can then be used in features such as Find My Friends. Otherwise it’s not entirely clear what the service knows about you. Cortana’s notebook feature is admittedly still quite basic, but the fact that there’s one at all is at least a step in the right direction.

Say what you see

Of course the most obvious area of intelligence in a computer assistant is the voice interface. There’s nothing quite like telling a program what you want and then seeing it do exactly that. Nuance, creators of Dragon Dictate, have pioneered this technology for many years, and know what it takes to make our devices seem truly smart.

Dragon Dictate

‘The personality of a system will be that instant thing that you latch onto about whether something is intelligent,’ explains John West, Principal Solution Architect at Nuance. ‘Taking you from your question to your answer in the quickest way possible. That’s different to the way you and I would interact as humans to a certain degree, because we could go off on tangents and come back. An intelligent agent in the future will possibly have the capability to do that, but at the moment its very much about you asking the intelligent agent to do something and the intelligent agent being able to fulfill that in the quickest possible way.’

Voice control has certainly developed to a point now where it’s a usable interface rather than just a party trick. Google’s voice search has also extended beyond mobile devices to the desktop Chrome browser. Opening a new tab on www.google.com and clicking on the microphone icon at the right of the search bar enables you to speak your search queries (so long as you have a built-in microphone), and in many cases Google will read the results back to you. In recent versions there is also the ability to ask related questions, for example – ‘When was Jaws released?’, then when the answer is given you can say ‘Who starred in it?’ and the system will know what you mean.

Google Chrome Google Now

Thanks to Google maps you can also ask for directions to a location and the system will work them out and display them. You might not realise it but Windows has, for some time now, also had a comprehensive suite of voice control tools built into the OS. If you search for Speech Recognition in versions as far back as Vista, you’ll find a program that enables you to navigate, dictate, and generally control your PC all via spoken commands.

While voice control may still catch on when using the desktop, the personal nature of a mobile device seems to make it easier to accept for users. After all, we already talk into our phones, so transitioning to speech recognition is a much smaller mental step. Then there’s the interoperability of the apps and information that these devices bring. In Cortana there is a novel feature which allows you to tell the handset ‘remind me about borrowing the suitcase next time I talk to Jim’, and Cortana will monitor your contacts until Jim is on the phone next then pop up the note. Very simple, very clever.

A major drawback of intelligent assistants is their need to be constantly online. Put your mobile device in airplane mode and you’re back to typing in calendar entries and reminders yourself. This is because most of the translation and processing is done on large, powerful servers in the cloud that can handle the number crunching required. Depending on the handset you can have basic elements of control completed locally – say ‘Launch Music Player’ – but the majority will likely be online for the foreseeable future. This isn’t surprising, as so much of the information an assistant needs to be intelligent is based in the cloud anyway. This is particularly true in one of the fastest developing areas in the technology at the moment – cars.

KITT cars

‘You can imagine a mapping database sitting in your car,’ explains John West, ‘it could be a TomTom, whatever is embedded in your vehicle, and you say to it ‘Find me the nearest petrol station’. It looks at where you are, it will know all the petrol stations around that area and can display them. However if you’re connected to the cloud, it could actually say, ah, John collects Tesco points for example, and he’s traveling North on the M1, and I know that, so I can provide John with the petrol stations going North on the M1 that are Tescos, but I also know that he’s only got fifty miles left of diesel, so I’ll only provide him the ones within the area that he’s going to or has the capability of reaching. Likewise it could say ah, there’s congestion here, so he possibly won’t make that, I’ll show the ones closer.

Apple CarPlay

‘With Dragon Drive we’ve got partnerships with a number of companies so that a user can say ‘Take me to a hotel in London’ and it finds the hotel you want to go to, again this is a hybrid between the handset and the vehicle, so you can arrange this in the house on your handset, and then when you get in the vehicle it’s aware that you’ve already set these destinations. Or it could look at your calendar and say oh, you’re going to this address and put it in. But then as you go along you could say ‘find me the nearest parking spot’, and you could then use services such as Parkopedia or Parkaround to be able to go out there and actively find you a parking spot. These services already exist, it’s just the method of bringing them together in a way that’s useable. If you wanted to do that right now on your PC you’d go to a mapping software, then a parking piece of software, but you couldn’t do it all in one piece of information. The personal assistant approach is very much a proactive system…if it knew your preferences, it could do those things for you.’

In many ways the car is the most obvious place for this technology to find its home. Being able to converse with an assistant verbally, organise your appointments, send email and text messages, have new arrivals read out to you, and conduct internet searches all without the need for a screen or physical interaction, would be a god-send for many. Google, Blackberry, Apple and several other giants have already outlined their in-car assistant programs, some already available in newer models, and it can only be a matter of time before the kids watch Netflix in the back while you get some work done on the long drive to the campsite for a summer break.

With the Internet of Things becoming a tangible reality in the near future the possibilities exist for a convergence of intelligent agents that could see you fridge knowing its contents and referencing your calendar to see if you have any dinner recipes planned for which you require ingredients. Our televisions will use biometric technology to recognise our voices and display the programs we prefer to watch, while GPS sensors in our cars and devices will tell our automated home systems when we’ll be home and set the temperature accordingly. If bathroom technology continues to advance then it might tap into this to run a warm bath when it knows we’ve been caught in the rain. But rather than a single, unified experience, it does seem more likely that companies will want you to sign up to their particular flavour of this brave new world. The future won’t be quite as egalitarian as we might have hoped.

‘From their perspective it gives you an allegiance to that brand or company,’ says John West, ‘so when you replace, say, your television set or set-top box or whatever, you’ll go with the company that are already set up with on your mobile phone…At the moment we’re spending a lot of time with organisations changing the [interface] voices for them to differentiate their service’.

Nuance is in a good position to watch these developments, as the company currently licenses technology to some of the largest brands in the world, including Apple, Microsoft, and Samsung, among many others.

‘It’s going to be difficult,’ West concludes, ‘at the moment it’s very much different people want to do different things. I see it really being…having this information sitting in the middle and people being able to access it. I don’t see any time in the future where we’re going to get a common set of APIs that will allow us access…I don’t think any of us would trust anybody with [sole control of] that information sitting in the centre. Who is going to own that? Who would we trust to own that?

‘We’ve been looking at the Internet of Things for example, about the amount of data that we’ve got, and the amount we’re willing to share. Who owns it? Who aggregates it? And such things. We’re very suspicious, and even more so recently, because of what’s gone on about data being leaked, the NSA, and all these sorts of things. The governments can’t be trusted with this information in the view of most people, so who would we trust? Would you trust Apple? It depends on your allegiance.’

[repost ]Cortana: Is Microsoft’s voice assistant better than Siri?


The forthcoming Windows Phone 8.1’s voice assistant combines Siri’s personality with Google Now’s knack for anticipation

A command prompt from Cortana, Windows phone software's virtual assistant. With Cortana, Windows catches up with Apple s iOS and Google s Android in a major way.
A command prompt from Cortana, Windows phone software’s virtual assistant. With Cortana, Windows catches up with Apple s iOS and Google s Android in a major way. Photograph: AP

“Yay, it’s Nick! How can I help?”

Thanks for asking, Cortana. And thanks for making the Windows phone software better, Microsoft.

With the new Cortana virtual assistant, Windows catches up with Apple’s iOS and Google’s Android in a major way. Microsoft takes some of the best parts of Apple’s and Google’s virtual assistants and adds a few useful tools of its own. The result is Cortana, named after an artificial intelligence character in Microsoft’s “Halo” video games.

The new Windows system, Windows Phone 8.1, has several other new features, which I’ll review separately later this week.

The update, including Cortana, will come with new phones starting next month, while existing phones will be able to download it for free in the coming months. On Monday, Microsoft made a preview version available to software developers. I was able to test that version over the past week.

Apple’s Siri virtual assistant on iPhones and iPads has a feisty personality. She has good comebacks for such questions as, “What is the meaning of life?” She’s also helpful with directions, restaurant recommendations and appointment reminders. Google Now on Android phones is boring by comparison, but better at anticipating your needs and giving you information before you even ask.

Cortana combines Siri’s personality with Google Now’s knack for anticipation.

Cortana also incorporates a feature for blocking calls, texts and notifications during times of your choosing, while letting you set exceptions for specific people or emergencies – defined as someone trying to call again within three minutes. That feature is separate on iPhones (though you can turn on Do Not Disturb, as it’s called, via Siri or directly from its “Control Centre”) and Samsung’s Android phones. Cortana will also identify the name of songs heard in a retail store or bar, while you need separate apps such as SoundHound or Shazam on other phones (though the iPhone and iPad versions of Shazam can listen continuously in the background; that isn’t yet in the Android version.)

Other differences include:

• Cortana asks rather than assumes

When you first use Cortana, she guides you through a brief questionnaire to gauge your interests. You can tap an icon on the top right to pull down a notebook and change your preferences.

Cortana also offers to scan your email for flights and other events to remind you about. Unlike Google Now, Cortana asks whether you’d like that flight tracked. Google Now does that automatically, and erroneously picks up itineraries that your travel companions send you. Also, Google Now works only with Gmail, while Cortana works with all major services except Yahoo, which Microsoft says it couldn’t track because of Yahoo’s terms of service.

Although Cortana avoids mistakes by asking, she requires slightly more work on your part. After asking Cortana about the latest Mets game, I had to tap a link to get future updates automatically. I didn’t have to do anything with Google Now. But Google Now also assumes that just because I search for a company once, I want its stock quote every day.

Cortana shows you game scores - but might forget what you asked.
Cortana shows you game scores – but might forget what you asked. Photograph: AP

• Cortana lets you ask follow-up questions

After asking Cortana for Mexican restaurants, I asked for ones that are open and got my list narrowed. I then asked for the ones that are good. Cortana responded with the Mexican restaurants that are both open and have at least four stars on Yelp.

Siri and Google Now tend to treat each request as new (though asking Siri “Which are the good ones?” will re-sort them by rating rather than distance), though both will let you make reservations through OpenTable. Cortana tells you only that a place takes reservations.

• Cortana offers more ways to set reminders

Like Siri and Google Now, Cortana lets you set reminders based on the time or location. When you arrive at work, for instance, she can remind you to mail a package.

Unlike the others, Cortana also lets you set people-based reminders. Let’s say your friend Mary just had a baby. You can ask Cortana to remind you to mention that the next time you call, text or email Mary.

Some of the interactions got frustrating until I manually added my work and home addresses to Cortana’s notebook.

Me: “Remind me to turn off the stove when I get home.”

Cortana: “All right, where should I remind you?”

Me: “Home.”

Cortana: “OK, what should I remind you about?”

Me: “Turn off the stove.”

Cortana: “When would you like to be reminded?”

Me: “When I get home.”

Cortana: “Sure, remind you when you get to home. Is this the one you want?”

The suggestion wasn’t for my home but “Home Restaurant.”

Microsoft says Cortana will figure out where you work and live over time.

Beta, could be better

Cortana is still in a “beta” test mode, so these kinks are to be expected. It’s fine for basic queries, though sometimes you have to ask a few times. In requesting directions, I sometimes got a simple web search for my destination address. But repeating the address then got me actual step-by-step directions.

I asked all three virtual assistants on Friday whether I needed an umbrella. Siri and Google Now both told me I didn’t, based on the fact that it wasn’t raining. Cortana answered, “I’m not entirely certain.” All three then presented a forecast.

By Sunday, Cortana seemed to have figured out I didn’t need an umbrella. But I asked the same question 10 minutes later and got web results for “Do I need an umbrella?”

Both Siri and Google gave me movie times when I asked, “When is ‘Frozen’ showing?” Clicking on a time took me to a ticketing service. Cortana simply conducted a web search.

Cortana warns me of conflicts when adding a calendar event, but the warning comes after the fact as a “by the way.” Siri warns me ahead of time, while Google Now offers no warning at all.

Ask Cortana to “tell me a joke,” and she tries to text “a joke” to my cellphone, or “Me” in the address book. Cortana also won’t compose email. Siri and Google Now do both email and texts.

These are all small points that I’m sure Microsoft will address over time. The company plans to keep Cortana in beta and limited to the US until the second half of the year, when the assistant will also debut in the UK and China.

Perhaps by then, Microsoft will offer a male voice, as Apple now does with Siri. In the meantime, enjoy interacting with Cortana. You can ask her to sing a song.

Overall, Cortana’s improvements over Siri and Google Now aren’t enough to compel a switch from an iPhone or Android phone – but Cortana does address an omission in Windows Phone for those already thinking of getting one.

Can Cortana and other features turn Windows Phone around?

[repost ]How Microsoft Cortana Improves Upon Siri and Google Now


Behind the scenes and hands on with Microsoft’s personal digital assistant.

While it is tempting to dismiss Cortana, the new personal digital assistant for Windows Phone 8.1, as Microsoft’s answer to Apple’s Siri for the iPhone or Google Now for Android, as yet another spin on the company’s old tactic of being the fast follower and not the innovator, keep in mind that in the hotly contested mobile arena, imitation is a considerable art form. See: Apple vs. Samsung, rounds one and two; Microsoft’s initial dismissive stance on tablets; Apple’s early dismissive stance on small tablets; Apple Maps; Google Play Music; Apple iTune Radio. Etc.

Cortana will ship as part of Windows Phone 8.1 in the coming months, which is about as precise as Microsoft will get on the timing. Microsoft announced and demonstrated the new technology, emerging from the company’s Bing platform division, at Build, the company’s annual developer conference, where attendees had an opportunity to get more of a hands-on experience with Cortana, and listen to Microsoft take a few swings at Apple and Google.

Read more: Windows Phone 8.1 Introduces Cortana

Most technology derivations and imitations either attempt differentiation or build on the prior work, but differentiation doesn’t necessarily beget improvement. What sounds good in focus groups, on marketing slides, or upon carefully staged reveals frequently fails to reach the cold ear of reality.

It is wise, then, to be skeptical of Cortana, a product that won’t ship for several weeks, and even then will likely maintain its beta status, much as Siri did when it arrived in 2011. While my own brief experience with Cortana was hardly flawless, and I witnessed Microsoft’s own team fail to get the results from Cortana they expected, this product does appear to both differentiate and build upon prior efforts. Cortana, and the claims Microsoft made regarding it, even pushed me to revisit both Siri and Google Now with a renewed purpose.

Granted, Windows Phone only enjoys only 3.3% worldwide marketshare (now ahead of BlackBerry, which sits at 1.9% but below iOS at 15.2% and Android at 78.6%), according to IDC’s 2013 estimates, but that market share is growing (Windows Phone shipments increased 90.9% in 2013, 46.7% in Q4). Cortana may well give the Windows Phone converts bragging rights on the personal assistant front. And while it’s difficult to say whether a successful Cortana will be enough to earn Microsoft new converts, it’s certainly not out of the question.

In addition to making Cortana available for some limited hands-on testing, Microsoft hosted a small Bing platform break-out session for the press and analysts, during which Bing executives held forth solely on Cortana. Afterward, I sat down with Stefan Weitz, director of search for Microsoft Bing to learn more about what powers Cortana, and to dive into the areas that make the technology sound so promising.

How Bing Powers Cortana
Cortana began 18 months ago as a collaboration between Microsoft’s Windows Phone and Bing groups, when Michael Calcagno, previously in Microsoft’s natural language group, took on the role of architect for the Bing Information Platform. The Bing service already contained all of Cortana’s key pieces, like visual recognition (recognition of physical objects) and the ability to make inferences. Microsoft just needed to build a platform to bring it together.

Microsoft had previously built an entity database, the technology that understands people, places, and things, and their relationships to other entities. The company dubs that technology “Satori,” and it’s what powers a search result that provides not just the simple answer to the question you asked, but all related information around it.

Microsoft has also been working on speech recognition, using deep neural networks (DNNs) to accomplish pattern recognition based on the way human brains process information. Waveforms are translated into bits and given to a speech recognition system, where natural language processing starts to make inferences about the user’s intent.

Apple and Google each perform natural language processing also, although each company employs some intellectual property to the process.

With Cortana, most of the processing happens in the cloud, but there is also some on-device speech recognition and information processing. That is, a Cortana query gets distributed to both the device and the cloud, with the results coalescing back on the device. Some of Cortana’s functions can take place entirely off line, Weitz said.

The combination of speech recognition, the use of DNNs, entity understanding, and inference comes together in a powerful way. Cortana parses what the user is talking about into a particular domain: Is it a device function? A reminder? A calendar entry? And within that domain, Cortana determines the intent of the utterance: What does the user want to do?

If one key differentiating theme emerged from Microsoft’s breakout session it was that Cortana was built around the concept of task completion, rather than knowledge derived from search or voice-assisted search. “We’ve become beaten down by the search model,” Weitz said, “so task completion has been lost.” Search, he added, has “evolved to be a noun-based retrieval system for [web] pages.”

The notion of task completion, or getting things done, was also the basis for Apple Siri when it first launched.

Cortana appears on the home screen as Windows Live Tile, but like Google Now, it is also powered through the device search function. It contains what Microsoft terms a proactive canvas, which is the information it gives you based on what it infers, and the reactive canvas, which responds to queries.

One important distinction the Microsoft team pointed out was that Cortana gives the user confirmation that it understood the request, and that it was finding the information; that is, not just a simple “OK” but a contextual, confirmative prompt.

Cortana frequently delivers query results by voice, but also in rich data presentations, rendered from the cloud. You can address Cortana by voice, but also from the keyboard, another difference from Siri. However, like Siri, Cortana has something of a personality (based on the personal assistant in Halo, voiced by Jen Taylor), with built-in, snarky answers to silly questions and a similar conversation-oriented approach. Let’s not mince words: Microsoft has borrowed liberally from the 2011 Siri playbook here.

5 Key Cortana Differences
1.) Context. One of the key things Microsoft’s Cortana brings to task completion is context, meaning that you can get a result from a query and ask further questions pertaining to that result.

When Weitz demonstrated with a search for good restaurants nearby, Cortana triangulated on ratings and proximity. He then asked if any in the result set were vegetarian and Cortana returned a subset of the first list. From that result, he could ask for information on one of them (“how far to the first one?” or “make a reservation with the second one”), meaning that Cortana understood that it had provided a list, and the request being made was on that list. Cortana adjusts the request vocabulary to what’s on the result page.

When I tried this with Apple’s Siri, it got stuck on the second step, looking up the word “vegetarian” instead of finding a vegetarian restaurant, let alone one from the list of nearby restaurants.

That doesn’t mean Siri lacks contextual awareness. When I asked Siri which restaurant was closest, it did re-sort the list by distance. When I asked Siri to make a reservation at one of them (I called it by name), Siri was able to determine that the restaurant didn’t take reservations and gave me the information to call on my own. Upon finding a list of restaurants, you can ask Siri “is it OK for kids?” When finding out what movies are playing, you can follow that up by saying “with Russell Crowe” or “buy tickets,” and Siri will follow the right path.

When I asked it what the weather was and got my reply, I then asked “what about New York,” Siri knew I was still talking about the weather and provided it. When I asked Siri “what about this weekend?” she gave me the upcoming weekend forecast. But when I asked Siri “what about Big Sur?” she reverted back providing me with web pages about Big Sur. Siri is being tuned for what it hears the most.

Similarly, Google Now also provides context for some things that it already knows. For example, if you ask it for pictures of the Space Needle, and then ask “how tall is it?” Google Now knows that “it” is the Space Needle. However, when I asked Google Now to show me pictures of the Hollywood sign, and then asked “where is it?” I got directions to the Space Needle. When I asked Google Now for a list of nearby restaurants, and then asked “what about Italian?” it gave me a list of nearby Italian restaurants. For some restaurants, you can even ask “show me the menu” and it will do so.

Without a longer list of examples, it’s difficult to determine just yet how much more powerful Cortana will be, but you can start to see some of the promising subtleties here if everything works as promised.

2.) Inference. Cortana is like Google Now in that it mines device signals so that it can do a better job of understanding your habits, interests and priorities. At the hardware level, it’s looking at location, battery state, movement (or lack of movement), and from those it might infer where home or work is. It tracks search history, looks at your calendar, and even into your e-mail. For example, it might notice flight information within an e-mail, and ask you if you want to track that flight.

Obtaining these signals requires the user to grant Cortana permission, which Microsoft believes is an important distinction.

Siri adds some of this inference-based personalization also, but it’s a bit more limited. It figures out where home and work are, or you can tell it. If you ask Siri to call your brother, it will prompt you to tell it who your brother is.

Cortana promises to take things one step further. Ask it to remind you to ask your brother if you can borrow his truck, and it will provide a reminder prompt on your next interaction with him, no matter what form that interaction takes. You have to tell Siri to send you a specific reminder during a specific event (as in, “remind me to call my brother when I get home”).

Google Now uses a combination of customizable cards (there are now dozens of them, including commute, flight delays, reservations, travel helping cards for things like translation and currency conversion, smart reminders for things like store chains, so you can set reminders to buy an item when you enter a particular store); and inference based on signals and information gleaned from the Google services (calendar, mail) running on your device. This is a very powerful combination, but while the cards are pretty powerful, the Google Now inferences don’t seem to go as far as Cortana’s.

Cortana will also work across services, even non-Microsoft ones. For example, it can read what’s in your Google Mail. “[Google Now] is magic if you’re all in on Google,” Weitz said. If something syncs to your Windows Phone device (meaning the information is on the device), Cortana can use it, he added.

3.) Transparency & Customization. Transparency is one of the key hallmarks of Cortana, and an area Microsoft emphasized heavily. Executives seemed determined not to allow Cortana’s omniscience to be mistaken for creepy. If Cortana infers that a location is home, it will ask you to confirm it. Because the devices Microsoft made available weren’t meant for us to personalize, it’s difficult to say how far these confirmations reach, but Microsoft implied that all inferences require user acceptance.

What’s more, Cortana includes a “notebook” on the device user, essentially a collection of the things it learns, infers and tracks. This concept stemmed from Microsoft personnel going out and talking to real-life personal assistants and asking them what made them good at their jobs. One key finding: the assistants kept all sorts of information about their client in a notebook.

In Windows Phone 8.1, you can actually go into that notebook and edit or add information. This includes information about your interests, places of importance, music preferences, reminders, settings and even your “inner circle.”

Your Inner Circle are those people with whom you have some heightened relationship, say a close colleague, a sibling, a friend. This Inner Circle function pulls information from what’s on your phone, from the People app or Microsoft Lync or even from Facebook. You can go into the Inner Circle entry in the Notebook and assign relationships or nicknames (up to three), and you can even tell the phone’s “quiet hours” mode that some of those people are allowed in (this doesn’t extend to those you’ve pulled into the Inner Circle from Facebook, Weitz said).

Google Now has a similar concept to Cortana’s Notebook. Your personal settings, or what Google determines about your actions with the device and its services, are easily accessible. While you can make some alterations and additions, those are fairly limited compared to what we saw with Cortana. For example, in Google Now, there are two Places that matter: Home and Work. That’s it. In Cortana, you can add favorite places manually.
In Google Now you can add sports teams, and you can tell it your preferred mode of transportation, but you have to pick only one. You can give it stocks to follow, and what TV and video streaming service you prefer (Hulu, Amazon Prime, Netflix, etc.). There’s also a bucket for everything else, a little hodgepodge of what it infers your interests are, but you can’t manually add to that inferred list, as we’re promised you can do in Cortana.

4.) Self-tuning. There’s a great deal of behind-the-scenes work constantly going on in Cortana, especially getting it to understand a user’s intent and self tuning based on user behavior. For instance, if you do a voice-based search, and it provides the wrong results, which it did during some of our brief testing, it recognizes that it has made a mistake when you ask the question a second time. On the back end, the platform then adjusts.

Or when a query ends in a web results page, that might be a signal to Cortana that it has failed to properly provide a more precise result, and it learns and adjusts, listening more intently to the next query to see what you meant, assuming you ask the question in a slightly different way. Weitz said that the Cortana/Bing service combines a certain level of human, or manual modeling when mistakes reach particular threshold, in addition to the more automated, machine-based learning.

One example Weitz used to illustrate Cortana’s self-tuning nature was a request for the location of a good “BBQ joint,” which Cortana didn’t initially understand as a request for a restaurant. In quick time, based on his followup question, Cortana learned and added that term to its vernacular.

As a simple test, I asked Siri: “how chilly is it?” Siri gave me the weather. Google Now didn’t even understand the question. When I asked Siri and Google Now if I should wear a jacket, both services gave me the weather. In other words, all of these services are working on this at some level. What’s more, both Apple and Google have had years and hundreds of millions of customers using the services to help tune the services.

Apple has also fine-tuned its algorithms to understand regional accents, both in the U.S., and for a variety of other languages (and the subsequent dialects). That’s right: Siri can alter its language model based on whether you have a Boston or a Texas accent. You can also teach Siri how to pronounce names. And just hit the little “?” upon entering Siri mode and there’s a pretty incredible list and drill-down into the many things you can extract from Siri. Ask Siri what planes are above you, to find gas stations along your route, or to dictate a message.

One thing that would be useful for all of these services is the ability to watch the decay of your interest — say in basketball once March Madness ends — and start to move that out of your information stream. With Cortana and Google Now, you can manually tell the system you’re no longer interested in a topic.

5.) Cortana APIs. Microsoft is also providing Cortana APIs, so developers can give the digital assistant direct access to the databases and processes within an application. Weitz said that if the app’s web service can support a deep call, there’s an almost limitless amount of access Cortana can be granted. Cortana can check on your Facebook status, but there’s also no reason you couldn’t ask Cortana to search for a particular Facebook post.

There’s a new version of Skype in Windows Phone 8.1, and Cortana can access it if you ask it to “get me [person's name].” You can add content to a Hulu queue. Only this small handful of apps will be part of the Cortana rollout (those just mentioned, plus Flixster and Twitter), but other apps could easily start to take advantage of this as well with some simple API calls.

Google has not provided API access to Google Now.

Apple doesn’t provide APIs for Siri, either, but the list of applications and services that Siri supports is pretty robust, including Facebook, Twitter, OpenTable (Siri will use your credentials with the on-device app to make restaurant reservations), Fandango, MLB and Yahoo. Oh, and don’t forget Microsoft’s Bing.

[repost ]In Depth Intro To Microsoft’s Cortana: Interview With Savas Parastatidis


The dream of having a real virtual personal assistant is becoming more real each day. Before Microsoft’s Cortana, the two competitors have been Apple’s Siri and Google Now. Both are impressive in their own right but Cortana might just be ahead of the curve. It will be available onWindows Phone 8.1.

This interview is with Savas Parastatidis, from Bing, and took place on April 3, 2014 via Skype.

  • What is your title at Microsoft / Bing?

    Principal Architect in Bing, Application Services Group. I am the architect of the team that is responsible for the cloud platform enabling Cortana.

  • How many people are on that team?

    I am afraid I can’t disclose that information. However, I can say that the effort involved many teams across Bing, not just mine. Most of the Bing teams contributed in some way or another to the Cortana project. It was a hugely collaborative effort across teams, and across divisions in Microsoft.

  • How did Cortana get it’s name?

    Robert Howard, a Principal Program Manager in the Windows Phone team, made the recommendation to Marcus Ash (his manager) to use “Cortana” as an internal codename. That happened at the beginning of the project. Everyone loved the name.

  • It does have a nice sound to it!

    The engineering teams loved it! As you might already know, Cortana is the AI character in the Halo universe.

  • It sounds like Cortana will be a mixture of Google Now and Siri. Will this be the case?

    Cortana is a personal assistant that tries to learn the user’s interests and proactively offer information. Cortana also has personality. Our goal is to make users feel that they are interacting with their own personal assistant, who knows them, who provides information that is relevant within the right context. Cortana’s personality helps our users feel more comfortable.

  • How will you be able to activate Cortana?

    Cortana takes over the search button on Windows Phone. If you tap on the button, you launch Cortana’s home page where the proactive information, based on your interests, is displayed. If you press the search button and keep it pressed, you can speak directly. Even when in Cortana’s home, you can tap on the microphone button. We wanted users to feel that their search functionality wasn’t lost, that Cortana takes over and provides them with access to the world’s knowledge.

  • Can you explain some of the helpful ways Cortana will be in use?

    Sure. You should think of Cortana as your personal assistant. We actually talked to human personal assistants in order to understand how they interacted with those they support, to understand what it is they do for them. So you can ask Cortana to place a call, to send a text message, to open an application. Or to set a reminder… “Remind me to take out my garbage when I get home” or… “Remind me to wish happy birthday when I talk with Mary”. Cortana will do the right thing. Cortana learns about you but you are always in control about the information that she uses to offer you her services. Exactly like a personal assistant in real life would do. Cortana uses a Notebook that contains all the information that she knows about you. You can search for a flight and then ask Cortana to keep track of it. The flight will go to her notebook. Or, if you receive your flight’s details via email, Cortana will extract the information from the email and ask you if you want for her to add it her notebook. You are always in control of what Cortana adds to her notebook. You are always in control on whether Cortana should be checking your email or not.

  • Love the concept of Cortana’s notebook.

    Indeed… We believe it’s an experience with which our users will feel at ease. The team designed the experience because they truly believe that the users will understand it and will feel more comfortable about all the things that Cortana can do for them.

  • And for the questions it can not answer, Cortana falls back to Bing, correct?

    That’s correct. There are some experiences like weather and places/locations with which you can have a very rich interaction/conversation with Cortana. You can ask something like “Should I take a scarf with me tomorrow?” Cortana will understand your question and respond appropriately. You can then follow up with another question such as “What about New York” or “How about the weekend?” Cortana can understand language and also conversational context. More and more of our knowledge domains will incorporate this capability. The more our users interact with Cortana, the more we learn about the language patterns and the types of questions they ask (anonymously). So that we can improve the features that we offer. We also have a category of answers that are not yet conversational but they are still very rich. You can ask questions such as “How did the Seattle Mariners do” to get the latest score from their game or “How many calories in a hamburger?” Or “What’s the tallest mountain in the world?” And so on. If everything else fails and we can’t give you a response, we will fall back to the wisdom of the web.

  • For the voice queries that Cortana can actually answer, what third party knowledge sources will be in use on Windows Phone 8.1? Are there any differences with which knowledge sources Cortana uses versus Siri?

    Bing is our big knowledge source. All queries go to Bing.

  • Really?

    Yup. However, Bing incorporates various third party sources. For example, if you ask for “Good restaurants around here”, the response will incorporate Yelp ratings in order to show you only those restaurants with 4 stars and above. If you then say, “Show me only the ones that accept reservations”, the conversation continues by reducing the result set for only those restaurants that can accept reservations. Bing conflates data from various data sources in order to enable this type of experience.

  • At the moment, Siri uses Bing as a backup, and you are saying Cortana uses Bing all the time, not just as a backup? That would be a major difference between the two.

    When we say that Bing powers Cortana, we do really mean that it is behind all the Cortana experiences. The user’s notebook is maintained in the cloud. Speech recognition is done both on the device (for the cases when there is no internet connectivity) and in the cloud (better models, better accuracy). Language and conversational understanding happens in the cloud. Same is the case for all knowledge queries. All the notification scenarios (e.g. tell me when my team scores or send me an alert if my flight is delayed) are driven by the cloud. “The cloud”, in all cases, is Bing.

  • After you make a voice query with Cortana, what happens behind the scenes. Can you explain step by step?

    1) The voice is streamed to the cloud.
    2) Our speech recognition servers host speech models which have been trained using Deep Neural Networks.
    3) The output of the speech reco phase goes through our natural language understanding phase so that we can figure out the user’s intent (e.g. when the user asks if they need an umbrella, then we determine with some degree of confidence that they are interested in the weather).
    4) In the next step, we go through the conversational management engine that maintains context about the user’s conversations. This allows us to support turns in a conversation and carry the context and the user’s intent forward as we help them accomplish a task.
    5) We use the information from the user’s notebook to resolve gaps in our understanding (e.g. “home” becomes a geolocation).
    6) If the user’s intent is to find information, then we query our Bing knowledge sources.
    7) Finally, we generate language (a response) and then the rendering of the response (HTML) to send to the device.

    The above captures the flow for a query. When we monitor the world’s information streams for the things in which the user is interested (e.g. traffic to the location of the next appointment, flight status, etc.), we proactively send notifications.

  • Siri stores voice queries in audio files (I think somewhere in North Carolina). Some might have issues with this. Will Cortana save these queries in audio format? What security measures are in place?

    We take our users’ privacy extremely seriously. We store our users’ information in a highly secure store. Not even our own developers have open access to the data. The speech queries are completely anonymized. That’s the way the service can improve. However, there is no way to track a particular voice query to a user.

  • How do you anonymize the queries?

    The voice streams aren’t associated with any identity in the cloud. For more detailed information, I will have to connect you with a domain expert from that team.

  • Cortana was developed to make life easier. For that concept to really work, you would need to trust the assistant 100%. One could argue that the definitive search results could be paid for. And unlike Bing.com, there would only be room for one answer. What are your thoughts on the ethical challenges at hand and how does Bing plan to handle them?

    Excellent observation! I truly believe that personal assistants are the future. I believe (my personal belief) that they are going to replace the way in which we interact with technology, the way we consume information, and accomplish tasks. I would go even further and say that the application model will go away. If I can accomplish a task through my personal assistant who knows how to interact with a 3rd-party service, why would I need a separate app?

  • Exactly, it makes total sense!

    Or, if my personal assistant can interact with the installed app, it would still feel natural for me to go through my trusted assistant. For a personal assistant to become my gate to the information/knowledge/services world, he/she will have to gain my trust. I truly believe that any service that doesn’t try to truly help the user, that doesn’t have the user’s interests as a priority, will eventually become irrelevant. Personal assistants will further emphasize this point. It’s exactly as in real life. If your personal assistant starts giving you information that isn’t relevant to YOU or that doesn’t really help YOU in completing a task, they will be fired, no? I don’t think there is a difference here.

  • Will there be paid ads via Cortana?

    Cortana doesn’t display any ads. We are trying to do our best for the user. When the user asks a question, we truly try our best to give the best response. It’s only when we fall back to Bing that an ad might be shown. But falling back to Bing is considered a transition to a different experience. I believe that personal assistants will find other ways for monetization. We have ideas, but I can’t discuss them yet.

  • With Cortana, ads would make the user wonder if the info you were displaying was actually correct or just paid for.

    Yup. The moment the user starts second guessing the honesty of the personal assistant, any previously established trust is broken.

  • Because there are less choices…

    Exactly. It’s actually an observation we made internally. Since we can only give one answer, we really need to get it right. Instead of showing 10-20 blue links and let the user pick, a personal assistant really needs to show just one answer to a direct question. It’s a very difficult problem to get right.

  • How will the relationship be with Apple after developing a competitor to Siri? Will it be a conflict of interest?

    I think our relationship with Apple is a solid one but I am not privileged to the details of our agreement.

  • What is your favorite aspect of Cortana?

    It’s been a dream of mine for many years to build a personal assistant. I use to give talks about my research work on knowledge representation. I am dreaming of a world where human-computer interaction would be ubiquitous, technology would be hidden.

  • A device would appear to do nothing, but would actually be doing everything.

    I am excited that we have made the first step towards that vision. Cortana, for me, is just the beginning. She represents the first step towards a longer path. I am super excited about taking the next steps towards that path. Any device would act as a gateway to the cloud, the universe of services, and knowledge. Yes, devices will be doing much more than what they can do today. But they will be part of the internet of things. Together, devices and cloud services would really transform the way we interact with the world of knowledge and how we accomplish tasks. Personal Assistants would become our interface into that world. Yes, I am excited!

  • When will Cortana be available to the public?

    Those with a Windows/phone developer account will get the update on their existing devices by mid-Apr. Then, over the summer, new devices will start becoming available. Also, operators will be able to push the update to all existing Windows Phone 8 devices.

Here are a few screenshots of Cortana:

Cortana Screen Shots


[repost ]The story of Cortana, Microsoft’s Siri killer


Technically, Cortana isn’t supposed to exist for at least another 500 years, but that’s not stopping Microsoft from bringing her to life this week. While Apple has Siri and Google has Google Now — both digital assistants that run on smartphones — Microsoft is taking an approach that mixes the best of the competition with its own unique take. Based on a 26th-century artificially intelligent character in the Halo video game series, Cortana will debut as part of Windows Phone 8.1, the next big update for Microsoft’s mobile operating system.

By learning your habits and interests continuously, Cortana is positioned as a personal digital assistant that helps you organize your day-to-day activities, alongside regular web searches for information. Cortana will act as the primary way to discover and search for information on Windows Phone 8.1, or just an assistant to manage your meetings, reminders, and daily life. She’s smart and witty, all while being designed to closely resemble a human assistant. With the competition already years ahead, Cortana arrives at a time when Microsoft is focused on catching up in mobile. Cortana is a significant new feature for Windows Phone 8.1, one that has been in development for more than two years. In many ways, Microsoft’s bold new mobile efforts rest on her virtual shoulders. This is the story of Cortana, and how she came to be.

The name started from a simple suggestion from Windows Phone program manager Robert Howard in an early planning meeting. “It was just a codename, it stuck,” explains Marcus Ash, group program manager of Windows Phone. “We didn’t intend for it to be the actual product name from the beginning.” The fact Cortana exists simply as Cortana, and not some marketing buzz like “Microsoft Personal Digital Assistant Home Premium” is surprising given Microsoft’s history of naming products. Up until a few weeks ago, it was hit and miss whether Cortana would be the final name. It could have been Naomi, Alyx, or a number of other suggestions, but leaks and a petition to use the Cortana name helped sway Microsoft’s decision.


The Cortana naming and background is linked directly toHalo, and meshes well with Microsoft’s main goal for the product: recreate a real personal assistant without being too creepy. Cortana was always there for Master Chief in the Halo games, and now she’s always there for you on your phone, but only if you want her to be. Rival services like Google Now dig deep into data from devices, and while that’s often useful it can also be irritating in the form of non-stop notifications, or just scary that the system knows so much about you. To avoid this, Microsoft spoke to a number of high-level personal assistants — yes, actual humans — and found one that kept a notebook with all the key information and interests of the person they had to look after.

CORTANA HAS A NOTEBOOK, JUST LIKE A REAL ASSISTANTThat simple idea inspired Microsoft to create a virtual “Notebook” for Cortana which stores personal information and anything that’s approved for Cortana to see and use. It’s not a privacy control panel, per se, but a list of everything Cortana knows about you. “It’s her view of you, but clearly you can just snatch it from her at any time and say ‘That’s not right, I don’t want you to know this’ or ‘I’m not comfortable with you reading my email,’” explains Ash. “So you have complete control over what she knows and she’s transparent about it.” Entries in the Notebook are stored in the cloud, and you can share contact information with it, as well as your interests, home and work locations, and more. The notion of Cortana acting as a personal assistant with a notebook— as opposed to a creepy stalker — has been drilled into the team from the beginning, they say. She also operates and functions by learning your habits and interests from your phone use, location, and communications. You can speak to Cortana or just input text, but she’ll always ask you before she stores any information she finds in her Notebook.


When you first launch Cortana, she runs through basic questions to learn about you — your name, your food preferences, what category of movie you like, and so on. After that, when the service is activated with Windows Phone’s search button, you can swipe down to see a “proactive view” of information. It’s very similar to Google Now’s cards, with information on flights, sports results, stocks, and anything else Cortana has learned and jotted down in her Notebook. You can improve the Notebook by manually adding your personal interests, reminders, news, and other important data. It’s really a hub of information that turns into cards, and parts of it can be pinned as Live Tiles on the Start Screen or used to generate notifications in Windows Phone 8.1’s new Action Center, a notification hub similar to those found in iOS and Android. If, for example, your favorite football team just scored, Cortana can alert you. If you visit a foreign country you’ll be greeted with weather information, currency conversions, and maps. If you’re in a text or email message, Cortana will underline elements like “let’s meet at 8PM” to make it easy to set reminders or calendar appointments.


One of the most useful features of Cortana is its ability to trigger actions based on events, a little bit like the popular web service IFTTT. For instance, saying “Remind me the next time I call my wife that we need to talk about Kevin” will create a reminder that is triggered when you next go to call your wife or she calls you. It’s powerful, but Cortana even impresses during basic search queries. If you search for “What’s the best restaurant near me” you won’t get a big list of results like you do with Siri, you’ll get a single restaurant that’s rated the best in the area by Yelp users. “If you asked a real assistant, ‘What is the best restaurant’ and she held a page up to you, you would fire her and try to find another one,” jokes Rob Chambers, principal group program manager of Bing. The difference is that if you had said “the best restaurants,” plural, then you’d get a list thanks to Cortana’s understanding of the voice queries and their context. The truly impressive moment is when you’re simply able to say “call it” after asking for the best restaurant, or “give me directions” and Cortana understands you mean directions to the restaurant in your previous part of the query. It’s true multistep search, a way to layer query upon query to accomplish complex tasks by voice alone. It feels like the future.

In Windows Phone 8.1, Cortana appears as little more than an animated circle, but that doesn’t mean she doesn’t have a personality. Like any good assistant, and Apple’s Siri, Cortana’s personality shines through in daily use. Ask her, “Who’s your father?” and Cortana will reply, “Technically speaking, that’d be Bill Gates. No big deal.” Other queries produce witty responses, and some answers make the circular character spring to life and animate with one of 16 emotions. Cortana won’t always respond with emotion and animations, but Microsoft envisions a future where she reacts visually to sports scores or other events — any good assistant knows to be pumped when a football team wins and furious when they lose, and so does Cortana now. “There’s just more stuff we’re gonna be able to do with the shape as we progress along this journey,” explains Ash.

Microsoft has also worked closely with Halo developer 343 Industries on the eyelike visual elements and voice actress Jen Taylor for the sound of Cortana. Taylor is the voice behind Princess Peach, Toad, and Toadette in various Mario games, but she’s best known for her role as Cortana in the Halo series. For Halo fans — and there are a lot of them — having Jen Taylor as the voice of Cortana in Windows Phone is a big deal, and for Microsoft it’s equally significant. “She’s gonna play a pretty big part in how we roll this out and how we evolve this speech technology,” explains Ash. Initially, Taylor will be used primarily for what Microsoft calls “chit chat” responses, queries where the company can use original audio. If you ask “‘What’s up with Master Chief,’” or anything related to the Covenant, then you’ll get a Taylor response. Other interactions, meanwhile, use a synthesized voice that’s similar to Taylor’s. (If you want some more Halo-related fun with Cortana, you can just set your nickname as “Master Chief” in the settings.)


Microsoft didn’t just magically build a digital assistant in two years — the company is leveraging investments in data gathering that it has been making for half a decade. Cortana relies on Bing’s backend services for the majority of its features, and that’s backed up by thousands of servers crunching data in the background. Microsoft’s Windows Phone team worked closely with engineers from Bing to bring Cortana to life. Just as Google Now is indelibly connected with its namesake search engine, it’s impossible to imagine Cortana would ever exist without Bing.


I met with a number of the Bing architects behind Cortana, and it’s clear they’re excited to see their work represented in a single product. While Microsoft is transforming Bing into a platform and service, it’s typically viewed as just another search engine; Cortana, on the other hand, is a true showcase. Over the past several years, Microsoft’s Bing engineers have been working on several services that play a crucial part in Cortana. Foundational technologies like natural language processing and the linking of real-world objects to web data are key, but they were built without a specific end product in mind. If Bing is the house, Cortana is the shiny red sports car in the garage.

To glue all these bits of Bing together, Microsoft’s Mike Calcagno, partner development manager at Bing, joined the search side of Microsoft 18 months ago, and his first big project at Bing was Cortana. “My assessment of the approach when I got here though is … someone needs to actually pull these services together in a way that’s coherent,” he says. “Everybody who was working on it had a little Cortana doll, and we all put them in our office and when you walked around you saw like….’Oh….he’s in, he has the Cortana doll.’”

This bond carried on throughout the Cortana project. The Bing team spent so much personal time with the Windows Phone group that Cortana eventually thought Calcagno’s work address was a local bar in Bellevue, Washington. “We just really got along with that team. We lived the project with those guys and really worked as one team, and what came out of it really is this first version of Cortana.” You could call it a good example of the “One Microsoft” philosophy that former CEO Steve Ballmer unveiled shortly before his departure, where teams work closely together instead of competing internally.


Cortana is the first big test of a giant mind-meld between Bing services, and it’s one reason Microsoft is placing a beta tag on the feature initially. The system needs to learn and improve over time, especially on the voice recognition side, and Microsoft is only launching it in the US for now. In the days leading up to its unveiling, the team was still fixing bugs in the background. Bing principal program manager Vish Vadlamani recalls spending countless days working from 7AM until 11:30PM on Satori — a self-learning system that voraciously chews through thousands of gigabytes of content for Bing’s indexes every day — and he’s hopeful the work is really going to pay off with Cortana’s launch. “It’s in some ways exhilarating, and some ways scary,” he admits.

CORTANA WOULD NOT EXIST WITHOUT THE POWER OF MICROSOFT’S BING PLATFORM“The vision behind what we’re doing here is that this intelligence can expand beyond Windows Phone,” explains Bing director Stefan Weitz. But where exactly Microsoft will take Cortana in the future is still largely a mystery. Third-party apps will be able to integrate with the service, allowing users to simply say, “Hulu, show me the latest episode of Modern Family” and the app will launch with the latest episode, rather like the way voice search works on Xbox. Combined with the reminders, it’s an example of how useful and powerful speech is when it’s done right.

Microsoft has seen what Apple and Google have done, wrapping some of the best ideas from Siri and Google Now into one attractive, easy-to-use interface — but now, the real trick will be to leverage Xbox, Windows, and Microsoft’s other products to get Cortana everywhere. The Bing home page will be updated in the coming weeks with notifications and information displayed in Live Tiles, personalized for each user, perhaps a small sign of things to come. The company has an always-on microphone in millions of houses through Kinect, hundreds of millions of computers running Windows, and a healthy new attitude toward iOS. For now, Cortana lives in your pocket, but her voice might soon be everywhere.

[repost ]Cortana诞生背后的故事:微软的Siri杀手



微软在Build开发者大会首日发布了Windows Phone 8.1系统,谣传已久的语音助手Cortana也随之亮相。国外媒体今天发布文章讲述了该微软Siri杀手诞生背后的故事,内容包括名字起源、个性化的虚拟“笔记本”功能、特色功能等等。




苹果谷歌智能手机平台上均已推出各自的语音助手——Siri和Google Now,而微软新推出的Cortana则融合了二者的优点,并加入自己的特色。Cortana基于《光晕》(Halo)游戏系列的一个26世纪人工智能角色,将随微软移动操作系统全新的重大更新Windows Phone 8.1推出。

Cortana的定位是,通过不断地学习你的习惯和兴趣偏好,充当个性化的数字助手,帮助你管理日常活动,并支持惯常的网络信息搜索。它将会是Windows Phone 8.1上主要的信息发现与搜索方式,以及管理你的会议、通知提醒和日常生活的助手。

它聪明机智,设计上力求起到人工助手般的作用。竞争对手们拥有先发优势,Cortana的诞生正值微软专注于在移动领域迎头赶上。Cortana是Windows Phone 8.1的一大新特性,其研发历时两年多。以下是该产品诞生背后的故事。


它的名字源自Windows Phone项目经理罗伯特·霍华德(Robert Howard)在早期规划会议中的一个简单提议。“当时它只是项目代号,”Windows Phone团体项目经理马库斯·阿什(Marcus Ash)解释道,“我们一开始并没打算将它作为实际的产品名称。”鉴于微软命名产品的历史,其语音助手命名为Cortana而非诸如“微软个人数字助手家庭高级版”的营销词颇令人意外。几周前该产品的名称都还未确定。它本来可能命名为Naomi、Alyx或者是其它被提议的名称,但媒体的爆料以及采用Cortana的内部请愿影响了微软最终的决定。

Cortana的命名和背景与《光晕》直接相关,很好地切合了微软对该产品的主要目标:再创造一个不那么令人恐惧的个人助手。Cortana在《光晕》游戏中随时为Master Chief待命,而现在它在你的手机随时为你待命。像Google Now这样的竞争服务会深度挖掘来自设备的数据,这么做很有用处,但也会引发无休止的提醒,令人厌烦,而且该系统对用户的深入了解也会让令人毛骨悚然。为了避免这种情况的出现,微软与多位高级个人助理(是的,是真人助理)进行了交流,期间发现有一位助理会随身携带一个笔记本,上面记录了有关其要照看的那个人的所有重要信息和兴趣爱好。






第一次启动Cortana的时候,它会问一些基本的问题以便了解你——你的名字、你喜欢的食物、喜欢的电影类别等等。通过Windows Phone的搜索按钮激活该服务后,你可以下拉屏幕看到前瞻信息,跟Google Now非常相似,涵盖航班信息、体育赛事比分以及其它Cortana了解到并记录在Notebook上的东西。

你可以通过手动加入自己的个人兴趣、提醒、消息和其它的重要数据来改进Notebook。它是可转化成卡片的信息中心,其组成部分可变成“开始”屏幕上的动态磁贴,或者用于在Windows Phone 8.1全新的活动中心(Action Center,类似于iOS和Android的通知提醒中心)生成提醒信息。






在Windows Phone 8.1界面中,Cortana只是一个圆环图标,但这并不意味着它没有个性。与苹果Siri等出色的语音助手一样,Cortana在日常使用中会表现出个性。问它“谁是你的父亲?”,Cortana会回答,“从技术上将,是比尔·盖茨(Bill Gates)。没什么。”对于其它的搜索请求,它也会给出机智的回答,有时候在回答问题时它的圆形图标会焕发出活力,呈表情状(共有16种表情)。Cortana并不总是给出表情和动画,但微软设想这么一个未来:它能够形象化地反映体育赛事比分或者其它的事件——喜欢的足球队赢球的时候它会给出兴奋表情,输球的时候则会表现得很愤怒。

微软还在视觉元素和Cortana配音上与《光晕》开发者343 Industries密切合作。演员珍·泰勒(Jen Taylor)是Mario游戏中多个角色的配音员,不过她最为人熟知的角色还是《光晕》系列中的Cortana。对于广大《光晕》粉丝来说,让泰勒给Windows Phone中的Cortana配音非常重要,对微软而言也同等重要。“她将在Cortana的提供和这项语音技术的演变上发展上起到非常重要的作用。”

一开始,泰勒的声音将主要用于给予微软所说的“闲聊”回复。如果你问“Master Chief怎么了”或者其它与Cortana相关的事情,你就会收到泰勒的回复。与此同时,其它的互动则使用类似于泰勒声音的合成音。


微软对其数字助手其实并不只是开发了两年——它借助了历时5年的数据收集成果。Cortana大多数的功能都依赖于必应的后台服务,而必应则是基于在背后挖掘数据的数千台服务器的支持。在Cortana的开发上,微软的Windows Phone团队与必应的工程师展开了密切的合作。正如Google Now离不开谷歌搜索引擎,很难想象没有必应的支持Cortana能否诞生。


Cortana可谓微软CEO史蒂夫·鲍尔默(Steve Ballmer)离任不久前提出的“一个微软”理念的绝佳例证,体现了公司内部团队之间的紧密合作,而非相互竞争。




“我们现在从事的工作背后的愿景是,该智能服务能够延伸至Windows Phone以外。”必应主管斯特凡·威兹(Stefan Weitz)解释道。不过微软未来会将Cortana带向何方还是未知之数。微软高管称,第三方应用将能够整合该项服务,如让用户只需要说“Hulu,给我找出最新一集的《摩登家族》”,Hulu应用就会显示出来,就像Xbox现在的语音搜索功能那样。这种功能以及提醒功能,体现了出色的语音技术的实用和强大。

微软融合了Siri和Google Now的一些优点,为Cortana打造出颇具吸引力且简单易用的界面——而现在的关键将是利用Xbox、Windows和微软其它的产品使得Cortana无处不在。必应主页将会在几周后进行更新,其通知提醒和信息会以动态磁贴形式呈现,针对每名用户进行个性化呈现,这些或许是微软未来动作的小征兆。微软通过Kinect进入了数百万的家庭,Windows运行于无数的电脑上,最近它对iOS平台的态度也出现转变。Cortana暂时只是存在于手机中,但或许不久后它的声音会传遍各个角落。



[repost ]Anticipating More from Cortana


Most of us can only dream of having the perfect personal assistant, one who is always there when needed, anticipating our every request and unobtrusively organizing our lives. Cortana, the new digital personal assistant powered by Bing that comes with Windows Phone 8.1, brings users closer to that dream.

For Larry Heck, a distinguished engineer in Microsoft Research, this first release offers a taste of what he has in mind. Over time, Heck wants Cortana to interact in an increasingly anticipatory, natural manner.

Cortana already offers some of this behavior. Rather than just performing voice-activated commands, Cortana continually learns about its user and becomes increasingly personalized, with the goal of proactively carrying out the right tasks at the right time. If its user asks about outside temperatures every afternoon before leaving the office, Cortana will learn to offer that information without being asked.

Furthermore, if given permission to access phone data, Cortana can read calendars, contacts, and email to improve its knowledge of context and connections. Heck, who plays classical trumpet in a local orchestra, might receive a calendar update about a change in rehearsal time. Cortana would let him know about the change and alert him if the new time conflicts with another appointment.

Research Depth and Breadth an Advantage

While many people would categorize such logical associations and humanlike behaviors under the term ”artificial intelligence” (AI), Heck points to the diversity of research areas that have contributed to Cortana’s underlying technologies. He views Cortana as a specific expression of Microsoft Research’s work on different areas of personal-assistant technology.

“The base technologies for a virtual personal assistant include speech recognition, semantic/natural language processing, dialogue modeling between human and machines, and spoken-language generation,” he says. “Each area has in it a number of research problems that Microsoft Research has addressed over the years. In fact, we’ve pioneered efforts in each of those areas.”

The Cortana user interface
The Cortana user interface.

Cortana’s design philosophy is therefore entrenched in state-of-the-art machine-learning and data-mining algorithms. Furthermore both developers and researchers are able to use Microsoft’s broad assets across commercial and enterprise products, including strong ties to Bing web search and Microsoft speech algorithms and data.

If Heck has set the bar high for Cortana’s future, it’s because of the deep, varied expertise within Microsoft Research.

“Microsoft Research has a long and broad history in AI,” he says. “There are leading scientists and pioneers in the AI field who work here. The underlying vision for this work and where it can go was derived from Eric Horvitz’s work on conversational interactions and understanding, which go as far back as the early ’90s. Speech and natural language processing are research areas of long standing, and so is machine learning. Plus, Microsoft Research is a leader in deep-learning and deep-neural-network research.”

From Foundational Technology to Overall Experience

In 2009, Heck started what was then called the conversational-understanding (CU) personal-assistant effort at Microsoft.

“I was in the Bing research-and-development team reporting to Satya Nadella,” Heck says, “working on a technology vision for virtual personal assistants. Steve Ballmer had recently tapped Zig Serafin to unify Microsoft’s various speech efforts across the company, and Zig reached out to me to join the team as chief scientist. In this role and working with Zig, we began to detail out a plan to build what is now called Cortana.”

Researchers who made contributions to Cortana
Researchers who worked on the Cortana product (from left): top row, Malcolm Slaney, Lisa Stifelman, and Larry Heck; bottom row, Gokhan Tur, Dilek Hakkani-Tür, and Andreas Stolcke.

Heck and Serafin established the vision, mission, and long-range plan for Microsoft’s digital-personal-assistant technology, based on scaling conversations to the breadth of the web, and they built a team with the expertise to create the initial prototypes for Cortana. As the effort got off the ground, Heck’s team hired and trained several Ph.D.-level engineers for the product team to develop the work.

“Because the combination of search and speech skills is unique,” Heck says, “we needed to make sure that Microsoft had the right people with the right combination of skills to deliver, and we hired the best to do it.”

After the team was in place, Heck and his colleagues joined Microsoft Research to continue to think long-term, working on next-generation personal-assistant technology.

Some of the key researchers in these early efforts included Microsoft Research senior researchersDilek Hakkani-Tür and Gokhan Tur, and principal researcher Andreas Stolcke. Other early members of Heck’s team included principal research software developer Madhu Chinthakunta, and principal user-experience designer Lisa Stifelman.

“We started out working on the low-level, foundational technology,” Heck recalls. “Then, near the end of the project, our team was doing high-level, all-encompassing usability studies that provided guidance to the product group. It was kind of like climbing up to the crow’s nest of a ship to look over the entire experience.

“Research manager Geoff Zweig led usability studies in Microsoft Research. He brought people in, had them try out the prototype, and just let them go at it. Then we would learn from that. Microsoft Research was in a good position to study usability, because we understood the base technology as well as the long-term vision and how things should work.”

The Long-Term View

Heck has been integral to Cortana since its inception, but even before coming to Microsoft in 2009, he already had contributed to early research on CU personal assistants. While at SRI International in the 1990s, his tenure included some of the earliest work on deep-learning and deep-neural-network technology.

Heck was also part of an SRI team whose efforts laid the groundwork for the CALO AI project funded by the U.S. government’s Defense Advanced Research Projects Agency. The project aimed to build a new generation of cognitive assistants that could learn from experience and reason intelligently under ambiguous circumstances. Later roles at Nuance Communications and Yahoo! added expertise in research areas vital to contributing to making Cortana robust.

The Cortana notebook menu
The notebook menu for Cortana.

Not surprisingly, Heck’s perspectives extend to a distant horizon.

“I believe the personal-assistant technology that’s out there right now is comparable to the early days of search,” he says, “in the sense that we still need to grow the breadth of domains that digital personal assistants can cover. In the mid-’90s, before search, there was the Yahoo! directory. It organized information, it was popular, but as the web grew, the directory model became unwieldy. That’s where search came in, and now you can search for anything that’s on the web.”

He sees personal-assistant technology traveling along a similar trajectory. Current implementations target the most common functions, such as reminders and calendars, but as technology matures, the personal assistant has to extend to other domains so that users can get any information and conduct any transaction anytime and anywhere.

“Microsoft has intentionally built Cortana to scale out to all the different domains,” Heck says. “Having a long-term vision means we have a long-term architecture. The goal is to support all types of human interaction—whether it’s speech, text, or gestures—across domains of information and function and make it as easy as a natural conversation.”

%d bloggers like this: