Platforms
Typical legacy IVR and speech recognition platforms are proprietary, closed and tightly integrated systems, operated by scarce experts. They are expensive to maintain, and tie the customer to the vendor through lack of interoperability with other systems and high switching costs. The advantage of having one single contact for support issues does not outweigh the cost of vendor lock-in.
The advent of the W3C-backed VoiceXML standard since 1999 has led to a proliferation of VoiceXML browsers, platforms, and development tools. The VoiceXML Forum, which promotes the language, has more than 150 member organizations. Thousands of VoiceXML developers have developed tens of thousands of speech-enabled phone applications worldwide, the larger part in the English-speaking countries. The VoiceXML standard is now (June 2007) at version 2.1, with version 3.0 being in preparation. There are literally dozens of VoiceXML platforms on the market.
Integration with speech recognisers is nowadays performed via a standard MRCP (Media Resource Control Protocol) interface. Opposite to proprietary connectors, this standardized interface makes it very easy for the voice platforms to support new releases of speech recognition software from different speech vendors.
The Contact Centre industry has seen an evolution in the last ten years. The latest evolution is to provide video as an interaction to the customers in order to personalizing the interaction and improving the quality of the call resolution. Most vendors of speech platforms have already included video capabilities in their product such as video-on-hold and video parking functionalities. This is the capability to play a user-defined file when the customer is put on hold or the call is parked. Typically, a video self service platform provides varied choices and flexibility in the supported codecs (H.261, H.263, and H.264) and file formats (wmv and mpg).
Speech Engines
Text-To-Speech (TTS) or speech synthesis systems have substantially gained in quality over the last few years, to the extent that in some application settings they are indistinguishable from a human reader. The technology is particularly useful in a dynamic environment, where the information to be presented to the caller is inherently unpredictable.
Thanks to this improvement in terms of quality, users of TTS wish to have an exclusive voice, which makes it possible to personalize the services. This personalization is possible not only on basis of an exclusive voice but also by including functions like a music background integrated in the TTS.
Systems for speaker-independent Automatic Speech Recognition (ASR) have also improved over the last few years, but still require a fair amount of tuning for optimal performance and accuracy. This is especially true in a mobile setting, in cars, or in other noisy environments.
The inherent probabilistic nature of ASR systems necessitates well-designed error recovery strategies at the application level. This is where Voice User Interface Design comes into play. State-of-the-art ASR systems are capable of recognizing utterances from grammars containing tens of thousands of entries.
Systems for Speaker Verification (SV) have been deployed to recognize and authenticate callers by their voice. For added security, speaker verification can be combined with other, more classic methods like passwords, pass-phrases or PIN codes.
Tools for Application Development
The simplest tool for developing a VoiceXML application is a text editor. More than a decade ago, the first websites consisting of static HTML pages were also developed this way.
To make their lives somewhat easier, developers have written libraries or plug-ins for VoiceXML code generation in various computer languages and integrated development environments. Although this kind of voice application development still requires a programmer acquainted with the programming language at hand, it’s already a step ahead.
The most advanced systems for voice application design, development and management are fully graphical, and often web-based. They no longer require programmers to design and develop the front-end of a speech-enabled phone application. This means that more time can be spent on the VUI aspects, which is of utmost importance. Voice application management functionality includes operational statistics, archiving, live monitoring and exploitation.
Integration
One of the big advantages of the VoiceXML standard is that the speech application (and its development) has become totally independent of the underlying hardware. State-of-the-art generic VoiceXML platforms support a wide variety of telephony boards and VoIP connections. Just like HTML pages can be viewed in different Web browsers (MS Internet Explorer or FireFox), VoiceXML applications can in principle run on any VoiceXML platform (although platforms do sometimes have their own extensions or peculiarities, just like with MS IE and Netscape).
A weaker point of these generic VoiceXML platforms, however, is their limited integration with existing call centre software. This is where proprietary extensions to the platform can make a difference. The major call centre software providers extend their VoiceXML platform with a software layer that integrates seamlessly with their existing CTI technologies, e.g. for assisted service. When necessary, operators are perfectly able to follow up on a speechinitiated call. This way speech technology and human-assisted service work in harmony to provide a continuous caller experience.
Internally and externally, the speech platform typically communicates in the Hypertext Transfer Protocol (HTTP), the lingua franca of the Internet. The usage of Internet protocols and technologies in a speech application setting makes it easy to integrate the new speech-enabled phone channel to existing back-ends that are already based on Internet technology. Existing business logic can be reused, and past investments in technological or human resources are therefore protected.
For example, many Contact Centres already have a CRM system in place, which is accessed via a web interface. They can now tap an automated voice channel directly into the existing CRM back-end, with or without additional human intervention, as desired. Potential issues arising from the voice channel can be trapped and escalated to a human operator.
Operation Monitoring Tools
As speech-enabled IVRs provide more services as traditional IVRs, they have a larger impact on the enterprise business operations. Business users subsequently need accurate monitoring and reporting information to continuously improve customer satisfaction based on a detailed understanding and analysis of the way they use the speech applications
Typical requested information includes:
• System monitoring: a tool for technical users that continuously monitor system hardware and software resources involved in the IVR system, provide a user interface showing all component status, and generate alarms in case of incidents
• Application monitoring: a tool for operation managers that tracks the managed calls in real-time by providing real-time information on the handled calls, such as volume, status, breakdown by category, etc … to enable them taking appropriate business operation decisions
• Historical calls logging: a tool for quality managers that tracks all call sessions, with a detailed Call Data Record (CDR) for each call including the list of all time-stamped events (such as the followed path in the IVR script, the played messages, the caller entries in DMTF and speech recognition with registered confidence level,)
• Queries and statistical reports: a tool for business users that generates configurable statistical reports automatically or on demand. Statistical reports present calls statistics over periods of time, such as call origin, duration, requested services, abandon and error rates, … They enable to track the evolution of the customer’s behaviour over time. The IVR statistical reports often need to be consolidated with the overall contact centre reports to provide accurate business intelligence analysis.
These tools are typically included in the Voice Application Management Systems. The user interfaces need to be web-based when the monitoring tools are hosted by a service operator.