Register   |  Login
 Speech Technology Implementation Best Practices
Minimize

Speech project characteristics

Setting up and implementing a speech project is no different from a classic Web project, except for:
• the increased importance of the (Voice) User Interface, which must follow linguistic and psychological principles of conversational interaction
• the increased importance of usability testing
• an incremental design and development approach, with multiple feedback loops

These aspects are particularly important and most of the speech deployments that have failed in the past were due to a poor design or a design that did not take the user perspective into consideration. When thinking about designing an application, it is important to understand the target audience, their needs and expectations. Who are the callers, what do they call about, what and how much do they expect to find out, how often do they call, how do they phrase their questions, and what terminology do they use and understand? There are a number of specific tasks that can help answer these questions, including:
• transcription and language analysis of recorded ‘user-to-agent’ calls
• agent interviews
• call centre visits
• listening to live calls
• call-type analysis.

During the design phase, linguistic expertise is invaluable, particularly for designing the dialogue – i.e. the recorded prompts that the users are required to respond to. Good dialogue design is not easy; it needs to be simple, straightforward, consistent, unambiguous, helpful and representative of the language the target audience uses and/or is familiar with. Most importantly, dialogue design must help users navigate their way through the system smoothly.

Dialogue is normally designed to reflect a persona – a carefully constructed human image that the company wants to portray to its audience.

Typical project phases are:
• Business analysis: define business drivers, needs and goals; select the right application
• Requirements definition: what are we building
• VUI and Call Flow Design: how will the dialogue between caller and system flow?
• Back-end design: how will the speech application integrate with existing back-end systems (ERP, CRM, CTI, etc.)
• Application development: production of VoiceXML or SALT code, speech grammars, pronunciation lexicons, and system prompts (pre-recorded audio files)
• Testing: usability, functional, unit, load, and integration testing
• Deployment: accept live caller traffic on the speech application
• Evaluation: measure speech recognition accuracy, task completion rate, automation rate, customer satisfaction, ROI or other key performance indicators
• Post-deployment tuning: listen to real caller utterances, compare them with what was recognised, and tune the VUI, grammars, pronunciation lexicon or system prompts for better performance
• Operational maintenance: monitor the live application

Operational models

Depending on the physical location of various system components, we distinguish between three models.

In the insourcing model, the company offering the speech system keeps all components in-house. These include: back-end data, speech application, speech and telephony platform.

In the ASP (application service provider) model, all components are outsourced to an external service provider, including the speech application and even the back-end data. Examples of ASPs in Europe are BT, Telefónica, T-Com or Telesonera.

In the VSP (voice service provider) model, the speech and telephony platform is hosted by a third party, but the speech application and certainly the back-end data remain at the company’s premises for added security and control. Each live call generates HTTP traffic between the VSP’s speech platform and the customer’s Speech Application Server and/or back-end databases. Voice Application Hosting Providers have made it possible for anyone with a web server to develop and serve their own speech-enabled phone applications, in more than 15 languages. Most VSPs also offer application hosting, but more and more customers prefer to keep their application insourced, often for security reasons.

Expertise requirements

The creation of a speech application requires specialised knowledge and skills of various kinds:
• Project Manager
• ICT Engineers (telecom, network, security, provisioning, installation)
• VUI designer (VUI design, call flow design, usability testing)
• Speech engineer (grammar and lexicon development, speech application tuning)
• Sound engineer / Voice Talent (system prompt recording)
• Speech application developer (call flow implementation)
• Web architect (back-end design), Web programmer (back-end API development), CTI programmer (CTI integration)
• QA engineer (test plan writing and execution, bug report writing)

Potential pitfalls

Speech recognition and speaker verification are probabilistic technologies by nature, which means that the speech engines sometimes get it wrong. Knowledgeable application designers and developers tackle this challenge by employing error recovery strategies, and by optimising recognition rates during the post-deployment tuning phase. In case of continued speech engine issues, an application can always back off to DTMF recognition, or transfer the call to an operator.

End users need some time to get acquainted with new technology, and speech-driven phone applications are no exception to this rule. Well thought-out interaction design must guarantee callers a pleasant and effective user experience, e.g. by providing guidance and help when needed. Usability testing and phased rollouts help application developers detect imperfections in an early stage.