Voice Gene
Essay Preview: Voice Gene
Report this essay
TABLE OF CONTENTS
Executive Summary
Introduction
Analysis
2.1 Reusability of code and developers skills
2.2 Suitability for VoiceGenies platform
– – – – – – – – – – – – – – –
– – – – –
2.3 Style
2.4 Industry Standard
– – – – – – – – – –
– – – – – – – – – –
Conclusions
Recommendations
References
Appendix A — An Example of an X+V Application
– – – – – – – – – –
– – – – – – – – – –
Appendix B — An Example of a SALT Application
– – – – – – – – – –
– – – – – – – – – –
Executive Summary
VoiceGenie Technologies Inc., a VoiceXML Gateway solutions company, is striving to support multimodal applications. The company must decide which multimodal markup language to support. The markup languages being considered are:
X+V (XHTML+Voice), a combination of XHTML and VoiceXML
SALT (Speech Application Language Tag), an extension of HTML/XHTML using SALT tags.
VoiceGenies method, a set of HTML pages and a set of VoiceXML pages synchronized by sending messages to each other.
This report assumes that an X+V or SALT browser is to be implemented by VoiceGenie. The benefits and drawbacks of these markup languages are analyzed using the following criteria:
•
Reusability of code and developers skills: Both Web and voice application developers would find X+V easy to learn. An existing Web application can be reused if it should be converted to SALT. Both Web and voice applications can be easily converted to X+V.
•
Suitability for VoiceGenies platform: X+V is more suitable for VoiceGenie since it already supports VoiceXML.
•
Style: The layout of elements in an X+V application is more elegant and intuitive.
•
Industry Standard: It is not clear whether X+V or SALT will become the standard.
This report recommends that VoiceGenies method be used until a standard multimodal markup language emerges.
1.0 Introduction
The development of multimodal technology has become increasingly significant over the past several years. A multimodal application accepts different modes of input and output. The possible inputs may include speech, key strokes, or mouse click; the possible outputs may include synthesized speech, text, graphics, or videos. Multimodality is most useful in a mobile environment, where keyboard input is difficult due to movements or the small size of the keyboard.
Since multimodality is a relatively new technology, there is not yet a single standard markup language accepted by the industry for developing multimodal applications. There are currently two markup languages submitted to the W3C for review: X+V and SALT.
X+V stands for XHTML+Voice, a language that is basically a combination of XHTML for the visual content and VoiceXML for the audio component. This multimodal markup language is an initiative of IBM, Motorola, and Opera Software. Version 1.0 was submitted to W3C for review at the end of 2001, and version 1.1 was submitted on March 11, 2003 (Multimodal, 2003).
SALT (Speech Application Language Tag) is a language proposed by Microsoft and submitted to W3C for review in July, 2002. It extends XHTML with SALT tags, which are used to handle the audio component of the application (Multimodal, 2003).
Due to the lack of an X+V interpreter or a SALT browser, VoiceGenie is currently developing its own multimodal “language” that the platform supports. In this stage, such a multimodal application can only run on a Pocket PC device. It consists of a set of HTML pages and a set of VoiceXML pages. The speech and visual components are synchronized by sending messages to each other. When the user gives a visual input, the device informs the VoiceXML page of the input. When speech input is detected, the VoiceXML page sends a message to the device to alter the visual content or to load a different HTML page.
This report compares X+V, SALT, and VoiceGenies method as to help decide which multimodal markup language should be supported.
2.0 Analysis
2.1 Reusability of code and developers skills
Both SALT and X+V extend HTML / XHTML. This is a good news for Web application developers since they can reuse their skills and it is not something totally new. Moreover, if a visual-only Web application already exists, adding the voice component is not difficult. Much of the existing application can be reused.
For the voice component, X+V uses VoiceXML, a language voice application developers are familiar with. SALT, on the other hand, has its own voice tags that are new to all developers. So voice application developers would learn X+V much faster than SALT.
Another advantage of using X+V is the ease of writing a multimodal application based on an existing voice application. A developer can write an X+V application by adding XHTML to an existing VoiceXML application and making some small changes to the VoiceXML part. If SALT is used, the multimodal application needs to be written from scratch. The existing VoiceXML application cannot be reused.
VoiceGenies method also has the same benefits as those of X+V mentioned above, but to a smaller extent. A multimodal