CHI 97 Electronic Publications: Doctoral Consortium

DISPLAYLESS INTERFACE ACCESS TO SPATIAL DATA: EFFECTS ON SPEAKER PROSODICS

Julie Baca
U.S. Army Corps of Engineers Waterways Experiment Station
Information Technology Laboratory
Vicksburg, MS 39180
jbaca@wes.army.mil

ABSTRACT

Displayless interface technology must address challenges similar to those presented by the problem of providing GUI access to visually impaired users. Both must address the issue of providing nonvisual access to spatial data. This research examines the hypothesis that such access places a cognitive burden on the user, which in turn will impact the prosodics, i.e. nonverbal aspects, of the user's speech.

KEYWORDS

GUI access, displayless interfaces, prosodics.

ABSTRACT

KEYWORDS

INTRODUCTION
RESEARCH APPROACH
ACKNOWLEDGMENTS
REFERENCES

INTRODUCTION

The emergence of the graphical user interface (GUI) has marked a turning point in modern computing environments. For sighted users, the GUI provides a more natural interaction with the computer system, allowing a direct manipulation of objects and actions within the interface [1]. For users with visual impairments, however, gaining access to these interfaces has presented major challenges [2, 3]. The use of the GUI in human-computer interaction continues only to increase with no signs of abating. This fact, coupled with the 1990 passage of the Americans with Disabilities Act (ADA), requiring employers to provide reasonable accommodations for persons with disabilities, makes the problem of providing better access to GUIs for users with visual impairments more critical than ever. Though some of the initial obstacles have been addressed, many issues remain unresolved.

Paradoxically, the increasing popularity of GUIs for sighted users has been followed by a recent trend, incongruous to the purpose of a GUI, the development and use of "displayless" interfaces. These interfaces offer voice access to applications in which a visual display cannot be used, such as telephone-based interactions, or those in which the user's hands and eyes are busy with other tasks, such as piloting an aircraft. Although displayless interface technology introduces certain issues specific to spoken language recognition, it also presents underlying challenges similar to those of GUI access technology for users with visual impairments. Each must address the unique problems presented by nonvisual access to data, especially data which is either inherently spatial in nature, i.e. geographical maps, or textual data presented through a visuospatial display metaphor. While methods have been developed for dealing with the problem of spatial presentation of textual data [4], the problem of accessing inherently spatial data without vision remains an open issue. This dissertation research is based on the assumption that accessing spatial data strictly via speech interaction produces a cognitive burden for the individual, regardless of visual capability. A selective survey of the literature, given in [5], reviews research originating from many viewpoints, including psychology, education, and human-computer interaction which supports this assumption.

GUI access researchers argue, however, that voice input should be included as an option in providing nonvisual access to graphical and spatial data [2]. While speech alone may not provide optimal access to spatial data, used with other input and output modalities, it offers certain advantages; e.g., voice input frees the hands to be used for other tasks, such as accessing a tactile output device.

In addition, the portability of speech interfaces makes them well-suited for certain applications in which access to spatial data is required. One example, Back Seat Driver, is a navigational system developed at the MIT Media Lab [6] for taxi drivers in the Boston area. In response to voice input from the driver, it consults an inertial guidance system and map database to provide directions via synthesized speech. Research described in [7] investigated the use of similar technology for a portable navigational aid for visually impaired travelers in unfamiliar environments. The Soldier's Computer, designed to meet the needs of the modern soldier, offers another example of a portable speech interface providing rapid access to map and directional data in time-critical situations [8].

Despite their potential benefits, many human factors issues must be resolved if speech interfaces are to enjoy widespread use. These are discussed in [1,9]. One issue in particular, prosodics, is a central focus of this research. Prosodics encompasses the nonverbal aspects of spoken language, such as pauses and intonation, which are useful in both speech synthesis and recognition. Research, reviewed in [10], has examined the effects of various psychological and cognitive stresses on the prosodics of human speech, e.g. fundamental frequency (F0) (which corresponds to pitch), the length and location of pauses, and speaking rate. This dissertation research examines the possible connection between the cognitive burden produced by nonvisual access to spatial data through spoken language and the impact of this burden on the prosodics of the user's speech. An understanding of how this cognitive burden impacts the human speaker's prosodics would contribute to the development of more robust interfaces for situations in which this type of access is necessary. Insight gained from an investigation of this issue could be used, for example, to improve algorithms used in the recognition component by identifying certain prosodic patterns or variations in patterns occurring for this type application. The limitations of many of the algorithms developed for prosodic pattern detection in using only limited acoustic cues, i.e., primarily F0 features are noted in [11]. It is argued in [11] that additional acoustic cues including pauses and durational features, such as speaking rate, should be used for more robust prosodic pattern detection. This argument has particular relevance for displayless access to spatial data due to the additional cognitive load such access places on the user and its potential impact on the prosodics of the user's speech.

RESEARCH APPROACH

It is the hypothesis of this research that the prosodic patterns of the user's speech produced to access visuospatial information through a displayless interface, employing only spoken language, will differ significantly from those of speech produced when the interface employs an additional modality. The hypothesis will be tested by analyzing recordings of user speech interactions with a prototype displayless interface to a map database of the U.S. Army Corps of Engineers (USAE) Waterways Experiment Station (WES). Two experiments will be conducted. In both experiments, a series of navigational tasks, each of increasing spatial complexity, will be given to the subjects. In order to accomplish the tasks, the subjects must query the database for information. For example, the initial task will be to determine the shortest route, on foot, from location A to location B at the WES. For each successive task, the route will become more complex. In the first experiment, subjects will use a displayless interface with no other modalities, visual or otherwise. In the second experiment, a tactile and a graphical display will be available to augment spoken output of the data to the visually impaired and sighted subjects, respectively.

Since a central assumption of the research is the difficulty of nonvisual access to spatial data for all users, both those with and without sight loss will be tested. However, since the conditions in the second experiment must differ for the two groups, no formal statistical comparisons can be made between the two groups. Therefore, data will be gathered and labeled separately for each group.

After all sessions are completed, acoustic features for the analysis will be extracted from the recordings, including F0, pauses, and durational features. To determine significance of differences in the prosodic features produced from the two experiments, statistical tests will be performed comparing the means of the differences in the measurements of each of the extracted features. More details on the the experimental design and the criteria for the subjects are available in [5].

ACKNOWLEDGMENTS

This research is partially funded by the USAE WES in Vicksburg, MS.

REFERENCES

1. Shneiderman, B. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Second Edition, Addison-Wesley, 1992.

2. Boyd, L.H., W.L. Boyd, J. Berliss, M. Sutton, and G.C. Vanderheiden. The paradox of the graphical user interface: Unprecedented computer power for blind people. Closing the Gap 14 (October):24-25, 60-61, 1992.

3. Vanderheiden, G.C., and D.C. Kunz. Systems 3: An interface to graphic computers for blind users. In Proceedings of the 13th Annual Conference of RESNA held in Washington, D.C. 20-24 June 1990, 150-200, 1993.

4. Mynatt, E.D., and G. Weber. Nonvisual presentation of graphical user interfaces: contrasting two approaches. In Proceedings of the ACM CHI '94 Conference, held in Boston, MA, 24-28 April 1994, 166-172, 1994.

5. Baca, J. Displayless access to spatial data: Effects on speaker prosodics. An unpublished dissertation proposal, Submitted to Mississippi State University, in the Department of Computer Science, 1996.

6. Davis, J.R. and C. Schmandt. The back seat driver: Real time spoken driving instructions. In Vehicle Navigation and Information Systems, 146-150, 1989.

7. Loomis, J.M., R.G. Golledge, R.L. Klatzky, J. Speigle, and J. Tietz. 1994. Personal guidance system for the visually impaired. In ASSETS 94, The First Annual ACM Conference on Assistive Technologies, October 31-November 1, 1994, Los Angeles, CA, 85-91, 1994.

8. Weinstein, C.J. Applications of voice processing technology. Voice Communication Between Humans and Machines, National Academy Press, Washington, D.C., 1994.

9. Kamm, C. User interfaces for voice applications. Voice Communication Between Humans and Machines. National Academy Press, Washington, D.C., 1994.

10. Scherer, K.R. Speech and emotional states. Speech Evaluation in Psychiatry, 189-220, New York:Grune-Stratton, 1981.

11. Wightman, C.W. and M. Ostendorf. Automatic labeling of prosodic patterns. IEEE Transactions on Speech and Audio Processing 2(4):469-481, 1994.

CHI 97 Electronic Publications: Doctoral Consortium