CHI 97 Electronic Publications: Organizational Overviews

CHI 97 Electronic Publications: Organizational Overviews

Multimodal Human Computer Interaction Research at Toshiba Research and Development Center

Yoichi Takebayashi and Miwako Doi
Human Interface Technology Center,
Research and Development Center, Toshiba Corporation
1 Komukai-Toshiba-cho, Saiwai-ku, Kawasaki 210, Japan

+81 44 549 2243
{yoichi,doi}@eel.rdc.toshiba.co.jp

ABSTRACT

Toshiba's Human Interface Research Group is pursuing media understanding and intelligent interaction technologies to achieve natural multimodal HCI (human-computer interaction). In collaboration with Toshiba's other corporate laboratories, engineering laboratories and business divisions, we have been developing practical interactive systems and products related to information services, consumer electronics, document filing and industrial equipment.

KEYWORDS

Organizations, multimodal, HCI, information filtering, knowledge sharing, media understanding.

ABSTRACT
KEYWORDS
ORGANIZATION AND RESEARCH THEMES
APPROACH

Structuring Multimedia Information
Enhancing Multimodal Interaction
Developing Sensors and Input-output Devices

SELECTED PROJECTS

Information Filtering System
Personal Information Provider
HI-ware(Common HI service environment)

PUBLICATIONS

ORGANIZATION AND RESEARCH THEMES

Best known for its world's first letter handling system using hand-written character recognition and Japanese word processor using Kana-to-Kanji conversion, Toshiba Research and Development Center (RDC) has been developing various media conversion/understanding systems and natural language processing systems.

We believe that these technologies play important roles in achieving user-centered multimodal human computer interaction. While advances in computing environments have helped us gather and share a large amount of mulimedia data, they cause the situation in which we have been forced to work under stress due to a flood of information. To solve this problem, we are focusing our HCI research on information retrieval and knowledge sharing, based on media understanding technologies. Specifically, we have been exploring user's intention and contents of multimedia data from the the viewpoint of media conversion and understanding functions and multimodal interface, because their full understanding is crucial in retrieving useful information.

Toshiba's Human Interface Technology Center (HIC) was established in 1995, as a corporate organization, aiming to achieve human-centered reliable media technologies in harmony with our human society. To apply these technologies to various systems and products, about 30 researchers are collaborating with other organizations, including those in charge of computer and communication systems, consumer electronics, power systems, and industrial equipment. Our work widely covers media conversion/understanding functions, from character recognition, to document understanding, natural language understanding, as well as media interaction such as information filtering, knowledge /information sharing, speech dialogue, video browsing and human factors. Figure 1 shows the framework of information retrieving and sharing system using a set of media conversion/processing functions " HI-ware".

Figure1: Framework of information retrieving and sharing system

APPROACH

Structuring Multimedia Information

In order to create user-centered multimodal interface, it is vitally important to structure multimedia information using media conversion. Structured multimedia information enables both humans and computers to share and retrieve information as they wish and to have a better understanding of each other.

The fundamental basis for this task is knowledge-bases and language dictionaries, which we are currently building.

Enhancing Multimodal Interaction

We need to upgrade intelligent multimodal human computer interaction technologies using agents so that users can find more enjoyment and comfort in working with computers. This means creating a system which understands users' intentions and situations from their utterances and gestures and provides such services as information retrieval, advices and suggestions, and whatever help they need, while directing a natural dialogue with users.

Developing Sensors and Input-output Devices

Finally, we are also investigating new sensors and input-output devices. They extract information users presented both voluntarily and unintentionally. Such information facilitates the development of media understanding technologies and makes it possible for computers to understand users' intention and situations.

The human brain consists of thousands of architectural types of computers, each of which has various functions including voice understanding, scene understanding, language understanding, translation, dialogue, speed reading, and problem-solving. Thus, a future picture for intelligent multimodal interfaces could be realized by upgrading conversion of each multi-level media and integrating those media through the organization of knowledge-bases and language dictionaries, thereby assisting humans' academic activities. Now that environments for digital information are being set, we should accelerate our research and development for highly advanced acquisition, sharing, and dispatch of knowledge/information.

SELECTED PROJECTS

Information Filtering System

We have developed an information filtering system for newspaper articles published every day in digital form. The system computes similarities between the user's information need and each article based on our expanded vector space model, and then selects articles suitable to his need. The system also detects other similar articles, so that it can indicate a cluster of similar articles. The selected newspaper articles are provided to users by using communication tools in the Internet, e.g., e-mail and WWW (World Wide Web). This system is being used as Japan's first information filtering service.

Personal Information Provider

We have been developing a multimodal Personal Information Provider(PIP), for enhancing information/knowledge sharing and closer human relations among groups. This system employs natural language and emotion understanding from speech and keyboard input, with a user-initiative dialogue manager and multimodal response generator. The system runs in real time on a personal computer with an interface agent to make the user's stored information open to others under the user's permission. Experiments based on the PIP are being performed on about 300 people for knowledge and know-how sharing in our laboratories. Figure 1 shows the advice/help on demand system for our office knowledge/know-how sharing system.

HI-ware(Common HI service environment)

We have been developing HI-ware (Common HI Service Environment) where various kinds of HI functions such as speech recognition/synthesis, character recognition, and machine translation are easily and organically available to develop advanced HCI. As shown in Figure 2, the environment has two features. One is standardized API (Application Programming Interface). The API keeps consistency among HI functions so that various kinds of HI applications can incorporate them in the common manner. The other feature is a common dictionary shared among HI functions; a new word registered in the common dictionary can be provided to all HI functions.

Figure2: Configuration of HI-ware

PUBLICATIONS

1. Aoki, H. et al. "A Shot Classification Method to Select Effective Key-frames for Video Browsing," Proc. ACM Multimedia'96, 1996
2. Ono, K. et al. :"Abstract Generation Based on Rhetorical Structure Extraction,'' COLING '94, 1994.
3. Miike, S. et al. : "A Full-Text Retrieval System with a Dynamic Abstract Generation Function,'' Proc. SIGIR '94, 1994.

CHI 97 Electronic Publications: Organizational Overviews