CHI 97 Electronic Publications: Organizational Overviews
Multimodal Human Computer Interaction Research at Toshiba Research and Development Center
Yoichi Takebayashi and Miwako Doi
Human Interface Technology Center,
Research and Development Center, Toshiba Corporation
1 Komukai-Toshiba-cho, Saiwai-ku, Kawasaki 210, Japan
+81 44 549 2243
{yoichi,doi}@eel.rdc.toshiba.co.jp
ABSTRACT
Toshiba's Human Interface Research Group is pursuing media understanding
and intelligent interaction technologies to achieve natural multimodal
HCI (human-computer interaction). In collaboration with Toshiba's
other corporate laboratories, engineering laboratories and business
divisions, we have been developing practical interactive systems
and products related to information services, consumer electronics,
document filing and industrial equipment.
KEYWORDS
Organizations, multimodal, HCI, information filtering, knowledge
sharing, media understanding.
© 1997 Copyright on this material is held by the authors.
ORGANIZATION AND RESEARCH THEMES
Best known for its world's first letter handling system using
hand-written character recognition and Japanese word processor
using Kana-to-Kanji conversion, Toshiba Research and Development
Center (RDC) has been developing various media conversion/understanding
systems and natural language processing systems.
We believe that these technologies play important roles in achieving
user-centered multimodal human computer interaction. While advances
in computing environments have helped us gather and share a large
amount of mulimedia data, they cause the situation in which we
have been forced to work under stress due to a flood of information.
To solve this problem, we are focusing our HCI research on information
retrieval and knowledge sharing, based on media understanding
technologies. Specifically, we have been exploring user's intention
and contents of multimedia data from the the viewpoint of media
conversion and understanding functions and multimodal interface,
because their full understanding is crucial in retrieving useful
information.
Toshiba's Human Interface Technology Center (HIC) was established
in 1995, as a corporate organization, aiming to achieve human-centered
reliable media technologies in harmony with our human society.
To apply these technologies to various systems and products, about
30 researchers are collaborating with other organizations, including
those in charge of computer and communication systems, consumer
electronics, power systems, and industrial equipment. Our work
widely covers media conversion/understanding functions, from character
recognition, to document understanding, natural language understanding,
as well as media interaction such as information filtering, knowledge
/information sharing, speech dialogue, video browsing and human
factors. Figure 1 shows the framework of information retrieving
and sharing system using a set of media conversion/processing
functions " HI-ware".
Figure1: Framework of information retrieving
and sharing system
APPROACH
Structuring Multimedia Information
In order to create user-centered multimodal interface, it is vitally
important to structure multimedia information using media conversion.
Structured multimedia information enables both humans and computers
to share and retrieve information as they wish and to have a better
understanding of each other.
The fundamental basis for this task is knowledge-bases and language
dictionaries, which we are currently building.
Enhancing Multimodal Interaction
We need to upgrade intelligent multimodal human computer interaction
technologies using agents so that users can find more enjoyment
and comfort in working with computers. This means creating a system
which understands users' intentions and situations from their
utterances and gestures and provides such services as information
retrieval, advices and suggestions, and whatever help they need,
while directing a natural dialogue with users.
Developing Sensors and Input-output Devices
Finally, we are also investigating new sensors and input-output
devices. They extract information users presented both voluntarily
and unintentionally. Such information facilitates the development
of media understanding technologies and makes it possible for
computers to understand users' intention and situations.
The human brain consists of thousands of architectural types of
computers, each of which has various functions including voice
understanding, scene understanding, language understanding, translation,
dialogue, speed reading, and problem-solving. Thus, a future picture
for intelligent multimodal interfaces could be realized by upgrading
conversion of each multi-level media and integrating those media
through the organization of knowledge-bases and language dictionaries,
thereby assisting humans' academic activities. Now that environments
for digital information are being set, we should accelerate our
research and development for highly advanced acquisition, sharing,
and dispatch of knowledge/information.
SELECTED PROJECTS
Information Filtering System
We have developed an information filtering system for newspaper
articles published every day in digital form. The system computes
similarities between the user's information need and each article
based on our expanded vector space model, and then selects articles
suitable to his need. The system also detects other similar articles,
so that it can indicate a cluster of similar articles. The selected
newspaper articles are provided to users by using communication
tools in the Internet, e.g., e-mail and WWW (World Wide Web).
This system is being used as Japan's first information filtering
service.
Personal Information Provider
We have been developing a multimodal Personal Information Provider(PIP),
for enhancing information/knowledge sharing and closer human relations
among groups. This system employs natural language and emotion
understanding from speech and keyboard input, with a user-initiative
dialogue manager and multimodal response generator. The system
runs in real time on a personal computer with an interface agent
to make the user's stored information open to others under the
user's permission. Experiments based on the PIP are being performed
on about 300 people for knowledge and know-how sharing in our
laboratories. Figure 1 shows the advice/help on demand system
for our office knowledge/know-how sharing system.
HI-ware(Common HI service environment)
We have been developing HI-ware (Common HI Service Environment)
where various kinds of HI functions such as speech recognition/synthesis,
character recognition, and machine translation are easily and
organically available to develop advanced HCI. As shown in Figure
2, the environment has two features. One is standardized API (Application
Programming Interface). The API keeps consistency among HI functions
so that various kinds of HI applications can incorporate them
in the common manner. The other feature is a common dictionary
shared among HI functions; a new word registered in the common
dictionary can be provided to all HI functions.
Figure2: Configuration of HI-ware
PUBLICATIONS
1. Aoki, H. et al. "A Shot Classification Method to Select Effective
Key-frames for Video Browsing," Proc. ACM Multimedia'96, 1996
2. Ono, K. et al. :"Abstract Generation Based on Rhetorical Structure
Extraction,'' COLING '94, 1994.
3. Miike, S. et al. : "A Full-Text Retrieval System with a Dynamic
Abstract Generation Function,'' Proc. SIGIR '94, 1994.
CHI 97 Electronic Publications: Organizational Overviews