LE-EAGLES WP4-1C.1
November 97
Survey of Research on Dialogue Acts
by Martin Weisser
Types of Dialogues:
- appointments (VERBMOBIL)
- airline/ travel inquiries
- developing plans to move trains and cargo from one city to another (TRAINS: "Coding Schemes for Spoken Dialogue Structure" & Dialog Act Markup in Several Layers [DAMSL])
- furnishing rooms (COCONUT)
- cooking recipes ("Instructions for Annotating Discourses".)
- giving directions (Maptask)
Types of Analysis:
- Automatic: VERBMOBIL, TRAINS-93
- Manual:
Online marking tools like Nota Bene ("Instructions for Annotating Discourses"); dat (TRAINS); CLAN (CHILDES)
No tools (Maptask ?)
Techniques for Identifying Topics
- keyword spotting + ? (VERBMOBIL)
- Manual semantic/pragmatic analysis of topic initiation. General underlying principle: initiated subject has to be brought to a conclusion (cf. Maptask). Thus beginnings and ends of "transactions" can be marked, although there does not seem to be general agreement about marking both [see further below]. In general, it seems to be assumed that the goal of the dialogue is achieved in as straightforward a manner as possible, although digression or irrelevant material can occur.
Annotation of Speech/Dialogue acts
The semantic/pragmatic annotation of Speech/Dialogue Acts according to the DAMSL model can be performed at "macro" or "micro" level. At the macro level utterances can be combined into larger units of annotation (called Segments) if they form one complete functional unit.
[However, the use of the term segment may be slightly problematical as it may be confused with segments identified at the phonetic level. The term functional unit or functional utterance unit would thus be preferable as it would be less likely to cause misunderstanding]
At the micro level, individual utterances can be identified through one or a combination of Utterance Tags. The following is a summary of those tags based on the categories described in Allen/Core. March 1997. Draft of DAMSL: Dialog Act Markup in several Layers.
Utterance tags:
Summarize the intentions of the speaker and the content of the utterance. Classified into four main categories.
Dagstuhl Conference
[Note: Listed here are only those points or issues that are not covered by or divergent from the new DAMSL manual.]
Five groups discussed different aspects of the annotations. Out of those five categories discussed we shall only deal with four, namely (1) "Forward-looking Communicative Function", (2) "Backward-looking Communicative Function", (3) "Segmentation" and (4) "Information level and Information Status". The topic of "Coreference" has deliberately been excluded here as does not exclusively belong to the domain of dialogues and may thus be dealt with in a different work package:
Unresolved issues:
- Higher level discourse structures.
- Integration of coreference coding with forward and backward communicative function.
- Specifying a set of informational relations (or at least some very general categories.
- Coding of floor and topic control issues.
- Coding of significant non-linguistic signals, e.g., refusing to respond, silence as acceptance
solutions to segmentation types and rules:
3 segment types:
- (1) regular segment boundaries: marked @, corresponds to what I refer to as functional (utterance) unit above.
- (2) weak segment boundaries: optional subunits, marked *.
- (3) drop-in segment boundaries: marked $, which serve to indicate pehenomena like self-repair or hesitations.
segmentation rules:
1. segment material that serves an illocutionary function (@).
2. when in doubt whether to segment or not, don’t segment.
3. if there are strong indicators, e.g. prosodic markers like a long pause, segment (@). (note: segment only in cases that are compatible with rule 1.)
4. in collaborative completions: segment at locations of speaker change (@).
5. optional: subsegment material into smaller units using weak boundaries (*) where the resulting units serve the same illocutionary function
Open issues:
- segmentation and prosody
- data representation (SGML coding)
- segmentation principles valid for all languages
- notational problems, such as whether to indicate beginnings and ends of segments or both, possibly by indexing
- In general, people at the Dagstuhl conference seemed to think that a 5-way distinction as proposed in the old version of the DAMSL manual [see my comment about this above] was too much and seemed to prefer a 3-way one, either TASK, ABOUT-TASK, NON-RELEVANT or TASK, COMMUNICATION, NON-RELEVANT.
- One thing that is not referred to in the DAMSL manual is information status. The group dealing with this issue at Dagstuhl proposes four schemes to make a distinction between old and new information:
(1) retain a simple distinction between old and new
(2) add a category irrelevant
(3) subdivide old into (a) repetition (→ anaphora), (b) reformulation (≠ paraphrase) and (c) inference (→ to bridge anaphora).
(4) define four categories (a) repetition, (b) reformulation, (c) inference and (d) new
[somehow the latter two seem to coincide with each other, so I'm sure there must be some further distinction???]
Prosodic Annotation
Prosodic labelling still presents one of the major problems in labelling any kind of spoken data. One of the main problems is that it is sometimes difficult to get access to appropriate tools for the operating system platform one is working on, but even if those tools may be available to the researcher/transcriber, it still remains difficult to make proper use of the prosodic information.
Simplified systems describing the use of intonation often resort to oversimplified categories such as attributing a fall to any kind of statement and rises to either questions or non-final items in a list, but in reality it is not always that simple to associate a certain type of pitch movement with a certain type of speech act or even sentence type. A one to one mapping between pitch contours is thus hardly achievable and this will therefore sometimes present an obstacle to the automatic analysis of content.
A broad classification may sometimes aid in establishing whether one may want to transcribe punctuation marks (if one decides to use them at all) as either full stops, question marks or commata/semicola, but in general a fair amount of knowledge of the language to be transcribed and the information contained in the text to be analysed is still required and even for well-researched languages like English not enough is yet known about how the use of certain intonational features can be related to speech acts.
At the level of segmentation, knowledge of intonational features may sometimes aid in establishing distinctions between levels of tone group boundaries and thus indicate whether an utterance may be complete or not, but even there a word of caution is necessary as information from the speech signal may be misleading, especially with regard to non-final utterance elements that may correspond to minor tone groups. Again this probably refers more to automatic content analysis as for example pauses can usually easily be identified perceptually by the transcriber, but may not necessarily be easily identifyable in the speech signal due to phenomena such as final lengthening.
As far as coding intonational information is concerned, a proper set of (preferably mnemonic) symbols needs to be found and most of the existing systems only present a rathere feeble attempt to capture what is actually going on. The TOBI system, although widely used nowadays, does not capture pitch levels relative to the speaker’s pitch range and the combination of symbols used to identify certain levels can be misleading, just as any multi-character sequence representing one phenomenon can be misleading. The latter is also the problem with John Wells’ X-SAMPA recommendations, where he proposes to use certain multi-character sequences to represent diacritics and intonation and even to establish a separate tier for intonational information, which is, however, not physically separated from the rest of the segmental information. Systems like these may be easy enough for the computer to handle, but make segmental/intonational representation more and more unreadable for the human interpreter.
The main drawback in coding segmental/intonational information is therefore still the absence of a universal character set for all platforms, which necessitates this strange kind of transliteration and can probably only be overcome if a move towards Unicode is propagated within both the speech and corpus linguistics communities.
Bibliography
Allen, J. & Core, M. 1997. "Draft of DAMSL: Dialog Act Markup in Several Layers".
Carletta/Dahlbäck/Reithinger/Walker. 1997. Dagstuhl Workshop report "Standards for Dialogue Coding in Natural Language Processing"
Carletta/Isard/Isard/Kowtko/Newlands/Doherty-Sneddon/Anderson. 1995. "HCRC Dialogue Structure Coding Manual". Association for Computational Linguistics.
Jekat/Klein/Maier/Maleck/Mast/Quantz. 1995. "Dialogue Acts in VERBMOBIL". VM-Report 65.
Llisterri, J. 1996. "Preliminary recommendations on Spoken Texts". EAGLES Document EAG-TCWGSPT/P. Expert Advisory Group on Language Engineering Standards.
Nakatani/Grosz/Ahn/Hirschberg. 1995. "Instructions for Annotating Discourses". Technical Report Nr. TR-21-95. Center for Research in Computing Technology, Harvard University: Cambridge, MA.
Traum, D. 1996. "Coding Schemes for Spoken Dialogue Structure". University of Geneva.
Wells, J. C. "Computer-coding the IPA: a proposed extension of SAMPA". London: UCL.