The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Last modified: November 16, 2000
Transcriber - Speech Segmentation and Annotation DTD

[November 16, 2000] 'Transcriber', a tool for assisting in the creation of speech corpora. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with extensions such as Snack for advanced audio functions and tcLex for lexical analysis, and has been tested on various Unix systems and Windows. The data format follows the XML standard with Unicode support for multilingual transcriptions. Distributed as free software in order to encourage the production of corpora, ease their sharing, increase user feedback and motivate software contributions, Transcriber has been in use for over a year in several countries. As a result of this collective experience, new requirements arose to support additional data formats, video control, and a better management of conversational speech. Using the annotation graphs framework recently formalized, adaptation of the tool towards new tasks and support of different data formats will become easier..."

Transcriber is free software for transcribing and annotating digital audio, aimed initially at transcription of broadcast news data. Its user interface is written in Tcl/Tk. It uses the same transcription formats as the LDC's Broadcast News data, and has also been adapted for XML I/O. It was developed by Claude Barras and Edouard Geoffrois, at DGA in Paris, in collaboration with LDC (UPenn). A new version of Transcriber will be based on the annotation graph model."

References:

  • UPenn Transcriber web site

  • European web site

  • DTD version 1.3. [DTD alt URL, cache]

  • XML sample transcription file. [sample alt URL, cache]

  • Transcriber download

  • Transcriber publications

  • Transcriber Reference

  • [November 16, 2000] "Transcribing with Annotation Graphs." By Edouard Geoffrois, Claude Barras, Steven Bird, and Zhibiao Wu. Presented at The Second International Conference on Language Resources and Evaluation (LREC-2000, May 31 - June 2, 2000, Athens, Greece). "Transcriber is a tool for manual annotation of large speech files. It was originally designed for the broadcast news transcription task. The annotation file format was derived from previous formats used for this task, and many related features were hard-coded. In this paper we present a generalization of the tool based on the annotation graph formalism, and on a more modular design. This will allow us to address new tasks, while retaining Transcriber's simple, crisp user-interface which is critical for user acceptance. Transcriber is described more extensively in other articles and in its reference manual (available on the web site and in the tool itself in the online help). [The interface] consists mainly in two windows, one for displaying and editing the transcription, and one for the displaying the signal waveform and the segmentation. The annotations include various information: orthographic transcription, speech turns, topic sections, background conditions, and various events. The data format is XML, and a DTD controls the validity of the data. This format used for file input/output is also used directly as the internal data structure. Therefore, no conversion is needed for input/output. But the major drawback is the strong dependency of the code on the file format, so that modifications in the format need to be propagated in the code. The annotation graph model provides a general-purpose abstraction layer between physical annotation formats and graphical user interfaces. As a consequence, the connections between this logical model and various physical and graphical representations can be fully modularized. New annotation formats and new user-interfaces to an annotation task can thus be implemented as pluggable components. The annotation graph data model is composed of two low-level structures -- nodes and arcs -- and two high-level structures -- graphs and subgraphs. A graph object is a collection of zero or more arcs, each specifying an identifier, a type, and some content consisting of domain-specific attributes and values. An arc also references a start and end node, and each node provides an optional temporal offset. This temporal offset may be qualified with a 'timeline', which is a symbolic name for a collection of signal files which are temporally co-extensive and whose times can be meaningfully compared. Node and arc identifiers may also be qualified with a user-specific namespace, to avoid collisions when multiple independent annotations are combined..." [cache]

  • "Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech." By C. Barras, E. Geoffrois, Z. Wu, and M. Liberman. First International Conference on Language Resources and Evaluation (LREC), pages 1373-1376, May 1998. [cache]

  • "Transcriber: development and use of a tool for assisting speech corpora production." By Claude Barras, Edouard Geoffrois, Zhibiao Wu, and Mark Liberman. To be published in Speech Communication Volume 33, Numbers 1-2 (January 2001). Special Issue on Speech Annotation and Corpus Tools. "We present 'Transcriber', a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with extensions such as Snack for advanced audio functions and tcLex for lexical analysis, and has been tested on various Unix systems and Windows. The data format follows the XML standard with Unicode support for multilingual transcriptions. Distributed as free software in order to encourage the production of corpora, ease their sharing, increase user feedback and motivate software contributions, Transcriber has been in use for over a year in several countries. As a result of this collective experience, new requirements arose to support additional data formats, video control, and a better management of conversational speech. Using the annotation graphs framework recently formalized, adaptation of the tool towards new tasks and support of different data formats will become easier... Segmentation levels of a transcription: transcriptions are complex objects, and a structured machine-readable format is needed. We considered SGML (Standard Generalized Markup Language) and its more recent subset XML (Extensible Markup Language). Both allow a document to be structured as a tree. Each node of the tree contains a set of attributes with a value. The syntax used in the document can be specified in a Document Type Declaration (DTD). Tools exist for ensuring automatically the well-formedness and validity of a document, that is, that it correctly follows the SGML or XML syntax as well as its specific DTD. More importantly, SGML and XML are widespread standards, which helps sharing documents. In addition, they support Unicode character codes. Automatic processing of XML documents is much easier than SGML, and thus XML was adopted. The format was designed as being backward compatible with a previous format used at the LDC for the DARPA Broadcast News evaluations. The transcriptions have three hierarchically embedded layers of segmentation (orthographic transcription, speaker turns, sections), plus a fourth level of segmentation (acoustic background conditions) which is independent of the other three. A global list of speakers along with their attributes is also managed inside a transcription, as is a list of topics. Figure 3 shows a manually indented sample of a transcription file corresponding to the screen shot of Figure 1. In our case, the validation of a document is not enough to ensure its logical consistency; indeed, some properties -- e.g., the fact that the 'startTime' and 'endTime' attributes must bear numerical values which are in increasing order, or that each of the four types of segmentation is constrained to be a partition of the whole signal -- exceeds the capabilities of a DTD and have to be verified afterwards in the application. Some of these issues could be addressed using CSS (Cascading Style Sheets) and XSL (Extensible Stylesheet Language) which aim to provide more complex manipulations of XML files..." [cache]


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI: http://xml.coverpages.org/transcriber.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org