DITA Proposed Feature #12026

Extend the DITA 1.1 glossary specialization for more complete support of glossary, linguistic, and semantic applications.

Longer description

DITA 1.1 introduce a simple glossary specialization to meet basic needs for publication as part of bookmap.

The DITA 1.1 glossary specialization, however, is too simple to support many common glossary applications. For instance, many content publishers need to distinguish an abbreviation from the full term. In addition, a more complete representation of terminology can support processing such as the following:

Translation
The glossary identifies key terminology for human translators as well as the meaning that the term must retain in translation. In addition, the identification of special terms and the terminology data for those terms provides a dictionary that helps to enable automated translation of mentions of terms and content in the vicinity of such mentions.

Key terminology standards include TBX.

Semantic search
The glossary identifies the subjects associated with specific terms, which can enable indexing content based on the meaning of terms rather than the surface forms in the text.

Key semantic standards include TopicMaps and SKOS.

To enable these applications, DITA 1.2 allows additional detail in the glossary definition.

Statement of Requirement

Use Cases

Scope

Moderate: adding elements to one specialized topic

Technical Requirements

The expanded glossentry topic provides the following elements:

Base Element Content Purpose
<concept> <glossentry>
  1. one <glossterm>, <glossAbbreviation>, or <glossAcronym>
  2. one <glossdef>
  3. zero or one <prolog>
  4. zero or one <glossBody>
  5. zero or one <related-links>
Defines a single reusable term subject within the glossary. The <glossterm>, <glossAbbreviation>, or <glossAcronym> gives the form of the main term.
<title> <glossterm> <glossAbbreviation> <glossAcronym> <glossFullForm> <glossShortForm> <glossSynonym> title content for <glossterm> for consistency with DITA 1.1; text, <term>, <keyword>, or <tm> content for the other <title> specializations Identifies the role of one term with respect to other variant terms. A <glossFullForm> alternate term should be specified only if the main term is an abbreviation, acronym, or some other shortened form.
<abstract> <glossdef> section content or <shortdesc> Defines the term subject for users.
<conbody> <glossBody>
  1. one <glossPartOfSpeech>
  2. zero or one <glossStatus>
  3. zero or more <glossProperty>
  4. zero or one <glossUsage>
  5. zero or one <glossScopeNote>
  6. zero or more <glossSymbol>
  7. zero or more <note>
  8. zero or more <glossAlt>
Represents terminology detail. The part of speech applies to all term variants and encourages consistency of the variants with the main term. The status indicates the overall status of the term subject. The <glossProperty> and <note> elements are extension points for more detailed terminology definitions (such as the linguistic properties from basic or full TBX).
<data> <glossPartOfSpeech> value attribute enumerated as noun, properNoun, verb, adjective, or adverb; empty content Identifies the part of speech for the main and alternate terms (using the proposed controlled values mechanism if approved but extensible with validation by processing if not) with a default of noun. By definition alternate terms must have the same part of speech as the main term to have a common term subject. The part of speech must be specified when glossary detail is provided.
<data> <glossStatus> value attribute enumerated as restricted, prohibited, or obsolete; empty content Identifies the allowable use of a main or alternate term (using the proposed controlled values mechanism if approved but extensible with validation by processing if not). If the status isn't specified, the main term provides a preferred term and an alternate term provides an allowed term.
<data> <glossProperty> data content An extension point for linguistic or semantic properties such as the gender of the term.
<note> <glossUsage> note content Any information about the correct usage of the term.
<note> <glossScopeNote> note content An explanation of the limitations on the applicability of the term subject.
<image> <glossSymbol> image content Identifies a standard icon associated with the term subject.
<section> <glossAlt>
  1. one <glossAbbreviation>, <glossAcronym>, <glossFullForm>, <glossShortForm>, or <glossSynonym>
  2. zero or one <glossStatus>
  3. zero or more <glossProperty>
  4. zero or one <glossUsage>
  5. zero or more <note>
  6. zero or more <glossAlternateFor>
Identifies a variant term for the main term. Any list of alternative terms is, of course, specific to the language and may get longer or shorter during translation.
<xref> <glossAlternateFor> Empty content Indicates when a variant term has a relationship to another variant term as well as to the main term.

The following example shows the use of the expanded glossentry topic to define main and alternate terms:

<glossentry id="usbfd">
  <glossterm>USB flash drive</glossterm>
  <glossdef>A small portable drive.</glossdef>
  <glossBody>
    <glossPartOfSpeech value="noun"/>
    <glossUsage>Do not use in upper case (as in "USB Flash Drive") so as not to suggest that this is a trademark.</glossUsage>
    <glossAlt>
      <glossAcronym>UFD</glossAcronym>
      <glossUsage>Explain the acronym on first occurrence.</glossUsage>
    </glossAlt>
    <glossAlt id="memoryStick">
      <glossSynonym>memory stick</glossSynonym>
      <glossUsage>This is a colloquial term.</glossUsage>
    </glossAlt>
    <glossAlt>
      <glossAbbreviation>stick</glossAbbreviation>
      <glossStatus value="prohibited"/>
      <glossUsage>This is too colloquial.</glossUsage>
      <glossAlternateFor href="#memoryStick"/>
    </glossAlt>
    <glossAlt>
      <glossAbbreviation>flash</glossAbbreviation>
      <glossStatus value="prohibited"/>
      <glossUsage>This short form is ambiguous.</glossUsage>
    </glossAlt>
  </glossBody>
</glossentry>

The additional markup provides for flexibility so that an adopter with simpler requirements could capture the same list of alternative terms without the detailed usage notes.

<glossentry id="usbfd">
  <glossterm>USB flash drive</glossterm>
  <glossdef>A small portable drive.</glossdef>
  <glossBody>
    <glossPartOfSpeech value="noun"/>
    <glossAlt>
      <glossAcronym>UFD</glossAcronym>
    </glossAlt>
    <glossAlt>
      <glossSynonym>memory stick</glossSynonym>
    </glossAlt>
    <glossAlt>
      <glossAbbreviation>stick</glossAbbreviation>
    </glossAlt>
    <glossAlt>
      <glossAbbreviation>flash</glossAbbreviation>
    </glossAlt>
  </glossBody>
</glossentry>

Relationships between term subjects (such as the hypernym or kind-of relationship and the holonym or part-of relationships specified by WordNet) can be specified for glossary topics by a subject scheme map. (Please see the Proposal 12031 for Controlled Values.)

Finally, for authoring convenience, a <glossgroup> topic can contain multiple <glossentry> topics:

Base Element Content Purpose
<concept> <glossgroup>
  1. one <title>
  2. zero or one <prolog>
  3. zero or more <glossgroup> or <glossentry> topics
Groups a set of glossary entries for some purpose, for instance, for convenient maintenance based on the alphabetic collation of the main term or on the subject matter covered by the terms.

New or Changed Specification Language

Addition to the language reference for the glossentry topic.

Costs

Implementation of the DTD and Schema. The processing for publishing doesn't change.

Benefits