DITA Proposed Feature #12026 and #12038

Build on the DITA 1.1 glossary specialization for more complete support of glossary, linguistic, and semantic applications and also to assist in the resolution and handling of abbreviated-form text such as acronyms, general abbreviations, and short forms in source and target text within DITA documents.

Longer description

DITA 1.1 introduce a simple glossary specialization to meet basic needs for publication as part of bookmap.

The DITA 1.1 glossary specialization, however, is too simple to support many common glossary applications. For instance, many content publishers need to distinguish an abbreviation from the full term. In addition, a more complete representation of terminology can support processing such as the following:

Translation
The glossary identifies key terminology for human translators as well as the meaning that the term must retain in translation. In addition, the identification of special terms and the terminology data for those terms provides a dictionary that helps to enable automated translation of mentions of terms and content in the vicinity of such mentions.

Key terminology standards include TBX.

Semantic search
The glossary identifies the subjects associated with specific terms, which can enable indexing content based on the meaning of terms rather than the surface forms in the text.

Key semantic standards include TopicMaps and SKOS.

Handling of abbreviated forms

Abbreviated forms, such as acronyms, are ubiquitous in technical documentation. Abbreviated forms are a special case of glossary term because they need to be expanded to the full form under some conditions (such as the first encounter within a printed document). In electronic published documents, abbreviated form expansions can also be made available in the form of a hyperlink or 'tool tip' mechanism. In addition, the expanded text of abbreviated forms should be available for automatic inclusion in glossary entries for the publication. This proposal relates to all types of abbreviations, such as acronyms, initialisms, apocope, clipping, elision, syncope, syllabic abbreviation, and portmanteau.

To enable these applications, DITA 1.2 allows additional detail in the glossary definition and additional methods for referring to abbreviated terms that can deliver either abbreviated or expanded forms of the term.

Statement of Requirement

The following requirements apply to glossary terms generally:

In addition, abbreviated forms and their translations require special handling:

For example, the expansion of an abbreviated form in English might consist of the abbreviated form followed by its full form in parentheses. By contrast, the translated version might consist of the expanded form followed by the abbreviated form in parentheses. The translated version might also include the English and the translation.

For example, in a Polish book on Java Web programming, the first reference to JSP may appear as follows:

"JSP (ang. Java Server Pages)"

Another example from a publication concerning OASIS:

"OASIS (ang. Organization for the Advancement of Structured Information Standards—organizacja dla propagowania strukturalnych standardów infomracyjnych)"

In the first example, the translator assumes the reader will not require a translation of the English abbreviated form. In the second example, the translator assumes the reader may not understand the English expanded form and therefore adds the translation.

Use Cases

Scope

Moderate: adding elements to one specialized topic, providing a map domain for defining keys, and providing an element domain for referring to keys.

Technical Requirements

The expanded glossentry topic provides the following elements:

Base Element Content Purpose
<concept> <glossentry>
  1. one <glossterm>, <glossAbbreviation>, or <glossAcronym>
  2. one <glossdef>
  3. zero or one <prolog>
  4. zero or one <glossBody>
  5. zero or one <related-links>
Defines a single reusable term subject within the glossary. The <glossterm>, <glossAbbreviation>, or <glossAcronym> gives the form of the preferred term.
<title> <glossterm> <glossAbbreviation> <glossAcronym> <glossFullForm> <glossShortForm> <glossSurfaceForm> <glossSynonym> title content for <glossterm> for consistency with DITA 1.1; text, <term>, <keyword>, or <tm> content for the other <title> specializations Identifies the role of one term with respect to other variant terms. A <glossFullForm> alternate term should be specified only if the preferred term is an abbreviation, acronym, or some other shortened form. The <glossSurfaceForm> can, however, be specified as an expansion of any preferred term.
<abstract> <glossdef> section content or <shortdesc> Defines the term subject for users.
<conbody> <glossBody>
  1. one <glossPartOfSpeech>
  2. zero or one <glossStatus>
  3. zero or more <glossProperty>
  4. zero or one <glossUsage>
  5. zero or one <glossScopeNote>
  6. zero or more <glossSymbol>
  7. zero or more <note>
  8. zero or more <glossAlt>
Represents terminology detail. The part of speech applies to all term variants and encourages consistency of the variants with the preferred term. The status indicates the overall status of the term subject. The <glossProperty> and <note> elements are extension points for more detailed terminology definitions (such as the linguistic properties from basic or full TBX).
<data> <glossPartOfSpeech> value attribute enumerated as noun, properNoun, verb, adjective, or adverb; empty content Identifies the part of speech for the preferred and alternate terms (using the proposed controlled values mechanism if approved but extensible with validation by processing if not) with a default of noun. By definition alternate terms must have the same part of speech as the preferred term to have a common term subject. The part of speech must be specified when glossary detail is provided.
<data> <glossStatus> value attribute enumerated as restricted, prohibited, or obsolete; empty content Identifies the allowable use of a preferred or alternate term (using the proposed controlled values mechanism if approved but extensible with validation by processing if not). If the status isn't specified, the preferred term provides a preferred term and an alternate term provides an allowed term.
<data> <glossProperty> data content An extension point for linguistic or semantic properties such as the gender of the term.
<note> <glossUsage> note content Any information about the correct usage of the term.
<note> <glossScopeNote> note content An explanation of the limitations on the applicability of the term subject.
<image> <glossSymbol> image content Identifies a standard icon associated with the term subject.
<section> <glossAlt>
  1. one <glossAbbreviation>, <glossAcronym>, <glossFullForm>, <glossShortForm>, or <glossSynonym>
  2. zero or one <glossStatus>
  3. zero or more <glossProperty>
  4. zero or one <glossUsage>
  5. zero or more <note>
  6. zero or more <glossAlternateFor>
Identifies a variant term for the preferred term. Any list of alternative terms is, of course, specific to the language and may get longer or shorter during translation.
<xref> <glossAlternateFor> Empty content Indicates when a variant term has a relationship to another variant term as well as to the preferred term.

The following example shows the use of the expanded glossentry topic to define preferred and alternate terms:

<glossentry id="usbfd">
  <glossterm>USB flash drive</glossterm>
  <glossdef>A small portable drive.</glossdef>
  <glossBody>
    <glossPartOfSpeech value="noun"/>
    <glossUsage>Do not use in upper case (as in "USB Flash Drive") so as not to suggest that this is a trademark.</glossUsage>
    <glossAlt>
      <glossAcronym>UFD</glossAcronym>
      <glossUsage>Explain the acronym on first occurrence.</glossUsage>
    </glossAlt>
    <glossAlt id="memoryStick">
      <glossSynonym>memory stick</glossSynonym>
      <glossUsage>This is a colloquial term.</glossUsage>
    </glossAlt>
    <glossAlt>
      <glossAbbreviation>stick</glossAbbreviation>
      <glossStatus value="prohibited"/>
      <glossUsage>This is too colloquial.</glossUsage>
      <glossAlternateFor href="#memoryStick"/>
    </glossAlt>
    <glossAlt>
      <glossAbbreviation>flash</glossAbbreviation>
      <glossStatus value="prohibited"/>
      <glossUsage>This short form is ambiguous.</glossUsage>
    </glossAlt>
  </glossBody>
</glossentry>

The usage and status markup is optional so that an adopter with simpler requirements could capture a list of alternative terms without the burden of the full terminology detail. For instance, the following example shows a minimal entry defining an abbreviation, its full form, and its surface form:

<glossentry id="abs">
  <glossAcronym>ABS</glossAcronym>
  <glossdef>A brake technology that minimizes skids.</glossdef>
  <glossBody>
    <glossPartOfSpeech value="noun"/>
    <glossAlt>
      <glossFullForm>Anti-lock Braking System</glossFullForm>
    </glossAlt>
    <glossAlt>
      <glossSurfaceForm>Anti-lock Braking System (ABS)</glossSurfaceForm>
    </glossAlt>
  </glossBody>
</glossentry>

Two new domains complement the glossary topic:

The following example uses the "car.abs" key to refer to the glossary topic shown previously for the ABS abbreviation:

<map>
  ...
  <glossref keys="car.abs" href="abs.dita"/>
  ...
  <topicref type="task" href="maintcar.dita"/>
  ...
</map>

<task id="maintcar">
  <title>Maintaining your car</title>
  ...
    <info>The <abbreviation keyref="car.abs"/> system will prevent the car from skidding in adverse weather conditions.</info>
  ...
</task>

Writers can set the linking attribute to the "target" value on the <glossref> element to enable linking from the use to the glossary term or use a <keydef> or <topicref> element with a keys attribute to pull glossary topics into a TOC context while defining keys.

Content teams may choose to use the base <term> element to refer to glossary terms when referring to other terms as well as abbreviations. The <term> element can provide a context-specific surface form. That is, processing inserts the preferred term from the glossentry topic only when the <term> element doesn't contain text.

For authoring convenience, a <glossgroup> topic can contain multiple <glossentry> topics:

Base Element Content Purpose
<concept> <glossgroup>
  1. one <title>
  2. zero or one <prolog>
  3. zero or more <glossgroup> or <glossentry> topics
Groups a set of glossary entries for some purpose, for instance, for convenient maintenance based on the alphabetic collation of the preferred term or on the subject matter covered by the terms.

Relationships between term subjects (such as the hypernym or kind-of relationship and the holonym or part-of relationships specified by WordNet) can be specified for glossary topics by a subject scheme map. (Please see the Proposal 12031 for Controlled Values.)

Rendition

When the writer provides a keyref to a glossentry topic with a <glossSurfaceForm> element, a process can emit the surface form in contexts where the abbreviation might be unfamiliar to the reader.

For instance, a process composing a book deliverable can emit the surface form on the first reference to the glossentry topic within the book or within copyright, warning, or legal sections. A process generating an online page can emit the surface form as a hover tooltip on every instance of the term.

For instance, if the topic with the keyref to the "car.abs" key provided the first appearance of the ABS term within a book, the sentence could be rendered as follows:

"The Anti-lock Brake System (ABS) system will prevent the car from skidding in adverse weather conditions."

If the ABS term had appeared previously within the book, the same sentence could instead be rendered as follows:

"The ABS system will prevent the car from skidding in adverse weather conditions."

Translation Issues for Abbreviated Forms

The following cases for abbreviated forms must be contemplated when working with documents that require internationalization:

New or Changed Specification Language

The Language Reference for the glossentry topic should be revised to reflect the contents of this proposal including translation considerations and their impact on the use of abbreviations.

Costs

Benefits