DITA Proposed Feature 14

A specialized topic for glossary entries.

Longer description

The problem: DITA users need to publish glossaries and, more generally, identify the terminology for key concepts in an information set.

The solution: Add a specialized topic for reusable glossary entries.

Scope

Major - new topic type

Use Case

Here are some examples of potential uses of a glossary entry.

Publishing a glossary listing in the back of a book
The glossary definitions can be sorted and grouped by term prior to formatting.
Offering inline definitions in a help system or website
The glossary definitions can be included in HTML pages as popups windows or tooltips for mentions of unfamiliar terms.
Guiding authors during content creation
The glossary provides a controlled vocabulary to help authors create content that employs consistent terminology for a set of standard concepts.
Guiding translation
The glossary identifies key terminology for human translators as well as the meaning that the term must retain in translation. In addition, the identification of special terms and the terminology data for those terms provides a dictionary that helps to enable automated translation of mentions of terms and content in the vicinity of such mentions.

Key terminology standards include TMF and TBX.

Indicating subject matter for semantic processing
By defining concepts that might be unfamiliar, the glossary contributes to a formal definition of the subject matter for an information set. Formal definition of subject matter can enable semantic search as well as browsing or linking across content based on its subject matter.

Here is one example (http://tb1.siderean.com:7880/test/test2query3.jsp) of browsing based on different aspects of the subject matter.

Key standards for formal subject definition include TopicMaps and SKOS.

Finally, because topics must be reusable in many different combinations, the glossary definitions for the terminology associated with topics must be reusable in many combinations as well.

Technical Requirements

The relationship between concepts and terms is complex. A term can have many meanings. For instance, "element" can have a chemical, programming, or XML sense. Conversely, a concept can have many labels. For instance, an XML delimiter for hierarchical structured content could be called an "element" or "tag". A strict formal model of the relationship between concepts and terms would allow many-to-many relationships between the two.

A content creator, however, usually wants a one-to-one mapping between key concepts and terms. That is, within an information set, most content creators would like each key concept to have a single preferred label and each key term to have a standard, unambiguous meaning.

Thus, the glossary specialization should focus on the common, simple case where a single glossary entry specifies both the concept and term. The specialization should scale to more complex cases where the concept and term need to be defined separately and associated by reference.

The simplest possible glossary entry might resemble the following:

<glossentry id="ddl">
    <glossterm>Data Definition Language</glossterm>
    <glossdef>A language used for defining databases....</glossdef>
</glossentry>

The glossary entry must be able to accept optional, additional detail about the term or concept:

Alternative terms
The glossary entry might provide a convenience to identify alternative terms such as synonyms, acronyms, and so on. If the user needs to capture terminological detail about the alternative term, the user should define the alternative term in a separate glossary entry and use a <topicref> or <link> relationship to associated the two terms.
Terminology detail
Examples of terminology properties include the part of speech, a list of inflections, and so on. DITA might choose to make available an initial set of optional terminology properties (perhaps enough to generate an adequate terminology exchange file in TBX).
Concept detail
Examples of conceptual properties include the scope of subject matter covered by the concept, the origin of the concept, relationships to other concepts, or categorization of the concept and so on. It must be possible to supply these conceptual properties not only for glossary concepts but for other concepts that don't use special terminology. DITA might choose to make available an initial set of optional conceptual properties (perhaps enough to generate an adequate semantic representation in SKOS).
Example
The glossary entry might provide an example of the use of a term in the intended sense.

In most cases, content providers won't want to display terminology or conceptual properties to the reader. Instead, these properties enable terminology or semantic processing. Regardless of the initial set of properties, organizations with specific needs must be able to extend the set of terminology or conceptual properties through specialization. Organizations must be able to share the specializations for terminology properties separately from conceptual properties based on their level of requirement in each area.

The content provider might need to maintain status and workflow metadata on the glossary entry, but that's a separate issue that applies generally to many different topic types.

A glossary entry that uses the full glossary capabilities might resemble the following:

<glossentry id="ddl">
    <glossterm>Data Definition Language</glossterm>
    <glossdef>A language used for ....</glossdef>
    <glossdetail>
        <glalt-terms>
            <glacronym>DDL</glacronym>
            <glsynonym>Data Description Language</glsynonym>
            ... other alternative terms ...
        </glalt-terms>
        <glterm-detail>
            <glpart-of-speech>noun</glpart-of-speech>
            ... other terminology detail ...
        </glterm-detail>
        <glsubject-detail>
            <glscope-note>Covers SQL schema....</glscope-note>
            ... other concept detail ...
        </glsubject-detail>
        <example>
            <p>Before using ...</p>
        </example>
    </glossdetail>
    <glosslinks>
        <glrelated-entry href="glossary/database.dita"/>
        ... other relationships ...
    </glosslinks>
</glossentry>
Note: The glossary might use "subject" as a label for concept properties to avoid confusion with standard DITA concept topics.

The content provider can use the DITA map to establish relationships between glossary entries, either based on the term or based on the concept. For instance, the content provider might indicate deprecated terms or related concepts. The content provider could also choose to identify the terminology and conceptual properties in separate topics with relationships. The map can also indicate glossary entries that should be used in terminology or semantic processing but not displayed to the reader.

Reuse of topics that are authored independently can result in glossary misalignment. For example, the topics might use different labels for the same concept or the same label for different concepts. To mitigate this problem, one approach could be to use keyref or conref when authoring mentions of the term so a different label can be assigned to the concept at any time.

If the term is embedded inline, however, the content provider should produce a glossary that reflects the terminology of the actual topics. That is, the reuser should specify the related synonyms (identifying the preferred definition) and generate a listing that is sorted and grouped by term.

Note: It might be possible for a concept topic to contribute its title and short description to a glossary while providing a more detailed explanation of the concept for the navigation. For instance, a topic might provide a capsule summary of LDAP for the glossary but more detailed background about LDAP for the website.
Note: We may want to add markup to identify samples of the use of a term inline within content topics.

Related Proposals

Costs

Benefits

Provides DITA adopters with an efficient way to enable glossary publishing, terminology processing, and semantic processing from a single set of definitions.

Time Required