Build on the DITA 1.1 glossary specialization for more complete support of glossary, linguistic, and semantic applications and also to assist in the resolution and handling of abbreviated-form text such as acronyms, general abbreviations, and short forms in source and target text within DITA documents.
This section contrasts the original and revised markup for the acronym example terms.
Original markup | Revised markup |
---|---|
Reference | |
<abbreviated-form conref="acronyms.dita#acronyms/wmd"/> |
<abbreviated-form keyref="wmd"/> |
English source | |
<abbreviated-form id="wmd"> <expanded>Weapons of Mass Destruction</expanded> <short>WMD</short> <surface-form>Weapons of Mass Destruction (WMD)</surface-form> </abbreviated-form> |
<glossentry id="wmd"> <glossterm>Weapons of Mass Destruction</glossterm> <glossBody> <glossSurfaceForm>Weapons of Mass Destruction (WMD)</glossSurfaceForm> <glossAlt> <glossAcronym>WMD</glossAcronym> </glossAlt> </glossBody> </glossentry> |
Spanish translation | |
<abbreviated-form id="wmd"> <expanded>armas de destrucción masiva</expanded> <short>armas de destrucción masiva</short> <surface-form>armas de destrucción masiva</surface-form> </abbreviated-form> |
<glossentry id="wmd"> <glossterm>armas de destrucción masiva</glossterm> <glossBody> <glossSurfaceForm>armas de destrucción masiva </glossSurfaceForm> <glossAlt> <glossAcronym>armas de destrucción masiva</glossAcronym> </glossAlt> </glossBody> </glossentry> |
Original proposal | Revised markup |
---|---|
Reference | |
<abbreviated-form conref="acronyms.dita#acronyms/aids"/> |
<abbreviated-form keyref="aids"/> |
English source | |
<abbreviated-form id="aids"> <expanded>acquired immunodeficiency syndrome</expanded> <short>AIDS</short> <surface-form>acquired immunodeficiency syndrome (AIDS) </surface-form> </abbreviated-form> |
<glossentry id="aids"> <glossterm>acquired immunodeficiency syndrome</glossterm> <glossBody> <glossSurfaceForm>acquired immunodeficiency syndrome (AIDS) </glossSurfaceForm> <glossAlt> <glossAcronym>AIDS</glossAcronym> </glossAlt> </glossBody> </glossentry> |
Spanish translation | |
<abbreviated-form id="aids"> <expanded>síndrome de inmuno-deficiencia adquirida </expanded> <short>SIDA</short> <surface-form>síndrome de inmuno-deficiencia adquirida (SIDA)</surface-form> </abbreviated-form> |
<glossentry id="aids"> <glossterm>síndrome de inmuno-deficiencia adquirida </glossterm> <glossBody> <glossSurfaceForm>síndrome de inmuno-deficiencia adquirida (SIDA)</glossSurfaceForm> <glossAlt> <glossAcronym>SIDA</glossAcronym> </glossAlt> </glossBody> </glossentry> |
This section gives examples of how subsets of the glossentry markup can be used for different applications making use of terms.
An adopter interested only in term resolution for acronyms can declare an acronym with a glossentry topic similar to the following example:
<glossentry id="abs"> <glossterm>Anti-lock Braking System</glossterm> <glossBody> <glossSurfaceForm>Anti-lock Braking System (ABS)</glossSurfaceForm> <glossAlt> <glossAcronym>ABS</glossAcronym> </glossAlt> </glossBody> </glossentry>
The adopter can declare a key for the acronym using the standard DITA 1.2 keyref mechanism:
<map> ... <topicref href="maintcar.dita"/> ... <glossref keys="abs" href="antiLockBrake.dita"/> ... key declarations for other referenced acronyms ... </map>
The adopter can then refer to the acronym using the standard DITA 1.2 keyref mechanism:
<task id="maintcar"> ... <info>The <abbreviated-form keyref="abs"/> will prevent the car from skidding ...</info> ... </task>
Processes should resolve the "abs" reference to the <glossSurfaceForm> text in introductory contexts and to the <glossAcronym> text in other contexts.
Note that the keyref value does not need to match the acronym. In fact, using a more qualified value for the keyref will reduce conflicts in situations where the same acronym may resolve in many ways. For example, an information set could use “cars.abs” as the key for Anti-lock Braking System, and “ship.abs” to refer to the American Bureau of Shipping.
An adopter interested only in traditional glossary publishing can explain one sense of a term with a glossentry topic similar to the following example:
<glossentry id="abs"> <glossterm>Anti-lock Braking System</glossterm> <glossdef>A brake technology that minimizes skids.</glossdef> <glossBody> <glossSurfaceForm>Anti-lock Braking System (ABS)</glossSurfaceForm> <glossAlt> <glossAcronym>ABS</glossAcronym> </glossAlt> </glossentry>
The adopter can then pull together a subset of the defined terms for a deliverable as in the following example:
<map> ... <topichead navtitle="glossary"> <topicref href="antiLockBrake.dita"/> ... other terms in the glossary for this deliverable ... </topichead> </map>
To produce a traditional glossary, a process should sort the terms included in a deliverable and list the explained senses under each term.
Adopters need not declare the same acronym in different ways for different purposes but instead can establish a declaration of acronym terms for multiple purposes. An adopter who needs both to refer to an acronym and list the acronym in a published glossary would provide an explanation of the acronym as in the following example:
<glossentry id="abs"> <glossterm>Anti-lock Braking System</glossterm> <glossdef>A brake technology that minimizes skids.</glossdef> <glossBody> <glossSurfaceForm>Anti-lock Braking System (ABS)</glossSurfaceForm> <glossAlt> <glossAcronym>ABS</glossAcronym> </glossAlt> </glossBody> </glossentry>
The glossary can include the expanded acronym (as shown in the following example) as well as glossary term that are not acronyms. In addition, the team can create acronyms that are referenced but not included in the glossary:
<map> ... <topicref href="maintcar.dita"/> ... <topichead navtitle="glossary"> <topicref keys="abs" href="antiLockBrake.dita"/> ... other referenced terms in the glossary ... </topichead> ... key declarations for other referenced acronyms that aren't in the glossary ... </map>
The adopter can still refer to the acronym with the <abbreviated-form> element as in the following example:
<task id="maintcar"> ... <info>The <abbreviated-form keyref="abs"/> will prevent the car from skidding ...</info> ... </task>
Processing for term resolution to either the <glossSurfaceForm> or <glossAcronym> text and processing for glossary publishing work as before.
While a number of text analysis tools exist, the challenge for adopters is populating the terminology database that enables use of such tools. Published glossaries provide a practical source for terminology to populate such terminology databases.
An adopter whose requirements include not only acronym resolution and glossary publishing requirements but populating a terminology database can create glossentry topics similar to the following:
<glossentry id="abs"> <glossterm>Anti-lock Braking System</glossterm> <glossdef>A brake technology that minimizes skids.</glossdef> <glossBody> <glossPartOfSpeech value="noun"/> <glossSurfaceForm>Anti-lock Braking System (ABS)</glossSurfaceForm> <glossAlt> <glossAcronym>ABS</glossAcronym> <glossStatus value="preferred"/> <glossUsage>Recommended because more readers are familiar with the acronym than the term.</glossUsage> </glossAlt> <glossAlt> <glossSynonym>Anti-skid Brakes</glossSynonym> <glossStatus value="restricted"/> <glossUsage>Allowed in legacy content but not in new content.</glossUsage> </glossAlt> </glossBody> </glossentry>
As illustrated by these example, adopters can scale up for more sophisticated applications as their requirements change by taking advantage of optional elements to provide additional detail about the term.
This section discusses the subset of the glossentry vocabulary specific to acronyms.
To use the glossentry topic for acronym resolution, the writer takes advantage of the following elements
Base element | Specialized element | Content | Purpose |
---|---|---|---|
<concept> | <glossentry> |
|
Declares a term, its acronym, and its surface form. |
<title> | <glossterm> | title content for <glossterm> for consistency with DITA 1.1 | Specifies a term that also has an acronym form. |
<conbody> | <glossBody> |
|
Contains detail about the term. |
<p> | <glossSurfaceForm> | text, <term>, <keyword>, or <tm> content | Specifies an unambiguous presentation of an acronym such as providing the term with the acronym in parentheses. The surface form is suitable to introduce the term in new contexts. |
<section> | <glossAlt> |
|
Identifies an alternate term, in this case, an acronym. |
<title> | <glossAcronym> | text, <term>, <keyword>, or <tm> content | Identifies the acronym for the term. |
The <glossentry> topic provides additional subelements that are optional but available to scale up for single sourcing for additional purposes such as glossary publishing of the acronym (see Technical Requirements below).
Two new domains complement the glossary entry topic to make it easy to refer to acronyms (as shown in the example of acronym resolution):
When the writer provides a keyref to a glossentry topic that contains a <glossSurfaceForm> element, a process should emit the surface form in introductory contexts where the term might be unfamiliar to the reader or in other contexts where a precise term is appropriate.
For instance, a process composing a book deliverable should emit the surface form on the first reference to the glossentry topic within the book or for every reference within a copyright or a warranty-related warning. A process generating an online page should emit the surface form as a hover tooltip on every instance of the term.
When the writer uses the <abbreviated-form> element to refer to a glossentry topic, processing resolves the term reference to the text of the <glossSurfaceForm> element in introductory contexts and to text of the <glossAcronym> element in other contexts.
For instance, if the topic with the keyref to the "abs" key provided the first appearance of the ABS term within a book, the sentence could be rendered as follows:
"The Anti-lock Brake System (ABS) will prevent the car from skidding in adverse weather conditions."
If the ABS term had appeared previously within the book, the same sentence could instead be rendered as follows:
"The ABS will prevent the car from skidding in adverse weather conditions."
The following cases for abbreviated forms must be contemplated when working with documents that require translation:
The source and target languages may have different forms for a term. One language may lack an abbreviation or acronym that's recognized in the other, or the preferred term may be an abbreviation or acronym in one language but the expanded form in another.
Translation workbenches don't allow the translator to change markup during translation. That's necessary for the translation workbench to apply to any markup language without building in an awareness of specific markup vocabularies. For that reason, the text of an acronym and surface form may be provided in the source language but omitted or translated to the same text in a target language while preserving the markup structure.
The following example illustrates this approach for the English source topic:
<glossentry id="wmd" xml:lang="en"> <glossterm>Weapons of Mass Destruction</glossterm> <glossBody> <glossSurfaceForm>Weapons of Mass Destruction (WMD)</glossSurfaceForm> <glossAlt> <glossAcronym>WMD</glossAcronym> </glossAlt> </glossBody> </glossentry>
Term resolution processing uses the supplied text from the <glossAcronym> and <glossSurfaceForm> elements in the same way as the source English text.
Term resolution processing should always ignore empty elements. If the <glossAcronym> and <glossSurfaceForm> elements are empty, an <abbreviated-form> reference should resolve to the <glossterm> text. Thus, if allowed by the translation workbench, the translator could take advantage of standard processing by omitting the text translation for both the <glossAcronym> and <glossSurfaceForm> elements. The result of processing an empty element should be the same as if the translator had copied the <glossterm> text into the empty element.
<glossentry id="wmd" xml:lang="es"> <glossterm>armas de destrucción masiva</glossterm> <glossBody> <glossSurfaceForm></glossSurfaceForm> <glossAlt> <glossAcronym></glossAcronym> </glossAlt> </glossBody> </glossentry>
However, translation processing systems may not permit the translator to leave an element empty and will generate an error message that the translation is incomplete. In that case, the translator must duplicate the <glossterm> in the <glossAcronym> and <glossSurfaceForm> elements.
<glossentry id="wmd" xml:lang="es"> <glossterm>armas de destrucción masiva</glossterm> <glossBody> <glossSurfaceForm>armas de destrucción masiva</glossSurfaceForm> <glossAlt> <glossAcronym>armas de destrucción masiva</glossAcronym> </glossAlt> </glossBody> </glossentry>
In some languages, like Spanish, abbreviated-form expansion should be written in lower case. This can lead to a grammatical error if the first appearance of an abbreviated form occurs at the beginning of a sentence. The same problem may arise with the indefinite article in English 'a' or 'an' depending on whether the text to be inserted begins with a vowel. It is up to the composition/display software to handle this. For example, the acronym for AIDS should be translated as:
<glossentry id="aids" xml:lang="es"> <glossterm>síndrome de inmuno-deficiencia adquirida</glossterm> <glossBody> <glossSurfaceForm>síndrome de inmuno-deficiencia adquirida (SIDA)</glossSurfaceForm> <glossAlt> <glossAcronym>SIDA</glossAcronym> </glossAlt> </glossBody> </glossentry>
Normally the <glossSurfaceForm> text from the above example could not be used at the beginning of a sentence, because it begins with a lower case letter. It is up to the composition software for the given language to cope with this input.
Abbreviated forms can cause problems for inflected languages because abbreviated form expansion needs to be presented in the nominative case, without any inflection. This can be achieved with a surface form that provides the full form in parentheses immediately following the acronym. For example, the Polish acronym for the European Union is:
<glossentry id="eu" xml:lang="pl"> <glossterm>Unia Europejska</glossterm> <glossBody> <glossSurfaceForm>UE (Unia Europejska)</glossSurfaceForm> <glossAlt> <glossAcronym>UE</glossAcronym> </glossAlt> </glossBody> </glossentry>
Using the above construct enables automated handling of the abbreviated form in Polish without causing any problems with grammatical inflection. For example, when stating that something occurred within the EU, the inflected form in Polish caused by the use of the locative case would have to be used. For the actual abbreviated form itself this is not a problem, since abbreviated forms are not inflected. Consider, for example, the phrase "In the European Union (EU) there are many institutions…":
"W Unii Europejskiej (UE) jest wiele instytucji…"
However, by allowing the translator to control how the text is displayed via the <glossSurfaceForm> element, the first occurrence for the abbreviated form allows the translator to use the following acceptable construct:
"W UE (Unia Europejska) jest wiele instytucji…"
This section provides a discusses the full glossentry markup available for any terminology application.
DITA 1.1 introduce a simple glossary specialization to meet basic needs for publication as part of bookmap.
The DITA 1.1 glossary specialization, however, is too simple to support many common glossary applications. For instance, many content publishers need to distinguish an abbreviation from the full term. In addition, a more complete representation of terminology can support processing such as the following:
Key terminology standards include TBX.
Abbreviated forms, such as acronyms, are ubiquitous in technical documentation. Abbreviated forms are a special case of glossary term because they need to be expanded to the full form under some conditions (such as the first encounter within a printed document). In electronic published documents, abbreviated form expansions can also be made available in the form of a hyperlink or 'tool tip' mechanism. In addition, the expanded text of abbreviated forms should be available for automatic inclusion in glossary entries for the publication. This proposal relates to all types of abbreviations, such as acronyms, initialisms, apocope, clipping, elision, syncope, syllabic abbreviation, and portmanteau.
To enable these applications, DITA 1.2 allows additional detail about the term and additional methods for referring to terms that can deliver either abbreviated or surface forms of the term.
The following requirements apply to glossary terms generally:
In addition, abbreviated forms and their translations require special handling:
For example, the surface form for an abbreviated form in English might consist of the abbreviated form followed by its expanded form in parentheses. By contrast, the translated version might consist of the expanded form followed by the abbreviated form in parentheses. The translated version might also include the English and the translation.
For example, in a Polish book on Java Web programming, the first reference to JSP may appear as follows:
"JSP (ang. Java Server Pages)"Another example from a publication concerning OASIS:
"OASIS (ang. Organization for the Advancement of Structured Information Standards—organizacja dla propagowania strukturalnych standardów infomracyjnych)"In the first example, the translator assumes the reader will not require a translation of the English abbreviated form. In the second example, the translator assumes the reader may not understand the English expanded form and therefore adds the translation.
Moderate: adding elements to one specialized topic, providing a map domain for defining keys, and providing an element domain for referring to keys.
The full set of elements provided by the expanded glossentry topic includes the following elements:
Base element | Specialized element | Content | Purpose |
---|---|---|---|
<concept> | <glossentry> |
|
Specifies the preferred and alternate forms of a term and
the subject designated by those terms within a glossary or other kind
of terminology set. Note: Some terminology discussions use "concept" to
denote the meaning of a term. For the DITA community, however, "concept" has
a strong association with the core DITA concept topic type. To avoid
confusion, this proposal denotes the meaning of a term with "subject" (which
has appropriate connotations by way of "subject classification" and
the Dublin Core subject property).
|
<title> | <glossterm> <glossAbbreviation> <glossAcronym> <glossShortForm> <glossSynonym> | title content for <glossterm> for consistency with DITA 1.1; text, <term>, <keyword>, or <tm> content for the other <title> specializations | Identifies the role of one term with respect to other variant terms. The <glossterm> element appears within the <glossentry> element to indicate the preferred term. The other <title> specializations can appear within the <glossAlt> element to indicate alternative forms with the same meaning. In particular, where <glossShortForm> to indicate a shorter alternative to the preferred term. |
<abstract> | <glossdef> | section content or <shortdesc> | Provides a verbal definition of the subject of a term for writers and users. |
<conbody> | <glossBody> |
|
Represents terminology detail. The part of speech applies to all term forms to encourage consistency of the alternate forms with the preferred term. The surface form presents the term in an unambiguous way. The status indicates the overall status of the subject of the term. The <glossProperty> and <note> elements are extension points for more detail about the preferred term or its subject (such as the linguistic properties from basic or full TBX). |
<data> | <glossPartOfSpeech> | value attribute enumerated as noun, properNoun, verb, adjective, or adverb; empty content | Identifies the part of speech for the preferred and alternate
terms. Alternate terms must have the same part of speech as the preferred
term because all terms in the glossentry topic designate the same
subject. If the part of speech isn't specified, the default is a noun
for the standard enumeration. Note: The standard enumeration is extensible
or replaceable. The enumeration is validated by means of the proposed
controlled values mechanism or through processing rather than validated
as an XML enumeration.
|
<data> | <glossStatus> | value attribute enumerated as preferred, restricted, prohibited, or obsolete; empty content | Identifies the usage status of a preferred or alternate term.
If the status isn't specified, the <glossterm> provides a preferred
term and an alternate term provides an allowed term. Note: This enumeration
must be extensible or replaceable. The enumeration is validated by
means of the proposed controlled values mechanism or through processing
rather than validated as an XML enumeration.
|
<data> | <glossProperty> | data content | An extension point for linguistic or semantic properties such as the gender of the term. |
<p> | <glossSurfaceForm> | text, <term>, <keyword>, or <tm> content | Specifies an unambiguous presentation of the term that may combine multiple forms. For instance, for an acronym, the <glossSurfaceForm> might provide the full form as well as the acronym in parentheses. The surface form is suitable to introduce the term in new contexts. |
<note> | <glossUsage> | note content | Any information about the correct usage of the term. |
<note> | <glossScopeNote> | note content | A clarification of the subject designated by the terms such as examples of included or excluded companies or products. For instance, a scope note for "Linux" might explain that the term doesn't apply to UNIX products and give some examples of Linux products that are included as well as UNIX products that are excluded. |
<image> | <glossSymbol> | image content | Identifies a standard icon associated with the subject of the term. |
<section> | <glossAlt> |
|
Identifies a variant term for the preferred term. Any list of alternative terms is, of course, specific to the language, so translation may result in empty elements. |
<xref> | <glossAlternateFor> | Empty content | Indicates when a variant term has a relationship to another variant term as well as to the preferred term. |
The following example shows the minimum declaration of a term:
<glossentry id="highavail"> <glossterm>High Availability</glossterm> </glossentry>
The following example shows a detailed glossary entry specifying the usage for the preferred and alternate terms:
<glossentry id="usbfd"> <glossterm>USB flash drive</glossterm> <glossdef>A small portable drive.</glossdef> <glossBody> <glossPartOfSpeech value="noun"/> <glossUsage>Do not provide in upper case (as in "USB Flash Drive") because that suggests a trademark.</glossUsage> <glossAlt> <glossAcronym>UFD</glossAcronym> <glossUsage>Explain the acronym on first occurrence.</glossUsage> </glossAlt> <glossAlt id="memoryStick"> <glossSynonym>memory stick</glossSynonym> <glossUsage>This is a colloquial term.</glossUsage> </glossAlt> <glossAlt> <glossAbbreviation>stick</glossAbbreviation> <glossStatus value="prohibited"/> <glossUsage>This is too colloquial.</glossUsage> <glossAlternateFor href="#memoryStick"/> </glossAlt> <glossAlt> <glossAbbreviation>flash</glossAbbreviation> <glossStatus value="prohibited"/> <glossUsage>This short form is ambiguous.</glossUsage> </glossAlt> </glossBody> </glossentry>
Using the standard keyref mechanism, the writer can assign a key to the declaration topic and refer to the key to insert the preferred term. The benefit in using a reference is that the preferred term can be maintained in one place:
<map> ... <topicref keys="reliability" href="highavail.dita" linking="none" toc="no" print="no" search="no"/> ... <topicref href="configdb.dita"/> ... </map> <task id="configdb"> <title>Configuring the database.</title> ... <context>To enable <term keyref="reliability"/>, you configure the database</context> ... </task>
Two new domains support easy definition and use of keys for glossary entry topics:
Writers can set the linking attribute to the "target" value on the <glossref> element to enable linking from the use to the glossary term. The <glossref> element is only a convenience. Writers can always use the standard capabilities of the keyref mechanism. For instance, writers can use the <topicref> element with a keys attribute to pull a glossary topic into a TOC context while defining a key.
When the writer uses the <abbreviated-form> element to refer to a glossentry topic, the process performs the following checks in the attempt to find an abbreviated or surface form with text for the reference, skipping all subsequent checks once the text has been found:
Writers can also use the <term> element with a keyref attribute to refer to a glossentry. Processing inserts text from the glossentry topic only when the referencing <term> element doesn't contain text. As a result, writers can use the <term> element to delimit terms within content while identifying the corresponding glossary entry. That is, the <term> element can provide a context-specific surface form as its content where appropriate.
For authoring convenience, a <glossgroup> topic can contain multiple <glossentry> topics:
Base | Element | Content | Purpose |
---|---|---|---|
<concept> | <glossgroup> |
|
Groups a set of glossary entries for some purpose, for instance, for convenient maintenance based on the alphabetic collation of the preferred terms or on the subject matter covered by the terms. |
Relationships between the subjects of terms (such as the hypernym or kind-of relationship and the holonym or part-of relationships specified by WordNet) can be specified for glossary topics by a subject scheme map. (Please see the Proposal 12031 for Controlled Values.)
The Language Reference for the glossentry topic should be revised to reflect the contents of this proposal including translation considerations and their impact on the use of abbreviations.
Implementation of the DTD and Schema changes for the glossentry topic, of the map domain for the <glossref> element, of the topic domain for the <abbreviated-form> element, and of the glossgroup topic.
Implementation of special processing to emit the surface form when appropriate.
In particular, abbreviated forms can be handled in a uniform and consistent manner by putting resolution of the abbreviated form under the control of the composition software so that glossary, tooltip, and first forms can be provided as required to meet the end-user requirements.