DITA Proposed Feature #12031

Manage and validate the enumerations for attributes as controlled content.

Longer description

Problem

The list of appropriate values for an attribute is determined by the subject matter of the content. For example, consider the audience attribute. In the documentation for a software design tool, the audience attribute might take values such as Architect, Developer, and User. By contrast, in the report for a pharmaceutical trial, the same audience attribute might take values such as Participant, Researcher, and Executive. In short, content providers need the ability to specify the attribute values that are appropriate for their content.

Because of this need, the selection attributes in DITA 1.1 don't define enumerations but, instead, accept any character data. The challenge for the content provider is define the values for their content.

A content provider could specialize a DTD or XML Schema to enumerate the values for an attribute. This solution, however, has several problems:

  • Few content creators have the necessary expertise to modify a DTD or Schema.
  • The enumeration for an attribute changes more rapidly and applies to a more limited set of content than the document type. For instance, if a software vendor ports their product to Macintosh OS X, the documentation team must add a corresponding controlled value to the platform attribute to be able to filter and flag content. The core structure and semantics of the concept, reference, and task topic types, however, require no changes.
  • Specializing to enumerate the values for an attribute creates a barrier for receiving content from other adopters of the same topic type. For example, the learning topic types apply to education content for many different industries, but each industry requires different enumerations for the user role.
  • The enumeration for an attribute must express a flat list of values. The list cannot express subsumption where a more general value subsumes a more specific value such as the Programmer audience including the Application Programmer and System Programmer audiences or the Linux platform including the RedHat Linux and SuSE Linux platforms.

Other initiatives have recognized the need to specify the enumerations of values separately from the structure and semantics of the document type. For instance, OASIS Universal Business Language (UBL) separates the validation of the document structure using XML Schema from the validation of controlled values using Schematron (see http://docs.oasis-open.org/ubl/os-UBL-2.0/UBL-2.0.html#CODELISTS). IEEE LOM (Learning Objects Metadata) attributes don't have a single enumeration of values but instead use a <source> subelement to identify the standard that defines the enumeration (see http://ltsc.ieee.org/wg12/materials.html). TBX (TermBase Exchange) uses an external XCS (Extensible Constraint Specification) file to define the controlled values for typing content elements (see http://www.lisa.org/standards/tbx/tbxISO_final.html#flexiFormat).

Solution

Provide DITA adopters with a method for defining controlled values as part of their content. This approach has the following benefits:

  • Adopters can easily define controlled values for the DITA attributes and other purposes without special technical knowledge.
  • Partners can share the definition of controlled values so that, when building their combined content, a common set of filtering and flagging values can be applied.
  • Tools have the option to validate controlled values for attributes against the definitions.
  • Controlled values can be used to classify content for filtering and flagging at build time but can also scale for retrieval and traversal at runtime if sophisticated information viewers are available.

For an earlier version of this proposal, please see:

http://lists.oasis-open.org/archives/dita/200702/msg00059.html

Statement of Requirement

Define a controlled value as special DITA content
Instead of introducing a fundamentally new XML vocabulary, the definition for a controlled value should be a specialization. Adopters benefit because they use familiar tools and document structures to define the controlled values. Tool vendors benefit because they do not have to implement special tooling for editing the definitions of controlled values.
Associate a group of controlled values with an attribute
The solution must be able to define a group of controlled values and to bind the group to an attribute. This declaration allows sophisticated editors to provide pick lists for setting controlled values and for processors to validate that content is classified with defined values.
Allow readable labels for controlled values
While short keywords without spaces are most convenient for processing, some controlled values are easier to understand when provided as multiple words or with labels. For instance, wsent might be a useful keyword for which "Web Server (Enterprise Edition)" would be a more recognizable label. The label allows sophisticated editors to provide more friendly pick lists for writers.
Allow definition of the group as a hierarchy
Content providers should have the option of specifying that more general controlled values include more specific controlled values. For instance, if a warning applies only to RedHat Linux and not to other Linux platforms, the writer should be able to classify the warning with RedHat Linux but still have the warning flagged when Linux is set for flagging during a build.
Allow definition of the sense of a controlled value
Within a small community, a well-known keyword can be enough for shared understanding. For instance, within a specific development group, everyone on the team might understand the server controlled value to mean Linux and UNIX platforms running database or web servers. Across partner organization boundaries or with ambiguous keywords, the controlled value may require a definition to ensure consistent use.
Allow classification of content for retrieval
Besides filtering and flagging, controlled values can support retrieval or traversal of content. For instance, Open Source information viewers that provide retrieval based on classification include Apache Solr (http://lucene.apache.org/solr/), Aduna AutoFocus (http://www.aduna-software.com/products/autofocus_server/overview.view), and MIT Longwell (http://simile.mit.edu/wiki/Longwell).

Use Cases

Validating controlled values
A publication workflow product needs to define values for the state of an article as it passes through review and editing. The content team needs to define the values reflective of their publication workflow and, for publication processes to work reliably, be confident that the state attribute reflects only those defined values. The content team doesn't know how to create or modify document types.
Sharing values for filtering and flagging
Two companies enter into an OEM agreement to produce a product that combines their components. The documentation teams must produce a joint deliverable for multiple platforms on a tight schedule. To produce the versions, the teams must agree on the list of possible platforms for filtering content. One team must identify specific platform versions that are irrelevant to the other team (whose component isn't as tightly integrated into the native libraries of the operating system). Finally, the server platform definition must have a clear explication so both teams use the server platform in the same way.
Retrieving content based on subject matter
An enterprise web portal rolls out an Open Source facet browser for easier access to information. The content teams producing information for the portal need to classify their content to indicate which content is about which subjects. The content teams need to create and maintain the classification as they are working with the content.

Scope

This proposal has the following impacts:

  • Introduces a specialized map and a specialized map domain.
  • Requires enhancements to attribute validation and filtering and flagging processing.

Technical Requirements

Enumerating a list of controlled values

Fundamentally, a controlled value is a short, readable, and meaningful keyword that identifies a subject. For instance, when the writer sets the platform attribute of a warning note to the linux keyword, the writer is asserting that the warning is about the Linux platform. Because the DITA 1.2 key definition mechanism mints short identifiers, key definitions provide a natural DITA method for defining controlled values. Because the DITA map defines collections, the DITA map provides a natural DITA method for defining an enumeration of controlled values.

Draft comment:
The use of keys for defining controlled values here is, obviously, dependent on the DITA 1.2 keys proposal as a method for defining a controlled value that is unique within its category. In particular, the key definition proposal specifies the scope within which a key must be unique.

The core elements for a specialized map that defines the controlled value identifiers for some subjects are as follows:

<subjectScheme>
A specialized <map> that defines a collection of controlled values rather than a collection of topics.
<subjectdef>
A specialized <topicref> that defines the controlled value identifier for a subject within the scheme. (Specializing a topicref allows the definition of the subject to refer to an optional subject definition topic for more detail as described later in Defining the sense of a controlled value.)

The following example enumerates a list of operating systems. The top-level os <subjectdef> element identifies the category. The contained <subjectdef> elements identify each operating system:

<subjectScheme>
  <subjectdef keys="os">
    <subjectdef keys="linux"/>
    <subjectdef keys="mswin"/>
    <subjectdef keys="zos"/>
  </subjectdef>
  ...
</subjectScheme>

Defining both the category and the controlled values for the category as subjects allows nesting of subcategories (as described later in Defining a hierarchy of controlled values).

In that DITA adopters don't have to define controlled values, the subject scheme is an optional specialization for adopters like bookmap.

Binding an enumeration to an attribute

To define an enumeration for an attribute, the scheme associates the attribute with the <subjectdef> category that contains the enumeration using the following specialized elements:

<enumerationdef>
A specialized <topicgroup> or <topicref> element that identifies one attribute and one or more categories that contain the controlled values for the enumeration. (For the multiple category case, please see Binding multiple categories to a single attribute.) The type attribute has a default value of keys; enumerations based on other kinds of value definitions could be added in the future.
<elementdef>
An optional specialized <data> element that identifies the element containing the attribute. The name attribute identifies the name of the element. Where <elementdef> is omitted, the enumeration is bound to the attribute in all elements.
<attributedef>
A required specialized <data> element that identifies the attribute. The name attribute identifies the name of the attribute.
<defaultSubject>
An optional specialized <topicref> element that refers to the subject that should be considered the default controlled value if none is specified in the attribute.

The following example uses the specialized elements to associate the platform attribute with the operating system category:

<subjectScheme>
  <subjectdef keys="os">
    <subjectdef keys="linux"/>
    <subjectdef keys="mswin"/>
    <subjectdef keys="zos"/>
  </subjectdef>
  <enumerationdef>
    <attributedef name="platform"/>
    <subjectdef keyref="os"/>
  </enumerationdef>
</subjectScheme>

To establish the subject scheme governing attribute values, a map refers to the map that defines the enumerations. As with all key definitions and references, the reference must appear in the highest map that makes use of the controlled values. The general error and override conditions for key definitions apply to controlled values.

Note: Tools may choose to provide a preference dialog that establishes a default scheme so new maps are initialized with a reference to that scheme.

After the locating the scheme, tools can validate an attribute against the bound enumeration. For instance, a topic editor could prevent the user from entering "linix" as a platform value:

<note platform="linux">Please don't remove the root directory.</note>

An map editor could also validate the platform attribute in a map against the scheme. Finally, a processor could check that all values listed for an attribute by the DITA values file are bound to the attribute by the scheme before applying filtering or flagging:

<val>
  <prop att="platform" val="linux" action="flag">
    <startflag>Linux</startflag>
  </prop>
</val>

In the example scheme above, the os category is defined separately and referenced with the keyref attribute in the binding to the platform attribute. A content provider can, alternatively, define the category inline within the binding:

<subjectScheme>
  <enumerationdef>
    <attributedef name="platform"/>
    <subjectdef keys="os">
      <subjectdef keys="linux"/>
      <subjectdef keys="mswin"/>
      <subjectdef keys="zos"/>
    </subjectdef>
  </enumerationdef>
</subjectScheme>

The choice of the preferred approach is up to the adopter, though separating the binding allows for more flexibility for extension (as described later in Merging controlled values with an extension scheme).

Note: An enumeration can specify an empty category without children. In this case, no value is valid for the attribute.
Note: Whether an attribute takes a single value or multiple values from the enumeration is part of the structural definition of the element controlled by the DTD or XML Schema.
Note: One strategy for validating instances of an attribute bound to a subject scheme would be to generate a Schematron.

Labelling controlled values

For clarity, a content provider can supply a label with the navtitle attribute of the <subjectdef> element:

<subjectScheme>
  <subjectdef keys="os" navtitle="Operating system">
    <subjectdef keys="linux" navtitle="Linux"/>
    <subjectdef keys="mswin" navtitle="Microsoft Windows"/>
    <subjectdef keys="zos"   navtitle="z/OS"/>
  </subjectdef>
  <enumerationdef>
    <attributedef name="platform"/>
    <subjectdef keyref="os"/>
  </enumerationdef>
</subjectScheme>

An editor could provide a pick list with the operating system keys and the titles for selection by the user:

linux Linux
mswin Microsoft Windows
zos z/OS

The editor should store only the key in the for tagging content. That way, the content provider can maintain the title without invalidating existing classification of content.

The writer can also supply a brief description for a subject within the scheme by supplying the <shortdesc> element within a <topicmeta> element under the <subjectdef> element.

Defining the sense of a controlled value

To use a controlled value consistently, content teams must apply the same interpretation to each controlled value; that is, teams must share am understanding of the subject indicated by the keyword. Otherwise, writers will apply the same controlled value to different content or apply different controlled values to the same content.

For instance, if one writer understands the server platform to cover any machine running a web server while another writer understands the server platform to cover high-end enterprise clusters accessed simultaneously by hundreds of users, different content will have be classified with the same controlled value. As a result, filtering, flagging, and retrieval operations will treat dissimilar content as if it were the same.

Establishing a shared understanding is especially important when independent teams (perhaps in different companies) must produce a common deliverable.

To clarify the sense or meaning of a controlled value, the content provider can supply a subject definition topic similar to an entry in an encyclopaedia. In the following example, the linux and unix subjects have subject definition topics.

Figure 1. The baseOS.ditamap scheme map
<subjectScheme>
  <subjectdef keys="os" navtitle="Operating system">
    <subjectdef keys="linux" navtitle="Linux" href="subject/linux.dita"/>
    <subjectdef keys="mswin" navtitle="Windows"/>
    <subjectdef keys="unix"  navtitle="UNIX"  href="subject/unix.dita"/>
    <subjectdef keys="zos"   navtitle="z/OS"/>
  </subjectdef>
  <enumerationdef>
    <attributedef name="platform"/>
    <subjectdef keyref="os"/>
  </enumerationdef>
</subjectScheme>
Figure 2. The linux.dita subject definition topic
<concept id="linux">
  <title>The Linux operating system</title>
  <body>
     <p>Although Linux has historical roots in UNIX, ...</p>
  </body>
</concept>
Figure 3. The unix.dita subject definition topic
<concept id="unix">
  <title>The UNIX operating system</title>
  <body>
     <p>As a commercial operating system, UNIX differs from Linux ...</p>
  </body>
</concept>

As usual with DITA maps and topics, this approach has the benefit of decoupling maintenance of the relationships between subjects (including what belongs to an enumeration) from the maintenance of the textual explication of the meaning of the subject. For instance, if writers discover that the textual explication needs to be revised to eliminate an potential misinterpretation, the subject definition topic can be revised without touching the subject scheme map. Similarly, new subjects can be added to an enumeration without touching the subject definition topics.

A subject definition topic can be reused in multiple alternative subject schemes. Content providers who need to simplify the authoring experience for non-professional writers could take advantage of this capability to provide a subset of the defined controlled values in their environment.

Such subject definition topics can be provided during the initial creation of the subject scheme or added later as the need arises. When a subject is defined only with a key but not with a reference to a topic, the key can be thought of as an identifier for a virtual topic that could be added later if needed to explicate a well-known subject.

When offering a list of subjects in a pick list, an editor may support drill down into the subject definition topic for a detailed explanation of the subject.

Where the maintainer of the subject scheme has provided definitional topics for the controlled values, default DITA output formatting can produce a help file, PDF, or other readable catalog for understanding the controlled values.

Note: Nothing prevents the content team from using a topic both to define a subject and as part of the content provided by a deliverable. That's especially likely with conceptual definitions including glossary topics. In some cases, however, design guidelines that are published for content creators rather than content readers may provide more useful subject definition topics. Examples might include such as definitions of learning competencies.
Note: Nothing prevents a content provider from defining the meaning of a subject in a non-DITA format. The content provider would merely set the format attribute appropriately on the <subjectdef> element. In particular, a <subjectdef> could refer to subjects with OWL or TopicMaps definitions. That is, adopters could serialize a simplified form of the subjects as a scheme for use in DITA processing such as filtering and flagging.

Defining a hierarchy of controlled values

Content providers need the ability to classify more specifically in some cases while classifying more generally in others. For instance, a content provider might need to provide note about specific versions of Linux as well as general notes about Linux.

An enumeration can be defined with hierarchical levels merely by nesting subject definitions. This approach is consistent with the other uses of the DITA map for expressing a general-to-specific hierarchy, notably nesting of topic references for navigations.

<subjectScheme>
  <subjectdef keys="os" navtitle="Operating system">
    <subjectdef keys="linux" navtitle="Linux">
      <subjectdef keys="redhat" navtitle="RedHat Linux"/>
      <subjectdef keys="suse"   navtitle="SuSE Linux"/>
    </subjectdef>
    <subjectdef keys="mswin" navtitle="Windows"/>
    <subjectdef keys="zos"   navtitle="z/OS"/>
  </subjectdef>
  <enumerationdef>
    <attributedef name="platform"/>
    <subjectdef keyref="os"/>
  </enumerationdef>
</subjectScheme>

A hierarchical enumeration supports tagging similar to the following:

<p platform="linux">You must set up a cron job to ...</p>
<p platform="redhat">To set up the cron job, ...</p>

This hierarchical enumeration affects filtering and flagging as follows:

  • When filtering includes or excludes Linux explicitly and doesn't identify RedHat explicitly, processes should also apply the filtering operation to the RedHat content because RedHat is a special kind of Linux.
  • When filtering includes RedHat explicitly and doesn't explicitly exclude Linux, processes should include the general Linux content because the general Linux content applies to RedHat Linux.
  • When flagging is set explicitly for Linux but isn't set explicitly for RedHat, processes should also apply the Linux flag to the RedHat content because RedHat is a special kind of Linux.

Merging controlled values with an extension scheme

When content providers share an enumeration of controlled values, they may discover the need to extend the shared enumeration to handle special cases. That's particularly true when a need for new controlled values is discovered during content creation, and teams cannot afford to delay for agreement on the new values.

In the same way that maps can aggregate by reference using a <topicref> element with a format attribute of "ditamap", subject schemes can aggregate by reference. The specialized elements used to merge schemes:

<schemeref>
A specialized <topicref> element with a format attribute of "ditamap" and a type attribute of "scheme" that identifies a base scheme extended by this scheme.

Because a scheme establishes relationships between subjects rather than a contextual navigation structure, new relationships can be added to existing subjects. In particular, the referencing scheme can extend an enumeration by adding new relationships to existing subjects that belong to the enumeration. For instance, a scheme could extend the baseOS.ditamap scheme shown in previous examples by adding Macintosh OS as a child of the existing os subject and adding special versions of Windows under the existing mswin subject:

<subjectScheme>
  <schemeref href="baseOS.ditamap"/>
  <subjectdef keyref="os">
    <subjectdef keys="macos" navtitle="Macintosh"/>
    <subjectdef keyref="mswin">
      <subjectdef keys="winxp" navtitle="Windows XP"/>
      <subjectdef keys="win98" navtitle="Windows Vista"/>
    </subjectdef>
  </subjectdef>
</subjectScheme>

The references to the subjects defined by the base scheme use the keyref attribute to avoid duplicate definitions of the keys.

The result of merging the extension scheme with the base scheme is exactly the same as the following single scheme:

<subjectScheme>
  <subjectdef keys="os" navtitle="Operating system">
    <subjectdef keys="linux" navtitle="Linux">
      <subjectdef keys="redhat" navtitle="RedHat Linux"/>
      <subjectdef keys="suse"   navtitle="SuSE Linux"/>
    </subjectdef>
    <subjectdef keys="macos" navtitle="Macintosh"/>
    <subjectdef keys="mswin" navtitle="Windows">
      <subjectdef keys="winxp" navtitle="Windows XP"/>
      <subjectdef keys="win98" navtitle="Windows Vista"/>
    </subjectdef>
    <subjectdef keys="zos"   navtitle="z/OS"/>
  </subjectdef>
  <enumerationdef>
    <attributedef name="platform"/>
    <subjectdef keyref="os"/>
  </enumerationdef>
</subjectScheme>

Because the extended baseOS scheme bound the os subject to the platform attribute, the extension scheme doesn't provide that binding. The controlled values added by the extension to the hierarchy for the os subject become part of the enumeration bound to the platform attribute.

A category can also be extended upward. For instance, an extension scheme could create a Software category that includes operating systems as well as applications.

<subjectScheme>
  <schemeref href="baseOS.ditamap"/>
  <subjectdef keys="sw" navtitle="Software">
    <subjectdef keyref="os"/>
    <subjectdef keys="app" navtitle="Applications">
      <subjectdef keys="apacheserv" navtitle="Apache Web Server"/>
      <subjectdef keys="mysql"      navtitle="MySQL Database"/>
    </subjectdef>
  </subjectdef>
</subjectScheme>

If the extended baseOS scheme defined the binding of the os subject with the platform attribute, the app subjects provided by the extension scheme aren't subordinate to the os subject and thus don't become part of that enumeration. To leave open the possibility of upward extension of an enumeration, the content provider should define the controlled values in one scheme and define the binding to the attribute separately in a extension scheme. That way, the content provider can substitute a binding to a different extension without rework.

An adopter would identify the extension scheme as the scheme governing controlled values in the DITA environment. Any base schemes referenced by the extension scheme are, from a logical view, part of the extension scheme.

Note: Processors may choose to implement the merge operation by saving a new monolithic scheme that aggregates the subject definitions in a single file. This monolithic scheme would have to be refreshed when any of the source files for the merged schemes changes, but such changes may not happen frequently once the content provider's scheme stabilizes.

Binding multiple categories to a single attribute

While providing a single category for an attribute usually provides the most straightforward authoring experience, there are cases where an adopter might want to provide multiple categories for a single attribute. That's particularly true with the otherprops attribute, which allows content teams to supply controlled values even if the team lacks the technical knowledge to specialize new attributes in a DTD or XML Schema. An editor tool could prompt the user to select a category from the scheme and then select a subject within the category.

The following example defines the application and task type enumerations and binds them to the otherprops attribute:

Figure 4. scheme.ditamap
<subjectScheme>
  ...
  <subjectdef keys="app" navtitle="Applications">
    <subjectdef keys="apacheserv" navtitle="Apache Web Server"/>
    <subjectdef keys="mysql"      navtitle="MySQL Database"/>
  </subjectdef>
  <subjectdef keys="taskType" navtitle="Task type">
    <subjectdef keys="setup"        navtitle="Setting up"/>
    <subjectdef keys="operate"      navtitle="Operating"/>
    <subjectdef keys="troubleshoot" navtitle="Troubleshooting"/>
  </subjectdef>
  <enumerationdef>
    <attributedef name="otherprops"/>
    <subjectdef keyref="app"/>
    <subjectdef keyref="taskType"/>
  </enumerationdef>
</subjectScheme>

The writer can then supplies the mysql and troubleshooting keys in the otherprops attribute to indicate that the content pertains to both the MySQL database and the troubleshooting task:

Figure 5. contentTopic.dita
<task ...>
  ...
  <note otherprops="mysql troubleshoot">Please check to make sure 
the daemon is running.</note>
  ...
</task>

When an attribute is bound to multiple enumerations, DITA processing determines exclusion for filtering based on the enumeration category rather than on the attribute. The following example filters notes and other content that applies to MySQL and not other software applications regardless of which tasks are specified by the otherprops attribute:

<val>
  <prop att="otherprops" val="mysql" action="exclude"/>
</val>

Draft comment:
A future version could introduce <categoryref> and <subjectref> elements for the DITA values file to express conditions by reference instead of requiring attribute values.

Classifying and qualifying topics in a map

By defining a scheme of controlled values that organizes subjects in categories with hierarchical relationships, a content provider in fact defines a simple taxonomy. By indicating which content is about the subjects defined in this scheme, the content provider can perform faceted classification (see http://en.wikipedia.org/wiki/Faceted_classification).

As noted in Statement of Requirement, information viewers can use a classification to support retrieval and traversal of the content. That is, the same enumeration of controlled values needed for filtering or flagging at build time also supports filtering, flagging, or retrieval at request time. For instance, once a content provider has defined the Linux operating system, the provider should be able to produce a deliverable without Linux or, if supported by the information viewer, allow the user to retrieve content specific to Linux.

As part of a classification, the content provider must distinguish cases where the content is about a subject (that is, provides the authoritative treatment of the subject) from cases where the content applies to the subject. Content about a subject is a good target for retrieval and traversal as well as filtering and flagging. Content that applies to a subject is appropriate for filtering and flagging but not retrieval and traversal.

Classification is needed only in the map for the following reasons:

  • Targets for retrieval or traversal are always complete topics or collections of topics.
  • As with indexing, content is classified to make distinctions; classification decisions are not made in isolation.

The classification elements are provided in a map domain so adopters who are using an information viewer that supports retrieval and traversal can classify their content but others content providers don't see the special elements. That is, like the subject scheme, the classification map domain is optional for adopters. The elements in the map domain:

<topicsubject>
A specialized <topicref> element that identifies the subjects for which the topic or collection of topics provides the authoritative treatment. The subjects can be identified by keys (if defined in the scheme) or, if the subject definition topic exists, by href (as with ordinary topic references). Additional secondary subjects can be specified by nested <subjectref> elements. Subjects that apply only for filtering or flagging and not for retrieval or traversal can be specified by <topicapply> elements.
<topicapply>
A specialized <topicref> element that identifies subjects that qualify the content for filtering or flagging but not retrieval. The <topicapply> element can identify a single subject. Additional subjects can be specified by nested <subjectref> elements.
<subjectref>
A specialized <topicref> element that identifies a subject by key (if defined in the scheme) or href (if a subject definition topic exists). The <subjectref> element is contained within either <topicsubject>, <topicapply>, or <topicSubjectTable>.
<topicSubjectTable>
A specialized <reltable> that reserves the first column for the <topicref> elements that identify content and subsequent columns for <topicsubject>, <topicapply>, or <subjectref> elements that identify subjects covered by the content or qualifying the content. In the reltable header, the subject columns can have a <subjectref> element to specify the category for the subjects in the column. That way, each column covers a different category, making a facet classification easier.

In the following example, the map is classified as covering the Linux subject and the "Developing web applications" topic as covering the web and development subjects:

<map>
  <title>Working with Linux</title>
  <topicsubject keyref="linux"/>
  ...
  <topicref href="webapp.dita" navtitle="Developing web applications">
    <topicsubject>
      <subjectref keyref="web"/>
      <subjectref keyref="development"/>
    </topicsubject>
    ...
  </topicref>
  ...
</map>

As with all metadata in DITA maps, the classifications cascade down the navigation hierarchy unless overridden by a different subject for the same attribute or category. Thus, by virtue of being in the Linux map, the "Developing web applications" topic is also about Linux. DITA provides the cascade because a navigation hierarchy reflects a drilldown from the general to the specific.

When enabling retrieval or traversal, the build output format for the classification depends on the runtime viewer. Standard formats for classification include SKOS RDF (in particular, see http://www.w3.org/2004/02/skos/) and TopicMaps (in particular, see http://www.techquila.com/psi/thesaurus/).

Note: Some classification output formats may require additional properties. Content providers can supply these properties by creating a specialized topic type that collects the properties as part of subject definition topics.

Draft comment:
If a future version of the DITA values file introduces <categoryref> and <subjectref> elements, adopters could indicate to processors how to filter based on the classification in the DITA map.

Specific relationships between controlled values

Some advanced retrieval or traversal processing benefits from more specific relationships between subjects than simple hierarchies. The benefit of being able to use such precision has been recognized as part of the Functional Requirements for Bibliographic Records (FRBR, see http://vocab.org/frbr/extended), TermBase Exchange (TBX, see http:www.lisa.org/standards/tbx/tbxISO_final.html), and Simple Knowledge Organization System (SKOS, see http://www.w3.org/2004/02/skos/extensions/spec/) initiatives. The scheme provides the following optional elements for adopters who need to specify explicit relationships.

<hasNarrower>
A specialized <topicref> element that indicates that the container subject is more general than each of the contained subjects. That is, this element makes the default hierarchical relationship explicit.
<hasKind>, <hasPart>, <hasInstance>
A specialized <topicref> element that indicates a specific relationship between the container subject and each of the contained subjects. The <hasKind> element specifies a KIND-OF / IS-A relationship.
<hasRelated>
A specialized <topicref> element that identifies an associative relationship between the container subject and each of the contained subjects. As in any DITA map, relationships applies to all parent-child pairs of descendants.
<relatedSubjects>
A specialized <topicref> element that establishes associative relationships between each child subject and every other child subject (unless the association is restricted by the linking attribute of the subjects).
<subjectHead>
Provided labeled grouping within a subject hierarchy similar to the <topichead> element in a navigation hierarchy. The <subjectHead> element cannot be referenced and thus does not define a controlled value.
<subjectRelTable>
A specialized <reltable> that establishes relationships between the subjects in different columns of the same row.

The following scheme establishes that Internet Explorer is part of Windows and that the Linux, the Apache Web Server, and the MySQL Database are related:

<subjectScheme>
  ...
  <subjectdef keys="mswin" navtitle="Windows">
    <hasPart>
      <subjectdef keys="iexplorer" navtitle="Internet Explorer Browser"/>
      ...
    </hasPart>
  </subjectdef>
  ...
  <relatedSubjects>
    <subjectdef keys="linux"     navtitle="Linux"/>
    <subjectdef keys="apacheweb" navtitle="Apache Web Server"/>
    <subjectdef keys="mysql"     navtitle="MySQL Database"/>
    ...
  </relatedSubjects>
  ...
</subjectScheme>

For filtering and flagging, processors need only inspect the subordinate hierarchies under category subjects that are bound to attributes. Filtering and flagging processors do not have to understand specific types of relationships. Explicit relationships are useful primarily for information viewers with advanced capabilities.

The content provider can name an explicit relationship by specifying the navtitle attribute and can provide more detailed properties of a relationship in a definitional topic. The content provider can also use keys to apply the same relationship to multiple subjects.

Summary of new elements

The scheme map provides the following elements:

  • <subjectScheme> contains an optional <topicmeta> element followed by zero or more <schemeref>, <subjectdef>, <subjectHead>, <enumerationdef>, or the explicit relationships <hasNarrower>, <hasKind>, <hasPart>, <hasInstance>, <hasRelated>, <relatedSubjects>, and <subjectRelTable> as well as <anchor>, <data>, <data-about>, <navref>, <reltable>, or <topicref> for extensibility.
  • <schemeref> contains an optional <topicmeta> element followed by zero or more <data> or <data-about> for extensibility.
  • The explicit relationships <hasNarrower>, <hasKind>, <hasPart>, <hasInstance>, and <hasRelated> contain an optional <topicmeta> element followed by zero or more <subjectdef> or <subjectHead> as well as <topicref>, <data>, or <data-about> for extensibility.
  • <subjectdef> contains an optional <topicmeta> element followed by zero or more <subjectdef>, <subjectHead>, or the explicit relationships <hasNarrower>, <hasKind>, <hasPart>, <hasInstance>, or <hasRelated> as well as <topicref>, <data>, or <data-about> for extensibility.
  • <subjectHead> contains <subjectdef> or <subjectHead> as well as <topicref>, <data>, or <data-about> for extensibility.
  • <enumerationdef> contains an optional <elementdef>, one <attributedef>, an optional <defaultSubject>, one or more <subjectdef>, and zero or more <data> or <data-about> for extensibility.
  • <elementdef> contains zero or more <data> or <data-about> for extensibility.
  • <attributedef> contains zero or more <data> or <data-about> for extensibility.
  • <defaultSubject> contains zero or more <data> or <data-about> for extensibility.
  • <relatedSubjects> contains zero or more of <subjectdef> as well as <topicref>, <data>, or <data-about> for extensibility.
  • <subjectRelTable> contains a specialized reltable structure with an optional <topicmeta> element, an optional <subjectRelHeader> element, and one or more <subjectRel> elements.
  • <subjectRelHeader> contains one or more <subjectRole> elements.
  • <subjectRel> contains one or more <subjectRole> elements.
  • <subjectRole> contains zero or more <subjectdef> elements as well as <topicref>, <data>, or <data-about> for extensibility.

The classification map domain provides the following elements:

  • <topicsubject> contains an optional <topicmeta> element followed by zero or many <subjectref> as well as <topicref>, <data>, or <data-about> for extensibility.
  • <topicapply> contains an optional <topicmeta> element followed by zero or many <subjectref> as well as <topicref>, <data>, or <data-about> for extensibility.
  • <subjectref> contains an optional <topicmeta> element followed by zero or many <data> or <data-about> for extensibility.
  • <topicSubjectTable> contains a specialized reltable structure with an optional <topicmeta> element, an optional <topicSubjectHeader> element, and zero or many <topicSubjectRow> elements.
  • <topicSubjectHeader> contains one <topicCell> element followed by one or more <subjectCell> elements.
  • <topicSubjectRow> contains one <topicCell> element followed by one or more <subjectCell> elements.
  • <topicCell> contains one or more <topicref> as well as <data> or <data-about> for extensibility.
  • <subjectCell> contains an optional <topicsubject> followed by zero or many <topicapply> or <subjectdef> as well as <topicref>, <data>, or <data-about> for extensibility.

The existing DITA taxonomy specialization (available as a plugin for the DITA Open Toolkit) provides many of these elements (see http://www.ibm.com/developerworks/xml/library/x-dita10/).

New or Changed Specification Language

Costs

  • Revision to the specification.
  • Providing the DTD and XML Schema implementation of the Scheme map and Classification domain vocabularies.
  • Extending DITA editors and processors to register and merge the scheme for validating attributes.
  • Extending DITA editors and processors to validate the values of bound attributes against the scheme.
  • Optionally extending DITA editors to provide a pick list for selecting attribute values and drilldown from a value into the definitional topic.
  • Extending DITA processors to filter or flag based on the categories bound to attributes and to qualify content based on the classification domain.

Benefits

  • Content providers can validate, filter, and flag controlled values for their content without implementing documentation types.
  • Content providers can handle cases where one controlled value subsumes another.
  • Content providers can share controlled values with others but still extend the shared values to meet their special requirements.
  • Content providers can scale up to use the same controlled values for retrieval and traversal where viewers support these operations.