Manage and validate the enumerations for attributes as controlled content.
The list of appropriate values for an attribute is determined by the subject matter of the content. For example, consider the audience attribute. In the documentation for a software design tool, the audience attribute might take values such as Architect, Developer, and User. By contrast, in the report for a pharmaceutical trial, the same audience attribute might take values such as Participant, Researcher, and Executive. In short, content providers need the ability to specify the attribute values that are appropriate for their content.
Because of this need, the selection attributes in DITA 1.1 don't define enumerations but, instead, accept any character data. The challenge for the content provider is define the values for their content.
A content provider could specialize a DTD or XML Schema to enumerate the values for an attribute. This solution, however, has several problems:
Other initiatives have recognized the need to specify the enumerations of values separately from the structure and semantics of the document type. For instance, OASIS Universal Business Language (UBL) separates the validation of the document structure using XML Schema from the validation of controlled values using Schematron (see http://docs.oasis-open.org/ubl/os-UBL-2.0/UBL-2.0.html#CODELISTS). IEEE LOM (Learning Objects Metadata) attributes don't have a single enumeration of values but instead use a <source> subelement to identify the standard that defines the enumeration (see http://ltsc.ieee.org/wg12/materials.html). TBX (TermBase Exchange) uses an external XCS (Extensible Constraint Specification) file to define the controlled values for typing content elements (see http://www.lisa.org/standards/tbx/tbxISO_final.html#flexiFormat).
Provide DITA adopters with a method for defining controlled values as part of their content. This approach has the following benefits:
For an earlier version of this proposal, please see:
http://lists.oasis-open.org/archives/dita/200702/msg00059.html
This proposal has the following impacts:
Fundamentally, a controlled value is a short, readable, and meaningful keyword that identifies a subject. For instance, when the writer sets the platform attribute of a warning note to the linux keyword, the writer is asserting that the warning is about the Linux platform. Because the DITA 1.2 key definition mechanism mints short identifiers, key definitions provide a natural DITA method for defining controlled values. Because the DITA map defines collections, the DITA map provides a natural DITA method for defining an enumeration of controlled values.
The core elements for a specialized map that defines the controlled value identifiers for some subjects are as follows:
The following example enumerates a list of operating systems. The top-level os <subjectdef> element identifies the category. The contained <subjectdef> elements identify each operating system:
<subjectScheme> <subjectdef keys="os"> <subjectdef keys="linux"/> <subjectdef keys="mswin"/> <subjectdef keys="zos"/> </subjectdef> ... </subjectScheme>
Defining both the category and the controlled values for the category as subjects allows nesting of subcategories (as described later in Defining a hierarchy of controlled values).
In that DITA adopters don't have to define controlled values, the subject scheme is an optional specialization for adopters like bookmap.
To define an enumeration for an attribute, the scheme associates the attribute with the <subjectdef> category that contains the enumeration using the following specialized elements:
The following example uses the specialized elements to associate the platform attribute with the operating system category:
<subjectScheme> <subjectdef keys="os"> <subjectdef keys="linux"/> <subjectdef keys="mswin"/> <subjectdef keys="zos"/> </subjectdef> <enumerationdef> <attributedef name="platform"/> <subjectdef keyref="os"/> </enumerationdef> </subjectScheme>
To establish the subject scheme governing attribute values, a map refers to the map that defines the enumerations. As with all key definitions and references, the reference must appear in the highest map that makes use of the controlled values. The general error and override conditions for key definitions apply to controlled values.
After the locating the scheme, tools can validate an attribute against the bound enumeration. For instance, a topic editor could prevent the user from entering "linix" as a platform value:
<note platform="linux">Please don't remove the root directory.</note>
An map editor could also validate the platform attribute in a map against the scheme. Finally, a processor could check that all values listed for an attribute by the DITA values file are bound to the attribute by the scheme before applying filtering or flagging:
<val> <prop att="platform" val="linux" action="flag"> <startflag>Linux</startflag> </prop> </val>
In the example scheme above, the os category is defined separately and referenced with the keyref attribute in the binding to the platform attribute. A content provider can, alternatively, define the category inline within the binding:
<subjectScheme> <enumerationdef> <attributedef name="platform"/> <subjectdef keys="os"> <subjectdef keys="linux"/> <subjectdef keys="mswin"/> <subjectdef keys="zos"/> </subjectdef> </enumerationdef> </subjectScheme>
The choice of the preferred approach is up to the adopter, though separating the binding allows for more flexibility for extension (as described later in Merging controlled values with an extension scheme).
For clarity, a content provider can supply a label with the navtitle attribute of the <subjectdef> element:
<subjectScheme> <subjectdef keys="os" navtitle="Operating system"> <subjectdef keys="linux" navtitle="Linux"/> <subjectdef keys="mswin" navtitle="Microsoft Windows"/> <subjectdef keys="zos" navtitle="z/OS"/> </subjectdef> <enumerationdef> <attributedef name="platform"/> <subjectdef keyref="os"/> </enumerationdef> </subjectScheme>
An editor could provide a pick list with the operating system keys and the titles for selection by the user:
linux | Linux |
mswin | Microsoft Windows |
zos | z/OS |
The editor should store only the key in the for tagging content. That way, the content provider can maintain the title without invalidating existing classification of content.
The writer can also supply a brief description for a subject within the scheme by supplying the <shortdesc> element within a <topicmeta> element under the <subjectdef> element.
To use a controlled value consistently, content teams must apply the same interpretation to each controlled value; that is, teams must share am understanding of the subject indicated by the keyword. Otherwise, writers will apply the same controlled value to different content or apply different controlled values to the same content.
For instance, if one writer understands the server platform to cover any machine running a web server while another writer understands the server platform to cover high-end enterprise clusters accessed simultaneously by hundreds of users, different content will have be classified with the same controlled value. As a result, filtering, flagging, and retrieval operations will treat dissimilar content as if it were the same.
Establishing a shared understanding is especially important when independent teams (perhaps in different companies) must produce a common deliverable.
To clarify the sense or meaning of a controlled value, the content provider can supply a subject definition topic similar to an entry in an encyclopaedia. In the following example, the linux and unix subjects have subject definition topics.
<subjectScheme> <subjectdef keys="os" navtitle="Operating system"> <subjectdef keys="linux" navtitle="Linux" href="subject/linux.dita"/> <subjectdef keys="mswin" navtitle="Windows"/> <subjectdef keys="unix" navtitle="UNIX" href="subject/unix.dita"/> <subjectdef keys="zos" navtitle="z/OS"/> </subjectdef> <enumerationdef> <attributedef name="platform"/> <subjectdef keyref="os"/> </enumerationdef> </subjectScheme>
<concept id="linux"> <title>The Linux operating system</title> <body> <p>Although Linux has historical roots in UNIX, ...</p> </body> </concept>
<concept id="unix"> <title>The UNIX operating system</title> <body> <p>As a commercial operating system, UNIX differs from Linux ...</p> </body> </concept>
As usual with DITA maps and topics, this approach has the benefit of decoupling maintenance of the relationships between subjects (including what belongs to an enumeration) from the maintenance of the textual explication of the meaning of the subject. For instance, if writers discover that the textual explication needs to be revised to eliminate an potential misinterpretation, the subject definition topic can be revised without touching the subject scheme map. Similarly, new subjects can be added to an enumeration without touching the subject definition topics.
A subject definition topic can be reused in multiple alternative subject schemes. Content providers who need to simplify the authoring experience for non-professional writers could take advantage of this capability to provide a subset of the defined controlled values in their environment.
Such subject definition topics can be provided during the initial creation of the subject scheme or added later as the need arises. When a subject is defined only with a key but not with a reference to a topic, the key can be thought of as an identifier for a virtual topic that could be added later if needed to explicate a well-known subject.
When offering a list of subjects in a pick list, an editor may support drill down into the subject definition topic for a detailed explanation of the subject.
Where the maintainer of the subject scheme has provided definitional topics for the controlled values, default DITA output formatting can produce a help file, PDF, or other readable catalog for understanding the controlled values.
Content providers need the ability to classify more specifically in some cases while classifying more generally in others. For instance, a content provider might need to provide note about specific versions of Linux as well as general notes about Linux.
An enumeration can be defined with hierarchical levels merely by nesting subject definitions. This approach is consistent with the other uses of the DITA map for expressing a general-to-specific hierarchy, notably nesting of topic references for navigations.
<subjectScheme> <subjectdef keys="os" navtitle="Operating system"> <subjectdef keys="linux" navtitle="Linux"> <subjectdef keys="redhat" navtitle="RedHat Linux"/> <subjectdef keys="suse" navtitle="SuSE Linux"/> </subjectdef> <subjectdef keys="mswin" navtitle="Windows"/> <subjectdef keys="zos" navtitle="z/OS"/> </subjectdef> <enumerationdef> <attributedef name="platform"/> <subjectdef keyref="os"/> </enumerationdef> </subjectScheme>
A hierarchical enumeration supports tagging similar to the following:
<p platform="linux">You must set up a cron job to ...</p> <p platform="redhat">To set up the cron job, ...</p>
This hierarchical enumeration affects filtering and flagging as follows:
When content providers share an enumeration of controlled values, they may discover the need to extend the shared enumeration to handle special cases. That's particularly true when a need for new controlled values is discovered during content creation, and teams cannot afford to delay for agreement on the new values.
In the same way that maps can aggregate by reference using a <topicref> element with a format attribute of "ditamap", subject schemes can aggregate by reference. The specialized elements used to merge schemes:
Because a scheme establishes relationships between subjects rather than a contextual navigation structure, new relationships can be added to existing subjects. In particular, the referencing scheme can extend an enumeration by adding new relationships to existing subjects that belong to the enumeration. For instance, a scheme could extend the baseOS.ditamap scheme shown in previous examples by adding Macintosh OS as a child of the existing os subject and adding special versions of Windows under the existing mswin subject:
<subjectScheme> <schemeref href="baseOS.ditamap"/> <subjectdef keyref="os"> <subjectdef keys="macos" navtitle="Macintosh"/> <subjectdef keyref="mswin"> <subjectdef keys="winxp" navtitle="Windows XP"/> <subjectdef keys="win98" navtitle="Windows Vista"/> </subjectdef> </subjectdef> </subjectScheme>
The references to the subjects defined by the base scheme use the keyref attribute to avoid duplicate definitions of the keys.
The result of merging the extension scheme with the base scheme is exactly the same as the following single scheme:
<subjectScheme> <subjectdef keys="os" navtitle="Operating system"> <subjectdef keys="linux" navtitle="Linux"> <subjectdef keys="redhat" navtitle="RedHat Linux"/> <subjectdef keys="suse" navtitle="SuSE Linux"/> </subjectdef> <subjectdef keys="macos" navtitle="Macintosh"/> <subjectdef keys="mswin" navtitle="Windows"> <subjectdef keys="winxp" navtitle="Windows XP"/> <subjectdef keys="win98" navtitle="Windows Vista"/> </subjectdef> <subjectdef keys="zos" navtitle="z/OS"/> </subjectdef> <enumerationdef> <attributedef name="platform"/> <subjectdef keyref="os"/> </enumerationdef> </subjectScheme>
Because the extended baseOS scheme bound the os subject to the platform attribute, the extension scheme doesn't provide that binding. The controlled values added by the extension to the hierarchy for the os subject become part of the enumeration bound to the platform attribute.
A category can also be extended upward. For instance, an extension scheme could create a Software category that includes operating systems as well as applications.
<subjectScheme> <schemeref href="baseOS.ditamap"/> <subjectdef keys="sw" navtitle="Software"> <subjectdef keyref="os"/> <subjectdef keys="app" navtitle="Applications"> <subjectdef keys="apacheserv" navtitle="Apache Web Server"/> <subjectdef keys="mysql" navtitle="MySQL Database"/> </subjectdef> </subjectdef> </subjectScheme>
If the extended baseOS scheme defined the binding of the os subject with the platform attribute, the app subjects provided by the extension scheme aren't subordinate to the os subject and thus don't become part of that enumeration. To leave open the possibility of upward extension of an enumeration, the content provider should define the controlled values in one scheme and define the binding to the attribute separately in a extension scheme. That way, the content provider can substitute a binding to a different extension without rework.
An adopter would identify the extension scheme as the scheme governing controlled values in the DITA environment. Any base schemes referenced by the extension scheme are, from a logical view, part of the extension scheme.
While providing a single category for an attribute usually provides the most straightforward authoring experience, there are cases where an adopter might want to provide multiple categories for a single attribute. That's particularly true with the otherprops attribute, which allows content teams to supply controlled values even if the team lacks the technical knowledge to specialize new attributes in a DTD or XML Schema. An editor tool could prompt the user to select a category from the scheme and then select a subject within the category.
The following example defines the application and task type enumerations and binds them to the otherprops attribute:
<subjectScheme> ... <subjectdef keys="app" navtitle="Applications"> <subjectdef keys="apacheserv" navtitle="Apache Web Server"/> <subjectdef keys="mysql" navtitle="MySQL Database"/> </subjectdef> <subjectdef keys="taskType" navtitle="Task type"> <subjectdef keys="setup" navtitle="Setting up"/> <subjectdef keys="operate" navtitle="Operating"/> <subjectdef keys="troubleshoot" navtitle="Troubleshooting"/> </subjectdef> <enumerationdef> <attributedef name="otherprops"/> <subjectdef keyref="app"/> <subjectdef keyref="taskType"/> </enumerationdef> </subjectScheme>
The writer can then supplies the mysql and troubleshooting keys in the otherprops attribute to indicate that the content pertains to both the MySQL database and the troubleshooting task:
<task ...> ... <note otherprops="mysql troubleshoot">Please check to make sure the daemon is running.</note> ... </task>
When an attribute is bound to multiple enumerations, DITA processing determines exclusion for filtering based on the enumeration category rather than on the attribute. The following example filters notes and other content that applies to MySQL and not other software applications regardless of which tasks are specified by the otherprops attribute:
<val> <prop att="otherprops" val="mysql" action="exclude"/> </val>
By defining a scheme of controlled values that organizes subjects in categories with hierarchical relationships, a content provider in fact defines a simple taxonomy. By indicating which content is about the subjects defined in this scheme, the content provider can perform faceted classification (see http://en.wikipedia.org/wiki/Faceted_classification).
As noted in Statement of Requirement, information viewers can use a classification to support retrieval and traversal of the content. That is, the same enumeration of controlled values needed for filtering or flagging at build time also supports filtering, flagging, or retrieval at request time. For instance, once a content provider has defined the Linux operating system, the provider should be able to produce a deliverable without Linux or, if supported by the information viewer, allow the user to retrieve content specific to Linux.
As part of a classification, the content provider must distinguish cases where the content is about a subject (that is, provides the authoritative treatment of the subject) from cases where the content applies to the subject. Content about a subject is a good target for retrieval and traversal as well as filtering and flagging. Content that applies to a subject is appropriate for filtering and flagging but not retrieval and traversal.
Classification is needed only in the map for the following reasons:
The classification elements are provided in a map domain so adopters who are using an information viewer that supports retrieval and traversal can classify their content but others content providers don't see the special elements. That is, like the subject scheme, the classification map domain is optional for adopters. The elements in the map domain:
In the following example, the map is classified as covering the Linux subject and the "Developing web applications" topic as covering the web and development subjects:
<map> <title>Working with Linux</title> <topicsubject keyref="linux"/> ... <topicref href="webapp.dita" navtitle="Developing web applications"> <topicsubject> <subjectref keyref="web"/> <subjectref keyref="development"/> </topicsubject> ... </topicref> ... </map>
As with all metadata in DITA maps, the classifications cascade down the navigation hierarchy unless overridden by a different subject for the same attribute or category. Thus, by virtue of being in the Linux map, the "Developing web applications" topic is also about Linux. DITA provides the cascade because a navigation hierarchy reflects a drilldown from the general to the specific.
When enabling retrieval or traversal, the build output format for the classification depends on the runtime viewer. Standard formats for classification include SKOS RDF (in particular, see http://www.w3.org/2004/02/skos/) and TopicMaps (in particular, see http://www.techquila.com/psi/thesaurus/).
Some advanced retrieval or traversal processing benefits from more specific relationships between subjects than simple hierarchies. The benefit of being able to use such precision has been recognized as part of the Functional Requirements for Bibliographic Records (FRBR, see http://vocab.org/frbr/extended), TermBase Exchange (TBX, see http:www.lisa.org/standards/tbx/tbxISO_final.html), and Simple Knowledge Organization System (SKOS, see http://www.w3.org/2004/02/skos/extensions/spec/) initiatives. The scheme provides the following optional elements for adopters who need to specify explicit relationships.
The following scheme establishes that Internet Explorer is part of Windows and that the Linux, the Apache Web Server, and the MySQL Database are related:
<subjectScheme> ... <subjectdef keys="mswin" navtitle="Windows"> <hasPart> <subjectdef keys="iexplorer" navtitle="Internet Explorer Browser"/> ... </hasPart> </subjectdef> ... <relatedSubjects> <subjectdef keys="linux" navtitle="Linux"/> <subjectdef keys="apacheweb" navtitle="Apache Web Server"/> <subjectdef keys="mysql" navtitle="MySQL Database"/> ... </relatedSubjects> ... </subjectScheme>
For filtering and flagging, processors need only inspect the subordinate hierarchies under category subjects that are bound to attributes. Filtering and flagging processors do not have to understand specific types of relationships. Explicit relationships are useful primarily for information viewers with advanced capabilities.
The content provider can name an explicit relationship by specifying the navtitle attribute and can provide more detailed properties of a relationship in a definitional topic. The content provider can also use keys to apply the same relationship to multiple subjects.
The scheme map provides the following elements:
The classification map domain provides the following elements:
The existing DITA taxonomy specialization (available as a plugin for the DITA Open Toolkit) provides many of these elements (see http://www.ibm.com/developerworks/xml/library/x-dita10/).