Add a <data> element for representing machine-processable values within DITA topics and maps.
The problem: Currently, DITA provides limited extensibility for properties as well as embedded data (such as the form fields that word processors can embed in content).
For the topic as a whole, a designer can only specialize single values from the <othermeta> element. The designer can't define complex data structures comparable to the existing <audience> and <prodinfo> properties. As a result, specializers are forced to specialize body content to create complex data structures. (For examples, see the eNote record-oriented demo and the bookinfo data for the bookmap demo in the DITA Open Toolkit.) These data structures are not part of the topic content and thus don't belong in the body.
Within the topic content, a designer can specialize data from the <state> element. Again, the <state> element supports only single values. As a result, specializers are forced to abuse discourse elements such as <ph> for data that isn't a textual phrase.
The solution: Add a <data> element for values intended to be consumed primarily by automated processes. Typical applications would include both complex metadata structures and hybrid documents with both discourse and data values. You can nest <data> elements for structures and specialize the <data> element for more precise semantics and for constraints on structures and values.
A process could harvest the data values for a machine processable representation such as RDF. Formatting for discourse should skip the <data> element by default. A specialization could, however, extend processing to include data values in some formatted outputs (again, similar to form fields in word processor formats).
The <data> element is a more powerful alternative to the <state> and <othermeta> elements. The <state> element could, in fact, be specialized from the <data> element and deprecated.
The following references are pertinent:
Major only because the implications need to be considered carefully. The actual design and implementation should be small.
Here are some specific examples of potential uses of the <data> element:
Design impact: a new element would be added with a definition similar to the following (in DTD syntax)
<!ELEMENT data (#PCDATA|%keyword;|%term;|%image;|%object;|%ph;|%data;)*>
<!ATTLIST data %univ-atts;
name CDATA #IMPLIED
label CDATA #IMPLIED
typeid CDATA #IMPLIED
abouthref CDATA #IMPLIED
abouttype CDATA #IMPLIED
aboutformat CDATA #IMPLIED
value CDATA #IMPLIED
href CDATA #IMPLIED
type CDATA #IMPLIED
format CDATA #IMPLIED
outputclass CDATA #IMPLIED
>
The new element would be added to the following contexts:
Processing impact: a default rule to ignore the <data> element silently.
Because the <data> element is optional in all contexts, there is no migration costs.
As noted above, design impacts are small and processing impacts are minimal.
Editors and content management systems must implement support for nested values.
The benefit of the <data> element is primarily for specialization. Without the <data> element, data structures have to be implemented in the topic body because topic metadata doesn't support complex structures. Semantically, this workaround (usually with the <ph> or <keyword> element) makes the false promise that the data content is discourse.
A clean basis for extensible data provides benefits to everyone who works with complex metadata and opens up the potential for DITA form-like and transactional documents. The rest of this section lists some examples of potential specializations of the data element in different subject areas.
The following example identifies the properties of a book (where <bkrights> and everything it contains are specialized from the <data> element):
<bookmap>
<bkrights>
<bkcopyrfirst><year>2003</year></bkcopyrfirst>
<bkcopyrlast><year>2005</year></bkcopyrlast>
<bkowner>
<organization>
<orgname>XYZ, Inc</orgname>
<phone>123-456-7890</phone>
<resource href=""http://www.xyz.com/"/>
</organization>
</bkowner>
</bkrights>
...
</bookmap>
The following example specifies source code delimiters for automatic refresh of a code fragment (where the <sourceFile>, <startDelimiter>, and <endDelimiter> elements are specialized from <data> but the <codeFragment> is specialized from <codeblock>):
<example>
<title>An important coding technique</title>
<codeFragment>
<sourceFile value="helloWorld.java"/>
<startDelimiter value="FRAGMENT_START_1"/>
<endDelimiter value="FRAGMENT_END_1"/>
...
</codeFragment>
</example>
The following example identifies a real estate property for a house description (where the <realEstateProperty> and everything it contains are specialized from <data> but <houseDescription> is specialized from <section>).
<houseDescription>
<title>A great home for sale</title>
<realEstateProperty>
<realEstateBlock value="B7"/>
<realEstateLot value="4003"/>
...
</realEstateProperty>
<p>This elegant....</p>
<object data=""B7_4003_tour360Degrees.swf"/>
</houseDescription>
The following example identifies the maintainer of the topic (where <maintainer> is specialized from the <data>):
<topicref href=""sometopic.dita">
<topicmeta>
<maintainer>Sachiko</maintainer>
</topicmeta>
...
</topicref>
With no changes, 4 person hours to implement the DTD and Schema changes, add the default processing rule, and expand this note as a formal specification.