I've just had time to skim it so far, but I can
already see that Erik's paper clearly and systematically outlines of
what attribute extension in general should look like - in addition to
many other valuable DITA extensions that would make the kind of
customization I've resigned myself to in any workable implementation
unnecessary.
But this just makes me wonder why I've been getting so much resistance
over what, unlike some of these extensions, should be a no brainer -
allowing DITA implementers to add arbitrary, implementation-specific
attributes to elements that will be ignored on generalization.
Erik recognizes the need for this:
Attribute addition
Adding new properties to existing elements without losing
interoperability with others. Many problem domains have special
metadata that must be represented in the document instance. For
instance, warning notes for hardware might have an attribute that
identifies the regulation that motivates the warning.
He also provides the use case and justification, along the same
object-oriented lines I've been arguing:
The Object Oriented perspective and addition of properties to
discourse
In the Object Oriented approach, a specialized class inherits all
of the properties of the base class. These properties are often
accessed through extensible behaviors, but that nuance doesn't alter
the basic principle. The specialized class introduces variation by
adding new properties (in XML Schema parlance, through extension by
addition).
The following example shows the specialization of a class for
generic structural nodes to define a class for tree nodes.
General class |
Specialized class |
Node
data: Object
next: Node
|
TreeNode
data: Object
next: Node
parent: TreeNode
|
A program can treat objects of the specialized type as objects of
the general type through a casting operation that hides the added
properties. For instance, the parent property of the TreeNode class
isn't visible when a program is treating a TreeNode object as a Node
object. Such casting makes it easy for a program to process objects in
shared or distinct ways as appropriate.
Adding properties to a discourse object can be important for
metadata processing and for hybrid documents that include record data
as well as discourse text. For instance, a lab report type might need
metadata about the institution that produced the report or record data
expressing the raw data analyzed in the report. If added content is
restricted to properties outside the main flow of discourse, the
standard object-oriented strategy of hiding the additions can maintain
the validity of the discourse when generalizing to a type that doesn't
have the added properties. That is, after the added properties are
hidden, the remaining discourse remains a valid instance of the general
type.
One strategy is to put the addition inside a processing
instruction that occupies the position of the hidden content during
generalization. It should even be possible to add properties to a
specialization of an empty element because, when generalized, the empty
element should be able to contain the processing instruction for the
hidden addition.
Special type |
General type after hiding
the addition |
<fig>
<title>Quantum
engines</title>
<labloc>B52-FA-RA13</labloc>
<image href="qengines.jpg"/>
</fig>
|
<fig>
<title>Lab report</title>
<?HIDDEN-ELEMENT <labloc>B52-FA-RA13</labloc> ?>
<image href="qengines.jpg"/>
</fig>
|
Thus, addition complements substitution by supporting extensible
properties about discourse.1
So what's the fuss? Why so much resistance?
--Dana
Erik Hennum wrote:
Hi, DITA Committee Folk:
Since it came up, I'd like to summarize some ideas that have been
brewing offline for a while now. Maybe the ambition for more
significant attribute capabilities in the future can provide motivation
for progress on attribute specialization now.
FWIW, a paper at last year's Extreme has more detail:
Of course, the issues summarized herein require more thought and many
perspectives to get right.
1. Specializing an attribute that takes a single value (not an
enumeration)
If an element contains a value (that is, only text), a designer in DITA
1.0 can specialize that element by changing the name and restricting
the value to specify a more precise semantic. For instance, we can
specialize <apiname> as <javaClassName> or specialize
<msgnum> to <httpErrorCode>.
In principle, the same kind of specialization should be possible for an
attribute that takes a single value. For instance, a designer should be
able to distinguish and enforce formats for the version, release, and
modification attributes on <vrm>, for the id on
<resourceid>, the content on <othermeta>, or the value on
<state>.
In the same way that the specialized <parml> element can mandate
a specialized <plentry> in its substructure, a specialized
element should be able to mandate a specialized attribute. That ability
to specialize an attribute as part of element substructure might be
something to take on after DITA 1.1
2. Interoperability of a model over variant XML syntax
More fundamentally, could specialization allow mutability between a
single-value attribute and a text-only subordinate element (a
possibility that Bruce raised with respect to the <data>
element)? For instance, could DITA recognize the following forms as
identical?
<p owner="bjorn">It all began...</p>
<p><owner>bjorn</owner>It all began...</p>
Building on that, could DITA recognize equivalence between the
subdivision of a value into fields via a pattern and fields in the
content delimited by subordinate elements? For instance, could a base
instance of
<bookinfo publisher="Bjornsen, Bjorn"/>
be specialized via a field pattern of "'(\w+),\s+(\w+)', lastname,
firstname" as
<bookinfo><publisherIndividual>
.... <lastname>Bjornsen</lastname>
.... <firstname>Bjorn</firstname>
</publisherIndividual></bookinfo>
Similarly, could a different base instance of
<bookinfo publisher="AMLW - Amalgamated Widgets"/>
be specialized via a different field pattern of "'(\w+) - (\w+)',
stock, company" as
<bookinfo><corporatePublisher>
.... <stock>AMLW</stock>
.... <company>Amalgamated Widgets</company>
</corporatePublisher></bookinfo>
This account is only a sketch of a direction, but this capability would
let designers specify text for general content and still allow
specialized elements for precision.
3. Bridging between definitions of controlled values and citations of
controlled values
How might adopters define the controlled values for an enumeration --
especially in a way that permits extensibility of those values?
One possibility would be to use the key feature proposed for DITA 1.2
(credit to Mr. Priestley for that lightbulb):
- Use a specialized DITA topic to define the meaning of the
controlled value (a meta topic, if you will).
- Use a specialized DITA map both to combine these definitional
topics in groups (like operating system platform, machine type
platform, audience education, and audience job) and to indicate
semantic hierarchies within each group ("RedHat" is a special kind of
"Linux," "appdev" is a special kind of "programmer").
- Assign a key (effectively, a local name) to each definitional
topic.
- Use the keys as values in metadata attributes.
Benefits: The enumeration can be maintained by content creators without
having to modify a schema definition. A process can still validate the
enumeration (that is, check that the controlled values in topics have
corresponding definitions). Where a controlled value without any
definition might be ambiguous, a defined controlled value can be
clarified by drilling down into the definitional topic. The definitions
of controlled values can be shared easily between adopters and allow
adopters to use different local names for the same thing (for instance,
"linux" and "LinuxOS" and "unices.linux"). The taxonomic relationships
can be maintained without forcing classification changes in the
content. Definitional topics can be reused as content topics where the
user would benefit from a definition of an unfamiliar concept. Finally,
adopters can scale the formality of their practice from single
controlled values to formal taxonomies without any change in their
authoring infrastructure. (In fact, the DITA taxonomy specialization
provides an implementation of the first two bullets above.)
4. Specializing an attribute that takes an enumeration
The two sides debating about attribute specialization seems to focus on
different things.
One side has a focus on the semantics of the attributes, submitting
that, if you analyze your audience by education, by job role, or by
both, you are still analyzing your audience.
The other side has a focus on the values, submitting that any
enumeration of audience education values requires additional
information to merge with any other enumeration of audience values.
The second side has a point. Where the base and specialized attributes
have a clear semantic relationship, the base enumeration would include
values that are compounds of the values from the specialized
enumerations. As a result, even in the best case, the mapping will be
likely to be complex and partial.
For example, if operating system and machine are special kinds of
platform, adopters might need mappings similar to the following:
adopter 1 (base) .......... platform = ( bigiron | openserver |
wintel | handheld )
adopter 2 (specialized) ... os = ( linux | macosx | windows
)
........................... machine = ( macintosh | mainframe |
pc | server )
mapping ................... platform( bigiron ) MATCHES
machine=( mainframe )
........................... platform( openserver ) MATCHES os(
linux ) OR machine=( server )
........................... platform( wintel ) MATCHES os(
windows ) OR machine=( pc )
So far as I know, no one wants to address that mapping challenge now.
Besides, the DITA practice thus far has been to enable vocabulary
agreements within communities in advance rather than to try to
reconcile arbitrary vocabularies after the fact. So, let's acknowledge
that we won't automate the mapping of values from different
enumerations and thus won't automate integration of enumerations for
conditional processing.
All that said -- does the first side have a point, too? If I need to
enumerate the audience by education and know that I am providing an
analysis of the audience, why should I be forced to treat audience
education as if it were completely unrelated to audience? If I can
declare that audienceEducation specializes audience, processes other
than conditional processes that operate on audience semantics can
recognize the values of audienceEducation as indicating something about
the audience. For instance, a process might build an index of content
by audience or by platform:
.......... bigiron
.......... handheld
.......... linux
.......... macintosh
.......... macosx
.......... mainframe
.......... openserver
.......... pc
.......... server
.......... windows
.......... wintel
Otherwise, each attribute that has processing that is sensitive to
semantics will require a custom process.
Hoping that's useful,
Erik Hennum
ehennum@us.ibm.com
|