DITA Proposed Feature # 20

Extensible universal attributes, specifically for conditional processing (filtering/flagging, also known as profiling), but also for arbitrary attributes that have a similarly simple syntax.

Longer description

Allow DITA document type developers to incorporate new conditional processing attributes that can be used for filtering and flagging, or new attributes with no existing equivalent that can be managed and generalized in the same way as conditional processing attributes.

The new attributes need to be:
  • Identified as conditional processing attributes (when intended for this purpose)
  • Preserved during generalization and respecialization
  • While generalized, still operable on by either general or specialized behaviors (for example, conditional processing)

This proposal also documents a proposal for increased flexibility in the attribute values used for conditional processing, and in the ditaval format that is used to select values for exclusion, inclusion, or flagging.

Scope

Major

Use Case: conditional processing

  1. DITA architect for a team defines new attributes that are needed by the team (eg proglanguage)
  2. DITA architect expresses each new attribute as a separate domain package (eg proglanguage.mod, with new attribute specialized from props attribute)
  3. DITA architect integrates the domain packages into the authoring DTDs or schemas:
    1. redefining "props" attribute entity to include proglanguage attribute, same way we redefine element entities to integrate new domain elements;
    2. adding the new attribute domain to the list of domains in the domains attribute, preceded by an "a"; for example domains="a(props proglanguage)" or domains="a(props audience role")
  4. Author can now add values to the new attributes, since they are physically present in the document type
  5. Build developer defines values in ditaval format and runs a build to remove or flag content based on the new attributes (eg flag all proglanguage="Java").
  6. Another build developer includes their content but needs to run all content through a specialization-unaware trademarking tool that requires generalization of the contributed content; after generalization, the content is processed into output with filtering based on the new attributes (which are now collapsed into props attribute):
    1. the generalize process turned proglanguage="Java" into props="proglanguage(Java)"
    2. the conditional processing transform recognizes the new form as equivalent to the old, and the instruction "flag all proglanguage=java" operates on either props="proglanguage(Java)" or proglanguage="Java".

Draft comment:

The grouping mechanism generated during generalization will also be directly authorable. Should it be documented as such? This would effectively allow the expression of OR statements within a single attribute. If we do so, would we need to distinguish more directly between values that identify groupings (for example the "proglanguage" value in props="proglanguage(Java)") and those that are directly processible (for example props="vendor1")? One possibility would be to include a colon in the syntax for attribute-based groupings (eg props="proglanguage:(Java)").

For consistency, the rev attribute will also be made specializable, although it and its specializations will only be usable for flagging, not filtering. For example, a specialization of the rev attribute might identify a particular kind of revision (technical vs grammatical) or the role of the reviser (editor vs author).

Use case: generic attributes

  1. DITA architect for a team needs to add a new attribute that has no equivalent in existing DITA, for example a "phase" attribute that identifies what phase of a process an element is associated with.
  2. DITA architect expresses each new attribute as a separate domain package (eg phases.mod, with new attribute specialized from "base" attribute)
  3. DITA architect integrates the domain packages into the authoring DTDs or schemas:
    1. redefining "base" attribute entity to include phase attribute, same way we redefine element entities to integrate new domain elements;
    2. adding the new attribute domain to the list of domains in the domains attribute, preceded by an "a"; for example domains="a(base phase)" or domains="a(base phase phasetype")
  4. The DITA architect must also supply processing behavior for the new attribute, and ensure that it works on both the specialized form (eg phase="develop") and the generalize form (eg base="phase(develop)"), using the conditional processing match logic as a pattern.

Use case: negative values

The DITA 1.0 attribute syntax supports positive values only. This makes it difficult to work with cases where the classification really is by negation (for example: "this applies to every possible user EXCEPT programmers"); while a special value could be created (for example "notprogrammer"), it would need to be managed in parallel with the positive value (for example, "include notprogrammer, exclude programmer") and is not a particularly usable solution.

Proposed change: allow NOT as a special keyword within an attribute value. It applies only to the next value: if there are multiple negatives, they will have to be independently negated. This is still not full BOOLEAN logic support, and is intended to remain as simple and readable as possible, with an eye on the fact that a major cost of conditional processing is maintaining, debugging, and transferring ownership of documents with complex conditions. Maps are expected to do much more of the heavy lifting in DITA, and complex conditions are deliberately not supported within a single attribute.

Use case: scoped values

The DITA 1.0 attribute syntax supports simple values only. This makes it difficult to work with cases where several values have a common feature, for example audience="programmerJava programmerCPP programmerPython".

Proposed change: in order to make semantic scopes within an attribute more explicit, support componentized values separated by /: for example audience="programmer/database programmer/Java programmer/Web". The separate components of a value can then be addressed directly when filtering or flagging, for example "exclude programmer" would match all three, whereas "exclude programmer/Java" would match only the second value. The componentized syntax would require the most general scope to occur to the left, becoming more specific as it moves to the right, to be consistent with similar uses in the class attribute and href syntax in DITA.

Use case: extended syntax for ditaval

Publishers require more flexibility in how they process values. The following format extends and formalizes the .ditaval format used in the DITA toolkit and referred to non-normatively in the DITA 1.0 specification:
val
Root element, contains one or more prop or revprop elements
prop
Identifies an attribute, and usually values in the attribute, to take an action on. The attribute must be a specialization of the props attribute (such as platform, product, audience, and otherprops).
@att
The attribute to be acted upon. Must be one of props, audience, platform, product, otherprops, or a specialization of them. If the att attribute is absent, then the prop element declares a default behavior for any attribute specialized from props.
@val
The value to be acted upon. The value may be only a component of a scoped value, for example "programmer" would match "programmer/enterprise". If the val attribute is absent, then the prop element declares a default behavior for any value in the specified attribute.
action
The action to be taken. The options are:
include
Include the content in output. This is the default behavior unless otherwise set.
exclude
Exclude the content from output (if all values in the particular attribute are excluded).
passthrough
Include the content in output, and preserve the attribute value as part of the output stream for further processing. For example, add to the class attribute in html output, using the format for generalized values: eg class="programminglanguage(programmer/Javaprogrammer)"
flag
Flag the content on output (if the content has not been excluded).
@startimg
If flag has been set, the image to use for flagging the beginning of flagged content.
@endimg
If flag has been set, the image to use for flagging the ending of flagged content.
@color
If flag has been set, the color to use to flag text. Colors may be entered by name or by code. Processor support is recommended for the following: blue #CAE1FF, green #DAF4F0, dark pink #CCCCFF, light pink #FFF0F5, yellow #ffffcc, and tan #EED6AF
Draft comment:

the list of values given here and below differs from current toolkit support - what do we want to doc in the spec? need to strike right balance between tool flexibility and interoperability

@backcolor
If flag has been set, the color to use as background for flagged text. Colors may be entered by name or code. Processor support is recommended for the following: blue #CAE1FF, green #DAF4F0, dark pink #CCCCFF, light pink #FFF0F5, yellow #ffffcc, and tan #EED6AF
@style
If flag has been set, the text style to use for flagged text. The following values are enumerated:
  • underline
  • double-underline
  • italics
  • overline
  • bold
@printchar
If flag has been set, the character to include in the margin of the flagged text, for example "|".
revprop
Identifies a value in the rev attribute of content, or of a specialization of the rev attribute, that should be flagged in some manner. Unlike the props attribute, which can be used for both filtering and flagging, the rev attribute and its specializations can only be used for flagging.
@att
The attribute to be acted upon. Must be rev or a specialization of rev. If the att attribute is absent, then the revprop element declares a default behavior for any attribute specialized from rev.
@val
The value to be acted upon. The value may be only a component of a scoped value, for example "programmer" would match "programmer/enterprise". If the val attribute is absent, then the prop element declares a default behavior for any value in the specified attribute.
action
The action to be taken. The options are:
include
Include the content in output without flags. This is the default behavior unless otherwise set.
passthrough
Include the content in output, and preserve the attribute value as part of the output stream for further processing.
flag
Flag the content on output (if the content has not been excluded).
Flag the content using >> and << characters in addition to whatever image or style options are chosen.
@startimg, @endimg, @color, @backcolor, @style, @printchar
Same as for prop element
startimgalt
An element allowed inside either prop or revprop to provide alternate text for an image, when the startimg attribute sets an image to be used for flagging. If the element is absent inside revprop, the default alternate text of "start of change" will be used, in the language of the current document or element.
endimgalt
An element allowed inside either prop or revprop to provide alternate text for an image, when the endimg attribute sets an image to be used for flagging. If the element is absent inside revprop, the default alternate text of "end of change" will be used, in the language of the current document or element.
Draft comment:
Should we make ditaval a true DITA format, by adding class attributes, id and conref attributes, and the base attribute? (I'm assuming props and its derivatives would be inappropriately self-referential in this context).

Technical Requirements

Change to the architectural specification to allow specialization of new universal attributes off of the attributes props, rev, and base, following domain model: each new universal attribute defined in a separate domain package that provides an attribute definition entity and a domain attribute value that starts with "a" and then lists the attribute ancestry in parentheses, eg a(props language). The domain can be integrated into a doctype by redefining the univ-atts entity to include the new attribute entity, and redefining the domains attribute to include the domains value entity.

Define syntax for generalized attribute values that allows for continued processing and roundtripping: put the values of the generalized attribute into parentheses preceded by its specialized name, eg props="proglanguage(Java)" or props="audience(role(developer))".

Update conditional processing logic to work the same on either specialized or generalized forms of the value: OR between attributes, AND within an attribute, whether or not an attribute actually exists.

Remove existing section of specification on how to "break" architecture to get attribute specialization.

Add a "props" attribute to the architecture, from which the other metadata attributes (platform, product, audience, otherprops) will be specialized.

Add a "base" attribute to the architecture, which will be ignored by unspecialized processing.

Ensure all attributes are expressed in entities to allow domain-based expansion/specialization (DTDs); document equivalent mechanism for schemas

Formalize and document new conditional processing attribute value syntax to allow scoped values (eg Java/EJB) and negative values (eg NOT Java)

Formalize and document ditaval format, including logic for filtering, flagging, ignoring, or passing through values in the props attribute or in attributes specialized from the props attribute; for setting attribute defaults, and for matching partial values based on value scopes or components..

Costs

Time required for design should hopefully be minimal. There will be more work by the open-source toolkit to enhance existing transforms to handle "base" and "props"-specialized attribute generalization and respecialization, and make the conditional processing logic specialization-aware.

Benefits

Many people would make use of this. It is consistently a highly rated requirement. For some, this would remove a major barrier to DITA adoption.

Time Required

3 1-hour meetings to review requirements

3 1-hour meetings to agree on solution

2 days to complete document solution