Re: [dita] Impacts of lightweight topics not being self-describing

dita message

Subject: Re: [dita] Impacts of lightweight topics not being self-describing

From: "Don R. Day" <donday@donrday.com>

To: dita@lists.oasis-open.org

Date: Tue, 20 May 2014 10:05:04 -0500

I'll note that specialization does give back some of the ability to collect by unique features that may be discovered in the content by using XPath inquiries into the content. These tend to be ad hoc items--if a topic contains crucial info that is wrongly marked up or not marked up at all, then it is still invisible to such a query. We might do some thinking on ways to ensure that content-based inner semantic structures can be insinuated into authoring rules somehow.

Don R. Day
Co-Founder, ContelligenceGroup.com
Past Chair, OASIS DITA Technical Committee
LinkedIn: donrday Twitter: @donrday
About.me: Don R. Day Skype: don.r.day

"Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?"
--T.S. Eliot

On 5/20/2014 9:46 AM, Don R Day wrote:

Lightweight DITA can display well enough in a live-rendering universe if directly linked or invoked. But because of the lack of prolog in the data model, these topics are invisible to file-based searches that attempt to make collections based on metadata (other than features already in the topics: doctype/topictype, XPath to content, or filename).

The baseline for functionality for comparison is a typical blog or wiki entry. The SQL schemas for this type of content [1][2] typically include:

postID (usually the primary key by which the item is stored; equivalent to either a topic ID or a topic filename, whichever one chooses to be the key locator)

postTitle

postBody (which, if it contains a "more" PI can be divided into an excerpt/shortdesc and "the rest of the body")

comment key (if not the postID itself)

feature image (may be used in sliders or feature posts, but not always part of other rendered content such as sidebar snippets)

kicker title

and a comprehensive set of data used to select by collection type:

tags (folksonomy or enumerated terms, often used in tag clouds)

categories (faceted filtering)

author (collections by author, with foreign keys into member/user tables)

date (collections by creation, publish date, archive date, etc.)

edit notes and/or status

related posts

included media (for collecting topics that contain videos or UI exhibits, for example)

obscure other, depending on application (Drupal node types and relations for example)

For the baseline case, this data represents one row of self-described content retrieved by "select * where id=$postid". By contrast, for the file entity case, the Lightweight DITA topic as an "entity-as-row" basically self-presents only the first set of data (and not all, at that); the rest must be carried in a hybrid database entry as needed. In other words, it is not possible for Lightweight DITA to as a "file-only" entity to self-represent the equivalent data set as the "database-only" baseline.

Whether this restricted inherent data model is important or not depends on your application. It complicates the logical data access layer, which must be hybrid rather than one or the other. Regular DITA topics come close to the baseline equivalency, if used with some metadata conventions. [3] For example a microsite or landing page application, the full DITA topic is usually sufficiently self-describing (as long as you explicitly identify "feature" images as othermeta, for example, and have a convention for retrieving them). And don't use all the processing features that complicate direct rendering.[4]

All noted in order to ensure that Lightweight DITA is appropriately disclaimed against user expectations for an equivalently simpler application. The current model only makes the input side simpler. Until someone needs to enter other metadata into another form.

The alternative, to be weighed against the message of "utter simplicity," is that we add some of these features back in without being strictly limited by the contentEditable feature set, with the expectation of using a hybrid editor (with input fields for discrete metadata and contentEditble divs for the discourse, which is probably how a LWD editor will be designed anyway).

And I think it will be soon time to get a Subcommittee started where we can begin channeling these design discussions in their own list. I'm willing to lead in the initiation of this, if needed.

-----
[1] http://codex.wordpress.org/Database_Description
[2] http://www.mediawiki.org/wiki/Manual:Database_layout
[3]But the gaps lead me to continue thinking towards a "DITA for the Web" that breaks with strict compatibility (and therefore would not be called "DITA" when that time comes) in order to enable Web applications to use structured content in ways that the current standard inhibits.
[4] https://groups.yahoo.com/neo/groups/dita-users/conversations/messages/34990

--

Don R. Day

Co-Founder, ContelligenceGroup.com

Past Chair, OASIS DITA Technical Committee

LinkedIn: donrday Twitter: @donrday

About.me: Don R. Day Skype: don.r.day

"Where is the wisdom we have lost in knowledge?

Where is the knowledge we have lost in information?"

--T.S. Eliot

Don R. Day
Co-Founder, ContelligenceGroup.com
Past Chair, OASIS DITA Technical Committee
LinkedIn: donrday Twitter: @donrday
About.me: Don R. Day Skype: don.r.day

"Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?"
--T.S. Eliot