David, the DITA TC discussed this at our meeting today. Two
things:
- As a standard, DITA is focused on providing architecture for
content that is authored by and for people. Everyone thought
that what you are suggesting is more of processing focused.
- No one was willing to champion the proposal.
Thanks for sending it to us.
Best,
Kris
Kristen James Eberlein
Chair, OASIS DITA Technical Committee
Principal consultant, Eberlein Consulting
www.eberleinconsulting.com
+1 919 622-1501; kriseberlein (skype)
On 1/17/2019 2:04 PM, David Hollis
wrote:
Hi Rob, Kris,
I note all the points about cached content.
I still think the interchange of content that contains reuse
mechanisms is an important use case, and would warrant inclusion
in DITA 2.0.
If this is going to go forwards it will need a voting member
to champion it. So, let's see if anyone is interested.
David
Hi David,
From what I understand of the proposal - it
is effectively trying to define a cache of resolved /
processed content, together with the original source
document. As Kris said initially, that is really a
processor issue. For example, I know some processors
already have ways to cache content references for faster
load times.
Defining a new format for storing processed
content *within the source content* is dangerous for a
number of reasons. Among them - you mention about this
new section: "It must not be
editable by an author." I
certainly agree -- you cannot allow cached / processed
data to be edited or the whole model falls apart. But
that itself is why any cached data belongs more to the
world of processors than to the DITA standard itself; as
an XML format, it is simply not possible to define a
portion of the document that 1) contains real meaningful
text content but 2) must not be editable. Anybody would
be able to open the document in a simple XML editor and
break the cache.
However, creating that sort of cache within
a processor's data model would definitely keep it out of
reach of the document authors, and handle most of the
use cases that you listed to start the proposal (faster
load times, avoid repetitive resolution, etc).
Robert D.
Anderson
DITA-OT lead and Co-editor DITA
1.3 specification
Marketing Services Center |
<graycol.gif>Kristen James Eberlein ---01/17/2019
12:29:11 PM---Once again, I see nothing that has to do
with DITA the standard, only how it might be rendered.
Kris
From: Kristen James Eberlein <kris@eberleinconsulting.com>
To: David
Hollis <david@tdandc.co.uk>
Cc: DITA
TC <dita@lists.oasis-open.org>
Date: 01/17/2019 12:29 PM
Subject: Re: [dita] DITA 2.0 Stage 1 proposal - A
storage mechanism for resolved reuse content
Sent by: <dita@lists.oasis-open.org>
Once again, I see nothing that has to do
with DITA the standard, only how it might be rendered.
Kris
Sent from my iPad
> On Jan 17, 2019, at 1:12 PM, David Hollis <david@tdandc.co.uk>
wrote:
>
> Hi Kris,
>
> On the one hand, I agree. It's the sort of thing a
vendor might implement.
>
> But, one of the important aspects of DITA is its
ability to pass content between companies. So, sending
content to a translation company requires the integrity
of the reuse mechanisms. That is, the content sent for
translation contains reuse mechanisms, and when it
returns it must contain the same reuse mechanisms.
>
> This is why I think it is important for the TC to
consider it.
>
> Many thanks,
> David
>
>
>> David, this is an issue for processors. I
cannot see what it has to do with DITA the standard.
>>
>> Kris
>>
>>
>>> On Jan 17, 2019, at 12:19 PM, David Hollis
<david@tdandc.co.uk>
wrote:
>>>
>>> Hi all,
>>>
>>> Please find below a DITA 2.0 stage 1
proposal - A storage mechanism for resolved reuse
content
>>>
>>> I think the content model would be
relatively straightforward.
>>>
>>> Any difficulty would be with the DOT and
other DITA processors. They might need quite a bit of
work!
>>>
>>> Long term, it should make life easier for
CMS, editor and translation tool vendors.
>>>
>>> It could revolutionise DITA reuse: authors
would be able to
> Hi Kris,
>
> On the one hand, I agree. It's the sort of thing a
vendor might implement.
>
> But, one of the important aspects of DITA is its
ability to pass content between companies. So, sending
content to a translation company requires the integrity
of the reuse mechanisms. That is, the content sent for
translation contains reuse mechanisms, and when it
returns it must contain the same reuse mechanisms.
>
> This is why I think it is important for the TC to
consider it.
>
> Many thanks,
> David
>
>
>> David, this is an issue for processors. I
cannot see what it has to do with DITA the standard.
>>
>> Kris
>>
>>
>>> On Jan 17, 2019, at 12:19 PM, David Hollis
<david@tdandc.co.uk>
wrote:
>>>
>>> Hi all,
>>>
>>> Please find below a DITA 2.0 stage 1
proposal - A storage mechanism for resolved reuse
content
>>>
>>> I think the content model would be
relatively straightforward.
>>>
>>> Any difficulty would be with the DOT and
other DITA processors. They might need quite a bit of
work!
>>>
>>> Long term, it should make life easier for
CMS, editor and translation tool vendors.
>>>
>>> It could revolutionise DITA reuse: authors
would be able to physically see it in action, in front
of them, in the editor.
>>>
>>> It should make reused content statistics
more meaningful and manageable. In turn, this could
justify ROI, and the move to DITA.
>>>
>>> I acknowledge that this is probably quite
ambitious.
>>>
>>> Many thanks,
>>> David
>>>
>>>
>>>
>>> Background
>>>
>>> One of the core aspects of DITA is the
multiple reuse mechanisms. It's why a lot of companies
chose to use DITA.
>>>
>>> Fundamental to these reuse mechanisms is
that they require resolution. Not just by the processor
to create an output format, but at every stage during
content creation: authoring, reviewing and translation.
This requires the tool to be able to resolve the reuse
mechanism on the fly, as the topic opens.
>>>
>>> 1. This has the potential for delays.
Especially for large, global CMS systems with server
replication. It takes longer to open a DITA topic,
despite a typically small file size, than practically
any other file
>>> Hi all,
>>>
>>> Please find below a DITA 2.0 stage 1
proposal - A storage mechanism for resolved reuse
content
>>>
>>> I think the content model would be
relatively straightforward.
>>>
>>> Any difficulty would be with the DOT and
other DITA processors. They might need quite a bit of
work!
>>>
>>> Long term, it should make life easier for
CMS, editor and translation tool vendors.
>>>
>>> It could revolutionise DITA reuse: authors
would be able to physically see it in action, in front
of them, in the editor.
>>>
>>> It should make reused content statistics
more meaningful and manageable. In turn, this could
justify ROI, and the move to DITA.
>>>
>>> I acknowledge that this is probably quite
ambitious.
>>>
>>> Many thanks,
>>> David
>>>
>>>
>>>
>>> Background
>>>
>>> One of the core aspects of DITA is the
multiple reuse mechanisms. It's why a lot of companies
chose to use DITA.
>>>
>>> Fundamental to these reuse mechanisms is
that they require resolution. Not just by the processor
to create an output format, but at every stage during
content creation: authoring, reviewing and translation.
This requires the tool to be able to resolve the reuse
mechanism on the fly, as the topic opens.
>>>
>>> 1. This has the potential for delays.
Especially for large, global CMS systems with server
replication. It takes longer to open a DITA topic,
despite a typically small file size, than practically
any other file type. Given comparable file systems.
>>>
>>> 2. On-the-fly resolution has to happen many
more times than is necessary. It is normal practice to
set up reused content related to a new product at the
start of a project, and it changes little during the
project. A project might require a new warning or
caution, but that doesn't take too long. So, authoring
time related to reused content is, let's say, less than
10% of the whole content creation effort related to a
project. That is an entirely arbitrary figure, but you
get the idea.
>>>
>>> 3. Some tools cannot resolve DITA reuse
mechanisms on the fly, or perhaps the map is missing.
The author, editor or translator then sees an ugly
'lump' of meaningless, raw XML content. What are they
supposed to do with that? It might be a product name,
but how are they to know?
>>>
>>> 4. A topic has been reused many times. A
new engineer joins the team, and is not aware of how the
documentation is built, and its reliance upon reuse. A
simple job, one that helps him learn the product, is for
him to review the documentation. He's keen, and makes a
number of suggestions. Some of these suggestions impact
the reuse viability of a particular topic. How is an
author supposed to spot this?
>>>
>>> 5. A competent, DITA-savvy, CMS probably
offers a 'dependencies' feature. Trying to follow
dependencies first up one map tree, and then down
another can be very futile. It can feel like 'chasing
Alice down the rabbit hole'.
>>>
>>> 6. Every time a document is built, the
resolution of reused content takes place. The results of
that resolution are thrown away.
>>>
>>>
>>>
>>> Storage Mechanism
>>>
>>> So, what if DITA 2.0 could contain a means
to capture and store the resolved reused content?
>>>
>>> It would mean the end to on-the-fly
resolution, and the end to additional file opening
delays. It would be as easy, and as quick, to open a
DITA topic as any other file, given comparable file
systems. Tools would need to be far less DITA-savvy
because they would not need to perform on-the-fly
resolution. So, an end to meaningless, raw XML content.
>>>
>>> A storage mechanism would be able to
capture the resolved reused content for every reused
instance of that topic. An author would be able to open
the topic, without the need to define a resolution map,
and they would be able to choose the resolved reused
content for any of the multiple instances of that topic.
Say, by choosing the title of an output build map from a
drop-down list. They would be able to immediately see
any potential problems related to review comments. They
would be able to see how a reused feature list, say,
changes and grows for each individual product, and
instance of the topic.
>>>
>>> It would also be possible to directly
navigate from the reused content source to every topic
that reuses the content. And vice versa, from reused
content target back to the content source. This could be
very useful for the management of warnings and cautions,
to reduce duplication, and when product development
mitigates against warnings. Medical companies work hard
to mitigate against warnings, and reduce the number
required.
>>>
>>>
>>>
>>> Implementation
>>>
>>> I don't know the correct terminology, but
it would use a top level 'area' or 'block' in every
topic model. That is, a new block that is parallel with
<title>, <prolog>, <body>, and
<related-links>.
>>>
>>> An important aspect is that the content
placed in this block would only be by the processor. It
must not be editable by an author. If an author knows
that there are recent changes to a reused content
source, they could initiate one or more 'resolution
builds' to rebuild the resolved reused content.
>>>
>>> Ideally, an editor or CMS would store a
list of files to watch, and automatically initiate the
relevant resolution builds to refresh the resolved
reused content. A CMS might also do overnight resolution
builds to keep content fresh.
>>>
>>> A resolution build would not necessarily
produce an actual output document, just refresh the
reused content resolution.
>>>
>>> In practice, the exact implementation would
be chosen by the editor or CMS, and probably in
conjunction with user preferences. Some options might
be:
>>>
>>> 1. Put the resolved reused content back
into the original topic file. That might tempt the
author to edit it, rather than go to the content source.
The CMS or editor would need to ensure that no topic
files are open during the build.
>>>
>>> 2. A second parallel or 'ghost' file for
each topic. This file would probably have the same title
as the main topic, but would only have the resolved
reused content block. No <prolog>, <body>,
or <related-links>. The content in this block
would include every reused instance for the topic.
>>>
>>> 3. Essentially the same as 2. But, there
would be a parallel or ghost file for every reused
instance of a topic. Rather than all reused instances in
the one file. This would have the advantage that it
would be very easy to add up the reused instances, and
produce reuse statistics.
>>>
>>>
>>> When an editor or CMS opens a topic, it
would need to read the main topic and one of the reused
content resolutions. The author might be able to set a
preference for a particular build map that the editor
uses each time it opens a topic. It would need to be
able to link the content source to the content target,
and replace one with the other.
>>>
>>> It should be possible to send a single
topic for translation, without accompanying maps. In
practice, XLIFF or other tool would either merge the
main and ghost files, or include two or more files in
the manifest. The translation tool would do the same as
the editor, and also mark the reused content as
non-translatable, but visible to the translator. For
cost purposes, the tool would count only the actual
words in the main topic, not XML 'lumps', and totally
ignore the resolved reused content.
>>>
>>> This might lead to a new two stage
translation work flow:
>>>
>>> 1. Translate reused content sources, and
review in conjunction with the English, or first
language, main content. Look for any gender and noun
declension issues related to the reused content. Get
approval for reused content translations.
>>>
>>> 2. Use the translated reused content
source, and translate the main content.
>>>
>>>
>>>
>>> Content Model
>>>
>>> It would be necessary to figure out all the
permutations and combinations for maps, topics, keys,
scoped key, filters and branch filters. This ought to be
a closed set of combinations. For instance, a topic
might be introduced by a <keydef @href> in one
map, and a <topicref @keys> in another map. It
might mean a trip into 'worst practice', e.g. 'spaghetti
reuse'.
>>>
>>> A reduced form of this content model could
be used for maps.
>>>
>>>
>>> The content model would be a hierarchy of
elements:
>>>
>>> 1. The title and link to a build or root
map.
>>>
>>> 2a A hierarchy of any intermediate maps:
title and link.
>>>
>>> 2b Any additional map that defines keys or
filters relevant to the topic: title and link.
>>>
>>> 3 Any map that defines scoped keys or
branch filters: title and link.
>>>
>>> 4a For any reused content source topic, the
title and link for any target topic.
>>>
>>> 4b For the target topic of any reused
content, the title, content and link to the source
topic.
>>>
---------------------------------------------------------------------
>>> To unsubscribe from this mail list, you
must leave the OASIS TC that
>>> generates this mail. Follow this link to
all your TCs in OASIS at:
>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>>
>>>
>>
>>
>>
---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must
leave the OASIS TC that
>> generates this mail. Follow this link to all
your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>
>
>
>
---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave
the OASIS TC that
> generates this mail. Follow this link to all your
TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the
OASIS TC that
generates this mail. Follow this link to all your TCs
in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
|