Dear SC,
From the retrospective on discussions some of us had at the LibreOffice conference last week, we started to discuss in our call today the need for standardization of document changes.
Let me try to wrap up the output of our discussion and allow me to spice it up with some references of research and arguments, we might have missed during the call:
Currently the finest level for interoperability between users and applications is a document.
Unfortunately for modern collaboration the exchange of the full document is no longer sufficient.
- In collaboration, especially in real-time collaboration, dispatching documents is not feasible. Aside of the unnecessary network traffic, there is the additional burden for the receiver of the document to identify the changes that transforms the received document into its own. The complexity of this task is relative to the document size. Even worse to identify only 3 basic types of changes (add, deleted update) the complexity is without any aid as change-tracking O(n³) [1] and when identifying "move" changes it becomes even NP-hard [2]. Change-tracking as potential aid to identify changes has the general disadvantage of not being able to define overlapping changes (neither ODF nor OOXML) and the disadvantage for the user to build the collaboration on pure trust as change-tracking does not give any guarantees (oppose to the pure exchange of changes) that all changes had been recorded by the author. Users state that they still even print different versions of documents (such as important contracts) and hold them against the light.
- Basically the exchange of changes solves the above problem by easing the main problem of collaboration, the merge of simultaneously edited documents.
- The trick is: instead of finding changes, the changes are dispatched beforehand. Why loosing precious information that the editor had earlier at its fing ertips.
- In other words, the use of predefined changes gets important whenever multiple people are working in parallel - for instance offline - and/or not desire to require a (pessimistic write) lock blocking others to work on their working area. The exchange of the changes omits this necessity of finding them afterwards in the document.
- To define a change from a high level view. It consists in general of three parts: the type of change (e.g. text addition), its customized parameters (e.g. the text itself and perhaps text style properties) and finally a reference of the change to occur in the document (its position).
After agreeing on the advantage to have something more atomic defined for collaboration as the document itself like we have nowadays, the interoperable interchange of such changes is only guaranteed by a standardization of them as we agreed upon today.
Please comment if I should have missed something or in case you have an additional thought to add now after call upon this discussion.
Best regards,
Svante
[1] M. Pawlik and N. Augsten. RTED: a robust algorithm
for the tree edit distance. PVLDB, 5(4):334–345, 2011.
[2] P. Bille. A survey on tree edit distance and related
problems. Theor. Comput. Sci., 337(1-3):217–239, 2005.