Proposal For Segmentation Notation in XLIFF - Additional Support Attributes

Draft by the XLIFF Segmentation Committee
05 January 2005


Table of Contents

1. Introduction
2. Problem Statement
3. Proposal
3.1. Changes To The 1.1 Specification
3.1.1. XLIFF Schema Changes
3.1.2. Documentation Changes
 

1. Introduction

This document describes a proposal for introducing extra attributes to the existing XLIFF 1.1 specification for the treatment of special cases that come under the scope of 'Segmentation'.

This document is designed to complement the document produced by the XLIFF Segmentation sub-committee that deals directly with the issue of how segmentation should be encoded within an XLIFF document.

2. Problem Statement

There are two special cases that fall under the general topic of text segments that need to be addressed by the XLIFF standard:

  1. How does the standard allow <trans-unit> elements that have been incorrectly segmented to be translated?
  2. How can the standard allow a translator to signal that for whatever reason a given translation for a <trans-unit> should not be regarded as a direct equivalent for translation memory purposes?

    It is inevitable that for whatever reason individual XLIFF <trans-unit> elements may not represent a piece of text that can be translated without reference to one or more surrounding <trans-unit> elements. The causes for this may be incorrect segmentation or bad document design. A mechanism is required that stipulates in the translated XLIFF document that specific <trans-unit> elements need to be 'grouped' together to provide a distinct accurate and unified translation.

    Example:

        <trans-unit id="t1"> 
          <source>The German acronym v.</source>
          <target>Niemiecki skrót v. OT oznacza górną pozycję silnika.</target>
        </trans-unit>
        <trans-unit id="t2">
          <source>OT signifies the top dead center position for an engine.</source>
          <target/>
        </trans-unit>
    	

    In addition linguistically complete text may have to be broken into a number of segments due to message size constraints. In these instances the translator is not providing an equivalent translation for each <trans-unit>, but rather fitting in the target language text over a number of <trans-unit> elements to meet the requirements of the target application.

    Example:

        <trans-unit id="t1">
          <source>Constrained text for limited</source>
          <target>Tekst angielski dla</target>
        </trans-unit>
        <trans-unit id="t2">
          <source>display for English</source>
          <target>ograniczonego pola</target>
        </trans-unit>
        

    There may be other circumstances, where for whatever reason, the translation provided in the <target> element is not a direct translation of the <source> element. A mechanism is required to allow this fact to be signalled within the <trans-unit> element that the translation is not a direct equivalent. This is important during further processing of the XLIFF document, say for loading translation memory.

    3. Proposal

    After careful considerations, the XLIFF Segmentation Sub-Committee has come to the conclusion that the following additions are required to the XLIFF 1.1 standard:

    3.1. Changes To The 1.1 Specification

    The changes for this proposal would be as follow:

    3.1.1. XLIFF Schema Changes

    There are two changes required to the XLIFF XSD file:

    1. Create a new "equivalent-translation" attribute for the <trans-unit> element. The default value of this attribute will be "yes". The other possible value will be "no" to indicate that the translation for this <trans-unit> is not a direct equivalent linguistically of the source language text. The following example demonstrates the use of the "equivalent-translation" attribute:

          <trans-unit id="t1" equivalent-translation="no"> 
            <source>Constrained text for limited</source>
            <target>Tekst angielski dla</target>
          </trans-unit>
          <trans-unit id="t2" equivalent-translation="no">
            <source>display for English</source>
            <target>ograniczonego pola</target>
          </trans-unit>
      	
    2. Create a new "merged-translations" attribute for the <group> element. This new attribute has two possible values: "yes" or "no". The default value is "no". A value of "yes" indicates that the <trans-unit> elements contained within this <group> element are to be treated together for linguistic purposes. All <trans-unit> elements that are encompassed by a <group> element that has its merged-translations element set to "yes" normally have their equivalent-translation attribute set to the value of "no". The text of all of the <source> and <target> elements taken together form one linguistic whole. No requirements are made regarding the distribution of the translation in the <target> elements. This will be governed by the requirements of the individual applications. The translated text may be placed within the first <target> element leaving the following <target> elements blank, or distributed among the <target> elements contained within the "merged-translations" <group> element. The following example demonstrates the use of the "merged-translations" attribute for the <group> element:

      <group merged-translations="yes">
          <trans-unit id="t1" equivalent-translation="no">
            <source>The German acronym v.</source>
            <target>Niemiecki skrót v. OT oznacza górną pozycję silnika.</target>
          </trans-unit>
          <trans-unit id="t2" equivalent-translation="no">
            <source>OT signifies the top dead center position for an engine.</source>
            <target/>
          </trans-unit>
       </group>
       

    3.1.2. Documentation Changes

    The following changes will be required:

    1. A new section (2.7) describing the "equivalent-translation" concept including relevant examples.
    2. A new section (2.8) describing the "merged-translations" concept including relevant examples.
    3. Add the documentation concerning the "equivalent-translation" attribute regarding <trans-unit> elements
    4. Add the documentation concerning the "merged-translations" attribute for <group> elements to section 3.2 (Elements).
    5. Update section 3.3 (XLIFF Attributes) to incorporate the new attributes and explain their use.
      1. ===== Start of proposed new entry (all inserted) =====

        2.7 Non equivalent translations

        Linguistically complete text may have to be broken into a number of <trans-unit> elements due to message size constraints or other reasons. In these instances the translator is not providing an equivalent translation for each <trans-unit>, but rather fitting in the target language text over a number of <trans-unit> elements to meet the requirements of the target application.

        Example:

            <trans-unit id="t1">
              <source>Constrained text for limited</source>
              <target>Tekst angielski dla</target>
            </trans-unit>
            <trans-unit id="t2">
              <source>display for English</source>
              <target>ograniczonego pola</target>
            </trans-unit>
            

        In this circumstance the "equivalent-translation" attribute for the <trans-unit> element is used to denote that the translation should not be regarded as a direct translation of the <source> element. The default value of this attribute is "yes". The other possible value will be "no" to indicate that the translation for this <trans-unit> is not a direct equivalent linguistically of the source language text. The following example demonstrates the use of the "equivalent-translation" attribute:

            <trans-unit id="t1" equivalent-translation="no"> 
              <source>Constrained text for limited</source>
              <target>Tekst angielski dla</target>
            </trans-unit>
            <trans-unit id="t2" equivalent-translation="no">
              <source>display for English</source>
              <target>ograniczonego pola</target>
            </trans-unit>
        	

        2.8 Grouping Translation across <trans-unit> elements

        It is inevitable that individual XLIFF <trans-unit> elements may not represent a piece of text that can be translated without reference to one or more following <trans-unit> elements. The causes for this may be incorrect segmentation or bad document design.

        Example:

            <trans-unit id="t1"> 
              <source>The German acronym v.</source>
              <target>Niemiecki skrót v. OT oznacza górną pozycję silnika.</target>
            </trans-unit>
            <trans-unit id="t2">
              <source>OT signifies the top dead center position for an engine.</source>
              <target/>
            </trans-unit>
        	

        In these cases the "merged-translations" attribute for the <group> element can be used to denote that the individual <trans-unit> elements cannot be regarded as a direct translation, but rather need to be treated linguistically as a merged group. This attribute has two possible values: "yes" or "no". The default value is "no". A value of "yes" indicates that the <trans-unit> elements contained within this <group> element are to be treated together for linguistic purposes. All <trans-unit> elements that are encompassed by a <group> element that has its merged-translations element set to "yes" normally have their equivalent-translation attribute set to the value of "no". The text of all of the <source> and <target> elements taken together form one linguistic whole. No requirements are made regarding the distribution of the translation in the <target> elements. This will be governed by the requirements of the individual applications. The translated text may be placed within the first <target> element leaving the following <target> elements blank, or distributed among the <target> elements contained within the "merged-translations" <group> element. The following example demonstrates the use of the "merged-translations" attribute for the <group> element:

        <group merged-translations="yes">
            <trans-unit id="t1" equivalent-translation="no">
              <source>The German acronym v.</source>
              <target>Niemiecki skrót v. OT oznacza górną pozycję silnika.</target>
            </trans-unit>
            <trans-unit id="t2" equivalent-translation="no">
              <source>OT signifies the top dead center position for an engine.</source>
              <target/>
            </trans-unit>
         </group>
         

        3.3. Attributes

        This section lists the various attributes used in the XLIFF elements. An attribute is never specified more than once for each element. Along with some of the attributes are the list of their possible values.

        XLIFF attributes annotates, approved, assoc, build-num, ctype, category, charclass, comment, company-name, contact-email, contact-name, contact-phone, context-type, coord, count-type, crc, css-style, datatype, date, equivalent-translation, exstyle, extradata, extype, font, form, from, help-id, href, id, job-id, match-mandatory, match-quality, maxheight, maxbytes, maxwidth, menu, menu-name, menu-option, mid, merged-translations, mime-type, minheight, minbytes, minwidth, mtype, name, original, phase-name, pos, priority, process-name, product-name, product-version, prop-type, purpose, reformat, resname, restype, rid, size-unit, source-language, state, state-qualifier, style, tool, tool-company, tool-id, tool-name, tool-version, target-language, translate, ts, uid, unit, version, xid.
        XML namespace attributes xml:lang, xml:space.

        3.3.1. XLIFF Attributes

        equivalent-translation

        equivalent-translation - Indicates if the target laguage translation is a direct equivalent of the source text.

        Value description:

        yes, or no.

        Default value:

        yes.

        Used in:

         <trans-unit>.

        merged-translations

        merged-translations - Indicates if the group element contains merged trans-unit elements.

        Value description:

        yes, or no.

        Default value:

        no.

        Used in:

         <group>.

        ===== End of proposed new entry =====

         

        -end-