xliff message

Subject: RE: [xliff] Re: SPAM - RE: [xliff] HTML Inline codes
From: David Pooley <dpooley@sdl.com>
To: "'xliff@lists.oasis-open.org'" <xliff@lists.oasis-open.org>
Date: Fri, 22 Oct 2004 09:59:58 +0100
... and another view. This relates to how we handle these instances in SDLX.

[1] In SDLX, we ultimately replace <br> with the unicode character 0x2028.
The segmentation rules can then be used to determine whether to segment on
this character. An alternative is to use <x ctype="lb"/> and again allow the
segmenting application to make the decision.

[2] Inline codes are ALWAYS inline. However, we may choose to "optimise out"
matching inline codes if there is no text between these codes and the
segment start/end when presenting the text for translation. So the example
below becomes

	<source><g id="1" ctype="bold">Sample text.</g></source>

However, we would only present "Sample text." to the translator for
translation. (Actually, in this case, we would make the text bold as well).
Personally, I don't think you should ever make inline codes "external" in
the XLIFF filter ... that should be left to the XLIFF editor to decide
whether to present them to the translator. After all, the formatting of the
text could have some bearing on the translation.

David Pooley
Software Architect
SDL International

-----Original Message-----
From: Shigemichi Yazawa [mailto:yazawa@globalsight.com] 
Sent: Thursday, October 21, 2004 7:46 PM
To: xliff@lists.oasis-open.org
Subject: [xliff] Re: SPAM - RE: [xliff] HTML Inline codes


At Wed, 20 Oct 2004 14:54:41 -0600,
Yves Savourel wrote:
>
> The question is:
> Which elements should remain inside the extracted text.

This is not to answer to the question, but rather to offer a view to the
subject from a little different angle.

1. <br> can be inline or not, depending on a context. It can be used
   as a formatter or a paragraph separater. It needs to be treated
   case by case basis.

2. There is no doubt that <b> should be inline, but not always. If <b>
   appears like this,

   <p><b>Sample text.</b></p>

   <b> had better be excluded from the text. One reason is that
   translators generally don't want to be distracted by codes. The
   less codes, the happier they are. Another reason is that one can
   expects better TM levearge results without codes. If other
   documents have a sentence like <i>Sample text.</i>, a TM result
   would be 100% if both the TM record and the source segment don't
   include codes. With codes, a match result would generally be
   penalized.

3. So inline elements should be excluded from the segments when they
   are outside of the segments, right?  Well, it's not always
   possible.  Let's look at an example below.

   <p><b>Sample text.</b> Another text.</p>

   Whether </b> can be excluded from the segments depends on the
   implementation of the extraction framework. If the extractor
   extracts a segment as a unit and put anything between in the
   skeleton, </b> can be excluded. If the extractor extracts a
   paragraph as a unit, then </b> cannot be excluded.


These are implementation issues we came across. Other tool venders may have
different issues, too. It would be great if these issues are captured in
some way in the profile.

-------------------
Shigemichi Yazawa
yazawa@globalsight.com

To unsubscribe from this mailing list (and be removed from the roster of the
OASIS TC), go to
http://www.oasis-open.org/apps/org/workgroup/xliff/members/leave_workgroup.p
hp.