[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff] Re: SPAM - RE: [xliff] HTML Inline codes
... and another view. This relates to how we handle these instances in SDLX. [1] In SDLX, we ultimately replace <br> with the unicode character 0x2028. The segmentation rules can then be used to determine whether to segment on this character. An alternative is to use <x ctype="lb"/> and again allow the segmenting application to make the decision. [2] Inline codes are ALWAYS inline. However, we may choose to "optimise out" matching inline codes if there is no text between these codes and the segment start/end when presenting the text for translation. So the example below becomes <source><g id="1" ctype="bold">Sample text.</g></source> However, we would only present "Sample text." to the translator for translation. (Actually, in this case, we would make the text bold as well). Personally, I don't think you should ever make inline codes "external" in the XLIFF filter ... that should be left to the XLIFF editor to decide whether to present them to the translator. After all, the formatting of the text could have some bearing on the translation. David Pooley Software Architect SDL International -----Original Message----- From: Shigemichi Yazawa [mailto:yazawa@globalsight.com] Sent: Thursday, October 21, 2004 7:46 PM To: xliff@lists.oasis-open.org Subject: [xliff] Re: SPAM - RE: [xliff] HTML Inline codes At Wed, 20 Oct 2004 14:54:41 -0600, Yves Savourel wrote: > > The question is: > Which elements should remain inside the extracted text. This is not to answer to the question, but rather to offer a view to the subject from a little different angle. 1. <br> can be inline or not, depending on a context. It can be used as a formatter or a paragraph separater. It needs to be treated case by case basis. 2. There is no doubt that <b> should be inline, but not always. If <b> appears like this, <p><b>Sample text.</b></p> <b> had better be excluded from the text. One reason is that translators generally don't want to be distracted by codes. The less codes, the happier they are. Another reason is that one can expects better TM levearge results without codes. If other documents have a sentence like <i>Sample text.</i>, a TM result would be 100% if both the TM record and the source segment don't include codes. With codes, a match result would generally be penalized. 3. So inline elements should be excluded from the segments when they are outside of the segments, right? Well, it's not always possible. Let's look at an example below. <p><b>Sample text.</b> Another text.</p> Whether </b> can be excluded from the segments depends on the implementation of the extraction framework. If the extractor extracts a segment as a unit and put anything between in the skeleton, </b> can be excluded. If the extractor extracts a paragraph as a unit, then </b> cannot be excluded. These are implementation issues we came across. Other tool venders may have different issues, too. It would be great if these issues are captured in some way in the profile. ------------------- Shigemichi Yazawa yazawa@globalsight.com To unsubscribe from this mailing list (and be removed from the roster of the OASIS TC), go to http://www.oasis-open.org/apps/org/workgroup/xliff/members/leave_workgroup.p hp.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]