[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Segmentation representation and scenario
Some ideas on segmentation representation: For representing the segmentation inside a <trans-unit> I would use the <mrk> element: <trans-unit id='2'> <source xml:lang='en'><mrk mid='2-1' mtype='phrase'>This is the second entry of the file.</mrk> <mrk mid='2-2' mtype='phrase'>This is the second sentence of the second entry.</mrk></source> <target xml:lang='fr'><mrk mid='2-1' mtype='phrase'>Ceçi est la première entrée du fichier.</mrk> <mrk mid='2-1' mtype='phrase'>Ceçi est la seconde phrase de la première entrée.</mrk></target> </trans-unit> - It's part of the existing specifications. - It's un-intrusive: mergers are suppose to ignore it. - We can have a set of specific extended attributes if we want to store sentence-level information. - We would probably need to add a mtype value specific for a 'segment' ('phrase' is not good enough). I agree that translation tools should be able to provide there own segmentation within a <trans-unit> and that during the translation itself (by the translator). I also think that a translation tool should be able to use any existing match at the <trans-unit> level as well: there is no reason to go to a finer granularity if a match is already available at the <trans-unit> level. This said, there is obviously a threshold of usability for fuzzy matches at the <trans-unit> level. And that threshold is most likely commensurable to the size of the text in the <trans-unit> (as for large units the differences between the new source and the old one may be more difficult to see). I think a translation process should be able to take advantage of such high matches obtained without the translation tool and without segmentation of the <trans-unit> content. Translation tools should allow the verification of such matches during the translation. For example: one can imagine a project where version 2 of a software is to be localized. A version 1 with translation exists, but no TM. One can easily create a "TM" without complexe tools for <trans-unit> level entries. One should be able to re-use high matches of that "TM" regardless what segmentation is use by the translation tools. Cheers, -yves
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]