OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti] i18n (RE: MVP Discussion)


I like this idea, it makes it easier to create content, makes it look simpler (don’t underestimate that), and makes dealing with single-language content easier but has the power of Ryu’s approach.



If I’m understanding the implications right the tradeoff is that you have to hash the string rather than just look at an ID field to figure out if a translation is valid, but that seems worth it to me. That, and in theory a producer could create two slightly different strings and use the same ID (perhaps punctuation was different) and you wouldn’t need to translate both. I’m not sure that’s a case we should really worry about though?

It would be nice to walk through a workflow of an updated object and a third-party translator with this approach to see what it means for the producer, consumer, and translator.

John

On 4/18/16, 5:13 PM, "Mates, Jeffrey CIV DC3/DCCI" <cti@lists.oasis-open.org on behalf of Jeffrey.Mates@dc3.mil> wrote:

>I might have missed this earlier, but instead of using text IDs on all text
>fields for all objects to ensure that a version change didn't result in a
>translation going out of date, couldn't we have the translation object store
>a hash of each of the text fields it translated?
>
>That way the producer of the original object wouldn't have to do any
>additional work while anyone using the translation could immediately find
>out which fields their translations were valid for and which ones had been
>broken by a version change.
>
>Sorry if this solution was proposed earlier, but I didn’t' see mention of
>it in the discussion so it might look like:
>
>  "campaigns": [
>    {
>      "type": "campaign",
>      "id": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>      "lang": "en",
>      "spec_version": "stix-2.0",
>      "created_at": "2015-12-03T13:13Z",
>      "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
>      "title": "Dridex Campaign - Botnet 121",
>      "descriptions": "Dridex-based campaign leveraging Botnet 121",
>      "intended_effects": [
>        {"value": "theft-identity-theft"}
>      ],
>      "status": "Ongoing"
>   }
>  ],
>  "translations": [
>    {
>      "type": "translation",
>      "id": "translation--a1201df6-c352-4a81-9c7c-5a6f896a1111",
>      "lang": "jp",
>      "spec_version": "stix-2.0",
>      "created_at": "2015-12-03T13:13Z",
>      "created_by_ref": "identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
>      "translated_ref": "campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>      "translated_text": [
>        "title": {
>                            "text":"Dridex キャンペーン - ボットネット 121",
>                            "hash":"abc..."
>                       },
>        "description": {
>                            "text":"ボットネット 121 を活用する Dridex を元
>にしたキャンペーン",
>                            "hash":"dae..."
>                       }
>      ]
>    }
>]
>
>Jeffrey Mates, Civ DC3/DCCI
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Computer Scientist
>Defense Cyber Crime Institute
>jeffrey.mates@dc3.mil
>410-694-4335
>
>
>-----Original Message-----
>From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf
>Of Jordan, Bret
>Sent: Monday, April 18, 2016 3:48 PM
>To: Mark Davidson
>Cc: Paul Patrick; Jason Keirstead; Masuoka, Ryusuke; John-Mark Gurney;
>cti@lists.oasis-open.org
>Subject: [Non-DoD Source] Re: [cti] i18n (RE: MVP Discussion)
>
>This is why I have proposed the simple translation object.  I makes all of
>this work.
>
>Bret
>
>Sent from my Commodore 64
>
>On Apr 18, 2016, at 11:58 AM, Mark Davidson <mdavidson@soltra.com> wrote:
>
>
>
>	I think ownership is part of the conversation.
>
>	Enabling a third party to add translations to an object effectively
>requires that they “steal” ownership of the object. Otherwise, the third
>party wouldn’t be able to author an authoritative revision. I don’t think
>content owners will want third parties to be able to issue authoritative
>updates to their content. As a concrete question: Could a foreign-language
>ISAO modify info from a Threat Provider, and publish it to their membership
>as ThreatProvider’s intel? Or would it need to be clearly marked as
>“derived from” the Threat Provider’s intel?
>
>	The way I see it, there would be two ways for a third party to do a
>translation: 1) create a new object with the translations that is
>“derived-from” the original; 2) suggest an update back to the original
>author.
>
>	FWIW, I am for i18n being MVP as long as we can solve it in a
>reasonably straightforward way.
>
>	Thank you.
>	-Mark
>
>
>	From: Paul Patrick <ppatrick@isightpartners.com>
>	Date: Monday, April 18, 2016 at 11:39 AM
>	To: "Jordan, Bret" <bret.jordan@bluecoat.com>
>	Cc: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Mark Davidson
><mdavidson@soltra.com>, "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>,
>John-Mark Gurney <jmg@newcontext.com>, "cti@lists.oasis-open.org"
><cti@lists.oasis-open.org>
>	Subject: Re: [cti] i18n (RE: MVP Discussion)
>
>
>
>	I completely agree with your statement about supporting translation
>being done by someone other than the content creator as that is likely the
>most predominant case.
>
>	I also agree that is MUST NOT break existing relationships.
>
>
>
>
>	From: "Jordan, Bret" <bret.jordan@bluecoat.com>
>	Date: Monday, April 18, 2016 at 11:32 AM
>	To: Paul Patrick <ppatrick@isightpartners.com>
>	Cc: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Mark Davidson
><mdavidson@soltra.com>, "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>,
>John-Mark Gurney <jmg@newcontext.com>, "cti@lists.oasis-open.org"
><cti@lists.oasis-open.org>
>	Subject: Re: [cti] i18n (RE: MVP Discussion)
>
>
>
>
>		If we do internationalization then it must support the
>ability for someone other than the content creator to generate translations.
>Also the solution must not break all existing relationships.  Embedded
>languages as a dict on a field, as Ryu has suggested, would prevent a third
>party from adding a translation.  That seems like a show stopper for that
>design.
>
>		Bret
>
>		Sent from my Commodore 64
>
>		On Apr 18, 2016, at 9:22 AM, Paul Patrick
><ppatrick@isightpartners.com> wrote:
>
>
>
>			I’m concerned that postponing i18n from MVP is
>basically postponing it until a 3.x since it is very likely to break
>backwards compatibility with 2.x releases as has been demonstrated by the
>various proposed designs to address i18n support and their impact on the
>model and bindings.
>
>
>			Paul Patrick
>
>
>
>			From: <cti@lists.oasis-open.org> on behalf of
>"Jordan, Bret" <bret.jordan@bluecoat.com>
>			Date: Monday, April 18, 2016 at 9:15 AM
>			To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
>			Cc: Mark Davidson <mdavidson@soltra.com>, "Masuoka,
>Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>, John-Mark Gurney <jmg@newcontext.
>com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
>			Subject: Re: [cti] i18n (RE: MVP Discussion)
>			Resent-From: <Paul.Patrick@FireEye.com>
>
>
>
>
>				Given the state of this discussion I move
>that we do NOT do internationalization for MVP.  There is just too much we
>need to figure out and understand.
>
>				Bret
>
>				Sent from my Commodore 64
>
>				On Apr 18, 2016, at 6:24 AM, Jason Keirstead
><Jason.Keirstead@ca.ibm.com> wrote:
>
>
>
>					The text ID is not referring to the
>field. It is referring to a piece of text that has been translated.
>
>					Example - say I publish an
>observable with a title "Dridex". That is a pretty common name and is likely
>to be used all over the place, with many instances of observables and
>sightings. You don't want to have to re-translate that over and over and
>over for each one. You want one translation, and for that to be able to be
>used everywhere that text occurs.
>
>					This is the exact same way that
>translation works for most all programming languages. You translate a piece
>of text, then forevermore you can refer to that text by an ID, in many
>different contexts.
>
>					-
>					Jason Keirstead
>					STSM, Product Architect, Security
>Intelligence, IBM Security Systems
>					www.ibm.com/security |
>www.securityintelligence.com <http://www.securityintelligence.com>
>
>					Without data, all you are is just
>another person with an opinion - Unknown
>
>
>					<graycol.gif>Mark Davidson
>---04/18/2016 08:19:48 AM---I’m trying to catch up to the thread - please
>accept my apologies if my thoughts/questions are under
>
>					From: Mark Davidson
><mdavidson@soltra.com>
>					To: "Masuoka, Ryusuke"
><masuoka.ryusuke@jp.fujitsu.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>,
>John-Mark Gurney <jmg@newcontext.com>
>					Cc: Jason
>Keirstead/CanEast/IBM@IBMCA, "cti@lists.oasis-open.org"
><cti@lists.oasis-open.org>
>					Date: 04/18/2016 08:19 AM
>					Subject: Re: [cti] i18n (RE: MVP
>Discussion)
>					Sent by: <cti@lists.oasis-open.org>
>
>
>________________________________
>
>
>
>
>					I’m trying to catch up to the
>thread - please accept my apologies if my thoughts/questions are
>under-informed.
>
>					How does the text_id help? The field
>is already uniquely identified by it’s location in the representation
>(I.e., indicator=2222.title). If I wanted to add a title, I’d insert a new
>key into the Title object/dictionary. Updates would be handled the same way.
>
>
>					To me, a reasonably minimal way of
>going about language on each field would be a structure like this:
>
>					“field-name”: {“lang-id”:
>“text”, “lang-id2”: “texto”}
>
>					However, in my mind, having the lang
>as a root attribute (and nowhere else) seems to be the simplest solution so
>I’d like to prod a little in that direction.
>
>					I’d like to ask about:
>					> For example, the titles are given
>in EN, but the descriptions are given in JP for CTI from a Japanese CTI
>provider.
>
>					Why is this the case? If it’s that
>the english title is suitable for the Japanese translation, then I’d say
>“lang=jp” is suitable for the whole object, even though not all text
>fields necessarily have a “lang=jp” entry.
>
>					I guess I’d be OK with a top level
>lang field AND a dictionary of “language-id”:”text” for each text field.
>It feels like it’s roughly the simplest mechanism for achieving i18n. I
>think that would handle revisions and updates fairly seamlessly also - a
>translation provider could send back an update to the content owner with the
>new languages for particular field(s).
>
>					Thank you.
>					-Mark
>
>					From: <cti@lists.oasis-open.org
><mailto:cti@lists.oasis-open.org> > on behalf of "Masuoka, Ryusuke"
><masuoka.ryusuke@jp.fujitsu.com <mailto:masuoka.ryusuke@jp.fujitsu.com> >
>					Date: Monday, April 18, 2016 at 5:34
>AM
>					To: "Jordan, Bret"
><bret.jordan@bluecoat.com <mailto:bret.jordan@bluecoat.com> >, John-Mark
>Gurney <jmg@newcontext.com <mailto:jmg@newcontext.com> >
>					Cc: Jason Keirstead
><Jason.Keirstead@ca.ibm.com <mailto:Jason.Keirstead@ca.ibm.com> >,
>"cti@lists.oasis-open.org <mailto:cti@lists.oasis-open.org> "
><cti@lists.oasis-open.org <mailto:cti@lists.oasis-open.org> >
>					Subject: RE: [cti] i18n (RE: MVP
>Discussion)
>
>					Hi, Bret, all,
>
>					I know I am loud, but ...
>
>					By putting everything (text_id,
>language code) into the text itself like
>
>					"title": {"text_id": "text-a1b2c3",
>					“en”: "Dridex Campaign - Botnet
>121"},
>
>					It does not affect other parts of
>the STIX and it does not get
>					affected changes in other parts of
>the STIX (like versioning
>					and object structure or how deep the
>object is).
>
>					As such, a single and simple code to
>produce and parse
>					the text will do.
>
>					Regards,
>
>					Ryu
>
>					From: Jordan, Bret
>[mailto:bret.jordan@bluecoat.com <mailto:bret.jordan@bluecoat.com> ]
>					Sent: Saturday, April 16, 2016 8:23
>AM
>					To: John-Mark Gurney
>					Cc: Masuoka, Ryusuke/益岡竜介; Jason
>Keirstead; cti@lists.oasis-open.org <mailto:cti@lists.oasis-open.org>
>					Subject: Re: [cti] i18n (RE: MVP
>Discussion)
>
>					What I would like to know, yes I am
>partial to this design, is how this design will NOT work for people. With
>some workflow analysis, and keeping with the identifier and versioning
>designs in STIX, I believe this, as represented by John-Mark, is the
>simplest and most straight forward solution.
>
>					But if I am missing some really key
>point, please call it out.
>
>					Personally I believe this design is
>straight forward enough and simple enough, that we could get this in the
>Summer 2016 release of STIX. However, as I said, if it is missing some key
>thing, please let me know so I can better understand.
>
>					This solution does the following:
>
>					1) Provides a solution for single
>producer, a UI in a product can easily allow translations to be created
>under the covers for an object.
>
>					2) Provides a solution for third
>party translators.
>
>					3) Allows translations to be sent
>separate from the original file.
>
>					4) It does not require us to
>pre-identify every field that we want to translate and thus add UUIDs to all
>of the objects causing significant bloat on the wire and increased demands
>on processing.
>
>					There are few elements that this
>design does not cover...
>
>					a) Mixing languages in the same JSON
>object. This can be a good thing from a graph database standpoint and really
>simplifies the consumption of objects or request of the objects from a TAXII
>server.
>
>					b) It is not very clean for
>translating deeply nested objects. I am sure we can figure out a solution
>for this, but the current example does not show one. However, given our
>design criteria for STIX 2.x, we are desperately trying to flatten the
>objects as much as possible, so this may or may not end up being an issue.
>
>
>					Thanks,
>
>					Bret
>
>
>
>					Bret Jordan CISSP
>					Director of Security Architecture
>and Standards | Office of the CTO
>					Blue Coat Systems
>					PGP Fingerprint: 63B4 FC53 680A 6B7D
>1447 F2C0 74F8 ACAE 7415 0050
>					"Without cryptography vihv vivc ce
>xhrnrw, however, the only thing that can not be unscrambled is an egg."
>
>
>							On Apr 15, 2016, at
>16:31, John-Mark Gurney <jmg@newcontext.com <mailto:jmg@newcontext.com> >
>wrote:
>
>							Masuoka, Ryusuke
>wrote this message on Fri, Apr 15, 2016 at 02:20 +0000:
>
>							- Always give
>"text_id" and "lang" for every text field
>							(So that anyone can
>give translations to the field later, knowing
>							which language it is
>in.)
>
>							- Always give
>"text_ref", "text_id" and "lang" for every translation
>							("text_id" is for
>someone to provide translations to other than one in the original language.
>							Example: A CTI text
>field created in Japanese, then it is given an English translation.
>							Then German and
>French translations are produced based on the English translation.)
>
>							A big issue with
>this is that now EVERY text field (that is
>							translatable) will
>now have a UUID. For descriptions, this isn't a
>							big issue, but when
>we are talking about titles and the like, it's
>							possible that the
>UUID will be longer than the translation itself..
>
>							I much prefer to
>handle translations by pointing to the object id,
>							and then the fields
>that you want to translate..
>
>							This is what I'm
>talking about:
>							{
>							"type": "package",
>							...
>							"campaigns": [
>							{
>							"type": "campaign",
>							"id":
>"campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>							"lang": "en",
>							"revision": 1,
>							"spec_version":
>"stix-2.0",
>							"created_at":
>"2015-12-03T13:13Z",
>							"created_by_ref":
>"identity--69a17e1b-bb45-4657-9a9d-96db3faccdde",
>							"title": "Dridex
>Campaign - Botnet 121",
>							"descriptions":
>"Dridex-based campaign leveraging Botnet 121",
>							"intended_effects":
>[
>							{"value":
>"theft-identity-theft"}
>							],
>							"status": "Ongoing"
>							}
>							],
>							"translations": [
>							{
>							"obj_ref":
>"campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>							"type":
>"translation"
>							"lang": "ja",
>							"text_id:
>"text-a1b2c3-ja-1",
>							"title": "Dridex
>キャンペーン - ボットネット 121"
>							"descriptions": "
>ボットネット 121 を活用する Dridex を元にしたキャンペーン"
>							},
>							{
>							"obj_ref":
>"campaign--a1201df6-c352-4a81-9c7c-5a6f896a4e31",
>							"type":
>"translation"
>							"lang": "de",
>							"title": "Some
>German Title”
>							"description": "Some
>German Description"
>							}
>							]
>							...
>							}
>
>							This is much more
>simple, It can be more simply handled in code by
>							overlaying objects
>by language preferences, etc...
>
>							As Bret pointed out,
>this does mean you can't have a base object w/
>							mixed languages, but
>I don't see a strong value in that, as those
>							other languages can
>be provided via translations...
>
>							--
>							John-Mark
>
>
>
>
>
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]