[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0
I would propose that it is b) as it involves capturing information about the sighting itself including potentially source, timing, confidence, observation details, count, etc..
All of this information is very useful for intelligence and I would characterize it as practical and not theoretical.
When we talk about adding it to the graph lets remember we are not necessarily talking about a human readable graph but rather a machine readable, analyzable (hopefully with inferencing capabilities) graph potentially backed by big data (sorry for the
buzzword) capabilities.
I think like with anything various parties may wish to age out sightings information as they may do with other information/intelligence at a point when they decide it is at reduced relevance but I don’t think we can throw up our hands, declare it impossible
and just leave it out of the model.
I also agree with John that the majority of high-value indicators are likely to be of relatively low volume for sightings. Low-value or poorly constructed indicators are likely to have much higher sightings volume. I would propose that from a language
model perspective we leave the decisions of how many reported sightings to keep up to the users.
If a user’s capabilities cannot keep up with massive volumes of sightings, they will filter them. If they are getting massive volumes of basically identical sightings reports from the same organization, they are very likely to capture metadata and filter
the individual sightings. There are different ways to deal with the volume. I believe we should leave those questions to the users not the model. The model should support the capabilities needed for appropriate analysis and sharing.
sean
From: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
Date: Tuesday, November 3, 2015 at 11:10 AM To: John Wunder <jwunder@mitre.org> Cc: Jerome Athias <athiasjerome@gmail.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>, Mark Davidson <mdavidson@mitre.org>, "Barnum, Sean D." <sbarnum@mitre.org>, Terry MacDonald <terry@soltra.com> Subject: Re: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0 In general, if you design anything with the requirement to scale large, then said system can easily scale down. But the inverse is rarely true.
Is that true in all scenarios? Sure, a lot of commodity indicators will probably have zillions of hits. But what about targeted indicators of APT activity that we want to carefully track? I feel like we’re designing for this one scenario of a ton of sightings when in practice the more valuable activity might be less volume and more specificity. (Not to say we don’t care about the volume use case, just that it’s not the only one). John
I understand the theoretical usefulness, but I still stand by the fact that once you get into large scale, it's usefulness as raw data becomes inconsequential... In terms of the graph - I believe sightings is a metric on the edge between an
observer and an indicator, and that edge has attributes such as "count" and "last seen". It is not a vertex in and of itself, that would not scale in real world scenarios.You also don't need to store the raw instances of sightings to do the most useful analysis
of those metrics (including temporal). I can have a time series database tied to the edge that is storing sighting counts over time, without storing the actual raw sighting instances. >Why would I want unique records of all of those sightings, to what purpose is it serving? What people care about in a sighting is a count of indicators, so that they can give increased >significance to those that are currently "live”. The count is like a heartbeat. It tells you if that TTP is still “alive” but that is really all it does. It is the actual sighting details that give you deeper insight (“intelligence”) into what is happening and how you might prevent or mitigate it. There are many forms of analysis that can be done on the sighting information but the most obvious and prevalent have to do with “when” and “who”. Temporal analysis across the actual sightings can yield all sorts of insight beyond just “alive” or “dead”. Similarly, analysis of who is sighting the indicator and when can give very valuable insight into victim targeting, who is being affected that might not know it yet and who will likely be affected next. If the sightings include details of what was actually observed rather than just a “matched pattern” count this information can also be very valuable in understanding the nature of the TTP and how variations of it may be being applied to different subsets of the victim targeting pool. Think of sightings like case reporting from doctors to the CDC. If you want to know if a potential contagion is something to worry about then counts give you the first measure but if you want to actually study the epidemiology, know how fast and how far it is spreading, know where it will likely spread next, know what sort of victims are most susceptible, know which methods are successful in slowing or stopping it and want to get ahead of it, you will need the actual “sightings”. sean From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Tuesday, November 3, 2015 at 8:48 AM To: Terry MacDonald <terry@soltra.com> Cc: Jerome Athias <athiasjerome@gmail.com>, "Jordan, Bret" <bret.jordan@bluecoat.com>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John Wunder <jwunder@mitre.org>, "Taylor, Marlon" <Marlon.Taylor@hq.dhs.gov>, Mark Davidson <mdavidson@mitre.org>, "Barnum, Sean D." <sbarnum@mitre.org> Subject: RE: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0 > Um, why do we want the same ID? If the attacker has sent our Org the same pdf 2000 times then don’t we want to record that fact, and link the Sighting objects (with Observable Instances) to the Incident
object with 2000 relationship objects? Then > don’t we want to also send that group of Incident, Sightings, Observables and Relationships to others in our Threat Sharing group so that they are aware of them? That is accurate.
Um, why do we want the same ID? If the attacker has sent our Org the same pdf 2000 times then don’t we want to record that fact, and link the Sighting objects (with Observable Instances) to the Incident object with 2000 relationship objects? Then don’t we want to also send that group of Incident, Sightings, Observables and Relationships to others in our Threat Sharing group so that they are aware of them? That is accurate. If we are worried about the size of storing the PDF multiple times, then it is up to the implementation to recognize that the MD5 of the attachment item is the same and then actually only store it once (just like MS Exchange servers have been doing since mid-2000’s). How do we identify to others that the above data came from us? If the ID of the object just generated from the <HashofContent> then there is no easy way to do this. If the ID of the object generated from the namespace.<HashofContent> then we have more chance. But what happens if we decide to update the Incident? The ID is now namespace.<NewHashofContent>. Now how do we de-duplicate? Do we now have to puclish a relationship object explicitly stating the is a replacement object for the namespace.<HashofContent> object? And what about relationship objects? Part of the power of separate top-level objects is that we can now just tell people about the relationship, but we can keep the actual data node it refers to a secret. Therefore in some implementations the only link to tie relationships together is the fact both relationships share an ID: e.g RelationshipA (src: CampaignA -> Threat ActorA) RelationshipB (src: IndicatorA -> CampaignA) RelationshipC (src: IndicatorB -> CampaignA) The recipient may not have the CampaignA data or ThreatActorA data, but they will still know that the IndicatorA and IndicatorB are related to the same campaign thanks to the relationship contains in the same IDs. This completely breaks if the ID’s change over time. We need an ID solution that:
- The ID stays the same over the lifetime of the object even if it is updated and the content changes. - Recognizes that IDs will be coming from many different companies and many different sources and that we ned a way of easily understanding who produced the data. To go over the FW use case again
2. The FW mgmt. server has STIX/TAXII capabilities. For the first detection alert that the FW MGMT receives, it creates a STIX v2 Sighting object, and a corresponding STIX Observable containing a CybOX EmailMessage Object and a related File object, and two relationship objects to join the STIX Sighting to the Observables. It stores a mapping of the Observable SHA256 / file ID in a local internal data table for the EmailMessage and the File. It sends these out on the TAXII channel that it was configured to use. 3. The main TAXII repository receives this STIX v2 Sighting object and the corresponding STIX Observable containing a CybOX an EmailMessage Object and a related File object, and adds them to its repository. 4. For the second detection alert that the FW MGMT receives, it does a SHA256 hash of the Email contents and the attached File independently to see if it’s seen them before. It hasn’t seen the EmailMessage before, but it has seen the attached PDF. 5. it creates a STIX v2 Sighting object, and a corresponding STIX Observable containing a new CybOX EmailMessage Object (email address was different). The EmailMessage contains the idref of the previously generated File object. It also adds two relationship objects to join the new STIX Sighting to the Observables. It sends these out on the TAXII channel that it was configured to use. 6. The main TAXII repository receives this second detection STIX v2 content, and adds them to its repository. 7. The next detection alerts each will create a new Sighting object, new EmailMessage object but will refer to the same File object. Relationships will be created between these objects as well. At this point, the main taxi repo knows that the File objects are all related. Cheers Terry MacDonald Senior STIX Subject Matter Expert SOLTRA | An FS-ISAC and DTCC Company +61 (407) 203 206 | terry@soltra.com From: Jordan, Bret [mailto:bret.jordan@bluecoat.com] Sent: Saturday, 31 October 2015 4:54 AM To: Jason Keirstead <Jason.Keirstead@ca.ibm.com> Cc: Wunder, John A. <jwunder@mitre.org>; Terry MacDonald <terry@soltra.com>; Mark Davidson <mdavidson@mitre.org>; Sean D. Barnum <sbarnum@mitre.org>; Jerome Athias <athiasjerome@gmail.com>; Taylor, Marlon <Marlon.Taylor@hq.dhs.gov>; cti-stix@lists.oasis-open.org Subject: Re: [cti-stix] Proposal to establish Sightings (#306) and Relationships (#291) as our official issue topics under active consideration for STIX v2.0 Lets run the FW use case to the ground, since most everyone should understand it... FW 1 see a series of weaponized PDFs come down. Say it sees the same Weaponized PDF 2,000 times over a period of 3 days. A large phishing attack with a lot of click happy users. 1) Now it is highly unlikely with the current model that the FW will remember and use the same ID value (UUID) for each Indicator+Observable+MAEC data blob it issues for this Weaponized PDF. In fact, it will probably have 2,000 different UUID IDs for the same Indicator. 2) Now when you compound this by 60,000 client in the network issuing Sightings, this becomes to blow up quickly. Maybe... Just maybe.... The FW could take the JSON Indicator that it is going to issue and hash the data blob and use that hash as the ID. Then at least each FW that is running the same code and is seeing basically the same thing with the same amount of data-enrichment, will issue the same ID value. We will have a totally different problem in TAXII Land in the Query REST API. Because you will probably want to do something like: /t2/query/indicator/file_name/FreeFood.pdf or /t2/query/indicator/file_hash/<some file hash of the PDF> Thanks, Bret Bret Jordan CISSP Director of Security Architecture and Standards | Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."
So then the question becomes - if the consumers are not using the IDs, then why are they required...
I am against a mandatory 32 or 64 or whatever bytes in every sighting message if usually the bytes don't have any meaning behind them. And to again re-iterate - this problem is beyond sightings... it certainly exists for many classes of observables, and sometimes even indicators. - Jason Keirstead Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security | www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown [attachment "graycol.gif" deleted by Jason Keirstead/CanEast/IBM] |
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]