Notes from 12/15 Working Call Attendees: Bret Jordan Ivan Kirillov John Wunder Desiree Beck Ron Williams Paul Patrick Summary: The group discussed malware families vs. instances, classifications, how to represent "observable" data like filenames/hashes, location, targeting, and aliases. The major takeaway is that the malware object should be split into two objects: malware, and malware-instance. Malware would characterize general families, malware instance would characterize specific samples/instances. The group will put together proposed objects. The other open questions for broader discussion is whether the malware object needs to have fields for file names and size, whether aliases should be a separate field or just use external references, and how to represent detailed analysis results. Next steps: The group will put together proposals for changes to the malware SDO and for a new malware-instance SDO. It will meet next week to discuss the new objects and further discuss open questions and topics we haven't gotten to yet. In the meantime, conversation will continue via Slack. Detailed Notes: Family vs. Instance =================== There was some discussion of the different types of representations you need to characterize. Generally everyone agreed that they fell into two buckets: - Malware Family, which is more generic like "Locky" - Malware Instance, which is a specific sample w/ hash, etc. Malware family has fields such as: name, description, aliases, kill chains, first/last seen. Malware instance has the above fields, plus hashes, scan data, the sample itself, 1+ analysis, and potentially filenames and file sizes. It might also have a dynamic/static analysis. Relationships also likely differ between families and instances. Given that discussion, the group seemed to agree that malware should in fact be represented by two separate SDOs: one for malware families (the current one) and one for individual instances/samples. The group will develop a set of changes to the current malware SDO, as well as a proposal for a new "malware-instance" SDO. Field Details ==================== Aliases: Paul Patrick suggested that we needed to add an aliases field, everyone generally agreed. There was some discussion of whether aliases is better captured as external-references...if so, it could be represented as a gray "override" field. Finally, there was discussion of overlap between scan data/classifications and aliases and the need to deconflict. Location: the group discussed location, and it was suggested that location is in fact not a location of the malware but a location of the authors/attribution of the malware. As such, it's probably better represented as a relationship to the malware authors threat actor or intrusion set. Targeting: the group also discussed targeting of malware. Targeting is important in many cases, but we also need to represent untargeted (general purpose) malware. The group will work to identify important types of malware targets (e.g. by identity/sector, location, ICS) and how to represent that (likely as a relationship). Targeting may also be a property of a campaign using malware. Classifications: the group discussed the field previously known as scan_data. It was decided that we should rename it to classifications to be less confusing, yet still not use the term "av" (a four letter word). We also discussed adding a capability to say that a tool explicitly did classify a sample as a virus while at the same time making sure that people do it correctly and don't just omit the result. There was discussion of the details field and whether it should be a custom property, we will work through some examples. Last_seen: will be added, similar to first_seen. Delivery vectors: no capability to represent this now, strongly suggested by everyone. MAEC has an existing enumeration, which we'll look through and put together examples. It's possible that this could be captured as relationships to infrastructure/tool/attack-pattern objects. "Observable Data" for instances =============================== We talked about the filenames, size, and hashes field. Ivan suggested that filenames and size are significantly less useful than hashes in identifying the malware, searching for it, etc. Plus, many malware use dynamic filenames today. Others agreed, though Bret said that it's still tracked in many cases. The group decided to bring that topic up to broader discussion. If the broader discussion leads to the filenames and size fields being removed (and represented as relationships to actual file objects that were observed) then hashes will be added just as a field on the malware-instance SDO. If the group decides to keep more properties of Cyber Observable file objects on the malware object itself, there needs to be a discussion of how they're represented. Analysis =============================== The group discussed options for incorporating malware analysis results (static/dynamic analysis output, etc.) One option is to just have a maec field with a MAEC analysis, but Paul said that even if everyone uses MAEC you might have output from several tools, so it needs to be a list. If not limited to MAEC, it could be a list of analysis/analysis_lang pairs. That would allow for non-MAEC representations. Another option would be to have behavioral analysis fields directly within the malware instance SDO.