How to Train a Machine Learning Model to Defeat APT Cyber Attacks

Part 4: Fuchikoma v1 - Finding the Fancy Footwork

This is Part 4 in a multi-part series of articles about how Cybots Senior Researcher C.K. Chen and team step-by-step used open source software to successfully design a working threat hunting machine learning model. We suggest starting from Part 1: Introducing Fuchikoma.

How to Train a ML Model to Defeat APT Cyber Attacks

Round 1: Introducing Fuchikoma

Round 2: Fuchikoma VS CyAPTEmu: The Weigh In

Round 3: Fucikoma v0: Learning the Sweet Science

Round 4: Finding the Fancy Footwork (Releases 2020.02.19)

Round 5: Fuchikoma v2: Jab, Cross, Hook, Perfecting the 1-2-3 Punch Combo (Coming Soon)

In preparation for the second round of MITRE ATT&CK evaluations, CyCraft Senior Researcher C.K. Chen and team went about designing an emulation of an APT attack, which they named CyCraft APT Emulator or CyAPTEmu for short. CyAPTEmu’s goal was to generate a series of attacks on Windows machines. Then, a proof of concept threat hunting machine learning (ML) model was designed to specifically detect and respond to APT attacks. Its name is Fuchikoma.

Last round, Fuchikoma v0 got knocked out by CyAPTEmu. Major Kusanagi of Section 9 (the purple-haired lady pictured above) was not pleased.

The thought experiment Fuchikoma v0 model gave insight into the four main challenges when designing a threat hunting ML model: having a weak signal, imbalanced data sets, a lack of high quality data labels, and the lack of an attack storyline.

In keeping with our boxing analogy, Fuchikoma v0 lost because it couldn’t decipher similar movements (such as footwork) as benign or malicious, was too busy focusing on everything in and out of the ring to properly identify malicious attacks, had trouble identifying similar attacks, and couldn’t string the attacks together to aid in fight analysis. With all this in mind, it’s no wonder Fuchikoma v0 got knocked out.

Introducing Fuchikoma v1.

Fuchikoma’s elementary school yearbook photo
In order to turn the tide for our friend, Fuchikoma v1, an analysis unit (AU) builder was constructed by structuring process events into AUs (a mini process tree), which would contain more useful contextual information. The idea is for Fuchikoma to cluster similar events thus reducing investigation time and increasing accuracy.

Each AU (pictured below) would be a mini process tree with a total depth of 5 layers; each process creation event (the cmd 1 node in the graphic below) was linked with its parent and up to three tiers of child processes. TF-IDF was used for vectorization of the command lines of the processes in Unit2Doc, which were then sent further down the ML pipeline for clustering.
Left: Analysis Unit Proc Tree. Right: Actual AU consisting of vectorized command lines.
As we discovered in Part 3 of this series, a typical workday in an organization’s environment could see billions of diverse events. Only a tiny portion of which would actually be related to a real attack. More than 98.9 percent of the events would be benign (assuming they meet the distribution of our sample data).

Pure supervised learning would be inefficient here for the real world as investigators (Fuchikoma’s friends at Section 9) would have to individually label each of the numerous and diverse events as malicious or benign. Instead, unsupervised learning algorithms, such as k-means, could cluster many similar events prior to labelling thus reducing the number of labels that needed to be generated by our investigators.

Since k-means gives more weight to the larger clusters, our investigators (Fuchikoma’s friends from Section 9) might know with a higher degree of confidence that the larger clusters would not contain malicious events as 98.9 percent of events are benign. We can see from this that maybe highlighting outliers might be more ideal.

After clustering, our investigators (Fuchikoma’s friends from Section 9) wouldn’t need to label each of the billion diverse events individually but rather label each of the clusters — a dramatic decrease in time and resources.

The results were …

… not great.

Labelling ten clusters (k was valued at 10 in a few of our tests) is better than labelling one billion; I’m sure we can all agree on that. Clustering is a step in the right direction; however, k-means presented two new challenges.

Setting the value of k (the number of clusters to be formed) before the initial labelling isn’t ideal, as it is hard to determine the clusters a priori. For example, while in a few of our tests k was valued at 10, this value would prove difficult to determine for each environment Fuchikoma needs to inspect. I mean, do you know the ideal number of centroids to accurately cluster all of the daily process creation events for your office environment? 

Our second issue is the imbalanced data set. There were 11,254 events generated; we’re looking for the 119 (1.1 percent) that are malicious. This second problem is compounded by the first problem. Perhaps, there should be eight clusters — each with hundreds of thousands of data points, and we’re looking for two points in that cluster. Not ideal. Don’t forget that some malicious activity could appear identical to benign activity (e.g., “netstat” or “whoami”). Despite these issues, clustering could still prove useful, as the initial conceit of labeling groups as opposed to individual dots can still be a time saver; however, highlighting outliers may prove better. Fuchikoma v2 will need more upgrades to its pipeline.

C.K. Chen and team decided Fuchikoma v2 should leverage outlier detection as it is effective at cutting the noise out of data sets by labelling them as outliers. However, instead of ignoring these outliers, Fuchikoma’s friends at Section 9 would focus on these outliers as these data sets theoretically have a higher probability of being malicious.

 Fuchikoma will have to get more training in order to properly defeat CyAPTEmu. Let’s hear from our investigator team from Section 9 on Fuchikoma v1’s performance.

Challenge One: Weak Signal [RESOLVED]

Single events in isolation do not contain enough information to determine if they are a threat or not. Data needs to be contextual in order for it to be useful. Analysis units, which contain contextual child and parent process information, were added into the ML pipeline and are then clustered and labeled later in the ML pipeline.

In the boxing ring, this means Fuchikoma can know relate everything it sees to everything else. The wooden stool in CyAPTEmu’s corner isn’t related to camera flashing in the background. The backward swing of our opponent’s arm is definitely related to the punch that is now quickly speeding towards us. Ouch, was that a glove contacting my face?

Challenge Two: Imbalanced Data Sets

As stated before, a typical workday in an organization’s environment could see billions of diverse events. Only a tiny portion of which (1.1 percent in the training data) would actually be related to a real attack. This massive imbalance in data sets (normal versus malicious) creates two big problems: (1) inefficient labelling time and (2) not enough data.

Fuchikoma v1 attempted leveraging k-means. k-means proved difficult. It wasn’t good enough as among other things defining k in advance is difficult. k-means still suffered from the imbalanced data sets. Clustering could still be useful; however, other algorithms such as DBScan or Isolated Forest need to be leveraged for anomaly detection to further boost the signal.

In the previous fight, Fuchikoma v0 was too busy labelling everything to significantly participate in the fight. That’s an arm. That’s a glove. That’s a punch. Those are stars spinning around my head. Fuchikoma v1 was less distracted as it didn’t need to generate that many labels; however, finding CyAPTEmu out of all the noise still proved too much to handle. Remember that only 1.1 percent of everything happening in this boxing ring is malicious. Fuchikoma is practically fighting against a mostly invisible opponent, and CyAPTEmu only really needs to land one punch.

Challenge Three: Hard to Retrieve of High Quality Labels

One of Fuchikoma’s many goals is to automate the alert verification process; however, both Fuchikoma v0 and Fuchikoma v1 could only verify very specific attacks — pretty much exact command lines. As attackers tend to use variations of a theme and string combinations of specific attacks together. Fuchikoma v2 will need to be able to classify not one but all possible attack variations as malicious; clustering was a move in the right direction, but didn’t get us all the way there without its own issues.

Fuchikoma v1 had more high quality labels than Fuchikoma v0; however, there were still too many and too internally-diverse clusters that needed manual analysis, most of which were benign.

What’s in an uppercut? That which we call an uppercut by any other label would still knock Fuchikoma out for the count. Fuchikoma would still only be able to see one style of uppercut but would have trouble identifying all the various iterations. When Bruce Lee said, “Be like water,” he didn’t mean to be a refreshing beverage for your opponent. Fuchikoma v1 went down with a gulp.

Challenge Four: No Storyline

This isn’t so much a flaw in the model as a need from the forensic analyst using it. Detecting one piece of malware in isolation isn’t enough to fully understand from a forensic perspective what malicious activity is occurring on your network. Worse yet, security analysts might miss something when presented with a smattering of isolated events. Our Section 9 investigating team demands an automated attack storyline to increase their SOC efficiency.

In the next article in this series, we’ll continue exploring the steps C.K. Chen and team took in the development of Fuchikoma and discuss the multiple outcomes of integrating more ML algorithms. We will not only breakdown how Fuchikoma v2 performed against both APT3 and Dogeza but discuss how Fuchikoma v2 was able to resolve two of the remaining three design challenges.

Fuchikoma is only one simplified version of one of the 50+ complex ML models that CyCraft’s CyCarrier platform uses to defeat APTs in the wild every day. For more information on our platform and the latest in Cybots security news follow us Facebook ,LinkedIn ,and our website at CybotsAi.

How to Train a ML Model to Defeat APT Cyber Attacks

Round 1: Introducing Fuchikoma

Round 2: Fuchikoma VS CyAPTEmu: The Weigh In

Round 3: Fucikoma v0: Learning the Sweet Science

Round 4: Finding the Fancy Footwork 

Round 5: Fuchikoma v2: Jab, Cross, Hook, Perfecting the 1-2-3 Punch Combo