How to Train a Machine Learning Model to Defeat APT Cyber Attacks

Part 3: Fuchikoma v0 - Learning the Sweet Science

Fuchikoma ML model training for its big fight against CyAPTEmu, the APT Emulation

This is Part 3 in a multi-part series of articles about how Cybots Senior Researcher C.K. Chen and team step-by-step used open source software to successfully design a working threat hunting machine learning model. We suggest starting from Part 1: Introducing Fuchikoma.

How to Train a ML Model to Defeat APT Cyber Attacks

Round 1: Introducing Fuchikoma

Round 2: Fuchikoma VS CyAPTEmu: The Weigh In

Round 3: Fucikoma v0: Learning the Sweet Science

Round 4: Finding the Fancy Footwork (Releases 2020.02.19)

In preparation for the second round of MITRE ATT&CK evaluations, C.K. Chen and team went about designing an emulation of an APT attack, which they named CyCraft APT Emulator or CyAPTEmu for short. CyAPTEmu’s goal was to generate a series of attacks on Windows machines. Then, a proof of concept threat hunting machine learning (ML) model was designed to specifically detect and respond to APT attacks. Its name was Fuchikoma.

Fuchikoma’s baby photo

“Fuchikoma v0 is designed to be a thought experiment–not a working ML model. Version 0 is a naive brute force solution that is doomed to fail. We start here to gain insight into the problems, or ‘challenges’ as we called them, which gave us an outline on how to focus development of the Fuchikoma ML model.”

 – C.K. Chen, CyCraft Senior Researcher

Fuchikoma begins its life as a simple theoretical supervised classifier ML model (models like SVM or Discern Tree would do), classifying events as malicious or benign. As explained in Part 2 of this series, the Elastic Stack collects the Windows event log data from the Windows machines and then sends all of that information to Fuchikoma. In order to decrease the overwhelming amount and types of data Fuchikoma v0 has to read, Fuchikoma solely focuses on reading the process creation events within the Windows event log data from all the machines.


Even though we’ve narrowed the input data down to process creation events, a typical workday in an organization might see billions of diverse process creation events. One single ML component or classifier could face a lot of problems trying to deal with the amount and diversity of this data. Therefore, a ML pipeline, consisting of several components, will need to be constructed. The function of these future components in the Fuchikoma ML pipeline will stem from the solutions to the following challenges that the naive Fuchikoma v0 brings to the front.

Challenge One: Weak Signal

A single event in isolation does not contain enough information to determine if it is a threat or not. Data needs to be contextual in order for it to be useful. Window commands such as “whoami” or “netstat” could be used by a systems analyst or a cyber criminal. Without linking preceding and proceeding contextual data to this command, Fuchikoma will have trouble labelling this individual event as benign or as malicious.

To expand on our boxing analogy, Mike Tyson (CyAPTEmu) could be moving his arm; however, without contextual information Fuchikoma v0 would be unable to label the movement as Kid Dynamite moving his arm to scratch his nose (benign) or moving his arm to unleash a deadly uppercut (malicious). Fuchikoma v1 will need to be able to add contextual information to each event it processes to accurately label it.

Challenge Two: Imbalanced Data Sets

As stated earlier, a typical workday in an organization’s environment could see billions of diverse events. Only a tiny portion of which would actually be related to a real attack. Out of the 11,135 events fed to Fuchikoma only 119 (or 1.1 percent) were malicious. This massive imbalance in data sets (normal versus malicious) creates two big problems.

Labelling becomes incredibly inefficient as you should be labelling benign events 98.9 percent of the time according to this dataset. You could label everything benign and be correct 98.9 percent of the time. However, in terms of cybersecurity, mislabelling malicious events as benign (false negatives) could be catastrophic; this isn’t an option.

If you trained Fuchikoma to identify unique events in hopes to better locate the malicious behavior, that too would face the same issue: Only 35 of the 1,857 unique events were malicious; that’s 98.1 percent benign to 1.9 percent malicious.

In addition, there are too few malicious events to properly train or create a classifier as, in general, ML classifiers need a minimum of 50 samples (with many more preferred); otherwise, you won’t be able to produce meaningful results.

Scikit-learn’s algorithm cheat-sheet has spoken: Get more data.

Heading back to the boxing ring, our friend Fuchikoma is busy processing everything in its environment from each audience member’s individual arm movements to the subtle movement of Mike Tyson’s feet. He would have trouble identifying and classifying a jab from a handshake.

Fuchikoma v1 will need to be able to quickly classify, separate, and process these data classes.

Challenge Three: Lack of High Quality Labels

One of Fuchikoma’s many goals is to automate the alert verification process; however, Fuchikoma v0 could only verify very specific attacks–pretty much exact command lines. Attackers tend to use variations of a theme and a combination of specific attacks together.

  Fuchikoma v0 could identify Mimikatz but may have trouble verifying Mimikatz used again with its optional components mimilib or mimidrv. So the Fuchikoma v0 labels would not be general enough.   In addition, Mimikatz is often used with several other tools at different phases of the cyber kill chain. This problem comes from Fuchikoma v0 only being capable of classifying individual events. In addition, manually classifying individual events one at a time for training would take well beyond a reasonable amount of time, so Fuchikoma v1 will need to be able to link together contextual events to accurately label malicious events and do so in significantly less time. In the boxing ring, this means that Fuchikoma could identify one uppercut from Mike Tyson; however, if Kid Dynamite threw another uppercut at a different angle, speed, or location, then Fuchikoma’s going down for the count.

Challenge Four: No Storyline

Seeing one event in isolation isn’t enough to fully understand from a forensic perspective what malicious activity is occurring on your network. Worse still, security analysts might miss something when presented with a smattering of isolated events.

Ideally, Fuchikoma should also be able to link similar alerts together; thus, creating not only fewer alerts, but also constructing an attack storyline that “narrates” the path of the attack back to its true root cause.   Our goal is to leverage machine learning to expeditiously remove a huge amount of false positive alerts and benign events leaving only the most worthy incidents to be analyzed by human security analysts, and to do so in a way that makes sense from a forensic perspective.

Fuchikoma v0 could have labeled events as malicious, but was unable to correlate these malicious events together and construct the desired attack storyline. Our friend Fuchikoma could possibly label w3wp.exe as abuse but wouldn’t understand why it was being abused since it couldn’t link the w3wp.exe abuse to the Windows kernel exploit, MS15-015.exe.
“In my youth, I would have argued that life is just a series of random events devoid of any meaning. But as a data scientist, I have to recognize that sometimes patterns emerge. Undeniable patterns.”

Bertram Gilfolye, Pied Piper CTO & Senior Systems Architect

  Even in the world of boxing, patterns emerge. “1-2-3” (jab, cross, hook) is an extremely powerful boxing combination. Continuing with our analogy, even though Fuchikoma v0 was able to correctly identify a jab, a cross, and a hook individually, our friend would not be able to link these seemingly disparate attacks together and predict the following combination–an important skill to master when one learns the art of the sweet science.   The post-trained Fuchikoma ML model’s automated process should be able to detect the jab, the cross, and the hook. If our friend initially verifies the jab and the hook, it should know to look for the missing link in the pattern–the cross. Ideally, if Fuchikoma reads any two of the 1-2-3 combo events, it should know to look for or predict the missing link. Not just for the good ol’ 1-2-3, but for all known attack patterns Mike Tyson has ever swung before. Fuchikoma is defending a sensitive corporate enterprise. If one punch gets through, the enterprise could face serious repercussion concussions.
In the next article in this series, we’ll continue exploring the steps C.K. Chen and team took in the development of Fuchikoma and discuss the critical component to the Fuchikoma ML pipeline that influenced all future Fuchikoma iterations. We will also meet the rest of the Ghost in the Shell team and learn how C.K Chen and team tackle the four main challenges of the Fuchikoma ML model.

Fuchikoma is only one simplified model out of the 50+ complex ML models that CyCraft’s CyCarrier platform uses to defeat APTs in the wild every day. For more information on our platform and the latest in CyCraft security news follow us on Facebook ,LinkedIn ,and our website at CybotsAi..

How to Train a ML Model to Defeat APT Cyber Attacks

Round 1: Introducing Fuchikoma

Round 2: Fuchikoma VS CyAPTEmu: The Weigh In

Round 3: Fucikoma v0: Learning the Sweet Science

Round 4: Finding the Fancy Footwork (Releases 2020.02.19)