How to Train a Machine Learning Model to Defeat APT Cyber Attacks

Part 2: Fuchikoma VS CyAPTEmu – The Weigh In

Ladies and Gentlemen! Welcome to the Rumble in the Virtual Jungle – Fuchikoma versus CyAPTEmu!

Before we introduce our fighters, let’s take a brief moment to acknowledge C.K. Chen, Cybots Senior Researcher and the lead designer (and trainer) of both Fuchikoma (the threat hunting machine learning model) and CyAPTEmu (the APT emulator).

C.K. attended National Chiao Tung University (QS ranked in the top 100 CS programs in the world, and second in Taiwan)  where he received his PHD in Distributed Systems & Network Security, specializing in reverse engineering malware, malware analysis, and vulnerability discovery. He founded his university’s first CTF team, BambooFox, and participated in the 2016 and 2018 DEFCON Finals. In his free time, C.K. is a reviewer for HITCON and HITB and also lectures at the AIS3 “Hacker College” from time to time. For the last year, he’s been leading the research team at CyCraft, the first Taiwanese cybersecurity vendor to join the MITRE ATT&CK Evaluations.

Determined to do well in the second round of MITRE ATT&CK Evaluations against APT29, C.K. and team developed an emulation of the first round adversary – APT3.

Caption: CyCraft presents: The Dream Fight Weigh In! ML Fuchikoma prepares to face off against the ruthless opponent, CyAPTEmu! 

This is Part 2 in a multi-part series of articles about how CyCraft Senior Researcher C.K. Chen and team step-by-step used open source software to successfully design a working threat hunting machine learning model. We suggest starting from Part 1: Introducing Fuchikoma.

The Weigh In

Kid Dynamite, CyAPTEmu!

The goal of CyCraft APT Emulator (CyAPTEmu) is to generate attacks on Windows machines in a virtualized environment. CyAPTEmu will send two waves of attacks; each utilizing a different pre-constructed playbook. Empire was used to run the first playbook, modelled after APT3.

Known for leveraging zero-day exploits in multiple phishing campaigns, APT3 is one of the more sophisticated threat groups that CyCraft tracks. The APT3 playbook consists of data we have collected in the wild and data from shared global threat intelligence. APT3 has several custom tools that record keystrokes in encrypted files, can enumerate current network connections, establish SOCKS5 connections for an initial C2, and remove indicators of compromise.
Metasploit was used to run the second playbook, which C.K. and team called Dogeza. “Dogeza” in Japanese refers to the traditional custom of kneeling directly on the floor and bowing until one’s head touches the ground; dogeza could be used to express a deep apology or deference to someone of higher status. Here, C.K. used “dogeza” to describe the power of the sophisticated APT attacks that CyCraft faces every day in the wild, which his team modeled the playbook on.

CyCraft secures various large enterprises and organizations across multiple countries and encounters zero-day exploits, stealthy tools, and advanced attacks every day. Many of these are included within the Dogeza playbook. One we commonly find is Juicy Potato, which is a weaponized variant of RottenPotatoNG that exploits the way Microsoft handles tokens, thus allowing attackers to escalate their privileges. The popular credential dumping malware, Mimikatz, was also included in Dogeza. However, not every Dogeza playbook entry is as notorious as Juicy Potato and Mimikatz. C.K. and team also included a keylogger found in the wild that remains undetectable by VirusTotal. Dogeza was designed as an any-given-day-in-the-wild playbook, so Fuchikoma could be trained to handle attacks outside of the APT3 playbook.

In order to better simulate the real-time response of a blue team in response to a cyber attack, the open source Elastic Stack was chosen due to its ability to query large data sets in a reasonable time. Beats was installed on every Windows machine and sent Windows log data to Elasticsearch, where the Fuchikoma ML pipeline picked up.

Each playbook was given its own virtual environment. In order to better create a typical environment, benign events were created by allowing our interns to use the machines for normal everyday use.

The APT3 environment consisted of five devices: one AD server, one SMB/CIFS file server, and three Windows user endpoints. The goal here was to emulate the internal network of an enterprise or the typical office environment.

The Dogeza environment consisted of four devices: one IIS web server, two Windows user endpoints, and one Linux user endpoint. The goal here was to emulate a company with a heterogeneous environment and emulate the DMZ /internal network split that exists in many enterprise networks.

The Deep Blue Hope, Fuchikoma!

Fuchikoma is a proof of concept, demonstrating a machine learning (ML) model designed to not only accurately detect malicious cyber attacks, but also create a storyline that traces the attack from its current location back to its true root cause. Our intention with Fuchikoma is to demonstrate how ML can be used concretely to automate large swaths of SOC work. One malicious event never provides enough information; SOC analysts need to trace malicious activity back to the root cause of the attack to gain the necessary forensic perspective on the entirety of the attack. Only then, are SOC analysts able to fully understand an attack and possibly the threat actors’ motivations.

“Machine learning isn’t magic; it’s work–hard work. We made mistakes and overcame challenges.”

 – C.K. Chen, CyCraft Senior Researcher

Fuchikoma uses several off-the shelf technologies like Scikit-learn (a commonly used ML Python library), Natural Language Toolkit (NLTK, another commonly used Python library), and Neo4j (a commonly used graph database). Roughly, the idea is that NLTK will process Windows log data for specific malicious patterns and then store that data in Neo4j, which then will be sent to models trained in Scikit-learn. These technologies form the skeletal system Fuchikoma’s more complicated muscles (other tech) will be built on; however, like any ML model, there’s plenty of training that needs to be done!


In the next article in this series, we’ll explore the theoretical base model, Fuchikoma v0 and discuss the four main challenges C.K. and team faced while training Fuchikoma through its multiple iterations. Fuchikoma is only one simplified model out of the 50+ complex ML models that CyCraft’s CyCarrier platform uses to defeat APTs every day. For more information on our platform and the latest in CyCraft security news follow us on  Facebook ,LinkedIn ,and our website at CybotsAi.