A supervised learning approach to Living off the Land attack classification from Adobe SI
Happy Holidays, readers!
Their LotL Classifier includes two components: feature extraction and an ML classifier algorithm. Again, read their post on these components, but I do want to focus a bit on their use of the random forest classifier for this project. As LotL Classifier is written in Python the project utilizes the sklearn.ensemble.RandomForestClassifier class from scikit-learn, simple and efficient tools for predictive data analysis and machine learning in Python. Caie, Dimitriou and Arandjelovic (2021), in their contribution to the book Artificial Intelligence and Deep Learning in Pathology, state that random forest classifiers are part of the broad umbrella of ensemble-based learning methods, are simple to implement, fast in operation, and are successful in a variety of domains. The random forest approach makes use of the construction of many “simple” decision trees during the training stage, and the majority vote (mode) across them in the classification stage (Caie et al., 2021). Of particular benefit, this voting strategy corrects for the undesirable tendency of decision trees to overfit training data (Caie et al., 2021). Cotaie, Boros, Vikramjeet, and Malik, the Living off the Land Classifier authors, found that, though they used a variety of different classifiers during testing, their best results in terms of accuracy and speed were achieved using the RandomForest classifier. This was driven in large part due to their use of test data representative of “real world” situations during the training stage.
Figure 1: Jupyter notebook: LotL reverse shells, filed uploads, coin miners
You can experiment with this notebook for yourselves, via GitHub.
Let’s explore results. Findings are scored GOOD, NEUTRAL, or BAD; I intentionally selected LotL strings that would be scored as BAD. Per the author’s use of feature extraction coupled with secondary validation courtesy of the BiLingual Evaluation Understudy (BLEU) metric, consider the results of our Linux reverse shell examples as seen in Figure 2.
Figure 2: LotL reverse shells results
The authors employ labels for the same class of features, including binaries, keywords, patterns, paths, networks, and similarity, followed by a BLEU score to express the functional similarity of two command lines that share common patterns in the parameters. As a result, GTFObins examples for Netcat and bash are scored BAD, along with Gimp, but are further bestowed with LOOKS_LIKE_KNOWN_LOL. Indeed it does. Note the model triggering on numerous keywords, commands, and paths.
Figure 3: LotL coin miner results
Again, we note keyword and command matches, and the full treatment for the regsvr32 example.
The Adobe SI crew’s work has piqued my interest such that I intend to explore their One Stop Anomaly Shop (OSAS) next. From their research paper, A Principled Approach to Enriching Security-related Data for Running Processes through Statistics and Natural Language Processing, Boros et al. (2021) propose a different approach to cloud-based anomaly detection in running processes:
I’ll certainly have opportunities to test this framework at scale; if findings are positive, I’ll share results here.
Caie P., Dimitriou N., Arandjelovic O., Chapter 8 - Precision medicine in digital pathology via image analysis and machine learning, Editor(s): Stanley Cohen, Artificial Intelligence and Deep Learning in Pathology, Elsevier, 2021, Pages 149-173, ISBN 9780323675383, https://doi.org/10.1016/B978-0-323-67538-3.00008-7.
Dec 28th 2021
|Thread locked Subscribe||
Dec 28th 2021
9 months ago