I regularly challenge myself and others to visualize the results of their analysis, when and where the data permits it. The likes of ggplot2 enables this beautifully for R users. Then, in September 2018, gganimate hit my radar via R-bloggers and I had an epiphany.
“gganimate extends the grammar of graphics as implemented by ggplot2 to include the description of animation. It does this by providing a range of new grammar classes that can be added to the plot object in order to customize how it should change with time.”
While Thomas’s gganimate examples are intriguing, and triggered my notions for deeper visualization opportunities, they were contextually unrelated to my goals. As such, I endeavored to provide example data sets and applicability for information security and assurance analysis. As purveyors of security analysis services, my team is perpetually faced with solving problems at massive scale, yet finding intelligent, accurate answers in the sea of data. While a static visualization specific to a related analysis can be truly effective, an animated visualization, particularly a time-based graphic, can bring the art to a whole new level. A couple of points and caveats:
gganimate installation is really simple. You can grab the stable version from CRAN via
or the development version via
Note that, while working on Windows 10, I used a gganimate fork via
to overcome a Windows 10-specific bug. Installation from CRAN or the thomasp85 GitHub should be otherwise successful. I strongly suggest reading through as much of the gganimate reference guide, as a Grammar of Animated Graphics, there is some granular syntax to consume and understand here.
I selected three of Thomas’s examples and customized them for use in a security analysis context. Thomas is gganimate’s author and maintainer, for a very current review of the project’s history, current state, and road map, see gganimate has transitioned to a state of release. The project is now officially a v1.0 release. The project GitHub includes three examples:
I utilized the principles and code from each of these and applied them to three unique security-oriented scenarios, namely security incident counts over time, a cloud provider Cybersecurity Framework attestation comparison, and ten years of Security Development Lifecycle utilization.
Security Incidents Time Series
I’ll start with a simple example and concept. I’m not a big fan of security incident counts by themselves as a metric or a KPI, but they do inform trend indicators. For large service providers and operations, data of this nature can inform leadership of patterns to manage as well. This visualization compares incident counts by day of the month, over five months August through December, in parallel, as seen in Figure 1.
Figure 1: Security incidents time series
One could reach conclusions such as:
Were this real data specific to the environment you’re supporting you might adjust scheduling and staffing to account for a heavier work load at the beginning of the month, while potentially pushing scheduled time off to the middle of the month.
Cloud Provider Cybersecurity Framework (CSF) Attestation Comparison
For our second scenario, imagine you’re in the market for a cloud service provider, and you’re charged with conducting the utmost due diligence. It just so happens that The Cloud Security Alliance (CSA) Cloud Controls Matrix (CCM) is “designed to provide fundamental security principles to guide cloud vendors and to assist prospective cloud customers in assessing the overall security risk of a cloud provider. The CSA CCM provides a controls framework that gives detailed understanding of security concepts and principles that are aligned to tools including the Cybersecurity Framework.” The CSF is oriented towards the function areas Identify, Protect, Detect, Respond, and Recover. With a combination of cloud service provider data, as well as your own research, you gathered data to measure provider performance in each of the function area over the period of a year. Your data is refined to a percentage of completeness towards each of the function areas for the twelve months of the year for your final two provider candidates. The code to create this visualization follows.
Visualizing this data with gganimate for purposes of comparison thus might appear as seen in Figure 2.
Figure 2: Cloud providers CSF comparison
There’s a pretty clear conclusion to be reached with this visualization. It certainly appears that Cloud Provider 2 is the more mature of the two providers, by at least 20% per function area. A visualization of this nature for vendor comparisons of many different kinds could be very useful in making better informed decision, particularly when they’re large financial investments.
Ten Years of Security Development Lifecycle Utilization
I’m personally fond of this last example as I am both a proud advocate for the practice of a Security Development Lifecycle and a believer that this level of performance measurement granularity can and should be performed. I have to imagine mature development environments with strong code management capabilities are likely able to achieve some semblance of this scenario. The premise of the data set assumes a ten year measurement where aggregate development organizations have tracked:
Each of these are valid and important measurements and KPIs for development organizations, nor matter what product is being developed. This data set represents measurements across multiple applications, built for all major platforms (Windows, Linux, Android, iOS, Mac), over a ten year period since the organization began utilizing SDL. First, the code.
The resulting visualization warrants a bit of explanation. This size of each node (application) in the five major platform panes panes represents is indicative of the size of the application’s code base. The x axis represents the number of bugs filed, and the y axis represents the number of regressions introduced, as seen in Figure 3.
Figure 3: Ten Years of SDL
A few observations:
While again, this is artificial, manipulated data, I tried to cook it in such a manner as to produce likely outcomes that would be well observed with animated visualizations over time.
Each of these scripts and data sets are available for you on my GitHub, as is a Jupyter Notebook.
I’d love to see what you come up with, please share them with me via social media, @holisticinfosec or email, russ at holisticinfosec dot io.
Cheers…until next time.
Jan 9th 2019
1 week ago