Earlier in my career I worked for a company that performed an R&D project to automate the self-defense process for a set of military systems. Defending a location from incoming missiles is difficult for several reasons. First, besides the general “fog of war,” the enemy will try to come in stealthily by using materials that lower their radar signature, or overwhelm defenses by doing things like jamming radar and firing multiple missiles that approach from different directions. Personnel must detect the threat, identify the type of incoming weapon, and determine a self-defense solution in a high-stress environment where time is very limited.
The technique the incumbent R&D team was using was automating the process through a set of Boolean rules (If X then Y). The problem was the typical one of “rule explosion.” The rule sets became so large that eventually it is nearly impossible to determine what the system would actually do in a real environment. Not something you want if you are being fired at and must quickly make a decision to fire back.
They asked our team to come up with a better solution as a proof-of-concept. We went through the options. Neural nets were out because there was (thankfully) very little training data of actual missile attacks. We settled on a Bayesian approach for the detection part of the solution. It allowed us to develop an “Expert System” that leveraged knowledge of experienced military officers who were in that position and could help us develop the probabilistic parameters. Even so, the project would never really be finished. The enemy is constantly evolving weapons and tactics to thwart defenses, so adjustments are constantly needed.
So, how is cyber defense similar to military defense?
- The fog of war is certainly evident in cyber defense. Networks are very noisy and generate a lot of false positives.
- Cyber teams must identify threats, identify the type of threat, and determine a defense solution in a high-stress environment where time is very limited.
- Threats will come in stealthily by going “low and slow” or create a lot of noise through Denial of Service attacks, etc.
- Attacker tactics continually change. Defenses must constantly be adjusted.
Unfortunately, our main alerting platforms today (SIEMs) still use primarily Boolean rules, with all the disadvantages discussed above. Worse yet, most organizations use the default detection and alerting rules out-of-the-box. Anyone can buy the detection tools or SIEMs to determine these rules, or just get the manuals that will tell them how to avoid detection. One of the biggest hidden costs of a SOC is customizing and tuning the security tools.
Many people, including me, believe the next logical step toward improving cyber defense is using data science techniques in conjunction with cyber subject matter knowledge. Whether it be machine learning, neural nets, or other methods, we are in the early phases of a totally new way of doing security.
As with our military self-defense project, going to the next level depends on several things. First, a large set of training and test data that includes “true positive” attacks – real threat signatures injected at a known time and from a known location. New tools are available that can generate this simulated attack data and inject it into the operational network. This can enable us to get training data that will allow our algorithms to be validated in real-world conditions.
Second, we will need professionals well versed in both cyber defense and advanced analytic techniques. Security is a blend of art, skill, and technology – there is no purely “mathematical” solution. Models need to be created and optimized using domain knowledge. These are not static models. Our work will never be finished. Attacker tactics constantly change and there will be more and more game theory involved in these solutions.
We are in the early phases of integrating data science into security and have a long way to go. It is an exciting time to see how our profession will be transformed in the future.