Being an InfoSec Professional / Cybersecurity

Let’s stop measuring risk

Ok, I don’t quite mean that.  What I mean is let’s stop using residual risk as the final product of the risk measurement calculation.  Let’s consider a more pragmatic formula.  This is going to seem sacrilegious to NIST and the VERIS guys will probably just think I am being quaint, but I am serious.  I think we’ve over-complicated something that could be simpler.  And I’m going to make it all more complicated before I finally simplify it below.  Bear with me.

Consider the limitations of the current risk calculation formula (“typically a function of the degree of harm and likelihood of harm occurring”, NIST 800-30 rev 1–  Guide for Conducting Risk Assessments):

Likelihood.  It is hard to quantify the chances of something happening without significant amounts of significant data and so many people are left with industry averages—if that.  I just want to quote one of my earlier posts: “…the risk of being invaded by predatory space aliens.  Hollywood has provided us with lots of illustrations of the impact of alien invasion.  The Day the Earth Stood Still (both of them), War of the Worlds (both of them), Independence Day, Signs, Skyline, etc.   These outlandish examples are to illustrate that while we can describe impact fairly well—in graphic terms even— we are not nearly as good at describing likelihood. And that is where we lose our audience in talking about risk.”

Impact.  It is often binary, especially with Healthcare and Financial data.  Any breach outside specific exceptional circumstances is always more impact than the Enterprise wants to absorb, isn’t it?  And how about the impact of a fire in your datacenter?   Isn’t, for example, the likelihood of a fire in your datacenter completely irrelevant to why you have fire alarms and chemical fire suppressant in there?  In Healthcare, we describe the Enterprise’s tolerance for certain things with the phrase: “a never should happen event”— even though the event can and does happen.  Breaches and fires fall into that category.  In Healthcare, when it comes to Electronic Protected Health Information, you either encrypt the data at the proper strength (impact=0) or you protect unencrypted data and even so the impact of a breach= “too much”.  Credit Card Information with the mag stripe data?  Social Security Numbers with birthdates?  Does anyone want to split hairs over the “impact” of losing 10,000 Social Security Numbers?  When a NASA laptop went missing with 10,000 employee records on it, no one from NASA administration said “when we chose not to encrypt every laptop, we did so accepting the risk that this could happen”.  It’s just bad when it does happen—since no one ever announces they have accepted the impact, let’s agree that no one accepted the risk.

Inherent risk.  It is an illusion that can be reduced to data classification or mission criticality masked as something else.  Public data?  Well the inherent risk is 0 because there is no impact to losing it.  Restricted data, such as the key to FOBs for two-factor authentication?  The inherent risk cannot be quantified—it’s just really big.   Mission critical operation?  Then the inherent risk is, well, critical.  You get the idea.

The equation.   We will look at this in more detail, but for now,  consider if controls really do change inherent risk to residual risk or do they rather influence impact and/or likelihood (hold that thought) and, if it’s an equation, what’s the symbol for that influence?  Even when impact and likelihood are quantified, the equation only has the illusion of accuracy.  How do you take a likelihood of 1:100 and an impact of $10,000 per event and say that the risk is total number of events (n) divided by 100 multiplied by $10,000 ((n/100)*10000)?  Oh, that’s right, you can do that if you fudge both reputation risk and the indirect risks of additional scrutiny by those that govern you or those that govern your industry if your actual number of negative events is too high.  Or you’ve already created a proxy number for these qualitative risks and baked them into the $10,000.  Or if you’re an actuary pricing cyber-liability coverage.

In other words, from the point of view of anyone except the cyber-liability carrier, information security risk is not like financial risk.  It is only ever partially quantitative even when you find a way to quantify both impact and likelihood unless you have a lot of data for likelihood and the ONLY impact is truly quantifiable (e.g., financial).  And, whenever one of the two cannot be quantified, then coming up with a proxy and multiplying the two is at least something of a fiction.  What’s more, the whole exercise to fully quantify is only important if you need to have residual risk as your final product.  In other words, if you sell cyber-liability insurance, you will most certainly quantify the risk.  If not, then the approximations required to quantify risk may not be all that useful.

So what if we took risk out of the equation?  Granted, we still need to produce an output that can be presented to the Executive leadership of an organization which will allow them to make decisions for the organization.  We need to have something that is the equivalent of “accepting risk”.  We can accomplish this.   To illustrate how, I will remove the formula’s focus on probability  and make it instead an “abstract algebra” problem.  And I admit, I enjoyed doing this.

Here are formulas for calculating residual risk:

irsk= (rL*rh)

where irsk=inherent risk

rL=real likelihood

rh=real harm in the absence of mitigating controls

real=within the realm of the possible

* is something like multiplication but according to the NIST formula it is a “function” that combines “degree of harm and likelihood of harm occurring”

rrsk=irsk ǣ mitc

where rrsk=residual risk

ǣ is a symbol I appropriated to stand for the function of the influence of mitigating controls on risk  (“always effect”)

mitc=mitigating controls

the formula to define a mitigating control is:

mitc=effc[c]

where effc=effectiveness of the control

c=a control without considering its effectiveness

[ ] is a way of indicating the combination of describing something with its effectiveness

First let’s remove inherent risk (irsk) from the formula using well known rules of algebra:

rrsk=(rL ǣ (effc[c])) * (rh ǣ (effc[c]))

Since not everyone defines inherent risk in their analysis, this formula directly above is what we normally consider to be the heart of risk assessment.  The next step is where we get to take advantage of the fact that * and ǣ are not really arithmetic operations like multiplication,  division, subtraction and addition and putting something in brackets [ ] is a shortcut.  I am going to say that the way to remove ǣ from the equation is to break mitc up into what the control is from how effective it is—remove the shortcut.  If we take residual risk out of the formula, we get:

effc=(c*(rL+rh))

We’ve done a few things here.  First of all, we’ve removed the idea that we MUST figure out how likelihood and harm interact.  Just add them together[1].  If something is just too big an impact and still within the realm of the possible (like a major fire in our datacenter) then we tend to ignore the likelihood and if neither likelihood nor harm are in the realm of what we consider possible (alien invasion is the example I use) then the equation resolves to what we can define as zero in this algebra.

In addition, we’ve focused the equation on the most actionable part of risk management: how effective are your appropriate controls.

So what you present to Executive leadership is the effectiveness of the appropriate controls.  To use the example I’ve been using above here’s what you present: “The likelihood of a fire in the datacenter is not that great.  We’ve been operating it for 20 years and never had a fire.  But the harm it would do to our infrastructure and the cost of restoring it are huge. So we have put controls in place that include automatic smoke and heat detectors and a chemical fire suppressant system.  We have had a third party expert in fire suppression solutions evaluate our controls and they consider these to be effective controls.”  That is a serious presentation of facts without using the word “risk”.  And it represents an assessment that Executive leadership can choose to accept or reject.

Appropriateness is not part of the equation, although I introduced it in the last two paragraphs.  Like the evaluation of whether or not a likelihood or harm is “real”, i.e. in the realm of possibility, appropriateness is purely subjective.  For example, if someone presented “anti-virus software on workstations” as a mitigating control for the harm of having a fire in the data center, I think we can agree to reject it from our equation as inappropriate.

Which means that you still need to locate what you need to protect and where it is.

By focusing on the effectiveness of controls, you present Executive leadership with not just a more actionable picture of how their environment is protected, you present them with choices that make sense at the Executive level.  I’ll present two examples and then conclude this blog.

The NASA laptop.  It was unencrypted and went missing after being taken from a NASA facility with personnel records for ~10,000 NASA employees.  Because the laptop had sensitive data on it, the employee violated internal information security policy by removing it from the premises.  I discuss this in an earlier blog (Adequatley?).   We know that losing employee data was considered to be a high impact event for them and we know that they considered their policy against employees taking unencrypted mobile devices with sensitive data on them out of their office to be an adequate control to keep it from happening.   Looking at this from the point of view of risk management illustrates why risk management should not have a statement of risk as its end product.  Hypothetically, the statement of residual risk to NASA administration would have been “The risk of losing sensitive employee data is low”.  This assessment  would be based not on the impact, which they had assessed as high.  Rather, the risk presented to management is “low” because it is against policy to take the information, unencrypted, out of the facility and the likelihood of someone violating policy was hard for NASA risk assessors to imagine.  Remember, even after the laptop went missing, NASA said the policy alone was an “adequate” control.

The effectiveness of the control was what NASA administration should have been asked to sign off on.   Executives can be expected to accept a risk measured as “low”, but would they have signed off on people never making the mistake of taking sensitive data out of their facility on an unencrypted laptop because a policy says they can’t?  This is an organization that crashed a 125 million dollar satellite into Mars because “one engineering team used metric units while another used English units for a key spacecraft operation” (CNN, September 30, 1999).  In other words, they are no strangers to human error.  No one can say for sure what conversations took place at NASA, but regardless, this is a good example of where “residual risk” says a lot less than “effectiveness of control” as a discussion point with Senior leadership.

Malware.  Malware is another example of a potentially high impact threat.  Malware is also a threat that comes in so many degrees of impact that it would be difficult to determine a single residual risk.  And as for likelihood, it depends on whether we are discussing a new strain of malware or one that has been around for a while.   The logs from an Enterprise-wide malware solution that is used as its organizations primary defense usually shows a regular stream of exploits identified and cleaned.  Add in SPAM filters, IPS and any other preventive controls which you are hoping to keep malware out of your environment (or keep it from spreading).

The residual risk formula looks something like this:

rrsk=(“happens all the time” ǣ (effc[c])) * (“ranges from annoyance to catastrophic” ǣ (effc[c]))

where effc=”we count on the controls working roughly X% of the time”

I am not trying to play straight into the hands of anti-malware vendors, but I would argue that a conversation about the effectiveness of your controls, i.e. what percent of the time does it catch the malware, has far more value than a discussion of residual risk.

There are exceptions, of course.  There are some times when taking the term “risk” out of risk management does not work.

Regulatory compliance may require that you produce output with high, medium and low residual risk cataloged into those kinds of buckets.  Compliance with such regulation may dictate that your Executives know how to discuss and accept risks that are described as high, medium and low.

Completely new environments or environments where the previous risk assessments are suspected to be unreliable need to be evaluated for the effectiveness of the controls.  Depending on what is discovered, it might be worthwhile to describe risk in such a way that makes it clear that there is something to worry about.  The phrase “unacceptably high risk” does that nicely.

In other words, when inherent risk is at the extremes and the controls are completely ineffective, then it is worth discussing risk directly.  Truly slight risk activities should be called out so that resources are not wasted on strengthening controls.   And when the risk assessor comes on a situation with high inherent risk and no effective mitigating controls?   I believe the security professional is paid to jump up and down and scream wildly in those situations.

To sum up: unless you sell cyber-liability insurance, descriptions of risk should be just that, descriptions.  The distinction between residual and inherent risk is worthless because what matters is real risk.  And evaluating the effectiveness of your controls is a quicker route to developing an action plan than trying to figure out which is more important to lower: a medium risk which is the result of a low-likelihood/high-impact calculation or a medium risk based on a medium-likelihood/medium-impact calculation.  The risk assessment exercise should result in action plans and adjectives should not be the end product of formulas.


[1] In reality this equation does not preclude quantifying likelihood and or harm.  If you really could quantify them sufficiently to multiply them together, you still could.  Just replace the + sign with a the * and it still works for defining effc.

2 thoughts on “Let’s stop measuring risk

  1. David – I enjoyed your article. I always question people who “accept” a $1 million risk who have $5K signing authority. Who gives them that right and do they tell senior management what risks they have “accepted”.

  2. You’re right, Jim. Regardless of how you measure risk it is essential that the resolution of any analysis that falls short of eliminating the possibility of the adverse impact needs to be an acceptance of the status quo or a commitment to an action plan. And that has to be at the appropriate level.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s