Friday, September 13, 2013

The Risk of Temporary Systems

Here is an example of the classic 'temporary system' problem, which I have seen in various forms, up to a rogue server that some developers installed on the DMZ. Scenario:

  • Client installs new servers at a low rate--never more than 10 per year
  • Client has an incoming QA procedure that involves a burn-in  (Yay!)
  • Client burn-in procedure is to install an ancient OS, and some scripts that were written years ago  via optical drive (Boo!)
  • Client has no subnet for this, just some reserved IP numbers. (Boo!)
  • Cient just lets systems "soak for a while" (Boo!)
  • Client discovers a temporary system is compromised (Yay!)
  • Me gets money (Yay!)

The Yay count is three, though "Me gets money" probably wouldn't count, from the client's perspective, so let's just toss that one, and call it two. The Boo count is a definite three.

What Went Right

  1. They were sharp enough (or had been burned enough) to not just rack a new system up and place it into production. Infant mortality is real. Look at Google's publications on disk failures or something if you don't think that is an issue.
  2. They spotted the compromise in a much shorter than average period. It's not uncommon for compromised systems to remain undetected for months, so that is a huge Yay!

What Went Wrong

  1. Ancient code. The install rate was low enough that they didn't see much benefit from modernizing how they did this. Though it was a manual process, hence expensive. The irony is that they could have used provisioning/patching automation I'd already built for them, and this would never have happened.
  2. Improper subnetting. Ancient code, if running at all, should have been partitioned away. Particularly as it was just admin stuff, not a creaky old legacy business system that nevertheless had to be widely available, despite the risk.
  3. Procedures that need work. A Post-It note, with a start-of-run date, stuck to the front of a system is not good doc. 

Lessons


Most of this should be obvious from What Went Wrong. But it's worth stressing that countless problems are caused by organizations not being fully aware of what systems (and their security posture) are running behind the corporate firewall, or even full knowledge of what Internet connections exist. The thought of the odd T3 connection being forgotten about may seem strange to some, but it commonly happens in large organizations.

'Temporary' resources will become a larger problem than it is today. Virtualization, software-defined-everything, and the power of modern provisioning systems ensure this. Compute and storage nodes can be spun up with a few clicks of a mouse, and an interconnect with a few more clicks. The economic imperative is obvious.

There are things that the security community needs to work on, as always, but I would argue that the most important thing that organizations can do is embrace continuous audit.

No comments:

Post a Comment

Comments on posts older than 60 days go into a moderation queue. It keeps out a lot of blog spam.

I really want to be quick about approving real comments in the moderation queue. When I think I won't manage that, I will turn moderation off, and sweep up the mess as soon as possible.

If you find comments that look like blog spam, they likely are. As always, be careful of what you click on. I may have had moderation off, and not yet swept up the mess.