fubarnorthwest: Shell

Monday, April 20, 2015

Exploring System Data: Use Anything but bash.

Recommendation::Language
Audience::Intermediate
UUID: 4e163e7c-ec63-430e-83e2-605e9df95526

In a gmail conversation related to changes to the Linux kernel, I asked whether anyone still used gnuplot, which was used in the example. Because one of the first things you do when exploring data is to look at the distribution. Duh.

Of course, I am sure that gnuplot is still in constant use. People don't scrap production systems simply because something is more fashionable. Or they shouldn't, anyway. The math is not favorable.

As a side note, I really need to take a decision on how I want to display math on this blog.

I started a project related to data analysis using some old-school techniques, all based around shells. Shells can be a win, for answering questions such as, "How is this new application changing my system." That can be important. I've seen Web application servers deployed before the location and content of log files was known, much less characterized at a level of, "What sensitive information might be written if the log level is DEBUG?"

Shells are fine for that sort of fast initial cut. The problem is people don't want to throw that code away. They keep writing one more grep statement, or whatever. My personal alarms tend to ring at arrays. If the system becomes complex enough that I need arrays, I am going to question the wisdom of doing it in a shell.

They aren't POSIX, so you become wedded to one particular shell. Want to use dash instead of bash? Sorry, but you can't.
You can't pass arrays to functions, if you need to do something more complex than loop over them. Even for that, you are probably going to use a reference. Modify them? Sorry, but you can't.
You can't even take a slice of an array. Sorry, but you can't.
What stop-gaps exist for dealing with arrays, or even faking them if they aren't available in your shell, tend to use 'eval'. Which adds a whole new layer of potential security issues. Sorry, but you really shouldn't.

Shell arrays don't do anything more complex than map strings to integers. Except in the case of bash associative arrays, which are a newer, shinier, and deeper can of worms. The point is that the most advanced data structures available in shells are not really suitable for building a software with any sort of complexity.

I pretty much won't start down that slippery path, any more, and I hope that you won't either. I tend to use Python. If you prefer Ruby, have fun. It's just too slow for me, but it's also widely used in the security community, including in the Metasploit framework.

There is value in knowing pretty much any language, especially in the security field, if for no other reason than to know how problems with them can be exploited. That is not an argument for falling into the same trap of eval'ing something, wedding yourself to sanitization problems, because you pushed the language too far.

2015-04-23 Addendum

The power of the shell is seductive. I still use it all the time. Moments ago, on a Linux machine:

# ls -lh /var/log/secure

-rw-------. 1 root root 8.3M Apr 23 13:24 /var/log/secure

# wc -l /var/log/secure

69397 /var/log/secure

# grep hddtemp /var/log/secure | wc -l

69319

But this is not a mechanism for monitoring log growth. I can immediately see log file size, that (non-SELinux) permissions are correct, and that this one is mostly about monitoring a drive temperature.

The problems will surface when I try to reuse these commands: add -Z to /usr/bin/ls will show me the SELinux context, find lines that aren't about hddtemp, etc. But in scripts, to start with, you should not parse the output of /usr/bin/ls. stat(1) is your friend (and don't forget to supply, and appropriately quote, a format string).

The shell gives you a lot of power to spot-check things. Leave it at that, and save yourself some grief.

Thursday, June 12, 2014

Linux Shells Just Keep Bugging Me

A couple of months ago, Dan Farmer (who is definitely someone that technically-oriented security people should pay attention to) put up a one-liner to generate 128 bits of pseudo-randomness, via a shell.

Unfortunately, single quotes are showing up as back-ticks in that post, so cut-and-paste 'programmers' had a bit of debugging to do. Hopefully Blogger will do a bit better (update: yes it does):

dd if=/dev/urandom bs=16 count=1 2>/dev/null| hexdump |awk '{$1=""; printf("%s", $0)}' | sed 's/ //g'

My first reaction was horror; sed and awk are both Turing-complete programming languages, and a bit on the large size. However, large systems written in a shell will likely use both, in which case they will likely be resident in memory. Unless your system is swapping madly, in which case you have other problems. So, even running that code in a tight loop may not have as much effect as one might think.

There are things we might do to swap in lighter-weight utilities, and get more readable code as well:

dd if=/dev/urandom bs=16 count=1 2>/dev/null | hexdump | head -n 1 | cut -d ' ' -f 2- | tr -d ' '

In shells, each command executes in a sub-shell, and that is an expense. My version costs us two more sub-shells. So, is it a win? I have no clue, and getting a clue is not an easy thing to do in shells. Variants of this code fail to give valid results using either the bash builtin 'time', or /usr/bin/time. Putting it into a script, and timing that, does not really help.

It is complicated. Have a look at https://stackoverflow.com/questions/5014823/how-to-profile-a-bash-shell-script. Note that there is no mention of taking an average of many tests, which is fundamental.

Shell code is pretty much the last thing you should use to write anything, if you have security in mind. I've had to do it because managers demanded it. Probably because they read an article about POSIX, from ten or fifteen years ago. But the security argument has seldom worked in the past, so let's mount that profiling argument. It is entirely valid.

Shells are tremendously useful for things like the initial exploration of log files, and setting up execution environments*. Which means that knowing the native shell is important. And there are things I still need to look at; I've just installed the dash shell. But so far, my take is that if you are using a famously slow and error-prone tool to write production (vice exploratory) code, and have no good means of profiling that code, You Are Doing It Wrong.

* file $(which firefox)
/usr/bin/firefox: Bourne-Again shell script, ASCII text executable