In a gmail conversation related to changes to the Linux kernel, I asked whether anyone still used gnuplot, which was used in the example. Because one of the first things you do when exploring data is to look at the distribution. Duh.
Of course, I am sure that gnuplot is still in constant use. People don't scrap production systems simply because something is more fashionable. Or they shouldn't, anyway. The math is not favorable.
As a side note, I really need to take a decision on how I want to display math on this blog.
I started a project related to data analysis using some old-school techniques, all based around shells. Shells can be a win, for answering questions such as, "How is this new application changing my system." That can be important. I've seen Web application servers deployed before the location and content of log files was known, much less characterized at a level of, "What sensitive information might be written if the log level is DEBUG?"
Shells are fine for that sort of fast initial cut. The problem is people don't want to throw that code away. They keep writing one more grep statement, or whatever. My personal alarms tend to ring at arrays. If the system becomes complex enough that I need arrays, I am going to question the wisdom of doing it in a shell.
- They aren't POSIX, so you become wedded to one particular shell. Want to use dash instead of bash? Sorry, but you can't.
- You can't pass arrays to functions, if you need to do something more complex than loop over them. Even for that, you are probably going to use a reference. Modify them? Sorry, but you can't.
- You can't even take a slice of an array. Sorry, but you can't.
- What stop-gaps exist for dealing with arrays, or even faking them if they aren't available in your shell, tend to use 'eval'. Which adds a whole new layer of potential security issues. Sorry, but you really shouldn't.
Shell arrays don't do anything more complex than map strings to integers. Except in the case of bash associative arrays, which are a newer, shinier, and deeper can of worms. The point is that the most advanced data structures available in shells are not really suitable for building a software with any sort of complexity.
I pretty much won't start down that slippery path, any more, and I hope that you won't either. I tend to use Python. If you prefer Ruby, have fun. It's just too slow for me, but it's also widely used in the security community, including in the Metasploit framework.
There is value in knowing pretty much any language, especially in the security field, if for no other reason than to know how problems with them can be exploited. That is not an argument for falling into the same trap of eval'ing something, wedding yourself to sanitization problems, because you pushed the language too far.
The power of the shell is seductive. I still use it all the time. Moments ago, on a Linux machine:
# ls -lh /var/log/secure
-rw-------. 1 root root 8.3M Apr 23 13:24 /var/log/secure
# wc -l /var/log/secure
# grep hddtemp /var/log/secure | wc -l
But this is not a mechanism for monitoring log growth. I can immediately see log file size, that (non-SELinux) permissions are correct, and that this one is mostly about monitoring a drive temperature.
The problems will surface when I try to reuse these commands: add -Z to /usr/bin/ls will show me the SELinux context, find lines that aren't about hddtemp, etc. But in scripts, to start with, you should not parse the output of /usr/bin/ls. stat(1) is your friend (and don't forget to supply, and appropriately quote, a format string).
The shell gives you a lot of power to spot-check things. Leave it at that, and save yourself some grief.