Tuesday, November 17, 2015

If There Were One Feature I Wish Bugzilla Had

Commentary::Performance
Audience::Entry
UUID: ddf3eae9-a84d-4083-987e-a84cf2ec8aec

It would involve track records. Specifically, there is no way to know how many bugs in Open Source software you have reported have never been assigned, and were simply closed due to End of Life (EOL), or what the track record of an assignee (who never addressed it, but simply closed EOL), closed but reopened due to a broken or incomplete fix, no communication whatsover from the assignee, etc.).

I have biases which you should probably be aware of.

  1. I weight communications issues more heavily than some do, because I am usually thinking in terms of security, and taking a sane approach to disclosure. This is often about how well communications works, and timeliness.
  2. Many bugs can fall into a security context, for reasons that may not be readily apparent. Failure of some Linux update mechanisms in terms of alerting on rpm.new/.save files is a pet peeve, as is maintaining performance.

This is about that second point -- maintaining performance. I have another bias there -- if I didn't do security, I would do perf. It's the next most interesting problem. The high-frequency trading community is running without firewalls. Some turn SELinux off not due to usability concerns, but because characterizing perf impacts is difficult. Some run Remote DMA over converged Ethernet (RoCE) for the performance pop without considering the security implications of bypassing defenses built into the kernel.

There are a lot of interesting behaviors out there, and not all are well-considered. The last thing we should be doing is making it difficult to easily explore the performance impacts of things we might recommend. That's a recipe for losing all credibility, and becoming part of the problem.

I've recently destroyed the lab (again), because 2016 is coming up fast, and I wanted a first cut on what the hardware budget should look like. One of the things I wanted to explore is the overhead of rapidly spawning a lot of processes. Likely the last time I would do such a thing, for a couple of reasons.

  1. Amazon AWS probably makes more financial sense than tearing up the lab, though reproducible research is a concern. But Amazon is only one vendor.
  2. 'a lot of processes' is subjective, and relevance is entirely dependent on your *aaS (Infrastucture, Platform, etc., as a Service), your existing or contemplated local/cloud/hybrid security posture, etc.

Given the large number of possible deployment scenarios in modern infrastructure, it would be really nice if even the basics of performance tools were reliable. Nicer still if one could have some confidence that 'bleeding edge' Linux distributions could give us a preview of coming attractions, as used to be the case. Sadly, this is much less the case than it used to be.

In the case of Fedora and the rest of the Red Hat family, it goes beyond default file systems. I could go on about that, but this is about performance testing. So, well past time that I circled back to https://bugzilla.redhat.com/process_bug.cgi (login required), and spawning processes. That submission, for those who do not have a Red Hat Bugzilla account, reads as follows.

Description of problem: /usr/bin/free -s fails for floating-point and integers.

Version-Release number of selected component (if applicable):
3.3.10-5

How reproducible: Always.

Steps to Reproduce:
1. /usr/bin/free -s 1
free: seconds argument `1' failed

2. /usr/bin/free -c 1
works

3. /usr/bin/free -s 1 -c 1
free: seconds argument `1' failed

4. /usr/bin/free -c 1 -s 1
works

Actual results: As above. Verified that c > 1 work, when -c works at all.

Expected results: Functional continuous results from /usr/bin/free, and agreement between man page and program output. From man:
-c, --count count
Display the result count times. Requires the -s option.
-s, --seconds seconds
Continuously display the result delay seconds apart. You may actually specify any floating point number for delay, usleep(3) is used for microsecond resolution delay times.

Currently, it is -s that requires -c. Which is perverse if one wants to use free as a rough and ready means of tracking memory usage as several processes are started, and the time required to do that is unknown. Nor should order of specifying -c and -s matter, which would be a usability bug.

Additional info: Discovered this while attempting to use '-s 0.1', and discovering that even integers did not work.