Wednesday, February 25, 2015

Quality Time With Databases


Unusually for me, I have spent the past couple of weeks swimming in a database sea. Specifically PostgreSQL 9.3: the relational database that deserves a lot more press than it has ever received.

I lay the blame for that at the feet of Tim O'Reilly, and early Web developers. We are going back to the mid 1990s here. Many local and regional printing companies thought they would own the Web development business because, the predominant commercial Web sites of the time, in any market segment, were static: they were divided roughly equally between online brochures and catalogues. Which was what printers did, so they expected to rule by a rather natural segue, powered largely by having graphic artists, etc., on staff.

The drive for interactive sites was just beginning, with Java developers trying to stuff huge files down pipes defined by a bad-ass 56K modem on a dial-up connection. Javascript was launched, with the idea that riding the name recognition of this horrible Java interactivity mechanism was a Good Thing, though Java and Javascript are entirely unrelated.

It turned out that having graphic artists on staff wasn't enough. Interactivity demanded running code (both server-side and client-side), not just HTML markup. A server-side database became a basic requirement, and it turned out that old-school graphic artists were not all that knowledgeable about coding and database administration.

Into that breach stepped Tim O'Reilly, and his introduction of the term 'LAMP stack'. Linux, Apache, MySQL, PHP. This hugely boosted the popularity of both Linux and the Apache Web server. It was free! ISPs could offer it on the cheap!

One downside to this idea, was MySQL, which achieved that vital initial surge in popularity simply because some graphic artist, likely still affiliated with a printer, or some random person associated with an equally random ISP, could actually make it run. Sort of; it was buggy and primitive.

It stayed that way for a long time. Not ACID-complaint, horrible SQL compliance, didn't reliably enforce constraints, etc. A seemingly unending list of serious issues. But hey, it was easy to get it running! PostgreSQL was also available, and has historically been far superior, but it demanded more care and feeding, as serious databases or wont to do. The horrible name helped not at all.

I, OTOH, had a need for reliability, was already operating in an environment that used Linux and Apache (though not PHP -- a subject for another post) and had no problem with the systems administration aspect. To this day, in any project that needs a hugely competent database, I default to PostgreSQL. Is it perfect? No. I have never seen perfect software, and do not expect to.

Enough History. What Am I doing?

I currently have five projects in-work that require a database back-end. I am planning to migrate all of them to at least 9.2. It is possible that in one or more cases, postgres will not meet the need. In which case, it will be abandoned. Having the best historical results with a database is not the same thing as being wedded to the damned thing forever.

The way that this breaks down is that one is personal. We might toss that one on those grounds, except that it is the project that uses more postgres features than any. The only thing it is not subject to is a sensitive dependence on ingestion rates. I am going to leave it in, with some misgivings; it is famously hard to avoid using features that are available in your database, but subject you to lock-in. 'Serial' v 'Autoincrement' may be the canonical example.

Overall, I care about, ordered by overall requirements:

  1. Integrity (very much including security)
  2. Auditability (not the same thing as security)
  3. Availability
  4. Ingestion rate
  5. Rate of change (in both code and documentation requirements)

Tuesday, February 17, 2015

An Update is Long Overdue


It is difficult to believe that my last post was over a month ago. Time flies. Particularly when there is not much of it that could be classified as 'spare'. I am still entranced with the idea of a Semantic Web, but that has been the case for a very long time now. The practical pitfalls are many, and often seem insurmountable.

Lately, the guiding wisdom has been to not let the perfect be the enemy of the good. If you wait until you create some Enormous Perfect Thing to ship anything, you will never ship. In the case of a blog, you will also fail to push possibly useful information out. This is old wisdom, and long predates blogs, but it is still an easy trap to fall into. This has become a mantra that I chant to myself multiple times per day.

It also has implications for practical software development in a modern distributed environment. This is still old news, but it bears mentioning again. Huge and sudden changes are far less likely to be accepted than slower incremental ones, particularly in the absence of a certain degree of coordination. A game plan, if you will. I am more or less operating in the limit, where that might be defined as a situation where I am hesitant to accept my own changes. In other words, self-doubt.

As I indicated in New Year, and Changes On the Way, I expect to fail at this. But, as I said in that post, "This a case where not failing horribly would be a win."

The major thing I am contending with is, of course, name spaces. In my (even more) ignorant youth, I used to be annoyed at, say, antivirus companies all having a unique name for the same bit of malware. It reminded me of financial services classifying the same company in differing ways. One service would declare Foo Corp a member of the Services sector, while another would call it a member of the Energy sector. It smacked of marketing, and an attempt at locking customers into a proprietary classification scheme, which of course promised enormous benefits for the investor.

What, then, would be made of a corporation that provides financial transaction processing services, but is also a supplier of systems (hardware and software) for those that want to do it in-house? Is this a vendor of hardware, software, or services? Is it dependent on sales into those market segments? If so, how are changes over time accounted for?

With the best corporate will in the world, presuming a completely altruistic (yeah, right) outlook, with no intent of lock-in, this is a difficult problem. Nor can you consider, for the most part, governmental data sources as above the commercial fray, hence reliable data sources. The United States Department of Labor, Bureau of Labor Statistics, is incapable of providing data regarding even Systems Administrator statistics, much less the security specialties. Namespace issues are once again at the root of the problem. We have no reliable sources of data on whether security resources are increasing to meet increasing threats. What we have is marketing.

The problem continues in how one might evaluate the changing reliability of security news sources, including private and academic security researchers, the corporate publishers of white papers and press releases, the security trade press, etc.

So, am I working on some Impossible Shining Thing? Yes, if it is considered in absolute terms; some standard that is widely accepted, the evolution of which can be tracked over time, and continually evaluated by consumers as to efficacy.

So not letting the perfect be the enemy of the good returns to center stage. I might possibly succeed at creating something that is at least some sort of improvement over the current state of things. Given the huge and ongoing waste of resources, that would be a huge win. I would gladly settle for at least stimulating a bit more thought and discussion about the nature and extent of the problem, which seems sadly absent.

OTOH, this is a very low-traffic blog. Even that is a long shot.