I don't want to fall into the classic trap of trying to design the perfect semantic data system, meaning that an enormous effort goes in, and nothing useful comes out. But I would like a reasonable first cut at encompassing some very different security data sources. Some of those live on local storage, in multiple databases. A bookmarks database stored in sqlite would be one example, as would other databases (which I can't tell you about) stored in not-sqlite. Which I also can't mention.
Data stored in the filesystem is a whole different thing. There are semantic desktop approaches in both the KDE and Gnome Linux desktops. But those approaches, while advancing at what seems to me to be a reasonable pace, are immature. Plus, I have rules about this sort of thing. In the midst of all the junk that accumulates on hard drives (and this is a deep problem), there are file hierarchies that I care about. As an example, the reference area is not allowed to contain a file with the execute bit set. The reason being that I want to preserve my ability to mount it from anywhere on the LAN, as a network filesystem, with the noexec option set. Example exploit code is typically compressed. Because duh. Who would mount a network FS, containing known malware, and allow executable code? The only text files I write to that area are CONTENTS files, but that is more about keeping my notes files away from a purely reference area. That is just one example of half a dozen or so filesystem areas that I deeply care about.
There are other information flows, such as news. It turns out that Thomson Reuters has done important work in this area. See references to Calais/OpenCalais. This is important work, even from a purely security perspective, because attack development follows fashion, and the security history of any particular software project is not strongly correlated to current or future states. As witness, the recent explosions in exploits against Point of Sale systems. As counter-examples, we have
Drupal and the long history of punctuated equilibrium related to attacks against image-rendering libraries, etc. Several related and worthwhile papers have come out of academia, and there is much to talk about.
As usual, there is a lot going on here behind the scenes, and much is unrelated to this project. But I felt that an update was in order. The gist of that is that crossing off TODO entries, in the context of creating something that is more useful than what has come before, and potentially far more useful, are winning over new entries by about 3:1. That is a win.
There is also a massive raft of notes, links, code snippets, and other detritus accumulating. There is no good way to publish that lot on Blogger. That is a lose, and while it is conceivable that GitHub might be a complete solution, I am open to other suggestions.
Thursday, January 1, 2015
This is nothing to do with new year resolutions, as I discovered years ago that those simply do not work for me. It is more about looking at various note files which will become future posts. The good news (for me, anyway) is that there is material for something like 30 posts in various states of completion. Some of it would take as little as half an hour of work, if I were to continue as I have in the past.
I don't want to do that. I don't know that it has been particularly effective, and behind the scenes I have been organizing the work into themes that I want explore in some depth, and thinking about how to make the intended audience for any given post explicit. Do I need a simple classification scheme, or do I need to think in terms of an ontology? How is what I want to do constrained by the Blogger platform?
The simplest case was in doing the initial blog layout. I wanted the date of publication right up top. There is a lot stale information on the Web, and I think that part of that is purposeful. There is a business incentive (so-called Long Tail economics of the Web) for making publication dates harder to find. It is also a complete pain in the ass for many members of my tribe (security workers) who may have to digest dozens of news articles, blog posts, etc., every day.
I would also like to make life a bit easier for the tool builders who are creating the next generation of automated ingestion systems. So I would like to keep things as stable as possible. Meaning don't change the current date format, and put some thought into the forthcoming 'Intended Audience', etc.
I expect to fail at this. The design space begins to resemble API design (difficult to get right) and also involves some Semantic Web thought. The Semantic Web has, rather famously, never taken off. This a case where not failing horribly would be a win.
Do not expect this stuff immediately. It has been a couple of weeks since I posted, and it is going to be difficult to post again until sometime around mid-January. Sorry about that, but life intrudes.