I don't want to fall into the classic trap of trying to design the perfect semantic data system, meaning that an enormous effort goes in, and nothing useful comes out. But I would like a reasonable first cut at encompassing some very different security data sources. Some of those live on local storage, in multiple databases. A bookmarks database stored in sqlite would be one example, as would other databases (which I can't tell you about) stored in not-sqlite. Which I also can't mention.
Data stored in the filesystem is a whole different thing. There are semantic desktop approaches in both the KDE and Gnome Linux desktops. But those approaches, while advancing at what seems to me to be a reasonable pace, are immature. Plus, I have rules about this sort of thing. In the midst of all the junk that accumulates on hard drives (and this is a deep problem), there are file hierarchies that I care about. As an example, the reference area is not allowed to contain a file with the execute bit set. The reason being that I want to preserve my ability to mount it from anywhere on the LAN, as a network filesystem, with the noexec option set. Example exploit code is typically compressed. Because duh. Who would mount a network FS, containing known malware, and allow executable code? The only text files I write to that area are CONTENTS files, but that is more about keeping my notes files away from a purely reference area. That is just one example of half a dozen or so filesystem areas that I deeply care about.
There are other information flows, such as news. It turns out that Thomson Reuters has done important work in this area. See references to Calais/OpenCalais. This is important work, even from a purely security perspective, because attack development follows fashion, and the security history of any particular software project is not strongly correlated to current or future states. As witness, the recent explosions in exploits against Point of Sale systems. As counter-examples, we have
Drupal and the long history of punctuated equilibrium related to attacks against image-rendering libraries, etc. Several related and worthwhile papers have come out of academia, and there is much to talk about.
As usual, there is a lot going on here behind the scenes, and much is unrelated to this project. But I felt that an update was in order. The gist of that is that crossing off TODO entries, in the context of creating something that is more useful than what has come before, and potentially far more useful, are winning over new entries by about 3:1. That is a win.
There is also a massive raft of notes, links, code snippets, and other detritus accumulating. There is no good way to publish that lot on Blogger. That is a lose, and while it is conceivable that GitHub might be a complete solution, I am open to other suggestions.
No comments:
Post a Comment
Comments on posts older than 60 days go into a moderation queue. It keeps out a lot of blog spam.
I really want to be quick about approving real comments in the moderation queue. When I think I won't manage that, I will turn moderation off, and sweep up the mess as soon as possible.
If you find comments that look like blog spam, they likely are. As always, be careful of what you click on. I may have had moderation off, and not yet swept up the mess.