(Update database to v1.1 on 11/30/09 at 11:49AM)
It contains all alphanumeric messages from the WikiLeaks 9/11 dataset as an SQLite3 database:
911.wikileaks.org.sqlite3db_v1.1.zip, 14MB (uncompressed: 44MB)
There are three tables - textTable (main), emailTable and urlTable - with the following schemas:
-
textTable:timestamp DATETIME, service TEXT, senderID INTEGER, text TEXT, key INTEGER PRIMARY KEY -
emailTable:address TEXT, domain TEXT, textKey INTEGER, key INTEGER PRIMARY KEY -
urlTable:url TEXT, textKey INTEGER, key INTEGER PRIMARY KEY
- The
textKeyfields are pointers to the primary key (key) of an entry in thetextTabletable. - For the
senderIDfield, a string of all zeros is a translation from the same number of question marks in the original message, as I wanted this field to be integer typed.
More to come, including: the script used for generating this database (GitHub project), a more robust script and database (added in v1.1 update on 11/30/09) that will include email and URL lookup tables, and hopefully some cool analysis of all this amazing data.
Lastly: it turns out that this Jeff Clark guy has already done a bunch of analysis along the lines of what I thinking about doing, and it’s great stuff. In any case, hopefully this database will make things a tad easier for others wishing to do interesting analysis of their own.