The Motivation
https://twitter.com/#!/corbett_inc/status/11049922849673216
https://twitter.com/#!/corbett_inc/status/11050551278051330
https://twitter.com/#!/ario/status/11053875767279616
Check my Wikileaks Research Project midpoint report out for more details but a bite sized background piece quoting from the blog post is as follows:
A week ago I hatched a plan to educate myself in a more structured manner on what exactly Wikileaks stands for, its history, and its possible futures with the idea of developing a more concrete and informed opinion (and since I have a habit of taking action with respect to my opinions, thereby guiding my future actions). After seeking some advice about what to read via twitter and starting with this Julian Assange essay as a basis–a very good one for some philosophical background of the larger aims, I spent awhile digesting the Wikipedia page on Wikileaks before deciding I wanted to fact check its assertions and that its references might serve as a springboard the next step in my studies
When the fantastic FBZ suggested I give a talk at SecurityBSides Berlin I submitted this as a talk, which when accepted gave me a tight deadline to meet:
I am happy to say I made it. The talk went decently but I’m making up for the places in which it was lacking by making a thorough blog post detailing the methodology and findings.
The Methodology
Everything I speak about here is available in the github repostory of the project https://github.com/corbett/WikileaksResearchProject.
1. Obtain articles
This is covered in more depth in my midpoint report blogpost but the Cliff’s notes are to run the following:
./AddUrlsToInstapaper.py -u username -p password page_url
on the relevant Wikipedia page, which in this case is http://en.wikipedia.org/wiki/WikiLeaks. AddUrlsToInstapaper.py is of course available on github.
2. Read articles, all 265 + of them
The articles are available in .mobi e-book format and original html (for offline reading) on the github. Since the inception of the project, I read a bit every day, making notes along the way. The worthwhile reads are also marked in the citation spreadsheet (see step 3) and are indicated by belonging to the folder “WL Reading List” under the “Instapaper Folder” column.
3. Classify articles
During the course of reading, I quickly noticed most articles didn’t cite their sources properly, and I was often led in circles before finding an original source. I decided to quantify this, and the results are available in the following spreadsheet: http://tinyurl.com/wikipedia-wikileaks-citations.
A few key results:
The course citation classification looks pretty good. 47% original documents (note this does not necessarily mean they were informative or useful only that they were original) and 25% citing original sources.
However, classifying in greater detail (the above lumps all articles which cite at least one original source into the “yes” pile) and reserving “yes” for articles which cite original sources wherever applicable, the numbers get significantly more depressing. Here only 5% of articles (~10% of non-original source documents) properly handle citations, and most articles which do cite an original source do so in a haphazard and inadequate manner.
4. Analyze relationships
In addition to the citation problem, I quickly noticed a repetition problem. Many articles were rehashes or light rephrasing of other articles making this algorithmic “what to read” approach inefficient as I often read the same article rehashed over and over. To quantify this to some degree I wrote a script to identify articles which had at least one full sentence in common and represent that connection by an edge connecting two nodes, representing the two articles, in a graph. The articles the numbered nodes correspond to are indicated in this table. The graphs are tangled media webs but here are a few interesting excerpts:
I also did a quick and dirty scan for the number of documents mentioning key terms to the saga:
This analysis was coarse, but what pops outs clearly, as if it wasn’t obvious already, is Assange’s domination of the news.
Conclusions
The conclusions and opinions I formed and actions I plan as a result–both with respect to WikiLeaks and the media as a whole–deserve their own post. Stay tuned for Wikileaks Research Project Final Report II: Conclusions, Opinions and Actions. In the meantime the slides from my BerlinSides Presentation are available for perusal. Happy New Year!









You must be logged in to post a comment.