Gun Violence in Ohio 2013 - 2018 Analysis with Palladio and Google Fusion Tables
Initially, I began working on this project trying to answert the humanistic question of what social media, specifically tweets and conversation on twitter could tell me about the “mood” or sentiment about particular topics based on keywords. In doing this, I tried to use the toolset of a single file python script which used the tweepy library to scrape twitter for data around tweets which included my keyword of choice. In installing the different libraries required for the script to compile on my linux machine, I found that the script had been written in Python 2.7, but the current versions of the library were in 3.5. There was no backwards compatibility between the library and the script, and the task of modifying the script to make it compatible with the version of tweepy available proved too ardous.
Next, I decided to work backwards, to engineer a humanistic question based on the dataset and toolset/ technology available to me. I explored various datasets until I found a dataset on Kaggle.com which included every reported shooting across the United States, with names, links to newspaper articles, gps coordinates, addresses of the incidents, descriptions of the places and circumstances under which these events took place and the times and dates. All in all, the csv file for this dataset had over 250,000 entries, and 20 metrics by which incident was recorded.
I decided to use Palladio to manipulate this dataset and to overlay this data on a map to show the frequency density of violent crimes involving shootings. Because the dataset was so big, it had an initial size of 250 megabytes and my computer could not manipulate it using the default spreadsheet software. I then tinkered with it until I found Google Fusion Tables- a project that manipulates large data files and parses them to identify patterns. I used this tool to clean my data set, and I cut out 15 of the columns until I was left with just 5. I also then decided to filter out all the other incidents except for those that occured in Ohio. After this, I was left with a much smaller dataset of 9000+ incidents of gun violence across the state of Ohio since 2013.
I found that Google fusion tables intuitively used the address label to map the incidents to physical address locations within google maps. A photograph of such visualizations from within fusion tables is below.
This shows the frequency distribution visualization in Cleveland specifically.
This shows a pin for one specific incident.
Using fusion tables allowed me to explore the dataset much more intuitively with a lower barrier to entry in terms of information than Palladio did.
Below is a visualisation generated by Palladio.
Palladio allowed me to compare incidents of gun violence that culminated in death, injury, and no significant damage to any person. Each of these were represented by red, blue and green respectively. Using palladio I was able to qualitatively and quantitavely answer the humanistic question of how place affected safety and the incidence of gun violences. Such information could be incredibly useful in urban planning, or in making the choice of where to move next.
While I initially set out to carry out my research using just Palladio, I found that my research could be more comprehensive if I used both Palladio and Fusion tables to examine my research question. My biggest lesson from this project was likely that multiple frameworks can be used to answer the same research question, and that they can often complement each other- and that while I may try to answer a specific research question, I should be open to the possibility of exploring it along a different path/ methodology as the project progresses.