Data Rep: Final Project – The Twitter Scorecard

 

 

 

For my final project in Jer Thorp’s Data Representation class, I explored the nature of tweets in relation to live sports. I worked on a concept for an app this past summer (summer of 2011) called “Tailgate” (formerly “Huddl”) that would provide users with curated twitter feeds for live sporting events on tv. During my research for this project, I found that following the sports “action” on twitter was not only entertaining, but extremely insightful. If I wanted to know what was going on in a particular game, reading recent tweets was a much faster and easiesr way to get a sense of the game than tuning in on TV. Also, reading tweets after a game had the potential to tell a much richer, nuanced story than other traditional forms of media, specifically boxscores and highlight reels.

 

For my project, I sought to execute a proof of concept, through data visualization, that twitter does provide an accurate and rich depiction of the “story” of a live sporting event. Using a Ruby sketch to access the Twitter streaming API, I pulled in tweets from two NFL football games. I stored the tweets in a MongoLab database and then exported the data in CSV format. I then brought the CSV into Processing to create the visualizations.

 

The first game from which I pulled in tweets was the Dallas Cowboys vs. Miami Dolphins, played on Thanksgiving Day, Nov.24, 2011. For this game, I queried the twitter streaming API for 12 game-specific hashtags. Over the course of roughly 3 hours, I pulled in close to 35,000 tweets.

 

The twitter streaming API sends back an extensive JSON object for each tweet. Per Jer’s advice, I started out by focusing specifically on the tweet message and the time it occurred. Using the Simple Date Object in Processing to determine each tweet’s exact timestamp, I plotted the frequency of all the collected tweets along the timeline of the game. The x-axis represented the time, progressing from left to right, with points plotted for every tweet and y-values set to a random spectrum. Here is the first visualization:

 

 

As you can see in the image above, there were areas of significant density representing higher tweet volume. (note: I lost the feed during halftime of the game which explains the gap of tweets in the center of the sketch). Common sense would assume these dense areas were moments of significance during the game, probably times when points were scored. So I then plotted the important scoring events of the game along the timeline below the tweets.

 

 

 

 

Touchdowns clearly sparked the highest volume of tweets along with the ending of the game, which was a game-winning field goal by the Cowboys with no time remaining.

 

To see what was actually being said, I then parsed out words from individual tweets to confirm what was being discussed and when it was happening. The images below show the occurrence of different words throughout the game.

 

 

Tweets with the word “Touchdown” or “TD”

 

 

 

 

Tweets with the word “Field Goal” or “FG”

 

 

 

Tweets with the word “Fumble” (this one is particularly interesting in that you could assume the fumble resulted in an opponent touchdown shortly after)

 

 

 

Tweets with the word “Interception” or “int”

 

 

 

Going beyond basic football terms, I started looking at the occurrence of words that had more “emotional significance.” Here’s a sketch comparing words of frustration (damn, shit, crap, fuck) with words of elation (amazing, nuts, crazy), and words of laughter (Ha, LOL).

 

 

 

 

For the most part, these occurrences appeared to line up with the significant moments of the game.

 

I then repeated this entire process for the New York Giants vs. Green Bay Packers game, which was played on Sunday Dec. 4th. For this game, I searched for 10 game-specific key words instead of hashtags. Through 3 quarters of the game, I pulled in over 150,000 tweets. (I unfortunately lost my feed during the beginning of the 4th quarter of the game.)

 

For my final class presentation, I developed a sketch that was primarily an exploratory tool. It allows a user to search for any word that occurred during either game, plot when / how often it occurred, and plot the relative frequency of that particular word throughout the game. The relative frequency helped distinguish when a particular word spiked in occurrence regardless of its total volume.

 

I also added in rollover functionality, where the actual tweet message would be displayed if the mouse rolls over a particular tweet in the lower graph. The visible message in the middle of some of the sketch images is the result of  the mouse rollong over the densest area and picking a tweet to display. Here are some images of the final interactive sketch. I am working to get the sketch active online, since the real charm of the piece is being able to search for any and all words to see where and when they might have occurred, but still images will have to do for now.

 

And special thanks to Jeremy Scott Diamond, Martin Bravo, Rune Madsen, and Greg Borenstein for their help on this project.

 

 

Game 01 – Overall

 

 

 

Game 01 – “Cheerleader” (a player on the Cowboys knocked over a cheerleader at the beginning of the 4th quarter, the television broadcast replayed the moment several times)

 

 

 

Game 02 – Overall

 

 

 

Game 02 – “Sack”

 

 

 

Game 02 – “Fumble”