The echo of a tragedy in Social Media – “The Making of”
In this post I’m going to describe how I pull together the visualization of The echo of a tragedy in Social Media.
I used 3 different technologies to pull it together: PHP for data gathering, R for data analysis and data output generation and D3.js for data visualization.
The day after the incidents in the Charlie Hebdo took place, I set up a Twitter monitor using the Search API based on the hashtag #jesuischarlie for 5 different languages: German, Spanish, English, Italian and of course French. I made sure for each language I had at least 10K Tweets gathered. I made available the data as csv (jesuischarlie.csv). The PHP part is easy so I’m not going to focus on it 🙂
The analysis part is no rocket science and is accomplished in following steps:
- Counting the occurrences of all available hashtags per language
- Taking the top 50 hashtags per language
- Computing the occurrences of hashtags pairs within the selected hashtags
- Generating the JSON according to the directed forced D3.js layout
The code taking care of the visualization is written in D3.js, embedded and readable in the visualization page.
A few comments on that:
- There are 2 functions to make both text and opacity dependent on the number of occurrences
- The nodes accept a click to highlight the connections to other nodes
- The color of the highlighted connections change according to a round robin approach to allow for multiple highlighting
- The values for gravity, distance and charge have been manually adjusted… worth spending some time playing around with them
- The size of the images for the nodes is determined by a function to allow bigger sizes for the flags
Below you can find the R code… Again, it might not be 100% optimized, but does the job… Give it a go!
library(RMySQL) require(data.table) library(stringr) con <- dbConnect(MySQL(), user="XXXXXX", password="XXXXXX",dbname="XXXXXXX", host="XXXXXXXX") res=dbSendQuery(conn=con,"SELECT * FROM domaintweets where tag='#JeSuisCharlie'") assets<-fetch(res,-1) dbDisconnect(con) # I've done it with a data based... you can do it with the data set I provided above with a read.csv statement # assets<-read.csv("~/jesuischarlie/jesuischarlie.csv") # languages we extracted tweets for languages<-c("es","de","en","fr","it") countPerLanguage<-function (assets, max) { assets$ntags<-str_count(assets$tags, ",") #filtering out tweets with way too many hashtags assets<-assets[assets$ntags<5,] assets<-head(assets,max) tags<-assets$tags alltags<-paste(tags, collapse = ',') alltags<-gsub(x=alltags, pattern = ',,',replacement = ',') all.tags<-strsplit(x = alltags,split = ',') df.all.tags<- as.data.frame(all.tags) colnames(df.all.tags)<-c("tag") df.all.tags$tag<-as.character(df.all.tags$tag) df.all.tags$tag<-tolower(df.all.tags$tag) df.all.tags$count<-1 agg.df.all.tags<-aggregate(count~tag, df.all.tags, length) agg.df.all.tags<-agg.df.all.tags[order(agg.df.all.tags$count, decreasing = T),] return (agg.df.all.tags) } # taking the top50 hashtags per languages agg.languages<-NULL for (i in 1:length(languages)) { df.lang<-subset(assets, lang==languages[i]) top50 <- countPerLanguage(df.lang, 10000) lang<-rep(x = languages[i],times = nrow(top50)) top50$lang<-lang top50<-head(top50,50) agg.languages<-rbind(agg.languages,top50) } # hashtag 2 hashtag connection unique.tags<-unique(agg.languages$tag) pairs.df<-NULL for (i in 2:length(unique.tags)) { tag1 <- unique.tags[i] assets.tag <- subset(assets, grepl(pattern = paste0("^",tag1,",", "|",",", tag1,",") ,x = assets$tags) ) for (j in i:length(unique.tags)) { if (i!=j) { tag2<-unique.tags[j] print(paste(tag1, "-", tag2)) pairs <- subset(assets.tag, grepl(pattern = paste0("^",tag2,",", "|",",", tag2,",") ,x = assets.tag$tags) ) count.pairs<-nrow(pairs) if (count.pairs!= 0) { l<-list(unique.tags[i],unique.tags[j],count.pairs) df<-as.data.frame(l) colnames(df)<-c('tag1','tag2','Freq') pairs.df<-rbind(pairs.df,df) } } } } pairs.df$tag1<-as.character(pairs.df$tag1) pairs.df$tag2<-as.character(pairs.df$tag2) # Generating JSON compatible with the force-directed graph http://bl.ocks.org/mbostock/4062045 languages<-c("de","en","it","fr","es") languages.img<-c("https://cdn3.iconfinder.com/data/icons/finalflags/32/Germany-Flag.png", "https://cdn3.iconfinder.com/data/icons/finalflags/32/United-Kingdom-flag.png", "https://cdn3.iconfinder.com/data/icons/finalflags/32/Italy-Flag.png", "https://cdn3.iconfinder.com/data/icons/finalflags/32/France-Flag.png", "https://cdn3.iconfinder.com/data/icons/finalflags/32/Spain-Flag.png") img.url<-"https://cdn4.iconfinder.com/data/icons/miu/22/editor_pencil_pen_edit_write_-16.png" listNodes<-NULL nodeNr<-0 # Genereting the Nodes strJson<-'{ "nodes":[' for(j in 1:length(languages)) { lang<-agg.languages[agg.languages$lang==languages[j],] df<-data.frame(tag=languages[j],noderNr=nodeNr) listNodes<-rbind(df,listNodes) # entry for the language dt<-paste0('{"name":"', languages[j],'","group":',j,', "image":"',languages.img[j] ,'", "count":', sum(lang$count) ,', "number":', nodeNr ,' },' ) nodeNr<-nodeNr+1 strJson<-paste0(strJson,dt) for(i in 1:nrow(lang)) { if (nrow(listNodes[listNodes$tag==lang[i,]$tag,])==0 && !(lang[i,]$tag %in% languages)) { dt<-paste0('{"name":"', lang[i,]$tag,'","group":',j,', "image":"',img.url ,'", "count":', lang[i,]$count ,', "number":', nodeNr ,' },' ) strJson<-paste0(strJson,dt) df<-data.frame(tag=lang[i,]$tag,noderNr=nodeNr) listNodes<-rbind(df,listNodes) nodeNr<-nodeNr+1 } } } strJson<-substr(strJson,1,nchar(strJson)-1) # Genereting the Links strJson<-paste0(strJson,'], "links": [') for (l in 1:nrow(agg.languages)) { matches1<-subset(listNodes, tag==agg.languages[l,]$lang) matches2<-subset(listNodes, tag==agg.languages[l,]$tag) for (k in 1:nrow(matches1)) { for (m in 1:nrow(matches2)) { dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},') strJson<-paste0(strJson,dt) } } } for (l in 1:nrow(pairs.df)) { matches1<-subset(listNodes, tag==pairs.df[l,]$tag1) matches2<-subset(listNodes, tag==pairs.df[l,]$tag2) for (k in 1:nrow(matches1)) { for (m in 1:nrow(matches2)) { dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},') strJson<-paste0(strJson,dt) } } } strJson<-substr(strJson,1,nchar(strJson)-1) strJson<-paste0(strJson,']}') # Writing it to a file write(x = strJson, file = '~/jesuischarlie/jesuischarlie2.json') |
1 Response
[…] If you want to understand the way the #jesuischarlie voices have been pulled together, read the “Making of #jesuischarlie plot” […]