Code

Monitoring Twitter Keywords and Sentiment Analysis – BITS Pilani

Last year i did a small project – monitoring keywords on twitter, in hope to get better insight about my Alma mater. Unfortunately this lasted only for 9 months, before twitter axed IFTTT search support.

6311 tweets over the period of 9 months (January-August 2012). Sample data is filled with noise as expected, also users keep deleting their tweets later.

Medium Used: IFTTT recipe to monitor twitter keywords

Keywords Monitored: BITSPilani, BITSGoa, BITSHyd, BITSDubai, BITSAA, BITS-Pilani and tons of other variations

Steps:

  1. IFTTT emails the links of tweets for those keywords.
  2. Extract GMail contents using a python script.
  3. Using library for twitter API, crawl individual tweet stats. (Twitter API is pretty simple, i wrote a java class to do the task.
  4. Twitter training dataset taken from ThinkNook.
  5. RapidMiner with -Xms2048m -Xmx3072m took around 20 hours in SVM model for 0.1 Million rows dataset.

Basic Connection Code

URL url = new URL("https://api.twitter.com/1/statuses/show.json?id="
+tweetid+"&include_entities=true");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "application/json");
if (conn.getResponseCode() != 200)
{
	//throw new RuntimeException("Failed : HTTP error code : "
	// + conn.getResponseCode());
	fostream = new FileWriter("D:/project/twitter.csv",true);
	BufferedWriter out = new BufferedWriter(fostream);
	out.write("x----x,x----x,x----x,x----x,x----x,x----x,x----x");
	out.newLine();
	out.close();
}

Performance Vector (SVM Model)

true 0 true 1 class precision
pred. 0 24042 9922 70.79%
pred. 1 19482 46537 70.49%
class recall 55.24% 82.43%

Stats

Top 10 Positive and Negative words

word weight word weight
thank 0.06800427050495744 sad 0.06904954519705979
love 0.04238921785592977 miss 0.06799716497097386
good 0.03864780316342833 sorri 0.06447410364223946
great 0.03332699835307452 wish 0.04964308132602499
quot 0.028049576202737663 suck 0.04549754050714666
welcom 0.028045093611976712 bad 0.03882145370669514
awesom 0.027883840586310205 hate 0.038814744730334146
haha 0.027711586964757735 work 0.038456277249749565
nice 0.026502431781819224 poor 0.03537374379337165
happi 0.024842171425360552 want 0.03312521661076012

Total Tweets Ratio

Positive Tweets 4759
Negative Tweets 1552

Top 5 Most RTed Tweets

Headover to the GitHub for project datasets and analysis.

Tagged , , , , , , , , , , , , , , , ,