<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tobias &#38; Tobias &#187; twitter</title>
	<atom:link href="http://blog.tobias.tv/tag/twitter/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.tobias.tv</link>
	<description>Company blog of T&#38;T</description>
	<lastBuildDate>Thu, 02 Feb 2012 10:01:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.5</generator>
		<item>
		<title>Using Google Spreadsheets to scrape Twitter data</title>
		<link>http://blog.tobias.tv/2010/01/21/google-spreadsheets-twitte/</link>
		<comments>http://blog.tobias.tv/2010/01/21/google-spreadsheets-twitte/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 10:16:55 +0000</pubDate>
		<dc:creator>Brendan</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[walkthroughs]]></category>

		<guid isPermaLink="false">http://blog.tobias.tv/?p=48</guid>
		<description><![CDATA[Find out how to use Google Documents as a Twitter search engine, extracting tweets into a useful spreadsheet format. You'll be able to see tweets that link to any URL as well as tweets containing any keyword you specify.]]></description>
			<content:encoded><![CDATA[<p>A while ago I was looking for ways to scrape Twitter search data in a structured, easily manageable format. The two APIs I was using (<a href="http://apiwiki.twitter.com/Search+API+Documentation">Twitter Search</a> and <a href="http://backtweets.com/api">Backtweets</a>) were giving good results &#8211; but as a non-developer I couldn&#8217;t do much with the raw data they returned. Instead, I needed to get the data into a format like CSV or XLS.</p>
<p>Some extensive googling led me to <a href="http://www.labnol.org/internet/monitor-web-pages-changes-with-google-docs/4536/">this extremely useful post on Labnol</a>, where I learnt about how to use the <a href="http://docs.google.com/support/bin/answer.py?hl=en&amp;answer=75507">ImportXML function</a> in <a href="http://www.google.com/docs">Google Spreadsheets</a>. Before too long I&#8217;d cracked my problem. In this post I&#8217;m going to explain how you can do it too.</p>
<h3>Data you can extract from Twitter</h3>
<p>This walkthrough will teach you how to extract two types of Twitter data using Google Spreadsheets &#8211; <strong>tweets</strong> and <strong>links</strong>.</p>
<p><strong>Tweets</strong> are extracted using the Twitter Search API in conjunction with ImportFeed. This allows Twitter search results to be extracted into a spreadsheet format.</p>
<p><strong>Links</strong> are extracted using the Backtweets API in conjunction with ImportXML. The Backtweets API allows you to find any links posted on Twitter even if they&#8217;ve been shortened using services like bit.ly or tinyurl.</p>
<h3>I&#8217;m in a hurry, can I just do this right now?</h3>
<p>If you just want to do it &#8211; instead of learn <em>how</em> to do it &#8211; just <a href="http://spreadsheets.google.com/ccc?key=0Ash8H8PmYM6JdENrNnkzX3l0ZkI2d2ZmZHBiOGtKNHc&amp;hl=en" target="_blank">open this Google spreadsheet I&#8217;ve created</a>.  You&#8217;ll need to make your own local copy so you can edit it. Instructions can be found in the spreadsheet itself.</p>
<h3>How to extract tweets containing links</h3>
<p>The instructions below will help you create a Google Spreadsheet that pulls in and displays the time, username and text of all tweets containing links to a specified page. Because it uses Backtweets, these tweets will be retrieved even if they used shortened URLs from services like <a href="http://www.bit.ly">bit.ly</a> or <a href="http://www.tinyurl.com">tinyurl</a>.</p>
<ol>
<li>Create a new spreadsheet in Google Documents.</li>
<li>Enter column labels in this order: &#8220;Search criteria&#8221;, &#8220;Timestamp&#8221;, &#8220;Username&#8221; and &#8220;Tweet text&#8221; in cells A1 to D1.</li>
<li>In cell B2, underneath Timestamp, insert the following formula:<br />
<blockquote><p>=ImportXML(&#8220;http://backtweets.com/search.xml?itemsperpage=100&amp;since_id=1255588696&amp;key=key&amp;q=&#8221;&amp;A2,&#8221;//tweet_created_at&#8221;)</p></blockquote>
</li>
<li>In cell C2, underneath Username, insert the following formula:<br />
<blockquote><p>=ImportXML(&#8220;http://backtweets.com/search.xml?itemsperpage=100&amp;since_id=1255588696&amp;key=key&amp;q=&#8221;&amp;A2,&#8221;//tweet_from_user&#8221;)</p></blockquote>
</li>
<li>In cell D2, underneath Tweet Text, insert the following formula:<br />
<blockquote><p>=ImportXML(&#8220;http://backtweets.com/search.xml?itemsperpage=100&amp;since_id=1255588696&amp;key=key&amp;q=&#8221;&amp;A2,&#8221;//tweet_text&#8221;)</p></blockquote>
</li>
<li>Now paste a search query into cell A2 &#8211; say, <strong>http://www.google.com</strong>. After a few seconds, you should see columns B, C and D fill up with tweets, looking something like the image below:</li>
<p style="text-align: center;"><a href="http://www.brelson.com/wp-content/uploads/2010/01/gdocs-backtweets.png"><img class="aligncenter size-full wp-image-310" src="http://www.brelson.com/wp-content/uploads/2010/01/gdocs-backtweets.png" alt="Google Spreadsheet showing Backtweets results" width="600" height="96" /></a></p>
<li>The formulas pasted into cells B2, C2 and D2 all reference the URL in cell A2. This means that whenever you paste anything new into A2, the search results should refresh.</li>
<li>Also, you can paste parts of URLs into A2 &#8211; not just entire ones. This is useful for seeing all links to a specific directory on your site, for example.</li>
</ol>
<p>Finally, this tool can only extract 100 results at a time &#8211; but it is possible to set it up to retrieve more than that. Look at my <a href="http://spreadsheets.google.com/ccc?key=0Ash8H8PmYM6JdENrNnkzX3l0ZkI2d2ZmZHBiOGtKNHc&amp;hl=en">sample Google Spreadsheet</a> if you want to do this.</p>
<h3>Extracting tweets from Twitter search results</h3>
<p>The method for doing this is identical to the above, but uses the ImportFeed function instead of ImportXML.</p>
<ol>
<li>Create a new spreadsheet in Google Documents.</li>
<li>Enter column labels in this order: &#8220;Search criteria&#8221;, &#8220;Timestamp&#8221;, &#8220;Username&#8221; and &#8220;Tweet text&#8221;. For the rest of this walkthrough, I&#8217;m going to assume that these labels are in cells A1 to D1, but in reality you can put them wherever you like</li>
<li>In cell B2, underneath Timestamp, insert the following formula:<br />
<blockquote><p>=ImportFeed(&#8220;http://search.twitter.com/search.atom?rpp=20&amp;page=1&amp;q=&#8221;&amp;A2, &#8220;items created&#8221;)</p></blockquote>
</li>
<li>In cell C2, underneath Username, insert the following formula:<br />
<blockquote><p>=ImportFeed(&#8220;http://search.twitter.com/search.atom?rpp=20&amp;page=1&amp;q=&#8221;&amp;A2, &#8220;items author&#8221;)</p></blockquote>
</li>
<li>In cell D2, underneath Tweet Text, insert the following formula:<br />
<blockquote><p>=ImportFeed(&#8220;http://search.twitter.com/search.atom?rpp=20&amp;page=1&amp;q=&#8221;&amp;A2, &#8220;items title&#8221;)</p></blockquote>
</li>
<li>Type a search query into cell A2 &#8211; say, &#8220;Hoth.&#8221; Hit enter and the results will load. It should look something like this:</li>
<li><a href="http://www.brelson.com/wp-content/uploads/2010/01/gdocs-twittersearch.png"><img class="aligncenter size-full wp-image-314" title="gdocs-twittersearch" src="http://www.brelson.com/wp-content/uploads/2010/01/gdocs-twittersearch.png" alt="Google Spreadsheets with data from Twitter search" width="600" height="143" /></a>Things will go wrong if you insert characters like <strong>#</strong> or <strong>@</strong> into the search query. To get around this, type <strong>%23</strong> instead of <strong>#</strong> and <strong>%40</strong> instead of <strong>@</strong>. This will allow you to search for hash tags and usernames.</li>
</ol>
<p>I haven&#8217;t been successful in generating more than 20 search results per request, but you can get around this using the page number parameter in the ImportFeed query string. See <a href="http://spreadsheets.google.com/ccc?key=0Ash8H8PmYM6JdENrNnkzX3l0ZkI2d2ZmZHBiOGtKNHc&amp;hl=en">my own Google spreadsheet</a> to find out how to do this.</p>
<p>I hope these instructions are useful &#8211; if you have any comments, questions or feedback, please let me know in the comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tobias.tv/2010/01/21/google-spreadsheets-twitte/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

