I'm looking to make a scraper for Twitter and started looking online for resources that can give me a brief idea as to how I may go about doing it. I found quite a few gems like TweetStream and twitter-scraper but haven't seen a single full-fledged implementation of it to study for reference. Can someone please provide information regarding this concept as I'm familiar with the Ruby language but am not too sure as to how to proceed with this logic ?
Then you should be able to use the Ruby Twitter client to request data from the Twitter streams etc.
I don't use Ruby myself, but I would assume there are decent Twitter clients for Ruby, as there are for most languages. I've used Twitter clients for Python, Java and Scala and they're all fairly similar.
You need to set up an "application" under your Twitter account and request a set of API keys, which you then use as your credentials inside your client program. These give you permission to access Twitter's various stream interfaces and request a feed of live Tweets, for example. The Twitter stuff is documented here:
Thanks for the link to the blog. I've already registered my application and obtained the 2 keys and their secrets, and was planning on storing the tweets in a NoSQL database like CouchDB but MongoDB looks like a good alternative to it.
I didn't think about using Rails though to create the scraper and this again I think brings in a bit of complexity in the design. Apart from Ruby, which language would you recommend to create a Twitter scraper ? and is it easy to create and manage ?
Well, it's up to you. I like Python for this kind of thing, as it's easy to use, it handles JSON well, and the Twitter clients are pretty straightforward as far as I recall. Also, the PyMongo driver makes it really easy to work with MongoDB.