Tim Cooke wrote:Doesn't sound much like a Hadoop problem. The difficulty in this problem is being able to correctly identify positive and negative intent from written text. Do you have any idea how you might do that?
no Tim ,Don't have any idea for this i am new in Hadoop. could you help me in this
Forget about Hadoop, your problem has nothing to do with Hadoop.
I'm afraid nobody is going to do the work for you so I don't think you're going to get very far with such a broad question for which you have shown no effort to explore for yourself.
I would recommend dong some research on the topic of Natural Language Processing (NLP) which is a branch of Artificial Intelligence. I don't know anything about it myself but I see straight away that it is a non-trivial subject and would take me many many many hours to obtain even the most foundation level of knowledge.
Is this a work assignment? A school assignment? Just for fun? If it's a work or school assignment then I would expect there to be some knowledge available within your peer group or faculty staff to help get you started. If it's just for fun then, you're going to spend a lot of time on Google.
Last night I was at a Gradle workshop that was put on by one of our local software & training consultancy shops and I bumped into a buddy of mine Gordon who works with Hadoop at another company across town. We discussed this question for a bit and he tells me that this is just the sort of problem that Hadoop is good for. The problem of how to identify a positive and negative comment remains but Hadoop will help with processing huge amounts of data. I know very little about it so I'll just pass on the info he sent me this afternoon:
Tim's mate Gordon wrote:There's two main parts to that guys problem.
1) First of all, hadoop won't perform the action of collecting the data for he is looking for, it's only a tool for processing large amounts of data in a distributed fashion, so he'll need to collect the tweets/posts he wants to process first. This can be down using something like Spring Integration from the following guide. http://spring.io/guides/gs/integration/
2) Once the data is collected, it needs to be processed using a map reduce function. This processing may be done in hadoop, but if it's not a large amount of data, less than 1Gb, then it may actually be slower to process it using hadoop given that it would have to assign the tasks of the job out to different servers.
He'd basically need a mechanism for deciding what are good posts and what are bad based from the input file created in step 1, which would be the mapping, reducing would then be able to squash that information down into sensible output such as the number of good or bad, or how many instances of each word were used, etc. He might then want to run further map reduce jobs on this data to glean even more information from it.
If he was to use hadoop, the data files from step one would just be loaded into hdfs using `hadoop fs -put <local src> <destination>`. Once the data is on hdfs, he'd be able to run his map reduce function using `hadoop jar <user-created-map-red-function>.jar <input args> <output args>`
I'd imagine his first problem will be setting up hadoop in reality...
So there it is. Unfortunately I haven't worked with Hadoop at all so I'm going to be no help for any follow up questions but I thought it worth passing that on in any case.
If you're going to roll your own solution, the hard part is probably working out or finding a good set of tagged "sentiment" terms that will let you calculate a sentiment value for each tweet. Google around and you may be able to find a good set online somewhere.