• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How to Write a MapReduce Program Using the Hadoop to find out positive and negative comments

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi...
i want to write Map Reduce code for count the positive and negative comments about any product or any social media site.
please help me i am new to hadoop programming
 
Sheriff
Posts: 5555
326
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Doesn't sound much like a Hadoop problem. The difficulty in this problem is being able to correctly identify positive and negative intent from written text. Do you have any idea how you might do that?
 
saini kumar
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Cooke wrote:Doesn't sound much like a Hadoop problem. The difficulty in this problem is being able to correctly identify positive and negative intent from written text. Do you have any idea how you might do that?



no Tim ,Don't have any idea for this i am new in Hadoop. could you help me in this
 
Tim Cooke
Sheriff
Posts: 5555
326
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Forget about Hadoop, your problem has nothing to do with Hadoop.

I'm afraid nobody is going to do the work for you so I don't think you're going to get very far with such a broad question for which you have shown no effort to explore for yourself.

I would recommend dong some research on the topic of Natural Language Processing (NLP) which is a branch of Artificial Intelligence. I don't know anything about it myself but I see straight away that it is a non-trivial subject and would take me many many many hours to obtain even the most foundation level of knowledge.

Is this a work assignment? A school assignment? Just for fun? If it's a work or school assignment then I would expect there to be some knowledge available within your peer group or faculty staff to help get you started. If it's just for fun then, you're going to spend a lot of time on Google.
 
Tim Cooke
Sheriff
Posts: 5555
326
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Last night I was at a Gradle workshop that was put on by one of our local software & training consultancy shops and I bumped into a buddy of mine Gordon who works with Hadoop at another company across town. We discussed this question for a bit and he tells me that this is just the sort of problem that Hadoop is good for. The problem of how to identify a positive and negative comment remains but Hadoop will help with processing huge amounts of data. I know very little about it so I'll just pass on the info he sent me this afternoon:

Tim's mate Gordon wrote:There's two main parts to that guys problem.

1) First of all, hadoop won't perform the action of collecting the data for he is looking for, it's only a tool for processing large amounts of data in a distributed fashion, so he'll need to collect the tweets/posts he wants to process first. This can be down using something like Spring Integration from the following guide. http://spring.io/guides/gs/integration/

2) Once the data is collected, it needs to be processed using a map reduce function. This processing may be done in hadoop, but if it's not a large amount of data, less than 1Gb, then it may actually be slower to process it using hadoop given that it would have to assign the tasks of the job out to different servers.

He'd basically need a mechanism for deciding what are good posts and what are bad based from the input file created in step 1, which would be the mapping, reducing would then be able to squash that information down into sensible output such as the number of good or bad, or how many instances of each word were used, etc. He might then want to run further map reduce jobs on this data to glean even more information from it.

If he was to use hadoop, the data files from step one would just be loaded into hdfs using `hadoop fs -put <local src> <destination>`. Once the data is on hdfs, he'd be able to run his map reduce function using `hadoop jar <user-created-map-red-function>.jar <input args> <output args>`

I'd imagine his first problem will be setting up hadoop in reality...


So there it is. Unfortunately I haven't worked with Hadoop at all so I'm going to be no help for any follow up questions but I thought it worth passing that on in any case.
 
Ranch Hand
Posts: 63
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The approach would be to have 3 reference data sets,

1> Positive words
2> Negative words
3> Confusion matrix

Then see if there are positive words or negative words and then classify it has a positive or a negative comment.

Confusion matrix is a contigency table.

The challenges will be to capture positive words added with negative words, sarcasm in comments.

Examples: This is not the best movie, I have watched.

The word not does "not" mean it is a negative comment.

Can you share more information on what is your dataset? The computational power of Hadoop can help you compute such a
huge dataset, however Hadoop will not do any thing by itself.


Hope this helps

Thanks and Regards
Rajesh Nagaraju

 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You could start by working through the Hortonworks tutorials on sentiment analysis:

http://hortonworks.com/use-cases/sentiment-analysis-hadoop-example/

http://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/

Alternatively there is a sentiment analysis example using Google Prediction API.

If you're going to roll your own solution, the hard part is probably working out or finding a good set of tagged "sentiment" terms that will let you calculate a sentiment value for each tweet. Google around and you may be able to find a good set online somewhere.

 
If you want to look young and thin, hang around old, fat people. Or this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic