Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Reading website text

 
Farakh khan
Ranch Hand
Posts: 833
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I made a software using Java Servlet that can extract emails from the text/winword files.

I want to prove it in the way that this should be able to read emails from web. e.g.

1) User will insert the keword(s)

2) User will select the option to search this/these keyword(s) from search engine or specific URL

3) The extracted emails will be given as output *.txt

Can you please help me to know that how can I read emails on the web with Java Servlet?

Thanks & best regards
 
Ashish Hareet
Ranch Hand
Posts: 375
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Farakh,

Reading webpages is essentially the same as reading any other file except for the IO classes. You might want to use java.net.URL class to obtain a stream to the resource.

To make things easier with the parsing of webpages, you can treat the webpages as xml resources & then use DOM or SAX parsers. Have a look at the javax.xml.parsers.SAXParser class as a starting point.

HTH
Ashish Hareet
 
Norm Radder
Ranch Hand
Posts: 883
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
By reading emails, do you mean connecting to an email server and reading the emails it has for a user? Like Outlook express or Thunderbird. To do this I think you need to understand SMTP. Some doc for SMTP are in RFC821 and RFC1869.
 
Farakh khan
Ranch Hand
Posts: 833
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ashish Hareet:
Farakh,

Reading webpages is essentially the same as reading any other file except for the IO classes. You might want to use java.net.URL class to obtain a stream to the resource.

To make things easier with the parsing of webpages, you can treat the webpages as xml resources & then use DOM or SAX parsers. Have a look at the javax.xml.parsers.SAXParser class as a starting point.

HTH
Ashish Hareet



Thanks a lot Ashish Hareet
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic