aspose file tools*
The moose likes Beginning Java and the fly likes Reading website text Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Reading website text" Watch "Reading website text" New topic
Author

Reading website text

Farakh khan
Ranch Hand

Joined: Mar 22, 2008
Posts: 732
I made a software using Java Servlet that can extract emails from the text/winword files.

I want to prove it in the way that this should be able to read emails from web. e.g.

1) User will insert the keword(s)

2) User will select the option to search this/these keyword(s) from search engine or specific URL

3) The extracted emails will be given as output *.txt

Can you please help me to know that how can I read emails on the web with Java Servlet?

Thanks & best regards
Ashish Hareet
Ranch Hand

Joined: Jul 14, 2001
Posts: 375
Farakh,

Reading webpages is essentially the same as reading any other file except for the IO classes. You might want to use java.net.URL class to obtain a stream to the resource.

To make things easier with the parsing of webpages, you can treat the webpages as xml resources & then use DOM or SAX parsers. Have a look at the javax.xml.parsers.SAXParser class as a starting point.

HTH
Ashish Hareet
Norm Radder
Ranch Hand

Joined: Aug 10, 2005
Posts: 685
By reading emails, do you mean connecting to an email server and reading the emails it has for a user? Like Outlook express or Thunderbird. To do this I think you need to understand SMTP. Some doc for SMTP are in RFC821 and RFC1869.
Farakh khan
Ranch Hand

Joined: Mar 22, 2008
Posts: 732
Originally posted by Ashish Hareet:
Farakh,

Reading webpages is essentially the same as reading any other file except for the IO classes. You might want to use java.net.URL class to obtain a stream to the resource.

To make things easier with the parsing of webpages, you can treat the webpages as xml resources & then use DOM or SAX parsers. Have a look at the javax.xml.parsers.SAXParser class as a starting point.

HTH
Ashish Hareet



Thanks a lot Ashish Hareet
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Reading website text