File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes How to read text content not source code from webpage in java ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "How to read text content not source code from webpage in java ?" Watch "How to read text content not source code from webpage in java ?" New topic
Author

How to read text content not source code from webpage in java ?

Marimuthu Udayakumar
Greenhorn

Joined: Jun 17, 2008
Posts: 16
Hi Guyz..
How to read text content not source code from webpage using java ?

Thanks,
http://teknoturfian.blogspot.com


Thanks and Regards,
P.Marimuthu Udayakumar
Venkateswara Rao Desu
Greenhorn

Joined: Apr 13, 2009
Posts: 7
In java.net package we have URLConnection class is there. we can use that to connect to some URL and request and get response from that.

-- Venkateswara Rao Desu
Marimuthu Udayakumar
Greenhorn

Joined: Jun 17, 2008
Posts: 16
Hi Venkateswara ,
Thanks for your reply,
I tried this,


import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;


public class URLExp {

public static void main(String[] args) {
try {
URL google = new URL("http://www.google.com/");
URLConnection yc = google.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc
.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);

}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}

}


BUT...
what happend i can get the source code of the webpage ,I need text based real content.So what i do?...
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14104
    
  16

Marimuthu Udayakumar wrote:BUT...
what happend i can get the source code of the webpage ,I need text based real content.So what i do?...

You'd have to parse the HTML in your program and get the text out of it yourself.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19670
    
  18

And next time, please http://faq.javaranch.com/java/UseCodeTags


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Marimuthu Udayakumar
Greenhorn

Joined: Jun 17, 2008
Posts: 16
Hello Jesper Young ,
Thanks for your query,I made it.

Hi Rob Prime,
Thanks for your suggesstion that code Tag, I used that Tag too here...

I used NekoHTML parser ..




I used jar files named nekohtml.jar and xercesImpl.jar for parser ,
I am not able to attach those jarfiles here.just you can download from web,
If you dont get it just mail me to teknoturfian@gmail.com
I will send it to you..
Thanks guys...Have a good day...
http://www.wix.com/muthu_tek/Marimuthu-at-Teknoturf
http://teknoturfian.blogspot.com

" I aim to bring Passion and Quality to every relationship"
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to read text content not source code from webpage in java ?