File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Android and the fly likes Retrieve HTML text element HTMLCleaner Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Mobile » Android
Bookmark "Retrieve HTML text element HTMLCleaner" Watch "Retrieve HTML text element HTMLCleaner" New topic
Author

Retrieve HTML text element HTMLCleaner

Lex van Rijswijk
Greenhorn

Joined: Jul 07, 2012
Posts: 10
I'm trying to get a text from a website to show in an Android app. I'm using HTMLCleaner for parsing the HTML code. I dont have much knowledge of HTML but the code seems a bit messy in my opinion. I've read quite a few examples and other topics but I just cant get it to work. My code:



The part of the HTML code I'm trying to retrieve is "Welkom" and "Havana staat voor eten, drinken en dansen in een gezellige sfeer." (Dutch). Part of the HTML code:



I've tried many different setups of XPATH_HOME but everytime my string "HomeTitle" returns empty. I've tried to start XPATH_HOME from <div id="content"> but I've read that its best to stay as close as possible to the element you want to retrieve because of code updates and site adjustments. So what should be the XPATH to the desired text?

Hopefully you can help me out!
Thanks
Mario Alcantara
Greenhorn

Joined: Mar 29, 2011
Posts: 16
What is the complete HTML content of the page? It contains a <html> or <body> tag?
Lex van Rijswijk
Greenhorn

Joined: Jul 07, 2012
Posts: 10
Hello Mario,

It has a <body> tag which is inside <html xmlns="http://www.w3.org/1999/xhtml"> and </html>
What I found out, as soon as you load the www.havana-tilburg.nl page, it first shows an intro. After that it gets to the home page but they both have the same address. Not sure if that is going to be a problem?

I also read that HTMLCleaner cant go that deep into a tree. So I've tried Jsoup and HTML parser. They have the same result though. And that is 'nothing'.

Gr
Mario Alcantara
Greenhorn

Joined: Mar 29, 2011
Posts: 16
I've test your XPath expression and it's correct, I suposse that problem is the content of the page, you can see in the content of root element TagNode using root.getText().toString(), the result must contain the next structure:

Lex van Rijswijk
Greenhorn

Joined: Jul 07, 2012
Posts: 10
Thank you for looking into it. I tried to put the content in a string before but I don't know a good way to show the text.
Is it possible to put the string with the content in a XML file in eclipse? Or how do I make it readable like you show in your last reply?
If I put it in a string I can show it in the android app obviously but that isnt very useful i guess.

Can also you please explain why you asked me if the content was in a HTML tag or a BODY tag?
Gr

Mario Alcantara
Greenhorn

Joined: Mar 29, 2011
Posts: 16
Hi,

I asked you for <body> or <html> tags because I thought that you xpath expression was wrong, but I checked it and it´s fine. On the other hand to order of show html content of the page requested is verify if it contains the right structure, if the html content is very very different you never get the correct tag
Lex van Rijswijk
Greenhorn

Joined: Jul 07, 2012
Posts: 10
Mario, thanks for your help. I think I'm gonna try some different websites and see if I get the right information from those. Probably will back in a few days!
 
 
subject: Retrieve HTML text element HTMLCleaner