This week's book giveaway is in the OCMJEA forum.
We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line!
See this thread for details.
The moose likes Java in General and the fly likes HtmlUnit show content of aspx page (school timetable) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "HtmlUnit show content of aspx page (school timetable)" Watch "HtmlUnit show content of aspx page (school timetable)" New topic
Author

HtmlUnit show content of aspx page (school timetable)

jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
I am working on a program that shows my school timetable at the click of a button. I am doing this because I am lazy and want to improve my java skills. After 2 days of searching the web I finally came here to ask for help, I have no idea on how to "print" or show the content of an aspx page (timeable is in .aspx). I am using HtmlUnit.

This is the code I have at the moment:



At the end you see page = (HtmlPage) form.getInputByName("bGetTimetable").click(); This is the button which opens my timetable (.aspx).

Some screens to make it visual:


Fill in info
timetable

timetable html code:

http://pastebin.com/AGqr4f7c

I would be very happy if somebody could explain to me how to download and print (in console) its content !
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Have you read this page about HtmlUnit and ActiveX: From HtmlUnit project?


Steve
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41634
    
  55
Doesn't the HtmlPage object that gets returned have methods that hand you the page source?


Ping & DNS - my free Android networking tools app
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Steve Luke wrote:Have you read this page about HtmlUnit and ActiveX: From HtmlUnit project?


duh, misread the topic - that is aspx (server side execution) not ActiveX. Ignore my previous post...
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Ulf Dittmer wrote:Doesn't the HtmlPage object that gets returned have methods that hand you the page source?


methods of htmlpage:

methods
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

jeroen druwe wrote:
Ulf Dittmer wrote:Doesn't the HtmlPage object that gets returned have methods that hand you the page source?


methods of htmlpage:

methods


Another place to look for the available methods is the API. You can get that here: HtmlUnit API. That is better because you can see the methods that HtmlPage inherits from its parents.

So looking at that, which methods do you think you can't use to get the content?
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Steve Luke wrote:So looking at that, which methods do you think you can't use to get the content?

I really do not know, it is not in a form or has any id's.... What would you suggest?
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Well, how do you expect to display the output? what type of value do you need to that?
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
jeroen druwe wrote:
Steve Luke wrote:So looking at that, which methods do you think you can't use to get the content?

I really do not know, it is not in a form or has any id's.... What would you suggest?


String XD, i need a way to read those value in the html table and convert them to string so i can show them in my gui later on.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Okay, so what methods in the API look like you can get the page as a String? Recall that a String is also sometimes called Text.
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Steve Luke wrote:Okay, so what methods in the API look like you can get the page as a String? Recall that a String is also sometimes called Text.


Ow ok, i found .asXML(). I need to copy this and same as an .html to open later on, now I need to find out how XD
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Sorry edit fail ^^
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

jeroen druwe wrote:... but is there a way to make it like a xml structure?


There is another method exactly for that. Not much different than the method you found.

Or even better, copy the table structure so I can show it in my program.

It depends on exactly what you want: There is a method to get the contents the page. Hint: What is the name of the HTML element for the tag you want?

You can display HTML in a GUI using built in Swing components. See Text Components in the Java Tutorial.
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Steve Luke wrote:
jeroen druwe wrote:... but is there a way to make it like a xml structure?


There is another method exactly for that. Not much different than the method you found.

Or even better, copy the table structure so I can show it in my program.

It depends on exactly what you want: There is a method to get the contents the page. Hint: What is the name of the HTML element for the tag you want?

You can display HTML in a GUI using built in Swing components. See Text Components in the Java Tutorial.


It is just a list in a table ^^. At the moment I am writing the asXML to an HTML file so I can call it in the GUI
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

You can probably skip some steps. for example, you can use a JEditorPane (something line new JEditorPane("text/html", theText) without first having to write to a file...
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Steve Luke wrote:You can probably skip some steps. for example, you can use a JEditorPane (something line new JEditorPane("text/html", theText) without first having to write to a file...


Ah k, i will try this tomorrow. (very late atm ^^). I do want thank you from the bottom of my heart for being so helpful!
I will keep you up to date
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41634
    
  55
Note that JEditorPane is very limited with respect to the HTML it can display. Saving the HTML to a file and calling Desktop.open to use the native browser may be a better option.
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Ulf Dittmer wrote:Note that JEditorPane is very limited with respect to the HTML it can display. Saving the HTML to a file and calling Desktop.open to use the native browser may be a better option.


I noticed that. I have a poblem with 1 line in the html code "<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>"
, Jeditorpane won't accept this as valid html. At the moment I write the page.asXML() in an tempHTML.html file ==> copy it in a
finalHTML.html (deleting the line) ==> using in the .setTest method of the editorpane and it works. Isn't there a better way?
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Since you don't really care about the header stuff, you could pull out just the parts you want. For example, the HtmlPage lets you pull out just the Body of the page as an HtmlElement, which you could then display. Or if you wanted to limit it even further, first pull out the Body, then from the Body pull out the table you want.
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Steve Luke wrote:Since you don't really care about the header stuff, you could pull out just the parts you want. For example, the HtmlPage lets you pull out just the Body of the page as an HtmlElement, which you could then display. Or if you wanted to limit it even further, first pull out the Body, then from the Body pull out the table you want.


Ow I see HtmlElement element = page.getbody()
element.asxml returns the body ^^. Thanks I will use that for sure (no need to write and read files)
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Steve Luke wrote:Since you don't really care about the header stuff, you could pull out just the parts you want. For example, the HtmlPage lets you pull out just the Body of the page as an HtmlElement, which you could then display. Or if you wanted to limit it even further, first pull out the Body, then from the Body pull out the table you want.


Sorry for doublepost, but I need a way to get the table out of the body.asXML(). Do you know a way to do it?

Table: http://pastebin.com/Aw4V7SHB
page.getBody().asXML(): http://pastebin.com/1tJWwTf8
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

You know the name of the element you want right? What type are you working on? If you look in the API for that class what methods are available to let you get an element whose name you know?
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

I'll give a better hint, since the HTML is so ugly. You have many tables, table in tables in tables on other tables... and the page looks to have to different data tables you probably want to keep. All the tables have different classes. Classes would be accessible as attributes to the HtmlElement. You can search the body for all the tables with a particular attribute (class) and value to get just the tables that you want.
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Steve Luke wrote:I'll give a better hint, since the HTML is so ugly. You have many tables, table in tables in tables on other tables... and the page looks to have to different data tables you probably want to keep. All the tables have different classes. Classes would be accessible as attributes to the HtmlElement. You can search the body for all the tables with a particular attribute (class) and value to get just the tables that you want.


I found this on the internet:

System.out.println(page.getByXPath("//table[@class='grid-border-args']"));

But I don't know how i could use htmlElement here... (more tips? ^^)
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

The getByXPath method provides a list of Nodes. Although it doesn't say it, my guess is those Nodes are likely to be HtmlElements.

The route I was thinking of was something that would get Elements by looking up its Attributes. The method would have to provide three parameters: the name of the element, the attribute name, and the attribute value. It would probably work a lot like that XPath method without the need to know XPath - but it also creates a List of HtmlElements so there is no guesswork as to the type of Nodes returned.
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Steve Luke wrote:The getByXPath method provides a list of Nodes. Although it doesn't say it, my guess is those Nodes are likely to be HtmlElements.

The route I was thinking of was something that would get Elements by looking up its Attributes. The method would have to provide three parameters: the name of the element, the attribute name, and the attribute value. It would probably work a lot like that XPath method without the need to know XPath - but it also creates a List of HtmlElements so there is no guesswork as to the type of Nodes returned.


This method, is it already written, or do I need to make my own?
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

It's already there
jeroen druwe
Greenhorn

Joined: Oct 07, 2012
Posts: 14
Steve Luke wrote:It's already there


Ow it's working, thanks for all your help! (if I need more information I will just post a comment ^^)
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: HtmlUnit show content of aspx page (school timetable)