Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

HtmlUnit show content of aspx page (school timetable)

 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am working on a program that shows my school timetable at the click of a button. I am doing this because I am lazy and want to improve my java skills. After 2 days of searching the web I finally came here to ask for help, I have no idea on how to "print" or show the content of an aspx page (timeable is in .aspx). I am using HtmlUnit.

This is the code I have at the moment:



At the end you see page = (HtmlPage) form.getInputByName("bGetTimetable").click(); This is the button which opens my timetable (.aspx).

Some screens to make it visual:


Fill in info
timetable

timetable html code:

http://pastebin.com/AGqr4f7c

I would be very happy if somebody could explain to me how to download and print (in console) its content !
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you read this page about HtmlUnit and ActiveX: From HtmlUnit project?
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Doesn't the HtmlPage object that gets returned have methods that hand you the page source?
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:Have you read this page about HtmlUnit and ActiveX: From HtmlUnit project?


duh, misread the topic - that is aspx (server side execution) not ActiveX. Ignore my previous post...
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf Dittmer wrote:Doesn't the HtmlPage object that gets returned have methods that hand you the page source?


methods of htmlpage:

methods
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
jeroen druwe wrote:
Ulf Dittmer wrote:Doesn't the HtmlPage object that gets returned have methods that hand you the page source?


methods of htmlpage:

methods


Another place to look for the available methods is the API. You can get that here: HtmlUnit API. That is better because you can see the methods that HtmlPage inherits from its parents.

So looking at that, which methods do you think you can't use to get the content?
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:So looking at that, which methods do you think you can't use to get the content?

I really do not know, it is not in a form or has any id's.... What would you suggest?
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, how do you expect to display the output? what type of value do you need to that?
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
jeroen druwe wrote:
Steve Luke wrote:So looking at that, which methods do you think you can't use to get the content?

I really do not know, it is not in a form or has any id's.... What would you suggest?


String XD, i need a way to read those value in the html table and convert them to string so i can show them in my gui later on.
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay, so what methods in the API look like you can get the page as a String? Recall that a String is also sometimes called Text.
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:Okay, so what methods in the API look like you can get the page as a String? Recall that a String is also sometimes called Text.


Ow ok, i found .asXML(). I need to copy this and same as an .html to open later on, now I need to find out how XD
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry edit fail ^^
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
jeroen druwe wrote:... but is there a way to make it like a xml structure?


There is another method exactly for that. Not much different than the method you found.

Or even better, copy the table structure so I can show it in my program.

It depends on exactly what you want: There is a method to get the contents the page. Hint: What is the name of the HTML element for the tag you want?

You can display HTML in a GUI using built in Swing components. See Text Components in the Java Tutorial.
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:
jeroen druwe wrote:... but is there a way to make it like a xml structure?


There is another method exactly for that. Not much different than the method you found.

Or even better, copy the table structure so I can show it in my program.

It depends on exactly what you want: There is a method to get the contents the page. Hint: What is the name of the HTML element for the tag you want?

You can display HTML in a GUI using built in Swing components. See Text Components in the Java Tutorial.


It is just a list in a table ^^. At the moment I am writing the asXML to an HTML file so I can call it in the GUI
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can probably skip some steps. for example, you can use a JEditorPane (something line new JEditorPane("text/html", theText) without first having to write to a file...
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:You can probably skip some steps. for example, you can use a JEditorPane (something line new JEditorPane("text/html", theText) without first having to write to a file...


Ah k, i will try this tomorrow. (very late atm ^^). I do want thank you from the bottom of my heart for being so helpful!
I will keep you up to date
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Note that JEditorPane is very limited with respect to the HTML it can display. Saving the HTML to a file and calling Desktop.open to use the native browser may be a better option.
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf Dittmer wrote:Note that JEditorPane is very limited with respect to the HTML it can display. Saving the HTML to a file and calling Desktop.open to use the native browser may be a better option.


I noticed that. I have a poblem with 1 line in the html code "<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>"
, Jeditorpane won't accept this as valid html. At the moment I write the page.asXML() in an tempHTML.html file ==> copy it in a
finalHTML.html (deleting the line) ==> using in the .setTest method of the editorpane and it works. Isn't there a better way?
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since you don't really care about the header stuff, you could pull out just the parts you want. For example, the HtmlPage lets you pull out just the Body of the page as an HtmlElement, which you could then display. Or if you wanted to limit it even further, first pull out the Body, then from the Body pull out the table you want.
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:Since you don't really care about the header stuff, you could pull out just the parts you want. For example, the HtmlPage lets you pull out just the Body of the page as an HtmlElement, which you could then display. Or if you wanted to limit it even further, first pull out the Body, then from the Body pull out the table you want.


Ow I see HtmlElement element = page.getbody()
element.asxml returns the body ^^. Thanks I will use that for sure (no need to write and read files)
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:Since you don't really care about the header stuff, you could pull out just the parts you want. For example, the HtmlPage lets you pull out just the Body of the page as an HtmlElement, which you could then display. Or if you wanted to limit it even further, first pull out the Body, then from the Body pull out the table you want.


Sorry for doublepost, but I need a way to get the table out of the body.asXML(). Do you know a way to do it?

Table: http://pastebin.com/Aw4V7SHB
page.getBody().asXML(): http://pastebin.com/1tJWwTf8
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You know the name of the element you want right? What type are you working on? If you look in the API for that class what methods are available to let you get an element whose name you know?
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'll give a better hint, since the HTML is so ugly. You have many tables, table in tables in tables on other tables... and the page looks to have to different data tables you probably want to keep. All the tables have different classes. Classes would be accessible as attributes to the HtmlElement. You can search the body for all the tables with a particular attribute (class) and value to get just the tables that you want.
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:I'll give a better hint, since the HTML is so ugly. You have many tables, table in tables in tables on other tables... and the page looks to have to different data tables you probably want to keep. All the tables have different classes. Classes would be accessible as attributes to the HtmlElement. You can search the body for all the tables with a particular attribute (class) and value to get just the tables that you want.


I found this on the internet:

System.out.println(page.getByXPath("//table[@class='grid-border-args']"));

But I don't know how i could use htmlElement here... (more tips? ^^)
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The getByXPath method provides a list of Nodes. Although it doesn't say it, my guess is those Nodes are likely to be HtmlElements.

The route I was thinking of was something that would get Elements by looking up its Attributes. The method would have to provide three parameters: the name of the element, the attribute name, and the attribute value. It would probably work a lot like that XPath method without the need to know XPath - but it also creates a List of HtmlElements so there is no guesswork as to the type of Nodes returned.
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:The getByXPath method provides a list of Nodes. Although it doesn't say it, my guess is those Nodes are likely to be HtmlElements.

The route I was thinking of was something that would get Elements by looking up its Attributes. The method would have to provide three parameters: the name of the element, the attribute name, and the attribute value. It would probably work a lot like that XPath method without the need to know XPath - but it also creates a List of HtmlElements so there is no guesswork as to the type of Nodes returned.


This method, is it already written, or do I need to make my own?
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's already there
 
jeroen druwe
Greenhorn
Posts: 14
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:It's already there


Ow it's working, thanks for all your help! (if I need more information I will just post a comment ^^)
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic