This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes Problem with dom4j Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Problem with dom4j "Getting started" example" Watch "Problem with dom4j "Getting started" example" New topic
Author

Problem with dom4j "Getting started" example

D Swart
Greenhorn

Joined: Nov 07, 2008
Posts: 12
The following code is taken from the dom4j website, http://dom4j.sourceforge.net/download.html

It gives a org.dom4j.DocumentException, and I have no clue why. The code is taken from their example: http://dom4j.sourceforge.net/dom4j-1.6.1/guide.html
and is the first "This is easy to do" example.

Any help most appreciated.

The full error is: org.dom4j.DocumentException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd Nested exception: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

Wouter Oet
Saloon Keeper

Joined: Oct 25, 2008
Posts: 2700

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
10.5.4 503 Service Unavailable

The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.

Note: The existence of the 503 status code does not imply that a
server must use it when becoming overloaded. Some servers may wish
to simply refuse the connection.


"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
Please correct my English.
D Swart
Greenhorn

Joined: Nov 07, 2008
Posts: 12
Thanks Wouter,

The problem is it says that no matter what I set the URL to (I have tried several). Also, it seems to be saying that about some URL which is not www.apache.org.

So I guess how do I make it look at www.apache.org?

Cheers.
Wouter Oet
Saloon Keeper

Joined: Oct 25, 2008
Posts: 2700

I'm not sure why it doesn't work for apache.org.
I've tried google and that works fine. Remember that HTML is not valid XML.
So the sax parser throws exceptions while trying to parse the file.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38045
    
  22
Moving thread.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

What do you mean, the URL with the problem is an apache.org URL? Read the error message again:
The full error is: org.dom4j.DocumentException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd Nested exception: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd


Anyway the answer to this particular error message is in this mail list entry:

http://lists.w3.org/Archives/Public/site-comments/2009Jun/0009.html

In other words, the W3C (the people who defined the standards for XHTML) decided that every XHTML document would refer to a DTD which they specified. They also (probably without thinking too hard about the consequences) decided that the DTD would be hosted on their site. That meant that every single application in the world which parsed an XHTML document would have to go to their site to get that DTD.

Of course a responsible application (like your browser for example) will only get the DTD once, then it will cache it for future use and not go to the W3C site again. But your application isn't a responsible one, as you can see from the remarks in that link about "Java". So the W3C is basically telling your application to get lost, it doesn't have time for you.

Of course you wouldn't think it's every single Java application's responsibility to cache URLs properly. You would think it's the JVM's responsibility to do that on behalf of the applications. But apparently it doesn't do that.

So if you're just doing this to get experience with XML, I recommend you stay away from XHTML pages until you have enough experience to set up an XML catalog or a caching proxy.
D Swart
Greenhorn

Joined: Nov 07, 2008
Posts: 12
Thank you - a very useful answer.

If I do want experience with just DOM parsing/ HTMLpage traversal, can you recommend a tool?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

What do you mean by "a tool"?
D Swart
Greenhorn

Joined: Nov 07, 2008
Posts: 12
Hmmm .... good question. I mean something which enables me to think at a higher level of abstraction.

I really just want to get the job done, where "the job" is read in and parse web pages. Anything that helps me do so is good - the more easily it lets me do this, the better. Does that make sense?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

If you're still thinking of something written in Java code for this tool, then you still have the same problem. However the page I linked to has a little hint at a workaround:
request the DTD resources through it from a user-agent other than one that vaguely identifies itself as Java

So if you use something which requests the page and sets the User-Agent header to something which, say, claims to be Firefox, you might get away with it. I generally use Apache HttpClient to access data over HTTP, as it means I don't have to learn as many details of the HTTP protocol as I would if I tried to code the access myself.

Edit: I just noticed that the page I linked to also says
and apparently if using Apache libraries there is a catalog solution in it

which if it means what I think it means (that the Apache code caches those DTDs for you) would be even better. But I'm just guessing about that.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Problem with dom4j "Getting started" example
 
Similar Threads
Error:The XML page cannot be displayed Cannot view XML input using style sheet
!DOCTYPE and function refreshIt() not working together
!DOCTYPE and function refreshIt() not working together
How to read XML file offline using a browser
Need Help Iterating Elements Using dom4j