The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
Note: The existence of the 503 status code does not imply that a
server must use it when becoming overloaded. Some servers may wish
to simply refuse the connection.
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
Please correct my English.
D Swart
Greenhorn
Joined: Nov 07, 2008
Posts: 12
posted
0
Thanks Wouter,
The problem is it says that no matter what I set the URL to (I have tried several). Also, it seems to be saying that about some URL which is not www.apache.org.
So I guess how do I make it look at www.apache.org?
I'm not sure why it doesn't work for apache.org.
I've tried google and that works fine. Remember that HTML is not valid XML.
So the sax parser throws exceptions while trying to parse the file.
In other words, the W3C (the people who defined the standards for XHTML) decided that every XHTML document would refer to a DTD which they specified. They also (probably without thinking too hard about the consequences) decided that the DTD would be hosted on their site. That meant that every single application in the world which parsed an XHTML document would have to go to their site to get that DTD.
Of course a responsible application (like your browser for example) will only get the DTD once, then it will cache it for future use and not go to the W3C site again. But your application isn't a responsible one, as you can see from the remarks in that link about "Java". So the W3C is basically telling your application to get lost, it doesn't have time for you.
Of course you wouldn't think it's every single Java application's responsibility to cache URLs properly. You would think it's the JVM's responsibility to do that on behalf of the applications. But apparently it doesn't do that.
So if you're just doing this to get experience with XML, I recommend you stay away from XHTML pages until you have enough experience to set up an XML catalog or a caching proxy.
D Swart
Greenhorn
Joined: Nov 07, 2008
Posts: 12
posted
0
Thank you - a very useful answer.
If I do want experience with just DOM parsing/ HTMLpage traversal, can you recommend a tool?
Hmmm .... good question. I mean something which enables me to think at a higher level of abstraction.
I really just want to get the job done, where "the job" is read in and parse web pages. Anything that helps me do so is good - the more easily it lets me do this, the better. Does that make sense?
If you're still thinking of something written in Java code for this tool, then you still have the same problem. However the page I linked to has a little hint at a workaround:
request the DTD resources through it from a user-agent other than one that vaguely identifies itself as Java
So if you use something which requests the page and sets the User-Agent header to something which, say, claims to be Firefox, you might get away with it. I generally use Apache HttpClient to access data over HTTP, as it means I don't have to learn as many details of the HTTP protocol as I would if I tried to code the access myself.
Edit: I just noticed that the page I linked to also says
and apparently if using Apache libraries there is a catalog solution in it
which if it means what I think it means (that the Apache code caches those DTDs for you) would be even better. But I'm just guessing about that.
This message was edited 1 time. Last update was at by Paul Clapham
subject: Problem with dom4j "Getting started" example