File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes entity was referenced but not declared Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "entity was referenced but not declared" Watch "entity was referenced but not declared" New topic
Author

entity was referenced but not declared

alexandre saviano
Greenhorn

Joined: Sep 06, 2012
Posts: 2
Greetings to all !

I am new to this forum and would be very grateful if anyone could help me solve a problem with HTML files that I have to translate. I have been searching on many forums, for many hours, during many days, but I still haven't found a solution.
This is regarding entities which are "referenced but not declared". I know the problem has been asked many times and I understand what this is about, for example replacing é by é but I am sure there is another way around and since I have to translate more than a hundred files, all containing french entities (é, è, à...) I cannot afford to search and replace all entities in every file, it would take me days to do that...

The files are encoded in UTF 8 and here are the lines I have been trying to add

<!ENTITY eacute "é" (& # 2 3 3)
<!ENTITY egrave "è" (& # 2 3 2)

but when I add these, I get the following error "The content of elements must consist of well-formed character data or markup"

And if I add a DOC Type before those lines, I get an error saying that DOC Type is not allowed in this document...

I would like to add or create a list of all those entities so I can validate my XML files without any errors, please help me out,

Thank you very much
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

Could you show us a small example of one of these documents? Right now your post isn't entirely clear to me -- these documents are HTML documents and not XML documents, am I right?
alexandre saviano
Greenhorn

Joined: Sep 06, 2012
Posts: 2
Hello,

Thank you for your reply.
Yes the documents are HTML documents.. I have tried to attach one but apparently there's no way to enclose a file with a .htm extension...
Another way is to declare the entities in an external DTD (or internal, which I have been trying to do...) but still cannot do it...
A dtd (list of entities to declare) can be a .txt. file?




Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18987
    
    8

But it looks like you're trying to use XML software to do the translation? Actually I don't see where you described what you were doing at all.

Anyway what I would suggest is to use an HTML parser, one which can read an HTML document into a DOM structure (that's org.w3.dom.Document preferably). Then serialize that DOM into XML.

I'm not sure why you must convert the HTML entities to XML character entities -- why can't you just convert them to the characters themselves? In other words instead of converting "&eacute;" to "&#233;" why not just convert it to "é"?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: entity was referenced but not declared