• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Tim Cooke
  • Devaka Cooray
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
Bartenders:
  • Carey Brown
  • Roland Mueller

Including special characters in an XML which is validated against a schema

 
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,



I am trying to convert a proprietary format file into an XML.
I have created a schema for the XML. However, the proprietary
format file contains special characters like 'τ' etc which if
left as it is, give an error during validation.
If i were using a DTD i could have declared these characters
using the <!ENTITY.... tag but how do i declare these entites
in a schema ? or in the converted XML ?


Thanks and Regards,

Chetan
 
author and deputy
Posts: 3150
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Identifying and converting them into Unicode is one choice!..
 
Chet Arora
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi balaji,

there would be 100s of such characters and if i were
to convert them through my programme it would take
up a lot of resources. id rather have the parser do
it for me. any idea how i can get the parser to do
it for me ?

thanks,
chetan
 
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You could try running your input file through native2ascii before handing it off to the XML parser.
 
Lasse Koskela
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Oh, and if those characters aren't valid Unicode (or the resulting XML is otherwise invalid), obviously a standards-compliant XML parser will throw exceptions so you really should consider converting the characters before the XML parser.
 
author
Posts: 30
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I am trying to convert a proprietary format file into an XML.
I have created a schema for the XML. However, the proprietary
format file contains special characters like 'τ' etc which if
left as it is, give an error during validation.



You must distinguish between the character encoding, and characters that are not allowed in xml. The disallowed characters are essentially the non-printing control characters (except for tab, CR, NL). Those characters have to be removed, or converted to allowable characters.

I'm not sure what your example character is supposed to be, but it isn't a non-printing control character. You need to find out what character encoding is in use, and insert a corresponding character encoding declaration into to xml declaration at the start of your generated xml file.

If your parser does not support that encoding, you need to convert the file to an encoding that is supported. Alternatively, if there are only a few characters like that in each file you could convert them to character references (like   . Just be sure that you remember that character references contain are unicode values, not encoded characters. Thus, an encoded value of 160 in your encoding may not be the same as the unicode #160. You have to translate to the right character.

If i were using a DTD i could have declared these characters
using the <!ENTITY.... tag but how do i declare these entites
in a schema ? or in the converted XML ?



There is no problem about putting a DTD into a schema document. In fact, the schema for schemas given in the XML Schema Recommendation does that very thing. But it is unlikely to solve your problem in this case, because you problem is either incorrect encoding or illegal characters. No tricks with the DTD can solve either of these problems.
 
Nothing up my sleeve ... and ... presto! A tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic