aspose file tools*
The moose likes Java in General and the fly likes Transformation Problem For Arabic/French Character : HTML to XML Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Transformation Problem For Arabic/French Character : HTML to XML" Watch "Transformation Problem For Arabic/French Character : HTML to XML" New topic
Author

Transformation Problem For Arabic/French Character : HTML to XML

Tanzy Akhtar
Ranch Hand

Joined: Jul 19, 2009
Posts: 110
Hi,

I have a program, which converts html into xml.
I am facing problem when transforming french/arabic character.

The code is as follows--



The transformer used is --



and the version of jar is xalan-2.7.0.

Any idea, what i am missing?

Thanks,
Tanzy.

Roll with punchers, there is always tomorrow.
Techie Blog -- http://jtanzy.blogspot.com/
Aneesh Vijendran
Ranch Hand

Joined: Jun 29, 2008
Posts: 125
Hi,

By default xalan uses UTF-8, so there is no problem with the transformer/parser. The problem should b with your xsl.

Could you try giving the UTF-8 encoding in your xsl. You could do that by adding the following in the xsl file:





You can have a more look for the syntax here:

http://www.w3schools.com/xsl/el_output.asp


Cheers
Aneesh


Cheers
Aneesh
Tanzy Akhtar
Ranch Hand

Joined: Jul 19, 2009
Posts: 110
Thank You Aneesh for the respoonse.

<xslutput
method="xml"
encoding="UTF-8"

....
...
/>


This is been already done.

Meaning, there is some other issue.

Thanks,
Tanzy.
Aneesh Vijendran
Ranch Hand

Joined: Jun 29, 2008
Posts: 125
Could you please attach the xsl & xml here, through the attachments?

Cheers
Aneesh
Tanzy Akhtar
Ranch Hand

Joined: Jul 19, 2009
Posts: 110
Aneesh, this program is running fine if i am not using xalan transformer.

Problem occurs only when using xalan.
Is there any other transformer to which i should use instead of xalan?
Aneesh Vijendran
Ranch Hand

Joined: Jun 29, 2008
Posts: 125
I didnt get the point:

Aneesh, this program is running fine if i am not using xalan transformer


Are you getting
???
in the transformed xml. or any exception. Could you please let me know the exact problem?

You can try using cocoon. But I can say the problem is not with the transformer. I have done numerous transformations with a variety of unicode Indian characters.

Cheers
Aneesh
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19783
    
  20

tanzy akhtar wrote:I am facing problem when transforming french/arabic character.

What's the problem? http://faq.javaranch.com/java/TellTheDetails


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Tanzy Akhtar
Ranch Hand

Joined: Jul 19, 2009
Posts: 110
Are you getting

???


Yes exactly Aneesh. This is my problem. After transformation arabic/french character get replaced by "???".


Sorry Rob, i could not specify my problem earlier and thanks for pointing that.
Tanzy Akhtar
Ranch Hand

Joined: Jul 19, 2009
Posts: 110
Before doing actual transformation, replacing "nbsp" character with "\n".
Below is the program which gets execute before the transformation takes place--



Here may be some problem when copying.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19783
    
  20

&nbsp; is not the same as an enter. The nearest equivalent is a space. That's where the name comes from: non breaking space.
Tanzy Akhtar
Ranch Hand

Joined: Jul 19, 2009
Posts: 110
Thanks Rob.

Well is that nbsp; creating problem in my case?
Aneesh Vijendran
Ranch Hand

Joined: Jun 29, 2008
Posts: 125
Tanzy,

Shoudn't it be

Well what I think is, there isn't nay problem with your code, I guess it's rather the problem with your browser. Did you try making the browser charset to unicode?

(Mozilla) View ->Character Encoding -> Unicode
(IE) View ->Enclding->Unicode

Let me know this.

Cheers
Aneesh
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19783
    
  20

tanzy akhtar wrote:Thanks Rob.

Well is that nbsp; creating problem in my case?

Possibly;   is a valid HTML entity, but unless you declare it explicitly again, it's not in XML. For instance, the following XML document gives me the following errors when running through xmllint:
Aneesh Vijendran
Ranch Hand

Joined: Jun 29, 2008
Posts: 125
Rob, I guess it's becasue &amb;nbsp; is not an xml entity, it's only an html entity. So I guess his problem seems to be the encoding issue.

Cheers
Aneesh
Tanzy Akhtar
Ranch Hand

Joined: Jul 19, 2009
Posts: 110
Shoudn't it be

 rather than nbsp; ?


Yes Aneesh. It is the same character you are saying.
Actually while posting the ampersand character, it got invisible.

Thanks for correcting this.

Well, browser setting is also done, even though it's not working.

Aneesh Vijendran
Ranch Hand

Joined: Jun 29, 2008
Posts: 125
hi,

Have send you a Pm with my email.


Cheers
Aneesh
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19783
    
  20

Aneesh Vijendran wrote:Have send you a Pm with my email.

http://faq.javaranch.com/java/UseTheForumNotEmail
Don't use email or private messages to come to a solution; other people will not see those so you are withholding that solution from everybody else.
Aneesh Vijendran
Ranch Hand

Joined: Jun 29, 2008
Posts: 125
Rob, I would definitely post the solution here, if ever I come to a solution. If there are some files which he can't put public, what could be done?

Cheers
Aneesh
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19783
    
  20

He should obfuscate them - replace all sensitive data. I once had to export our customer database; I ended up giving them all my manager's name in the export
Aneesh Vijendran
Ranch Hand

Joined: Jun 29, 2008
Posts: 125
yeah you are right

oops manager's details lol!. He might have got a hundred calls regarding market research and stuff lol!
Tanzy Akhtar
Ranch Hand

Joined: Jul 19, 2009
Posts: 110
lolz...
nice conversation..
Tanzy Akhtar
Ranch Hand

Joined: Jul 19, 2009
Posts: 110
Hi,

I got work around of my problem..

Just put "UTF-8" as parameter wherever creation of inputstream/outputstream takes place.

It's working fine for me.

Thank you Rob and Aneesh for useful guidelines.

Life Rocks,
Tanzy.
Aneesh Vijendran
Ranch Hand

Joined: Jun 29, 2008
Posts: 125
Excellant!!!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Transformation Problem For Arabic/French Character : HTML to XML