aspose file tools*
The moose likes XML and Related Technologies and the fly likes Ignoring ParsingExceptions for quotes Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Ignoring ParsingExceptions for quotes" Watch "Ignoring ParsingExceptions for quotes" New topic
Author

Ignoring ParsingExceptions for quotes

Mustafa Garhi
Ranch Hand

Joined: Nov 05, 2008
Posts: 111
Hi,

I used the JAXP library to convert HTML to CSV using an XSL file.

The source file is huge and has unqouted attributes so it results in a javax.xml.transform.TransformerException.

I tried finding a transformer that ignores the parsing errors but failed.

Now i am thinking of tidying my HTML file in a java code if there are no options.

Here is the JAXP code :



Please suggest how do i get around the issue.

Thanks in advance.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
I don't think it is possible or advisable to try to recover from SAXParseExceptions.

You are going to have to do some sort of preliminary cleanup. JTidy might be able to handle it.

If the input file problems are regular, you might be able to scan it as text and patch the missing attribute quotes to a new file.

Another possibility would be to drop XSLT and scan as text, writing the CSV directly.

How large a "huge" file are we talking about.

Bill
Mustafa Garhi
Ranch Hand

Joined: Nov 05, 2008
Posts: 111
Thanks man,

Around 30k records distributed over 300 html files. So 300 recs per html with some javascript and other UI code on each page.

Anyway, i think tidying up the HTML looks to be the best option.

Thanks again.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
It would be great if you could later post what your solution was, after you get this working. We tend to get similar questions frequently and your solution could help a lot of people.

Thanks
Bill
Mustafa Garhi
Ranch Hand

Joined: Nov 05, 2008
Posts: 111
Ill sure do that once done.

Thanks
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Ignoring ParsingExceptions for quotes