File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes XML file parsing in Java using SAX Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML file parsing in Java using SAX" Watch "XML file parsing in Java using SAX" New topic
Author

XML file parsing in Java using SAX

Anjali Raman
Ranch Hand

Joined: Nov 28, 2007
Posts: 57
Hello ppl,
Not sure if I am posting it in the right place in the forum, dint find anything apt.

Now for the question, I am trying to parse a XML file in java using SAX and having some issues with the way its getting formatted.
Here goes the code:


And the XML file I am trying to parse is :


The Output is:
===================================================
Start element: company
Character =

Start element: staff
Character =

Start element: firstname
Character = yong
Character =

Start element: lastname
Character = mook kim
Character =

Start element: nickname
Character = mkyong
Character =

Start element: salary
Character = 100000
Character =

Character =

Start element: staff
Character =

Start element: firstname
Character = low
Character =

Start element: lastname
Character = yin fong
Character =

Start element: nickname
Character = fong fong
Character =

Start element: salary
Character = 200000
Character =

Character =
========================================================

My problem is that I want to get rid of the Empty Character

Could someone help me with this...


Cheers,
Anjali


Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41621
    
  55
You can use a boolean that keeps track of whether the parser is currently inside one of the elements that have text content (firstname, lastname, nickname and salary). If it's not, skip character processing.


Ping & DNS - my free Android networking tools app
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3499
    
  13
Anjali Raman wrote:My problem is that I want to get rid of the Empty Character




String are immutable, so charString.trim() doesn't actually change charString at all and so charString.isEmpty() will never be true.

You would be better off using a StringBuilder. Continuous concatenation of single chars to a String creates lots of String objects.

Try this


Joanne
Anjali Raman
Ranch Hand

Joined: Nov 28, 2007
Posts: 57
Hello,
Well that was just a example that I was trying.
In reality my XML file looks as horrid as this :



And the tags m02000c0037 are not constant. They keep varying. So I need something generic where we dont specify the elements/node names.
vinod ernakulam
Greenhorn

Joined: Jun 29, 2009
Posts: 7
Try this.


public void characters (char ch[], int start, int length)
{
String charString="";
for (int i = start; i < start + length; i++) {
switch (ch[i]) {
case '\\':

break;
case '"':

break;
case '\n':

break;
case '\r':

break;
case '\t':

break;
default:
charString = charString+ch[i];
break;

}

}
if(charString.trim() != "" && !charString.isEmpty())
System.out.println("Character = "+charString);

}
Anjali Raman
Ranch Hand

Joined: Nov 28, 2007
Posts: 57
Thanks Neal...your solution worked great for me...

Now I would actually like to add these values into a Hash map.
If the start element tag is empty like in case of company and staff, how can I make sure that it does not go to characters?
Anjali Raman
Ranch Hand

Joined: Nov 28, 2007
Posts: 57
My additional question is - How can I add these into a Hashmap?

Saw something similar in javaranch, but the examples was not clear enough.
Anjali Raman
Ranch Hand

Joined: Nov 28, 2007
Posts: 57
Also can you please let me know whats the best way to add this into a Hashmap.
I cant use the nodes as Keys as they will repeat .

Any help on this will be greatly appreciated
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
When you retrieve from this HashMap - what are you expecting to get back? How are you going to ask for it?
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187


You might want to modify this so that you don't create a String object everytime the characters method is called by the parser. The concatenation is also creating another String object. Look into StringBuffer as a Class variable.
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3499
    
  13
Anjali Raman wrote:If the start element tag is empty like in case of company and staff, how can I make sure that it does not go to characters?


The characters you get after the empty tags like company and staff are nothing to do with the value of the tag. Notice that you also get an 'empty' characters array with every tag. It is actualy the whitespace at the start of every line. You can verify this by parsing a file where the whole xml is on one line
<?xml version="1.0"?>
<company><staff><firstname>yong</firstname><lastname>mook kim</lastname><nickname>mkyong</nickname><salary>100000</salary></staff></company>

Other than removing the whitespace from the xml file (not recommended as it would make it very unreadable by humans), your only other option is to ignore it. Notice that this whitespace will always be after a closing tag and characters that represent the value of a tag will always be after an opening tag. So you just need to maintain a flag that indicates whether the last r=tag parsed was an opening or closing one.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
<company>
<staff>
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
</company>


If the start element tag is empty like in case of company and staff, how can I make sure that it does not go to characters?


XML Elements can have two types of content: element content and character content. So, in your example file the company and staff elements have element content, not character content. For example, the content of the staff element is firstname, lastname, nickname and salary child elements.

You don't have to make sure that parser will not call the characters method, because there is no character content in the staff or company element.

Keep in mind that your SAX content handler is only receiving events from an XML parser, it is not the actual parser. The code in the parser implementation is making sure that it only sends correct events to the content handler.

The Simple API for XML (SAX) is only a programming API, it is not an XML parser. Apache Xerces is an XML parser.
Anjali Raman
Ranch Hand

Joined: Nov 28, 2007
Posts: 57
@William Brogden : The whole intention to put it in a hash map is that in my program I want to retrieve the values in this format : I will take the example of the huge xml file i had attached.

CLA-0:
Key Value
m02000c0003 4.0499999999999998224
m02000c0004 5.9333333333333335702
.
.
.
.
Similarly I need to collect the data for CLA-1 node as well

Is there any way to doing this? And since Hashmap takes only unique values and the tags m02000c0003 repeats in the CLA-1 node also, I tried to concatinate the CLA-0 to it so that the key will be unique.


And once again thanks for all your help
Do help to find one good solution for my problem.





Anjali Raman
Ranch Hand

Joined: Nov 28, 2007
Posts: 57
Also Joanne: You told me that I should set a boolean flag to make sure that the tag has a ned tag. I dint quite understand this.
Could you please brief more on this.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
This sounds to me like you really want at two step process based on a hashmap of hashmaps rather than trying to invent some way to make a single composite key.

The first step would retrieve a hashmap for CLA-O entries which in turn has the key and value entities as per your example.

Bill
 
jQuery in Action, 2nd edition
 
subject: XML file parsing in Java using SAX