• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

XML file parsing in Java using SAX

 
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello ppl,
Not sure if I am posting it in the right place in the forum, dint find anything apt.

Now for the question, I am trying to parse a XML file in java using SAX and having some issues with the way its getting formatted.
Here goes the code:


And the XML file I am trying to parse is :


The Output is:
===================================================
Start element: company
Character =

Start element: staff
Character =

Start element: firstname
Character = yong
Character =

Start element: lastname
Character = mook kim
Character =

Start element: nickname
Character = mkyong
Character =

Start element: salary
Character = 100000
Character =

Character =

Start element: staff
Character =

Start element: firstname
Character = low
Character =

Start element: lastname
Character = yin fong
Character =

Start element: nickname
Character = fong fong
Character =

Start element: salary
Character = 200000
Character =

Character =
========================================================

My problem is that I want to get rid of the Empty Character

Could someone help me with this...


Cheers,
Anjali


 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can use a boolean that keeps track of whether the parser is currently inside one of the elements that have text content (firstname, lastname, nickname and salary). If it's not, skip character processing.
 
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Anjali Raman wrote:My problem is that I want to get rid of the Empty Character





String are immutable, so charString.trim() doesn't actually change charString at all and so charString.isEmpty() will never be true.

You would be better off using a StringBuilder. Continuous concatenation of single chars to a String creates lots of String objects.

Try this
 
Anjali Raman
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,
Well that was just a example that I was trying.
In reality my XML file looks as horrid as this :



And the tags m02000c0037 are not constant. They keep varying. So I need something generic where we dont specify the elements/node names.
 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try this.


public void characters (char ch[], int start, int length)
{
String charString="";
for (int i = start; i < start + length; i++) {
switch (ch[i]) {
case '\\':

break;
case '"':

break;
case '\n':

break;
case '\r':

break;
case '\t':

break;
default:
charString = charString+ch[i];
break;

}

}
if(charString.trim() != "" && !charString.isEmpty())
System.out.println("Character = "+charString);

}
 
Anjali Raman
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Neal...your solution worked great for me...

Now I would actually like to add these values into a Hash map.
If the start element tag is empty like in case of company and staff, how can I make sure that it does not go to characters?
 
Anjali Raman
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My additional question is - How can I add these into a Hashmap?

Saw something similar in javaranch, but the examples was not clear enough.
 
Anjali Raman
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also can you please let me know whats the best way to add this into a Hashmap.
I cant use the nodes as Keys as they will repeat .

Any help on this will be greatly appreciated
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
When you retrieve from this HashMap - what are you expecting to get back? How are you going to ask for it?
 
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


You might want to modify this so that you don't create a String object everytime the characters method is called by the parser. The concatenation is also creating another String object. Look into StringBuffer as a Class variable.
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Anjali Raman wrote:If the start element tag is empty like in case of company and staff, how can I make sure that it does not go to characters?



The characters you get after the empty tags like company and staff are nothing to do with the value of the tag. Notice that you also get an 'empty' characters array with every tag. It is actualy the whitespace at the start of every line. You can verify this by parsing a file where the whole xml is on one line
<?xml version="1.0"?>
<company><staff><firstname>yong</firstname><lastname>mook kim</lastname><nickname>mkyong</nickname><salary>100000</salary></staff></company>

Other than removing the whitespace from the xml file (not recommended as it would make it very unreadable by humans), your only other option is to ignore it. Notice that this whitespace will always be after a closing tag and characters that represent the value of a tag will always be after an opening tag. So you just need to maintain a flag that indicates whether the last r=tag parsed was an opening or closing one.
 
Jimmy Clark
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

<company>
<staff>
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
</company>



If the start element tag is empty like in case of company and staff, how can I make sure that it does not go to characters?



XML Elements can have two types of content: element content and character content. So, in your example file the company and staff elements have element content, not character content. For example, the content of the staff element is firstname, lastname, nickname and salary child elements.

You don't have to make sure that parser will not call the characters method, because there is no character content in the staff or company element.

Keep in mind that your SAX content handler is only receiving events from an XML parser, it is not the actual parser. The code in the parser implementation is making sure that it only sends correct events to the content handler.

The Simple API for XML (SAX) is only a programming API, it is not an XML parser. Apache Xerces is an XML parser.
 
Anjali Raman
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@William Brogden : The whole intention to put it in a hash map is that in my program I want to retrieve the values in this format : I will take the example of the huge xml file i had attached.

CLA-0:
Key Value
m02000c0003 4.0499999999999998224
m02000c0004 5.9333333333333335702
.
.
.
.
Similarly I need to collect the data for CLA-1 node as well

Is there any way to doing this? And since Hashmap takes only unique values and the tags m02000c0003 repeats in the CLA-1 node also, I tried to concatinate the CLA-0 to it so that the key will be unique.


And once again thanks for all your help
Do help to find one good solution for my problem.





 
Anjali Raman
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also Joanne: You told me that I should set a boolean flag to make sure that the tag has a ned tag. I dint quite understand this.
Could you please brief more on this.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This sounds to me like you really want at two step process based on a hashmap of hashmaps rather than trying to invent some way to make a single composite key.

The first step would retrieve a hashmap for CLA-O entries which in turn has the key and value entities as per your example.

Bill
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic