David Patterson

Ranch Hand
+ Follow
since Jul 01, 2002
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by David Patterson

Can you post what code you have done so far, or are you fishing for someone
to do your assignment for you?

Dave Patterson
No, schema don't allow you to have one set of rules for Hondas and a different set for Toyotas.

You would normally specify that the carType element can have a sequence of zero or one crazydoors elements, and zero or one headlights elements. That will force a sequence of the two elements if both are coded. Or, you could specify a choice of either a crazydoors or a headlights, but not both.

As this shows, schema will not solve all data problems. There is another technology called "Schematron" that recently was accepted as a standard. It allows you to specify XPath strings so you could validate that only Hondas can have crazydoors, and only Toyotas can have headlights.

Dave Patterson
Starting the string with // will find first find a PARENT tag anywhere below the current context node (that is the node from which searches are made). Starting with a single / will only find a PARENT node if it is an immediate child of the current context node.

In any case, any PARENT node that does not have an immediate child of CHILD will be ignored as will any PARENT with an immediate child of CHILD that does not have an immediate child of VALUE.

If you are really looking for



you need to specify /PARENT/CHILD[VALUE] or //PARENT/CHILD[VALUE]

Dave Patterson
Assuming you start with a node you have already found (assume Node myNode), I'd write a method that used a StringBuffer and used getParentNode(). Put the start element at the beginnning of the StringBuffer and the closing element at the end. Then recursively loop until you get a Document or find the root element that you located earlier.

I've thought about writing one of these to build an XPath string from a node deep within a document, but it is still on my ToDo list.

Dave Patterson
The biggest problem I see with people using SAX is that they assume that the characters method is called only once for the string between two tags. Look at the JavaDoc and follow the suggested procedure of starting with an empty
StringBuffer and append the text in each characters call until the endElement method is called.

That is the only safe way to determine what is being passed. There are no rules
for how many times or how few (0) the method is called. It also can differ parser by parser.

How are you seeing evidence that the spaces around an ampersand are being trimmed? Is the data just an ampersand (which is invalid in the input) or an ampersand-amp; string?

Dave Patterson
If you use JDOM, you will wind up with a Document object. You can either use a SAXParser (usually this is the best way to go if your input is an XML file) or a DOMParser (if you already have the DOM Object).

With a Document, you can easily edit:
- change the text string in an element
- add new elements or attributes
- delete elements or attributes
- change the contents of an attribute

Dave Patterson
DOM is a good bet.

Dave Patterson
Yes, using the SAX approach, you can write a program to read the XML file and store the data in a ListArray (or any other collection) of Card objects with the members you show. Using SAX, you write a class to process the data, and get called at three main methods -- startElement, endElement, and characters.

Why XML? In my opinion, on of the big reasons is that you can validate your input before you start. If you were to do a good job on the program above, you would have to worry about what happens if one card did not have a location, or if a card had two enghishwords. What if someone created a file with

<CorrectAttempts>one</CorrectAttempts>

There are lots of ways someone could create a such a file that could cause problems to your program. If you were to describe the proper content (using something called XML Schema) then the sender could check before sending to make sure the input was correct, and you could check before processing the file that it was "proper". You can specify the sequence of elements, how often they occur, and what kind of content is valid.

The difficulty between writing a commercial-grade program to read an unvalidated file vs. a validated one is very large. To process a validated file, the job is much easier.

Is XML the answer to all data interchange problems? No. Can it be force fit into situations for which it is a lousy answer? Of course. Like all technologies, it needs to be used in the right places and avoided in the wrong places.

Dave Patterson
As was suggested, use recursion.

Write a method that takes an Element as a parameter. Its job is to
1) get the child nodes of that Element.
2) loop or iterate through the nodes
3) if you find another Element, check to see if it has elements as children or Text as a child. (If it has Elements, it may also have as its first child a Text node with a newline. So, don't just check for first child == Text as the sole decision.)
4) Add the elements with values to your hashmap,
5) Call the same method to process the child Elements of the new Element.

Also, it is VERY BAD FORM to use code like:



I've been doing Java processing of XML for about 3 years, and I have no idea what type of node you are selecting. If you want to use code like this, at least use the static fields in Node like Node.ELEMENT_NODE. Another approach is to use



The third approach works if you know you will not be seeing processing instructions or other strange content - just Elements, Text nodes and Comment nodes.



Dave Patterson
With DOM processing, you get a Document at the start.

From the Document, here is how you find the element you want.


The value of ChildNode is stored in a Text node. The code above should return the correct value for you.

Dave Patterson
The main interface involved in SAX is a ContentHandler. You write your own class that implments this interface. You supply methods to respond to events. One method is called when the document starts, another when the document ends. One is called when an element starts, one when it ends. Between these two there may be calls to a "characters" method if there are text character specified between the start end end tags. If elements are nested, you may get two starts then two ends.

The entire procesing is up to you. The sequence follows the input source. If you don't care about a specific element when it is processed, do nothing.

When the document end method is called, SAX is finished. Whatever you have kept in whatever format is all that is kept.

This is in contrast to DOM which reads the entire input and constructs a tree of elements. Then entire source is represented by the tree. You can move elements or attributes around to make a different file, you can run it through a transformer. You can search it using XPath to find sequences of elements or structures in the document and process them as you wish. When you are done, you can serialize it (to produce an XML file, or an xml-format stream.

So, SAX is a Simple API for XML as its name implies. It does not have large demands for memory. You can process a huge file and if you don't want to keep much data, or you are summing data from the elements that go by, you will not require much memory. DOM builds a tree of Nodes to represent the entire file. It takes more space to hold an element than it takes for the minimal character representation -- "<a/>" 4 characters vs. dozens or hundreds.

Both will process the same input, and with SAX, you will see all input as it goes by. You may keep what you want in whatever format you want. But, if you don't keep it, it is not stored somewhere for you to process unless you run the input source through SAX again.

Does this help you understand the differences?

Dave Patterson
Another approach would be to read the whole file (or a line at a time) into a StringBuffer. Then loop using charAt() to get each char. Use a switch block to select the ones you want to fix and default to doing nothing. The ones you want to fix can be replace with setCharAt().

By the way, here is a tool that was written to do a specific instance of this kind of fixup -- to correct bad HTML generated by MS Office tools.

http://www.fourmilab.ch/webtools/demoroniser/

Dave Patterson
18 years ago
xml
Let's do some assumptions first.

I'm assuming you have used DOM to get a Document with your XML data in it.

By "value" I assume yuu mean the text between the start end end tags, like
the "xxx" in this example:

<a>xxx</a>

To do that, you need to
1) Find the <a> element.
2) Delete all of its existing children.
3) Create a new Text node with the new content you want.
4) Add add it as a child to the <a> element.
5) Do anything else you want to do to the Document.
6) Serialize it to save the XML file.

Dave Patterson
The first parameter for replaceAll is a String, but not a literal string, it is a regular expression.

The problem is that "*" has special meaning as part of an expression. To use a regular expression to match a single asterisk, use "\\*". The regular expression you want is '\*'. To get that you have to double the first backslash since "\*" is invalid -- it is not a valid Java escape sequence like "\n" or "\t". But "\\" means put in a single backslash. So if you code

String str = value.replaceAll("\\*","%");

it should work fine.

Dave Patterson
18 years ago
Well, your template is still wrong. You have one template, that produces one <identities> tag. I think that is OK.

It then produces a <users> tag, and again, that seems OK.

But the <user> tag should be inside a for-each so that it gets produced
for each (hence the name) of the user elements you process.

It might be easier to get your head around if you made a second template with a name= attribute. Then, in your first template you can call the second one to do the entire <user> tag and its children. You would use the
<xsl:call-template name="xx"/> entry to call this second template.

Dave Patterson