File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Tree parsing and performance Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Tree parsing and performance " Watch "Tree parsing and performance " New topic

Tree parsing and performance

Tad Dicks
Ranch Hand

Joined: Nov 16, 2004
Posts: 264
I have written an application that creates a fairly large tree structure from an input file and then inserts it into a database. As it parses the tree it inserts records and part of each record is the id of its parent. With really large input files the process gets extremely slow. I'm trying to conceptualize a way to do batch inserts (or anything else to speed up the process).

my basic algorithm is:

Any ideas? It also seems the application speed decreases exponentially while the input size increases linearly (thats a little bit of an overstatement).

Ernest Friedman-Hill
author and iconoclast

Joined: Jul 08, 2003
Posts: 24199

The Enumeration at the top iterates over... all the nodes? Are all the nodes in the tree held in one collection somewhere?

Then the parseNode() method iterates over all the children of each node.

Taken together, and combined with your observation that the runtime goes up exponentially with tree depth, this suggests that parseNode() is being called multiple times on any given node.

If you have a recursive algorithm for walking a tree, you generally start it by calling it once on one node at the root of the tree, and then the recursion takes care of calling it on every tree node. It looks to me as if you're calling it on every node, and many of those calls proceed to recursively call it again on many other nodes. Can the Enumeration at the top be replaced with a single call at the tree root?

[Jess in Action][AskingGoodQuestions]
Tad Dicks
Ranch Hand

Joined: Nov 16, 2004
Posts: 264
The enumeration at the top holds the top level "nodes." If the input file doesn't create a single root node, it artificially creates one from the sequence of nodes that is the top level (I hope that makes sense).
I guess I should have written while(root.hasMoreChildren())as thats what is happening.

Basically if an input file has
parent 1
parent 2
parent 3

The top while statement would get executed 3 times.
I just didn't want the "artificially created root" in the database.
I wouldn't say its exactly exponential. Just if the input file is 50k or so it runs lickety split, if the input file is 500k it takes a lot longer than 10 times the length of the first run.
I agree. Here's the link:
subject: Tree parsing and performance
It's not a secret anymore!