This week's giveaway is in the Spring forum.We're giving away four copies of Learn Spring Security (video course) and have Eugen Paraschiv on-line!See this thread for details.
Win a copy of Learn Spring Security (video course) this week in the Spring forum!

# Tree parsing and performance

Ranch Hand
Posts: 264
I have written an application that creates a fairly large tree structure from an input file and then inserts it into a database. As it parses the tree it inserts records and part of each record is the id of its parent. With really large input files the process gets extremely slow. I'm trying to conceptualize a way to do batch inserts (or anything else to speed up the process).

my basic algorithm is:

Any ideas? It also seems the application speed decreases exponentially while the input size increases linearly (thats a little bit of an overstatement).

Ernest Friedman-Hill
author and iconoclast
Marshal
Posts: 24208
35
The Enumeration at the top iterates over... all the nodes? Are all the nodes in the tree held in one collection somewhere?

Then the parseNode() method iterates over all the children of each node.

Taken together, and combined with your observation that the runtime goes up exponentially with tree depth, this suggests that parseNode() is being called multiple times on any given node.

If you have a recursive algorithm for walking a tree, you generally start it by calling it once on one node at the root of the tree, and then the recursion takes care of calling it on every tree node. It looks to me as if you're calling it on every node, and many of those calls proceed to recursively call it again on many other nodes. Can the Enumeration at the top be replaced with a single call at the tree root?

Ranch Hand
Posts: 264
The enumeration at the top holds the top level "nodes." If the input file doesn't create a single root node, it artificially creates one from the sequence of nodes that is the top level (I hope that makes sense).
I guess I should have written while(root.hasMoreChildren())as thats what is happening.

Basically if an input file has
parent 1
child1a
child1b
parent 2
child2a
child2b
parent 3
child3a
child3b

The top while statement would get executed 3 times.
I just didn't want the "artificially created root" in the database.
I wouldn't say its exactly exponential. Just if the input file is 50k or so it runs lickety split, if the input file is 500k it takes a lot longer than 10 times the length of the first run.