I have written an application that creates a fairly large tree structure from an input file and then inserts it into a database. As it parses the tree it inserts records and part of each record is the id of its parent. With really large input files the process gets extremely slow. I'm trying to conceptualize a way to do batch inserts (or anything else to speed up the process).
my basic algorithm is:
Any ideas? It also seems the application speed decreases exponentially while the input size increases linearly (thats a little bit of an overstatement).
author and iconoclast
The Enumeration at the top iterates over... all the nodes? Are all the nodes in the tree held in one collection somewhere?
Then the parseNode() method iterates over all the children of each node.
Taken together, and combined with your observation that the runtime goes up exponentially with tree depth, this suggests that parseNode() is being called multiple times on any given node.
If you have a recursive algorithm for walking a tree, you generally start it by calling it once on one node at the root of the tree, and then the recursion takes care of calling it on every tree node. It looks to me as if you're calling it on every node, and many of those calls proceed to recursively call it again on many other nodes. Can the Enumeration at the top be replaced with a single call at the tree root?
The enumeration at the top holds the top level "nodes." If the input file doesn't create a single root node, it artificially creates one from the sequence of nodes that is the top level (I hope that makes sense). I guess I should have written while(root.hasMoreChildren())as thats what is happening.
Basically if an input file has parent 1 child1a child1b parent 2 child2a child2b parent 3 child3a child3b
The top while statement would get executed 3 times. I just didn't want the "artificially created root" in the database. I wouldn't say its exactly exponential. Just if the input file is 50k or so it runs lickety split, if the input file is 500k it takes a lot longer than 10 times the length of the first run.