Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Implementing CASCADE DELETE using Java

 
Mani Ram
Ranch Hand
Posts: 1140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need to do the following task, using JDBC (I can't use database triggers or constraints with cascade-delete)

As you can see the result of the first query is used again and again. Is there a way to cache the result and use it in the subsequent statements?

Of course, I can use an ArrayList to store the MESSAGE_IDs returned in (a), but there are 2 problems with the approach

i) The number of rows returned is huge and so it storing it in a ArrayList and retriving it [I]might[/] affect the performance
ii) I have to format the ArrayList entries in a way that it can be passed into the subsequent queries. (I have to fetch the individual elements in the ArrayList, append them all with a comma inbetween them...). Looks awkward to me.

Just wondering, is there a way to do it elegantly?
 
Blake Minghelli
Ranch Hand
Posts: 331
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can't you do something like this psuedocode:
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34095
337
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mani,
In this case, the database is likely to be the performance bottleneck, not the ArrayList. I would prefer storing the values in an ArrayList so you can close the connection/statement/resultset sooner. The deletes may take a while and you don't want to consume unnecessary database resources all that time.

Also look into batching those delete statements. That way you have less trips to the database. Further, not that using the in clause you don't get the value of prepared statements. You might want to try doing a fixed number (say 10) in each in clause so the sql looks the same each time.
 
asirob civokviz
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
did you consider to delete data using only one
SQL statement or you need those ids retrieved
by first SQL statement?
 
Mani Ram
Ranch Hand
Posts: 1140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the replies

Blake:
The suggested approach will delete records one by one. Assuming there are million rows in the table, the executeUpdate will be called 4 million times to delete the records from 4 tables. Won't that be a performance hit?

For example,

will perform better than

isn't it?

Jeanne:
I understand that the database will be the bottleneck when compared to the ArrayList manipulation. That's the reason why I too prefer the ArrayList over the method suggested by Blake.
But just wanted to know whether there is a better way of doing this
Batching - yes. I will definitely go for it.
And, I'm not clear what you mean by "doing a fixed number in each clause". Could you elaborate on it please. Are you suggesting some thing like this


Bori?a (Sorry...my browser doesn't display your name properly)
No. I don't need the retrieved MESSAGE_IDs anymore.
 
Blake Minghelli
Ranch Hand
Posts: 331
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The suggested approach will delete records one by one. Assuming there are million rows in the table, the executeUpdate will be called 4 million times to delete the records from 4 tables. Won't that be a performance hit?

Correct, my approach is appropriate if you are deleting a small number of records. It's hard to imagine a business process whereby you would be deleting a million parent rows, but if that's the case then you need to consider something else. At that point I don't think you even want to do small batches of 10 100,000 times. Either way, your db is doing the same work.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34095
337
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mani,
Yes, you have the idea of what I was going for.

If you always used the number of results as returned from the original query, you would have a different number of parameters in the in clause each time. For example, you could have 102, 200 and 201 in clause entries for 3 different queries. The database would recognize this as 3 different queries and would not use the cached execution plan for similar queries.

If you always use a fixed number of parameters in the in clause, the database will use the prepared statement which will reduce some overhead. You could pick a couple of fixed values too. For example, batches for 1, 25 and 200. You can experiment to come up with good sizes for your dataset.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic