This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I am accessing an Oracle database that is about 4000 miles away. The bandwidth on both ends is adequate, and the performance is reasonable so long as I set the fetch size to an appropriate number.
Retrieving 31,000 rows of data without CLOBS behaves as follows:
When reading CLOB data from an Oracle database the driver does not provide the contents of the CLOB at the same time as the rest of the row, instead it provides an InputStream to read from. This results in an additional roundtrip and causes significant overhead in my situation.
One approach I tried was to select the data in two batches - one with data smaller than 4000, and one with the data over 4000. e.g.: query 1: select to_char(foo) from sometable where length(foo) <= 4000 query 2: select foo from sometable where length(foo) > 4000
I applied this technique to the following data set: Rows with CLOB smaller than 4000: 30000 Rows with CLOB larger than 4000: 800
The result was that I could fetch the 30000 smaller records in 110 seconds, while the 800 remaining records took an additional 7 minutes. The larger records were all under 30kb, and the total size of the smaller records was much higher than the total size of the larger records.
Bandwidth is not the issue here - it is the overhead from an excessive number of roundtrips.
Thus far I have not been able to find a way to transfer CLOB data in batches along with the other row data - is there a way to do this?
[ June 05, 2007: Message edited by: Christopher Clark ] [ June 05, 2007: Message edited by: Christopher Clark ]
I'm curious, for the records with lare clob data, have you tried reading the entries one by one? Given that it takes 7 minutes to read them in a group, I wonder the performance if you performed separate selects on just the clob data using a query for each clob. Also, having the 'length' of the clob in a denormalized field (with a tree index on it) would speed up the query you are running, but not neccessarily significantly.
I'm not an Oracle DB expert, so you might need to consult one to solve your problem, but one work-around would be to use varchar instead of clobs. Since 30,000 (of 30,800) of your records are under 4000, this might not be a bad idea.
Lastly, have you tried the same tests in a system on the same network as the database? Does it really matter that the database is far away? [ June 07, 2007: Message edited by: Scott Selikoff ]
The performance for executing individual selects would be abysmal, even with a PreparedStatement.
As the table in question is read far more often than it is updated I had the DBA create two materalized views - one with the data under 4k as a varchar2, and the other with the large data in CLOBs. This avoids the length() and to_char overhead, but that overhead was a relatively small part of the problem - the reading of the CLOBs is much more expensive.
When running the application on a machine that is local to the database the performance is significantly better of course. Fetching the 34000 entries under 4k in size taks 34 seconds, and fetching the 880 entries over 4k in size takes 22 seconds. The total size of the data under 4k is 29.1mb, the total size of the larger fields is 5.3mb. That means that the overhead from the network roundtrip accounts for 94% of my execution time for the CLOBs.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com
subject: Performance tuning for Oracle CLOBs with a remote database