• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Performance tuning for Oracle CLOBs with a remote database

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am accessing an Oracle database that is about 4000 miles away.
The bandwidth on both ends is adequate, and the performance is
reasonable so long as I set the fetch size to an appropriate number.

Retrieving 31,000 rows of data without CLOBS behaves as follows:


When reading CLOB data from an Oracle database the driver does not provide
the contents of the CLOB at the same time as the rest of the row, instead
it provides an InputStream to read from. This results in an additional
roundtrip and causes significant overhead in my situation.

One approach I tried was to select the data in two batches - one with data smaller than 4000, and one with the data over 4000. e.g.:
query 1: select to_char(foo) from sometable where length(foo) <= 4000
query 2: select foo from sometable where length(foo) > 4000

I applied this technique to the following data set:
Rows with CLOB smaller than 4000: 30000
Rows with CLOB larger than 4000: 800

The result was that I could fetch the 30000 smaller records in 110 seconds,
while the 800 remaining records took an additional 7 minutes. The larger records were all under 30kb, and the total size of the smaller records was much higher than the total size of the larger records.

Bandwidth is not the issue here - it is the overhead from an excessive number of roundtrips.

Thus far I have not been able to find a way to transfer CLOB data in
batches along with the other row data - is there a way to do this?

[ June 05, 2007: Message edited by: Christopher Clark ]
[ June 05, 2007: Message edited by: Christopher Clark ]
 
author
Posts: 4335
39
jQuery Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm curious, for the records with lare clob data, have you tried reading the entries one by one? Given that it takes 7 minutes to read them in a group, I wonder the performance if you performed separate selects on just the clob data using a query for each clob. Also, having the 'length' of the clob in a denormalized field (with a tree index on it) would speed up the query you are running, but not neccessarily significantly.

I'm not an Oracle DB expert, so you might need to consult one to solve your problem, but one work-around would be to use varchar instead of clobs. Since 30,000 (of 30,800) of your records are under 4000, this might not be a bad idea.

Lastly, have you tried the same tests in a system on the same network as the database? Does it really matter that the database is far away?
[ June 07, 2007: Message edited by: Scott Selikoff ]
 
Christopher Clark
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the reply Scott.

The performance for executing individual selects would be abysmal, even with a PreparedStatement.

As the table in question is read far more often than it is updated I had the DBA create two materalized views - one with the data under 4k as a varchar2, and the other with the large data in CLOBs. This avoids the length() and to_char overhead, but that overhead was a relatively small part of the problem - the reading of the CLOBs is much more expensive.

When running the application on a machine that is local to the database the performance is significantly better of course. Fetching the 34000 entries under 4k in size taks 34 seconds, and fetching the 880 entries over 4k in size takes 22 seconds. The total size of the data under 4k is 29.1mb, the total size of the larger fields is 5.3mb. That means that the overhead from the network roundtrip accounts for 94% of my execution time for the CLOBs.
 
reply
    Bookmark Topic Watch Topic
  • New Topic