aspose file tools*
The moose likes Performance and the fly likes effcient way to parse clob data Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "effcient way to parse clob data" Watch "effcient way to parse clob data" New topic
Author

effcient way to parse clob data

ranjithkumar.gendhe kumar
Greenhorn

Joined: Oct 12, 2010
Posts: 19
hi

we are reading the clobs data. no of clobs are more than 25.

and parse it by using StringTokenizer

in this method we are getting out of memory error in some cases.

StringTokenizer always returns new String on base String holding it as char array.

is there any way to parse the data efficient and without getting out of memory error.




Thanks & Regards

gr kumar
Madhan Sundararajan Devaki
Ranch Hand

Joined: Mar 18, 2011
Posts: 312

You may use the java.lang.String.split method that also takes a limit argument. Using this split method you can control the size of the String array returned.


S.D. MADHAN
Not many get the right opportunity !
ranjithkumar.gendhe kumar
Greenhorn

Joined: Oct 12, 2010
Posts: 19
hi

here each and every clob having the size more than 40 mb.

if i use the split method. it will creates the array of strings in heap.

in application server free memory is 150 mb.

getting 40 mb data in heap and split methods also creates array of string and less performance compare to stringtokenizer.

each clob taken nearly 60 mb of heap at runtime.

i want to parse the data without creating new String in heap. here split and stringtokenizer always creates new string.

Thanks and Regards
gr kumar.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12789
    
    5
Exactly what is the purpose of this parse?

It is not at all clear from the original post what you expect to get out of these CLOB documents. You will get better suggestions if we know what you are trying for.

Those string tokenizer and split methods are only intended for short chunks of text.

Bill
ranjithkumar.gendhe kumar
Greenhorn

Joined: Oct 12, 2010
Posts: 19
hi

clobs are containing the data in this format

Header1~Header2~......~;
data11~. data12~.........~;
data21~data22~...........~;
. .
. .
. .
datam1~datamn~...........~;

by using this we generate excel using Aspose Api.

Thanks & Regards

gr kumar.

William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12789
    
    5
If this was my problem I would either:

1. Read the whole CLOB into a char[] and then walk a pointer through the char[] recognizing the various line end and separator characters to locate the fields. Only extracting strings from one line at a time as needed to write a single Excel record.

OR

2. Reading the CLOB as a character stream, parse one line at a time and output the selected data as a CSV file, since Excel can read CSV. There appears to be no reason to parse more than one line at a time.

Bill
ranjithkumar.gendhe kumar
Greenhorn

Joined: Oct 12, 2010
Posts: 19
Hi Bill,

thanks for giving response.


2. Reading the CLOB as a character stream, parse one line at a time and output the selected data as a CSV file, since Excel can read CSV. There appears to be no reason to parse more than one line at a time.


this solution is not efficient. it is round trip process. writing the data to csv file and again open it by file stream.and again doing generation. the size of csv file same as clob size. it takes some memory to open the file in heap.

1. Read the whole CLOB into a char[] and then walk a pointer through the char[] recognizing the various line end and separator characters to locate the fields. Only extracting strings from one line at a time as needed to write a single Excel record.


this one is consider but in this case also there will be a chance of getting out of memory error.

consider on example.

four clients are giving request to generate excel having 10,00,000 rows.

in that case extracting the string from char array also a new string.

holding 4 clobs data and creating new strings in each and every time leads to out of memory error.


i need a method which will return required string without creating new string from base string.

is it possible.

if yes tell me how to write that one.

or

is there any api to provide this feature.



Thanks & Regards

Ranjith.








William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12789
    
    5
this solution is not efficient. it is round trip process. writing the data to csv file and again open it by file stream.and again doing generation. the size of csv file same as clob size. it takes some memory to open the file in heap.


Oh really! You tried having Excel read a csv and it was slow?

i need a method which will return required string without creating new string from base string.


Why are you using Aspose API?

Your original code:


Apparently tries to read the whole CLOB when all you need is one line at a time! Why?
Ermes stg
Greenhorn

Joined: Aug 21, 2011
Posts: 1
Extremely thanks ! for this factual data share with me because in some days i could not take decision inadequate information but now im satisfied.
----------------------
Orlando Pool Distribution
 
Consider Paul's rocket mass heater.
 
subject: effcient way to parse clob data