No, it doesn't need to be in memory. You will have to work out how to load the data into the DB as you read them.
Nancy Joe wrote:. . . it needs to be loaded into a DB, so it needs to be in the memory.
Campbell Ritchie wrote:If you have enough mmory, you may be able to avoid the increase in the array size in the StringBuilder by setting its size before you start.
Nancy Joe wrote:It is huge, can run in MB and GB. After reading this data, it needs to be loaded into a DB, so it needs to be in the memory.
Campbell Ritchie wrote:No, it doesn't need to be in memory. You will have to work out how to load the data into the DB as you read them.
I thought there would be an easy way tio do it.
Paul Clapham wrote:. . . It's not as complicated as that. . . .
Nancy Joe wrote:Hi Tim,
Yeah, thats what our workflow looks like in short. Interesting that you worked on some of the pentaho source code. Basically in our case, pentaho makes one rest call to the service to pull tons of data (in GBs). The inbuilt rest client step in pentaho did not work for us since we got the gzipped json response back from the service and pentaho did not know how to decompress it. So we created a java library(which calls the rest service and decompresses the gzip to json string) which we use in the user defined java step and pass it back to a json input step in pentaho where it decodes the fields. But the user defined java step is not able to handle so much data in memory. We are using the com.fasterxml.jackson.core.JsonParser, but that also stores everything in memory when reading the json tree from the mapper. Is there any other option than calling the rest service in batches?