• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

I want to Read and copy 800 MB file. Any efficient way.

 
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have an assignment, where I have to read 800 MB file and create another file of same size. With I/O it takes 45 minutes and I am given target to complete in less time. Is there any other solution can anyone suggest???

Thank you.

"I love Java"
[ October 26, 2005: Message edited by: kapil patel ]
 
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You need to read entire file using IO and again write it using IO or you need to copy the file using copy commnad?
 
kapil patel
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you chetan.

I have tried using Java I/O. It takes 45 times. But I have target to complete in less time. Can u suggest any other solution?

Moreover, as an Input: I have resultSet or ArrayList and I have to copy those as Flat File with some formating.
 
Rancher
Posts: 13459
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Did you want to show us what you have and we can offer suggestions? eg I'm assuming you have a buffered stream piped to another buffered stream, but as Chetan says, you may be using the OS copy command.
 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Kapil,

Can you just post what data your are fetching and what you are storing.

Post in below manner.

Fetching :
FirstName Lastname
Chetan Parekh
Kapil Patel

Storing:
My name is Chetan Parekh.
My name is Kapil Patel.
 
David O'Meara
Rancher
Posts: 13459
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A quick internet search provided some code that did a 600M file in about 3 minutes, with other activities running in the background. I'd love to pass it on, but we'd like to see yours first...
 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Kapil, I just want to give an example where we have done optimization.

We had to generate .xls file by firing query on database. Query was returning huge data. Query time and .xls file generation from ResultSet were very high. We found that, we were providing the same data to all users and to fulfill request of each and every user, we are doing same process again and again.

We decided that we will generate .xls file in BOD and give the link of that file for download, rather then generating file on the runtime.

We have reduced huge amount of load by doing this.

This is just to give you a hint if your requirement is the same.

Post your requirement. Why and what you want to save in file.
 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Some Tips
 
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I attach some code that I use in a lot of program:

 
kapil patel
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you all.

My requirement is:

I am getting some data from database in ResultSet as string. We want to get all data as a dump in one flat file. Number of resultset is huge around 800 MB size of database. We need to do this only through Java application. A normal Java I/O takes 45 minutes. We want to reduce it to less than 20 mins.

Java NIO (New I/O) might be probable solution. Has anyone tried to write string objects using NIO into file system???

Or is there any other alternative???

Thanks again.

"I love Java"
 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
These are some general tips to optimize your performance, but you will not see any dramatic improvement.

(1)Use native Database driver they are fast
(2)Set record fetch size to the optimum level

(3)If you have to do formatting of the data before duping in file, try to do some part of it in query.
 
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are you sure, that IO is the bottleneck, and not the database-query?

If your drive is already used by other tasks, you could speed up things by writing to a different drive.

Without seeing your code, it's hard to suggest improvements there.
 
kapil patel
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you all. I/O is bottleneck in our case. We are trying with NIO.

If I will find solution, I will inform all.
Thanks chetan, your suggestion also we will consider for this.

"I love Java"
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can find the code to copy a file using NIO here:

The Java Developers Almanac - Copying One File to Another (using nio)
 
JuanP barbancho
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
HI,

Try CloverETL in Sourceforge, is a tool for read data from source, formatting and save it.

It is very fast, multithread and NIO.

I make a tool similar using 4 thread, but I need than the program make faster than CloverETL.

I make a tool in C and Pthread only for ORACLE, It is fastest than CloverETL, but I love Java and not C. If you want this tool, I could provide

send to me a email barbyware@yahoo.com
 
JuanP barbancho
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try read data in a Thread and Write to File in other Thread.
 
JuanP barbancho
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I put and very fast unload java program, I expect make this an open source project.

http://groups.yahoo.com/group/barbyware/files/
 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by JuanP barbancho:


I expect make this an open source project.



Thx a lot.

Future is open, support Open Source.
[ October 27, 2005: Message edited by: Chetan Parekh ]
 
JuanP barbancho
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I make this tool whithout NIO, It is posible that you improve using NIO and memory ByteBuffer.

I expect work in this project as soon as posible.
 
author
Posts: 4335
39
jQuery Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Kapil-

I'm curious how your storing the data in memory and how frequently you are writing to file. For example, are you storing the entire set in memory or writing to the file as soon as it is read from the result set? I think some clever memory management may help.

I would tend to think the database would be more of a bottleneck than the I/O, especially if it is accross a network, but again, I think we'd need to see some example code to get a better idea what you are doing.
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I agree with Scott, a DB across a network is most likely the problem here. 45 minutes for 800 MB is pretty bad, even for old-fashioned Java IO. Kapil, what makes you think the DB is not the bottleneck? Is it the fact that the execute() method returned quickly? That means nothing. DB drivers often return a ResultSet very quickly, even though the actual data is still being gathered. That's useful because it allows you to start processing data right away - but the real question is, how long does it take to get all of it?

You may find it interesting to see how long it takes to execute the following:

If you find it takes something close to 45 minutes with no additional I/O at all - that's a pretty big clue that the problem really is in the DB.

You could also add a call to resultset.getString(0) or whatever field(s) you need to access in each result inside the while loop, again with no additional I/O code. I suspect that you won't need it, but it might make a difference.

Please let us know what results you get from this. Good luck.
 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
More to what Jim Yingst said,

 
JuanP barbancho
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I was the same problem, 45 minutes is very bad, but I it posible with large tables, 6 000 000 record with a lot of column, 50 or more. If you use a lot CHAR, then the problem is the same that I was in the past.

JDBC is streaming, you need read data in a loop.

. Try to use a buffer with 1000 or more record.
. Try to use the minimal byte, short, int, long of you can.
. Try to save the file in batch use a large Buffer.

If none tool run I could provide a C tool for Unload data

Thanks
 
JuanP barbancho
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I forgot if you want double database memory.

Thanks
 
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try to cheat, make a (soft) link.
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Kapi, some comments:

- The suggestions you get might be a little bit confusing, because your first description of the problem did sound like you wanted to make an exact copy of an existing file. Writing the formatted content of a DB to a file is a *very* different problem.

- It's hard to help you without knowing more. The current suggestions cannot be much more than wild guesses, and your conclusion that it's an IO problem feels very much like a guess, too. I very much doubt that NIO will solve your problem and would strongly suggest to first do a deeper analysis of the problem with our help to really find the bottleneck.

- As this is a performance question, I'm moving it to our Performance forum...
 
Ranch Hand
Posts: 308
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Loading 800MB at a time and making a copy needs 1600MB of your RAM. If you don't have that much RAM your oerating system will do paging to create virtual memory. Most of the time will be lost in the paging process. In your 45minute program this might be the issue.

The solution to this is don't keep more than x bytes in memmory at a time. A fast algorithm will find a value for x at runtime based on allotted memory, other applications running etc. This x bytes will be rwad from input stream and write to output stream. The value for X should be chosen such that paging will not happen.

Does this help?
 
JuanP barbancho
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Try to use a Thread for save the data, you could use two buffer array.

Thanks you.
 
reply
    Bookmark Topic Watch Topic
  • New Topic