This week's book giveaway is in the OCPJP forum.
We're giving away four copies of OCA/OCP Java SE 7 Programmer I & II Study Guide and have Kathy Sierra & Bert Bates on-line!
See this thread for details.
The moose likes JDBC and the fly likes Problem in using java to Store 1GB+ files in a MySQL database Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Problem in using java to Store 1GB+ files in a MySQL database" Watch "Problem in using java to Store 1GB+ files in a MySQL database" New topic
Author

Problem in using java to Store 1GB+ files in a MySQL database

Leysen Strada
Greenhorn

Joined: Jun 25, 2011
Posts: 5
Im currently experimenting on storing large files on a MySQL 5.5 database using java. My main class is called FileDatabaseTest. It has the following method:



The database has only one Table - the "FILES" table, and it has the following columns.

ID - AUTOINCREMENT, PRIMARY KEY

FILENAME - VARCHAR(100)

FILESIZE - BIGINT

FILEDESCRIPTION - VARCHAR(500)

FILEDATA - LONGBLOB

The program is working fine when uploading small documents, but when I upload files like 20MB, the upload process is very slow. So I tried putting the FileInputStream inside a BufferedInputStream in the following code:

stmt.setBinaryStream(4, new BufferedInputStream(new FileInputStream(file));

The upload process became very fast. Its like just copying the file to another directory. But when I tried to upload files more than 400mb, I got the following error:

Exception in thread "Thread-5" java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.ensureCapacity(Buffer.java:156)
at com.mysql.jdbc.Buffer.writeBytesNoNull(Buffer.java:514)
at com.mysql.jdbc.PreparedStatement.escapeblockFast(PreparedStatement.java:1169)
at com.mysql.jdbc.PreparedStatement.streamToBytes(PreparedStatement.java:5064)
at com.mysql.jdbc.PreparedStatement.fillSendPacket(PreparedStatement.java:2560)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2401)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2345)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2330)
at FileDatabaseTest$2.run(FileDatabaseTest.java:312)
at java.lang.Thread.run(Thread.java:662)


So I tried using an embedded Apache-Derby database instead of MySQL, and I didn't get the error. I was able to upload 500MB to 1.5G files in the Derby database using the BufferedInputStream. I also observed that when using the BufferedInputStream with the MySQL server in uploading large files, the JVM is eating a lot of memory, while when I used it in the Derby database, the JVM's memory usage is maintaned at around 85MB TO 100MB.

I am relatively new to MySQL and I am just using its default configurations. The only thing I changed in its configuration is the "max_allowed_packet" size so I can upload up to 2GB file to the database. So I wonder where the error came from. Is it a bug of MySQL or the MySQL connector/J? or is there something wrong with my code?

What I am trying to achieve here is to be able to upload large files (up to 2GB) to the MySQL server using java, without increasing the java heap space.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

I guess (only guess) that this is a deficiency of your current MySQL driver - it seems to load the file to memory completely and then to write it into the database, instead of sending it into the database as a stream. Maybe you could try to put the file into the database in smaller parts (eg. 10-100 MB, depending on how much memory can you assign to your processing), using Blob.setBytes() method.
Leysen Strada
Greenhorn

Joined: Jun 25, 2011
Posts: 5
Martin Vajsar wrote:Maybe you could try to put the file into the database in smaller parts (eg. 10-100 MB, depending on how much memory can you assign to your processing), using Blob.setBytes() method.


How do I do that? I mean, how can I create a Blob object? Should I get it from an updatable ResultSet and append the bytes from there?
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

Leysen Strada wrote:How do I do that? I mean, how can I create a Blob object? Should I get it from an updatable ResultSet and append the bytes from there?

There is the Connection.createBlob() method. I'd suggest reading some JDBC tutorial.

I can't be more specific, as I usually work with Oracle and use a technique that is specific to that DB (namely, empty_blob sql function) in similar situations.
Tim Moores
Rancher

Joined: Sep 21, 2011
Posts: 2408
Maybe a better solution would be to store the file data in the file system, and to store just the path to the file in the DB.
Wendy Gibbons
Bartender

Joined: Oct 21, 2008
Posts: 1107

what documentation or event made you change the max_packet_size? as maybe that is why it is saving the entire file to memory first.
what happens if you make it smaller again?
Leysen Strada
Greenhorn

Joined: Jun 25, 2011
Posts: 5
Wendy Gibbons wrote:what documentation or event made you change the max_packet_size? as maybe that is why it is saving the entire file to memory first.
what happens if you make it smaller again?


If I make it smaller again like 5MB, I get this error:

ERROR UPLOADING: Packet for query is too large (20866341 > 5242880). You can change this value on the server by setting the max_allowed_packet' variable.
com.mysql.jdbc.PacketTooBigException: Packet for query is too large (20866341 > 5242880). You can change this value on the server by setting the max_allowed_packet' variable.
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3285)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1970)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2150)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2626)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2415)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2333)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2318)
at FileDatabaseTest$2.run(FileDatabaseTest.java:344)
at java.lang.Thread.run(Thread.java:662)


I can't send anything larger than the 'max_allowed_packet' size to the MySQL database.
Wendy Gibbons
Bartender

Joined: Oct 21, 2008
Posts: 1107

oh, sorry for wasting your time

but I did find this:
http://stackoverflow.com/questions/945471/handling-of-huge-blobs-in-mysql
Leysen Strada
Greenhorn

Joined: Jun 25, 2011
Posts: 5
Wendy Gibbons wrote:oh, sorry for wasting your time

but I did find this:
http://stackoverflow.com/questions/945471/handling-of-huge-blobs-in-mysql


I read the documentation about the LOAD_FILE function of MySQL. It says that the file has to be located on the server host to use the function.

So there's no way I can upload a 2Gb file to a MySQL server unless I have a 12Gb ram. Anyway, thanks for the replies.
Scott Selikoff
author
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3712
    
    5

Tim made an important point. In general, it's a very bad design practice to store large binary files in a database because the unstructured data overwhelms the structured data and makes the database difficult to manage. Personally, I would never upload anything more than a couple of megabytes to a database.

Think of the driver breaking at 1+ gigabyte as a feature, not a bug, in that it's encouraging you to explore better design patterns.


My Blog: Down Home Country Coding with Scott Selikoff
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

Scott Selikoff wrote:Tim made an important point. In general, it's a very bad design practice to store large binary files in a database because the unstructured data overwhelms the structured data and makes the database difficult to manage. Personally, I would never upload anything more than a couple of megabytes to a database.

Think of the driver breaking at 1+ gigabyte as a feature, not a bug, in that it's encouraging you to explore better design patterns.

If this is supposed to mean "always explore the out-of-database storage first", then I would not agree. Of course, in this case (MySQL) it is actually a must, but some other databases support terabyte-sized of blobs. Having all data in a database seems easier and more straightforward option to me and I'd explore it first. It automatically provides all of the services the relational data already have - transactionality, security, point-in-time recovery, replication and so on. Of course, for gigabyte-sized blobs lots of considerations need to be made, but that is true even for out of database storage.
Scott Selikoff
author
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3712
    
    5

Martin Vajsar wrote:Having all data in a database seems easier and more straightforward option to me and I'd explore it first. It automatically provides all of the services the relational data already have


Try maintaining a system with gigabyte/terabyte size blobs and you might have a different opinion. Also keep in mind most large databases handles thousands, perhaps millions, of transactions a day. The overhead of uploading/downloading the objects puts an unnecessary strain on the db. That, coupled with the fact that the blob is often unstructured, making joining on the blob and performing queries pointless. Backing up the entire db is also quite combursome. There are very few appropriate situations, especially in the world of web systems, where putting massive files in the database would ever be a good idea.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

Scott Selikoff wrote:Try maintaining a system with gigabyte/terabyte size blobs and you might have a different opinion. Also keep in mind most large databases handles thousands, perhaps millions, of transactions a day. The overhead of uploading/downloading the objects puts an unnecessary strain on the db. That, coupled with the fact that the blob is often unstructured, making joining on the blob and performing queries pointless. Backing up the entire db is also quite combursome. There are very few appropriate situations, especially in the world of web systems, where putting massive files in the database would ever be a good idea.

Yes, in the domain of web systems that is probably true. I didn't notice that Leysen was speaking about a web app, though using the term "uploading" several times in his post certainly points that way.

In other environments the situation might be different. Administrators of corporation's back office might well prefer handling these data in a database. I admit I'm closer to this environment that to web applications and I'm probably influenced this way. Having terabyte-sized databases (full of pure relational data) is not uncommon there, and they are used to it.

I, however, don't see managing terabytes of standalone files fundamentally easier than managing these data inside a database. And unless you can afford to lose the files, you need to back them up anyway, which is certainly a challenge in itself. Given that the database is already here and therefore a DBA to take care of it too, not having to separately back-up standalone files means you don't need another set of skills.

Of course, chances are there is not a (good) DBA taking care of the database. In this case, the issue is moot, as the database will sooner or later be lost due to hardware, software or (most probably) human error.
Scott Selikoff
author
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3712
    
    5

Martin Vajsar wrote:I, however, don't see managing terabytes of standalone files fundamentally easier than managing these data inside a database


Theory is not the same as practice. And except in scientific experiments, databases must be practical. These days most developers can manage or write code that interacts with nearly any database system without a lot of special training (except possibly Oracle). Accessing, maintaining, or writing code that interacts with a database storing terabyte-size file requires very special handling and is not for normal developers.

In your example, it would be a lot easier to put terabytes of data in a web cache than a database, so that multiple users can access it and it can distributed more easily. In other words, it's not just that it's bad to put them in the database, but there are advantages to using a file system.

Again, try managing such a system. It is non-trivial.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

Regarding pure web applications, putting large files into the DB is probably not be the best solution. I don't contend this point. However, in other environment the situation might be different. My last post was concerned with the corporate environments I'm more familiar with, as is this one.

Scott Selikoff wrote:
Martin Vajsar wrote:I, however, don't see managing terabytes of standalone files fundamentally easier than managing these data inside a database

Theory is not the same as practice. And except in scientific experiments, databases must be practical. These days most developers can manage or write code that interacts with nearly any database system without a lot of special training (except possibly Oracle).

Firstly, I don't understand why Oracle should be an exception in this. Anyway, writing database-agnostic code is anything but easy. I doubt most of the JDBC developers understand differences between MySQL and Postgre (or SQL Server and Oracle) concurrency models, for example. And ORM frameworks are not going to shield all these differences away from them. Writing highly performant, database agnostic code is next to impossible. Unless you implement all of the functionality in stored procedures, using all the tools the target database has to offer to you, and generally limit the JDBC code to calling these procedures. I'd say this would not be considered "database agnostic" by most people, though.

Secondly, I don't see why handling several GB blobs should require very special handling in JDBC, though my experience is mostly with Oracle, which probably has good support for large LOBs. Other databases might pose specific challenges - MySQL certainly does. (As a side note, I'd say that providing support for GB-sized files in a web application might require more specialized knowledge than providing similar support on the database side, eg. you might have to allow the end user to restart his upload or download, which usually should not be needed when storing/retrieving them to/from the DB. Sending GBs of data over a webservice would probably be quite a feat too. But possibly this is just my ignorance in these areas.)

Again, try managing such a system. It is non-trivial.

Regarding the management (and I reiterate I'm speaking about corporate environments) - providing full ACID support, or replication, or point-in-time recovery, or disaster recovery for TBs of standalone files would be highly demanding in my opinion. It can be demanding in case of TB-sized databases too, but in corporations DBAs usually have to handle large databases anyway. I was not saying it was trivial, I just say that in some environments this has to be done anyway and additional few TBs of data can be relatively cheap.
Scott Selikoff
author
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3712
    
    5

Experience is everything. Oracle, btw, is one of the few database products that still often requires a dedicated Oracle DBA, whereas most developers can successfully manage other DBMS systems without special training.
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30795
    
157

I think it depends on the type of database. In a relational database, I agree with Scott. While I don't have enough experience with non-relational databases, it seems like they would handle it better.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

Scott Selikoff wrote:Experience is everything. Oracle, btw, is one of the few database products that still often requires a dedicated Oracle DBA, whereas most developers can successfully manage other DBMS systems without special training.

I had understand your previous post that developing against Oracle requires special training. If we speak about administration, that's a different story. Though I still think it depends on what the requirements are, not the vendor; setting the DB up for and then performing point in time recovery (my favorite example ) probably requires good knowledge in any database out there. On the other hand I'm quite successfully managing my company's Oracle test databases, though I didn't have any specialized DBA course. It took some time to get into it, but not fundamentally more than it took to get into MS SQL Server, for example. Things dramatically improved when I got used to reading the documentation. (Edit: I didn't ever do backup, we don't need it for testing. Maybe backing up a DB is easier in other databases than in Oracle, I don't know.)

Anyway, we're speaking about vastly different environments here. Corporations generally do have DBAs for all of their databases, be it Oracle, DB2 or MS SQL Server. I would have thought that some of them run MySQL too, but I haven't personally encountered any in the few corporations I've come into contact with; it is well possible that they simply don't. The point is that one database typically hosts tens of applications and it is generally not feasible to let developers of individual applications manage it, for quite obvious reasons. So they do need a DBA. Having all data in a database makes some operations (eg. disaster recovery) much more simpler in these environments, since there is simply less moving bits and less things that can go wrong.

Regarding experience, I had experienced moving and upgrading several TBs relational database hosting a dozen of applications onto a new machine this month. My position was that of a developer of one of the vendors. The tricky bit was planning for the outage, moving core business onto a stand-by database (and back), coordinating actions among developers, updating connection strings dispersed throughout the corporation. The company handled the database internally and handled it pretty good. They have the skills, which is precisely my point. Additionally, several years ago we were discussing design of an application with them and their position was "put everything into the database, including LOBs". So I believe there are environments where putting files into the DB is preferable. Undoubtedly there are ones where the opposite is true. It is pretty much possible that the former is poorly represented here on Javaranch and I'll try to take notice. (I've made progress on my part - some time ago I'd actively suggest moving files into the DB in discussions here. Today I wouldn't.)
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Problem in using java to Store 1GB+ files in a MySQL database