I would like to create a component to detect the file being modify before process.
Seeking for the advice, whether my thought is make sense and is it the right way to detect the file modification based on file size value?
Below are the flow:
1. Get the file size of a file
2. Used file size value encrypt it with MD5 algorithm, and say it generated us encrypted value "0123sdf"
3. to avoid user modify the file content, before file process, we take the file and do the encryption with md5 again, if it return value "0123sdf", then we are sure it doesn't have modification.
a. is it the right approach to detect file modification?
b. what the library advise to use or using java.security.DigestInputStream will do?
IMHO using file size as the only indicator for "file changed status" is rather vague. The file size may be the same yet the content is changed (eg a letter)
I suggest using the last modified date/time.
Also you should consider what happens when your program is doing whatever to the file (in memory perhaps) that no one tampers it in between. In another word, can the file be locked (similar to database row or table locking) during processing?
First how is the file uploaded to the server? FTP, web UI upload etc? If FTP, do end users have direct access to the FTP folder through FTP client software or something?
Suppose it's web UI upload. Given user A and user B has the same file name called "Y2014budget.txt". Both user A and user B made some changes independent of each other. They upload their respective file through the web interface.
In normal situation, whoever upload last will overwrite the previous uploads..... I doubt you want that!
Under this situation, the "uploaded" file name would have to be changed, say appending a unique key (eg db PK) at the end or something to say Y2014budget.txt_1
This advantage to this approach is uploaded files never get overwritten. Disadvantage, use more disk space.
Depending on whether the files have any defined spec having header, footer, etc. If so, the file renaming may need to corresponding to those "file name" entries inside the file (usually in header and/or footer)
Using the last modified date/time is not at all safe. It is very easy to modify a file and set it's modified time back the original file's time.
Joined: Jul 24, 2011
Hi K. Tsang,
there are two ways user can trigger the file generation - manual or auto.
Both approach will trigger from web UI, for manual, user can download the file and allow to upload it into different environment at anytime.
for auto, user trigger the file generation process from web UI and the generated file will transfer over via FTP to other environment instantly.
the file size checksum is to ensure the content is not being modify by the user.