marten kay wrote:
1) On the client (HTML side), it seems that the only thing that I can do is use javascript to check the file extension, and use some sort of mapping to check against the content type. Is this correct?
yes and this only serves to help the user. It can't really ensure that a false kind of file is uploaded
marten kay wrote:
2) On the server side (Java - using com.oreilly.servlet) it seems I can only get the content type from the header, but the content type is set by the client side using the extension in the first place. is this right?
It just comes to mind that a user can fool the system by changing the file extension... is this correct?
This is right. all content type information comes from the browser and it can't really be trusted.
Is there someway for either the client or server to 'interrogate' the actual file?
there are some solutions if you expect, for example, an image file, you can convert it to an image file using java's imageio and then interrogate it as an image, for example checking if image size returns something valid.
If you have a PDF Library like itext, you could do something similar by converting a supposed PDF upload to PDF and making sure it behaves correctly.
beyond that, you can examine the byte stream and using
a table of "magic numbers" for common file types to determine the file type.
Here is a blog post comparing various file type identifier solutions in java:
http://fredeaker.blogspot.com/2006/12/file-type-mime-detection.html
I imagine someone could fix a file's bytes to have the correct magic numbers making it look like a gif file when it isn't, but now you are in anti-virus scanning territory.