aspose file tools*
The moose likes Beginning Java and the fly likes Search FIle System for Pattern Match Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Search FIle System for Pattern Match" Watch "Search FIle System for Pattern Match" New topic
Author

Search FIle System for Pattern Match

Kevin Brennan
Greenhorn

Joined: Oct 23, 2003
Posts: 27
I would like to write a program to search the file system for files whose name matches at least one of a list of "patterns." Patterns consist of legal characters for filenames and "*" (asterisk) which signals a series of zero or more contiguous characters.

I think I need to assemble the following pieces, but am unsure of how.
-- Pattern list (array of regular expressions?)
-- Use FileFilter
-- Some method to walk the directory tree
-- Some way to identify the starting point (typically the root, but must be platform independent)
-- Some way to store the absolute path, size and create date for the matches and pass them back to another process.

Is this already written somewhere? Seems like a great way to learn a lot about Java to solve a familiar problem.
marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

Here's a good reference that might help with most of what you're doing. See "Chapter 12: The Java I/O System" in Bruce Eckel's Thinking in Java...

http://www.faqs.org/docs/think_java/TIJ3.htm


"We're kind of on the level of crossword puzzle writers... And no one ever goes to them and gives them an award." ~Joe Strummer
sscce.org
A. Wolf
Ranch Hand

Joined: Sep 28, 2003
Posts: 57
seems to me like you want to recursively search every folders' contents you encounter on the system.

Yes it exists, but like you, I made my own for the fun of learning.
in linux you use "find"/"locate"/"whereis", windows click Start>Search mac osX has spotlight

for the "walking" of the directory structure, I would create a recursive function with a directory File as an argument that iterates through each of its contents and checks if it is a file or folder. If its a folder the function calls itself with the folder as the argument. Here's what I mean:



I hope that works. If it doesn't, hopefully you can use it to help you.
I don't think the java api will let you get the creation date of a file.

as for the Roots, use File.listRoots()? I actually listed the contents of my /Volumes folder to get my drives on mac os x

as you can see I've chosen to ignore some directories on my file system because If I were to index the volumes folder located at /volumes/ which contains my hard drive /volumes/HardDrive, I would get an endless result of /volumes/HardDrive/volumes/HardDrive/Volumes/HardDrive.

Note that searching your whole directory structure probably takes a couple of minutes if not longer. I would probably use a Vector to store the results if they matched my query but thats up to you.
good luck
Kevin Brennan
Greenhorn

Joined: Oct 23, 2003
Posts: 27
Thank you for the advice. I have read both of these carefully and dimly understand them (not a criticm of the code, more a realistic assessment of my ability).

What I fail to get is:
(1) How you start the directory walk -- it needs to start somewhere in the tree, and I don't see how the current directory is necessarily the root? How do I use listroots?
(2) I'd like to return some kind of list of the absolute paths, lengths and last modify dates to the calling method, and I can't find the ddoucmentation to do this. (I am fundamentally a VB programmer -- am I thinking wonrgly about this?)

I appreciate the generous help from the forum members.
marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

Originally posted by Kevin Brennan:
... I'd like to return some kind of list of the absolute paths, lengths and last modify dates to the calling method, and I can't find the documentation to do this...

See the File class in java.io...

http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html

Note that a File instance can represent a file or a directory, and you can use methods isFile() or isDirectory() to determine which you're working with. Also note that File has methods, getAbsolutePath(), length(), and lastModified().

Here's another variation on the recursive handling approach. It's not as elegant as what A. Wolf posted above, but it might be easier to follow conceptually...
Kevin Brennan
Greenhorn

Joined: Oct 23, 2003
Posts: 27
Marc,
Your code gave me several productive lessons -- thank you!
I am trying to mmodify it to implement a filer on the files. I only need to find files whose names contain certain strings, so I thought the IndexOf method would be eaiser than implementing a filter or matching on regular expressions. here is my code:

[code]
void getFiles(String[] dir) {
for(int i = 0; i < dir.length; i++) {
File thisItem = new File(dir[i]);
if(thisItem.isDirectory()) {
getFiles(thisItem.list());
}
else {
String FileName2Check=thisItem.getname().toUpperCase();
if (thisItem.isFile() &&
FileName2Check.indexOf("QUICKTIME") > 0 ||
FileName2Check.indexOf("CLASS") > 0)
fileList.add(thisItem);
}
}
[\code]

The problem is that if I just write

[\code]

thisItem.getName().indexOf("Blah")

[\code]

it works, but since it is case-sensitive, it might miss a positive match. I have experimented with different ways to upshift the filename and comapre to an upshifted string ("BLAH"), but it gives me errors. I would aprpeciate any advice.
[ June 27, 2005: Message edited by: Kevin Brennan ]
marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

String indexing starts at zero, so you probably want "greater than or equal to" instead of just "greater than": ...indexOf("BLAH") >= 0...

Also,
  • You have: if( A && B || C )...
  • I think you want: if( A && (B || C) )...
  • If these things don't fix it, what type of errors are you getting? Compilation errors, or logical errors?

    (Note: Use forward slashes in your closing CODE tags.)
    [ June 27, 2005: Message edited by: marc weber ]
    Kevin Brennan
    Greenhorn

    Joined: Oct 23, 2003
    Posts: 27
    Hi, Marc and Mr. A. Wolf,
    I apologize for being unclear -- in fact, the errors I was picking up were due to a miscapitalized method. I am embarassed about how long that took to figure out!
    The other source of errors I found was due to the way I was trying to begine the search. I am trying to set it up so that it begins at the root of the volume, irrespective of platform (Windows, Mac or Linux). I thought that the way to do it would be to establish a file obect with the listRoots() method, then iterate thorugh the FileBlaster passing it a root each time. Whatever I was doing has not worked doe to a variety of syntax errors -- maybe there is a simpler way?
    marc weber
    Sheriff

    Joined: Aug 31, 2004
    Posts: 11343

    I've never worked with roots, but it seems to me that instead of starting in the current directory, you you could do something like...
    Kevin Brennan
    Greenhorn

    Joined: Oct 23, 2003
    Posts: 27
    Marc,
    I've been messing with this for several hours and havemade headway based on your thoughtful help. My current issue is that, while I can iterate through all the roots, the program cannot distinguish between fixed and removable media, so the first time it hits a removable drive it gives a dialog box asking for media.
    I searched the API for something that would read the hardware, but found nothing. I next firgured that checking for "writability" would work, but I have having trouble dealing with the exceptions. Also, when I try to print the name of the root, I get nothing.
    Here is my code



    Thanks for your thoughts.
    Kevin Brennan
    Greenhorn

    Joined: Oct 23, 2003
    Posts: 27
    I have continued to work on this.



    I've been trying to filter out the root volumes that are removable. The problem is that rootfiles.length returns a null for the removable media. How do I trap it?
    marc weber
    Sheriff

    Joined: Aug 31, 2004
    Posts: 11343


    [ June 27, 2005: Message edited by: marc weber ]
    Kevin Brennan
    Greenhorn

    Joined: Oct 23, 2003
    Posts: 27
    Thanks for your help -- I really need to make myself more clear.

    The ".length" method returns an integer, so when I test for != null, the compiler complains of an invalid data type.
    marc weber
    Sheriff

    Joined: Aug 31, 2004
    Posts: 11343

    Yes, it occurred to me this morning that something isn't quite right here.

    rootfiles is an array, and its length member is just a primitive int, so that's not going to be null. It's the array reference itself (rootfiles) that will be set to null when the listFiles() method is called on a removable drive...

    File[] rootfiles = roots[x].listFiles();

    Therefore, when you try to call rootfiles.length, you'll get a NullPointerException, because you're trying to call a method on a null reference.

    So what you need to do is test the rootfiles reference for a null value before trying to dereference it (that is, before trying to access one of its members)...

    Note: The "else" part of the code isn't technically needed, but this would be a good place to log somewhere that the root is being skipped.
    [ June 28, 2005: Message edited by: marc weber ]
    Kevin Brennan
    Greenhorn

    Joined: Oct 23, 2003
    Posts: 27
    Nothing works so far. All I want to do is establish whether a volume is a local fixed disk rather than a network share or removable drive. I think I am going to start a new thread since the question has morphed so much.
    Kevin Brennan
    Greenhorn

    Joined: Oct 23, 2003
    Posts: 27
    Very Interesting problem!

    When I run the code Marc Weber wrote, it compiles and works perfectly. When I change the second line of main in order to search the whole drive,



    to



    Only the C:\ searched, but the absolute path printout shows the names are formed by firectory + file name where directory is my Java Source directory (not C:\!!!), and the files are those in the root "C:\", which is correct. Very Weird!!!
    marc weber
    Sheriff

    Joined: Aug 31, 2004
    Posts: 11343

    Originally posted by Kevin Brennan:
    ...the absolute path printout shows the names are formed by directory + file name where directory is my Java Source directory (not C:\!!!), and the files are those in the root "C:\", which is correct. Very Weird!!!

    I think these selective quotes from the API for the File class explain what's happening.

    Under the isAbosolute method...
    The definition of absolute pathname is system dependent... On Microsoft Windows systems, a pathname is absolute if its prefix is a drive specifier followed by "\\"...

    And under the getAbsolutePath method...
    On Microsoft Windows systems, a relative pathname is made absolute by resolving it ... against the current user directory.


    Ref: http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html

    But this still doesn't quite solve your problem... Or does it? (If you're able to specify the drives, then this might be fine. But if the program has to find the drives on its own, then it seems there are still issues getting past a floppy.)
    Kevin Brennan
    Greenhorn

    Joined: Oct 23, 2003
    Posts: 27
    I guess I am confused because whatever .getAbsoluteFile is doing, it is reporting files that do not exist: namely files that exist in the root (C:\) as if they were in my Java source directory. For this reason, the test for .isDirectory() is failing because they celarly do not exist.

    I guess the real question is hwo do I use your (Marc's) code to begine the search at the root. I am happy to specify the root as oong as I can do it with certainly (C:\ for Windows, for instance). How would I do it for the Mac?
    marc weber
    Sheriff

    Joined: Aug 31, 2004
    Posts: 11343

    I think the flaw in my code was in using the list() method instead of listFiles().

    The list() method returns an array of Strings, whereas the listFiles() method returns an array of File objects. By converting to Strings, I think we were losing critical path information that was not being resolved satisfactorily by the getAbsolutePath() method.

    Below is a revised version in which my handleFiles method takes an array of File objects rather than String objects. I also added a test for a null reference because something on my Mac was giving a null reference rather than a File[]. (I haven't tracked this down yet.)

    On a Mac, use a forward slash for the root...
  • Windows: File startDir = new File("C:\\");
  • Mac: File startDir = new File("/");

  • Note: Manuel has posted a solution to the "floppy drive" problem in your other post, so that could be integrated into this code as a better way of finding the roots...
    http://www.coderanch.com/t/400083/java/java/Identify-Local-Hard-Drives-Only
    [ July 01, 2005: Message edited by: marc weber ]
     
    wood burning stoves
     
    subject: Search FIle System for Pattern Match