wood burning stoves*
The moose likes Java in General and the fly likes Java Regular Expressions Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Java Regular Expressions" Watch "Java Regular Expressions" New topic
Author

Java Regular Expressions

jay lai
Ranch Hand

Joined: Apr 04, 2002
Posts: 180
Hi,
The input file name needs to pass some validations, below is the list of criteria.
1. The file name must be lower case and no longer than 32 characters in length.
2. The file name must start with a letter (a-z) may contain numbers (0-9) with no spaces.
3. The file name may contain one period (.), one hyphen (-),one underscore (_) character and must end with a *.pdf extension.
4. The file name should not contain any path delimiters.

Can some one help me with this, Tank you very much
marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

That's an interesting set of requirements. What do you have so far?

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

Here's some general advice: Start by writing your test code. Create instances that violate each of the requirements (individually, and possibly in various combinations), along with at least one instance that satisfies all requirements. Then as you develop your regex pattern, these test instances will tell you what needs adjustment.
[ June 21, 2005: Message edited by: marc weber ]

"We're kind of on the level of crossword puzzle writers... And no one ever goes to them and gives them an award." ~Joe Strummer
sscce.org
marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

Are you required to use a single regex pattern? Or can you use 2 or 3 separate regex patterns? In other words, can you use something like the following?

... s.matches(regex1) && s.matches(regex2) ...
[ June 21, 2005: Message edited by: marc weber ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
3. The file name may contain one period (.), one hyphen (-),one underscore (_) character and must end with a *.pdf extension.

This is going to be the hard part, I think. Particularly if you want to do this all with a single regex. It's possible (e.g. using negative lookahead) but the regex will probably be more complex than most people are comfortable with. I'd ignore this requirment initially - concentrate on solving the other requirements. When you've got everything else worked out, then try writing a regex to just detect if a filename contains two or more hyphens. When you get that working, then do the same for two periods, or two underscores. Then try to find a way to combine these together in your code. (I suggest going what Marc just suggested, combining separate expressions using && and || outside the regexes.

By the way, if the file must end with a .pdf - did that use the one allowed '.'? Or do you allow one '.' in addition to the .pdf?


"I'm not back." - Bill Harding, Twister
jay lai
Ranch Hand

Joined: Apr 04, 2002
Posts: 180
I have so far done this [a-z][a-z0-9]*[\\.]?[\\-]?[\\_]?, which satisfies the below details

1. The file name must start with a letter (a-z) may contain numbers (0-9)
2. The file name may contain one period (.), one hyphen (-),one underscore (_) character

The file name must end with a .pdf so the . present here is extra '.' alond with 0 or 1 allowed '.' in the file name
and all the criteria need not be done in a single regular expression.

The code that I have so far looks like

import java.util.*;
import java.io.*;
import java.util.regex.*;

public class RegularExp
{
public static void main(String args[])
{
String input = "fr11edjava.-";
Pattern pat=Pattern.compile("[a-z][a-z0-9]*[\\.]?[\\-]?[\\_]?");
Matcher matcher = pat.matcher(input);
boolean flag = matcher.matches();
System.out.println("flag1 value is:" +flag);
}
}

Any additional help is very helpfull
marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

Originally posted by jamie lee:
I have so far done this [a-z][a-z0-9]*[\\.]?[\\-]?[\\_]? ...

Great start! A few things to consider...
  • The asterisk quantifier denotes zero or more. Is there a way to denote "at least zero, but not more than..."? This would be helpful in enforcing your 32-character limit.
  • After you've matched the first lowercase letter, followed by zero or more lowercase letters or numbers, you match the other characters (period, hyphen, and underscore). This will only work if these other characters come after the letters and numbers -- and only in that specific order of period, hyphen, then underscore (if present). What if these are mixed in with the other numbers or letters?
  • It must end with ".pdf".
  • As Jim pointed out, the period, hyphen, and underscore are the tricky parts here. You might want to first get this working without putting limits on these characters.
    Alan Moore
    Ranch Hand

    Joined: May 06, 2004
    Posts: 262
    It turns out it is possible to do this with one regex:
    Jeffrey Spaulding
    Ranch Hand

    Joined: Jan 15, 2004
    Posts: 149
    There is always at least one guy to show off his skills here.

    let the kids solve their problems themselves

    Tzk,

    J.
    Alan Moore
    Ranch Hand

    Joined: May 06, 2004
    Posts: 262
    Sorry, but this problem is too advanced for the kind of coaching you guys were trying to do.
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    It might've been, but the solution could still have been achieved by implementing several separate regexes and combining them with &&. I think Jamie would've learned quite a bit as a result, and remembered it afterward.

    [Alan]: It turns out it is possible to do this with one regex

    As I'd previously indicated. Though I was thinking of something more like:


    Some requirements are still unclear though, IMO:

    Does the ".pdf" count as part of the 32 characters?

    Can the filename begin with [._-]?
    marc weber
    Sheriff

    Joined: Aug 31, 2004
    Posts: 11343

    Originally posted by Jim Yingst:
    ...the solution could still have been achieved by implementing several separate regexes and combining them with &&...

    And I don't think Jamie was too far from a solution along those lines. For example, the regex posted above could easily be adjusted to end with a literal ".pdf" and mix the period, hyphen, and underscore in with the lowercase letters and digits for a string not exceeding 32 characters. Then additional regexes could verify that the period, hyphen, and underscore don't appear more than they should. This might not be the most elegant solution; but it works, and it's a good exercise without getting too advanced.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: Java Regular Expressions