File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Java Regular Expressions

 
jay lai
Ranch Hand
Posts: 180
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
The input file name needs to pass some validations, below is the list of criteria.
1. The file name must be lower case and no longer than 32 characters in length.
2. The file name must start with a letter (a-z) may contain numbers (0-9) with no spaces.
3. The file name may contain one period (.), one hyphen (-),one underscore (_) character and must end with a *.pdf extension.
4. The file name should not contain any path delimiters.

Can some one help me with this, Tank you very much
 
marc weber
Sheriff
Posts: 11343
Java Mac Safari
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's an interesting set of requirements. What do you have so far?

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

Here's some general advice: Start by writing your test code. Create instances that violate each of the requirements (individually, and possibly in various combinations), along with at least one instance that satisfies all requirements. Then as you develop your regex pattern, these test instances will tell you what needs adjustment.
[ June 21, 2005: Message edited by: marc weber ]
 
marc weber
Sheriff
Posts: 11343
Java Mac Safari
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are you required to use a single regex pattern? Or can you use 2 or 3 separate regex patterns? In other words, can you use something like the following?

... s.matches(regex1) && s.matches(regex2) ...
[ June 21, 2005: Message edited by: marc weber ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
3. The file name may contain one period (.), one hyphen (-),one underscore (_) character and must end with a *.pdf extension.

This is going to be the hard part, I think. Particularly if you want to do this all with a single regex. It's possible (e.g. using negative lookahead) but the regex will probably be more complex than most people are comfortable with. I'd ignore this requirment initially - concentrate on solving the other requirements. When you've got everything else worked out, then try writing a regex to just detect if a filename contains two or more hyphens. When you get that working, then do the same for two periods, or two underscores. Then try to find a way to combine these together in your code. (I suggest going what Marc just suggested, combining separate expressions using && and || outside the regexes.

By the way, if the file must end with a .pdf - did that use the one allowed '.'? Or do you allow one '.' in addition to the .pdf?
 
jay lai
Ranch Hand
Posts: 180
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have so far done this [a-z][a-z0-9]*[\\.]?[\\-]?[\\_]?, which satisfies the below details

1. The file name must start with a letter (a-z) may contain numbers (0-9)
2. The file name may contain one period (.), one hyphen (-),one underscore (_) character

The file name must end with a .pdf so the . present here is extra '.' alond with 0 or 1 allowed '.' in the file name
and all the criteria need not be done in a single regular expression.

The code that I have so far looks like

import java.util.*;
import java.io.*;
import java.util.regex.*;

public class RegularExp
{
public static void main(String args[])
{
String input = "fr11edjava.-";
Pattern pat=Pattern.compile("[a-z][a-z0-9]*[\\.]?[\\-]?[\\_]?");
Matcher matcher = pat.matcher(input);
boolean flag = matcher.matches();
System.out.println("flag1 value is:" +flag);
}
}

Any additional help is very helpfull
 
marc weber
Sheriff
Posts: 11343
Java Mac Safari
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by jamie lee:
I have so far done this [a-z][a-z0-9]*[\\.]?[\\-]?[\\_]? ...

Great start! A few things to consider...
  • The asterisk quantifier denotes zero or more. Is there a way to denote "at least zero, but not more than..."? This would be helpful in enforcing your 32-character limit.
  • After you've matched the first lowercase letter, followed by zero or more lowercase letters or numbers, you match the other characters (period, hyphen, and underscore). This will only work if these other characters come after the letters and numbers -- and only in that specific order of period, hyphen, then underscore (if present). What if these are mixed in with the other numbers or letters?
  • It must end with ".pdf".
  • As Jim pointed out, the period, hyphen, and underscore are the tricky parts here. You might want to first get this working without putting limits on these characters.
     
    Alan Moore
    Ranch Hand
    Posts: 262
    • 0
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    It turns out it is possible to do this with one regex:
     
    Jeffrey Spaulding
    Ranch Hand
    Posts: 149
    • 0
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    There is always at least one guy to show off his skills here.

    let the kids solve their problems themselves

    Tzk,

    J.
     
    Alan Moore
    Ranch Hand
    Posts: 262
    • 0
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Sorry, but this problem is too advanced for the kind of coaching you guys were trying to do.
     
    Jim Yingst
    Wanderer
    Sheriff
    Posts: 18671
    • 0
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    It might've been, but the solution could still have been achieved by implementing several separate regexes and combining them with &&. I think Jamie would've learned quite a bit as a result, and remembered it afterward.

    [Alan]: It turns out it is possible to do this with one regex

    As I'd previously indicated. Though I was thinking of something more like:


    Some requirements are still unclear though, IMO:

    Does the ".pdf" count as part of the 32 characters?

    Can the filename begin with [._-]?
     
    marc weber
    Sheriff
    Posts: 11343
    Java Mac Safari
    • 0
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Originally posted by Jim Yingst:
    ...the solution could still have been achieved by implementing several separate regexes and combining them with &&...

    And I don't think Jamie was too far from a solution along those lines. For example, the regex posted above could easily be adjusted to end with a literal ".pdf" and mix the period, hyphen, and underscore in with the lowercase letters and digits for a string not exceeding 32 characters. Then additional regexes could verify that the period, hyphen, and underscore don't appear more than they should. This might not be the most elegant solution; but it works, and it's a good exercise without getting too advanced.
     
    I agree. Here's the link: http://aspose.com/file-tools
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic