It's not a secret anymore!
The moose likes Java in General and the fly likes Generating a String based on a Regular Exprssion Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "Generating a String based on a Regular Exprssion" Watch "Generating a String based on a Regular Exprssion" New topic
Author

Generating a String based on a Regular Exprssion

Amit K Srivastava
Greenhorn

Joined: May 16, 2006
Posts: 23
Hi,

I need to generate a String based on a given regular exression. Currently,
relying on a trivial approach to keep on generating a char sequence randomly
until it gets validated against the regular expression (using the
java.util.regex APIs).

Suggestions for any other optimized approach/ method/ APIs available which
can help ?

Regards,
Amit Srivastava.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18670
Well, that's an unusual question. I think you'll probably have to parse the regex yourself to understand it in terms of its constituent parts, and develop a series of rules for generating random strings based on each type of feature. For example, for a character class like [xyz], you could count up how many characters are in the class, total, and generate a random number to choose one of the characters. Or for alternation (foo|bar), just assign a 50% chance to pick the thing on the left (foo), and a 50% chance to pick the one on the right (bar).

Be aware that there is no single unique way to do this. It may be tempting to say that all possibilities are "equally probable", but this idea becomes impossible when you get to something like (.*). That could be an infinite string. Or in Java, a string of length Integer.MAX_INTEGER (or however much you hae memory for). You probably don't want that though. For variable quantifiers like * or +, I would generate them one character at a time, and at each character, say there's, oh, a 10% chance that the * terminates right there. So the string will be as long as it takes before you happen to generate a random number < 0.10. It could be really long, but long strings are less likely than short ones. This is a good thing, probably.

There may be some libraries out there that would help you in parsing the regex. I don't just mean like Pattern.parse(), but something that generates a semantic structure for the regex. Perhaps a good idea would be to study the source code of Pattern.compile(), or of other regex parsers like OROMatcher or many others. You can figure this out yourself, but it would probably speed things up considerably to steal ideas from existing code. I doubt you'll find any libraries to generate random strings though - that part you'll almost certainly need to make for yourself.

Hope that helps...
[ May 17, 2006: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Amit K Srivastava
Greenhorn

Joined: May 16, 2006
Posts: 23
Thanks Jim,

Fine parsing could be achieved taking the concepts from existing regex parsers.
After parsing, one approach for generating String based on regex could be based on Finite State Automation

It takes:
1. Convert the regular expression's parsed components to the Finite State Machine (FSM)
2. Generating the sequence of characters in password while traversing the FSM until the final state is attained.


The major issue with the above approach is the complexity.


The determination of the total no. of sample space for the next state transition will become complex if the regular expression contains the sequence like pre defined character classes:

\d representing a digit: [0-9]
\D representing a non-digit: [^0-9]

Thanks,
Amit S
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Can you tell us *why* you want to do this?


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Amit K Srivastava
Greenhorn

Joined: May 16, 2006
Posts: 23
The intent is to generate a password based on a password policy (Policy may specify a regular expression).
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18670
Ah. You know, for something like that, I think allowing the full power of regular expressions would be overkill which makes your problem much harder than it needs to be. Why not just have a short list of control parameters, perhaps set in a properties file? E.g. allow the system admin to specify things like

length.min = 6
length.max = 10
require.one.digit = yes
require.mixed.case = no

I think something like this would be much easier to code...
[ May 17, 2006: Message edited by: Jim Yingst ]
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 16482
    
    2

Originally posted by Amit K Srivastava:
...based on a password policy (Policy may specify a regular expression).
I'm sure that policy wasn't your idea. But it seems to me that if the general public was asked to use a password that satisfied a certain regular expression, complete confusion would ensue. So presumably ordinary people aren't being subjected to that policy?
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
What Jim said...
Amit K Srivastava
Greenhorn

Joined: May 16, 2006
Posts: 23
Well I agree with Jim using the regex with full power would be an overkill. So, allowing a subset of it in the regex will reduce the effort considerably.

Thanks Again.
--Amit S
 
I agree. Here's the link: http://zeroturnaround.com/jrebel/download
 
subject: Generating a String based on a Regular Exprssion
 
Similar Threads
Do you store IDE-specific files in your repository?
Need to parse and analyse tomcat logs
Question regarding regular expression
Printing repeated Characters
I am a SE and EE developer plan to learn Android for ME