File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes regex code to recognize fully qualified Java class names in strings? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "regex code to recognize fully qualified Java class names in strings?" Watch "regex code to recognize fully qualified Java class names in strings?" New topic
Author

regex code to recognize fully qualified Java class names in strings?

Joe Vahabzadeh
Ranch Hand

Joined: Jan 05, 2005
Posts: 140
All,

Ok, I'm trying to write a bit of Java code to itself read .java files, and notify me when it finds a class name in quotes.

I've been struggling a bit with this, but am baffled.

I want to be able to see when I match something like:


where it picks up on the "com.myjob.MyClass"

Now there can be any arbitrary depth before the classname . . ie: it could be com.myjob.mywidgets.myspecializedwidgets.MyClass" or something as simple as "com.MyClass"

I've been playing around with the matches(regex) method in String, but I'm not getting quite the results I want.

Any pointers?

Thanks!
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19543
    
  16

How about you show us your current regular expression first, and we can tell you what's wrong with it. One hint - the package is zero or more occurrences of "something followed by a dot". I say zero because it's also possible to have a class in the default package.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Joe Vahabzadeh
Ranch Hand

Joined: Jan 05, 2005
Posts: 140
It's the dot character that's throwing me I think, because it also means "any character" . . . in any case, I've tried the following:



Ok, so I know the + is wrong and should be a * because, as you said, it could be in the default package. I also know that all the packages follow the convention of all lowercase characters, and only the class names have mixed case.

So I already know that it should be something like:


And I tried it, but, while both of those regular expressions will capture "com.myjob.MyClass", they will also capture "commyjobMyClass"

I'm trying to say:
- any string of any number of characters (the rest of the line BEFORE what I'm looking for)
- followed by a double-quote character
- followed by zero or more occurrences of:
--- one or more lowercase letters followed by a period
- followed by an occurrence of:
--- one or more letters of any case.
- followed by a double-quote character
- followed by any string of any number of characters (the rest of the line AFTER what I'm looking for)

If I were to guess, I'd say my problem exists in this part: [[a-z]+\\.]* though I'm not sure how to fix it.
Matthew Brown
Bartender

Joined: Apr 06, 2010
Posts: 4240
    
    7

Joe Vahabzadeh wrote:If I were to guess, I'd say my problem exists in this part: [[a-z]+\\.]* though I'm not sure how to fix it.

Do you mean ([a-z]+\\.)* ? I'm not quite sure what the effect of nesting [ ]s will be, but it's not what you're trying to do here.
Joe Vahabzadeh
Ranch Hand

Joined: Jan 05, 2005
Posts: 140
That's exactly it - I was trying to nest something, but I am doing so incorrectly.

I didn't realize that parenthesis could be used like that to next something. Thanks, that seems to have solved this problem for me!
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7063
    
  16

Joe Vahabzadeh wrote:I didn't realize that parenthesis could be used like that to next something. Thanks, that seems to have solved this problem for me!

The real problem that you're likely to run into (as already stated by Rob) is that a class name doesn't necessarily have to have a dot in it. Also, there are several possibilities of strings that do contain dots that aren't class names. I suspect that you'll have to verify the result with something like Class.forName() if you want to be really sure.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18134
    
    8

The idea that Java class names will be composed entirely of the letters "a" to "z" is pretty parochial and shouldn't be used for real-world applications, where pretty much any Unicode letters can be used. (Check the Java Language Spec for the exact rules.) It's probably okay for personal use, though. Although you might want to consider the possibility that class names can also include digits -- unless you already did and I didn't notice where in the regex you did that.
Joe Vahabzadeh
Ranch Hand

Joined: Jan 05, 2005
Posts: 140
Winston and Paul,

You are both correct.

Winston - I will be using Class.forName() - but I'm filtering the strings first to make sure I have a String that is in quotes first. If it passes the regex, then Class.forName() will be used to verify.

I've also switched back to assuming at least one dot - as for the particular source code that I need to run this program against, NONE of the classes are in the default package.

Paul - agreed, but for the source code that I'm running this program against, it's a known quantity that the package names consist of only lowercase letters, and the class names only consist of upper and lower case letters, and numbers (I've modified the regex slightly to reflect that).


Ultimately, though, my biggest stumbling block was the nested brackets, when I should've been using parenthesis.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7063
    
  16

Joe Vahabzadeh wrote:Winston - I will be using Class.forName()
Paul - agreed, but for the source code that I'm running this program against, it's a known quantity that the package names consist of only lowercase letters, and the class names only consist of upper and lower case letters, and numbers (I've modified the regex slightly to reflect that)...
There are a few other rules that you can apply too. For example: class names cannot start with a number (although, as Paul said, they can include numbers). You can find the exact rules in the JLS, and I would make sure your regex covers them all, because you don't want to be calling something as heavyweight as Class.forName() on a string that can't possibly be a class name.

Ultimately, though, my biggest stumbling block was the nested brackets, when I should've been using parenthesis.
Welcome to the world of parsing; something that regex is definitely NOT good for. And don't forget about escaped/doubled quotes either.

Winston
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18134
    
    8

I don't know... my programs are full of string literals like "a" or "Error". These could of course be class names, but they aren't. Even if you get this regex working, you're going to have more false positives than true positives.

Edit: I see you're now requiring a package name. That's likely to reduce the false-positive level considerably from what I originally assumed.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: regex code to recognize fully qualified Java class names in strings?
 
Similar Threads
first assignment - need help
need help creating a java regex
Regular Expression Question
Problem using java.util.regex.Pattern class to match a pattern.
Taking the Next Step