wood burning stoves 2.0*
The moose likes Java in General and the fly likes Regex Question ... Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regex Question ..." Watch "Regex Question ..." New topic
Author

Regex Question ...

jay vas
Ranch Hand

Joined: Aug 30, 2005
Posts: 407
Hi Im new to java regexes. I need to build a regex that detects
all words in a paragraph that look like a string of amino acids.

So for example :

Ala-Cys-Ala, A-C-A, and ACA all represent possible amino acid sequences of alanine, cystein and alanine. Is there a way to build a regex in java that represents this ? Currently Im doing it with nested for loops. Ive tried
[A|Ala|V|Val|L|Lys|M|Met|W|Trp|P|S|T|Thr|C|Y|Tyr|N|Asn|-|Q|D|E|K|R|H|X]++ but it returns false positive matches... for example GAVs is returned as group(0) using the java matcher, even though the 's' character is not in the expression..?

Ala A
Arg R
Asn N
Asp D
Cys C
His H
Ile I
Leu L
Lys K
Met M
Phe F
Pro P
Ser S
Thr T
Trp W
Tyr Y
Val V
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30764
    
156

Jay,
group(0) returns the whole matching string, not just the matching portion. Try putting your reg exp in parens and using group(1).


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
jay vas
Ranch Hand

Joined: Aug 30, 2005
Posts: 407
Well ive gotten closer, but for some reason
EEEs matches ... any ideas?

Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40



This pattern makes no sense... what is it that you are trying to do?

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
jay vas
Ranch Hand

Joined: Aug 30, 2005
Posts: 407
"([G|A|V|L|[Lys]{3}|M|F|W|P|S|T|[Thr]{3}|C|Y|[Trp]{3}|N|-|Q|D|E|K|R|H|X]){3,9}?";


The pattern means

Match a strings which is
1) of length 3 through 9
where
2) all subStrings in the string are a combination of
G,A,V,L,Lys, M,F,W,P,S,T,Thr, C,Y,Trp, N, -, Q, D, E, K, R, H, or X.


so

G-A-V-L-X-L matches
Lys-L-V-G-A-Trp-X-Trp matches
but


Lys-O-Lys-X wouldnt match (since O is not a valid amino acid).
Also
A-L-s-L-s-X wouldnt match either (s isnt an amino acid, although S is).
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40


"([G|A|V|L|[Lys]{3}|M|F|W|P|S|T|[Thr]{3}|C|Y|[Trp]{3}|N|-|Q|D|E|K|R|H|X]){3,9}?";

The pattern means

Match a strings which is
1) of length 3 through 9
where
2) all subStrings in the string are a combination of
G,A,V,L,Lys, M,F,W,P,S,T,Thr, C,Y,Trp, N, -, Q, D, E, K, R, H, or X.


Sorry, but the pattern that you have doesn't do what you described. In fact, I am not even sure if some of the stuff in the pattern is even valid.

Assuming that the "-" is an optional separator, and not part of the sequence, this is probably closer to what you want...



Henry
jay vas
Ranch Hand

Joined: Aug 30, 2005
Posts: 407
Thanks !!! I'lll try it and tell you the result. BTW, what does the ? after the - mean ?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19720
    
  20

It means the - is optional.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
 
Don't get me started about those stupid light bulbs.
 
subject: Regex Question ...