| Author |
Regex Question ...
|
jay vas
Ranch Hand
Joined: Aug 30, 2005
Posts: 407
|
|
Hi Im new to java regexes. I need to build a regex that detects all words in a paragraph that look like a string of amino acids. So for example : Ala-Cys-Ala, A-C-A, and ACA all represent possible amino acid sequences of alanine, cystein and alanine. Is there a way to build a regex in java that represents this ? Currently Im doing it with nested for loops. Ive tried [A|Ala|V|Val|L|Lys|M|Met|W|Trp|P|S|T|Thr|C|Y|Tyr|N|Asn|-|Q|D|E|K|R|H|X]++ but it returns false positive matches... for example GAVs is returned as group(0) using the java matcher, even though the 's' character is not in the expression..? Ala A Arg R Asn N Asp D Cys C His H Ile I Leu L Lys K Met M Phe F Pro P Ser S Thr T Trp W Tyr Y Val V
|
 |
Jeanne Boyarsky
internet detective
Marshal
Joined: May 26, 2003
Posts: 26496
|
|
Jay, group(0) returns the whole matching string, not just the matching portion. Try putting your reg exp in parens and using group(1).
|
[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
|
 |
jay vas
Ranch Hand
Joined: Aug 30, 2005
Posts: 407
|
|
Well ive gotten closer, but for some reason EEEs matches ... any ideas?
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16811
|
|
This pattern makes no sense... what is it that you are trying to do? Henry
|
Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
|
 |
jay vas
Ranch Hand
Joined: Aug 30, 2005
Posts: 407
|
|
"([G|A|V|L|[Lys]{3}|M|F|W|P|S|T|[Thr]{3}|C|Y|[Trp]{3}|N|-|Q|D|E|K|R|H|X]){3,9}?"; The pattern means Match a strings which is 1) of length 3 through 9 where 2) all subStrings in the string are a combination of G,A,V,L,Lys, M,F,W,P,S,T,Thr, C,Y,Trp, N, -, Q, D, E, K, R, H, or X. so G-A-V-L-X-L matches Lys-L-V-G-A-Trp-X-Trp matches but Lys-O-Lys-X wouldnt match (since O is not a valid amino acid). Also A-L-s-L-s-X wouldnt match either (s isnt an amino acid, although S is).
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16811
|
|
"([G|A|V|L|[Lys]{3}|M|F|W|P|S|T|[Thr]{3}|C|Y|[Trp]{3}|N|-|Q|D|E|K|R|H|X]){3,9}?"; The pattern means Match a strings which is 1) of length 3 through 9 where 2) all subStrings in the string are a combination of G,A,V,L,Lys, M,F,W,P,S,T,Thr, C,Y,Trp, N, -, Q, D, E, K, R, H, or X.
Sorry, but the pattern that you have doesn't do what you described. In fact, I am not even sure if some of the stuff in the pattern is even valid. Assuming that the "-" is an optional separator, and not part of the sequence, this is probably closer to what you want... Henry
|
 |
jay vas
Ranch Hand
Joined: Aug 30, 2005
Posts: 407
|
|
|
Thanks !!! I'lll try it and tell you the result. BTW, what does the ? after the - mean ?
|
 |
Rob Spoor
Sheriff
Joined: Oct 27, 2005
Posts: 19232
|
|
|
It means the - is optional.
|
SCJP 1.4 - SCJP 6 - SCWCD 5
How To Ask Questions How To Answer Questions
|
 |
 |
|
|
subject: Regex Question ...
|
|
|