• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

regex problem

 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello

why the following regex

(.*[^\da-zA-Z]+rebase[^\da-zA-Z]+.*)|(^rebase[^\da-zA-Z]+.*)|(.*[^\da-zA-Z]+rebase$)|(^rebase$)

returns false on matches() for String

<HTML><HEAD><TITLE>REBASE Sequence Data</TITLE></HEAD> <!--========================================================--> <!--== Dana Macelis for Dr. R.J. Roberts ======--> <!--========================================================--> <BODY BGCOLOR=#FFFFFF TEXT=#000000 LINK=#000080 ALINK=#5577CC VLINK=#5577CC> <CENTER> <TABLE WIDTH=100%><TR ALIGN=CENTER VALIGN=CENTER> <TD><A HREF=/rebase/index.html><IMG SRC=/rebase/rebhomeB4.gif BORDER=0 ALT=Home></A></TD> <TD> <FONT SIZE=5><b>REBASE <font color=#5599BB>Sequence Data</font></b></FONT> 06/22/2011<br> <font size=2 color=gray>DNA and Protein sequences are shown in <font color=#77AADD><b>FASTA</b></font> Format.</font> <br><table cellpadding=6><tr> <td bgcolor=#99FF99><a href=javascript:window.close();><font size=2 color=#000000>Close Window</font></a></td><td bgcolor=#FFFF99><a href=javascript:history.go(-1);><font size=2 color=#000000>Back</font></a></td> <td bgcolor=lightyellow><a href=javascript:history.go();><font size=2 color=#000000>Refresh</font></a></td> </tr></table> </TD> <TD><A HREF=/rebase/rebase.seqs.html><IMG SRC=/rebase/rebseqs.gif BORDER=0 ALT=Seqs></A></TD> </TR></TABLE> <br><br> <table><tr><td align=left><b><FONT size=3><xmp> >M.EcoCB9615DamP GATC 278 aa MKKNRAFLKW AGGKYPLLDD IKRHLPKGEC LVEPFVGAGS VFLNTDFSRY ILADINSDLI SLYNIVKMRT DEYVQAAREL FVPETNCAEV YYQFREEFNK SQDPFRRAVL FLYLNRYGYN GLCRYNLRGE FNVPFGRYKK PYFPEAELYH FAEKAQNAFF YCESYADSMA RADDASVVYC DPPYAPLSAT ANFTAYHTNS FTLEQQAHLA EIAEGLVERH IPVLISNHDT MLTREWYQRA KLHVVKVRRS ISSNGGTRKK VDELLALYKP GVVSPAKK </xmp></font></b></td></tr></table><hr> </CENTER> </BODY> </HTML>

when there is "<A HREF=/rebase/rebase.seqs.html>" in it?

Thanks in advance
 
Saloon Keeper
Posts: 15510
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yulia, whatever it is you're trying to do, you're probably better off writing a few lines of code that do it. Don't use a horrible regular expression like that.

If you're hellbent on using it anyway, show us a SSCCE that demonstrates the issue.
 
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It works for me although it does appear to be overly complex. I may be misunderstanding the intent of your regex but can't you just use:

Can you show the code you are using that returned no matches
 
Yulia Dubinina
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm trying to find "rebase" substring anywhere in the code of html page. The substring should be separated by anything except digit or letter from both sides. In this specific example it is http://rebase.neb.com/cgi-bin/seqsget?M.EcoCB9615DamP.CP001846.pro

It does worked for me here http://www.regexplanet.com/advanced/java/index.html when I copy-pasted the page source from this forum. However when I copy-paste it from variables in debug it doesn't work for some reason. Are there any sneaky hidden symbols which make it behave like that?

Here is the code example:

 
Tony Docherty
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I'm trying to find "rebase" substring anywhere in the code of html page. The substring should be separated by anything except digit or letter from both sides.


Ok, this should work:

However when I copy-paste it from variables in debug it doesn't work for some reason. Are there any sneaky hidden symbols which make it behave like that?


Maybe, but without printing out the bytes in the String it's hard to say what non-displaying characters are in it.
 
Yulia Dubinina
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I saved host and html Strings into files and tried to use match() reading Strings from these files. I looked inside those files and it looks like it should work, I don't understand why it returns false. I tried to attach txt files to this post, but forum doesn't allow me to do so. I renamed files as jpg, but those are simple text files.
teststring.jpg
[Thumbnail for teststring.jpg]
testhost.jpg
[Thumbnail for testhost.jpg]
 
Yulia Dubinina
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Looks like it doesn't allow to download attached files. Is it possible to attach txt file to a post here?
 
Yulia Dubinina
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I uploaded files here http://www.filehostfree.com/?d=51A095D21
 
Tony Docherty
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The String.matches() method attempts to match the entire string against the regex.
If you want to find a match within the String, use the Pattern class to get a Matcher and call it's find() method.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic