Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regex question

 
joe nesbitt
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I have a file that contains sentences separated by CRLF.
Also in each sentence, each word is seperated by | and I need to replace the words which contains CRLF with space ("")...how do I do it using regex.

File content example:

aaa|bbbb|cccCRLF
zzz|yyyCRLFxxx|nnCRLF

I need to replace CRLF with "" in the second line only ( I need to ignore the CRLF that are at the end of each sentence).

Any help is highly appreciated.

Thanks in advance.
 
Henry Wong
author
Marshal
Pie
Posts: 20907
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I am assuming that when you say "CRLF", you actually mean a "CRLF" string -- and not, the more common phrase used to refer to a carriage return line feed sequence.

BTW, what have you tried so far?

Henry
 
joe nesbitt
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When I refered to CRLF, I meant to say \r\n (return key usage) .

I tried stringxxx.replaceAll("\\r+.", "\\X000d ") for \r. But not sure how to replace only a particular \r I tried this but in vain:

stringxxx.replaceAll("|*\\r+.*|", "\\X000d ")


Any help is appreciated.

Thanks in advance.
 
Lee Kian Giap
Ranch Hand
Posts: 213
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
you might try this

System.getProperty("line.separator")

instead of

\r\n
 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
joe nesbitt wrote:Hi all,

I have a file that contains sentences separated by CRLF.
Also in each sentence, each word is seperated by | and I need to replace the words which contains CRLF with space ("")...how do I do it using regex.

File content example:

aaa|bbbb|cccCRLF
zzz|yyyCRLFxxx|nnCRLF

I need to replace CRLF with "" in the second line only ( I need to ignore the CRLF that are at the end of each sentence).

Any help is highly appreciated.

Thanks in advance.



Am I missing something? I'm not sure this makes sense! You have shown two lines in your example but you have not said what constitutes a line. Since you are trying to replace \r\n you can't be using \r\n as a line separator. So what separates the lines and sentences in your file?

I don't see regex coming into a solution for this problem. If you are just wanting to replace ALL \r\n in a file by a single space then just read and write the file a char at a time (BufferedReader and BufferedWriter make this efficient) and look for \r\n and output a space when you find the \r\n pair.
 
Rob Spoor
Sheriff
Pie
Posts: 20496
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
joe nesbitt wrote:I tried stringxxx.replaceAll("\\r+.", "\\X000d ") for \r. But not sure how to replace only a particular \r I tried this but in vain:

stringxxx.replaceAll("|*\\r+.*|", "\\X000d ")

String.replaceAll uses regular expressions. | has special meaning in regular expressions. Also, both your attempts would also remove the . (any character) / .* (all characters); that's not what you want, is it?

Check out the Javadoc of java.util.regex.Pattern and check for "positive lookahead" and "positive lookbehind".

However, I don't think using simple regular expressions will help you out here. How would your example be different from the following if you'd use only regular expressions: or even All represent the same characters. Is it really the number of | characters? If so then using a simple loop would probably be better. In pseudo code:
You'll probably want a StringBuilder to store the modified file contents.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic