*
The moose likes Beginning Java and the fly likes Splitting a block of text into individual lines Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Splitting a block of text into individual lines" Watch "Splitting a block of text into individual lines" New topic
Author

Splitting a block of text into individual lines

Michael DeChirico
Greenhorn

Joined: Jul 31, 2008
Posts: 16
I have a need to split blocks of text into their component text lines, each text line has a 0xff as the trailing character.

the block of text are variable sized and the component text lines are variable in lenght.

Whats the best way to break the blocks of text apart?

My original thought was to use the stringtokenizer but i gather thats not a good thing to do.

Thanks.

Michael
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38412
    
  23
Is this already in a text file? Use something which reads text files, eg java.util.Scanner.

Is it already as a String, and you are sure there aren't any 0x00ff characters anywhere else? Since 0x00ff = DELETE that is probably the case.
Use the split method of the String class and pass something like "\u00ff" as its regex delimiter. That returns a String[] array.

See whether that helps
Michael DeChirico
Greenhorn

Joined: Jul 31, 2008
Posts: 16
Yes, this code goes in the client app on the desktop, its communicating with a server on an ibm mainframe and the data being parsed is one or more report files generated on the mainframe so the records contain only printable characters, the server was concatenating the report records into blocks of text delimiting the records with 0xff bytes, of course the server could use any non-printable character as a delimiter.

Michael
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38412
    
  23
String#split() sounds hopeful then. Tell us how you get on, please.
Michael DeChirico
Greenhorn

Joined: Jul 31, 2008
Posts: 16
I implemented the split but i am missing something, when I receive each block of text the individual text lines are delimited by a \u00FF after each block is received the:

System.out.println("Result Length: " + result.length);

Always displays a value of 30, what does this value represent? does this mean that their are 30 substrings in the results array? The last entry in the last text block is 0x39FF0A but its never recognized, I even tried placing the sequence in the text block twice but to no avail

The last couple of entries in in the buffer were:

Result SYSPRINT OT11.LISTINGS.OTUF0401 WORK41
Result
Result 7316K allocated to Buffer Pool, 1933K would be required for this to be an In-Storage Assembly

After these the buffer with the termination sequence was sent: x39FF0A
but we never saw them.

From the servers perspective we completed the sequence

OTUF0401: WRITE SUCCESS RETCODE= +0004021
OTUF0401: WRITE SUCCESS RETCODE= +0004021
OTUF0401: WRITE SUCCESS RETCODE= +0004021
OTUF0401: WRITE SUCCESS RETCODE= +0000004
OTUF0401: CLOSE SUCCESS RETCODE= +0000000
OTUF0401: TERMAPI SUCCESS RETCODE= +0000000
OTUF0401: ENDED

The retcode value shows the number of bytes written.

Anyone have any thoughts?


Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38412
    
  23
Your "result" variable is an array; if you look in the Java Language Specification (look at �10.3) you find that an array has a public final int field called length, and that is what you are using.

If result.length is 30, that means there are 30 separate Strings in the String[] called result.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38412
    
  23
Oh, you are using Pattern#split rather than String#split. I am pretty sure String#split would have taken the \u00ff out, but you will have to check in the API about Pattern#split; I don't know what that does.
Michael DeChirico
Greenhorn

Joined: Jul 31, 2008
Posts: 16
So what your saying is that pattern split may not remove the \u00ff from the result array entries?
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38412
    
  23
No, I was saying don't know. If you read about Pattern#split it would appear to takes its pattern out.
I think you are using \u correctly as an escape, but I can never remember how many escape sequences there are for patterns.
Please check whether you need the square brackets [] in your pattern; I would have thought it would work without the [] since you are not grouping several characters.
Michael DeChirico
Greenhorn

Joined: Jul 31, 2008
Posts: 16
Another dumb mistake on my part, when the server sent the last block the client it promptly closed the connection, but not before the client had a chance to process the last block and instead the:

Catch (Exception e) {
System.out.println("Sorry, an error has occurred. Connection lost. ");
System.out.println(e.toString());
System.exit(1);
} // END CATCH

Got driven and we got a bogus and did not process the last block, what I have had to do was after processing each block from the server, is to send a response back requesting the next block.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Splitting a block of text into individual lines