• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Transform a unformatted paragraph to a multiple-line length-limited paragraph

 
Sheriff
Posts: 4643
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to take an unformatted paragraph of text and transform it in to multiple lines (line breaks on word boundaries only), with the content of each line not exceeding a specified maximum length.  I also optionally want to be able to specify a string to prefix each line (I'm calling this indentation).

I am solving this using a collector which consumes a stream of words and produces a formatted string.  Internally the collector uses a list of StringBuilder objects which are created dynamically during the processing.

Functionally it is working as expected, but I am wanting to get some feedback on the design/implementation.  For example - would you solve this problem differently?  Would you use StringBuilders or something more primitive?

Here's what I did:
Test:
Output:
 
Sheriff
Posts: 28325
95
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This sounds like something I did a long time ago. I can't track it down now, maybe it was for work but I don't think so. Anyway I used (or thought of using) a java.text.BreakIterator which sounds a lot like your requirement.
 
Bartender
Posts: 2911
150
Google Web Toolkit Eclipse IDE Java
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I came up with a shorter form using regex:
 
Saloon Keeper
Posts: 28319
210
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In other words, you're looking to implement a bog-standard old-time text formatting utility.

You've got 2 possible strategies here. One is to use a scanner to break the text down into tokens and keep appending tokens into a StringBuilder until you reach the limit, then output the StringBuilder contents as a String. Reset the StringBuilder to empty and start appending from there. There really isn't anything more primitive that StringBuilder in Java.

The other is faster, but maybe a bit more brutal - use a regex to suck up as large a set of characters from the source string(s) as will fill out the StringBuilder buffer, rinse and repeat until you run out of sources.

And, of course, either way, don't forget to flush out the buffer at the end! (I always forget).

The regex way is probably more performant and more likely to preserve non-semantic constructs such as multiple adjacent spaces (depending on how your scanner works). The scanner, on the other hand, can be more intelligent about how it handles tokens. And, unlike regex'es is less likely to injure your sanity. So it's a matter of which suits you better.
 
Ron McLeod
Sheriff
Posts: 4643
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:You've got 2 possible strategies here. One is to use a scanner to break the text down into tokens and keep appending tokens into a StringBuilder until you reach the limit, then output the StringBuilder contents as a String. Reset the StringBuilder to empty and start appending from there. There really isn't anything more primitive that StringBuilder in Java.


That is basically what I did, but as a stream collector.
 
Sheriff
Posts: 7125
184
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just a thought (and I having tried it), I don't think any of these solutions handle when a "word" is longer than the maximum characters per line.
 
Tim Holloway
Saloon Keeper
Posts: 28319
210
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Knute Snortum wrote:Just a thought (and I having tried it), I don't think any of these solutions handle when a "word" is longer than the maximum characters per line.



I'll take "Things that will 'never' happen for $20", Alex.

Yep. Something that I usually do pay attention to when designing algorithms of that sort. Much to Management's annoyance.

The accepted solution is to simply break the work. You can do that crudely, by just snapping the two (or more!) segments apart, or you can back up 1 and append a word-continuation hyphen on the affected line.

OR you can get really fancy and try and backtrack to a syllable boundary. Although assuming your output lines are a civilized length (60 characters or so), the English language doesn't have a lot of potential offenders and therefore syllable-backtracking isn't likely to be worth it. Not so sure about German, though. Different syllabification rules there, anyway, I'd expect. And stuff like code samples is quite another matter.
 
Paul Clapham
Sheriff
Posts: 28325
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:This sounds like something I did a long time ago. I can't track it down now, maybe it was for work but I don't think so. Anyway I used (or thought of using) a java.text.BreakIterator which sounds a lot like your requirement.



I just remembered why I didn't end up using that class -- it was because my data was going to be rendered by fonts which might not be fixed-width fonts.
 
Knute Snortum
Sheriff
Posts: 7125
184
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Although assuming your output lines are a civilized length (60 characters or so), the English language doesn't have a lot of potential offenders and therefore syllable-backtracking isn't likely to be worth it.


Well, I was thinking of very-long-over-hyphenated-pseudo-adjectives or a list like Democrat/Republican/Independent/Green/Pacific/Socialist/Communist, any list of words that don't have spaces between them.
 
Tim Holloway
Saloon Keeper
Posts: 28319
210
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Knute Snortum wrote:

Tim Holloway wrote:Although assuming your output lines are a civilized length (60 characters or so), the English language doesn't have a lot of potential offenders and therefore syllable-backtracking isn't likely to be worth it.


Well, I was thinking of very-long-over-hyphenated-pseudo-adjectives or a list like Democrat/Republican/Independent/Green/Pacific/Socialist/Communist, any list of words that don't have spaces between them.



As long as you know what you're getting, though, adding a word-breaker function should be fairly easy. In fact I think I've run into a text formatter or two that specifically supported plug-in word-breakers.
 
It's just a flesh wound! Or a tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic