• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Add & edit ArrayList elements from inside a for loop

 
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm trying to break one paragraph into multiple strings (StringBuffers, actually, since I need to keep appending each string) - one for each sentence. At the start of the program, I do not know how many sentences are in the paragraph, but I want each sentence to be its own element in an ArrayList (not an Array, since I don't know how many elements I'll have).

The problem is:

From the research I've done, it isn't possible to create & name new StringBuffers from within a for loop (so I can't have the first time create SB1, then SB2, then SB3, etc). And I can't use the myStringBuffer.append() method or otherwise edit an element within my ArrayList unless I have already used the myArrayList.add() method and created a new element in my list... but I don't know how many elements I'll need until after the for loop has finished.

Help?? Any ideas on how I can achieve my goal, without breaking the laws of physics or Java?

My code is below:

 
Ranch Hand
Posts: 1164
Eclipse IDE Firefox Browser Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would suggest you use the classes / methods in java.util.regex package. They are specifically designed for the type of problems you are trying to solve. Read about this package and its classes(Pattern and Matcher) Pattern and Matcher. Study how these classes work together to sieve out data as per compiled pattern.

Hint: Since each sentence will surely end with a ". ", you can use this pattern to separate any number of legal sentences.
 
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Beware of . as a regex; it is a metacharacter.
Are you absolutely sure that every . in your text means end of sentence? You might have a decimal fraction; what if you write 1.23 somewhere?
 
Campbell Ritchie
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This tutorial should tell you about special handling for metacharacters.
 
J Steele
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hm... I think I may have been unclear in my question.

My code above looks for a period, exclamation point, or question mark as an indication that a sentence has ended. I will later add in extra tests for multiple punctuation marks (ex: "How cool is that??"), and it already occurred to me that I need to set up a separate test for ellipses (...) because those don't always indicate the end of a sentence. Thank you, Campbell Ritchie, for pointing out another mid-sentence use of the period - I'll have to add a test for that.

However, my code can already find these punctuation marks:



What I need is a way to separate the original String input (a paragraph consisting of multiple sentences - the code does not know how many sentences) into an ArrayList, for which each element of the array is a StringBuffer whose contents are a single sentence.

In other words:

User input (a single String): "Hi, my name is Sammy. I am a Smith. I am Sammy Smith. Who are you?"

My code goes through the string, one character at a time (char c is the current character in the code above), and does two things:
- 1) The code keeps a counter of how many sentences there are in the paragraph (int numSen in the code above).
- 2) The code produces an ArrayList of StringBuffer elements:
myArrayList(0) = "Hi, my name is Sammy."
myArrayList(1) = "I am a Smith."
myArrayList(2) = "I am Sammy Smith."
myArrayList(3) = "Who are you?"

Where I need help is with part 2. Without knowing ahead of time how many sentences are in the paragraph/user input, how do I create a dynamic ArrayList with one StringBuffer element for each sentence?

I can't add a character to an element of the array (myArrayList(0).append(next character)) without first adding a StringBuffer to that element (myArrayList.add(newStringBuffer)).

However, I can't do this *before* the for loop starts because I don't know how many elements (sentences) I will need... and everything I've found online says I can't do this from *within* the for loop, because I can't set up an automatic new-variable-creating code (ex: the first time through the loop, it creates myStringBuffer1, the next time through it creates myStringBuffer2, then myStringBuffer3, etc., until the loop is done).

So my question is - is there another way I can code this to get the result I want (item 2 above)?
 
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Create a new StringBuilder (preferable to StringBuffer) before the start of the loop.
Add each character to the StringBuilder.
When you find the ., ? or !, add the StringBuilder to the ArrayList and then reinitialise your StringBuilder reference with a new StringBuilder.

Something like*


*Not tested. There may be syntax errors.
 
J Steele
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I may have found a solution, but I won't be able to code and test it until I get home tonight: Within my for loop, at the start of each sentence, call a method that creates a new StringBuffer:



Last time I tried something like this, the problem was that all of my ArrayList elements were the same, because they all referred to the same StringBuffer variable. I'll have to check tonight and see if this fixes it...
 
J Steele
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Joanne Neal - thank you for your suggestion! I'll have to try it tonight when I get home. Also, I'll look more into StringBuilders - I haven't used those before.
 
Mansukhdeep Thind
Ranch Hand
Posts: 1164
Eclipse IDE Firefox Browser Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Beware of . as a regex; it is a metacharacter.
Are you absolutely sure that every . in your text means end of sentence? You might have a decimal fraction; what if you write 1.23 somewhere?



Even if the paragraph does contain decimals etc. , proper usage of regex will ensure we filter the sentences.
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

J Steele wrote:Also, I'll look more into StringBuilders - I haven't used those before.


They're basically the same as StringBuffers but are not synchronized which means they might be a little more efficient when synchronization is not needed.
 
Campbell Ritchie
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Have you considered this?
Can you design a regex which will identify sentence ends?
 
lowercase baba
Posts: 13089
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Have you considered this?
Can you design a regex which will identify sentence ends?



I think that's going to be pretty hard...

"Everyone welcome F." is a valid sentence. "Rosenberger to the Stage" is also. But so is "Everyone welcome F. Rosenberger to the Stage".
 
J Steele
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I like the idea of a split, except that it doesn't account for all the exceptions I need. For example - a period is the end of a sentence, but an ellipsis or decimal point uses the same symbol and isn't the end of a sentence. And most importantly - I can't have empty strings if someone ends a sentence with more than one punctuation mark. These empty strings will seriously mess with my program later in the code.

fred rosenberger - You bring up a good point, but I'm not sure how I would test for that exception. I can test for decimal points by checking if there is an A-Z, a-z, or 0-9 character immediately after the period, but that doesn't solve using the same marker to indicate an abbreviation. And by the way - your exception is extremely aggravating! It occurred to me that I could check if the first letter following ". " (a period and a space) is a-z (lower-case), as it would be in the middle of most sentences... but of course you selected an example that would fail that test too. I hope I always have access to a tester as devious as you.

Additionally, it occurs to me that I really have no good way of knowing if an ellipsis is intended to be mid sentence or as a sentence-ending device.

Unless I can find a (reasonable) way of solving these issues, I think I will just have to declare arbitrary rules: ellipses end sentences, and if there is a space after a period, it indicates the end of a sentence.

This program is intended to test and improve my Java skills, not for any work/school assignment, so as long as I can achieve my self-imposed goals, I may have to accept reasonable compromises.
 
Campbell Ritchie
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, you are going to have to compromise and simplify your identifying of sentence ends. After all, what happens if you write

He is such a helpful chap, that F. Rosenberger.

… on a WP? It will try to end the sentence after the F. You are going to have to accept a simpler solution which is actually achievable.
 
reply
    Bookmark Topic Watch Topic
  • New Topic