File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Beginning Java and the fly likes Add & edit ArrayList elements from inside a for loop Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Add & edit ArrayList elements from inside a for loop" Watch "Add & edit ArrayList elements from inside a for loop" New topic
Author

Add & edit ArrayList elements from inside a for loop

J Steele
Greenhorn

Joined: Feb 28, 2013
Posts: 13
I'm trying to break one paragraph into multiple strings (StringBuffers, actually, since I need to keep appending each string) - one for each sentence. At the start of the program, I do not know how many sentences are in the paragraph, but I want each sentence to be its own element in an ArrayList (not an Array, since I don't know how many elements I'll have).

The problem is:

From the research I've done, it isn't possible to create & name new StringBuffers from within a for loop (so I can't have the first time create SB1, then SB2, then SB3, etc). And I can't use the myStringBuffer.append() method or otherwise edit an element within my ArrayList unless I have already used the myArrayList.add() method and created a new element in my list... but I don't know how many elements I'll need until after the for loop has finished.

Help?? Any ideas on how I can achieve my goal, without breaking the laws of physics or Java?

My code is below:

Mansukhdeep Thind
Ranch Hand

Joined: Jul 27, 2010
Posts: 1157

I would suggest you use the classes / methods in java.util.regex package. They are specifically designed for the type of problems you are trying to solve. Read about this package and its classes(Pattern and Matcher) Pattern and Matcher. Study how these classes work together to sieve out data as per compiled pattern.

Hint: Since each sentence will surely end with a ". ", you can use this pattern to separate any number of legal sentences.


~ Mansukh
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38357
    
  23
Beware of . as a regex; it is a metacharacter.
Are you absolutely sure that every . in your text means end of sentence? You might have a decimal fraction; what if you write 1.23 somewhere?
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38357
    
  23
This tutorial should tell you about special handling for metacharacters.
J Steele
Greenhorn

Joined: Feb 28, 2013
Posts: 13
Hm... I think I may have been unclear in my question.

My code above looks for a period, exclamation point, or question mark as an indication that a sentence has ended. I will later add in extra tests for multiple punctuation marks (ex: "How cool is that??"), and it already occurred to me that I need to set up a separate test for ellipses (...) because those don't always indicate the end of a sentence. Thank you, Campbell Ritchie, for pointing out another mid-sentence use of the period - I'll have to add a test for that.

However, my code can already find these punctuation marks:



What I need is a way to separate the original String input (a paragraph consisting of multiple sentences - the code does not know how many sentences) into an ArrayList, for which each element of the array is a StringBuffer whose contents are a single sentence.

In other words:

User input (a single String): "Hi, my name is Sammy. I am a Smith. I am Sammy Smith. Who are you?"

My code goes through the string, one character at a time (char c is the current character in the code above), and does two things:
- 1) The code keeps a counter of how many sentences there are in the paragraph (int numSen in the code above).
- 2) The code produces an ArrayList of StringBuffer elements:
myArrayList(0) = "Hi, my name is Sammy."
myArrayList(1) = "I am a Smith."
myArrayList(2) = "I am Sammy Smith."
myArrayList(3) = "Who are you?"

Where I need help is with part 2. Without knowing ahead of time how many sentences are in the paragraph/user input, how do I create a dynamic ArrayList with one StringBuffer element for each sentence?

I can't add a character to an element of the array (myArrayList(0).append(next character)) without first adding a StringBuffer to that element (myArrayList.add(newStringBuffer)).

However, I can't do this *before* the for loop starts because I don't know how many elements (sentences) I will need... and everything I've found online says I can't do this from *within* the for loop, because I can't set up an automatic new-variable-creating code (ex: the first time through the loop, it creates myStringBuffer1, the next time through it creates myStringBuffer2, then myStringBuffer3, etc., until the loop is done).

So my question is - is there another way I can code this to get the result I want (item 2 above)?
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3446
    
  12
Create a new StringBuilder (preferable to StringBuffer) before the start of the loop.
Add each character to the StringBuilder.
When you find the ., ? or !, add the StringBuilder to the ArrayList and then reinitialise your StringBuilder reference with a new StringBuilder.

Something like*


*Not tested. There may be syntax errors.


Joanne
J Steele
Greenhorn

Joined: Feb 28, 2013
Posts: 13
I may have found a solution, but I won't be able to code and test it until I get home tonight: Within my for loop, at the start of each sentence, call a method that creates a new StringBuffer:



Last time I tried something like this, the problem was that all of my ArrayList elements were the same, because they all referred to the same StringBuffer variable. I'll have to check tonight and see if this fixes it...
J Steele
Greenhorn

Joined: Feb 28, 2013
Posts: 13
Joanne Neal - thank you for your suggestion! I'll have to try it tonight when I get home. Also, I'll look more into StringBuilders - I haven't used those before.
Mansukhdeep Thind
Ranch Hand

Joined: Jul 27, 2010
Posts: 1157

Campbell Ritchie wrote:Beware of . as a regex; it is a metacharacter.
Are you absolutely sure that every . in your text means end of sentence? You might have a decimal fraction; what if you write 1.23 somewhere?


Even if the paragraph does contain decimals etc. , proper usage of regex will ensure we filter the sentences.
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3446
    
  12
J Steele wrote:Also, I'll look more into StringBuilders - I haven't used those before.

They're basically the same as StringBuffers but are not synchronized which means they might be a little more efficient when synchronization is not needed.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38357
    
  23
Have you considered this?
Can you design a regex which will identify sentence ends?
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11229
    
  16

Campbell Ritchie wrote:Have you considered this?
Can you design a regex which will identify sentence ends?


I think that's going to be pretty hard...

"Everyone welcome F." is a valid sentence. "Rosenberger to the Stage" is also. But so is "Everyone welcome F. Rosenberger to the Stage".


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
J Steele
Greenhorn

Joined: Feb 28, 2013
Posts: 13
I like the idea of a split, except that it doesn't account for all the exceptions I need. For example - a period is the end of a sentence, but an ellipsis or decimal point uses the same symbol and isn't the end of a sentence. And most importantly - I can't have empty strings if someone ends a sentence with more than one punctuation mark. These empty strings will seriously mess with my program later in the code.

fred rosenberger - You bring up a good point, but I'm not sure how I would test for that exception. I can test for decimal points by checking if there is an A-Z, a-z, or 0-9 character immediately after the period, but that doesn't solve using the same marker to indicate an abbreviation. And by the way - your exception is extremely aggravating! It occurred to me that I could check if the first letter following ". " (a period and a space) is a-z (lower-case), as it would be in the middle of most sentences... but of course you selected an example that would fail that test too. I hope I always have access to a tester as devious as you.

Additionally, it occurs to me that I really have no good way of knowing if an ellipsis is intended to be mid sentence or as a sentence-ending device.

Unless I can find a (reasonable) way of solving these issues, I think I will just have to declare arbitrary rules: ellipses end sentences, and if there is a space after a period, it indicates the end of a sentence.

This program is intended to test and improve my Java skills, not for any work/school assignment, so as long as I can achieve my self-imposed goals, I may have to accept reasonable compromises.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38357
    
  23
Yes, you are going to have to compromise and simplify your identifying of sentence ends. After all, what happens if you write
He is such a helpful chap, that F. Rosenberger.
… on a WP? It will try to end the sentence after the F. You are going to have to accept a simpler solution which is actually achievable.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Add & edit ArrayList elements from inside a for loop