• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

How to split the sentence neatly

 
Ranch Hand
Posts: 472
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Good day,

Appreciated your help and i would like to know how can i split below input string into expected output?

Take note on input string as between 77973 and 117181 have multiple spaces, whereas Revenue and 77973 have single space only.

I have try using outputStr.split(" "); and doesn't split it neatly




Expecting output
 
Sheriff
Posts: 4884
317
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I assume by "doesn't split it neatly" you mean that you have a bunch of whitespace around each item?

If so, then you can split on a singe whitespace then trim each item to get rid of the extra whitespace.
 
Bartender
Posts: 1952
7
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Or as an alternative to what Tim suggested, you could split using a regular expression that matches one or more whitespace characters.
 
Nakata kokuyo
Ranch Hand
Posts: 472
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks everyone, i trying below and seem not matching, anyone can shed the light ?


 
Sergej Smoljanov
Ranch Hand
Posts: 472
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need delimiter that is space, but not followed letter [^A-z].

^a - is the first a, because

^ The beginning of a line


http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
but you can use negations this way:

this time [^a]+ mean one or more occurrence of letter that not a, pay note that if you use [^ab] delimiter is not a or not b.

also you may try find for than become more interesting
 
Nakata kokuyo
Ranch Hand
Posts: 472
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks buddy, i figure out using this way to achieved it



Now, when i try to split it using


the number is split in right way, but the wording "Total comprehensive income .." which contain whitespace will split too

How can i handle it ?

Below is the full snippet :

 
Sergej Smoljanov
Ranch Hand
Posts: 472
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

i mean something like this one.
 
author
Posts: 23883
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Nakata kokuyo wrote:Thanks buddy, i figure out using this way to achieved it


Well, it does match, but something tells me that it doesn't match it in a way that you think it does. For this input ...

The "[azA-Z]+" part of the regex only matches the first letter -- specifically the letter "T". Notice that it is not matching the small letters, except of the letters "a" and "z" only.

The next part of the regex, ie. ".*", matches the majority of the input -- specifically, it matches "otal comprehensive income for the year 12,464 28,164 123,601 114,72".... (spaces not shown due to lack of code tags)

The "\\d+" part of the regex only matches the last digit (ie. the last "2" from the last number). The reason for this is that the previous portion is greedy, and match the maximum possible, giving only the minimum of a single digit for this sub match.

And finally, the last ".*" part of the regex matches the remaining spaces.

Henry
 
Nakata kokuyo
Ranch Hand
Posts: 472
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Henry,

Thanks for pointing out !

Do you means the first part i should place this


second part, i just remain ?

third part, I have no idea how i can match, could you please share me more ?
 
Nakata kokuyo
Ranch Hand
Posts: 472
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks and will try it later !

Sergej Smoljanov wrote:
i mean something like this one.

 
Sergej Smoljanov
Ranch Hand
Posts: 472
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
also look:
Special constructs (named-capturing and non-capturing)
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
for example (?!

this time: (?! - mean look ahead and fint that not followed by \\w, you may combine inside bracket what you not want encounter by using or: |
 
Henry Wong
author
Posts: 23883
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Sergej Smoljanov wrote:
i mean something like this one.



First of all, I am not a fan of using the "[A-z]" pattern to represent the letters. There are characters between the upper case and lower case letters, and they will be matched too. A pattern like "[A-Za-z] is more accurate and easier to read.

Second, and this is more important, this will *NOT* work. The delimiter will grab a digit when splitting -- so the numbers will all lose their first digit.

Henry
 
Sergej Smoljanov
Ranch Hand
Posts: 472
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

this will *NOT* work


You right, sorry. Now also trying find decision.
 
Sergej Smoljanov
Ranch Hand
Posts: 472
10
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

This time it work (hope so)

There are characters between the upper case and lower case letters, and they will be matched too

thanks for this notion.
 
Nakata kokuyo
Ranch Hand
Posts: 472
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks everyone , i using below to split and it is working great, appreciated your guidance everyone especially to Sergej and Henry

String pattern = "\\s+(?=\\d+,\\d+)";
 
Nakata kokuyo
Ranch Hand
Posts: 472
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Question :

Does "?=" means positive number only ?


 
Sergej Smoljanov
Ranch Hand
Posts: 472
10
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
zero-width positive lookahead - (zero-width) mean that this part will not included in result, positive - that must much, lookahead - lookahead .
if you want include '-' minus to result you must write this like "\\s+(?=-?\\d+,)" - that mean that i look for one or more space, after this space(spaces) must be expression that i will not include in my finding this expression is: zero or one'-' (minus) "-?" that followed one or more digit that followed exactly one ','
You express \\d is [0123456789] (one of this)
if you want find '-' you must specified this. also if you want use others like '(' or '+' you must specify this.
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
this has good explanation of java regex, when i try answer your question i try this source.
(?!X) X, via zero-width negative lookahead, also this mean that expression followed by you try found is not included but must be not mach to X
 
    Bookmark Topic Watch Topic
  • New Topic