• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Trouble removing spaces in ArrayList

 
Bartender
Posts: 1973
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

I have an array list with these string elements:

"     1"
"2"
"3"
"4    "

But, running this code does not remove the spaces:


Even in the debugger, when this code runs, the spaces are still present.

The lists are not "final" or any other problem cause I can think of.

Suggestions?

Thanks,

-- mike
 
Master Rancher
Posts: 5060
81
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your code really looks like it should remove the spaces.  Is it possible you're still looking at the original list, rather than the new one?  If not, can you show more code of where this is called and how you know it's not working?
 
Sheriff
Posts: 4641
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The code looks fine.  Are you sure that the blank characters are actually spaces?
 
Ron McLeod
Sheriff
Posts: 4641
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ron McLeod wrote:Are you sure that the blank characters are actually spaces?


For example:
 
Mike London
Bartender
Posts: 1973
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maybe it's how I'm creating the list that is the problem. It "looks" like spaces but this is in a SpringBoot.

Below is a section of the code that builds the UTF-8 list from either of two at the top:



This must be where the problem is.

If I just add an ArrayList in code, and add values to it, the stream() API works fine.

Suggestions?

Thanks,

- mike
 
Mike London
Bartender
Posts: 1973
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here's the value of  "       1"       (that refuses to trim)

after this code executes:

utf8Lines = Files.readAllLines(
                               filePath, StandardCharsets.UTF_8)
                       .stream()
                       .filter(line -> !line.matches("^\\s*$"))     // skip blank lines
                       .collect(Collectors.toList());



[-1, -2, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 49, 0]

Doesn't seem to be UTF-8, right?

Confusing...

Thanks,

-- mike
 
Sheriff
Posts: 28323
95
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I recently spent quite a while trying to find why two strings from two totally different external sources appeared to be identical but were actually not. The strings were only two words long, how could they be different? It turned out that one of them had a non-breaking space between the two words -- took me a long time to find that.
 
Mike London
Bartender
Posts: 1973
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:I recently spent quite a while trying to find why two strings from two totally different external sources appeared to be identical but were actually not. The strings were only two words long, how could they be different? It turned out that one of them had a non-breaking space between the two words -- took me a long time to find that.



It looks like my "      1" string isn't the expected value after my (attempted) "UTF-8" conversion.

Not sure what's off.

-- mike
 
Paul Clapham
Sheriff
Posts: 28323
95
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike London wrote:Doesn't seem to be UTF-8, right?

More like UTF-16, with some kind of marker at the beginning.
 
Ron McLeod
Sheriff
Posts: 4641
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you can determine which characters are causing you grief, you could remove them before using trim.  U+00A0, U+2007, and U+202F are the likely ones.
 
Mike London
Bartender
Posts: 1973
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ron McLeod wrote:If you can determine which characters are causing you grief, you could remove them before using trim.  U+00A0, U+2007, and U+202F are the likely ones.



That's a good idea, I'll investigate ... but here's the thing... this should work (famous last words, I know).

In BBEdit, I created a simple text file:

   1
2
3
4

In a standalone Java program -> should convert to UTF-8, so why the Unicode characters?:



Putting the break point on the sop, I get:

0 = "    1"
1 = "2"
2 = "3"
3 = "4"

WTF?
 
Ron McLeod
Sheriff
Posts: 4641
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you share a file (attachment to post, not text in post) which is problematic?
 
Mike London
Bartender
Posts: 1973
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ron McLeod wrote:Can you share a file (attachment to post, not text in post) which is problematic?



Sure, attached.

Just set the path to the text file (list1.txt) on your system, set a breakpoint on the SOP and note that no trimming occurred of the first value.

Look forward to hearing your results.

Thanks Ron.

-- mike
Filename: Trimming-issue.zip
File size: 695 bytes
 
Ron McLeod
Sheriff
Posts: 4641
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It looks like that file is UTF-8 with a BOM sequence.

I tried this and it did work.  It helps show what the problem is, but is a bit of a hack and a better solution should be used.
 
Mike London
Bartender
Posts: 1973
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ron McLeod wrote:It looks like that file is UTF-8 with a BOM sequence.

I tried this and it did work.  It helps show what the problem is, but is a bit of a hack and a better solution should be used.



Thanks. Weird. I just created two new BBEdit files and they worked, too.

Don't you love Encoding issues?

Thanks very much.

-- mike
 
Marshal
Posts: 79956
396
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ron McLeod wrote:. . . remove them before using trim.  U+00A0, U+2007, and U+202F are the likely ones. . . .

It is a long time since I used trim() and I might be mistaken, but I believe it doesn't remove such characters as hard space (\u00a0). Try String#strip() instead.
 
Mike London
Bartender
Posts: 1973
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:

Ron McLeod wrote:. . . remove them before using trim.  U+00A0, U+2007, and U+202F are the likely ones. . . .

It is a long time since I used trim() and I might be mistaken, but I believe it doesn't remove such characters as hard space (\u00a0). Try String#strip() instead.




Cool tip, thank you! Will experiment.

-- mike
 
Campbell Ritchie
Marshal
Posts: 79956
396
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike London wrote:. . . a better solution should be used. . . .

Try the isBlank() method without bothering to trim or strip anything.

Don't you love Encoding issues? . . .

They are better than beer, aren't they
 
Campbell Ritchie
Marshal
Posts: 79956
396
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I tried printing all the chars in that text file and got this:-

\ufeff \u0020 \u0020 \u0020 \u0020 \u0031
\u0032
\u0033
\u0034

I think you are right that the byte order mark is the real problem.
 
Campbell Ritchie
Marshal
Posts: 79956
396
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Output from my program, as before:-

~/java$ java EncodingDemo.java /run/media/campbell/TOSHIBA/Trimming-issue/list1.txt
showChars() method
\ufeff \u0020 \u0020 \u0020 \u0020 \u0031
\u0032
\u0033
\u0034

Tried a few tweaks. All kludgy to the worst degree.

java$ java EncodingDemo.java /run/media/campbell/TOSHIBA/Trimming-issue/list1.txt
showChars() method
\u00a0 \u0020 \u0020 \u0020 \u0020 \u0031
\u0032
\u0033
\u0034

Output unchanged Even better, guess what I got from JShell!

jshell> Character.isWhitespace((char)0xa0)
$1 ==> false

So the hard space doesn't seem to count as whitespace.Now we have got what you want

/java$ java EncodingDemo.java /run/media/campbell/TOSHIBA/Trimming-issue/list1.txt
showChars() method
\u0031
\u0032
\u0033
\u0034


 
Saloon Keeper
Posts: 10929
87
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Following along with all of this I get...
 
Ron McLeod
Sheriff
Posts: 4641
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In the example file that ML posted, the BOM sequence was ef bb bf (UTF-8), not fe ff (UTF-16).
 
Carey Brown
Saloon Keeper
Posts: 10929
87
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, I tried that BOM replacement and it didn't work but my above code did. And I just ran 'od' on my download of the file and I got the same thing you did.
 
Ron McLeod
Sheriff
Posts: 4641
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ron McLeod wrote:In the example file that ML posted, the BOM sequence was ef bb bf (UTF-8), not fe ff (UTF-16).


char uses UTF-16, and BOM is represented in UTF-16 as 0xfeff, so comparing to 0xfeff is correct when work with a char.
 
Ron McLeod
Sheriff
Posts: 4641
582
VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here's another approach (although with more code) using a BufferedReader and peeking at the first char in the buffer.
 
Won't you please? Please won't you be my neighbor? - Fred Rogers. Tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic