• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

String Split with "\\t"

 
Saurabh Pillai
Ranch Hand
Posts: 524
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

So I am getting text file from somewhere. According to her, the file data is tab delimited. But when I parse it with above code, it does not give expected output. I know that tab is actually editor specific, you can configure tab to 4 while other can set it to 8. But how does Java (above code) interprets it? Now as I have edited the file with setting proper tab on MY machine, it parses it perfectly and I am getting expected result.

 
Henry Wong
author
Marshal
Pie
Posts: 21021
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Saurabh Pillai wrote:
So I am getting text file from somewhere. According to her, the file data is tab delimited. But when I parse it with above code, it does not give expected output. I know that tab is actually editor specific, you can configure tab to 4 while other can set it to 8. But how does Java (above code) interprets it? Now as I have edited the file with setting proper tab on MY machine, it parses it perfectly and I am getting expected result.



The regex library parses "\\t" as a tab character -- meaning ASCII code 9. It doesn't do anything special such as treat spaces to a position as a tab, etc.

Henry
 
Saurabh Pillai
Ranch Hand
Posts: 524
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:The regex library parses "\\t" as a tab character -- meaning ASCII code 9. It doesn't do anything special such as treat spaces to a position as a tab, etc.

Henry

So as it does not parse the file properly, it is safe to conclude that actually the file is not tab delimited, right? Yes, I think so.
 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
Saurabh Pillai
Ranch Hand
Posts: 524
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes the file is inconsistent with tabs, some fields are multi-tabs delimited while others are single tab delimited.

I think I need to start seeing the file as string of encoded characters

Thank you guys.
 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If they really are multiple tabs, you have two (or more) possible responses. You can request a new version of the file in the correct format, or change your regex slightly
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic