Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

split a string and differentiate the elements in string array

 
Krishna Chaitanya Reddy Balam
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all:
I am trying to split a string basing on some HTML tags inside it
if
String s = "I am <b> bold </b>";
I am want that to be converted to string array and I must be able to differentiate which element in the string array was inside the tags.
Is there any way to do it.
 
Campbell Ritchie
Sheriff
Pie
Posts: 48940
60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Depends what you want to split on. There is a split() method in the String class which does what you want, but it takes a regular expression as its parameter.

If you are not familiar with regular expressions, try here to start you off.
 
Krishna Chaitanya Reddy Balam
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
but will I be able to differentiate the string int between bold tags in the string array.
 
Campbell Ritchie
Sheriff
Pie
Posts: 48940
60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can probably design a regular expression which will match <b> and </b> tags, so you should be able to do that, yes.
 
Krishna Chaitanya Reddy Balam
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think this is the pattern for <b> tags
<b\b[^>]*>(.*?)</b>
and
this is for general HTML tags
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>

but if I write something like
String S4 = "I am <b>bold</b> and I am <i>italic</i> and I am <b><i>bold italic</i></b>"
Pattern htmlTag = Pattern.compile("<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>");
int length = s4.length();
Matcher matcher = pbold.matcher(s4);
String result = matcher.group();

I need to get the output to String array

like
String[] sa;
and sa should contain {"I am", "bold", "and I am","italic","and I am","bold italic"}
I konow I can get this but after storing in string array I need to differentiate that sa[1] was between bold tags and sa[3] was in italic tags and sa[5] was in bold italic tags.

Is there any way to do this.
Right now I am parsing the string character by character and doing it bu tI need something more generic as it is difficult to have nested tags with character logic.
Please help
 
Campbell Ritchie
Sheriff
Pie
Posts: 48940
60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Difficult to be sure just looking at the code, but you appear to be matching everything from a <b> tag to the next </b> tag. I think you want to match only the <b> and </b>.

You might do well to Google for HTML parsers, as well, if you are looking for more than one kind of tag. Why spend hours and hours re-inventing the wheel?
 
Krishna Chaitanya Reddy Balam
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I did and nothing helps.
 
Campbell Ritchie
Sheriff
Pie
Posts: 48940
60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry to hear that.
List of HTML parsers here.

Try myString.split("</.+>") or myString.split("<b>").
[Campbell@queeg applications]$ java BoldSplitter
I am
bold</b> and I am <i>italic</i> and I am
<i>bold italic</i></b>
[Campbell@queeg applications]$
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic