wood burning stoves 2.0*
The moose likes Beginning Java and the fly likes split a string and differentiate the elements in string array Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "split a string and differentiate the elements in string array" Watch "split a string and differentiate the elements in string array" New topic
Author

split a string and differentiate the elements in string array

Krishna Chaitanya Reddy Balam
Greenhorn

Joined: Feb 19, 2008
Posts: 22
Hi all:
I am trying to split a string basing on some HTML tags inside it
if
String s = "I am <b> bold </b>";
I am want that to be converted to string array and I must be able to differentiate which element in the string array was inside the tags.
Is there any way to do it.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38107
    
  22
Depends what you want to split on. There is a split() method in the String class which does what you want, but it takes a regular expression as its parameter.

If you are not familiar with regular expressions, try here to start you off.
Krishna Chaitanya Reddy Balam
Greenhorn

Joined: Feb 19, 2008
Posts: 22
but will I be able to differentiate the string int between bold tags in the string array.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38107
    
  22
You can probably design a regular expression which will match <b> and </b> tags, so you should be able to do that, yes.
Krishna Chaitanya Reddy Balam
Greenhorn

Joined: Feb 19, 2008
Posts: 22
I think this is the pattern for <b> tags
<b\b[^>]*>(.*?)</b>
and
this is for general HTML tags
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>

but if I write something like
String S4 = "I am <b>bold</b> and I am <i>italic</i> and I am <b><i>bold italic</i></b>"
Pattern htmlTag = Pattern.compile("<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>");
int length = s4.length();
Matcher matcher = pbold.matcher(s4);
String result = matcher.group();

I need to get the output to String array

like
String[] sa;
and sa should contain {"I am", "bold", "and I am","italic","and I am","bold italic"}
I konow I can get this but after storing in string array I need to differentiate that sa[1] was between bold tags and sa[3] was in italic tags and sa[5] was in bold italic tags.

Is there any way to do this.
Right now I am parsing the string character by character and doing it bu tI need something more generic as it is difficult to have nested tags with character logic.
Please help
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38107
    
  22
Difficult to be sure just looking at the code, but you appear to be matching everything from a <b> tag to the next </b> tag. I think you want to match only the <b> and </b>.

You might do well to Google for HTML parsers, as well, if you are looking for more than one kind of tag. Why spend hours and hours re-inventing the wheel?
Krishna Chaitanya Reddy Balam
Greenhorn

Joined: Feb 19, 2008
Posts: 22
I did and nothing helps.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38107
    
  22
Sorry to hear that.
List of HTML parsers here.

Try myString.split("</.+>") or myString.split("<b>").
[Campbell@queeg applications]$ java BoldSplitter
I am
bold</b> and I am <i>italic</i> and I am
<i>bold italic</i></b>
[Campbell@queeg applications]$
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: split a string and differentiate the elements in string array
 
Similar Threads
String --> split/regex question
JSF and trinidad- Help with multiple iterations in a table
Course Splitter
Set character formatting in Excel using POI
splitting arrays