IntelliJ Java IDE
The moose likes Java in General and the fly likes inserting strings into another string Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "inserting strings into another string" Watch "inserting strings into another string" New topic
Author

inserting strings into another string

Gill Clover
Greenhorn

Joined: Aug 10, 2002
Posts: 28
Hi there,
I would like to do the following but so far have had no luck doing step 3:
I have one very long String containing an HTML page. I need to look through it for all URLs contained in it and extract them (either a copy of each or actually split the string into smaller strings composed of the string that comes before the URL, the URL, and the string that comes after it). I then want to encode the URLs (I'm using sessions so I need to use URL rewriting) and stick them all back into the HTML string.
1. I am able to extract the first URL in the HTML string no problem (but can't figure out how to loop through the string looking for all URL occurences)
2. I can then encode the URL easily
3. But I can't insert it back into the HTML string where the original, unencoded URL is/was. I tried using string method replaceFirst (regex, replacement) but it appears my whole URL is too long to fit into the character length permitted by this method's regex string.
My code is as follows:
String completeOutput = "...the HTML string here...";
int start = 0;
int end = 0;
int stringLength = completeOutput.length();
start = completeOutput.indexOf("http://"); // find the first occurence of this (i.e. a URL)
end = completeOutput.indexOf("</a>"); // find the end of the URL
// get a copy of the URL (which has some unwanted HTML on the end, which I will get rid of next to leave just the URL)
String substring = completeOutput.substring(start, end+4); // '4' is number of characters is '</a>'
System.out.println("String returned is from completeOutput is: " + substring);
// get rid of the unwanted HTML on the end of the URL
int end2 = substring.indexOf("\"");
String url = substring.substring(0, end2);
System.out.println("Final URL is: " + url);
String encodedURL = response.encodeURL(url);
// now I want to stick encodedURL back into the HTML string but this doesn't work...
//String newOutput = completeOutput.replaceFirst(url, encodedURL);

There surely must be a way of doing this but I sure can't see how - I would be most grateful if someone could
1. suggest how I can loop through the HTML string to find all occurances of URLs
2. stick the encodedURL's back into the HTML string
Thanks in advance,
Gillian Klee
Greg Charles
Bartender

Joined: Oct 01, 2001
Posts: 1855

The String class really isn't designed for constantly changing strings. Really, a String object is immutable, meaning it never changes. The methods that seem to change the String, actually create a brand new String and return it to you. A better approach is to put the String into a StringBuffer. Inserts there are a bit easier to manage.
I don't know how you find the URLs, but the StringBuffer probably doesn't have the methods you need. What you can do is find the list of indexes on the String, then use those indexes on the StringBuffer. As long as you start from the end, then your inserts won't mess up the indexes.
I also don't know why you're having trouble looping through the String. Most of the "finder" methods have a form that lets you find the next occurence after a given index. Have you tried that?
Gill Clover
Greenhorn

Joined: Aug 10, 2002
Posts: 28
Thanks for your reply Greg. I know that a String object is immutable etc, but couldn't find the methods I needed in the StringBuffer class to find the URLs. I'm going to try your String then StringBuffer idea, it sounds good.
Originally posted by Greg Charles:
I also don't know why you're having trouble looping through the String. Most of the "finder" methods have a form that lets you find the next occurence after a given index. Have you tried that?

I still can't figure out a loop condition that will look through the string until it finds no more URLs though. I know I need to find the first occurence of a URL, then use the index of the end of the URL as the index from where I search the string a second time etc, but I'm still not sure how to do it. Could you possibly post an example? I feel really dim to be asking this, but I can't for the life of me see how to do it!
Thanks again,
Gillian Klee
Gill Clover
Greenhorn

Joined: Aug 10, 2002
Posts: 28
Well, I finally had my breakthrough and figured out how to code that damn loop! And it was relatively easy, of course (don't know why I didn't think of it before). AND I did it all using the StringBuffer class. Thanks for your help, it gave me something to think about and helped me eventually figure out my problem.
For anyone who might be interested in the code, here it is:
String completeOutput = "...some HTML goes here...";
int start = 0;
int end = 0;
StringBuffer outputBuffer = new StringBuffer(completeOutput);
while (start != -1) {
start = outputBuffer.indexOf("http://", end);
if (start == -1) // there are no more occurences of "http://"
break;

end = outputBuffer.indexOf("</a>", start); // have to use this as the 'end' of the URL as it's unique within the HTML

System.out.println("Start of URL is: " + start);
System.out.println("Nearly end of URL is: " + end);

String almostURL = outputBuffer.substring(start, end+4); // '4' is number of characters in '</a>'

System.out.println("AlmostURL is: " + almostURL);

int realEnd = almostURL.indexOf("\"");
String url = almostURL.substring(0, realEnd);

System.out.println("Real end of URL is: " + realEnd);
System.out.println("URL is: " + url);

String encodedURL = "http://thisIsMyNewURLWithASessionIDAppendedToIt123456789B677889";

// 'realEnd' is the end of the URL when the URL starts at index 0, as it does in 'almostURL'
outputBuffer.delete(start, start+realEnd);
outputBuffer.insert(start, encodedURL);
} // while
System.out.println("OutputBuffer is now: " + outputBuffer);
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
And now for something completely different ... you could parse the HTML into a DOM and work there. See http://www.quiotix.com/downloads/html-parser/ This is a very nice example of the visitor pattern. I use it for several manipulations on all HTML files in a directory and could share a visitor that deals with links if you like.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Gill Clover
Greenhorn

Joined: Aug 10, 2002
Posts: 28
Originally posted by Crash Landon:
And now for something completely different ... you could parse the HTML into a DOM and work there. See http://www.quiotix.com/downloads/html-parser/ This is a very nice example of the visitor pattern. I use it for several manipulations on all HTML files in a directory and could share a visitor that deals with links if you like.

Thanks for the suggestion, that looks quite interesting! Although I think for now I'm going to stick with what I've got for now, as I'm in the middle of my dissertation for which the hand-in date is looming...
Gillian
Debashish Chakrabarty
Ranch Hand

Joined: May 14, 2002
Posts: 224

Originally posted by Gillian Klee:
start = outputBuffer.indexOf("http://", end);

Do we have an indexOf() method in StringBuffer class?


Debashish
SCJP2, SCWCD 1.4
Gill Clover
Greenhorn

Joined: Aug 10, 2002
Posts: 28
Originally posted by Debashish Chakrabarty:

Do we have an indexOf() method in StringBuffer class?

I don't know about previous versions of Java but there certainly is in 1.4.1!
Java 1.4.1 API
[ February 12, 2003: Message edited by: Gillian Klee ]
Debashish Chakrabarty
Ranch Hand

Joined: May 14, 2002
Posts: 224

Thanks Gillian. It's time I upgrade from jdk1.3. For ignorants like me here are the new features in Java 1.4 and here are the changes in the core libraries.
Regards,
[ February 12, 2003: Message edited by: Debashish Chakrabarty ]
 
 
subject: inserting strings into another string
 
Threads others viewed
Removing substrings from strings
Regular Expressions and String replacements
pulling my hair out
.csv-file import and export problem
Substring of a string which matches RegularExpression
developer file tools

cast iron skillet 49er

more from paul wheaton's glorious empire of web junk: cast iron skillet diatomaceous earth rocket mass heater sepp holzer raised garden beds raising chickens lawn care CFL flea control missoula heat permaculture