aspose file tools*
The moose likes Beginning Java and the fly likes String manipulattion Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "String manipulattion" Watch "String manipulattion" New topic
Author

String manipulattion

pradipta kumar rout
Ranch Hand

Joined: Sep 13, 2010
Posts: 43
Sir,

Sir I have saved a html doccument in .txt format .I have used Pattern class but not able to get my result as I want

1. all the strings except <title>,a, the like this the unnecessary words from a file.

Kindly give me a solution.

Thank you
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18971
    
  40


Can you (1) give us an example of the file (preferably small), (2) what is it exactly that you want from the file?, and (3) what Pattern (and code) that you tried so far?

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
pradipta kumar rout
Ranch Hand

Joined: Sep 13, 2010
Posts: 43
to : Henry Wong

Sir,
Thank you for the response.

1.Here is the file sample "<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head id="head1"><title>santabanta mobile home</title> <link href="css/default.css" rel="stylesheet" type="text/css" /></head><body topmargin="0" leftmargin="0"> <table width="100%" cellpadding="3" cellspacing="0" border="0" align="center"> <tr> <td colspan="3" align="center" class="td1">"


2. I want all the words find from this except the followings
2.1 tag names like <title>,<head> etc
2.2 articles a, an the etc
2.3 other unnecessary strings.
3.
I have used a pattern "\\S+","\\S+|^<title>" , but I donot find any patternt by which I can select all strings except the above strings.


Is there any other way to retrive these string except the unnesessary string.
kindly help me I am doing my project on this. so please give some solution.

Thank you
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18971
    
  40

pradipta kumar rout wrote:
3.
I have used a pattern "\\S+","\\S+|^<title>" , but I donot find any patternt by which I can select all strings except the above strings.


With regex, you must describe what you want -- not what you don't want.

I am assuming that you want strings between certain tags. In that case, look into describing those tags, and using a subgroup for the parts within those tags that you want.

Henry
pradipta kumar rout
Ranch Hand

Joined: Sep 13, 2010
Posts: 43
to : Henry Wong

Sir,
Thank you,Sir its right I want to retrive the data between tags but how to retrive
give me one example

1. <title>javaranch</title> kindly give me code snippet to retrive javaranch or anything betwen <title> tag .

2. As there more than one tag so how to retrive all data between tags .
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39782
    
  28
What about the String#split(java.lang.String) method?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18971
    
  40

pradipta kumar rout wrote:
1. <title>javaranch</title> kindly give me code snippet to retrive javaranch or anything betwen <title> tag .


As already mentioned, take a look at the regex group feature, which can be used to extract parts of a match.

pradipta kumar rout wrote:
2. As there more than one tag so how to retrive all data between tags .


You have yet to post any code, so we can't tell what you are doing wrong -- but what you described can easily be done with the find() method.


And BTW, just in case you haven't figured it out yet, regexes is not something that is easily learned by example. It may be best to learn the feature, and the API, and not just an example that targets a specific task.

Henry

James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

I would have thought that a starting point for this problem would have been an HTML parser such as http://htmlparser.sourceforge.net/. One then only has to extract the content of the required elements and filter that to remove the unwanted words.


Retired horse trader.
 Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.
 
wood burning stoves
 
subject: String manipulattion