File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes searching a string in an HTML  file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "searching a string in an HTML  file" Watch "searching a string in an HTML  file" New topic
Author

searching a string in an HTML file

sahid ul karim
Greenhorn

Joined: Jun 06, 2007
Posts: 20
How to search a string in an HTML file using java? I want to parse the html page.I have written a code to serach but its seraching including html tag.But i want to search only in html body only.
Here is my code:-
public class match {
public static void main(String[] args) throws IOException {
String pattern = "The String";
Pattern regPat = Pattern.compile(pattern);
Matcher matcher = regPat.matcher("");
BufferedReader reader =
new BufferedReader(new FileReader("D:\\report1.html"));
String line;
int count=0;
while ((line = reader.readLine()) != null) {
matcher.reset(line);
if (matcher.find()) {
System.out.println(line);

}
Jan van Mansum
Ranch Hand

Joined: Oct 19, 2007
Posts: 74
There are several ways to do this. Which one you want depends much on how efficient it needs to be. For me the simplest way to do this would be to:

  • Search for the first occurrence of "<body>"
  • Search for the first occurrence of "</body>"
  • Create a copy of the substring between "<body>" and "</body>" deleting all occurrences of "<" followed by zero or more characters followed by ">"
  • Search the copy for the string you want


  • This is certainly not the most efficient way to do it, but to me it seems the most easy to understand. The part where you copy the substring is what makes it particularly inefficient. However, if you are just searching one or a couple of medium-sized HTML-pages, that is not a problem.

    Another approach could be to try and create one big regular expression for your search.
    [ November 06, 2007: Message edited by: Jan van Mansum ]

    SCJP 1.4, SCWCD 1.4
    sahid ul karim
    Greenhorn

    Joined: Jun 06, 2007
    Posts: 20
    can you please write some code for that?
    Joe Ess
    Bartender

    Joined: Oct 29, 2001
    Posts: 8876
        
        8

    Sahid, though we love to help people understand and write code, JavaRanch is Not A Code Mill. We will not do your work for you. Try taking Jan's suggestions into consideration and show us some code. We'll be glad to give you a hand if you Show Some Effort.


    "blabbing like a narcissistic fool with a superiority complex" ~ N.A.
    [How To Ask Questions On JavaRanch]
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: searching a string in an HTML file