• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

searching a string in an HTML file

 
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How to search a string in an HTML file using java? I want to parse the html page.I have written a code to serach but its seraching including html tag.But i want to search only in html body only.
Here is my code:-
public class match {
public static void main(String[] args) throws IOException {
String pattern = "The String";
Pattern regPat = Pattern.compile(pattern);
Matcher matcher = regPat.matcher("");
BufferedReader reader =
new BufferedReader(new FileReader("D:\\report1.html"));
String line;
int count=0;
while ((line = reader.readLine()) != null) {
matcher.reset(line);
if (matcher.find()) {
System.out.println(line);

}
 
Ranch Hand
Posts: 74
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There are several ways to do this. Which one you want depends much on how efficient it needs to be. For me the simplest way to do this would be to:

  • Search for the first occurrence of "<body>"
  • Search for the first occurrence of "</body>"
  • Create a copy of the substring between "<body>" and "</body>" deleting all occurrences of "<" followed by zero or more characters followed by ">"
  • Search the copy for the string you want


  • This is certainly not the most efficient way to do it, but to me it seems the most easy to understand. The part where you copy the substring is what makes it particularly inefficient. However, if you are just searching one or a couple of medium-sized HTML-pages, that is not a problem.

    Another approach could be to try and create one big regular expression for your search.
    [ November 06, 2007: Message edited by: Jan van Mansum ]
     
    sahid ul karim
    Greenhorn
    Posts: 20
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    can you please write some code for that?
     
    Bartender
    Posts: 9626
    16
    Mac OS X Linux Windows
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Sahid, though we love to help people understand and write code, JavaRanch is Not A Code Mill. We will not do your work for you. Try taking Jan's suggestions into consideration and show us some code. We'll be glad to give you a hand if you Show Some Effort.
     
    Let nothing stop you! Not even this tiny ad:
    a bit of art, as a gift, that will fit in a stocking
    https://gardener-gift.com
    reply
      Bookmark Topic Watch Topic
    • New Topic