Win a copy of Five Lines of Code this week in the OO, Patterns, UML and Refactoring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

searching keywords in webpage

 
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What is best way of searching some keywrods in web page. In Database, i have list of webpages. In each webpage, i want to check if some keyword are present. I m storing webpage data as String.Is String.indexOf() methods good one ? Size of page may differ.
 
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maintain a collection of words you want to search . When a collection is implemented we can verify if a particular word is present or not buy using contains method. which would return tree if the word exists in a collection or false which states the world does not exist.

Sizeof would return the size of the string which does not serve your purpose of searching.
 
Marshal
Posts: 69789
277
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch
 
Campbell Ritchie
Marshal
Posts: 69789
277
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am not convinced Sizeof exists in Java; it looks like a C keyword. Do you mean indexOf? You should use the String#contains method rather than indexOf if you only want to check for existence of a substring.

For a linear search, you would have to iterate the text of the webpage once for every keyword. Also what will happen for the keyword short if the text includes shorten? I think I shall work from Balaji Vankadaru's suggestion.
Put your keywords into a set. Split the text into a String[], maybe splitting on whitespace. Iterate the split array and see whether the set contains each word.
 
Arjun Shastry
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks.
 
For my next trick, I'll need the help of a tiny ad ...
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
    Bookmark Topic Watch Topic
  • New Topic