Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to make a comparaison cyclic with a url

 
david luis
Ranch Hand
Posts: 53
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'm newbie in Java and I would like to make a program to can campare every X minutes a web page because I must notify if there is some changes in the web page.
I don't know if exits some library in java to make it more easier.

Many thanks and sorry for my english!
 
Steve Luke
Bartender
Posts: 4181
21
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is a class called URLConnection which will help you connect and communicate with the webpage. Here is a tutorial for using it: http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html

There are easier-to-use third-party tools if you get into a system that is more than just a simple GET request to the website. One such tool is Apache HttpClient.

As for scheduling: There are tools to do this in Java (Timers, ScheduledExecutorService, ...) But this sort of scheduling is usually best implemented using external schedulers like *nix's cron, the Windows Scheduler, or Quartz
 
david luis
Ranch Hand
Posts: 53
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:There is a class called URLConnection which will help you connect and communicate with the webpage. Here is a tutorial for using it: http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html

There are easier-to-use third-party tools if you get into a system that is more than just a simple GET request to the website. One such tool is Apache HttpClient.

As for scheduling: There are tools to do this in Java (Timers, ScheduledExecutorService, ...) But this sort of scheduling is usually best implemented using external schedulers like *nix's cron, the Windows Scheduler, or Quartz


Thanks Steve,
I know that with Httpclient I cant get the html code but there is some library to campre html code with other to can see if there is some differences?
Thanks and sorry for my english!
 
Steve Luke
Bartender
Posts: 4181
21
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah, there is no good built-in tool for measuring diffs (diffs is the 'differences between two Strings') in Java. All the tools are there (regex, sub-String matching/splitting,...), but nothing implemented to actually calculate the diff.

Google has a DIFF/MATCH/PATCH library implemented in a variety of languages: http://code.google.com/p/google-diff-match-patch/. I think it can be used for your purpose if you treat the text of the pages as simple Strings. Finding the Diffs can be complex and take some time, so my suggestion would be to take the returned HTML and filtering out all the bits that you don't care about. You can usually use HTML/XML parsers to select the parts of the returned page you care about as long as the page is well-formed.
 
Junilu Lacar
Bartender
Pie
Posts: 7465
50
Android Eclipse IDE IntelliJ IDE Java Linux Mac Scala Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Questions:
1. Are you interested in monitoring only one page or multiple pages?
2. Are the pages that you'll be watching external or internal to your local network?
3. What kind of updates are you watching for? Any update or just specific kinds of updates?
4. Is there an option to use something like an RSS feed to learn about interesting updates to the page instead?
 
david luis
Ranch Hand
Posts: 53
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu Lacar wrote:Questions:
1. Are you interested in monitoring only one page or multiple pages?
2. Are the pages that you'll be watching external or internal to your local network?
3. What kind of updates are you watching for? Any update or just specific kinds of updates?
4. Is there an option to use something like an RSS feed to learn about interesting updates to the page instead?


1.two pages
2.External
3.The updates are new entrys in the page
4.No RSS

Thanks
 
Steve Luke
Bartender
Posts: 4181
21
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:...You can usually use HTML/XML parsers to select the parts of the returned page you care about as long as the page is well-formed.

Which brings up another suggestion - depending on the differences you care about. You could use the XML/HTML parser to select the parts that you care about and put them in a List or Set. Then use List/Set methods to remove Objects that were present in the old list. For example:
 
Winston Gutkowski
Bartender
Pie
Posts: 10243
58
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
david luis wrote:I'm newbie in Java and I would like to make a program to can campare every X minutes a web page because I must notify if there is some changes in the web page.

I'm not going to try and improve on the good advice already given, so I'm going to try a different tack:

Why do you need to do this?

It seems to me that what you've told us is how you want to implement something, rather than what it is you need to do. If you gave us a bit more background, we might be able to offer design alternatives to a "busy page check"; but right now all we can do is advise on possible alternatives.

Winston
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic