aspose file tools*
The moose likes Beginning Java and the fly likes How to make a comparaison cyclic with a url Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "How to make a comparaison cyclic with a url" Watch "How to make a comparaison cyclic with a url" New topic
Author

How to make a comparaison cyclic with a url

david luis
Ranch Hand

Joined: Mar 10, 2011
Posts: 46
Hi,

I'm newbie in Java and I would like to make a program to can campare every X minutes a web page because I must notify if there is some changes in the web page.
I don't know if exits some library in java to make it more easier.

Many thanks and sorry for my english!
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

There is a class called URLConnection which will help you connect and communicate with the webpage. Here is a tutorial for using it: http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html

There are easier-to-use third-party tools if you get into a system that is more than just a simple GET request to the website. One such tool is Apache HttpClient.

As for scheduling: There are tools to do this in Java (Timers, ScheduledExecutorService, ...) But this sort of scheduling is usually best implemented using external schedulers like *nix's cron, the Windows Scheduler, or Quartz


Steve
david luis
Ranch Hand

Joined: Mar 10, 2011
Posts: 46
Steve Luke wrote:There is a class called URLConnection which will help you connect and communicate with the webpage. Here is a tutorial for using it: http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html

There are easier-to-use third-party tools if you get into a system that is more than just a simple GET request to the website. One such tool is Apache HttpClient.

As for scheduling: There are tools to do this in Java (Timers, ScheduledExecutorService, ...) But this sort of scheduling is usually best implemented using external schedulers like *nix's cron, the Windows Scheduler, or Quartz


Thanks Steve,
I know that with Httpclient I cant get the html code but there is some library to campre html code with other to can see if there is some differences?
Thanks and sorry for my english!
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Ah, there is no good built-in tool for measuring diffs (diffs is the 'differences between two Strings') in Java. All the tools are there (regex, sub-String matching/splitting,...), but nothing implemented to actually calculate the diff.

Google has a DIFF/MATCH/PATCH library implemented in a variety of languages: http://code.google.com/p/google-diff-match-patch/. I think it can be used for your purpose if you treat the text of the pages as simple Strings. Finding the Diffs can be complex and take some time, so my suggestion would be to take the returned HTML and filtering out all the bits that you don't care about. You can usually use HTML/XML parsers to select the parts of the returned page you care about as long as the page is well-formed.
Junilu Lacar
Bartender

Joined: Feb 26, 2001
Posts: 4462
    
    6

Questions:
1. Are you interested in monitoring only one page or multiple pages?
2. Are the pages that you'll be watching external or internal to your local network?
3. What kind of updates are you watching for? Any update or just specific kinds of updates?
4. Is there an option to use something like an RSS feed to learn about interesting updates to the page instead?


Junilu - [How to Ask Questions] [How to Answer Questions]
david luis
Ranch Hand

Joined: Mar 10, 2011
Posts: 46
Junilu Lacar wrote:Questions:
1. Are you interested in monitoring only one page or multiple pages?
2. Are the pages that you'll be watching external or internal to your local network?
3. What kind of updates are you watching for? Any update or just specific kinds of updates?
4. Is there an option to use something like an RSS feed to learn about interesting updates to the page instead?


1.two pages
2.External
3.The updates are new entrys in the page
4.No RSS

Thanks
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Steve Luke wrote:...You can usually use HTML/XML parsers to select the parts of the returned page you care about as long as the page is well-formed.

Which brings up another suggestion - depending on the differences you care about. You could use the XML/HTML parser to select the parts that you care about and put them in a List or Set. Then use List/Set methods to remove Objects that were present in the old list. For example:
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7779
    
  21

david luis wrote:I'm newbie in Java and I would like to make a program to can campare every X minutes a web page because I must notify if there is some changes in the web page.

I'm not going to try and improve on the good advice already given, so I'm going to try a different tack:

Why do you need to do this?

It seems to me that what you've told us is how you want to implement something, rather than what it is you need to do. If you gave us a bit more background, we might be able to offer design alternatives to a "busy page check"; but right now all we can do is advise on possible alternatives.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to make a comparaison cyclic with a url