Win a copy of TDD for a Shopping Website LiveProject this week in the Testing forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Jeanne Boyarsky
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Frits Walraven
Bartenders:
  • Piet Souris
  • Himai Minh

Connection time out

 
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am getting my feet slightly wet in this area of Java. I want to read HTML records directly from a website. I picked a very simple example off the net which looks something like this:
 
 
// READ HTML FILE FROM WEB
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;

public class ReadHTML {

   public static void main(String[] args) throws IOException {

     
       URL url = new URL("https://webcode.me");
       
       BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));

           String line;

           StringBuilder sb = new StringBuilder();

           while ((line = br.readLine()) != null) {

               sb.append(line);
               sb.append("\r\n");
               
           }  // WHILE END

           System.out.println(sb);
       
   }          // MAIN END
}              // CLASS END

When it executes I get the following error msgs:
Exception in thread "main" java.net.ConnectException: Connection timed out: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:352)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:214)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:201)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:378)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.connect(Socket.java:477)
at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at com.ibm.net.ssl.www2.protocol.https.c.<init>(c.java:91)
at com.ibm.net.ssl.www2.protocol.https.c.a(c.java:60)
at com.ibm.net.ssl.www2.protocol.https.d.getNewHttpClient(d.java:2)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:796)
at com.ibm.net.ssl.www2.protocol.https.d.connect(d.java:60)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1044)
at com.ibm.net.ssl.www2.protocol.https.b.getInputStream(b.java:67)
at java.net.URL.openStream(URL.java:1011)
at ReadHTML.main(ReadHTML.java:14)


On my browser I can see //webcode.me, so it really exists.
I am missing something but what? Discussions talk about proxies, firewalls, etc.
I want to keep it as simple as possible.
Thanks!
 
Marshal
Posts: 27211
87
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Robert Houston wrote:On my browser I can see //webcode.me, so it really exists.



You can? When I try to connect to it I eventually get a page which says "The server at webcode.me is taking too long to respond."

Oh. But that happens when I use the URL which you posted, which is an HTTPS URL. If I change it to HTTP then it connects just fine. You could try that too.
 
Robert Houston
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, you caught it. The "S" needs to be dropped, my bad.
What I ultimately want to do is to read a fishing report in URL (https://www.accuweather.com/en/us/mayport-fl/32233/fishing-weather/2245467) so that I can get the "fishing values" for the next x days. I fish.
Accessing this URL via my browser works well. From the small class example I have shown you, not so good. I've played around with the timeout value, including giving it a zero, but no dice. There is more to this than timeout.
I guess its not as straightforward as I thought.
 
Paul Clapham
Marshal
Posts: 27211
87
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Considering that the AccuWeather apps aren't free, I would guess that trying to download their data isn't free either. I did a quick search and couldn't find out how to set up a contract with them but I'm guessing maybe you didn't really want to do that?
 
Robert Houston
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't pay for their data just to view it. As far as I know, it's there for the public to see. Is there a dynamic here that I am unaware of, as in I must pay to read their data? It's my first attempt at this so I approach the subject from a viewpoint of blissful ignorance. Thanks to the time to respond.
 
Paul Clapham
Marshal
Posts: 27211
87
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you want to view it online, then you're going to be sitting at a screen clicking buttons. That's a low volume of usage which they can afford to give away for free. It's a common sales technique.

Whereas if you want to download it using a program, that could be a very high volume of usage. And maybe they wouldn't want to give that away for free. Now, I don't know for sure that they have a program which charges you for downloads. I could be wrong. They might just prevent all access which isn't from a browser or one of their apps.

If the latter is the case then I've had success in the past by putting in a browser description in the User-Agent header. But if you're just getting your feet "slightly wet" then maybe you wouldn't want to go through that sort of thing.
 
Robert Houston
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Life is often more complicated than we think. Just when I had an idea, it sinks in early sunset.

Thanks for your time!
 
Saloon Keeper
Posts: 7317
170
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:If the latter is the case then I've had success in the past by putting in a browser description in the User-Agent header.


Me too. I do that in a couple of apps that are low-volume, and thus unlikely to impact the site more than an actual human being clicking their way through it. Something like this may do the trick:



For one particular site I had to add con.setRequestProperty("Host", "site.ext"); as well, with "site.ext" being the domain the content was served from.
 
Saloon Keeper
Posts: 25463
180
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, weather services - including Accuweather do have APIs that allow downloading their data in a more machine-friendly format. Whether the pricing and restrictions are suitable, I leave to others to determine.

Screen-scraping is an ancient and common practice. There are resources specifically designed to help with stuff like that, such as Python's BeautifulSoup library.

As to the morality and ethics, that's another matter. As a general rule, I'd consider data posted on an open website to be copyrighted information, so capturing data for personal use would generally be "fair use". But lawyers can argue about that that, and it gets into even more fun when client and server are in radically different legal jurisdictions.

Still, if you only download once a day, it's unlikely that anyone is going to prosecute. It's when you pull it down over and over and repeat it to the world as though it's your own - or worse yet, charge for it - that you have to worry. And as far as Accuweather does, for many years they had what amounted to their very own personal congressperson.

It's also worth considering if there aren't suitable alternatives with more appropriate services. DarkSky is alleged to be the most open of the private weather services these days, though I'm not familiar with products or licensing, only reputation. The Weather Underground is worth checking out - they recently ate Intellicast.

And, of course, there's always the official source: The United States National Weather Service (NOAA), which has its own APIs. They're not ideal for general weather forecasting, but they can provide environmental predictions on an hour-by-hour basis. So while you can't get a simple "Today will be sunny", you can tell about how sunny (or otherwise) it will likely be at a given hour of the day. Well actually, getting a "Today will be sunny with showers in the afternoon" isn't too hard, but I wanted something more formal that I could reduce to a 1-word synopsis for a weather display device I've made and that required some creativity.

If you do use the NWS web services, it will likely be a challenge at first, as they've tried various options over the years and getting the appropriate documentation requires some effort. Also a lot of their positional data lives in a space of its own, so some determination and translation is required. In the end, however, I think it's worth it. And, unlike screen-scraping, an API tends to be more stable. Few web page designers are kind enough to put IDs that can reliably lead you to the good stuff, so everytime someone changes the display format, the screen scraper is likely to break.
 
Robert Houston
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Tim Moores: I'd like to try your suggestion. the line that says:    "con.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:73.0) Gecko/20100101 Firefox/73.0");"   I need to change for my own environment (Edge). Where did you get this info for your environment. Can I construct it for my own and how?  Thanks

Tim Holloway - Hi Tim, you bring out a bunch of good points which I hadn't a clue they would be involved in what I want to do. The site I want to scrape provides many values but I am interested in only (2): Date & "Fishing Score". If the trouble factor gets too big then its not worth it since its all based upon a simple thought.

Remember back to you mainframe days: whenever an error condition occurred, you usually had one or more error msgs giving you the reason(s) for the failure. In the server world, and especially in networking, you often get nothing. Why is this?
 
Tim Holloway
Saloon Keeper
Posts: 25463
180
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Oh, Hi Bob! Who'da thunk you were interested in fishing data!

The errors are there, but they do come from many places. It's just a matter of knowing how to deal with them. It beats the infamous IEC141I / IEC000I from MVS (wrong length record OR tape drive is on fire.)

Basically when you talk to a web server, the two biggest problems are network-related or not getting what you wanted. Most network issues will throw an Exception and you can call printStackTrace() to dump the exception, its root causes and the code traceback.

If you connect with a web server, however, and it's operating properly, you'll get back a standard response code as defined in the Internet RFC documents. For example, "404 Page Not Found" means your URL points somewhere that the server has no response for, "503" generally means that the server-side software had an internal exception, and "200 OK" is the golden one.

I did a quick look-around. Apparently there's nothing like the freebie level API that Google gives you for Accuweather - they demand a minimum of $25 a month as I read it, and I don't think they even have an API for fishing forecasts. If you scrape their page, however, you can get the raw score from within the circle and the circle itself is easy to find, since it's defined as an SVG (scalable vector graphics) tag.

Being that we're a popular locale for year-round fishing, however, I did a quick check to see what other options are available. I liked this site: https://fishingbooker.com/reports/destination/us/FL/jacksonville which doesn't have a score, but does have the virtue of an easy-to-locate go/no-go indicator and a simple document structure. Just look for the CSS class tag "reports-feed-weather-fishing-desc". You can, of course, get Small Craft and offshore conditions from the NWS API.

I checked briefly to see what the current tech for screen scraping in Java is, and it appears to be Jaunt. Don't know anything about it, though. I've been doing mine in Python using BeautifulSoup.

My own weather data collection is being done by an app I wrote in NodeJS. It runs on one of my servers and about 4AM every morning it polls the NWS API to get daily forecast information, which gets bundled up along with data from another machine that's picking up temperature and humidity from some wireless sensors I got from Target. And time-of-day from NIST. This aggregate is itself a web service that's called by the remote device I built that's mounted on a kitchen cabinet. That's got an e-paper display for low energy consumption plus a WiFi interface chip to pull the data and present it on the display. Cost me under $50 to build and that's including the relatively expensive e-ink display unit.

Incidentally, if you really want your mind blown, let me tell you about that WiFi interface. Although this particular one is soldered onto the back of the e-paper display, I've also got a box of them for stand-alone use. I've got one going into a rain gauge repair, in fact. The unit itself is about 1.5 inches long by half an inch wide. It costs under $5. And it has enough RAM and processing power that people have actually programmed it to emulate the Apple II PC. If I was really brave, I'd see if I could compule the Hercules mainframe emulator to run on it. My current record for Hercules is a $35 credit-card sized Raspberry Pi, using a Bart Simpson USB stick for the DASD farm. Which I think is probably faster and with more capacity than the old Amdahl V6 I left behind at CPI.

Sometimes modern tech scares me.
 
Paul Clapham
Marshal
Posts: 27211
87
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Robert Houston wrote:Tim Moores: I'd like to try your suggestion. the line that says:    "con.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:73.0) Gecko/20100101 Firefox/73.0");"   I need to change for my own environment (Edge). Where did you get this info for your environment. Can I construct it for my own and how?



I used the User-Agent property for my own environment too, but really I don't think you have to. If you tell the server that you're Firefox running on a Macintosh, whereas you're really a Java program running on Windows, the server isn't going to care. At least probably not. It has been known for web applications to look at the user agent and return different things for different browsers. But for the moment that's an advanced topic which probably won't affect you. (I think I found out what my user agent string was by running HttpFox in Firefox to monitor the HTTP traffic. Don't know what features Edge has for that.)

My code which uses this property actually scrapes a whole site, about 10000 pages. But it has a 5-to-10-second delay between each access so it doesn't burn up the server, so it takes a couple of days to run. And I only run it once a year. And it is for personal use only.
 
Tim Holloway
Saloon Keeper
Posts: 25463
180
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think the main use of the User-Agent header was so that the website could sneer at you for not using Internet Exploder so their ActiveVirus controls wouldn't work right.

Beyond that, it's sometimes collected to help thoughtful designers optimize the site for the most common types of visitors. Nah, who are we kidding? They were told to "just Git 'er Dun!" and go on to the next overdue project.

Paul Clapham wrote:
My code which uses this property actually scrapes a whole site, about 10000 pages. But it has a 5-to-10-second delay between each access so it doesn't burn up the server, so it takes a couple of days to run. And I only run it once a year. And it is for personal use only.



As opposed to Google's indexer that runs rampant once a day. Which is why do-not-follow and robots.txt are your friends when developing large sites.

Incidentally, I used the NWS API to pull this data. It's for Puerto Rico, since there are no bulletins active for closer to home, and I can't guarantee that there ever will be - a lot of working with the NWS stuff is trial and error because you can never tell what you'd think should work versus what actually does. And incidentally, the NWS zone IDs for Fernandina South to St. Augustine areAMZ453 (less than 20 miles offshore) and AMZ470 (20-40 miles offshore).

So here goes:
 
Robert Houston
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello all of you,

Thanks for all of the input! When I first went down this path I didn't realize how much more there is in this world of JAVA & network access.
Funny thing - The APP in question said "Go fishing on Monday; we score it a 9 out of 10." So I go and its very slow going. I come back home and get back on the APP and suddenly they score the same day a 5! So there goes any incentive for me to pursue anything here in the future. I'll retire from this endeavor.

Tim Holloway - I remember you as likely the smartest person I have ever met. It's amazing how much info you have collected, and I'm guessing most of it is on your own. Thanks, Bob
 
Tim Holloway
Saloon Keeper
Posts: 25463
180
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the compliment, Bob! It probably has more to do with spending time reading technical articles when I probably should have been fishing. Well, maybe not fishing, since that's not my style. And I'm not totally nerdish. I'm biting my nails about the front roaring in, since if we stay warm enough, I'll have a crop of pineapples in my garden next August.

Weather predictions are tricky, though. I read yesterday that an XBox 1 game console has the same processing power that the ALLTEL mainframes did back in Y2K. And while granted, IBM was using 600MHz cores back then, when down in my department we were using 1.1GHz processors to provide the raw numbers that would eventually trigger the Great Recession, that's still quite a change. But despite that, NOAA is about to take delivery of a new processor system with literally thousands of cores, because weather prediction is so complex. And that's just their central facility. They get help from other places - your alma mater, for one, I think.

One thing I've noticed about weather systems locally, however, is that fronts very often miss schedule by slowing down about the time they reach Tallahasse, and that often, the heaviest weather in them lifts to the North and misses us. A lot of the winter weather brutality that much of the country has seen this year has done that, which is why we've not had a freeze here all winter (so far!).

But there's really 2 things you'd look for in a fishing forecast. 1: Weather, so your 3-hour tour doesn't go awry, and 2: Fish. I don't think Accuweather has a clue as to what fish are doing within 500 miles of here. There are better local sources for that information. Some of them get regular marine-band radio information in real-time from boats on the scene.

At any rate, don't let this be the end of your experiments. You may yet find something useful. Here's where my own efforts led me:

weather.jpg
[Thumbnail for weather.jpg]
Weather display getting data from web sources
 
Oh. Hi guys! Look at this tiny ad:
free, earth-friendly heat - a kickstarter for putting coin in your pocket while saving the earth
https://coderanch.com/t/751654/free-earth-friendly-heat-kickstarter
reply
    Bookmark Topic Watch Topic
  • New Topic