Win a copy of Rust Web Development this week in the Other Languages forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Junilu Lacar
  • Rob Spoor
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Jesse Silverman
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Piet Souris
  • Frits Walraven

Web Crawler Exercise

 
Ranch Hand
Posts: 236
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I just completed the Web Crawler exercise (at "127.0.0.1:3999/concurrency/10") and therefore the whole Go Tour. But I'm just kind of wondering. The exercise was to create a web crawler that explored every URL on a page, and for each such URL every URL on the page that URL referred to, and so on and on forever recursively. (Well, not quite forever; there was a depth limit built into it.) My code that accomplished it was:

But note that in order to get it to work I had to put in a call to "time.Sleep( time.Second)" in my main function. Without that line in, the main function would end up returning, and terminating the program, before very many calls to "Crawl()" had gotten executed. Is there some way in Go to tell the main function to wait and stay alive until all currently executing lightweight threads have completed executing?

I was thinking one way I could implement that would be to add an integer "Count" field to my "SafeMap" struct, increment it before each call to "go Crawl(sm, u, depth-1, fetcher)", and only decrement it at the end of the "Crawl()" function, and then have my main function loop on that "Count" variable until it was zero again. That seems kind of drastic though. Anybody have any better ideas?
 
Sheriff
Posts: 16767
281
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Kevin, the code tag doesn't currently recognize "go" as a language that it can prettify so don't set that attribute for now. I'll see what I can do to add "go" as a language that the code tags recognize.
 
Junilu Lacar
Sheriff
Posts: 16767
281
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't think you should be changing the function signature(s) to include your SafeMap as a parameter. Since goroutines run in the same address space, the SafeMap would be shared by functions in your program. It's the methods in your SafeMap that would use mutex.Lock() and mutex.Unlock() to serialize access to the encapsulated Map. Your implementation "reaches into" the object and manipulates the mutex. That breaks encapsulation.

That is, your implementation should have something like this:
 
Junilu Lacar
Sheriff
Posts: 16767
281
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also, the map they refer to is supposed to act as a cache, so you can avoid going to the Fetcher more than 1 time for each URL. A cache usually holds the same kind of thing you get from the original source. Your SafeMap holds a map[string]int which is not the same thing you get from the original source. I would look to the Fetch function parameters to see what kind of map the SafeMap should hold. As an object, I think the SafeMap should have Fetch() and Put() methods.
 
Junilu Lacar
Sheriff
Posts: 16767
281
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You might also want to look at the example under the "Parallel digestion" section on this page: https://blog.golang.org/pipelines
 
Junilu Lacar
Sheriff
Posts: 16767
281
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Instead of sleeping, you can wait on a channel that gets closed when no more new URLs are found, i.e., all URLs are retrieved from the cache.
 
Rancher
Posts: 4686
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry to jump in with a negative comment, but the code posted looks like Java code that is accidentally written in GO. Its a long way from idomatic GO

For example, GO code doesn't use sleep, it uses channel. And the mutex usage looks straight out of Henry Wong's Java Threads book.

Don't feel bad, most folks write in their old language when learning a new one. But to see GO's strengths, you have to write idiomatic GO.
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic