I just completed the Web Crawler exercise (at "127.0.0.1:3999/concurrency/10") and therefore the whole Go Tour. But I'm just kind of wondering. The exercise was to create a web crawler that explored every URL on a page, and for each such URL every URL on the page that URL referred to, and so on and on forever recursively. (Well, not quite forever; there was a depth limit built into it.) My code that accomplished it was:
But note that in order to get it to work I had to put in a call to "time.Sleep( time.Second)" in my main function. Without that line in, the main function would end up returning, and terminating the program, before very many calls to "Crawl()" had gotten executed. Is there some way in Go to tell the main function to wait and stay alive until all currently executing lightweight threads have completed executing?
I was thinking one way I could implement that would be to add an integer "Count" field to my "SafeMap" struct, increment it before each call to "go Crawl(sm, u, depth-1, fetcher)", and only decrement it at the end of the "Crawl()" function, and then have my main function loop on that "Count" variable until it was zero again. That seems kind of drastic though. Anybody have any better ideas?
Kevin, the code tag doesn't currently recognize "go" as a language that it can prettify so don't set that attribute for now. I'll see what I can do to add "go" as a language that the code tags recognize.
I don't think you should be changing the function signature(s) to include your SafeMap as a parameter. Since goroutines run in the same address space, the SafeMap would be shared by functions in your program. It's the methods in your SafeMap that would use mutex.Lock() and mutex.Unlock() to serialize access to the encapsulated Map. Your implementation "reaches into" the object and manipulates the mutex. That breaks encapsulation.
That is, your implementation should have something like this:
Also, the map they refer to is supposed to act as a cache, so you can avoid going to the Fetcher more than 1 time for each URL. A cache usually holds the same kind of thing you get from the original source. Your SafeMap holds a map[string]int which is not the same thing you get from the original source. I would look to the Fetch function parameters to see what kind of map the SafeMap should hold. As an object, I think the SafeMap should have Fetch() and Put() methods.