Peter Hoppe wrote:I am working on a web project and needed some test URLs to test a URL input validator for a web form. Sadly, I found no test data set of real world URLs. So I wrote a random URL generator which creates HTTP(S) links. I'd like the generator to create links which are valid so I can run my tests with those randomly generated URLs. For reference I delved into RFC1738 (ouch, 20 years old), even though I deviated slightly (e.g. I'm creating https urls which isn't mentioned in RFC1738). See below for the generator's source code. Could you kindly comment on the code?
Well, my first one is that it's rather long.
We're all volunteers here, and asking people to plough through over 400 lines of code, however nicely documented and formatted (and it certainly appears to be that), isn't likely to produce a lot of responses. Is there any way you could shorten it? - eg, perhaps only include the parts you think might be causing problems.
however nicely documented and formatted (and it certainly appears to be that)
Second: I'm not quite sure what a "random" URL generator will give you - even if it can spew out valid URLs for you - because unless the URL actually exists, you won't be able to connect to it. Perhaps you could explain how you intend to use it.
because unless the URL actually exists, you won't be able to connect to it.
About the only thing I could imagine it might be useful for is checking whether a URL "validator" actually works; and it would seem to me that you then have a bit of a "chicken and egg" situation:
You can't validate a URL without knowing what a "valid" one looks like.
You can't write a program to generate valid URLs without knowing what one looks like.
and, unless you're very careful, you could easily end up creating a validator that simply reverse-engineers your generator - including any mistakes it makes.
Third: Even assuming that there are uses for such an animal, a truly random generator is only likely to be good for "smoke tests", and may never (or only very rarely) produce "corner cases" - ie, URLs that are particularly long, problematic, or obscure.
Fourth: Have you tried looking for an existing library to do this? My Google for "valid URL generator" produced a slew of results; although I have to admit that none leap out at me as a "solution" for what you appear to want. I do know that there are any number of solutions (most involving regular expressions) for validating a URL though.
Peter Hoppe wrote:Thank you for your thoughts and taking the time to write!
Peter Hoppe wrote:
I try to write well formatted and documented code and use the Taligent naming conventions. I have always found them very useful.
Peter Hoppe wrote:The validator uses regular expression(s), so I am basically testing those regular expressions.
Jamie Zawinski, in a Tue 12 Aug 1997 Usenet post wrote: Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.