Over the weekend, I spent a number of hours tracking down a bug caused by the cache in the Python urlparse module. The problem has already been reported as Python bug 1313119, but has not been fixed yet.
First a bit of background. The urlparse module does what you'd expect and parses a URL into its components:
>>> from urlparse import urlparse >>> urlparse('http://www.gnome.org/') ('http', 'www.gnome.org', '/', '', '', '') As well as accepting byte strings (which you'd be using at the HTTP protocol level), it also accepts Unicode strings (which you'd be using at the HTML or XML content level):