Today has been focused on fixing bugs that just so happen to all relate to content retrieval.
With the volumes of web pages retrieved and tested daily by Simply Testable, I always think I’ve seen all the ways in which people produce HTML incorrectly.
With the volumes of code I write daily for Simply Testable I always think I’ve covered (nearly) all the ways in which content retrieval can go wrong.
I’m still amazed by what I find.
Today’s latest bug fixes:
We weren’t correctly deriving the full absolute path to the referenced resource in some cases where a relative URL is used.
Ignoring invalid Content-Type attributes
The HTTP Content-Type header includes details on the type of content present in a HTTP response.
Here’s a common example:
Here’s a bad example:
Our library for making sense of this wasn’t prepared for things being quite as invalid as the above bad example is.
I updated the library to allow invalid attributes to be ignored entirely.
Feel free now to return Content-Type headers with jibberish attributes if you fancy.
Handling cURL Errors 6 and … 37
Somewhere towards the bottom of the stack when retrieving web content is a library called cURL. It gives us very detailed reasons why something went wrong if it went wrong.
This specifically covers two cases of uncaught cURL errors that we were seeing.
Firstly there is the quite-likely-to-occur error 6 which more or less translates to the fact that the domain name in a URL does not exist.
Secondly there is
the blimey-did-that-happen error 37 which occurs if you try to use a
URL on the Internet.
Feel free now to make such mistakes, I don’t mind at all.