asynchronous http requests in python

25 Jan 2007

At work I’ve been having a good time writing lots of python code of late. Python’s a very nice language, and it comes with a good set of modules to start with that make things easy to get going. For example, I’ve been needing to fetch things from a server using http requests, and the provided urllib does generally a fine job of this.

However, it all came unstuck when I wanted to do an asynchronous url fetch – that is I wanted to connect to the socket, make a HTTP GET request, and then get on with other work whilst the server thought about it. urllib unfortunately doesn’t support asynchronous requests. One solution is to just open a socket, manually issue the http request, and use the select system call to be alerted when there’s an update, which is fine, but then I need to parse the response, which is a bunch of HTTP headers and encoded text, and there’s no where in urllib or httplib where it obviously wants me to give it the string of an HTTP response and have it parse it for me and just give the the data. The code is clearly in there, but there’s no interface for it. One option is to write my own HTTP parsing code, but that’s annoying, time consuming, and why should I have to do it when there’s already code to do it that’s tried and tested?

In the end, I came up with this solution, which I only blog here incase it either helps someone else googling like I did for an answer, or for someone to point out how wrong I am and tell me there’s a better way. In httplib there’s a class called HTTPResponse, which the documentation says should never be created directly, as it’s created by HTTPConnection. Well, phooey to that :) If you create HTTPResponse yourself, it will happily take a socket as the init method parameter and then process an HTTP response that may come down said socket. Thus I can now create a socket, send my GET request, use my nice friendly select function, and when I see there’s data on the socket pass it over to an HTTPResponse object to parse it for me.

This solution may not work in python versions other than 2.4, as I’m using undocumented features, but it works for now and keeps me happy.

In the process of seeking solutions I did come across Twisted, a very rich event driven python library for handling all sorts of networking protocols. I’ll definitely need to investigate this further, but here I wasn’t able to use it as Twisted uses it’s own run loop, and I already had a perfectly good run loop doing work I wanted (hence why I was using select).

Anyway, there you go.