For a simple blog-to-twitter posting gateway (source code) I’m
relying on the excellent feedparser
and twitter
modules, and I am
trusting them to handle unicode strings without trouble. With most
well-written Python modules (and these two are no exception!) methods
will return unicode strings as they see fit, and other methods will
accept these unicode strings and handle all the nitty gritty encoding
details for me.
A simplified version of my workflow would look like this:
def post(entry):
title = entry.title
print "posting [%s]" % title
api.PostUpdate(title) # api is a twitter Api object
feed = feedparser.parse(config["feed"])
for e in reversed(feed.entries):
if not e.id in seen:
post(e)
This code bombed out with an exception on the first post that had a non-ASCII title. Can you spot why?
It’s the print
statement. All the APIs I’m using have zero trouble with
unicode, but print wants to encode for your terminal and it’ll usually assume
that that is ASCII. My ‘debugging’ output actually broke the program. My
workaround is to say title.encode("ascii","replace")
Brend on #python pointed out to me that the issue is not, exactly, print
.
The issue is interpolating title
into a non-unicode string. Depending on
environment, using print
on the unicode object might in fact work. For those
environments, saying print u"posting [%s]" % title
could help. In my case
however, I ran into the issue from cron with no locale set at all, so dumbing
the string down to ascii is still the right thing to do.