2014-02-10

Unicode#

Great video by Ned Batchelder: https://www.youtube.com/watch?v=sgHbC6udIqc

General rule:

decode to unicode as early as possible (from db, from request, from file)
encode to target-encoding as late as possible (to db, to response, to file)

Notes#

Print a list of unicode characters by code point number. Note that I use utf-8 as my default encoding. In Python 2.7:

print ''.join([x.encode('utf-8') for x in map(unichr, range(0x0107, 0x0187))])