Unicode

Great video by Ned Batchelder: https://www.youtube.com/watch?v=sgHbC6udIqc

General rule:

  • decode to unicode as early as possible (from db, from request, from file)
  • encode to target-encoding as late as possible (to db, to response, to file)

Notes

Print a list of unicode characters by code point number. Note that I use utf-8 as my default encoding. In Python 2.7:

print ''.join([x.encode('utf-8') for x in map(unichr, range(0x0107, 0x0187))])