Saturday, May 02, 2015

List of Python Unicode / UTF8 String Encoding Snippet


\u to \x conversion

>>> print(u'\u0420\u0443\u0441\u0441\u043a\u0438\u0439')
Русский
>>> a = u'\u0420\u0443\u0441\u0441\u043a\u0438\u0439'.encode('utf8')
>>> a
'\xd0\xa0\xd1\x83\xd1\x81\xd1\x81\xd0\xba\xd0\xb8\xd0\xb9'
>>> print(a)
Русский


\x to \u conversion

Firstly, convert unicode to string (iso-8859-1 encoding)
Then you need to know the encoding. In this example, the encoding of the string is gb2312:

>>> x
u'\xcc\xd8\xbe\xaf\xc1\xa6\xc1\xbf'
>>> x.encode('iso-8859-1')
'\xcc\xd8\xbe\xaf\xc1\xa6\xc1\xbf'
>>> '\xcc\xd8\xbe\xaf\xc1\xa6\xc1\xbf'.decode('gb2312')
u'\u7279\u8b66\u529b\u91cf'
>>> print(u'\u7279\u8b66\u529b\u91cf')
特警力量