From python2 to python3

2018.06.07

This article won’t provide perfect guide for porting py2 code to py3, just list the solutions I tried, the problems I come to, and my choices. I haven’t finished this project, also I haven’t gave up so far :).

Won’t explain too much about the differences between py2 and py3, will write down some corner cases which are easy to miss.

The codebase I’m working on:

Only support python2.7, don’t consider python2.6
1X repos, about half a million lines of code in total (calculated by cloc).
These repos will import each other, bad design from early days, not easy to resolve, which means I can’t switch to py3 one by one, I need write py2/3 compatiblility code for them, and switch together(I’m also considering solve the import problem first).
Test coverage is not good, best is around 80%, lowest is 30%.

Tools

2to3, a command line tools packaged with py2, it’s a oneway porting to convert your code to py3, new code won’t work under py2, since I need be compatible with py2 and py3 for long time, didn’t try it.

future, it tries to make you write single clean python3.x code without ugly hack with six. I used it it first, but come to many problems, will explain later.

modernize, rewrite your code with six. I mainly use it, but some code it still can’t detect.

Things out of tools

hash function

Internal implementation of hash has changed in py3, you shouln’t rely on result of hash in py2/3, but you need be aware of the difference. Unfortunately, old system relies on py2’s hash for some reason, so I reimplement it in py3 with a c extension: https://github.com/monsterxx03/legacyhash, only support bytes and unicode.

Tools won’t check your usage of hash, you need to check it by yourself, maybe consider write a customized fixer for it?

json

Before python3.6, json.loads only accept str(unicode in python3), not bytes. You’d better upgrade to python3.6 directly to avoid this problem.

json.dumps output is str in both py2/3, but str is bytes in py2, unicode in py3, other code may be broken.

redis

return value from redis client is always bytes in py3, since boundary of bytes and unicode in py2 is very obscure, it may cause problem.

List comprehension can’t access class scope

class A(object):
    test1 = 'a'
    test2 = [test1 for _ in range(10)]

will report NameError: name 'test1' is not defined.

In py3 list comprehension’s working scope is changed to avoid bleeding variables from local scope. Details can see: https://stackoverflow.com/questions/13905741/accessing-class-variables-from-a-list-comprehension-in-the-class-definition

A workaround is:

class A(object):
    test1 = 'a'
    test2 = (lambda test1=test1: [test1 for _ in range(10)])()

Dynamic generated code

If you have some scripts to generate python code dynamically like me, don’t forget to fix them.

Problems with future

standard lib changes

Many standard libs have change names or structure in py3, future try to let you import py3 name directly with the hack of future.standard_library.install_aliases(), it will modified the sys.modules to make new module name point to a backported version from future.backports, but it maybe break your code in py2.

One example is urljoin, in py2 it’s from urlparse impor urljoin, in py3 from urllib.parse import urljoin, but the internal implementation is different.

In py2, urljoin can mix bytestring and unicode: urljoin(b'http://localhost', u'a/b') works, but in py3, it will report TypeError: Cannot mix str and non-str arguments, if you use future’s backport version, will come to this issue as well. Some third part libs will breake due to this behavior, eg: eventbrite’s python sdk.

raven(sentry’s client) also come to problem due to this runtime replacement. To avoid third part libs issue, I didn’t use future’s install_aliases().

For modernize, it will genearate code with six, eg: from six.moves.urllib.parse import urljoin, in py2, it will still use py2’s urljoin. I perfer this way.

builtins

future try to backport py3’s data type to py2 via builtins modules, but it may break your py2 code due to type check.

from builtins import str

isinstance('1', str)  # False, since str is actually unicode type in py3
isinstance(u'1', str) # True

buidins.object try to fix the next and __next__ differences between py2/3, but it may break, if you overwrite __getattr__ function in class.

from builtins import object

class A(object):
    def __getattr__(self, key):
        return 'test'

a = A()
if a:
    print('ok')

will report AttributeError: type object 'A' has no attribute '__bool__'

lang-en python code-infra

comments powered by Disqus

△