This article won't provide perfect guide for porting py2 code to py3, just list the solutions I tried, the problems I come to, and my choices. I haven't finished this project, also I haven't gave up so far :).
Won't explain too much about the differences between py2 and py3, will write down some corner cases which are easy to miss.
The codebase I'm working on:
- Only support python2.7, don't consider python2.6
- 1X repos, about half a million lines of code in total (calculated by cloc).
- These repos will import each other, bad design from early days, not easy to resolve, which means I can't switch to py3 one by one, I need write py2/3 compatiblility code for them, and switch together(I'm also considering solve the import problem first).
- Test coverage is not good, best is around 80%, lowest is 30%.
2to3, a command line tools packaged with py2, it's a oneway porting to convert your code to py3, new code won't work under
py2, since I need be compatible with py2 and py3 for long time, didn't try it.
future, it tries to make you write single clean python3.x code without ugly hack with six. I used it it first, but come to many problems, will explain later.
modernize, rewrite your code with
six. I mainly use it, but some code it still can't detect.
Things out of tools
Internal implementation of
hash has changed in py3, you shouln't rely on result of
hash in py2/3, but you need be aware of the difference. Unfortunately,
old system relies on py2's
hash for some reason, so I reimplement it in py3 with a c extension: https://github.com/monsterxx03/legacyhash, only support bytes and unicode.
Tools won't check your usage of
hash, you need to check it by yourself, maybe consider write a customized fixer for it?
json.loads only accept str(unicode in python3), not bytes. You'd better upgrade to python3.6 directly to avoid this problem.
json.dumps output is
str in both py2/3, but
str is bytes in py2, unicode in py3, other code may be broken.
return value from redis client is always bytes in py3, since boundary of bytes and unicode in py2 is very obscure, it may cause problem.
List comprehension can't access class scope
class A(object): test1 = 'a' test2 = [test1 for _ in range(10)]
NameError: name 'test1' is not defined.
In py3 list comprehension's working scope is changed to avoid bleeding variables from local scope. Details can see: https://stackoverflow.com/questions/13905741/accessing-class-variables-from-a-list-comprehension-in-the-class-definition
A workaround is:
class A(object): test1 = 'a' test2 = (lambda test1=test1: [test1 for _ in range(10)])()
Dynamic generated code
If you have some scripts to generate python code dynamically like me, don't forget to fix them.
Problems with future
standard lib changes
Many standard libs have change names or structure in py3, future try to let you import py3 name directly with the hack of
it will modified the
sys.modules to make new module name point to a backported version from
future.backports, but it maybe break your code in py2.
One example is
urljoin, in py2 it's
from urlparse impor urljoin, in py3
from urllib.parse import urljoin, but the internal implementation
In py2, urljoin can mix bytestring and unicode:
urljoin(b'http://localhost', u'a/b') works, but in py3, it will report
TypeError: Cannot mix str and non-str arguments, if you use future's backport version, will come to this issue as well. Some third part libs will breake due to this behavior, eg: eventbrite's python sdk.
raven(sentry's client) also come to problem due to this runtime replacement. To avoid third part libs issue, I didn't use future's
modernize, it will genearate code with
from six.moves.urllib.parse import urljoin, in py2, it will still use py2's urljoin. I perfer this way.
future try to backport py3's data type to py2 via
builtins modules, but it may break your py2 code due to type check.
from builtins import str isinstance('1', str) # False, since str is actually unicode type in py3 isinstance(u'1', str) # True
buidins.object try to fix the
__next__ differences between py2/3, but it may break, if you overwrite
__getattr__ function in class.
from builtins import object class A(object): def __getattr__(self, key): return 'test' a = A() if a: print('ok')
AttributeError: type object 'A' has no attribute '__bool__'