This article won’t provide perfect guide for porting py2 code to py3, just list the solutions I tried, the problems I come to, and my choices. I haven’t finished this project, also I haven’t gave up so far :).
Won’t explain too much about the differences between py2 and py3, will write down some corner cases which are easy to miss.
The codebase I’m working on:
- Only support python2.7, don’t consider python2.6
- 1X repos, about half a million lines of code in total (calculated by cloc).
- These repos will import each other, bad design from early days, not easy to resolve, which means I can’t switch to py3 one by one, I need write py2/3 compatiblility code for them, and switch together(I’m also considering solve the import problem first).
- Test coverage is not good, best is around 80%, lowest is 30%.
Tools
2to3
, a command line tools packaged with py2, it’s a oneway porting to convert your code to py3, new code won’t work under
py2, since I need be compatible with py2 and py3 for long time, didn’t try it.
future, it tries to make you write single clean python3.x code without ugly hack with six. I used it it first, but come to many problems, will explain later.
modernize, rewrite your code with six
. I mainly use it, but some code it still can’t detect.
Things out of tools
hash function
Internal implementation of hash
has changed in py3, you shouln’t rely on result of hash
in py2/3, but you need be aware of the difference. Unfortunately,
old system relies on py2’s hash
for some reason, so I reimplement it in py3 with a c extension: https://github.com/monsterxx03/legacyhash, only support bytes and unicode.
Tools won’t check your usage of hash
, you need to check it by yourself, maybe consider write a customized fixer for it?
json
Before python3.6, json.loads
only accept str(unicode in python3), not bytes. You’d better upgrade to python3.6 directly to avoid this problem.
json.dumps
output is str
in both py2/3, but str
is bytes in py2, unicode in py3, other code may be broken.
redis
return value from redis client is always bytes in py3, since boundary of bytes and unicode in py2 is very obscure, it may cause problem.
List comprehension can’t access class scope
class A(object):
test1 = 'a'
test2 = [test1 for _ in range(10)]
will report NameError: name 'test1' is not defined
.
In py3 list comprehension’s working scope is changed to avoid bleeding variables from local scope. Details can see: https://stackoverflow.com/questions/13905741/accessing-class-variables-from-a-list-comprehension-in-the-class-definition
A workaround is:
class A(object):
test1 = 'a'
test2 = (lambda test1=test1: [test1 for _ in range(10)])()
Dynamic generated code
If you have some scripts to generate python code dynamically like me, don’t forget to fix them.
Problems with future
standard lib changes
Many standard libs have change names or structure in py3, future try to let you import py3 name directly with the hack of future.standard_library.install_aliases()
,
it will modified the sys.modules
to make new module name point to a backported version from future.backports
, but it maybe break your code in py2.
One example is urljoin
, in py2 it’s from urlparse impor urljoin
, in py3 from urllib.parse import urljoin
, but the internal implementation
is different.
In py2, urljoin can mix bytestring and unicode: urljoin(b'http://localhost', u'a/b')
works, but in py3, it will report TypeError: Cannot mix str and non-str arguments
, if you use future’s backport version, will come to this issue as well. Some third part libs will breake due to this behavior, eg: eventbrite’s python sdk.
raven(sentry’s client) also come to problem due to this runtime replacement. To avoid third part libs issue, I didn’t use future’s install_aliases()
.
For modernize
, it will genearate code with six
, eg: from six.moves.urllib.parse import urljoin
, in py2, it will still use py2’s urljoin. I perfer this way.
builtins
future try to backport py3’s data type to py2 via builtins
modules, but it may break your py2 code due to type check.
from builtins import str
isinstance('1', str) # False, since str is actually unicode type in py3
isinstance(u'1', str) # True
buidins.object
try to fix the next
and __next__
differences between py2/3, but it may break, if you overwrite __getattr__
function in class.
from builtins import object
class A(object):
def __getattr__(self, key):
return 'test'
a = A()
if a:
print('ok')
will report AttributeError: type object 'A' has no attribute '__bool__'