Read this if you plan to contribute code written in the Python language.
Fortunately, with PEP 8 there’s an extensive official Style Guide for Python Code. All new Python code you submit should conform to it, unless you have good reasons to deviate from it, for instance readability.
Keep PEP 20, the Zen of Python, under your pillow.
The code you write now probably needs to be touched by someone else down the road, and that someone else might be less experienced than you, or have a terrible headache and be under pressure of time. So while a particular construct may be a clever way of doing something, a simple way of doing the same thing can be and often is preferrable. If (when) complexity can’t be avoided, try to isolate it: put a difficult operation into its own function, method or class, add comments. If complexity can be hidden from upper layers of the code, do so.
Be generous when it comes to commenting your code, it’s better to have a superfluous comment than if one were necessary but is missing. However, if there is a comment it should be correct and agree with the code, otherwise people have to guess if the comment or the code needs to be straightened out.
Adding docstrings to modules, classes, methods and functions is encouraged. If you use the Sphinx format to describe parameters, return values, etc., even better!
Python comes in two major versions nowadays:
Version 3 is not backwards compatible to version 2. While we mainly target “the future”, there are some components we have to work with that haven’t yet been ported over the Python 3, most notably koji. Additionally, we may also want to support the “user tools” we create on legacy systems, so we can’t write code that uses all the latest features. Fortunately, many of the original Python 3 features have been back-ported to Python 2.7, so we can and should write code that is very close to writing idiomatic Python 3 but can still be run on version 2.7. Targeting older minor releases (Python 2.6 and earlier) is much more of a balancing act, so we won’t aim for it.
The following sections cover areas that require some attention. The Python project itself has a great Porting Python 2 Code to Python 3 document which goes into much detail about the differences and is worth a read, even though it mainly addresses existing Python 2 code bases.
In Python 2, importing modules can be ambiguous when a module of that
name exists in the same package and elsewhere in the module search path
sys.path
. To work around this ambiguity, programmers often resorted
to adding paths private to the project to the beginning of sys.path
to force loading modules from a project-internal location (which adds
unwanted noise and can make e.g. testing code that isn’t installed
difficult). Python 3 introduces new syntax for import statements which
makes both cases distinct, this is available since version 2.5 from the
__future__
module:
from __future__ import absolute_import
# Import the sys module from the module search path
import sys
# Import the foo module from the same directory
from . import foo
# Import snafu from the bar module one directory above
from ..bar import snafu
Python 3 did away with print
as a statement and introduced it as a
function. In order to use it the same way in Python 2.7, add the
following to the top of source code files where you use print
:
from __future__ import print_function
Python 2 has two integer types, `int` which is whatever integer-type
is native to the system (which has certain maximal and minimal values
and can overflow) and `long` which can store arbitrary integer
numbers. Python 3 only the latter type, but it’s called int
.
Dividing integer numbers using /
truncates the result to an integer
in Python 2 by default, but yields a floating point number in Python 3.
In order for code to do the same thing on either version, include the
following line at the top of your source files where you divide numbers,
and use /
for normal divisions and //
for divisions that should
truncate the result:
from __future__ import division
Some consider this the main difference between Python 2 and 3: Both
versions have a type for strings of bytes and strings of Unicode
character points. They are called str
and unicode
in version 2
and bytes
and str
in version 3, respectively.
Python 2 and 3 use different ways of marking literals of the different
types by default. Byte strings can have no prefix or b
in Python
2.7, but must be prefixed in Python 3, and text strings must have the
u
prefix in Python 2 which can be and usually is omitted in Python
3:
# a byte string in Python 2 and 3
string1 = b"abc"
# a byte string in Python 2, but a text string in Python 3
string2 = "def"
# a text string in Python 2 and 3
string3 = u"ghi"
In order to ease writing code that is compatible between the versions,
you can switch Python 2 to treat unprefixed string literals as
unicode
, the text string type, by adding this snippet to the top of
the relevant source code files:
from __future__ import unicode_literals
In Python 2, the byte and text string types are exchangeable in many
places, taking the user’s or system default locale into account (and
sometimes failing, when the locale didn’t match up with encoded data).
Apart from the change in type names and how literals look like, Python 3
requires you to explicitly encode str
and decode bytes
objects
if you need them cast into the respective other string type. It is good
practice to exclusively use text strings for strings that represent text
in a program and decode byte strings as early and encode text strings as
late as possible at interfaces that produce or consume encoded data.
from __future__ import print_function
text_string = u"Hello, world!"
print(text_string.replace("world", "gang"))
from __future__ import print_function, unicode_literals
text_string = "Hello, world!"
print(text_string.replace(b"world".decode('utf-8'), b"gang".decode('ascii')))
With version 3.6 around the corner, there are four ways to format strings in Python now:
%
operatorstring.Template
of PEP
292str.format()
methodThe last method isn’t available yet in a stable Python release and will
never be in Python 2, so it’s not suitable for our purposes. The other
three variants work in all Python versions we’re interested in,
formatting with string.Template
is very rarely done however. The
remaining two ways, commonly called old-style (%
operator) and
new-style (str.format()
), are both in wide-spread use, here’s a
site showcasing the differences between
them. New-style formatting is more powerful
and often easier to read, but on the other hand can be a little more to
type. From a technical point of view, this is a case of “use what works
for you”, but for consistency sake the new-style str.format()
way is
preferrable if you’re comfortable with using it. If not, others can
convert old-style to new-style formatting for you during review or when
happening across it. At any rate, consistently use one way or the other
in what you submit.
Python 2 and earlier knows two types of classes, old-style which have no
base class, and new-style which have object
as the base class.
Because their behavior is slightly different in some places, and some
things can’t be done with old-style classes, we want to stick to
new-style classes wherever possible.
The syntactical difference is that new-style classes have to explicitly
be derived from object
or another new-style class.
# old-style classes
class OldFoo:
pass
class OldBar(OldFoo):
pass
# new-style classes
class NewFoo(object):
pass
class NewBar(NewFoo):
pass
Python 3 only knows new-style classes and the requirement to explicitly
derive from object
was dropped. In projects that will only ever run
on Python 3, it’s acceptable not to explicitly derive classes without
parents from object
, but if in doubt, do it just the same.
In Python, it’s easy to inadvertently emulate idiomatic styles of other languages like C/C++ or Java. In cases where there are constructs “native” to the language, it’s preferrable to use them.
Python has special syntax for literals for a couple of built-in compound data types: lists, tuples, dictionaries, strings, sets. It’s customary to use that syntax instead of the class constructor to create objects for these data types unless you have good reason not to. Apart from how it looks, the literal syntax is performing a little bit better (because it doesn’t have to look up the class name in the current scope). NB: Set literals are peculiar in that you can’t create empty ones—they would look the same as empty dicts.
Data Type | Good | Bad |
---|---|---|
str |
a_str = "abc" empty_str = "" |
empty_str = str() |
list |
a_list = [1, 2] empty_str = [] |
a_list = list((1, 2)) empty_list = list() |
tuple |
a_tuple = ('a', 'b', 3) empty_tuple = () |
a_tuple = tuple(['a', 'b', 3]) empty_tuple = tuple() |
dict |
a_dict = {'a': 1} empty_dict = {} |
a_dict = dict(('a', 1)) empty_dict = dict() |
set |
a_set = {"banana", "apple"} ``empty_set = set()``
|
a_set = set(["banana", "apple"]) |
Table: Creating compound objects
Often the initial contents of a compound object are only known when it’s created at runtime. For simple cases like mere type conversions, calling the class constructors are the way to go:
a_tuple = (1, 2, 3)
...
a_list = list(a_tuple)
...
another_list = [4, 5, 6]
...
another_tuple = tuple(another_list)
a_list = [1, 2, 3, 2]
...
a_set = set(a_list)
For more involved cases, say some values need to be filtered or a
specific attribute of the objects is wanted, Python has so-called
comprehensions to create compound objects in a syntactically “nice” way.
These largely supersede the old (ugly) way of using map()
and
filter()
in conjunction with class constructors.
Comprehension type | Data Type | Example | Remarks |
---|---|---|---|
List Comprehension | list |
a_list = [x for x in range(20) if x % 2] |
Put all odd numbers smaller than 20 into a list. |
Dict Comprehension | dict |
a_dict = {k: getattr(an_obj, k) for k in dir(an_obj) if not k.startswith("_")} |
Fill a dict with those attribute names and values of an object that aren’t considered “protected” or “private” (names with one or two leading underscores, respectively). |
Set Comprehension | set |
a_set = {o.name for o in a_list} |
Create a set containing the value of the attribute name of objects in a list. |
Table: Using comprehensions to create compound objects
Languages like C normally use incremented indices to loop over arrays:
float pixels[NUMBER_OF_PIXELS] = [...];
for (int i = 0; i < NUMBER_OF_PIXELS; i++)
{
do_something_with_a_pixel(pixels[i]);
}
Implementing the loop like this would give away that you’ve programmed in C or a similar language before:
pixels = [...]
for i in range(len(pixels)):
do_something_with_a_pixel(pixels[i])
Here’s the “native” way to implement the above loop:
pixels = [...]
for p in pixels:
do_something_with_a_pixel(p)
It yields pairs of count (starting at 0 by default) and the current value like this:
pixels = [...]
for p_no, p in enumerate(pixels, 1):
print("Working on pixel no. {}".format(p_no))
do_something_with_a_pixel(p)
In order to allow future changes in how object attributes (member variables) are set, some languages encourage always using getter and/or setter methods. This is unnecessary in Python, as you can intercept access to an attribute by wrapping it into a property if and when this becomes necessary. Properties allow having accessor methods without making the user of the class have to use them explicitly. This way you can validate values when an attribute is set, or translate back and forth between the interface used on the attribute and an internal representation.
To ensure that an Employee
object only has positive values for its
salary
attribute, you’d put a property in its place which checks
values before storing them in an attribute called e.g. _salary
:
class Employee(object):
@property
def salary(self):
return self._salary
@salary.setter
def salary(self, salary):
if salary <= 0:
raise ValueError("Salary must be positive.")
self._salary = salary
Take these classes of geometric primitives, Point
and Circle
:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
class Circle(object):
def __init__(self, point, radius):
self.point = point
self.radius = radius
If you wanted to add a diameter
attribute to Circle
, you can do
so as a property which translates back and forth between it and the
existing radius
attribute:
...
class Circle(object):
def __init__(self, point, radius=None, diameter=None):
self.point = point
if (radius is None) == (diameter is None):
raise ValueError("Exactly one of radius or diameter must be set")
if radius is not None:
self.radius = radius
else:
self.diameter = diameter
@property
def diameter(self):
return self.radius * 2
@diameter.setter
def diameter(self, diameter):
self.radius = diameter / 2.0
...
Even setting self.diameter
in the constructor goes by way of the
property and therefore the setter method.
%
and .format()
for
great good!