Skip to content

0x346 Python

1. Data Structure

1.1. Sequence

There are two types of sequences from the standard library

  • container sequences: contains the reference (e.g: list, tuple, collections.deque)
  • flat sequences: contains the value directly (e.g: str, bytes, bytearray, memoryview, array.array)

2. Functions

Functions in python are first-class objects, which means they can be

  • created at runtime
  • assigned to a variable
  • passed as an arg to a function
  • return as the result from a function

Btw, Guido van Rossum did not view python as a functional programming.

3. Classes and Protocols

3.1. Iterable, Iterator and Generator

There are many implicit iterators: for i in x actually invokes iter(x) which in turn may call x.__iter__()

There are three related concepts:

  • Iterable: An object which can generator iterator. In python, either __iter__ or __getitem__ should be implemented to be an Iterable
  • Iterator: An object that implement __next__ and raise StopIteration when items exhausted
  • Generator: simple lazy iterator, any function using coroutine yield

we should use itertools library for many iterator related functions

3.2. Decorators and Closures

Decorators are just syntactic sugar, the decorator runs at import time. In practice, decorator defines an inner function and return it

@decorate
def f():
    print('running f()')


# previous one is just a syntactic sugar of the following
def f():
    print('running f()')

target = decorate(f)

A closure is a function with extended scope capturing free variables

# ave = make_averager()
# ave(10) => 10
# ave(11) => 10.5
# ave(12) => 11.0

def make_averager():
    series = [] # this is a free variable captured by averager, it can be seen with ave.__code__.co_freevars

    def averager(new_value):
        series.append(new_value)
        total = sum(series)
        return total / len(series)

    return averager

nonlocal declaration let you declare a variable as a free variable even it is assigned in the function

def make_averager():
    count = 0
    total = 0

    def averager(new_value):
        nonlocal count, total
        count += 1
        total += new_value
        return total / count

    return averager

4. Concurrency

4.1. Asynchronous

Async IO helps us utilize the IO wait time by allowing us to perform other operations while we are in the wait state.

4.1.1. asyncio

An async function is called a coroutine, in python they are implemented with the same philosophies as generators. an await is similar in function to a yield statement.

>>> import asyncio

>>> async def main():
...     print('hello')
...     await asyncio.sleep(1)
...     print('world')

>>> asyncio.run(main())
hello
world

4.1.2. gevent

4.1.3. tornade

5. CPython Interpreter Internal

5.1. Data Structure

5.1.1. PyObject

The object definition in cpython is here where the most basic type is defined as

typedef struct _object {
    _PyObject_HEAD_EXTRA // struct _object *_ob_next; struct _object *_ob_prev;
    Py_ssize_t ob_refcnt;
    PyTypeObject *ob_type;
} PyObject;

So basically each python object looks to be a item in a double-linked list with ref count.

The variable length objects is using PyVarObject defined as

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

For example, str, list are derived from this PyVarObject. When retrieving length by builtin len, it will look up ob_size

5.1.2. int (PyLongObject)

Integer related struct are as follows, reference is from include/longintrepr.h

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1];
};

There are some optimizations for small integers (-5 ~ 256) where python allocated in advance, so do not need to malloc everytime for those integers. Note that tuple has the same type of optimizations.

5.2. Lexical Analysis

5.3. Syntactic Analysis

python code is immediately parsed into ast and into bytecode when interpreted. The asr will be discarded, so we need access to the source code to reproduce ast.

import ast
from pprint import pprint

tree = ast.parse("""
def add(a, b):
  return a + b

c = 1
""")
pprint(ast.dump(tree))

5.4. Bytecode Compile

The compiler here takes the AST and compile it into python bytecode, we can see how they get compiled with dis module. There is also a 3rd party disassembler uncompyle6 translate pyc back to bytecode

For a function, the bytecode will be stored into f.__code__ and might be cached in .pyc file

The compiler implementation is cpython/Python/compiler.c

dis.dis('a=1') # compile this get following bytecodes
# 0 LOAD_CONST               0 (1)
# 2 STORE_NAME               0 (a)
# 4 LOAD_CONST               1 (None)
# 6 RETURN_VALUE

Current bytecode is available here in cpython/include/opcode. It has around 165 bytecodes.

The bytecode can be fetched from function and manipulated through __code__

def add(x):
  return x+1

# Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=0, starts_line=2, is_jump_target=False)
# Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=1, argrepr='1', offset=2, starts_line=None, is_jump_target=False)
# Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None, argrepr='', offset=4, starts_line=None, is_jump_target=False)
# Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=6, starts_line=None, is_jump_target=False)
for x in dis.Bytecode(add):
  print(x)

# bytecode is stored as byte in __code__.co_code section
code = add.__code__
print(code)

# it can be iterated
# 0 124
# 1 0
# 2 100
# 3 1
# 4 23
# 5 0
# 6 83
# 7 0
for i, op in enumerate(code.co_code):
  print(i, op)

5.4.1. Bytecodes

arithmetic ops BINARY operation here means takes the top 2 from stack and perform operation then put it back. TOP: top item of the stack, TOP1: 2nd item of the stack

  • BINARY_ADD: TOS = TOS1 + TOS
  • BINARY_MULTIPLY
  • INPLACE_ADD // for case such as i += 1

logic ops

  • UNARY_NOT
  • BINARY_AND

variable ops

  • LOAD_CONST: load const variables to stack
  • LOAD_FAST: this loads local variables to stack
  • SAVE_FAST: save local variables to stack
  • LOAD_GLOBAL: loads global variables to stack

function ops

  • CALL_FUNCTION: execute function
  • RETURN_VALUE: return

stack related ops

  • POP_TOP: pop from stack, this has the op number 1!
  • SET_TOP

others ops

  • ROT_TWO: swap two vars on top of the stack, used in cases like (a,b)=(b,a)
  • BUILD_STRING: build string
  • BUILD_TUPLE: for most commonly used python data structures, it looks we have this type of ops

5.4.2. Optimization

In the recent PEP 659, the concept of specialization is introduced into python. CPython becomes adaptive and allows the high-level bytecode to be rewritten to a more specialized bytecode during the execution.

For example, BINARY_ADD might be written to BINARY_ADD_UNICODE or BINARY_ADD_INT depending on the operands.

The rewritten would not happen immediately on the first call but would be triggered after being called for a few times.

5.5. Virtual Machine

CPython VM executes those bytecode with a stack machine architecture (e.g: push, pop, execute ops)

The main VM is implemented cpython/python/ceval.c

// for example, this is BINARY_ADD impl and some of my comments
        case TARGET(BINARY_ADD): {
            PyObject *right = POP(); // this is roughly doing (*--stack_pointer)
            PyObject *left = TOP();  // this is defined as stack_pointer[-1]

            PyObject *sum;
            if (PyUnicode_CheckExact(left) &&
                     PyUnicode_CheckExact(right)) {
                sum = unicode_concatenate(tstate, left, right, f, next_instr);
                /* unicode_concatenate consumed the ref to left */
            }
            else {
                sum = PyNumber_Add(left, right); // of course, this is adding big integer not int32
                Py_DECREF(left); // decrement reference
            }
            Py_DECREF(right);
            SET_TOP(sum);
            if (sum == NULL)
                goto error;
            DISPATCH(); // this is roughly defined as `goto` fast_next_opcode
        }

6. Compiler

There is a couple of options to compile python code into machine code to improve speed. Mathematical code with many loops can benefit from these compiling approach.

6.1. Cython

Cython is a ahead-of-time (AOT) compiler

6.2. Numba

Numba is a just-in-time (JIT) compiler. It compiles the bytecode into machine code on the first time the function get executed.

6.2.1. Nopython mode

aggressive compilation. Numba need to be able to infer the types of all variables. numpy arrays and simple scalar types are reasonably well supported. For list and dict, use numba's typed containers: typed-list, typed-dict

6.2.2. object mode

permissive compilation.

6.3. Pypi

Command to upload new packages

# build
python setup.py sdist bdist_wheel

# upload
python -m twine upload dist/my-package-1.0*

7. Numpy

7.1. API

Summary of some commonly used API

API (np.random)

The following API are consistent with np.ones and np.zeros - np.random.randint(high, size=()): generate random uniform int with a target shape - np.random.random_sample(size=()): generate random uni(0,1) with a target shape - np.random.standard_normal(size=()): generate random \(n(0,1)\) with a target shape

8. Reference