0x346 Python

1. Data Structure
- 1.1. Sequence
- 1.2. Data Class Builders
2. Functions
3. Classes and Protocols
- 3.1. Iterable, Iterator and Generator
- 3.2. Decorators and Closures
4. Concurrency
- 4.1. Asynchronous
5. CPython Interpreter Internal
6. Compiler
Server
- WSGI
- ASGI
8. Reference

1. Data Structure

1.1. Sequence

There are two types of sequences from the standard library

container sequences: contains the reference (e.g: list, tuple, collections.deque)
flat sequences: contains the value directly (e.g: str, bytes, bytearray, memoryview, array.array)

1.2. Data Class Builders

A naive data class using traditional class and __init__ approach is not good because __repr__ and __eq__ are mostly useless.

A few data class builders are better

collections.namedtuple
typing.NamedTuple
@dataclasses.dataclass

The followings are useful features of dataclasses

@dataclass
class Data:
 a: str # @dataclass will make 'a' as an instance field
 c: ClassVar[str] = '' # declases a class instance
 b: list = field(default_factory=list) # default list

 def __post_init__(self):
    # can be used to do postprocessing

d = Data(...)

# convert to dict
dataclasses.asdict(d)

2. Functions

Functions in python are first-class objects, which means they can be

created at runtime
assigned to a variable
passed as an arg to a function
return as the result from a function

Btw, Guido van Rossum did not view python as a functional programming.

3. Classes and Protocols

3.1. Iterable, Iterator and Generator

There are many implicit iterators: for i in x actually invokes iter(x) which in turn may call x.__iter__()

There are three related concepts:

Iterable: An object which can generator iterator. In python, either __iter__ or __getitem__ should be implemented to be an Iterable
Iterator: An object that implement __next__ and raise StopIteration when items exhausted
Generator: simple lazy iterator, any function using coroutine yield

we should use itertools library for many iterator related functions

3.2. Decorators and Closures

Decorators are just syntactic sugar, the decorator runs at import time. In practice, decorator defines an inner function and return it

@decorate
def f():
    print('running f()')


# previous one is just a syntactic sugar of the following
def f():
    print('running f()')

target = decorate(f)

A closure is a function with extended scope capturing free variables

# ave = make_averager()
# ave(10) => 10
# ave(11) => 10.5
# ave(12) => 11.0

def make_averager():
    series = [] # this is a free variable captured by averager, it can be seen with ave.__code__.co_freevars

    def averager(new_value):
        series.append(new_value)
        total = sum(series)
        return total / len(series)

    return averager

nonlocal declaration let you declare a variable as a free variable even it is assigned in the function

def make_averager():
    count = 0
    total = 0

    def averager(new_value):
        nonlocal count, total
        count += 1
        total += new_value
        return total / count

    return averager

4. Concurrency

4.1. Asynchronous

Async IO helps us utilize the IO wait time by allowing us to perform other operations while we are in the wait state.

4.1.1. asyncio

An async function is called a coroutine, in python they are implemented with the same philosophies as generators. an await is similar in function to a yield statement.

>>> import asyncio

>>> async def main():
...     print('hello')
...     await asyncio.sleep(1)
...     print('world')

>>> asyncio.run(main())
hello
world

4.1.2. gevent

4.1.3. tornade

5. CPython Interpreter Internal

5.1. Data Structure

5.1.1. PyObject

The object definition in cpython is here where the most basic type is defined as

typedef struct _object {
    _PyObject_HEAD_EXTRA // struct _object *_ob_next; struct _object *_ob_prev;
    Py_ssize_t ob_refcnt;
    PyTypeObject *ob_type;
} PyObject;

So basically each python object looks to be a item in a double-linked list with ref count.

The variable length objects is using PyVarObject defined as

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

For example, str, list are derived from this PyVarObject. When retrieving length by builtin len, it will look up ob_size

5.1.2. int (PyLongObject)

Integer related struct are as follows, reference is from include/longintrepr.h

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1];
};

There are some optimizations for small integers (-5 ~ 256) where python allocated in advance, so do not need to malloc everytime for those integers. Note that tuple has the same type of optimizations.

5.2. Lexical Analysis

5.3. Syntactic Analysis

python code is immediately parsed into ast and into bytecode when interpreted. The asr will be discarded, so we need access to the source code to reproduce ast.

import ast
from pprint import pprint

tree = ast.parse("""
def add(a, b):
  return a + b

c = 1
""")
pprint(ast.dump(tree))

5.4. Bytecode Compile

The compiler here takes the AST and compile it into python bytecode, we can see how they get compiled with dis module. There is also a 3^rd party disassembler uncompyle6 translate pyc back to bytecode

For a function, the bytecode will be stored into f.__code__ and might be cached in .pyc file

The compiler implementation is cpython/Python/compiler.c

dis.dis('a=1') # compile this get following bytecodes
# 0 LOAD_CONST               0 (1)
# 2 STORE_NAME               0 (a)
# 4 LOAD_CONST               1 (None)
# 6 RETURN_VALUE

Current bytecode is available here in cpython/include/opcode. It has around 165 bytecodes.

The bytecode can be fetched from function and manipulated through __code__

def add(x):
  return x+1

# Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=0, starts_line=2, is_jump_target=False)
# Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=1, argrepr='1', offset=2, starts_line=None, is_jump_target=False)
# Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None, argrepr='', offset=4, starts_line=None, is_jump_target=False)
# Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=6, starts_line=None, is_jump_target=False)
for x in dis.Bytecode(add):
  print(x)

# bytecode is stored as byte in __code__.co_code section
code = add.__code__
print(code)

# it can be iterated
# 0 124
# 1 0
# 2 100
# 3 1
# 4 23
# 5 0
# 6 83
# 7 0
for i, op in enumerate(code.co_code):
  print(i, op)

5.4.1. Bytecodes

arithmetic ops BINARY operation here means takes the top 2 from stack and perform operation then put it back. TOP: top item of the stack, TOP1: 2^nd item of the stack

BINARY_ADD: TOS = TOS1 + TOS
BINARY_MULTIPLY
INPLACE_ADD // for case such as i += 1

logic ops

UNARY_NOT
BINARY_AND

variable ops

LOAD_CONST: load const variables to stack
LOAD_FAST: this loads local variables to stack
SAVE_FAST: save local variables to stack
LOAD_GLOBAL: loads global variables to stack

function ops

CALL_FUNCTION: execute function
RETURN_VALUE: return

stack related ops

POP_TOP: pop from stack, this has the op number 1!
SET_TOP

others ops

ROT_TWO: swap two vars on top of the stack, used in cases like (a,b)=(b,a)
BUILD_STRING: build string
BUILD_TUPLE: for most commonly used python data structures, it looks we have this type of ops

5.4.2. Optimization

In the recent PEP 659, the concept of specialization is introduced into python. CPython becomes adaptive and allows the high-level bytecode to be rewritten to a more specialized bytecode during the execution.

For example, BINARY_ADD might be written to BINARY_ADD_UNICODE or BINARY_ADD_INT depending on the operands.

The rewritten would not happen immediately on the first call but would be triggered after being called for a few times.

5.5. Virtual Machine

CPython VM executes those bytecode with a stack machine architecture (e.g: push, pop, execute ops)

The main VM is implemented cpython/python/ceval.c

// for example, this is BINARY_ADD impl and some of my comments
        case TARGET(BINARY_ADD): {
            PyObject *right = POP(); // this is roughly doing (*--stack_pointer)
            PyObject *left = TOP();  // this is defined as stack_pointer[-1]

            PyObject *sum;
            if (PyUnicode_CheckExact(left) &&
                     PyUnicode_CheckExact(right)) {
                sum = unicode_concatenate(tstate, left, right, f, next_instr);
                /* unicode_concatenate consumed the ref to left */
            }
            else {
                sum = PyNumber_Add(left, right); // of course, this is adding big integer not int32
                Py_DECREF(left); // decrement reference
            }
            Py_DECREF(right);
            SET_TOP(sum);
            if (sum == NULL)
                goto error;
            DISPATCH(); // this is roughly defined as `goto` fast_next_opcode
        }

6. Compiler

There is a couple of options to compile python code into machine code to improve speed. Mathematical code with many loops can benefit from these compiling approach.

6.1. Cython

Cython is a ahead-of-time (AOT) compiler

6.2. Numba

Numba is a just-in-time (JIT) compiler. It compiles the bytecode into machine code on the first time the function get executed.

6.2.1. Nopython mode

aggressive compilation. Numba need to be able to infer the types of all variables. numpy arrays and simple scalar types are reasonably well supported. For list and dict, use numba's typed containers: typed-list, typed-dict

6.2.2. object mode

permissive compilation.

6.3. Pypi

Command to upload new packages

# build
python setup.py sdist bdist_wheel

# upload
python -m twine upload dist/my-package-1.0*

Server

WSGI

WSGI (Web Server Gateway Interface) is an interface forwarding web requests such as HTTP requests (i.e. server side) to python applications (i.e. client side)

The application side is expected to implement interface as follows:

# A relatively simple WSGI application. It's going to print out the
# environment dictionary after being updated by setup_testing_defaults
def simple_app(environ, start_response):
    setup_testing_defaults(environ)

    status = '200 OK'
    headers = [('Content-type', 'text/plain; charset=utf-8')]

    start_response(status, headers)

    ret = [("%s: %s\n" % (key, value)).encode("utf-8")
           for key, value in environ.items()]
    return ret

There are a few implementations:

Werkzeug: WSGI library including a simple wsgi server, used by flask
gunicorn: WSGI server using pre-fork worker, worker can be either gevent (using greenlet) or gthread

ASGI

ASGI (Asynchronous Server Gateway Interface) is designed to be a successor of WSGI, it forward requests to async python applications:

async def application(scope, receive, send):
    event = await receive()
    ...
    await send({"type": "websocket.send", ...})

8. Reference

[1] cpython Github (https://github.com/python/cpython)
[2] cpython internals (https://github.com/zpoint/CPython-Internals)
[3] Ramalho, Luciano. Fluent Python: clear, concise, and effective programming. " O'Reilly Media, Inc.", 2015.