Skip to content

Latest commit

 

History

History
472 lines (395 loc) · 15.2 KB

README.adoc

File metadata and controls

472 lines (395 loc) · 15.2 KB

Python Resource Management

I would like to share an idea about RAII-like approach for the resource management there. I don’t follow the Python world (my preference go for Scala, C++ or Rust), so it’s most likely the idea I present here is not new; someone must have "discovered" it before me. TL;DR The idea is to use coroutine trampolining to avoid nesting with blocks in the code, and so to allow RAII approach: when an instance goes out of its scope, a release action is executed.

1. What do we mean by "resource management"?

By a resource we mean anything which follows the pattern Acquire — Use — Release. That is, a resource which is aquired should be eventually released. A typical example from the native world is a block of memory allocated on the free store, in C++ acquired with new and released with delete (C: malloc/free).

An example which everybody knows is opening a file (C: fopen) which returns a handle to a just opened file which we should later pass to some close mechanism (C: fclose). Some other examples most developers have met: TCP sockets, database connections or mutexes.

The main problem with a proper resource managemnt is ensuring that once a resource is acquired, it is released somewhere later, even in the presence of exceptions (in languages having them). This is a well known problem, so I will make it short:

def program():
    f = open('file.txt', 'w')
    ...
    f.write('Hello, world')
    ...
    f.close()

What if the code between open and close returns or throws (in the Python lingo, raises)? Then we leave the file open "forever", which is a resource leak. For scripts which exit immediately, this might not be a big problem, but for every long running application we need to ensure this will not happen: If we open more and more files and don’t close them, the operation system will eventually reject opening one more; if we don’t close a mutex, we have a deadlock.

2. Quick tour on resource management solutions

2.1. RAII

An approach which C++ (and Rust and more) has taken is the Resource acquisition is initialization idiom — RAII. The idea is that for each resource we introduce a scope-bound resource manager (also known as owner). The resource is usually acquired as part of instantiation of the manager (C++: manager contructor, mgr m(…​)), and the release action is called automatically when the manager goes out of the scope (C++: manager destructor, m.~mgr()). For brevity I will not speak about the move semantics which allows us to pass the ownership (and hence the responsiblity to release the managed resource) between different contexts. The file example in C++:

void program(const std::string & fname) {
    std::fstream f(fname);
    ...
    f << "Hello, world";
    ...
    // f.~fstream() is called here, right before }
}

The important thing is that the compiler ensures that the release action is called always, on all exit paths, including the exceptional ones. One nice property is that RAII approach nests really well:

void program() {
    resource1 r1(...); // acquire the resoure #1    -------+
    resource2 r2(...); // acquire the resoure #2      --+  |
    ...                // use them                      |  |
    // automatically called (note the order):           |  |
    //   r2.~resource2() // release the resoure #2    --+  |
    //   r1.~resource1() // release the resoure #1  -------+
}

2.2. Java, C#: Try-with-resources

This is the approach taken by many mainstream languages with automatic memory management: Java, C# (for the Python version of this, please refer to the next section). The idea starts with try-finally block:

BufferedReader br = new BufferedReader(new FileReader(path));
try {
    ... // use `br` here
} finally {
    br.close();
}

(an example from Oracle Java tutorial). The runtime guarantees that whatever happens inside the try block, the finally block will be executed.

The pattern is so common that the languages introduced some special measures. In Java, it is the try-with-resources block:

try (BufferedReader br = new BufferedReader(new FileReader(path))) {
    return br.readLine();
}

We create an instance derived from a special interface Closeable (C#: IDisposable) which defines the release action close() (C#: dispose()) called in case of successful construction of Closeable instance.

This pattern can lead (and often leads) to a spaghetti code (similar in nature to the callback hell known from Javascript before await/async syntax) with nested resources:

try (Resource1 r1 = new Resource1()) {
    try (Resource1 r2 = new Resource2()) {
        try (Resource1 r3 = new Resource3()) {
            ...
        }
    }
}

There are ways to deal with this nesting and the code shifted "too much right", but it’s not the purpose of this text to discuss them.

2.3. Python: with statement

The Python approach is similar, but more functional-programming like (quite surprisingly in Python) — we will use this fact to our advantage later. The correct way to open a file for writing in Python is the following:

def program():
    with open('file.txt', 'w') as f:
        ...
        f.write('Hello, world')
        ...

The with statement starts a block of code, introducing a new variable (as …​) for a handle (f) for the just acquired resource (opened file). What happens underneath here:

  1. open(…​) creates an instance of a context manager (call it ctx) which saves the parameters, but itself does nothing, just represents what it means to acquire and release the resource. (This is the functional programming face of it: to defer doing the actual side effect to somewhere else.)

  2. Then, ctx.__init__() is called, making the system call to open the file with parameters saved from the ctx initialization.

  3. When leaving the block of code indented after with, Python calls ctx.__exit__() which closes the file. The __exit__ method is called both on standard return and if an exception is raised.

As with try-with-resources in Java or C#, we often see Python codebases nesting with blocks and code there shifted too much right:

def program():
    with resource1(...) as r1:
        with resource2(...) as r2:
            with resource3(...) as r3:
                ...

Again, there are ways to deal with this, but it requires some care which is not always seen in real codebases.

2.4. Other approaches — find them yourself!

This text is not meant as an exhaustive resource on resource management — there are definitely other approaches, the most interesting I know is a Resource[IO, T] abstraction in the IO monad world. See the Cats Effect implementation of it.

3. RAII in Python

As mentioned above, there is a problem that nesting with blocks causes our code to look like spaghetti shifted too much right. I would like to present an idea how this can be prevented using another Python language feature, coroutines. I have not seen this before, but I’m not a Python developer (meh) and so it’s very likely somebody got the idea before me — yet I was not able to find any reference on Google for this (maybe I searched for bad words). I would like to know any prior knowledge on this: please let me know at [email protected].

The idea is to have something like RAII in Python — when a variable goes out of the scope, we want a release action to run:

def program():
    r1 = <RAII> resource1()
    r2 = <RAII> resource2()
    r3 = <RAII> resource3()
    ...
    # on program exit, run the release actions
    # for `r1`, `r2`, `r3`, in reversed order, similarly to RAII in C++

The <RAII> stands for some "magic" to convince Python to "register" release actions to be run when we leave the scope. This looks like as an impossible task in Python, but it is not. What we want to do in the runtime is what Python allows us to do with the with statement at the time of writing the code:

def program():
    with resource1(...) as r1:
        with resource2(...) as r2:
            with resource3(...) as r3:
              ...

i.e. we want to delegate the guarantee to call release actions to Python itself and not "invent" some new "runtime" on top of Python runtime (which is what IO libraries do in JVM). At the same time, we want to avoid using with blocks and their inherent nesting (which probably is by design and in accordance with the rule Explicit is better than implicit).

The idea is to not call program directly, but manage its execution as a coroutine execution:

def program():
    r1 = yield resource1(...)
    r2 = yield resource2(...)
    r3 = yield resource3(...)
    ...

# for the implementation of the "driver" of this coroutine
# please continue reading

An oversimplified introduction: a coroutine is a "function" from which you can return back to the caller with yield (instead of return), but unlike with ordinary functions, the caller can pass the execution back to the callee to the point where it was leaved before (after the last yield), possibly passing a value there — all you need is to assign a result of yield to a variable.

So, a coroutine execution can be driven from outside. In the example above, the code driving the program needs to execute it as if it was an ordinary function

def program():
    with resource1(...) as r1:
        with resource2(...) as r2:
            with resource3(...) as r3:
                ...

Without further ado, here it is:

def program():
    r1 = yield resource1(...)
    r2 = yield resource2(...)
    r3 = yield resource3(...)
    ...

def run(program):
    coro = program()
    def stack(res):
        with res as r:
            next_res = coro.send(r)
            stack(next_res)
    stack(next(coro))

run(program)

What happens here?

  • The run function creates a generator from the supplied program. We save this generator as coro. Note that coro is now suspended, i.e. prepared to be run; nothing has happened yet.

  • Next, next(coro) is called. This actually enters the body of program and executes resource1(…​) which returns a context manager res1 for the resource #1 (not the resource handle itself as mentioned above — this is the crucial point).

  • The context manager res1 is yielded ("sent" in the sense of message passing) from program back to run, and passed there to stack as res.

  • Now we are at the line with the with statement which calls r = res1.__enter__(). The r is the handle to to the just acquired resource #1.

  • coro.send(r) resumes the program where it was left and sends there the handle r which is saved as local variable r1.

  • Now, the program continues and creates a context manager res2 for the resource #2 which is again yielded (sent) to run and saved to next_res variable.

  • run continues by executing stack(next_res) and the history repeats: we acquire the resource #2 by res2.__enter__()ing it, the program is resumed again provided the resource handle which is there saved to a local variable r2

  • And so on.

So, we gradually build the nested with blocks inside the run driver and each time we make a new with block, we resume the program with the resource handle — and since the nesting is done inside run (with the help of recursion instead of hardcoding it), the program itself is relieved from it.

Let me show you a concrete example:

from contextlib import contextmanager

@contextmanager
def resource(r):
    print('resource::acquire', r)
    try:
        yield r
    finally:
        print('resource::release', r)

def program():
    a = yield resource(1)
    print('use a =', a)

    b = yield resource(2)
    print('use b =', b)

    c = yield resource(3)
    print('use c =', c)

    assert False, "intentional error"

def run(program):
    coro = program()
    def stack(r):
        with res as r:
            next_res = coro.send(r)
            stack(next_res)
    stack(next(coro))

run(program)

The output:

resource::acquire 1
use a = 1
resource::acquire 2
use b = 2
resource::acquire 3
use c = 3
resource::release 3
resource::release 2
resource::release 1
Traceback (most recent call last):
  ...
    assert False, "intentional error`

The @contextmanager part is just a convenient way to create a context manager. You can see that the program itself is a nice function (more precisely, a generator function) without any syntactic noise and without any nesting, yet even in the presence of exception (assert False), the release actions are called for r3, r2 and r1 (in the right, reversed order).

Again: This idea I have not seen anywhere, but this does not mean I am the first person who "discovered" it. Please let me know if you have seen this before.

This is the idea itself and what follows is just an iteration / warning that there are caveats. If we change our program to

def program():
    for i in range(1000):
        a = yield resource(i)
        print('use a =', a)

    assert False, "intentional error"

and run it, we get an unpleseant RecursionError: maximum recursion depth exceeded while calling a Python object caused by recursive calling of stack.

I’m not a Python person, so I will present a simple solution for this, but I wouldn’t be surprised if this had a better solution — I just want to demonstrate a solution exists:

async def run(program):
    coro = program()

    async def stack(res):
        with res as x:
            next_res = coro.send(x)
            next_stack = asyncio.create_task(stack(next_res))
            await next_stack

    await stack(next(coro))

asyncio.run(run(program))

Instead of letting the execution stack grow, we use asyncio to turn stack into an "awaitable" Task we we submit to the underlying executor. This way, every call of stack gets its own indepedent context and no RecursionError will happen.

Let us try again with this asyncio version of run. The output of program is then

resource::acquire 0
use 0
resource::acquire 1
use 1
...
resource::acquire 998
use 998
resource::acquire 999
use 999
resource::release 999
resource::release 998
...
resource::release 1
resource::release 0
Traceback (most recent call last):
  ...
    assert False, "intentional error"
AssertionError: intentional error

Please let me know if you find this intersting or if you have seen this trick before, making the with statement nesting inside a function driving a coroutine execution. To my best knowledge, this is not published anywhere as of today, but I don’t follow Python world and googling is often not much helpful in getting this kind of information.

I can imagine that for the use case which made me think about the ways of resource management in Python and which requires acquiring many nested resources, this can be a revolution in code safety / clarity.

Waiting for your feedback 🙏