Dealing with inversion of control flow

Daniel Näslund

#15561

June 14, 2018

This is just some free-wheeling thoughts. No concrete question in here (or there probably is, but I just can't formulate it yet). Sorry if I come across as someone sniffing paint thinner.

I've noticed that a couple of design choices in ion has been about avoiding "inversion of control flow":

* The compiler uses tagged unions instead of inheritance (given that we're in C, it's a more obvious choice than if we were in C++ or another language with syntactic abstractions for class hierarchies, but still it's a choice).
* The platform layer noir uses something which I guess resembles the immediate GUI paradigm (these things are not something I'm familiar with so bare with me. My experience with GUI programming has been from slots and signals OO toolkits like Qt and GTK where everything is callback based.)
* The assembler cmd_loop uses what Per refers to as "a poor mans co-routine", where we avoid the inversion of control caused by breakpoint callback, by re-enter the cmd_loop.

Avoiding callbacks and inheritance makes coding more fun and easier to me, but I'm curious about where the limitations of these approaches are. I've used yield statements in Python and promises in Javascript and it has led to great simplifications of some problems, but I always have this feeling that I'm in a "shadow land" in-between regular straight-forward code (uh, the inverse of inverse control flow. Can't come up with a name for it) and inheritance/callback based code.

Does anyone have good suggestions on how to think about these things?

Edited by Daniel Näslund on June 14, 2018, 8:34pm Reason: Initial post

ratchetfreak

#15562

June 14, 2018

If you don't have compiler or language support for coroutines you end up needing to implement a state machine for each of them. You'll need to save all variables that are alive at each yield explicitly.

Stack support (yielding in a nested call) will require the entire call stack supporting the yield by being yielding themselves.

Debugging is kinda a pain because it's not always obvious which state each coroutine object is in. This is especially true when the coroutine is created with compiler support because that will tend to hide the resume point.

Daniel Näslund

#15565

June 15, 2018

ratchetfreak
If you don't have compiler or language support for coroutines you end up needing to implement a state machine for each of them. You'll need to save all variables that are alive at each yield explicitly.

Stack support (yielding in a nested call) will require the entire call stack supporting the yield by being yielding themselves.

Debugging is kinda a pain because it's not always obvious which state each coroutine object is in. This is especially true when the coroutine is created with compiler support because that will tend to hide the resume point.

Thank you for responding. Here are more rambling thoughts:

After reading this reply I found Simon Sathams article Coroutines in C that uses the stack for keeping track of where to reenter a function. I can see how that pattern resembles what Per wrote for the breakpoint_callback.

int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: /* start of function */
        for (i = 0; i < 10; i++) {
            state = 1; /* so we will come back to "case 1" */
            return i;
            case 1:; /* resume control straight after the return */
        }
    }
}

I guess two components which consumes data in "lock-step", are never required to use callbacks. For instance a lexer and a parser can be written as the lexer calling into a parser callback; or it could be rewritten using coroutines (we just need one-way communication there so a generator would do, yielding tokens); or it could be rewritten using an iterator-style API where we have state context and a next method, like the next_token() function used in the bitwise lexers. So this applies to all those "data in one form" to "data in another form" problems.

For two components where there's back-and-forth passing of control-flow, we can use two state machines; or threads with message passing between them; or the modules can be rewritten to use coroutines. But, as you mentioned, coroutines comes with a "mental cost". The control flow might be easier to follow for simple cases, but maybe the overhead of keeping track of states will consume the benefits for larger programs? Are there other ways of structuring command loop programs, besides these I mention here?

For 1:N relations between components, say a state change in X needs to cause update of state in N components, we have to rely on callbacks or we have to manually keep track of all state (I guess that's what the immediate mode GUIs do). Coroutines can't help us here.

My confusion is about the interleaving of these concepts:

Granularity of concurrency: Fibers (non-preemptive threads), coroutines, green-threads, kernel-threads, processes, systems
Polymorphism: Tagged unions, class-based inheritance, duck-typing
Syntactic sugar over callbacks: callbacks, promises, async-await, continuations?
Higher level control flow constructs: straight-line code, conditionals+loops, callback, coroutines, iterators, excepions, message passing between threads, message passing between processes, message passing between machines
Data binding: retained mode, immediate mode, reactive. "Who is responsible for keeping track of the program state"?

Does anyone have suggestions on books or articles that clearly describes these concepts?

Are threaded code as used by forth a form of continuation?

Edited by Daniel Näslund on June 15, 2018, 6:28am

ratchetfreak

#15566

June 15, 2018

dannas

After reading this reply I found Simon Sathams article Coroutines in C that uses the stack for keeping track of where to reenter a function. I can see how that pattern resembles what Per wrote for the breakpoint_callback.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: /* start of function */
        for (i = 0; i < 10; i++) {
            state = 1; /* so we will come back to "case 1" */
            return i;
            case 1:; /* resume control straight after the return */
        }
    }
}

That doesn't use the stack but static local variables, which are essentially globals, except not accessible outside the function they are declared in

dannas

I guess two components which consumes data in "lock-step", are never required to use callbacks. For instance a lexer and a parser can be written as the lexer calling into a parser callback; or it could be rewritten using coroutines (we just need one-way communication there so a generator would do, yielding tokens); or it could be rewritten using an iterator-style API where we have state context and a next method, like the next_token() function used in the bitwise lexers. So this applies to all those "data in one form" to "data in another form" problems.

For two components where there's back-and-forth passing of control-flow, we can use two state machines; or threads with message passing between them; or the modules can be rewritten to use coroutines. But, as you mentioned, coroutines comes with a "mental cost". The control flow might be easier to follow for simple cases, but maybe the overhead of keeping track of states will consume the benefits for larger programs? Are there other ways of structuring command loop programs, besides these I mention here?

Maybe the best example I've seen is zlib's API, there after you init you pass an input buffer and an output buffer, it does its processing consuming from input and putting to output and returns once it cannot continue anymore because one of them ran out. Then the caller has to give more input or more output space as appropriate and call the function again.

This allows it to be easily slotted into either a push or a pull stack. Depending on which end controls the data rate.

Daniel Näslund

#15569

June 15, 2018

ratchetfreak

That doesn't use the stack but static local variables, which are essentially globals, except not accessible outside the function they are declared in

You're right, I pasted the code without reading it through properly. I guess that in order to store local variables I need to copy the current call frame somewhere using mechanisms like the POSIX setcontext family of functions.

ratchetfreak

Maybe the best example I've seen is zlib's API, there after you init you pass an input buffer and an output buffer, it does its processing consuming from input and putting to output and returns once it cannot continue anymore because one of them ran out. Then the caller has to give more input or more output space as appropriate and call the function again.

This allows it to be easily slotted into either a push or a pull stack. Depending on which end controls the data rate.

Speaking of compression APIs, I stumbled upon a gist by Micha Mettke on API design and found a short reply by Per Vognsen where he has a sketch of the following API:

struct unzip_t unzip = { ... };
  unzip_init(&unzip);
  struct unzip_packet_t packet;
  while (unzip_next(&unzip, &packet)) {
    if (packet.type == UNZIP_FILE) {
      packet.file.address = zip_file_memory + packet.file.offset;
    } else if (packet.type == UNZIP_TOC) {
      free(toc);
      packet.toc.address = toc = malloc(packet.toc.size);
    }
  }
return unzip.status != UNZIP_ERROR;

It provides fine-grained interleaving between user code and library code and it's another example of a type of API that I haven't come across in the past.