proposal: Add coroutines #21

Open Jookia opened this issue on 5 Jul 2022 - 3 comments

@Jookia Jookia commented on 5 Jul 2022

Yesterday I posted a proposal for adding continuations to NewLang and at the time I was writing I found it wasn't a really convincing case. Furthermore when asking Xogium about it they didn't really get it. So here's an alternative: Coroutines!

What are coroutines?

Coroutines are basically easy to use subprocesses that don't necessarily use operating system resources. You can spawn a coroutine to run some code. If they don't use operating system resources then there can be thousands of them running within a single process. They are implemented basically how a RTOS implements threads of execution: Stack switching.

In this case my proposal is to add a new keyword to NewLang: Spawn. Spawn will start a new coroutine and run a function in it. The coroutine is returned as a variable and can be used to communicate and manage the coroutine.

Coroutines have a global variable, ParentProcess. This can be used to talk to the process that created it.

Example: Exceptions

Let's say you're writing some code and you want to do error handling in one place. In most languages this is done through exceptions: You register a place to 'catch' an exception and code you run might 'throw' the exception. When throwing the catcher is immediately run, bypassing intermediate code.

We can do this using coroutines as follows:

  • Spawn a new process that runs some code
  • That code will message the parent process with an error
  • The parent process will respond to the error and kill the process

Here's an example using hypothetical syntax:

Function DoThing1 Do
  System Print StartText Doing some things... EndText
  ParentProcess Send StartText Big Error! EndText
  System Print StartText Things done! EndText
EndFunction
Function DoStuff Do
  StartNote All these might error, but it's ok, MyRunner will handle it EndNote
  DoThing1
  DoThing2
  DoThing3
EndFunction
Function MyRunner Do
  Set Child To Spawn DoStuff EndSet
  Set Error To Child Receive EndSet
  If Error Equals StartText Big Error! EndText
  Then Child Kill
  Else System Print StartText All good! EndText
  EndIf

In this case this program will execute MyRunner by:

  • Spawning a process named Child that will run DoStuff
  • Waiting for a message from Child
  • The runtime sees MyRunner is waiting so it switches to Child
  • Runs DoThing1
  • Prints "Doing some things..."
  • Sends the message "Big Error!" to ParentProcess which is MyRunner
  • The runtime switches back to MyRunner now that there's a message for it
  • MyRunner reads the message to a variable named Error
  • MyRunner checks if Error is "Big Error!"
  • It is, so MyRunner kills the Child process
  • Any cleanup need to be done by Child is done
  • MyRunner then continues on running

Other uses

I don't want to overcomplicate things by writing more examples, but here's some other cases this is useful:

  • Streams and pipelines that join together, like with Bash and Unix
  • Having some other code do tasks for you, such as opening a file
  • Writing code that waits for user input and having another process handle that logic

Here's some things it can't do compared to continuations:

  • Go back in time

Other solutions considered

For error handling, continuation passing style and things like monads could work instead.

@Jookia Jookia referenced the issue on 8 Aug 2022

I've been programming another project in Python over the past days so I want to dump some thoughts here.

When I originally wrote this issue I set out a model like this:

  • You spawn a coroutine
  • The coroutine runs
  • The coroutine does something
  • The coroutine sends a value to the parent
  • The coroutine stops
  • The parent runs
  • The parent kills the coroutine

"Why does the coroutine start or stop?" is a question I didn't address. The coroutine must stop so another coroutine can run, so I figured in my example all coroutines would stop when they send a message and start when they are spawned or have a new message. This is equivalent to saying "switch to this coroutine and give it this value". This is similar to Lua coroutines.

One thing that this doesn't handle is asynchronous I/O. Consider this:

  • You have a server coroutine
  • It waits for a connection
  • It spawns a coroutine for a new connection
  • The new connection sends a response
  • The new connection waits for data from the client

At this point in our program we have two coroutines:

  • The server which spawns workers for each connection
  • A worker handling actual logic

The server is now waiting for the worker coroutine to finish and the worker coroutine is going to spend a lot of time handling a request due to latency. This also stops us from handling other work.

We want our coroutine to say 'stop me until I get data' and let the server run. This is difficult because the server doesn't know the worker even get data, it just spawns workers. How do we do this?

A popular solution is 'futures', the worker would send a command back to the server saying 'I want to do X, give me the result'. The server would collect all these 'future' commands then ask the runtime to do them. The runtime would then say 'I've done X, here is the result' then the server would give the result back to the appropriate worker.

This is kind of a pain to explain and a pain to think about, because now you can't actually do I/O any more, you have to send it to your parent coroutine and have it send it to its parent etc until it gets to the runtime then have it send back down. A questionable benefit from this solution is that you can schedule things yourself I guess.

Python and Javascript and some other language use this system, I guess because it's easy to implement as a library.

Another model is the actor model. This is basically like processes on a computer:

  • A coroutine can send messages to other coroutine's mailboxes
  • A coroutine can get messages from its mailboxes
  • A coroutine can do calculations and I/O
  • A coroutine can wait for a message
  • A coroutine only stops running when it does I/O or waits for a message

For our purposes, the 'I/O' here could be seen as sending a message to the runtime directly and then waiting for a response.

This system appeals to me because it's easy to understand. Languages like Erlang, Ruby. The downside is that it needs to be built in to the language runtime itself. You also don't have to understand what async I/O is.

Another system is 'communicating sequential processes'. This is much like Unix processes:

  • Coroutines communicate through channels which are like pipes
  • A coroutine can do calculations and I/O
  • A coroutine only stops running when it does I/O or writes to a channel

Messaging is done using channels which are basically pipes. I'm not sure how I feel about this: You still send and receive messages, but instead of doing it with other coroutines you do it with channels. Go and Clojure use these models.

One interesting distinction between the actor model and CSP model is how it handles sending:

  • The actor model will send a message and continue, the mailbox size is unbounded and never fails
  • CSP blocks the coroutine until the message has been read by the other side of the channel/pipe

Both of these systems can kind of model each other so the only real differences to me are the way it feels to program them and usage patterns. I'll have to have a think about this I guess. Currently I'm leaning towards the following:

  • A coroutine can be spawned and monitored or killed by a parent
  • A coroutine stops when it does I/O

Some open questions for the design are:

  • Should coroutines have multiple channels/pipes?
  • Should they have a single mailbox instead?
  • Should sending a message block?
  • Should sending a message time out?
  • What happens if you send a message to a coroutine that dies?
  • What happens if you wait for a message from a dead coroutine?
  • Do we want to build giant programs that span servers or small ones that live on single machines?

Hello again readers.

I decided today to meet a friend at a park 3km away. This would take a car 5 minutes, my bike 10 minutes, a kickscooter 20 minutes and my legs walking 30 minutes. I don't have a car or kickscooter, and once meeting my friend I would have to walk my bike with me, so I just decided to walk. I had a lot of time to think about this issue.

I asked myself: What would we use coroutines and asynchronous I/O to make easier?

Let's image two applications.

The first would be some microcontroller audio player:

  • A task that handles user input
  • A task that handles blinking an LED
  • A task that handles buffering audio data
  • A task that handles generating audio data

The second would be some MUD server:

  • A task that handles the server socket
  • A task that handles the database
  • A task that handles game events
  • A task that handles each client

We can write either of these without coroutines or asynchronous I/O. We can instead use a 'superloop' where we do the following:

  1. Gather all the events we need to wait for
  2. Based on each event do some logic

The main problem here is code locality: No longer is the event and the logic right next to each other in the code. Coroutines and asynchronous I/O solve this by letting us write code that handles multiple events spread out over the codebase: We might want to wait for a timer for some game logic but also input from a client at the same time. Coroutines let us write the timer logic and the client logic in two different places without telling each other about it. This makes code easier to understand and modularize.

It's important to note here that all our stuff here are doing is making it easier to understand code. The coroutines are not running in parallel, they are not networked. It is simply a case that coroutines pause to wait for an event then continue when there's no current coroutine running and the event they are waiting for happens. This is known as cooperative multitasking.

But once you delve in to ANY kind of multitasking you have to talk about resource sharing. This is where you and another task want to use the same resource at once, such as a file, the audio output, etc. You MUST agree on how to do this safely or you will have bad things happen. A real world example here is a traffic intersection: Everyone wants to use the intersection at the same time, but if more than one car uses it at once you get a literal crash.

When people talk about multithreading being difficult, it's not necessarily because multithreading is hard, it's because resource sharing is hard to do both correctly and efficiently. It's also why I get a little bothered about people claiming cooperative multitasking makes things easier: It's not the multitasking model, it's the sharing model that causes problems.

So if NewLang is to provide support for coroutines it must provide a way to share things that is easy to understand and easy to user.

The classical solution for resource sharing is for each task to lock something while they use it and signal to other tasks that the resource is free when it's done. This is done using mutexes and semaphores. I'm going to level with you here: These confuse and scare me. I've read a dozen articles on what the difference is between a mutex and semaphore and I still couldn't tell you. Even more importantly: Rigourously testing that these actually work is near impossible. These are something you definitely don't want to touch if you can avoid it.

The alternate approach which is used by the Actor model and CSP is to say: No sharing. Each coroutine owns the resource they use and others must ask it to do things for them. If anyone wants a LED to blink it must ask the LED owner to do it. If the LED owner stops responding it gets killed and restarted. This is done by passing messages. These messages always arrive in order and never get lost.

But... This still bugs me a bit. Sure we now have proper ownership, but what if something happens like 'I want to add $50 to a player's money' so you ask for the current value, add 50, then set it back. But in the meanwhile somebody has taken out $50. So now a player gets $100. Sure you could solve this by having an 'add money' function to the bank but how would this work with something like taxes that are more complicated? What if we require multiple banks for calculating that?

I'm a bit too tired to propose a solution at the moment, but I have the feeling the solution involves not coroutines but some kind of explicit graph that the runtime can schedule in parallel if necessary. A bit like an out of order CPU.

This topic is really starting to hurt my head so I'm going to just write my current thoughts on this and come back to in one day.

Imagine having multiple modules in your program that work with a bank account: The withdrawl module, the deposite module, and the tax module. Now imagine this scenario:

  • The tax module gets your current bank amount
  • You withdraw all your money
  • The tax module sets your bank account to your old amount minus $50 in tax
  • You deposit your old money
  • You now have double your money

The problem here is that we have three tasks using the bank account: The tax module, the withdrawl module, and the despoit module. Each task wants to update the bank account at the same time.

I think the best way to solve this problem is to give data to a task instead of have the task request it. For example, the bank account is given to the data, the task gives it back when it's done. Having the data be given instead of requested means we can model the flow of data as a network or pipeline, with data passed between tasks in a specific order.

Deciding what tasks can be run in parallel is fairly straightforward from that: They don't share data. In our case multiple tasks want access to my bank account, so only one can run at once.

To be specific this can be further broken down in to how data is used: Tasks can share data sources but cannot share data sinks. So we could calculate withdrawls, deposits, taxes in parallel. But the reason why we don't is because they output to the same place that only wants one result, not three.

I'm not exactly sure how I'd implement this system, but I imagine it would go like this:

  • Tasks take data and output more data
  • Tasks can be joined together in a circuit
  • This circuit shows what order tasks need to be run in
  • The ordering shows how tasks can be run in parallel

For instance, let's say we have this:

  • We have a bank task, it stores accounts and takes updated data
  • We have a tax task, it takes data from the bank and sends it to the bank
  • We have a withdrawl task, it takes data from the bank and sends it to the bank

The tax task and withdrawl task both send data back to the bank so they cannot be run in parallel.

Another example is this:

  • We have a Config task, it takes whether the Alarm is armed and it outputs whether an Alarm is armed
  • We have a GUI task that reads from Config and Keyboard, it outputs whether Alarm should be armed
  • We have an Alarm task that reads from a Timer and Config, it outputs if the Alarm was triggered and whether it should be armed

We connect things like this:

  • Config takes from GUI and Alarm whether the Alarm should be armed
  • GUI takes from Keyboard for user input
  • GUI takes from Config for displaying alarm status
  • Alarm takes from Timer and Config whether to trigger
  • Alarm tells Config that the Alarm is not armed on trigger

This is a bit more complex: Various outputs feed back in to each other. GUI and Alarm cannot run in parallel as they both update Config, but this shouldn't matter if both don't spend much time in CPU but instead handling inputs and timers.

I have no idea how any of this would work in practice. I'll just have to try and prototype and make it eventually.

Labels

Priority
default
Milestone
No milestone
Assignee
No one assigned
1 participant
@Jookia