Clean Concurrency in Go

The Go programming language makes it possible to write readable software. But readability is not an ingredient of a language. It follows from the language’s design and how it is used.

In my brief experience with Go, using interfaces enhances readability. Interfaces can be used to separate concerns, so that unrelated thoughts don’t intrude on one another.

Concurrency, one of Go’s “headline features,” enables separation of concerns in a different way. In sequential code, two statements always happen in the same order, even if that order is irrelevant.

Concurrency frees the programmer from having to impose an ordering on two events where none is required. For example, instead of having a main event loop that has to deal with every event in the entire program, you have a bunch of independently executing goroutines, which can be assigned to individual concerns.

Many pedagogical examples of concurrency in Go use closures and channels. Things like this are offered as an introduction to concurrency:

c := make(chan int)
go func() {
	defer close(c)
	for i := 0; i < 10; i++ {
		c <- i
	}
}()

for i := range c {
	fmt.Println(i)
}

Such code shows correct usage of Go’s concurrency features, but not how to use concurrency to for the sake of organization or readability. It provides no example of using concurrency in a complex, evolving program. The result? Spaghetti concurrency.

The wise words of more senior Gophers (see also this video) prompted me to minimize my use of concurrency and focus on interfaces.

So I rewrote one of my projects (about 1000 lines) to use packages and interfaces. I was very productive. Readability and the separation of concerns dramatically improved. What was given up, of course, was the ability to do two things at one time. But this particular program didn’t suffer much from that. In fact, the pipeline model I had used was easily transformed into composed interfaces. I was happy with the trade-off.

Not long afterward, I had to implement a bunch of features that would be really nice to run simultaneously, since they could be made relatively independent of one another. I knew I could link them together with channels. The challenge was to maintain the nice organization while letting it be concurrent.

Comparing interfaces and channels

It is not a wild guess that you can get the best of both worlds by using interfaces and concurrency together. However, there are many ways the two can be combined. A channel of interfaces? Interfaces that hide channels? Channels of channels? Coordination modules? I did not have the patience to search blindly.

Instead, I wanted to understand why the channel-and-closure-based code became unmaintainable. So I took a simple, familiar programming idea—“send and receive”—and expressed it both ways; with interfaces, then with channels and closures.

type Message string

type Sender interface {
	Send(Message) error
}

type Receiver interface {
	Recv() (Message, error)
}

What’s useful about this kind of code is its composability and polymorphism. I can write a “generic” function that works on any Sender or Receiver:

func relay(in Receiver, out Sender) (err error) {
	m, err := in.Recv()
	if err != nil {
		return
	}
	return out.Send(m)
}

To continuously relay messages, one need only run this in a loop:

for {
	if err := relay(in, out); err != nil {
		// handle error
	}
}

Analogous channel-based code might be like this:

func relay(in <-chan Message, out chan<- Message) (err error) {
	m, ok := <-in
	if !ok {
		return errors.New("closed input channel!")
	}
	out <- m
	return
}

One difference is that the types <-chan Message and chan<- Message are not interfaces, and thus can’t be “implemented” by anything but themselves. Nevertheless, channels are similar to interfaces in that they can be used to separate concerns. My guess was that the quality of separation differs between the two.

Send and Recv could be encapsulating almost anything. They are appropriate for concurrent and sequential tasks equally. Whereas, the channel operations strongly evoke an image of at least one goroutine that does the complementary operation on the channel in question.

Asking the right question

What would be the identity of the “complementary” goroutine? This is a presumptive, higher-order question, because goroutines do not have an identity. We don’t even have a guarantee that there is only one goroutine on the other side, or that it is fixed over the lifetime of the channel.

This means we need a convention, and that requires discipline and clear documentation, to ensure that the convention is elicited when reading the code. Assuming that such a convention exists, we have to follow the channel. We have to look at the scope in which the channel was declared. Per convention, it should have gotten within scope of a function that is executing concurrently.

In the interface code, if I want to find the identity of the Sender, I still have to do some scope analysis, but I have an advantage. I know that a Sender must be a type; particularly, a type that implements the Send method. I have to trace the Sender value back to the scope of its declaration, at which point I would probably find out its concrete type, given by a constructor function. Then I could easily find the implementation of the Send method. A Sender has a clear identity.

Understanding how an interface works means knowing two things: the concrete type (which receiver) and the abstract type (which methods). The first can be found by a bit of scope analysis; the second, by looking up a declaration.

Understanding how a channel works means knowing all the places where it is used. That requires scope analysis, which, to be successful, requires programmer convention. Importantly, the behavior of a channel is not subject to a declaration.

Identity over anonymity

It has become clear that interfaces give information about identity in a way that channels do not. The Sender and Receiver interfaces could embody concurrent code, yet do not expose any channel types. On the other hand, not exposing any channel types makes it impossible to use select statements, which are irreplacable. My guess as to the solution?

An interface that exposes channels.

type Sender interface {
	Send() chan<- Message
}

type Receiver interface {
	Recv() <-chan Message
}

The interface encodes and documents the identity of the concurrent task. The channels provide the low-level details required to coordinate and communicate with it.

Then there can be a pretty simple convention that channels must be clearly derived from an interface. Channels should not be anonymous.

func relay(in Receiver, out Sender) (err error) {
	m, ok := <-in.Recv()
	if !ok {
		return errors.New("closed input!")
	}
	out.Send() <- m
	return
}

This code has in and out executing concurrently, and their role in program architecture is clear. If I want to understand what a Sender is, I don’t have to find all the places the channel is used. I just look up type and method declarations.

Conclusion

Anonymity frustrates coordination; identity helps it.

What makes “channel-and-closure” code unmaintainable is its inability to evoke the identity of the concurrent concerns of a program. There is a Go proverb, “interface{} says nothing.” Something similar can be said of channels. Channel types in themselves leave the “complementary goroutine” anonymous. Channels provide a means of separating concurrent concerns, but do not say what the concerns are. In interface-based code, readability is enhanced because concerns can be identified with types. That cooperates with existing conventions about documentation and code organization.

Go’s highly composable design means that channels and interfaces can be used together to get the benefits of each. But finding the right way to combine these features requires a structural analysis of the codebase, which benefits from a comparative approach.

Lessons learned

When creating a pipeline where each stage has the same type of input and output,

  1. Define a one-method interface to be implemented by each stage. It should return a channel for either sending or receiving.
  2. Document the interface so that it is clear what kind of behavior is expected; for instance, when the channel should close.
  3. Allow one stage to connect to another by providing it a reference to that stage, not a channel. (Identity over anonymity.)
  4. Avoid launching secret goroutines. Provide a Serve method that does the job of the stage. That way, client code can control the concurrency.

There are two ways to express a pipeline by nesting interfaces; supply-driven and demand-driven.

Supply-driven means that the pipeline is driven from the input. A driver generates data and has a reference to the next stage.

func drive(next Writer) {
	for i := 0; i < 10; i++ {
		next.Write() <- i
	}
}

Demand-driven means that the pipeline is driven from the output. A driver consumes data by a reference to the preceding stage.

func drive(prev Reader) {
	for i := range prev.Read() {
		fmt.Println(i)
	}
}