kdsch.org

2020 Jan 17

Data in the History of Automation Tools

Shells have always been used for both interaction and automation. Today, automation tools are highly evolved and diverse. Here are some notes I’ve taken while searching for meaning in the chaos.

Louis Pouzin coined the term “shell” and also wrote the first one: RUNCOM. This program featured composable commands and permitted automating repetitive tasks. Multics shell, inspired by Pouzin’s ideas, was created in 1965 by Glenda Schroeder. In 1971, Ken Thompson wrote a similar shell for Unix.

A shell is a human interface to an operating system, such that design pressures exerted on operating systems also impact shells.

In 1973, Version 4 Unix was rewritten in C, contrary to the thinking of the time that operating systems must be written in assembly language. The migration to C demonstrated the operating system’s portability to different machines, a rare but precedented achievement. Economy motivated portability; it had been a common and costly practice to completely rewrite software for incompatible machines.

sed, the “stream editor”, appeared in 1974, for automating text file editing. It has only strings, although the lack of variables makes even this an overstatement.

Mashey shell, also called PWB shell, released in 1975, introduced single-letter variables. The $ character, used previously only for script arguments, became the syntax for variable values, and could be used to insert a variable value into a double-quoted string. This feature would later appear in Perl and PHP.

You could concatenate two variables with x="$a$b". In C, it would be sprintf(x, "%s%s", a, b). Clearly, the shell version is pithier.

make, written by Stuart Feldman and included in PWB/Unix 1.0 (1976), provided automatic dependency analysis in software builds, which streamlined the C programming workflow. Prior to make, builds were automated by shell scripts, with no awareness of which dependencies needed to be rebuilt.

Where shell offered a vocabulary for filesystems and concurrent processes, make refined this vocabulary by giving files contextual roles as either targets or dependencies of a recipe.

The exec system call in V6 Unix took a list of arguments for the new process, but no environment. Thus, at one point, command-line arguments provided the only way to parametrize a process invocation.

awk, a text pattern processing language, was created in 1977. It worked with both numbers and strings. It appeared in Version 7 Unix (1979), and was often used to generate reports.

V7, the first Unix to have environment variables, supported them in exec, and the Bourne shell exposed them to users for passing data between programs. Bourne shell intended to be more suitable for scripting, approaching the power of conventional programming languages. In The Unix Programming Environment, it was the first shell to be presented as a programming language.

At the level of system calls, the environment, being a null-terminated array of strings, differed from the positional arguments only by its elements taking the form "var=value" by convention. This design, riding atop portable C pointers, proved simple, economical, and human-readable.

One can imagine Lisp programmers of the day shaking their heads. Today, a new generation has taken up this pastime.

It’s difficult to know what kind of data was transmitted via environments in the early days, but given the persistence of this design, I guess it wasn’t very complicated. Programs working with complex data would need an advanced parser, and would read their input from a file. There was evidently no desire for the operating system to take up structured inter-process communication, freeing processes of the burden of (de)serialization. Plain text is a cornerstone of the Unix philosophy, down to the syscalls.

Current languages often wrap the environment in a table-like interface. I suspect most programmers know little of the underlying implementation.

During the 1980s, commercialization of Unix eroded its portability as vendors carved up the market by offering incompatible variants. Competition drove complexity. The configure script became a popular way to maintain portability at the expense of the simplicity of build systems. GNU Autotools perfected the art of automatically generating these scripts, a pinnacle of absurdist software.

Perl, created in 1987, improved on shell, sed, and awk. The Debian GNU/Linux distribution took it up as a system programming language; generally, it glues components together and wrangles data. Where data is lacking, such as in Unix-style IPC, glue languages restore a semblance of structure.

dpkg, the Debian package manager, was written in shell in 1994. It was rewritten in Perl the same year.

In 2003, Jeffrey Snover wrote The Monad Manifesto, which proposed a new shell-like language suited to Windows system administration. Snover, a longtime Unix user, made the case that administrative tasks on Windows needed to be automatable, and that a scripting language in the spirit of Unix shell could serve this purpose. However, porting shell and popular Unix commands to Windows proved futile, due to core architectural differences between Unix and Windows: “everything is a file” vs. “everything is an API”.

Snover’s PowerShell achieved a similar degree of composability to Unix shells of old, but worked with .NET objects instead of bytes and strings. It is a rare example of a shell with a rich type system. Many PowerShell commands are implemented internally to the interpreter, bypassing the need to marshal and unmarshal data structures as byte streams. This gives efficiency, power, and correctness.

The irony of a Unix diehard creating a shell with object-oriented IPC for Windows will not be lost on anyone. I take it as a sign of hope.


Discussion

Discuss this post by emailing my public inbox. Please note the etiquette guidelines.

© 2021 Karl Schultheisz — Lancaster, PA, USA