Expedient Set Operations in Unix Shell
Set theory rocks. It especially rocks for programmers, because set theory and computers have a special relationship. You see, if you can describe an answer you want in set-theoretic terms, using a programming language with support for sets, the computer can execute your description and give you an answer.
Compare this with what we normally need to do to get an answer from the computer: describe, with conditionals and jumps and counters, exactly the steps the computer must perform in order to answer our question.
Here, we use Python's support for sets to find the students who are both on the Honor Roll and in the Drama Club (aka \(honorroll \cap dramaclub\)):
honorroll = set(['Bob', 'Joe', 'Sally']) dramaclub = set(['Bob', 'Sally']) print honorroll.intersection(dramaclub)
honorroll = ['Bob', 'Joe', 'Sally'] dramaclub = ['Bob', 'Sally'] def intersection(xs, ys): l, s = [xs, ys] if len(xs) > len(ys) else [ys, xs] result =  for x in l: for y in s: if x == y: result.append(x) return result print intersection(honorroll, dramaclub)
For getting answers from the computer clearly and quickly, without many opportunities for us to introduce bugs, set-theoretic
Instead of describing how to get answers from the computer at a low level, with conditionals, counters, and jumps, we can do it at a really convenient level — set theory —
Most programming languages these days have a set data structure and operations to go with it.
Usually we need to instruct computers to give us answers at a low level: the level of conditionals, jumps, and counters.
The language of sets is a convenient middle-ground between us and the computer. We can state problems intuitively without resorting to the language of the computer: that of conditionals, jumps, and counters.
No conditionals, jumps, or counters needed.
You see, set theory gives us a vocabulary we can use to describe many kinds of computing problems in very clear ways. Beyond having an awesome way to describe our problems, we have an awesome way to get the answers we need. That's because computers are awesome at performing set operations.
State the problem in terms of set and set operations, and the computer evaluates the problem and gives us our answer. Radical, no?
Many problems can be solved easily by expressing them in set-theoretic terms. Even in humble Unix shell.
It's useful to compare files by their contents in shell. A typical question
might be: what lines are in file
A, but not in file
contain IP addresses, one per line, and we want to know which IPs appear in
but not in
A, because we work at an ISP and we're tracking down people who
illegally downloaded the latest Taylor Swift album or some other redonk thing.
We can think about the question in terms of set theory. The cool set theory
way to pose the question is: what's the set of IP addresses that are in
There's a particular set-theoretic operation — relative complement — to express exactly this relationship. It's also known as difference, and it's written like this: \(B \setminus A\).
$$B \cap A$$
The awesome part about expressing answers we want in set-theoretic terms is that computers can easily perform these operations. It's almost like our questions become executable. Even in the humble Unix shell!
Peteris Krummins wrote a comprehensive article that demonstrates many set operations that can be performed in shell using various programs. It's a terrific article, but I found it hard to remember everything well enough to be able to use in a pinch.
Below, I demonstrate only union, intersection, and relative complement (a.k.a.
difference), using only
uniq. My hope is that you can
completely internalize the examples, and that you'll never need to refer to this
The examples assume the following two files,
a b c
b c d
Sets in Shell
Commands you need:
A but not in
The shell It's frequently useful to compare files to one another. In many cases, the comparisons
Like the article?
Get notified of future blog posts. Don't worry - we won't make it hard to get to inbox zero: no more than 2 e-mails a month. We promise.