# Expedient Set Operations in Unix Shell

Set theory rocks. It especially rocks for programmers, because set theory
and computers have a special relationship. You see, if you can describe an
answer you want in set-theoretic terms, using a programming language with
support for sets, the computer can *execute your description* and give you an
answer.

Compare this with what we normally need to do to get an answer from the computer: describe, with conditionals and jumps and counters, exactly the steps the computer must perform in order to answer our question.

Fortunately, most modern managed programming languages include a set data structure and set operations. The Unix shell does not, but Peteris Krumins

Here, we use Python's support for sets to find the students who are both on the Honor Roll and in the Drama Club (aka \(honorroll \cap dramaclub\)):

```
honorroll = set(['Bob', 'Joe', 'Sally'])
dramaclub = set(['Bob', 'Sally'])
print honorroll.intersection(dramaclub)
```

```
honorroll = ['Bob', 'Joe', 'Sally']
dramaclub = ['Bob', 'Sally']
def intersection(xs, ys):
l, s = [xs, ys] if len(xs) > len(ys) else [ys, xs]
result = []
for x in l:
for y in s:
if x == y:
result.append(x)
return result
print intersection(honorroll, dramaclub)
```

For getting answers from the computer clearly and quickly, without many opportunities for us to introduce bugs, set-theoretic

Instead of describing how to get answers from the computer at a low level, with conditionals, counters, and jumps, we can do it at a really convenient level — set theory —

Most programming languages these days have a set data structure and operations to go with it.

Usually we need to instruct computers to give us answers at a low level: the level of conditionals, jumps, and counters.

The language of sets is a convenient middle-ground between us and the computer. We can state problems intuitively without resorting to the language of the computer: that of conditionals, jumps, and counters.

No conditionals, jumps, or counters needed.

You see, set theory gives us a vocabulary we can use to describe many kinds of
computing problems in very clear ways. Beyond having an awesome way to describe
our problems, we have an awesome way to get the answers we need. That's because
computers are awesome at *performing* set operations.

State the problem in terms of set and set operations, and the computer evaluates the problem and gives us our answer. Radical, no?

Many problems can be solved easily by expressing them in set-theoretic terms. Even in humble Unix shell.

It's useful to compare files by their contents in shell. A typical question
might be: what lines are in file `A`

, but not in file `B`

? Maybe `A`

and `B`

contain IP addresses, one per line, and we want to know which IPs appear in `B`

but not in `A`

, because we work at an ISP and we're tracking down people who
illegally downloaded the latest Taylor Swift album or some other redonk thing.

We can think about the question in terms of set theory. The cool set theory
way to pose the question is: what's the set of IP addresses that are in `A`

but
not in `B`

?

There's a particular set-theoretic operation — *relative complement* — to
express exactly this relationship. It's also known as *difference*, and it's
written like this: \(B \setminus A\).

$$B \cap A$$

The awesome part about expressing answers we want in set-theoretic terms is that
computers can easily *perform* these operations. It's almost like our questions
become executable. Even in the humble Unix shell!

Peteris Krummins wrote a comprehensive article that demonstrates many set operations that can be performed in shell using various programs. It's a terrific article, but I found it hard to remember everything well enough to be able to use in a pinch.

Below, I demonstrate only union, intersection, and relative complement (a.k.a.
difference), using only `cat`

, `sort`

, and `uniq`

. My hope is that you can
completely internalize the examples, and that you'll never need to refer to this
article again.

The examples assume the following two files, `A`

and `B`

:

`A`

contains:

```
a
b
c
```

`B`

contains:

```
b
c
d
```

### Sets in Shell

Commands you need:

`cat`

`sort`

`uniq`

#### Union

**In either A or B**

#### Intersection

**In both A and B**

#### Difference

**In A but not in B**

The shell It's frequently useful to compare files to one another. In many cases, the comparisons

Peteris Krummins wrote an excellent and comprehensive article

### Like the article?

Get notified of future blog posts. Don't worry - we won't make it hard to get to inbox zero: no more than 2 e-mails a month. We promise.

###### About Adzerk

Adzerk is a cloud API platform that enables user-first sites and apps to build an in-house native ad platform in weeks, not years.