A recent **revelation** I had is:

All I’m doing when I program is nothing but doing algebra - algebra on numbers and algebra on types.

Algebra on numbers are prevelant in functional programming, when we write a function, we mostly either do some algebra with the arguments, or call some other functions to do algebra for us. Afterall, functional programs are kind of like “executable mathamatics”.

What do I mean by “algebra on types”? Well, the algebra that I know, aka high school algebra, has **additions** and **multiplications** on numbers. “Doing algebra on types” simply means instead of doing additions and muliplications on numbers, we do “additions” and “multiplications” on types. Types that result from “doing algebra on types” are called **Algebraic Data Types**. The two fundamental algebraic data types are **Sum** type which corresponds to “adding” two types, and **Product** type which corresponds to “multiplying” two types.

## Part 1: Sum (aka Coproduct)

If we think of types as sets of values, then a **Sum** type corresponds to the **union**^{1} of two types. In set theory, if we take two completely different sets, one has 5 elements and the other has 10, then the union of these two sets has `5 + 10 = 15`

elements. What does this corresponds to in types? And what is a good example of a **Sum** type?

Well, if we think of `True`

as a set, i.e., a singleton set that only has one element called `True`

, and `False`

as another set, i.e., a singleton set that only has one element called `False`

, then `Bool`

is actually a **Sum** type! `Bool`

is exactly the union of the two singleton sets, the `True`

set and the `False`

set. The number of elements in `Bool`

, as expected from set theory, is 1 + 1 = 2.

Actually, if we think of the **sum** type this way, all types are sum types. Integers are sum of all individual integers, Strings are sum of all possible strings:

```
data Bool = True | False
data Integer = 1 | 2 | 3 | ...
data String = "" | "a" | "b" | ... | "aa" | "ab" | ...
```

Of course, this is not how these primitive types are implemented in haskell, but at least conceptually, they are sums/unions of lots of different singleton sets.

A more classic example of a sum type is ** Maybe a** (i.e.

**in scala). For example,**

`Option[A]`

`Maybe Char`

is a union of set `Char`

and the singleton set `Nothing`

. That is, an element of type `Maybe Char`

is either `Just Char`

(128 possible values, suppose we restrict Char to 128 ASCII characters) or `Nothing`

(1 possible value). So, `Maybe Char`

has `128 + 1 = 129`

possible values which is how many elements a union is supposed to have.In fact, in languages that have `null`

, it just feels like all non-primitive types are `Maybe`

types. For example in java/scala, although the type signature tells you that a variable is of type `Boolean`

, but because `null`

can appear anywhere, the variable can actually also be `null`

. That is, when the compiler tells you it’s a `Boolean`

, it’s actually not a `Boolean`

, it maybe is a `Boolean`

… I think that’s perhaps why `null`

causes lots of run-time null pointer exceptions.

Another example of a sum type that I used a lot but I’m not aware of the fact that they are sum types is subclasses in OOP languages. Say there is an abstract class/trait called `Animal`

and two subclasses `Dog`

and `Cat`

, then in the code where the type signature tells us it’s an animal, we know that it can either be a dog or a cat. That is, the set `Animal`

is the union of set `Dog`

and set `Cat`

.

## Part 2: Product

Let’s still think of **types** as **sets**. As discussed before, **Bool** is a set of two elements, `True`

and `False`

. **Char** is a set of 128 elements (suppose Char is restricted to ASCII characters). We can represent the **Cartesian Product** of these two sets as a new type which we denote as `(Bool, Char)`

. This new type, `(Bool, Char)`

, has 2 components: the first component is a `Bool`

and it has 2 possible values, `True`

and `False`

, and the second component is a `Char`

and it has 128 possible values. Therefore the new type has 2 * 128 = 256 possible values, which is exactly the number of elements that the Cartesian product is meant to have. This new type, usually called a **pair** or **tuple** depending on the language, is a classic example of a **Product** type. It represents the Cartesian product of two types.

What else are also product types? Well, **records** (aka **dictionary**) are also **Product**. The only difference it has with the pair type is how the components are named. For a pair, the components are named as as `fst`

and `snd`

in haskell, `_1`

, `_2`

in scala, `[0]`

, `[1]`

in python, whereas records/dictionaries are named using some user-defined names. Really, this is a superficial difference.

What’s more revealing is that, if you think about it, a (immutable) **class** in an OOP language is in essence also a **Product** type! They are just like records or dictionaries, the components of the product are member variables (aka, fields, instance variables, etc) and methods. Actually, this type of **Product** (products with with custom names and functions and without mutation), is so useful that scala has a special syntax for them: `case class`

. In fact, all case classes automatically extend **Product** in scala.

## Part 3: Algebra on Sum and Product

with addition and multiplication, we can actually do some simple algebra already. **List**, perhaps not surprisingly, is simply the result of performing a few simple additions and multiplications on some generic type. The classic definition of a list is as follows:

`data List a = Nil | Pair a (List a)`

It’s a recursive definition, which basically says that `List a`

is either `Nil`

(empty list) or it’s a pair whose first element is of type `a`

and the second element is another `List a`

. If we translate this into the “type algebra”, it’s something like this:

\[ \begin{align} List(a) = Nil + a \times List(a) \end{align} \]

Well, we have one equation and one unknown - \(List(a)\), maybe we can try to solve for it:

\[ \begin{align} List(a) & = Nil + a \times List(a) \\ & = Nil + a \times (Nil + a \times List(a)) \\ & = Nil + a \times Nil + a \times a \times List(a) \\ & = Nil + a \times Nil + a \times a \times (Nil + a \times List(a)) \\ & = Nil + a \times Nil + a \times a \times Nil + a \times a \times List(a) \\ & = Nil + a \times Nil + a \times a \times Nil + a \times a \times a \times List(a) \\ & = ... \\ & = \sum_{i=0}^\infty{ a^i \times Nil } \end{align} \]

Wow, the resulting equation perhaps reveals more substance of a list: a list contains either no element of type a and `Nil`

(\(a^0 \times Nil\)), or 1 element of type a and Nil (\(a^1 \times Nil\)), or 2 elements of type a and Nil (\(a^2 \times Nil\)), or 3 elements of type a and Nil, and so on… Looks like the distributive law in high school algebra also applies to type algebra…

## Part 4: Exponential

So, now it looks like that pretty much any data type I use when I do any sort of programming, from primitive types like integers, booleans, to more complex types like pair (aka tuple), Maybe (aka Option) and list, to user defined data types like `case class`

and classes in class hierachies, **they are all algebraic data types!!!** That is, they are simply some algebra on primitive types! In some sense, when I program in a functional style (without mutation), **all I’m doing is nothing but algebra - algebra on numbers and algebra on types**.

There’s one thing that’s omitted here - functions. Can functions also fit in to this story of doing algebra? aren’t they some different beasts altogether? Well, it turns out that they are called exponential in category theory. Here’s why from a layman’s perspective.

Let’s say a function is of type `Bool -> Char`

. Let’s still think of types as sets, so in this case, one of the possible elements in the set `Bool -> Char`

is:

```
myFunction b = if b then 'a' else 'b'
:t myFunction
myFunction True
myFunction False
```

```
## myFunction :: Bool -> Char
## 'a'
## 'b'
```

`myFunction`

takes a Bool `b`

and returns `'a'`

if the argument is `True`

or `'b'`

if it is `False`

. How many distinct such functions can there possibly be? That is, how many elements are there in the set `Bool -> Char`

?

Again let’s suppose `Char`

is ASCII character and it has only 128 elements. When `b`

is `True`

, there’re 128 different functions, one for each of the 128 possible ASCII characters; when `b`

is `False`

, there’re also `128`

different functions that each return different ASCII characters. So in total, we have \(128 \times 128 = 128^2\) such functions in the set of `Bool -> Char`

. In category theory, mathematicians call this type an **Exponential**, denoted as \(Char^{Bool}\). And it sort of describes the cardinality of the type.

So, the boundary between **function types** and **data types** are sort of blurred from the point of view of catgegory theory.

Actually, the boundary between *function* and *data* is also sort of blurred. We can think of functions as a lookup table that stores the data. Unlike elements in a data type which are themselves data, elements in a function type, i.e. a function, is like a database, we need to give it an argument, or a query, in order to get some value back.

Also, in one of the previous post, we can actually use pure functions to implement boolean values.

## Conclusion

That’s my recent revelation. I don’t know where it will lead me to lol.

Actually to be more precisely, the sum type corresponds to

**tagged unions**in set theory.↩