Monday, 11 July 2016

The Game of Hats

Lord Shamsay Snow, Warden of the North in the grim land of Pesteros delights in playing games with his prisoners. Especially games which end with said prisoners being skinned alive. Nothing enhances delight like holding out some hope of escape, then brutally extinguishing it.
Having rounded up a hundred of the miserable wretches, Shamsay announces:
“You will all play the Game of Hats – where you either go free or you die.”

The game will proceed as follows:
All prisoners will be lined up facing the dungeon wall.
Prisoner 100 stands right against the wall, prisoner 99 stands behind him and so on.
Shamsay’s trusty servant, Meek, will place a hat – which may be coloured Black or White – on their heads. Obviously they won’t know the colour of the hat. All they will see is the hats of everyone in front of them.

Now, starting from Prisoner 1, right at the back, Shamsay will ask each one of them the colour of the hat on their heads, one by one.
The only words they can say are “Black” or “White” – which they will shout out for everyone to hear.
Any other word or phrase leads to everyone being fed to the hounds. (This is what Shamsay is secretly hoping for….)
Now, if the word they shout out corresponds to the colour of their own hat, they go free. Otherwise, they are skinned alive. Such is the Game of Hats.

As the prisoners stare at him, hope and apprehension mingling in their eyes, Shamsay offers them a sporting chance – they have till next episode to figure out any way they can to save their sorry selves.
As the camera pans onto Shamsay’s leer and the credits start to roll, can you anticipate what follows? Will the next episode show a procession of prisoners hobbling off to freedom, or will carnage abound? Now may be a good time to go and think….


At most one prisoner dies, all the rest go free.

How in Pesteros did that happen?
Well, due to the ingenuity of an extremely short prisoner of very high intelligence, who then insisted on standing right at the front of the line.

The solution involves “Arithmetic mod 2” which is much simpler than it sounds.
The basic idea is that every number is replaced by its remainder when divided by 2 - so every even number becomes 0, every odd number is 1.
This applies to negative numbers as well -  -2, -12, - 1000 are all 0, while -1, -25, -117 are all 1.

Now the cool thing is, you can add, subtract and multiply “mod 2” as well, hence “Arithmetic mod 2”.
All you need to do is add, subtract or multiply as you normally do, then divide by two and take the remainder.
To illustrate:
13 + 16 mod 2 = 29 mod 2 = 1

4 x 3 mod 2 = 12 mod 2 = 0
3 – 7 mod 2 = -4 mod 2 = 0

All clear? Good. Now on to the hats.

The strategy starts with assigning a number to each colour: Black = 0, White = 1.
Prisoner 1 adds up the numbers corresponding to the 99 hats he sees in front of him mod 2.
If he gets a 0, he shouts “Black”, if 1, then “White”.
Prisoner 2 does the same for the 98 hats in front of her, then subtracts what she gets from Prisoner 1’s answer. This gives her the colour of her hat and she shouts it out.

What about the rest, say Prisoner 17 ?
Well, prisoner 17 starts off by getting the total of the 83 hats in front of him. Call it X.
Then he hears Prisoner 1’s shout and gets the corresponding number. Call it Y.
Now as Prisoners 2 to 16 call out their colours, he subtracts 1 from Y every time he hears “White” and o when he hears “Black”.
By the time Shamsay gets to him, all the subtraction leaves him with some number, Z.
So “What is the colour of your hat ?”.
Calculate (X – Z ) mod 2. If 0, say “Black”, if 1, say “White”, and he is free !

Was that confusing? Okay, let me give an example with ten prisoners rather than a hundred.
Suppose, this is the sequence of hats on their heads, with prisoner 1 on extreme left and Prisoner 10 on extreme right (B is black, W is white).


Now then, Prisoner 1 sees 4 black hats and 5 white hats.
So, following the procedure, he gets (4 x 0 + 5 x 1) mod 2 = 5 mod 2 = 1
Hence, he shouts out “White”. 
Alas, his own hat is black, so he dies, but this is the only casualty.

Prisoner 2 sees 4 black hats and 4 white hats.
So she gets (4 x 0 + 4 x 1) mod 2 = 4 mod 2 = 0
Subtracting from Prisoner 1’s answer gives 1 – 0 = 1, so she says “White”.
And notice that this is the colour of her hat!

Now let’s try it for Prisoner 5.
He sees, 2 black and 3 white hats, so X = (2 x 0 + 3 x 1) mod 2 = 1
Now he hears Prisoner 1, shout out “White”, so Y = 1.
As Prisoners 2 to 4 shout out their answers, he keeps subtracting accordingly.
Easy to verify that:  Z = 1 – 1 – 1 – 0 = -1
Now it’s his turn to answer.
Well, X – Z mod 2 = 1 – (-1) mod 2 = 2 mod 2 = 0. And “Black” it is! J

Note that the number of prisoners makes absolutely no difference.
Be it ten or a hundred or millions, everyone goes free except maybe Prisoner 1. And there’s a 50-50 chance that even he or she gets it right.

So what happens if Shamsay ups the ante by increasing the number of hat colours, to Black, White, Red and Green ?
Makes absolutely no difference.  At worst Prisoner 1 dies or they all go free.

If you followed the argument above, you will quickly see that the solution here  is to use “Arithmetic Modulo 4” where you take remainders by 4 rather than 2.
So, 13 + 16 = 29 mod 4 = 1,  8 -4 = 4 mod 4 = 0, 11 x 2 = 22 mod 4 = 2  etc.

To spell it out further, assign a number from 0 to 3 to each colour, say Black = 0, White = 1, Red = 2 and Green = 3. The strategy is exactly analogous, except you need to do the calculations using the four numbers corresponding to the four different colours and do all the arithmetic mod 4.

Exercise: With 10 prisoners, work out the procedure in detail for the hat sequence below:

No matter how many prisoners or hat colours, the worst possible outcome is one death. Thanks to the power of modular arithmetic.

With his approval ratings plummeting wildly -  “At most one death per episode ? Yawn.” -  Lord Shamsay is in despair until he gets a note from a mysteriously named GRRM.
“Don’t let them shout out the answer, dammit ! Have them whisper it in your ear !”

Carnage ensues. With no shouted answer to guide them, each prisoner is reduced to making a guess. With two hat colours about half of them get it wrong, more colours make it worse.
There is much screaming and bloodshed, the fans are ecstatic and the show is destined to continue for many more seasons.

Analytical Interlude
Time to tune out the morbidity and engage in some dispassionate analysis.
If you think about the “shout version” of the game, you realize that in principle, the outcome is not too surprising.
Every prisoner knows the colour of the hats of the people in front, and shouting allows them to communicate information. Yes, it takes a lot of ingenuity to convey all that information in a single word, but ultimately it all boils down to each person getting to know the colour of their hat from the prisoner behind them in a suitably coded form.
It is, therefore, also reasonable that  Prisoner 1 is at risk – after all, there is no one behind him to tell him his colour.

The “whisper version” of the game destroys the information channel and each person must now figure out their hat colour based only on the hats in front of them.
Is this always impossible? Not necessarily.
For instance, if Shamsay put black hats on everyone, or an alternating sequence of black and white hats, then it’s quite possible that prisoners may figure out the pattern. But all hope is lost if he assigns the colours randomly, say by tossing a coin each time.

Surely, if the hat colours are chosen randomly, then it is impossible to figure out the colour of your own hat from the ones in front ? Surely, in such a case, half the prisoners must guess their colour wrong on average ?
In a finite world, yes.
But what if we throw infinity into the mix?

The Infinite Game
In a strange dimension unimaginably distant from Pesteros, yet all too near, R’Holler, the Lord of Might has an infinity of demons at his command.
His command is to play the Game of Hats. (Why ? Do not question the will of R’Holler, puny mortal !)

The rules are the same. Hats are either black or white.
Demon 1 stands behind Demon 2, who stands behind Demon 3 and so on forever.
(Math Alert: This implies that the infinity of demons is a countable infinity.)
All too aware of Shamsay’s initial discomfiture in the mortal realm, R’Holler goes straight for the “whisper version” of the game.
Oh yes, punishments. The penalty for guessing the wrong hat colour is the worst fate imaginable – being born as a commoner in Pesteros.

Based on our analysis above, since this is the whisper version, each demon has  a 50% chance of getting their colour wrong. So, is Pesteros headed for a population explosion of literally infinite proportions?
In fact, even if the demons work out some intricate statistical strategy so that the chance of a wrong answer is 1% or 0.0001%, any fraction of infinity is infinity and Pesteros is doomed.

So, what if I tell you that:
There exists a strategy to ensure that only a finite number of demons get their hat colour wrong – ie, the proportion of demons getting it wrong can be made precisely zero.
All without any communication whatsoever between the demons during the game.

How in the seven heavens and hells ???
This will take a bit of explanation.

Strings and Grouping
Consider any arrangement of black and white hats chosen by R’Holler.
This would correspond to an infinite sequence of the letters B and W.
We will call such a sequence a “string”.

We will call two strings “finitely different” if their values differ in finitely many locations and “infinitely different” otherwise.

To illustrate, consider the four strings below.
S1: B B B B B B B B ..… all black
S2: BWWWBWWB…all black henceforth
S3: BWBWBWBW…alternating B and W
S4: BBBWWWBW…alternating B and W henceforth

You can verify that S1 and S2 are finitely different, as are S3 and S4. However, S1 is infinitely different from both S3 and S4.

To start things off, here’s an Easy Result:
If S1 is finitely different from S2 and S2 is finitely different from S3, then S1 is finitely different from S3.
Quick Proof: Suppose S1 differs from S2 at n different locations and S3 differs from S2 at k different locations. 
Then S1 and S2 differ at at most  k + n different locations.

Now, I am going to break up the set of all possible strings into groups as follows.
Pick any string (the string which is all B for instance), call it S1. Make a group consisting of all strings which are finitely different from S1.
Anything string left over? If so, call it S2. Make another group consisting of all strings which are finitely different from S2.
Anything string left over? If so, call it S3. Make a group… get the idea.

We keep repeating this again and again until all possible strings belong to some group or another.

Observe two facts about this grouping:
Fact 1: Every possible string belongs to some group.
Fact 2: No string belongs to two different groups.

How come Fact 2?
Okay, suppose a string S belongs to two different groups, say group of S1 and group of S2. This means S is finitely different from S1 and S2 is finitely different from S.
But then the Easy result we just proved implies that S2 is finitely different from S1.
But then S2 and S1 should be in the same group, which isn’t true. QED.
(My mathematician friends will realize that I am partitioning the set of strings into equivalence classes, but enough of that.)

Fact 1 and Fact 2 taken together can be combined to give us:
Big Fact: If the set of all possible strings is partitioned into groups whose members are all finitely different from each other, then every string will belong to exactly one of these groups.

Winning the Infinite Game
For what follows, we shall assume that all demons have infinitely good vision, infinite memory and infinite computing power.
Any objections ? None ? Very well then…

As a first step, the demons combine their infinite computing power to split the set of all possible strings into groups of finitely different strings as above.
The next step is to form a Representative Set, R, consisting of exactly one string from each group. (Each string in the set ‘represents’ its group, hence the name.)
At the third step, the infinite memory comes in handy – the demons simply memorize all the strings in the Representative Set.
(Note: Infinitely many different Representative Sets can be formed. The demons come to a consensus about which one to use before memorizing it.)

Now the demons line up as commanded and the Lord of Might places the hats.
With their infinitely good vision, each demon can see all the hats in front of them.

There will be a unique string, S*, in the Representative Set, R, which is finitely different from the string of hats on the demons’ heads.
Furthermore, each demon can identify S* simply by looking at the hats in front of it.

Proof: Let S represent the string of hats.
By the Big Fact stated above, S belongs to exactly one group of finitely different strings.
R consists of exactly one string from each such group.
So, S* will be the string in R which belongs to the group where S lies.
Given how the groups were formed, S and S* will be finitely different.

Consider Demon 17, say.
It can see all but the first 17 entries of S. (The 17 missing entries being the hat on its own head and the ones behind.)
Demon 17 simply goes through each string in R, picks out all but the first 17 letters and compares the result with what it sees. (Infinite computation power, remember?)

Now, suppose a string in R differs from S at infinitely many locations.
Removing 17 letters will reduce the number of different locations by at most 17 – ie, the truncated strings will still be infinitely different.
Conversely, if a string in R is only finitely different from S then, after truncation, the number of differences will either stay the same or decrease – ie, the truncated strings will be finitely different.

From what we stated above, only S* is finitely different from S and everything else in R is infinitely different. Hence, Demon 17 can identify S* because only the truncated version of S* will be finitely different from the string of hats it can see.

Clearly, the argument above applies to Demon 1, Demon 113274 or any demon in the line.

So, finally we have it.
The hats are placed, the demons see what’s in front of them, and all of them identify S*.
When the questioning begins, Demon 1 whispers the first colour in the string S*, Demon 2 whispers the second colour and so on.
Since S* differs from the actual string S in finitely many places, all but finitely many of them get their hat colour correct.
To really see how surprising this is, imagine R’Holler walking down the line hearing the answers.
Initially, there are some mistakes, but at some point he passes the last location where S and S* differ, and from then on every single demon gets its hat colour correct !!!

In fact, if you think about it, it doesn’t matter if the Lord of Might walks down the line from back to front. He can question them in any order and the result is the same !!!

If you think a bit more, you will see that the same strategy works for any number of hat colours with exactly the same result !!!
In fact, even with infinitely many different hat colours, you get the same result !!!!
If that didn’t get an exclamation from you, I don’t know what will !!!!!

In Conclusion
Did the infinite game bother you ? It ought to.
In the finite version of the game played by Shamsay, blocking the information flow between prisoners by changing from “shout” to “whisper” led to a devastating change of outcome.
In the infinite version, the lack of communication matters not at all – in terms of proportions, precisely 0% of players guess wrong.

Maybe we can figure out the secret by trying to apply the winning strategy to a finite game with, say, 100 prisoners.

Very well then, we start off with strings of B and W of length 100.
Now we lump them into groups of finitely different strings.
Result: Since the strings are all finitely long, any two strings are finitely different, so the entire set forms a single group !
But let’s go on.

Since there’s only one group, the representative set, R, consists of just one string.
Basically, all the prisoners agree on one string and memorize it.

Now they are standing in line. What next ?
Well, there’s no problem of “recognizing S*”, since there is only one S* in the set R
And the strategy ? When Shamsay comes up, simply recite the letter of S* corresponding to your location.

Does the strategy work ? Yes it does !!
You see, if they are very lucky, the string of hats on their heads is exactly S*, whereby everyone is right.
Or it could be that only one person gets it wrong. Or all hundred. Or any number in between.
In every single case above, only finitely many people get their hat colour wrong, which is precisely what the strategy promised to achieve.

So there you have it. A strategy which worked with almost magical power for infinitely many players is totally ineffective, even ridiculous, in the finite world.

Which brings me to a point I have often made about infinity.
Most people imagine infinity by starting with something very big, then making it bigger and bigger and bigger and then it becomes, you know, infinite.
But when you think carefully, the world of the infinite reveals completely peculiar features with absolutely no finite analogue.

Infinity is infinitely weirder than you think.

Wednesday, 14 October 2015

The Ladder of Infinity

Infinity comes in many different sizes.

Yes, it’s true. Get over it.
Or at least, let’s think about why it so difficult to get over it.
After all, nobody is particularly bothered when I say “Numbers come in many sizes”.

I suspect the reason is that when people hear “Infinity” they think of the
Poetic Definition: Infinity is the Ultimate Maximum

This notion of Infinity as the Absolute, the Whole, “Something greater than which nothing can be conceived” has a long history in human culture.
Notions of the Infinite are frequently associated with a Supreme Being.

Viewed in this light, the idea of different sizes of infinity is indeed very confusing.
How can there be a larger or smaller Ultimate Maximum?

The scientific study of infinity – a branch of mathematics known as Set Theory – begins by descending from the lofty heights of the Absolute and starts with a
Prosaic Definition: An infinite set is a collection of objects that does not end

Viewed in this light, different sizes of infinity suddenly look more plausible.
For instance, consider the collections {0, 1, 2, 3, 4, 5 ...} and {2, 3, 4, 5, …}
Both never end, but the latter has two fewer members than the former.

Now if you look at {3, 5, 7, 9,….} (the odd numbers greater than 1), this has infinitely fewer members than both the above.
Surely it is conceivable that, while all three sets above are infinite, they have different sizes -
just like the numbers 5, 17 and 5000 are all bigger than 2, but different from each other?

But how can we be sure ?
After all, as everyone knows, only Rajinikanth can count to infinity. J
Let’s start at the very beginning – with the idea of a Set.

The Joy of Sets
A set is a collection of objects. (Some technical conditions apply but they can wait.)
We take the objects belonging to the collection and put curly brackets around them.
{1, 2, 3} is a set. So is {a, b, c} or {Jack, Jill}.

The objects belonging to a set are called its members.
1 is a member of {1, 2, 3} and b is a member of {a, b, c}, but 1 is not a member of {a, b, c} and vice versa.

Of particular interest to us will be N, the set of positive integers (called “Natural Numbers” by mathematicians).
N = {0, 1, 2, 3, 4,….}  is an example of an infinite set, as opposed to the other sets we saw above which are finite.

Once you have a few sets in hand, you can more sets out of them, using a couple of operations.

Union: Given two sets A and B, their union A ∪ B, consists of all the members that belong to A or to B.
So, if A = {1, 2, 3} and B = {2, 4, 6}, then A ∪ B = {1, 2, 3, 4, 6}

Using this idea repeatedly, you can define the union of any number of sets or even an infinite number of sets.
So, for instance, if A1 = {0}, A2 = {0, 1}, A3 = {0, 1, 2} and so on, we can take the union of all the sets to get A1 ∪ A2 ∪ A∪ ... = {0,1, 2, 3, 4, ….} = Our old friend N

Intersection: Given two sets A and B, their intersection A ∩ B, consists of all the members which belong to both A and B
So, if A = {1, 2, 3} and B = {2, 4, 6}, then A ∩ B = {2}

What if A and B have nothing in common? Let’s say if A = {1, 2, 3} and B = {4, 5, 6}?
Well, in that case, A ∩ B is defined to be the Empty Set, which contains nothing at all.
The Empty Set is denoted by { }, (there’s nothing inside the brackets) or the letter ∅

Just like unions, you can also talk about the intersection of infinitely many sets, although we won’t make much use of them in this post.

Subsets: A set A is said to be a subset of another set B, if every member of the set A also belongs to B.
So for instance {2} and {2, 6} are both subsets of B ={2, 4, 6}.
But {2, 5} is not.

Two important things to note:
-         Any set is a subset of itself. So {2, 4, 6} is a subset of {2, 4, 6}. (Check that the definition is satisfied!)
-         The Empty Set, { }, is always a subset of every set.

These last two facts often confuse newbies to set theory, but here’s a way to think about it:
To generate a subset of a given set – go through every member of the set, one by one, and make a decision about whether to include that member or not.
Every possible set of decisions corresponds to a valid subset.

So, if A = {3, 4, 5}, you could decide to include only the first member and exclude the rest.
This gives you the subset {3}.
If you decide to include the first two and exclude the third, you get {3, 4}

If you decide to include all the members, that gives you the set A.
If you decide to include none, you get the empty set.

Exercise 1: If a set, A, has n members, show that there are 2n possible subsets of A

With subsets under our belt, we come to a crucial concept for studying infinity.

Firstly, note that a set can have other sets as its members.
So, for instance: A = { {1, 2} , {3, 4, 5}, {a, b} } is a set.
Note that {1, 2} and {a,b} are members of A. But { {1,2}, {a,b}} is a subset of A.
However, {1, 2, 5} is not a member of A. (Neither is it a subset)

Now, are ready to define:
Power Set: The power set of a set A is the set P(A) consisting of all subsets of A.
Best to illustrate this by example:
If A = {1, 2, 3}, the power set of A is given by:
P(A) = { { }, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3} }
(Note carefully the placement of curly brackets and commas)

Exercise 2:
(easy) If A has n members, show that P(A) has 2n members
(harder) If A = {1, 2}, write down P( P(A) ) (“power set of power set of A”)

Note that everything we are doing can be defined for an infinite set like N.
The power set of N will consist of finite subsets like {3, 5, 7} and well as infinite subsets like
{All the odd numbers}

So, what’s all this got to do with different sizes of infinity?
Well, the language of sets allows us to define sizes of sets – finite or infinite – in a very precise way.
In fancy math-speak, the word used for “size” is “cardinality”, but we’ll stick with “size”

Sizing Things Up
Imagine you have forgotten how to count.

Is there any way for you to distinguish the size of two different sets ?
For instance, is there any way for you to recognize and convey the fact that {1, 2, 3} has just as many members as {a, b, c}, but fewer members than {a, b, c, d, e} ?

Turns out that there is.
In fact, this is how pretty much everybody learns to count but then forgets !
It’s the very basic idea of pairing things up.

The idea is, given two sets A and B, I will try to pair up every member of A with some member of B - with the stipulation that different members of A must be paired with different members of B.
(In fancy math-speak, a pairing as above is called a one-to-one function)

In what follows, I will use the notation “x -> y” as shorthand for “x is paired with y”

So, if A = {1, 2, 3} and B = {a, b, c}, then: 1 –> a, 2 –> b, 3 –> c  is a valid pairing.
So is: 1 –> b, 2 –> c, 3 –> a.
But: 1 –> a, 2 –> a, 3 –> b is invalid because 1 and 2 are both paired with a.

But if we took, B ={a, b, c, d, e} instead, then 1 –> a, 2 –> b, 3 –> c or 1 –> b, 2 –> d, 3 –> e are both valid ways of pairing all members of A with some members of B (there are many other ways).

The crucial difference, is that, every member of B = {a, b, c} ended up getting paired with some member of {1, 2, 3} but when we took B ={a, b, c, d, e}  instead, then however you try to pair things up, you always end up missing some members of B.

Hence, even without knowing how to count, one can compare sizes of sets, in a way which matches perfectly with our intuition of one set having “more things in it” than another one.
Let’s make this official.

Definition 1: Given two sets A and B, we say that A ≤ B (Size of A is less than or equal to size of B) if there is some way of pairing up all members of A with some members of B (but not necessarily all of them)

Definition 2: Given two sets A and B, we say A = B (A and B have the same size), if there is some way of pairing up all members all members of A with all members of B

Definition 3: Given two sets A and B, we say A < B (A is smaller than B or B is greater than A) if:
-There is some way of pairing up all members of A with some members of B (but not all)
- There is no way of pairing up all members of A with all members of B

Exercise 3: If A is a subset of B, then show that A ≤ B

Notice that proving A = B is relatively easy because you just have to figure out one possible way to pair up all the members of the two sets.
Showing that A < B is much trickier. It’s not enough to show some attempt at pairing up the two sets doesn’t work – you need to show that every possible attempt at pairing will fail.
This requires a good dose of ingenuity as we shall see later.

So, with these definitions in hand, we can go ahead and say that {1, 2, 3} ≤ {a, b, c} and  
{1, 2, 3} ≤ {a, b, c, d, e}.
Furthermore, {1, 2, 3} = {a, b, c}, but {1, 2, 3} < {a, b, c, d, e}

“Now this is all very interesting”, you say, “but the fact is I do know how to count and I could have told you this much earlier. Why did we take such a long winded route to this easy conclusion?”

Because – and this was the great insight of George Cantor, father of Set Theory – this method of comparing sizes of sets by pairing works perfectly well for infinite sets, whereas counting fails miserably !!
In another words, we have just managed to extend the concept of counting from finite to infinite collections.

So, without further ado, allow me to introduce…

The Smallest Infinity
Let N = {0, 1, 2, 3, …..}, be the natural numbers. This is the smallest possible infinite set.
What do I mean by that exactly? Well, this.

Theorem: For infinite set A, N ≤ A
Proof:   Pick any member from A, pair it up with 1. Now pick up another member, pair it up with 2. Keep going.
Now, if we run out of members to pick up at the k-th step for some number k, then A had only k members. But that means A was finite!

So, the process must continue until every member of N gets paired up with some members of A, but not necessarily all. Hence, by definition, N ≤ A.  □

But are we really sure of this? After all remember the sets {2, 3, 4, 5,…} and the odd numbers {1, 3, 5, 7,…} ? We had suspected that these were smaller than N, but how is that possible if N is the smallest infinity?

The answer is that both the sets above have the same size as N.

Look at the pairing 0 -> 2, 1 -> 3, …., k -> k + 2,….
Convince yourself that using this formula every single member of {2, 3, 4, 5..} gets paired up with N. Hence they have the same size.

Similarly, 0 -> 1, 1 ->3, 2 -> 5, …, k -> 2k + 1, …. pairs up N and all the odd numbers.

So, here we have a property of infinite sets that has absolutely no analogue in the world of the merely finite:
It is possible to remove members from an infinite set – even infinitely many of them – and still end with a set of the same size !!

What about the other direction? What if we add more things to N ?

How about S = {0, 0.5, 1, 1.5, 2, 3, 4, 5, … } where we threw in two more numbers ? Maybe that’s a bigger infinity?

No such luck.
0 -> 0, 1 -> 1.5, 2 -> 1, 3 -> 1.5, 4 -> 2, 5 -> 3, k -> k – 2 for all subsequent numbers k, and we have paired up N and S with nothing left over.
In fact, you could have thrown in a million, a trillion or indeed, any finite number of extra members to N without changing its size at all!

Ok, then, let’s get serious.
Let S = {0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, …. } (every whole number and whole number + 0.5)
Now we have thrown in infinitely more things. Could this give something bigger ?
0 -> 0, 1 -> 0.5, 2 -> 1,…., k -> 0.5k  for all numbers k  and once again, our hopes are dashed as everything gets paired up.

It keeps getting worse.
You can add a million sets to N, or a gazillion – each having the same size as N – and still the size remains stubbornly unchanged!

In desperation, we go all the way.
Consider an infinite collection of sets, A0, A1, A2 and so on, one for each whole number.
Let each of these sets be individually as large as N.

How large is this entire infinite collection of sets taken altogether?
The completely stunning – but unequivocal – answer is:
Just as large as N, not one bit more!!

At this point, you may legitimately wonder why we called N the smallest infinity.
Every infinite set which looked like it might be smaller than N ended up being the same size.
On the other hand, every attempt to get something bigger than N failed as well.

Maybe it’s time to call it quits and admit that there is only one infinity – the size of the set N.
Not quite.

A Bigger Infinity
Ladies and gentlemen, allow me to present:

Cantor’s Theorem: Let A be any set, finite or infinite. Then A < Power set of A

Proof: Let’s explicitly write down what we need to prove, first.

Let P(A) denote the power set of A.
By definition of A < P(A) we need to show two things
1) There is some way of pairing up all members of A with some members of P(A) (but not all)
2) There is no way of pairing up all members of A with all members of P(A)

To illustrate the ideas, I will use the set N, but the proof works for absolutely any set.

Step 1) is easy.
Note that P(A) consists of all possible subsets of A. In particular, for any member x belonging to A, the set {x} belongs to P(A). So, simply pair x -> {x} for each x and we are done.
To illustrate using N, we just pair 0 -> {0}, 1 -> {1}, 2 -> {2} and so on….

Step 2) was Cantor’s stroke of genius.
Remember, we need to show that no conceivable pairing will between members of A and P(A) will work.
So, how do we check through all possible pairings, especially if there are infinitely many of them ?
We don’t

Instead, we show that no matter what pairing recipe was given to you to link all members of P(A) and A, it couldn’t possibly have worked.

So, let’s say the World’s Biggest Math Genius has given you a possible way of pairing them all up.
As per his formula:
-         Every member x gets paired with some subset of A, which we will call “Pair of x”
-         More importantly, for any subset of A, there is some member of A which is paired with this subset.

We will use Mr. Genius’ pairing formula against him, by constructing a Rogue Subset as follows:
For any x, look at the subset  “Pair of x” as per his formula.
Now ask the question, Does “Pair of x” contain x?
If the answer is “No”, then include x in Rogue Subset.
If the answer is “Yes”, then don’t include x in Rogue Subset.

To illustrate, suppose as per Mr. Genius’s formula:
0 -> {1, 2, 3}, 1 -> {1, 3, 5, 7,…, all odd numbers} and 4 -> {1, 6, 11, 13}.
Then, Rogue Subset would include 0 and 4, but not include 1.

Now take a moment to realize that:
The Rogue Subset cannot have been paired with any member of A.

How so? Because you intentionally made it that way.
If the subset paired to a member contained that member then Rogue Subset wouldn’t contain it and vice versa. So, it is guaranteed that you won’t find Rogue Subset anywhere on the list of pairs.

Note that this doesn’t depend on any particular recipe for pairing.
If Mr. Genius realized his mistake and gave you another pairing recipe, then you would just follow the same procedure for that recipe and end up with another Rogue Subset.
Every possible attempt is doomed to failure and this completes Step 2)

This we have shown that A < P(A)
Done !! □

Let’s just take a deep breath and appreciate the magnitude of what we just proved.
For finite sets, this result is not a big deal because n < 2n for any number n (Exercise 1).

But the true magic of Cantor’s Theorem shines through when A is an infinite set.
For at last, we can break through the tyranny of N and soar upwards to a provably bigger infinity, the power set of N.
With that small (or is it infinitely large?) step, our journey has begun.

The Ladder of Infinity
Start by renaming N. Call it I0 – the “zero-th infinity”.
We now know that N < P(N), the power set of N.
Call that I1

But what worked once, works again. So, P(N) < P(P(N)). Call that I2.
Repeat. Repeat. Repeat again.

What you get is a chain of infinities, each bigger than the last – I0 < I1 < I2 < .... < I1000 < …
In other words, not only are there different sizes of infinity, you get infinitely many different sizes - one for every whole number.

(Quick Question: How do we know there are no infinities between I0 and I1 ?
 Quick Answer: This is the Continuum Hypothesis and way beyond the scope of this post.)

Leaving already? But we are just getting started.

Remember how we are allowed to take unions of sets?
Well then, once we get our infinity of infinite sets, each bigger than the last, let’s take the union them all !
Look at the Jumbo Set, J = I0 ∪ I1 ∪ I2 ∪ ….∪ ….
J is a strictly bigger infinite set than all the I’s !!

Why ? Okay, suppose, instead, that J = Ik for some integer k.
But that can’t be, because J contains all the members of Ik+1 and Ik < Ik+1
So, Ik < J for all Ik – in other words, J is bigger than all the I’s.

So, we can think of J as the “infinity-th infinity”, larger than anything that came before.
Let’s call it Iω

But now the fun starts all over again – for nothing stops us from looking at P(J), P(P(J)) and so on to get  Iω < Iω+1 < Iω+2 < ….
And yes, when this chain gets exhausted, we take the union once again to get a super jumbo sized infinite set I

Power set, power set, power set, union, repeat, repeat…  – I , I4ω ,  I , ….
Union them all and out pops I ω2
Plodding ever onwards, we get I ω3   I ω4   I ω5   ….. Then eventually  Iωω
And beyond that as well…

At this point, you may turn around and ask:
“How many different sizes of infinity are there? How many rungs on the ladder of infinity?”

Infinitely many, obviously, but now we have learned to calibrate more finely.
I0 different sizes of infinity ? I1 maybe ? Surely not Iω ?

It can be proved that, in a very well defined sense, there is an I1-th infinity, an I2-th infinity, an Iω-th infinity, even an Iωω th infinity.

In fact, we can find an infinity - call it ICrazy – such that ICrazy is the ICrazy th infinity.
In other words, ICrazy = I ICrazy  

That’s crazy ! Yes, I know.

As for your question, the answer is that there are more rungs on the ladder of infinity than can be counted by any of the infinities lying on that ladder !

Let’s say that once again - not only are there different sizes of infinity, there are far, far, far more of them than there are numbers.
In fact, there are more types of infinity than can be counted by any of the infinities in our ever increasing ladder.

But what if we joined them all together and looked at the “Set of all sets” ?
Right away, we have a problem:
Let U be the set of all sets.
From Cantor’s Theorem, we know that U < P(U)
But by definition, U contains all sets – which means P(U) is a subset of U.
But then P(U) ≤ U.
Contradiction !

Something has gone seriously wrong. Maybe it’s time to slow down and take stock of things.

The Universe of Sets
Infinity has traditionally connoted something mighty and mysterious, possibly beyond human understanding.

But now that we are dealing with infinities in the plural, it becomes apparent that they behave rather like the familiar finite (whole) numbers.

Some are bigger than others.
Taking a power set always yields a bigger infinity, just like adding one always gives a bigger number.
The enormous profusion of infinities, with entities like Iωω   or ICrazy may be startling, but perhaps no more so than the existence of really large numbers.

It turns out we can even do arithmetic with these infinities – add them up, multiply them, take one infinity to the power of another and so on.

The fact that there are more different sizes of infinity than can be counted by any given infinity is precisely analogous to there being more finite numbers than can be counted by any one number.

Even the problem we saw with our “Set of all sets” is very similar to what happens when we talk about the “Biggest possible number” - adding one should give a bigger number which leads to a contradiction.

But while it is fairly clear why you can’t talk about a biggest number, it isn’t quite apparent why we can’t mention the “set of all sets”.

Isn’t a set just a collection of things ? So why not the “set of all sets” ?

Remember how I had qualified my definition of a set with “technical collections apply” ?
It’s time to apply them now.

A set is defined as a collection of things which satisfies a set of rules called the “Zermelo Fraenkel Axioms” – usually abbreviated to ZFC.

For instance, the ZFC rules say that the Empty Set is a valid set and taking the power set of a set gives a valid set (among other things).
However, it so happens that the “collection of all sets” does not satisfy ZFC and is therefore not a set.
So taking its power set and looking at its size are out of bounds.
Contradiction avoided.

But let’s stick with the analogy just a little longer.

Once we realize that the finite numbers never end, it gradually dawns on us that the collection of all finite numbers is not described by some “biggest number”, but corresponds to something entirely new and incomparably larger.
(Or as your high school teacher would say, “Infinity is not a number”)

By analogy, although there is never a largest infinity, can’t we still imagine collecting all sets together to get – not a “largest possible set”, but “Something Beyond” ?
Yes, indeed we can.

What we get is the Universe of Sets.
Associated with the Universe is an “Infinity beyond all infinities” which Cantor referred to as the Absolute Infinite.
We are back to the long neglected “poetic definition” of Infinity – an ultimate maximum which admits nothing larger.

A deeply religious man, Cantor associated the Absolute Infinite with God – in contrast to the garden variety infinities we have been discussing, which he called the “transfinite cardinals
or “transfinite numbers”.

So, at the end of our excursion into the set theoretic realm, we are left with:
-         The finite numbers which we all know and love
-         The transfinite numbers forming our “ladder of infinity”
-         The Absolute Infinite encompassing and transcending them all

Or maybe we are thinking too small…

Stretching the Universe
Recall that all sets are required to be in compliance with the ZFC Axioms.
It turns out that all finite sets and the infinite sets – aka transfinite cardinals – we have seen till now can be explicitly constructed once we have the ZFC Axioms in place.
For what follows, I’ll call this collection the “ZFC universe”.

From here onwards, one possible approach is the
Conservative View: The universe consists of all sets which may be constructed from the ZFC Axioms.
In other words, the ZFC universe is everything you will ever get and you can stop reading now.

Life gets more interesting, however, if we adopt the
Liberal View: The universe consists of all sets which do not contradict the ZFC Axioms.

To see why, we define a set K with the following three properties.
a) I0 < K (where Io is N, our smallest infinity)
b) If a set A < K, then P(A) < K
c) (Union of fewer than K sets which are each smaller than K) < K

So, what’s the big deal ?
For starters, since I0 < K, Property b) implies I1 < K (since I0 = N and I1 = P(N) ).
But repeating the same logic, I2 < K, I3 < K , in fact In < K for every number n.
Now Property c) implies Iω < K since it is just the unions of all the In
Then we use Property b) again….

It turns out that K is at least as large as the entire ZFC universe !!!

But hang on, didn’t I just say that the universe of all sets is too large a collection to be a set ?
So am I talking about a set as big as the universe ?
Can I please make up my mind ?

Well, it all depends on how my mind is made up.
If I take the Conservative stance, then a monstrous set like K does not exist, cannot exist.

But if I take the Liberal view, then the ZFC universe is not “The Universe of All Sets”, it is merely “the collection of sets which can be constructed from ZFC” and there is no problem at all with a set as large as this collection, or even larger.

With the introduction of K, we have just stretched the Universe of sets.
And what a stretch it is!

To begin with, the once-mighty ZFC universe, which we were about to associate with Absolute Infinity, deflates to become the Least Inaccessible Cardinal - K0

And now of course, we can take the power set of K0, the power set of the power set, union of all the power sets, all the good old tricks to get a new ladder of infinities built atop K0

But now we have a new trick up our sleeve !
Because standing just beyond this new ladder, we find the next inaccessible cardinal, K1
And similarly, we get K2 K3 and – yes, you got it –an infinity of inaccessible cardinals.
And once you got those, triple your fun by taking the union of them all, then keep going.

Forget the ladder - we are now on the space elevator of infinity.

As we casually zoom past the inaccessible cardinals, it is all too easy to forget that the gap between one and the next is equivalent to the gulf between the smallest infinity I0 and the entire ZFC universe.
More power sets, more unions, more inaccessibles – are we stretching the universe of sets to breaking point ?
Not by a very long shot.

Inaccessible cardinals form the bottom level of the so called Large Cardinal Hierarchy – the frontier of research in modern set theory.
Inaccessible, Mahlo, weakly compact, ineffable, measurable, extendible, huge, superhuge… the chain of Large Cardinals goes on and on.

Each step up the hierarchy is a gargantuan, mindboggling leap, dwarfing the jump from the smallest infinity (almost invisible from these heights) to the least inaccessible cardinal and enlarging the set theoretic universe almost beyond comprehension.

Like ever-frustrated mountaineers, we triumphantly reach a summit, ready to touch the sky - when the fog rolls back to reveal an immeasurably higher peak towering overhead and the starry firmament of the Absolute recedes further than ever.

Does it ever end ? Could it ever end ?
These are, after all, mountains of the mind, unconstrained by any connection to reality.
What prevents us from spinning out ever larger cardinals for centuries to come, defining them into existence ad infinitum ?
Answer: The ZFC Axioms.

The only constraint on our “Liberal worldview” was that our sets should not contradict ZFC.
It turns out that as we climb the large cardinal hierarchy, we put ever more pressure on the girdle of ZFC – increasing the possibility of contradiction – until it finally snaps.

Bloated with mathematical hubris, we try to define a Reinhardt cardinal – an ultimate mountain of infinity that “touches the Absolute” – and it collapses into inconsistency.
But that is a story for another time.