Some fraction of these articles I’ll dedicate to exploring a part of the mathematical canon: who the author was, the nature of their project, and why their work is worth reading not just for technical specialists, but for everyone. Finally, I’ll conclude with some suggestions about the kairos for the author: when they can be incorporated into a plan of reading or a curriculum. In the last article I introduced Claude Shannon. This article continues through his corpus.
As a student and teacher of Latin, my fights with my computer began early. Spellcheckers were flummoxed at the idea that I could write sentences in a mixture of Latin and English, and whichever language I chose, marked whole swaths of my document as incorrect. A bit later in my career, the computer escalated the skirmishes and began changing my Latin to (bad) English without my consent. I responded with the thermonuclear weapon of disabling spellchecking entirely. But the war was not yet won. Now I do most of my writing in archaic programming editors, which are the last bastion of tools that obey their user. But, alas, when I write for a public audience, my words pass under the gaze of the computer, which attempts to bend them towards some sort of corporate pablum. What is going on with this? Why have computers become adversarial towards idiosyncratic writing? Why do computers attempt to make our writing wrong, as in the screenshot of my last article?
To make sense of this phenomenon, we need to look at Claude Shannon’s second work, A Mathematical Theory of Communication. In it, Shannon ushers in the “information age” in which we now live. Indeed, Shannon’s definitions hold enormous sway over all of our thoughts, whether we perceive it or not. He begins by defining the “problem of communication”:
The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages.
He goes on to define “information”:
If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely.
What do those (admittedly somewhat opaque) definitions mean? I want to draw out a few important bits. First, communication is a process of selecting a message from a finite list of possible messages. Second, the actual meaning of those messages doesn’t really matter. Finally, we can count how much “information” a particular message contains, again ignoring its meaning. Let me explain by way of a few examples of the sort of thing Shannon is talking about.
Imagine that I give my students a multiple-choice quiz that has four options per question and will be dutifully fed into the maw of the scantron. It begins:
- Who wrote the Aeneid?
- a) Catullus
- b) Vergil
- c) Cicero
- d) Livia
Here we have the paradigmatic example of communication for Shannon. A teacher offers his students a particular prompt, and they communicate a message to him in return. They must choose the message from a finite set of possible messages (four here). Perhaps the prompt and message have meaning (to whom?), perhaps not (more on this in a moment). If the teacher has written a good exam, then all four possibilities will be equally likely. The students can happily communicate their response to their teacher (perfectly unambiguously) by filling in a little bubble.
Indeed, Shannon goes on to quantify how much information the response to that question contains. The “monotonic function,” as he elaborates, is a measure of how many yes-no questions you’d need to uniquely identify the answer. So, an answer in the child’s game “twenty questions” that took all twenty questions would contain twenty bits of information. For our multiple-choice quiz, we would need two questions to narrow down a particular response (“Is the answer a) or b)” might be the first question whose answer would be, “yes”), so each question would be a two-bit question (this is a pun).
This means that we can think of a multiple-choice test as a communicative act and count up how much information it contains. A twenty-question test with five choices per question would be a total of 46 bits of information. By comparison, the previous sentence contains 808 bits of information (each character is [represented by 8 bits](https://www.asciitable.com)). And the fact that we can have a machine grade the exams suggests that Shannon’s framework is in play: the important act is selecting the correct message, not any sort of exchange of meaning. b) chosen by lot earns as many points as b) chosen reflecting knowledge. My wife notes that this leads teachers to call such tests “multiple-guess.”
This ethic—that communication is selecting the right message from a set of possibilities—has invaded the college classroom as well. Companies like Cengage offer online textbooks whose great advantage is machine graded homework. Some of their questions are multiple choice. Some are free response. Some are true/false. I particularly enjoy questions that are true/false where the student is given five attempts to answer the question! Particularly horrifying is the reduction of proof writing to (and I emphasize that I am not making this up) mad libs and word jumbles. Rather than a creative act of persuasion, the Cengage texts make proof writing the selection of the right word from a short list of words to fill in a blank in a pre-written proof. Often there are fewer choices for the word than allowed attempts. This is horrifying.
Shannon’s vision of communication pervades grading as well. As a teacher, I communicate to my students by choosing from a limited set of messages: F, D, C, B, or A. Perhaps with the addition of plusses and minuses we have something like three and a half bits of information in a grade for a whole semester long course.
This impulse to reduce communication to the selection of an appropriate message from a dictionary of messages now extends beyond the academy. Tools like Microsoft Word’s autocorrect and Grammarly conceive of all writing as the selection of a message from a corpus. And the goal of tools like that is to shape our writing to consist of the most likely words. This, of course, often results in unintended hilarity parodied in Martha Snow’s poem:
Eye halve a spelling chequer
It came with my pea sea
It plainly marques four my revue
Miss steaks eye kin knot sea.
Shannon walks us through how we can mathematically analyze which letters are most common, then which pairs of letters, then which words, and so on. The spell checker that gave the thumbs up to Martha’s poem knew enough to say that “halje” was less likely than “halve,” but not enough to know that the author probably meant “have.” The former is probabilistic. The latter relies on understanding semantic meaning. And this sort of probabilistic building of Markov chains is the foundation of modern A.I. Things like ChatGPT are the next steps on the road of treating communication like an engineering problem. They produce plausible texts that are only indifferently connected to meaning.
Shannon’s work reminds me of the Colosseum, or any of the monumental buildings in Rome. On the one hand, Roman buildings are majestic, the fruits of empire. On the other hand, they are horrifying, the fruits of empire. The marvel of engineering and artistic vision is also a mass grave built by slaves. Certainly less viscerally, Shannon’s work rings both of those notes. Inasmuch as his theory is a mathematical theory of communication, it is majestic and beautiful. Inasmuch as his theory is a mathematical theory of communication, it horrifies.
Why then, does Shannon belong in the canon? Why is he worth reading?
First, his ideas are enduring – his thesis articulates a vision of a machine that can, in some sense, imitate the work of the human mind. And his theory of communication transforms the human mind into something that machines can make sense of: a source of discrete messages to be quantified. Like the printing press (which, in passing, is also a Shannonian invention: rather than the free composition of the pen, you have the selection of particular letters from a set of possible letters), Shannon’s theory has reshaped society. Indeed, his ideas have consequences for all of us. We live in a time of information, vice knowledge. ChatGPT flows directly from his work, as do things like Grammarly that use Shannon’s picture of information to round off the idiosyncratic bits of our writing. Indeed, much of our educational system is now geared around producing good sources of information in the sense that Shannon means. When students just want to know what to write to get marks on their homework, they are thinking of themselves as information passing machines. A fascinating question is “could it be otherwise”? That is, could we have at our disposal machines that do what computers do today without those machines shaping the way we think and write?
Second, having inspired those changes to society, Shannon nevertheless offers us grounds to critique this sort of tomfoolery. Imagine two students. One is not particularly gifted but works hard and is diligent. She scores 75, 77, 73, and 79 on four tests. Our grading scheme compresses that to a C. Another is quite gifted, but distracted and prone to ill health. She scores 99, 58, 97, and 50 on the same four tests. Our grading scheme compresses that to the same C. But those two students are remarkably different in ways that cannot be captured well in three bits of information. Hence, I suppose, the push for open gradebooks: if we communicate all four grades, then we’re communicating almost 27 bits of information! This is still a shockingly small amount of information compared to even a modest narrative communication. Our ability to quantify the information contained in a communication allows us to push back against the very reductive tendencies that the quantification of information inspires.
Both in his creation of the information age and his implicit critique of it, Shannon lies at the root of many phenomena we encounter today. If we want to have an informed opinion about things like artificial intelligence, machine learning, even computerized pedagogy, we need to engage with the philosophical and mathematical ideas that lie at the root of those societal trends. We cannot understand how the modern world shapes the human person without understanding the nature of information that Shannon develops. His text and ideas are foundational. And in the same way that a summary of Plato does not answer the mail, my brief summary of Shannon’s work is inadequate. The more we can grapple with texts like his on our own, the more we can be part of a fully formed society.
That said, A Mathematical Theory of Communication is a very hard read. Shannon expects much facility with mathematics from his reader. But this too is true of all great literature and philosophy. If you haven’t spent some time wrestling with 19th century philosophy and Christian orthodox theology, Crime and Punishment will be a hard read. Great ideas, great texts have table stakes without which we approach them in vain. What makes Shannon different, perhaps, is that we don’t generally think of our mathematical education as necessary preparation to engage with great and enduring ideas. We should.
 Shannon would say the response to such a question communicates zero bits of information.
 He says this in a more formal and exact way: we use a logarithmic function base two to measure the information content of a particular message. The word “bit,” which means “binary digit” flows from the inputs to this function.
 Actually it’s a bit more: 20 * log(5)/log(2) is 46.44ish.
 Someone, possibly me, should write an article discussing the role of mathematical thought in handwriting: the transition from script to print is all about the ascendancy of the discrete in the 19th century, aided and abetted by mechanical printing, which eschews things like ligatures by its very nature.
 This framing also makes the debate about grade inflation much clearer. All grade inflation does is take a bad communication method and make it significantly worse. If the only possible grades are A and A-, then each course is compressed into one bit of information. Likewise for floors on bad grades: if the minimum grade is a 50 then the informational content of each numeric grade is cut in half. We need more information in our communication with students, not less.