I wrote this sentence by pressing down on little pieces of plastic in a specific order on my keyboard. Afterward, a whole set of software and electronic systems perform a cascade of further transformations and transmission. On your device, that process is reversed for you to be able to read this.

Mind → English Language → Keystrokes → Characters Stored/Transmitted in Electronic Format → English Language → Your Mind

Another word for this sequence of translations is encoding, and it leads to all sorts of problems that can blow up data pipelines, creates errors in software, and generally lead to the degradation and loss of information that we try to move between systems or use for analytical purposes. Information that was created and stored a few decades ago can be less accessible or cause hiccups when you try to access or process it.

You have probably seen the results of this garbling while scrolling at some point. Have you ever run across � or □ or weird garbled text like they’re? Or maybe you have opened a file only the be greeted by a wall of text that looks like green scrolling Matrix code or an otherworldly message of warning like ™é£Ÿåæ ¹äº‹æ†²è»Šèµ¤è¨˜å ±é™¸äº€è–公。開仕国委館関飛å‰å¥æ¯ŽèŠ¸é–¢è¥²ã€‚社飛è.

These are typically caused by encoding issues, and the history of encoding using electronic or electromagnetic means goes back to the early 19th century.

Untitled

Unlike writing words on paper and physically sending a letter to someone, there is no direct way to transmit a letter like A through a copper telephone line or fiber optic cable in the same way that you can write letters on a page. But if you don’t want to rely on writing letters and having a courier deliver them, you have to tackle translating human language into an electronic medium for transmission. Since the early 1800s, we have been tackling this problem with…a wide dispersion of success.

The first widespread way of doing so was the original Morse code, developed in the mid-1830s in America. Artist and inventor Samuel Morse and his business partners invented both an encoding system and a device to transmit the codes over a long distance that relied on the presence absence of electric current to represent 26 letters, 10 numerals, and a handful of punctuation marks.

The original Morse telegraph, credit: Wikipedia

The original Morse telegraph, credit: Wikipedia

Untitled

As the telegraph took off around the world, demand surged for a way to encode more characters, like accented letters, more punctuation marks, capital and lowercase letters, etc. Technical issues related to how electric signals travel through super-long undersea cables also made the American Morse code ill-suited to international communication (at least for a while).

Everyone wanted to add more codes to the system, but this ran into a common problem. To communicate effectively, people must use the same (or at least compatible) encoding systems. If I think 0001 = A, and you read my message thinking that 0001 = C, you will end up with a lot of misrecorded words and letters. The information contained in the message is partially destroyed. If my system has an encoding for ø and yours doesn’t, you are going to receive a code that you have no idea how to interpret.

By 1851, a group of European countries held a conference and created International Morse Code, which supported more letters, including accented letters. American and International Morse are not interoperable, and both have been referred to simply as “Morse code” in many contexts, often leading to confusion. Some telegraph operators had to learn both systems and juggle back and forth in different contexts. American Morse remained in wide usage for 100 years, until the 1920s and 1930s when it was only really extinguished by changes in communication media like the radio and telephone.

What International and American Morse share, however, is a two-level coding system, where characters in human languages were first translated into a much more limited set of codes, which then were translated into a binary form. To quote from this epic Stack Exchange post on International Morse by user babou:

Morse code is composed of a prefix ternary code expressed in the alphabet {dotdashsep }, with these three symbols themselves encoded in binary with the following codewords:

dot →10, dash → 1110, and sep → 00

In a binary, there are only two codes: 0 and 1. In electronic or electromagnetic terms, this means that the presence of electric current represented one value, lack of current represented the second value in a given unit of time. Both American Morse and International Morse combined these time blocks of current or absence of current to encode messages first from their native language to an intermediate “language” with just a few codes, to this binary representation.