In this episode, I review the recent history and current status of science fiction on television during the streaming era. TV recommendation: The Orville
Last time, I talked about the protein tryptophan synthase A and how its “full chemical name” may be the longest “English” word ever professionally published in a book. (And I corrected the spelling of it.) But that “word” was just constructed by applying chemical naming rules to the amino acid sequence of the protein. And there are much larger proteins out there, with correspondingly longer amino acid sequences. So, to construct a longer word, you just need to use a bigger protein.
This much is not a surprise. The most extreme claim for the “longest word”—one that has been circulating around the internet for years—is the “full chemical name” of the aptly-named titin, the largest known protein, which is a component of muscle tissues that gives them their passive elasticity. It is normally quoted as having 34,350 amino acids, more than 100 times longer than tryptophan synthase A, and its name is a staggering 189,819 letters. MrBeast even made a video a few years ago where he spent two hours reading the whole thing out loud…although he obviously didn’t have a clue how to pronounce it.
I didn’t pronounce it, but it did write it out, and I even managed to fit it in a single YouTube short.
At 72 letters per line, one line per frame, and 60 frames per second, it comes in at just under a minute…Except you may notice that my version is considerably longer than the one everybody quotes around the internet.
I started this video by downloading the official 34,350 amino acid sequence of titin from the UniProt database and writing a Python script to write the full name—like I did for tryptophan synthase A—just to make sure the letter count was correct. But I didn’t get 189,819 letters. In fact, it wasn’t even close. I got 241,577 letters. This is also not the number in the video, but bear with me.
(And if you think about it, it makes sense. 189,819 letters for 34,350 amino acids is only 5.53 letters per amino acid, which, if you’re familiar with amino acid names, seems improbably short.)
What happened? Well, proteins can actually have multiple different amino acid sequences, called “isoforms.” This can happen if the gene segments that encode them get read in a different order, and different isoforms may be more useful in different circumstances. Isoform 1 of titin is the “canonical” 34,350 amino acid sequence. But the 189,819-letter name everyone circulates, for some inexplicable reason, is Isoform 3 (also known as the “small cardiac N2-B” form), which only has 26,926 amino acids (7.05 letters per amino acid). Isoform 3 is actually the shortest version of titin out there, except for Isoform 6, which is a much smaller fragment.
If we want the actual largest known protein, we need to find the largest isoform of titin. That happens to be Isoform 12, with 35,991 amino acids. When I ran it through my Python script, it returned a word length of 252,176 letters, which is the word I wrote in the above video clip.
I’m not the first person to figure this out. A commenter named Stephen Thomas put the pieces together in 2016. But to my knowledge, I seem to be the first to write out the full corrected version. I’ve posted the full text of this word along with the amino acid sequence and my Python script on my Github page. Now, all we need to do is get MrBeast to read this version…preferably learning a bit of organic chemistry first.
But as I said in the video, this isn’t actually the longest possible word under chemical naming rules (you know, if chemists actually used crazy names like this). Just like you can string together names of amino acids in proteins, you can do the same with DNA sequences. But that’s a project that’s a little bit bigger than a short.
Quick, what’s the longest word in the English language?
If you watched my recent video, you’ll know that in descriptivist terms, that is, how people actually use language in practice, it’s a tie between transinstitutionalization and superincomprehenibleness (both 25 letters). If you allow technical terms, it’s laryngotracheobronchopneumonitis (32 letters). That’s the longest word that’s (normally) used in scholarly literature.
But that’s not the only way to answer the question. Check out this word:
Tryptophan synthase is an enzyme in bacteria that produces the amino acid tryptophan. Humans can’t synthesize it at all, and we have to get it from our food, but bacteria can. Like all enzymes, it’s made of a string of amino acids stuck together. In the case of tryptophan synthase in E. coli, or more specifically the alpha chain (also known as tryptophan synthase A), it’s a relatively small string of 268 amino acids.
In this episode, I review the recent history and current status of science fiction on television during the streaming era. TV recommendation: The Orville
Techno-thrillers are a surprisingly fuzzy category that sits at the border of sci-fi and suspense. In this episode, I give an overview of the subgenre and the different ways it can be classified.
I mentioned The Kaiju Preservation Society in Episode 4 of this season of my podcast as one of the hot new sci-fi books of this year and a rare serious literary treatment of the giant monster trope. This is the latest book of John Scalzi, also known for Old Man’s War and Redshirts. I have since actually read the book, and…I think it was a lot of fun.
In this episode, I review the recent history and current status of science fiction on television during the streaming era. TV recommendation: The Orville
The answer is probably not what you think. There are plenty of examples of “longest words” in various dictionaries, but most of them are curiosities that no one uses in real life. I’m looking for the longest words that are productive members of linguistic society, and that search went in some surprising directions.
In this episode, I review the recent history and current status of science fiction on television during the streaming era. TV recommendation: The Orville
Radix economy is often billed as a measure of which number base is most “efficient,” but its answer that base-3 is the best suggests there’s probably something else going on. In this video, I explain what radix economy is actually used for and when its use is not really justified.
In a future video, I’ll talk about what makes a base good in natural language.
In this episode, I review the recent history and current status of science fiction on television during the streaming era. TV recommendation: The Orville