Book Lies: Readability is Impossible to Measure

Gabe Habash -- July 20th, 2011

One of Amazon’s best and little-known book features is its “Text Stats” page, a tiny link that’s tucked three-quarters down a book’s page under the “Inside This Book” heading. Clicking the link takes you to a page with graphs and numbers, the most interesting (and objective) of which is word count. It’s always fun to compare War and Peace‘s word count (590,000) to major textbooks, and to see that Tolstoy smashes most of them with his stern Russian will.

But there are other figures on the page, and these are meant to tell you, as close to objectively as possible, how readable and how complex the book is. We put these measurements to the test to see how accurate they are in determining how readable and how complex a text is. The six books we sampled are Finnegans Wake by James Joyce, Where I’m Calling From by Raymond Carver, The Great Gatsby by F. Scott Fitzgerald, The Memory Keeper’s Daughter by Kim Edwards, The Tipping Point by Malcolm Gladwell, and Moby-Dick by Herman Melville.

For “Readability,” Amazon lists Fog Index and two indices developed by Rudolf Flesch under commission by the Navy. Keeping it brief, the Fog Index estimates the number of years of formal education required to understand the book. The Flesch-Kincaid Index similarly measures the U.S. grade level likely needed to understand the book, meaning Fog and Flesh-Kincaid numbers should be, in a perfect world, quite similar. The regular Flesch Index is based on a 100 point scale, with 100 being the easiest to read (for frame of reference, a college degree is considered necessary to read a book with a score of 0 to 30, while a 5th grader should be able to understand a book with a score of 90 to 100).

Got all that? Here’s what our sample books scored (Fog, Flesch-Kincaid, Flesch):

Finnegans Wake: 11.8, 9.3, 57.8

Where I’m Calling From: 5.7, 3.9, 84.2

The Great Gatsby: 14.4, 11.8, 48.8

The Memory Keeper’s Daughter: 7.5, 5.6, 76.7

The Tipping Point: 12.6, 10.1, 55.7

Moby-Dick: 13.0, 10.8, 57.9

According to the numbers, Gatsby requires the most education and is the most difficult book to read from the group, and Where I’m Calling From requires the least education and is the least difficult. Other interesting finds: Gladwell’s bestseller is more difficult than both Moby-Dick and Finnegans Wake, two books notorious for being picked up but never finished, the latter often cited as unreadable. So, there are some flaws in the numbers. But why?

Looking at the formula each index uses, we can see they all include words per sentence, as well as accounting for “complex words,” or longer-syllabic words. Conveniently, Amazon’s “Complexity” section on the Texts Stats page breaks down those numbers as well. The book with the most number of words per sentence? Moby-Dick, followed by Gatsby, with Where I’m Calling From last. In fact, except for the switch at the top between Melville and Fitzgerald, the order of books from “hardest readability” to “easiest readability” exactly corresponds to the number of words per sentence.

The conclusion: the main “objective” way we try to measure a book’s difficulty is ultimately determined by how long its sentences are.

So what does this all mean? It means that we can get a general idea of how difficult a book is through these numbers, but not really enough to call it reliably accurate. The Great Gatsby is read by high school freshmen and sophomores around the country, but it has an 11.8 Flesch-Kincaid score, meaning high school seniors should be the youngest to read it and understand it. Finnegans Wake, a book famous for its own language, is only moderately difficult to read according to the figures, mainly because it has a 1.6 syllable per word average (even though many of those words are made up). And Carver? According to the numbers, his stories could be read by fourth graders or very precocious third graders (3.9 Flesch-Kincaid). Which is probably true, they could read it. They could read “Collectors” and see that it’s about a traveling vacuum salesman trying to sell a vacuum cleaner to a man. They could read it, but no fourth grader could ever really read it.

But, for all the shortcomings here, there are some good numbers. The Memory Keeper’s Daughter could probably be read by most sixth graders, and the same with The Tipping Point and tenth graders.

Ultimately, however, there’s no substitute for picking up a book and seeing for yourself. Most people have the common sense to put down a book that opens with the word riverrun.

4 thoughts on “Book Lies: Readability is Impossible to Measure

  1. Larry Kunz

    I think it’s telling that all three readability metrics have been around since at least the 1970s: I was introduced to them when I was a very young technical writer. These are imperfect metrics, but if nobody has invented a better one in 40-plus years, they might be the best we can do.

    Sentence length is a faulty criterion. Sentence STRUCTURE matters much more. It’s much harder to read a short sentence that turns the basic subject-verb-predicate structure inside-out than a long, straightforward sentence. In spite of this, automated style checkers place a premium on sentence length — to their detriment.

  2. jesusangelgarcia

    From what I’ve discovered recently, “readability”/comprehension has less to do w/ length of sentences than a reader’s life experience combined w/ broadmindedness (limited preconceptions) and breadth of exposure to literary styles and structures. Thorny sentences are one thing. Thorny content is another. Some books are written for adults. Some adults are children.

  3. Pingback: Book links round-up: phone-hacking lit, saving Toronto libraries, and more | Quillblog | Quill & Quire

  4. Rudy

    Readability indexes are best used with nonfiction prose–for technical and scholarly work written for nonspecialists. They can be easily gamed (though the result IS often more readable, which isn’t the same thing as understandable).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>