Table of Contents
If you are an avid reader of blogs, like I am, you may have come across the "estimated reading time" feature like the on you can find on Medium. Indicating the reading time for a blog post or article is a great way of letting your readers know how long it would take them to read and consume the content. Many people live busy (work-) lives and have limited time available, and if they see an article is going to take only a couple of minutes to read, they are so much more likely to dive in and get to the end of it.
Speedy Readers
According to a speed-reading test sponsored by Staples, the typical speeds at which humans read, and in theory comprehend, at various stages of educational development:
- Third-grade students = 150 words per minute (wpm)
- Eight grade students = 250
- Average college student = 450
- Average “high level exec” = 575
- Average college professor = 675
- Speed readers = 1,500
- World speed reading champion = 4,700
- Average adult: 300 wpm
This data tells us that the average reading speed for individuals at different stages of educational and professional development vary. So it's important to know your audience when estimate reading speeds - but calibrating for a 250-300 words per minute range is a good ballpark, since not all content is equal. We can leverage this data to estimate the time it would take the average reader to get through your blog post or article.
The Reading Time Formula
The formula for estimating reading time itself is incredibly simple: T = w / s
where T
denotes the estimated reading time, w
is the total number of words in your article or post, and s
is the reading speed (either in words per minute, or words per second) for your target audience.
Ruby Implementation
How can we put this formula into our Rails blog? Natural Language Processing (NLP) is a subfield of computational linguistics and gives us all the fundamental building blocks for processing and analyzing arbitrary amounts of digital natural language data. We will first start off with the notion of a Document:
class Document
def initialize(text)
end
end
A Document
is constructed from a chunk of raw natural language text, in our case, the contents of your blog post or article. Through NLP methods, we can cut up this text into smaller and smaller pieces such as paragraphs, sentences, tokens, and finally words.
We first start with paragraphs
- consecutive amounts of text that are separated by two or more line breaks. We can use a Regular Expression that splits the given text
on two or more consecutive newline
characters:
def paragraphs(text)
text.split(/[\n\r]{2,}/)
end
Next, we can further split up each paragraph
into sentences
, again using a Regular Expression, albeit this time it's a little more complex:
def sentences(paragraph)
paragraph.split(/((?<=[a-z0-9][.?!])|(?<=[a-z0-9][.?!]\"))(\s|\r\n)(?=\"?[A-Z])/)
end
Each sentence can be further split into raw tokens
, all the elements of the sentence that are separated by whitespace
:
def raw_tokens(sentence)
sentence.split(/\s+/)
end
Each raw token
potentially contains punctuation characters, for example if we were to split the sentence "Two books, one tall, one short." with the method above, we'd obtain the raw tokens ["Two", "books,", "one", "tall,", "one", "short."]
. Hence, we need to further separate punctuation from each raw token
:
def split_with_punctuation(raw_token)
return raw_token if raw_token.end_with?("'S")
raw_token.split(/((?<=\p{P})|(?=\p{P}))/).map(&:strip)
end
For each resulting token
, we can now determine whether that token comprises a word
, or a punctuation
element:
def punctuation?(token)
(token =~ /\p{P}/) != nil
end
def word?(token)
!punctuation?(token)
end
Putting it all together
Let's put all these pieces together in a complete solution:
module Language
def self.paragraphs(text)
text.split(/[\n\r]{2,}/)
end
def self.sentences(text)
text.split(/((?<=[a-z0-9][.?!])|(?<=[a-z0-9][.?!]\"))(\s|\r\n)(?=\"?[A-Z])/)
end
def self.tokenize(text)
text.split(/\s+/)
end
def self.split_with_punctuation(text)
return text if text.end_with?("'S")
text.split(/((?<=\p{P})|(?=\p{P}))/).map(&:strip)
end
def self.punctuation?(token)
(token =~ /\p{P}/) != nil
end
def self.word?(token)
!punctuation?(token)
end
class Document
attr_accessor :paragraphs, :sentences, :tokens
def initialize(text)
@text = text
@paragraphs = Language.paragraphs(text)
@sentences =
@paragraphs
.map { |paragraph| Language.sentences(paragraph) }
.flatten
.filter { |s| (s =~ /\A\s*\z/).nil? }
@tokens =
@sentences
.map { |sentence| Language.tokenize(sentence) }
.flatten
.map(&:upcase)
.map(&Language.method(:split_with_punctuation))
.flatten
.filter { |s| (s =~ /\A\s*\z/).nil? }
end
def words
@tokens.filter(&Language.method(:word?))
end
def reading_time(speed = 3)
n_words = words.count
(n_words + speed / 2) / speed
end
end
end
Now we can construct a Document
from any given text
and estimate the reading time:
class Post < ApplicationRecord
# other code ...
def reading_time(speed = 200)
Language::Document.new(content).reading_time(speed)
end
end
For our KUY.io blog we are using a reading_speed
of 4.1 words per second (or about 250 wpm), and display it on our blog posts like this:
distance_of_time_in_words (@post.reading_time(4.1)).seconds
What if your Blog is not written in another language / framework?
There are a number of amazing NLP libraries out there for different languages and frameworks, that come with built-in support for text parsing, chunking and tokenization. Here is a round-up of the most popular libraries for different languages:
- Python: is quickly becoming the standard language for data scientists, and also features on of the best NLP libraries out there: the Natural Language Toolkit (NLTK)
- JavaScript: is another language that is very popular for web applications and blogs. Two NLP libraries stand out: Natural and NLP.js
- Java: is the grandfather language for NLP processing, with one of the most mature libraries, the Stanford NLP library
- Ruby: features a large collection of libraries for NLP tasks
- Elixir: has an implementation of basic NLP tools with the Essence library
How can we help?
Natural Language Processing is an incredibly fun sub-field of Artificial Intelligence and a powerful tool for teaching machines to process natural language text. If you have any questions about KUY.io or are looking for advice with implementing NLP workloads in your next project, please feel free to reach out us.
👋 Cheers,
Nicolas Bettenburg, CEO and Founder of KUY.io