Common English Words Counted & Ranked

Which words in the English language are used most?
Or how often is a particular word is used?

Wordcount displays the most commonly used English words.

God comes in at 76. Africa is ranked at 1317. Rugby is 2843!!!

Mac is 7172nd. Hmmm. If you add apple that . . . . .


Read the VOA transcript . . .

Avi Arditti with Rosanne Skirble, and this week on Wordmaster: counting words.

RS: If you wanted to show people the 88,000 most common words in English, how would you do it? Jonathan Harris thought of a sentence -- or something that looks like one. He works on interactive art projects. He laid out the words in a straight line, from the most frequently used to the least frequently used.

AA: This is all on a Web site, so you keep clicking to the right to read the words on the screen. Or you can look up specific words to see their ranking. There's also a visual trick that displays the words as a graph. The most common are in really big type; the least common are in really small type.

RS: Jonathan Harris is an artist in the field of "information visualization." What he created is wordcount-dot-org.

JONATHAN HARRIS: "The experience I was trying to create for the user was like an archeologist sort of sifting through sand. And you never really get a look at the whole language at any one time. You really have to zero in one specific part and explore there. And in this sense you can really spend hours just killing time on this and playing around."

RS: "You say it's like one very long sentence, but is there anything connecting these words?"

JONATHAN HARRIS: "That's what's really interesting, and this is the one aspect of WordCount that people have really gravitated toward, as I've found. Because the data is essentially random -- I mean, it's not random, but the fact that a given word is next to another word is only based on how often those words appear in normal English usage. But when you have 88,000 words placed back to back, chances are pretty good that a few of those sequences are going to form some pretty conspiratorial meanings.

"Every morning I sort of come into work and I check my e-mail and I have a pile of e-mails waiting for me from people all around the globe that have found interesting sequences in WordCount. Some of my favorites are words 992 to 995 are 'American ensure oil opportunity.' Then 4304 to 4307 is 'Microsoft acquire salary tremendous.'"

AA: "I like this one, 5283 to 5285, which is 'angel seeks supper.'"

JONATHAN HARRIS: "Exactly. I found that a lot of people suggest that this be used as a good device for people trying to come up with a name for their band."

RS: "How is it determined, the frequency of any given word?"

JONATHAN HARRIS: "The frequency is data that is not generated by me. The frequency data was all coming from this source data that I used, which is the British National Corpus and that's a collection of written and spoken English words that were collected over a few years, I think back in the mid-1990s, by this group in England. It's a little bit dated; I've found one word that people are often surprised does not appear at all in the archive is blog. So clearly the phenomenon of Web logging came up after this data was collected."

AA: "So now you describe this basically as an 88,000-word-long sentence, starting with the word 'the,' the most frequently used word in the English language. What's at the other end?"

JONATHAN HARRIS: "The other end is surprising, and this is a big point of contention for a lot of people that actually find what the last word is. But the last word, surprisingly or not, is conquistador. And if you look through the list and you spend some time with it, you'll find that there are many words much, much further in front of conquistador that you've never even heard of. So clearly there seems to be some errata in their data."

Go to Wordcount

Comments

jane said…
His interactive website is lots of fun! Give it a try (www.wordcount.org). These must be written words, not spoken ones, because the word "uh" is ranked at 16,975.