Word Frequency Analyzer

Word frequency analysis reveals the most common words in a text, helping writers identify overused words, SEO professionals check keyword density, and researchers analyze large documents.

Built by Bob Article by Lace QA by Ben Shipped April 26, 2026

Frequently asked questions

What are stop words?

Stop words are common words like 'the', 'and', 'is', 'in' that carry little meaning. This analyzer optionally filters them out to focus on meaningful content words.

How is keyword density calculated?

Keyword density = (number of times keyword appears ÷ total word count) × 100. For SEO, a density of 1–2% is generally recommended for the primary keyword.

What is the ideal keyword density for SEO?

Google does not have an official keyword density guideline. Most SEO experts recommend 1–2% for the primary keyword. Over-optimization (keyword stuffing) can actually harm rankings.

Ratings & Reviews

Rate this tool

Loading reviews…

What the Word Frequency Analyzer does

The Word Frequency Analyzer takes a block of text, counts how many times each unique word appears, and shows you the results sorted from most frequent to least. Paste a paragraph and you see a ranked list: the word, the count, and (if you want) the percentage of total words.

Paste in the opening of A Tale of Two Cities — "It was the best of times, it was the worst of times..." — and the analyzer reports that was appears 10 times, of 10 times, the 11 times, it 10 times, and so on down the list. You see distribution. You see which words a writer reaches for, which they avoid, and which ones are doing more work than they probably should.

That's the difference between this tool and a word counter. A word counter gives you a total. This gives you the shape.

When you'll actually use it

Most writers and editors discover this category of tool through one of these jobs:

SEO keyword density. Your target keyword should appear often enough that the page reads as "about" it, and rarely enough that it doesn't feel stuffed. The analyzer shows you the actual percentage.
Self-editing for style. Every writer overuses words. The analyzer surfaces the ones you didn't notice — really, just, very, that — and lets you cut them deliberately.
Content analysis. Comparing two competing articles on the same topic? Paste each into the analyzer and you'll see immediately which one weights the topic more heavily and which one drifts.
Plagiarism detection (the rough version). If two documents have nearly identical frequency distributions, that's a signal. Not proof — but a starting point that's faster than reading both.
Translation review. Compare the frequency distribution of an English source with its translation. Big discrepancies in key terms often mean the translator paraphrased something they should have kept literal.
Speech writing. Politicians, comedians, and teachers all use repetition deliberately. The analyzer tells you whether your repetition is intentional or accidental.
Academic writing audits. Some journals flag papers that overuse hedging language. May, might, could, possibly, suggest — count them and decide whether the paper is being cautious or evasive.

The common thread: any time you want to know what your text is actually about — not what you think it's about — frequency is the fastest signal.

A worked example

Take this paragraph:

The quick brown fox jumps over the lazy dog. The dog barks. The fox runs. The fox is quick and the dog is lazy. The fox jumps again.

Paste it into the analyzer with stopwords included, and the ranked output looks like:

Rank	Word	Count	Percentage
1	the	7	23.3%
2	fox	4	13.3%
3	dog	3	10.0%
4	is	2	6.7%
5	jumps	2	6.7%
6	lazy	2	6.7%
7	quick	2	6.7%
8	and	1	3.3%
9	again	1	3.3%
10	barks	1	3.3%

Now flip the stopwords toggle to exclude, and the, is, and disappear. The list rearranges to:

Rank	Word	Count
1	fox	4
2	dog	3
3	jumps	2
4	lazy	2
5	quick	2

Same text, different question. With stopwords in, the paragraph is "about" articles and pronouns — which is true of every English paragraph and not useful. With stopwords out, it's about the fox and the dog, which is what a human reading the text would say. That's why the toggle matters.

The stopwords question

"Stopwords" is the linguistics term for high-frequency function words that carry little standalone meaning. In English: the, of, and, a, to, in, is, you, that, it, he, was, for, on, are, with, as, I, his, they, be, at, one, have — about 100 to 200 words depending on the list.

In every English passage longer than a paragraph, these dominate the top of the frequency list. The is the most common word in written English — about 7% of all text. Of is next, around 3.5%. By the time you get to position 30 on a stopword-inclusive list, you've still mostly seen function words.

The analyzer offers a toggle:

Include stopwords — every word counts, no filtering. Use this when you're studying register, formality, or any linguistic feature where stopwords matter (legal texts use of three times more than casual prose, for instance).
Exclude stopwords — function words are dropped before sorting. Use this when you're looking for topic, theme, or content. This is the default for SEO audits.

There's no universally correct stopword list. The analyzer uses a standard ~180-word English list adapted from the NLTK and Lucene defaults, which covers most practical cases. If a word you care about is on the list and shouldn't be, ignore the toggle for that document and read past the function words manually.

Stopwords aren't useless — they're just usually noise. If you're studying authorial style, the function-word frequencies are gold: forensic linguists use them to attribute disputed texts to specific writers because we each have unconscious "fingerprints" in our that/which ratio, our preference for and versus but, and so on. For ordinary writing audits, turn the stopwords off. For literary analysis or authorship questions, leave them on.

Reading the distribution

A frequency list isn't just a ranking — it's a shape. A few patterns to recognize when you read your output:

A long flat tail. Most words appear once, a few appear a lot. This is normal English text and is called Zipf's distribution. The word at rank N appears roughly 1/N as often as the most common word.
A spike at one content word. Useful for SEO. If your target keyword is at rank 1 (with stopwords off) and appears around 1-3% of the time, you're in the safe zone for topical relevance without keyword stuffing.
A cluster of near-synonyms at the top. Important, crucial, vital, essential, key, critical all showing up multiple times means the writer ran out of ways to say "important." This is a style problem the analyzer catches faster than re-reading.
Filler words dominating with stopwords off. Just, really, very, actually, basically in the top 10 is a sign your draft has tics. Cut them.
Brand or product name missing from the top 20. For marketing copy, this is a problem — if you're writing about your product and its name isn't frequent, the reader won't anchor on it.

The distribution doesn't tell you whether the writing is good. It tells you what the writing is doing. Whether that's what you wanted is your call.

SEO keyword density (with realistic numbers)

The old SEO advice — "target 2% keyword density" — is wrong but useful as a sanity check. Modern search engines use vector embeddings and topic models, not raw keyword counts. But raw frequency is still a fast way to spot two failure modes:

Under-density. If your target keyword appears once in a 1,500-word article, the page doesn't read as topical. Google's classifier may decide the page is about something else.
Over-density. If your target keyword appears 80 times in 1,500 words (5%+ density), the page reads as spam to both humans and algorithms.

The healthy zone for most content is 0.5% to 2.5% density for the primary keyword, with secondary keywords spread naturally through the text. The analyzer shows you that percentage directly. If you're at 0.2% and the topic is one word, you may want to rework. If you're at 4%, definitely rework.

For long-tail phrases (three or more words), this tool won't help directly — it counts individual words, not multi-word strings. For exact-phrase counting, use the Word Frequency Counter on each phrase separately.

How it counts

The analyzer runs entirely in your browser. Nothing is uploaded.

The rules it uses:

Words are sequences of letters, numbers, and apostrophes separated by whitespace or punctuation. It's counts as one word. Twenty-one counts as one word — the hyphen doesn't split. U.S.A. counts as one word.
Case is normalized. The and the and THE all merge into a single entry. There's an optional case-sensitive mode if you need to distinguish proper nouns from sentence-initial capitals.
Punctuation is stripped before counting. fox. and fox, and fox! all merge into fox.
Numbers count as words. 1999 and 2026 get their own entries. Useful for date-heavy text; turn them off in your head if you're auditing content style.
Sorting is by count descending, then alphabetically for ties.

For most writing this matches the way Microsoft Word and Google Docs count, with the same one-or-two-word edge-case gaps. For non-Latin scripts (Cyrillic, Greek, Arabic, Hebrew), word boundaries are still based on whitespace and work correctly. For Chinese, Japanese, and Thai — which don't use spaces between words — the analyzer treats each whitespace-separated chunk as one word, which is rarely what you want for those languages.

Analyzer vs Counter — pick the right one

This tool has a sibling: the Word Frequency Counter. They sound similar and they're not the same job.

Question	Use the Analyzer	Use the Counter
"How often does 'sustainable' appear in my essay?"	Yes (look in the list)	Yes (faster — one input)
"What are the top 20 words in this text?"	Yes	No
"Is my keyword density above 2%?"	Yes (shows percent)	Yes (divide by total)
"What words am I overusing?"	Yes	No
"Count 'because' only — I know what I'm looking for"	Overkill	Yes

Rule of thumb: if you know which word you care about, use the Counter. If you want to discover which words you should care about, use the Analyzer.

Related text tools

Word Frequency Counter — count one specific word, fast, when you already know what you're looking for.
Word Counter — total words, characters, sentences, reading time. The general-purpose counterpart.
Character Counter — character total only, for strict length limits.
Readability Checker — Flesch reading ease, grade level, and sentence-length analysis. Pairs well with the frequency analyzer when you're auditing prose style.
Sentence Counter — sentence-level structure analysis, useful when frequency tells you to vary your vocabulary and you also need to vary your rhythm.
Sort Lines — for sorting the exported frequency list however you need it.
Remove Duplicate Lines — when you're cleaning up a glossary or term list built from a frequency export.

Frequently asked questions

Is my text uploaded or stored anywhere?

No. The Word Frequency Analyzer runs entirely in your browser using JavaScript. Your text never reaches any Microapp server. The frequency table is built in memory, and closing the tab removes it completely.

What counts as a "word"?

A word is any sequence of letters, numbers, and apostrophes separated by whitespace or punctuation. It's is one word. Long-term is one word (the hyphen is treated as joining, not splitting). U.S.A. is one word. Email addresses and URLs typically count as one word each because they contain no whitespace.

Are "Dog" and "dog" counted together?

By default, yes. The analyzer is case-insensitive — Dog, dog, and DOG all merge into one entry shown in lowercase. Toggle case-sensitive mode if you want proper nouns kept separate from sentence-initial capitals.

What's the stopword list?

The analyzer uses a standard English stopword list of roughly 180 high-frequency function words: the, of, and, a, to, in, is, you, that, it, he, was, for, on, are, with, as, I, his, they, be, at, one, have, this, from, or, had, by, not, but, what, all, were, we, when — and similar. It's based on the NLTK and Lucene default lists. Toggle stopwords on or off depending on whether you're studying topic (off) or style (on).

How long can the input text be?

Practically, there's no fixed cap. The analyzer has been tested with documents over a million words and stays responsive on a modern laptop. Very long inputs (book-length, 100,000+ words) may take a second or two to process; everything shorter is instant.

Can I export the frequency list?

The output is plain HTML — you can select all, copy, and paste into a spreadsheet, a document, or any text editor. The columns separate cleanly. For more structured export (CSV, JSON), copy the table and use a converter, or run the analysis again in a script for production work.

Does it work for languages other than English?

Yes for any language that uses whitespace between words — Spanish, French, German, Russian, Greek, Arabic, Hebrew, Portuguese, and others. The stopword filter is English-only; for other languages, toggle stopwords off and read the list manually. For Chinese, Japanese, and Thai — which don't use spaces — the analyzer can't segment words correctly, and a different category of tool is needed.

What does "keyword density" actually mean for SEO?

Keyword density is the percentage of total words that match your target keyword. A 1,500-word article that mentions sustainable 15 times has 1% density. Modern search engines don't use density as a primary ranking signal anymore — they use topic models and embeddings — but density is still a useful proxy for "is this page actually about the topic." The healthy range is roughly 0.5% to 2.5% for a primary keyword; below that, the topic isn't anchored, and above that, the page reads as overoptimized.