Remove Duplicate Lines

The Duplicate Line Remover strips repeated lines from any text in seconds. Choose to keep only unique lines, sort them alphabetically, or extract only the duplicates for inspection.

Built by Bob Article by Lace QA by Ben Shipped

How to use

  1. 1

    Paste your text with duplicate lines.

  2. 2

    Choose a deduplication mode.

  3. 3

    Copy the cleaned result.

Frequently asked questions

Ratings & Reviews

Rate this tool

Sign in to rate and review this tool.

Loading reviews…

What does the Remove Duplicate Lines tool do?

Paste a list — one item per line — and the tool returns the same list with every repeated line removed. The first occurrence stays. Everything that comes after a line you've already seen disappears.

Five lines in, three lines out, if two of yours were duplicates. The output is in the same order as your input. No sorting, no reformatting, no surprise rearrangement — just the duplicates gone.

This is the kind of thing that's a one-liner in a programming language (sort -u in a Unix shell, set() in Python) but a real chore in Excel or Notepad. The tool exists so you don't have to leave the browser when you have a list of 400 email addresses and you suspect a few are repeated.

A worked example

Suppose you have this list of five lines, copied from a contact sheet:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Paste it in and click Remove Duplicates. The output is:

[email protected]
[email protected]
[email protected]

Three unique lines. The order matches the order they first appeared in your input. alice is first because she was first in the original list. bob is second. carol is third. The duplicate alice and the duplicate bob were both dropped.

That's it. No configuration, no panel to read, no decisions about which copy to keep. If you need the second or third occurrence of a duplicate kept instead of the first, you'd flip the list upside down before pasting and flip it back after — but in 95% of real cleanup work, "keep the first one" is what you want.

When you'll use it

The tool earns its keep on lists that came from somewhere else and arrived dirty:

  • Email lists — You exported subscribers from two campaigns and want one clean list before importing them into a third. The Remove Duplicate Lines tool turns 1,200 entries into 940 unique ones without firing up Excel.
  • CSV cleanup — One column copied out of a spreadsheet, deduped, pasted back. Often faster than the spreadsheet's own deduplication menu, especially if you only care about one column.
  • Log analysis — Grep produced 2,000 lines of warnings, but you suspect there are really only 30 unique messages. Paste, dedupe, and now you can read them all.
  • URL lists — A web crawl produced thousands of links and you want the unique set before passing them to the next step.
  • Word lists for puzzles or word games — Building a custom dictionary and want to make sure no word appears twice.
  • Survey responses — One question, one row per response, and you want to see all the distinct answers without counting how often each appeared.
  • Building a checklist from messy notes — You jotted "send invoice" three times during a long meeting. The deduped version is your actual to-do list.

For all of these, the alternative is some combination of Excel, a text editor with a regex panel, or a quick Python script. None of those are hard — but they each take longer than pasting into a tab that's already open.

How the dedup works

The tool splits your input on newline characters into an array of lines. For each line, it trims leading and trailing whitespace, then asks: have I seen this exact string before? If yes, skip it. If no, add it to the output. Empty lines (lines that are blank after trimming) are also dropped.

Behind that simple description is a small choice that matters: how do we compare lines? There are three reasonable answers, each with its own trade-off.

Strategy Comparison rule What gets kept Use case
Simple dedup (this tool) Exact string match, whitespace trimmed First occurrence of each unique line General-purpose list cleanup
Case-insensitive dedup Compare as lowercase Alice and alice treated as duplicates Email lists, names, where casing varies
Preserve last Exact match, keep the most recent Last occurrence of each unique line Versioned logs, where the newer line wins
Sort then unique Sort first, drop adjacent duplicates One copy of each, alphabetical order When order doesn't matter and sorted is nicer

The default behavior here — preserve first, exact-match — is the right choice for most situations because it doesn't change the order of items you care about. If your list was already in priority order, deduping won't accidentally promote a duplicate to the top of the list. If you specifically want the sorted version, run the output through the Sort Lines tool afterward.

Case sensitivity, and why it matters

By default, the comparison is exact. "Alice" and "alice" are different lines and both are kept. That's the safe default — most of the time, when two lines differ only by case, the difference is intentional.

The exception is email addresses. The local part of an email address (before the @) is technically case-sensitive per the RFC, but in practice every major email provider treats addresses case-insensitively. [email protected] and [email protected] arrive in the same inbox. If you're deduping an email list and want to treat case-different addresses as the same person, you have two options: either lowercase the whole list before pasting (the Case Converter tool does this in one click), or accept that you'll have some duplicates the tool didn't catch.

The same logic applies to URLs, where example.com and Example.com resolve to the same site, and to usernames on services that are case-insensitive at sign-in. Normalize first, then dedupe.

Trailing whitespace is the silent killer. Two lines that look identical can fail a dedup check because one ends with an invisible space or a tab character. The Remove Duplicate Lines tool trims leading and trailing whitespace before comparing, which catches the most common version of this. But if your lines have internal whitespace differences — "hello world" vs "hello world" (two spaces between) — those will be treated as different. Run a regex find-and-replace to collapse multiple spaces if you need an even stricter dedupe.

Order preservation: why it matters

Some dedup tools sort the output. Some don't tell you which order you'll get. This one preserves the original input order — the first occurrence stays in its original position, and everything after the duplicate just gets skipped.

That matters in surprising places. If your list was a prioritized to-do, alphabetical sorting destroys the prioritization. If your list was a chronological log, sorting destroys the timeline. If your list was a contact import where the first occurrence has the most-recent metadata, sorting separates the right record from the rest.

The deeper reason to preserve order is that you can always sort the output if you want sorted output. If the tool sorts for you, you've lost information you can't get back without rebuilding the original.

Limits and what the tool won't do

A few things to know before you trust this for production data work:

  • It doesn't normalize content. "[email protected]" and "[email protected]" are different lines to the tool. Run them through a case converter first if that matters.
  • It doesn't handle CSV semantics. If your input has lines like "Smith, John", the tool treats each whole line as one string. It doesn't know about columns. For column-aware deduplication, a spreadsheet is still the right tool.
  • It doesn't fuzzy-match. "[email protected]" and "[email protected]" (note the typo) are different lines. Detecting near-duplicates requires a fuzzy-matching algorithm that this tool deliberately doesn't include.
  • It strips empty lines. If you had blank lines as separators in your input, they're gone in the output. This is usually what you want, but worth knowing.
  • It runs in your browser. Nothing is uploaded. Big lists (tens of thousands of lines) work fine; truly massive lists (millions of lines) might slow your browser depending on memory.

A second worked example: cleaning up a log

Suppose you grep'd an application log for warnings and got 200 lines like this:

WARNING: Connection timeout on socket 4
WARNING: Connection timeout on socket 4
WARNING: Cache miss for key user_42
WARNING: Connection timeout on socket 4
WARNING: Cache miss for key user_42
WARNING: Disk space below 10%
WARNING: Connection timeout on socket 4
...

You don't need to know the warning fired 84 times. You need to know it fired. Paste the log, click Remove Duplicates, and you're left with three unique warnings. That's a 200-line problem turned into a 3-line problem in two clicks.

If you also wanted a count of how often each line appeared (the frequency, not just the unique set), this tool doesn't do that — but it's still the right first step. Dedupe first, then go back and count whichever ones look interesting.

Related text tools

Remove Duplicate Lines is part of a small set of list-cleanup tools that compose well with each other:

  • Sort Lines — Run after deduping to get a clean alphabetical or numerical version of your unique list. Pairing dedup with sort is the most common workflow.
  • Case Converter — Lowercase your list before deduping if case-different copies should be treated as duplicates.
  • Whitespace Remover — Collapse internal whitespace before deduping if "hello world" and "hello world" should be considered the same line.
  • Text Diff Checker — When you want to compare two lists and see what's in one but not the other, the diff tool is what you need. Use it for "what changed between yesterday's list and today's."
  • Word Counter — A quick way to see how many lines and characters you're working with before and after the dedupe.

Frequently asked questions

Does the tool keep the first or last occurrence of a duplicate?

The first occurrence. If [email protected] appears on lines 1, 5, and 12, only the line-1 version stays in the output. The lines 5 and 12 copies are removed.

Are blank lines preserved?

No. Blank lines (lines that are empty or contain only whitespace) are dropped from the output. If you need them as separators, add them back after deduping in a text editor.

Is the comparison case-sensitive?

Yes. Alice and alice are different lines and both will be kept. If you want case-insensitive deduping, lowercase your input first with the Case Converter.

What about whitespace at the start or end of a line?

Trimmed before comparison. "hello" and " hello " are treated as the same line — the first one (the one without the extra spaces) is kept in the output. Internal whitespace, however, is preserved exactly as written.

Is there a line limit?

No fixed limit. The dedupe runs entirely in your browser, so the practical limit is whatever your browser can hold in memory. Lists in the tens of thousands of lines complete in milliseconds. Hundreds of thousands work too, just a little slower.

Is my data uploaded anywhere?

No. The tool runs in JavaScript in your browser. Your list never leaves your machine, which means it's safe to use for confidential email lists, internal logs, or anything else you'd rather not send to a server.

Can I get a count of how often each line appeared?

Not directly — this tool only outputs the unique lines, not their frequencies. For frequency analysis, paste the result into a spreadsheet with COUNTIF or run a quick sort | uniq -c in a terminal. The deduped output is still useful as the first step for that workflow.