What the PDF to Excel Converter does
A PDF stores text as positioned fragments — each word knows its (x, y) location on the page, but nothing in the file knows what a "row" or a "column" is. The PDF to Excel Converter looks at those fragments, clusters them into rows by their Y-coordinate, splits each row into columns wherever there's a wide horizontal gap, and writes the result into a real .xlsx workbook. One sheet per PDF page. Open it in Excel, Google Sheets, Numbers, or LibreOffice Calc. No upload, no watermark, no signup.
The whole pipeline runs in your browser. PDF.js (Mozilla's open-source PDF library — the same one Firefox uses internally) reads your PDF and hands us text fragments with coordinates. SheetJS encodes the result into the .xlsx binary format. Both libraries run on your CPU, in your tab. Open DevTools, switch to the Network tab, click Convert. Zero outbound requests. Your bytes do not leave your machine.
One honest caveat right up top: this works well on PDFs that already are tables — financial reports, statements, exports from a spreadsheet app, government data dumps, board-meeting attachments. It works less well on prose PDFs (a column-of-text document collapses to a single column of cell values, which is correct but probably not what you wanted) and not at all on scanned PDFs (image of text, not text — that's an OCR job, see below). Knowing what the tool does and doesn't do up front saves a wasted minute.
When PDF to Excel is the right tool
Tables get trapped in PDFs every day. Quarterly reports, bank statements, tax forms, supplier invoices, government data releases, internal MIS exports, academic papers with results tables — anywhere a spreadsheet originally lived, someone eventually decided to "share the PDF version." The numbers are right there on the page; they're just not in a format you can sum, filter, sort, or pivot.
Real situations where this tool pays off:
- Bank statement reconciliation. Your bank emails monthly statements as PDFs. You want the transactions in a spreadsheet so you can categorize them, sum by merchant, and feed them to your accountant. The transactions are a table on every statement page.
- Vendor invoices. A supplier sends 6 invoices a month as PDFs. Your accounts payable workflow needs them as rows in a workbook so you can total by month and reconcile against POs.
- Research data. A government agency, an academic paper, or an industry report publishes a key data table inside a PDF report. You want it in Excel so you can chart it or join it to your own data.
- Sales reports from your CRM. The CRM has a "Download as PDF" button but no "Download as Excel" — or only on the paid tier. The PDF has the table you'd otherwise be paying $20/seat/month to export differently.
- Class rosters or attendance logs. A teacher or admin gets a roster as a PDF, needs it in a spreadsheet to take attendance, grade, or share with substitutes.
In every case, the underlying data is tabular — fixed rows and columns — and the PDF is just the wrapper someone chose. The converter strips the wrapper and gives you the table back.
How to use the PDF to Excel Converter
One screen. Drop zone at the top, a short callout explaining what works and what doesn't, a convert button below.
- Drop or pick your PDF. Up to 100 MB and 200 pages.
- Read the yellow callout — it tells you what kind of PDF this works on (tables) and what it doesn't (prose, scans, complex layouts).
- Click "Convert to Excel." The tool reads each page, clusters the text fragments into a 2D grid, and emits one sheet per PDF page.
- Download the .xlsx file (named after your source PDF —
invoice.pdf→invoice.xlsx). Open it in Excel, Google Sheets, Numbers, or LibreOffice Calc. - Eyeball the result. Column boundaries are a best-guess based on horizontal gaps — if a column split landed wrong, fix it in your spreadsheet app with Text-to-Columns or by editing the cells directly.
That's it. No "sign up to unlock conversion of more than 5 pages." No 10 MB free-tier cap. No watermark stamped through your data.
How the row-and-column detection actually works
This is the interesting part, and it's worth understanding because it explains both the wins and the failure modes.
PDF.js hands us a list of text fragments per page. Each fragment has: the string it contains, an x/y position on the page (in points — 72 per inch), and a width. A typical page is hundreds to thousands of fragments. Our job: turn that into a grid.
Step one — cluster rows by Y. Fragments whose Y-coordinates are within about 5 points of each other are treated as the same row. 5pt is roughly half a line of body text, so this absorbs the natural baseline variation in any given table row while still separating one row from the next. The result is an ordered list of rows, top to bottom.
Step two — split columns by X-gaps. Within each row, fragments are sorted left to right. Where the horizontal gap between adjacent fragments exceeds about 8 points, we insert a column boundary. 8pt is wider than typical inter-word spacing (a space is around 3pt at 10pt font size) but narrower than the gutter most table designers use between columns. So adjacent words in the same column stay in the same cell; adjacent columns get split apart.
That's the whole algorithm. Two heuristics tuned to the way real tables are actually laid out. No machine learning, no cloud API, no upload-and-wait-for-our-AI. Just geometry. It works well on real tables for the same reason: real tables are the geometry it's expecting.
Where it falters: a "table" that's actually a multi-column page of prose (think a magazine article, a two-column conference paper) will get split into columns by the X-gap detector. That's the algorithm doing exactly what it's supposed to do — there are wide horizontal gaps between the two text blocks. If you wanted the prose back instead, use PDF Extract Text, which preserves reading order.
Big Software alternatives — and the deal each is offering
You have three other options for getting tables out of a PDF, and each comes with its own tax.
Adobe Acrobat Pro ($19.99/month, or $239.88/year) is the gold standard. It has a real table-detection pipeline tuned over decades — line detection, OCR fallback for scanned tables, header recognition, footnote handling. For PDFs without obvious table structure (no borders, merged cells, irregular layouts), it's better than us. The price is the price: a perpetual monthly subscription whether you convert 100 PDFs a month or one. Most people who own Acrobat use it for 5% of what it does.
iLovePDF, SmallPDF, PDF24, ABBYY FineReader Online, Convertio — the upload-first crowd. They take your PDF, send it to a server somewhere, run a conversion there, send you back an .xlsx. Same conversion quality as ours in the easy cases; sometimes better in the hard cases (some of them OCR scanned tables; we don't). The trade: your PDF is now on someone's server. For a public dataset PDF, who cares. For a bank statement, a salary table, a contract with line items, an HR document — you've just emailed it to a stranger and trusted them to delete it. Plus rate limits (SmallPDF caps you at 2 conversions before sign-up; Sejda at 3/hour) and a Pro Plan pitch on every page.
"Copy and paste from the PDF into Excel." Try it once on a multi-column table and watch the columns collapse into a single column of cell values. PDF copy-paste preserves text but discards layout — it's exactly the problem our column detection was built to solve. Works fine for a one-column list. Falls apart on anything wider.
Big Software's pitch is always: the conversion is "free!" but the experience is gated. Free tier capped at 2 files, 10 pages each, with a watermark. Pro tier unlocks the rest. Sign up to remove the cap. Subscribe to remove the watermark. We're picking a fight with that model on purpose. The browser does the work. There's no marginal cost for us. So there's no marginal price for you. "There is a solution for everything" doesn't mean "there is a paid solution for everything."
Worked example: a 6-page bank statement
You're reconciling expenses for the quarter. Your bank emails a monthly statement as a PDF. Six pages, three statements in a folder. Each statement has a header (page 1 with account info, summary), three pages of transactions in a 5-column table (date, description, debit, credit, balance), and a final page of fine print.
What happens when you drop one statement into the converter:
- You drop
statement-2026-04.pdfonto the drop zone. The widget shows the page count (6) and a Convert button. - Click Convert to Excel. About 2 seconds later, the download activates.
- You open
statement-2026-04.xlsx. It has six sheets: "Page 1", "Page 2", ..., "Page 6". - Page 1 has the account header — name, address, account number — clustered into rough rows. Not very useful as a table, but it's faithful to what's on the page.
- Pages 2–4 are the transactions. Each row has 5 columns: date, description, debit, credit, balance. The header row from the bank's PDF is the first row of each sheet.
- Pages 5–6 are the totals and fine print. Mostly useless for reconciliation; ignore them.
You select pages 2–4, copy the rows, paste them into your master workbook. Five seconds of cleanup — a couple of debit/credit cells where the bank's PDF used a strange spacing convention and our column detector split them slightly differently. Click, click, done. Repeat for the other two statements. Total time: about 5 minutes. The same job through Adobe Acrobat would be roughly the same time, plus a $19.99 subscription. Through SmallPDF: slower upload step, plus the bank's data is now on SmallPDF's server.
What it preserves, what it doesn't
Honest expectation-setting saves frustration. Here's what survives the conversion and what doesn't:
| Feature | What happens |
|---|---|
| Cell text and numbers | Preserved exactly as PDF.js reads them |
| Row order | Preserved (top to bottom of each page) |
| Column order | Preserved (left to right of each row) |
| Header row | Becomes the first row of the sheet (no special "header" status applied) |
| Multi-page tables | One sheet per page — the table is split across sheets, you'd recombine in Excel |
| Merged cells | Best-effort: the value goes in the leftmost cell of the merge |
| Borders, colors, fonts | Stripped — values only |
| Formulas | Not applicable — PDFs don't carry formulas, only computed values |
| Scanned tables | Doesn't work — no OCR. Image PDFs come out empty. |
| Multi-column page layout (e.g. magazine columns) | Both columns end up side-by-side in the spreadsheet — usually not what you want |
| Images embedded in cells | Stripped — see Extract PDF Images for those |
If a column boundary lands wrong on your particular PDF — say the converter merged two columns that should have been separate because their horizontal gap was 6pt instead of 8pt — the fix is in Excel itself, not in our tool. Select the column, Data → Text to Columns → Fixed Width → drag the boundary to the right spot. 30 seconds. The alternative — building a UI for users to manually drag column boundaries in our widget — would double the complexity of the tool for the 5% of conversions where it'd help. The Simplicity Pledge says: do one job. We do.
About scanned PDFs and OCR
The single most common "the tool didn't work" report on any PDF converter is: I dropped in a scan, the result was empty. Here's why, and what to do.
A scanned PDF is a picture of a page wrapped in PDF metadata. There's no text inside — every "letter" is a pixel pattern. PDF.js, when asked to extract text from a scan, finds zero text fragments and hands us nothing. So our row/column detector has nothing to cluster, and the .xlsx output is empty. This isn't a bug we can fix in this tool; it's the input being a different kind of object than the tool reads.
What to do: OCR the PDF first. Optical Character Recognition converts the pixels back into text fragments — once that's done, you have a "real" PDF with extractable text, and the converter will work. Options:
- Adobe Acrobat Pro has built-in OCR (Tools → Recognize Text). Run it, save, then run the result through here.
- macOS Preview auto-OCRs PDFs when you open them on recent macOS versions. Save a copy.
- Tesseract (open source, command line) — install via Homebrew or apt, run on your PDF.
- Google Drive — upload, "Open with Google Docs," and Drive OCRs the contents. Privacy note: your PDF is now in Google.
We may add an in-browser OCR tool later (tesseract.js works in browser, just slow), but it's a substantially different operation from text-based extraction and deserves its own tool.
Privacy is the whole reason this exists in the browser
The reason PDF-to-Excel converters are mostly cloud-based is simple: it's easier for the vendor to run the conversion on their server than to ship a fast PDF parser to every user's browser. The user's privacy trade — "give us your PDF, we promise to delete it" — is the cost of the easier engineering choice.
Microapp picked the harder engineering choice on purpose. PDF.js runs in your browser, fast, on any modern device. SheetJS runs in your browser, fast, on any modern device. There is no reason the conversion has to happen on someone else's machine — except that it's slightly more work for the vendor to make it not.
So we did the slightly more work. The result: your bank statement, your salary table, your customer list, your contract — whatever PDF you're converting — stays on your laptop. The Microapp page loaded from our CDN; the conversion runs locally; the .xlsx is built locally and offered as a local download blob. Zero network traffic during convert. You can verify with DevTools. That's not a marketing promise — it's the architecture.
Related tools
Tools that pair naturally with the PDF to Excel Converter:
- Extract Text from PDF — pulls running prose out of a PDF (reading order, not tabular). The right pick when your PDF isn't tabular.
- Extract Images from PDF — pulls the embedded photos out of a PDF at their original resolution.
- Split PDF — break a very large PDF into smaller files before converting (helpful past 200 pages).
- PDF Merger — combine multiple PDFs into one before converting (useful for batching monthly statements).
- Excel to PDF — the reverse direction. Turn a workbook back into a PDF.
- PDF Page Count — quick check on a PDF's length before you convert.
- PDF Redact — black out sensitive fields before sharing the source PDF.
How Microapp pays the rent: annual membership for clean pages and AI work at near-cost; non-members get the same tools with ads. Either way, 10% of every dollar Microapp earns goes to charity — off the top, audited, published quarterly. The PDF to Excel Converter is one of ~115 microapps built to the same standard. Premium quality, for everyone.
Frequently asked questions
How is the table actually detected?
Two-step heuristic on the text fragments PDF.js gives us. Step one: cluster fragments by Y-coordinate — fragments within about 5pt of each other vertically are treated as the same row. Step two: within each row, sort fragments left-to-right and split into separate columns wherever the horizontal gap between adjacent fragments exceeds about 8pt (wider than inter-word spacing but narrower than typical table gutters). It's a heuristic, not magic — it works well on real tables, less well on text that just happens to be laid out in columns.
Does it handle merged cells?
No. PDFs don't have a structural concept of "merged cells" — they have visually-positioned text on a page, and a merged cell is just a single text fragment that happens to span what would be two column positions. The converter places that fragment in one cell (the leftmost it overlaps with). If you need merged cells preserved, you'll need to merge them manually in Excel after conversion — but in our experience, downstream tools work better with unmerged cells anyway.
What if my PDF has multiple pages?
You get one sheet per page in the output workbook, named "Page 1", "Page 2", etc. Total pages cap at 200 (set higher than text-extraction's 500 because the cluster-and-write step is heavier per page). If your PDF is longer, split it first with our PDF Splitter and convert each chunk separately.
How does this compare to Adobe Acrobat or a paid converter?
Adobe Acrobat (and ABBYY, Foxit, etc.) ship with multi-pass table-detection algorithms tuned over decades — line detection, OCR fallback for scanned tables, header recognition, footnote handling. They're better at edge cases: tables with no borders, tables with merged cells, scanned PDFs (which we don't handle at all — no OCR). For the common case — a PDF that's literally an export of a spreadsheet, or a financial report that's structurally tabular — our tool gets 80% of the way there for $0 and zero upload. Use Adobe when you need the last 20%.
Is my PDF really not uploaded?
Correct. PDF.js (the library Firefox uses internally to render PDFs) runs in your browser. SheetJS (the .xlsx encoder) also runs in your browser. Your bytes go from your file system to the browser's memory to the .xlsx download — never to a server. Check your browser's network tab during convert: zero outbound requests.
Why does my prose PDF look like a single column in Excel?
Because that's what it is, structurally. Paragraphs of running text don't have horizontal gaps wide enough to trigger a column split — the words are separated by single-space gaps, well under our 8pt threshold. The converter correctly identifies each line as one cell. If you want each line in its own cell of column A, this is actually working — if you want the words split across columns, you probably want Extract Text from PDF and then a Text-to-Columns step in Excel itself.
What about scanned PDFs?
Doesn't work — same as our other PDF tools. Scanned PDFs are images of text, not text. Converting them requires OCR (Optical Character Recognition), which is a fundamentally different operation and not something this tool does. Run the scan through an OCR tool first (Adobe Acrobat, macOS Preview, or Tesseract), save the OCR'd PDF, then run that through here.
What's the max file size?
100 MB and 200 pages. The cluster-and-encode step is memory-heavy; we cap it lower than the page-count tool. For really big PDFs, split first with the PDF Splitter.