Duplicate files (general)

How to Find Duplicate Documents on Mac (Word, PDF, Pages, Excel) in 2026

Document duplicates on a Mac are sneakier than photo duplicates. You can spot two identical sunset shots at a glance. You cannot easily spot that ProjectBrief_Final_v2.docx and ProjectBrief_Final_v3_REAL.docx contain the same content, or that the PDF you downloaded twice lives in both ~/Downloads and ~/Documents/Archive. This guide covers exactly how to find duplicate documents on Mac, whether you are dealing with Word files, PDFs, Pages documents, or Excel spreadsheets, using built-in tools, Terminal, and dedicated scanners.

Why Document Duplicates Accumulate

Documents pile up differently than media files. The main culprits are:

  • Version-named copies: Files saved as Report_v1.docx, Report_v2.docx, Report_Final.docx, Report_Final_FINAL.docx. These share most content but differ by a sentence or two, so byte-level matching will not catch them as duplicates.
  • Re-downloads: PDFs, spreadsheets, and forms downloaded more than once land in ~/Downloads with identical content but sometimes different filenames (e.g., invoice.pdf and invoice (1).pdf).
  • Cloud sync artifacts: iCloud Drive, Dropbox, and OneDrive can each maintain a local cache. If you switched services, the same document may exist in multiple sync folders.
  • Email attachments saved manually: Files saved from Mail end up in ~/Downloads or ~/Documents even if the original already lives somewhere else.
  • Migration leftovers: Time Machine restores and Migration Assistant can bring over an entire old Documents folder that overlaps significantly with what you already have.

The Two Types of Duplicate Documents You Need to Handle Separately

Before jumping to tools, understand the split. Byte-for-byte duplicates are files with identical content regardless of filename. These are cleanly detectable by comparing checksums. Near-duplicates (versioned files, lightly edited copies) share most content but differ slightly. No free built-in tool handles near-duplicates well. For near-duplicates, your best approach is manual review using Finder's Quick Look.

Step 1: Search Finder by File Type to See the Scope

Start with a simple Finder search to understand what you are working with:

  1. Open a Finder window and press Command+F.
  2. Click the first search filter dropdown (it says "Kind" by default) and choose Other. Search for "File Extension" and select it.
  3. Type pdf in the value field, then repeat for docx, xlsx, and pages.

This shows you all documents of each type across your Mac. Sort by Date Modified or Name to spot obvious pairs like Budget.xlsx and Budget (1).xlsx.

For a faster document-type overview, switch to Finder's Gallery view (Command+4) to preview PDFs and Pages files without opening them.

Step 2: Use Terminal to Find Exact Duplicate PDFs and Word Files

Terminal is the most reliable free method for finding byte-for-byte duplicates. The fdupes approach requires a third-party install, but macOS's built-in find combined with md5 can get the job done.

Find duplicate PDFs in your Documents folder

find ~/Documents -name "*.pdf" -type f | xargs md5 | sort | awk -F'(' '{print $1}' | sort | uniq -d

A cleaner two-step approach that groups duplicates together:

find ~/Documents ~/Downloads -name "*.pdf" -type f -exec md5 {} \; | sort | awk '{print $1}' | sort | uniq -d

This prints the MD5 hashes that appear more than once. To also see the filenames:

find ~/Documents ~/Downloads -name "*.pdf" -type f -exec md5 {} \; | sort -k1,1 | awk 'prev==$1{print prev_file; print $2} {prev=$1; prev_file=$2}'

Replace *.pdf with *.docx, *.xlsx, *.pages, or *.numbers to scan other document types. Adjust the search paths to include ~/Desktop or ~/Library/Mobile\ Documents/com~apple~CloudDocs for iCloud Drive.

Scan iCloud Drive documents

find ~/Library/Mobile\ Documents/com~apple~CloudDocs -name "*.docx" -type f -exec md5 {} \;

iCloud Drive stores synced files here even when "Optimize Mac Storage" is on, as long as the files have been downloaded locally. Files shown with a cloud icon in Finder have not been downloaded yet and will not be found by find until you open them.

Step 3: Check the Downloads Folder First, It Is Usually the Worst Offender

The ~/Downloads folder is where most re-downloaded PDFs and forms accumulate. macOS Sonoma and later versions let you sort Downloads by size in Finder to find large PDF duplicates quickly.

  1. Open ~/Downloads in Finder.
  2. Switch to List view (Command+2).
  3. Click the Name column header to sort alphabetically. Files named statement.pdf and statement (1).pdf will land next to each other.
  4. Select both with Shift+click, press Space to Quick Look both previews side by side to confirm they are identical before deleting one.

Step 4: Handle Versioned Documents ("Final_v2.docx" Problem)

This is where byte matching fails you entirely. Report_v1.docx and Report_v2.docx will have different checksums even if the only change was fixing a typo. The only reliable approach is manual triage with Quick Look.

Triage process for versioned files

  1. In Finder, search for the document base name (e.g., search "Report" in your ~/Documents folder).
  2. Sort results by Date Modified, oldest first.
  3. Select all results and press Space to open Quick Look. Use the left/right arrows to flip through each version.
  4. Keep only the latest version unless you have a specific reason to retain earlier drafts.
  5. Move older versions to a temporary folder first rather than deleting immediately, so you can verify the latest copy has everything you need.

For Word (.docx) and Excel (.xlsx) files, Quick Look renders a readable preview on Sonoma and Sequoia without opening the full app. Pages and Numbers files preview natively in Quick Look as well.

Step 5: Check Where Duplicate Documents Folder Copies Hide

Beyond Downloads and Documents, duplicate documents commonly appear in these locations:

  • ~/Desktop: Working copies that never got filed.
  • ~/Library/Mobile Documents/com~apple~CloudDocs/Documents: iCloud's mirrored Documents folder, which overlaps with ~/Documents if you have "Desktop and Documents Folders" sync turned on in System Settings under Apple ID.
  • ~/Dropbox or ~/OneDrive: Copies from old or parallel sync services.
  • ~/Library/Containers/com.microsoft.Word/Data/Library/: Word's sandboxed container sometimes holds auto-recovery duplicates.
  • External drives or Time Machine backups: Not the same as live duplicates, but worth noting when doing a full audit.

Using a Dedicated Duplicate Document Finder on Mac

For large document libraries (thousands of PDFs or years of Word files), manual Terminal checks and Finder searches get tedious. Dedicated duplicate finders scan by file hash, can filter by file type, and show grouped results so you can review before deleting anything.

Crumb includes a duplicate scan that lets you filter by document file type (PDF, Word, Pages, Excel, and others), so you can target just your documents without wading through media files. Results show grouped sets with file sizes and paths, and Crumb requires you to review the plan before anything is removed. It runs fully on-device and does not need an account, which matters when documents contain sensitive information.

Whatever tool you use, look for one that lets you preview before deleting and that handles grouped sets rather than just flagging individual files.

Safe Deletion Checklist Before You Clear Duplicates

  • Confirm the copy you plan to keep opens correctly and is not corrupted.
  • Check that the "keeper" file has the most recent content, not just the most recent modification date (resaving an old file bumps its date).
  • For PDFs: verify page counts match if you are comparing a re-download with an original.
  • Move to Trash first, do not permanently delete. Give yourself a day before emptying.
  • If documents are in iCloud Drive, confirm they are fully downloaded locally before scanning (no cloud icon next to the filename in Finder).

Finding duplicate documents on a Mac takes a bit more care than clearing photo duplicates, because versioned filenames fool automated tools. Start with Terminal's md5 commands for exact PDF and Word duplicates, use Finder's Quick Look to triage versioned copies manually, and check the usual hiding spots: Downloads, iCloud Drive, and any old sync folders. If you want a faster pass across your whole document library with type filtering built in, Crumb's duplicate scan is a practical option that keeps you in control of what gets removed.

Reclaim your disk in one click

Crumb audits your whole Mac, tells you what's safe to delete, and frees the space in seconds — private, local, and Apple-notarized.

Download Crumb for macOS

Frequently asked questions

Does macOS have a built-in duplicate document finder?
No, macOS does not include a dedicated duplicate finder. You can use Finder's search and Quick Look to manually identify duplicates, and Terminal's md5 command to compare file checksums, but there is no system-level tool that automatically groups and removes duplicate documents. Third-party apps fill this gap.
Will a duplicate PDF finder on Mac catch files with different names but identical content?
Yes, any tool that compares by file hash (MD5 or SHA) will find byte-for-byte identical PDFs regardless of filename. So invoice.pdf and invoice (1).pdf will be flagged as duplicates if their content is exactly the same. Versioned files with even minor content differences will not be caught this way, since their hashes differ.
How do I find duplicate Word docs on Mac stored in iCloud Drive?
iCloud Drive files sync to ~/Library/Mobile Documents/com~apple~CloudDocs on your Mac. Run your duplicate scan against both ~/Documents and that iCloud path. Make sure files are fully downloaded first (no cloud icon in Finder), because files stored only in iCloud are not accessible to local scanning tools until downloaded.
Is it safe to delete older versions of a document if I keep the newest copy?
Usually yes, but verify before deleting. Open the newest version and confirm it contains everything from the older copies, since modification date alone does not guarantee the newest file has the most complete content. Move the older versions to Trash and wait a day or two before permanently emptying, so you have a recovery window.
Why does my Mac show a duplicate Documents folder in iCloud Drive?
If you have Desktop and Documents Folders sync enabled in System Settings under Apple ID, your ~/Documents folder is mirrored to iCloud Drive. This can create apparent duplication if you also have a manual Documents folder inside iCloud Drive. Check System Settings, open Apple ID, click iCloud, then iCloud Drive options, and review which folders are being synced.