Document duplicates on a Mac are sneakier than photo duplicates. You can spot two identical sunset shots at a glance. You cannot easily spot that ProjectBrief_Final_v2.docx and ProjectBrief_Final_v3_REAL.docx contain the same content, or that the PDF you downloaded twice lives in both ~/Downloads and ~/Documents/Archive. This guide covers exactly how to find duplicate documents on Mac, whether you are dealing with Word files, PDFs, Pages documents, or Excel spreadsheets, using built-in tools, Terminal, and dedicated scanners.
Why Document Duplicates Accumulate
Documents pile up differently than media files. The main culprits are:
- Version-named copies: Files saved as
Report_v1.docx,Report_v2.docx,Report_Final.docx,Report_Final_FINAL.docx. These share most content but differ by a sentence or two, so byte-level matching will not catch them as duplicates. - Re-downloads: PDFs, spreadsheets, and forms downloaded more than once land in
~/Downloadswith identical content but sometimes different filenames (e.g.,invoice.pdfandinvoice (1).pdf). - Cloud sync artifacts: iCloud Drive, Dropbox, and OneDrive can each maintain a local cache. If you switched services, the same document may exist in multiple sync folders.
- Email attachments saved manually: Files saved from Mail end up in
~/Downloadsor~/Documentseven if the original already lives somewhere else. - Migration leftovers: Time Machine restores and Migration Assistant can bring over an entire old
Documentsfolder that overlaps significantly with what you already have.
The Two Types of Duplicate Documents You Need to Handle Separately
Before jumping to tools, understand the split. Byte-for-byte duplicates are files with identical content regardless of filename. These are cleanly detectable by comparing checksums. Near-duplicates (versioned files, lightly edited copies) share most content but differ slightly. No free built-in tool handles near-duplicates well. For near-duplicates, your best approach is manual review using Finder's Quick Look.
Step 1: Search Finder by File Type to See the Scope
Start with a simple Finder search to understand what you are working with:
- Open a Finder window and press Command+F.
- Click the first search filter dropdown (it says "Kind" by default) and choose Other. Search for "File Extension" and select it.
- Type
pdfin the value field, then repeat fordocx,xlsx, andpages.
This shows you all documents of each type across your Mac. Sort by Date Modified or Name to spot obvious pairs like Budget.xlsx and Budget (1).xlsx.
For a faster document-type overview, switch to Finder's Gallery view (Command+4) to preview PDFs and Pages files without opening them.
Step 2: Use Terminal to Find Exact Duplicate PDFs and Word Files
Terminal is the most reliable free method for finding byte-for-byte duplicates. The fdupes approach requires a third-party install, but macOS's built-in find combined with md5 can get the job done.
Find duplicate PDFs in your Documents folder
find ~/Documents -name "*.pdf" -type f | xargs md5 | sort | awk -F'(' '{print $1}' | sort | uniq -d
A cleaner two-step approach that groups duplicates together:
find ~/Documents ~/Downloads -name "*.pdf" -type f -exec md5 {} \; | sort | awk '{print $1}' | sort | uniq -d
This prints the MD5 hashes that appear more than once. To also see the filenames:
find ~/Documents ~/Downloads -name "*.pdf" -type f -exec md5 {} \; | sort -k1,1 | awk 'prev==$1{print prev_file; print $2} {prev=$1; prev_file=$2}'
Replace *.pdf with *.docx, *.xlsx, *.pages, or *.numbers to scan other document types. Adjust the search paths to include ~/Desktop or ~/Library/Mobile\ Documents/com~apple~CloudDocs for iCloud Drive.
Scan iCloud Drive documents
find ~/Library/Mobile\ Documents/com~apple~CloudDocs -name "*.docx" -type f -exec md5 {} \;
iCloud Drive stores synced files here even when "Optimize Mac Storage" is on, as long as the files have been downloaded locally. Files shown with a cloud icon in Finder have not been downloaded yet and will not be found by find until you open them.
Step 3: Check the Downloads Folder First, It Is Usually the Worst Offender
The ~/Downloads folder is where most re-downloaded PDFs and forms accumulate. macOS Sonoma and later versions let you sort Downloads by size in Finder to find large PDF duplicates quickly.
- Open
~/Downloadsin Finder. - Switch to List view (Command+2).
- Click the Name column header to sort alphabetically. Files named
statement.pdfandstatement (1).pdfwill land next to each other. - Select both with Shift+click, press Space to Quick Look both previews side by side to confirm they are identical before deleting one.
Step 4: Handle Versioned Documents ("Final_v2.docx" Problem)
This is where byte matching fails you entirely. Report_v1.docx and Report_v2.docx will have different checksums even if the only change was fixing a typo. The only reliable approach is manual triage with Quick Look.
Triage process for versioned files
- In Finder, search for the document base name (e.g., search "Report" in your
~/Documentsfolder). - Sort results by Date Modified, oldest first.
- Select all results and press Space to open Quick Look. Use the left/right arrows to flip through each version.
- Keep only the latest version unless you have a specific reason to retain earlier drafts.
- Move older versions to a temporary folder first rather than deleting immediately, so you can verify the latest copy has everything you need.
For Word (.docx) and Excel (.xlsx) files, Quick Look renders a readable preview on Sonoma and Sequoia without opening the full app. Pages and Numbers files preview natively in Quick Look as well.
Step 5: Check Where Duplicate Documents Folder Copies Hide
Beyond Downloads and Documents, duplicate documents commonly appear in these locations:
~/Desktop: Working copies that never got filed.~/Library/Mobile Documents/com~apple~CloudDocs/Documents: iCloud's mirrored Documents folder, which overlaps with~/Documentsif you have "Desktop and Documents Folders" sync turned on in System Settings under Apple ID.~/Dropboxor~/OneDrive: Copies from old or parallel sync services.~/Library/Containers/com.microsoft.Word/Data/Library/: Word's sandboxed container sometimes holds auto-recovery duplicates.- External drives or Time Machine backups: Not the same as live duplicates, but worth noting when doing a full audit.
Using a Dedicated Duplicate Document Finder on Mac
For large document libraries (thousands of PDFs or years of Word files), manual Terminal checks and Finder searches get tedious. Dedicated duplicate finders scan by file hash, can filter by file type, and show grouped results so you can review before deleting anything.
Crumb includes a duplicate scan that lets you filter by document file type (PDF, Word, Pages, Excel, and others), so you can target just your documents without wading through media files. Results show grouped sets with file sizes and paths, and Crumb requires you to review the plan before anything is removed. It runs fully on-device and does not need an account, which matters when documents contain sensitive information.
Whatever tool you use, look for one that lets you preview before deleting and that handles grouped sets rather than just flagging individual files.
Safe Deletion Checklist Before You Clear Duplicates
- Confirm the copy you plan to keep opens correctly and is not corrupted.
- Check that the "keeper" file has the most recent content, not just the most recent modification date (resaving an old file bumps its date).
- For PDFs: verify page counts match if you are comparing a re-download with an original.
- Move to Trash first, do not permanently delete. Give yourself a day before emptying.
- If documents are in iCloud Drive, confirm they are fully downloaded locally before scanning (no cloud icon next to the filename in Finder).
Finding duplicate documents on a Mac takes a bit more care than clearing photo duplicates, because versioned filenames fool automated tools. Start with Terminal's md5 commands for exact PDF and Word duplicates, use Finder's Quick Look to triage versioned copies manually, and check the usual hiding spots: Downloads, iCloud Drive, and any old sync folders. If you want a faster pass across your whole document library with type filtering built in, Crumb's duplicate scan is a practical option that keeps you in control of what gets removed.