Efficiently delete millions of files on Linux
TL;DR : cd ~/public_html/var/session/ && ls -1 | wc -l && perl -e 'for(<*>){((stat)[9]<(unlink))}' && ls -1 | wc -l with a close follower from rsync: rsync -a --delete /tmp/empty/ ~/public_html/var/session/ .
I had to delete ~1.700.000 files in a Magento ./var/session/ folder
- the Plesk interface crashed after a few minutes because it didn’t have the memory to list the folder’s content
- after ssh-ing into the machine, the classic rm -rf var/session/* also crashed afeter a few minutes with the error: ” -sh: /usr/bin/rm: Argument list too long”
Count the initial number of files:
cd ~/public_html/var/session/
ls | wc -l
So I started looking on other solutions on the wild web and I came to this Kinamo post: Efficiently delete a million files on Linux servers that had 4 variants:
- `-sh-4.2$ rm -rf var/session/*` -> -sh: /usr/bin/rm: Argument list too long
- `find /yourmagicmap/* -type f -mtime +3 -exec rm -f {} \;`
- `find /yourmagicmap/* -type f -mtime +3 -delete`
- `-sh-4.2$ rsync -a —delete /tmp/empty/ var/session/`
Details on those variants:
- rm: deleting millions of file is a no-can-do!
- find -exec: an option, but slower!
- find -delete: fast and easy way to remove loads of files.
- rsync —delete: without doubt the quickest!
Beside that post, I found another solution proposed on a Unix StackExchange thread: Faster way to delete large number of files [duplicate] which had answers on another one: Efficiently delete large directory containing thousands of files. The solution was a delete-in-5000-files-batches script:
#!/bin/bash
# Path to folder with many files
FOLDER="/path/to/folder/with/many/files"
# Temporary file to store file names
FILE_FILENAMES="/tmp/filenames"
if [ -z "$FOLDER" ]; then
echo "Prevented you from deleting everything! Correct your FOLDER variable!"
exit 1
fi
while true; do
FILES=$(ls -f1 $FOLDER | wc -l)
if [ "$FILES" -gt 10000 ]; then
printf "[%s] %s files found. going on with removing\n" "$(date)" "$FILES"
# Create new list of files
ls -f1 $FOLDER | head -n 5002 | tail -n 5000 > "$FILE_FILENAMES"
if [ -s $FILE_FILENAMES ]; then
while read FILE; do
rm "$FOLDER/$FILE"
sleep 0.005
done < "$FILE_FILENAMES"
fi
else
printf "[%s] script has finished, almost all files have been deleted" "$(date)"
break
fi
sleep 5
done
Stats for my test:
- batch-delete-script: 5000 files / 43 seconds -> ~100 files/s
- rsync: 50.000 files / 6 seconds -> 8300 files/s !!
Obviously, the rsync is the fastest solution for deleting a huge number of files!
PS. There was another solution on one of the stackexchange threads above, that claimed that a Perl one-liner would be even faster:
cd ~/public_html/var/session/ && ls -1 | wc -l && perl -e ‘for(<*>){((stat)[9]<(unlink))}’ && ls -1 | wc -l
I tested it and it was really faster…