Efficiently delete millions of files on Linux

26 Apr 2020 @ 2:51 PM

TL;DR : cd ~/public_html/var/session/ && ls -1 | wc -l && perl -e 'for(<*>){((stat)[9]<(unlink))}' && ls -1 | wc -l with a close follower from rsync: rsync -a – delete /tmp/empty/ ~/public_html/var/session/ .

I had to delete ~1.700.000 files in a Magento ./var/session/ folder

the Plesk interface crashed after a few minutes because it didn’t have the memory to list the folder’s content
after ssh-ing into the machine, the classic rm -rf var/session/* also crashed afeter a few minutes with the error: ” -sh: /usr/bin/rm: Argument list too long”

Count the initial number of files:

cd ~/public_html/var/session/
ls | wc -l

So I started looking on other solutions on the wild web and I came to this Kinamo post: Efficiently delete a million files on Linux servers that had 4 variants:

`-sh-4.2$ rm -rf var/session/*` -> -sh: /usr/bin/rm: Argument list too long
`find /yourmagicmap/* -type f -mtime +3 -exec rm -f {} \;`
`find /yourmagicmap/* -type f -mtime +3 -delete`
`-sh-4.2$ rsync -a –delete /tmp/empty/ var/session/`

Details on those variants:

rm: deleting millions of file is a no-can-do!
find -exec: an option, but slower!
find -delete: fast and easy way to remove loads of files.
rsync –delete: without doubt the quickest!

Beside that post, I found another solution proposed on a Unix StackExchange thread: Faster way to delete large number of files [duplicate] which had answers on another one: Efficiently delete large directory containing thousands of files. The solution was a delete-in-5000-files-batches script:

#!/bin/bash

# Path to folder with many files
FOLDER="/path/to/folder/with/many/files"

# Temporary file to store file names
FILE_FILENAMES="/tmp/filenames"

if [ -z "$FOLDER" ]; then
    echo "Prevented you from deleting everything! Correct your FOLDER variable!"
    exit 1
fi

while true; do
    FILES=$(ls -f1 $FOLDER | wc -l)
    if [ "$FILES" -gt 10000 ]; then
        printf "[%s] %s files found. going on with removing\n" "$(date)" "$FILES"
        # Create new list of files
        ls -f1 $FOLDER | head -n 5002 | tail -n 5000 > "$FILE_FILENAMES"

        if [ -s $FILE_FILENAMES ]; then
            while read FILE; do
                rm "$FOLDER/$FILE"
                sleep 0.005
            done < "$FILE_FILENAMES"
        fi
    else
        printf "[%s] script has finished, almost all files have been deleted" "$(date)"
        break
    fi
    sleep 5
done

Stats for my test:

batch-delete-script: 5000 files / 43 seconds -> ~100 files/s
rsync: 50.000 files / 6 seconds -> 8300 files/s !!

Obviously, the rsync is the fastest solution for deleting a huge number of files!

PS. There was another solution on one of the stackexchange threads above, that claimed that a Perl one-liner would be even faster:

cd ~/public_html/var/session/ && ls -1 | wc -l && perl -e ‘for(<*>){((stat)[9]<(unlink))}' && ls -1 | wc -l

I tested it and it was really faster…

Posted By: Teodor Muraru
Last Edit: 29 Jan 2021 @ 01:50 PM

Email • Permalink

Categories: Linux, Technology

Teodor Muraru

Efficiently delete millions of files on Linux

Responses to this post » (None)

About this Post

Personalizari

Reduceri la cumparaturi

eMAG - Furnizorul solutiilor complete!

Cashback

Referrals

Vouchers

Disclaimer

Despre mine

Filme vazute

Carti citite

Aplicatii utile – Android, Windows, Linux

Bancuri

Muzica