Skip to content

Latest commit

 

History

History
258 lines (186 loc) · 7.4 KB

README.rst

File metadata and controls

258 lines (186 loc) · 7.4 KB

bitrot

Detects bit rotten files on the hard drive to save your precious photo and music collection from slow decay.

Installation

Windows:
pip3 install .
Linux:
python3 setup.py install

Usage

Go to the desired directory and simply invoke:

$ bitrot

This will start digging through your directory structure recursively indexing all files found. The index is stored in a ''.bitrot.db'' file which is a SQLite 3 database.

Next time you run ''bitrot'' it will add new files and update the index for files with a changed modification date. Most importantly however, it will report all errors, e.g. files that changed on the hard drive but still have the same modification date.

All paths stored in ''.bitrot.db'' are relative so it's safe to rescan a folder after moving it to another drive. Just remember to move it in a way that doesn't touch modification dates. Otherwise the checksum database is useless.

Performance

Obviously depends on how fast the underlying drive is. Historically the script was single-threaded because back in 2013 checksum calculations on a single core still outran typical drives, including the mobile SSDs of the day. In 2020 this is no longer the case so the script now uses a process pool to calculate SHA1 hashes and perform 'stat()' calls.

No rigorous performance tests have been done. Scanning a ~1000 file directory totalling ~5 GB takes 2.2s on a 2018 MacBook Pro 15" with a AP0512M SSD. Back in 2013, that same feat on a 2015 MacBook Air with a SM0256G SSD took over 20 seconds.

On that same 2018 MacBook Pro 15", scanning a 60+ GB music library takes 24 seconds. Back in 2013, with a typical 5400 RPM laptop hard drive it took around 15 minutes. How times have changed!

Tests

There's a simple but comprehensive test scenario using pytest and pytest-order <https://pypi.org/p/pytest-order>.

Install:

$ python3 -m venv .venv
$ . .venv/bin/activate
(.venv)$ pip install -e .[test]

Run:

(.venv)$ pytest -x
==================== test session starts ====================
platform darwin -- Python 3.10.12, pytest-7.4.0, pluggy-1.2.0
rootdir: /Users/ambv/Documents/Python/bitrot
plugins: order-1.1.0
collected 12 items

tests/test_bitrot.py ............                      [100%]

==================== 12 passed in 15.05s ====================

Change Log

1.0.2

  • Integration with Healthchecks.io, Pushober, PushBullet
  • Officially remove Python 2 support that was broken since 1.0.0 anyway; now the package works with Python 3.9+ because of a few features

1.0.1

  • Can now include hidden files with --hidden

1.0.0

  • Significantly sped up execution on solid state drives by using a process pool executor to calculate SHA1 hashes and perform stat() calls; use -w1 if your runs on slow magnetic drives were negatively affected by this change

0.9.4

  • Added better progress bar (pip3 install progressbar2)
  • Can now specify source and destination directories

0.9.3

  • Added option to ignore date modified (only checks hashes). Great for verifying backups for integrity (File Integrity Monitoring) using -t 2 or --test 2
  • Added option to allow testing of only recent (default: last 1 day) of recently modified data (great for checking a backup you just synced for integrity) using -r or --recent
  • Added database vacuuming to shrink DB size on hard drive of old hashes that went missing
  • Added logging to file using -g or ---log (on by default)
  • Added email support for hash mismatch using -e or --email (on by default)
  • Added a time elapsed counter
  • Can now fix files that have invalid modification date, and rename files/dirs that have bad chars in name, using -f or --fix (dangerous)
  • Total size now printed in B, KB, MB, GB, TB
  • Can now include and exclude at same time, and fixed logic. Exclude takes prescendence
  • Added ability to specify hash function from command line. I found SHA512 to be just as fast as SHA1 on my machine using -a or --hashing-function
  • Fixed bug when file doesn't have a valid modification timestamp
  • Can now create MD5 or SFV files using -c or --sfv
  • Now prints out ignored files with verbosity level 4
  • Fixes for invalid characters in file names
  • Integrates benshep's and liloman's latest changes
  • Better warning printing
  • Added grammar fixes

0.9.2

  • bugfix: one place in the code incorrectly hardcoded UTF-8 as the filesystem encoding

0.9.1

  • bugfix: print the path that failed to decode with FSENCODING
  • bugfix: when using -q, don't hide warnings about files that can't be statted or read
  • bugfix: -s is no longer broken on Python 3

0.9.0

  • bugfix: bitrot.db checksum checking messages now obey --quiet
  • Python 3 compatibility

0.8.0

  • bitrot now keeps track of its own database's bitrot by storing a checksum of .bitrot.db in .bitrot.sha512
  • bugfix: now properly uses the filesystem encoding to decode file names for use with the .bitrotdb database. Report and original patch by pallinger.

0.7.1

  • bugfix: SHA1 computation now works correctly on Windows; previously opened files in text-mode. This fix will change hashes of files containing some specific bytes like 0x1A.

0.7.0

  • when a file changes or is renamed, the timestamp of the last check is updated, too
  • bugfix: files that disappeared during the run are now properly ignored
  • bugfix: files that are locked or with otherwise denied access are skipped. If they were read before, they will be considered "missing" in the report.
  • bugfix: if there are multiple files with the same content in the scanned directory tree, renames are now handled properly for them
  • refactored some horrible code to be a little less horrible

0.6.0

  • more control over performance with ''--commit-interval'' and ''--chunk-size'' command-line arguments
  • bugfix: symbolic links are now properly skipped (or can be followed if ''--follow-links'' is passed)
  • bugfix: files that cannot be opened are now gracefully skipped
  • bugfix: fixed a rare division by zero when run in an empty directory

0.5.1

  • bugfix: warn about test mode only in test mode

0.5.0

  • ''--test'' command-line argument for testing the state without updating the database on disk (works for testing databases you don't have write access to)
  • size of the data read is reported upon finish
  • minor performance updates

0.4.0

  • renames are now reported as such
  • all non-regular files (e.g. symbolic links, pipes, sockets) are now skipped
  • progress presented in percentage

0.3.0

  • ''--sum'' command-line argument for easy comparison of multiple databases

0.2.1

  • fixed regression from 0.2.0 where new files caused a ''KeyError'' exception

0.2.0

  • ''--verbose'' and ''--quiet'' command-line arguments
  • if a file is no longer there, its entry is removed from the database

0.1.0

  • First published version.

Authors

Glued together by 'Lukasz Langa <mailto:[email protected]>'_. Multiple improvements by 'Ben Shepherd <mailto:[email protected]>'_, 'Jean-Louis Fuchs <mailto:[email protected]>'_, 'Marcus Linderoth <[email protected]>'_, 'p1r473 <mailto:[email protected]>'_, 'Peter Hofmann <mailto:[email protected]>'_, 'Phil Lundrigan <mailto:[email protected]>'_, 'Reid Williams <[email protected]>'_, 'Stan Senotrusov <[email protected]>'_, 'Yang Zhang <mailto:[email protected]>'_, and 'Zhuoyun Wei <[email protected]>'_