smart_merge project

This is a work in progress tool to solve some issues when collecting many, many old backups I had over the years. I hate losing memories (being these pictures, old writings, pieces of code, excel sheets with projects I imagined, notes from friemds, etc) so I kept a vast collection of unusable backups. After many years of trying to sort this out I decided to give it a try and focus on making something that may actually work. Time will tell :)

Executables

The project includes three different executable files: delete_duplicates.py, scan.py, merge.py

The functionality has been separated in different executables because, while they maintain some similarities, the goals and modes of operations are very different.

delete_duplicates.py: The goal of this exec is to process a given source directory and create a list of files that follow some rules (using the definitions in should_ignore function). If the file is not ignored and has been already found somewhere else in the same session, the script will delete the file (or create a script to delete the file after)
scan.py: The goal of this exec is to manipulate a database of objects, allowing the user to create a new scan or work with previously scanned files.
merge.py: The goal of this exec is to merge the content of a source folder into a target folder, manipulating the content if the file already exists. To do so, the system uses a database of previously seen files and according to it, decides what to do with objects that have the same content.

Data Store Implementation

The project includes three different versions of the DataStore backend: in-memory dict, the Shelve standard module and sqllite3.

At the moment, the selection of the backend is not configurable. You can only change by calling the right class (MemoryDataStore, ShelveDataStore, DataStore) - only DataStore implements all required functionalities at the moment, but it may be very slow for some large operations.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
benchmark.py		benchmark.py
config.py		config.py
data.py		data.py
data_store.py		data_store.py
delete_duplicates.py		delete_duplicates.py
merge.py		merge.py
merge_project.jpeg		merge_project.jpeg
prof_stats.py		prof_stats.py
scan.py		scan.py
stats.py		stats.py
test_data_store.py		test_data_store.py
test_merge.py		test_merge.py
test_scan.py		test_scan.py
test_stats.py		test_stats.py
todo.md		todo.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

smart_merge project

Executables

Data Store Implementation

About

Releases

Packages

Languages

pcolazurdo/smart_merge

Folders and files

Latest commit

History

Repository files navigation

smart_merge project

Executables

Data Store Implementation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages