Projects tagged ‘archive’ and ‘python’


Jump to tag:

Projects tagged ‘archive’ and ‘python’

Filtered by Project Tags archive python

Refine results Project Tags backup (7) incremental (3) tar (3) rsync (2) document (2) s3 (2) scanner (1) wxpython (1) versioncontrol (1) gnupg (1) remote (1) list (1)

[15 total ]

19 Users
   

Duplicity backs up directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space ... [More] efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server. The duplicity package also includes the rdiffdir utility. Rdiffdir is an extension of librsync's rdiff to directories---it can be used to produce signatures and deltas of directories as well as regular files. These signatures and deltas are in GNU tar format. [Less]
Created over 3 years ago.

5 Users

CDS Invenio (formerly CDSware) is a suite of applications that provides the framework and tools for building and managing an autonomous digital library server. It complies with the Open Archives ... [More] Initiative metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic standard. Its flexibility and performance make it a comprehensive solution for the management of document repositories of moderate to large size. [Less]
Created about 1 year ago.

1 Users
 

Manent is an algorithmically strong backup and archival program. It features: Efficient backup to anything that looks like a storage. Currently it supports plain filesystem ("directory"), ftp and ... [More] sftp. Planned are Amazon S3, optical disks and email (smtp + imap). Manent can work (making progress towards finishing a backup) over a slow and unreliable network. Manent can offer online access to the contents of the backup. Currentlly, local FTP serving is being worked on; in the future, FUSE support for Linux will be added. Backed up storage is completely encrypted. Backup is incremental, including changed parts of large files. Moved, renamed and duplicate files will not require additional storage. Several computers can use the same storage for backup, automatically sharing data. Both very l [Less]
Created 12 months ago.

1 Users

sync2cd is an incremental archiving tool. It allows backing up complete filesystem hierarchies to multiple backup media (e.g. CD-R). Files are archived incrementally, i.e. only new or changed files ... [More] are stored during an archive operation. All entity types are supported: directories, files, symlinks, named pipes, sockets, block and character devices. [Less]
Created about 1 year ago.

1 Users

We are aiming for a comprehensive scalable software solution for electronic medical practice with emphasis on privacy protection, secure patient centric record sharing, decision support and ease of use.
Created over 3 years ago.

1 Users

fwbackups is a feature-rich user backup program that allows you to backup your documents anytime, anywhere. fwbackups offers a simple but powerful interface and supports multiple scheduled backups ... [More] , on-demand backups as well as restores. fwbackups can backup to a local disk or alternatively to another host using SFTP. [Less]
Created 11 months ago.

0 Users

Python module to archive files based on date. Currently only supports text files but more files will be added in the future.
Created 11 months ago.

0 Users

DebPPA (Debian PPA) is the same as the Ubuntu PPA, but for Debian. It's in development with the aim of having the exact same functionality as the Ubuntu PPA. Please follow the links in my initial ... [More] email for the motivation and goals of this project. See the Installation wiki page (on the right panel) for documentation. I am currently working on: http://wiki.debian.org/svnbuildstat and I will merge debppa with svnbuildstat. This will provide the service we need and much more. Please, join us as well. Please report all bugs (into the Issues, or to the mailinglist). [Less]
Created 12 months ago.

0 Users

'baubau' is a multithreaded backup tool written in python. It aims to create a backup of all unrecoverable files which either don't belong to any package or have been modified. The logic behind it ... [More] is fairly simple: the main loop walks the file-system searching for files a second thread checks whether the file belongs to a package (using information from rpmdb) if the file belongs to a package, a third thread compares it against the rpm database (size and eventually md5 checksum) all modified files and new files are listed to a file (and packed into a tar archive on the fly if requested) To avoid including unnecessary files and thus saving space, baubau uses regexp expressions to force the exclusion or inclusion of certain files (for instance media, log and lock files are excluded in /etc/baubau/exclude_files). Generally speaking, the resulting archive will typically contain all files which cannot be restored by simply reinstalling packages. For instance: your home directory all the configuration files you have modified your log files baubau will create a directory in your home directory (-d to specify the directory): [root@navid-laptop ~]# ls -l /root/baubau-20070313-131511 total 1892 -rw-r--r-- 1 root root 0 Mar 13 13:15 excluded_files -rw-r--r-- 1 root root 278 Mar 13 13:15 excluded_files_regexp -rw-r--r-- 1 root root 0 Mar 13 13:15 excluded_pkg_files -rw-r--r-- 1 root root 1900544 Mar 13 13:15 included_files -rw-r--r-- 1 root root 0 Mar 13 13:15 included_files_regexp -rw-r--r-- 1 root root 25425 Mar 13 13:15 rpm-qaThis directory will also be included in the tar-ball archive if you decide to let baubau create one for you (-z option). Ideally, in order to fully restore from backup all you have to do is reinstall the rpm packages and then extract the archive produced by 'baubau' over the root file-system. [Less]
Created 12 months ago.

0 Users

AboutArchiveFS is a FUSE file system used for archiving and backup. Its primary function is to ensure that multiple copies of a file are only represented as a single file. The representation of the ... [More] file system is intentionally kept simple and consists just of a single SQLite3 database file and table (which can be dumped into a text file), together with a directory full of files. The file system is not intended for general purpose computing, but mostly for copying data in and out. It seems to be working reasonably well for backup, and even file system intensive operations like software builds seem to complete OK. Please give it a good try and workout, but don't blame me if you lose any data. UsageJust check out the source code. You do need the python-fuse and python-sqlite3 packages (Ubuntu) or their equivalents. To start it up, use a command like: $ python archivefs.py -o root=/somewhere/FSDATA /my/mountpoint $ echo hello world > /my/mountpoint/new-file $ cat /my/mountpoint/new-fileThe root directory must exist and be writable by you. The root directory contains the database file (DB), a working directory for temporary files (WORKING), and an archival directory containing the actual, permanent files (ARCHIVE). The file system will create those if they don't already exist. When you're done, you should unmount the directory as usual: $ fusermount -u /my/mountpointIt's intended to be used with something like: cp -av /home/tmb /backup/tmb-$(date)You can get some file metadata via getfattr and attr: attr -g _id file -- the unique file id attr -g _storage file -- the path to the actual file attr -g _instances file -- a list of all paths referring to this content Note the following points: file permissions aren't enforced (but are recorded) link counts are not preserved deleting a file only deletes its entry, it doesn't recover the space automatically There are a number of things I can't find good documentation and that I therefore don't quite understand in fuse-python: hardlinks and concurrent updates through different paths the degree of threading (apparently, not much, but enough to cause occasional problems) how mmap is handled You can reconstruct a directory tree easily from an md5sum dump and the contents of the archive disk; you don't need FUSE. To create such a dump manually, just write: $ find . -type f -print0 | xargs -0 md5sum > my.md5sums(I'll upload some scripts for this at some point.) HistoryThis code replaces (and is based on) a bunch of shell scripts I've been using for backup for a couple of decades that also used checksums for storage but stored the mapping in a plain text file. The reason why a file system is nicer than the scripts is because it's possible not only to copy into the archival tree, but also untar tar files in it directly, copy data in remotely, etc. With FUSE, it's finally easy and portable enough to do this (last time I looked into doing this, this still required a lot of painful kernel-level C programming.) InternalsIt's written in Python using the python-fuse package. The representation of the file system is pretty simple: root/DB -- sqlite3 database file containing metadata and ids root/ARCHIVE/xx/yy/xxyyzzzzz... -- the actual content, stored by id to keep directory size down, this has two levels of directories root/WORKING/zzzzzzzz... -- temporary working files TODOThere are a bunch of things to be done: important clean up the code write a text file dumper for the database smart command line tools for local and remote copies/sync garbage collecting defunct working files on startup garbage collecting defunct archival files on demand (after a big removal) automatic garbage collection of defunct archival files upon deletion add metadata handling and searchrecord checksum and discard) well-known checksums (just transparent gzip compression/decompression of chunks would be nice record-and-discard well-known checksum (can retrieve from the web, maybe store URL) by file name by mime type separate directory and file name columns to make dir listings faster tokenize directory names to save space id available via extended attribute speed it up by caching and other tricks better multithreading (maybe port to IronPython) record user ids in text form and resolve at runtime fix global scope for fs variable transparently handle files inside archives write a test suite and perform more extensive testing perform explicit in-memory buffering for checksumming and copying use a larger checksum to make collisions less likely add non-FUSE command line tools for storing and accessing the data handle extended attributes tools for reporting logical vs physical usage move small file operations in memory transparent mounting of the underlying file system long term ideas (maybe a different project) handle file parts by partitioning files at type-dependent boundaries e.g., paragraph boundaries, MP3 chunks, mbox message boundaries, etc. transparently disassemble and assemble archive formats S3 backend stick very small files into the database distributed storage across disks distributed storage across the network change tracking time-machine like functionality i.e. represent trees at different points in time explicitly also saves database space for frequent backups this needs to have a notion of a completed checkpoint, so... archivefs-open-replica old-tree new-tree rsync ... source new-tree archivefs-close-replica new-tree old-tree [Less]
Created 2 months ago.