Skip to content

fahidsarker/s3-diff-archive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

S3 Diff Archive

A powerful, efficient command-line tool for incremental backup and archiving of files to Amazon S3. This tool performs differential backups by only archiving files that have changed since the last backup, making it ideal for large datasets where full backups would be inefficient.

πŸš€ Features

  • Incremental Backups: Only archives files that have changed since the last backup
  • S3 Integration: Direct upload to Amazon S3 with configurable storage classes
  • Password Protection: Encrypt your archives with password-based encryption
  • File Filtering: Support for exclude patterns using glob syntax
  • Multiple Tasks: Configure multiple backup tasks in a single configuration file
  • Database Tracking: Uses BadgerDB to track file states and changes
  • Compression: Automatic ZIP compression with configurable size limits
  • Restoration: Experimental restore functionality from archived backups (Must request deep_archive restore first from s3)
  • Detailed Logging: Comprehensive logging for monitoring and debugging
  • Notifications: Configurable notification system for operation status updates

πŸ“¦ Installation

Download Pre-built Binaries

Pre-compiled binaries are available for download from the Releases section. Choose the appropriate binary for your operating system:

  • Linux: s3-diff-archive-linux-amd64
  • macOS: s3-diff-archive-darwin-amd64 (Intel) or s3-diff-archive-darwin-arm64 (Apple Silicon)
  • Windows: s3-diff-archive-windows-amd64.exe

Build from Source

If you prefer to build from source:

git clone https://github.com/fahidsarker/s3-diff-archive.git
cd s3-diff-archive
go build -o s3-diff-archive .

βš™οΈ Configuration

Environment Variables

Create a .env file in your working directory with your AWS credentials:

AWS_ACCESS_KEY_ID=your_access_key_here
AWS_SECRET_ACCESS_KEY=your_secret_key_here
AWS_REGION=us-east-1
S3_BUCKET=your-bucket-name

Configuration File

Create a YAML configuration file (e.g., config.yaml) based on the sample:

# Base path in S3 bucket where archives will be stored
s3_base_path: "backups/my-project"

# Directory to store logs (optional)
logs_dir: "./logs"

# Temporary directory for creating zip files
working_dir: "./tmp"

# Maximum size for each zip file in MB
max_zip_size: 5000

# Notification script for operation status updates (optional)
# Available placeholders: %icon%, %operation%, %status%, %message%
notify_script: 'echo "%icon% %operation% - %status% | %message%"'

# Backup tasks configuration
tasks:
  - id: photos
    dir: "./photos"
    storage_class: "DEEP_ARCHIVE"  # Cost-effective for long-term storage
    encryption_key: "MySecurePassword123"
    exclude: ["**/.DS_Store", "**/Thumbs.db", "**/*.tmp"]

  - id: documents
    dir: "./documents"
    storage_class: "STANDARD_IA"   # For infrequently accessed files
    encryption_key: "AnotherSecurePassword456"

  - id: videos
    dir: "./videos"
    storage_class: "GLACIER"       # Even more cost-effective for archives

Storage Classes

Choose the appropriate S3 storage class based on your access patterns and cost requirements:

  • STANDARD: For frequently accessed data
  • INTELLIGENT_TIERING: Automatic cost optimization
  • STANDARD_IA: For infrequently accessed data
  • ONEZONE_IA: Lower cost for infrequently accessed data (single AZ)
  • GLACIER: For archival data accessed once or twice per year
  • DEEP_ARCHIVE: Lowest cost for long-term archival (7-10 years)

Notification System

The tool supports configurable notifications for operation status updates. Configure the notify_script in your config file to receive notifications:

# Simple echo notification (default)
notify_script: 'echo "%icon% %operation% - %status% | %message%"'

# macOS notification using osascript
notify_script: 'osascript -e "display notification \"%message%\" with title \"S3 Archive - %operation%\" subtitle \"%status%\""'

# Linux notification using notify-send
notify_script: 'notify-send "S3 Archive - %operation%" "%message%" --urgency=normal'

# Slack webhook notification
notify_script: 'curl -X POST -H "Content-type: application/json" --data "{\"text\":\"%icon% %operation% - %status%: %message%\"}" YOUR_SLACK_WEBHOOK_URL'

# Discord webhook notification
notify_script: 'curl -H "Content-Type: application/json" -d "{\"content\":\"%icon% %operation% - %status%: %message%\"}" YOUR_DISCORD_WEBHOOK_URL'

Available Placeholders

  • %icon%: Status-specific emoji (βœ… for success, ❌ for error, ⚠️ for warning, ❌⚠️❌⚠️ for fatal)
  • %operation%: The operation being performed (scan, archive, restore, system)
  • %status%: Operation status (success, error, warn, fatal)
  • %message%: Detailed message about the operation result

πŸ”§ Usage

Basic Commands

# Scan directories for changes (dry run)
s3-diff-archive scan -config config.yaml

# Archive changed files to S3
s3-diff-archive archive -config config.yaml

# Restore files from S3
s3-diff-archive restore -config config.yaml

# View database contents for a specific task
s3-diff-archive view -config config.yaml -task photos

Command-line Options

Each command supports the following flags:

  • -config: Path to configuration file (required)
  • -env: Path to environment file (default: .env)
  • -task: Task ID (required for view command only)

Example Workflow

  1. Initial Setup:

    # Create your configuration
    cp config.sample.yaml config.yaml
    # Edit config.yaml with your settings
    
    # Set up environment variables
    cp .env.example .env
    # Edit .env with your actual AWS credentials
  2. Scan for Changes:

    s3-diff-archive scan -config config.yaml
  3. Perform Backup:

    s3-diff-archive archive -config config.yaml
  4. Restore When Needed:

    s3-diff-archive restore -config config.yaml

πŸ“ Project Structure

s3-diff-archive/
β”œβ”€β”€ main.go                 # Main application entry point
β”œβ”€β”€ go.mod                  # Go module dependencies
β”œβ”€β”€ config.sample.yaml      # Sample configuration file
β”œβ”€β”€ archiver/              
β”‚   β”œβ”€β”€ archiver.go        # File archiving logic
β”‚   └── zipper.go          # ZIP compression utilities
β”œβ”€β”€ constants/
β”‚   └── colors.go          # Terminal color constants
β”œβ”€β”€ crypto/
β”‚   β”œβ”€β”€ files.go           # File encryption/decryption
β”‚   └── strings.go         # String encryption utilities
β”œβ”€β”€ db/
β”‚   β”œβ”€β”€ container.go       # Database container management
β”‚   β”œβ”€β”€ db-archiver.go     # Database archiving
β”‚   β”œβ”€β”€ db.go              # Main database operations
β”‚   β”œβ”€β”€ reg.go             # File registry management
β”‚   └── view.go            # Database viewing utilities
β”œβ”€β”€ logger/
β”‚   β”œβ”€β”€ log.go             # Logging configuration
β”‚   └── loggers.go         # Logger implementations
β”œβ”€β”€ restorer/
β”‚   β”œβ”€β”€ compare.go         # File comparison utilities
β”‚   └── restorer.go        # File restoration logic
β”œβ”€β”€ s3/
β”‚   β”œβ”€β”€ s3-manager.go      # S3 operations manager
β”‚   └── task-uploader.go   # Task-specific upload logic
β”œβ”€β”€ scanner/
β”‚   β”œβ”€β”€ scanner.go         # File system scanning
β”‚   └── types.go           # Scanner type definitions
β”œβ”€β”€ types/
β”‚   β”œβ”€β”€ s3-config.go       # S3 configuration types
β”‚   └── sfile.go           # File metadata types
└── utils/
    β”œβ”€β”€ config-parser.go   # Configuration parsing
    β”œβ”€β”€ notifier.go        # Notification system
    β”œβ”€β”€ rand-create.go     # Random data generation
    β”œβ”€β”€ tools.go           # General utilities
    └── zipper.go          # ZIP file utilities

πŸ” How It Works

  1. Scanning: The tool scans specified directories and calculates checksums for all files
  2. Comparison: File states are compared against a local BadgerDB database stored in S3
  3. Differential Detection: Only files that have changed (new, modified, or deleted) are identified
  4. Archiving: Changed files are compressed into password-protected ZIP archives
  5. Upload: Archives are uploaded to S3 with the specified storage class
  6. Database Update: The local database is updated and synchronized with S3

πŸ›‘οΈ Security Features

  • Encryption: All archives are password-protected using ZIP encryption
  • AWS IAM: Leverages AWS IAM for secure access control
  • Secure Storage: Passwords are not stored in configuration files
  • Integrity Checking: File checksums ensure data integrity

🀝 Contributing

We welcome contributions! Please follow these steps:

  1. Fork the Repository

    git clone https://github.com/yourusername/s3-diff-archive.git
    cd s3-diff-archive
  2. Create a Feature Branch

    git checkout -b feature/your-feature-name
  3. Make Your Changes

    • Write clean, well-documented code
    • Follow Go best practices and conventions
    • Add tests for new functionality
  4. Test Your Changes

    go test ./...
    go build .
  5. Submit a Pull Request

    • Provide a clear description of your changes
    • Include any relevant issue numbers
    • Ensure all tests pass

Development Guidelines

  • Code Style: Follow standard Go formatting (go fmt)
  • Testing: Add unit tests for new features
  • Documentation: Update README and code comments as needed
  • Dependencies: Minimize external dependencies when possible

Reporting Issues

  • Use the GitHub Issues page
  • Provide detailed reproduction steps
  • Include configuration files (with sensitive data removed)
  • Specify your operating system and Go version

πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“ž Support

For support and questions:

  • πŸ“« Create an issue on GitHub Issues
  • πŸ“– Check the documentation and examples above
  • πŸ” Search existing issues for similar problems

Note: This tool is designed for efficient incremental backups. For initial backups of large datasets, the first run may take longer as it processes all files. Subsequent runs will be much faster as only changed files are processed.

This tool does not guarantee data integrity or security beyond the provided encryption and S3 storage features. Always test your backup and restore processes to ensure they meet your requirements.

About

diff archiving with go+deep-archive

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages