Skip to content
This repository was archived by the owner on Oct 10, 2025. It is now read-only.

Conversation

josecelano
Copy link
Member

@josecelano josecelano commented Jul 1, 2025

Relates to: #10

This PR implements a complete local testing infrastructure using KVM/libvirt to enable reliable local development and testing before production deployment to Hetzner.

📋 What's Included

🔧 Infrastructure Setup

  • OpenTofu/Terraform configuration for local VM deployment
  • Cloud-init templates for automated Ubuntu 22.04 setup with Docker
  • Automated libvirt permission fixes and storage pool configuration
  • Network configuration with proper interface detection (ens3)
  • Security hardening with UFW firewall and automatic updates

⚙️ Makefile Automation

  • Complete workflow automation with intuitive targets
  • SSH key injection from local config (secure, never stored in repo)
  • Real-time VM monitoring and cloud-init progress tracking
  • Minimal configuration support for debugging
  • Comprehensive libvirt troubleshooting automation

🧪 Testing & Monitoring

  • Automated infrastructure validation tests
  • Real-time cloud-init monitoring script
  • VM status checking and SSH connectivity validation
  • Network optimization for BitTorrent traffic

📚 Documentation

  • Quick Start Guide
  • Complete Setup Guide
  • libvirt Troubleshooting
  • Updated main README with local testing section

🎯 Current Status

Working Infrastructure

  • ✅ VM deployment and network connectivity
  • ✅ SSH access with key-based authentication
  • ✅ Docker installation and configuration
  • ✅ UFW firewall with Torrust Tracker ports
  • ✅ Network optimizations for BitTorrent
  • ✅ Cloud-init automation

📝 TODO List for Review

🔍 Manual Testing Required

  • Test all Makefile commands generated by AI assistant
    • make install-deps - Dependency installation
    • make init - OpenTofu initialization
    • make plan - Infrastructure planning
    • make apply - VM deployment
    • make apply-minimal - Minimal configuration deployment
    • make monitor-cloud-init - Real-time monitoring
    • make restart-and-monitor - Complete restart workflow
    • make ssh - SSH connection
    • make destroy - VM cleanup
    • make test - Infrastructure tests

🧪 Infrastructure Validation

  • Run infrastructure tests
    • tests/infrastructure/test-local-setup.sh prerequisites
    • tests/infrastructure/test-local-setup.sh full-test
    • test-integration.sh
  • Verify GitHub Actions workflow (infrastructure.yml)
  • Test libvirt permission fixes on fresh Ubuntu/Debian systems

🎯 Torrust Tracker Installation Testing

DISCARDED: Compilation from sources was discarded. We have prioritised easy update over performance for the demo.

  • Install tracker from Rust sources
    • Clone torrust-tracker repo in VM
    • Build with Rust/Cargo
    • Configure and run tracker
    • Test HTTP and UDP endpoints
  • Install with Docker Compose (legacy torrust-demo style)
    • Clone repo in VM
    • Run with docker-compose
    • Verify all services start correctly
    • Test tracker functionality

📋 Code Review

  • Review OpenTofu configuration security
  • Validate cloud-init templates
  • Check Makefile target implementation
  • Verify .gitignore excludes sensitive files
  • Review documentation completeness

🎯 Next PRs Scope

The next PR will focus on production deployment:

  • Hetzner production configuration
    • Terraform/OpenTofu config for Hetzner Cloud
    • Production cloud-init templates
    • SSL/TLS configuration with Let's Encrypt
    • Production security hardening
    • Monitoring and alerting setup

🔒 Security Notes

  • ✅ SSH keys properly use template variables, never stored in repo
  • local.tfvars is git-ignored and contains actual secrets locally
  • ✅ All sensitive files properly excluded via .gitignore
  • ✅ Cloud-init templates use secure templating approach

🎯 Testing the Infrastructure

# Quick setup (requires Ubuntu/Debian with sudo)
make install-deps  # Install dependencies
make setup-ssh-key # Configure SSH key
make apply        # Deploy VM
make ssh          # Connect to VM

# Monitor deployment
make monitor-cloud-init

# Run tests
make test

# Cleanup
make destroy

📊 Files Changed

  • Infrastructure: 15+ new files (OpenTofu, cloud-init, scripts)
  • Documentation: 4 new guides + updated README
  • Automation: Enhanced Makefile with 15+ targets
  • Testing: Infrastructure test suite + GitHub Actions workflow

🚀 This provides a solid foundation for local testing before production Hetzner deployment.

@josecelano josecelano requested a review from da2ce7 July 1, 2025 12:07
@josecelano josecelano self-assigned this Jul 1, 2025
@josecelano josecelano added the - Admin - Enjoyable to Install and Setup our Software label Jul 1, 2025
@josecelano
Copy link
Member Author

Hi @da2ce7, I'm trying to deploy the demo to hetzner.com using Terraform and cloud-init. Before doing it with the final cloud provider, I wanted to simulate it locally to avoid wasting more time (with the real cloud provider the process would be slower) or costly resources. So I decided to find a way to test it locally. The current PR state only does the deployment locally but I can check that the VM is provisioned correctly and all the dependencies are installed correctly.

Why Terraform + Cloud-init Over Manual Hetzner Setup

  • Automation & Speed: 2-5 minutes vs 30+ minutes manual configuration
  • Reproducibility: Identical setups every time, eliminating "works on my machine" issues
  • Version Control: Complete infrastructure history tracked in git
  • Security Standardization: Consistent security baseline applied automatically
  • Documentation: Infrastructure requirements are self-documenting through code
  • Disaster Recovery: Complete infrastructure rebuild in minutes, not hours
  • Lower Barrier to Entry: One command deployment vs extensive Linux administration knowledge
  • Team Collaboration: Infrastructure knowledge codified and shareable across contributors
  • Error Reduction: Eliminates human configuration mistakes and forgotten steps
  • Scalability: Easy to deploy multiple environments (dev, staging, production)ructure using KVM/libvirt to enable reliable local development and testing before production deployment to Hetzner.

@josecelano josecelano force-pushed the 10-provision-new-hetzner-vm branch 2 times, most recently from 1010819 to 67c1bc2 Compare July 1, 2025 14:45
- Add complete local VM testing setup using OpenTofu and KVM/libvirt
- Create infrastructure automation with Makefile targets for setup and testing
- Add comprehensive libvirt troubleshooting and permission fixes
- Implement automated AppArmor override for libvirt-qemu storage access
- Add cloud-init configuration for Ubuntu 22.04 VMs with Docker setup
- Create test suite for infrastructure validation and Torrust Tracker integration
- Add detailed documentation with quick-start and troubleshooting guides
- Configure markdownlint for consistent documentation formatting
- Fix all markdown linting issues in core infrastructure documentation

Infrastructure includes:

- OpenTofu configuration for local VM deployment
- Cloud-init setup with torrust user, Docker, and security hardening
- Automated scripts for libvirt permission and storage pool fixes
- Comprehensive test coverage for prerequisites, deployment, and integration
- GitHub Actions workflow for CI/CD validation

This enables reliable local testing before cloud deployment and provides
automated solutions for common libvirt permission issues on Ubuntu/Debian.
@josecelano josecelano force-pushed the 10-provision-new-hetzner-vm branch from 67c1bc2 to 8a33e42 Compare July 1, 2025 15:05
- Separate infrastructure (VM/server setup) from application (Docker/app config)
- Move infrastructure files to infrastructure/ (cloud-init, terraform, tests, docs)
- Move application files to application/ (compose, scripts, configs, docs)
- Create distributed .gitignore files for each component
- Add ADR-001 documenting Makefile location decision
- Update all documentation and references to match new structure
- Centralize port documentation in application/docs/firewall-requirements.md
- Add TOML formatting conventions with .taplo.toml and VS Code settings
- Update GitHub Actions workflow for new file paths
- Validate all Makefile commands work after reorganization
@josecelano josecelano force-pushed the 10-provision-new-hetzner-vm branch from e37ed21 to 34750e1 Compare July 1, 2025 17:17
…sults

- Complete test results for 22/27 Makefile targets (81% coverage)
- Document successful testing of VM lifecycle, configuration validation, and workflows
- Identify critical issues: test suite initialization dependency and minimal config behavior
- Add detailed phase-by-phase test results and findings
- Update status from 'In Progress' to 'Substantially Complete'
- Provide recommendations for remaining improvements
…ssues

Main fixes:
- Fix firewall configuration blocking SSH during cloud-init (root cause)
- Install Docker Compose V2 plugin instead of standalone version
- Update integration tests to auto-detect compose command

Changes:
- infrastructure/cloud-init/user-data.yaml.tpl:
  * Remove docker-compose package, install Docker Compose V2 plugin
  * Improved firewall setup to allow SSH throughout process
- infrastructure/tests/test-integration.sh:
  * Add get_docker_compose_cmd() helper for compatibility
  * Auto-detect 'docker compose' vs 'docker-compose' commands
  * Update all compose operations to use detected command
- docs/guides/integration-testing-guide.md:
  * Add comprehensive integration testing guide
  * Correct timing estimates (8-12 min total, 2-3 min cloud-init)
  * Document firewall fix as main solution
  * Add Docker Compose compatibility notes

Result: VM deployment now completes reliably in 2-3 minutes with proper
SSH connectivity throughout the cloud-init process.
@josecelano josecelano force-pushed the 10-provision-new-hetzner-vm branch from c6b27a3 to 091029f Compare July 2, 2025 07:24
josecelano added 17 commits July 3, 2025 07:22
- Remove index-gui and index services from proxy dependencies
- These services were referenced but not defined in compose.yaml
- Fixes Docker Compose validation error when starting services
- Now allows tracker service to start successfully
- Fix typo: 'Crating' → 'Creating' in tracker configuration message
- Add missing newline at end of file for proper file format
…guidelines

- Add clear prohibition of state-changing git actions without explicit permission
- List allowed git actions that can be performed without permission (read-only)
- List actions requiring explicit permission (all state-changing operations)
- Add best practice guidance to always ask before committing changes
- Prevents AI assistants from committing/pushing without user consent
…ation

- Add pkg-config, libssl-dev, make, build-essential for OpenSSL support
- Add libsqlite3-dev, sqlite3 for SQLite3 database support
- Dependencies are for future source compilation to improve performance
- Currently using Docker but planning to compile from source to avoid Docker layer overhead
- Based on Torrust Tracker crate documentation requirements
- Resolves sqlite3 command not found error for source compilation
- Add auto-approve flags to Makefile commands (apply, apply-minimal, destroy)
- Add new VM management commands: 'make vm-ip' and 'make vm-info'
- Update integration testing guide with manual volume cleanup section (1.4.1)
- Enhance minimal cloud-init configuration with SSH debugging features:
  - Add password authentication support
  - Add explicit SSH daemon configuration
  - Add cloud-init completion tracking files
  - Add SSH service restart commands
- Improve test script error message for libvirtd service

This improves the development experience by eliminating manual confirmation
prompts and providing better debugging tools for SSH connectivity issues.
…nfiguration

- Update Terraform to use Ubuntu 24.04 server cloud image
- Simplify minimal cloud-init configuration with working username/password setup
- Add SSH key support to minimal cloud-init configuration
- Successfully tested username/password authentication with testuser/testpass123

The changes ensure reliable VM creation and SSH access for integration testing.
- Add 'make console' command for text-based VM console access
- Add 'make vm-console' command for graphical VM console access via virt-viewer
- Update install-deps to include virt-viewer package
- Document virt-viewer usage in integration testing guide
- Update local testing setup documentation with console access options
- Add console commands to main commands table in copilot instructions

These commands provide easy access to VM console for debugging when SSH fails
during cloud-init or boot processes. Users can now use:
- virt-viewer spice://127.0.0.1:5900 for graphical console
- virsh console for text-based console access
- Add explicit requirement that all commits MUST be signed with GPG
- Specify to use default git commit behavior (triggers GPG signing)
- Prohibit use of --no-gpg-sign to bypass signing
- Ensure all AI-assisted commits maintain security and authenticity standards
- Add guidelines for working in small, manageable steps
- Include parallel changes methodology for independent modifications
- Specify separation between refactoring and feature commits
- Add requirement for planning complex tasks with confirmation
- Include rollback strategies and risk mitigation for critical changes

This ensures AI-assisted development follows incremental, well-organized
approach with clear separation of concerns and proper planning.
Break long line in cloud-init template to comply with 120-character
yamllint limit, fixing ci-test-syntax workflow failure.

- Split 136-character line into multiple lines maintaining YAML format
- Ensures yamllint validation passes for CI/CD pipeline
- Resolves failing make ci-test-syntax command
- Add comprehensive linting script (scripts/lint.sh) with yamllint, shellcheck, and markdownlint
- Simplify GitHub Actions workflow to focus only on linting checks
- Add Makefile targets for linting (lint, lint-yaml, lint-shell, lint-markdown)
- Fix all ShellCheck issues in shell scripts:
  - Quote variables properly in monitor-cloud-init.sh
  - Fix sed command in test-local-setup.sh
  - Use array handling for yamllint config
- Fix markdown line length issues in documentation files
- Update yamllint config to support GitHub Actions syntax
- Add linting documentation to main README
- Remove empty test file and cleanup directory structure

All linting checks now pass successfully, providing a solid foundation
for maintaining code quality across the entire project.
- Rename .github/workflows/infrastructure.yml to testing.yml
- Update workflow name from 'Infrastructure Tests' to 'Testing'
- Simplify workflow triggers (remove path-based filtering)
- Fix runner specification to use ubuntu-22.04 instead of ubuntu
- Maintain same linting functionality with cleaner focus
- yamllint: Use 'yamllint .' instead of manual find + per-file processing
- shellcheck: Use glob patterns to find files, process all at once
- markdownlint: Use 'markdownlint **/*.md' glob pattern
- Removed complex template file handling logic
- Simplified error handling with single pass/fail per tool
- Reduced code complexity by ~100 lines while maintaining functionality
- All linting tools now leverage their own optimized file discovery
- Add pre-commit linting requirement to Git Actions section
- Add Automated Linting section to Code Quality Standards
- Document all lint.sh command options and usage
- Enforce ./scripts/lint.sh must pass before any commit
- Ensure consistent code quality across all file types
- Add nullglob bash option to project-words.txt
- Resolves spell-check for shell option used in lint.sh refactoring
- Related to improved shell script file discovery patterns
…tion

Root Cause: YAML document start marker (---) was breaking cloud-init processing
Solution: Replace --- with #cloud-config header in user-data.yaml.tpl

Details:
- Cloud-init parser requires #cloud-config as first line, not YAML document marker
- Using --- caused SSH key templating to fail ( became None)
- This resulted in empty ssh_authorized_keys and authentication failures

Changes:
- Fixed infrastructure/cloud-init/user-data.yaml.tpl header
- Added comprehensive documentation of investigation process
- Included 15 incremental test configurations used for debugging
- Created detailed bug analysis and resolution summary

Testing:
- All individual cloud-init components validated via incremental testing
- SSH key authentication: ✅ WORKING
- Password authentication: ✅ WORKING
- Full integration test suite: ✅ PASSED
- Standard make workflow (init, plan, apply): ✅ WORKING

Documentation:
- SSH_BUG_SUMMARY.md: Complete analysis and resolution
- SSH_BUG_ANALYSIS.md: Technical investigation details
- 17 test configuration files: Incremental debugging process
- Updated project-words.txt: Added technical terms

Impact:
- Resolves critical SSH access issue preventing VM usage
- Enables proper cloud-init processing with all features working
- Infrastructure deployment now works reliably via make commands
- Integration testing workflow fully operational
- Create infrastructure/docs/bugs/ directory for systematic bug documentation
- Move SSH authentication failure documentation to 001-ssh-authentication-failure/
- Organize content into logical structure:
  - README.md: Bug overview and quick reference
  - SSH_BUG_ANALYSIS.md: Initial investigation and analysis
  - SSH_BUG_SUMMARY.md: Complete timeline and resolution
  - test-configs/: All 17 test configurations used during debugging

- Add comprehensive README.md for bugs directory explaining:
  - Purpose and scope of bug documentation archive
  - Directory structure and naming conventions
  - Content guidelines and quality standards
  - Usage examples for contributors and maintainers

- Fix markdown linting issues in all documentation files
- Add markdownlint disable for technical content with long lines

This establishes a systematic approach for documenting infrastructure bugs
with complete investigation trails, test artifacts, and lessons learned.
Future bugs can follow this template for consistent documentation quality.
@josecelano josecelano force-pushed the 10-provision-new-hetzner-vm branch from 0b60e52 to b272f1b Compare July 4, 2025 16:43
- Add new section explaining why multiple DHCP leases appear
- Clarify that this is normal behavior when VMs are created/destroyed
- Provide example output and explanation of active vs expired leases
- Add verification commands to check active VMs
- Include DHCP lease cleanup instructions
- Explain impact on development workflow
- Add DHCP lease information to diagnostic commands section

This helps users understand the expected behavior when running
'virsh net-dhcp-leases default' and seeing multiple lease entries.
- Update cloud-init configuration to reference Ubuntu 24.04 LTS
- Update documentation across infrastructure and guides
- Update Makefile to reference ubuntu-24.04-base.qcow2 image
- Update terraform.tfvars.example with correct Ubuntu 24.04 image URL
- Replace jammy (22.04) URL with noble (24.04) release URL
- Ensure consistency across all documentation and configuration files

All changes maintain backward compatibility and follow existing patterns.
Linting validation passed for all modified files.
Replace deprecated chpasswd.list with plain_text_passwd in user configuration.
The chpasswd.list method is deprecated in newer cloud-init versions and
plain_text_passwd is the recommended approach for setting user passwords.

This change:
- Removes the deprecated chpasswd section
- Adds plain_text_passwd field to the user configuration
- Maintains the same password (torrust123) for compatibility
- Eliminates deprecation warnings in cloud-init logs
Comment out password authentication settings in cloud-init configuration
to enforce SSH key-only access for enhanced security.

Changes:
- Comment out plain_text_passwd field in user configuration
- Comment out ssh_pwauth setting
- Comment out SSH configuration file that enables password auth
- Update final message to reflect SSH key-only access

Password authentication can be re-enabled by uncommenting the relevant
sections if needed for debugging or recovery purposes.
- Add comprehensive twelve-factor refactoring plan (README.md)
- Add detailed phase 1 implementation checklist
- Add step-by-step migration guide with backward compatibility
- Include Torrust Tracker-specific configuration considerations
- Address deployment separation and environment parity violations
- Plan multi-cloud support starting with Hetzner infrastructure
- Update project-words.txt with new technical terms
- All documentation passes markdown linting with table formatting fixes
@josecelano
Copy link
Member Author

Hi @da2ce7, the current PR status is:

I planned to add scripts and configuration to deploy to Hetner and create the VM there, and then manually finish the installation by following the official tutorial.

This is already a step further from what we currently have. This automates the provisioning of the VM with the necessary system dependencies. However, I'm considering going even further:

  • Refactor the current state to follow https://12factor.net/ recommendations.
  • Full automation of the process, including app setup. Steps described here.

I have created a plan to do that:

22ee5f3

Improving the tracker installation has not been a priority so far. We have only improved the tracker installation, but we have not:

  • Provided a way to deploy the tracker to a cloud provider easily.
  • Put the infrastructure requirements in a declarative way.
  • Automate the provisioning, orchestration, and setup process to avoid errors due to manual processes.

Continue working on implementing a good infrastructure setup; it may take me some days (or even weeks) longer than a simple manual provider migration. I think this can be very valuable for users (both for system admins who want to automate the process, or for system admins who, even if they do it manually, would have a deterministic, detailed description of what they should do to deploy the tracker).

what do you think @da2ce7 @cgbosse?

- Remove ubuntu docker.io package from apt packages list
- Add official Docker GPG key and repository setup
- Install docker-ce, docker-ce-cli, containerd.io, docker-buildx-plugin, docker-compose-plugin
- Remove manual Docker Compose v2.21.0 installation
- Add Docker version verification steps
- Update project-words.txt with new technical terms (buildx, containerd, dpkg, keyrings)

This ensures we get the latest Docker version (28.3.1) and Docker Compose v2.38.1
as per the official Docker installation documentation.
- Install Rust using rustup following official documentation
- Configure Rust installation as torrust user for proper ownership
- Add Rust's cargo bin directory to PATH in .bashrc and .profile
- Include verification commands for rustc and cargo
- Update final message to reflect Rust installation
- Add Rust-related terms to project-words.txt for spell checking

This enables future source compilation of Torrust Tracker for better
performance, moving away from the current Docker-based approach.
- Document the 'No IP assigned yet' issue in Terraform outputs
- Explain root cause: stale Terraform state vs actual VM DHCP lease
- Add comprehensive troubleshooting steps with solutions
- Include prevention tips and alternative methods
- Add new 'make refresh-state' command for easy state synchronization
- Update both quick-start and detailed setup documentation
- Improve command tables and cross-references

This addresses the common confusion when make status shows no IP
while virsh domifaddr shows the VM has a valid IP address.
- Create comprehensive ADR documenting the decision to use Docker for all
  services including UDP tracker despite potential performance overhead
- Document technical challenges with host networking and connection tracking
- Reference related GitHub issues #27 and torrust#72 from torrust-demo repo
- Update application README with Docker design decision section
- Update twelve-factor refactor docs to reference new ADR
- Update main README to list both ADRs with descriptions
- Update cloud-init comments to clarify dependency installation purpose
- Prioritize demo simplicity and consistency over peak performance
- Comment out Rust build packages (pkg-config, libssl-dev, etc.) in cloud-init
- Comment out Rust installation and configuration commands
- Update final message to reflect Docker-only deployment approach
- Add clear instructions for re-enabling Rust if needed for source compilation
- Aligns with ADR-002 decision to use Docker for all services
- Reduces VM provisioning time and complexity while preserving flexibility
- Add proxy_set_header X-Forwarded-For to nginx HTTP (port 80) configuration
- Enables HTTP tracker functionality through reverse proxy
- Fix resolves tracker client connection issues when using proxy
- Update smoke testing guide to reflect working HTTP tracker via proxy
- Add comprehensive end-to-end smoke testing documentation
- Update .gitignore to exclude cloned torrust-tracker/ directory
- Add project words for documentation spelling checks

Tested:
- UDP trackers (6868, 6969): ✅ Working
- HTTP tracker via proxy (80): ✅ Working
- Health check API (80): ✅ Working
- Statistics API (80): ✅ Working
- Comprehensive tracker checker: ✅ All endpoints passing
@josecelano josecelano marked this pull request as ready for review July 7, 2025 18:38
@josecelano
Copy link
Member Author

ACK 8fac056

@josecelano josecelano merged commit e4076d1 into main Jul 7, 2025
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

- Admin - Enjoyable to Install and Setup our Software

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant