-
Notifications
You must be signed in to change notification settings - Fork 1
Add Local KVM/libvirt Testing Infrastructure #11
Conversation
Hi @da2ce7, I'm trying to deploy the demo to hetzner.com using Terraform and cloud-init. Before doing it with the final cloud provider, I wanted to simulate it locally to avoid wasting more time (with the real cloud provider the process would be slower) or costly resources. So I decided to find a way to test it locally. The current PR state only does the deployment locally but I can check that the VM is provisioned correctly and all the dependencies are installed correctly. Why Terraform + Cloud-init Over Manual Hetzner Setup
|
1010819
to
67c1bc2
Compare
- Add complete local VM testing setup using OpenTofu and KVM/libvirt - Create infrastructure automation with Makefile targets for setup and testing - Add comprehensive libvirt troubleshooting and permission fixes - Implement automated AppArmor override for libvirt-qemu storage access - Add cloud-init configuration for Ubuntu 22.04 VMs with Docker setup - Create test suite for infrastructure validation and Torrust Tracker integration - Add detailed documentation with quick-start and troubleshooting guides - Configure markdownlint for consistent documentation formatting - Fix all markdown linting issues in core infrastructure documentation Infrastructure includes: - OpenTofu configuration for local VM deployment - Cloud-init setup with torrust user, Docker, and security hardening - Automated scripts for libvirt permission and storage pool fixes - Comprehensive test coverage for prerequisites, deployment, and integration - GitHub Actions workflow for CI/CD validation This enables reliable local testing before cloud deployment and provides automated solutions for common libvirt permission issues on Ubuntu/Debian.
67c1bc2
to
8a33e42
Compare
- Separate infrastructure (VM/server setup) from application (Docker/app config) - Move infrastructure files to infrastructure/ (cloud-init, terraform, tests, docs) - Move application files to application/ (compose, scripts, configs, docs) - Create distributed .gitignore files for each component - Add ADR-001 documenting Makefile location decision - Update all documentation and references to match new structure - Centralize port documentation in application/docs/firewall-requirements.md - Add TOML formatting conventions with .taplo.toml and VS Code settings - Update GitHub Actions workflow for new file paths - Validate all Makefile commands work after reorganization
e37ed21
to
34750e1
Compare
…sults - Complete test results for 22/27 Makefile targets (81% coverage) - Document successful testing of VM lifecycle, configuration validation, and workflows - Identify critical issues: test suite initialization dependency and minimal config behavior - Add detailed phase-by-phase test results and findings - Update status from 'In Progress' to 'Substantially Complete' - Provide recommendations for remaining improvements
…ssues Main fixes: - Fix firewall configuration blocking SSH during cloud-init (root cause) - Install Docker Compose V2 plugin instead of standalone version - Update integration tests to auto-detect compose command Changes: - infrastructure/cloud-init/user-data.yaml.tpl: * Remove docker-compose package, install Docker Compose V2 plugin * Improved firewall setup to allow SSH throughout process - infrastructure/tests/test-integration.sh: * Add get_docker_compose_cmd() helper for compatibility * Auto-detect 'docker compose' vs 'docker-compose' commands * Update all compose operations to use detected command - docs/guides/integration-testing-guide.md: * Add comprehensive integration testing guide * Correct timing estimates (8-12 min total, 2-3 min cloud-init) * Document firewall fix as main solution * Add Docker Compose compatibility notes Result: VM deployment now completes reliably in 2-3 minutes with proper SSH connectivity throughout the cloud-init process.
c6b27a3
to
091029f
Compare
- Remove index-gui and index services from proxy dependencies - These services were referenced but not defined in compose.yaml - Fixes Docker Compose validation error when starting services - Now allows tracker service to start successfully
- Fix typo: 'Crating' → 'Creating' in tracker configuration message - Add missing newline at end of file for proper file format
…guidelines - Add clear prohibition of state-changing git actions without explicit permission - List allowed git actions that can be performed without permission (read-only) - List actions requiring explicit permission (all state-changing operations) - Add best practice guidance to always ask before committing changes - Prevents AI assistants from committing/pushing without user consent
…ation - Add pkg-config, libssl-dev, make, build-essential for OpenSSL support - Add libsqlite3-dev, sqlite3 for SQLite3 database support - Dependencies are for future source compilation to improve performance - Currently using Docker but planning to compile from source to avoid Docker layer overhead - Based on Torrust Tracker crate documentation requirements - Resolves sqlite3 command not found error for source compilation
- Add auto-approve flags to Makefile commands (apply, apply-minimal, destroy) - Add new VM management commands: 'make vm-ip' and 'make vm-info' - Update integration testing guide with manual volume cleanup section (1.4.1) - Enhance minimal cloud-init configuration with SSH debugging features: - Add password authentication support - Add explicit SSH daemon configuration - Add cloud-init completion tracking files - Add SSH service restart commands - Improve test script error message for libvirtd service This improves the development experience by eliminating manual confirmation prompts and providing better debugging tools for SSH connectivity issues.
…nfiguration - Update Terraform to use Ubuntu 24.04 server cloud image - Simplify minimal cloud-init configuration with working username/password setup - Add SSH key support to minimal cloud-init configuration - Successfully tested username/password authentication with testuser/testpass123 The changes ensure reliable VM creation and SSH access for integration testing.
- Add 'make console' command for text-based VM console access - Add 'make vm-console' command for graphical VM console access via virt-viewer - Update install-deps to include virt-viewer package - Document virt-viewer usage in integration testing guide - Update local testing setup documentation with console access options - Add console commands to main commands table in copilot instructions These commands provide easy access to VM console for debugging when SSH fails during cloud-init or boot processes. Users can now use: - virt-viewer spice://127.0.0.1:5900 for graphical console - virsh console for text-based console access
- Add explicit requirement that all commits MUST be signed with GPG - Specify to use default git commit behavior (triggers GPG signing) - Prohibit use of --no-gpg-sign to bypass signing - Ensure all AI-assisted commits maintain security and authenticity standards
- Add guidelines for working in small, manageable steps - Include parallel changes methodology for independent modifications - Specify separation between refactoring and feature commits - Add requirement for planning complex tasks with confirmation - Include rollback strategies and risk mitigation for critical changes This ensures AI-assisted development follows incremental, well-organized approach with clear separation of concerns and proper planning.
Break long line in cloud-init template to comply with 120-character yamllint limit, fixing ci-test-syntax workflow failure. - Split 136-character line into multiple lines maintaining YAML format - Ensures yamllint validation passes for CI/CD pipeline - Resolves failing make ci-test-syntax command
- Add comprehensive linting script (scripts/lint.sh) with yamllint, shellcheck, and markdownlint - Simplify GitHub Actions workflow to focus only on linting checks - Add Makefile targets for linting (lint, lint-yaml, lint-shell, lint-markdown) - Fix all ShellCheck issues in shell scripts: - Quote variables properly in monitor-cloud-init.sh - Fix sed command in test-local-setup.sh - Use array handling for yamllint config - Fix markdown line length issues in documentation files - Update yamllint config to support GitHub Actions syntax - Add linting documentation to main README - Remove empty test file and cleanup directory structure All linting checks now pass successfully, providing a solid foundation for maintaining code quality across the entire project.
- Rename .github/workflows/infrastructure.yml to testing.yml - Update workflow name from 'Infrastructure Tests' to 'Testing' - Simplify workflow triggers (remove path-based filtering) - Fix runner specification to use ubuntu-22.04 instead of ubuntu - Maintain same linting functionality with cleaner focus
- yamllint: Use 'yamllint .' instead of manual find + per-file processing - shellcheck: Use glob patterns to find files, process all at once - markdownlint: Use 'markdownlint **/*.md' glob pattern - Removed complex template file handling logic - Simplified error handling with single pass/fail per tool - Reduced code complexity by ~100 lines while maintaining functionality - All linting tools now leverage their own optimized file discovery
- Add pre-commit linting requirement to Git Actions section - Add Automated Linting section to Code Quality Standards - Document all lint.sh command options and usage - Enforce ./scripts/lint.sh must pass before any commit - Ensure consistent code quality across all file types
- Add nullglob bash option to project-words.txt - Resolves spell-check for shell option used in lint.sh refactoring - Related to improved shell script file discovery patterns
…tion Root Cause: YAML document start marker (---) was breaking cloud-init processing Solution: Replace --- with #cloud-config header in user-data.yaml.tpl Details: - Cloud-init parser requires #cloud-config as first line, not YAML document marker - Using --- caused SSH key templating to fail ( became None) - This resulted in empty ssh_authorized_keys and authentication failures Changes: - Fixed infrastructure/cloud-init/user-data.yaml.tpl header - Added comprehensive documentation of investigation process - Included 15 incremental test configurations used for debugging - Created detailed bug analysis and resolution summary Testing: - All individual cloud-init components validated via incremental testing - SSH key authentication: ✅ WORKING - Password authentication: ✅ WORKING - Full integration test suite: ✅ PASSED - Standard make workflow (init, plan, apply): ✅ WORKING Documentation: - SSH_BUG_SUMMARY.md: Complete analysis and resolution - SSH_BUG_ANALYSIS.md: Technical investigation details - 17 test configuration files: Incremental debugging process - Updated project-words.txt: Added technical terms Impact: - Resolves critical SSH access issue preventing VM usage - Enables proper cloud-init processing with all features working - Infrastructure deployment now works reliably via make commands - Integration testing workflow fully operational
- Create infrastructure/docs/bugs/ directory for systematic bug documentation - Move SSH authentication failure documentation to 001-ssh-authentication-failure/ - Organize content into logical structure: - README.md: Bug overview and quick reference - SSH_BUG_ANALYSIS.md: Initial investigation and analysis - SSH_BUG_SUMMARY.md: Complete timeline and resolution - test-configs/: All 17 test configurations used during debugging - Add comprehensive README.md for bugs directory explaining: - Purpose and scope of bug documentation archive - Directory structure and naming conventions - Content guidelines and quality standards - Usage examples for contributors and maintainers - Fix markdown linting issues in all documentation files - Add markdownlint disable for technical content with long lines This establishes a systematic approach for documenting infrastructure bugs with complete investigation trails, test artifacts, and lessons learned. Future bugs can follow this template for consistent documentation quality.
0b60e52
to
b272f1b
Compare
- Add new section explaining why multiple DHCP leases appear - Clarify that this is normal behavior when VMs are created/destroyed - Provide example output and explanation of active vs expired leases - Add verification commands to check active VMs - Include DHCP lease cleanup instructions - Explain impact on development workflow - Add DHCP lease information to diagnostic commands section This helps users understand the expected behavior when running 'virsh net-dhcp-leases default' and seeing multiple lease entries.
- Update cloud-init configuration to reference Ubuntu 24.04 LTS - Update documentation across infrastructure and guides - Update Makefile to reference ubuntu-24.04-base.qcow2 image - Update terraform.tfvars.example with correct Ubuntu 24.04 image URL - Replace jammy (22.04) URL with noble (24.04) release URL - Ensure consistency across all documentation and configuration files All changes maintain backward compatibility and follow existing patterns. Linting validation passed for all modified files.
Replace deprecated chpasswd.list with plain_text_passwd in user configuration. The chpasswd.list method is deprecated in newer cloud-init versions and plain_text_passwd is the recommended approach for setting user passwords. This change: - Removes the deprecated chpasswd section - Adds plain_text_passwd field to the user configuration - Maintains the same password (torrust123) for compatibility - Eliminates deprecation warnings in cloud-init logs
Comment out password authentication settings in cloud-init configuration to enforce SSH key-only access for enhanced security. Changes: - Comment out plain_text_passwd field in user configuration - Comment out ssh_pwauth setting - Comment out SSH configuration file that enables password auth - Update final message to reflect SSH key-only access Password authentication can be re-enabled by uncommenting the relevant sections if needed for debugging or recovery purposes.
- Add comprehensive twelve-factor refactoring plan (README.md) - Add detailed phase 1 implementation checklist - Add step-by-step migration guide with backward compatibility - Include Torrust Tracker-specific configuration considerations - Address deployment separation and environment parity violations - Plan multi-cloud support starting with Hetzner infrastructure - Update project-words.txt with new technical terms - All documentation passes markdown linting with table formatting fixes
Hi @da2ce7, the current PR status is:
I planned to add scripts and configuration to deploy to Hetner and create the VM there, and then manually finish the installation by following the official tutorial. This is already a step further from what we currently have. This automates the provisioning of the VM with the necessary system dependencies. However, I'm considering going even further:
I have created a plan to do that: Improving the tracker installation has not been a priority so far. We have only improved the tracker installation, but we have not:
Continue working on implementing a good infrastructure setup; it may take me some days (or even weeks) longer than a simple manual provider migration. I think this can be very valuable for users (both for system admins who want to automate the process, or for system admins who, even if they do it manually, would have a deterministic, detailed description of what they should do to deploy the tracker). |
- Remove ubuntu docker.io package from apt packages list - Add official Docker GPG key and repository setup - Install docker-ce, docker-ce-cli, containerd.io, docker-buildx-plugin, docker-compose-plugin - Remove manual Docker Compose v2.21.0 installation - Add Docker version verification steps - Update project-words.txt with new technical terms (buildx, containerd, dpkg, keyrings) This ensures we get the latest Docker version (28.3.1) and Docker Compose v2.38.1 as per the official Docker installation documentation.
- Install Rust using rustup following official documentation - Configure Rust installation as torrust user for proper ownership - Add Rust's cargo bin directory to PATH in .bashrc and .profile - Include verification commands for rustc and cargo - Update final message to reflect Rust installation - Add Rust-related terms to project-words.txt for spell checking This enables future source compilation of Torrust Tracker for better performance, moving away from the current Docker-based approach.
- Document the 'No IP assigned yet' issue in Terraform outputs - Explain root cause: stale Terraform state vs actual VM DHCP lease - Add comprehensive troubleshooting steps with solutions - Include prevention tips and alternative methods - Add new 'make refresh-state' command for easy state synchronization - Update both quick-start and detailed setup documentation - Improve command tables and cross-references This addresses the common confusion when make status shows no IP while virsh domifaddr shows the VM has a valid IP address.
- Create comprehensive ADR documenting the decision to use Docker for all services including UDP tracker despite potential performance overhead - Document technical challenges with host networking and connection tracking - Reference related GitHub issues #27 and torrust#72 from torrust-demo repo - Update application README with Docker design decision section - Update twelve-factor refactor docs to reference new ADR - Update main README to list both ADRs with descriptions - Update cloud-init comments to clarify dependency installation purpose - Prioritize demo simplicity and consistency over peak performance
- Comment out Rust build packages (pkg-config, libssl-dev, etc.) in cloud-init - Comment out Rust installation and configuration commands - Update final message to reflect Docker-only deployment approach - Add clear instructions for re-enabling Rust if needed for source compilation - Aligns with ADR-002 decision to use Docker for all services - Reduces VM provisioning time and complexity while preserving flexibility
- Add proxy_set_header X-Forwarded-For to nginx HTTP (port 80) configuration - Enables HTTP tracker functionality through reverse proxy - Fix resolves tracker client connection issues when using proxy - Update smoke testing guide to reflect working HTTP tracker via proxy - Add comprehensive end-to-end smoke testing documentation - Update .gitignore to exclude cloned torrust-tracker/ directory - Add project words for documentation spelling checks Tested: - UDP trackers (6868, 6969): ✅ Working - HTTP tracker via proxy (80): ✅ Working - Health check API (80): ✅ Working - Statistics API (80): ✅ Working - Comprehensive tracker checker: ✅ All endpoints passing
ACK 8fac056 |
Relates to: #10
This PR implements a complete local testing infrastructure using KVM/libvirt to enable reliable local development and testing before production deployment to Hetzner.
📋 What's Included
🔧 Infrastructure Setup
ens3
)⚙️ Makefile Automation
🧪 Testing & Monitoring
📚 Documentation
🎯 Current Status
✅ Working Infrastructure
📝 TODO List for Review
🔍 Manual Testing Required
make install-deps
- Dependency installationmake init
- OpenTofu initializationmake plan
- Infrastructure planningmake apply
- VM deploymentmake apply-minimal
- Minimal configuration deploymentmake monitor-cloud-init
- Real-time monitoringmake restart-and-monitor
- Complete restart workflowmake ssh
- SSH connectionmake destroy
- VM cleanupmake test
- Infrastructure tests🧪 Infrastructure Validation
tests/infrastructure/test-local-setup.sh prerequisites
tests/infrastructure/test-local-setup.sh full-test
🎯 Torrust Tracker Installation Testing
DISCARDED: Compilation from sources was discarded. We have prioritised easy update over performance for the demo.
📋 Code Review
🎯 Next PRs Scope
The next PR will focus on production deployment:
🔒 Security Notes
local.tfvars
is git-ignored and contains actual secrets locally🎯 Testing the Infrastructure
📊 Files Changed
🚀 This provides a solid foundation for local testing before production Hetzner deployment.