Skip to content

Conversation

@muscariello
Copy link
Member

@muscariello muscariello commented Oct 13, 2025

Summary

This PR adds a privacy-first visit tracking system for docs.agntcy.org.

Changes

Core Implementation

  • Visit Tracker (docs/javascripts/visit-tracker-secure.js): Secure client-side tracking using GitHub Issues API
  • GitHub Workflow (process-visits-secure.yml): Secure workflow for processing visit data using GitHub secrets
  • Processing Script (process_visits.py): Python script to process and store visit data in gist

Configuration

  • Updated mkdocs.yml to include tracking script
  • Added visit archive directory structure

Features

Privacy First

  • No tokens exposed (uses public GitHub Issues API for submission)
  • GitHub secrets for secure gist access (server-side only)
  • Respects Do Not Track
  • Disabled on localhost
  • Bot detection

Secure Architecture

  • Client submits via GitHub Issues (no auth needed)
  • GitHub Actions processes with secrets (secure)
  • Data stored in private gist
  • Automatic issue cleanup

Developer Friendly

  • Clean separation of concerns
  • Well-organized in .github/
  • Comprehensive inline documentation

How It Works

  1. Client Side: Browser collects visits in localStorage
  2. Submission: Creates GitHub Issue with JSONL data (public API, no auth)
  3. Processing: GitHub Actions triggered by issue label
  4. Storage: Actions uses secrets to append to gist
  5. Cleanup: Issue auto-closed after processing

Security

  • ✅ No tokens in client code
  • ✅ GitHub secrets used only server-side
  • ✅ Private gist for data storage
  • ✅ Automated issue lifecycle

Related Issue

Closes #279

- Add secure visit tracker using GitHub Issues API
- Implement GitHub Actions workflows for processing visits
- Add CLI test scripts for validation (test-tracking-*.sh)
- Add browser automation test (test-tracking.js)
- Update Taskfile with 'task test:tracking' command
- Include visit tracker script in MkDocs configuration

The tracking system:
- Collects page visit data in localStorage
- Submits via GitHub Issues (no tokens exposed)
- Processes with GitHub Actions
- Respects privacy (Do Not Track, localhost disabled)
- Includes comprehensive testing suite

Signed-off-by: Luca Muscariello <[email protected]>
@muscariello muscariello requested a review from a team as a code owner October 13, 2025 13:59
@muscariello muscariello linked an issue Oct 13, 2025 that may be closed by this pull request
2 tasks
@muscariello muscariello requested review from jubarbot-cisco and removed request for a team and jubarbot-cisco October 13, 2025 13:59
Only keep process-visits-secure.yml which uses GitHub secrets
for secure gist access.

Signed-off-by: Luca Muscariello <[email protected]>
Test scripts moved to local-only usage:
- Removed test-tracking-simple.sh
- Removed test-tracking-flow.sh
- Removed test-tracking.js
- Removed test:tracking tasks from Taskfile
- Reverted .gitignore changes

These scripts remain available locally for manual testing.

Signed-off-by: Luca Muscariello <[email protected]>
@muscariello muscariello marked this pull request as draft October 13, 2025 15:11
Add secure validation layer to prevent malicious data submission:

- New validation script (validate_visit_data.py):
  * Size limits: 1MB max issue, 100 visits per issue
  * Field whitelisting and type validation
  * Path traversal prevention (no .. or ~)
  * Safe character sets for all fields
  * ISO timestamp validation
  * Domain validation for referrers

- Updated workflow (process-visits-secure.yml):
  * Validates all data before processing
  * Auto-closes invalid issues with explanation
  * Only processes data after validation success
  * Proper error handling and logging

Security protections:
- Prevents code injection attacks
- Blocks path traversal attempts
- Mitigates XSS via character whitelisting
- DoS protection via size limits
- No shell command execution of user data

Signed-off-by: Luca Muscariello <[email protected]>
@muscariello
Copy link
Member Author

Check this flow @ramizpolic

┌──────────────┐
│ User Browser │ (1) Submits via GitHub Issue API (no auth)
└──────┬───────┘
       │
       ↓
┌─────────────────────────────────────────┐
│ GitHub Issue                             │
│ - Public API (no token needed)          │
│ - Contains JSONL in code block          │
└──────┬──────────────────────────────────┘
       │
       ↓ (2) Trigger workflow
┌────────────────────────────────────────────┐
│ GitHub Actions Workflow                    │
│ ✓ Validates issue body                    │
│ ✓ Extracts JSONL block safely             │
│ ✓ Runs Python validation script           │
│ ✓ Checks all fields and formats           │
│ ✓ Rejects invalid data                    │
└──────┬─────────────────────────────────────┘
       │
       ↓ (3) Only if valid
┌────────────────────────────────────────────┐
│ GitHub Gist Storage (with secrets)         │
│ ✓ Appends validated data                  │
│ ✓ Uses VISIT_GIST_TOKEN (server-side)     │
│ ✓ Never exposes token to client           │
└────────────────────────────────────────────┘

Remove test:tracking tasks from Taskfile.yml:
- test:tracking
- test:tracking:setup
- test:tracking:flow

Revert .gitignore changes:
- Remove node_modules entry
- Remove package-lock.json entry

These were part of the test scripts that have been moved to local-only usage.

Signed-off-by: Luca Muscariello <[email protected]>
@muscariello muscariello changed the title feat(docs): Add visit tracking with comprehensive testing feat(docs): Add visit tracking Oct 13, 2025
@muscariello
Copy link
Member Author

We keep this on hold as a user must be authenticated in github to post. This is not always true and when true it does not work for privacy issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Docs]: collect stats about docs.agntcy.org

3 participants