Skip to content

Conversation

@dhlaluku
Copy link
Contributor

This feature introduces a new API command that will improve troubleshooting of network issues on CloudStack hosted networks by executing network-utility commands (ping, traceroute, arping) remotely on system VMs.

Description

For troubleshooting purposes, CloudStack administrators may wish to execute network utility commands remotely on system VMs, or request system VMs to ping/traceroute/arping to specific addresses over specific interfaces. An API command to provide such functionalities is being developed without altering any existing APIs. The targeted system VMs for this feature are the Virtual Router (VR), Secondary Storage VM (SSVM) and the Console Proxy VM (CPVM).

FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Remote+Diagnostics+API
ML discussion: https://markmail.org/message/xt7owmb2c6iw7tva

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

GitHub Issue/PRs

Screenshots (if appropriate):

image

How Has This Been Tested?

Log on to the CloudStack management server as root admin and start cloudmonkey. Sync APIs and then execute the run diagnostics command as follows:
run diganostics ipaddress=www.shapeblue.com type=arping targetid=uuid params="-I eth0 -c 4"

Where;

  • ipaddress is the destination IP/Domain address to test connection to;
  • type is the diagnostics command type to execute from remote target
  • targetid is the uuid of the system VM from which to test
  • params are optional command line arguments that apply to each diagnostics command type

Dev environment components:

  • Platform: ACS-4.12
  • Management OS: Ubuntu 18.04 LTS
  • Hypervisor: 1 Host running KVM with CentOS 7

This command does not run with the CloudStack simulator and has only been tested on real hardware environments.

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
    Testing
  • I have added tests to cover my changes.
  • All relevant new and existing integration tests have passed.
  • A full integration testsuite with all test that can run on my environment has passed.

@dhlaluku dhlaluku changed the title Introducing a new diagnostics API command for system VMs for CloudStack admins api:Introducing a new diagnostics API command for system VMs for CloudStack admins Jun 25, 2018
@dhlaluku dhlaluku changed the title api:Introducing a new diagnostics API command for system VMs for CloudStack admins api: Introducing a new diagnostics API command for system VMs for CloudStack admins Jun 25, 2018
@borisstoyanov
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've executed the following manual tests:

Test Name Steps
Run against CPVM With cloudmonkey execute: run diagnostics type=arping ipaddress=www.shapeblue.com id=[CPVM-id] params="-I eth1 -c 8"
Run against SSVM With cloudmonkey execute: run diagnostics type=arping ipaddress=www.shapeblue.com id=[SSVM-id] params="-I eth1 -c 8"
Run against VR With cloudmonkey execute: run diagnostics type=arping ipaddress=www.shapeblue.com id=[vr-id] params="-I eth1 -c 8"
Run a ping type With cloudmonkey execute: run diagnostics type=ping ipaddress=www.shapeblue.com id=[vr-id]
Run an arping type With cloudmonkey execute: run diagnostics type=arping ipaddress=www.shapeblue.com id=[vr-id] params="-I eth1 -c 8"
Run a traceroute type With cloudmonkey execute: run diagnostics type=traceroute ipaddress=www.shapeblue.com id=[vr-id]
Run against VR with arguments With cloudmonkey execute: run diagnostics type=arping ipaddress=www.shapeblue.com id=[vr-id] params="-I eth1 -c 8"
Run a ping type with arguments With cloudmonkey execute: run diagnostics type=arping ipaddress=www.shapeblue.com id=[vr-id] params="-I eth1 -c 8"
Run an traceroute type with arguments With cloudmonkey execute: run diagnostics type=arping ipaddress=www.shapeblue.com id=[vr-id] params="-I eth1 -c 8"
Run an type with invalid arguments With cloudmonkey execute: run diagnostics type=arping ipaddress=www.shapeblue.com id=[vr-id] params="-H eth1 -c 8"
Run type to unreachable desctination With cloudmonkey execute: run diagnostics type=arping ipaddress=www.something-not-readchable.com id=[vr-id] params="-H eth1 -c 8"
Run with invalid type run diagnostics type=netstat ipaddress=www.shapeblue.com id=[vr-id]
Run with invalid target id. run diagnostics type=ping ipaddress=www.shapeblue.com id=[invalid-id]

LGTM, lets wait for the automated results

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2152

@borisstoyanov
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2825)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 26692 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2721-t2825-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_accounts.py
Intermitten failure detected: /marvin/tests/smoke/test_affinity_groups_projects.py
Intermitten failure detected: /marvin/tests/smoke/test_certauthority_root.py
Intermitten failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermitten failure detected: /marvin/tests/smoke/test_primary_storage.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_snapshots.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermitten failure detected: /marvin/tests/smoke/test_host_maintenance.py
Intermitten failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 58 look OK, 10 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestAccounts>:setup Error 0.00 test_accounts.py
ContextSuite context=TestAddVmToSubDomain>:setup Error 0.00 test_accounts.py
test_DeleteDomain Error 0.92 test_accounts.py
test_forceDeleteDomain Error 0.94 test_accounts.py
ContextSuite context=TestRemoveUserFromAccount>:setup Error 5.14 test_accounts.py
ContextSuite context=TestDeployVmWithAffinityGroup>:setup Error 0.00 test_affinity_groups_projects.py
test_provision_certificate Error 17.38 test_certauthority_root.py
ContextSuite context=TestDeployVirtioSCSIVM>:setup Error 0.00 test_deploy_virtio_scsi_vm.py
test_01_add_primary_storage_disabled_host Error 0.73 test_primary_storage.py
test_01_primary_storage_nfs Error 0.10 test_primary_storage.py
ContextSuite context=TestStorageTags>:setup Error 0.18 test_primary_storage.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 148.40 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 249.85 test_privategw_acl.py
test_02_list_snapshots_with_removed_data_store Error 1.13 test_snapshots.py
test_01_secure_vm_migration Error 5.16 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 5.62 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 1.14 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 4.59 test_vm_life_cycle.py
ContextSuite context=TestVMLifeCycle>:setup Error 7.35 test_vm_life_cycle.py
test_01_cancel_host_maintenace_with_no_migration_jobs Failure 0.09 test_host_maintenance.py
test_02_cancel_host_maintenace_with_migration_jobs Error 2.25 test_host_maintenance.py
test_hostha_enable_ha_when_host_in_maintenance Error 2.46 test_hostha_kvm.py

@borisstoyanov
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@borisstoyanov
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2830)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 29372 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2721-t2830-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_certauthority_root.py
Intermitten failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermitten failure detected: /marvin/tests/smoke/test_volumes.py
Intermitten failure detected: /marvin/tests/smoke/test_host_maintenance.py
Intermitten failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 62 look OK, 6 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_provision_certificate Error 8.21 test_certauthority_root.py
ContextSuite context=TestDeployVirtioSCSIVM>:setup Error 0.00 test_deploy_virtio_scsi_vm.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 1213.03 test_privategw_acl.py
test_01_secure_vm_migration Error 5.16 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 1.08 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 1.08 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 3.12 test_vm_life_cycle.py
test_11_migrate_volume_and_change_offering Error 129.66 test_volumes.py
test_hostha_enable_ha_when_host_in_maintenance Error 3.46 test_hostha_kvm.py
test_hostha_kvm_host_recovering Error 7.56 test_hostha_kvm.py

@borisstoyanov
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2831)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 28543 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2721-t2831-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_certauthority_root.py
Intermitten failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermitten failure detected: /marvin/tests/smoke/test_volumes.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermitten failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 62 look OK, 6 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_provision_certificate Error 7.25 test_certauthority_root.py
ContextSuite context=TestDeployVirtioSCSIVM>:setup Error 0.00 test_deploy_virtio_scsi_vm.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 1157.31 test_privategw_acl.py
test_01_secure_vm_migration Error 3.15 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 1.10 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 1.10 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 3.19 test_vm_life_cycle.py
test_11_migrate_volume_and_change_offering Error 128.27 test_volumes.py
test_02_redundant_VPC_default_routes Failure 990.47 test_vpc_redundant.py

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the niche of this new feature we might find aspects we would like slightly different. The code looks good and contains unit - and integration tests. Let's run with it.

from nose.plugins.attrib import attr


class TestRemoteDiagnostics(cloudstackTestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhlaluku can you add this to .travis.yml, fix it to make it work/skip against simulator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests has been added to Travis file in the section that contains tests for routers and system VMs

Copy link
Member

@rohityadavcloud rohityadavcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhlaluku for simulator see if you add some mocks to get this APIs work against simulator i.e. with Travis to test business logic. Please see if you can fix that.

@rohityadavcloud
Copy link
Member

Please hold merging, while everything is LGTM -- @dhlaluku will add support for simulator with his work to make the business logic test-able and add his new marvin test with .travis.yml

Dingane Hlaluku added 2 commits July 6, 2018 12:15
troubleshooting of network issues in CloudStack hosted networks
@rohityadavcloud
Copy link
Member

ping @dhlaluku

@dhlaluku dhlaluku force-pushed the remote-diagnostics-api branch from bbe835d to d2d0698 Compare July 10, 2018 10:43
@dhlaluku
Copy link
Contributor Author

@rhtyd I have included "test_diagnostics" in the Travis file. Updated the Marvin tests to skip the negative test cases for the Simulator hypervisor.

Also defined some mocks for the positive cases, example below with Cloudmonkey;

(local) 🐵 > run diagnostics targetid=0c04cf04-8222-47fd-a883-6cf335646d64 ipaddress=8.8.8.8 type=traceroute
{
"diagnostics": {
"exitcode": "0",
"stderr": "",
"stdout": "TRACEROUTE 8.8.8.8 executed in v-1-VM"
}
}

@rohityadavcloud
Copy link
Member

Fantastic @dhlaluku - let's wait for travis to complete/pass.

.travis.yml Outdated
# Keep the TESTS sorted by name and grouped by type
- TESTS="smoke/test_certauthority_root"

- TESTS="smoke/test_diagnostics"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the name sorted, include your file under an existing test section/block, don't add a new job/runner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included test to the section with routers and ssvm tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you should keep it sorted by name as the file suggests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- TESTS="smoke/test_accounts
         smoke/test_affinity_groups
         smoke/test_affinity_groups_projects
         smoke/test_deploy_vgpu_enabled_vm
         smoke/test_deploy_vm_iso
         smoke/test_deploy_vm_root_resize
         smoke/test_deploy_vm_with_userdata
         smoke/test_deploy_vms_with_varied_deploymentplanners
         **smoke/test_diagnostics**
         smoke/test_disk_offerings
         smoke/test_dynamicroles
         smoke/test_global_settings
         smoke/test_guest_vlan_range"

} else {
throw new CloudRuntimeException("Command execution failed: " + details);
executionDetailsMap.put(ApiConstants.STDOUT, "");
executionDetailsMap.put(ApiConstants.STDERR, details );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove space after details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@dhlaluku
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@dhlaluku a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✖centos7 ✔debian. JID-2168

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2170

@dhlaluku
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@dhlaluku a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2842)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 35370 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2721-t2842-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_certauthority_root.py
Intermitten failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_public_ip_range.py
Intermitten failure detected: /marvin/tests/smoke/test_templates.py
Intermitten failure detected: /marvin/tests/smoke/test_usage.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermitten failure detected: /marvin/tests/smoke/test_volumes.py
Intermitten failure detected: /marvin/tests/smoke/test_host_maintenance.py
Smoke tests completed. 60 look OK, 8 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_provision_certificate Error 9.21 test_certauthority_root.py
ContextSuite context=TestDeployVirtioSCSIVM>:setup Error 0.00 test_deploy_virtio_scsi_vm.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 1226.87 test_privategw_acl.py
test_04_extract_template Failure 128.25 test_templates.py
ContextSuite context=TestISOUsage>:setup Error 0.00 test_usage.py
test_01_secure_vm_migration Error 5.14 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 4.12 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 1.09 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 0.07 test_vm_life_cycle.py
test_06_download_detached_volume Failure 137.56 test_volumes.py
test_11_migrate_volume_and_change_offering Error 128.44 test_volumes.py
test_02_cancel_host_maintenace_with_migration_jobs Error 2.24 test_host_maintenance.py

Copy link
Member

@rohityadavcloud rohityadavcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rohityadavcloud
Copy link
Member

Merging this based on code reviews and testing. The failures are not related to this PR.

@rohityadavcloud rohityadavcloud merged commit 40af32b into apache:master Jul 13, 2018
@dhlaluku dhlaluku deleted the remote-diagnostics-api branch July 17, 2018 12:00
borisstoyanov pushed a commit to shapeblue/cloudstack that referenced this pull request Jul 23, 2018
This is a new feature for CS that allows Admin users improved
troubleshooting of network issues in CloudStack hosted networks.

Description: For troubleshooting purposes, CloudStack administrators may wish to execute network utility commands remotely on system VMs, or request system VMs to ping/traceroute/arping to specific addresses over specific interfaces. An API command to provide such functionalities is being developed without altering any existing APIs. The targeted system VMs for this feature are the Virtual Router (VR), Secondary Storage VM (SSVM) and the Console Proxy VM (CPVM).

FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Remote+Diagnostics+API
ML discussion:
https://markmail.org/message/xt7owmb2c6iw7tva
bernardodemarco pushed a commit to scclouds/cloudstack that referenced this pull request Jul 16, 2025
Alteração da quantidade máxima de projetos através da UI

Closes apache#2721

See merge request scclouds/scclouds!1193
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants