Skip to content

Conversation

@ustcweizhou
Copy link
Contributor

Description

This PR provides multiple options to handle storage issue on kvm.

FS: to be added

Two main parts
(1) kvm: Handle storage issue on NFS/KVM in multiple ways : hardreset (default), noaction, destroyvms, stopagent
(2) kvm: kvm: Schedule investigate tasks for disconnected hosts

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

@weizhouapache
Copy link
Member

related to #2722 #2890 #2984 #4586

@weizhouapache weizhouapache reopened this Feb 22, 2021
@weizhouapache weizhouapache marked this pull request as ready for review April 23, 2021 09:19
@rohityadavcloud rohityadavcloud added this to the 4.16.0.0 milestone May 25, 2021
@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 91

@GabrielBrascher
Copy link
Member

@ustcweizhou thanks for the PR. From what I understood (really quick glance of eyes) this avoid also situations of HA monitor a removed NFS storage. Am I right?

I have seen some weird issues where a Cluster that had NFS and removes it (standard storage removal process Up > Maintenance > Removed) but still HA expects NFS pool to be "Healthy" and simply Fence all hosts on the cluster.

@nvazquez
Copy link
Contributor

Hi @ustcweizhou can you please resolve the conflicts?

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about the use of Thread directly, @weizhouapache. Any particular reason?

Set<String> removedPools = new HashSet<String>();
for (String uuid : _storagePool.keySet()) {
NfsStoragePool primaryStoragePool = _storagePool.get(uuid);
protected class CheckPoolThread extends Thread {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this a ManagemedContextRunnable as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManagedContextRunnable and Thread are both implemetation of Runnable interface.
Thread works in testing.

ManagedContextRunnable should also be ok. The following might work as well

protected class CheckPoolThread implements Runnable {

@rohityadavcloud
Copy link
Member

Ping @weizhouapache can you fix the conflicts, is this essential for 4.16?

_monitorExecutor = new ScheduledThreadPoolExecutor(1, new NamedThreadFactory("AgentMonitor"));

_scanHostsExecutor = new ScheduledThreadPoolExecutor(1, new NamedThreadFactory("HostsScanner"));
_investigatorExecutor = new ScheduledThreadPoolExecutor(InvestigateDisconnectedHostsPoolSize.value(), new NamedThreadFactory("DisconnectHostsInvestigator"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_investigatorExecutor = new ScheduledThreadPoolExecutor(InvestigateDisconnectedHostsPoolSize.value(), new NamedThreadFactory("DisconnectHostsInvestigator"));
_investigatorExecutor = new ScheduledThreadPoolExecutor(InvestigateDisconnectedHostsPoolSize.value(), new NamedThreadFactory("DisconnectedHostsInvestigator"));

_connectExecutor.shutdownNow();
_monitorExecutor.shutdownNow();
_investigatorExecutor.shutdownNow();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_scanHostsExecutor.shutdownNow() missing

protected final ConfigKey<Integer> InvestigateDisconnectedHostsInterval = new ConfigKey<>("Advanced", Integer.class, "investigate.disconnected.hosts.interval",
"300", "The time (in seconds) between VM investigation on disconnected hosts.", false);
protected final ConfigKey<Integer> InvestigateDisconnectedHostsPoolSize = new ConfigKey<Integer>("Advanced", Integer.class, "investigate.disconnected.hosts.pool.size", "10",
"Default pool size to investigate disconnected hosts", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Default pool size to investigate disconnected hosts", false);
"The thread pool size to investigate disconnected hosts", false);

@sureshanaparti
Copy link
Contributor

@weizhouapache can you address outstanding remarks, and fix the conflicts.

@weizhouapache
Copy link
Member

@rhtyd @sureshanaparti
this PR has many conflicts with #5239 which has been merged into master.
I think it is better to re-target to 4.17

what do you think ? @ravening @soreana

@soreana
Copy link
Member

soreana commented Sep 16, 2021

@weizhouapache I'm agree 👍

@sureshanaparti sureshanaparti modified the milestones: 4.16.0.0, 4.17.0.0 Sep 16, 2021
@soreana
Copy link
Member

soreana commented Dec 1, 2021

@weizhouapache I hope you are doing well :)
I'm going to work on this pr. How can I reproduce it?

@weizhouapache
Copy link
Member

@weizhouapache I hope you are doing well :) I'm going to work on this pr. How can I reproduce it?

@soreana
what do you want to reproduce ? this is an enhancement, not a bug.

@soreana
Copy link
Member

soreana commented Dec 1, 2021

@weizhouapache I hope you are doing well :) I'm going to work on this pr. How can I reproduce it?

@soreana what do you want to reproduce ? this is an enhancement, not a bug.

Some test cases to see if it works or not would be enough.

@weizhouapache
Copy link
Member

@weizhouapache I hope you are doing well :) I'm going to work on this pr. How can I reproduce it?

@soreana what do you want to reproduce ? this is an enhancement, not a bug.

Some test cases to see if it works or not would be enough.

@soreana
I think you can get more info on your wiki/jira, etc.

@ravening ravening mentioned this pull request Dec 16, 2021
12 tasks
@ravening
Copy link
Member

@weizhouapache I created a new pr for this in #5783

@sureshanaparti
Copy link
Contributor

@weizhouapache I created a new pr for this in #5783

@weizhouapache @ravening is this PR no longer relevant (as the changes are ported to #5783)?

@ravening
Copy link
Member

@weizhouapache I created a new pr for this in #5783

@weizhouapache @ravening is this PR no longer relevant (as the changes are ported to #5783)?

@sureshanaparti yes. this has conflicts.. i have resolved the conflicts and created new pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants