Skip to content

Conversation

@PannagaRao
Copy link

Signed-off-by: Pannaga Rao Bhoja Ramamanohara

@PannagaRao PannagaRao changed the title Add kubelet and CRI-O panic detection invariant test [WIP]: Add kubelet and CRI-O panic detection invariant test Sep 11, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 11, 2025
@openshift-trt
Copy link

openshift-trt bot commented Sep 12, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 7570b8a

  • "[sig-node] kubelet-log-collector detects kubelet or CRI-O panics" [Total: 38, Pass: 38, Fail: 0, Flake: 0]

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 12, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2025
@openshift-trt
Copy link

openshift-trt bot commented Sep 16, 2025

Job Failure Risk Analysis for sha: e2aae7e

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive High
operator conditions image-registry
This test has passed 99.44% of 3956 runs on release 4.21 [Overall] in the last week.

Open Bugs
Define New Post Analysis Command
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (22) are below the historical average (4535): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: e2aae7e

  • "[Monitor:kubelet-log-collector][sig-node] kubelet-log-collector detects kubelet or CRI-O panics" [Total: 11, Pass: 11, Fail: 0, Flake: 0]

@PannagaRao
Copy link
Author

/test-with openshift/kubernetes#2468

@PannagaRao
Copy link
Author

/test e2e-aws-ovn-fips

@PannagaRao
Copy link
Author

/test e2e-aws-ovn-fips --from openshift/kubernetes#2468

@PannagaRao
Copy link
Author

/test e2e-aws-ovn-fips --from=openshift/kubernetes#2468

@openshift-trt
Copy link

openshift-trt bot commented Sep 18, 2025

Job Failure Risk Analysis for sha: 973a936

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive High
operator conditions monitoring
This test has passed 99.52% of 4175 runs on release 4.21 [Overall] in the last week.

Open Bugs
Define New Post Analysis Command
---
operator conditions image-registry
This test has passed 99.54% of 4174 runs on release 4.21 [Overall] in the last week.

Open Bugs
Define New Post Analysis Command
---
operator conditions ingress
This test has passed 99.57% of 4175 runs on release 4.21 [Overall] in the last week.

Open Bugs
Define New Post Analysis Command
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (104) are below the historical average (1641): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 973a936

  • "[Monitor:kubelet-log-collector][sig-node] kubelet-log-collector detects kubelet or CRI-O panics" [Total: 36, Pass: 36, Fail: 0, Flake: 0]

@PannagaRao
Copy link
Author

/payload-job-with-prs e2e-aws-ovn-fips openshift/kubernetes#2468

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2025

@PannagaRao: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • e2e-aws-ovn-fips

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0915cc20-94aa-11f0-86a6-18150b14e243-0

@PannagaRao
Copy link
Author

/payload-job-with-prs e2e-aws-ovn-fips openshift/kubernetes#2468

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2025

@PannagaRao: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@PannagaRao
Copy link
Author

/payload-job-with-prs e2e-aws-ovn-fips openshift/kubernetes#2468

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2025

@PannagaRao: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • e2e-aws-ovn-fips

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/148ed9e0-94d0-11f0-9cf6-f13decee80c1-0

@PannagaRao
Copy link
Author

/payload-job-with-prs e2e-aws-ovn-fips openshift/kubernetes#2468

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2025

@PannagaRao: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@openshift-trt
Copy link

openshift-trt bot commented Oct 3, 2025

Job Failure Risk Analysis for sha: 9d16343

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive High
operator conditions image-registry
This test has passed 98.97% of 4163 runs on release 4.21 [Overall] in the last week.

Open Bugs
Define New Post Analysis Command
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (25) are below the historical average (1474): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade Low
Job run should complete before timeout
This test has passed 79.83% of 5390 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (25) are below the historical average (2510): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 9d16343

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-csi High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-gcp-csi High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: 9d16343

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 59, Pass: 59, Fail: 0, Flake: 0]

@openshift-trt
Copy link

openshift-trt bot commented Oct 3, 2025

Job Failure Risk Analysis for sha: 9d16343

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive High
operator conditions image-registry
This test has passed 98.95% of 4180 runs on release 4.21 [Overall] in the last week.

Open Bugs
Define New Post Analysis Command
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (25) are below the historical average (1469): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade Low
Job run should complete before timeout
This test has passed 79.97% of 5412 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (25) are below the historical average (2510): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 9d16343

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-csi High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-gcp-csi High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: 9d16343

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 59, Pass: 59, Fail: 0, Flake: 0]

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2025
@openshift-trt
Copy link

openshift-trt bot commented Nov 4, 2025

Job Failure Risk Analysis for sha: 9d16343

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (25) are below the historical average (1797): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade Medium
Job run should complete before timeout
This test has passed 90.52% of 4272 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-metal-ipi-virtualmedia Medium
Job run should complete before timeout
This test has passed 90.52% of 4272 runs on release 4.21 [Overall] in the last week.

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 9d16343

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-csi High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-gcp-csi High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: 9d16343

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 61, Pass: 61, Fail: 0, Flake: 0]

@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Nov 10, 2025
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 10, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 10, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 10, 2025

New changes are detected. LGTM label has been removed.

@openshift-trt
Copy link

openshift-trt bot commented Nov 10, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 146e59e

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 7, Pass: 7, Fail: 0, Flake: 0]

@openshift-trt
Copy link

openshift-trt bot commented Nov 10, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 146e59e

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 11, Pass: 11, Fail: 0, Flake: 0]

@kannon92
Copy link
Contributor

cc @dgoodwin

I know that TRT was interested in improving panic detection. Curious if you all could help review?

@kannon92
Copy link
Contributor

@PannagaRao can you fix the go fmt failures?

Signed-off-by: Pannaga Rao Bhoja Ramamanohara
@PannagaRao PannagaRao force-pushed the monitor-test-panic branch 2 times, most recently from 00d170a to 9a37149 Compare November 13, 2025 18:53
@PannagaRao
Copy link
Author

/retest

@openshift-trt
Copy link

openshift-trt bot commented Nov 14, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 9a37149

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-csi High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: 9a37149

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 15, Pass: 15, Fail: 0, Flake: 0]

@openshift-trt
Copy link

openshift-trt bot commented Nov 14, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 9a37149

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-csi High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: 9a37149

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 17, Pass: 17, Fail: 0, Flake: 0]

errCh <- err
return
}
newCrioLogs := eventsFromCrioLogs(nodeName, crioLogs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I was tripped up at first by 'newCrioLogs' whereas the others would follow 'newCrioInterval' / 'newCrioEvents'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to newCrioEvents.

Locator(nodeLocator).
Message(monitorapi.NewMessage().
Reason(reason).
HumanMessage(human)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the junit references locator so likely not an issue but just noting that other messages in here include the nodeName as well

monitorapi.NewMessage().Reason(reason).Node(nodeName).HumanMessage(message),

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's redundant since the node is already in the locator, I tried keeping it consistent with other node-level functions in this file.

@neisw
Copy link
Contributor

neisw commented Nov 14, 2025

/approve
A couple of comments but no push for changes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DennisPeriquet, haircommander, neisw, PannagaRao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [DennisPeriquet,neisw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Pannaga Rao Bhoja Ramamanohara
@openshift-trt
Copy link

openshift-trt bot commented Nov 14, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 93e2bfb

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 11, Pass: 11, Fail: 0, Flake: 0]

@openshift-trt
Copy link

openshift-trt bot commented Nov 15, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 93e2bfb

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 12, Pass: 12, Fail: 0, Flake: 0]

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 18, 2025

@PannagaRao: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn 9d16343 link false /test e2e-aws-ovn
ci/prow/e2e-metal-ipi-virtualmedia 9d16343 link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-aws-ovn-single-node-upgrade 9d16343 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-cgroupsv2 9d16343 link false /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-aws-ovn-single-node 9d16343 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-kube-apiserver-rollout 9d16343 link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-aws-disruptive 9d16343 link false /test e2e-aws-disruptive
ci/prow/okd-scos-e2e-aws-ovn 9d16343 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-openstack-ovn 9d16343 link false /test e2e-openstack-ovn
ci/prow/e2e-vsphere-ovn 93e2bfb link true /test e2e-vsphere-ovn
ci/prow/e2e-vsphere-ovn-upi 93e2bfb link true /test e2e-vsphere-ovn-upi
ci/prow/e2e-metal-ipi-ovn-ipv6 93e2bfb link true /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-trt
Copy link

openshift-trt bot commented Nov 18, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 93e2bfb

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: 93e2bfb

  • "[Monitor:kubelet-log-collector][sig-node] node-system-log-collector detects kubelet or CRI-O panics" [Total: 13, Pass: 13, Fail: 0, Flake: 0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants