Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.16.z
Component/s: Cluster Version Operator
Labels:
None

Severity:
Critical
Regression:
None
Sprint:
OTA 269, OTA 270, OTA 271, OTA 272
sprint_count:
4
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Description of problem:

Following the oc adm upgrade --to-multi-arch rollout, the cluster's core components were disrupted for an extended period of time, leading to service downtime.

Version-Release number of selected component (if applicable):

RHOCP 4.16.35

Actual results:

Between, roughly, 01:30 UTC and 04:00 UTC, etcd:
- Database back-end size was seen to increase significantly, peaking at close to 7,5 GiB @ 3:45 UTC... from the regular 4.8 GiB in size (before and after the peak)
- Went to multiple iterations of outage and leadership changes.

RAM consumption was reasonably low, averaging 20-30% and only with with peaks ~60%, but CPU showed a linear increase from 200% @ 01:44 UTC to 413% @ 03:56 UTC

- Between 01:30 and 04:00 UTC the kube-apiserver was seen to be recurrently unavailable.

Expected results:

For multiarch to not lead to service disruption.prevent

Additional info:

The environment is quite large in size, which may have potentially contributed to the sudden resource consumption peak.

Assignee:: W. Trevor King

Reporter:: Robert Sandu

QA Contact:: Jian Li

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2025/03/24 11:32 AM

Updated:: 2025/06/04 7:29 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates