-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.16.z
-
None
-
Critical
-
None
-
OTA 269, OTA 270, OTA 271, OTA 272
-
4
-
False
-
Description of problem:
Following the oc adm upgrade --to-multi-arch rollout, the cluster's core components were disrupted for an extended period of time, leading to service downtime.
Version-Release number of selected component (if applicable):
RHOCP 4.16.35
Actual results:
Between, roughly, 01:30 UTC and 04:00 UTC, etcd: - Database back-end size was seen to increase significantly, peaking at close to 7,5 GiB @ 3:45 UTC... from the regular 4.8 GiB in size (before and after the peak) - Went to multiple iterations of outage and leadership changes. RAM consumption was reasonably low, averaging 20-30% and only with with peaks ~60%, but CPU showed a linear increase from 200% @ 01:44 UTC to 413% @ 03:56 UTC - Between 01:30 and 04:00 UTC the kube-apiserver was seen to be recurrently unavailable.
Expected results:
For multiarch to not lead to service disruption.prevent
Additional info:
The environment is quite large in size, which may have potentially contributed to the sudden resource consumption peak.