Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54151

Extensive core RHOCP operations outage during multiarch upgrade

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.16.z
    • None
    • Critical
    • None
    • OTA 269, OTA 270, OTA 271, OTA 272
    • 4
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Following the oc adm upgrade --to-multi-arch rollout, the cluster's core components were disrupted for an extended period of time, leading to service downtime.
          

      Version-Release number of selected component (if applicable):

      RHOCP 4.16.35

      Actual results:

      Between, roughly, 01:30 UTC and 04:00 UTC, etcd:
      - Database back-end size was seen to increase significantly, peaking at close to 7,5 GiB @ 3:45 UTC... from the regular 4.8 GiB in size (before and after the peak)
      - Went to multiple iterations of outage and leadership changes.
      
      RAM consumption was reasonably low, averaging 20-30% and only with with peaks ~60%, but CPU showed a linear increase from 200% @ 01:44 UTC to 413% @ 03:56 UTC
      
      - Between 01:30 and 04:00 UTC the kube-apiserver was seen to be recurrently unavailable.

      Expected results:

      For multiarch to not lead to service disruption.prevent 

      Additional info:

      The environment is quite large in size, which may have potentially contributed to the sudden resource consumption peak.
          

              trking W. Trevor King
              rhn-support-rsandu Robert Sandu
              Jian Li Jian Li
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: