Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57314

Unnecessary churn with OLMv0 operatorgroup clusterrole management

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 4.18.0
    • 4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18.z, 4.19.z, 4.20.0
    • OLM
    • Important
    • None
    • Lillipup Sprint 272
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      *Cause*: When OLM reconciles OperatorGroup objects that target all namespaces, it updates the OperatorGroup ClusterRoles with a non-deterministic ordering of aggregationRule selectors. This is a problem when there are two or more operator APIs being provided by operators in the OperatorGroup's namespace.
      *Consequence*: OLM makes unnecessary update calls to the ClusterRoles created for the OperatorGroup in the namespace(s) where operators are installed. This causes unnecessary churn in etcd and the apiserver.
      *Fix*: OLM was updated to sort the order of selectors in the ClusterRole aggregationRule
      *Result*: OLM no longer makes unnecessary update calls to the OperatorGroup's ClusterRole objects.
      Show
      *Cause*: When OLM reconciles OperatorGroup objects that target all namespaces, it updates the OperatorGroup ClusterRoles with a non-deterministic ordering of aggregationRule selectors. This is a problem when there are two or more operator APIs being provided by operators in the OperatorGroup's namespace. *Consequence*: OLM makes unnecessary update calls to the ClusterRoles created for the OperatorGroup in the namespace(s) where operators are installed. This causes unnecessary churn in etcd and the apiserver. *Fix*: OLM was updated to sort the order of selectors in the ClusterRole aggregationRule *Result*: OLM no longer makes unnecessary update calls to the OperatorGroup's ClusterRole objects.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-57279. The following is the description of the original issue:

      Description of problem:

      When there are one or more operators installed in a namespace with an OperatorGroup that targets all namespaces and where the operators provide a combined total of at least 2 APIs, OLMv0 sends ClusterRole updates to the APIserver with aggregation rule changes where the only change is the order of the aggregation rule selectors. This happens whenever the OperatorGroup is reconciled, which happens when other namespaces are created or deleted, among other triggers.
      
      This causes unnecessary churn with etcd writes and invalidation of auth caches in openshift-apiserver, which leads to yet more churn.    

      Version-Release number of selected component (if applicable):

      4.19.0-rc.5    

      How reproducible:

      Always    

      Steps to Reproduce:

          1. Get a clusterbot 4.19.0-rc5 cluster
          2. Install several operators in the global-operators namespace
          3. Start a watch for the clusterrole with the name prefix "olm.og.global-operators.admin-" (e.g. oc get clusterrole olm.og.global-operators.admin-3gjDVezhGPF6RBtOOpjEpDpKqO39v3NK8r4hmc -w -o yaml)
          4. Create and delete namespaces multiple times
          5. Observe from the watch that there are changes to the clusterrole and that the only change is to the order of the selectors in the aggregation rule.

      Actual results:

      Writes to the clusterrole occur due to changing order of selectors.

      Expected results:

      Writes to the clusterrole do not occur, because the order of selectors is deterministic. 

      Additional info:

          

              rh-ee-cchantse Catherine Chan-Tse
              openshift-crt-jira-prow OpenShift Prow Bot
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: