Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagepy
def Mwan3RuleReconciler.Reconcile(req ctrl.Request):
  rule_cr = k8sClient.get(req.NamespacedName)
  cnf_deployment = k8sClient.get_deployment_with_label(rule_cr.labels.sdewanPurpose)
  if rule_cr DeletionTimestamp exists:
    # The CR is being deleted. finalizer on the CR
    if cnf_deployment exists:
      if cnf_deployment is ready:
        for cnf_pod in cnf_deployment:
          err = openwrt_client.delete_rule(cnf_pod_ip, rule_cr)
          if err:
            return "re-queue req"
        rule_cr.finalizer = nil
        return "ok"
      else:
        return "re-queue req"
    else:
      # Just remove finalizer, because no CNF pod exists
      rule_cr.finalizer = nil
      return "ok"
  else:
    # The CR is not being deleted
    if cnf_deployment not exist:
      return "ok"
    else:
      if cnf_deployment not ready:
        # set appliedVersion = nil if cnf_deployment get into not_ready status
        rule_cr.status.appliedVersion = nil
        return "re-queue req"
      else:
        for cnf_pod in cnf_deployment:
          runtime_cr = openwrt_client.get_rule(cnf_pod_ip)
          if runtime_cr != rule_cr:
            err = openwrt_client.add_or_update_rule(cnf_pod_ip, rule_cr)
            if err:
              # err could be caused by dependencies not-applied or other reason
              return "re-queue req"
        # set appliedVerson only when it's applied for all the cnf pods
        rule_cr.finalizer = cnf_finalizer
        rule_cr.status.appliedVersion = rule_cr.resourceVersion
        rule_cr.status.inSync = True
        return "ok"

Unsual Cases

In the following cases, when we say "call CNF api to create/update/delte rule", it means the logic below:


def create_or_update_rule(rule): runtime_rule = openwrt_client.get_rule(rule.name) if runtime_rule exist: if runtime_rule equal rule: return else: openwrt_client.update_rule(rule) else: openwrt_client.add_rule(rule) def delete_rule(rule): runtime_rule = openwrt_client.get_rule(rule.name) if runtime_rule exist: openwrt_client.del_rule(rule)


Case 1:

  • A deployment(CNF) for a given purpose has two pod replicas (CNF-pod-1 and CNF-pod-2)
  • Controller is also brought yup.
  • CNF-pod-1 and CNF-pod-2 are both running with no/default configuration.
  • MWAN3 policy 1 is added
  • MWAN3 rule 1 and Rule 2 are added to use MWAN3 Policy1.
  • Since all controller, CNF-pod-1 and CNF-pod-2 are running, CNF-pod-1 and CNF-pod-2 has configuration MWAN3 Policy1, rule1 and rule2.
  • Now CNF-pod-1 is stopped.

    Info
    Mwan3Policy controller and Mwan3Rule controller receives a CNF event. Mwan3Policy addes all the related mwan3Policy CRs to reconcile queue. Mwan3Rule addes all the related mwan3Rule CRs to reconcile queue. In the reconicle, it finds that the CNF is not ready, so CR status.appliedVersion is set nil. The CRs are re-queued with time delay.



  • MWAN3 rule 1 is deleted.

    Panel
    As every CR has finalizer, rule 1 CR is not deleted from etcd directly. Instead, deleteTimestap field is added to the rule 1 CR. The mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.



  • MWAN3 rule 3 added

    Mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.


  • MWAN3 rule 2 is updated.

    Mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.


  • CNF-pod-1 is brought back up after 10 minutes (more than 5 minutes)

    As pod restart, CNF-pod-1 is running with no/default configuration. In Mwan3Rule reconcile queue, there are 3 CRs: rule1, rule2, rule3. The controller reconcile them, and do the right things. For rule1, controller calls cnf api to delete rule1 from both CNF-pod-1 and CNF-pod-2. Then controller removes finalizer from the rule1 CR, then rule1 CR is deleted from etcd by k8s. For rule2, controller calls cnf api to update rul2 for both CNF-pod-1 and CNF-pod-2. Then set rule2 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true. For rule3, controller calls cnf api to add rul3 for both CNF-pod-1 and CNF-pod-2. Then set rule3 finalizer. Also set rule3 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true.


  • Ensure that both CNF-pod-1 and CNF-pod-2 have latest configuration.

    Once the reconcile finish, both CNF-pod-1 and CNF-pod-2 have latest configuration.


Case 2:

  • A deployment(CNF) for a given purpose has two pod replicas (CNF-pod-1 and CNF-pod-2)
  • Controller is also brought yup.
  • CNF-pod-1 and CNF-pod-2 are both running with no/default configuration.
  • MWAN3 policy 1 is added
  • MWAN3 rule 1 and Rule 2 are added to use MWAN3 Policy1.
  • Since all controller, CNF-pod-1 and CNF-pod-2 are running, CNF-pod-1 and CNF-pod-2 has configuration MWAN3 Policy1, rule1 and rule2.
  • Now CNF-pod-1 is disconnected, but still running.

    We have the API rediness check for CNF pod, when it is disconnected. The CNF-pod-1 becomes not-ready. Mwan3Policy controller and Mwan3Rule controller receives a CNF event. Mwan3Policy addes all the related mwan3Policy CRs to reconcile queue. Mwan3Rule addes all the related mwan3Rule CRs to reconcile queue. In the reconicle, it finds that the CNF is not ready, so CR status.appliedVersion is set nil. The CRs are re-queued with time delay.


  • MWAN3 rule 1 is deleted.

    As every CR has finalizer, rule 1 CR is not deleted from etcd directly. Instead, deleteTimestap field is added to the rule 1 CR. The mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.


  • MWAN3 rule 3 added

    Mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.


  • MWAN3 rule 2 is updated.

    Mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.


  • CNF-pod-1 is brought back up after 10 minutes (more than 5 minutes)

    As pod restart, CNF-pod-1 is running with no/default configuration. In Mwan3Rule reconcile queue, there are 3 CRs: rule1, rule2, rule3. The controller reconcile them, and do the right things. For rule1, controller calls cnf api to delete rule1 from both CNF-pod-1 and CNF-pod-2. Then controller removes finalizer from the rule1 CR, then rule1 CR is deleted from etcd by k8s. For rule2, controller calls cnf api to update rul2 for both CNF-pod-1 and CNF-pod-2. Then set rule2 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true. For rule3, controller calls cnf api to add rul3 for both CNF-pod-1 and CNF-pod-2. Then set rule3 finalizer. Also set rule3 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true.


  • Ensure that both CNF-pod-1 and CNF-pod-2 have latest configuration.

    Once the reconcile finish, both CNF-pod-1 and CNF-pod-2 have latest configuration.


Case 3:

  • A deployment(CNF) for a given purpose has two pod replicas (CNF-pod-1 and CNF-pod-2)
  • Controller is also brought yup.
  • CNF-pod-1 and CNF-pod-2 are both running with no/default configuration.
  • MWAN3 policy 1 is added
  • MWAN3 rule 1 and Rule 2 are added to use MWAN3 Policy1.
  • Since all controller, CNF-pod-1 and CNF-pod-2 are running, CNF-pod-1 and CNF-pod-2 has configuration MWAN3 Policy1, rule1 and rule2.
  • Controller is down for 10 minutes.
  • MWAN3 rule 1 is deleted.

    As controller is down, so no event, no reconcile. rule1 CR is not deleted from etcd because of finalizer. Instead, DeleteTimestamp is added to rule1 CR by k8s


  • MWAN3 rule 3 added

    As controller is down, no event no reconcile. rule3 CR is added to etcd, but not applied onto CNF. rule3 status.appliedVersion and status.appliedTime and status.inSync are nil/default value.


  • MWAN3 rule 2 is updated.

    As controller is down, no event no reconcile. rule2 CR is updated to etcd, but not applied onto CNF. rule3 status.appliedVersion and status.appliedTime and status.inSync are the value before controller goes down.


  • Controller is up.

    Controller reconciles for all CRs. For rule1 CR, controller calls cnf api to delete rule1 from both CNF-pod-1 and CNF-pod-2. Then controller removes finalizer from the rule1 CR, then rule1 CR is deleted from etcd by k8s. For rule2, controller calls cnf api to update rul2 for both CNF-pod-1 and CNF-pod-2. Then set rule2 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true. For rule3, controller calls cnf api to add rul3 for both CNF-pod-1 and CNF-pod-2. Then set rule3 finalizer. Also set rule3 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true.


  • Ensure that CNF-pod-1 and CNF-pod-2 have latest configuration and there is no duplicate information.

    Once the reconcile finish, both CNF-pod-1 and CNF-pod-2 have latest configuration.


Case 4:

  • A deployment(CNF) for a given purpose has two pod replicas (CNF-pod-1 and CNF-pod-2)
  • Controller is also brought yup.
  • CNF-pod-1 and CNF-pod-2 are both running with no/default configuration.
  • MWAN3 policy 1 is added
  • MWAN3 rule 1 and Rule 2 are added to use MWAN3 Policy1.
  • Since all controller, CNF-pod-1 and CNF-pod-2 are running, CNF-pod-1 and CNF-pod-2 has configuration MWAN3 Policy1, rule1 and rule2.
  • Controller is down for 10 minutes.
  • After controller goes down, CNF-pod-1 is down

    As controller is down, so no event, no reconcile.


  • MWAN3 rule 1 is deleted.

    As controller is down, so no event, no reconcile. rule1 CR is not deleted from etcd because of finalizer. Instead, DeleteTimestamp is added to rule1 CR by k8s


  • MWAN3 rule 3 added

    As controller is down, no event no reconcile. rule3 CR is added to etcd, but not applied onto CNF. rule3 status.appliedVersion and status.appliedTime and status.inSync are nil/default value.


  • There is no change for MWAN3 rule 2
  • CNF-pod-1 is up

    As controller is down, so no event, no reconcile. As pod restart, CNF-pod-1 is running with no/default configuration.


  • Controller is up.

    Controller reconciles for all CRs. For rule1 CR, controller calls cnf api to delete rule1 from both CNF-pod-1 and CNF-pod-2. Then controller removes finalizer from the rule1 CR, then rule1 CR is deleted from etcd by k8s. For rule2, controller calls cnf api to update rul2 for both CNF-pod-1 and CNF-pod-2. Then set rule2 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true. For rule3, controller calls cnf api to add rul3 for both CNF-pod-1 and CNF-pod-2. Then set rule3 finalizer. Also set rule3 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true.


  • Ensure that CNF-pod-1 and CNF-pod-2 have latest configuration and there is no duplicate information.

    Once the reconcile finish, both CNF-pod-1 and CNF-pod-2 have latest configuration

  • Controller goes down -> Create CNF Deployment and rule CRs -> Controller goes up
    • No reconcile executed before the controller goes up. The rule CRs have empty status, no rules applied to the CNF deployment
    • Once the controller goes up, it reconciles every rule CR. In the reconcile function, rules are applied and rule CRs status.appliedVersion are updated
  • Controller goes down -> delete rule CRs -> Controller goes up
    • During the controller are down, rule are not deleted from the CNF Deployment. The rule CRs are not deleted from k8s etcd because of finalizer.
    • Once the controller goes up, it reconciles every rule CR. It calls CNF api to delete rules and remove CR finalizer.
  • CNF deployment goes to not-ready -> after some time -> CNF deployment goes to ready status
    • As the CNF deployment goes to not-ready, the controller reconciles every CR which matchs the CNF deployment, to set status.appliedVersion=nil.
    • Once the CNF deployment goes to ready status, controller receives the event and reconciles every rule CR. It applies the rule and set status.appliedVersion.
  • Controller goes down -> CNF deployment pod restart -> Controller goes up
  • During the time controller is down, the CNF pod is restarted, the rules do not exist in restarted pod. So all the related rule CRs status.appliedVersion should be set to nil. But the controller is down, it can't receive the CNF down/up event.
  • When the controller goes up, it reconciles every rule CR. But it doesn't know the CNF had every restarted. This is a problem, so we can't use CR status.appliedVersion to record if the rule is applied or not. Instead we should already call API to get existing rule and compare, and finally apply if existing rule in CNF is different with CR defination

    .


Admission Webhook Usage

We use admission webhook to implemention several features.

...