Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Generic: Infrastructure Orchestration shall be as generic.  Even though this work is being done on behalf of one BP (MICN), infrastructure orchestration shall be common across all BPs in the ICN family.  Also, it shall be possible to use this component in other BPs outside of ICN family.
  • Leverage open source projects:
    • Leverage cluster-API for infra-global-controller. Identify gaps and provide fixed and also provide UI/CLI for good user experience.
    • Leverage Ironic and metal3 for infra-local-controller to do bare-metal provisioning.  Identify any gaps to make it work with Cluster-API.
    • Leverage KuD in infra-local-controller to do Kubernetes installation. Identify any gaps and fix them.
  • Figure out ways to use the bootstrap machine also as workload machine (Not in scope for Akraino-R2)
  • Flexible and Extensible :  
    • Adding any new package in future shall be a  simple addition.
    • Interaction with workload orchestrator shall not be limited to K8S. Shall be able to talk to any workload orchestrator.
  • Data Model driven:
    • Follow CRD models as much as possible.
  • Security:
    • Infra-global and infra-local controller may have privileged access to secrets, keys etc.. Shall ensure to protect them by putting them in HW RoT or at least ensure that they are not visible in clear in HDD/SSDs.
  • Redundancy: Infra-global controller shall be redundant, especially, if it used to manage multiple sites.
  • Performance: 
    • Shall be able to complete the first time installation or patching across multiple servers in a site shall be in minutes < 10minutes 10 minutes for 10 server site. (May need to ensure that jobs are done in parallel - Multi-threading of infra-local-controller).
    • Shall be able to complete the patching across sites shall be done in <10 minutes for 100 sites.  

Architecture:

Blocks and Modules


Image RemovedImage Added


All the green items are existing open source projects. If they require any enhancements, it is best done in the upstream community.

...

As you see above in the picture, bootstrap machine itself is based on K8S.  Note that this K8S is different from the K8S that gets installed in compute nodes.  That is, these are two are different K8S clusters. In case of bootstrap machine, it itself is complete K8S cluster with one node that has both master and minion software combined.  All the components of infra-local-controller (such as BPA,   Metal3 and Ironic) themselves are containers.  

Since we expect infra-local-controller is reachable from outside we expect it to be secured using

  • ISTIO and Envoy (for internal communication as well as for external communication) 

Infra-local-controller is expected to be brought in two ways:

...

User experience for infrastructure administrators:

When using URB USB bootable disk

  1. Select a machine in the location for bootstrapping. 
  2. Boot up a bootstrap machine using USB bootable disk.
  3. Via Kubectl to infra-local-controller via Metal3 CRs, make ironic ready for compute nodes to do PXEBOOT and install Linux.
  4. Upload site specific information via BPA CR - Compute nodes, their roles etc...
  5. Once Linux get installed, Via Kuberctl to BPA (via CR), make BPA install the binary packages (such as Kubelet, docker, kubectl, kubenetes API server for application-K8S)
  6. Via Kuberctl to BPA, get hold of kubeconfig of application-K8S 
  7. Using this kubeconfig, via kubectl to application-K8S, install the packages that can be done via kubectl (such as Multus, OVN Controllers, Virtlet etc...)

...

BPA is expected to store any private key and secret information in SMS.

Implementation suggestions:

KuD needs to be broken into piecies

  • KuD that installs basic packages via Kubespray and packages that are not containerized. BPA can inherit this code.
  • KuD that acts as private docker hub repository.  BPA can inherit this code.
  • KuD that builds the packages from the source code - this needs to be done outside of BPA and binary packages and container packages that result from these are expected to be part of USB bootable disk.
  • KuD that brings containerized packages :  This needs to be taken care as a script on top of infra-local-controller.
  • SMS (Secret Management Service) can be used ASIS.  Integration with SMS can be for Akraino-R2 and not for interim release

Infra-global-controller: 

There could be multiple edges that need to be brought up.  Administrator going to each location, using infra-local-controller to bring up application-K8S cluster in compute nodes of location is not scalable.  "infra-global-controller" is expected to provide centralized software provisioning and configuration system.  It provides one single-pane-of-glass for administrating the edge locations with respect to infrastructure.   Administration involves

  • First time bring up.
  • Addition of new compute nodes in locations.
  • Removal of compute nodes from locations
  • Software patching
  • Software upgrading

...

CSM.

  • SSH passwords used to authenticate with the compute nodes is expected to be stored in SMS of CSM
  • Kuberconfig used to authenticate with application-K8S.

BPA and Ironic related integration:

Ironic is expected to bring up Linux on compute nodes. It is also expected to create SSH keys automatically for each compute node. In addition, it is also expected to create SSH user for each compute node. Usernames and password are expected to be stored in SMS for security reasons in infra-local-controller.  BPA is expected to leverage these authentication credentials when it installs the software packages.

CSM is expected to be used not only for storing secrets, but also securely store and perform crypto operations using CSM.

  • Use PKCS11
  • If TPM is present, Citadel keys are expected to be distributed to TPM and also use TPM for signing operations.

Implementation suggestions:

KuD needs to be broken into pieces

  • KuD that installs basic packages via Kubespray and packages that are not containerized. BPA can inherit this code.
  • KuD that acts as private docker hub repository.  BPA can inherit this code.
  • KuD that builds the packages from the source code - this needs to be done outside of BPA and binary packages and container packages that result from these are expected to be part of USB bootable disk.
  • KuD that brings containerized packages :  This needs to be taken care as a script on top of infra-local-controller.
  • CSM  (Certificate and Secret management) can be used ASIS.  Integration with CSM can be for Akraino-R2 and not for interim release

Infra-global-controller: 

There could be multiple edges that need to be brought up.  Administrator going to each location, using infra-local-controller to bring up application-K8S cluster in compute nodes of location is not scalable.  "infra-global-controller" is expected to provide centralized software provisioning and configuration system.  It provides one single-pane-of-glass for administrating the edge locations with respect to infrastructure.   Administration involves

  • First time bring up.
  • Addition of new compute nodes in locations.
  • Removal of compute nodes from locations
  • Software patching
  • Software upgrading

It is expected that infra-local-controller is brought up in each location.  infra-local-controller kubeconfig is something that is expected to be made known to the infra-global-controller. Beyond that everything else is taken care by infra-global-controller. infra-global-controller communicates with various infra-local-controllers to do the job of software installation and provisioning.

Infra-global-controller runs in its own K8S cluster. All the components of infra-global-controllers are containers.  Following containers are part of the infra-global-controller.

  • Provisioning controller  (PC) Micro Services
  • Binary Provisioning Manager (BPM) Micro services
  • K8S Provisioning Manager (KPM) Micro-services
  • Certificate and Secret management related Micro-services
  • Cluster-API related Micro-services
  • MongoDB for storing packages and OS images.

Since we expect infra-global-controller is reachable from the Internet, we expect it to be secured using

  • ISTIO and Envoy (for internal communication as well as for external communication) 
  • Store Citadel private keys using CSM.
  • Store secrets using SMS of CSM.

Admin user experience:

Assuming that infra-global-controller is brought up with all its micro-services, following steps are expected to be taken up to provision sites/edges.

  1. Register infra-local-controllers using infra-local-k8s kubeconfig information and BPA rechability information of each infra-local-controller : This step is required for each infra-local-controller i.e for each location.  This will provide information to infra-global-controller to reach K8S API server of infra-local-controller of each location.
  2. Upload binary packages (which are binary packages installed in compute nodes by infra-local-controller), helm charts and corresponding container images : This step occurs normally once or when the new versions of the packages.
  3. For every site, upload information about compute nodes and their roles.
  4. Trigger installation on each site.
  5. Monitor the progress
  6. Take corrective actions if necessary.
  7. Monitor the status of each site on continuous basis. (Expectation is that all K8S clusters - global, local, application - would be installed with Prometheus, Node-exporter and cAdvisor microservices. It is an assumption is that all logs are generated and put in fluentd)
    1. Monitor the status of infra-local-K8S and its node.
    2. Monitor the status of application-K8S and compute nodes.
    3. Monitor itself 
    4. Log viewer

Infra-global-controller uses Cluster-API to provision OS related installation in the locations via infra-local-controller. 

Following sections describe the components of infra-global-controller.

Provisioning Controller:

It has following functions

Site information : 

  • It maintains site reachability information - infra-local-controller and BPS reachability information. It provides CRD operator to allow registration of sites.

Inventory information:  

  • It maintains inventory of compute nodes for each site and the roles of each compute node - It gets this information via CRs.
  • It maintains the inventory of software packages and container packages - It gets this information via RESTful API (CRs are not good here as the software packages and container packages are very big).

Application-K8S reachability information:

  • It gets this information from BPM and stores it locally. This is used by PC and KPM at later time to install containerized packages in application-K8S.

Software and configuration of site:

  • Upon invocation of install, patch or update, it goes through series of steps
    • It uploads the OS images, binary packages  to the site (whatever is needed and if the versions are different from the site)
    • It communicates with  Cluster-API to talk to infra-local-controller Metal3 to prepare it for installation of basic operating system and utilities using Linux image.
    • Keeps monitoring to ensure the compute nodes are installed with the OS and utilities.
    • Then it communicates with BPM to talk to BPA to install binary packages.
    • Once BPM provides status that all packages are installed successfully, it gets the kubeconfig of application-K8S.
    • Then it invokes KPM to instantiate containerized packages.
    • Once KPM indicates that all containerized packages are ready/complete, then it provides appropriate status to the user by updating the CR status.

Implementation notes:

  • Site registration code can be borrowed from the ONAP K8S plugin service.
  • New CRD controller is expected to be created with following CRs
    • Site registration related CRs.
    • Compute inventory related CRs.
    • Site install trigger related CRs.
  • Expected to provide APIs
    • For uploading binary packages
    • For uploading containerized packages
    • For uploading OS images
    • Each package, OS image or containerized package is supposed to have right meta data information for identification at later time.

Binary Provisioning Manager (BPM)

K8S Provisioning Manager (KPM)



Design Details

infra-local-controller (Kuralamudhan Ramakrishnan, please fill up this section and subsections of this - Define CRD, give example CRs and then fill up sequence diagrams, directory structure of source code for various modules etc...)

...