Model & Dataset CRD Design

Motivation

This document serves as a place for brainstorming ideas for Model & Dataset CRD design. The general goal is to design reusable CRDs that can be shared by various higher level machine learning tasks and frameworks.

Goals

What do the CRD controllers do? Define the exact responsibilities of model & dataset CRDs and controllers.
How will the higher level tasks, i.e. federated learning, model serving etc, utilize the services provided by model & dataset CRDs.
Cloud edge communication mechanism for the CRD controllers: do they share the existing port 10000, or use a new port exclusively for AI purpose? Related: how do cloud workers and edge workers communicate? Cloud workers can be scheduled in cloud worker nodes, which means they can be deployed as a standard K8s service and have an publicly routable endpoint. Can KubeEdge operate in hybrid mode, i.e. having both cloud worker nodes and edge nodes?

Use Cases

Model serving

This is the simplest case. Upon creating a model CRD object, the edge part of the model controller should get notified and prepare the local workspace for the new model. By specifying the target model version to pull, the controller will download the corresponding model files (the whole directory can be compressed into a tar ball) from the provided URL endpoint.

Federated learning

Upon creating a model CRD object, edge needs to download the training scripts (runs on edge) and initial model weights files from provided URL endpoint. Once the gradients have been calculated, they should be reported back to cloud worker.

It looks like the common behavior of the model CRD object is just to download and upload data and executable files (of different types) to a cloud location, i.e. some storage service like AWS S3.

Space shortcuts

Page tree

Motivation

Goals

Use Cases

Model serving

Federated learning

Design Details