K8s Summary - Containers

K8s Summary - Containers
Photo by Guillaume Bolduc / Unsplash

Containers

  • Containers are repeatable, which ensures the same behavior wherever they're run due to standardization and included dependencies.
  • Containers decouple applications from the underlying host infrastructure, making deployment easier across different cloud or OS environments.
  • Containers form the Pods assigned to a node in a Kubernetes cluster, and they are co-located and co-scheduled to run on the same node.

Container Images

  • A container image is a ready-to-run software package. It includes everything needed to run an application: the code, any required runtime, application and system libraries, and default values for any essential settings.
  • Containers are intended to be stateless and immutable. If you need to make changes, the correct process is to build a new image that includes the change, then recreate the container to start from the updated image.

Container Runtimes

  • The container runtime is the software responsible for running containers.
  • Kubernetes supports several container runtimes such as containerd, CRI-O, and any other implementation of the Kubernetes CRI (Container Runtime Interface).
  • Usually, the default container runtime for a Pod can be chosen by the cluster. However, if there's a need to use more than one container runtime in a cluster, the RuntimeClass for a Pod can be specified.
  • RuntimeClass can also be used to run different Pods with the same container runtime but with different settings

Images

  • A container image encapsulates an application and all its software dependencies. It is an executable software bundle that can run standalone.
  • Container images are typically created and pushed to a registry before being referred to in a Pod.
  • Container images have names which can include a registry hostname and possibly a port number. If no registry hostname is specified, Kubernetes assumes the Docker public registry.
  • Image names can be followed by a tag to identify different versions of the same series of images. If no tag is specified, Kubernetes assumes the "latest" tag.

Updating Images

  • By default, when you create a Deployment, StatefulSet, Pod, or other object with a Pod template, the pull policy of all containers in that pod will be set to "IfNotPresent" if not specified. This policy avoids pulling an image if it already exists locally.

Image Pull Policy

  • The "imagePullPolicy" and the image tag affect when the kubelet attempts to pull the specified image.
  • "IfNotPresent": the image is pulled only if it is not already present locally.
  • "Always": the kubelet queries the container image registry to resolve the name to an image digest each time it launches a container. If the image is cached locally, it is used; otherwise, it is pulled.
  • "Never": the kubelet does not try fetching the image. It attempts to start the container if the image is already present locally; otherwise, startup fails.

Notes

  • Avoid using the "latest" tag in production as it makes it harder to track versions and roll back.
  • To ensure the Pod always uses the same version of a container image, specify the image's digest.
  • If you specify an image by its digest, Kubernetes runs the same code every time it starts a container with that image name and digest, avoiding potential issues with registry changes.
  • There are third-party admission controllers that mutate Pods to ensure the workload runs based on an image digest rather than a tag, offering more control over the code that is run.

Default image pull policy:

  • If imagePullPolicy is not specified and the image tag is :latest, or there's no tag, imagePullPolicy is set to Always.
  • If imagePullPolicy is not specified and the image tag is not :latest, imagePullPolicy is set to IfNotPresent.
  • The imagePullPolicy is set when the object is first created and is not updated if the image's tag later changes.

Required image pull:

  • To always force a pull, set imagePullPolicy to Always, or omit it and use :latest as the image tag. You can also enable the AlwaysPullImages admission controller.

ImagePullBackOff:

  • If Kubernetes cannot pull a container image, the container might be in ImagePullBackOff state, meaning Kubernetes will retry pulling the image with an increasing delay, up to a maximum of 300 seconds.

Serial and parallel image pulls:

  • By default, kubelet pulls images serially. You can enable parallel image pulls by setting serializeImagePulls to false in kubelet configuration.
  • Kubelet never pulls multiple images in parallel for one Pod, but can pull images in parallel for different Pods.

Maximum parallel image pulls:

  • If serializeImagePulls is false, there is no limit on the number of images being pulled at the same time.
  • You can set maxParallelImagePulls in kubelet configuration to limit the number of parallel image pulls.

Multi-architecture images with image indexes:

  • An image index in a container registry can point to multiple image manifests for architecture-specific versions of a container.

Using a private registry:

  • Kubernetes supports specifying registry keys on a Pod via imagePullSecrets.
  • Private registries may require keys for reading images, which can be provided in several ways:
  • Configuring nodes to authenticate to a private registry.
  • Using Kubelet Credential Provider.
  • Pre-pulling images.
  • Specifying imagePullSecrets on a Pod.
  • Configuring nodes to authenticate to a private registry is dependent on the container runtime and registry.
  • The Kubelet Credential Provider dynamically fetches registry credentials for a container image.
  • Pre-pulled images can be used as an alternative to authenticating to a private registry.
  • imagePullSecrets can be added to a Pod definition or to a ServiceAccount resource.

Use cases:

  • For clusters running only open-source images, no configuration is required.
  • For clusters running proprietary images visible to all users, use a private registry, possibly with an admission controller active.
  • For multi-tenant clusters where each tenant needs their own private registry, run a private registry with authorization, generate a registry credential for each tenant, and populate the secret to each tenant namespace.

Interpretation of config.json

  • Kubernetes and Docker interpret config.json differently. In Docker, the auths keys only specify root URLs, but Kubernetes allows glob URLs and prefix-matched paths.
  • The root URL is matched using specific syntax patterns.
  • Multiple entries in config.json are possible, and Kubernetes performs image pulls sequentially for every found credential.

Pre-pulled Images

  • Kubelet tries to pull each image from the specified registry by default. However, if imagePullPolicy is set to IfNotPresent or Never, a local image is used.
  • If you want to rely on pre-pulled images instead of registry authentication, you must ensure all nodes in the cluster have the same pre-pulled images.
  • This method requires control over node configuration and is not reliable if your cloud provider manages nodes.
  • All pods will have read access to any pre-pulled images.

Specifying imagePullSecrets on a Pod

  • Kubernetes supports specifying container image registry keys on a Pod using imagePullSecrets.
  • The referenced Secrets must be of type kubernetes.io/dockercfg or kubernetes.io/dockerconfigjson.
  • You can create a Secret with a Docker config using the command kubectl create secret docker-registry <name> ...
  • Pods can only reference image pull secrets in their own namespace, so the process needs to be done once per namespace.
  • Setting of this field can be automated by setting the imagePullSecrets in a ServiceAccount resource.

Use Cases and Solutions

  1. Cluster running only non-proprietary images: Use public images from a public registry. No configuration required.
  2. Cluster running some proprietary images: Use a hosted private registry or run an internal private registry behind your firewall with open read access.
  3. Cluster with proprietary images that require stricter access control: Ensure AlwaysPullImages admission controller is active. Move sensitive data into a "Secret" resource.
  4. Multi-tenant cluster where each tenant needs own private registry: Run a private registry with authorization required. Generate registry credential for each tenant, put into secret, and populate secret to each tenant namespace.

Container Environment

The container environment in Kubernetes provides several essential resources to containers:

  • A filesystem, a combination of an image and one or more volumes.
  • Information about the container itself.
  • Information about other objects in the cluster.

Container Information

  • The container's hostname is the name of the Pod where the container is running. It can be accessed using the hostname command or the gethostname function in libc.
  • The Pod's name and namespace are accessible as environment variables through the downward API.
  • User-defined environment variables from the Pod definition are available to the container, along with any environment variables specified statically in the container image.

Cluster Information

  • A list of all services running when a container was created is available to the container as environment variables. This list only includes services in the same namespace as the container's Pod and Kubernetes control plane services.
  • For a service named foo mapping to a container named bar, the variables FOO_SERVICE_HOST (the host where the service runs) and FOO_SERVICE_PORT (the port where the service runs) are defined.
  • Services have dedicated IP addresses and can be accessed by the container via DNS, provided the DNS addon is enabled.

Runtime Class

RuntimeClass is a Kubernetes feature that allows the selection of the container runtime configuration. It is used to run a Pod's containers.

Motivation

  • RuntimeClass can be set differently between Pods to balance performance and security. For instance, Pods requiring high security may use a container runtime that uses hardware virtualization.
  • RuntimeClass can also be used to run different Pods with the same container runtime but with different settings.

Setup

  1. Configure the Container Runtime Interface (CRI) implementation on nodes. Note that RuntimeClass assumes a homogeneous node configuration across the cluster by default.
  2. Create the corresponding RuntimeClass resources. The configuration set up in step 1 should each have an associated handler name, which identifies the configuration.

A RuntimeClass resource consists of two significant fields:

  • The RuntimeClass name (metadata.name)
  • The handler (handler)
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: myclass 
handler: myconfiguration 

Usage

A runtimeClassName can be specified in the Pod spec to use it. If no runtimeClassName is specified, the default RuntimeHandler will be used.

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  runtimeClassName: myclass

CRI Configuration

Runtime handlers can be configured through the CRI implementation's configuration.

Scheduling

By specifying the scheduling field for a RuntimeClass, constraints can be set to ensure that Pods running with this RuntimeClass are scheduled to nodes that support it.

The supported nodes should have a common label selected by the runtimeclass.scheduling.nodeSelector field.

If the supported nodes are tainted to prevent other RuntimeClass pods from running on the node, tolerations can be added to the RuntimeClass.

Pod Overhead

Pod overhead is defined in RuntimeClass through the overhead field. Overhead resources associated with running a Pod can be declared, allowing the cluster to account for it when making decisions about Pods and resources.

Container Lifecycle Hooks

Lifecycle hooks allow Containers to be aware of events in their management lifecycle and run code when the corresponding lifecycle hook is executed.

Types of Hooks

  • PostStart: Executed immediately after a container is created. There's no guarantee that the hook will execute before the container ENTRYPOINT. No parameters are passed to the handler.
  • PreStop: Called immediately before a container is terminated due to an API request or management event. The hook must complete before the TERM signal to stop the container can be sent.

Hook Handler Implementations

Containers can implement and register a handler for a hook. There are two types of hook handlers:

  • Exec: Executes a specific command inside the cgroups and namespaces of the Container.
  • HTTP: Executes an HTTP request against a specific endpoint on the Container.

Hook Handler Execution

Hook handler calls are synchronous within the context of the Pod containing the Container.

  • For a PostStart hook, the Container ENTRYPOINT and hook fire asynchronously.
  • PreStop hooks must complete their execution before the TERM signal can be sent.

If a hook fails, it kills the Container. Therefore, hook handlers should be as lightweight as possible.

Hook Delivery Guarantees

Hook delivery is intended to be at least once, meaning a hook may be called multiple times for any given event. It's up to the hook implementation to handle this correctly.

Debugging Hook Handlers

The logs for a Hook handler are not exposed in Pod events. If a handler fails, it broadcasts an event. For PostStart, this is the FailedPostStartHook event, and for PreStop, this is the FailedPreStopHook event.