K8s Summaries - Storage
Volumes in Kubernetes
Kubernetes supports several types of volumes, including:
Deprecated Volumes
awsElasticBlockStore
azureDisk
azureFile
gcePersistentDisk
Migrating Volumes
- AWS EBS, Azure Disk, and Azure File have respective CSI migration paths.
Other Volume Types
cephfs
cinder
- OpenStack has a CSI migration path.
ConfigMap
- Injects configuration data into pods.
- You must create a ConfigMap before you can use it.
- A container using a ConfigMap as a subPath volume mount will not receive ConfigMap updates.
- Text data is exposed as files using the UTF-8 character encoding.
DownwardAPI
- Makes downward API data available to applications.
- Exposed data is in read-only files in plain text format.
- A container using the downward API as a subPath volume mount does not receive updates when field values change.
EmptyDir
- Created when a Pod is assigned to a node and exists as long as that Pod is running on that node.
- All containers in the Pod can read and write the same files in the emptyDir volume.
- The data in an emptyDir volume is safe across container crashes.
- A size limit can be specified for the default medium, which limits the capacity of the emptyDir volume.
Fibre Channel (fc)
- Allows an existing fibre channel block storage volume to mount in a Pod.
- You must configure FC SAN Zoning to allocate and mask those LUNs (volumes) to the target WWNs beforehand so that Kubernetes hosts can access them.
GCE Persistent Disk
- The contents of a PD are preserved and the volume is merely unmounted when a pod is removed.
- A gcePersistentDisk volume permits multiple consumers to simultaneously mount a persistent disk as read-only.
- Using a GCE persistent disk with a Pod controlled by a ReplicaSet will fail unless the PD is read-only or the replica count is 0 or 1.
Regional Persistent Disks
- Available in two zones within the same region.
- Must be provisioned as a PersistentVolume; referencing the volume directly from a pod is not supported.
GCE CSI Migration
- Redirects all plugin operations from the existing in-tree plugin to the pd.csi.storage.gke.io Container Storage Interface (CSI) Driver.
- The GCE PD CSI Driver must be installed on the cluster.
Please refer to the detailed documentation for more specific configuration examples and further usage details.
gitRepo (deprecated)
- The
gitRepo
volume type is deprecated. - To provision a container with a git repo, use an EmptyDir with an InitContainer to clone the repo.
- A gitRepo volume is a type of volume plugin that mounts an empty directory and clones a git repository into this directory for the Pod to use.
glusterfs (removed)
- As of Kubernetes 1.27, the glusterfs volume type is no longer included.
- The GlusterFS in-tree storage driver was deprecated in Kubernetes v1.25 and removed entirely in v1.26.
hostPath
HostPath
volumes mount a file or directory from the host node's filesystem into the Pod.- They pose security risks and should be used sparingly and with specific access controls.
- They can be used for containers needing access to Docker internals, running cAdvisor, or allowing a Pod to specify whether a given hostPath should exist prior to the Pod running.
- Can optionally specify a type for a hostPath volume. Types include DirectoryOrCreate, Directory, FileOrCreate, File, Socket, CharDevice, and BlockDevice.
iscsi
iSCSI
volume allows an existing iSCSI (SCSI over IP) volume to be mounted into your Pod.- Its contents are preserved and the volume is merely unmounted when a Pod is removed.
- Can be mounted as read-only by multiple consumers simultaneously.
local
- A
local
volume represents a mounted local storage device such as a disk, partition or directory. - They can only be used as a statically created PersistentVolume and dynamic provisioning is not supported.
- Subject to the availability of the underlying node and not suitable for all applications.
- Recommended to create a StorageClass with volumeBindingMode set to WaitForFirstConsumer when using local volumes.
nfs
- An
NFS
volume allows an existing NFS (Network File System) share to be mounted into a Pod. - Its contents are preserved and the volume is merely unmounted when a Pod is removed.
- NFS can be mounted by multiple writers simultaneously.
persistentVolumeClaim
- A
persistentVolumeClaim
volume is used to mount a PersistentVolume into a Pod. - Allows users to "claim" durable storage without knowing the details of the particular cloud environment.
portworxVolume (deprecated)
- A
portworxVolume
is an elastic block storage layer that runs hyperconverged with Kubernetes, but it's deprecated as of Kubernetes v1.25. - Can be dynamically created through Kubernetes or pre-provisioned and referenced inside a Pod.
- Portworx CSI migration feature is in beta state as of Kubernetes v1.25, but turned off by default.
- It redirects all plugin operations from the existing in-tree plugin to the
pxd.portworx.com
Container Storage Interface (CSI) Driver, which must be installed on the cluster.
Projected Volume
- A projected volume maps multiple existing volume sources into the same directory.
Rados Block Device (RBD) Volume
- An RBD volume allows a Rados Block Device volume to mount into your Pod.
- Unlike
emptyDir
, an RBD volume's contents persist even when a pod is removed. - This allows pre-population of data that can be shared between pods.
- You need a running Ceph installation to use RBD.
- RBD volumes can be mounted as read-only by multiple consumers simultaneously, but can only be mounted by a single consumer in read-write mode.
RBD CSI Migration
- As of Kubernetes v1.23 (alpha), all plugin operations for RBD can be redirected to the
rbd.csi.ceph.com
CSI driver when theCSIMigrationRBD
feature gate is enabled. - To use this, you must:
- Install the Ceph CSI driver (v3.5.0 or above) in your Kubernetes cluster.
- Create a clusterID based on the monitors hash in the CSI config map.
- If the adminId value in the StorageClass differs from
admin
, patch the adminSecretName with the base64 value of the adminId parameter value.
Secret Volume
- Secret volumes are used to pass sensitive information to Pods.
- Secrets can be stored in the Kubernetes API and mounted as files for use by pods.
- They are backed by
tmpfs
, a RAM-backed filesystem, so they are never written to non-volatile storage. - You must create a Secret in the Kubernetes API before using it.
- A container using a Secret as a subPath volume mount won't receive Secret updates.
vSphereVolume (deprecated)
- vSphereVolume is used to mount a vSphere VMDK volume into your Pod and supports both VMFS and VSAN datastores.
- The contents of a volume persist when it is unmounted.
- The use of vSphere CSI out-of-tree driver is recommended instead of vSphereVolume.
vSphere CSI Migration
- As of Kubernetes v1.26 (stable), all operations for the in-tree vsphereVolume type are redirected to the
csi.vsphere.vmware.com
CSI driver. - To migrate, you must:
- Install the vSphere CSI driver on your cluster.
- Run vSphere 7.0u2 or later.
- Some StorageClass parameters from the built-in vsphereVolume plugin are not supported by the vSphere CSI driver.
vSphere CSI Migration Complete
- As of Kubernetes v1.19 (beta), to disable the vsphereVolume plugin, you need to set
InTreePluginvSphereUnregister
feature flag to true. - The
csi.vsphere.vmware.com
CSI driver must be installed on all worker nodes.
Using subPath
- The
volumeMounts.subPath
property is used to specify a sub-path inside a referenced volume instead of the root. This allows sharing a single volume for multiple uses in one pod. - Example: A LAMP (Linux Apache MySQL PHP) stack pod configures a shared volume, mapping the PHP application's code and assets to the volume's
html
folder and the MySQL database to themysql
folder.
Using subPath with Expanded Environment Variables
subPathExpr
is used to constructsubPath
directory names from downward API environment variables.subPath
andsubPathExpr
properties are mutually exclusive.- Example: A Pod uses
subPathExpr
to create a directorypod1
within thehostPath
volume/var/log/pods
. The directory/var/log/pods/pod1
is mounted at/logs
in the container.
Resources
- The storage media of an
emptyDir
volume is determined by the filesystem holding the kubelet root dir (typically/var/lib/kubelet
). There's no space limit or isolation foremptyDir
orhostPath
volumes.
Out-of-tree Volume Plugins
- Out-of-tree volume plugins, like Container Storage Interface (CSI) and FlexVolume (deprecated), allow storage vendors to create custom storage plugins without adding their plugin source code to the Kubernetes repository.
- These plugins can be developed independently of the Kubernetes code base and deployed on Kubernetes clusters as extensions.
CSI
- CSI defines a standard interface for container orchestration systems to expose arbitrary storage systems to their container workloads.
- A CSI compatible volume driver, once deployed on a Kubernetes cluster, allows users to use the
csi
volume type to attach or mount the volumes exposed by the CSI driver.
CSI Ephemeral Volumes
- CSI volumes can be directly configured within the Pod specification. These volumes are ephemeral and do not persist across pod restarts.
Windows CSI Proxy
- For Windows worker nodes, privileged operations for containerized CSI node plugins are supported using
csi-proxy
, a community-managed, stand-alone binary that needs to be pre-installed on each Windows node.
Migrating to CSI Drivers from In-tree Plugins
- The
CSIMigration
feature redirects operations against existing in-tree plugins to corresponding CSI plugins. This allows a smooth transition to a CSI driver that supersedes an in-tree plugin without any configuration changes.
FlexVolume
- FlexVolume is a deprecated out-of-tree plugin interface that uses an exec-based model to interface with storage drivers. It's recommended to use an out-of-tree CSI driver for integrating external storage with Kubernetes.
Mount Propagation
- Mount propagation allows sharing volumes mounted by a container with other containers in the same pod or even other pods on the same node.
- Three propagation modes are available:
None
(default),HostToContainer
, andBidirectional
. Bidirectional
mount propagation is only allowed in privileged containers due to potential damage to the host operating system.
Configuration
- For mount propagation to work properly on some deployments (CoreOS, RedHat/Centos, Ubuntu), Docker's
MountFlags
must be configured correctly. After the configuration, Docker daemon needs to be restarted.
Kubernetes Persistent Volumes and Persistent Volume Claims
Introduction
- Persistent Volume (PV): A piece of storage in the cluster that is provisioned by an administrator or dynamically provisioned using Storage Classes. It has a lifecycle independent of any individual Pod that uses the PV. It captures the details of the implementation of the storage system.
- Persistent Volume Claim (PVC): A user's request for storage. It is similar to a Pod - as Pods consume node resources, PVCs consume PV resources. PVCs can request specific size and access modes.
- StorageClass resource: Allows cluster administrators to offer a variety of PersistentVolumes with different properties, without exposing users to the implementation details.
Lifecycle of a Volume and Claim
Provisioning: PVs can be provisioned statically or dynamically.
- Static: A cluster administrator creates a number of PVs which exist in the Kubernetes API for consumption.
- Dynamic: When no static PVs match a user's PVC, the cluster may dynamically provision a volume for the PVC based on StorageClasses.
- Binding: A control loop in the master watches for new PVCs, finds a matching PV (if possible), and binds them together. Once bound, PVC to PV binding is exclusive and one-to-one.
- Using: Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a Pod.
- Storage Object in Use Protection: Ensures that PVCs in active use by a Pod and PVs that are bound to PVCs are not removed from the system to prevent data loss.
Reclaiming: When a user is done with their volume, they can delete the PVC objects from the API for resource reclamation. The reclaim policy for a PV tells the cluster what to do with the volume after it has been released of its claim. Policies include:
- Retain: Allows for manual reclamation of the resource.
- Delete: Removes both the PV object from Kubernetes and the associated storage asset in the external infrastructure.
- Recycle (deprecated): Performs a basic scrub on the volume and makes it available again for a new claim.
Note
A PVC is in active use by a Pod when a Pod object exists that is using the PVC. PVC or PV removal is postponed until the PVC is no longer actively used by any Pods, or the PV is no longer bound to a PVC.
PersistentVolume Deletion Protection Finalizer
- PersistentVolumes with a Delete reclaim policy are only deleted after the backing storage is deleted.
- Two finalizers are introduced,
kubernetes.io/pv-controller
for in-tree plugin volumes, andexternal-provisioner.volume.kubernetes.io/finalizer
for CSI volumes. - When the CSIMigration{provider} feature flag is enabled for an in-tree volume plugin,
kubernetes.io/pv-controller
is replaced byexternal-provisioner.volume.kubernetes.io/finalizer
.
Reserving a PersistentVolume
- You can pre-bind a PersistentVolumeClaim (PVC) to a specific PersistentVolume (PV).
- You specify a PV in a PVC to declare a binding.
- The binding happens regardless of some volume matching criteria, but the control plane still checks that storage class, access modes, and requested storage size are valid.
- To reserve a specific storage volume, specify the relevant PVC in the claimRef field of the PV so that other PVCs can't bind to it.
Expanding Persistent Volumes Claims
- PVCs can be expanded if their storage class's allowVolumeExpansion field is set to true.
- A larger volume for a PVC can be requested by editing the PVC object and specifying a larger size.
- Directly editing the size of a PersistentVolume can prevent an automatic resize of that volume.
CSI Volume Expansion
- CSI volume expansion requires a specific CSI driver to support volume expansion.
- Resizing a volume containing a file system can only be done if the file system is XFS, Ext3, or Ext4.
- File system expansion is either done when a Pod is starting up or when a Pod is running and the underlying file system supports online expansion.
Resizing an in-use PersistentVolumeClaim
- In-use PVCs automatically become available to its Pod as soon as its file system has been expanded.
- This has no effect on PVCs that are not in use by a Pod or deployment.
Recovering from Failure when Expanding Volumes
- If a new size specified is too big to be satisfied by underlying storage system, expansion of PVC will be continuously retried until action is taken.
- A manual recovery can be done by the cluster administrator to cancel the resize requests.
Types of Persistent Volumes
- PersistentVolume types are implemented as plugins.
- Current supported types: cephfs, csi, fc, hostPath, iscsi, local, nfs, rbd.
- Deprecated types: awsElasticBlockStore, azureDisk, azureFile, cinder, flexVolume, gcePersistentDisk, portworxVolume, vsphereVolume. These are still supported but will be removed in a future Kubernetes release.
- No longer supported types: photonPersistentDisk
PersistentVolume (PV) Specifications
- Spec and Status: Specification and status of the volume.
- Name: Must be a valid DNS subdomain name.
- Capacity: Storage capacity (e.g., 5Gi).
- VolumeMode: Filesystem (default) or Block.
- AccessModes: ReadWriteOnce, ReadOnlyMany, ReadWriteMany, ReadWriteOncePod (beta).
- StorageClassName: Specifies the class of the PV.
- PersistentVolumeReclaimPolicy: Retain, Recycle, or Delete.
- MountOptions: Additional options for mounting PV on a node (not supported by all PV types).
Capacity
- Storage size is the only resource that can be set or requested (for now).
Volume Mode
- Filesystem: Mounted into Pods into a directory.
- Block: Presented as a raw block device without any filesystem on it (useful for the fastest possible access).
Access Modes
- ReadWriteOnce (RWO): Volume can be mounted as read-write by a single node.
- ReadOnlyMany (ROX): Volume can be mounted as read-only by many nodes.
- ReadWriteMany (RWX): Volume can be mounted as read-write by many nodes.
- ReadWriteOncePod (RWOP, beta): Volume can be mounted as read-write by a single Pod.
Storage Class
- StorageClassName attribute specifies the class of a PV.
- PVs with no storageClassName have no class.
Reclaim Policy
- Retain: Manual reclamation.
- Recycle: Basic scrub (rm -rf /thevolume/*).
- Delete: Deletes associated storage asset.
Mount Options
- Specify additional options for mounting a PV on a node.
- Not all PV types support mount options.
Node Affinity
- Define constraints to limit what nodes a volume can be accessed from.
- Automatically populated for AWS EBS, GCE PD, and Azure Disk volume block types.
Volume Phases
- Available: Free resource not yet bound to a claim.
- Bound: Volume is bound to a claim.
- Released: Claim has been deleted, resource not yet reclaimed by the cluster.
- Failed: Volume has failed its automatic reclamation.
PersistentVolumeClaims (PVC)
Overview
A PersistentVolumeClaim (PVC) specifies how much storage a user needs and how it should be accessed. It's a request for storage by a user.
Key Components
- Spec and Status: Each PVC contains a spec and status, which outlines the specifications and status of the claim.
- Access Modes: Claims follow the same conventions as volumes when requesting storage.
- Volume Modes: Claims use the same conventions as volumes for consumption either as a filesystem or block device.
- Resources: Like Pods, claims can request specific quantities of a resource, in this case, storage.
- Selector: Claims can specify a label selector to filter the set of volumes. Only volumes whose labels match the selector can be bound to the claim.
- Class: A claim can request a particular class by specifying the name of a StorageClass using the attribute
storageClassName
.
Selectors
Two types of fields in selectors:
- matchLabels: The volume must have a label with this value.
- matchExpressions: A list of requirements made by specifying key, list of values, and operator that relates the key and values.
Class
A claim can request a particular class by specifying the name of a StorageClass using the attribute storageClassName
. PVCs don't necessarily have to request a class. The handling of storageClassName
depends on whether the DefaultStorageClass admission plugin is turned on or off.
Retroactive Default StorageClass Assignment
In the event that a PVC is created without specifying a storageClassName
, the control plane identifies any existing PVCs without storageClassName
and updates those PVCs to match the new default StorageClass when it becomes available.
Claims as Volumes
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the Pod using the claim.
Namespaces
Since PersistentVolumeClaims are namespaced objects, mounting claims with "Many" modes (ROX, RWX) is only possible within one namespace.
Raw Block Volume Support
Certain volume plugins support raw block volumes, including dynamic provisioning where applicable.
Binding Block Volumes
When a user requests a raw block volume by using the volumeMode
field in the PVC spec, the binding rules change. There's a matrix for possible combinations of requesting a raw block device.
Volume Snapshot and Restore Volume from Snapshot
- Supported only by out-of-tree CSI volume plugins (In-tree volume plugins are deprecated).
- To create a PersistentVolumeClaim (PVC) from a Volume Snapshot:
yamlCopy codeapiVersion: v1 kind: PersistentVolumeClaim metadata: name: restore-pvc spec:storageClassName: csi-hostpath-sc dataSource: name: new-snapshot-test kind:VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources:requests: storage: 10Gi
Volume Cloning
- Only available for CSI volume plugins.
- To create a PVC from an existing PVC:
yamlCopy codeapiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec:storageClassName: my-csi-plugin dataSource: name: existing-src-pvc-name kind:PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
Volume Populators and Data Sources
- Kubernetes supports custom volume populators; must enable the AnyVolumeDataSource feature gate.
- The dataSourceRef field can contain a reference to any object in the same namespace, except for core objects other than PVCs.
- Use of dataSourceRef is preferred over dataSource for clusters with the feature gate enabled.
Cross Namespace Data Sources
- Kubernetes supports cross-namespace volume data sources; must enable AnyVolumeDataSource and CrossNamespaceVolumeDataSource feature gates.
- Allows you to specify a namespace in the dataSourceRef field.
- Requires ReferenceGrant from the Gateway API to use this mechanism.
Data Source References
- dataSourceRef and dataSource fields are almost identical and cannot be changed after creation.
- The dataSource field ignores invalid values while the dataSourceRef field never does.
- The dataSource field only allows PVCs and VolumeSnapshots, while the dataSourceRef field may contain different types of objects.
- When the CrossNamespaceVolumeDataSource feature is enabled, the dataSourceRef field allows objects in any namespaces and does not sync with dataSource when a namespace is specified.
Using Volume Populators
- Volume populators are controllers that create non-empty volumes determined by a Custom Resource.
- A populated volume is created by referring to a Custom Resource using the dataSourceRef field.
Using Cross-Namespace Volume Data Sources
- Create a ReferenceGrant to allow the namespace owner to accept the reference.
- Define a populated volume by specifying a cross-namespace volume data source using the dataSourceRef field.
- Requires a valid ReferenceGrant in the source namespace.
Writing Portable Configuration
- Include PVC objects in your configuration bundle.
- Do not include PersistentVolume (PV) objects in the config.
- Allow the user to provide a storage class name when instantiating the template.
- If no storage class name is provided, leave the persistentVolumeClaim.storageClassName field as nil to automatically provision a PV with the default StorageClass.
- Monitor PVCs not getting bound after some time to identify potential issues with dynamic storage support or the lack of a storage system.