Overview
gridscale Managed Kubernetes (GSK) is a secure and fully-managed Kubernetes solution. All you need to do is to configure how powerful you wish your cluster to be. We take care of upgrades and OS maintenance.
- Release Support
- GSK Updates and Upgrades
- Considerations for Upgrades
- Connect a Kubernetes Cluster to a PaaS service
- GSK Resource Protection
- Worker Node Resources
- Horizontal Pod Autoscaler (HPA)
- Cluster Autoscaler
- Vertical Scaling
- Logging
- Load Balancing
- Networking
- Persistent Volumes
- Local Rocket Storage
- Ingress Controller
- Access and Security
- Backups
- Node Pools
- Kubernetes Dashboard
- Known issues
- FAQ
- Terms and Abbreviations
On this page
- Release Support
- GSK Updates and Upgrades
- Considerations for Upgrades
- Connect a Kubernetes Cluster to a PaaS service
- GSK Resource Protection
- Worker Node Resources
- Horizontal Pod Autoscaler (HPA)
- Cluster Autoscaler
- Vertical Scaling
- Logging
- Load Balancing
- Networking
- Persistent Volumes
- Local Rocket Storage
- Ingress Controller
- Access and Security
- Backups
- Node Pools
- Kubernetes Dashboard
- Known issues
- FAQ
- Terms and Abbreviations
gridscale Managed Kubernetes (GSK) fully integrates into our products, offering easy configuration, monitoring, release management and security enabling you to explicitly focus on your business applications.
GSK easily integrates with our Load Balancer, Certificates and Storage IaaS for Ingress and persistent volumes respectively.
If you are new to Kubernetes or containers in general, we’d recommend you get familiar with commonly used terminology and go through our line-up of content for you to get started:
- How to: connect gridscale Kubernetes Cluster and PaaS
- Release notes {%gs-only%}
- gridscale Kubernetes Cluster in 5 Minuten einrichten
- Kubernetes - All about clusters, pods and kubelets {%/gs-only%}
You also may want to take a look at known issues.
Release Support
The GSK offering supports three stable Kubernetes releases (minor/major versions) at any time.
New Kubernetes releases provided by the community are adopted within 6 months of time, after their initial stable release. This adoption window is used to migrate GSK components, assure stable operations and provide a migration path from previous releases.
Releases other than the latest three are deprecated, not available for new clusters and no longer maintained for existing clusters. You are notified of GSK release deprecation in your Cloud Panel four weeks in advance.
Once deprecated, your cluster is subject to auto-upgrade. With auto-upgrades, correct functioning of your workloads cannot be guaranteed, since Kubernetes releases do introduce breaking changes and preparations on your side should be made.
Please upgrade your clusters proactively ahead of deprecation.
Release Notes
Please find the release notes here.
GSK Updates and Upgrades
This section includes vital information for a smooth upgrade of your cluster. Please read it carefully.
Before you upgrade
Before you upgrade, please check the GSK release notes page for changes that affect you. We list all changes specific to GSK, our offering of Kubernetes, on that page. Breaking changes are marked explicitly.
For a list of changes to Kubernetes itself, please check the Kubernetes Changelog.
Patch Updates
Patch updates contain either a new Kubernetes patch release or GSK specific changes (such as CSI plugin) or both.
Availability of new patch updates are announced as notifications in your Cloud Panel.
Upon availability, you can update your cluster via the Cloud Panel or the API at a time of your choosing.
Please consult the upgrade considerations section below.
Release Upgrades
Release upgrades contain a new Kubernetes minor or major release and (optionally) GSK specific changes. Release upgrades are not performed automatically for you.
You can perform release upgrades via the Cloud Panel or the API at a time of your choosing.
Please consult the upgrade considerations section below for compatibility information between Kubernetes releases.
Performing Patch Updates and Release Upgrades via the API
- Get your GSK service:
curl 'https://api.gridscale.io/objects/paas/services/<CLUSTER_UUID>' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>'
- Get the available Service Templates:
curl 'https://api.gridscale.io/objects/paas/service_templates' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>'
3.Take the current service_template_uuid
from Step 2, which corresponds to your GSK cluster found in Step 1.
Find the target services template from the
patch_updates
attribute from Step 3.Initiate GSK Update via Service Patch using the UUID from Step 4:
curl 'https://api.gridscale.io/objects/paas/services/<CLUSTER_UUID>' -X PATCH -H 'Content-Type: application/json' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>' --data-raw '{"service_template_uuid":"<PATCH_UPDATE_SERVICE_TEMPLATE_UUID>"}'
Effect of Updates and Upgrades on Nodes and Workloads
Nodes are considered volatile in the Kubernetes cluster. During updates, upgrades or node recoveries, nodes are not modified - they are replaced.
The process starts by upgrading the master node. Kubernetes API will experience a short interruption during which you won’t be able to change cluster resources. Existing pods will continue to run uninterrupted. New pods can be scheduled once the master node upgrade has completed.
The next step is upgrading all worker nodes. This is a sequential process, where nodes are upgraded one at a time. To avoid resource shortage, surge upgrades are performed by default.
Worker node upgrades drain workloads of the node before taking it down, to allow your pods to be rescheduled gracefully. In case pod disruption policies prevent your workloads from being drained, the process will continue to ensure cluster integrity. Once the node has been drained, it is replaced and joins the cluster again.
Be sure to configure your workloads with redundancies in place, so that they remain available during an upgrade, if continuous operation is a priority for your workload.
Surge Upgrades
With surge upgrades, resource shortage during upgrades is counteracted by adding worker nodes for the time of the upgrade.
If enabled (default is 1 surge node), the configured amount of nodes are added to your cluster before the first node is taken down. They are temporary in nature and are removed once the upgrade has succeeded.
Additional costs are generated during surge node lifetime. You can disable surge upgrades in your Cloud Panel or via the API by setting parameter k8s_surge_node_count
to 0
.
Note: Surge node count is currently limited to either 0 or 1. Support for counts >1 will be added in the future.
Impact on Node Labels
Node labels are not persisted when nodes are replaced. In case you rely on node labels to control where deployments run in your cluster, please look into Affinity and anti-affinity as the preferred approach.
Considerations for Upgrades
Patch updates (1.19.10 → 1.19.11) is considered safe from the Kubernetes project.
Release upgrades (1.16.x → 1.17.x) can introduce breaking changes. To check if your workloads (deployments, services, daemonsets, etc.) are still compatible with the Kubernetes release you want to upgrade to, you can do several things.
Read on to find out how to check if your workloads (deployments, services, daemonsets etc.) are still compatible with the Kubernetes release you want to upgrade to.
Official Kubernetes Documentation
You can find the official Kubernetes release notes in the Changelog.
Another helpful resource is the Deprecated API Migration Guide, which lists all API removals by release.
Example:
The extensions/v1beta1 API version of NetworkPolicy is no longer served as of v1.16.
- Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.8.
- All existing persisted objects are accessible via the new API
Deploy and Test Workload on a Temporary Cluster
The easiest way is just to provision a test cluster with the new release.
Deploy your workloads to the test cluster and check if everything is working as expected.
This way you can make sure your workloads are compatible with the kubernetes release you want to upgrade to without impact on live workloads.
Third Party Cluster Linting Tool
There are some third party tools which could make the transition easier.
Pluto is a tool which helps users find deprecated Kubernetes APIs.
In this example we see two files in our directory that have deprecated apiVersions.
Deployment extensions/v1beta1
is no longer available and needs to be replaced with apps/v1
.
This will need to be fixed prior to a 1.16 upgrade:
pluto detect-files -d kubernetes/testdata
NAME KIND VERSION REPLACEMENT REMOVED DEPRECATED
utilities Deployment extensions/v1beta1 apps/v1 true true
utilities Deployment extensions/v1beta1 apps/v1 true true
Head over to the Pluto Documentation to read more about in-depth usage.
Connect a Kubernetes Cluster to a PaaS service
_Requires 1.19.16-gs0, 1.20.15-gs0, 1.21.11-gs0 or higher.
We recently released the support of private networks with IPv4 for PaaS services. This feature allows you to access a PaaS service from a Kubernetes cluster as a Kubernetes service, so your application can access the PaaS service without a proxy. You can follow the following steps:
- First, you need a GSK cluster. The worker nodes of the GSK cluster will be connected to a private network with IPv4.
- Determine the private network that the worker nodes are connected to. The name of the private network always consists of the cluster name and the suffix
private
. For example, if you have a cluster namedmy-first-gridscale-k8s
, the name of the cluster’s private network ismy-first-gridscale-k8s-private
- Connect your PaaS service to the cluster’s private network that you looked up in the previous step. For both new and existing services you can do so
- via the API, where you specify
network_uuid
in the create or update request’s payload. - via the panel, where you can check the “Relate custom private Network” box during creation of the PaaS service or with the edit-icon in the Connections-pane for existing PaaS services. Then select the corresponding network from the dropdown.
- via the API, where you specify
- Create a Kubernetes service via mapping a hostname to the PaaS service private IP
Determine the PaaS service private IP from the Service Access: for example a Postgres database with the following Service Access:
connection-string format:
postgres://postgres:XXpasswordXX@10.244.0.43:5432
connection-parameters format:
username = postgres password = XXpasswordXX host = 10.244.0.43 port = 5432
Create a Kubernetes service as in this example
kind: "Service" apiVersion: "v1" metadata: name: "paas-postgres" spec: ports: - name: "paas-postgres" protocol: "TCP" port: 5432 targetPort: 5432
After applying the above yaml manifest, you can get the
paas-postgres
service as following$ kubectl get services paas-postgres NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE paas-postgres ClusterIP 10.244.69.82 <none> 5432/TCP 2d17h
Create a Kubernetes Endpoints for the Kubernetes service. The IP address should be the one from the service access (connection-string or connection-parameters). In this example, the ip address is
10.244.0.43
.kind: "Endpoints" apiVersion: "v1" metadata: name: "paas-postgres" subsets: - addresses: - ip: "10.244.0.43" ports: - port: 5432 name: "paas-postgres"
After applying the above yaml manifest, you can get the
paas-postgres
endpoints as following$ kubectl get endpoints paas-postgres NAME ENDPOINTS AGE paas-postgres 10.244.0.43:5432 2d17h
Create the secrets for database access, use the postgres database, username, and password.
$ kubectl create secret generic paas-postgres \ --from-literal=database=postgres \ --from-literal=username=postgres \ --from-literal=password=XXpasswordXX
As the service, endpoint, and secrets were created, the application now can access the
postgres
database as a Kubernetes service. Here is an example on how to configure your application to access thepostgres
database.apiVersion: apps/v1 kind: Deployment metadata: name: my-app labels: app: my-app spec: replicas: 1 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-app image: postgres:12-alpine imagePullPolicy: Always env: - name: DATABASE_HOST value: "paas-postgres" - name: DATABASE_NAME valueFrom: secretKeyRef: name: paas-postgres key: database - name: DATABASE_USER valueFrom: secretKeyRef: name: paas-postgres key: username - name: DATABASE_PASSWORD valueFrom: secretKeyRef: name: paas-postgres key: password - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: name: paas-postgres key: password ports: - containerPort: 8080
Show the pods
$ kubectl get pods NAME READY STATUS RESTARTS AGE my-app-6559f7f88c-fjqtq 1/1 Running 0 10s
You can access the database from one of the pods
$ kubectl exec -it my-app-6559f7f88c-fjqtq bash
Connect, describe and list the database
bash-5.1# PGPASSWORD=$POSTGRES_PASSWORD psql -U $DATABASE_USER -h $DATABASE_HOST psql (12.10, server 13.0 (Debian 13.0-1.pgdg100+1)) WARNING: psql major version 12, server major version 13. Some psql features might not work. Type "help" for help. postgres=# \d List of relations Schema | Name | Type | Owner --------+-----------------------------------+----------+---------- public | auth_group | table | postgres public | auth_group_id_seq | sequence | postgres public | auth_group_permissions | table | postgres public | auth_group_permissions_id_seq | sequence | postgres public | auth_permission | table | postgres public | auth_permission_id_seq | sequence | postgres public | auth_user | table | postgres public | auth_user_groups | table | postgres public | auth_user_groups_id_seq | sequence | postgres public | auth_user_id_seq | sequence | postgres postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+------------+------------+----------------------- postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres (3 rows) postgres=#
GSK Resource Protection
Resources like servers, storages, networks, ip addresses or load balancers, which make up the cluster, are visible to you via API or within the Cloud Panel for transparency and billing reasons. They are, however, protected from being altered. This not only makes sure that they are not deleted accidentally, but is also vital to stable cluster operations.
Protected Resources:
- Master Nodes (server, storage, ips)
- Worker Nodes (server, storage, ips)
- Kubernetes network
- You can still attach your own servers or platform services to it, i.e. to access them from inside your cluster.
- Storages created by Kubernetes (like Persistent Volumes)
- LoadBalancers created by Kubernetes (like Ingress-Controllers)
If you want to change your worker config you can still do this in the Kubernetes configuration.
Worker Node Resources
Workloads (pods) can utilise a subset of worker node memory and CPU resources.
Depending on the node size, GSK reserves 450MiB of memory for system resources, and between 640MiB - 1465MiB for kubernetes components. The rest is made available to pods - this is also known as node allocatable.
To check the amout of resources that are available to your pods you can run:
kubectl get nodes -o jsonpath='{.items[0].status.allocatable}' | jq
Worker nodes with 4GiB of memory or more can run up to 110 pods, which is the Kubernetes default.
Worker nodes with 2GiB or 3GiB of memory can run up to 35 pods simultaneously. This is to reduce the impact of memory and CPU resource reservations on the amount of resources available for workloads, since reservation amounts are a function of node pod density (maxPods).
If a pod tries to allocate more than available allocatable resources it will either get evicted by the kubelet, or killed by the underlying system, depending on how aggressive the resource allocation breach was.
In case of eviction the kubelet will terminate the pod and mark it as ‘Evicted’ and ‘Failed’. In case of an OOM kill, the pod will be marked as ‘OOM Killed’.
Any such events can be seen in Kubernetes events list.
Horizontal Pod Autoscaler (HPA)
In order to use the horizontal pod autoscaler (HPA) you need to install the Metrics Server. You can bring your own or just follow the example.
Install Metrics Server
You can install the Metrics Server via Helm. There is a ready-to-use Metrics Server Helm Chart by Bitnami.
Add the Bitnami Metrics Server repository to your Helm installation:
helm repo add bitnami https://charts.bitnami.com/bitnami
Create a values.yaml
with this content to configure your Metrics Server:
apiService:
create: true
extraArgs:
- --kubelet-insecure-tls=true
- --kubelet-preferred-address-types=InternalIP
Install the Metrics Server Helm Chart:
helm install metrics-server bitnami/metrics-server -f values.yaml
Wait for the Metrics Server to be ready. It might take a minute or two before the first metrics are collected.
Run HPA
In order to run the HPA you need to create a deployment and generate some load against it.
Keep in mind that it is required to define the resource limits and request in order to use the HPA. The service is just for allowing access for load-generator.
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
Download the example deployment and service and deploy with:
kubectl apply -f php-apache.yaml
Create the HPA for the deployment:
kubectl autoscale deployment php-apache --cpu-percent=20 --min=1 --max=10
Check the current status of the HPA:
kubectl get hpa
This should look like this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 2d22h
Generate Test Load
Now you create an infinite loop which will generate a load:
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
Open a second terminal and check the HPA status:
kubectl get hpa -w
After some time you should see the pods scale:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 2d22h
php-apache Deployment/php-apache 91%/50% 1 10 1 2d22h
php-apache Deployment/php-apache 91%/50% 1 10 2 2d22h
php-apache Deployment/php-apache 253%/50% 1 10 2 2d22h
php-apache Deployment/php-apache 253%/50% 1 10 4 2d22h
php-apache Deployment/php-apache 253%/50% 1 10 6 2d22h
php-apache Deployment/php-apache 101%/50% 1 10 6 2d22h
php-apache Deployment/php-apache 71%/50% 1 10 6 2d22h
php-apache Deployment/php-apache 71%/50% 1 10 9 2d22h
php-apache Deployment/php-apache 51%/50% 1 10 9 2d22h
You can also check the deployment itself:
kubectl get deployment php-apache
NAME READY UP-TO-DATE AVAILABLE AGE
php-apache 9/9 9 9 2d22h
Stop Load and Clean Up
In order to stop the load, hit CTRL+C in the terminal where you started the load generator.
You can verify the scale down with the commands from above:
kubectl get deployment php-apache -w
Delete the example deployment and service:
kubectl delete -f php-apache.yaml
{%gs-only%}
Cluster Autoscaler
gridscale managed Kubernetes cluster-autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when the load changes. When the load is high, the cluster-autoscaler increases the size of the cluster (add worker nodes), and when the load is low, it decreases the size of the cluster (remove worker nodes).
Note: The cluster-autoscaler currently supports gridscale managed Kubernetes clusters with version ~> 1.25.
Note 2: When down-scaling, gridscale managed k8s will always remove nodes from the end of the node list. (i.e., scaling down from 8 to 6 nodes, nodes 7 and 8 will be removed, no matter how nodes are utilized at the time. workloads are be rescheduled accordingly.) This is different from default autoscaler behaviour. As such, GSK is implemented in a forked version of autoscaler. This behaviour will change with a future implementation of node pools, which will allow us to integrate into upstream autoscaler.
Note 3: If you deploy your k8s cluster via Terraform, please ignore the change of node_pool.0.node_count
in your tf files by adding the following code:
lifecycle {
ignore_changes = [ node_pool.0.node_count]
}
so that terraform will not overwrite the node count set by the cluster-autoscaler.
cluster-autoscaler deployment
Prerequisites
- A gridscale managed Kubernetes cluster.
- Create an gridscale API token via panel.
- kubectl is installed on your local machine.
- kubectl is configured to access your gridscale managed Kubernetes cluster.
Deploy cluster-autoscaler
- Download the cluster-autoscaler manifest file corresponding to your gridscale managed Kubernetes cluster version from the table below, and save it as
cluster-autoscaler-autodiscover.yaml
.
GSK version | Manifest file |
---|---|
1.30 | link |
1.29 | link |
1.28 | link |
1.27 | link |
1.26 | link |
1.25 | link |
1.23, 1.24 | link |
NOTE: Please note that the flag --daemonset-eviction-for-occupied-nodes=false
is added to the command in the manifest file to evict the PVC-attached pods properly. Otherwise, the CSI related pods will be evicted and the PVC-attached pods will be stuck in Terminating
state. If you want to use the default behavior, please remove the flag.
2. If you use namespace gsk-autoscaler
in your cluster-autoscaler-autodiscover.yaml
, create a new namespace called gsk-autoscaler
by running the following command:
$ kubectl create namespace gsk-autoscaler
- Insert your base64 encoded gridscale API user and token in the manifest file.
- Insert your gridscale kubernetes cluster UUID in environment variable
CLUSTER_UUID
in the manifest file. - Change environment variable
CLUSTER_MAX_NODE_COUNT
in the manifest file to the maximum number of nodes you want to scale up to. (Optional) you can also change the minimum number of nodes by changing environment variableCLUSTER_MIN_NODE_COUNT
(Default: 1) in the manifest file. - To configure parameters of the cluster-autoscaler, you can add flags to the command in the manifest file. All available flags and their default values can be found in the table below.
GSK version | Parameter information |
---|---|
1.30 | link |
1.29 | link |
1.28 | link |
1.27 | link |
1.26 | link |
1.25 | link |
1.23, 1.24 | link |
- Deploy the cluster-autoscaler by running the following command:
$ kubectl apply -f cluster-autoscaler-autodiscover.yaml
- You can check the autoscaling activity by reading the configmap
cluster-autoscaler-status
in namespacekube-system
, i.e.:
$ kubectl get configmap cluster-autoscaler-status -n gsk-autoscaler -o yaml
Note: the cluster-autoscaler will be deployed in namespace called gsk-autoscaler
.
Worker Node Storage Performance Classes
Worker nodes in your cluster use a distributed storage for their operating system. On cluster creation, you choose the performance class for this storage with the parameter k8s_worker_node_storage_type
.
The performance class of your worker nodes’ storage is independent of your PersistentVolumes and only affects the OS, kubelet and potential hostPath
mounts. A higher performance class can help the node stay responsive when under increased memory pressure.
The performance class of your worker nodes can be changed at any time by editing your cluster. You can do so either in your Cloud Panel in the Configuration section, or via API. Changing the performance class will recycle all nodes sequentially.
To do so via API, you need to patch your cluster’s parameters to update the parameter k8s_worker_node_storage_type
. Always include all the parameters in the patch, not just the ones you want to change.
For example:
{
"parameters": {
"k8s_worker_node_ram": 4,
"k8s_worker_node_cores": 2,
"k8s_worker_node_count": 3,
"k8s_worker_node_storage": 40,
"k8s_worker_node_storage_type": "storage"
}
}
FAQ
After upgrading my gridscale managed Kubernetes cluster, the cluster-autoscaler is not working anymore. What should I do?
Please make sure that the minor version of the cluster-autoscaler matches the minor version of your gridscale managed Kubernetes cluster. If not, please redeploy the cluster-autoscaler with the correct version. {%/gs-only%}
Vertical Scaling
GSK supports vertical scaling, which can be enabled by simply editing the worker node configuration of your Kubernetes cluster in the Cloud Panel or via the API. Scaling the cluster will recycle all nodes sequentially.
The following node resources can be changed:
- Cores per worker node via parameter
k8s_worker_node_cores
- RAM per worker node via parameter
k8s_worker_node_ram
- Storage per worker node via parameter
k8s_worker_node_storage
- Storage type per worker node via parameter
k8s_worker_node_storage_type
You can either change these in your Cloud Panel in the Configuration section, or via API.
To do so via API, you need to patch your cluster’s parameters. Always include all the parameters in the patch, not just the ones you want to change.
For example:
{
"parameters": {
"k8s_worker_node_ram": 4,
"k8s_worker_node_cores": 2,
"k8s_worker_node_count": 3,
"k8s_worker_node_storage": 40,
"k8s_worker_node_storage_type": "storage"
}
}
Logging
Control Plane Logs
GSK can deliver Kubernetes control plane logs to an S3 bucket of your choice. This can be configured independently for each cluster.
Currently, the following log types (components) are supported:
- kube-apiserver
- kubernetes audit logs
All logs will be delivered with the following naming convention:
${BUCKET_NAME}/kubernetes/${CLUSTER_UUID}/${COMPONENT}/%Y/%m/%d/%H-%M-%S_${INDEX}_${UUID}.log.gz
Delivery occurs every 10 minutes, or when uncompressed log file size reaches 65MB, whichever comes first. Files are delivered gzip
compressed.
For performance reasons we recommend disabling Bucket Versioning on the destination bucket. You can set up log rotation by specifying a Bucket Lifecycle Configuration to, for example, automatically delete log files older than 365 days.
Audit Log Policy
You can select an audit log policy for your Kubernetes cluster, which will define the types of events that will be captured.
GSK offers three policy types, each with a specific use case:
- Metadata, which will log metadata of every request and response (no body)
- RequestALLResponseCRUD, which will log request metadata and body, and response metadata and body, but not response body for [“get”, “list”, “watch”] verbs
- RequestALLResponseALL, which will log metadata and body for all requests and all responsess
IMPORTANT: Secrets, ConfigMaps, and TokenReviews may contain sensitive data, so they are always only logged at the metadata level, regardless of the policy selected. Some high-volume system calls such as kube-proxy ‘watch’ events are also excluded.
Container Logs
Container logs can be obtained via kubectl
. While this is certainly feasible for ad-hoc debugging of single containers, it doesn’t give you the full picture of your application or even the whole cluster.
It is therefore a common practice to ship logs to a centralized log management platform, where they can be transformed and analyzed in one place - giving you that full picture and the means to act on events or trends.
There are multiple ways to get your logs into the log management platform:
- Your application can directly implement the format your log management platform accepts the logs in, and send them there.
- Your application can log to
stdout
andstderr
, leaving it to the container engine to store the logs.
It is a good practice to use the latter approach. This approach decouples the application from runtime environment specifics. It is non-blocking for the application and provides a general approach to reliably and securely transfer logs, even when running into temporary unavailability of the log management platform
Log Shipping
While the container engine technically might be able to ship the logs directly to your log management platform, having the container engine store them locally instead and a third-party component read and ship them has proven to be the more reliable and portable solution.
This third-party component is called a log shipper. In general it can run anywhere, has inputs to read logs from locally and outputs to ship logs to remotely. The log shipper is an application agnostic approach - in the sense that it doesn’t need to be integrated into the applications you run on your cluster in any way. It just needs to support the format the logs are stored in as an input and the format the log management platform accepts the logs in as an output.
Accessing Container Logs
GSK 1.24 and higher
The logging format used by the container engine is the CRI logging format.
You can choose any log shipper that supports the CRI logging format, such as
Logs are stored in /var/log/containers
and /var/log/pods
.
GSK 1.23 and lower
The log driver used by the container engine docker on our managed Kubernetes platform is journald
.
journald is part of systemd and designed to store logs safely and handle rotation gracefully to prevent node disks from filling up. journald makes it easy for the shipper to reliably transfer logs, since the shipper only needs to keep track of one event stream.
journald stores logs in /var/log/journal
. Among the log shippers that support journald as an input are:
Note:
The log shipper needs to keep track where it left off, so that after a restart/redployment log shipping doesn’t start at the beginning resp. all logs are transferred again. Since the position is node-specific, a local hostPath
mount to store the position in is recommended.
Load Balancing
Applying a service with the type of Load Balancer will provision a gridscale Load Balancer. Below are some helpful tips on integrating with our Load Balancer as a Service (LBaaS):
IP Address Forwarding
The Load Balancer needs to be set to HTTP mode.
The client’s IP address is then available in the X-Forwarded-For
HTTP header.
Note: When in HTTP mode, HTTPS-termination happens at the Load Balancer level. For the HTTP mode alone, certificates will be obtained via Let’s Encrypt or you can upload your own custom certificate.
Configuring Load Balancer Modes
The cloud controller manager (CCM) uses service annotations to configure the LBaaS for a GSK cluster. If an annotation of a specific parameter is not set, the default value for that parameter will be configured. This feature is supported from these GSK versions 1.18.12-gs1, 1.19.4-gs1, 1.17.14-gs1, and 1.16.15-gs2 and later.
Annotation | Default value |
---|---|
service.beta.kubernetes.io/gs-loadbalancer-mode | tcp |
service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https | “false” |
service.beta.kubernetes.io/gs-loadbalancer-ssl-domains | nil |
service.beta.kubernetes.io/gs-loadbalancer-algorithm | leastconn |
service.beta.kubernetes.io/gs-loadbalancer-https-ports | 443 |
service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids | nil |
Examples
- The following annotations configure the LBaaS with HTTP mode, Round Robin Algorithm, redirect HTTP to HTTPS, and multiple SSL Domains wherein domains are separated by a comma. The
service.beta.kubernetes.io/gs-loadbalancer-ssl-domains
annotation allows you to add multiple SSL Domains to the loadbalancer.
annotations:
service.beta.kubernetes.io/gs-loadbalancer-mode: http
service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https: "true"
service.beta.kubernetes.io/gs-loadbalancer-ssl-domains: demo1.test.com,demo2.test.com
service.beta.kubernetes.io/gs-loadbalancer-algorithm: roundrobin
- The following annotations configure the LBaaS with HTTP mode, Round Robin Algorithm, redirect HTTP to HTTPS, a none standard SSL port 4443, and a custom certificate wherein certificate UUIDs are separated by a comma. The
service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids
annotation allows you to an already uploaded custom certificates to the loadbalancer. Thus, first upload the custom certificate via the panel or API. Then, you can use the uuid of the uploaded custom certificate, for examplec8b786e7-53ee-427b-8ff6-498f59f58b14
, withservice.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids
annotation.
annotations:
service.beta.kubernetes.io/gs-loadbalancer-mode: http
service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https: "true"
service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids: c8b786e7-53ee-427b-8ff6-498f59f58b14
service.beta.kubernetes.io/gs-loadbalancer-algorithm: roundrobin
service.beta.kubernetes.io/gs-loadbalancer-https-ports: "4443"
Adding Annotations to an Existing Ingress
You can customize the behaviour of specific Ingress objects using annotations:
kubectl annotate --overwrite svc <INGRESS_NAME> \
"service.beta.kubernetes.io/gs-loadbalancer-mode=http" \
"service.beta.kubernetes.io/gs-loadbalancer-algorithm=roundrobin"
Proxy Protocol
The Proxy Protocol is a network protocol designed to maintain the original IP address of a client when its TCP connection is routed through a proxy. Without this protocol, proxies would not retain this information as they function as intermediaries for the client, transmitting messages to the server while substituting the client’s IP address with their own.
Enable Proxy Protocol for the Loadbalancer with TCP mode (Layer 4)
Edit the ingress controller service:
kubectl edit service -n <namespace> <service-name>
Add these annotations to the manifest as the following snippet:
service.beta.kubernetes.io/gs-loadbalancer-mode: tcp
service.beta.kubernetes.io/gs-loadbalancer-proxy-protocol: v2
Enable Proxy Protocol for the NGINX Ingress Controller
Official documentation for the NGINX Ingress Controller proxy-protocol.
Edit the ConfigMap of the ingress controller:
kubectl edit configmap -n <namespace> <configmap-name>
Add use-proxy-protocol: "true"
to the data section as the following snippet:
apiVersion: v1
data:
allow-snippet-annotations: "true"
use-proxy-protocol: "true"
kind: ConfigMap
Ingress controller errors
broken header: " " while reading PROXY protocol
occurs:- when proxy protocol is enabled in the ingress controller but not enabled in the loadbalancer.
- when the request does not go through the loadbalancer if the proxy protocol is already enabled in the lodabalancer. For example, a k8s service issues a request to a public ingress, the request will not be routed though the loadbalancer.
Allow traffic from specific IPs or CIDRs
when proxy-protocol is enabled the nginx.ingress.kubernetes.io/whitelist-source-range
Ingress annotation can be used to allow the traffic only from specific IPs or CIDRs.
Networking
Until GSK release 1.28 we used Flannel as a network plugin. Starting with 1.29 we use Cilium, a more advanced network plugin. This cannot be changed. For upgraded gsk clusters to 1.29 network policy mode is configured in so-called verdict mode where policies have no effect but it can be debugged with related tooling which is mentioned later. In new provisioned gsk clusters from 1.29 network policies are enabled by default.
Network Policies
GSK releases 1.29 and upward supports network policies, which provides the common way to secure workloads inside a cluster. The gsk cni plugin has changed in release 1.29 from flannel to cilium which allows customers to use standard kubernetes and extended cilium network policies. Both supports Layer 3 and 4 policies. Cilium network policies supports also Layer 7. Aside from that cilium allows you to configure cluster-scoped policies.
The cilium project offers a simple example on how to use Cilium network policies at layer 3/4 and layer 7. This includes a backend deployment and two clients.
The cilium project also offers a very good online editor for network policies.
Change network policy mode
kubectl patch configmap/cilium-config -n kube-system --type merge -p '{"data":{"enable-policy":"default"}}' # 'default' for enabling network policies and 'never' for disabling
kubectl rollout restart ds -n kube-system cilium
Namespaced Policies
At layer 3/4 you can achieve the same behaviour like in the demo offered by cilium with a standard kubernetes network policy like this:
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: rule1
spec:
podSelector:
matchLabels:
org: empire
class: deathstar
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
org: empire
ports:
- port: 80
protocol: TCP
Verifying that the policy works as expected:
kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deathstar-f449b9b55-6lv4x 1/1 Running 2 (118m ago) 120m 10.244.193.135 node-pool0-0 <none> <none>
deathstar-f449b9b55-8btck 1/1 Running 0 120m 10.244.192.92 node-pool0-1 <none> <none>
tiefighter 1/1 Running 0 120m 10.244.193.115 node-pool0-0 <none> <none>
xwing 1/1 Running 0 120m 10.244.193.124 node-pool0-0 <none> <none>
kubectl apply -f network-policy.yaml
kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
Ship landed
kubectl exec xwing -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
^C
Cluster-scoped policies
A CiliumClusterwideNetworkPolicy can be used when traffic can be restricted by selector labels for all pods in a cluster. Let’s use a slightly modified example from the star wars demo used before.
We will create 3 namespaces. One for the deathstar service and after that we will deploy tiefighter and xwings pod as deployments in the other two namespaces.
---
apiVersion: v1
kind: Namespace
metadata:
name: test-1
---
apiVersion: v1
kind: Namespace
metadata:
name: test-2
---
apiVersion: v1
kind: Namespace
metadata:
name: test-deathstar
---
apiVersion: v1
kind: Service
metadata:
name: deathstar
namespace: test-deathstar
labels:
app.kubernetes.io/name: deathstar
spec:
type: ClusterIP
ports:
- port: 80
selector:
class: deathstar
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deathstar
namespace: test-deathstar
labels:
app.kubernetes.io/name: deathstar
spec:
replicas: 2
selector:
matchLabels:
class: deathstar
template:
metadata:
labels:
class: deathstar
app.kubernetes.io/name: deathstar
spec:
containers:
- name: deathstar
image: docker.io/cilium/starwars
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tiefighter
namespace: test-1
labels:
class: tiefighter
app.kubernetes.io/name: tiefighter
spec:
replicas: 1
selector:
matchLabels:
class: tiefighter
template:
metadata:
labels:
class: tiefighter
app.kubernetes.io/name: tiefighter
spec:
containers:
- name: spaceship
image: docker.io/cilium/json-mock
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: xwing
namespace: test-1
labels:
app.kubernetes.io/name: xwing
class: xwing
spec:
replicas: 1
selector:
matchLabels:
class: xwing
template:
metadata:
labels:
class: xwing
app.kubernetes.io/name: xwing
spec:
containers:
- name: spaceship
image: docker.io/cilium/json-mock
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tiefighter
namespace: test-2
labels:
class: tiefighter
app.kubernetes.io/name: tiefighter
spec:
replicas: 1
selector:
matchLabels:
class: tiefighter
template:
metadata:
labels:
class: tiefighter
app.kubernetes.io/name: tiefighter
spec:
containers:
- name: spaceship
image: docker.io/cilium/json-mock
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: xwing
namespace: test-2
labels:
app.kubernetes.io/name: xwing
class: xwing
spec:
replicas: 1
selector:
matchLabels:
class: xwing
template:
metadata:
labels:
class: xwing
app.kubernetes.io/name: xwing
spec:
containers:
- name: spaceship
image: docker.io/cilium/json-mock
Finally we can deploy a CiliumClusterwideNetworkPolicy to restrict traffic only for tiefigther deployments and only to a specific layer 7 path.
---
apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
name: "clusterwide-policy-example"
spec:
description: "Policy for selective ingress allow to a pod from only a pod with given label"
endpointSelector:
matchLabels:
class: deathstar
ingress:
- fromEndpoints:
- matchLabels:
class: tiefighter
toPorts:
- ports:
- port: "80"
protocol: TCP
rules:
http:
- method: "POST"
path: "/v1/request-landing"
Now we will get nearly the same results as we got in the demo before.
kubectl get pod -A
...
test-1 tiefighter-67869b6bcc-chspv 1/1 Running 0 12m
test-1 xwing-df84dd9c5-pkfbr 1/1 Running 0 12m
test-2 tiefighter-67869b6bcc-tmnnj 1/1 Running 0 12m
test-2 xwing-df84dd9c5-kpc76 1/1 Running 0 12m
test-deathstar deathstar-66949878c9-grztm 1/1 Running 0 12m
test-deathstar deathstar-66949878c9-n2mvf 1/1 Running 0 12m
kubectl exec -ti -n test-2 tiefighter-67869b6bcc-tmnnj -- curl -s -XPOST deathstar.test-deathstar.svc.cluster.local/v1/request-landing
Ship landed
kubectl exec -ti -n test-2 tiefighter-67869b6bcc-tmnnj -- curl -s -XPOST deathstar.test-deathstar.svc.cluster.local/v1/request
Access denied
kubectl exec -ti -n test-1 tiefighter-67869b6bcc-chspv -- curl -s -XPOST deathstar.test-deathstar.svc.cluster.local/v1/request-landing
Ship landed
kubectl exec -ti -n test-1 tiefighter-67869b6bcc-chspv -- curl -s -XPOST deathstar.test-deathstar.svc.cluster.local/v1/request
Access denied
kubectl exec -ti -n test-1 xwing-df84dd9c5-pkfbr -- curl -s -XPOST deathstar.test-deathstar.svc.cluster.local/v1/request-landing
^C
kubectl exec -ti -n test-2 xwing-df84dd9c5-kpc76 -- curl -s -XPOST deathstar.test-deathstar.svc.cluster.local/v1/request-landing
^C
Hubble
The hubble ui and metrics can be enabled via panel. Neither the ui nor the metrics endpoint is exposed outside the private network which is attached to the gsk cluster nodes.
To use the hubble ui you can simply forward the port to your local machine:
kubectl port-forward -n kube-system svc/hubble-ui 8080:80
To get an impression about the metrics you can port forward the metrics endpoint and curl /metrics
:
kubectl port-forward -n kube-system svc/hubble-metrics 9965:
For now cilium-operator metrics are not available. The OpenMetrics support is enabled. Read more in the referenced links.
This is a good starting point in the cilium documentation regarding metrics in general: https://docs.cilium.io/en/stable/observability/metrics/
This is a good starting point in the cilium documentation regarding layer 7 visibility: https://docs.cilium.io/en/latest/observability/visibility/
To get an impression you can apply the demo resources for prometheus and grafana with pre configured dashboards.
Cilium operator
The cilium operator is running on the hidden master node. This is to ensure safe and stable operations. We are reapplying the configmap on each gsk upgrade and only keep settings regarding cluster traffic encryption and network policy mode at the moment.
Persistent Volumes
We differentiate between Persistent Volumes that are based on block devices and those that are based on network filesystems.
Block Device Persistent Volumes
Block device based Persistent Volumes use distributed storages that are directly attached to your GSK nodes.
Since they are block devices with plain, non-clustered filesystems (ext4
by default), they can only ever be attached to a single node at a time and thus only be used by pods that run on the same node. (ReadWriteOnce (RWO) access mode)
Their strength is performance.
Storage Classes
Block device based Persistent Volumes give you the raw performance of the Distributed Storage. You can find a storage class for each of its performance classes.
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
block-storage (default) bs.csi.gridscale.io Delete Immediate true 68d
block-storage-high bs.csi.gridscale.io Delete Immediate true 68d
block-storage-insane bs.csi.gridscale.io Delete Immediate true 68d
Reclaim Policy
Reclaim policy Delete
makes sure that deleting Persistent Volumes (PV) will also delete the corresponding Distributed Storage.
Deleting and changing preconfigured storage classes to modify this behaviour is not recommended. Your changes will be reverted with every upgrade.
Instead, create your own storage classes that use the same provisioner
.
Troubleshooting
In rare cases a persistent volume can no longer be mounted.
That happens when the volume was unmounted on the node and the VolumeAttachment
was not removed correctly.
Here is how to fix it:
- Find the affected PVC
kubectl get volumeattachments | grep <affected-pvc>
- Delete the volumeattachment for the affected PVC
kubectl delete volumeattachments <affectedvoumeattachment>
Now the attachment process will recreate the volumeattachment
and the mount process should work.
Limitations
Block device based Persistent Volumes are subject to Distributed Storage and Server limitations. Currently, up to 15 storages respectively Persistent Volumes can be attached to a single GSK node at a time. The attach-process takes a few seconds per Storage/PV.
Network Filesystem Persistent Volumes via GridFs
Requires 1.19.16-gs0, 1.20.15-gs0, 1.21.11-gs0 or higher.
Network Filesystem based Persistent Volumes use GridFs to store data. GridFs is an NFS-compatible network filesystem. It grows with your data, you only pay for volume you actually use and your data can be access read-write by any number of GSK nodes at a time. (ReadWriteMany (RWX) and ReadOnlyMany (ROM) access modes)
Its strengths are scalability and being read-write accessible from all your GSK nodes.
Set up GridFs based Persistent Volumes
GridFs is an NFS compatible network filesystem. As such, access is achieved through the NFS CSI driver for Kubernetes.
- Create a new GridFs instance or use an existing one.
- Follow the first three steps of Connect a Kubernetes Cluster to a PaaS service to make sure your GridFs is connected to your GSK cluster.
- Install the NFS CSI driver for Kubernetes as described here.
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system --version v4.0.0
- Create a storage class that uses the NFS CSI driver as the provisioner and your GridFS as the NFS server.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gridfs-<PAAS_SERVICE_UUID OF YOUR GRIDFS>
provisioner: nfs.csi.k8s.io
parameters:
server: <IP ADDRESS OF YOUR GRIDFS>
share: /
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
- nfsvers=4.1
- Use that storage class for your PVCs.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-first-gridfs-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: gridfs-<PAAS_SERVICE_UUID OF YOUR GRIDFS>
- The NFS CSI driver creates a directory for this PVC under the
share
-path configured in the storage class and makes it available as a new PersistentVolume.
Limitations
Network Filesystem based Persistent Volumes via GridFs can hold any number of PVCs in a single GridFs instance.
Host Path Persistent Volumes
Aside from block device based and network filesystem based Persistent Volumes, hostPath
Persistent Volumes can be used for node-local storage.
Please note:
- Due to the transient nature of the Kubernetes nodes,
hostPath
Persistent Volumes will be lost whenever the node is being recycled. (f.e. during updates, upgrades or node recovery) - Use of
hostPath
Persistent Volumes can fill up node-local storage and affect health of the node.
Persistent Volumes are not automatically deleted
The PersistentVolume is created automatically when a PersistentVolumeClaim is requested. But it’s not automatically deleted after you delete the GSK cluster. This behaviour prevents data loss of your persistent volumes.
There are two ways to delete the persistent volumes:
- After deleting the cluster, it’s also possible to delete the persistent volumes from the Cloud Panel.
- Before deleting the cluster, you should delete the related deployments that use the PersistentVolume and the PersistentVolumeClaim from the cluster.
Local Rocket Storage
GSK releases 1.26 and upward supports local rocket storage, which provides nodes with an additional high performance I/O storage, as described in the storage performance classes.
Management of local rocket storage in GSK
- Adding or removing the rocket storage:
- impacts the workloads (e.g., rescheduling pods) as worker nodes will be replaced
- Increasing the size of the rocket storage:
- expands the rocket storage on-the-fly
- preserves data
- has no impact (e.g., rescheduling pods) on workloads that are not using the rocket storage
- Decreasing the size of the rocket storage:
- in order to decrease the size of the rocket storage, you MUST delete all workloads and PVCs that use the rocket storage
- recreates the rocket storage
- causes data loss
- has no impact (e.g., rescheduling pods) on workloads that are not using the rocket storage
- updating the worker nodes’ resources (e.g., ram, core, and the boot storage) preserves data of the rocket storage
How to use the local rocket storage to provision local PVCs
- Deploy local-path-provisioner
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.24/deploy/local-path-storage.yaml
- Patch the configmap of local-path-config where the local rocket storage is mounted to
/gs-pvc-data
path in the worker node
cat <<EOF>configmap_batch.yaml
data:
config.json: |-
{
"nodePathMap":[
{
"node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",
"paths":["/gs-pvc-data"]
}
]
}
EOF
kubectl -n local-path-storage patch cm local-path-config --patch-file configmap_batch.yaml
- Print the supported
StorageClasses
, and you should seelocal-path
included inStorageClasses
kubectl get StorageClasses
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
block-storage (default) bs.csi.gridscale.io Delete WaitForFirstConsumer true 83m
block-storage-high bs.csi.gridscale.io Delete WaitForFirstConsumer true 83m
block-storage-insane bs.csi.gridscale.io Delete WaitForFirstConsumer true 83m
local-path rancher.io/local-path Delete WaitForFirstConsumer false 33m
- When the locally provisioned PVC reaches the maximum size of the local rocket storage, you might get
output: I/O error
in your pods- I/O error
kubectl logs -f cronjob-logs-28102376-f8qzk head: standard output: I/O error
- you can expand the local rocket storage by increasing the value of
k8s_worker_node_rocket_storage
. Then, the pods recover.
- I/O error
Limitations
- When decreasing the size of the rocket storage, all worker node rocket storages will be recreated. as such, you need to delete all workloads and PVCs that use the rocket storage first.
- As the rocket storage is local, it is bound to both the worker node and the underlying server’s compute node. As such
- your workloads using a
local-path-provisioner
PVC cannot be rescheduled to other worker nodes. - your worker node cannot be recovered, while the compute node is not available.
- your workloads using a
- The maximum supported size of the rocket storage is 5960GB
- Take care to only use the
local-path-provisioner
deployment (as described above) if local rocket storage is enabled. otherwise, thelocal-path-provisioner
will store its PVCs on the boot disk of the worker nodes. - The
local-path-provisioner
does not enforce volume capacity limits. The capacity limit will be ignored. - Although
ALLOWVOLUMEEXPANSION
is disabled, local volume PVCs can grow to the rocket storage size.
Ingress Controller
Your cluster does not come with an ingress controller preinstalled. You can install the ingress controller of your choice as described in ingress-controllers.
Access and Security
All users with write access (or higher) to the project will be able to download the Kuberenetes certificate.
{%gs-only%}
PKI Certificate Access
Authentication against the Kubernetes master is based on X.509 client certificates, which can be generated and expire after three days. This can be used with gscloud, which will automatically renew the certificate for you.
After installation of gscloud, set it up with your API token as described here. Then use gscloud to fetch and maintain your kubeconfig as described here. {%/gs-only%}
Encryption
Data is encrypted at rest, and network traffic is TLS encrypted on the application layer.
Network Traffic Encryption
GSK releases 1.29 and upward supports cluster network traffic encryption with wireguard due to usage of cilium as the cni plugin. To enable full encryption you need to set two parameters in the cilium-config configmap to true. Only setting enable-wireguard
to "true"
would only encrypt pod-to-pod traffic.
kubectl patch configmap/cilium-config -n kube-system --type merge -p '{"data":{"enable-wireguard":"true", "encrypt-node": "true"}}' # 'true' for enabling network traffic encryption and 'false' for disabling
kubectl rollout restart ds -n kube-system cilium
Role-based Access Control (RBAC)
GSK supports standard Kubernetes RBAC.
Firewall
GSK controlplane and worker nodes utilize the firewall in the OS to secure cluster-internals from the public network.
This does not restrict you from exposing your workloads to the public network.
Backups
Data that belongs to the controlplane of the cluster (such as etcd) is backed up by gridscale.
Data that comes from within the application needs to be backed up by the user. gridscale Storage snapshots and backups are not supported by GSK at this point. They cannot be used for backing up persisted data.
Please employ a solution that runs in the cluster.
Node Pools
With GSK 1.30.3-gs0 you can now create multiple node pools per cluster. For more information, see here.
Kubernetes Dashboard
The official Kubernetes dashboard is not deployed by default and can be installed with a single command that is mentioned in the Official Kubernetes Documentation.
Known issues
Storage instances are not deleted from the Cloud Panel
To prevent this issue, please do NOT delete the PVs (Persistent Volumes) before the storage instances are deleted completely from the panel. If you already have some storage instances dangling in the panel, please contact us to remove them.
Cannot delete k8s cluster when there are other PaaS/servers connected to the cluster’s private network
The issue can be solved by either attaching the PaaS/servers to other networks or removing the PaaS/servers.
Node labels do not persist
Nodes in a Kubernetes cluster are volatile and can be replaced at any time, i.e. during updates, upgrades or node recovery. When they are, replacement nodes do not inherit their labels.
If you control scheduling of your pods with nodeSelector and node labels, please consider migrating to Affinity and anti-affinity.
FAQ
Does gridscale monitor the cluster?
We monitor the overall cluster health of a cluster. We assure that the cluster is healthy and functional, and we will be paged about abnormal conditions of the cluster.
gridscale does not monitor the application(s) that are deployed within the cluster. Since we don’t know anything about your workloads, we don’t include performance and resource monitoring from our side as part of the standardised gridscale Managed Kubernetes (GSK).
Do cluster components communicate on the Public or the Private Network?
Cluster communication is strictly private. This includes communication between Kubernetes components, but also communication between pods and/or services.
However, as a user you can contact external services.
Thereby it would technically be possible, but not usual, to communicate with other services on the cluster through the Public Network and Load Balancers, if that service is exposed to the outside and communication is explicitly directed there through public connection details.
A specific tool that I want to use with my cluster is not working. What shall I do?
Please check whether your tool is supporting the kubernetes version of your cluster. If your cluster version is not supported, please have a look in the Cloud Panel, where you can update your cluster to a new patch version (e.g. 1.24.8 to 1.24.9) or replace your cluster with a more up-to-date one.
I cannot see PVC usage in Grafana. What shall I do?
Please ensure that the volume is mounted for long enough, and the query interval in Grafana low enough to catch all metrics.
Terms and Abbreviations
- GSK: gridscale Kubernetes
- K8s: K-ubernete-s.
- kubectl: A command line tool which functions as a management interface for a K8s cluster.
- Node: A K8s cluster is made of a few virtual machines that talk to each other. In this context, a virtual machine is a node. A master (we have one master at the moment) and one or more workers.
- Control Plane: A fancy way of saying “masters of the cluster”. Technically, all programs that run on the master that make the cluster a cluster. For instance, a specialized database or a program that decides which worker should run which software.
- Deployment: In most cases an app running on K8s. Technically a collection of containers based on a set of templates (images).
- PV: Persistent Volume. A persistent storage for Kubernetes deployments.
- PVC: Persistent Volume Claim. When a client (user, customer, an application) needs a PV, they send a PVC to the K8s cluster.
- Service: A way of accessing your deployment outside of the cluster, tightly related to Load Balancers and Ingresses.
- Ingress: A special way of exposing a deployment outside of the cluster. Think of it as a kind of Load Balancer.
- IngressController: This component runs inside the cluster and is responsible for handling requests for an Ingress.
- RBAC: Role Based Access-Control. Allows you to selectively give different people different access rights to the cluster.
- Dashboard: A graphical frontend for the cluster API. The user can see their deployments, nodes and a few metrics without using the command line. This is not enabled by default, but can be easily installed into the GSK.