Kubernetes

1029 readers
2 users here now

founded 2 years ago
MODERATORS
1
11
submitted 2 years ago* (last edited 2 years ago) by Daemon to c/kubernetes
2
3
 
 

When combined with today’s other vulnerabilities, CVE-2025-1974 means that anything on the Pod network has a good chance of taking over your Kubernetes cluster, with no credentials or administrative access required.

4
 
 

Authors: Daniel Vega-Myhre (Google), Abdullah Gharaibeh (Google), Kevin Hannon (Red Hat)

In this article, we introduce JobSet, an open source API for representing distributed jobs. The goal of JobSet is to provide a unified API for distributed ML training and HPC workloads on Kubernetes.

[...]

[T]he Job API fixed many gaps for running batch workloads, including Indexed completion mode, higher scalability, Pod failure policies and Pod backoff policy to mention a few of the most recent enhancements. However, running ML training and HPC workloads using the upstream Job API requires extra orchestration to fill the following gaps:

Multi-template Pods : Most HPC or ML training jobs include more than one type of Pods. The different Pods are part of the same workload, but they need to run a different container, request different resources or have different failure policies. A common example is the driver-worker pattern.

Job groups : Large scale training workloads span multiple network topologies, running across multiple racks for example. Such workloads are network latency sensitive, and aim to localize communication and minimize traffic crossing the higher-latency network links. To facilitate this, the workload needs to be split into groups of Pods each assigned to a network topology.

Inter-Pod communication : Create and manage the resources (e.g. headless Services) necessary to establish communication between the Pods of a job.

Startup sequencing : Some jobs require a specific start sequence of pods; sometimes the driver is expected to start first (like Ray or Spark), in other cases the workers are expected to be ready before starting the driver (like MPI).

JobSet aims to address those gaps using the Job API as a building block to build a richer API for large-scale distributed HPC and ML use cases.

5
6
 
 

cross-posted from: https://lemmy.ml/post/20234044

Do you know about using Kubernetes Debug containers? They're really useful for troubleshooting well-built, locked-down images that are running in your cluster. I was thinking it would be nice if k9s had this feature, and lo and behold, it has a plugin! I just had to add that snippet to my ${HOME}/.config/k9s/plugins.yaml, run k9s, find the pod, press enter to get into the pod's containers, select a container, and press Shift-D. The debug-container plugin uses the nicolaka/netshoot image, which has a bunch of useful tools on it. Easy debugging in k9s!

7
8
9
10
11
12
8
submitted 11 months ago by Sheldan to c/kubernetes
13
22
submitted 1 year ago by mac to c/kubernetes
14
5
submitted 1 year ago* (last edited 1 year ago) by Sheldan to c/kubernetes
 
 

I recently got recommended this project, to have a more natively connected CI/CD (I would probably be more interested in the CI part, as I already have argo-cd running) And it seems very interesting, and the development seems okayish active. The only thing that I am curious about (and why I made this post, besides maybe making more people aware that it exists), is how active the Tekton hub (https://hub.tekton.dev/) is.

So, maybe somebody here has some information on that. I am not using Tekton (yet), but I read somewhere in the documentation, that this hub is supposed to be the place to get re-usable components, but seeing the actual activity on there turned me off from the project a little bit, because a lot of things are in version 0.1 and have been last updated 1 or 2 years ago. Maybe that issue only exists, because I am not logged in, but that certainly looks weird.

So, do you have any experience with Tekton? How do you feel about it?

15
16
17
18
19
20
10
submitted 1 year ago by mac to c/kubernetes
21
9
submitted 1 year ago by mac to c/kubernetes
22
23
 
 

One of biggest problems of #kubernetes is complexity.
@thockin on #KubeCon keynote shares his insights. I've seen that time and again with my users, as well as on our Logz.io DevOps Pulse yearly survey.
Maintainers aren't the end users of
@kubernetes , which doesn't help.

24
 
 

#KubeCon #ObservabilityDay? It’s time to talk about the unspoken challenges of #monitoring #Kubernetes: the bloat of metric data, the high churn rate of pod metrics, configuration complexity, and so much more. https://horovits.medium.com/f30c58722541
#observability #devops #SRE @kubernetes @linuxfoundation

25
6
submitted 1 year ago by stoex to c/kubernetes
view more: next ›