Running Kubernetes jobs on specific nodes

Sep 25, 2022 · 93 words · 1 minute read devopshelmkubernetes

Batch jobs are commonly used for data-intensive tasks – refreshing analytics and model retraining just to to name a few. Often data engineers and scientists use special purpose tools like Apache Spark, Apache Airflow, AWS Glue, or Google Dataflow. These tools provide a high-level language for dispatching data processing workflows that can automatically employ strategies such as data shuffling, map-reduce, and parallelization. The infrastructure is usually abstracted from the end-user, either by an internal DevOps or Platform Engineering team or by using cloud provider managed services.

Running your Kubernetes jobs on specific nodes