Moving From EC2 Runners to GitHub Action Runner Controller

2025-12-01, 12 minutes reading for Software Engineers

I carried out this brief Analysis of Using Action Runner Controller Compared to Self-Hosted Runner in July of 2024, This year my current Company has adopted it. This blog post give an overview of using k8s instead of dedicated EC2 runners.

This post does not go through the setup process, the documentation is easy to follow.

About

Actions Runner Controller (ARC) is a Kubernetes operator that orchestrates and scales self-hosted runners for GitHub Actions.

With ARC, you can create runner scale sets that automatically scale based on the number of workflows running in your repository, organization, or enterprise. Because controlled runners can be ephemeral and based on containers, new runner instances can scale up or down rapidly and cleanly.

Why

Our AWS EC2 GitHub runners were not efficiently utilised and had no automatic scaling.

Most Jobs fall under 10 minutes, however the runner keep running regardless

Workflow distribution, 75% (53.99%+21.09%) fall under 10 minutes

And most of the workflows runs from 9am to 7pm (Australian main time zones)

Build Runner CPU usage on July 5, 2024 from 5am to 11pm

ARC Key Concepts

Controller: A single one for the whole deployment, watches for changes, auto-scales, and manages the lifecycle of runners.

Listener: One for each runnerSet, the listener pod connects to the GitHub Actions Service to authenticate and establish an HTTPS long poll connection that handles communication with GitHub. The listener stays idle until it receives a Job Available message from the GitHub Actions Service.

Runner: Represents a single Action runner.

RunnerSet: Runner scale sets are a group of homogeneous runners (that share the same configuration) that can be assigned jobs from GitHub Actions.

Comparison with Current EC2 Runners

Advantages

Scaling can be done automatically, removing the need for manual intervention to increase the number of runners.
- Faster startup compared to EC2
Cost optimisation: no need to pay for unused nodes (i.e., idle nodes).
Isolation: Each workflow can run in its own isolated Pod environment, improving security and reducing the risk of jobs interfering with each other (i.e., not cleaning up the environment after the workflow finishes). (It is possible to disable ephemeral behaviour.)
- Runners are ephemeral, no danger of runner contamination and no need to run cleanup steps on ephemeral runners.
Resource efficiency: Runners can be packed more efficiently across the cluster's hardware resources, maximising the utilization of the underlying infrastructure.

Disadvantages

Management overhead: Kubernetes requires ongoing maintenance and regular upgrades, however in our case this k8s is already used widely in the organisation
- EC2 runners are much simpler
Depending on the resources assigned to the runner pods (resource.request and resource.limit), workflow runs might be less predictable, potentially taking more or less time depending on the capacity of the cluster.

Overview

More about ARC can be found on docs.github.com

Custom Resource

ARC consists of several custom resource definitions (CRDs). Once deployed, you can list these custom resources with:

➜  ~ kubectl api-resources --api-group=actions.github.com
NAME                    SHORTNAMES   APIVERSION                    NAMESPACED   KIND
autoscalinglisteners                 actions.github.com/v1alpha1   true         AutoscalingListener
autoscalingrunnersets                actions.github.com/v1alpha1   true         AutoscalingRunnerSet
ephemeralrunners                     actions.github.com/v1alpha1   true         EphemeralRunner
ephemeralrunnersets                  actions.github.com/v1alpha1   true         EphemeralRunnerSet

How It Connects to GitHub

The runner maintains a outbound HTTPS connection (long-polling) to GitHub using the runner container hook. This means that the runner is constantly checking (polling) GitHub for new jobs.

Once the runner starts executing a job, it continuously sends updates back to GitHub over the same long-polling connection. These updates include the status of the job (e.g., in progress, completed, failed), any output from the job (which is displayed in the GitHub Actions logs), and any artifacts produced by the job.

ARC Modes

ARC officially supports three modes. You are free to customise the runner image however you wish (such as having rootless DinD) as long as it meets the minimum requirements.

Standard Mode

Nothing special; a single container in a single pod. It does not come with Docker.

DinD Mode

Docker in Docker (DinD): Run DinD container alongside the runner container in the same pod.

The runner is responsible for executing all commands passed through the actions, while the Docker container runs the Docker daemon. The runner container mounts the Docker daemon socket, so even though the docker run command is executed from the runner container, that request is ultimately passed through to the Docker daemon running in the Docker container.

Resource requests and limits need to be carefully adjusted for both containers. You'll need to consider where the workloads are ultimately going to be run. If the majority of the workload is within the container, then you'd want to allocate the majority of resources to the DinD container.

K8s Mode

If security is your highest priority, then Kubernetes mode allows you to do the same as DinD. In K8s mode, a second pod is spun in the same namespace with the pod sharing the same network space and persistent volume as the runner, without requiring any privileges.

To use Kubernetes mode, you must:

Create persistent volumes available for the runner pods to claim.
Use a solution to automatically provision persistent volumes on demand. For testing, you can use a solution like OpenEBS.

Unfortunately, due to resource limitation, I was unable to test K8s Mode.

Test Workflows

Some test cases I created, these test cases should be representative of some/most of the workflows we run on a daily basis. Stress testing ARC was out of scope:

A simple bash command
Run a container stub beside the main workflow using the services keyword, requires DinD
Pulling another image from ECR to run the workflow on, using the container keyword
Multiple different workflows running concurrently
Custom backend image workflow
A Matrix workflow to demonstrate scaling up and down

ARC was able to handle all these normally with no issue.

How to Estimate the cost saving

In this section, I estimated the cost comparison between current EC2 Runners and ARC.

The analysis here makes the following assumptions:

Paying on-demand pricing for EC2, any AWS discounts are not taken into account
- Normal pricing for EKS, i.e. assuming no need for EKS extended support pricing
Data egress cost is the same across both types of runners
Both use the same EC2 instance type shown in the table below, but different OS

Instance Name	vCPUs	Memory
c5.2xlarge	8	16 GiB

EC2 Runners

Cost of the current EC2 runners is calculated as such:

Cost Calculation

cost of EC2 type (c5.2xlarge) per hour multiplied by the combined total number of hours.

Our EC2 runners are scaled up and down on schedule regardless of daily demand. That is, the same number of runners are active every week.

Table below shows the cost breakdown for our Build runners of type c5.2xlarge using RHEL in Asia-Pacific Sydney region:

Cost per Hour (RHEL)	Total Hours per Day	Daily Cost	Annual Cost (260 days)
$0.574	10 nodes × 19 hrs + 5 nodes × 4 hrs = 210 hrs	$120.54	$31,340.40

ARC Runners

There are two methods that can be used to estimate this:

Using current Runner CloudWatch data
Using GitHub workflow statistics

Tables below show the estimated cost breakdown for our equivalent Build runners of type c5.2xlarge using Amazon EKS optimised Amazon Linux in Asia-Pacific Sydney region:

Using CloudWatch

Cost Calculation

With EKS, you only pay for what you use; there are no minimum fees and no upfront commitments.

Because we only pay what we use, the cost in EKS is dynamic, based on how many nodes are running at any moment in time.

Here is the cost breakdown:

$0.10 USD per hour per cluster + number of Amazon Managed Nodes
- $0.10 is not included in the cost as the management cluster is already provisioned to run other services
The number of nodes depends on the number of workflows running and how much resources (CPU/memory) each workflow/pod is requesting
- The number of workflows running can be deduced from GitHub workflow statistics
Unlike dedicated runners on EC2 instances, there is no cost of running the container image, i.e no need to pay RHEL license

The graph below shows the CPU usage of 10 Build Runners on Thursday, July 4, 2024:

The graph below shows the same CPU usage but with stacked average area of 10 Build Runners on Thursday, July 4, 2024. Note the maximum possible value for 10 instances is 1000%:

Stacked Area graph of CPU usage of Build Runners with Statistic: Average and Period:1 minute

The graph below shows the same CPU usage but with stacked maximum area of 10 Build Runners on Thursday, July 4, 2024:

Stacked Area graph of CPU usage of Build Runners with Statistic: Maximum and Period:1 minute

Based on the graphs above and current EKS requirements, my calculations estimate are based on these assumptions:

Multiple DaemonSets (shown below) are deployed on the EKS cluster. A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. This means that ARC nodes will have slightly reduced capacity for runners.

➜  kubectl get daemonset --all-namespaces
NAMESPACE     NAME                                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   aws-node                                  7         7         7       7            7           <none>                   2y141d
kube-system   kube-proxy                                7         7         7       7            7           <none>                   2y85d
splunk        splunk-otel-splunk-otel-collector-agent   7         7         7       7            7           kubernetes.io/os=linux   294d
twistlock     twistlock-defender-ds                     7         7         7       7            7           <none>                   2y129d

In out case these DaemonSets take up at most 1 core on each node.

Based on the stacked CPU usage from the stacked average area graph, I estimate that with ARC, our workload will use the equivalent of 5 runners for 12 hours every day, which is equal to 60 hours total.
We add 40% more resource to account for slightly reduced capacity and runner idling.

Note:

I use a single day to represent every workday in the whole year (I actually picked the highest day out of 3 samples). In practice, this is not the case, but it should be close enough. Unfortunately on AWS, I couldn't export data for more than one day because our instances get recycled daily.
I'm being generous here. If we refer back to the stacked graph, we can see that the average usage doesn't exceed 300% and most of the time current build runners are sitting idle. Furthermore, the maximum usage doesn't exceed 400%.

Table below shows the estimated cost of using the CloudWatch method:

Cost per Hour	Total Hours per Day	Daily Cost	Annual Cost (260 days)
$0.444	60 × 40% = 84	$37.296	$9,696.96

Using GitHub Workflow Statistics

Cost Calculation

Using GitHub workflow statistics for our Github org for the month of June 2024, this data includes build, non-prod, and deploy runners. I extracted and plotted the data into a frequency distribution graph. I wanted to highlight our current workflow usage.

We have used a total of 124,292 minutes, however this includes 5 non-prod instances and 3 deploy instances (in practice, the number of instances is usually higher). In my calculation, I only include the 10 instances of build runners, which translates to 60% of 124,292 minutes, which is equal to 74,575 minutes.

We have used a total of 74,575 minutes = 1,242.92 hours = 51.79 days of workflow time. Which is equivalent to running ~2 instances for 24 hours, or ~4 for 12 hours. We add 40% to our result to account for reduced capacity and idling, and we reach 6 instances for 12 hours, which aligns with my generous calculations in the CloudWatch section.

Table below shows the estimated cost of using the GitHub Statistics method:

Cost per Hour	Total Hours per Day	Daily Cost	Annual Cost (260 days)
$0.444	6 × 12 = 72	$31.968	$8,311.68

Performance Analysis

The performance per core was roughly the same, however keep in mind that unlike dedicated EC2, ARC Runner nodes have slightly reduced capacity due to the daemon-sets running on each instance and pod overhead is larger than process overhead when running on a dedicated runner.

The workflow below was used as a benchmark for both runners. It builds a buildpack image, then runs the build image to download and compile CMake. The compilation uses all available cores.

The result of running the same benchmark on both runners are shown in the table below.

vCPUs (C5 family)	GitHub Runner	ARC
2	33m 52s	34m 17s
4	23m 31s	23m 24s
8	12m 44s	12m 51s

Takeaway on utilisation resources

Is your workflow's CPU utilisation like this?

Most Workflows jobs make little use of multi-threading, so having exclusive ownership of a dedicated EC2 Runners will result in poor utilization of resources when your workflow is not using the cores. This can be clearly observed from the CPU utilization graph. So a good question to ask:

Do you really need 8 cores system to run terraform apply? when Terraform apply is API limited in the first place( --parallelism flag refers to concurrent operations and not actual parallelism, i.e. multi-threading or multiprocessing).

I couldn't confirm whether a Terraform run (apply/plan/destroy) uses more than one thread, however I found this about the required CPU resources for terraform enterprise on capacity and performance:

Our rule of thumb is 10 Terraform runs per CPU core, with 2 CPU cores allocated for the base Terraform Enterprise services. So a 4-core instance with 16 GB of memory could comfortably run 20 Terraform runs, if the runs are allocated the default 512 MB each.

In other words, a single run requires 0.2c CPU and 512MB

Runners can be broken into runner groups, such that each workflow only uses what it requires, decreasing node provisioning. for example we could configure these runner groups:

runner-1c-2m
runner-2c-4m
runner-4c-8m
runner-8c-16m

Since we are paying by the minute, further cost cutting could be achieved by optimising CI/CD, using parallel runs instead of sequential where possible.