Compare commits

..

89 Commits

Author SHA1 Message Date
Bassem Dghaidi
59cb1d2c8b Prepare 0.10.0 release (#3849) 2024-12-16 11:39:55 +01:00
dependabot[bot]
fd8f76b91c Bump golang.org/x/crypto from 0.22.0 to 0.31.0 (#3844)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2024-12-13 15:57:08 +01:00
Bassem Dghaidi
7e04027d19 Make k8s client rate limiter parameters configurable (#3848)
Co-authored-by: Taketoshi Fujiwara <t-b-fujiwara@mercari.com>
2024-12-13 15:37:01 +01:00
Ken Muse
488b0956fd Update docs with details for the dashboard visualizations (#3696)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2024-12-13 14:50:55 +01:00
dependabot[bot]
3c14ee0652 Bump github.com/bradleyfalzon/ghinstallation/v2 from 2.8.0 to 2.12.0 (#3837)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2024-12-11 21:38:46 +01:00
Yusuke Kuoka
32ae917937 Make EphemeralRunnerReconciler create runner pods earlier (#3831)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2024-12-11 21:28:29 +01:00
Yusuke Kuoka
3998f6dee6 Make EphemeralRunnerController MaxConcurrentReconciles configurable (#3832)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2024-12-11 21:19:43 +01:00
Bassem Dghaidi
835bc2aed8 Fix ARC e2e tests (#3836) 2024-12-11 14:25:29 +01:00
github-actions[bot]
8b36ea90eb Updates: runner to v2.321.0 container-hooks to v0.6.2 (#3809)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-14 17:39:49 +01:00
github-actions[bot]
96d1bbcf2f Updates: runner to v2.320.0 (#3763)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-10-08 12:51:03 -04:00
Bassem Dghaidi
90b68fec1a Add exponential backoff when generating runner reg tokens (#3724) 2024-09-04 12:23:31 +02:00
github-actions[bot]
1be410ba80 Updates: runner to v2.319.1 (#3708)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2024-08-20 12:22:06 +02:00
github-actions[bot]
930c9db6e7 Updates: runner to v2.319.0 (#3702)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-08-20 11:30:43 +02:00
github-actions[bot]
a152741a1a Updates: runner to v2.318.0 container-hooks to v0.6.1 (#3684)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-07-26 13:52:44 -04:00
Nikola Jokic
80d848339e Prepare 0.9.3 release (#3624) 2024-06-25 12:35:39 +02:00
dependabot[bot]
8535a24135 Bump github.com/hashicorp/go-retryablehttp from 0.7.5 to 0.7.7 (#3623)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-25 10:18:13 +02:00
Nikola Jokic
b349ded2be Increase test timeouts to avoid CI test failures (#3554) 2024-06-21 13:45:48 +02:00
Nikola Jokic
6276c84493 AutoscalingListener controller: Inspect listener container state instead of pod phase (#3548) 2024-06-21 13:40:08 +02:00
Nikola Jokic
4a8420ce96 Update forgotten azure/setup-helm action (#3612) 2024-06-21 13:31:36 +02:00
Nikola Jokic
a62ca3d853 Exclude label prefix propagation (#3607) 2024-06-21 12:12:14 +02:00
Nikola Jokic
4eb038eaa1 Bump node actions (#3569) 2024-06-21 12:11:29 +02:00
Nikola Jokic
b2c6992e84 Check status code of fetch access token for github app (#3568) 2024-06-21 12:10:56 +02:00
Nikola Jokic
0a6208e38d Bump Go patch version to 1.22.4 (#3593) 2024-06-17 10:36:23 +02:00
Nikola Jokic
2cc793a835 Remove .Named() from the ephemeral runner controller (#3596) 2024-06-17 10:36:08 +02:00
github-actions[bot]
894732732a Updates: runner to v2.317.0 (#3559)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-06-07 11:53:30 +02:00
Serge
e45ac190e2 Customize work directory (#3477) 2024-06-04 15:16:45 +02:00
Katarzyna
d0fb7206a4 Fix problem with ephemeralRunner Succeeded state before build executed (#3528) 2024-06-03 10:49:45 +02:00
Nikola Jokic
9afd93065f Remove finalizers in one pass to speed up cleanups AutoscalingRunnerSet (#3536) 2024-05-27 09:21:31 +02:00
Nikola Jokic
3be7128f9a Prepare 0.9.2 release (#3530) 2024-05-20 10:58:06 +02:00
Nikola Jokic
3bda9bb240 Refresh session if token expires during delete message (#3529) 2024-05-17 15:16:38 +02:00
Nikola Jokic
ab92e4edc3 Re-use the last desired patch on empty batch (#3453) 2024-05-17 15:12:16 +02:00
Nikola Jokic
fa7a4f584e Extract single place to set up indexers (#3454) 2024-05-17 14:42:46 +02:00
Nikola Jokic
9b51f25800 Rename imports in tests to remove double import and to improve readability (#3455) 2024-05-17 14:37:13 +02:00
Nikola Jokic
ea13873f14 Remove service monitor that is not used in controller chart (#3526) 2024-05-17 13:06:57 +02:00
github-actions[bot]
a6d87c46cd Updates: runner to v2.316.1 (#3496)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-05-14 11:24:14 +02:00
Nikola Jokic
51c70a64c3 Include controller version in logs (#3473) 2024-05-13 14:16:36 +02:00
dependabot[bot]
a1b8e0cc3d Bump golang.org/x/sync from 0.6.0 to 0.7.0 (#3482)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-30 08:53:19 +02:00
dependabot[bot]
2889029bc5 Bump github.com/onsi/gomega from 1.30.0 to 1.33.0 (#3462)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-29 12:56:43 +02:00
dependabot[bot]
87f2e00971 Bump go.uber.org/zap from 1.26.0 to 1.27.0 (#3442)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-29 12:56:05 +02:00
dependabot[bot]
d9af241a7d Bump golang.org/x/oauth2 from 0.15.0 to 0.19.0 (#3441)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nikola Jokic <jokicnikola07@gmail.com>
2024-04-29 12:55:24 +02:00
github-actions[bot]
49490c4421 Updates: runner to v2.316.0 (#3463)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-04-24 12:21:30 +01:00
Bryan Peterson
109750f816 propogate arbitrary labels from runnersets to all created resources (#3157) 2024-04-23 11:19:32 +02:00
Nikola Jokic
9e191cdd21 Prepare 0.9.1 release (#3448) 2024-04-17 10:51:28 +02:00
Nikola Jokic
f965dfef73 Shutdown metrics server when listener exits (#3445) 2024-04-16 21:29:03 +02:00
Nikola Jokic
4ee49fee14 Propagate max capacity information to the actions back-end (#3431) 2024-04-16 14:00:40 +02:00
Nikola Jokic
8075e5ee74 Refactor actions client error to include request id (#3430)
Co-authored-by: Francesco Renzi <rentziass@gmail.com>
2024-04-16 12:57:44 +02:00
Nikola Jokic
963ae48a3f Include self correction on empty batch and avoid removing pending runners when cluster is busy (#3426) 2024-04-16 12:55:25 +02:00
nasa9084
98854ef9c0 Fix doc comment for listenerTemplate (#3436) 2024-04-15 11:48:30 +02:00
dependabot[bot]
1987d9eb2e Bump github.com/stretchr/testify from 1.8.4 to 1.9.0 (#3418)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nikola Jokic <jokicnikola07@gmail.com>
2024-04-12 15:01:52 +02:00
Alexandre Chouinard
0006dd5eb1 Add topologySpreadConstraint to gha-runner-scale-set-controller chart (#3405) 2024-04-12 14:22:41 +02:00
Nikola Jokic
86f1714354 Revert "Bump k8s.io/client-go from 0.28.4 to 0.29.3 (#3416)" (#3432) 2024-04-12 13:51:44 +02:00
dependabot[bot]
f68bbad579 Bump k8s.io/client-go from 0.28.4 to 0.29.3 (#3416)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nikola Jokic <jokicnikola07@gmail.com>
2024-04-12 13:12:02 +02:00
dependabot[bot]
d3a8a34bb2 Bump golang.org/x/net from 0.20.0 to 0.24.0 (#3417)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-09 07:42:41 +02:00
dependabot[bot]
d515b4a6e0 Bump github.com/onsi/ginkgo/v2 from 2.13.1 to 2.17.1 (#3379)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-08 10:17:02 +02:00
dependabot[bot]
d971fedbe8 Bump github.com/evanphx/json-patch from 5.7.0+incompatible to 5.9.0+incompatible (#3398)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-08 10:11:10 +02:00
dependabot[bot]
6c6d061f0a Bump github.com/cloudflare/circl from 1.3.6 to 1.3.7 (#3206)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-04 14:57:24 -04:00
github-actions[bot]
5b9b9f7ca2 Updates: runner to v2.315.0 container-hooks to v0.6.0 (#3387)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-04-03 16:06:30 -04:00
Nikola Jokic
4357525445 Prepare 0.9.0 release (#3388) 2024-03-27 11:54:17 +01:00
Nikola Jokic
1d1790614b Add retry on 401 and 403 for runner-registration (#3377)
Co-authored-by: Francesco Renzi <rentziass@gmail.com>
2024-03-27 10:55:17 +01:00
dependabot[bot]
442d52cd56 Bump github.com/go-logr/logr from 1.3.0 to 1.4.1 (#3383)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nikola Jokic <jokicnikola07@gmail.com>
2024-03-26 15:19:39 +01:00
Nikola Jokic
b6a95ae879 Change duplicate message key in logs while updating ephemeral runner status (#3380) 2024-03-26 12:57:46 +01:00
dependabot[bot]
9968141086 Bump golang.org/x/sync from 0.5.0 to 0.6.0 (#3384)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-26 09:27:58 +01:00
dependabot[bot]
e59d127d41 Bump golang.org/x/crypto from 0.16.0 to 0.17.0 (#3173)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nikola Jokic <jokicnikola07@gmail.com>
2024-03-25 16:28:31 +01:00
dependabot[bot]
fb1232c13e Bump google.golang.org/protobuf from 1.31.0 to 1.33.0 (#3349)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nikola Jokic <jokicnikola07@gmail.com>
2024-03-22 18:04:34 +01:00
Nikola Jokic
7a643a5107 Fix overscaling when the controller is much faster then the listener (#3371)
Co-authored-by: Francesco Renzi <rentziass@gmail.com>
2024-03-20 15:36:12 +01:00
Nikola Jokic
46cfbb6ec7 Fix documented dind expansion (#3368) 2024-03-19 15:24:58 +01:00
Nikola Jokic
c9099a5a56 Add annotation with values hash to re-create listener (#3195) 2024-03-19 14:29:49 +01:00
Hidehito Yabuuchi
48706584fd Propagate runner scale set name annotation to EphemeralRunner (#3098) 2024-03-19 12:50:49 +01:00
Nikola Jokic
2c0e53951b Fix tests and comment string for docker socket mounted path (#3366) 2024-03-19 11:29:07 +01:00
Nikola Jokic
a7af44e042 Deprecation warning of older listener for 0.9.0 release (#3280) 2024-03-18 12:59:41 +01:00
Nikola Jokic
f225fef921 Bump Go version to 1.22.1 (#3290) 2024-03-18 12:46:30 +01:00
Nikola Jokic
814947c60e Update metrics to include repository on job-based label (#3310)
Co-authored-by: Samuel Rats <samuel.rats@teads.com>
2024-03-18 12:45:52 +01:00
Nikola Jokic
039350a0d0 Escape automated updates version to avoid changing stuff that don't exactly match (#3354) 2024-03-18 12:41:12 +01:00
Nikola Jokic
a0fb417f69 Change docker socket path to /var/run/docker.sock (#3337) 2024-03-18 12:40:27 +01:00
Nikola Jokic
f5fd831c2f Add Francesco (@rentziass) to CODEOWNERS (#3362) 2024-03-18 12:08:16 +01:00
github-actions[bot]
753afb75b9 Updates: runner to v2.314.1 (#3308)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
2024-02-28 15:43:14 -05:00
Nikola Jokic
309b53143e Prepare 0.8.3 release (#3309) 2024-02-28 10:26:32 +01:00
Nikola Jokic
7da2d7f96a Fix acquire jobs after session refresh ghalistener (#3307) 2024-02-27 17:37:42 +01:00
Ivar Larsson
e06c7edc21 Refer to the correct variable in discovery error message (#3296) 2024-02-26 15:51:07 +01:00
Talia Stocks
9fba37540a Expose volumeMounts and volumes in gha-runner-scale-set-controller (#3260) 2024-02-12 14:47:09 +01:00
github-actions[bot]
a68aa00bd8 Updates: runner to v2.313.0 container-hooks to v0.5.1 (#3270)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-02-09 09:44:28 -05:00
dependabot[bot]
9b053102ed Bump github.com/google/uuid from 1.4.0 to 1.6.0 (#3253)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-06 15:05:09 +01:00
Nick McClorey
c03fac8fdd Remove Typo in Grafana docs (#3235) 2024-02-02 10:01:22 +01:00
Nikola Jokic
d72774753c Prepare 0.8.2 release (#3249) 2024-01-26 11:03:08 +01:00
Nikola Jokic
f7b6ad901d Add listener graceful termination period and background context after the message is received (#3187) 2024-01-25 15:45:07 +01:00
Nikola Jokic
728f05c844 Delete message session when listener.Listen returns (#3240) 2024-01-25 15:12:19 +01:00
Nikola Jokic
c00465973e Publish metrics in the new ghalistener (#3193) 2024-01-25 14:46:42 +01:00
github-actions[bot]
5f23afaad3 Updates: runner to v2.312.0 (#3229)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-01-22 14:17:31 -05:00
Ken Muse
47dfed3ced Add documentation legacy modes warning and links to new docs (#3199) 2024-01-05 19:56:31 +01:00
123 changed files with 65305 additions and 13797 deletions

View File

@@ -47,7 +47,7 @@ runs:
-d '{"ref": "main", "inputs": { "arc_name": "${{inputs.arc-name}}" } }'
- name: Fetch workflow run & job ids
uses: actions/github-script@v6
uses: actions/github-script@v7
id: query_workflow
with:
script: |
@@ -128,7 +128,7 @@ runs:
- name: Wait for workflow to start running
if: inputs.wait-to-running == 'true' && inputs.wait-to-finish == 'false'
uses: actions/github-script@v6
uses: actions/github-script@v7
with:
script: |
function sleep(ms) {
@@ -156,7 +156,7 @@ runs:
- name: Wait for workflow to finish successfully
if: inputs.wait-to-finish == 'true'
uses: actions/github-script@v6
uses: actions/github-script@v7
with:
script: |
// Wait 5 minutes and make sure the workflow run we triggered completed with result 'success'
@@ -188,6 +188,19 @@ runs:
}
core.setFailed(`The triggered workflow run didn't finish properly using ${{inputs.arc-name}}`)
- name: Gather listener logs
shell: bash
if: always()
run: |
LISTENER_POD="$(kubectl get autoscalinglisteners.actions.github.com -n arc-systems -o jsonpath='{.items[*].metadata.name}')"
kubectl logs $LISTENER_POD -n ${{inputs.arc-controller-namespace}}
- name: Gather coredns logs
shell: bash
if: always()
run: |
kubectl logs deployments/coredns -n kube-system
- name: cleanup
if: inputs.wait-to-finish == 'true'
shell: bash
@@ -195,8 +208,8 @@ runs:
helm uninstall ${{ inputs.arc-name }} --namespace ${{inputs.arc-namespace}} --debug
kubectl wait --timeout=30s --for=delete AutoScalingRunnerSet -n ${{inputs.arc-namespace}} -l app.kubernetes.io/instance=${{ inputs.arc-name }}
- name: Gather logs and cleanup
- name: Gather controller logs
shell: bash
if: always()
run: |
kubectl logs deployment/arc-gha-rs-controller -n ${{inputs.arc-controller-namespace}}
kubectl logs deployment/arc-gha-rs-controller -n ${{inputs.arc-controller-namespace}}

View File

@@ -27,7 +27,7 @@ runs:
using: "composite"
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
with:
# Pinning v0.9.1 for Buildx and BuildKit v0.10.6
# BuildKit v0.11 which has a bug causing intermittent
@@ -36,7 +36,7 @@ runs:
driver-opts: image=moby/buildkit:v0.10.6
- name: Build controller image
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
file: Dockerfile
platforms: linux/amd64
@@ -56,7 +56,7 @@ runs:
- name: Get configure token
id: config-token
uses: peter-murray/workflow-application-token-action@8e1ba3bf1619726336414f1014e37f17fbadf1db
uses: peter-murray/workflow-application-token-action@dc0413987a085fa17d19df9e47d4677cf81ffef3
with:
application_id: ${{ inputs.app-id }}
application_private_key: ${{ inputs.app-pk }}

View File

@@ -24,23 +24,23 @@ runs:
shell: bash
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
with:
version: latest
- name: Login to DockerHub
if: ${{ github.event_name == 'release' || github.event_name == 'push' && github.ref == 'refs/heads/master' && inputs.password != '' }}
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
username: ${{ inputs.username }}
password: ${{ inputs.password }}
- name: Login to GitHub Container Registry
if: ${{ github.event_name == 'release' || github.event_name == 'push' && github.ref == 'refs/heads/master' && inputs.ghcr_password != '' }}
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ inputs.ghcr_username }}

View File

@@ -40,12 +40,12 @@ jobs:
publish-chart: ${{ steps.publish-chart-step.outputs.publish }}
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Helm
uses: azure/setup-helm@v3.4
uses: azure/setup-helm@fe7b79cd5ee1e45176fcad797de68ecaf3ca4814
with:
version: ${{ env.HELM_VERSION }}
@@ -58,7 +58,7 @@ jobs:
run: helm template --values charts/.ci/values-kube-score.yaml charts/* | ./kube-score score - --ignore-test pod-networkpolicy --ignore-test deployment-has-poddisruptionbudget --ignore-test deployment-has-host-podantiaffinity --ignore-test container-security-context --ignore-test pod-probes --ignore-test container-image-tag --enable-optional-test container-security-context-privileged --enable-optional-test container-security-context-readonlyrootfilesystem
# python is a requirement for the chart-testing action below (supports yamllint among other tests)
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
@@ -134,7 +134,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0
@@ -145,7 +145,7 @@ jobs:
- name: Get Token
id: get_workflow_token
uses: peter-murray/workflow-application-token-action@8e1ba3bf1619726336414f1014e37f17fbadf1db
uses: peter-murray/workflow-application-token-action@dc0413987a085fa17d19df9e47d4677cf81ffef3
with:
application_id: ${{ secrets.ACTIONS_ACCESS_APP_ID }}
application_private_key: ${{ secrets.ACTIONS_ACCESS_PK }}
@@ -184,7 +184,7 @@ jobs:
# this workaround is intended to move the index.yaml to the target repo
# where the github pages are hosted
- name: Checkout target repository
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
repository: ${{ env.CHART_TARGET_ORG }}/${{ env.CHART_TARGET_REPO }}
path: ${{ env.CHART_TARGET_REPO }}

View File

@@ -39,9 +39,9 @@ jobs:
if: ${{ !startsWith(github.event.inputs.release_tag_name, 'gha-runner-scale-set-') }}
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- uses: actions/setup-go@v4
- uses: actions/setup-go@v5
with:
go-version-file: 'go.mod'
@@ -73,7 +73,7 @@ jobs:
- name: Get Token
id: get_workflow_token
uses: peter-murray/workflow-application-token-action@8e1ba3bf1619726336414f1014e37f17fbadf1db
uses: peter-murray/workflow-application-token-action@dc0413987a085fa17d19df9e47d4677cf81ffef3
with:
application_id: ${{ secrets.ACTIONS_ACCESS_APP_ID }}
application_private_key: ${{ secrets.ACTIONS_ACCESS_PK }}

View File

@@ -28,7 +28,7 @@ jobs:
name: Trigger Build and Push of Runner Images
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Get runner version
id: versions
run: |
@@ -39,7 +39,7 @@ jobs:
- name: Get Token
id: get_workflow_token
uses: peter-murray/workflow-application-token-action@8e1ba3bf1619726336414f1014e37f17fbadf1db
uses: peter-murray/workflow-application-token-action@dc0413987a085fa17d19df9e47d4677cf81ffef3
with:
application_id: ${{ secrets.ACTIONS_ACCESS_APP_ID }}
application_private_key: ${{ secrets.ACTIONS_ACCESS_PK }}

View File

@@ -21,7 +21,7 @@ jobs:
container_hooks_current_version: ${{ steps.container_hooks_versions.outputs.container_hooks_current_version }}
container_hooks_latest_version: ${{ steps.container_hooks_versions.outputs.container_hooks_latest_version }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Get runner current and latest versions
id: runner_versions
@@ -64,7 +64,7 @@ jobs:
echo "CONTAINER_HOOKS_CURRENT_VERSION=${{ needs.check_versions.outputs.container_hooks_current_version }}"
echo "CONTAINER_HOOKS_LATEST_VERSION=${{ needs.check_versions.outputs.container_hooks_latest_version }}"
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: PR Name
id: pr_name
@@ -78,7 +78,7 @@ jobs:
run: |
RUNNER_MESSAGE="runner to v${RUNNER_LATEST_VERSION}"
CONTAINER_HOOKS_MESSAGE="container-hooks to v${CONTAINER_HOOKS_LATEST_VERSION}"
PR_NAME="Updates:"
if [ "$RUNNER_CURRENT_VERSION" != "$RUNNER_LATEST_VERSION" ]
then
@@ -88,7 +88,7 @@ jobs:
then
PR_NAME="$PR_NAME $CONTAINER_HOOKS_MESSAGE"
fi
result=$(gh pr list --search "$PR_NAME" --json number --jq ".[].number" --limit 1)
if [ -z "$result" ]
then
@@ -119,22 +119,26 @@ jobs:
PR_NAME: ${{ needs.check_pr.outputs.pr_name }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: New branch
run: git checkout -b update-runner-"$(date +%Y-%m-%d)"
- name: Update files
run: |
sed -i "s/$RUNNER_CURRENT_VERSION/$RUNNER_LATEST_VERSION/g" runner/VERSION
sed -i "s/$RUNNER_CURRENT_VERSION/$RUNNER_LATEST_VERSION/g" runner/Makefile
sed -i "s/$RUNNER_CURRENT_VERSION/$RUNNER_LATEST_VERSION/g" Makefile
sed -i "s/$RUNNER_CURRENT_VERSION/$RUNNER_LATEST_VERSION/g" test/e2e/e2e_test.go
sed -i "s/$CONTAINER_HOOKS_CURRENT_VERSION/$CONTAINER_HOOKS_LATEST_VERSION/g" runner/VERSION
sed -i "s/$CONTAINER_HOOKS_CURRENT_VERSION/$CONTAINER_HOOKS_LATEST_VERSION/g" runner/Makefile
sed -i "s/$CONTAINER_HOOKS_CURRENT_VERSION/$CONTAINER_HOOKS_LATEST_VERSION/g" Makefile
sed -i "s/$CONTAINER_HOOKS_CURRENT_VERSION/$CONTAINER_HOOKS_LATEST_VERSION/g" test/e2e/e2e_test.go
CURRENT_VERSION="${RUNNER_CURRENT_VERSION//./\\.}"
LATEST_VERSION="${RUNNER_LATEST_VERSION//./\\.}"
sed -i "s/$CURRENT_VERSION/$LATEST_VERSION/g" runner/VERSION
sed -i "s/$CURRENT_VERSION/$LATEST_VERSION/g" runner/Makefile
sed -i "s/$CURRENT_VERSION/$LATEST_VERSION/g" Makefile
sed -i "s/$CURRENT_VERSION/$LATEST_VERSION/g" test/e2e/e2e_test.go
CURRENT_VERSION="${CONTAINER_HOOKS_CURRENT_VERSION//./\\.}"
LATEST_VERSION="${CONTAINER_HOOKS_LATEST_VERSION//./\\.}"
sed -i "s/$CURRENT_VERSION/$LATEST_VERSION/g" runner/VERSION
sed -i "s/$CURRENT_VERSION/$LATEST_VERSION/g" runner/Makefile
sed -i "s/$CURRENT_VERSION/$LATEST_VERSION/g" Makefile
sed -i "s/$CURRENT_VERSION/$LATEST_VERSION/g" test/e2e/e2e_test.go
- name: Commit changes
run: |

View File

@@ -40,13 +40,13 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Helm
# Using https://github.com/Azure/setup-helm/releases/tag/v3.5
uses: azure/setup-helm@5119fcb9089d432beecbf79bb2c7915207344b78
# Using https://github.com/Azure/setup-helm/releases/tag/v4.2
uses: azure/setup-helm@fe7b79cd5ee1e45176fcad797de68ecaf3ca4814
with:
version: ${{ env.HELM_VERSION }}
@@ -67,7 +67,7 @@ jobs:
--enable-optional-test container-security-context-readonlyrootfilesystem
# python is a requirement for the chart-testing action below (supports yamllint among other tests)
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'

View File

@@ -24,7 +24,7 @@ jobs:
name: runner / shellcheck
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: shellcheck
uses: reviewdog/action-shellcheck@v1
with:
@@ -45,7 +45,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Run tests
run: |

View File

@@ -16,7 +16,7 @@ env:
TARGET_ORG: actions-runner-controller
TARGET_REPO: arc_e2e_test_dummy
IMAGE_NAME: "arc-test-image"
IMAGE_VERSION: "0.8.1"
IMAGE_VERSION: "0.10.0"
concurrency:
# This will make sure we only apply the concurrency limits on pull requests
@@ -33,7 +33,7 @@ jobs:
env:
WORKFLOW_FILE: "arc-test-workflow.yaml"
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{github.head_ref}}
@@ -103,6 +103,8 @@ jobs:
kubectl wait --timeout=30s --for=condition=ready pod -n arc-systems -l actions.github.com/scale-set-name=$ARC_NAME
kubectl get pod -n arc-systems
sleep 60
- name: Test ARC E2E
uses: ./.github/actions/execute-assert-arc-e2e
timeout-minutes: 10
@@ -122,7 +124,7 @@ jobs:
env:
WORKFLOW_FILE: "arc-test-workflow.yaml"
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{github.head_ref}}
@@ -194,6 +196,8 @@ jobs:
kubectl wait --timeout=30s --for=condition=ready pod -n arc-systems -l actions.github.com/scale-set-name=$ARC_NAME
kubectl get pod -n arc-systems
sleep 60
- name: Test ARC E2E
uses: ./.github/actions/execute-assert-arc-e2e
timeout-minutes: 10
@@ -213,7 +217,7 @@ jobs:
env:
WORKFLOW_FILE: arc-test-dind-workflow.yaml
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{github.head_ref}}
@@ -284,6 +288,8 @@ jobs:
kubectl wait --timeout=30s --for=condition=ready pod -n arc-systems -l actions.github.com/scale-set-name=$ARC_NAME
kubectl get pod -n arc-systems
sleep 60
- name: Test ARC E2E
uses: ./.github/actions/execute-assert-arc-e2e
timeout-minutes: 10
@@ -303,7 +309,7 @@ jobs:
env:
WORKFLOW_FILE: "arc-test-kubernetes-workflow.yaml"
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{github.head_ref}}
@@ -383,6 +389,8 @@ jobs:
kubectl wait --timeout=30s --for=condition=ready pod -n arc-systems -l actions.github.com/scale-set-name=$ARC_NAME
kubectl get pod -n arc-systems
sleep 60
- name: Test ARC E2E
uses: ./.github/actions/execute-assert-arc-e2e
timeout-minutes: 10
@@ -402,7 +410,7 @@ jobs:
env:
WORKFLOW_FILE: "arc-test-workflow.yaml"
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{github.head_ref}}
@@ -484,6 +492,8 @@ jobs:
kubectl wait --timeout=30s --for=condition=ready pod -n arc-systems -l actions.github.com/scale-set-name=$ARC_NAME
kubectl get pod -n arc-systems
sleep 60
- name: Test ARC E2E
uses: ./.github/actions/execute-assert-arc-e2e
timeout-minutes: 10
@@ -503,7 +513,7 @@ jobs:
env:
WORKFLOW_FILE: "arc-test-workflow.yaml"
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{github.head_ref}}
@@ -579,6 +589,8 @@ jobs:
kubectl wait --timeout=30s --for=condition=ready pod -n arc-systems -l actions.github.com/scale-set-name=$ARC_NAME
kubectl get pod -n arc-systems
sleep 60
- name: Test ARC E2E
uses: ./.github/actions/execute-assert-arc-e2e
timeout-minutes: 10
@@ -598,7 +610,7 @@ jobs:
env:
WORKFLOW_FILE: "arc-test-workflow.yaml"
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{github.head_ref}}
@@ -699,6 +711,8 @@ jobs:
kubectl wait --timeout=30s --for=condition=ready pod -n arc-systems -l actions.github.com/scale-set-name=$ARC_NAME
kubectl get pod -n arc-systems
sleep 60
- name: Test ARC E2E
uses: ./.github/actions/execute-assert-arc-e2e
timeout-minutes: 10
@@ -718,7 +732,7 @@ jobs:
env:
WORKFLOW_FILE: "arc-test-sleepy-matrix.yaml"
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{github.head_ref}}
@@ -789,6 +803,8 @@ jobs:
kubectl wait --timeout=30s --for=condition=ready pod -n arc-systems -l actions.github.com/scale-set-name=$ARC_NAME
kubectl get pod -n arc-systems
sleep 60
- name: Trigger long running jobs and wait for runners to pick them up
uses: ./.github/actions/execute-assert-arc-e2e
timeout-minutes: 10
@@ -888,7 +904,7 @@ jobs:
env:
WORKFLOW_FILE: arc-test-workflow.yaml
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ github.head_ref }}

View File

@@ -45,7 +45,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
# If inputs.ref is empty, it'll resolve to the default branch
ref: ${{ inputs.ref }}
@@ -72,10 +72,10 @@ jobs:
echo "repository_owner=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
with:
# Pinning v0.9.1 for Buildx and BuildKit v0.10.6
# BuildKit v0.11 which has a bug causing intermittent
@@ -84,14 +84,14 @@ jobs:
driver-opts: image=moby/buildkit:v0.10.6
- name: Login to GitHub Container Registry
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build & push controller image
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
file: Dockerfile
platforms: linux/amd64,linux/arm64
@@ -121,7 +121,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
# If inputs.ref is empty, it'll resolve to the default branch
ref: ${{ inputs.ref }}
@@ -140,8 +140,8 @@ jobs:
echo "repository_owner=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
- name: Set up Helm
# Using https://github.com/Azure/setup-helm/releases/tag/v3.5
uses: azure/setup-helm@5119fcb9089d432beecbf79bb2c7915207344b78
# Using https://github.com/Azure/setup-helm/releases/tag/v4.2
uses: azure/setup-helm@fe7b79cd5ee1e45176fcad797de68ecaf3ca4814
with:
version: ${{ env.HELM_VERSION }}
@@ -169,7 +169,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
# If inputs.ref is empty, it'll resolve to the default branch
ref: ${{ inputs.ref }}
@@ -188,8 +188,8 @@ jobs:
echo "repository_owner=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
- name: Set up Helm
# Using https://github.com/Azure/setup-helm/releases/tag/v3.5
uses: azure/setup-helm@5119fcb9089d432beecbf79bb2c7915207344b78
# Using https://github.com/Azure/setup-helm/releases/tag/v4.2
uses: azure/setup-helm@fe7b79cd5ee1e45176fcad797de68ecaf3ca4814
with:
version: ${{ env.HELM_VERSION }}

View File

@@ -36,13 +36,13 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Helm
# Using https://github.com/Azure/setup-helm/releases/tag/v3.5
uses: azure/setup-helm@5119fcb9089d432beecbf79bb2c7915207344b78
# Using https://github.com/Azure/setup-helm/releases/tag/v4.2
uses: azure/setup-helm@fe7b79cd5ee1e45176fcad797de68ecaf3ca4814
with:
version: ${{ env.HELM_VERSION }}
@@ -63,7 +63,7 @@ jobs:
--enable-optional-test container-security-context-readonlyrootfilesystem
# python is a requirement for the chart-testing action below (supports yamllint among other tests)
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
@@ -84,13 +84,13 @@ jobs:
ct lint --config charts/.ci/ct-config-gha.yaml
- name: Set up docker buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
if: steps.list-changed.outputs.changed == 'true'
with:
version: latest
- name: Build controller image
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
if: steps.list-changed.outputs.changed == 'true'
with:
file: Dockerfile

View File

@@ -55,11 +55,11 @@ jobs:
TARGET_REPO: actions-runner-controller
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Get Token
id: get_workflow_token
uses: peter-murray/workflow-application-token-action@8e1ba3bf1619726336414f1014e37f17fbadf1db
uses: peter-murray/workflow-application-token-action@dc0413987a085fa17d19df9e47d4677cf81ffef3
with:
application_id: ${{ secrets.ACTIONS_ACCESS_APP_ID }}
application_private_key: ${{ secrets.ACTIONS_ACCESS_PK }}
@@ -90,10 +90,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Login to GitHub Container Registry
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
@@ -110,16 +110,16 @@ jobs:
echo "repository_owner=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
with:
version: latest
# Unstable builds - run at your own risk
- name: Build and Push
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
context: .
file: ./Dockerfile

View File

@@ -25,10 +25,10 @@ jobs:
security-events: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Install Go
uses: actions/setup-go@v4
uses: actions/setup-go@v5
with:
go-version-file: go.mod

View File

@@ -11,7 +11,7 @@ jobs:
check_for_first_interaction:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: actions/first-interaction@main
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -29,8 +29,8 @@ jobs:
fmt:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-go@v4
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: 'go.mod'
cache: false
@@ -42,13 +42,13 @@ jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-go@v4
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: 'go.mod'
cache: false
- name: golangci-lint
uses: golangci/golangci-lint-action@v3
uses: golangci/golangci-lint-action@v6
with:
only-new-issues: true
version: v1.55.2
@@ -56,8 +56,8 @@ jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-go@v4
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: 'go.mod'
cache: false
@@ -69,8 +69,8 @@ jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-go@v4
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: 'go.mod'
- run: make manifests

View File

@@ -1,7 +1,9 @@
run:
timeout: 3m
output:
format: github-actions
formats:
- format: github-actions
path: stdout
linters-settings:
errcheck:
exclude-functions:

View File

@@ -1,2 +1,2 @@
# actions-runner-controller maintainers
* @mumoshu @toast-gear @actions/actions-launch @nikola-jokic
* @mumoshu @toast-gear @actions/actions-launch @nikola-jokic @rentziass

View File

@@ -1,5 +1,5 @@
# Build the manager binary
FROM --platform=$BUILDPLATFORM golang:1.21.3 as builder
FROM --platform=$BUILDPLATFORM golang:1.22.4 as builder
WORKDIR /workspace

View File

@@ -6,7 +6,7 @@ endif
DOCKER_USER ?= $(shell echo ${DOCKER_IMAGE_NAME} | cut -d / -f1)
VERSION ?= dev
COMMIT_SHA = $(shell git rev-parse HEAD)
RUNNER_VERSION ?= 2.311.0
RUNNER_VERSION ?= 2.321.0
TARGETPLATFORM ?= $(shell arch)
RUNNER_NAME ?= ${DOCKER_USER}/actions-runner
RUNNER_TAG ?= ${VERSION}
@@ -68,7 +68,7 @@ endif
all: manager
lint:
docker run --rm -v $(PWD):/app -w /app golangci/golangci-lint:v1.55.2 golangci-lint run
docker run --rm -v $(PWD):/app -w /app golangci/golangci-lint:v1.57.2 golangci-lint run
GO_TEST_ARGS ?= -short
@@ -310,7 +310,7 @@ github-release: release
# Otherwise we get errors like the below:
# Error: failed to install CRD crds/actions.summerwind.dev_runnersets.yaml: CustomResourceDefinition.apiextensions.k8s.io "runnersets.actions.summerwind.dev" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[template].properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[template].properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]
#
# Note that controller-gen newer than 0.6.0 is needed due to https://github.com/kubernetes-sigs/controller-tools/issues/448
# Note that controller-gen newer than 0.6.2 is needed due to https://github.com/kubernetes-sigs/controller-tools/issues/448
# Otherwise ObjectMeta embedded in Spec results in empty on the storage.
controller-gen:
ifeq (, $(shell which controller-gen))
@@ -320,7 +320,7 @@ ifeq (, $(wildcard $(GOBIN)/controller-gen))
CONTROLLER_GEN_TMP_DIR=$$(mktemp -d) ;\
cd $$CONTROLLER_GEN_TMP_DIR ;\
go mod init tmp ;\
go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.13.0 ;\
go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.14.0 ;\
rm -rf $$CONTROLLER_GEN_TMP_DIR ;\
}
endif

View File

@@ -42,6 +42,10 @@ type EphemeralRunner struct {
Status EphemeralRunnerStatus `json:"status,omitempty"`
}
func (er *EphemeralRunner) IsDone() bool {
return er.Status.Phase == corev1.PodSucceeded || er.Status.Phase == corev1.PodFailed
}
// EphemeralRunnerSpec defines the desired state of EphemeralRunner
type EphemeralRunnerSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster

View File

@@ -24,6 +24,8 @@ import (
type EphemeralRunnerSetSpec struct {
// Replicas is the number of desired EphemeralRunner resources in the k8s namespace.
Replicas int `json:"replicas,omitempty"`
// PatchID is the unique identifier for the patch issued by the listener app
PatchID int `json:"patchID"`
EphemeralRunnerSpec EphemeralRunnerSpec `json:"ephemeralRunnerSpec,omitempty"`
}

View File

@@ -3,7 +3,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.13.0
controller-gen.kubebuilder.io/version: v0.14.0
name: horizontalrunnerautoscalers.actions.summerwind.dev
spec:
group: actions.summerwind.dev
@@ -35,10 +35,19 @@ spec:
description: HorizontalRunnerAutoscaler is the Schema for the horizontalrunnerautoscaler API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
@@ -47,7 +56,9 @@ spec:
properties:
capacityReservations:
items:
description: CapacityReservation specifies the number of replicas temporarily added to the scale target until ExpirationTime.
description: |-
CapacityReservation specifies the number of replicas temporarily added
to the scale target until ExpirationTime.
properties:
effectiveTime:
format: date-time
@@ -79,30 +90,46 @@ spec:
items:
properties:
repositoryNames:
description: RepositoryNames is the list of repository names to be used for calculating the metric. For example, a repository name is the REPO part of `github.com/USER/REPO`.
description: |-
RepositoryNames is the list of repository names to be used for calculating the metric.
For example, a repository name is the REPO part of `github.com/USER/REPO`.
items:
type: string
type: array
scaleDownAdjustment:
description: ScaleDownAdjustment is the number of runners removed on scale-down. You can only specify either ScaleDownFactor or ScaleDownAdjustment.
description: |-
ScaleDownAdjustment is the number of runners removed on scale-down.
You can only specify either ScaleDownFactor or ScaleDownAdjustment.
type: integer
scaleDownFactor:
description: ScaleDownFactor is the multiplicative factor applied to the current number of runners used to determine how many pods should be removed.
description: |-
ScaleDownFactor is the multiplicative factor applied to the current number of runners used
to determine how many pods should be removed.
type: string
scaleDownThreshold:
description: ScaleDownThreshold is the percentage of busy runners less than which will trigger the hpa to scale the runners down.
description: |-
ScaleDownThreshold is the percentage of busy runners less than which will
trigger the hpa to scale the runners down.
type: string
scaleUpAdjustment:
description: ScaleUpAdjustment is the number of runners added on scale-up. You can only specify either ScaleUpFactor or ScaleUpAdjustment.
description: |-
ScaleUpAdjustment is the number of runners added on scale-up.
You can only specify either ScaleUpFactor or ScaleUpAdjustment.
type: integer
scaleUpFactor:
description: ScaleUpFactor is the multiplicative factor applied to the current number of runners used to determine how many pods should be added.
description: |-
ScaleUpFactor is the multiplicative factor applied to the current number of runners used
to determine how many pods should be added.
type: string
scaleUpThreshold:
description: ScaleUpThreshold is the percentage of busy runners greater than which will trigger the hpa to scale runners up.
description: |-
ScaleUpThreshold is the percentage of busy runners greater than which will
trigger the hpa to scale runners up.
type: string
type:
description: Type is the type of metric to be used for autoscaling. It can be TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy.
description: |-
Type is the type of metric to be used for autoscaling.
It can be TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy.
type: string
type: object
type: array
@@ -110,7 +137,9 @@ spec:
description: MinReplicas is the minimum number of replicas the deployment is allowed to scale
type: integer
scaleDownDelaySecondsAfterScaleOut:
description: ScaleDownDelaySecondsAfterScaleUp is the approximate delay for a scale down followed by a scale up Used to prevent flapping (down->up->down->... loop)
description: |-
ScaleDownDelaySecondsAfterScaleUp is the approximate delay for a scale down followed by a scale up
Used to prevent flapping (down->up->down->... loop)
type: integer
scaleTargetRef:
description: ScaleTargetRef is the reference to scaled resource like RunnerDeployment
@@ -126,7 +155,18 @@ spec:
type: string
type: object
scaleUpTriggers:
description: "ScaleUpTriggers is an experimental feature to increase the desired replicas by 1 on each webhook requested received by the webhookBasedAutoscaler. \n This feature requires you to also enable and deploy the webhookBasedAutoscaler onto your cluster. \n Note that the added runners remain until the next sync period at least, and they may or may not be used by GitHub Actions depending on the timing. They are intended to be used to gain \"resource slack\" immediately after you receive a webhook from GitHub, so that you can loosely expect MinReplicas runners to be always available."
description: |-
ScaleUpTriggers is an experimental feature to increase the desired replicas by 1
on each webhook requested received by the webhookBasedAutoscaler.
This feature requires you to also enable and deploy the webhookBasedAutoscaler onto your cluster.
Note that the added runners remain until the next sync period at least,
and they may or may not be used by GitHub Actions depending on the timing.
They are intended to be used to gain "resource slack" immediately after you
receive a webhook from GitHub, so that you can loosely expect MinReplicas runners to be always available.
items:
properties:
amount:
@@ -139,12 +179,18 @@ spec:
description: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#check_run
properties:
names:
description: Names is a list of GitHub Actions glob patterns. Any check_run event whose name matches one of patterns in the list can trigger autoscaling. Note that check_run name seem to equal to the job name you've defined in your actions workflow yaml file. So it is very likely that you can utilize this to trigger depending on the job.
description: |-
Names is a list of GitHub Actions glob patterns.
Any check_run event whose name matches one of patterns in the list can trigger autoscaling.
Note that check_run name seem to equal to the job name you've defined in your actions workflow yaml file.
So it is very likely that you can utilize this to trigger depending on the job.
items:
type: string
type: array
repositories:
description: Repositories is a list of GitHub repositories. Any check_run event whose repository matches one of repositories in the list can trigger autoscaling.
description: |-
Repositories is a list of GitHub repositories.
Any check_run event whose repository matches one of repositories in the list can trigger autoscaling.
items:
type: string
type: array
@@ -169,7 +215,9 @@ spec:
type: array
type: object
push:
description: PushSpec is the condition for triggering scale-up on push event Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
description: |-
PushSpec is the condition for triggering scale-up on push event
Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
type: object
workflowJob:
description: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job
@@ -178,23 +226,33 @@ spec:
type: object
type: array
scheduledOverrides:
description: ScheduledOverrides is the list of ScheduledOverride. It can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule. The earlier a scheduled override is, the higher it is prioritized.
description: |-
ScheduledOverrides is the list of ScheduledOverride.
It can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule.
The earlier a scheduled override is, the higher it is prioritized.
items:
description: ScheduledOverride can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule. A schedule can optionally be recurring, so that the corresponding override happens every day, week, month, or year.
description: |-
ScheduledOverride can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule.
A schedule can optionally be recurring, so that the corresponding override happens every day, week, month, or year.
properties:
endTime:
description: EndTime is the time at which the first override ends.
format: date-time
type: string
minReplicas:
description: MinReplicas is the number of runners while overriding. If omitted, it doesn't override minReplicas.
description: |-
MinReplicas is the number of runners while overriding.
If omitted, it doesn't override minReplicas.
minimum: 0
nullable: true
type: integer
recurrenceRule:
properties:
frequency:
description: Frequency is the name of a predefined interval of each recurrence. The valid values are "Daily", "Weekly", "Monthly", and "Yearly". If empty, the corresponding override happens only once.
description: |-
Frequency is the name of a predefined interval of each recurrence.
The valid values are "Daily", "Weekly", "Monthly", and "Yearly".
If empty, the corresponding override happens only once.
enum:
- Daily
- Weekly
@@ -202,7 +260,9 @@ spec:
- Yearly
type: string
untilTime:
description: UntilTime is the time of the final recurrence. If empty, the schedule recurs forever.
description: |-
UntilTime is the time of the final recurrence.
If empty, the schedule recurs forever.
format: date-time
type: string
type: object
@@ -231,18 +291,24 @@ spec:
type: object
type: array
desiredReplicas:
description: DesiredReplicas is the total number of desired, non-terminated and latest pods to be set for the primary RunnerSet This doesn't include outdated pods while upgrading the deployment and replacing the runnerset.
description: |-
DesiredReplicas is the total number of desired, non-terminated and latest pods to be set for the primary RunnerSet
This doesn't include outdated pods while upgrading the deployment and replacing the runnerset.
type: integer
lastSuccessfulScaleOutTime:
format: date-time
nullable: true
type: string
observedGeneration:
description: ObservedGeneration is the most recent generation observed for the target. It corresponds to e.g. RunnerDeployment's generation, which is updated on mutation by the API Server.
description: |-
ObservedGeneration is the most recent generation observed for the target. It corresponds to e.g.
RunnerDeployment's generation, which is updated on mutation by the API Server.
format: int64
type: integer
scheduledOverridesSummary:
description: ScheduledOverridesSummary is the summary of active and upcoming scheduled overrides to be shown in e.g. a column of a `kubectl get hra` output for observability.
description: |-
ScheduledOverridesSummary is the summary of active and upcoming scheduled overrides to be shown in e.g. a column of a `kubectl get hra` output
for observability.
type: string
type: object
type: object

View File

@@ -15,13 +15,13 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.8.1
version: 0.10.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "0.8.1"
appVersion: "0.10.0"
home: https://github.com/actions/actions-runner-controller

View File

@@ -2,3 +2,4 @@ Thank you for installing {{ .Chart.Name }}.
Your release is named {{ .Release.Name }}.
WARNING: Older version of the listener (githubrunnerscalesetlistener) is deprecated and will be removed in the future gha-runner-scale-set-0.10.0 release. If you are using environment variable override to force the old listener, please remove the environment variable and use the new listener (ghalistener) instead.

View File

@@ -126,7 +126,3 @@ Create the name of the service account to use
{{- end }}
{{- $names | join ","}}
{{- end }}
{{- define "gha-runner-scale-set-controller.serviceMonitorName" -}}
{{- include "gha-runner-scale-set-controller.fullname" . }}-service-monitor
{{- end }}

View File

@@ -65,6 +65,9 @@ spec:
{{- with .Values.flags.watchSingleNamespace }}
- "--watch-single-namespace={{ . }}"
{{- end }}
{{- with .Values.runnerMaxConcurrentReconciles }}
- "--runner-max-concurrent-reconciles={{ . }}"
{{- end }}
{{- with .Values.flags.updateStrategy }}
- "--update-strategy={{ . }}"
{{- end }}
@@ -79,6 +82,15 @@ spec:
- "--listener-metrics-endpoint="
- "--metrics-addr=0"
{{- end }}
{{- range .Values.flags.excludeLabelPropagationPrefixes }}
- "--exclude-label-propagation-prefix={{ . }}"
{{- end }}
{{- with .Values.flags.k8sClientRateLimiterQPS }}
- "--k8s-client-rate-limiter-qps={{ . }}"
{{- end }}
{{- with .Values.flags.k8sClientRateLimiterBurst }}
- "--k8s-client-rate-limiter-burst={{ . }}"
{{- end }}
command:
- "/manager"
{{- with .Values.metrics }}
@@ -110,10 +122,16 @@ spec:
volumeMounts:
- mountPath: /tmp
name: tmp
{{- range .Values.volumeMounts }}
- {{ toYaml . | nindent 10 }}
{{- end }}
terminationGracePeriodSeconds: 10
volumes:
- name: tmp
emptyDir: {}
{{- range .Values.volumes }}
- {{ toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
@@ -122,6 +140,10 @@ spec:
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.topologySpreadConstraints }}
topologySpreadConstraints:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}

View File

@@ -345,6 +345,7 @@ func TestTemplate_ControllerDeployment_Defaults(t *testing.T) {
assert.Len(t, deployment.Spec.Template.Spec.NodeSelector, 0)
assert.Nil(t, deployment.Spec.Template.Spec.Affinity)
assert.Len(t, deployment.Spec.Template.Spec.TopologySpreadConstraints, 0)
assert.Len(t, deployment.Spec.Template.Spec.Tolerations, 0)
managerImage := "ghcr.io/actions/gha-runner-scale-set-controller:dev"
@@ -424,10 +425,17 @@ func TestTemplate_ControllerDeployment_Customize(t *testing.T) {
"tolerations[0].key": "foo",
"affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].key": "foo",
"affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].operator": "bar",
"priorityClassName": "test-priority-class",
"flags.updateStrategy": "eventual",
"flags.logLevel": "info",
"flags.logFormat": "json",
"topologySpreadConstraints[0].labelSelector.matchLabels.foo": "bar",
"topologySpreadConstraints[0].maxSkew": "1",
"topologySpreadConstraints[0].topologyKey": "foo",
"priorityClassName": "test-priority-class",
"flags.updateStrategy": "eventual",
"flags.logLevel": "info",
"flags.logFormat": "json",
"volumes[0].name": "customMount",
"volumes[0].configMap.name": "my-configmap",
"volumeMounts[0].name": "customMount",
"volumeMounts[0].mountPath": "/my/mount/path",
},
KubectlOptions: k8s.NewKubectlOptions("", "", namespaceName),
}
@@ -470,9 +478,11 @@ func TestTemplate_ControllerDeployment_Customize(t *testing.T) {
assert.Equal(t, int64(1000), *deployment.Spec.Template.Spec.SecurityContext.FSGroup)
assert.Equal(t, "test-priority-class", deployment.Spec.Template.Spec.PriorityClassName)
assert.Equal(t, int64(10), *deployment.Spec.Template.Spec.TerminationGracePeriodSeconds)
assert.Len(t, deployment.Spec.Template.Spec.Volumes, 1)
assert.Len(t, deployment.Spec.Template.Spec.Volumes, 2)
assert.Equal(t, "tmp", deployment.Spec.Template.Spec.Volumes[0].Name)
assert.NotNil(t, 10, deployment.Spec.Template.Spec.Volumes[0].EmptyDir)
assert.NotNil(t, deployment.Spec.Template.Spec.Volumes[0].EmptyDir)
assert.Equal(t, "customMount", deployment.Spec.Template.Spec.Volumes[1].Name)
assert.Equal(t, "my-configmap", deployment.Spec.Template.Spec.Volumes[1].ConfigMap.Name)
assert.Len(t, deployment.Spec.Template.Spec.NodeSelector, 1)
assert.Equal(t, "bar", deployment.Spec.Template.Spec.NodeSelector["foo"])
@@ -481,6 +491,11 @@ func TestTemplate_ControllerDeployment_Customize(t *testing.T) {
assert.Equal(t, "foo", deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms[0].MatchExpressions[0].Key)
assert.Equal(t, "bar", string(deployment.Spec.Template.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms[0].MatchExpressions[0].Operator))
assert.Len(t, deployment.Spec.Template.Spec.TopologySpreadConstraints, 1)
assert.Equal(t, "bar", deployment.Spec.Template.Spec.TopologySpreadConstraints[0].LabelSelector.MatchLabels["foo"])
assert.Equal(t, int32(1), deployment.Spec.Template.Spec.TopologySpreadConstraints[0].MaxSkew)
assert.Equal(t, "foo", deployment.Spec.Template.Spec.TopologySpreadConstraints[0].TopologyKey)
assert.Len(t, deployment.Spec.Template.Spec.Tolerations, 1)
assert.Equal(t, "foo", deployment.Spec.Template.Spec.Tolerations[0].Key)
@@ -521,9 +536,11 @@ func TestTemplate_ControllerDeployment_Customize(t *testing.T) {
assert.True(t, *deployment.Spec.Template.Spec.Containers[0].SecurityContext.RunAsNonRoot)
assert.Equal(t, int64(1000), *deployment.Spec.Template.Spec.Containers[0].SecurityContext.RunAsUser)
assert.Len(t, deployment.Spec.Template.Spec.Containers[0].VolumeMounts, 1)
assert.Len(t, deployment.Spec.Template.Spec.Containers[0].VolumeMounts, 2)
assert.Equal(t, "tmp", deployment.Spec.Template.Spec.Containers[0].VolumeMounts[0].Name)
assert.Equal(t, "/tmp", deployment.Spec.Template.Spec.Containers[0].VolumeMounts[0].MountPath)
assert.Equal(t, "customMount", deployment.Spec.Template.Spec.Containers[0].VolumeMounts[1].Name)
assert.Equal(t, "/my/mount/path", deployment.Spec.Template.Spec.Containers[0].VolumeMounts[1].MountPath)
}
func TestTemplate_EnableLeaderElectionRole(t *testing.T) {
@@ -737,6 +754,7 @@ func TestTemplate_ControllerDeployment_WatchSingleNamespace(t *testing.T) {
assert.Len(t, deployment.Spec.Template.Spec.NodeSelector, 0)
assert.Nil(t, deployment.Spec.Template.Spec.Affinity)
assert.Len(t, deployment.Spec.Template.Spec.TopologySpreadConstraints, 0)
assert.Len(t, deployment.Spec.Template.Spec.Tolerations, 0)
managerImage := "ghcr.io/actions/gha-runner-scale-set-controller:dev"
@@ -1017,3 +1035,41 @@ func TestControllerDeployment_MetricsPorts(t *testing.T) {
assert.Equal(t, value.frequency, 1, fmt.Sprintf("frequency of %q is not 1", key))
}
}
func TestDeployment_excludeLabelPropagationPrefixes(t *testing.T) {
t.Parallel()
// Path to the helm chart we will test
helmChartPath, err := filepath.Abs("../../gha-runner-scale-set-controller")
require.NoError(t, err)
chartContent, err := os.ReadFile(filepath.Join(helmChartPath, "Chart.yaml"))
require.NoError(t, err)
chart := new(Chart)
err = yaml.Unmarshal(chartContent, chart)
require.NoError(t, err)
releaseName := "test-arc"
namespaceName := "test-" + strings.ToLower(random.UniqueId())
options := &helm.Options{
Logger: logger.Discard,
SetValues: map[string]string{
"flags.excludeLabelPropagationPrefixes[0]": "prefix.com/",
"flags.excludeLabelPropagationPrefixes[1]": "complete.io/label",
},
KubectlOptions: k8s.NewKubectlOptions("", "", namespaceName),
}
output := helm.RenderTemplate(t, options, helmChartPath, releaseName, []string{"templates/deployment.yaml"})
var deployment appsv1.Deployment
helm.UnmarshalK8SYaml(t, output, &deployment)
require.Len(t, deployment.Spec.Template.Spec.Containers, 1, "Expected one container")
container := deployment.Spec.Template.Spec.Containers[0]
assert.Contains(t, container.Args, "--exclude-label-propagation-prefix=prefix.com/")
assert.Contains(t, container.Args, "--exclude-label-propagation-prefix=complete.io/label")
}

View File

@@ -72,14 +72,20 @@ tolerations: []
affinity: {}
topologySpreadConstraints: []
# Mount volumes in the container.
volumes: []
volumeMounts: []
# Leverage a PriorityClass to ensure your pods survive resource shortages
# ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
# PriorityClass: system-cluster-critical
priorityClassName: ""
## If `metrics:` object is not provided, or commented out, the following flags
## will be applied the controller-manager and listener pods with empty values:
## `--metrics-addr`, `--listener-metrics-addr`, `--listener-metrics-endpoint`.
## If `metrics:` object is not provided, or commented out, the following flags
## will be applied the controller-manager and listener pods with empty values:
## `--metrics-addr`, `--listener-metrics-addr`, `--listener-metrics-endpoint`.
## This will disable metrics.
##
## To enable metrics, uncomment the following lines.
@@ -100,6 +106,11 @@ flags:
## Defaults to watch all namespaces when unset.
# watchSingleNamespace: ""
## The maximum number of concurrent reconciles which can be run by the EphemeralRunner controller.
# Increase this value to improve the throughput of the controller.
# It may also increase the load on the API server and the external service (e.g. GitHub API).
runnerMaxConcurrentReconciles: 2
## Defines how the controller should handle upgrades while having running jobs.
##
## The strategies available are:
@@ -115,3 +126,16 @@ flags:
## This can lead to a longer time to apply the change but it will ensure
## that you don't have any overprovisioning of runners.
updateStrategy: "immediate"
## Defines a list of prefixes that should not be propagated to internal resources.
## This is useful when you have labels that are used for internal purposes and should not be propagated to internal resources.
## See https://github.com/actions/actions-runner-controller/issues/3533 for more information.
##
## By default, all labels are propagated to internal resources
## Labels that match prefix specified in the list are excluded from propagation.
# excludeLabelPropagationPrefixes:
# - "argocd.argoproj.io/instance"
## Defines the K8s client rate limiter parameters.
# k8sClientRateLimiterQPS: 20
# k8sClientRateLimiterBurst: 30

View File

@@ -15,13 +15,13 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.8.1
version: 0.10.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "0.8.1"
appVersion: "0.10.0"
home: https://github.com/actions/actions-runner-controller

View File

@@ -99,7 +99,7 @@ volumeMounts:
image: docker:dind
args:
- dockerd
- --host=unix:///run/docker/docker.sock
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
@@ -110,7 +110,7 @@ volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
mountPath: /var/run
- name: dind-externals
mountPath: /home/runner/externals
{{- end }}
@@ -223,7 +223,7 @@ env:
{{- end }}
{{- if $setDockerHost }}
- name: DOCKER_HOST
value: unix:///run/docker/docker.sock
value: unix:///var/run/docker.sock
{{- end }}
{{- if $setRunnerWaitDocker }}
- name: RUNNER_WAIT_FOR_DOCKER_IN_SECONDS
@@ -264,8 +264,7 @@ volumeMounts:
{{- end }}
{{- if $mountDindCert }}
- name: dind-sock
mountPath: /run/docker
readOnly: true
mountPath: /var/run
{{- end }}
{{- if $mountGitHubServerTLS }}
- name: github-server-tls-cert
@@ -526,13 +525,13 @@ volumeMounts:
{{- end }}
{{- end }}
{{- if and (eq $multiNamespacesCounter 0) (eq $singleNamespaceCounter 0) }}
{{- fail "No gha-rs-controller deployment found using label (app.kubernetes.io/part-of=gha-rs-controller). Consider setting controllerServiceAccount.name in values.yaml to be explicit if you think the discovery is wrong." }}
{{- fail "No gha-rs-controller deployment found using label (app.kubernetes.io/part-of=gha-rs-controller). Consider setting controllerServiceAccount.namespace in values.yaml to be explicit if you think the discovery is wrong." }}
{{- end }}
{{- if and (gt $multiNamespacesCounter 0) (gt $singleNamespaceCounter 0) }}
{{- fail "Found both gha-rs-controller installed with flags.watchSingleNamespace set and unset in cluster, this is not supported. Consider setting controllerServiceAccount.name in values.yaml to be explicit if you think the discovery is wrong." }}
{{- fail "Found both gha-rs-controller installed with flags.watchSingleNamespace set and unset in cluster, this is not supported. Consider setting controllerServiceAccount.namespace in values.yaml to be explicit if you think the discovery is wrong." }}
{{- end }}
{{- if gt $multiNamespacesCounter 1 }}
{{- fail "More than one gha-rs-controller deployment found using label (app.kubernetes.io/part-of=gha-rs-controller). Consider setting controllerServiceAccount.name in values.yaml to be explicit if you think the discovery is wrong." }}
{{- fail "More than one gha-rs-controller deployment found using label (app.kubernetes.io/part-of=gha-rs-controller). Consider setting controllerServiceAccount.namespace in values.yaml to be explicit if you think the discovery is wrong." }}
{{- end }}
{{- if eq $multiNamespacesCounter 1 }}
{{- with $controllerDeployment.metadata }}
@@ -545,11 +544,11 @@ volumeMounts:
{{- $managerServiceAccountNamespace = (get $controllerDeployment.metadata.labels "actions.github.com/controller-service-account-namespace") }}
{{- end }}
{{- else }}
{{- fail "No gha-rs-controller deployment that watch this namespace found using label (actions.github.com/controller-watch-single-namespace). Consider setting controllerServiceAccount.name in values.yaml to be explicit if you think the discovery is wrong." }}
{{- fail "No gha-rs-controller deployment that watch this namespace found using label (actions.github.com/controller-watch-single-namespace). Consider setting controllerServiceAccount.namespace in values.yaml to be explicit if you think the discovery is wrong." }}
{{- end }}
{{- end }}
{{- if eq $managerServiceAccountNamespace "" }}
{{- fail "No service account namespace found for gha-rs-controller deployment using label (actions.github.com/controller-service-account-namespace), consider setting controllerServiceAccount.name in values.yaml to be explicit if you think the discovery is wrong." }}
{{- fail "No service account namespace found for gha-rs-controller deployment using label (actions.github.com/controller-service-account-namespace), consider setting controllerServiceAccount.namespace in values.yaml to be explicit if you think the discovery is wrong." }}
{{- end }}
{{- $managerServiceAccountNamespace }}
{{- end }}

View File

@@ -13,6 +13,7 @@ metadata:
app.kubernetes.io/component: "autoscaling-runner-set"
{{- include "gha-runner-scale-set.labels" . | nindent 4 }}
annotations:
actions.github.com/values-hash: {{ toJson .Values | sha256sum | trunc 63 }}
{{- $containerMode := .Values.containerMode }}
{{- if not (kindIs "string" .Values.githubConfigSecret) }}
actions.github.com/cleanup-github-secret-name: {{ include "gha-runner-scale-set.githubsecret" . }}

View File

@@ -900,7 +900,7 @@ func TestTemplateRenderedAutoScalingRunnerSet_EnableDinD(t *testing.T) {
assert.Equal(t, "ghcr.io/actions/actions-runner:latest", ars.Spec.Template.Spec.Containers[0].Image)
assert.Len(t, ars.Spec.Template.Spec.Containers[0].Env, 2, "The runner container should have 2 env vars, DOCKER_HOST and RUNNER_WAIT_FOR_DOCKER_IN_SECONDS")
assert.Equal(t, "DOCKER_HOST", ars.Spec.Template.Spec.Containers[0].Env[0].Name)
assert.Equal(t, "unix:///run/docker/docker.sock", ars.Spec.Template.Spec.Containers[0].Env[0].Value)
assert.Equal(t, "unix:///var/run/docker.sock", ars.Spec.Template.Spec.Containers[0].Env[0].Value)
assert.Equal(t, "RUNNER_WAIT_FOR_DOCKER_IN_SECONDS", ars.Spec.Template.Spec.Containers[0].Env[1].Name)
assert.Equal(t, "120", ars.Spec.Template.Spec.Containers[0].Env[1].Value)
@@ -910,8 +910,7 @@ func TestTemplateRenderedAutoScalingRunnerSet_EnableDinD(t *testing.T) {
assert.False(t, ars.Spec.Template.Spec.Containers[0].VolumeMounts[0].ReadOnly)
assert.Equal(t, "dind-sock", ars.Spec.Template.Spec.Containers[0].VolumeMounts[1].Name)
assert.Equal(t, "/run/docker", ars.Spec.Template.Spec.Containers[0].VolumeMounts[1].MountPath)
assert.True(t, ars.Spec.Template.Spec.Containers[0].VolumeMounts[1].ReadOnly)
assert.Equal(t, "/var/run", ars.Spec.Template.Spec.Containers[0].VolumeMounts[1].MountPath)
assert.Equal(t, "dind", ars.Spec.Template.Spec.Containers[1].Name)
assert.Equal(t, "docker:dind", ars.Spec.Template.Spec.Containers[1].Image)
@@ -921,7 +920,7 @@ func TestTemplateRenderedAutoScalingRunnerSet_EnableDinD(t *testing.T) {
assert.Equal(t, "/home/runner/_work", ars.Spec.Template.Spec.Containers[1].VolumeMounts[0].MountPath)
assert.Equal(t, "dind-sock", ars.Spec.Template.Spec.Containers[1].VolumeMounts[1].Name)
assert.Equal(t, "/run/docker", ars.Spec.Template.Spec.Containers[1].VolumeMounts[1].MountPath)
assert.Equal(t, "/var/run", ars.Spec.Template.Spec.Containers[1].VolumeMounts[1].MountPath)
assert.Equal(t, "dind-externals", ars.Spec.Template.Spec.Containers[1].VolumeMounts[2].Name)
assert.Equal(t, "/home/runner/externals", ars.Spec.Template.Spec.Containers[1].VolumeMounts[2].MountPath)
@@ -2089,3 +2088,58 @@ func TestRunnerContainerVolumeNotEmptyMap(t *testing.T) {
_, ok := m.Spec.Template.Spec.Containers[0]["volumeMounts"]
assert.False(t, ok, "volumeMounts should not be set")
}
func TestAutoscalingRunnerSetAnnotationValuesHash(t *testing.T) {
t.Parallel()
const valuesHash = "actions.github.com/values-hash"
// Path to the helm chart we will test
helmChartPath, err := filepath.Abs("../../gha-runner-scale-set")
require.NoError(t, err)
releaseName := "test-runners"
namespaceName := "test-" + strings.ToLower(random.UniqueId())
options := &helm.Options{
Logger: logger.Discard,
SetValues: map[string]string{
"githubConfigUrl": "https://github.com/actions",
"githubConfigSecret.github_token": "gh_token12345",
"controllerServiceAccount.name": "arc",
"controllerServiceAccount.namespace": "arc-system",
},
KubectlOptions: k8s.NewKubectlOptions("", "", namespaceName),
}
output := helm.RenderTemplate(t, options, helmChartPath, releaseName, []string{"templates/autoscalingrunnerset.yaml"})
var autoscalingRunnerSet v1alpha1.AutoscalingRunnerSet
helm.UnmarshalK8SYaml(t, output, &autoscalingRunnerSet)
firstHash := autoscalingRunnerSet.Annotations["actions.github.com/values-hash"]
assert.NotEmpty(t, firstHash)
assert.LessOrEqual(t, len(firstHash), 63)
helmChartPath, err = filepath.Abs("../../gha-runner-scale-set")
require.NoError(t, err)
options = &helm.Options{
Logger: logger.Discard,
SetValues: map[string]string{
"githubConfigUrl": "https://github.com/actions",
"githubConfigSecret.github_token": "gh_token1234567890",
"controllerServiceAccount.name": "arc",
"controllerServiceAccount.namespace": "arc-system",
},
KubectlOptions: k8s.NewKubectlOptions("", "", namespaceName),
}
output = helm.RenderTemplate(t, options, helmChartPath, releaseName, []string{"templates/autoscalingrunnerset.yaml"})
helm.UnmarshalK8SYaml(t, output, &autoscalingRunnerSet)
secondHash := autoscalingRunnerSet.Annotations[valuesHash]
assert.NotEmpty(t, secondHash)
assert.NotEqual(t, firstHash, secondHash)
assert.LessOrEqual(t, len(secondHash), 63)
}

View File

@@ -88,7 +88,7 @@ githubConfigSecret:
# kubernetesModeServiceAccount:
# annotations:
## template is the PodSpec for each listener Pod
## listenerTemplate is the PodSpec for each listener Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
# listenerTemplate:
# spec:
@@ -125,18 +125,17 @@ template:
## command: ["/home/runner/run.sh"]
## env:
## - name: DOCKER_HOST
## value: unix:///run/docker/docker.sock
## value: unix:///var/run/docker.sock
## volumeMounts:
## - name: work
## mountPath: /home/runner/_work
## - name: dind-sock
## mountPath: /run/docker
## readOnly: true
## mountPath: /var/run
## - name: dind
## image: docker:dind
## args:
## - dockerd
## - --host=unix:///run/docker/docker.sock
## - --host=unix:///var/run/docker.sock
## - --group=$(DOCKER_GROUP_GID)
## env:
## - name: DOCKER_GROUP_GID
@@ -147,7 +146,7 @@ template:
## - name: work
## mountPath: /home/runner/_work
## - name: dind-sock
## mountPath: /run/docker
## mountPath: /var/run
## - name: dind-externals
## mountPath: /home/runner/externals
## volumes:

View File

@@ -34,7 +34,7 @@ type Listener interface {
//go:generate mockery --name Worker --output ./mocks --outpkg mocks --case underscore
type Worker interface {
HandleJobStarted(ctx context.Context, jobInfo *actions.JobStarted) error
HandleDesiredRunnerCount(ctx context.Context, desiredRunnerCount int) error
HandleDesiredRunnerCount(ctx context.Context, count int, jobsCompleted int) (int, error)
}
func New(config config.Config) (*App, error) {
@@ -117,15 +117,19 @@ func (app *App) Run(ctx context.Context) error {
}
g, ctx := errgroup.WithContext(ctx)
metricsCtx, cancelMetrics := context.WithCancelCause(ctx)
g.Go(func() error {
app.logger.Info("Starting listener")
return app.listener.Listen(ctx, app.worker)
listnerErr := app.listener.Listen(ctx, app.worker)
cancelMetrics(fmt.Errorf("Listener exited: %w", listnerErr))
return listnerErr
})
if app.metrics != nil {
g.Go(func() error {
app.logger.Info("Starting metrics server")
return app.metrics.ListenAndServe(ctx)
return app.metrics.ListenAndServe(metricsCtx)
})
}

View File

@@ -15,18 +15,28 @@ type Worker struct {
mock.Mock
}
// HandleDesiredRunnerCount provides a mock function with given fields: ctx, desiredRunnerCount
func (_m *Worker) HandleDesiredRunnerCount(ctx context.Context, desiredRunnerCount int) error {
ret := _m.Called(ctx, desiredRunnerCount)
// HandleDesiredRunnerCount provides a mock function with given fields: ctx, count, acquireCount
func (_m *Worker) HandleDesiredRunnerCount(ctx context.Context, count int, acquireCount int) (int, error) {
ret := _m.Called(ctx, count, acquireCount)
var r0 error
if rf, ok := ret.Get(0).(func(context.Context, int) error); ok {
r0 = rf(ctx, desiredRunnerCount)
var r0 int
var r1 error
if rf, ok := ret.Get(0).(func(context.Context, int, int) (int, error)); ok {
return rf(ctx, count, acquireCount)
}
if rf, ok := ret.Get(0).(func(context.Context, int, int) int); ok {
r0 = rf(ctx, count, acquireCount)
} else {
r0 = ret.Error(0)
r0 = ret.Get(0).(int)
}
return r0
if rf, ok := ret.Get(1).(func(context.Context, int, int) error); ok {
r1 = rf(ctx, count, acquireCount)
} else {
r1 = ret.Error(1)
}
return r0, r1
}
// HandleJobStarted provides a mock function with given fields: ctx, jobInfo

View File

@@ -31,10 +31,11 @@ const (
type Client interface {
GetAcquirableJobs(ctx context.Context, runnerScaleSetId int) (*actions.AcquirableJobList, error)
CreateMessageSession(ctx context.Context, runnerScaleSetId int, owner string) (*actions.RunnerScaleSetSession, error)
GetMessage(ctx context.Context, messageQueueUrl, messageQueueAccessToken string, lastMessageId int64) (*actions.RunnerScaleSetMessage, error)
GetMessage(ctx context.Context, messageQueueUrl, messageQueueAccessToken string, lastMessageId int64, maxCapacity int) (*actions.RunnerScaleSetMessage, error)
DeleteMessage(ctx context.Context, messageQueueUrl, messageQueueAccessToken string, messageId int64) error
AcquireJobs(ctx context.Context, runnerScaleSetId int, messageQueueAccessToken string, requestIds []int64) ([]int64, error)
RefreshMessageSession(ctx context.Context, runnerScaleSetId int, sessionId *uuid.UUID) (*actions.RunnerScaleSetSession, error)
DeleteMessageSession(ctx context.Context, runnerScaleSetId int, sessionId *uuid.UUID) error
}
type Config struct {
@@ -79,6 +80,7 @@ type Listener struct {
// updated fields
lastMessageID int64 // The ID of the last processed message.
maxCapacity int // The maximum number of runners that can be created.
session *actions.RunnerScaleSetSession // The session for managing the runner scale set.
}
@@ -88,10 +90,11 @@ func New(config Config) (*Listener, error) {
}
listener := &Listener{
scaleSetID: config.ScaleSetID,
client: config.Client,
logger: config.Logger,
metrics: metrics.Discard,
scaleSetID: config.ScaleSetID,
client: config.Client,
logger: config.Logger,
metrics: metrics.Discard,
maxCapacity: config.MaxRunners,
}
if config.Metrics != nil {
@@ -113,7 +116,7 @@ func New(config Config) (*Listener, error) {
//go:generate mockery --name Handler --output ./mocks --outpkg mocks --case underscore
type Handler interface {
HandleJobStarted(ctx context.Context, jobInfo *actions.JobStarted) error
HandleDesiredRunnerCount(ctx context.Context, desiredRunnerCount int) error
HandleDesiredRunnerCount(ctx context.Context, count, jobsCompleted int) (int, error)
}
// Listen listens for incoming messages and handles them using the provided handler.
@@ -126,6 +129,12 @@ func (l *Listener) Listen(ctx context.Context, handler Handler) error {
return fmt.Errorf("createSession failed: %w", err)
}
defer func() {
if err := l.deleteMessageSession(); err != nil {
l.logger.Error(err, "failed to delete message session")
}
}()
initialMessage := &actions.RunnerScaleSetMessage{
MessageId: 0,
MessageType: "RunnerScaleSetJobMessages",
@@ -133,28 +142,21 @@ func (l *Listener) Listen(ctx context.Context, handler Handler) error {
Body: "",
}
if l.session.Statistics.TotalAvailableJobs > 0 || l.session.Statistics.TotalAssignedJobs > 0 {
acquirableJobs, err := l.client.GetAcquirableJobs(ctx, l.scaleSetID)
if err != nil {
return fmt.Errorf("failed to call GetAcquirableJobs: %w", err)
}
acquirableJobsJson, err := json.Marshal(acquirableJobs)
if err != nil {
return fmt.Errorf("failed to marshal acquirable jobs: %w", err)
}
initialMessage.Body = string(acquirableJobsJson)
if l.session.Statistics == nil {
return fmt.Errorf("session statistics is nil")
}
l.metrics.PublishStatistics(initialMessage.Statistics)
if err := handler.HandleDesiredRunnerCount(ctx, initialMessage.Statistics.TotalAssignedJobs); err != nil {
desiredRunners, err := handler.HandleDesiredRunnerCount(ctx, initialMessage.Statistics.TotalAssignedJobs, 0)
if err != nil {
return fmt.Errorf("handling initial message failed: %w", err)
}
l.metrics.PublishDesiredRunners(desiredRunners)
for {
select {
case <-ctx.Done():
return fmt.Errorf("context cancelled: %w", ctx.Err())
return ctx.Err()
default:
}
@@ -164,32 +166,62 @@ func (l *Listener) Listen(ctx context.Context, handler Handler) error {
}
if msg == nil {
_, err := handler.HandleDesiredRunnerCount(ctx, 0, 0)
if err != nil {
return fmt.Errorf("handling nil message failed: %w", err)
}
continue
}
statistics, jobsStarted, err := l.parseMessage(ctx, msg)
if err != nil {
return fmt.Errorf("failed to parse message: %w", err)
}
l.lastMessageID = msg.MessageId
if err := l.deleteLastMessage(ctx); err != nil {
return fmt.Errorf("failed to delete message: %w", err)
}
for _, jobStarted := range jobsStarted {
if err := handler.HandleJobStarted(ctx, jobStarted); err != nil {
return fmt.Errorf("failed to handle job started: %w", err)
}
}
if err := handler.HandleDesiredRunnerCount(ctx, statistics.TotalAssignedJobs); err != nil {
return fmt.Errorf("failed to handle desired runner count: %w", err)
// Remove cancellation from the context to avoid cancelling the message handling.
if err := l.handleMessage(context.WithoutCancel(ctx), handler, msg); err != nil {
return fmt.Errorf("failed to handle message: %w", err)
}
}
}
func (l *Listener) handleMessage(ctx context.Context, handler Handler, msg *actions.RunnerScaleSetMessage) error {
parsedMsg, err := l.parseMessage(ctx, msg)
if err != nil {
return fmt.Errorf("failed to parse message: %w", err)
}
l.metrics.PublishStatistics(parsedMsg.statistics)
if len(parsedMsg.jobsAvailable) > 0 {
acquiredJobIDs, err := l.acquireAvailableJobs(ctx, parsedMsg.jobsAvailable)
if err != nil {
return fmt.Errorf("failed to acquire jobs: %w", err)
}
l.logger.Info("Jobs are acquired", "count", len(acquiredJobIDs), "requestIds", fmt.Sprint(acquiredJobIDs))
}
for _, jobCompleted := range parsedMsg.jobsCompleted {
l.metrics.PublishJobCompleted(jobCompleted)
}
l.lastMessageID = msg.MessageId
if err := l.deleteLastMessage(ctx); err != nil {
return fmt.Errorf("failed to delete message: %w", err)
}
for _, jobStarted := range parsedMsg.jobsStarted {
if err := handler.HandleJobStarted(ctx, jobStarted); err != nil {
return fmt.Errorf("failed to handle job started: %w", err)
}
l.metrics.PublishJobStarted(jobStarted)
}
desiredRunners, err := handler.HandleDesiredRunnerCount(ctx, parsedMsg.statistics.TotalAssignedJobs, len(parsedMsg.jobsCompleted))
if err != nil {
return fmt.Errorf("failed to handle desired runner count: %w", err)
}
l.metrics.PublishDesiredRunners(desiredRunners)
return nil
}
func (l *Listener) createSession(ctx context.Context) error {
var session *actions.RunnerScaleSetSession
var retries int
@@ -237,7 +269,7 @@ func (l *Listener) createSession(ctx context.Context) error {
func (l *Listener) getMessage(ctx context.Context) (*actions.RunnerScaleSetMessage, error) {
l.logger.Info("Getting next message", "lastMessageID", l.lastMessageID)
msg, err := l.client.GetMessage(ctx, l.session.MessageQueueUrl, l.session.MessageQueueAccessToken, l.lastMessageID)
msg, err := l.client.GetMessage(ctx, l.session.MessageQueueUrl, l.session.MessageQueueAccessToken, l.lastMessageID, l.maxCapacity)
if err == nil { // if NO error
return msg, nil
}
@@ -253,66 +285,89 @@ func (l *Listener) getMessage(ctx context.Context) (*actions.RunnerScaleSetMessa
l.logger.Info("Getting next message", "lastMessageID", l.lastMessageID)
msg, err = l.client.GetMessage(ctx, l.session.MessageQueueUrl, l.session.MessageQueueAccessToken, l.lastMessageID)
msg, err = l.client.GetMessage(ctx, l.session.MessageQueueUrl, l.session.MessageQueueAccessToken, l.lastMessageID, l.maxCapacity)
if err != nil { // if NO error
return nil, fmt.Errorf("failed to get next message after message session refresh: %w", err)
}
return msg, nil
}
func (l *Listener) deleteLastMessage(ctx context.Context) error {
l.logger.Info("Deleting last message", "lastMessageID", l.lastMessageID)
if err := l.client.DeleteMessage(ctx, l.session.MessageQueueUrl, l.session.MessageQueueAccessToken, l.lastMessageID); err != nil {
return fmt.Errorf("failed to delete message: %w", err)
err := l.client.DeleteMessage(ctx, l.session.MessageQueueUrl, l.session.MessageQueueAccessToken, l.lastMessageID)
if err == nil { // if NO error
return nil
}
expiredError := &actions.MessageQueueTokenExpiredError{}
if !errors.As(err, &expiredError) {
return fmt.Errorf("failed to delete last message: %w", err)
}
if err := l.refreshSession(ctx); err != nil {
return err
}
err = l.client.DeleteMessage(ctx, l.session.MessageQueueUrl, l.session.MessageQueueAccessToken, l.lastMessageID)
if err != nil {
return fmt.Errorf("failed to delete last message after message session refresh: %w", err)
}
return nil
}
func (l *Listener) parseMessage(ctx context.Context, msg *actions.RunnerScaleSetMessage) (*actions.RunnerScaleSetStatistic, []*actions.JobStarted, error) {
type parsedMessage struct {
statistics *actions.RunnerScaleSetStatistic
jobsStarted []*actions.JobStarted
jobsAvailable []*actions.JobAvailable
jobsCompleted []*actions.JobCompleted
}
func (l *Listener) parseMessage(ctx context.Context, msg *actions.RunnerScaleSetMessage) (*parsedMessage, error) {
if msg.MessageType != "RunnerScaleSetJobMessages" {
l.logger.Info("Skipping message", "messageType", msg.MessageType)
return nil, fmt.Errorf("invalid message type: %s", msg.MessageType)
}
l.logger.Info("Processing message", "messageId", msg.MessageId, "messageType", msg.MessageType)
if msg.Statistics == nil {
return nil, nil, fmt.Errorf("invalid message: statistics is nil")
return nil, fmt.Errorf("invalid message: statistics is nil")
}
l.logger.Info("New runner scale set statistics.", "statistics", msg.Statistics)
if msg.MessageType != "RunnerScaleSetJobMessages" {
l.logger.Info("Skipping message", "messageType", msg.MessageType)
return nil, nil, fmt.Errorf("invalid message type: %s", msg.MessageType)
}
var batchedMessages []json.RawMessage
if len(msg.Body) > 0 {
if err := json.Unmarshal([]byte(msg.Body), &batchedMessages); err != nil {
return nil, nil, fmt.Errorf("failed to unmarshal batched messages: %w", err)
return nil, fmt.Errorf("failed to unmarshal batched messages: %w", err)
}
}
var availableJobs []int64
var startedJobs []*actions.JobStarted
parsedMsg := &parsedMessage{
statistics: msg.Statistics,
}
for _, msg := range batchedMessages {
var messageType actions.JobMessageType
if err := json.Unmarshal(msg, &messageType); err != nil {
return nil, nil, fmt.Errorf("failed to decode job message type: %w", err)
return nil, fmt.Errorf("failed to decode job message type: %w", err)
}
switch messageType.MessageType {
case messageTypeJobAvailable:
var jobAvailable actions.JobAvailable
if err := json.Unmarshal(msg, &jobAvailable); err != nil {
return nil, nil, fmt.Errorf("failed to decode job available: %w", err)
return nil, fmt.Errorf("failed to decode job available: %w", err)
}
l.logger.Info("Job available message received", "jobId", jobAvailable.RunnerRequestId)
availableJobs = append(availableJobs, jobAvailable.RunnerRequestId)
parsedMsg.jobsAvailable = append(parsedMsg.jobsAvailable, &jobAvailable)
case messageTypeJobAssigned:
var jobAssigned actions.JobAssigned
if err := json.Unmarshal(msg, &jobAssigned); err != nil {
return nil, nil, fmt.Errorf("failed to decode job assigned: %w", err)
return nil, fmt.Errorf("failed to decode job assigned: %w", err)
}
l.logger.Info("Job assigned message received", "jobId", jobAssigned.RunnerRequestId)
@@ -320,43 +375,39 @@ func (l *Listener) parseMessage(ctx context.Context, msg *actions.RunnerScaleSet
case messageTypeJobStarted:
var jobStarted actions.JobStarted
if err := json.Unmarshal(msg, &jobStarted); err != nil {
return nil, nil, fmt.Errorf("could not decode job started message. %w", err)
return nil, fmt.Errorf("could not decode job started message. %w", err)
}
l.logger.Info("Job started message received.", "RequestId", jobStarted.RunnerRequestId, "RunnerId", jobStarted.RunnerId)
startedJobs = append(startedJobs, &jobStarted)
parsedMsg.jobsStarted = append(parsedMsg.jobsStarted, &jobStarted)
case messageTypeJobCompleted:
var jobCompleted actions.JobCompleted
if err := json.Unmarshal(msg, &jobCompleted); err != nil {
return nil, nil, fmt.Errorf("failed to decode job completed: %w", err)
return nil, fmt.Errorf("failed to decode job completed: %w", err)
}
l.logger.Info("Job completed message received.", "RequestId", jobCompleted.RunnerRequestId, "Result", jobCompleted.Result, "RunnerId", jobCompleted.RunnerId, "RunnerName", jobCompleted.RunnerName)
parsedMsg.jobsCompleted = append(parsedMsg.jobsCompleted, &jobCompleted)
default:
l.logger.Info("unknown job message type.", "messageType", messageType.MessageType)
}
}
l.logger.Info("Available jobs.", "count", len(availableJobs), "requestIds", fmt.Sprint(availableJobs))
if len(availableJobs) > 0 {
acquired, err := l.acquireAvailableJobs(ctx, availableJobs)
if err != nil {
return nil, nil, err
}
l.logger.Info("Jobs are acquired", "count", len(acquired), "requestIds", fmt.Sprint(acquired))
}
return msg.Statistics, startedJobs, nil
return parsedMsg, nil
}
func (l *Listener) acquireAvailableJobs(ctx context.Context, availableJobs []int64) ([]int64, error) {
l.logger.Info("Acquiring jobs")
func (l *Listener) acquireAvailableJobs(ctx context.Context, jobsAvailable []*actions.JobAvailable) ([]int64, error) {
ids := make([]int64, 0, len(jobsAvailable))
for _, job := range jobsAvailable {
ids = append(ids, job.RunnerRequestId)
}
ids, err := l.client.AcquireJobs(ctx, l.scaleSetID, l.session.MessageQueueAccessToken, availableJobs)
l.logger.Info("Acquiring jobs", "count", len(ids), "requestIds", fmt.Sprint(ids))
idsAcquired, err := l.client.AcquireJobs(ctx, l.scaleSetID, l.session.MessageQueueAccessToken, ids)
if err == nil { // if NO errors
return ids, nil
return idsAcquired, nil
}
expiredError := &actions.MessageQueueTokenExpiredError{}
@@ -368,12 +419,12 @@ func (l *Listener) acquireAvailableJobs(ctx context.Context, availableJobs []int
return nil, err
}
ids, err = l.client.AcquireJobs(ctx, l.scaleSetID, l.session.MessageQueueAccessToken, availableJobs)
idsAcquired, err = l.client.AcquireJobs(ctx, l.scaleSetID, l.session.MessageQueueAccessToken, ids)
if err != nil {
return nil, fmt.Errorf("failed to acquire jobs after session refresh: %w", err)
}
return ids, nil
return idsAcquired, nil
}
func (l *Listener) refreshSession(ctx context.Context) error {
@@ -386,3 +437,16 @@ func (l *Listener) refreshSession(ctx context.Context) error {
l.session = session
return nil
}
func (l *Listener) deleteMessageSession() error {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
l.logger.Info("Deleting message session")
if err := l.client.DeleteMessageSession(ctx, l.session.RunnerScaleSet.Id, l.session.SessionId); err != nil {
return fmt.Errorf("failed to delete message session: %w", err)
}
return nil
}

View File

@@ -2,6 +2,7 @@ package listener
import (
"context"
"encoding/json"
"errors"
"net/http"
"testing"
@@ -9,7 +10,6 @@ import (
listenermocks "github.com/actions/actions-runner-controller/cmd/ghalistener/listener/mocks"
"github.com/actions/actions-runner-controller/cmd/ghalistener/metrics"
metricsmocks "github.com/actions/actions-runner-controller/cmd/ghalistener/metrics/mocks"
"github.com/actions/actions-runner-controller/github/actions"
"github.com/google/uuid"
"github.com/stretchr/testify/assert"
@@ -37,23 +37,6 @@ func TestNew(t *testing.T) {
assert.Nil(t, err)
assert.NotNil(t, l)
})
t.Run("SetStaticMetrics", func(t *testing.T) {
t.Parallel()
metrics := metricsmocks.NewPublisher(t)
metrics.On("PublishStatic", mock.Anything, mock.Anything).Once()
config := Config{
Client: listenermocks.NewClient(t),
ScaleSetID: 1,
Metrics: metrics,
}
l, err := New(config)
assert.Nil(t, err)
assert.NotNil(t, l)
})
}
func TestListener_createSession(t *testing.T) {
@@ -140,13 +123,14 @@ func TestListener_getMessage(t *testing.T) {
config := Config{
ScaleSetID: 1,
Metrics: metrics.Discard,
MaxRunners: 10,
}
client := listenermocks.NewClient(t)
want := &actions.RunnerScaleSetMessage{
MessageId: 1,
}
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything).Return(want, nil).Once()
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything, 10).Return(want, nil).Once()
config.Client = client
l, err := New(config)
@@ -165,10 +149,11 @@ func TestListener_getMessage(t *testing.T) {
config := Config{
ScaleSetID: 1,
Metrics: metrics.Discard,
MaxRunners: 10,
}
client := listenermocks.NewClient(t)
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything).Return(nil, &actions.HttpClientSideError{Code: http.StatusNotFound}).Once()
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything, 10).Return(nil, &actions.HttpClientSideError{Code: http.StatusNotFound}).Once()
config.Client = client
l, err := New(config)
@@ -187,6 +172,7 @@ func TestListener_getMessage(t *testing.T) {
config := Config{
ScaleSetID: 1,
Metrics: metrics.Discard,
MaxRunners: 10,
}
client := listenermocks.NewClient(t)
@@ -202,12 +188,12 @@ func TestListener_getMessage(t *testing.T) {
}
client.On("RefreshMessageSession", ctx, mock.Anything, mock.Anything).Return(session, nil).Once()
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything).Return(nil, &actions.MessageQueueTokenExpiredError{}).Once()
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything, 10).Return(nil, &actions.MessageQueueTokenExpiredError{}).Once()
want := &actions.RunnerScaleSetMessage{
MessageId: 1,
}
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything).Return(want, nil).Once()
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything, 10).Return(want, nil).Once()
config.Client = client
@@ -231,6 +217,7 @@ func TestListener_getMessage(t *testing.T) {
config := Config{
ScaleSetID: 1,
Metrics: metrics.Discard,
MaxRunners: 10,
}
client := listenermocks.NewClient(t)
@@ -246,7 +233,7 @@ func TestListener_getMessage(t *testing.T) {
}
client.On("RefreshMessageSession", ctx, mock.Anything, mock.Anything).Return(session, nil).Once()
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything).Return(nil, &actions.MessageQueueTokenExpiredError{}).Twice()
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything, 10).Return(nil, &actions.MessageQueueTokenExpiredError{}).Twice()
config.Client = client
@@ -390,6 +377,93 @@ func TestListener_deleteLastMessage(t *testing.T) {
err = l.deleteLastMessage(ctx)
assert.NotNil(t, err)
})
t.Run("RefreshAndSucceeds", func(t *testing.T) {
t.Parallel()
ctx := context.Background()
config := Config{
ScaleSetID: 1,
Metrics: metrics.Discard,
}
client := listenermocks.NewClient(t)
newUUID := uuid.New()
session := &actions.RunnerScaleSetSession{
SessionId: &newUUID,
OwnerName: "example",
RunnerScaleSet: &actions.RunnerScaleSet{},
MessageQueueUrl: "https://example.com",
MessageQueueAccessToken: "1234567890",
Statistics: nil,
}
client.On("RefreshMessageSession", ctx, mock.Anything, mock.Anything).Return(session, nil).Once()
client.On("DeleteMessage", ctx, mock.Anything, mock.Anything, mock.Anything).Return(&actions.MessageQueueTokenExpiredError{}).Once()
client.On("DeleteMessage", ctx, mock.Anything, mock.Anything, mock.MatchedBy(func(lastMessageID any) bool {
return lastMessageID.(int64) == int64(5)
})).Return(nil).Once()
config.Client = client
l, err := New(config)
require.Nil(t, err)
oldUUID := uuid.New()
l.session = &actions.RunnerScaleSetSession{
SessionId: &oldUUID,
RunnerScaleSet: &actions.RunnerScaleSet{},
}
l.lastMessageID = 5
config.Client = client
err = l.deleteLastMessage(ctx)
assert.NoError(t, err)
})
t.Run("RefreshAndFails", func(t *testing.T) {
t.Parallel()
ctx := context.Background()
config := Config{
ScaleSetID: 1,
Metrics: metrics.Discard,
}
client := listenermocks.NewClient(t)
newUUID := uuid.New()
session := &actions.RunnerScaleSetSession{
SessionId: &newUUID,
OwnerName: "example",
RunnerScaleSet: &actions.RunnerScaleSet{},
MessageQueueUrl: "https://example.com",
MessageQueueAccessToken: "1234567890",
Statistics: nil,
}
client.On("RefreshMessageSession", ctx, mock.Anything, mock.Anything).Return(session, nil).Once()
client.On("DeleteMessage", ctx, mock.Anything, mock.Anything, mock.Anything).Return(&actions.MessageQueueTokenExpiredError{}).Twice()
config.Client = client
l, err := New(config)
require.Nil(t, err)
oldUUID := uuid.New()
l.session = &actions.RunnerScaleSetSession{
SessionId: &oldUUID,
RunnerScaleSet: &actions.RunnerScaleSet{},
}
l.lastMessageID = 5
config.Client = client
err = l.deleteLastMessage(ctx)
assert.Error(t, err)
})
}
func TestListener_Listen(t *testing.T) {
@@ -435,6 +509,8 @@ func TestListener_Listen(t *testing.T) {
Statistics: &actions.RunnerScaleSetStatistic{},
}
client.On("CreateMessageSession", ctx, mock.Anything, mock.Anything).Return(session, nil).Once()
client.On("DeleteMessageSession", mock.Anything, session.RunnerScaleSet.Id, session.SessionId).Return(nil).Once()
config.Client = client
l, err := New(config)
@@ -442,8 +518,8 @@ func TestListener_Listen(t *testing.T) {
var called bool
handler := listenermocks.NewHandler(t)
handler.On("HandleDesiredRunnerCount", mock.Anything, mock.Anything).
Return(nil).
handler.On("HandleDesiredRunnerCount", mock.Anything, mock.Anything, 0).
Return(0, nil).
Run(
func(mock.Arguments) {
called = true
@@ -456,6 +532,65 @@ func TestListener_Listen(t *testing.T) {
assert.True(t, errors.Is(err, context.Canceled))
assert.True(t, called)
})
t.Run("CancelContextAfterGetMessage", func(t *testing.T) {
t.Parallel()
ctx, cancel := context.WithCancel(context.Background())
config := Config{
ScaleSetID: 1,
Metrics: metrics.Discard,
MaxRunners: 10,
}
client := listenermocks.NewClient(t)
uuid := uuid.New()
session := &actions.RunnerScaleSetSession{
SessionId: &uuid,
OwnerName: "example",
RunnerScaleSet: &actions.RunnerScaleSet{},
MessageQueueUrl: "https://example.com",
MessageQueueAccessToken: "1234567890",
Statistics: &actions.RunnerScaleSetStatistic{},
}
client.On("CreateMessageSession", ctx, mock.Anything, mock.Anything).Return(session, nil).Once()
client.On("DeleteMessageSession", mock.Anything, session.RunnerScaleSet.Id, session.SessionId).Return(nil).Once()
msg := &actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "RunnerScaleSetJobMessages",
Statistics: &actions.RunnerScaleSetStatistic{},
}
client.On("GetMessage", ctx, mock.Anything, mock.Anything, mock.Anything, 10).
Return(msg, nil).
Run(
func(mock.Arguments) {
cancel()
},
).
Once()
// Ensure delete message is called without cancel
client.On("DeleteMessage", context.WithoutCancel(ctx), mock.Anything, mock.Anything, mock.Anything).Return(nil).Once()
config.Client = client
handler := listenermocks.NewHandler(t)
handler.On("HandleDesiredRunnerCount", mock.Anything, mock.Anything, 0).
Return(0, nil).
Once()
handler.On("HandleDesiredRunnerCount", mock.Anything, mock.Anything, 0).
Return(0, nil).
Once()
l, err := New(config)
require.Nil(t, err)
err = l.Listen(ctx, handler)
assert.ErrorIs(t, context.Canceled, err)
})
}
func TestListener_acquireAvailableJobs(t *testing.T) {
@@ -489,7 +624,24 @@ func TestListener_acquireAvailableJobs(t *testing.T) {
Statistics: &actions.RunnerScaleSetStatistic{},
}
_, err = l.acquireAvailableJobs(ctx, []int64{1, 2, 3})
availableJobs := []*actions.JobAvailable{
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 1,
},
},
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 2,
},
},
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 3,
},
},
}
_, err = l.acquireAvailableJobs(ctx, availableJobs)
assert.Error(t, err)
})
@@ -523,9 +675,26 @@ func TestListener_acquireAvailableJobs(t *testing.T) {
Statistics: &actions.RunnerScaleSetStatistic{},
}
acquiredJobIDs, err := l.acquireAvailableJobs(ctx, []int64{1, 2, 3})
availableJobs := []*actions.JobAvailable{
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 1,
},
},
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 2,
},
},
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 3,
},
},
}
acquiredJobIDs, err := l.acquireAvailableJobs(ctx, availableJobs)
assert.NoError(t, err)
assert.Equal(t, jobIDs, acquiredJobIDs)
assert.Equal(t, []int64{1, 2, 3}, acquiredJobIDs)
})
t.Run("RefreshAndSucceeds", func(t *testing.T) {
@@ -550,12 +719,43 @@ func TestListener_acquireAvailableJobs(t *testing.T) {
}
client.On("RefreshMessageSession", ctx, mock.Anything, mock.Anything).Return(session, nil).Once()
// First call to AcquireJobs will fail with a token expired error
client.On("AcquireJobs", ctx, mock.Anything, mock.Anything, mock.Anything).Return(nil, &actions.MessageQueueTokenExpiredError{}).Once()
// Second call to AcquireJobs will succeed
want := []int64{1, 2, 3}
client.On("AcquireJobs", ctx, mock.Anything, mock.Anything, mock.Anything).Return(want, nil).Once()
availableJobs := []*actions.JobAvailable{
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 1,
},
},
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 2,
},
},
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 3,
},
},
}
// First call to AcquireJobs will fail with a token expired error
client.On("AcquireJobs", ctx, mock.Anything, mock.Anything, mock.Anything).
Run(func(args mock.Arguments) {
ids := args.Get(3).([]int64)
assert.Equal(t, want, ids)
}).
Return(nil, &actions.MessageQueueTokenExpiredError{}).
Once()
// Second call should succeed
client.On("AcquireJobs", ctx, mock.Anything, mock.Anything, mock.Anything).
Run(func(args mock.Arguments) {
ids := args.Get(3).([]int64)
assert.Equal(t, want, ids)
}).
Return(want, nil).
Once()
config.Client = client
@@ -567,7 +767,7 @@ func TestListener_acquireAvailableJobs(t *testing.T) {
RunnerScaleSet: &actions.RunnerScaleSet{},
}
got, err := l.acquireAvailableJobs(ctx, want)
got, err := l.acquireAvailableJobs(ctx, availableJobs)
assert.Nil(t, err)
assert.Equal(t, want, got)
})
@@ -606,8 +806,165 @@ func TestListener_acquireAvailableJobs(t *testing.T) {
RunnerScaleSet: &actions.RunnerScaleSet{},
}
got, err := l.acquireAvailableJobs(ctx, []int64{1, 2, 3})
availableJobs := []*actions.JobAvailable{
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 1,
},
},
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 2,
},
},
{
JobMessageBase: actions.JobMessageBase{
RunnerRequestId: 3,
},
},
}
got, err := l.acquireAvailableJobs(ctx, availableJobs)
assert.NotNil(t, err)
assert.Nil(t, got)
})
}
func TestListener_parseMessage(t *testing.T) {
t.Run("FailOnEmptyStatistics", func(t *testing.T) {
msg := &actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "RunnerScaleSetJobMessages",
Statistics: nil,
}
l := &Listener{}
parsedMsg, err := l.parseMessage(context.Background(), msg)
assert.Error(t, err)
assert.Nil(t, parsedMsg)
})
t.Run("FailOnIncorrectMessageType", func(t *testing.T) {
msg := &actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "RunnerMessages", // arbitrary message type
Statistics: &actions.RunnerScaleSetStatistic{},
}
l := &Listener{}
parsedMsg, err := l.parseMessage(context.Background(), msg)
assert.Error(t, err)
assert.Nil(t, parsedMsg)
})
t.Run("ParseAll", func(t *testing.T) {
msg := &actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "RunnerScaleSetJobMessages",
Body: "",
Statistics: &actions.RunnerScaleSetStatistic{
TotalAvailableJobs: 1,
TotalAcquiredJobs: 2,
TotalAssignedJobs: 3,
TotalRunningJobs: 4,
TotalRegisteredRunners: 5,
TotalBusyRunners: 6,
TotalIdleRunners: 7,
},
}
var batchedMessages []any
jobsAvailable := []*actions.JobAvailable{
{
AcquireJobUrl: "https://github.com/example",
JobMessageBase: actions.JobMessageBase{
JobMessageType: actions.JobMessageType{
MessageType: messageTypeJobAvailable,
},
RunnerRequestId: 1,
},
},
{
AcquireJobUrl: "https://github.com/example",
JobMessageBase: actions.JobMessageBase{
JobMessageType: actions.JobMessageType{
MessageType: messageTypeJobAvailable,
},
RunnerRequestId: 2,
},
},
}
for _, msg := range jobsAvailable {
batchedMessages = append(batchedMessages, msg)
}
jobsAssigned := []*actions.JobAssigned{
{
JobMessageBase: actions.JobMessageBase{
JobMessageType: actions.JobMessageType{
MessageType: messageTypeJobAssigned,
},
RunnerRequestId: 3,
},
},
{
JobMessageBase: actions.JobMessageBase{
JobMessageType: actions.JobMessageType{
MessageType: messageTypeJobAssigned,
},
RunnerRequestId: 4,
},
},
}
for _, msg := range jobsAssigned {
batchedMessages = append(batchedMessages, msg)
}
jobsStarted := []*actions.JobStarted{
{
JobMessageBase: actions.JobMessageBase{
JobMessageType: actions.JobMessageType{
MessageType: messageTypeJobStarted,
},
RunnerRequestId: 5,
},
RunnerId: 2,
RunnerName: "runner2",
},
}
for _, msg := range jobsStarted {
batchedMessages = append(batchedMessages, msg)
}
jobsCompleted := []*actions.JobCompleted{
{
JobMessageBase: actions.JobMessageBase{
JobMessageType: actions.JobMessageType{
MessageType: messageTypeJobCompleted,
},
RunnerRequestId: 6,
},
Result: "success",
RunnerId: 1,
RunnerName: "runner1",
},
}
for _, msg := range jobsCompleted {
batchedMessages = append(batchedMessages, msg)
}
b, err := json.Marshal(batchedMessages)
require.NoError(t, err)
msg.Body = string(b)
l := &Listener{}
parsedMsg, err := l.parseMessage(context.Background(), msg)
require.NoError(t, err)
assert.Equal(t, msg.Statistics, parsedMsg.statistics)
assert.Equal(t, jobsAvailable, parsedMsg.jobsAvailable)
assert.Equal(t, jobsStarted, parsedMsg.jobsStarted)
assert.Equal(t, jobsCompleted, parsedMsg.jobsCompleted)
})
}

View File

@@ -0,0 +1,205 @@
package listener
import (
"context"
"encoding/json"
"testing"
listenermocks "github.com/actions/actions-runner-controller/cmd/ghalistener/listener/mocks"
metricsmocks "github.com/actions/actions-runner-controller/cmd/ghalistener/metrics/mocks"
"github.com/actions/actions-runner-controller/github/actions"
"github.com/google/uuid"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/mock"
"github.com/stretchr/testify/require"
)
func TestInitialMetrics(t *testing.T) {
t.Parallel()
t.Run("SetStaticMetrics", func(t *testing.T) {
t.Parallel()
metrics := metricsmocks.NewPublisher(t)
minRunners := 5
maxRunners := 10
metrics.On("PublishStatic", minRunners, maxRunners).Once()
config := Config{
Client: listenermocks.NewClient(t),
ScaleSetID: 1,
Metrics: metrics,
MinRunners: minRunners,
MaxRunners: maxRunners,
}
l, err := New(config)
assert.Nil(t, err)
assert.NotNil(t, l)
})
t.Run("InitialMessageStatistics", func(t *testing.T) {
t.Parallel()
ctx, cancel := context.WithCancel(context.Background())
sessionStatistics := &actions.RunnerScaleSetStatistic{
TotalAvailableJobs: 1,
TotalAcquiredJobs: 2,
TotalAssignedJobs: 3,
TotalRunningJobs: 4,
TotalRegisteredRunners: 5,
TotalBusyRunners: 6,
TotalIdleRunners: 7,
}
uuid := uuid.New()
session := &actions.RunnerScaleSetSession{
SessionId: &uuid,
OwnerName: "example",
RunnerScaleSet: &actions.RunnerScaleSet{},
MessageQueueUrl: "https://example.com",
MessageQueueAccessToken: "1234567890",
Statistics: sessionStatistics,
}
metrics := metricsmocks.NewPublisher(t)
metrics.On("PublishStatic", mock.Anything, mock.Anything).Once()
metrics.On("PublishStatistics", sessionStatistics).Once()
metrics.On("PublishDesiredRunners", sessionStatistics.TotalAssignedJobs).
Run(
func(mock.Arguments) {
cancel()
},
).Once()
config := Config{
Client: listenermocks.NewClient(t),
ScaleSetID: 1,
Metrics: metrics,
}
client := listenermocks.NewClient(t)
client.On("CreateMessageSession", mock.Anything, mock.Anything, mock.Anything).Return(session, nil).Once()
client.On("DeleteMessageSession", mock.Anything, session.RunnerScaleSet.Id, session.SessionId).Return(nil).Once()
config.Client = client
handler := listenermocks.NewHandler(t)
handler.On("HandleDesiredRunnerCount", mock.Anything, sessionStatistics.TotalAssignedJobs, 0).
Return(sessionStatistics.TotalAssignedJobs, nil).
Once()
l, err := New(config)
assert.Nil(t, err)
assert.NotNil(t, l)
assert.ErrorIs(t, context.Canceled, l.Listen(ctx, handler))
})
}
func TestHandleMessageMetrics(t *testing.T) {
t.Parallel()
msg := &actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "RunnerScaleSetJobMessages",
Body: "",
Statistics: &actions.RunnerScaleSetStatistic{
TotalAvailableJobs: 1,
TotalAcquiredJobs: 2,
TotalAssignedJobs: 3,
TotalRunningJobs: 4,
TotalRegisteredRunners: 5,
TotalBusyRunners: 6,
TotalIdleRunners: 7,
},
}
var batchedMessages []any
jobsStarted := []*actions.JobStarted{
{
JobMessageBase: actions.JobMessageBase{
JobMessageType: actions.JobMessageType{
MessageType: messageTypeJobStarted,
},
RunnerRequestId: 8,
},
RunnerId: 3,
RunnerName: "runner3",
},
}
for _, msg := range jobsStarted {
batchedMessages = append(batchedMessages, msg)
}
jobsCompleted := []*actions.JobCompleted{
{
JobMessageBase: actions.JobMessageBase{
JobMessageType: actions.JobMessageType{
MessageType: messageTypeJobCompleted,
},
RunnerRequestId: 6,
},
Result: "success",
RunnerId: 1,
RunnerName: "runner1",
},
{
JobMessageBase: actions.JobMessageBase{
JobMessageType: actions.JobMessageType{
MessageType: messageTypeJobCompleted,
},
RunnerRequestId: 7,
},
Result: "success",
RunnerId: 2,
RunnerName: "runner2",
},
}
for _, msg := range jobsCompleted {
batchedMessages = append(batchedMessages, msg)
}
b, err := json.Marshal(batchedMessages)
require.NoError(t, err)
msg.Body = string(b)
desiredResult := 4
metrics := metricsmocks.NewPublisher(t)
metrics.On("PublishStatic", 0, 0).Once()
metrics.On("PublishStatistics", msg.Statistics).Once()
metrics.On("PublishJobCompleted", jobsCompleted[0]).Once()
metrics.On("PublishJobCompleted", jobsCompleted[1]).Once()
metrics.On("PublishJobStarted", jobsStarted[0]).Once()
metrics.On("PublishDesiredRunners", desiredResult).Once()
handler := listenermocks.NewHandler(t)
handler.On("HandleJobStarted", mock.Anything, jobsStarted[0]).Return(nil).Once()
handler.On("HandleDesiredRunnerCount", mock.Anything, mock.Anything, 2).Return(desiredResult, nil).Once()
client := listenermocks.NewClient(t)
client.On("DeleteMessage", mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(nil).Once()
config := Config{
Client: listenermocks.NewClient(t),
ScaleSetID: 1,
Metrics: metrics,
}
l, err := New(config)
require.NoError(t, err)
l.client = client
l.session = &actions.RunnerScaleSetSession{
OwnerName: "",
RunnerScaleSet: &actions.RunnerScaleSet{},
MessageQueueUrl: "",
MessageQueueAccessToken: "",
Statistics: &actions.RunnerScaleSetStatistic{},
}
err = l.handleMessage(context.Background(), handler, msg)
require.NoError(t, err)
}

View File

@@ -83,6 +83,20 @@ func (_m *Client) DeleteMessage(ctx context.Context, messageQueueUrl string, mes
return r0
}
// DeleteMessageSession provides a mock function with given fields: ctx, runnerScaleSetId, sessionId
func (_m *Client) DeleteMessageSession(ctx context.Context, runnerScaleSetId int, sessionId *uuid.UUID) error {
ret := _m.Called(ctx, runnerScaleSetId, sessionId)
var r0 error
if rf, ok := ret.Get(0).(func(context.Context, int, *uuid.UUID) error); ok {
r0 = rf(ctx, runnerScaleSetId, sessionId)
} else {
r0 = ret.Error(0)
}
return r0
}
// GetAcquirableJobs provides a mock function with given fields: ctx, runnerScaleSetId
func (_m *Client) GetAcquirableJobs(ctx context.Context, runnerScaleSetId int) (*actions.AcquirableJobList, error) {
ret := _m.Called(ctx, runnerScaleSetId)
@@ -109,25 +123,25 @@ func (_m *Client) GetAcquirableJobs(ctx context.Context, runnerScaleSetId int) (
return r0, r1
}
// GetMessage provides a mock function with given fields: ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId
func (_m *Client) GetMessage(ctx context.Context, messageQueueUrl string, messageQueueAccessToken string, lastMessageId int64) (*actions.RunnerScaleSetMessage, error) {
ret := _m.Called(ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId)
// GetMessage provides a mock function with given fields: ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId, maxCapacity
func (_m *Client) GetMessage(ctx context.Context, messageQueueUrl string, messageQueueAccessToken string, lastMessageId int64, maxCapacity int) (*actions.RunnerScaleSetMessage, error) {
ret := _m.Called(ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId, maxCapacity)
var r0 *actions.RunnerScaleSetMessage
var r1 error
if rf, ok := ret.Get(0).(func(context.Context, string, string, int64) (*actions.RunnerScaleSetMessage, error)); ok {
return rf(ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId)
if rf, ok := ret.Get(0).(func(context.Context, string, string, int64, int) (*actions.RunnerScaleSetMessage, error)); ok {
return rf(ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId, maxCapacity)
}
if rf, ok := ret.Get(0).(func(context.Context, string, string, int64) *actions.RunnerScaleSetMessage); ok {
r0 = rf(ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId)
if rf, ok := ret.Get(0).(func(context.Context, string, string, int64, int) *actions.RunnerScaleSetMessage); ok {
r0 = rf(ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId, maxCapacity)
} else {
if ret.Get(0) != nil {
r0 = ret.Get(0).(*actions.RunnerScaleSetMessage)
}
}
if rf, ok := ret.Get(1).(func(context.Context, string, string, int64) error); ok {
r1 = rf(ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId)
if rf, ok := ret.Get(1).(func(context.Context, string, string, int64, int) error); ok {
r1 = rf(ctx, messageQueueUrl, messageQueueAccessToken, lastMessageId, maxCapacity)
} else {
r1 = ret.Error(1)
}

View File

@@ -15,18 +15,28 @@ type Handler struct {
mock.Mock
}
// HandleDesiredRunnerCount provides a mock function with given fields: ctx, desiredRunnerCount
func (_m *Handler) HandleDesiredRunnerCount(ctx context.Context, desiredRunnerCount int) error {
ret := _m.Called(ctx, desiredRunnerCount)
// HandleDesiredRunnerCount provides a mock function with given fields: ctx, count, jobsCompleted
func (_m *Handler) HandleDesiredRunnerCount(ctx context.Context, count int, jobsCompleted int) (int, error) {
ret := _m.Called(ctx, count, jobsCompleted)
var r0 error
if rf, ok := ret.Get(0).(func(context.Context, int) error); ok {
r0 = rf(ctx, desiredRunnerCount)
var r0 int
var r1 error
if rf, ok := ret.Get(0).(func(context.Context, int, int) (int, error)); ok {
return rf(ctx, count, jobsCompleted)
}
if rf, ok := ret.Get(0).(func(context.Context, int, int) int); ok {
r0 = rf(ctx, count, jobsCompleted)
} else {
r0 = ret.Error(0)
r0 = ret.Get(0).(int)
}
return r0
if rf, ok := ret.Get(1).(func(context.Context, int, int) error); ok {
r1 = rf(ctx, count, jobsCompleted)
} else {
r1 = ret.Error(1)
}
return r0, r1
}
// HandleJobStarted provides a mock function with given fields: ctx, jobInfo

View File

@@ -4,6 +4,7 @@ import (
"context"
"net/http"
"strconv"
"time"
"github.com/actions/actions-runner-controller/github/actions"
"github.com/go-logr/logr"
@@ -223,8 +224,8 @@ type baseLabels struct {
func (b *baseLabels) jobLabels(jobBase *actions.JobMessageBase) prometheus.Labels {
return prometheus.Labels{
labelKeyEnterprise: b.enterprise,
labelKeyOrganization: b.organization,
labelKeyRepository: b.repository,
labelKeyOrganization: jobBase.OwnerName,
labelKeyRepository: jobBase.RepositoryName,
labelKeyJobName: jobBase.JobDisplayName,
labelKeyJobWorkflowRef: jobBase.JobWorkflowRef,
labelKeyEventName: jobBase.EventName,
@@ -271,8 +272,10 @@ type ServerPublisher interface {
ListenAndServe(ctx context.Context) error
}
var _ Publisher = &discard{}
var _ ServerPublisher = &exporter{}
var (
_ Publisher = &discard{}
_ ServerPublisher = &exporter{}
)
var Discard Publisher = &discard{}
@@ -336,7 +339,9 @@ func (e *exporter) ListenAndServe(ctx context.Context) error {
e.logger.Info("starting metrics server", "addr", e.srv.Addr)
go func() {
<-ctx.Done()
e.logger.Info("stopping metrics server")
e.logger.Info("stopping metrics server", "err", ctx.Err())
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
e.srv.Shutdown(ctx)
}()
return e.srv.ListenAndServe()

View File

@@ -11,6 +11,7 @@ import (
"github.com/actions/actions-runner-controller/logging"
jsonpatch "github.com/evanphx/json-patch"
"github.com/go-logr/logr"
kerrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
@@ -40,6 +41,7 @@ type Worker struct {
clientset *kubernetes.Clientset
config Config
lastPatch int
patchSeq int
logger *logr.Logger
}
@@ -49,6 +51,7 @@ func New(config Config, options ...Option) (*Worker, error) {
w := &Worker{
config: config,
lastPatch: -1,
patchSeq: -1,
}
conf, err := rest.InClusterConfig()
@@ -141,6 +144,10 @@ func (w *Worker) HandleJobStarted(ctx context.Context, jobInfo *actions.JobStart
Do(ctx).
Into(patchedStatus)
if err != nil {
if kerrors.IsNotFound(err) {
w.logger.Info("Ephemeral runner not found, skipping patching of ephemeral runner status", "runnerName", jobInfo.RunnerName)
return nil
}
return fmt.Errorf("could not patch ephemeral runner status, patch JSON: %s, error: %w", string(mergePatch), err)
}
@@ -156,55 +163,41 @@ func (w *Worker) HandleJobStarted(ctx context.Context, jobInfo *actions.JobStart
// The function then scales the ephemeral runner set by applying the merge patch.
// Finally, it logs the scaled ephemeral runner set details and returns nil if successful.
// If any error occurs during the process, it returns an error with a descriptive message.
func (w *Worker) HandleDesiredRunnerCount(ctx context.Context, count int) error {
// Max runners should always be set by the resource builder either to the configured value,
// or the maximum int32 (resourcebuilder.newAutoScalingListener()).
targetRunnerCount := min(w.config.MinRunners+count, w.config.MaxRunners)
logValues := []any{
"assigned job", count,
"decision", targetRunnerCount,
"min", w.config.MinRunners,
"max", w.config.MaxRunners,
"currentRunnerCount", w.lastPatch,
}
if targetRunnerCount == w.lastPatch {
w.logger.Info("Skipping patching of EphemeralRunnerSet as the desired count has not changed", logValues...)
return nil
}
func (w *Worker) HandleDesiredRunnerCount(ctx context.Context, count, jobsCompleted int) (int, error) {
patchID := w.setDesiredWorkerState(count, jobsCompleted)
original, err := json.Marshal(
&v1alpha1.EphemeralRunnerSet{
Spec: v1alpha1.EphemeralRunnerSetSpec{
Replicas: -1,
PatchID: -1,
},
},
)
if err != nil {
return fmt.Errorf("failed to marshal empty ephemeral runner set: %w", err)
return 0, fmt.Errorf("failed to marshal empty ephemeral runner set: %w", err)
}
patch, err := json.Marshal(
&v1alpha1.EphemeralRunnerSet{
Spec: v1alpha1.EphemeralRunnerSetSpec{
Replicas: targetRunnerCount,
Replicas: w.lastPatch,
PatchID: patchID,
},
},
)
if err != nil {
w.logger.Error(err, "could not marshal patch ephemeral runner set")
return err
return 0, err
}
w.logger.Info("Compare", "original", string(original), "patch", string(patch))
mergePatch, err := jsonpatch.CreateMergePatch(original, patch)
if err != nil {
return fmt.Errorf("failed to create merge patch json for ephemeral runner set: %w", err)
return 0, fmt.Errorf("failed to create merge patch json for ephemeral runner set: %w", err)
}
w.logger.Info("Created merge patch json for EphemeralRunnerSet update", "json", string(mergePatch))
w.logger.Info("Scaling ephemeral runner set", logValues...)
w.logger.Info("Preparing EphemeralRunnerSet update", "json", string(mergePatch))
patchedEphemeralRunnerSet := &v1alpha1.EphemeralRunnerSet{}
err = w.clientset.RESTClient().
@@ -217,7 +210,7 @@ func (w *Worker) HandleDesiredRunnerCount(ctx context.Context, count int) error
Do(ctx).
Into(patchedEphemeralRunnerSet)
if err != nil {
return fmt.Errorf("could not patch ephemeral runner set , patch JSON: %s, error: %w", string(mergePatch), err)
return 0, fmt.Errorf("could not patch ephemeral runner set , patch JSON: %s, error: %w", string(mergePatch), err)
}
w.logger.Info("Ephemeral runner set scaled.",
@@ -225,5 +218,40 @@ func (w *Worker) HandleDesiredRunnerCount(ctx context.Context, count int) error
"name", w.config.EphemeralRunnerSetName,
"replicas", patchedEphemeralRunnerSet.Spec.Replicas,
)
return nil
return w.lastPatch, nil
}
// calculateDesiredState calculates the desired state of the worker based on the desired count and the the number of jobs completed.
func (w *Worker) setDesiredWorkerState(count, jobsCompleted int) int {
// Max runners should always be set by the resource builder either to the configured value,
// or the maximum int32 (resourcebuilder.newAutoScalingListener()).
targetRunnerCount := min(w.config.MinRunners+count, w.config.MaxRunners)
w.patchSeq++
desiredPatchID := w.patchSeq
if count == 0 && jobsCompleted == 0 { // empty batch
targetRunnerCount = max(w.lastPatch, targetRunnerCount)
if targetRunnerCount == w.config.MinRunners {
// We have an empty batch, and the last patch was the min runners.
// Since this is an empty batch, and we are at the min runners, they should all be idle.
// If controller created few more pods on accident (during scale down events),
// this situation allows the controller to scale down to the min runners.
// However, it is important to keep the patch sequence increasing so we don't ignore one batch.
desiredPatchID = 0
}
}
w.lastPatch = targetRunnerCount
w.logger.Info(
"Calculated target runner count",
"assigned job", count,
"decision", targetRunnerCount,
"min", w.config.MinRunners,
"max", w.config.MaxRunners,
"currentRunnerCount", w.lastPatch,
"jobsCompleted", jobsCompleted,
)
return desiredPatchID
}

View File

@@ -0,0 +1,326 @@
package worker
import (
"math"
"testing"
"github.com/go-logr/logr"
"github.com/stretchr/testify/assert"
)
func TestSetDesiredWorkerState_MinMaxDefaults(t *testing.T) {
logger := logr.Discard()
newEmptyWorker := func() *Worker {
return &Worker{
config: Config{
MinRunners: 0,
MaxRunners: math.MaxInt32,
},
lastPatch: -1,
patchSeq: -1,
logger: &logger,
}
}
t.Run("init calculate with acquired 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(0, 0)
assert.Equal(t, 0, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
assert.Equal(t, 0, patchID)
})
t.Run("init calculate with acquired 1", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(1, 0)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
assert.Equal(t, 0, patchID)
})
t.Run("increment patch when job done", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(1, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 1)
assert.Equal(t, 1, patchID)
assert.Equal(t, 0, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("increment patch when called with same parameters", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(1, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(1, 0)
assert.Equal(t, 1, patchID)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("calculate desired scale when acquired > 0 and completed > 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(1, 1)
assert.Equal(t, 0, patchID)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
})
t.Run("re-use the last state when acquired == 0 and completed == 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(1, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 0)
assert.Equal(t, 1, patchID)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("adjust when acquired == 0 and completed == 1", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(1, 1)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 1)
assert.Equal(t, 1, patchID)
assert.Equal(t, 0, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
}
func TestSetDesiredWorkerState_MinSet(t *testing.T) {
logger := logr.Discard()
newEmptyWorker := func() *Worker {
return &Worker{
config: Config{
MinRunners: 1,
MaxRunners: math.MaxInt32,
},
lastPatch: -1,
patchSeq: -1,
logger: &logger,
}
}
t.Run("initial scale when acquired == 0 and completed == 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(0, 0)
assert.Equal(t, 0, patchID)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
})
t.Run("re-use the old state on count == 0 and completed == 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(2, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 0)
assert.Equal(t, 1, patchID)
assert.Equal(t, 3, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("request back to 0 on job done", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(2, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 1)
assert.Equal(t, 1, patchID)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("desired patch is 0 but sequence continues on empty batch and min runners", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(3, 0)
assert.Equal(t, 0, patchID)
assert.Equal(t, 4, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
patchID = w.setDesiredWorkerState(0, 3)
assert.Equal(t, 1, patchID)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
// Empty batch on min runners
patchID = w.setDesiredWorkerState(0, 0)
assert.Equal(t, 0, patchID) // forcing the state
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 2, w.patchSeq)
})
}
func TestSetDesiredWorkerState_MaxSet(t *testing.T) {
logger := logr.Discard()
newEmptyWorker := func() *Worker {
return &Worker{
config: Config{
MinRunners: 0,
MaxRunners: 5,
},
lastPatch: -1,
patchSeq: -1,
logger: &logger,
}
}
t.Run("initial scale when acquired == 0 and completed == 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(0, 0)
assert.Equal(t, 0, patchID)
assert.Equal(t, 0, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
})
t.Run("re-use the old state on count == 0 and completed == 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(2, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 0)
assert.Equal(t, 1, patchID)
assert.Equal(t, 2, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("request back to 0 on job done", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(2, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 1)
assert.Equal(t, 1, patchID)
assert.Equal(t, 0, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("scale up to max when count > max", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(6, 0)
assert.Equal(t, 0, patchID)
assert.Equal(t, 5, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
})
t.Run("scale to max when count == max", func(t *testing.T) {
w := newEmptyWorker()
w.setDesiredWorkerState(5, 0)
assert.Equal(t, 5, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
})
t.Run("scale to max when count > max and completed > 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(1, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(6, 1)
assert.Equal(t, 1, patchID)
assert.Equal(t, 5, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("scale back to 0 when count was > max", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(6, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 1)
assert.Equal(t, 1, patchID)
assert.Equal(t, 0, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("force 0 on empty batch and last patch == min runners", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(3, 0)
assert.Equal(t, 0, patchID)
assert.Equal(t, 3, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
patchID = w.setDesiredWorkerState(0, 3)
assert.Equal(t, 1, patchID)
assert.Equal(t, 0, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
// Empty batch on min runners
patchID = w.setDesiredWorkerState(0, 0)
assert.Equal(t, 0, patchID) // forcing the state
assert.Equal(t, 0, w.lastPatch)
assert.Equal(t, 2, w.patchSeq)
})
}
func TestSetDesiredWorkerState_MinMaxSet(t *testing.T) {
logger := logr.Discard()
newEmptyWorker := func() *Worker {
return &Worker{
config: Config{
MinRunners: 1,
MaxRunners: 3,
},
lastPatch: -1,
patchSeq: -1,
logger: &logger,
}
}
t.Run("initial scale when acquired == 0 and completed == 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(0, 0)
assert.Equal(t, 0, patchID)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
})
t.Run("re-use the old state on count == 0 and completed == 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(2, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 0)
assert.Equal(t, 1, patchID)
assert.Equal(t, 3, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("scale to min when count == 0", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(2, 0)
assert.Equal(t, 0, patchID)
patchID = w.setDesiredWorkerState(0, 1)
assert.Equal(t, 1, patchID)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
})
t.Run("scale up to max when count > max", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(4, 0)
assert.Equal(t, 0, patchID)
assert.Equal(t, 3, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
})
t.Run("scale to max when count == max", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(3, 0)
assert.Equal(t, 0, patchID)
assert.Equal(t, 3, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
})
t.Run("force 0 on empty batch and last patch == min runners", func(t *testing.T) {
w := newEmptyWorker()
patchID := w.setDesiredWorkerState(3, 0)
assert.Equal(t, 0, patchID)
assert.Equal(t, 3, w.lastPatch)
assert.Equal(t, 0, w.patchSeq)
patchID = w.setDesiredWorkerState(0, 3)
assert.Equal(t, 1, patchID)
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 1, w.patchSeq)
// Empty batch on min runners
patchID = w.setDesiredWorkerState(0, 0)
assert.Equal(t, 0, patchID) // forcing the state
assert.Equal(t, 1, w.lastPatch)
assert.Equal(t, 2, w.patchSeq)
})
}

View File

@@ -129,7 +129,7 @@ func (m *AutoScalerClient) Close() error {
return m.client.Close()
}
func (m *AutoScalerClient) GetRunnerScaleSetMessage(ctx context.Context, handler func(msg *actions.RunnerScaleSetMessage) error) error {
func (m *AutoScalerClient) GetRunnerScaleSetMessage(ctx context.Context, handler func(msg *actions.RunnerScaleSetMessage) error, maxCapacity int) error {
if m.initialMessage != nil {
err := handler(m.initialMessage)
if err != nil {
@@ -141,7 +141,7 @@ func (m *AutoScalerClient) GetRunnerScaleSetMessage(ctx context.Context, handler
}
for {
message, err := m.client.GetMessage(ctx, m.lastMessageId)
message, err := m.client.GetMessage(ctx, m.lastMessageId, maxCapacity)
if err != nil {
return fmt.Errorf("get message failed from refreshing client. %w", err)
}

View File

@@ -317,7 +317,7 @@ func TestGetRunnerScaleSetMessage(t *testing.T) {
Statistics: &actions.RunnerScaleSetStatistic{},
}
mockActionsClient.On("CreateMessageSession", ctx, 1, mock.Anything).Return(session, nil)
mockSessionClient.On("GetMessage", ctx, int64(0)).Return(&actions.RunnerScaleSetMessage{
mockSessionClient.On("GetMessage", ctx, int64(0), mock.Anything).Return(&actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "test",
Body: "test",
@@ -332,7 +332,7 @@ func TestGetRunnerScaleSetMessage(t *testing.T) {
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return nil
})
}, 10)
assert.NoError(t, err, "Error getting message")
assert.Equal(t, int64(0), asClient.lastMessageId, "Initial message")
@@ -340,7 +340,7 @@ func TestGetRunnerScaleSetMessage(t *testing.T) {
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return nil
})
}, 10)
assert.NoError(t, err, "Error getting message")
assert.Equal(t, int64(1), asClient.lastMessageId, "Last message id should be updated")
@@ -368,7 +368,7 @@ func TestGetRunnerScaleSetMessage_HandleFailed(t *testing.T) {
Statistics: &actions.RunnerScaleSetStatistic{},
}
mockActionsClient.On("CreateMessageSession", ctx, 1, mock.Anything).Return(session, nil)
mockSessionClient.On("GetMessage", ctx, int64(0)).Return(&actions.RunnerScaleSetMessage{
mockSessionClient.On("GetMessage", ctx, int64(0), mock.Anything).Return(&actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "test",
Body: "test",
@@ -383,14 +383,14 @@ func TestGetRunnerScaleSetMessage_HandleFailed(t *testing.T) {
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return nil
})
}, 10)
assert.NoError(t, err, "Error getting message")
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return fmt.Errorf("error")
})
}, 10)
assert.ErrorContains(t, err, "handle message failed. error", "Error getting message")
assert.Equal(t, int64(0), asClient.lastMessageId, "Last message id should not be updated")
@@ -419,7 +419,7 @@ func TestGetRunnerScaleSetMessage_HandleInitialMessage(t *testing.T) {
TotalAssignedJobs: 2,
},
}
mockActionsClient.On("CreateMessageSession", ctx, 1, mock.Anything).Return(session, nil)
mockActionsClient.On("CreateMessageSession", ctx, 1, mock.Anything, mock.Anything).Return(session, nil)
mockActionsClient.On("GetAcquirableJobs", ctx, 1).Return(&actions.AcquirableJobList{
Count: 1,
Jobs: []actions.AcquirableJob{
@@ -439,7 +439,7 @@ func TestGetRunnerScaleSetMessage_HandleInitialMessage(t *testing.T) {
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return nil
})
}, 10)
assert.NoError(t, err, "Error getting message")
assert.Nil(t, asClient.initialMessage, "Initial message should be nil")
@@ -488,7 +488,7 @@ func TestGetRunnerScaleSetMessage_HandleInitialMessageFailed(t *testing.T) {
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return fmt.Errorf("error")
})
}, 10)
assert.ErrorContains(t, err, "fail to process initial message. error", "Error getting message")
assert.NotNil(t, asClient.initialMessage, "Initial message should be nil")
@@ -516,8 +516,8 @@ func TestGetRunnerScaleSetMessage_RetryUntilGetMessage(t *testing.T) {
Statistics: &actions.RunnerScaleSetStatistic{},
}
mockActionsClient.On("CreateMessageSession", ctx, 1, mock.Anything).Return(session, nil)
mockSessionClient.On("GetMessage", ctx, int64(0)).Return(nil, nil).Times(3)
mockSessionClient.On("GetMessage", ctx, int64(0)).Return(&actions.RunnerScaleSetMessage{
mockSessionClient.On("GetMessage", ctx, int64(0), mock.Anything).Return(nil, nil).Times(3)
mockSessionClient.On("GetMessage", ctx, int64(0), mock.Anything).Return(&actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "test",
Body: "test",
@@ -532,13 +532,13 @@ func TestGetRunnerScaleSetMessage_RetryUntilGetMessage(t *testing.T) {
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return nil
})
}, 10)
assert.NoError(t, err, "Error getting initial message")
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return nil
})
}, 10)
assert.NoError(t, err, "Error getting message")
assert.Equal(t, int64(1), asClient.lastMessageId, "Last message id should be updated")
@@ -565,7 +565,7 @@ func TestGetRunnerScaleSetMessage_ErrorOnGetMessage(t *testing.T) {
Statistics: &actions.RunnerScaleSetStatistic{},
}
mockActionsClient.On("CreateMessageSession", ctx, 1, mock.Anything).Return(session, nil)
mockSessionClient.On("GetMessage", ctx, int64(0)).Return(nil, fmt.Errorf("error"))
mockSessionClient.On("GetMessage", ctx, int64(0), mock.Anything).Return(nil, fmt.Errorf("error"))
asClient, err := NewAutoScalerClient(ctx, mockActionsClient, &logger, 1, func(asc *AutoScalerClient) {
asc.client = mockSessionClient
@@ -575,12 +575,12 @@ func TestGetRunnerScaleSetMessage_ErrorOnGetMessage(t *testing.T) {
// process initial message
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
return nil
})
}, 10)
assert.NoError(t, err, "Error getting initial message")
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
return fmt.Errorf("Should not be called")
})
}, 10)
assert.ErrorContains(t, err, "get message failed from refreshing client. error", "Error should be returned")
assert.Equal(t, int64(0), asClient.lastMessageId, "Last message id should be updated")
@@ -608,7 +608,7 @@ func TestDeleteRunnerScaleSetMessage_Error(t *testing.T) {
Statistics: &actions.RunnerScaleSetStatistic{},
}
mockActionsClient.On("CreateMessageSession", ctx, 1, mock.Anything).Return(session, nil)
mockSessionClient.On("GetMessage", ctx, int64(0)).Return(&actions.RunnerScaleSetMessage{
mockSessionClient.On("GetMessage", ctx, int64(0), mock.Anything).Return(&actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "test",
Body: "test",
@@ -623,13 +623,13 @@ func TestDeleteRunnerScaleSetMessage_Error(t *testing.T) {
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return nil
})
}, 10)
assert.NoError(t, err, "Error getting initial message")
err = asClient.GetRunnerScaleSetMessage(ctx, func(msg *actions.RunnerScaleSetMessage) error {
logger.Info("Message received", "messageId", msg.MessageId, "messageType", msg.MessageType, "body", msg.Body)
return nil
})
}, 10)
assert.ErrorContains(t, err, "delete message failed from refreshing client. error", "Error getting message")
assert.Equal(t, int64(1), asClient.lastMessageId, "Last message id should be updated")

View File

@@ -89,7 +89,7 @@ func (s *Service) Start() error {
s.logger.Info("service is stopped.")
return nil
default:
err := s.rsClient.GetRunnerScaleSetMessage(s.ctx, s.processMessage)
err := s.rsClient.GetRunnerScaleSetMessage(s.ctx, s.processMessage, s.settings.MaxRunners)
if err != nil {
return fmt.Errorf("could not get and process message. %w", err)
}

View File

@@ -64,7 +64,7 @@ func TestStart(t *testing.T) {
)
require.NoError(t, err)
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything).Run(func(args mock.Arguments) { cancel() }).Return(nil).Once()
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything, mock.Anything).Run(func(mock.Arguments) { cancel() }).Return(nil).Once()
err = service.Start()
@@ -98,7 +98,7 @@ func TestStart_ScaleToMinRunners(t *testing.T) {
)
require.NoError(t, err)
mockRsClient.On("GetRunnerScaleSetMessage", ctx, mock.Anything).Run(func(args mock.Arguments) {
mockRsClient.On("GetRunnerScaleSetMessage", ctx, mock.Anything, mock.Anything).Run(func(args mock.Arguments) {
_ = service.scaleForAssignedJobCount(5)
}).Return(nil)
@@ -137,7 +137,7 @@ func TestStart_ScaleToMinRunnersFailed(t *testing.T) {
require.NoError(t, err)
c := mockKubeManager.On("ScaleEphemeralRunnerSet", ctx, service.settings.Namespace, service.settings.ResourceName, 5).Return(fmt.Errorf("error")).Once()
mockRsClient.On("GetRunnerScaleSetMessage", ctx, mock.Anything).Run(func(args mock.Arguments) {
mockRsClient.On("GetRunnerScaleSetMessage", ctx, mock.Anything, mock.Anything).Run(func(args mock.Arguments) {
_ = service.scaleForAssignedJobCount(5)
}).Return(c.ReturnArguments.Get(0))
@@ -172,8 +172,8 @@ func TestStart_GetMultipleMessages(t *testing.T) {
)
require.NoError(t, err)
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything).Return(nil).Times(5)
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything).Run(func(args mock.Arguments) { cancel() }).Return(nil).Once()
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything, mock.Anything).Return(nil).Times(5)
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything, mock.Anything).Run(func(args mock.Arguments) { cancel() }).Return(nil).Once()
err = service.Start()
@@ -207,8 +207,8 @@ func TestStart_ErrorOnMessage(t *testing.T) {
)
require.NoError(t, err)
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything).Return(nil).Times(2)
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything).Return(fmt.Errorf("error")).Once()
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything, mock.Anything).Return(nil).Times(2)
mockRsClient.On("GetRunnerScaleSetMessage", service.ctx, mock.Anything, mock.Anything).Return(fmt.Errorf("error")).Once()
err = service.Start()

View File

@@ -8,6 +8,6 @@ import (
//go:generate mockery --inpackage --name=RunnerScaleSetClient
type RunnerScaleSetClient interface {
GetRunnerScaleSetMessage(ctx context.Context, handler func(msg *actions.RunnerScaleSetMessage) error) error
GetRunnerScaleSetMessage(ctx context.Context, handler func(msg *actions.RunnerScaleSetMessage) error, maxCapacity int) error
AcquireJobsForRunnerScaleSet(ctx context.Context, requestIds []int64) error
}

View File

@@ -29,13 +29,13 @@ func (_m *MockRunnerScaleSetClient) AcquireJobsForRunnerScaleSet(ctx context.Con
return r0
}
// GetRunnerScaleSetMessage provides a mock function with given fields: ctx, handler
func (_m *MockRunnerScaleSetClient) GetRunnerScaleSetMessage(ctx context.Context, handler func(*actions.RunnerScaleSetMessage) error) error {
ret := _m.Called(ctx, handler)
// GetRunnerScaleSetMessage provides a mock function with given fields: ctx, handler, maxCapacity
func (_m *MockRunnerScaleSetClient) GetRunnerScaleSetMessage(ctx context.Context, handler func(*actions.RunnerScaleSetMessage) error, maxCapacity int) error {
ret := _m.Called(ctx, handler, maxCapacity)
var r0 error
if rf, ok := ret.Get(0).(func(context.Context, func(*actions.RunnerScaleSetMessage) error) error); ok {
r0 = rf(ctx, handler)
if rf, ok := ret.Get(0).(func(context.Context, func(*actions.RunnerScaleSetMessage) error, int) error); ok {
r0 = rf(ctx, handler, maxCapacity)
} else {
r0 = ret.Error(0)
}

View File

@@ -24,8 +24,12 @@ func newSessionClient(client actions.ActionsService, logger *logr.Logger, sessio
}
}
func (m *SessionRefreshingClient) GetMessage(ctx context.Context, lastMessageId int64) (*actions.RunnerScaleSetMessage, error) {
message, err := m.client.GetMessage(ctx, m.session.MessageQueueUrl, m.session.MessageQueueAccessToken, lastMessageId)
func (m *SessionRefreshingClient) GetMessage(ctx context.Context, lastMessageId int64, maxCapacity int) (*actions.RunnerScaleSetMessage, error) {
if maxCapacity < 0 {
return nil, fmt.Errorf("maxCapacity must be greater than or equal to 0")
}
message, err := m.client.GetMessage(ctx, m.session.MessageQueueUrl, m.session.MessageQueueAccessToken, lastMessageId, maxCapacity)
if err == nil {
return message, nil
}
@@ -42,7 +46,7 @@ func (m *SessionRefreshingClient) GetMessage(ctx context.Context, lastMessageId
}
m.session = session
message, err = m.client.GetMessage(ctx, m.session.MessageQueueUrl, m.session.MessageQueueAccessToken, lastMessageId)
message, err = m.client.GetMessage(ctx, m.session.MessageQueueUrl, m.session.MessageQueueAccessToken, lastMessageId, maxCapacity)
if err != nil {
return nil, fmt.Errorf("delete message failed after refresh message session. %w", err)
}

View File

@@ -31,17 +31,17 @@ func TestGetMessage(t *testing.T) {
},
}
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0)).Return(nil, nil).Once()
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0)).Return(&actions.RunnerScaleSetMessage{MessageId: 1}, nil).Once()
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0), 10).Return(nil, nil).Once()
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0), 10).Return(&actions.RunnerScaleSetMessage{MessageId: 1}, nil).Once()
client := newSessionClient(mockActionsClient, &logger, session)
msg, err := client.GetMessage(ctx, 0)
msg, err := client.GetMessage(ctx, 0, 10)
require.NoError(t, err, "GetMessage should not return an error")
assert.Nil(t, msg, "GetMessage should return nil message")
msg, err = client.GetMessage(ctx, 0)
msg, err = client.GetMessage(ctx, 0, 10)
require.NoError(t, err, "GetMessage should not return an error")
assert.Equal(t, int64(1), msg.MessageId, "GetMessage should return a message with id 1")
@@ -146,11 +146,11 @@ func TestGetMessage_Error(t *testing.T) {
},
}
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0)).Return(nil, fmt.Errorf("error")).Once()
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0), 10).Return(nil, fmt.Errorf("error")).Once()
client := newSessionClient(mockActionsClient, &logger, session)
msg, err := client.GetMessage(ctx, 0)
msg, err := client.GetMessage(ctx, 0, 10)
assert.ErrorContains(t, err, "get message failed. error", "GetMessage should return an error")
assert.Nil(t, msg, "GetMessage should return nil message")
assert.True(t, mockActionsClient.AssertExpectations(t), "All expected calls to mockActionsClient should have been made")
@@ -227,8 +227,8 @@ func TestGetMessage_RefreshToken(t *testing.T) {
Id: 1,
},
}
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0)).Return(nil, &actions.MessageQueueTokenExpiredError{}).Once()
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, "token2", int64(0)).Return(&actions.RunnerScaleSetMessage{
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0), 10).Return(nil, &actions.MessageQueueTokenExpiredError{}).Once()
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, "token2", int64(0), 10).Return(&actions.RunnerScaleSetMessage{
MessageId: 1,
MessageType: "test",
Body: "test",
@@ -243,7 +243,7 @@ func TestGetMessage_RefreshToken(t *testing.T) {
}, nil).Once()
client := newSessionClient(mockActionsClient, &logger, session)
msg, err := client.GetMessage(ctx, 0)
msg, err := client.GetMessage(ctx, 0, 10)
assert.NoError(t, err, "Error getting message")
assert.Equal(t, int64(1), msg.MessageId, "message id should be updated")
assert.Equal(t, "token2", client.session.MessageQueueAccessToken, "Message queue access token should be updated")
@@ -340,11 +340,11 @@ func TestGetMessage_RefreshToken_Failed(t *testing.T) {
Id: 1,
},
}
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0)).Return(nil, &actions.MessageQueueTokenExpiredError{}).Once()
mockActionsClient.On("GetMessage", ctx, session.MessageQueueUrl, session.MessageQueueAccessToken, int64(0), 10).Return(nil, &actions.MessageQueueTokenExpiredError{}).Once()
mockActionsClient.On("RefreshMessageSession", ctx, session.RunnerScaleSet.Id, session.SessionId).Return(nil, fmt.Errorf("error"))
client := newSessionClient(mockActionsClient, &logger, session)
msg, err := client.GetMessage(ctx, 0)
msg, err := client.GetMessage(ctx, 0, 10)
assert.ErrorContains(t, err, "refresh message session failed. error", "Error should be returned")
assert.Nil(t, msg, "Message should be nil")
assert.Equal(t, "token", client.session.MessageQueueAccessToken, "Message queue access token should not be updated")

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -3,7 +3,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.13.0
controller-gen.kubebuilder.io/version: v0.14.0
name: horizontalrunnerautoscalers.actions.summerwind.dev
spec:
group: actions.summerwind.dev
@@ -35,10 +35,19 @@ spec:
description: HorizontalRunnerAutoscaler is the Schema for the horizontalrunnerautoscaler API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
@@ -47,7 +56,9 @@ spec:
properties:
capacityReservations:
items:
description: CapacityReservation specifies the number of replicas temporarily added to the scale target until ExpirationTime.
description: |-
CapacityReservation specifies the number of replicas temporarily added
to the scale target until ExpirationTime.
properties:
effectiveTime:
format: date-time
@@ -79,30 +90,46 @@ spec:
items:
properties:
repositoryNames:
description: RepositoryNames is the list of repository names to be used for calculating the metric. For example, a repository name is the REPO part of `github.com/USER/REPO`.
description: |-
RepositoryNames is the list of repository names to be used for calculating the metric.
For example, a repository name is the REPO part of `github.com/USER/REPO`.
items:
type: string
type: array
scaleDownAdjustment:
description: ScaleDownAdjustment is the number of runners removed on scale-down. You can only specify either ScaleDownFactor or ScaleDownAdjustment.
description: |-
ScaleDownAdjustment is the number of runners removed on scale-down.
You can only specify either ScaleDownFactor or ScaleDownAdjustment.
type: integer
scaleDownFactor:
description: ScaleDownFactor is the multiplicative factor applied to the current number of runners used to determine how many pods should be removed.
description: |-
ScaleDownFactor is the multiplicative factor applied to the current number of runners used
to determine how many pods should be removed.
type: string
scaleDownThreshold:
description: ScaleDownThreshold is the percentage of busy runners less than which will trigger the hpa to scale the runners down.
description: |-
ScaleDownThreshold is the percentage of busy runners less than which will
trigger the hpa to scale the runners down.
type: string
scaleUpAdjustment:
description: ScaleUpAdjustment is the number of runners added on scale-up. You can only specify either ScaleUpFactor or ScaleUpAdjustment.
description: |-
ScaleUpAdjustment is the number of runners added on scale-up.
You can only specify either ScaleUpFactor or ScaleUpAdjustment.
type: integer
scaleUpFactor:
description: ScaleUpFactor is the multiplicative factor applied to the current number of runners used to determine how many pods should be added.
description: |-
ScaleUpFactor is the multiplicative factor applied to the current number of runners used
to determine how many pods should be added.
type: string
scaleUpThreshold:
description: ScaleUpThreshold is the percentage of busy runners greater than which will trigger the hpa to scale runners up.
description: |-
ScaleUpThreshold is the percentage of busy runners greater than which will
trigger the hpa to scale runners up.
type: string
type:
description: Type is the type of metric to be used for autoscaling. It can be TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy.
description: |-
Type is the type of metric to be used for autoscaling.
It can be TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy.
type: string
type: object
type: array
@@ -110,7 +137,9 @@ spec:
description: MinReplicas is the minimum number of replicas the deployment is allowed to scale
type: integer
scaleDownDelaySecondsAfterScaleOut:
description: ScaleDownDelaySecondsAfterScaleUp is the approximate delay for a scale down followed by a scale up Used to prevent flapping (down->up->down->... loop)
description: |-
ScaleDownDelaySecondsAfterScaleUp is the approximate delay for a scale down followed by a scale up
Used to prevent flapping (down->up->down->... loop)
type: integer
scaleTargetRef:
description: ScaleTargetRef is the reference to scaled resource like RunnerDeployment
@@ -126,7 +155,18 @@ spec:
type: string
type: object
scaleUpTriggers:
description: "ScaleUpTriggers is an experimental feature to increase the desired replicas by 1 on each webhook requested received by the webhookBasedAutoscaler. \n This feature requires you to also enable and deploy the webhookBasedAutoscaler onto your cluster. \n Note that the added runners remain until the next sync period at least, and they may or may not be used by GitHub Actions depending on the timing. They are intended to be used to gain \"resource slack\" immediately after you receive a webhook from GitHub, so that you can loosely expect MinReplicas runners to be always available."
description: |-
ScaleUpTriggers is an experimental feature to increase the desired replicas by 1
on each webhook requested received by the webhookBasedAutoscaler.
This feature requires you to also enable and deploy the webhookBasedAutoscaler onto your cluster.
Note that the added runners remain until the next sync period at least,
and they may or may not be used by GitHub Actions depending on the timing.
They are intended to be used to gain "resource slack" immediately after you
receive a webhook from GitHub, so that you can loosely expect MinReplicas runners to be always available.
items:
properties:
amount:
@@ -139,12 +179,18 @@ spec:
description: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#check_run
properties:
names:
description: Names is a list of GitHub Actions glob patterns. Any check_run event whose name matches one of patterns in the list can trigger autoscaling. Note that check_run name seem to equal to the job name you've defined in your actions workflow yaml file. So it is very likely that you can utilize this to trigger depending on the job.
description: |-
Names is a list of GitHub Actions glob patterns.
Any check_run event whose name matches one of patterns in the list can trigger autoscaling.
Note that check_run name seem to equal to the job name you've defined in your actions workflow yaml file.
So it is very likely that you can utilize this to trigger depending on the job.
items:
type: string
type: array
repositories:
description: Repositories is a list of GitHub repositories. Any check_run event whose repository matches one of repositories in the list can trigger autoscaling.
description: |-
Repositories is a list of GitHub repositories.
Any check_run event whose repository matches one of repositories in the list can trigger autoscaling.
items:
type: string
type: array
@@ -169,7 +215,9 @@ spec:
type: array
type: object
push:
description: PushSpec is the condition for triggering scale-up on push event Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
description: |-
PushSpec is the condition for triggering scale-up on push event
Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
type: object
workflowJob:
description: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job
@@ -178,23 +226,33 @@ spec:
type: object
type: array
scheduledOverrides:
description: ScheduledOverrides is the list of ScheduledOverride. It can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule. The earlier a scheduled override is, the higher it is prioritized.
description: |-
ScheduledOverrides is the list of ScheduledOverride.
It can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule.
The earlier a scheduled override is, the higher it is prioritized.
items:
description: ScheduledOverride can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule. A schedule can optionally be recurring, so that the corresponding override happens every day, week, month, or year.
description: |-
ScheduledOverride can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule.
A schedule can optionally be recurring, so that the corresponding override happens every day, week, month, or year.
properties:
endTime:
description: EndTime is the time at which the first override ends.
format: date-time
type: string
minReplicas:
description: MinReplicas is the number of runners while overriding. If omitted, it doesn't override minReplicas.
description: |-
MinReplicas is the number of runners while overriding.
If omitted, it doesn't override minReplicas.
minimum: 0
nullable: true
type: integer
recurrenceRule:
properties:
frequency:
description: Frequency is the name of a predefined interval of each recurrence. The valid values are "Daily", "Weekly", "Monthly", and "Yearly". If empty, the corresponding override happens only once.
description: |-
Frequency is the name of a predefined interval of each recurrence.
The valid values are "Daily", "Weekly", "Monthly", and "Yearly".
If empty, the corresponding override happens only once.
enum:
- Daily
- Weekly
@@ -202,7 +260,9 @@ spec:
- Yearly
type: string
untilTime:
description: UntilTime is the time of the final recurrence. If empty, the schedule recurs forever.
description: |-
UntilTime is the time of the final recurrence.
If empty, the schedule recurs forever.
format: date-time
type: string
type: object
@@ -231,18 +291,24 @@ spec:
type: object
type: array
desiredReplicas:
description: DesiredReplicas is the total number of desired, non-terminated and latest pods to be set for the primary RunnerSet This doesn't include outdated pods while upgrading the deployment and replacing the runnerset.
description: |-
DesiredReplicas is the total number of desired, non-terminated and latest pods to be set for the primary RunnerSet
This doesn't include outdated pods while upgrading the deployment and replacing the runnerset.
type: integer
lastSuccessfulScaleOutTime:
format: date-time
nullable: true
type: string
observedGeneration:
description: ObservedGeneration is the most recent generation observed for the target. It corresponds to e.g. RunnerDeployment's generation, which is updated on mutation by the API Server.
description: |-
ObservedGeneration is the most recent generation observed for the target. It corresponds to e.g.
RunnerDeployment's generation, which is updated on mutation by the API Server.
format: int64
type: integer
scheduledOverridesSummary:
description: ScheduledOverridesSummary is the summary of active and upcoming scheduled overrides to be shown in e.g. a column of a `kubectl get hra` output for observability.
description: |-
ScheduledOverridesSummary is the summary of active and upcoming scheduled overrides to be shown in e.g. a column of a `kubectl get hra` output
for observability.
type: string
type: object
type: object

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -55,7 +55,7 @@ type AutoscalingListenerReconciler struct {
ListenerMetricsAddr string
ListenerMetricsEndpoint string
resourceBuilder resourceBuilder
ResourceBuilder
}
// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch;create;update;patch;delete
@@ -242,17 +242,27 @@ func (r *AutoscalingListenerReconciler) Reconcile(ctx context.Context, req ctrl.
return r.createListenerPod(ctx, &autoscalingRunnerSet, autoscalingListener, serviceAccount, mirrorSecret, log)
}
// The listener pod failed might mean the mirror secret is out of date
// Delete the listener pod and re-create it to make sure the mirror secret is up to date
if listenerPod.Status.Phase == corev1.PodFailed && listenerPod.DeletionTimestamp.IsZero() {
log.Info("Listener pod failed, deleting it and re-creating it", "namespace", listenerPod.Namespace, "name", listenerPod.Name, "reason", listenerPod.Status.Reason, "message", listenerPod.Status.Message)
if err := r.Delete(ctx, listenerPod); err != nil && !kerrors.IsNotFound(err) {
log.Error(err, "Unable to delete the listener pod", "namespace", listenerPod.Namespace, "name", listenerPod.Name)
return ctrl.Result{}, err
}
}
cs := listenerContainerStatus(listenerPod)
switch {
case cs == nil:
log.Info("Listener pod is not ready", "namespace", listenerPod.Namespace, "name", listenerPod.Name)
return ctrl.Result{}, nil
case cs.State.Terminated != nil:
log.Info("Listener pod is terminated", "namespace", listenerPod.Namespace, "name", listenerPod.Name, "reason", cs.State.Terminated.Reason, "message", cs.State.Terminated.Message)
if listenerPod.Status.Phase == corev1.PodRunning {
if err := r.publishRunningListener(autoscalingListener, false); err != nil {
log.Error(err, "Unable to publish runner listener down metric", "namespace", listenerPod.Namespace, "name", listenerPod.Name)
}
if listenerPod.DeletionTimestamp.IsZero() {
log.Info("Deleting the listener pod", "namespace", listenerPod.Namespace, "name", listenerPod.Name)
if err := r.Delete(ctx, listenerPod); err != nil && !kerrors.IsNotFound(err) {
log.Error(err, "Unable to delete the listener pod", "namespace", listenerPod.Namespace, "name", listenerPod.Name)
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
case cs.State.Running != nil:
if err := r.publishRunningListener(autoscalingListener, true); err != nil {
log.Error(err, "Unable to publish running listener", "namespace", listenerPod.Namespace, "name", listenerPod.Name)
// stop reconciling. We should never get to this point but if we do,
@@ -260,8 +270,8 @@ func (r *AutoscalingListenerReconciler) Reconcile(ctx context.Context, req ctrl.
// notify the reconciler again.
return ctrl.Result{}, nil
}
return ctrl.Result{}, nil
}
return ctrl.Result{}, nil
}
@@ -373,7 +383,7 @@ func (r *AutoscalingListenerReconciler) cleanupResources(ctx context.Context, au
}
func (r *AutoscalingListenerReconciler) createServiceAccountForListener(ctx context.Context, autoscalingListener *v1alpha1.AutoscalingListener, logger logr.Logger) (ctrl.Result, error) {
newServiceAccount := r.resourceBuilder.newScaleSetListenerServiceAccount(autoscalingListener)
newServiceAccount := r.ResourceBuilder.newScaleSetListenerServiceAccount(autoscalingListener)
if err := ctrl.SetControllerReference(autoscalingListener, newServiceAccount, r.Scheme); err != nil {
return ctrl.Result{}, err
@@ -458,7 +468,7 @@ func (r *AutoscalingListenerReconciler) createListenerPod(ctx context.Context, a
logger.Info("Creating listener config secret")
podConfig, err := r.resourceBuilder.newScaleSetListenerConfig(autoscalingListener, secret, metricsConfig, cert)
podConfig, err := r.ResourceBuilder.newScaleSetListenerConfig(autoscalingListener, secret, metricsConfig, cert)
if err != nil {
logger.Error(err, "Failed to build listener config secret")
return ctrl.Result{}, err
@@ -477,7 +487,7 @@ func (r *AutoscalingListenerReconciler) createListenerPod(ctx context.Context, a
return ctrl.Result{Requeue: true}, nil
}
newPod, err := r.resourceBuilder.newScaleSetListenerPod(autoscalingListener, &podConfig, serviceAccount, secret, metricsConfig, envs...)
newPod, err := r.ResourceBuilder.newScaleSetListenerPod(autoscalingListener, &podConfig, serviceAccount, secret, metricsConfig, envs...)
if err != nil {
logger.Error(err, "Failed to build listener pod")
return ctrl.Result{}, err
@@ -537,7 +547,7 @@ func (r *AutoscalingListenerReconciler) certificate(ctx context.Context, autosca
}
func (r *AutoscalingListenerReconciler) createSecretsForListener(ctx context.Context, autoscalingListener *v1alpha1.AutoscalingListener, secret *corev1.Secret, logger logr.Logger) (ctrl.Result, error) {
newListenerSecret := r.resourceBuilder.newScaleSetListenerSecretMirror(autoscalingListener, secret)
newListenerSecret := r.ResourceBuilder.newScaleSetListenerSecretMirror(autoscalingListener, secret)
if err := ctrl.SetControllerReference(autoscalingListener, newListenerSecret, r.Scheme); err != nil {
return ctrl.Result{}, err
@@ -609,7 +619,7 @@ func (r *AutoscalingListenerReconciler) updateSecretsForListener(ctx context.Con
}
func (r *AutoscalingListenerReconciler) createRoleForListener(ctx context.Context, autoscalingListener *v1alpha1.AutoscalingListener, logger logr.Logger) (ctrl.Result, error) {
newRole := r.resourceBuilder.newScaleSetListenerRole(autoscalingListener)
newRole := r.ResourceBuilder.newScaleSetListenerRole(autoscalingListener)
logger.Info("Creating listener role", "namespace", newRole.Namespace, "name", newRole.Name, "rules", newRole.Rules)
if err := r.Create(ctx, newRole); err != nil {
@@ -637,7 +647,7 @@ func (r *AutoscalingListenerReconciler) updateRoleForListener(ctx context.Contex
}
func (r *AutoscalingListenerReconciler) createRoleBindingForListener(ctx context.Context, autoscalingListener *v1alpha1.AutoscalingListener, listenerRole *rbacv1.Role, serviceAccount *corev1.ServiceAccount, logger logr.Logger) (ctrl.Result, error) {
newRoleBinding := r.resourceBuilder.newScaleSetListenerRoleBinding(autoscalingListener, listenerRole, serviceAccount)
newRoleBinding := r.ResourceBuilder.newScaleSetListenerRoleBinding(autoscalingListener, listenerRole, serviceAccount)
logger.Info("Creating listener role binding",
"namespace", newRoleBinding.Namespace,
@@ -690,30 +700,6 @@ func (r *AutoscalingListenerReconciler) publishRunningListener(autoscalingListen
// SetupWithManager sets up the controller with the Manager.
func (r *AutoscalingListenerReconciler) SetupWithManager(mgr ctrl.Manager) error {
groupVersionIndexer := func(rawObj client.Object) []string {
groupVersion := v1alpha1.GroupVersion.String()
owner := metav1.GetControllerOf(rawObj)
if owner == nil {
return nil
}
// ...make sure it is owned by this controller
if owner.APIVersion != groupVersion || owner.Kind != "AutoscalingListener" {
return nil
}
// ...and if so, return it
return []string{owner.Name}
}
if err := mgr.GetFieldIndexer().IndexField(context.Background(), &corev1.Pod{}, resourceOwnerKey, groupVersionIndexer); err != nil {
return err
}
if err := mgr.GetFieldIndexer().IndexField(context.Background(), &corev1.ServiceAccount{}, resourceOwnerKey, groupVersionIndexer); err != nil {
return err
}
labelBasedWatchFunc := func(_ context.Context, obj client.Object) []reconcile.Request {
var requests []reconcile.Request
labels := obj.GetLabels()
@@ -746,3 +732,13 @@ func (r *AutoscalingListenerReconciler) SetupWithManager(mgr ctrl.Manager) error
WithEventFilter(predicate.ResourceVersionChangedPredicate{}).
Complete(r)
}
func listenerContainerStatus(pod *corev1.Pod) *corev1.ContainerStatus {
for i := range pod.Status.ContainerStatuses {
cs := &pod.Status.ContainerStatuses[i]
if cs.Name == autoscalingListenerContainerName {
return cs
}
}
return nil
}

View File

@@ -21,11 +21,11 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
actionsv1alpha1 "github.com/actions/actions-runner-controller/apis/actions.github.com/v1alpha1"
"github.com/actions/actions-runner-controller/apis/actions.github.com/v1alpha1"
)
const (
autoscalingListenerTestTimeout = time.Second * 5
autoscalingListenerTestTimeout = time.Second * 20
autoscalingListenerTestInterval = time.Millisecond * 250
autoscalingListenerTestGitHubToken = "gh_token"
)
@@ -34,9 +34,9 @@ var _ = Describe("Test AutoScalingListener controller", func() {
var ctx context.Context
var mgr ctrl.Manager
var autoscalingNS *corev1.Namespace
var autoscalingRunnerSet *actionsv1alpha1.AutoscalingRunnerSet
var autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet
var configSecret *corev1.Secret
var autoscalingListener *actionsv1alpha1.AutoscalingListener
var autoscalingListener *v1alpha1.AutoscalingListener
BeforeEach(func() {
ctx = context.Background()
@@ -53,12 +53,12 @@ var _ = Describe("Test AutoScalingListener controller", func() {
min := 1
max := 10
autoscalingRunnerSet = &actionsv1alpha1.AutoscalingRunnerSet{
autoscalingRunnerSet = &v1alpha1.AutoscalingRunnerSet{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asrs",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingRunnerSetSpec{
Spec: v1alpha1.AutoscalingRunnerSetSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
MaxRunners: &max,
@@ -79,12 +79,12 @@ var _ = Describe("Test AutoScalingListener controller", func() {
err = k8sClient.Create(ctx, autoscalingRunnerSet)
Expect(err).NotTo(HaveOccurred(), "failed to create AutoScalingRunnerSet")
autoscalingListener = &actionsv1alpha1.AutoscalingListener{
autoscalingListener = &v1alpha1.AutoscalingListener{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asl",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingListenerSpec{
Spec: v1alpha1.AutoscalingListenerSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
RunnerScaleSetId: 1,
@@ -119,7 +119,7 @@ var _ = Describe("Test AutoScalingListener controller", func() {
).Should(Succeed(), "Config secret should be created")
// Check if finalizer is added
created := new(actionsv1alpha1.AutoscalingListener)
created := new(v1alpha1.AutoscalingListener)
Eventually(
func() (string, error) {
err := k8sClient.Get(ctx, client.ObjectKey{Name: autoscalingListener.Name, Namespace: autoscalingListener.Namespace}, created)
@@ -298,7 +298,7 @@ var _ = Describe("Test AutoScalingListener controller", func() {
// The AutoScalingListener should be deleted
Eventually(
func() error {
listenerList := new(actionsv1alpha1.AutoscalingListenerList)
listenerList := new(v1alpha1.AutoscalingListenerList)
err := k8sClient.List(ctx, listenerList, client.InNamespace(autoscalingListener.Namespace), client.MatchingFields{".metadata.name": autoscalingListener.Name})
if err != nil {
return err
@@ -351,6 +351,53 @@ var _ = Describe("Test AutoScalingListener controller", func() {
autoscalingListenerTestInterval).Should(BeEquivalentTo(rulesForListenerRole([]string{updated.Spec.EphemeralRunnerSetName})), "Role should be updated")
})
It("It should re-create pod whenever listener container is terminated", func() {
// Waiting for the pod is created
pod := new(corev1.Pod)
Eventually(
func() (string, error) {
err := k8sClient.Get(ctx, client.ObjectKey{Name: autoscalingListener.Name, Namespace: autoscalingListener.Namespace}, pod)
if err != nil {
return "", err
}
return pod.Name, nil
},
autoscalingListenerTestTimeout,
autoscalingListenerTestInterval,
).Should(BeEquivalentTo(autoscalingListener.Name), "Pod should be created")
oldPodUID := string(pod.UID)
updated := pod.DeepCopy()
updated.Status.ContainerStatuses = []corev1.ContainerStatus{
{
Name: autoscalingListenerContainerName,
State: corev1.ContainerState{
Terminated: &corev1.ContainerStateTerminated{
ExitCode: 0,
},
},
},
}
err := k8sClient.Status().Update(ctx, updated)
Expect(err).NotTo(HaveOccurred(), "failed to update test pod")
// Waiting for the new pod is created
Eventually(
func() (string, error) {
pod := new(corev1.Pod)
err := k8sClient.Get(ctx, client.ObjectKey{Name: autoscalingListener.Name, Namespace: autoscalingListener.Namespace}, pod)
if err != nil {
return "", err
}
return string(pod.UID), nil
},
autoscalingListenerTestTimeout,
autoscalingListenerTestInterval,
).ShouldNot(BeEquivalentTo(oldPodUID), "Pod should be re-created")
})
It("It should update mirror secrets to match secret used by AutoScalingRunnerSet", func() {
// Waiting for the pod is created
pod := new(corev1.Pod)
@@ -373,7 +420,18 @@ var _ = Describe("Test AutoScalingListener controller", func() {
Expect(err).NotTo(HaveOccurred(), "failed to update test secret")
updatedPod := pod.DeepCopy()
updatedPod.Status.Phase = corev1.PodFailed
// Ignore status running and consult the container state
updatedPod.Status.Phase = corev1.PodRunning
updatedPod.Status.ContainerStatuses = []corev1.ContainerStatus{
{
Name: autoscalingListenerContainerName,
State: corev1.ContainerState{
Terminated: &corev1.ContainerStateTerminated{
ExitCode: 1,
},
},
},
}
err = k8sClient.Status().Update(ctx, updatedPod)
Expect(err).NotTo(HaveOccurred(), "failed to update test pod to failed")
@@ -415,11 +473,12 @@ var _ = Describe("Test AutoScalingListener customization", func() {
var ctx context.Context
var mgr ctrl.Manager
var autoscalingNS *corev1.Namespace
var autoscalingRunnerSet *actionsv1alpha1.AutoscalingRunnerSet
var autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet
var configSecret *corev1.Secret
var autoscalingListener *actionsv1alpha1.AutoscalingListener
var autoscalingListener *v1alpha1.AutoscalingListener
var runAsUser int64 = 1001
const sidecarContainerName = "sidecar"
listenerPodTemplate := corev1.PodTemplateSpec{
Spec: corev1.PodSpec{
@@ -432,7 +491,7 @@ var _ = Describe("Test AutoScalingListener customization", func() {
},
},
{
Name: "sidecar",
Name: sidecarContainerName,
ImagePullPolicy: corev1.PullIfNotPresent,
Image: "busybox",
},
@@ -458,12 +517,12 @@ var _ = Describe("Test AutoScalingListener customization", func() {
min := 1
max := 10
autoscalingRunnerSet = &actionsv1alpha1.AutoscalingRunnerSet{
autoscalingRunnerSet = &v1alpha1.AutoscalingRunnerSet{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asrs",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingRunnerSetSpec{
Spec: v1alpha1.AutoscalingRunnerSetSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
MaxRunners: &max,
@@ -484,12 +543,12 @@ var _ = Describe("Test AutoScalingListener customization", func() {
err = k8sClient.Create(ctx, autoscalingRunnerSet)
Expect(err).NotTo(HaveOccurred(), "failed to create AutoScalingRunnerSet")
autoscalingListener = &actionsv1alpha1.AutoscalingListener{
autoscalingListener = &v1alpha1.AutoscalingListener{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asltest",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingListenerSpec{
Spec: v1alpha1.AutoscalingListenerSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
RunnerScaleSetId: 1,
@@ -512,7 +571,7 @@ var _ = Describe("Test AutoScalingListener customization", func() {
Context("When creating a new AutoScalingListener", func() {
It("It should create customized pod with applied configuration", func() {
// Check if finalizer is added
created := new(actionsv1alpha1.AutoscalingListener)
created := new(v1alpha1.AutoscalingListener)
Eventually(
func() (string, error) {
err := k8sClient.Get(ctx, client.ObjectKey{Name: autoscalingListener.Name, Namespace: autoscalingListener.Namespace}, created)
@@ -525,7 +584,8 @@ var _ = Describe("Test AutoScalingListener customization", func() {
return created.Finalizers[0], nil
},
autoscalingListenerTestTimeout,
autoscalingListenerTestInterval).Should(BeEquivalentTo(autoscalingListenerFinalizerName), "AutoScalingListener should have a finalizer")
autoscalingListenerTestInterval,
).Should(BeEquivalentTo(autoscalingListenerFinalizerName), "AutoScalingListener should have a finalizer")
// Check if config is created
config := new(corev1.Secret)
@@ -559,30 +619,119 @@ var _ = Describe("Test AutoScalingListener customization", func() {
Expect(pod.Spec.Containers[0].SecurityContext.RunAsUser).To(Equal(&runAsUser), "Pod should have the correct security context")
Expect(pod.Spec.Containers[0].ImagePullPolicy).To(Equal(corev1.PullAlways), "Pod should have the correct image pull policy")
Expect(pod.Spec.Containers[1].Name).To(Equal("sidecar"), "Pod should have the correct container name")
Expect(pod.Spec.Containers[1].Name).To(Equal(sidecarContainerName), "Pod should have the correct container name")
Expect(pod.Spec.Containers[1].Image).To(Equal("busybox"), "Pod should have the correct image")
Expect(pod.Spec.Containers[1].ImagePullPolicy).To(Equal(corev1.PullIfNotPresent), "Pod should have the correct image pull policy")
})
})
Context("When AutoscalingListener pod has interuptions", func() {
It("Should re-create pod when it is deleted", func() {
pod := new(corev1.Pod)
Eventually(
func() (string, error) {
err := k8sClient.Get(ctx, client.ObjectKey{Name: autoscalingListener.Name, Namespace: autoscalingListener.Namespace}, pod)
if err != nil {
return "", err
}
return pod.Name, nil
},
autoscalingListenerTestTimeout,
autoscalingListenerTestInterval,
).Should(BeEquivalentTo(autoscalingListener.Name), "Pod should be created")
Expect(len(pod.Spec.Containers)).To(Equal(2), "Pod should have 2 containers")
oldPodUID := string(pod.UID)
err := k8sClient.Delete(ctx, pod)
Expect(err).NotTo(HaveOccurred(), "failed to delete pod")
pod = new(corev1.Pod)
Eventually(
func() (string, error) {
err := k8sClient.Get(ctx, client.ObjectKey{Name: autoscalingListener.Name, Namespace: autoscalingListener.Namespace}, pod)
if err != nil {
return "", err
}
return string(pod.UID), nil
},
autoscalingListenerTestTimeout,
autoscalingListenerTestInterval,
).ShouldNot(BeEquivalentTo(oldPodUID), "Pod should be created")
})
It("Should re-create pod when the listener pod is terminated", func() {
pod := new(corev1.Pod)
Eventually(
func() (string, error) {
err := k8sClient.Get(ctx, client.ObjectKey{Name: autoscalingListener.Name, Namespace: autoscalingListener.Namespace}, pod)
if err != nil {
return "", err
}
return pod.Name, nil
},
autoscalingListenerTestTimeout,
autoscalingListenerTestInterval,
).Should(BeEquivalentTo(autoscalingListener.Name), "Pod should be created")
updated := pod.DeepCopy()
oldPodUID := string(pod.UID)
updated.Status.ContainerStatuses = []corev1.ContainerStatus{
{
Name: autoscalingListenerContainerName,
State: corev1.ContainerState{
Terminated: &corev1.ContainerStateTerminated{
ExitCode: 1,
},
},
},
{
Name: sidecarContainerName,
State: corev1.ContainerState{
Running: &corev1.ContainerStateRunning{},
},
},
}
err := k8sClient.Status().Update(ctx, updated)
Expect(err).NotTo(HaveOccurred(), "failed to update pod status")
pod = new(corev1.Pod)
Eventually(
func() (string, error) {
err := k8sClient.Get(ctx, client.ObjectKey{Name: autoscalingListener.Name, Namespace: autoscalingListener.Namespace}, pod)
if err != nil {
return "", err
}
return string(pod.UID), nil
},
autoscalingListenerTestTimeout,
autoscalingListenerTestInterval,
).ShouldNot(BeEquivalentTo(oldPodUID), "Pod should be created")
})
})
})
var _ = Describe("Test AutoScalingListener controller with proxy", func() {
var ctx context.Context
var mgr ctrl.Manager
var autoscalingNS *corev1.Namespace
var autoscalingRunnerSet *actionsv1alpha1.AutoscalingRunnerSet
var autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet
var configSecret *corev1.Secret
var autoscalingListener *actionsv1alpha1.AutoscalingListener
var autoscalingListener *v1alpha1.AutoscalingListener
createRunnerSetAndListener := func(proxy *actionsv1alpha1.ProxyConfig) {
createRunnerSetAndListener := func(proxy *v1alpha1.ProxyConfig) {
min := 1
max := 10
autoscalingRunnerSet = &actionsv1alpha1.AutoscalingRunnerSet{
autoscalingRunnerSet = &v1alpha1.AutoscalingRunnerSet{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asrs",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingRunnerSetSpec{
Spec: v1alpha1.AutoscalingRunnerSetSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
MaxRunners: &max,
@@ -604,12 +753,12 @@ var _ = Describe("Test AutoScalingListener controller with proxy", func() {
err := k8sClient.Create(ctx, autoscalingRunnerSet)
Expect(err).NotTo(HaveOccurred(), "failed to create AutoScalingRunnerSet")
autoscalingListener = &actionsv1alpha1.AutoscalingListener{
autoscalingListener = &v1alpha1.AutoscalingListener{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asl",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingListenerSpec{
Spec: v1alpha1.AutoscalingListenerSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
RunnerScaleSetId: 1,
@@ -658,12 +807,12 @@ var _ = Describe("Test AutoScalingListener controller with proxy", func() {
err := k8sClient.Create(ctx, proxyCredentials)
Expect(err).NotTo(HaveOccurred(), "failed to create proxy credentials secret")
proxy := &actionsv1alpha1.ProxyConfig{
HTTP: &actionsv1alpha1.ProxyServerConfig{
proxy := &v1alpha1.ProxyConfig{
HTTP: &v1alpha1.ProxyServerConfig{
Url: "http://localhost:8080",
CredentialSecretRef: "proxy-credentials",
},
HTTPS: &actionsv1alpha1.ProxyServerConfig{
HTTPS: &v1alpha1.ProxyServerConfig{
Url: "https://localhost:8443",
CredentialSecretRef: "proxy-credentials",
},
@@ -766,19 +915,19 @@ var _ = Describe("Test AutoScalingListener controller with template modification
var ctx context.Context
var mgr ctrl.Manager
var autoscalingNS *corev1.Namespace
var autoscalingRunnerSet *actionsv1alpha1.AutoscalingRunnerSet
var autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet
var configSecret *corev1.Secret
var autoscalingListener *actionsv1alpha1.AutoscalingListener
var autoscalingListener *v1alpha1.AutoscalingListener
createRunnerSetAndListener := func(listenerTemplate *corev1.PodTemplateSpec) {
min := 1
max := 10
autoscalingRunnerSet = &actionsv1alpha1.AutoscalingRunnerSet{
autoscalingRunnerSet = &v1alpha1.AutoscalingRunnerSet{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asrs",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingRunnerSetSpec{
Spec: v1alpha1.AutoscalingRunnerSetSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
MaxRunners: &max,
@@ -800,12 +949,12 @@ var _ = Describe("Test AutoScalingListener controller with template modification
err := k8sClient.Create(ctx, autoscalingRunnerSet)
Expect(err).NotTo(HaveOccurred(), "failed to create AutoScalingRunnerSet")
autoscalingListener = &actionsv1alpha1.AutoscalingListener{
autoscalingListener = &v1alpha1.AutoscalingListener{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asl",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingListenerSpec{
Spec: v1alpha1.AutoscalingListenerSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
RunnerScaleSetId: 1,
@@ -887,7 +1036,6 @@ var _ = Describe("Test AutoScalingListener controller with template modification
g.Expect(pod.ObjectMeta.Annotations).To(HaveKeyWithValue("test-annotation-key", "test-annotation-value"), "pod annotations should be copied from runner set template")
g.Expect(pod.ObjectMeta.Labels).To(HaveKeyWithValue("test-label-key", "test-label-value"), "pod labels should be copied from runner set template")
},
autoscalingListenerTestTimeout,
autoscalingListenerTestInterval).Should(Succeed(), "failed to create listener pod with proxy details")
@@ -915,9 +1063,9 @@ var _ = Describe("Test GitHub Server TLS configuration", func() {
var ctx context.Context
var mgr ctrl.Manager
var autoscalingNS *corev1.Namespace
var autoscalingRunnerSet *actionsv1alpha1.AutoscalingRunnerSet
var autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet
var configSecret *corev1.Secret
var autoscalingListener *actionsv1alpha1.AutoscalingListener
var autoscalingListener *v1alpha1.AutoscalingListener
var rootCAConfigMap *corev1.ConfigMap
BeforeEach(func() {
@@ -955,16 +1103,16 @@ var _ = Describe("Test GitHub Server TLS configuration", func() {
min := 1
max := 10
autoscalingRunnerSet = &actionsv1alpha1.AutoscalingRunnerSet{
autoscalingRunnerSet = &v1alpha1.AutoscalingRunnerSet{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asrs",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingRunnerSetSpec{
Spec: v1alpha1.AutoscalingRunnerSetSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
GitHubServerTLS: &actionsv1alpha1.GitHubServerTLSConfig{
CertificateFrom: &actionsv1alpha1.TLSCertificateSource{
GitHubServerTLS: &v1alpha1.GitHubServerTLSConfig{
CertificateFrom: &v1alpha1.TLSCertificateSource{
ConfigMapKeyRef: &corev1.ConfigMapKeySelector{
LocalObjectReference: corev1.LocalObjectReference{
Name: rootCAConfigMap.Name,
@@ -991,16 +1139,16 @@ var _ = Describe("Test GitHub Server TLS configuration", func() {
err = k8sClient.Create(ctx, autoscalingRunnerSet)
Expect(err).NotTo(HaveOccurred(), "failed to create AutoScalingRunnerSet")
autoscalingListener = &actionsv1alpha1.AutoscalingListener{
autoscalingListener = &v1alpha1.AutoscalingListener{
ObjectMeta: metav1.ObjectMeta{
Name: "test-asl",
Namespace: autoscalingNS.Name,
},
Spec: actionsv1alpha1.AutoscalingListenerSpec{
Spec: v1alpha1.AutoscalingListenerSpec{
GitHubConfigUrl: "https://github.com/owner/repo",
GitHubConfigSecret: configSecret.Name,
GitHubServerTLS: &actionsv1alpha1.GitHubServerTLSConfig{
CertificateFrom: &actionsv1alpha1.TLSCertificateSource{
GitHubServerTLS: &v1alpha1.GitHubServerTLSConfig{
CertificateFrom: &v1alpha1.TLSCertificateSource{
ConfigMapKeyRef: &corev1.ConfigMapKeySelector{
LocalObjectReference: corev1.LocalObjectReference{
Name: rootCAConfigMap.Name,

View File

@@ -30,7 +30,6 @@ import (
corev1 "k8s.io/api/core/v1"
rbacv1 "k8s.io/api/rbac/v1"
kerrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
ctrl "sigs.k8s.io/controller-runtime"
@@ -42,10 +41,14 @@ import (
)
const (
labelKeyRunnerSpecHash = "runner-spec-hash"
annotationKeyRunnerSpecHash = "actions.github.com/runner-spec-hash"
// annotationKeyValuesHash is hash of the entire values json.
// This is used to determine if the values have changed, so we can
// re-create listener.
annotationKeyValuesHash = "actions.github.com/values-hash"
autoscalingRunnerSetFinalizerName = "autoscalingrunnerset.actions.github.com/finalizer"
runnerScaleSetIdAnnotationKey = "runner-scale-set-id"
runnerScaleSetNameAnnotationKey = "runner-scale-set-name"
)
type UpdateStrategy string
@@ -76,8 +79,7 @@ type AutoscalingRunnerSetReconciler struct {
DefaultRunnerScaleSetListenerImagePullSecrets []string
UpdateStrategy UpdateStrategy
ActionsClient actions.MultiClient
resourceBuilder resourceBuilder
ResourceBuilder
}
// +kubebuilder:rbac:groups=actions.github.com,resources=autoscalingrunnersets,verbs=get;list;watch;create;update;patch;delete
@@ -131,17 +133,11 @@ func (r *AutoscalingRunnerSetReconciler) Reconcile(ctx context.Context, req ctrl
return ctrl.Result{}, err
}
requeue, err := r.removeFinalizersFromDependentResources(ctx, autoscalingRunnerSet, log)
if err != nil {
if err := r.removeFinalizersFromDependentResources(ctx, autoscalingRunnerSet, log); err != nil {
log.Error(err, "Failed to remove finalizers on dependent resources")
return ctrl.Result{}, err
}
if requeue {
log.Info("Waiting for dependent resources to be deleted")
return ctrl.Result{Requeue: true}, nil
}
log.Info("Removing finalizer")
err = patch(ctx, r.Client, autoscalingRunnerSet, func(obj *v1alpha1.AutoscalingRunnerSet) {
controllerutil.RemoveFinalizer(obj, autoscalingRunnerSetFinalizerName)
@@ -205,7 +201,7 @@ func (r *AutoscalingRunnerSetReconciler) Reconcile(ctx context.Context, req ctrl
}
// Make sure the runner scale set name is up to date
currentRunnerScaleSetName, ok := autoscalingRunnerSet.Annotations[runnerScaleSetNameAnnotationKey]
currentRunnerScaleSetName, ok := autoscalingRunnerSet.Annotations[AnnotationKeyGitHubRunnerScaleSetName]
if !ok || (len(autoscalingRunnerSet.Spec.RunnerScaleSetName) > 0 && !strings.EqualFold(currentRunnerScaleSetName, autoscalingRunnerSet.Spec.RunnerScaleSetName)) {
log.Info("AutoScalingRunnerSet runner scale set name changed. Updating the runner scale set.")
return r.updateRunnerScaleSetName(ctx, autoscalingRunnerSet, log)
@@ -231,9 +227,8 @@ func (r *AutoscalingRunnerSetReconciler) Reconcile(ctx context.Context, req ctrl
return r.createEphemeralRunnerSet(ctx, autoscalingRunnerSet, log)
}
desiredSpecHash := autoscalingRunnerSet.RunnerSetSpecHash()
for _, runnerSet := range existingRunnerSets.all() {
log.Info("Find existing ephemeral runner set", "name", runnerSet.Name, "specHash", runnerSet.Labels[labelKeyRunnerSpecHash])
log.Info("Find existing ephemeral runner set", "name", runnerSet.Name, "specHash", runnerSet.Annotations[annotationKeyRunnerSpecHash])
}
// Make sure the AutoscalingListener is up and running in the controller namespace
@@ -250,7 +245,9 @@ func (r *AutoscalingRunnerSetReconciler) Reconcile(ctx context.Context, req ctrl
}
// Our listener pod is out of date, so we need to delete it to get a new recreate.
if listenerFound && (listener.Labels[labelKeyRunnerSpecHash] != autoscalingRunnerSet.ListenerSpecHash()) {
listenerValuesHashChanged := listener.Annotations[annotationKeyValuesHash] != autoscalingRunnerSet.Annotations[annotationKeyValuesHash]
listenerSpecHashChanged := listener.Annotations[annotationKeyRunnerSpecHash] != autoscalingRunnerSet.ListenerSpecHash()
if listenerFound && (listenerValuesHashChanged || listenerSpecHashChanged) {
log.Info("RunnerScaleSetListener is out of date. Deleting it so that it is recreated", "name", listener.Name)
if err := r.Delete(ctx, listener); err != nil {
if kerrors.IsNotFound(err) {
@@ -264,7 +261,7 @@ func (r *AutoscalingRunnerSetReconciler) Reconcile(ctx context.Context, req ctrl
return ctrl.Result{}, nil
}
if desiredSpecHash != latestRunnerSet.Labels[labelKeyRunnerSpecHash] {
if latestRunnerSet.Annotations[annotationKeyRunnerSpecHash] != autoscalingRunnerSet.RunnerSetSpecHash() {
if r.drainingJobs(&latestRunnerSet.Status) {
log.Info("Latest runner set spec hash does not match the current autoscaling runner set. Waiting for the running and pending runners to finish:", "running", latestRunnerSet.Status.RunningEphemeralRunners, "pending", latestRunnerSet.Status.PendingEphemeralRunners)
log.Info("Scaling down the number of desired replicas to 0")
@@ -272,6 +269,7 @@ func (r *AutoscalingRunnerSetReconciler) Reconcile(ctx context.Context, req ctrl
// need to scale down to 0
err := patch(ctx, r.Client, latestRunnerSet, func(obj *v1alpha1.EphemeralRunnerSet) {
obj.Spec.Replicas = 0
obj.Spec.PatchID = 0
})
if err != nil {
log.Error(err, "Failed to patch runner set to set desired count to 0")
@@ -384,7 +382,7 @@ func (r *AutoscalingRunnerSetReconciler) deleteEphemeralRunnerSets(ctx context.C
return nil
}
func (r *AutoscalingRunnerSetReconciler) removeFinalizersFromDependentResources(ctx context.Context, autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet, logger logr.Logger) (requeue bool, err error) {
func (r *AutoscalingRunnerSetReconciler) removeFinalizersFromDependentResources(ctx context.Context, autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet, logger logr.Logger) error {
c := autoscalingRunnerSetFinalizerDependencyCleaner{
client: r.Client,
autoscalingRunnerSet: autoscalingRunnerSet,
@@ -399,12 +397,7 @@ func (r *AutoscalingRunnerSetReconciler) removeFinalizersFromDependentResources(
c.removeManagerRoleBindingFinalizer(ctx)
c.removeManagerRoleFinalizer(ctx)
requeue, err = c.result()
if err != nil {
logger.Error(err, "Failed to cleanup finalizer from dependent resource")
return true, err
}
return requeue, nil
return c.Err()
}
func (r *AutoscalingRunnerSetReconciler) createRunnerScaleSet(ctx context.Context, autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet, logger logr.Logger) (ctrl.Result, error) {
@@ -480,7 +473,7 @@ func (r *AutoscalingRunnerSetReconciler) createRunnerScaleSet(ctx context.Contex
logger.Info("Adding runner scale set ID, name and runner group name as an annotation and url labels")
if err = patch(ctx, r.Client, autoscalingRunnerSet, func(obj *v1alpha1.AutoscalingRunnerSet) {
obj.Annotations[runnerScaleSetNameAnnotationKey] = runnerScaleSet.Name
obj.Annotations[AnnotationKeyGitHubRunnerScaleSetName] = runnerScaleSet.Name
obj.Annotations[runnerScaleSetIdAnnotationKey] = strconv.Itoa(runnerScaleSet.Id)
obj.Annotations[AnnotationKeyGitHubRunnerGroupName] = runnerScaleSet.RunnerGroupName
if err := applyGitHubURLLabels(obj.Spec.GitHubConfigUrl, obj.Labels); err != nil { // should never happen
@@ -528,9 +521,10 @@ func (r *AutoscalingRunnerSetReconciler) updateRunnerScaleSetRunnerGroup(ctx con
return ctrl.Result{}, err
}
logger.Info("Updating runner scale set runner group name as an annotation")
logger.Info("Updating runner scale set name and runner group name as annotations")
if err := patch(ctx, r.Client, autoscalingRunnerSet, func(obj *v1alpha1.AutoscalingRunnerSet) {
obj.Annotations[AnnotationKeyGitHubRunnerGroupName] = updatedRunnerScaleSet.RunnerGroupName
obj.Annotations[AnnotationKeyGitHubRunnerScaleSetName] = updatedRunnerScaleSet.Name
}); err != nil {
logger.Error(err, "Failed to update runner group name annotation")
return ctrl.Result{}, err
@@ -566,7 +560,7 @@ func (r *AutoscalingRunnerSetReconciler) updateRunnerScaleSetName(ctx context.Co
logger.Info("Updating runner scale set name as an annotation")
if err := patch(ctx, r.Client, autoscalingRunnerSet, func(obj *v1alpha1.AutoscalingRunnerSet) {
obj.Annotations[runnerScaleSetNameAnnotationKey] = updatedRunnerScaleSet.Name
obj.Annotations[AnnotationKeyGitHubRunnerScaleSetName] = updatedRunnerScaleSet.Name
}); err != nil {
logger.Error(err, "Failed to update runner scale set name annotation")
return ctrl.Result{}, err
@@ -628,7 +622,7 @@ func (r *AutoscalingRunnerSetReconciler) deleteRunnerScaleSet(ctx context.Contex
}
func (r *AutoscalingRunnerSetReconciler) createEphemeralRunnerSet(ctx context.Context, autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet, log logr.Logger) (ctrl.Result, error) {
desiredRunnerSet, err := r.resourceBuilder.newEphemeralRunnerSet(autoscalingRunnerSet)
desiredRunnerSet, err := r.ResourceBuilder.newEphemeralRunnerSet(autoscalingRunnerSet)
if err != nil {
log.Error(err, "Could not create EphemeralRunnerSet")
return ctrl.Result{}, err
@@ -657,7 +651,7 @@ func (r *AutoscalingRunnerSetReconciler) createAutoScalingListenerForRunnerSet(c
})
}
autoscalingListener, err := r.resourceBuilder.newAutoScalingListener(autoscalingRunnerSet, ephemeralRunnerSet, r.ControllerNamespace, r.DefaultRunnerScaleSetListenerImage, imagePullSecrets)
autoscalingListener, err := r.ResourceBuilder.newAutoScalingListener(autoscalingRunnerSet, ephemeralRunnerSet, r.ControllerNamespace, r.DefaultRunnerScaleSetListenerImage, imagePullSecrets)
if err != nil {
log.Error(err, "Could not create AutoscalingListener spec")
return ctrl.Result{}, err
@@ -752,26 +746,6 @@ func (r *AutoscalingRunnerSetReconciler) actionsClientOptionsFor(ctx context.Con
// SetupWithManager sets up the controller with the Manager.
func (r *AutoscalingRunnerSetReconciler) SetupWithManager(mgr ctrl.Manager) error {
groupVersionIndexer := func(rawObj client.Object) []string {
groupVersion := v1alpha1.GroupVersion.String()
owner := metav1.GetControllerOf(rawObj)
if owner == nil {
return nil
}
// ...make sure it is owned by this controller
if owner.APIVersion != groupVersion || owner.Kind != "AutoscalingRunnerSet" {
return nil
}
// ...and if so, return it
return []string{owner.Name}
}
if err := mgr.GetFieldIndexer().IndexField(context.Background(), &v1alpha1.EphemeralRunnerSet{}, resourceOwnerKey, groupVersionIndexer); err != nil {
return err
}
return ctrl.NewControllerManagedBy(mgr).
For(&v1alpha1.AutoscalingRunnerSet{}).
Owns(&v1alpha1.EphemeralRunnerSet{}).
@@ -798,17 +772,16 @@ type autoscalingRunnerSetFinalizerDependencyCleaner struct {
autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet
logger logr.Logger
// fields to operate on
requeue bool
err error
err error
}
func (c *autoscalingRunnerSetFinalizerDependencyCleaner) result() (requeue bool, err error) {
return c.requeue, c.err
func (c *autoscalingRunnerSetFinalizerDependencyCleaner) Err() error {
return c.err
}
func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeKubernetesModeRoleBindingFinalizer(ctx context.Context) {
if c.requeue || c.err != nil {
if c.err != nil {
c.logger.Info("Skipping cleaning up kubernetes mode service account")
return
}
@@ -839,7 +812,6 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeKubernetesModeRol
c.err = fmt.Errorf("failed to patch kubernetes mode role binding without finalizer: %w", err)
return
}
c.requeue = true
c.logger.Info("Removed finalizer from container mode kubernetes role binding", "name", roleBindingName)
return
case err != nil && !kerrors.IsNotFound(err):
@@ -852,7 +824,7 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeKubernetesModeRol
}
func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeKubernetesModeRoleFinalizer(ctx context.Context) {
if c.requeue || c.err != nil {
if c.err != nil {
return
}
@@ -882,7 +854,6 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeKubernetesModeRol
c.err = fmt.Errorf("failed to patch kubernetes mode role without finalizer: %w", err)
return
}
c.requeue = true
c.logger.Info("Removed finalizer from container mode kubernetes role")
return
case err != nil && !kerrors.IsNotFound(err):
@@ -895,7 +866,7 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeKubernetesModeRol
}
func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeKubernetesModeServiceAccountFinalizer(ctx context.Context) {
if c.requeue || c.err != nil {
if c.err != nil {
return
}
@@ -926,7 +897,6 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeKubernetesModeSer
c.err = fmt.Errorf("failed to patch kubernetes mode service account without finalizer: %w", err)
return
}
c.requeue = true
c.logger.Info("Removed finalizer from container mode kubernetes service account")
return
case err != nil && !kerrors.IsNotFound(err):
@@ -939,7 +909,7 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeKubernetesModeSer
}
func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeNoPermissionServiceAccountFinalizer(ctx context.Context) {
if c.requeue || c.err != nil {
if c.err != nil {
return
}
@@ -970,7 +940,6 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeNoPermissionServi
c.err = fmt.Errorf("failed to patch service account without finalizer: %w", err)
return
}
c.requeue = true
c.logger.Info("Removed finalizer from no permission service account", "name", serviceAccountName)
return
case err != nil && !kerrors.IsNotFound(err):
@@ -983,7 +952,7 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeNoPermissionServi
}
func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeGitHubSecretFinalizer(ctx context.Context) {
if c.requeue || c.err != nil {
if c.err != nil {
return
}
@@ -1014,7 +983,6 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeGitHubSecretFinal
c.err = fmt.Errorf("failed to patch GitHub secret without finalizer: %w", err)
return
}
c.requeue = true
c.logger.Info("Removed finalizer from GitHub secret", "name", githubSecretName)
return
case err != nil && !kerrors.IsNotFound(err) && !kerrors.IsForbidden(err):
@@ -1027,7 +995,7 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeGitHubSecretFinal
}
func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeManagerRoleBindingFinalizer(ctx context.Context) {
if c.requeue || c.err != nil {
if c.err != nil {
return
}
@@ -1058,7 +1026,6 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeManagerRoleBindin
c.err = fmt.Errorf("failed to patch manager role binding without finalizer: %w", err)
return
}
c.requeue = true
c.logger.Info("Removed finalizer from manager role binding", "name", managerRoleBindingName)
return
case err != nil && !kerrors.IsNotFound(err):
@@ -1071,7 +1038,7 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeManagerRoleBindin
}
func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeManagerRoleFinalizer(ctx context.Context) {
if c.requeue || c.err != nil {
if c.err != nil {
return
}
@@ -1102,7 +1069,6 @@ func (c *autoscalingRunnerSetFinalizerDependencyCleaner) removeManagerRoleFinali
c.err = fmt.Errorf("failed to patch manager role without finalizer: %w", err)
return
}
c.requeue = true
c.logger.Info("Removed finalizer from manager role", "name", managerRoleName)
return
case err != nil && !kerrors.IsNotFound(err):

View File

@@ -34,7 +34,7 @@ import (
)
const (
autoscalingRunnerSetTestTimeout = time.Second * 5
autoscalingRunnerSetTestTimeout = time.Second * 20
autoscalingRunnerSetTestInterval = time.Millisecond * 250
autoscalingRunnerSetTestGitHubToken = "gh_token"
)
@@ -280,6 +280,10 @@ var _ = Describe("Test AutoScalingRunnerSet controller", Ordered, func() {
// This should trigger re-creation of EphemeralRunnerSet and Listener
patched := autoscalingRunnerSet.DeepCopy()
patched.Spec.Template.Spec.PriorityClassName = "test-priority-class"
if patched.ObjectMeta.Annotations == nil {
patched.ObjectMeta.Annotations = make(map[string]string)
}
patched.ObjectMeta.Annotations[annotationKeyValuesHash] = "test-hash"
err = k8sClient.Patch(ctx, patched, client.MergeFrom(autoscalingRunnerSet))
Expect(err).NotTo(HaveOccurred(), "failed to patch AutoScalingRunnerSet")
autoscalingRunnerSet = patched.DeepCopy()
@@ -297,10 +301,10 @@ var _ = Describe("Test AutoScalingRunnerSet controller", Ordered, func() {
return "", fmt.Errorf("We should have only 1 EphemeralRunnerSet, but got %v", len(runnerSetList.Items))
}
return runnerSetList.Items[0].Labels[labelKeyRunnerSpecHash], nil
return runnerSetList.Items[0].Annotations[annotationKeyRunnerSpecHash], nil
},
autoscalingRunnerSetTestTimeout,
autoscalingRunnerSetTestInterval).ShouldNot(BeEquivalentTo(runnerSet.Labels[labelKeyRunnerSpecHash]), "New EphemeralRunnerSet should be created")
autoscalingRunnerSetTestInterval).ShouldNot(BeEquivalentTo(runnerSet.Annotations[annotationKeyRunnerSpecHash]), "New EphemeralRunnerSet should be created")
// We should create a new listener
Eventually(
@@ -334,6 +338,55 @@ var _ = Describe("Test AutoScalingRunnerSet controller", Ordered, func() {
err = k8sClient.Patch(ctx, patched, client.MergeFrom(autoscalingRunnerSet))
Expect(err).NotTo(HaveOccurred(), "failed to patch AutoScalingRunnerSet")
// We should not re-create a new EphemeralRunnerSet
Consistently(
func() (string, error) {
runnerSetList := new(v1alpha1.EphemeralRunnerSetList)
err := k8sClient.List(ctx, runnerSetList, client.InNamespace(autoscalingRunnerSet.Namespace))
if err != nil {
return "", err
}
if len(runnerSetList.Items) != 1 {
return "", fmt.Errorf("We should have only 1 EphemeralRunnerSet, but got %v", len(runnerSetList.Items))
}
return string(runnerSetList.Items[0].UID), nil
},
autoscalingRunnerSetTestTimeout,
autoscalingRunnerSetTestInterval).Should(BeEquivalentTo(string(runnerSet.UID)), "New EphemeralRunnerSet should not be created")
// We should only re-create a new listener
Eventually(
func() (string, error) {
listener := new(v1alpha1.AutoscalingListener)
err := k8sClient.Get(ctx, client.ObjectKey{Name: scaleSetListenerName(autoscalingRunnerSet), Namespace: autoscalingRunnerSet.Namespace}, listener)
if err != nil {
return "", err
}
return string(listener.UID), nil
},
autoscalingRunnerSetTestTimeout,
autoscalingRunnerSetTestInterval).ShouldNot(BeEquivalentTo(string(listener.UID)), "New Listener should be created")
// Only update the values hash for the autoscaling runner set
// This should trigger re-creation of the Listener only
runnerSetList = new(v1alpha1.EphemeralRunnerSetList)
err = k8sClient.List(ctx, runnerSetList, client.InNamespace(autoscalingRunnerSet.Namespace))
Expect(err).NotTo(HaveOccurred(), "failed to list EphemeralRunnerSet")
Expect(len(runnerSetList.Items)).To(Equal(1), "There should be 1 EphemeralRunnerSet")
runnerSet = runnerSetList.Items[0]
listener = new(v1alpha1.AutoscalingListener)
err = k8sClient.Get(ctx, client.ObjectKey{Name: scaleSetListenerName(autoscalingRunnerSet), Namespace: autoscalingRunnerSet.Namespace}, listener)
Expect(err).NotTo(HaveOccurred(), "failed to get Listener")
patched = autoscalingRunnerSet.DeepCopy()
patched.ObjectMeta.Annotations[annotationKeyValuesHash] = "hash-changes"
err = k8sClient.Patch(ctx, patched, client.MergeFrom(autoscalingRunnerSet))
Expect(err).NotTo(HaveOccurred(), "failed to patch AutoScalingRunnerSet")
// We should not re-create a new EphemeralRunnerSet
Consistently(
func() (string, error) {
@@ -493,6 +546,10 @@ var _ = Describe("Test AutoScalingRunnerSet controller", Ordered, func() {
// Patch the AutoScalingRunnerSet image which should trigger
// the recreation of the Listener and EphemeralRunnerSet
patched := autoscalingRunnerSet.DeepCopy()
if patched.ObjectMeta.Annotations == nil {
patched.ObjectMeta.Annotations = make(map[string]string)
}
patched.ObjectMeta.Annotations[annotationKeyValuesHash] = "testgroup2"
patched.Spec.Template.Spec = corev1.PodSpec{
Containers: []corev1.Container{
{
@@ -501,7 +558,6 @@ var _ = Describe("Test AutoScalingRunnerSet controller", Ordered, func() {
},
},
}
// patched.Spec.Template.Spec.PriorityClassName = "test-priority-class"
err = k8sClient.Patch(ctx, patched, client.MergeFrom(autoscalingRunnerSet))
Expect(err).NotTo(HaveOccurred(), "failed to patch AutoScalingRunnerSet")
autoscalingRunnerSet = patched.DeepCopy()
@@ -698,7 +754,7 @@ var _ = Describe("Test AutoScalingController updates", Ordered, func() {
return "", err
}
if val, ok := ars.Annotations[runnerScaleSetNameAnnotationKey]; ok {
if val, ok := ars.Annotations[AnnotationKeyGitHubRunnerScaleSetName]; ok {
return val, nil
}
@@ -722,7 +778,7 @@ var _ = Describe("Test AutoScalingController updates", Ordered, func() {
return "", err
}
if val, ok := ars.Annotations[runnerScaleSetNameAnnotationKey]; ok {
if val, ok := ars.Annotations[AnnotationKeyGitHubRunnerScaleSetName]; ok {
return val, nil
}

View File

@@ -39,7 +39,11 @@ const (
// Finalizer used to protect resources from deletion while AutoscalingRunnerSet is running
const AutoscalingRunnerSetCleanupFinalizerName = "actions.github.com/cleanup-protection"
const AnnotationKeyGitHubRunnerGroupName = "actions.github.com/runner-group-name"
const (
AnnotationKeyGitHubRunnerGroupName = "actions.github.com/runner-group-name"
AnnotationKeyGitHubRunnerScaleSetName = "actions.github.com/runner-scale-set-name"
AnnotationKeyPatchID = "actions.github.com/patch-id"
)
// Labels applied to listener roles
const (

View File

@@ -21,7 +21,6 @@ import (
"errors"
"fmt"
"net/http"
"strings"
"time"
"github.com/actions/actions-runner-controller/apis/actions.github.com/v1alpha1"
@@ -50,10 +49,10 @@ const (
// EphemeralRunnerReconciler reconciles a EphemeralRunner object
type EphemeralRunnerReconciler struct {
client.Client
Log logr.Logger
Scheme *runtime.Scheme
ActionsClient actions.MultiClient
resourceBuilder resourceBuilder
Log logr.Logger
Scheme *runtime.Scheme
ActionsClient actions.MultiClient
ResourceBuilder
}
// +kubebuilder:rbac:groups=actions.github.com,resources=ephemeralrunners,verbs=get;list;watch;create;update;patch;delete
@@ -133,17 +132,21 @@ func (r *EphemeralRunnerReconciler) Reconcile(ctx context.Context, req ctrl.Requ
return ctrl.Result{}, nil
}
if !controllerutil.ContainsFinalizer(ephemeralRunner, ephemeralRunnerActionsFinalizerName) {
log.Info("Adding runner registration finalizer")
err := patch(ctx, r.Client, ephemeralRunner, func(obj *v1alpha1.EphemeralRunner) {
controllerutil.AddFinalizer(obj, ephemeralRunnerActionsFinalizerName)
})
if ephemeralRunner.IsDone() {
log.Info("Cleaning up resources after after ephemeral runner termination", "phase", ephemeralRunner.Status.Phase)
done, err := r.cleanupResources(ctx, ephemeralRunner, log)
if err != nil {
log.Error(err, "Failed to update with runner registration finalizer set")
log.Error(err, "Failed to clean up ephemeral runner owned resources")
return ctrl.Result{}, err
}
log.Info("Successfully added runner registration finalizer")
if !done {
log.Info("Waiting for ephemeral runner owned resources to be deleted")
return ctrl.Result{Requeue: true}, nil
}
// Stop reconciling on this object.
// The EphemeralRunnerSet is responsible for cleaning it up.
log.Info("EphemeralRunner has already finished. Stopping reconciliation and waiting for EphemeralRunnerSet to clean it up", "phase", ephemeralRunner.Status.Phase)
return ctrl.Result{}, nil
}
if !controllerutil.ContainsFinalizer(ephemeralRunner, ephemeralRunnerFinalizerName) {
@@ -159,16 +162,25 @@ func (r *EphemeralRunnerReconciler) Reconcile(ctx context.Context, req ctrl.Requ
return ctrl.Result{}, nil
}
if ephemeralRunner.Status.Phase == corev1.PodSucceeded || ephemeralRunner.Status.Phase == corev1.PodFailed {
// Stop reconciling on this object.
// The EphemeralRunnerSet is responsible for cleaning it up.
log.Info("EphemeralRunner has already finished. Stopping reconciliation and waiting for EphemeralRunnerSet to clean it up", "phase", ephemeralRunner.Status.Phase)
if !controllerutil.ContainsFinalizer(ephemeralRunner, ephemeralRunnerActionsFinalizerName) {
log.Info("Adding runner registration finalizer")
err := patch(ctx, r.Client, ephemeralRunner, func(obj *v1alpha1.EphemeralRunner) {
controllerutil.AddFinalizer(obj, ephemeralRunnerActionsFinalizerName)
})
if err != nil {
log.Error(err, "Failed to update with runner registration finalizer set")
return ctrl.Result{}, err
}
log.Info("Successfully added runner registration finalizer")
return ctrl.Result{}, nil
}
if ephemeralRunner.Status.RunnerId == 0 {
log.Info("Creating new ephemeral runner registration and updating status with runner config")
return r.updateStatusWithRunnerConfig(ctx, ephemeralRunner, log)
if r, err := r.updateStatusWithRunnerConfig(ctx, ephemeralRunner, log); r != nil {
return *r, err
}
}
secret := new(corev1.Secret)
@@ -179,7 +191,17 @@ func (r *EphemeralRunnerReconciler) Reconcile(ctx context.Context, req ctrl.Requ
}
// create secret if not created
log.Info("Creating new ephemeral runner secret for jitconfig.")
return r.createSecret(ctx, ephemeralRunner, log)
if r, err := r.createSecret(ctx, ephemeralRunner, log); r != nil {
return *r, err
}
// Retry to get the secret that was just created.
// Otherwise, even though we want to continue to create the pod,
// it fails due to the missing secret resulting in an invalid pod spec.
if err := r.Get(ctx, req.NamespacedName, secret); err != nil {
log.Error(err, "Failed to fetch secret")
return ctrl.Result{}, err
}
}
pod := new(corev1.Pod)
@@ -285,14 +307,17 @@ func (r *EphemeralRunnerReconciler) Reconcile(ctx context.Context, req ctrl.Requ
}
func (r *EphemeralRunnerReconciler) cleanupRunnerFromService(ctx context.Context, ephemeralRunner *v1alpha1.EphemeralRunner, log logr.Logger) (ctrl.Result, error) {
actionsError := &actions.ActionsError{}
err := r.deleteRunnerFromService(ctx, ephemeralRunner, log)
if err != nil {
if errors.As(err, &actionsError) &&
actionsError.StatusCode == http.StatusBadRequest &&
strings.Contains(actionsError.ExceptionName, "JobStillRunningException") {
if err := r.deleteRunnerFromService(ctx, ephemeralRunner, log); err != nil {
actionsError := &actions.ActionsError{}
if !errors.As(err, &actionsError) {
log.Error(err, "Failed to clean up runner from the service (not an ActionsError)")
return ctrl.Result{}, err
}
if actionsError.StatusCode == http.StatusBadRequest && actionsError.IsException("JobStillRunningException") {
log.Info("Runner is still running the job. Re-queue in 30 seconds")
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}
log.Error(err, "Failed clean up runner from the service")
@@ -300,10 +325,9 @@ func (r *EphemeralRunnerReconciler) cleanupRunnerFromService(ctx context.Context
}
log.Info("Successfully removed runner registration from service")
err = patch(ctx, r.Client, ephemeralRunner, func(obj *v1alpha1.EphemeralRunner) {
if err := patch(ctx, r.Client, ephemeralRunner, func(obj *v1alpha1.EphemeralRunner) {
controllerutil.RemoveFinalizer(obj, ephemeralRunnerActionsFinalizerName)
})
if err != nil {
}); err != nil {
return ctrl.Result{}, err
}
@@ -324,7 +348,7 @@ func (r *EphemeralRunnerReconciler) cleanupResources(ctx context.Context, epheme
}
}
return false, nil
case err != nil && !kerrors.IsNotFound(err):
case !kerrors.IsNotFound(err):
return false, err
}
log.Info("Pod is deleted")
@@ -341,7 +365,7 @@ func (r *EphemeralRunnerReconciler) cleanupResources(ctx context.Context, epheme
}
}
return false, nil
case err != nil && !kerrors.IsNotFound(err):
case !kerrors.IsNotFound(err):
return false, err
}
log.Info("Secret is deleted")
@@ -499,27 +523,35 @@ func (r *EphemeralRunnerReconciler) deletePodAsFailed(ctx context.Context, ephem
// updateStatusWithRunnerConfig fetches runtime configuration needed by the runner
// This method should always set .status.runnerId and .status.runnerJITConfig
func (r *EphemeralRunnerReconciler) updateStatusWithRunnerConfig(ctx context.Context, ephemeralRunner *v1alpha1.EphemeralRunner, log logr.Logger) (ctrl.Result, error) {
func (r *EphemeralRunnerReconciler) updateStatusWithRunnerConfig(ctx context.Context, ephemeralRunner *v1alpha1.EphemeralRunner, log logr.Logger) (*ctrl.Result, error) {
// Runner is not registered with the service. We need to register it first
log.Info("Creating ephemeral runner JIT config")
actionsClient, err := r.actionsClientFor(ctx, ephemeralRunner)
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to get actions client for generating JIT config: %v", err)
return &ctrl.Result{}, fmt.Errorf("failed to get actions client for generating JIT config: %v", err)
}
jitSettings := &actions.RunnerScaleSetJitRunnerSetting{
Name: ephemeralRunner.Name,
}
for i := range ephemeralRunner.Spec.Spec.Containers {
if ephemeralRunner.Spec.Spec.Containers[i].Name == EphemeralRunnerContainerName &&
ephemeralRunner.Spec.Spec.Containers[i].WorkingDir != "" {
jitSettings.WorkFolder = ephemeralRunner.Spec.Spec.Containers[i].WorkingDir
}
}
jitConfig, err := actionsClient.GenerateJitRunnerConfig(ctx, jitSettings, ephemeralRunner.Spec.RunnerScaleSetId)
if err != nil {
actionsError := &actions.ActionsError{}
if !errors.As(err, &actionsError) {
return ctrl.Result{}, fmt.Errorf("failed to generate JIT config with generic error: %v", err)
return &ctrl.Result{}, fmt.Errorf("failed to generate JIT config with generic error: %v", err)
}
if actionsError.StatusCode != http.StatusConflict ||
!strings.Contains(actionsError.ExceptionName, "AgentExistsException") {
return ctrl.Result{}, fmt.Errorf("failed to generate JIT config with Actions service error: %v", err)
!actionsError.IsException("AgentExistsException") {
return &ctrl.Result{}, fmt.Errorf("failed to generate JIT config with Actions service error: %v", err)
}
// If the runner with the name we want already exists it means:
@@ -532,12 +564,12 @@ func (r *EphemeralRunnerReconciler) updateStatusWithRunnerConfig(ctx context.Con
log.Info("Getting runner jit config failed with conflict error, trying to get the runner by name", "runnerName", ephemeralRunner.Name)
existingRunner, err := actionsClient.GetRunnerByName(ctx, ephemeralRunner.Name)
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to get runner by name: %v", err)
return &ctrl.Result{}, fmt.Errorf("failed to get runner by name: %v", err)
}
if existingRunner == nil {
log.Info("Runner with the same name does not exist, re-queuing the reconciliation")
return ctrl.Result{Requeue: true}, nil
return &ctrl.Result{Requeue: true}, nil
}
log.Info("Found the runner with the same name", "runnerId", existingRunner.Id, "runnerScaleSetId", existingRunner.RunnerScaleSetId)
@@ -545,16 +577,16 @@ func (r *EphemeralRunnerReconciler) updateStatusWithRunnerConfig(ctx context.Con
log.Info("Removing the runner with the same name")
err := actionsClient.RemoveRunner(ctx, int64(existingRunner.Id))
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to remove runner from the service: %v", err)
return &ctrl.Result{}, fmt.Errorf("failed to remove runner from the service: %v", err)
}
log.Info("Removed the runner with the same name, re-queuing the reconciliation")
return ctrl.Result{Requeue: true}, nil
return &ctrl.Result{Requeue: true}, nil
}
// TODO: Do we want to mark the ephemeral runner as failed, and let EphemeralRunnerSet to clean it up, so we can recover from this situation?
// The situation is that the EphemeralRunner's name is already used by something else to register a runner, and we can't take the control back.
return ctrl.Result{}, fmt.Errorf("runner with the same name but doesn't belong to this RunnerScaleSet: %v", err)
return &ctrl.Result{}, fmt.Errorf("runner with the same name but doesn't belong to this RunnerScaleSet: %v", err)
}
log.Info("Created ephemeral runner JIT config", "runnerId", jitConfig.Runner.Id)
@@ -565,11 +597,20 @@ func (r *EphemeralRunnerReconciler) updateStatusWithRunnerConfig(ctx context.Con
obj.Status.RunnerJITConfig = jitConfig.EncodedJITConfig
})
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to update runner status for RunnerId/RunnerName/RunnerJITConfig: %v", err)
return &ctrl.Result{}, fmt.Errorf("failed to update runner status for RunnerId/RunnerName/RunnerJITConfig: %v", err)
}
// We want to continue without a requeue for faster pod creation.
//
// To do so, we update the status in-place, so that both continuing the loop and
// and requeuing and skipping updateStatusWithRunnerConfig in the next loop, will
// have the same effect.
ephemeralRunner.Status.RunnerId = jitConfig.Runner.Id
ephemeralRunner.Status.RunnerName = jitConfig.Runner.Name
ephemeralRunner.Status.RunnerJITConfig = jitConfig.EncodedJITConfig
log.Info("Updated ephemeral runner status with runnerId and runnerJITConfig")
return ctrl.Result{}, nil
return nil, nil
}
func (r *EphemeralRunnerReconciler) createPod(ctx context.Context, runner *v1alpha1.EphemeralRunner, secret *corev1.Secret, log logr.Logger) (ctrl.Result, error) {
@@ -622,7 +663,7 @@ func (r *EphemeralRunnerReconciler) createPod(ctx context.Context, runner *v1alp
}
log.Info("Creating new pod for ephemeral runner")
newPod := r.resourceBuilder.newEphemeralRunnerPod(ctx, runner, secret, envs...)
newPod := r.ResourceBuilder.newEphemeralRunnerPod(ctx, runner, secret, envs...)
if err := ctrl.SetControllerReference(runner, newPod, r.Scheme); err != nil {
log.Error(err, "Failed to set controller reference to a new pod")
@@ -645,21 +686,21 @@ func (r *EphemeralRunnerReconciler) createPod(ctx context.Context, runner *v1alp
return ctrl.Result{}, nil
}
func (r *EphemeralRunnerReconciler) createSecret(ctx context.Context, runner *v1alpha1.EphemeralRunner, log logr.Logger) (ctrl.Result, error) {
func (r *EphemeralRunnerReconciler) createSecret(ctx context.Context, runner *v1alpha1.EphemeralRunner, log logr.Logger) (*ctrl.Result, error) {
log.Info("Creating new secret for ephemeral runner")
jitSecret := r.resourceBuilder.newEphemeralRunnerJitSecret(runner)
jitSecret := r.ResourceBuilder.newEphemeralRunnerJitSecret(runner)
if err := ctrl.SetControllerReference(runner, jitSecret, r.Scheme); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to set controller reference: %v", err)
return &ctrl.Result{}, fmt.Errorf("failed to set controller reference: %v", err)
}
log.Info("Created new secret spec for ephemeral runner")
if err := r.Create(ctx, jitSecret); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to create jit secret: %v", err)
return &ctrl.Result{}, fmt.Errorf("failed to create jit secret: %v", err)
}
log.Info("Created ephemeral runner secret", "secretName", jitSecret.Name)
return ctrl.Result{Requeue: true}, nil
return nil, nil
}
// updateRunStatusFromPod is responsible for updating non-exiting statuses.
@@ -675,7 +716,7 @@ func (r *EphemeralRunnerReconciler) updateRunStatusFromPod(ctx context.Context,
return nil
}
log.Info("Updating ephemeral runner status with pod phase", "phase", pod.Status.Phase, "reason", pod.Status.Reason, "message", pod.Status.Message)
log.Info("Updating ephemeral runner status with pod phase", "statusPhase", pod.Status.Phase, "statusReason", pod.Status.Reason, "statusMessage", pod.Status.Message)
err := patchSubResource(ctx, r.Status(), ephemeralRunner, func(obj *v1alpha1.EphemeralRunner) {
obj.Status.Phase = pod.Status.Phase
obj.Status.Ready = obj.Status.Ready || (pod.Status.Phase == corev1.PodRunning)
@@ -774,7 +815,7 @@ func (r EphemeralRunnerReconciler) runnerRegisteredWithService(ctx context.Conte
}
if actionsError.StatusCode != http.StatusNotFound ||
!strings.Contains(actionsError.ExceptionName, "AgentNotFoundException") {
!actionsError.IsException("AgentNotFoundException") {
return false, fmt.Errorf("failed to check if runner exists in GitHub service: %v", err)
}
@@ -803,14 +844,14 @@ func (r *EphemeralRunnerReconciler) deleteRunnerFromService(ctx context.Context,
}
// SetupWithManager sets up the controller with the Manager.
func (r *EphemeralRunnerReconciler) SetupWithManager(mgr ctrl.Manager) error {
// TODO(nikola-jokic): Add indexing and filtering fields on corev1.Pod{}
return ctrl.NewControllerManagedBy(mgr).
For(&v1alpha1.EphemeralRunner{}).
Owns(&corev1.Pod{}).
WithEventFilter(predicate.ResourceVersionChangedPredicate{}).
Named("ephemeral-runner-controller").
Complete(r)
func (r *EphemeralRunnerReconciler) SetupWithManager(mgr ctrl.Manager, opts ...Option) error {
return builderWithOptions(
ctrl.NewControllerManagedBy(mgr).
For(&v1alpha1.EphemeralRunner{}).
Owns(&corev1.Pod{}).
WithEventFilter(predicate.ResourceVersionChangedPredicate{}),
opts,
).Complete(r)
}
func runnerContainerStatus(pod *corev1.Pod) *corev1.ContainerStatus {

View File

@@ -29,9 +29,9 @@ import (
)
const (
timeout = time.Second * 10
interval = time.Millisecond * 250
runnerImage = "ghcr.io/actions/actions-runner:latest"
ephemeralRunnerTimeout = time.Second * 20
ephemeralRunnerInterval = time.Millisecond * 250
runnerImage = "ghcr.io/actions/actions-runner:latest"
)
func newExampleRunner(name, namespace, configSecretName string) *v1alpha1.EphemeralRunner {
@@ -133,9 +133,9 @@ var _ = Describe("EphemeralRunner", func() {
n := len(created.Finalizers) // avoid capacity mismatch
return created.Finalizers[:n:n], nil
},
timeout,
interval,
).Should(BeEquivalentTo([]string{ephemeralRunnerActionsFinalizerName, ephemeralRunnerFinalizerName}))
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo([]string{ephemeralRunnerFinalizerName, ephemeralRunnerActionsFinalizerName}))
Eventually(
func() (bool, error) {
@@ -147,8 +147,8 @@ var _ = Describe("EphemeralRunner", func() {
_, ok := secret.Data[jitTokenKey]
return ok, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
Eventually(
@@ -160,8 +160,8 @@ var _ = Describe("EphemeralRunner", func() {
return pod.Name, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(ephemeralRunner.Name))
})
@@ -184,8 +184,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
})
@@ -203,7 +203,7 @@ var _ = Describe("EphemeralRunner", func() {
return "", nil
}
return updated.Status.Phase, nil
}, timeout, interval).Should(BeEquivalentTo(corev1.PodFailed))
}, ephemeralRunnerTimeout, ephemeralRunnerInterval).Should(BeEquivalentTo(corev1.PodFailed))
Expect(updated.Status.Reason).Should(Equal("InvalidPod"))
Expect(updated.Status.Message).Should(Equal("Failed to create the pod: pods \"invalid-ephemeral-runner\" is forbidden: no PriorityClass with name notexist was found"))
})
@@ -247,8 +247,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
// create runner linked secret
@@ -273,8 +273,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
err = k8sClient.Delete(ctx, ephemeralRunner)
@@ -289,8 +289,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return kerrors.IsNotFound(err), nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
Eventually(
@@ -302,8 +302,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return kerrors.IsNotFound(err), nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
Eventually(
@@ -315,8 +315,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return kerrors.IsNotFound(err), nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
Eventually(
@@ -328,8 +328,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return kerrors.IsNotFound(err), nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
Eventually(
@@ -341,8 +341,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return kerrors.IsNotFound(err), nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
})
@@ -356,8 +356,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return updatedEphemeralRunner.Status.RunnerId, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeNumerically(">", 0))
})
@@ -371,8 +371,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
for _, phase := range []corev1.PodPhase{corev1.PodRunning, corev1.PodPending} {
@@ -395,8 +395,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return updated.Status.Phase, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(phase))
}
})
@@ -411,8 +411,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
pod.Status.Phase = corev1.PodRunning
@@ -427,7 +427,7 @@ var _ = Describe("EphemeralRunner", func() {
}
return updated.Status.Phase, nil
},
timeout,
ephemeralRunnerTimeout,
).Should(BeEquivalentTo(""))
})
@@ -462,8 +462,8 @@ var _ = Describe("EphemeralRunner", func() {
Expect(err).To(BeNil(), "Failed to update pod status")
return false, fmt.Errorf("pod haven't failed for 5 times.")
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true), "we should stop creating pod after 5 failures")
// In case we still have pod created due to controller-runtime cache delay, mark the container as exited
@@ -488,7 +488,7 @@ var _ = Describe("EphemeralRunner", func() {
return "", err
}
return updated.Status.Reason, nil
}, timeout, interval).Should(BeEquivalentTo("TooManyPodFailures"), "Reason should be TooManyPodFailures")
}, ephemeralRunnerTimeout, ephemeralRunnerInterval).Should(BeEquivalentTo("TooManyPodFailures"), "Reason should be TooManyPodFailures")
// EphemeralRunner should not have any pod
Eventually(func() (bool, error) {
@@ -497,7 +497,7 @@ var _ = Describe("EphemeralRunner", func() {
return false, nil
}
return kerrors.IsNotFound(err), nil
}, timeout, interval).Should(BeEquivalentTo(true))
}, ephemeralRunnerTimeout, ephemeralRunnerInterval).Should(BeEquivalentTo(true))
})
It("It should re-create pod on eviction", func() {
@@ -510,8 +510,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
pod.Status.Phase = corev1.PodFailed
@@ -530,7 +530,7 @@ var _ = Describe("EphemeralRunner", func() {
return false, err
}
return len(updated.Status.Failures) == 1, nil
}, timeout, interval).Should(BeEquivalentTo(true))
}, ephemeralRunnerTimeout, ephemeralRunnerInterval).Should(BeEquivalentTo(true))
// should re-create after failure
Eventually(
@@ -541,8 +541,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
})
@@ -555,8 +555,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
pod.Status.ContainerStatuses = append(pod.Status.ContainerStatuses, corev1.ContainerStatus{
@@ -577,7 +577,7 @@ var _ = Describe("EphemeralRunner", func() {
return false, err
}
return len(updated.Status.Failures) == 1, nil
}, timeout, interval).Should(BeEquivalentTo(true))
}, ephemeralRunnerTimeout, ephemeralRunnerInterval).Should(BeEquivalentTo(true))
// should re-create after failure
Eventually(
@@ -588,8 +588,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
})
@@ -602,8 +602,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return true, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
// first set phase to running
@@ -627,8 +627,8 @@ var _ = Describe("EphemeralRunner", func() {
}
return updated.Status.Phase, nil
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(corev1.PodRunning))
// set phase to succeeded
@@ -644,7 +644,7 @@ var _ = Describe("EphemeralRunner", func() {
}
return updated.Status.Phase, nil
},
timeout,
ephemeralRunnerTimeout,
).Should(BeEquivalentTo(corev1.PodRunning))
})
})
@@ -671,8 +671,10 @@ var _ = Describe("EphemeralRunner", func() {
fake.WithGetRunner(
nil,
&actions.ActionsError{
StatusCode: http.StatusNotFound,
ExceptionName: "AgentNotFoundException",
StatusCode: http.StatusNotFound,
Err: &actions.ActionsExceptionError{
ExceptionName: "AgentNotFoundException",
},
},
),
),
@@ -698,7 +700,7 @@ var _ = Describe("EphemeralRunner", func() {
return false, err
}
return true, nil
}, timeout, interval).Should(BeEquivalentTo(true))
}, ephemeralRunnerTimeout, ephemeralRunnerInterval).Should(BeEquivalentTo(true))
pod.Status.ContainerStatuses = append(pod.Status.ContainerStatuses, corev1.ContainerStatus{
Name: EphemeralRunnerContainerName,
@@ -718,7 +720,7 @@ var _ = Describe("EphemeralRunner", func() {
return "", nil
}
return updated.Status.Phase, nil
}, timeout, interval).Should(BeEquivalentTo(corev1.PodSucceeded))
}, ephemeralRunnerTimeout, ephemeralRunnerInterval).Should(BeEquivalentTo(corev1.PodSucceeded))
})
})
@@ -798,7 +800,7 @@ var _ = Describe("EphemeralRunner", func() {
return proxySuccessfulllyCalled
},
2*time.Second,
interval,
ephemeralRunnerInterval,
).Should(BeEquivalentTo(true))
})
@@ -823,8 +825,8 @@ var _ = Describe("EphemeralRunner", func() {
err := k8sClient.Get(ctx, client.ObjectKey{Name: ephemeralRunner.Name, Namespace: ephemeralRunner.Namespace}, pod)
g.Expect(err).To(BeNil(), "failed to get ephemeral runner pod")
},
timeout,
interval,
ephemeralRunnerTimeout,
ephemeralRunnerInterval,
).Should(Succeed(), "failed to get ephemeral runner pod")
Expect(pod.Spec.Containers[0].Env).To(ContainElement(corev1.EnvVar{
@@ -956,7 +958,7 @@ var _ = Describe("EphemeralRunner", func() {
return serverSuccessfullyCalled
},
2*time.Second,
interval,
ephemeralRunnerInterval,
).Should(BeTrue(), "failed to contact server")
})
})

View File

@@ -22,7 +22,7 @@ import (
"fmt"
"net/http"
"sort"
"strings"
"strconv"
"github.com/actions/actions-runner-controller/apis/actions.github.com/v1alpha1"
"github.com/actions/actions-runner-controller/controllers/actions.github.com/metrics"
@@ -53,7 +53,7 @@ type EphemeralRunnerSetReconciler struct {
PublishMetrics bool
resourceBuilder resourceBuilder
ResourceBuilder
}
//+kubebuilder:rbac:groups=actions.github.com,resources=ephemeralrunnersets,verbs=get;list;watch;create;update;patch;delete
@@ -156,14 +156,14 @@ func (r *EphemeralRunnerSetReconciler) Reconcile(ctx context.Context, req ctrl.R
return ctrl.Result{}, err
}
pendingEphemeralRunners, runningEphemeralRunners, finishedEphemeralRunners, failedEphemeralRunners, deletingEphemeralRunners := categorizeEphemeralRunners(ephemeralRunnerList)
ephemeralRunnerState := newEphemeralRunnerState(ephemeralRunnerList)
log.Info("Ephemeral runner counts",
"pending", len(pendingEphemeralRunners),
"running", len(runningEphemeralRunners),
"finished", len(finishedEphemeralRunners),
"failed", len(failedEphemeralRunners),
"deleting", len(deletingEphemeralRunners),
"pending", len(ephemeralRunnerState.pending),
"running", len(ephemeralRunnerState.running),
"finished", len(ephemeralRunnerState.finished),
"failed", len(ephemeralRunnerState.failed),
"deleting", len(ephemeralRunnerState.deleting),
)
if r.PublishMetrics {
@@ -183,54 +183,56 @@ func (r *EphemeralRunnerSetReconciler) Reconcile(ctx context.Context, req ctrl.R
Organization: parsedURL.Organization,
Enterprise: parsedURL.Enterprise,
},
len(pendingEphemeralRunners),
len(runningEphemeralRunners),
len(failedEphemeralRunners),
len(ephemeralRunnerState.pending),
len(ephemeralRunnerState.running),
len(ephemeralRunnerState.failed),
)
}
// cleanup finished runners and proceed
var errs []error
for i := range finishedEphemeralRunners {
log.Info("Deleting finished ephemeral runner", "name", finishedEphemeralRunners[i].Name)
if err := r.Delete(ctx, finishedEphemeralRunners[i]); err != nil {
if !kerrors.IsNotFound(err) {
errs = append(errs, err)
total := ephemeralRunnerState.scaleTotal()
if ephemeralRunnerSet.Spec.PatchID == 0 || ephemeralRunnerSet.Spec.PatchID != ephemeralRunnerState.latestPatchID {
defer func() {
if err := r.cleanupFinishedEphemeralRunners(ctx, ephemeralRunnerState.finished, log); err != nil {
log.Error(err, "failed to cleanup finished ephemeral runners")
}
}()
log.Info("Scaling comparison", "current", total, "desired", ephemeralRunnerSet.Spec.Replicas)
switch {
case total < ephemeralRunnerSet.Spec.Replicas: // Handle scale up
count := ephemeralRunnerSet.Spec.Replicas - total
log.Info("Creating new ephemeral runners (scale up)", "count", count)
if err := r.createEphemeralRunners(ctx, ephemeralRunnerSet, count, log); err != nil {
log.Error(err, "failed to make ephemeral runner")
return ctrl.Result{}, err
}
}
}
if len(errs) > 0 {
mergedErrs := multierr.Combine(errs...)
log.Error(mergedErrs, "Failed to delete finished ephemeral runners")
return ctrl.Result{}, mergedErrs
}
total := len(pendingEphemeralRunners) + len(runningEphemeralRunners) + len(failedEphemeralRunners)
log.Info("Scaling comparison", "current", total, "desired", ephemeralRunnerSet.Spec.Replicas)
switch {
case total < ephemeralRunnerSet.Spec.Replicas: // Handle scale up
count := ephemeralRunnerSet.Spec.Replicas - total
log.Info("Creating new ephemeral runners (scale up)", "count", count)
if err := r.createEphemeralRunners(ctx, ephemeralRunnerSet, count, log); err != nil {
log.Error(err, "failed to make ephemeral runner")
return ctrl.Result{}, err
}
case total > ephemeralRunnerSet.Spec.Replicas: // Handle scale down scenario.
count := total - ephemeralRunnerSet.Spec.Replicas
log.Info("Deleting ephemeral runners (scale down)", "count", count)
if err := r.deleteIdleEphemeralRunners(ctx, ephemeralRunnerSet, pendingEphemeralRunners, runningEphemeralRunners, count, log); err != nil {
log.Error(err, "failed to delete idle runners")
return ctrl.Result{}, err
case ephemeralRunnerSet.Spec.PatchID > 0 && total >= ephemeralRunnerSet.Spec.Replicas: // Handle scale down scenario.
// If ephemeral runner did not yet update the phase to succeeded, but the scale down
// request is issued, we should ignore the scale down request.
// Eventually, the ephemeral runner will be cleaned up on the next patch request, which happens
// on the next batch
case ephemeralRunnerSet.Spec.PatchID == 0 && total > ephemeralRunnerSet.Spec.Replicas:
count := total - ephemeralRunnerSet.Spec.Replicas
log.Info("Deleting ephemeral runners (scale down)", "count", count)
if err := r.deleteIdleEphemeralRunners(
ctx,
ephemeralRunnerSet,
ephemeralRunnerState.pending,
ephemeralRunnerState.running,
count,
log,
); err != nil {
log.Error(err, "failed to delete idle runners")
return ctrl.Result{}, err
}
}
}
desiredStatus := v1alpha1.EphemeralRunnerSetStatus{
CurrentReplicas: total,
PendingEphemeralRunners: len(pendingEphemeralRunners),
RunningEphemeralRunners: len(runningEphemeralRunners),
FailedEphemeralRunners: len(failedEphemeralRunners),
PendingEphemeralRunners: len(ephemeralRunnerState.pending),
RunningEphemeralRunners: len(ephemeralRunnerState.running),
FailedEphemeralRunners: len(ephemeralRunnerState.failed),
}
// Update the status if needed.
@@ -247,6 +249,21 @@ func (r *EphemeralRunnerSetReconciler) Reconcile(ctx context.Context, req ctrl.R
return ctrl.Result{}, nil
}
func (r *EphemeralRunnerSetReconciler) cleanupFinishedEphemeralRunners(ctx context.Context, finishedEphemeralRunners []*v1alpha1.EphemeralRunner, log logr.Logger) error {
// cleanup finished runners and proceed
var errs []error
for i := range finishedEphemeralRunners {
log.Info("Deleting finished ephemeral runner", "name", finishedEphemeralRunners[i].Name)
if err := r.Delete(ctx, finishedEphemeralRunners[i]); err != nil {
if !kerrors.IsNotFound(err) {
errs = append(errs, err)
}
}
}
return multierr.Combine(errs...)
}
func (r *EphemeralRunnerSetReconciler) cleanUpProxySecret(ctx context.Context, ephemeralRunnerSet *v1alpha1.EphemeralRunnerSet, log logr.Logger) error {
if ephemeralRunnerSet.Spec.EphemeralRunnerSpec.Proxy == nil {
return nil
@@ -284,19 +301,19 @@ func (r *EphemeralRunnerSetReconciler) cleanUpEphemeralRunners(ctx context.Conte
return true, nil
}
pendingEphemeralRunners, runningEphemeralRunners, finishedEphemeralRunners, failedEphemeralRunners, deletingEphemeralRunners := categorizeEphemeralRunners(ephemeralRunnerList)
ephemeralRunnerState := newEphemeralRunnerState(ephemeralRunnerList)
log.Info("Clean up runner counts",
"pending", len(pendingEphemeralRunners),
"running", len(runningEphemeralRunners),
"finished", len(finishedEphemeralRunners),
"failed", len(failedEphemeralRunners),
"deleting", len(deletingEphemeralRunners),
"pending", len(ephemeralRunnerState.pending),
"running", len(ephemeralRunnerState.running),
"finished", len(ephemeralRunnerState.finished),
"failed", len(ephemeralRunnerState.failed),
"deleting", len(ephemeralRunnerState.deleting),
)
log.Info("Cleanup finished or failed ephemeral runners")
var errs []error
for _, ephemeralRunner := range append(finishedEphemeralRunners, failedEphemeralRunners...) {
for _, ephemeralRunner := range append(ephemeralRunnerState.finished, ephemeralRunnerState.failed...) {
log.Info("Deleting ephemeral runner", "name", ephemeralRunner.Name)
if err := r.Delete(ctx, ephemeralRunner); err != nil && !kerrors.IsNotFound(err) {
errs = append(errs, err)
@@ -310,7 +327,7 @@ func (r *EphemeralRunnerSetReconciler) cleanUpEphemeralRunners(ctx context.Conte
}
// avoid fetching the client if we have nothing left to do
if len(runningEphemeralRunners) == 0 && len(pendingEphemeralRunners) == 0 {
if len(ephemeralRunnerState.running) == 0 && len(ephemeralRunnerState.pending) == 0 {
return false, nil
}
@@ -321,7 +338,7 @@ func (r *EphemeralRunnerSetReconciler) cleanUpEphemeralRunners(ctx context.Conte
log.Info("Cleanup pending or running ephemeral runners")
errs = errs[0:0]
for _, ephemeralRunner := range append(pendingEphemeralRunners, runningEphemeralRunners...) {
for _, ephemeralRunner := range append(ephemeralRunnerState.pending, ephemeralRunnerState.running...) {
log.Info("Removing the ephemeral runner from the service", "name", ephemeralRunner.Name)
_, err := r.deleteEphemeralRunnerWithActionsClient(ctx, ephemeralRunner, actionsClient, log)
if err != nil {
@@ -343,7 +360,7 @@ func (r *EphemeralRunnerSetReconciler) createEphemeralRunners(ctx context.Contex
// Track multiple errors at once and return the bundle.
errs := make([]error, 0)
for i := 0; i < count; i++ {
ephemeralRunner := r.resourceBuilder.newEphemeralRunner(runnerSet)
ephemeralRunner := r.ResourceBuilder.newEphemeralRunner(runnerSet)
if runnerSet.Spec.EphemeralRunnerSpec.Proxy != nil {
ephemeralRunner.Spec.ProxySecretRef = proxyEphemeralRunnerSetSecretName(runnerSet)
}
@@ -414,6 +431,9 @@ func (r *EphemeralRunnerSetReconciler) createProxySecret(ctx context.Context, ep
// When this happens, the next reconcile loop will try to delete the remaining ephemeral runners
// after we get notified by any of the `v1alpha1.EphemeralRunner.Status` updates.
func (r *EphemeralRunnerSetReconciler) deleteIdleEphemeralRunners(ctx context.Context, ephemeralRunnerSet *v1alpha1.EphemeralRunnerSet, pendingEphemeralRunners, runningEphemeralRunners []*v1alpha1.EphemeralRunner, count int, log logr.Logger) error {
if count <= 0 {
return nil
}
runners := newEphemeralRunnerStepper(pendingEphemeralRunners, runningEphemeralRunners)
if runners.len() == 0 {
log.Info("No pending or running ephemeral runners running at this time for scale down")
@@ -427,12 +447,13 @@ func (r *EphemeralRunnerSetReconciler) deleteIdleEphemeralRunners(ctx context.Co
deletedCount := 0
for runners.next() {
ephemeralRunner := runners.object()
if ephemeralRunner.Status.RunnerId == 0 {
isDone := ephemeralRunner.IsDone()
if !isDone && ephemeralRunner.Status.RunnerId == 0 {
log.Info("Skipping ephemeral runner since it is not registered yet", "name", ephemeralRunner.Name)
continue
}
if ephemeralRunner.Status.JobRequestId > 0 {
if !isDone && ephemeralRunner.Status.JobRequestId > 0 {
log.Info("Skipping ephemeral runner since it is running a job", "name", ephemeralRunner.Name, "jobRequestId", ephemeralRunner.Status.JobRequestId)
continue
}
@@ -458,10 +479,14 @@ func (r *EphemeralRunnerSetReconciler) deleteIdleEphemeralRunners(ctx context.Co
func (r *EphemeralRunnerSetReconciler) deleteEphemeralRunnerWithActionsClient(ctx context.Context, ephemeralRunner *v1alpha1.EphemeralRunner, actionsClient actions.ActionsService, log logr.Logger) (bool, error) {
if err := actionsClient.RemoveRunner(ctx, int64(ephemeralRunner.Status.RunnerId)); err != nil {
actionsError := &actions.ActionsError{}
if errors.As(err, &actionsError) &&
actionsError.StatusCode == http.StatusBadRequest &&
strings.Contains(actionsError.ExceptionName, "JobStillRunningException") {
// Runner is still running a job, proceed with the next one
if !errors.As(err, &actionsError) {
log.Error(err, "failed to remove runner from the service", "name", ephemeralRunner.Name, "runnerId", ephemeralRunner.Status.RunnerId)
return false, err
}
if actionsError.StatusCode == http.StatusBadRequest &&
actionsError.IsException("JobStillRunningException") {
log.Info("Runner is still running a job, skipping deletion", "name", ephemeralRunner.Name, "runnerId", ephemeralRunner.Status.RunnerId)
return false, nil
}
@@ -546,28 +571,6 @@ func (r *EphemeralRunnerSetReconciler) actionsClientOptionsFor(ctx context.Conte
// SetupWithManager sets up the controller with the Manager.
func (r *EphemeralRunnerSetReconciler) SetupWithManager(mgr ctrl.Manager) error {
// Index EphemeralRunner owned by EphemeralRunnerSet so we can perform faster look ups.
if err := mgr.GetFieldIndexer().IndexField(context.Background(), &v1alpha1.EphemeralRunner{}, resourceOwnerKey, func(rawObj client.Object) []string {
groupVersion := v1alpha1.GroupVersion.String()
// grab the job object, extract the owner...
ephemeralRunner := rawObj.(*v1alpha1.EphemeralRunner)
owner := metav1.GetControllerOf(ephemeralRunner)
if owner == nil {
return nil
}
// ...make sure it is owned by this controller
if owner.APIVersion != groupVersion || owner.Kind != "EphemeralRunnerSet" {
return nil
}
// ...and if so, return it
return []string{owner.Name}
}); err != nil {
return err
}
return ctrl.NewControllerManagedBy(mgr).
For(&v1alpha1.EphemeralRunnerSet{}).
Owns(&v1alpha1.EphemeralRunner{}).
@@ -580,16 +583,22 @@ type ephemeralRunnerStepper struct {
index int
}
func newEphemeralRunnerStepper(pending, running []*v1alpha1.EphemeralRunner) *ephemeralRunnerStepper {
sort.Slice(pending, func(i, j int) bool {
return pending[i].GetCreationTimestamp().Time.Before(pending[j].GetCreationTimestamp().Time)
})
sort.Slice(running, func(i, j int) bool {
return running[i].GetCreationTimestamp().Time.Before(running[j].GetCreationTimestamp().Time)
func newEphemeralRunnerStepper(primary []*v1alpha1.EphemeralRunner, othersOrdered ...[]*v1alpha1.EphemeralRunner) *ephemeralRunnerStepper {
sort.Slice(primary, func(i, j int) bool {
return primary[i].GetCreationTimestamp().Time.Before(primary[j].GetCreationTimestamp().Time)
})
for _, bucket := range othersOrdered {
sort.Slice(bucket, func(i, j int) bool {
return bucket[i].GetCreationTimestamp().Time.Before(bucket[j].GetCreationTimestamp().Time)
})
}
for _, bucket := range othersOrdered {
primary = append(primary, bucket...)
}
return &ephemeralRunnerStepper{
items: append(pending, running...),
items: primary,
index: -1,
}
}
@@ -613,28 +622,48 @@ func (s *ephemeralRunnerStepper) len() int {
return len(s.items)
}
func categorizeEphemeralRunners(ephemeralRunnerList *v1alpha1.EphemeralRunnerList) (pendingEphemeralRunners, runningEphemeralRunners, finishedEphemeralRunners, failedEphemeralRunners, deletingEphemeralRunners []*v1alpha1.EphemeralRunner) {
type ephemeralRunnerState struct {
pending []*v1alpha1.EphemeralRunner
running []*v1alpha1.EphemeralRunner
finished []*v1alpha1.EphemeralRunner
failed []*v1alpha1.EphemeralRunner
deleting []*v1alpha1.EphemeralRunner
latestPatchID int
}
func newEphemeralRunnerState(ephemeralRunnerList *v1alpha1.EphemeralRunnerList) *ephemeralRunnerState {
var ephemeralRunnerState ephemeralRunnerState
for i := range ephemeralRunnerList.Items {
r := &ephemeralRunnerList.Items[i]
patchID, err := strconv.Atoi(r.Annotations[AnnotationKeyPatchID])
if err == nil && patchID > ephemeralRunnerState.latestPatchID {
ephemeralRunnerState.latestPatchID = patchID
}
if !r.ObjectMeta.DeletionTimestamp.IsZero() {
deletingEphemeralRunners = append(deletingEphemeralRunners, r)
ephemeralRunnerState.deleting = append(ephemeralRunnerState.deleting, r)
continue
}
switch r.Status.Phase {
case corev1.PodRunning:
runningEphemeralRunners = append(runningEphemeralRunners, r)
ephemeralRunnerState.running = append(ephemeralRunnerState.running, r)
case corev1.PodSucceeded:
finishedEphemeralRunners = append(finishedEphemeralRunners, r)
ephemeralRunnerState.finished = append(ephemeralRunnerState.finished, r)
case corev1.PodFailed:
failedEphemeralRunners = append(failedEphemeralRunners, r)
ephemeralRunnerState.failed = append(ephemeralRunnerState.failed, r)
default:
// Pending or no phase should be considered as pending.
//
// If field is not set, that means that the EphemeralRunner
// did not yet have chance to update the Status.Phase field.
pendingEphemeralRunners = append(pendingEphemeralRunners, r)
ephemeralRunnerState.pending = append(ephemeralRunnerState.pending, r)
}
}
return
return &ephemeralRunnerState
}
func (s *ephemeralRunnerState) scaleTotal() int {
return len(s.pending) + len(s.running) + len(s.failed)
}

View File

@@ -18,6 +18,9 @@ const defaultGitHubToken = "gh_token"
func startManagers(t ginkgo.GinkgoTInterface, first manager.Manager, others ...manager.Manager) {
for _, mgr := range append([]manager.Manager{first}, others...) {
if err := SetupIndexers(mgr); err != nil {
t.Fatalf("failed to setup indexers: %v", err)
}
ctx, cancel := context.WithCancel(context.Background())
g, ctx := errgroup.WithContext(ctx)

View File

@@ -0,0 +1,71 @@
package actionsgithubcom
import (
"context"
"slices"
v1alpha1 "github.com/actions/actions-runner-controller/apis/actions.github.com/v1alpha1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
)
func SetupIndexers(mgr ctrl.Manager) error {
if err := mgr.GetFieldIndexer().IndexField(
context.Background(),
&corev1.Pod{},
resourceOwnerKey,
newGroupVersionOwnerKindIndexer("AutoscalingListener", "EphemeralRunner"),
); err != nil {
return err
}
if err := mgr.GetFieldIndexer().IndexField(
context.Background(),
&corev1.ServiceAccount{},
resourceOwnerKey,
newGroupVersionOwnerKindIndexer("AutoscalingListener"),
); err != nil {
return err
}
if err := mgr.GetFieldIndexer().IndexField(
context.Background(),
&v1alpha1.EphemeralRunnerSet{},
resourceOwnerKey,
newGroupVersionOwnerKindIndexer("AutoscalingRunnerSet"),
); err != nil {
return err
}
if err := mgr.GetFieldIndexer().IndexField(
context.Background(),
&v1alpha1.EphemeralRunner{},
resourceOwnerKey,
newGroupVersionOwnerKindIndexer("EphemeralRunnerSet"),
); err != nil {
return err
}
return nil
}
func newGroupVersionOwnerKindIndexer(ownerKind string, otherOwnerKinds ...string) client.IndexerFunc {
owners := append([]string{ownerKind}, otherOwnerKinds...)
return func(o client.Object) []string {
groupVersion := v1alpha1.GroupVersion.String()
owner := metav1.GetControllerOfNoCopy(o)
if owner == nil {
return nil
}
// ...make sure it is owned by this controller
if owner.APIVersion != groupVersion || !slices.Contains(owners, owner.Kind) {
return nil
}
// ...and if so, return it
return []string{owner.Name}
}
}

View File

@@ -0,0 +1,56 @@
package actionsgithubcom
import (
"sigs.k8s.io/controller-runtime/pkg/builder"
"sigs.k8s.io/controller-runtime/pkg/controller"
)
// Options is the optional configuration for the controllers, which can be
// set via command-line flags or environment variables.
type Options struct {
// RunnerMaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run
// by the EphemeralRunnerController.
RunnerMaxConcurrentReconciles int
}
// OptionsWithDefault returns the default options.
// This is here to maintain the options and their default values in one place,
// rather than having to correlate those in multiple places.
func OptionsWithDefault() Options {
return Options{
RunnerMaxConcurrentReconciles: 2,
}
}
type Option func(*controller.Options)
// WithMaxConcurrentReconciles sets the maximum number of concurrent Reconciles which can be run.
//
// This is useful to improve the throughput of the controller, but it may also increase the load on the API server and
// the external service (e.g. GitHub API). The default value is 1, as defined by the controller-runtime.
//
// See https://github.com/actions/actions-runner-controller/issues/3021 for more information
// on real-world use cases and the potential impact of this option.
func WithMaxConcurrentReconciles(n int) Option {
return func(b *controller.Options) {
b.MaxConcurrentReconciles = n
}
}
// builderWithOptions applies the given options to the provided builder, if any.
// This is a helper function to avoid the need to import the controller-runtime package in every reconciler source file
// and the command package that creates the controller.
// This is also useful for reducing code duplication around setting controller options in
// multiple reconcilers.
func builderWithOptions(b *builder.Builder, opts []Option) *builder.Builder {
if len(opts) == 0 {
return b
}
var controllerOpts controller.Options
for _, opt := range opts {
opt(&controllerOpts)
}
return b.WithOptions(controllerOpts)
}

View File

@@ -8,6 +8,7 @@ import (
"math"
"net"
"strconv"
"strings"
"github.com/actions/actions-runner-controller/apis/actions.github.com/v1alpha1"
"github.com/actions/actions-runner-controller/build"
@@ -68,9 +69,11 @@ func SetListenerEntrypoint(entrypoint string) {
}
}
type resourceBuilder struct{}
type ResourceBuilder struct {
ExcludeLabelPropagationPrefixes []string
}
func (b *resourceBuilder) newAutoScalingListener(autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet, ephemeralRunnerSet *v1alpha1.EphemeralRunnerSet, namespace, image string, imagePullSecrets []corev1.LocalObjectReference) (*v1alpha1.AutoscalingListener, error) {
func (b *ResourceBuilder) newAutoScalingListener(autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet, ephemeralRunnerSet *v1alpha1.EphemeralRunnerSet, namespace, image string, imagePullSecrets []corev1.LocalObjectReference) (*v1alpha1.AutoscalingListener, error) {
runnerScaleSetId, err := strconv.Atoi(autoscalingRunnerSet.Annotations[runnerScaleSetIdAnnotationKey])
if err != nil {
return nil, err
@@ -85,13 +88,17 @@ func (b *resourceBuilder) newAutoScalingListener(autoscalingRunnerSet *v1alpha1.
effectiveMinRunners = *autoscalingRunnerSet.Spec.MinRunners
}
labels := map[string]string{
labels := b.mergeLabels(autoscalingRunnerSet.Labels, map[string]string{
LabelKeyGitHubScaleSetNamespace: autoscalingRunnerSet.Namespace,
LabelKeyGitHubScaleSetName: autoscalingRunnerSet.Name,
LabelKeyKubernetesPartOf: labelValueKubernetesPartOf,
LabelKeyKubernetesComponent: "runner-scale-set-listener",
LabelKeyKubernetesVersion: autoscalingRunnerSet.Labels[LabelKeyKubernetesVersion],
labelKeyRunnerSpecHash: autoscalingRunnerSet.ListenerSpecHash(),
})
annotations := map[string]string{
annotationKeyRunnerSpecHash: autoscalingRunnerSet.ListenerSpecHash(),
annotationKeyValuesHash: autoscalingRunnerSet.Annotations[annotationKeyValuesHash],
}
if err := applyGitHubURLLabels(autoscalingRunnerSet.Spec.GitHubConfigUrl, labels); err != nil {
@@ -100,9 +107,10 @@ func (b *resourceBuilder) newAutoScalingListener(autoscalingRunnerSet *v1alpha1.
autoscalingListener := &v1alpha1.AutoscalingListener{
ObjectMeta: metav1.ObjectMeta{
Name: scaleSetListenerName(autoscalingRunnerSet),
Namespace: namespace,
Labels: labels,
Name: scaleSetListenerName(autoscalingRunnerSet),
Namespace: namespace,
Labels: labels,
Annotations: annotations,
},
Spec: v1alpha1.AutoscalingListenerSpec{
GitHubConfigUrl: autoscalingRunnerSet.Spec.GitHubConfigUrl,
@@ -145,7 +153,7 @@ func (lm *listenerMetricsServerConfig) containerPort() (corev1.ContainerPort, er
}, nil
}
func (b *resourceBuilder) newScaleSetListenerConfig(autoscalingListener *v1alpha1.AutoscalingListener, secret *corev1.Secret, metricsConfig *listenerMetricsServerConfig, cert string) (*corev1.Secret, error) {
func (b *ResourceBuilder) newScaleSetListenerConfig(autoscalingListener *v1alpha1.AutoscalingListener, secret *corev1.Secret, metricsConfig *listenerMetricsServerConfig, cert string) (*corev1.Secret, error) {
var (
metricsAddr = ""
metricsEndpoint = ""
@@ -208,7 +216,7 @@ func (b *resourceBuilder) newScaleSetListenerConfig(autoscalingListener *v1alpha
}, nil
}
func (b *resourceBuilder) newScaleSetListenerPod(autoscalingListener *v1alpha1.AutoscalingListener, podConfig *corev1.Secret, serviceAccount *corev1.ServiceAccount, secret *corev1.Secret, metricsConfig *listenerMetricsServerConfig, envs ...corev1.EnvVar) (*corev1.Pod, error) {
func (b *ResourceBuilder) newScaleSetListenerPod(autoscalingListener *v1alpha1.AutoscalingListener, podConfig *corev1.Secret, serviceAccount *corev1.ServiceAccount, secret *corev1.Secret, metricsConfig *listenerMetricsServerConfig, envs ...corev1.EnvVar) (*corev1.Pod, error) {
listenerEnv := []corev1.EnvVar{
{
Name: "LISTENER_CONFIG_PATH",
@@ -226,6 +234,7 @@ func (b *resourceBuilder) newScaleSetListenerPod(autoscalingListener *v1alpha1.A
ports = append(ports, port)
}
terminationGracePeriodSeconds := int64(60)
podSpec := corev1.PodSpec{
ServiceAccountName: serviceAccount.Name,
Containers: []corev1.Container{
@@ -256,8 +265,9 @@ func (b *resourceBuilder) newScaleSetListenerPod(autoscalingListener *v1alpha1.A
},
},
},
ImagePullSecrets: autoscalingListener.Spec.ImagePullSecrets,
RestartPolicy: corev1.RestartPolicyNever,
ImagePullSecrets: autoscalingListener.Spec.ImagePullSecrets,
RestartPolicy: corev1.RestartPolicyNever,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
}
labels := make(map[string]string, len(autoscalingListener.Labels))
@@ -399,33 +409,33 @@ func mergeListenerContainer(base, from *corev1.Container) {
base.TTY = from.TTY
}
func (b *resourceBuilder) newScaleSetListenerServiceAccount(autoscalingListener *v1alpha1.AutoscalingListener) *corev1.ServiceAccount {
func (b *ResourceBuilder) newScaleSetListenerServiceAccount(autoscalingListener *v1alpha1.AutoscalingListener) *corev1.ServiceAccount {
return &corev1.ServiceAccount{
ObjectMeta: metav1.ObjectMeta{
Name: scaleSetListenerServiceAccountName(autoscalingListener),
Namespace: autoscalingListener.Namespace,
Labels: map[string]string{
Labels: b.mergeLabels(autoscalingListener.Labels, map[string]string{
LabelKeyGitHubScaleSetNamespace: autoscalingListener.Spec.AutoscalingRunnerSetNamespace,
LabelKeyGitHubScaleSetName: autoscalingListener.Spec.AutoscalingRunnerSetName,
},
}),
},
}
}
func (b *resourceBuilder) newScaleSetListenerRole(autoscalingListener *v1alpha1.AutoscalingListener) *rbacv1.Role {
func (b *ResourceBuilder) newScaleSetListenerRole(autoscalingListener *v1alpha1.AutoscalingListener) *rbacv1.Role {
rules := rulesForListenerRole([]string{autoscalingListener.Spec.EphemeralRunnerSetName})
rulesHash := hash.ComputeTemplateHash(&rules)
newRole := &rbacv1.Role{
ObjectMeta: metav1.ObjectMeta{
Name: scaleSetListenerRoleName(autoscalingListener),
Namespace: autoscalingListener.Spec.AutoscalingRunnerSetNamespace,
Labels: map[string]string{
Labels: b.mergeLabels(autoscalingListener.Labels, map[string]string{
LabelKeyGitHubScaleSetNamespace: autoscalingListener.Spec.AutoscalingRunnerSetNamespace,
LabelKeyGitHubScaleSetName: autoscalingListener.Spec.AutoscalingRunnerSetName,
labelKeyListenerNamespace: autoscalingListener.Namespace,
labelKeyListenerName: autoscalingListener.Name,
"role-policy-rules-hash": rulesHash,
},
}),
},
Rules: rules,
}
@@ -433,7 +443,7 @@ func (b *resourceBuilder) newScaleSetListenerRole(autoscalingListener *v1alpha1.
return newRole
}
func (b *resourceBuilder) newScaleSetListenerRoleBinding(autoscalingListener *v1alpha1.AutoscalingListener, listenerRole *rbacv1.Role, serviceAccount *corev1.ServiceAccount) *rbacv1.RoleBinding {
func (b *ResourceBuilder) newScaleSetListenerRoleBinding(autoscalingListener *v1alpha1.AutoscalingListener, listenerRole *rbacv1.Role, serviceAccount *corev1.ServiceAccount) *rbacv1.RoleBinding {
roleRef := rbacv1.RoleRef{
Kind: "Role",
Name: listenerRole.Name,
@@ -453,14 +463,14 @@ func (b *resourceBuilder) newScaleSetListenerRoleBinding(autoscalingListener *v1
ObjectMeta: metav1.ObjectMeta{
Name: scaleSetListenerRoleName(autoscalingListener),
Namespace: autoscalingListener.Spec.AutoscalingRunnerSetNamespace,
Labels: map[string]string{
Labels: b.mergeLabels(autoscalingListener.Labels, map[string]string{
LabelKeyGitHubScaleSetNamespace: autoscalingListener.Spec.AutoscalingRunnerSetNamespace,
LabelKeyGitHubScaleSetName: autoscalingListener.Spec.AutoscalingRunnerSetName,
labelKeyListenerNamespace: autoscalingListener.Namespace,
labelKeyListenerName: autoscalingListener.Name,
"role-binding-role-ref-hash": roleRefHash,
"role-binding-subject-hash": subjectHash,
},
}),
},
RoleRef: roleRef,
Subjects: subjects,
@@ -469,18 +479,18 @@ func (b *resourceBuilder) newScaleSetListenerRoleBinding(autoscalingListener *v1
return newRoleBinding
}
func (b *resourceBuilder) newScaleSetListenerSecretMirror(autoscalingListener *v1alpha1.AutoscalingListener, secret *corev1.Secret) *corev1.Secret {
func (b *ResourceBuilder) newScaleSetListenerSecretMirror(autoscalingListener *v1alpha1.AutoscalingListener, secret *corev1.Secret) *corev1.Secret {
dataHash := hash.ComputeTemplateHash(&secret.Data)
newListenerSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: scaleSetListenerSecretMirrorName(autoscalingListener),
Namespace: autoscalingListener.Namespace,
Labels: map[string]string{
Labels: b.mergeLabels(autoscalingListener.Labels, map[string]string{
LabelKeyGitHubScaleSetNamespace: autoscalingListener.Spec.AutoscalingRunnerSetNamespace,
LabelKeyGitHubScaleSetName: autoscalingListener.Spec.AutoscalingRunnerSetName,
"secret-data-hash": dataHash,
},
}),
},
Data: secret.DeepCopy().Data,
}
@@ -488,28 +498,29 @@ func (b *resourceBuilder) newScaleSetListenerSecretMirror(autoscalingListener *v
return newListenerSecret
}
func (b *resourceBuilder) newEphemeralRunnerSet(autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet) (*v1alpha1.EphemeralRunnerSet, error) {
func (b *ResourceBuilder) newEphemeralRunnerSet(autoscalingRunnerSet *v1alpha1.AutoscalingRunnerSet) (*v1alpha1.EphemeralRunnerSet, error) {
runnerScaleSetId, err := strconv.Atoi(autoscalingRunnerSet.Annotations[runnerScaleSetIdAnnotationKey])
if err != nil {
return nil, err
}
runnerSpecHash := autoscalingRunnerSet.RunnerSetSpecHash()
labels := map[string]string{
labelKeyRunnerSpecHash: runnerSpecHash,
labels := b.mergeLabels(autoscalingRunnerSet.Labels, map[string]string{
LabelKeyKubernetesPartOf: labelValueKubernetesPartOf,
LabelKeyKubernetesComponent: "runner-set",
LabelKeyKubernetesVersion: autoscalingRunnerSet.Labels[LabelKeyKubernetesVersion],
LabelKeyGitHubScaleSetName: autoscalingRunnerSet.Name,
LabelKeyGitHubScaleSetNamespace: autoscalingRunnerSet.Namespace,
}
})
if err := applyGitHubURLLabels(autoscalingRunnerSet.Spec.GitHubConfigUrl, labels); err != nil {
return nil, fmt.Errorf("failed to apply GitHub URL labels: %v", err)
}
newAnnotations := map[string]string{
AnnotationKeyGitHubRunnerGroupName: autoscalingRunnerSet.Annotations[AnnotationKeyGitHubRunnerGroupName],
AnnotationKeyGitHubRunnerGroupName: autoscalingRunnerSet.Annotations[AnnotationKeyGitHubRunnerGroupName],
AnnotationKeyGitHubRunnerScaleSetName: autoscalingRunnerSet.Annotations[AnnotationKeyGitHubRunnerScaleSetName],
annotationKeyRunnerSpecHash: runnerSpecHash,
}
newEphemeralRunnerSet := &v1alpha1.EphemeralRunnerSet{
@@ -536,24 +547,21 @@ func (b *resourceBuilder) newEphemeralRunnerSet(autoscalingRunnerSet *v1alpha1.A
return newEphemeralRunnerSet, nil
}
func (b *resourceBuilder) newEphemeralRunner(ephemeralRunnerSet *v1alpha1.EphemeralRunnerSet) *v1alpha1.EphemeralRunner {
func (b *ResourceBuilder) newEphemeralRunner(ephemeralRunnerSet *v1alpha1.EphemeralRunnerSet) *v1alpha1.EphemeralRunner {
labels := make(map[string]string)
for _, key := range commonLabelKeys {
switch key {
case LabelKeyKubernetesComponent:
labels[key] = "runner"
default:
v, ok := ephemeralRunnerSet.Labels[key]
if !ok {
continue
}
labels[key] = v
for k, v := range ephemeralRunnerSet.Labels {
if k == LabelKeyKubernetesComponent {
labels[k] = "runner"
} else {
labels[k] = v
}
}
annotations := make(map[string]string)
for key, val := range ephemeralRunnerSet.Annotations {
annotations[key] = val
}
annotations[AnnotationKeyPatchID] = strconv.Itoa(ephemeralRunnerSet.Spec.PatchID)
return &v1alpha1.EphemeralRunner{
TypeMeta: metav1.TypeMeta{},
ObjectMeta: metav1.ObjectMeta{
@@ -566,7 +574,7 @@ func (b *resourceBuilder) newEphemeralRunner(ephemeralRunnerSet *v1alpha1.Epheme
}
}
func (b *resourceBuilder) newEphemeralRunnerPod(ctx context.Context, runner *v1alpha1.EphemeralRunner, secret *corev1.Secret, envs ...corev1.EnvVar) *corev1.Pod {
func (b *ResourceBuilder) newEphemeralRunnerPod(ctx context.Context, runner *v1alpha1.EphemeralRunner, secret *corev1.Secret, envs ...corev1.EnvVar) *corev1.Pod {
var newPod corev1.Pod
labels := map[string]string{}
@@ -634,7 +642,7 @@ func (b *resourceBuilder) newEphemeralRunnerPod(ctx context.Context, runner *v1a
return &newPod
}
func (b *resourceBuilder) newEphemeralRunnerJitSecret(ephemeralRunner *v1alpha1.EphemeralRunner) *corev1.Secret {
func (b *ResourceBuilder) newEphemeralRunnerJitSecret(ephemeralRunner *v1alpha1.EphemeralRunner) *corev1.Secret {
return &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: ephemeralRunner.Name,
@@ -741,3 +749,29 @@ func trimLabelValue(val string) string {
}
return val
}
func (b *ResourceBuilder) mergeLabels(base, overwrite map[string]string) map[string]string {
mergedLabels := make(map[string]string, len(base))
base:
for k, v := range base {
for _, prefix := range b.ExcludeLabelPropagationPrefixes {
if strings.HasPrefix(k, prefix) {
continue base
}
}
mergedLabels[k] = v
}
overwrite:
for k, v := range overwrite {
for _, prefix := range b.ExcludeLabelPropagationPrefixes {
if strings.HasPrefix(k, prefix) {
continue overwrite
}
}
mergedLabels[k] = v
}
return mergedLabels
}

View File

@@ -19,12 +19,18 @@ func TestLabelPropagation(t *testing.T) {
Name: "test-scale-set",
Namespace: "test-ns",
Labels: map[string]string{
LabelKeyKubernetesPartOf: labelValueKubernetesPartOf,
LabelKeyKubernetesVersion: "0.2.0",
LabelKeyKubernetesPartOf: labelValueKubernetesPartOf,
LabelKeyKubernetesVersion: "0.2.0",
"arbitrary-label": "random-value",
"example.com/label": "example-value",
"example.com/example": "example-value",
"directly.excluded.org/label": "excluded-value",
"directly.excluded.org/arbitrary": "not-excluded-value",
},
Annotations: map[string]string{
runnerScaleSetIdAnnotationKey: "1",
AnnotationKeyGitHubRunnerGroupName: "test-group",
runnerScaleSetIdAnnotationKey: "1",
AnnotationKeyGitHubRunnerGroupName: "test-group",
AnnotationKeyGitHubRunnerScaleSetName: "test-scale-set",
},
},
Spec: v1alpha1.AutoscalingRunnerSetSpec{
@@ -32,31 +38,44 @@ func TestLabelPropagation(t *testing.T) {
},
}
var b resourceBuilder
b := ResourceBuilder{
ExcludeLabelPropagationPrefixes: []string{
"example.com/",
"directly.excluded.org/label",
},
}
ephemeralRunnerSet, err := b.newEphemeralRunnerSet(&autoscalingRunnerSet)
require.NoError(t, err)
assert.Equal(t, labelValueKubernetesPartOf, ephemeralRunnerSet.Labels[LabelKeyKubernetesPartOf])
assert.Equal(t, "runner-set", ephemeralRunnerSet.Labels[LabelKeyKubernetesComponent])
assert.Equal(t, autoscalingRunnerSet.Labels[LabelKeyKubernetesVersion], ephemeralRunnerSet.Labels[LabelKeyKubernetesVersion])
assert.NotEmpty(t, ephemeralRunnerSet.Labels[labelKeyRunnerSpecHash])
assert.NotEmpty(t, ephemeralRunnerSet.Annotations[annotationKeyRunnerSpecHash])
assert.Equal(t, autoscalingRunnerSet.Name, ephemeralRunnerSet.Labels[LabelKeyGitHubScaleSetName])
assert.Equal(t, autoscalingRunnerSet.Namespace, ephemeralRunnerSet.Labels[LabelKeyGitHubScaleSetNamespace])
assert.Equal(t, "", ephemeralRunnerSet.Labels[LabelKeyGitHubEnterprise])
assert.Equal(t, "org", ephemeralRunnerSet.Labels[LabelKeyGitHubOrganization])
assert.Equal(t, "repo", ephemeralRunnerSet.Labels[LabelKeyGitHubRepository])
assert.Equal(t, autoscalingRunnerSet.Annotations[AnnotationKeyGitHubRunnerGroupName], ephemeralRunnerSet.Annotations[AnnotationKeyGitHubRunnerGroupName])
assert.Equal(t, autoscalingRunnerSet.Annotations[AnnotationKeyGitHubRunnerScaleSetName], ephemeralRunnerSet.Annotations[AnnotationKeyGitHubRunnerScaleSetName])
assert.Equal(t, autoscalingRunnerSet.Labels["arbitrary-label"], ephemeralRunnerSet.Labels["arbitrary-label"])
listener, err := b.newAutoScalingListener(&autoscalingRunnerSet, ephemeralRunnerSet, autoscalingRunnerSet.Namespace, "test:latest", nil)
require.NoError(t, err)
assert.Equal(t, labelValueKubernetesPartOf, listener.Labels[LabelKeyKubernetesPartOf])
assert.Equal(t, "runner-scale-set-listener", listener.Labels[LabelKeyKubernetesComponent])
assert.Equal(t, autoscalingRunnerSet.Labels[LabelKeyKubernetesVersion], listener.Labels[LabelKeyKubernetesVersion])
assert.NotEmpty(t, ephemeralRunnerSet.Labels[labelKeyRunnerSpecHash])
assert.NotEmpty(t, ephemeralRunnerSet.Annotations[annotationKeyRunnerSpecHash])
assert.Equal(t, autoscalingRunnerSet.Name, listener.Labels[LabelKeyGitHubScaleSetName])
assert.Equal(t, autoscalingRunnerSet.Namespace, listener.Labels[LabelKeyGitHubScaleSetNamespace])
assert.Equal(t, "", listener.Labels[LabelKeyGitHubEnterprise])
assert.Equal(t, "org", listener.Labels[LabelKeyGitHubOrganization])
assert.Equal(t, "repo", listener.Labels[LabelKeyGitHubRepository])
assert.Equal(t, autoscalingRunnerSet.Labels["arbitrary-label"], listener.Labels["arbitrary-label"])
assert.NotContains(t, listener.Labels, "example.com/label")
assert.NotContains(t, listener.Labels, "example.com/example")
assert.NotContains(t, listener.Labels, "directly.excluded.org/label")
assert.Equal(t, "not-excluded-value", listener.Labels["directly.excluded.org/arbitrary"])
listenerServiceAccount := &corev1.ServiceAccount{
ObjectMeta: metav1.ObjectMeta{
@@ -83,6 +102,7 @@ func TestLabelPropagation(t *testing.T) {
}
assert.Equal(t, "runner", ephemeralRunner.Labels[LabelKeyKubernetesComponent])
assert.Equal(t, autoscalingRunnerSet.Annotations[AnnotationKeyGitHubRunnerGroupName], ephemeralRunner.Annotations[AnnotationKeyGitHubRunnerGroupName])
assert.Equal(t, autoscalingRunnerSet.Annotations[AnnotationKeyGitHubRunnerScaleSetName], ephemeralRunnerSet.Annotations[AnnotationKeyGitHubRunnerScaleSetName])
runnerSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
@@ -109,8 +129,9 @@ func TestGitHubURLTrimLabelValues(t *testing.T) {
LabelKeyKubernetesVersion: "0.2.0",
},
Annotations: map[string]string{
runnerScaleSetIdAnnotationKey: "1",
AnnotationKeyGitHubRunnerGroupName: "test-group",
runnerScaleSetIdAnnotationKey: "1",
AnnotationKeyGitHubRunnerGroupName: "test-group",
AnnotationKeyGitHubRunnerScaleSetName: "test-scale-set",
},
},
}
@@ -121,7 +142,7 @@ func TestGitHubURLTrimLabelValues(t *testing.T) {
GitHubConfigUrl: fmt.Sprintf("https://github.com/%s/%s", organization, repository),
}
var b resourceBuilder
var b ResourceBuilder
ephemeralRunnerSet, err := b.newEphemeralRunnerSet(autoscalingRunnerSet)
require.NoError(t, err)
assert.Len(t, ephemeralRunnerSet.Labels[LabelKeyGitHubEnterprise], 0)
@@ -145,7 +166,7 @@ func TestGitHubURLTrimLabelValues(t *testing.T) {
GitHubConfigUrl: fmt.Sprintf("https://github.com/enterprises/%s", enterprise),
}
var b resourceBuilder
var b ResourceBuilder
ephemeralRunnerSet, err := b.newEphemeralRunnerSet(autoscalingRunnerSet)
require.NoError(t, err)
assert.Len(t, ephemeralRunnerSet.Labels[LabelKeyGitHubEnterprise], 63)

View File

@@ -1,5 +1,8 @@
# About ARC
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Introduction
This document provides a high-level overview of Actions Runner Controller (ARC). ARC enables running Github Actions Runners on Kubernetes (K8s) clusters.

View File

@@ -42,7 +42,7 @@ eliminate some duplication:
`-coverprofile` flags: while `-short` is used to skip [old ARC E2E
tests](https://github.com/actions/actions-runner-controller/blob/master/test/e2e/e2e_test.go#L85-L87),
`-coverprofile` is adding to the test time without really giving us any value
in return. We should also start using `actions/setup-go@v4` to take advantage
in return. We should also start using `actions/setup-go@v5` to take advantage
of caching (it would speed up our tests by a lot) or enable it on `v3` if we
have a strong reason not to upgrade. We should keep ignoring our E2E tests too
as those will be run elsewhere (either use `Short` there too or ignoring the

View File

@@ -1,5 +1,8 @@
# Authenticating to the GitHub API
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Setting Up Authentication with GitHub API
There are two ways for actions-runner-controller to authenticate with the GitHub API (only 1 can be configured at a time however):

View File

@@ -1,5 +1,8 @@
# Automatically scaling runners
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Overview
> If you are using controller version < [v0.22.0](https://github.com/actions/actions-runner-controller/releases/tag/v0.22.0) and you are not using GHES, and so you can't set your rate limit budget, it is recommended that you use 100 replicas or fewer to prevent being rate limited.

View File

@@ -1,5 +1,8 @@
# Adding ARC runners to a repository, organization, or enterprise
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Usage
[GitHub self-hosted runners can be deployed at various levels in a management hierarchy](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners#about-self-hosted-runners):

View File

@@ -1,5 +1,8 @@
# Configuring Windows runners
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Setting up Windows Runners
The main two steps in enabling Windows self-hosted runners are:

View File

@@ -1,5 +1,8 @@
# Deploying alternative runners
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Alternative Runners
ARC also offers a few alternative runner options

View File

@@ -1,5 +1,8 @@
# Deploying ARC runners
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Deploying runners with RunnerDeployments
In our previous examples we were deploying a single runner via the `RunnerDeployment` kind, the amount of runners deployed can be statically set via the `replicas:` field, we can increase this value to deploy additional sets of runners instead:

View File

@@ -43,10 +43,91 @@ You can follow [this troubleshooting guide](https://docs.github.com/en/actions/h
## Changelog
### 0.10.0
This release includes major improvements to the runner provisioning duration. In short, you should see less latency between queueing a workflow run and having a runner available to execute the job.
Make sure to check [#3832](https://github.com/actions/actions-runner-controller/pull/3832) and [#3848](https://github.com/actions/actions-runner-controller/pull/3848) for details on how to fine-tune that behavior.
### Major changes
1. Add exponential backoff when generating runner reg tokens [#3724](https://github.com/actions/actions-runner-controller/pull/3724)
1. Make EphemeralRunnerController MaxConcurrentReconciles configurable [#3832](https://github.com/actions/actions-runner-controller/pull/3832)
1. Make EphemeralRunnerReconciler create runner pods earlier [#3831](https://github.com/actions/actions-runner-controller/pull/3831)
1. Make k8s client rate limiter parameters configurable [#3848](https://github.com/actions/actions-runner-controller/pull/3848)
### Minor changes
1. Bump github.com/bradleyfalzon/ghinstallation/v2 from `2.8.0` to `2.12.0` [#3837](https://github.com/actions/actions-runner-controller/pull/3837)
2. Bump golang.org/x/crypto from `0.22.0` to `0.31.0` [#3844](https://github.com/actions/actions-runner-controller/pull/3844)
1. Update docs with details for the dashboard visualizations [#3696](https://github.com/actions/actions-runner-controller/pull/3696)
### v0.9.3
1. AutoscalingListener controller: Inspect listener container state instead of pod phase [#3548](https://github.com/actions/actions-runner-controller/pull/3548)
1. Exclude label prefix propagation [#3607](https://github.com/actions/actions-runner-controller/pull/3607)
1. Check status code of fetch access token for github app [#3568](https://github.com/actions/actions-runner-controller/pull/3568)
1. Remove .Named() from the ephemeral runner controller [#3596](https://github.com/actions/actions-runner-controller/pull/3596)
1. Customize work directory [#3477](https://github.com/actions/actions-runner-controller/pull/3477)
1. Fix problem with ephemeralRunner Succeeded state before build executed [#3528](https://github.com/actions/actions-runner-controller/pull/3528)
1. Remove finalizers in one pass to speed up cleanups AutoscalingRunnerSet [#3536](https://github.com/actions/actions-runner-controller/pull/3536)
### v0.9.2
1. Refresh session if token expires during delete message [#3529](https://github.com/actions/actions-runner-controller/pull/3529)
1. Re-use the last desired patch on empty batch [#3453](https://github.com/actions/actions-runner-controller/pull/3453)
1. Extract single place to set up indexers [#3454](https://github.com/actions/actions-runner-controller/pull/3454)
1. Include controller version in logs [#3473](https://github.com/actions/actions-runner-controller/pull/3473)
1. Propogate arbitrary labels from runnersets to all created resources [#3157](https://github.com/actions/actions-runner-controller/pull/3157)
### v0.9.1
#### Major changes
1. Shutdown metrics server when listener exits [#3445](https://github.com/actions/actions-runner-controller/pull/3445)
1. Propagate max capacity information to the actions back-end [#3431](https://github.com/actions/actions-runner-controller/pull/3431)
1. Refactor actions client error to include request id [#3430](https://github.com/actions/actions-runner-controller/pull/3430)
1. Include self correction on empty batch and avoid removing pending runners when cluster is busy [#3426](https://github.com/actions/actions-runner-controller/pull/3426)
1. Add topologySpreadConstraint to gha-runner-scale-set-controller chart [#3405](https://github.com/actions/actions-runner-controller/pull/3405)
### v0.9.0
#### ⚠️ Warning
- This release contains CRD changes. During the upgrade, please remove the old CRDs before re-installing the new version. For more information, please read the [Upgrading ARC](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#upgrading-arc).
- This release contains changes in the [default docker socket path](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#upgrading-arc) expanded for container mode `dind`.
- Older version of the listener (`githubrunnerscalesetlistener`) is deprecated and will be removed in the future `0.10.0` release.
Please evaluate these changes carefully before upgrading.
#### Major changes
1. Change docker socket path to /var/run/docker.sock [#3337](https://github.com/actions/actions-runner-controller/pull/3337)
1. Update metrics to include repository on job-based label [#3310](https://github.com/actions/actions-runner-controller/pull/3310)
1. Bump Go version to 1.22.1 [#3290](https://github.com/actions/actions-runner-controller/pull/3290)
1. Propagate runner scale set name annotation to EphemeralRunner [#3098](https://github.com/actions/actions-runner-controller/pull/3098)
1. Add annotation with values hash to re-create listener [#3195](https://github.com/actions/actions-runner-controller/pull/3195)
1. Fix overscaling when the controller is much faster then the listener [#3371](https://github.com/actions/actions-runner-controller/pull/3371)
1. Add retry on 401 and 403 for runner-registration [#3377](https://github.com/actions/actions-runner-controller/pull/3377)
### v0.8.3
1. Expose volumeMounts and volumes in gha-runner-scale-set-controller [#3260](https://github.com/actions/actions-runner-controller/pull/3260)
1. Refer to the correct variable in discovery error message [#3296](https://github.com/actions/actions-runner-controller/pull/3296)
1. Fix acquire jobs after session refresh ghalistener [#3307](https://github.com/actions/actions-runner-controller/pull/3307)
### v0.8.2
1. Add listener graceful termination period and background context after the message is received [#3187](https://github.com/actions/actions-runner-controller/pull/3187)
1. Publish metrics in the new ghalistener [#3193](https://github.com/actions/actions-runner-controller/pull/3193)
1. Delete message session when listener.Listen returns [#3240](https://github.com/actions/actions-runner-controller/pull/3240)
### v0.8.1
1. Fix proxy issue in new listener client [#3181](https://github.com/actions/actions-runner-controller/pull/3181)
### v0.8.0
1. Change listener container name [#3167](https://github.com/actions/actions-runner-controller/pull/3167)
1. Fix empty env and volumeMounts object on default setup [#3166](https://github.com/actions/actions-runner-controller/pull/3166)
1. Fix override listener pod spec [#3161](https://github.com/actions/actions-runner-controller/pull/3161)
@@ -68,6 +149,7 @@ You can follow [this troubleshooting guide](https://docs.github.com/en/actions/h
1. ADR: Changing semantics of min runners to be min idle runners [#3040](https://github.com/actions/actions-runner-controller/pull/3040)
### v0.7.0
1. Add ResizePolicy and RestartPolicy on mergeListenerContainer [#3075](https://github.com/actions/actions-runner-controller/pull/3075)
1. feat: GHA controller Helm Chart quoted labels [#3061](https://github.com/actions/actions-runner-controller/pull/3061)
1. Update authorization for PAT to be Bearer as documented [#3039](https://github.com/actions/actions-runner-controller/pull/3039)
@@ -82,12 +164,14 @@ You can follow [this troubleshooting guide](https://docs.github.com/en/actions/h
1. chore: Service accounts in Kubernetes mode can now be annotated. [#2566](https://github.com/actions/actions-runner-controller/pull/2566)
### v0.6.1
1. Replace TLS dockerd connection with unix socket [#2833](https://github.com/actions/actions-runner-controller/pull/2833)
1. Fix name override labels when runnerScaleSetName value is set [#2915](https://github.com/actions/actions-runner-controller/pull/2915)
1. Fix nil map when annotations are applied [#2916](https://github.com/actions/actions-runner-controller/pull/2916)
1. Updates: container-hooks to v0.4.0 [#2928](https://github.com/actions/actions-runner-controller/pull/2928)
### v0.6.0
1. Fix parsing AcquireJob MessageQueueTokenExpiredError [#2837](https://github.com/actions/actions-runner-controller/pull/2837)
1. Set restart policy on the runner pod to Never if restartPolicy is not set in template [#2787](https://github.com/actions/actions-runner-controller/pull/2787)
1. Set the AutoscalingRunnerSet name to runnerScaleSetName [#2803](https://github.com/actions/actions-runner-controller/pull/2803)

View File

@@ -12,4 +12,26 @@ We do not intend to provide a supported ARC dashboard. This is simply a referenc
1. Make sure to have [Grafana](https://grafana.com/docs/grafana/latest/installation/) and [Prometheus](https://prometheus.io/docs/prometheus/latest/installation/) running in your cluster.
2. Make sure that Prometheus is properly scraping the metrics endpoints of the controller-manager and listeners.
3. Import the [dashboard](ARC-Autoscaling-Runner-Set-Monitoring_1692627561838.json.json) into Grafana.
3. Import the [dashboard](ARC-Autoscaling-Runner-Set-Monitoring_1692627561838.json) into Grafana.
## Details
This dashboard demonstrates some of the metrics provided by ARC and the underlying Kubernetes runtime. It provides a sample visualization of the behavior of the runner scale set, the ARC controllers, and the listeners. This should not be considered a comprehensive dashboard; it is a starting point that can be used with other metrics and logs to understand the health of the cluster. Review the [GitHub documentation detailing the Actions Runner Controller metrics and how to enable them](https://docs.github.com/en/enterprise-server@3.10/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#enabling-metrics).
The dashboard includes the following metrics:
| Label | Description |
| -------------------------------- | ----------------------------------------------------|
| Active listeners | The number of listeners currently running and attempting to manage jobs for the scale set. This should match the number of scale sets deployed. |
| Runner States | Displays the number of runners in a given state. The finished and deleted states are not included in this panel. |
| Failed (total) | The total number of ephemeral runners that have failed to properly start. This may require reviewing the custom resource and logs to identify and resolve the root causes. Common causes include resource issues and failure to pull the required image. |
| Pending (total) | The total number of ephemeral runners that ARC has requested and is waiting for Kubernetes to provide in a running state. If the Kubernetes API server is responsive, this will typically match the number of runner pods that are in a pending state. This number includes requests for runner pods that have not yet been scheduled. When this number is higher than the number of runner pods in a pending state, it can indicate performance issues with the API server and resource contention. |
| Idle (total) | The total number of ephemeral runners that are available to accept jobs across all scale sets. Keeping a pool of idle runners can enable a faster start time under load, but excessive idle runners will consume resources and can prevent nodes from scaling down. |
| Total assigned jobs per listener | The number of workflow jobs acquired and assigned to the listener. The listener must provide supporting runners to complete these jobs. Once jobs are assigned, they cannot be delegated to other listeners and must be processed by the scale set or cancelled. |
| Assigned vs running jobs | Compares the number of jobs assigned against the number of runners that are currently processing jobs. When running jobs is less than assigned jobs, it can indicate that ARC is waiting on Kubernetes to provide and start additional runners. |
| Average startup duration | The average time in seconds between when jobs are assigned and when a runner accepts the job and begins processing. An increasing duration can indicate that the cluster has resource contention or a lack of available nodes for scheduling jobs |
| Average execution duration | The average time in seconds that runners are taking to complete a job. Changes in this value reflect the efficiency of workflow jobs and the pod configuration. If the value is decreasing without changes to the job, it can indicate resource contention or CPU throttling. |
| Reconciliation errors | Reconciliation is the process of a controller ensuring the desired state and actual state of the resources match. Each time an event occurs on a resource watched by the controller, the controller is required to indicate if the new state matches the desired state. Kubernetes adds a task to the work queue for the controller to perform this reconciliation. Errors indicate that controller has not achieved a desired state and is requesting Kubernetes to queue another request for reconciliation. Ideally, this number remains close to zero. An increasing number can indicate resource contention or delays processing API server requests. This reflects Kubernetes resources that ARC is waiting to be provided or in the necessary state. As a concrete example, ARC will request the creation of a secret prior to creating the pod. If the response indicates the secret is not immediately ready, ARC will requeue the reconciliation task with the error details, incrementing this count. |
| Reconciliation time | A histogram reflecting the time in seconds to perform a single reconciliation task from the controller's work queue. A histogram counts the number of requests that are processed within a given bucket of time. This metric reflects the time it takes for ARC to complete each step in the processing of creating, managing, and cleaning up runners. As this increases, it can indicate resource contention or processing delays within Kubernetes or the API server. This displays shows an average, which may hide larger or smaller times that are occurring in the processing. |
| Workqueue depth | The number of tasks that Kubernetes queued for the ARC controllers to process. This includes reconciliation requests and tasks from ARC. ARC sequentially processes a work queue of single, small task to avoid concurrency issues. Managing a runner requires multiple steps to prepare, create, update, and delete the runner, its resources, and the ARC custom resources. As each step is completed (or trigger reconciliation), new tasks are queued for processing. As the depth increases, it indicates more tasks awaiting time from the controller. Growth indicates increasing work and may indicate Kubernetes resource contention or processing latencies. Each request for a new runner will result in multiple tasks being added to the work queue to prepare and create the runner and the related ARC custom resources. |
| Scrape Duration (seconds) | The amount of time required for Prometheus to read the configured metrics from components in the cluster. An increasing number may indicate a lack of resources for Prometheus and a risk of the process exceeding the configured timeout, leading to lost metrics data. |

View File

@@ -1,5 +1,8 @@
# Installing ARC
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Installation
By default, actions-runner-controller uses [cert-manager](https://cert-manager.io/docs/installation/kubernetes/) for certificate management of Admission Webhook. Make sure you have already installed cert-manager before you install. The installation instructions for the cert-manager can be found below.

View File

@@ -1,5 +1,8 @@
# Managing access with runner groups
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Runner Groups
Runner groups can be used to limit which repositories are able to use the GitHub Runner at an organization level. Runner groups have to be [created in GitHub first](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/managing-access-to-self-hosted-runners-using-groups) before they can be referenced.

View File

@@ -1,5 +1,8 @@
# Monitoring and troubleshooting
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Metrics
The controller also exposes Prometheus metrics on a `/metrics` endpoint. By default this is on port `8443` behind an RBAC proxy.

View File

@@ -1,5 +1,8 @@
# Actions Runner Controller Quickstart
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
GitHub Actions automates the deployment of code to different environments, including production. The environments contain the `GitHub Runner` software which executes the automation. `GitHub Runner` can be run in GitHub-hosted cloud or self-hosted environments. Self-hosted environments offer more control of hardware, operating system, and software tools. They can be run on physical machines, virtual machines, or in a container. Containerized environments are lightweight, loosely coupled, highly efficient and can be managed centrally. However, they are not straightforward to use.
`Actions Runner Controller (ARC)` makes it simpler to run self hosted environments on Kubernetes(K8s) cluster.

View File

@@ -1,5 +1,8 @@
# Using ARC across organizations
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Multitenancy
> This feature requires controller version => [v0.26.0](https://github.com/actions/actions-runner-controller/releases/tag/v0.26.0)

View File

@@ -1,5 +1,8 @@
# Using ARC runners in a workflow
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Runner Labels
To run a workflow job on a self-hosted runner, you can use the following syntax in your workflow:

View File

@@ -1,5 +1,8 @@
# Using custom volumes
> [!WARNING]
> This documentation covers the legacy mode of ARC (resources in the `actions.summerwind.net` namespace). If you're looking for documentation on the newer autoscaling runner scale sets, it is available in [GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller). To understand why these resources are considered legacy (and the benefits of using the newer autoscaling runner scale sets), read [this discussion (#2775)](https://github.com/actions/actions-runner-controller/discussions/2775).
## Custom Volume mounts
You can configure your own custom volume mounts. For example to have the work/docker data in memory or on NVME SSD, for

Some files were not shown because too many files have changed in this diff Show More