Compare commits

..

430 Commits

Author SHA1 Message Date
Yusuke Kuoka
edbdef8d20 Bump chart version to 0.20.0 for ARC 0.25.0 (#1600)
We'll be merging this immediately after ARC 0.25.0 gets released.
2022-07-05 11:19:24 +09:00
Nguyễn Đức Chiến
a190fa97bb Fix helm charts (#1603) 2022-07-05 10:35:57 +09:00
Yusuke Kuoka
bfc5ea4727 Fix a regression in webhook-based autoscaler (#1596)
The regression resulted in the webhook-based autoscaler be unable to find visible runner groups and therefore unable to scale up and down the target RunnerDeployment/RunnerSet at all when the webhook-based autoscaler was provided GitHub API credentials to enable the runner groups support. This fixes that.

The regression was introduced via #1578 which is not released yet. Users of existing ARC releases are therefore not affected.
2022-07-04 20:17:09 +09:00
renovate[bot]
5a9e8545aa fix(deps): update golang.org/x/oauth2 digest to 2104d58 (#1593)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-07-02 14:06:21 +09:00
Yusuke Kuoka
4446ba57e1 Cover ARC upgrade in E2E test (#1592)
* Cover ARC upgrade in E2E test

so that we can make it extra sure that you can upgrade the existing installation of ARC to the next and also (hopefully) it is backward-compatible, or at least it does not break immediately after upgrading.

* Consolidate E2E tests for RS and RD

* Fix E2E for RD to pass

* Add some comment in E2E for how to release disk consumed after dozens of test runs
2022-07-01 21:32:05 +09:00
Martin Moon (문성주)
d62c8a4697 fix typo. (#1594) 2022-07-01 10:24:41 +09:00
Yusuke Kuoka
946d5b1fa7 Add release note for v0.25.0 (#1591) 2022-06-30 22:11:22 +09:00
renovate[bot]
da6b07660e fix(deps): update golang.org/x/oauth2 digest to 02e64fa (#1480)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-30 11:32:26 +09:00
Callum Tait
e3deb0d752 chore: move runner docker check (#1548) 2022-06-30 11:31:50 +09:00
Callum Tait
82641e5036 chore: move HOME to more logical place (#1460)
* chore: move HOME to more logical place

* chore: don't break the PATH

* chore: don't break the PATH

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-06-30 11:21:05 +09:00
Vladyslav Miletskyi
2fe6adf5b7 Runner Entrypoint: fix daemon.json (#1409)
* Runner Entrypoint: fix daemon.json

Do not owerwrite daemon.json if it already exists.
Usage: custom images, which are using public image as source.

* Update runner/startup.sh

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-06-30 11:03:12 +09:00
renovate[bot]
736126b793 chore(deps): update helm values quay.io/brancz/kube-rbac-proxy to v0.13.0 (#1589)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-30 09:51:38 +09:00
renovate[bot]
6abf5bbac8 fix(deps): update module github.com/stretchr/testify to v1.8.0 (#1584)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-30 09:50:55 +09:00
Yusuke Kuoka
dc4f116bda Reflect manual test scenario for containerMode=kubernetes to E2E (#1588)
With this my semi-automatic E2E manual testing becomes even easier :)
2022-06-30 09:09:58 +09:00
Callum Tait
cda10fd243 docs: remove once feature flag env var (#1590) 2022-06-30 09:09:37 +09:00
Yusuke Kuoka
b5d1a63bdf Enhance the acceptance runnerset yaml template for manual E2E (#1587)
The primary goal of this change is to let the tester know about the config difference between the explicitly configured ephemeral work volume vs the automatically configured work volume with workVolumeClaimTemplate+containerMode=kubernetes.
2022-06-29 22:15:50 +09:00
Yusuke Kuoka
6f3e23973d Bump E2E runner version to 2.294.0 (#1586)
so that every runner does not result in auto-updating itself on startup in E2E, which makes E2E take longer to complete.
2022-06-29 22:05:50 +09:00
Yusuke Kuoka
a517c1ff66 Fix old runner pods stuck in Terminating since #1579 (#1585)
Ref #1579
2022-06-29 22:02:42 +09:00
Yusuke Kuoka
9b28e633c1 Drop support for --once (#1580)
Ref #1196
2022-06-29 21:49:52 +09:00
Yusuke Kuoka
8161136cbd Fix PercentageRunnersBusy scaling delay (#1579)
* Use a dedicated pod label to say it is a runner pod

Follow-up for #1546

* Fix PercentageRunnersBusy scaling delay

Ref #1374
2022-06-29 20:49:21 +09:00
Nikola Jokic
a9ac5a1cbf extracted validations to a single point (#1582) 2022-06-29 20:32:00 +09:00
Callum Tait
d4f35cff4f ci: add paths to push trigger (#1583) 2022-06-29 20:30:07 +09:00
Yusuke Kuoka
f661249f07 Use the go-github impl of ListRunnerGroups with visible_to_repository (#1578)
Ref #1402
2022-06-29 09:53:03 +01:00
Mike
73e430ce54 Add a solution to InternalError webhook context timedout (#1558)
* added troubleshooting solution

* added error example

* added entry to the pages index

* sorted

Co-authored-by: Mike Joseph <mike@Mikes-MacBook-Pro-5618.local>
Co-authored-by: Mike Joseph <mike@efrontier.com>
2022-06-29 09:40:23 +09:00
renovate[bot]
858ef8979d chore(deps): update helm/kind-action action to v1.3.0 (#1532)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-29 09:05:26 +09:00
renovate[bot]
1ce0a183a6 chore(deps): update azure/setup-helm action to v3 (#1571)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-29 09:04:33 +09:00
renovate[bot]
63935d2053 fix(deps): update module github.com/stretchr/testify to v1.7.5 (#1510)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-29 08:39:09 +09:00
John Delivuk
fc63d6d26e Fix: Match Ingress API Version correctly. (#1541)
* Updating conditional to match the api version and kind

mend

* Updating conditional to match the api version and kind

mend
2022-06-29 08:30:11 +09:00
renovate[bot]
5ea08411e6 chore(deps): update dependency golang to v1.18.3 (#1509)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-29 08:29:14 +09:00
Giuseppe Crinò
067ed2e5ec docs: fix logic explanation for scale down delay (#1562)
Signed-off: Giuseppe Crinò <giuscri@gmail.com>
2022-06-29 08:26:28 +09:00
renovate[bot]
d86bd2bcd7 fix(deps): update module sigs.k8s.io/controller-runtime to v0.12.2 (#1449)
* fix(deps): update module sigs.k8s.io/controller-runtime to v0.12.2

* Regenerate manfiests with the updated k8s and controller-runtime deps

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-06-29 06:42:17 +09:00
Yusuke Kuoka
ddd417f756 Bump go-github to v45 (#1573)
* Bump go-github to v45

Ref #1402

* fixup! Bump go-github to v45
2022-06-29 06:34:58 +09:00
Thomas Boop
0386c0734c containerMode option to allow running jobs in k8's instead of docker (#1546)
* added containerMode=kubernetes env variables to the runner

* removed unused logging

* restored configs and charts

* restored makefile cert version and acceptance/run

* added workVolumeClaimTemplate in pod definition, including logic

* added claim template name based on the runner

* Apply suggestions from code review

update errors

* added concurrent cleanup before runner pod is deleted

* update manifests

* added retry after 30s if pod cleanup contains err

* added admission webhook check, made workVolumeClaimTemplate mandatory for k8s

* style changes and added comments

* added izZero timestamp check for deleting runner-linked pods

* changed order of local variable to avoid copy if p is deleted

* removed docker from container mode k8s

* restored charts, config, makefile

* restored forked files back and not the ARC ones

* created PersistentVolume on containerMode k8s

* create pv only if storage class name is local-storage

* removed actions if storage class name is local-storage

* added service account validation if container mode kubernetes

* changed the coding style to match rest of the ARC

* added validation to the runnerdeployment webhook

* specified fields more precisely, added webhook validation to the replicaset as well

* remake manifests

* wraped delete runner-linked-pods in kube mode

* fixed empty line

* fixed import

* makefile changes for hooks

* added cleanup secrets

* create manifests

* docs

* update access modes

* update dockerfile

* nit changes

* fixed dockerfile

* rewrite allowing reuse for runners and runnersets

* deepcopy forgot to stage

* changed privileged

* make manifests

* partly moved to finalizer, still need to apply finalizer first

* finalizer added if env variable used in container mode exists

* bump runner version

* error message moved from Error to Info on cleanup pods/secrets

* removed useless dereferencing, added transformation tests of workVolumeClaimTemplate

* Apply suggestions from code review

* Update controllers/utils_test.go

Co-authored-by: Thomas Boop <52323235+thboop@users.noreply.github.com>

* Update controllers/utils_test.go

Co-authored-by: Thomas Boop <52323235+thboop@users.noreply.github.com>

* add hook version to cli, update to 0.1.2

* Apply suggestions from code review

* Update controllers/utils_test.go

* Update runner/Makefile

* Fix missing secret permission and the error handling

* Fix a runnerpod reconciler finalizer to not trigger unnecessary retry

Co-authored-by: Nikola Jokic <nikola-jokic@github.com>
Co-authored-by: Nikola Jokic <97525037+nikola-jokic@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-06-28 14:12:40 +09:00
Yusuke Kuoka
af96de6184 Fix completed runner pod recreation not to be blocked after max out (#1568)
Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1477#issuecomment-1164154496
2022-06-28 13:50:07 +09:00
Arnaud
abb8615796 Webhook server configuration with kustomize (#1312)
* webhook server configuration with kustomize

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-06-28 09:08:25 +09:00
Sam Weston
bc7a3cab1b Add priorityClassName to CRDs (#1513)
* Add pod priorityClassName to controller and crds

* Add missing bits in bases directory

* Regenerate crds
2022-06-28 08:45:19 +09:00
Yusuke Kuoka
e2c8163b8c Make webhook-based scale race-free (#1477)
* Make webhook-based scale operation asynchronous

This prevents race condition in the webhook-based autoscaler when it received another webhook event while processing another webhook event and both ended up
scaling up the same horizontal runner autoscaler.

Ref #1321

* Fix typos

* Update rather than Patch HRA to avoid race among webhook-based autoscaler servers

* Batch capacity reservation updates for efficient use of apiserver

* Fix potential never-ending HRA update conflicts in batch update

* Extract batchScaler out of webhook-based autoscaler for testability

* Fix log levels and batch scaler hang on start

* Correlate webhook event with scale trigger amount in logs

* Fix log message
2022-06-27 18:31:48 +09:00
Callum Tait
84d16c1c12 revert: "Overhauled startup.sh Script (#1454)" (#1561)
This reverts commit 071898c96b.
2022-06-23 12:39:32 +01:00
Richard Fussenegger
071898c96b Overhauled startup.sh Script (#1454)
This overhaul turns it into a shellcheck valid script with explicit error handling for all possible situations I could think of. This change takes https://github.com/actions-runner-controller/actions-runner-controller/pull/1409 into account and things can be merged in any order. There are a few important changes here to the logic:

- The wait logic for checking if docker comes up was fundamentally flawed because it checks for the PID. Docker will always come up and thus become visible in the process list, just to immediately die when it encounters an issue, after which supervisor starts it again. This means that our check so far is flaky due to the `sleep 1` it might encounter a PID, or it might not, and the existence of the PID does not mean anything. The `docker ps` check we have in the `entrypoint.sh` script does not suffer from this as it checks for a feature of docker and not a PID. I thus entirely removed the PID check, and instead I am handing things over to our `entrypoint.sh` script by setting the environment variables correctly.
- This change has an influence on the `docker0` interface MTU configuration, because the interface might or might not exist after we started docker. Hence, I changed this to a time boxed loop that tries for one minute to set up the interface's MTU. In case the command fails we log an error and continue with the run.
- I changed the entire MTU handling by validating its value before configuring it, logging an error and continuing without if it is set incorrectly. This ensures that we are not going to send our users on a bug hunt.
- The way we started supervisord did not make much sense to me. It sends itself into the background automatically, there is no need for us to do so with Bash.

The decision to not fail on errors but continue is a deliberate choice, because I believe that running a build is more important than having a perfectly configured system. However, this strategy might also hide issues for all users who are not properly checking their logs. It also makes testing harder. Hence, we could change all error conditions from graceful to panicking. We should then align the exit codes across `startup.sh` and `entrypoint.sh` to ensure that every possible error condition has its own unique error code for easy debugging.
2022-06-23 09:37:01 +09:00
renovate[bot]
f24e2fa44e chore(deps): update dependency actions/runner to v2.294.0 2022-06-22 21:45:32 +00:00
Callum Tait
3c7d3d6b57 ci: hardcode dockerhub username (#1555) 2022-06-22 16:15:50 +01:00
Callum Tait
23f091d7fa ci: don't login on a pr (#1554)
* ci: don't login on a pr

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-06-22 16:03:36 +01:00
Callum Tait
667764e027 chore: suggest gist first (#1539) 2022-06-18 17:38:37 +09:00
Callum Tait
de693c4191 ci: runners trigger on push (#1549)
* ci: runners trigger on push

* ci: comments

* ci: comments
2022-06-18 17:34:40 +09:00
Callum Tait
510fc9c834 ci: add GitHub packages to arc release (#1525)
* ci: add GitHub packages to arc release

* ci: use restrictive permissions
2022-06-15 11:37:19 +09:00
Callum Tait
7fd5e24961 chore: bump chart to app 0.24.1 (#1531) 2022-06-15 11:34:55 +09:00
Yusuke Kuoka
9974b1a2b7 e2e: Enable buildx in more images (#1530) 2022-06-14 09:29:30 +01:00
Yusuke Kuoka
bd91b73fd9 chore: update bug_report.yml (#1529) 2022-06-14 09:29:06 +01:00
Callum Tait
a7ae910ee4 docs: add CRD disclaimer to bug report (#1516) 2022-06-14 09:42:31 +09:00
Callum Tait
2733c36d0e ci: publish controller canary to github packages (#1524)
* ci: publish controller canary to github packages

* ci: include image name
2022-06-14 09:10:13 +09:00
Yusuke Kuoka
0ef9a22cd4 Fix confusing PV controller log (#1526)
Ref #1511
2022-06-14 08:35:04 +09:00
Renovate Bot
933b0c7888 chore(deps): update dependency actions/runner to v2.293.0 2022-06-13 17:09:29 +00:00
renovate[bot]
1b7ec33135 chore(deps): update actions/setup-python action to v4 (#1514)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-06-13 14:07:52 +01:00
Callum Tait
a62882d243 ci: fix permisions (#1512)
* ci: fix permisions

* chore: change to trigger build

* ci: add write permission to packages

* ci: remove conditionals for docker logins

* Update controllers/utils_test.go

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-06-09 10:25:56 +09:00
Callum Tait
0cd13fe51d ci: align pipeline files and setups (#1484)
* ci: align pipeline files and setups

* ci: more changes

* ci: various changes

* ci: fix setup-helm action ref

* ci: better pipeline name

* ci: more format aligning

* ci: more format aligning

* ci: better job name

* ci: supports multiple languages

* ci: better pipeline and job names

* ci: do a verb-noun thing for consistency

* ci: use 'arc' when talking holistically

* ci: add caching scope

* ci:  put canary in a scope

* ci: fix syntax error

* ci: better pipeline and job names

* ci: better job name

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-06-08 10:04:14 +09:00
Vinícius Garcia
01c8dc237e Fix example manifests for webhooks-based scaling (#1354)
* Fix example manifests for webhook based scaling

I tried running these on my k8s cluster and I got some easy to fix errors, so I am committing them here.

* Fix example manifests for webhook autoscaling with workflow_jobs

* Fix the explation on how to setup webhooks on your cluster

* Replace unclear comment with actual code examples

There was a comment instructing users to add minReplicas and
maxReplicas to all the HRA yamls, so I just removed it and added
these attributes to the yamls themselves for clarity.

* Make clear that using the ingress example is just a suggestion

* Apply some text improvements suggested by @mumoshu

* Update examples so the webhook server is exposed on a NodePort

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

* Remove an unnecessary field from one the examples

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

* Apply suggestion from @mumoshu

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

* Remove namespace fields from webhook autoscaler examples

This change was suggested by @mumoshu

* Apply final suggestion from @mumoshu

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-06-07 08:33:09 +09:00
Yusuke Kuoka
7c4db63718 chart: Bump appVersion to 0.24.0 (#1505) 2022-06-03 22:01:35 +09:00
Yusuke Kuoka
3d88b9630a doc: Add "people" section (#1498)
Ref #1497
2022-05-31 09:29:15 +09:00
Yusuke Kuoka
1152e6b31d Add release note for v0.24.0 (#1493) 2022-05-30 09:10:36 +09:00
renovate[bot]
ac27df8301 chore(deps): update dependency actions/runner to v2.292.0 (#1475)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-27 09:49:46 +09:00
Yusuke Kuoka
9dd26168d6 Fix confusing logs from pv and pvc controllers (#1487)
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1321#issuecomment-1137431212
2022-05-26 18:21:23 +09:00
Yusuke Kuoka
18bfb28c0b e2e: ARC_E2E_NO_CLEANUP to prevent cleanup (#1470)
A small improvement to our E2E test suite which allows you to set `ARC_E2E_NO_CLEANUP=whatever` to let it prevent the kind cluster cleanup on successful test run, so that you can rerun it without waiting for the new kind cluster to come up.
2022-05-26 10:59:50 +09:00
renovate[bot]
84210e900b chore(deps): update actions/setup-python digest to fff15a2 (#1458)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-25 12:12:22 +09:00
Yusuke Kuoka
ef3313d147 doc: Use RunnerSet to retain various cache by leveraging PV (#1464)
* doc: Use RunnerSet to retain various cache

In relation to #1286 and as a follow-up for #1340

* docs: clarify client vs daemon

* docs: better wording

* Separate RunnerSet examples for docker iimage layer caching

* Revert changes on testdata as it is going to be added via #1471 instead

* Update README.md

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>

* fixup! Update README.md

* Remove the outdated RunnerSet limitation

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-05-25 11:09:36 +09:00
Yusuke Kuoka
c7eea169ad test: Add test for runner with generic ephemeral volume as "work" (#1472)
This adds the test to verify the runner pod generation logic for the case that you use a generic ephemeral volume as "work".
It is almost an adaptation of the test cases writetn for RunnerSet in #1471, to RunnerDeployment and Runner.
2022-05-25 10:37:23 +09:00
Yusuke Kuoka
63be0223ad fix: Avoid duplicate volume and mount name error for generic ephemeral volume as "work" (#1471)
* fix: Avoid duplicate volume and mount name error for generic ephemeral volume as "work"

While manually testing configurations being documented in #1464, I discovered that the use of dynamic ephemeral volume for "work" directory was not working correctly due to the valiadation error.

This fixes the runner pod generation logic to not add the default volume and volume mount for "work" dir, so that the error disappears.

Ref #1464

* e2e: Ensure work generic ephemeral volume to work as expected
2022-05-22 10:25:50 +09:00
Yusuke Kuoka
5bbea772f7 doc: enhance troubleshooting guide with the scale-to-zero issue (#1469)
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1057#issuecomment-1133439061
2022-05-21 19:00:40 +01:00
Callum Tait
2aa3f1e142 chore: remove stale app config (#1465) 2022-05-21 14:19:41 +09:00
Yusuke Kuoka
3e988afc09 test: add fuzzing to the test suite (#1463)
As a part of #1298, we add fuzzing based on Go test's fuzzing support to the test suite
2022-05-19 13:34:23 +01:00
Yusuke Kuoka
84210f3d2b Bump Go to 1.18.2 (#1462)
As a part of #1298, I'm going to use Go fuzzing which is availabls since Go 1.18.

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-05-19 10:33:31 +01:00
Yusuke Kuoka
536692181b docs: Add CII Best Practices badge to README (#1461)
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1298
2022-05-19 10:16:39 +01:00
renovate[bot]
23403172cb chore(deps): update dependency golang to v1.18.2 (#1229)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-19 17:36:31 +09:00
renovate[bot]
8a8ec43364 chore(deps): update github/codeql-action action to v2.1.11 (#1455)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-18 09:02:26 +09:00
Felipe Galindo Sanchez
78c01fd31d cleanup some unused code and minor refactors (#1274)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-05-16 18:38:32 +09:00
Bernardo Meurer
bf45aa9f6b refactor(runner/entrypoint): don't mv externalstmp if it's not there (#1315) 2022-05-16 18:37:37 +09:00
Callum Tait
b5aa1750bb ci: match renovate with new dockerfile names (#1453) 2022-05-16 18:15:44 +09:00
Richard Fussenegger
cdc9d20e7a Renamed Runner Dockerfiles (#1248)
Renamed the runner dockerfiles so that we have proper syntax highlighting for them, as well as a consistent way to map from the image name to the dockerfile. Added a `.dockerignore` file to avoid uploading things to the daemon that we never use.
2022-05-16 11:41:28 +09:00
Hyeonmin Park
8035d6d9f8 chart: Add extraPaths to Ingress of GitHub Webhook Server (#1129)
* chart: Add extraPaths to Ingress of GitHub Webhook Server

* Update charts/actions-runner-controller/templates/githubwebhook.ingress.yaml

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

* Prefix the toYaml expression to remove the extra newline before extra paths

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-05-16 11:34:56 +09:00
Callum Tait
65f7ee92a6 refactor: remove registration runner dead code (#1260)
We had some dead code left over from the removal of registration runners. Registration runners were removed in #859 #1207

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-05-16 11:23:39 +09:00
Matéo Mévollon
fca8a538db docs: document the Docker MTU problem in troubleshooting guide (#1257)
* docs: document the Docker MTU problem

* Update TROUBLESHOOTING.md

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-05-16 11:13:05 +09:00
Nicholas Farley
95ddc77245 Allow customizing the controller webhook port (#1410)
Closes #1314

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-05-16 10:33:13 +09:00
Yusuke Kuoka
b5194fd75a Enhance RunnerSet to optionally retain PVs accross restarts (#1340)
* Enhance RunnerSet to optionally retain PVs accross restarts

This is our initial attempt to bring back the ability to retain PVs across runner pod restarts when using RunnerSet.
The implementation is composed of two new controllers, `runnerpersistentvolumeclaim-controller` and `runnerpersistentvolume-controller`.
It all starts from our existing `runnerset-controller`. The controller now tries to mark any PVCs created by StatefulSets created for the RunnerSet.
Once the controller terminated statefulsets, their corresponding PVCs are clean up by `runnerpersistentvolumeclaim-controller`, then PVs are unbound from their corresponding PVCs by `runnerpersistentvolume-controller` so that they can be reused by future PVCs createf for future StatefulSets that shares the same same StorageClass.

Ref #1286

* Update E2E test suite to cover runner, docker, and go caching with RunnerSet + PVs

Ref #1286
2022-05-16 09:26:48 +09:00
renovate[bot]
adf69bbea0 fix(deps): update module github.com/prometheus/client_golang to v1.12.2 (#1448)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-16 09:19:55 +09:00
renovate[bot]
b43ef70ac6 fix(deps): update module github.com/bradleyfalzon/ghinstallation/v2 to v2.0.4 (#1452)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-16 08:59:53 +09:00
Yusuke Kuoka
f1caebbaf0 Update codeql.yml (#1451)
Give up pinning deps with commit IDs because PRs were unreviewable due to missing changelog and it sends PRs for every commit to the master/main branch of the deps, which is undesired. We only need updates for tagged releases!
2022-05-16 08:59:29 +09:00
renovate[bot]
ede28f5046 chore(deps): update helm values quay.io/brancz/kube-rbac-proxy to v0.12.0 (#1323)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-16 08:50:12 +09:00
shettarvinay
f08ab1490d Update Dockerfile, github/github.go, go.mod and go.sum for fixing CVE-2020-2616 and CVE-2022-24921 (#1230)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-05-16 08:45:44 +09:00
renovate[bot]
772ca57056 fix(deps): update module github.com/stretchr/testify to v1.7.1 (#1228)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-16 08:43:41 +09:00
renovate[bot]
51b13e3bab fix(deps): update module github.com/onsi/gomega to v1.19.0 (#1069)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-16 08:41:59 +09:00
Michael Kuhnt
81017b130f fix(chart): add missing namespace to webhook.ingress (#1417)
The ingress needs to be deployed in the very same namespace
as the service it is forwarding to.
2022-05-16 08:41:35 +09:00
Yusuke Kuoka
bdbcf66569 chore: Add signrel command for sigining arc release assets (#1426)
* chore: Add signrel command for sigining arc release assets

I used this command to sign assets for the recent releases to comply with the recommendation of 5758364c82/docs/checks.md (signed-releases)

Ref #1298

* Implement signrel subcommands for listing tags and signing assets, with docs
2022-05-16 08:40:41 +09:00
Yusuke Kuoka
0e15a78541 Create SECURITY.md (#1424)
* Create SECURITY.md

According to 5758364c82/docs/checks.md (security-policy)

Ref #1298

* Update SECURITY.md
2022-05-16 08:40:16 +09:00
renovate[bot]
f85c3d06d9 chore(deps): update docker/setup-qemu-action action to v2 (#1450)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-14 16:07:23 +01:00
Callum Tait
51ba7d7160 chore: more initialisation info to help debug (#1276)
* chore: more initialisation info to help debug

* chore: clearer flag description

* chore: use actual english

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-05-14 17:11:20 +09:00
Yusuke Kuoka
759349de11 fix: force restartPolicy "Never" to prevent runner pods from stucking in Terminating when the container disappeared (#1395)
Ref #1369
2022-05-14 09:07:17 +01:00
renovate[bot]
3014e98681 chore(deps): update helm/chart-releaser-action digest to a3454e4 (#1441)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-13 07:47:19 +09:00
renovate[bot]
5f4be6a883 fix(deps): update module github.com/go-logr/logr to v1.2.3 (#1241)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-13 07:42:58 +09:00
Yusuke Kuoka
b98f470a70 ci: enable CodeQL Alerts following the OpenSSF Security Scorecards recommendation (#1421)
Ref #1298
2022-05-12 10:55:11 +01:00
Yusuke Kuoka
e46b90f758 fix: runner pods managed by RunnerSet to not stuck in Terminating (#1420)
This is intended to fix #1369 mostly for RunnerSet-managed runner pods. It is "mostly" because this fix might work well for RunnerDeployment in cases that #1395 does not work, like in a case that the user explicitly set the runner pod restart policy to anything other than "Never".

Ref #1369
2022-05-12 09:34:27 +01:00
Yusuke Kuoka
3a7e8c844b feat: Support arbitrarily setting privileged: true for runner container (#1383)
Resolves #1282
2022-05-12 09:25:51 +01:00
renovate[bot]
65a67ee61c chore(deps): update docker/setup-qemu-action digest to 0522dcd (#1440)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 11:09:43 +09:00
renovate[bot]
215ba36fd1 chore(deps): update docker/setup-buildx-action digest to 91cb32d (#1439)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 11:08:57 +09:00
renovate[bot]
27774b47bd fix(deps): update golang.org/x/oauth2 digest to 9780585 (#1329)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 10:50:29 +09:00
renovate[bot]
fbde2b9a41 chore(deps): update docker/login-action digest to d398f07 (#1438)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 10:37:34 +09:00
renovate[bot]
212098183a chore(deps): update docker/build-push-action digest to c5e6528 (#1437)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 10:37:09 +09:00
renovate[bot]
4a5097d8cf chore(deps): update actions/setup-go digest to 193b404 (#1431)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 10:36:53 +09:00
renovate[bot]
9c57d085f8 chore(deps): update actions/stale digest to 65d24b7 (#1433)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 09:21:08 +09:00
renovate[bot]
d6622f9369 chore(deps): update actions/setup-python digest to c57f793 (#1432)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 09:20:57 +09:00
Yusuke Kuoka
3b67ee727f e2e: Fix wrong scale trigger configuration used in test (#1434) 2022-05-12 09:19:58 +09:00
Yusuke Kuoka
e6bddcd238 Fix certain runnerset name in E2E and acceptance (#1435) 2022-05-12 09:19:47 +09:00
Callum Tait
f60e57d789 docs: improve troubleshooting (#1428)
* docs: runner cleanup instructions

* docs: add delay in job allocation

* docs: fix broken link

* docs: reorganise into categories

* docs: align the format across the doc

* docs: remove code comment

* docs: add a tools section

* docs: add a short description to each section

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-05-12 08:45:10 +09:00
renovate[bot]
3ca1152420 chore(deps): update actions/checkout digest to 2541b12 (#1430)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 08:43:35 +09:00
renovate[bot]
e94fa19843 chore(deps): update actions/cache digest to 95f200e (#1429)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-12 08:43:27 +09:00
renovate[bot]
99832d7104 chore(deps): update docker/setup-buildx-action action to v2 (#1416)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-11 16:34:53 +01:00
renovate[bot]
289bcd8b64 chore(deps): update docker/login-action action to v2 (#1415)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-05-11 16:34:40 +01:00
Jacob Gadikian
5e8cba82c2 docs: simplify wording (#1427)
clarify docs
2022-05-11 11:44:07 +01:00
Yusuke Kuoka
dabbc99c78 refactor(controller): stop auto-setting RUNNER_FEATURE_FLAG_EPHEMERAL (#1385)
This feature flag was provided from ARC to runner container automatically to let it use `--ephemeral` instead of `--once` by default. As the support for `--once` is being dropped from the runner image via #1384, we no longer need that.

Ref #1196
2022-05-11 11:42:55 +01:00
Yusuke Kuoka
d01595cfbc ci: pin GitHub Actions workflow actions by hash (#1422)
as recommended in 5758364c82/docs/checks.md (pinned-dependencies)

Ref #1298
2022-05-11 11:41:30 +01:00
Yusuke Kuoka
c1e5829b03 refactor(runner): ability to opt-out of using --ephemeral / opt-in to legacy --once for GHES older than 3.3 (#1384)
* runner: Remove the ability to use the deprecated `--once` flag

Ref #1196

* runner: Ability to opt-out of using --ephemeral

Although we are going to eventually remove the ability to use the legacy --once flag as proposed in #1196, there might be folks still using legacy GHES versions 3.2 or earlier.

This commit removes the existing feature flag to opt-in for --ephemeral, while adding another feature flag RUNNER_FEATURE_FLAG_ONCE to opt-in for --once so that folks stuck in legacy GHES versions
can still use ARC.

Since this change every user starts using --ephemeral by default. If they see any issues on legacy GHES instance, RUNNER_FEATURE_FLAG_ONCE=true can be set to opt-in to keep using --once, which gives one more ARC release until they upgrade their GHES instance.

But beware, we won't support legacy GHES instances forever as it's going to be a maintenance nightmare. Please upgrade!

Ref #1196
2022-05-11 09:55:33 +01:00
Renovate Bot
800d6bd586 chore(deps): update dependency actions/runner to v2.291.1 2022-04-29 19:05:31 +00:00
Callum Tait
d3b7f0bf7d chore: release chart targeting v0.23.0 (#1404) 2022-04-29 13:54:22 +01:00
Yusuke Kuoka
dbcb67967f Turn the bug report template into a form with more context (#1401)
I believe this helps us focus on relatively more important issues like critical bug reports and highly-requested feature requests.

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-04-29 21:09:59 +09:00
Callum Tait
55369bf846 fix: forgot to do the chart (#1388)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

> chart test is failing due to `flag provided but not defined: -default-scale-down-delay` which seems to come from the fact that we still use ARC 0.22.3 for chart testing.
> 
> Probably we'd better figure out how to test it against both the latest release version of ARC and the canary version of ARC?
> 
> Or just test it against the canary version so that it won't fail when the chart depends on features that are available only in the canary version of ARC? 🤔

yup, lets get this merged though so we can do a release today
2022-04-29 09:15:27 +01:00
Yusuke Kuoka
1f6303daed Merge pull request #1396 from actions-runner-controller/docs/pre-release
docs: final doc changes + v0.23.0 release notes
2022-04-29 12:35:42 +09:00
Yusuke Kuoka
0fd1a681af Update bug_report.md (#1400)
so that we can hopefully get enough information to diagnose the issue in case it's really a bug report, or it goes to Discussions in case it's a question.
2022-04-29 12:32:08 +09:00
toast-gear
58416db8c8 docs: add new runner group API enhancemnet 2022-04-28 16:17:53 +01:00
toast-gear
78a0817c2c docs: align release doc format 2022-04-28 16:06:59 +01:00
toast-gear
9ed429513d docs: bump the helm upgrade chart docs version 2022-04-28 16:04:58 +01:00
toast-gear
46291c1823 docs: highlight the new scale down delay flag 2022-04-28 16:04:16 +01:00
toast-gear
832e59338e docs: clarification of the release log 2022-04-28 16:00:03 +01:00
toast-gear
70ae5aef1f docs: add migration steps 2022-04-28 15:57:03 +01:00
toast-gear
6d10dd8e1d docs: breaking changes in v0.23.0 2022-04-28 10:51:19 +01:00
toast-gear
61c5a112db docs: remove reference to cleared limitation 2022-04-28 10:39:11 +01:00
toast-gear
7bc08fbe7c docs: remove TotalNumberOfQueuedAndInProgressWorkflowRuns limitation 2022-04-28 10:36:12 +01:00
Yusuke Kuoka
4053ab3e11 Fix label support for TotalNumberOfQueuedAndInProgressWorkflowRuns metric (#1390)
In #1373 we made two mistakes:

- We mistakenly checked if all the runner labels are included in the job labels and only after that it marked the target as eligible for scale. It should definitely be the opposite!
- We mistakenly checked for the existence of `self-hosted` labe l in the job. [Although it should be a good practice to explicitly say `runs-on: ["self-hosted", "custom-label"]`](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-using-labels-for-runner-selection), that's not a requirement so we should code accordingly.

The consequence of those two mistakes was that, for example, jobs with `self-hosted` + `custom` labels didn't result in scaling runner with `self-hosted` + `custom` + `custom2`. This should fix that.

Ref #1056
Ref #1373
2022-04-27 16:24:21 +01:00
Callum Tait
059481b610 refactor: remove legacy controller Docker build (#1360) [skip ci]
* refactor: remove legacy build and use buildkit

* refactor: add runner version to root makefie

* refactor: enable buildkit for runner make build

* refactor: ignore runner makefile in ci

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-27 08:21:02 +01:00
renovate[bot]
9fdb2c009d fix(deps): update module github.com/google/go-cmp to v0.5.8 (#1394)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-04-27 10:11:33 +09:00
Callum Tait
9f7ea0c014 docs: highlight breaking changes are possible (#1310)
It's probably worth highlighting it's ver 0.X.X by design and that breaking changes are possible until we move it to 1.0.0

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-26 11:20:11 +09:00
Callum Tait
0caa0315c6 feat: set default in chart (#1389)
Ref #963
Ref #899

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-26 10:25:01 +09:00
Yusuke Kuoka
1c726ae20c chore: Add unit test for RunnerReconciler.newPod (#1382)
Adds some unit tests for the runner pod generation logic that is used internally by runner controller as preparation for #1282
2022-04-25 09:59:21 +09:00
Yusuke Kuoka
d6cdd5964c chore: Add unit test for newRunnerPod (#1381)
Adds some unit tests for the runner pod generation logic that is used internally by runner deployment and runner set controllers as preparation for #1282
2022-04-25 08:52:58 +09:00
Yusuke Kuoka
a622968ff2 feat: Add label support to TotalNumberOfQueuedAndInProgressWorkflowRuns metric (#1373)
This is an implementation for my intepretation of the "bronze" case proposed in #1056

Ref #1056
2022-04-24 14:41:34 +09:00
Soham Banerjee
e8ef84ab76 Removed the default githubEvent: {} (#1361)
Ref #1358
See also #1379
2022-04-24 13:39:59 +09:00
Yusuke Kuoka
1551f3b5fc Remove the default githubEvent: {} requiring a event to be defined (#1379)
Ref #1358
2022-04-24 13:37:26 +09:00
Yusuke Kuoka
3ba7179995 Do not enable TotalNumberOfQueuedAndInProgressWorkflowRuns by default (#1372)
Previously, omitting hra.spec.metrics at all resulted in ARC enabling the TotalNumberOfQueuedAndInProgressWorkflowRuns.
That turned out not a good idea so since this change it is not enabled by default.

Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/728
2022-04-24 13:36:42 +09:00
Mário Uhrík
e7c6c26266 Runner CRD: Add required conversionReviewVersions field (#1259)
Without that field, GKE 1.21 refuses to create the CRD
with an error message that conversionReviewVersions is mandatory.

conversionReviewVersions is a required field when creating apiextensions.k8s.io/v1 custom resource definitions.
Webhooks are required to support at least one ConversionReview version understood by the current and previous API server.

See https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/_print/#webhook-request-and-response
2022-04-24 11:04:15 +09:00
Tingluo Huang
ebe7d060cb Find runner groups that visible to repository using a single API call. (#1324)
The [ListRunnerGroup API](https://docs.github.com/en/rest/reference/actions#list-self-hosted-runner-groups-for-an-organization) now add a new query parameter `visible_to_repository`.

We were doing `N+1` lookup when trying to find which runner group can be used for job from a certain repository.
- List all runner groups
- Loop through all groups to check repository access for each of them via [API](https://docs.github.com/en/rest/reference/actions#list-repository-access-to-a-self-hosted-runner-group-in-an-organization)

The new query parameter `visible_to_repository` should allow us to get the runner groups with access in one call.

Limitation:
- The new query parameter is only supported in GitHub.com, which means anyone who uses ARC in GitHub Enterprise Server won't get this.
- I am working on a PR to update `go-github` library to support the new parameter, but it will take a few weeks for a newer `go-github` to be released, so in the meantime, I am duplicating the implementation in ARC as well to support the new query parameter.
2022-04-24 10:54:40 +09:00
Callum Tait
c3e280eadb refactor: set sync period default to 1m (#1308)
Fixes: #1294

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-04-24 10:47:00 +09:00
Vinícius Garcia
9f254a2393 docs: run README files through Grammarly (#1353)
* Update README.md

* Run charts/actions-runner-controller/README.md thorugh Grammarly

* Fix broken link as suggested by @toast-gear

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-04-22 16:58:10 +01:00
renovate[bot]
e5cf3b95cf fix(deps): update module github.com/teambition/rrule-go to v1.8.0 (#1048)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-04-20 11:11:38 +09:00
Callum Tait
24aae58dbc feat: default scale down flag (#963)
Resolves #899

Co-authored-by: Callum <callum@domain.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-04-20 11:09:09 +09:00
Jeff Billimek
13bfa2da4e Fix runner pod dnsConfig (#1227)
Fixes #1226
Fixes #1224

Signed-off-by: Jeff Billimek <jeff@billimek.com>
2022-04-20 10:55:20 +09:00
Chris Bui
cb4e1fa8f2 breaking: Pluralize topologySpreadConstraint to match docs (#1089)
Original PR:
https://github.com/actions-runner-controller/actions-runner-controller/pull/814/files#diff-25283fab3c6d5fa726652c8741a122c1ba14d8486fe092774617a385e4bc1a92R145

If you're already using this feature, follow the process explained in https://github.com/actions-runner-controller/actions-runner-controller/pull/1089#issuecomment-1103354025 when upgrading.

Fixes #984
2022-04-20 10:47:18 +09:00
Patrick Ellis
7a5a6381c3 Add WorkflowJob to GitHubEventScaleUpTriggerSpec types (#922) 2022-04-20 09:59:08 +09:00
Renovate Bot
81951780b1 chore(deps): update dependency actions/runner to v2.290.1 2022-04-14 18:36:24 +00:00
renovate[bot]
3b48db0d26 chore(deps): update actions/stale action to v5 (#1338)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-04-13 09:42:27 +01:00
Callum Tait
352e206148 refactor: use apt-get instead of apt (#1342)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-13 09:40:15 +01:00
Richard Fussenegger
6288036ed4 Removed modprobe Script (#1247) [skip ci]
* Removed `modprobe` Script

I was able to find out that this script originates from https://github.com/docker-library/docker/blob/master/modprobe.sh but our image does not have `lsmod` nor `modprobe` installed. Hence, if it were in use, it would fail every time. 🤔

* fix: correct command order

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-13 09:39:55 +01:00
Siyuan Zhang
a37b4dfbe3 Fix scale down condition to exclude skipped (#1330)
* Fix scale down condition to exclude skipped
* Use fallthrough and break to let default handle the skipped case

Fixes #1326
2022-04-13 08:53:07 +09:00
Callum Tait
c4ff1a588f chore: migrate to actions stale bot (#1334)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-13 08:29:49 +09:00
Callum Tait
4a3b7bc8d5 refactor: location of some runner cmds (#1337)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-12 22:18:34 +01:00
Richard Fussenegger
8db071c4ba Improved Bash Logger (#1246)
* Improved Bash Logger

This is a first step towards having robust Bash scripts in the runner images. The changes _could_ be considered breaking, depending on our backwards compatibility definition.

* Fixed Log Formatting Issues

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-04-12 22:02:06 +01:00
Renovate Bot
7b8057e417 chore(deps): update dependency actions/runner to v2.290.0 2022-04-12 20:46:19 +00:00
renovate[bot]
960a704246 chore(deps): update azure/setup-helm action to v2.1 (#1328) [skip ci]
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-04-11 18:23:48 +01:00
Daniel Moran
f907f82275 Add more usages of RUNNER_VERSION to Renovate config. (#1313)
* Add more usages of RUNNER_VERSION to Renovate config.

* Double-escape `?` in pattern
2022-04-11 11:28:00 +01:00
Rolf Ahrenberg
7124451cea chore: fix typo (#1316) [skip ci] 2022-04-08 17:32:01 +01:00
Yusuke Kuoka
c8f1acd92c chore: bump chart to latest (#1319)
Bumps the chart version along with the controller version.
We bump the patch number for the chart as the release for the controller is a patch release.
That's the same handling as we've done in the previous version ecc8b4472a and #1300

As always, be sure to upgrade CRDs before updating the controller version!
Otherwise it can break in interesting ways.
2022-04-08 10:59:07 +09:00
Yusuke Kuoka
b0fd7a75ea Fix release workflow 2022-04-08 01:36:14 +00:00
Yusuke Kuoka
b09c54045a Prevent runners from stuck in Terminating when pod disappeared without standard termination process (#1318)
This fixes the said issue by additionally treating any runner pod whose phase is Failed or the runner container exited with non-zero code as "complete" so that ARC gives up unregistering the runner from Actions, deletes the runner pod anyway.

Note that there are a plenty of causes for that. If you are deploying runner pods on AWS spot instances or GCE preemptive instances and a job assigned to a runner took more time than the shutdown grace period provided by your cloud provider (2 minutes for AWS spot instances), the runner pod would be terminated prematurely without letting actions/runner unregisters itself from Actions. If your VM or hypervisor failed then runner pods that were running on the node will become PodFailed without unregistering runners from Actions.

Please beware that it is currently users responsibility to clean up any dangling runner resources on GitHub Actions.

Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1307
Might also relate to https://github.com/actions-runner-controller/actions-runner-controller/issues/1273
2022-04-08 10:17:33 +09:00
Yusuke Kuoka
96f2da1c2e Merge pull request #1262 from fgalind1/patch-4
Fix deleting a runner when pod was deleted
2022-04-08 10:17:13 +09:00
Yusuke Kuoka
cac8b76c68 Merge pull request #1292 from actions-runner-controller/renovate/sigs.k8s.io-controller-runtime-0.x
fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.2
2022-04-08 10:14:47 +09:00
Felipe Galindo Sanchez
e24d942d63 Merge remote-tracking branch 'upstream/master' into patch-4 2022-04-06 06:43:01 -07:00
Felipe Galindo Sanchez
b855991373 ci: pin go version to the known working version (#1303) 2022-04-06 09:34:48 +01:00
Felipe Galindo Sanchez
e7e48a77e4 Merge remote-tracking branch 'upstream/master' into patch-4 2022-04-04 09:54:08 -07:00
Yusuke Kuoka
85dea9b67c Merge pull request #1285 from actions-runner-controller/docs/runnersets
docs: add limitations to runnersets + reorder
2022-04-03 18:18:54 +09:00
Yusuke Kuoka
1d9347f418 chore: bump chart to latest (#1300)
* chore: bump chart to latest

Bumps the chart version along with the controller version.
We bump the patch number for the chart as the release for the controller is a patch release.
That's the same handling as we've done in the previous version ecc8b4472a

As always, be sure to upgrade CRDs before updating the controller version!
Otherwise it can break in interesting ways.

* docs: expand on CRD upgrade requirement

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-04-03 10:15:39 +01:00
Yusuke Kuoka
631a70a35f Fix runner pod to be cleaned up earlier regardless of the sync period (#1299)
Ref #1291
2022-04-03 11:12:44 +09:00
Yusuke Kuoka
b614dcf54b Make the hard-coded runner startup timeout to avoid race on token expiration longer (#1296)
Ref #1295
2022-04-03 09:59:35 +09:00
Callum Tait
14f9e7229e docs: highlight why persistent are not ideal 2022-04-01 15:49:15 +01:00
Renovate Bot
82770e145b fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.2 2022-03-30 21:38:12 +00:00
Renovate Bot
971c54bf5c chore(deps): update dependency actions/runner to v2.289.2 2022-03-30 18:18:17 +00:00
renovate[bot]
b80d9b0cdc chore(deps): update helm/chart-releaser-action action to v1.4.0 (#1287)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-30 13:24:26 +01:00
Bernardo Meurer
e46df413a1 refactor(runner/entrypoint): check for externalstmp (#1277)
* refactor(runner/entrypoint): check for externalstmp [skip ci]

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-03-30 12:18:18 +01:00
toast-gear
eb02f6f26e docs: redundant words 2022-03-30 10:36:34 +01:00
toast-gear
7a750b9285 docs: wording 2022-03-30 10:35:32 +01:00
toast-gear
d26c8d6529 docs: add autoscaling also causes problems 2022-03-30 10:26:08 +01:00
toast-gear
fd0092d13f chore: new line for consistency 2022-03-30 10:02:33 +01:00
toast-gear
88d17c7988 docs: use the right font 2022-03-30 10:00:34 +01:00
toast-gear
98567dadc9 docs: fix index 2022-03-30 09:59:32 +01:00
toast-gear
7e8d80689b docs: add limitations to runnersets + reorder 2022-03-30 09:53:59 +01:00
Callum Tait
d72c396ff1 docs: slight correction for a multi controller env 2022-03-29 16:57:58 +01:00
Milan Aleks
13e7b440a8 chore: typo fix in runner Dockerfile [skip ci] (#1270) 2022-03-29 11:05:24 +01:00
Michael Goodness
a95983fb98 feat(kustomize): add github-webhook-server overlay (#1198)
* feat(kustomize): add github-webhook-server overlay

* chore(kustomize): add image to github-webhook-server overlay

* feat(kustomize): drop sync-period
2022-03-29 11:00:55 +01:00
Callum Tait
ecc8b4472a chore: bump chart to latest (#1280) 2022-03-29 07:46:44 +01:00
Callum Tait
459beeafb9 docs: remove the nonsense 2022-03-27 14:15:42 +01:00
Rolf Ahrenberg
1b327a0721 refactor: use const envvars (#1251) 2022-03-27 12:14:56 +01:00
Jérôme Foray
1f8a23c129 fix(chart): add namespace selector to webhooks when in singleNamespace mode (#1237)
* fix(chart): add namespace selector to webhooks when in singleNamespace mode

* docs: expand multi controller setup

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-03-27 11:52:39 +01:00
Naka Masato
af8d8f7e1d Update runnerdeployment_webhook.go (#1271) 2022-03-25 09:24:13 +09:00
Yusuke Kuoka
e7ef21fdf9 Merge pull request #1264 from ekarlso/env-var-detection-fix
Use container name to detect runner container in Pod
2022-03-25 09:23:48 +09:00
Endre Karlson
ee7484ac91 Use container name to detect runner container in Pod 2022-03-23 12:39:58 +01:00
Yusuke Kuoka
debf53c640 Fix missing pip bin path (/home/runner/.local/bin) (#1263)
Fixes #1261
2022-03-23 10:28:12 +09:00
Felipe Galindo Sanchez
9657d3e5b3 Fix deleting a runner when pod was deleted
With the current implementation if a pod is deleted, controller is failing to delete the runner as it's trying to annotate a pod that doesn't exist as we're passing a new pod object that is not an existing resource
2022-03-22 14:44:50 -07:00
Callum Tait
2cb04ddde7 * feat: move to new run.sh container friendly file (#1244)
* fix: unit tests were very broken

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-03-22 19:02:51 +00:00
renovate[bot]
366f8927d8 chore(deps): update actions/cache action to v3 (#1252)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-22 18:48:23 +00:00
Richard Fussenegger
532a2bb2a9 feat: remove registration-only runner logic from entrypoint (#1249)
Closes #1207
2022-03-22 18:33:14 +00:00
Callum Tait
f28cecffe9 docs: various minor changes (#1250)
* docs: various minor changes

* docs: format fixes
2022-03-20 16:05:03 +00:00
Renovate Bot
4cbbcd64ce chore(deps): update dependency actions/runner to v2.289.1 2022-03-18 22:36:38 +00:00
Richard Fussenegger
a68eede616 feat: copy dotfiles from asset to service dir (#1136)
* feat: copy dotfiles from asset to service dir

* Fixed `UNITTEST` Condition

* Load `/etc/environment`

See https://github.com/actions/runner/issues/1703 for context on this change.
2022-03-18 07:40:52 +00:00
Julien Tanay
c06a806d75 Add note about having 100+ replicas (#1103) 2022-03-16 21:03:05 +00:00
Callum Tait
857c1700ba docs: add repo update to upgrade notes (#1233) 2022-03-16 10:37:37 +00:00
Callum Tait
a40793bb60 chore: bump chart to app 0.22.0 (#1232)
* chore: bump chart to app 0.22.0
2022-03-16 07:57:30 +00:00
Callum Tait
48a7b78bf3 docs: remove runnerset limitation (#1225)
This works great from testing now, this is no longer a limitation due to ARC now creating a statefulset per runner
2022-03-16 09:08:41 +09:00
renovate[bot]
6ff93eae95 chore(deps): update helm/chart-testing-action action to v2.2.1 (#1216)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-15 18:51:54 +00:00
Yusuke Kuoka
b25a0fd606 Merge pull request #1217 from actions-runner-controller/docs/re-order
docs: various changes in preparation for 0.22.0 release

- Move RunnerSets to a more predominant location in the docs
- Clean up  a few bits
- Highlight the deprecation and removal timeline for the `--once` flag
- Renamed ephemeral runners section to something more logical (persistent runners). Static runners were an option however the word static is awkward as it's sort of tied up with autoscaling and the `Runner` kind so Persistent was chosen instead.
- Update upgrade docs to use `replace` instead of `apply`
2022-03-15 09:01:32 +09:00
toast-gear
3beef84f30 docs: better sentences 2022-03-14 12:43:07 +00:00
toast-gear
76cc758d12 docs: minor consistency change 2022-03-14 12:37:57 +00:00
toast-gear
c4c6e833a7 chore: add deprecation warning 2022-03-14 12:35:07 +00:00
toast-gear
ecf74e615e docs: bump versions and upgrade instructions 2022-03-14 10:23:36 +00:00
toast-gear
bb19e85037 docs: various cleanups and re-orderings 2022-03-14 09:52:22 +00:00
Yusuke Kuoka
e7200f274d Merge pull request #1214 from actions-runner-controller/fix-static-runners
Fix runner{set,deployment} rollouts and static runner scaling

I was testing static runners as a preparation to cut the next release of ARC, v0.22.0, and found several problems that I thought worth being fixed.

In particular, this PR fixes static runners reliability issues in two means.

c4b24f8366 fixes the issue that ARC gives up retrying RemoveRunner calls too early, especially on static runners, that resulted in static runners to often get terminated prematurely while running jobs.

791634fb12 fixes the issue that ARC was unable to scale up any static runners when the corresponding desired replicas number in e.g. RunnerDeployment gets updated. It was caused by a bug in the mechanism that is intended to prevent ephemeral runners from being recreated in unwanted circumstances, mistakenly triggered also for static runners.

Since #1179, RunnerDeployment was not able to gracefully terminate old RunnerReplicaSet on update. c612e87 fixes that by changing RunnerDeployment to firstly scale old RunnerReplicaSet(s) down to zero and waits for sync, and set the deletion timestamp only after that. That way, RunnerDeployment can ensure that all the old RunnerReplicaSets that are being deleted are already scaled to zero passing the standard unregister-and-then-delete runner termination process.

It revealed a hidden bug in #1179 that sometimes the scale-to-zero-before-runnerreplicaset-termination does not work as intended. 4551309 fixes that, so that RunnerDeployment can actually terminate old RunnerReplicaSets gracefully.
2022-03-13 22:19:26 +09:00
Yusuke Kuoka
1cc06e7408 e2e: Make enterprise runners optional for testing GitHub App
As GitHub App does not allow ARC to access enterprise runner related API endpoints, like the create-registration-token API.
2022-03-13 13:11:26 +00:00
Yusuke Kuoka
4551309e30 Fix runners to not terminate before unregistration when scaling down
#1179 was not working particularly for scale down of static (and perhaps long-running ephemeral) runners, which resulted in some runner pods are terminated before the requested unregistration processes complete, that triggered some in-progress workflow jobs to hang forever. This fixes an edge-case that resulted in a decreased desired replicas to trigger the failure, so that every runner is unregistered then terminated, as originally designed.
2022-03-13 13:09:46 +00:00
Yusuke Kuoka
7123b18a47 chore: Log more variables when log level is -2 2022-03-13 13:04:28 +00:00
Yusuke Kuoka
cc55d0bd7d Let runnerdeployment controller log runnerreplicaset creation 2022-03-13 12:25:53 +00:00
Yusuke Kuoka
c612e87d85 fix: Let RunnerDeployment scale RunnerReplicaSet to zero before terminating it
so that hopefully RunnerDeployment can gracefully termiante older RunnerReplicaSet on update.
2022-03-13 12:18:22 +00:00
Yusuke Kuoka
326d6a1fe8 Fix the timing of Marking owner for unregistration completion log 2022-03-13 12:16:55 +00:00
Yusuke Kuoka
fa8ff70aa2 Add log when deletion timestamp is being set on owner object 2022-03-13 12:16:29 +00:00
Yusuke Kuoka
efb7fca308 Fix externally deleted runner pod to not block unregistration process 2022-03-13 12:15:49 +00:00
Yusuke Kuoka
e4280dcb0d Fix patch MergeFrom target 2022-03-13 12:14:14 +00:00
Yusuke Kuoka
f153870f5f fix: Do not block indefinitely on runner that cannot be deleted due to 403 2022-03-13 12:12:01 +00:00
Yusuke Kuoka
8ca39caff5 Fix log message on runner deletion 2022-03-13 12:11:11 +00:00
Yusuke Kuoka
791634fb12 Fix static runners not scaling up
It turned out that #1179 broke static runners in a way it is no longer able to scale up at all when the desired replicas is updated.
This fixes that by correcting a certain short-circuit that is intended only for ephemeral runners to not mistakenly triggered for static runners.
2022-03-13 07:26:43 +00:00
Yusuke Kuoka
c4b24f8366 Prevent static runners from terminating due to unregister timeout
The unregister timeout of 1 minute (no matter how long it is) can negatively impact availability of static runner constantly running workflow jobs, and ephemeral runner that runs a long-running job.
We deal with that by completely removing the unregistaration timeout, so that regarldess of the type of runner(static or ephemeral) it waits forever until it successfully to get unregistered before being terminated.
2022-03-13 07:26:36 +00:00
Yusuke Kuoka
a1c6d1d11a doc: Add release note for 0.22.0 (#1199)
As it turned out to be the biggest release ever, I was afraid I might not be able to write a summary of changes that communicates well. Here is my attempt. Please review and leave any comments so that we can be more confident in this release. Thank you!
2022-03-13 16:25:24 +09:00
Yusuke Kuoka
adc889ce8a Fix RunnerDeployment to be able to finish rollout (#1213)
I found that #1179 was unable to finish rollout of an RunnerDeployment update(like runner env update). It was able to create a new RunnerReplicaSet with the desired spec, but unable to tear down the older ones. This fixes that.
2022-03-13 10:10:24 +09:00
Yusuke Kuoka
b83db7be8f Merge pull request #1212 from actions-runner-controller/fix-runnerdeploy-duplicate-envvars
Fix RunnerDeployment-managed runner pods to not get RUNNER_NAME and RUNNER_TOKEN injected twice
2022-03-12 23:27:45 +09:00
Yusuke Kuoka
da2adc0cc5 e2e: Omit RUNNER_FEATURE_FLAG_EPHEMERAL when TEST_FEATURE_FLAG_EPHEMERAL is not set 2022-03-12 14:08:23 +00:00
Yusuke Kuoka
fa287c4395 Fix RunnerDeployment-managed runner pods to not get RUNNER_NAME and RUNNER_TOKEN injected twice
Since #1179, runner pods managed by RunnerDeployment had two duplicate environment variables for RUNNER_NAME and RUNNER_TOKEN. This fixes that.
2022-03-12 13:49:50 +00:00
Yusuke Kuoka
7c0340dea0 Merge pull request #1211 from actions-runner-controller/use-ephemeral-by-default
Use --ephemeral by default

Every runner is now --ephemeral by default.

Note that this works by ARC setting the RUNNER_FEATURE_FLAG_EPHEMERAL envvar to true by default. Previously you had to explicitly set it to true otherwise the runner was passed --once which is known to various race conditions.

It's worth noting that the very confusing and related configuration, ephemeral: true, which creates --once runners instead of static(or persistent) runners had been the default since many months ago. So, this should be the only change needed to make every runner ephemeral without any explicit configuration.

You can still fall back to static(persistent) runners by setting ephemeral: false, and to --once runners by setting RUNNER_FEATURE_FLAG_EPHEMERAL to "false". But I don't think there're many reasons to do so.

Ref #1189
2022-03-12 22:47:38 +09:00
Yusuke Kuoka
c3dd1c5c05 e2e: Make TEST_FEATURE_FLAG_EPHEMERAL optional 2022-03-12 13:32:42 +00:00
Yusuke Kuoka
051089733b Use --ephemeral by default
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1189
2022-03-12 13:20:07 +00:00
Yusuke Kuoka
757e0a82a2 Merge pull request #1210 from actions-runner-controller/fix-github-api-cache-for-github-app-mode
Fix GitHub API cache to work with GitHub App authentication
2022-03-12 21:17:25 +09:00
Yusuke Kuoka
83e550cde5 Experimetanl log level "-4" for logging every HTTP round-trip for GitHub API calls 2022-03-12 12:11:16 +00:00
Yusuke Kuoka
22ef7b3a71 acceptance,e2e: Fix deploy.sh and e2e_test.go for testing with GitHub App 2022-03-12 12:10:04 +00:00
Yusuke Kuoka
28fccbcecd Fix GitHub API cache to work with GitHub App authentication
The version of `bradleyfalzon/ghinstallation` which is used to enable GitHub App authentication turned out to add an extra header `application/vnd.github.machine-man-preview+json` to every HTTP request. That revealed an edge-case in our HTTP cache layer `gregjones/httpcache` that results it to not serve responses from cache when it should.

There were two problems. One was that it does not support multi-valued header and it only looked for the first value for each header, and another is that it does not support any http.RoundTripper implementation that modifies HTTP request headers in a RoundTrip function call.

I fixed it in my fork of httpcache, which is hosted at https://github.com/actions-runner-controller/httpcache.

The relevant commits are:

- 70d975e77d
- 197a8a3546

This can be considered as a follow-up for #1127, which turned out to have enabled the cache only for the case that ARC uses PAT for authentication.
Since this fix, the cache is also enabled when ARC authenticates as a GitHub App.
2022-03-12 11:14:16 +00:00
Yusuke Kuoka
9628bb2937 Prevent RemoveRunner spam on busy ephemeral runner scale down (#1204)
Since #1127 and #1167, we had been retrying `RemoveRunner` API call on each graceful runner stop attempt when the runner was still busy.
There was no reliable way to throttle the retry attempts. The combination of these resulted in ARC spamming RemoveRunner calls(one call per reconciliation loop but the loop runs quite often due to how the controller works) when it failed once due to that the runner is in the middle of running a workflow job.

This fixes that, by adding a few short-circuit conditions that would work for ephemeral runners. An ephemeral runner can unregister itself on completion so in most of cases ARC can just wait for the runner to stop if it's already running a job. As a RemoveRunner response of status 422 implies that the runner is running a job, we can use that as a trigger to start the runner stop waiter.

The end result is that 422 errors will be observed at most once per the whole graceful termination process of an ephemeral runner pod. RemoveRunner API calls are never retried for ephemeral runners. ARC consumes less GitHub API rate limit budget and logs are much cleaner than before.

Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1167#issuecomment-1064213271
2022-03-11 19:03:17 +09:00
Renovate Bot
736a53fed6 fix(deps): update golang.org/x/oauth2 commit hash to 6242fa9 2022-03-10 08:39:51 +09:00
yourmoonlight
132faa13a1 docs: fix the helm command for webhook installation (#1188)
* fix doc for install the webhook server

* modify cmd with single set && add double quote for zsh users
2022-03-08 17:59:01 +00:00
Callum Tait
66e070f798 docs: remove githubAPICacheDuration from docs (#1194) 2022-03-08 13:27:30 +00:00
Yusuke Kuoka
55ff4de79a Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192)
* Remove legacy GitHub API cache of HRA.Status.CachedEntries

We migrated to the transport-level cache introduced in #1127 so not only this is useless, it is harder to deduce which cache resulted in the desired replicas number calculated by HRA.
Just remove the legacy cache to keep it simple and easy to understand.

* Deprecate githubAPICacheDuration helm chart value and the --github-api-cache-duration as well

* Fix integration test
2022-03-08 19:05:43 +09:00
Yusuke Kuoka
301439b06a chore: Change log ts format to RFC3339 (#1191)
The TimeEncoder for zap seems to have been set to EpochTimeEncoder which is the default and it was not very readable. Changing it to a TimeEncoderOfLayout(time.RFC3339) for readability.

Another benefit of doing this is the ts format is now consistent with various timestamps ARC put into pod and other custom resource annotations.
2022-03-08 10:34:52 +09:00
Yusuke Kuoka
15ee6d6360 chore: Reorganize "Calculated desired replicas log fields (#1190)
So that `max` is emitted immediately after `min`, which is the counterpart of it.
2022-03-08 10:29:53 +09:00
Felipe Galindo Sanchez
5b899f578b fix(chart): allow to use basic auth when authSecret.create is false (#1149)
* fix(chart): allow to use basic auth when authSecret.create is false

When secret is created outside of the ARC chart using authSecret.create=false and basicAuth,
the controller fails as we're not including the basic password as environment variable as
the password value won't be inside the helm values.

This PR includes both environment variables for consistent regardless if
those are set or not similar as the rest of the other auth options (e.g
app_id, private  key, etc)

* chart: Add back the conditional block for .Values.authSecret.github_basicauth_username

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-03-07 10:07:24 +09:00
Yusuke Kuoka
d8c9eb7ba7 Fix arm64 image (#1185)
Fixes #1184
2022-03-07 10:00:20 +09:00
Yusuke Kuoka
cbbc383a80 Auto-correct replicas number on missing webhook_job completion event (#1180)
While testing #1179, I discovered that ARC sometimes stop resyncing RunnerReplicaSet when the desired replicas is greater than the actual number of runner pods.
This seems to happen when ARC missed receiving a workflow_job completion event but it has no way to decide if it is either (1) something went wrong on ARC or (2) a loadbalancer in the middle or GitHub or anything not ARC went wrong. It needs a standard to decide it, or if it's not impossible, how to deal with it.

In this change, I added a hard-coded 10 minutes timeout(can be made customizable later) to prevent runner pod recreation.
Now, a RunnerReplicaSet/RunnerSet to restart runner pod recreation 10 minutes after the last scale-up. If the workflow completion event arrived after the timeout, it will decrease the desired replicas number that results in the removal of a runner pod. The removed runner pod might be deleted without ever being used, but I think that's better than leaving the desired replicas and the actual number of replicas diverged forever.
2022-03-07 09:35:13 +09:00
seplak
b57e885a73 Fix service account typo in Helm README (#1183)
Just fixing a typo I discovered while reading through the README.
2022-03-07 08:39:01 +09:00
Yusuke Kuoka
bed927052d Merge pull request #1179 from actions-runner-controller/refactor-runner-and-runnerset
Refactor Runner and RunnerSet so that they use the same library code that powers RunnerSet.

RunnerSet is StatefulSet-based and RunnerSet/Runner is Pod-based so it had been hard to unify the implementation although they look very similar in many aspects.

This change finally resolves that issue, by first introducing a library that implements the generic logic that is used to reconcile RunnerSet, then adding an adapter that can be used to let the generic logic manage runner pods via Runner, instead of via StatefulSet.

Follow-up for #1127, #1167, and 1178
2022-03-06 15:56:51 +09:00
Yusuke Kuoka
14a878bfae refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
Yusuke Kuoka
c95e84a528 refactor: Extract runner pod owner management out of runnerset controller
so that it can potentially be reusable from runnerreplicaset controller
2022-03-05 12:18:02 +00:00
Yusuke Kuoka
95a5770d55 Fix regression that registration-timeout check was not working for runnerset (#1178)
Follow-up for #1167
2022-03-05 19:31:05 +09:00
Yusuke Kuoka
9cc9f8c182 chore: Add a few comments to runnerset and runnerpod controllers to help potential contributors 2022-03-05 05:41:56 +00:00
Yusuke Kuoka
b7c5611516 dockerfile: Fix unintended removal of CGO_ENABLED=0 2022-03-05 05:41:56 +00:00
Yusuke Kuoka
138e326705 chore: Add comment on lastSyncTime in runnerset controller 2022-03-05 05:41:56 +00:00
Renovate Bot
c21fa75afa fix(deps): update kubernetes packages to v0.23.4 2022-03-04 08:39:18 +09:00
Yusuke Kuoka
34483e268f ci: Enable actions/cache for Go modules 2022-03-03 18:47:54 +09:00
Yusuke Kuoka
5f2b5327f7 integration: Reduce error logs to ease debugging 2022-03-03 18:47:54 +09:00
renovate[bot]
a93b2fdad4 fix(deps): update golang.org/x/oauth2 commit hash to ee48083 (#1150)
fix(deps): update golang.org/x/oauth2 commit hash to ee48083

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-03 18:00:43 +09:00
Yusuke Kuoka
25570a0c6d Fix docker build 2022-03-03 02:05:38 +00:00
Felipe Galindo Sanchez
d20ad71071 Fix minor log in runner controller (#1175)
Log is mentioning registration only but this is about the standard runner pod
2022-03-03 09:51:30 +09:00
Daniel
8a379ac94b Add custom volume mount documentation (#1045)
one example for in-memory
and one example for NVME backed storage, also pointing out all the
current flaws/risks for that configuration
2022-03-03 09:13:42 +09:00
Felipe Galindo Sanchez
27563c4378 Remove unused function (#1173) 2022-03-03 09:02:47 +09:00
Felipe Galindo Sanchez
4a0f68bfe3 Cleanup extra block in runner controller (#1174) 2022-03-03 09:01:34 +09:00
Yusuke Kuoka
1917cf90c4 chore: Tweak runner-id annotation name and the annotation prefix to be more consistent 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
0ba3cad6c2 fix: Prefix runner pod related annotation keys with actions/ to make them distinguishable from other annotations 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
7f0e65cb73 refactor: Extract definitions of various annotation keys and other defaults to their own source 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
12a04b7f38 Fix typo in comment 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
a3072c110d Prevent runnerset pod unregistration until it gets runner ID
This eliminates the race condition that results in the runner terminated prematurely when RunnerSet triggered unregistration of StatefulSet that added just a few seconds ago.
2022-03-02 19:03:20 +09:00
Yusuke Kuoka
15b402bb32 Make RunnerSet much more reliable with or without webhook 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
11be6c1fb6 Prevent runner pod deletion delay when pod disappeared before unregistration 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
59c3288e87 acceptance,e2e: Automate restarts of ARC pods in case image tag is not changed 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
5030e075a9 dockerfile,e2e: Use buildx and cache mounts for faster rebuilds in E2E 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
3115d71471 acceptance,e2e: Enhance deploy.sh to support more types of runnersets 2022-03-02 19:03:20 +09:00
Renovate Bot
c221b6e278 chore(deps): update actions/checkout action to v3 2022-03-02 11:05:16 +09:00
Renovate Bot
a8dbc8a501 fix(deps): update module github.com/prometheus/client_golang to v1.12.1 2022-03-02 10:56:53 +09:00
Renovate Bot
b1ac63683f fix(deps): update module go.uber.org/zap to v1.21.0 2022-03-02 10:54:35 +09:00
Renovate Bot
10bc28af75 fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.1 2022-03-02 10:52:43 +09:00
Renovate Bot
e23692b3bc chore(deps): update actions/setup-python action to v3 2022-03-02 10:51:22 +09:00
renovate[bot]
e7f4a0e200 chore(deps): update actions/setup-go action to v3 (#1163)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-02 10:51:01 +09:00
Yusuke Kuoka
828ddcd44e Merge pull request #1151 from fgalind1/improve-logs
logging: improve logs for scaling
2022-03-02 10:46:53 +09:00
Yusuke Kuoka
fc821fd473 Merge pull request #1168 from actions-runner-controller/docs/better-runner-group-description
docs: better runner group description
2022-03-02 10:31:22 +09:00
Callum Tait
4b0aa92286 docs: better wording 2022-03-01 08:56:30 +00:00
Callum Tait
c69c8dd84d docs: better runner group description 2022-03-01 08:54:24 +00:00
Renovate Bot
e42db00006 chore(deps): update dependency actions/runner to v2.288.1 2022-02-28 22:30:10 +00:00
Felipe Galindo Sanchez
eff0c7364f Merge branch 'master' into improve-logs 2022-02-28 09:25:30 -08:00
Tingluo Huang
516695b275 Set UserAgent to 'actions-runner-controller' for all Http Client. (#1140)
I can't find any requests made by user agent `actions-runner-controller` in GitHub.com's telemetry in the past 7 days.

Turns out we only set user agent `actions-runner-controller` if we are configured to use BasicAuth which is not the case for most customers I think.

I update the code a little bit to make sure it always set `actions-runner-controller` as UserAgent for the GitHub HttpClient in ARC.

A further step would be somehow baking the ARC release version into the UserAgent as well.
2022-02-28 09:17:58 +09:00
Yusuke Kuoka
686d40c20d Merge pull request #1127 from actions-runner-controller/github-api-cache
Enhances ARC(both the controller-manager and github-webhook-server) to cache any GitHub API responses with HTTP GET and an appropriate Cache-Control header.

Ref #920

## Cache Implementation

`gregjones/httpcache` has been chosen as a library to implement this feature, as it is as recommended in `go-github`'s documentation:

https://github.com/google/go-github#conditional-requests

`gregjones/httpcache` supports a number of cache backends like `diskcache`, `s3cache`, and so on:

https://github.com/gregjones/httpcache#cache-backends

We stick to the built-in in-memory cache as a starter. Probably this will never becomes an issue as long as various HTTP responses for all the GitHub API calls that ARC makes, list-runners, list-workflow-jobs, list-runner-groups, etc., doesn't overflow the in-memory cache.

`httpcache` has an known unfixed issue that it doesn't update cache on chunked responses. But we assume that the APIs that we call doesn't use chunked responses. See #1503 for more information on that.

## Ephemeral runner pods are no longer recreated

The addition of the cache layer resulted in a slow down of a scale-down process and a trade-off between making the runner pod termination process fragile to various race conditions(shorter grace period before runner deletion) or delaying runner pod deletion depending on how long the grace period is(longer grace period). A grace period needs to be at least longer than 60s (which is the same as cache duration of ListRunners API) to not prematurely delete a runner pod that was just created.

But once I disabled automatic recreation of ephemeral runner pod, it turned out to be no more of an issue when it's being scaled via workflow_job webhook.

Ephemeral runner resources are still automatically added on demand by RunnerDeployment via RunnerReplicaSet(I've added `EffectiveTime` fields to our CRDs but that's an implementation detail so let's omit). A good side-effect of disabling ephemeral runner pod recreations is that ARC will no longer create redundant ephemeral runners when used with webhook-based autoscaler.

Basically, autoscaling still works as everyone might expect. It's just better than before overall.
2022-02-28 08:37:26 +09:00
Renovate Bot
f0fa99fc53 chore(deps): update dependency actions/runner to v2.288.0 2022-02-26 01:34:49 +00:00
Javier Sotelo
6b12413fdd Add optional hostNetwork (#1035)
Co-authored-by: jsotelo <javier.sotelo@viasat.com>
2022-02-23 20:11:40 +00:00
Felipe Galindo Sanchez
3abecd0f19 logging: improve logs for scaling 2022-02-23 08:29:13 -08:00
Callum Tait
7156ce040e chore: bump chart (#1138) 2022-02-21 09:24:14 +00:00
Yusuke Kuoka
1463d4927f acceptance,e2e: Let capacity reservation expired more later 2022-02-21 00:07:49 +00:00
Yusuke Kuoka
5bc16f2619 Enhance HRA capacity reservation update log 2022-02-21 00:06:26 +00:00
Yusuke Kuoka
b8e65aa857 Prevent unnecessary ephemeral runner recreations 2022-02-20 13:45:42 +00:00
Yusuke Kuoka
d4a9750e20 acceptance,e2e: Enhance E2E test and deploy.sh to support scaleDownDelaySeconds~ and minReplicas for HRA 2022-02-20 13:45:42 +00:00
Yusuke Kuoka
a6f0e0008f Make unregistration timeout and retry delay configurable in integration tests 2022-02-20 12:05:34 +00:00
Yusuke Kuoka
79a31328a5 Stop recreating ephemeral runner pod
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/911#issuecomment-1046161384
2022-02-20 04:42:19 +00:00
Yusuke Kuoka
4e6bfd8114 e2e: Add ability to toggle dockerdWithinRunnerContainer 2022-02-20 04:37:15 +00:00
Yusuke Kuoka
3c16188371 Introduce consistent timeouts for runner unregistration and runner pod deletion
Enhances runner controller and runner pod controller to have consistent timeouts for runner unregistration and runner pod deletion,
so that we are very much unlikely to terminate pods that are running any jobs.
2022-02-20 04:36:35 +00:00
Yusuke Kuoka
9e356b419e chart: Add default-logs-container annotation to controller pods
so that you can run `kubectl logs` on controller pods without the specifying the container name.

It is especially useful when you want to run kubectl-logs on all ARC pods across controller-manager and github-webhook-server like:

```
kubectl -n actions-runner-system logs -l app.kubernetes.io/name=actions-runner-controller
```

That was previously impossible due to that the selector matches pods from both controller-manager and github-webhook-server and kubectl does not provide a way to specify container names for respective pods.
2022-02-19 12:22:53 +00:00
Yusuke Kuoka
f3ceccd904 acceptance: Improve deploy.sh to recreate ARC (not runner) pods on new test id
So that one does not need to manually recreate ARC pods frequently.
2022-02-19 12:22:53 +00:00
Yusuke Kuoka
4b557dc54c Add logging transport to log HTTP requests in log level -3
The log level -3 is the minimum log level that is supported today, smaller than debug(-1) and -2(used to log some HRA related logs).

This commit adds a logging HTTP transport to log HTTP requests and responses to that log level.

It implements http.RoundTripper so that it can log each HTTP request with useful metadata like `from_cache` and `ratelimit_remaining`.
The former is set to `true` only when the logged request's response was served from ARC's in-memory cache.
The latter is set to X-RateLimit-Remaining response header value if and only if the response was served by GitHub, not by ARC's cache.
2022-02-19 12:22:53 +00:00
Yusuke Kuoka
4c53e3aa75 Add GitHub API cache to avoid rate limit
This will cache any GitHub API responses with correct Cache-Control header.

`gregjones/httpcache` has been chosen as a library to implement this feature, as it is as recommended in `go-github`'s documentation:

https://github.com/google/go-github#conditional-requests

`gregjones/httpcache` supports a number of cache backends like `diskcache`, `s3cache`, and so on:

https://github.com/gregjones/httpcache#cache-backends

We stick to the built-in in-memory cache as a starter. Probably this will never becomes an issue as long as various HTTP responses for all the GitHub API calls that ARC makes, list-runners, list-workflow-jobs, list-runner-groups, etc., doesn't overflow the in-memory cache.

`httpcache` has an known unfixed issue that it doesn't update cache on chunked responses. But we assume that the APIs that we call doesn't use chunked responses. See #1503 for more information on that.

Ref #920
2022-02-19 12:22:53 +00:00
Tingluo Huang
0b9bef2c08 Try to unconfig runner before deleting the pod to recreate (#1125)
There is a race condition between ARC and GitHub service about deleting runner pod.

- The ARC use REST API to find a particular runner in a pod that is not running any jobs, so it decides to delete the pod.
- A job is queued on the GitHub service side, and it sends the job to this idle runner right before ARC deletes the pod.
- The ARC delete the runner pod which cause the in-progress job to end up canceled.

To avoid this race condition, I am calling `r.unregisterRunner()` before deleting the pod.
- `r.unregisterRunner()` will return 204 to indicate the runner is deleted from the GitHub service, we should be safe to delete the pod.
- `r.unregisterRunner` will return 400 to indicate the runner is still running a job, so we will leave this runner pod as it is.

TODO: I need to do some E2E tests to force the race condition to happen.

Ref #911
2022-02-19 21:22:31 +09:00
Yusuke Kuoka
a5ed6bd263 Fix RunerSet managed runner pods to terminate more gracefully (#1126)
Make RunnerSet-managed runners as reliable as RunnerDeployment-managed runners.

Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/911#issuecomment-1042404460
2022-02-19 21:19:37 +09:00
Yusuke Kuoka
921f547200 fix: Do recreate runner pod on registration token update (#1087)
Apparently, we've been missed taking an updated registration token into account when generating the pod template hash which is used to detect if the runner pod needs to be recreated.

This shouldn't have been the end of the world since the runner pod is recreated on the next reconciliation loop anyway, but this change will make the pod recreation happen one reconciliation loop earlier so that you're less likely to get runner pods with outdated refresh tokens.

Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1085#issuecomment-1027433365
2022-02-19 21:18:00 +09:00
Felipe Galindo Sanchez
9079c5d85f fix: configure logger before trying to log (#1128)
Log about GitHub client not being initialized is not seen as logger is configured after adding the log
2022-02-19 20:56:58 +09:00
Yusuke Kuoka
a9aea0bd9c Fix issue that visible runner groups are printed as if empty in log 2022-02-19 14:43:41 +09:00
Yusuke Kuoka
fcf4778bac Fix regression that prevented default organizational runner group from being scale target
Fixes #1131
2022-02-19 14:43:41 +09:00
Yusuke Kuoka
eb0a4a9603 chart: Bump to 0.16.0 (with appVersion 0.21.0) 2022-02-18 01:57:37 +00:00
Yusuke Kuoka
b6151ebb8d Fjx release.yml upload artifacts to not fail due to outdated go (1.15) 2022-02-18 10:27:39 +09:00
Yusuke Kuoka
ba4bd7c0db e2e,acceptance: Cover enterprise runners (#1124)
Adds various code and changes I have used while testing #1062
2022-02-17 09:16:28 +09:00
Yusuke Kuoka
5b92c412a4 chart: Allow using different secrets for controller-manager and gh-webhook-server (#1122)
* chart: Allow using different secrets for controller-manager and gh-webhook-server

As it is entirely possible to do so because they are two different K8s deployments. It may provide better scalability because then each component gets its own GitHub API quota.
2022-02-17 09:16:16 +09:00
Yusuke Kuoka
e22d981d58 githubwebhookserver: Tweak log levels of various messages (#1123)
Some of logs like `HRA keys indexed for HRA` were so excessive that it made testing and debugging the githubwebhookserver harder. This tries to fix that.
2022-02-17 09:15:26 +09:00
Yusuke Kuoka
a7b39cc247 acceptance: Avoid "metadata.annotations too long" errors on applying CRDs 2022-02-17 09:01:44 +09:00
Yusuke Kuoka
1e452358b4 acceptance: Do recreate the controller-manager secret on every deployment
We had to manually remove the secret first to update the GitHub credentials used by the controller, which was cumbersome.
Note that you still need to recreate the controller pods and the gh webhook server pods to let them remount the recreated secret.
2022-02-17 09:01:44 +09:00
Carlos Tadeu Panato Junior
92e133e007 ci: update helm to 3.8.0 and go to 1.17.7 (#1119)
Signed-off-by: Carlos Panato <ctadeu@gmail.com>
2022-02-16 20:40:27 +09:00
Felipe Galindo Sanchez
d0d316252e Option to consider runner group visibility on scale based on webhook (#1062)
This will work on GHES but GitHub Enterprise Cloud due to excessive GitHub API calls required.
More work is needed, like adding a cache layer to the GitHub client, to make it usable on GitHub Enterprise Cloud.

Fixes additional cases from https://github.com/actions-runner-controller/actions-runner-controller/pull/1012

If GitHub auth is provided in the webhooks controller then runner groups with custom visibility are supported. Otherwise, all runner groups will be assumed to be visible to all repositories

`getScaleUpTargetWithFunction()` will check if there is an HRA available with the following flow:

1. Search for **repository** HRAs - if so it ends here
2. Get available HRAs in k8s
3. Compute visible runner groups
  a. If GitHub auth is provided - get all the runner groups that are visible to the repository of the incoming webhook using GitHub API calls.  
  b. If GitHub auth is not provided - assume all runner groups are visible to all repositories
4. Search for **default organization** runners (a.k.a runners from organization's visible default runner group) with matching labels
5. Search for **default enterprise** runners (a.k.a runners from enterprise's visible default runner group) with matching labels
6. Search for **custom organization runner groups** with matching labels
7. Search for **custom enterprise runner groups** with matching labels

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-02-16 19:08:56 +09:00
Shu Ambat
b509eb4388 Update the helm chart app version (#1099) 2022-02-09 09:29:49 +09:00
Yusuke Kuoka
59437ef79f Update README.md
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1100#issuecomment-1032775144
2022-02-09 09:16:46 +09:00
Ryo Sakamoto
a51fb90cd2 modify chart ingress (#1098)
Signed-off-by: cw-sakamoto <sakamoto@chatwork.com>
2022-02-08 12:56:30 +09:00
Callum Tait
eb53d238d1 docs: move istio to troubleshooting (#1097)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-02-07 20:49:26 +00:00
renovate[bot]
7fdf9a6c67 chore(deps): update helm/chart-releaser-action action to v1.3.0 (#1091)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-02-07 20:40:12 +00:00
Callum Tait
6f591ee774 chore: bump docker version (#1094)
* chore: bump docker version

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-02-07 20:10:02 +00:00
Callum Tait
cc25dd7926 chore: change to trigger build (#1093)
* chore: change to trigger build

* ci: actually use variable

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-02-03 21:23:42 +00:00
Chris Bui
1b911749a6 feat: disable automatic runner updates (#1088)
* Add env variable to configure `disablupdate` flag

* Write test for entrypoint disable update

* Rename flag, update docs for DISABLE_RUNNER_UPDATE

* chore: bump runner version in makefile

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-02-03 21:03:38 +00:00
maruware
b652a8f9ae Update chart README (#1083)
Fix `services.port` and `services.type` description is reversed.
2022-01-31 20:28:19 +00:00
sdubey-optum
069bf6a042 docs: fixing helm readme typo (#1064) 2022-01-28 22:26:17 +00:00
Callum Tait
f09a974ac2 chore: change to trigger build (#1079)
* chore: change to trigger build

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-01-28 21:57:53 +00:00
Callum Tait
1f7e440030 ci: add required token permissions 2022-01-28 21:37:16 +00:00
cspargo
9d5a562407 fix: use copy instead of move (#1066)
* fix: use copy instead of move

Co-authored-by: Colin Spargo <cspargo@users.noreply.github.com>
2022-01-28 21:24:52 +00:00
Renovate Bot
715e6a40f1 chore(deps): update dependency actions/runner to v2.287.1 2022-01-27 23:37:58 +00:00
Renovate Bot
81b2c5ada9 chore(deps): update dependency actions/runner to v2.287.0 2022-01-27 19:50:48 +00:00
renovate[bot]
9ae83dfff5 fix(deps): update module github.com/google/go-cmp to v0.5.7 (#1060)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-01-20 08:38:07 +09:00
Renovate Bot
5e86881c30 chore(deps): update dependency actions/runner to v2.286.1 2022-01-14 20:23:54 +00:00
renovate[bot]
1c75b20767 chore(deps): update helm/chart-testing-action action to v2.2.0 (#1038)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-01-07 13:56:48 +00:00
Daniel
8a73560dbc if a Volume is defined by the operator don't add another "work" volume. (#1015)
This allows providing a different `work` Volume.

This should be a cloud agnostic way of allowing the operator to use (for example) NVME backed storage.

This is a working example where the workDir will use the provided volume, additionally here docker is placed on the same NVME.
```
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: runner-2
spec:
  template:
    spec:
      dockerdContainerResources: {}
      env:
      - name: POD_NAME
        valueFrom:
          fieldRef:
            fieldPath: metadata.name
      # this is to mount the docker in docker onto NVME disk
      dockerVolumeMounts:
      - mountPath: /var/lib/docker
        name: scratch
        subPathExpr: $(POD_NAME)-docker
      - mountPath: /runner/_work
        name: work
        subPathExpr: $(POD_NAME)-work
      volumeMounts:
      - mountPath: /runner/_work
        name: work
        subPathExpr: $(POD_NAME)-work
      dockerEnv:
      - name: POD_NAME
        valueFrom:
          fieldRef:
            fieldPath: metadata.name
      volumes:
      - hostPath:
          path: /mnt/disks/ssd0
        name: scratch
      - hostPath:
          path: /mnt/disks/ssd0
        name: work
      nodeSelector:
       cloud.google.com/gke-nodepool: runner-16-with-nvme
      ephemeral: false
      image: ""
      imagePullPolicy: Always
      labels:
        - runner-2
        - self-hosted
      organization: yourorganization

```
2022-01-07 10:01:40 +09:00
Yusuke Kuoka
01301d3ce8 Stop creating registration-only runners on scale-to-zero (#1028)
Resolves #859
2022-01-07 09:56:21 +09:00
renovate[bot]
02679ac1d8 fix(deps): update module go.uber.org/zap to v1.20.0 (#1027)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-01-07 08:51:57 +09:00
Hyeonmin Park
1a6e5719c3 test: Add tests with self-hosted label for #953 (#1030) 2022-01-07 08:50:26 +09:00
Callum Tait
f72d871c5b docs: move troubleshooting out of main readme (#1023)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2021-12-29 10:24:09 +09:00
Callum Tait
ad48851dc9 feat: expose if docker is enabled and wait for docker to be ready (#962)
Resolves #897
Resolves #915
2021-12-29 10:23:35 +09:00
Lars Haugan
c5950d75fa fix: pagination for ListWorkflowJobs in autoscaler (#990) (#992)
Adding handling of paginated results when calling `ListWorkflowJobs`. By default the `per_page` is 30, which potentially would return 30 queued and 30 in_progress jobs.

This change should enable the autoscaler to scale workflows with more than 60 jobs to the exact number of runners needed.

Problem: I did not find any support for pagination in the Github fake client, and have not been able to test this (as I have not been able to push an image to an environment where I can verify this).
If anyone is able to help out verifying this PR, i would really appreciate it.

Resolves #990
2021-12-24 09:12:36 +09:00
Felipe Galindo Sanchez
de1f48111a feat: support routing GitHub API calls to custom proxy API (#1017)
GitHub currently has some limitations w.r.t permissions management on
runner groups as they all require org admin, however at our company
we're using runner groups to serve different internal teams (with
different permissions), thus we needed to deploy a custom proxy API with
our internal authentication to provide who has access to certain APIs
depending on the repository/runner group on a given org/enterprise

This change just allows to optionally send the GitHub API calls to an alternate custom
proxy URL instead of cloud github (github.com) or an enterprise URL with
basic authentication

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-23 09:24:10 +09:00
Renovate Bot
8a7720da77 chore(deps): update dependency actions/runner to v2.286.0 2021-12-21 19:47:32 +00:00
Felipe Galindo Sanchez
608c56936e Remove duplicate self-hosted condition (#1016)
Duplicate condition caused after merge of #953 and #1012
2021-12-21 09:08:21 +09:00
Felipe Galindo Sanchez
4ebec38208 Support runner groups with selected visibility in webhooks autoscaler (#1012)
The current implementation doesn't support yet runner groups with custom visibility (e.g selected repositories only). If there are multiple runner groups with selected visibility - not all runner groups may be a potential target to be scaled up. Thus this PR introduces support to allow having runner groups with selected visibility. This requires to query GitHub API to find what are the potential runner groups that are linked to a specific repository (whether using visibility all or selected).

This also improves resolving the `scaleTargetKey` that are used to match an HRA based on the inputs of the `RunnerSet`/`RunnerDeployment` spec to better support for runner groups.

This requires to configure github auth in the webhook server, to keep backwards compatibility if github auth is not provided to the webhook server, this will assume all runner groups have no selected visibility and it will target any available runner group as before
2021-12-19 18:29:44 +09:00
clement-loiselet-talend
0c34196d87 fix(#951): add exception for self-hosted label in webhook search (#953)
The webhook "workflowJob" pass the labels the job needs to the controller, who in turns search for them in its RunnerDeployment / RunnerSet. The current implementation ignore the search for `self-hosted` if this is the only label, however if multiple labels are found the `self-hosted` label must be declared explicitely or the RD / RS will not be selected for the autoscaling.

This PR fixes the behavior by ignoring this label, and add documentation on this webhook for the other labels that will still require an explicit declaration (OS and architecture). 

The exception should be temporary, ideally the labels implicitely created (self-hosted, OS, architecture) should be searchable alongside the explicitly declared labels.

code tested, work with `["self-hosted"]` and `["self-hosted","anotherLabel"]`

Fixes #951
2021-12-19 10:55:23 +09:00
Jacob Lauritzen
83c8a9809e Add GKE firewall issues to common errors (#1010)
A lot of people have issues with private GKE clusters and it seems they are all solved by setting up a firewall policy. I think it would be relevant to include this in a troubleshooting-section since so many people are searching around issues for it. I myself just spent most of my day trying to figure it out.

Issues where this is the solution:
* #293
* #335
* #68
2021-12-17 09:10:15 +09:00
renovate[bot]
c64000e11c fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.0 (#740)
* fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.0

* Fix dependencies and bump Go to 1.17 so that it builds after controller-runtime 0.11.0 upgrade

* Regenerate manifests with the latest K8s dependencies

Co-authored-by: Renovate Bot <bot@renovateapp.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-17 09:06:55 +09:00
Felipe Galindo Sanchez
9bb21aef1f Add support for default image pull secret name (#921)
Resolves #896

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-15 09:29:31 +09:00
Gabriele Mambrini
7261d927fb fix: report busy status for offline workers (#1009)
ref #911 

Fix #993 cannot work because the runner busy status is not reported when offline
2021-12-15 08:57:13 +09:00
Pavel Smalenski
91102c8088 Add dockerEnv variable for RunnerDeployment (#912)
Resolves #878

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-14 17:13:24 +09:00
apr-1985
6f51f560ba fix: allow GH priv key from env in helm chart (#884)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-14 13:15:12 +09:00
Bryan Peterson
961f01baed allow providing webhook secret token via flag instead of environment variable (#876)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-12 17:00:32 +09:00
Skyler Mäntysaari
d0642eeff1 chart: ingress for k8s v1.22.x support (#988)
Also dropped the deprecated .Capabilities.KubeVersion.Gitversion usage in ingress template.

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-12 16:43:32 +09:00
kannappan senthilnathan
473fe7f736 update readme instruction for webhook scaling (#987)
added note about mandatory parameters for webhook driven scaling
2021-12-12 16:16:54 +09:00
Piaras Hoban
84b0c64d29 feat: add authSecret.enabled to Helm chart (#937)
When false the chart deployment template will not add GITHUB_*
environment variables to the manager container. In addition, the `volume`
and `volumeMount` for the secret will also be omitted from the
deployment manifest.

Signed-off-by: Piaras Hoban <phoban01@gmail.com>
2021-12-12 16:13:14 +09:00
Felipe Galindo Sanchez
f0fccc020b refactor: split Reconciler from Reconcile in a few methods (#926)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-12 14:22:55 +09:00
Yusuke Kuoka
2bd6d6342e Push packages to GHCR (#1004)
Ref #849
Ref #566

Co-authored-by: Pål Sollie <sollie@sparkz.no>
2021-12-12 11:41:33 +09:00
Patrick Ellis
ea2dbc2807 Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
Yusuke Kuoka
c718eaae4f Bump ginkgo and gomega (#1003)
Supercedes #880 and #746
2021-12-11 21:10:09 +09:00
renovate[bot]
67e39d719e fix(deps): update golang.org/x/oauth2 commit hash to d3ed0bb (#947)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2021-12-11 20:51:46 +09:00
Yusuke Kuoka
bbd328a7cc Bump controller-runtime to v0.10.3 (#1002)
Enhanced version of https://github.com/actions-runner-controller/actions-runner-controller/pull/740
2021-12-11 20:49:47 +09:00
renovate[bot]
8eb6c0f3f0 fix(deps): update module github.com/teambition/rrule-go to v1.7.2 (#747)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2021-12-11 18:58:09 +09:00
renovate[bot]
231c1f80e7 fix(deps): update module sigs.k8s.io/yaml to v1.3.0 (#841)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2021-12-11 18:41:52 +09:00
renovate[bot]
3c073c5e17 fix(deps): update module go.uber.org/zap to v1.19.1 (#799)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2021-12-11 18:41:22 +09:00
Callum Tait
a1cfe3be36 docs: re-order helm param order (#996)
* docs: re-order helm param order

* docs: re-order params in values
2021-12-09 10:20:51 +00:00
Yusuke Kuoka
898ad3c355 Work-around for offline+busy runners (#993)
Ref #911
2021-12-09 09:31:06 +09:00
renovate[bot]
164a91b18f chore(deps): update quay.io/brancz/kube-rbac-proxy docker tag to v0.11.0 (#745)
* chore(deps): update quay.io/brancz/kube-rbac-proxy docker tag to v0.11.0

* chore(deps): update quay.io/brancz/kube-rbac-proxy make tag to v0.11.0

Co-authored-by: Renovate Bot <bot@renovateapp.com>
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2021-12-08 22:53:50 +00:00
Callum Tait
acb004f291 docs: remove RunnerSet limitation (#991) 2021-12-08 22:03:42 +00:00
Jonathan Sokolowski
3de4e7e9c6 Support installing without cert-manager (#834)
* Support installing without cert-manager
2021-12-08 21:58:46 +00:00
Renovate Bot
4a55fe563c chore(deps): update dependency actions/runner to v2.285.1 2021-12-06 21:22:38 +00:00
Callum Tait
23841642df docs: bring ephemeral runners to current (#981)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2021-12-04 11:12:24 +00:00
toast-gear
7c4ac2ef44 ci: update workflow triggers 2021-12-04 10:52:38 +00:00
Callum Tait
550717020d ci: remove ubuntu 18.04 image (#980)
* ci: remove Ubuntu 18.04 runner

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2021-12-04 10:46:08 +00:00
Callum Tait
9a5ae93cb7 docs: various clarifications (#976)
* docs: various clarifications
2021-11-30 19:22:49 +00:00
Callum Tait
f5175256c6 chore(deps): bump 20.04 runner version (#975) 2021-11-30 14:34:04 +00:00
Callum Tait
031b1848e0 ci: separate ubuntu versions out in ci (#969)
* ci: separate ubuntu versions out in ci
2021-11-30 14:09:33 +00:00
Yusuke Kuoka
47a17754fd chore: bumping chart for new release 2021-11-27 10:54:54 +09:00
Greg Schofield
85ddd0d137 Update volumes and volumes mounts indent (#966)
Follow-up for #952

Signed-off-by: Gregory Schofield <gscho@github.com>
2021-11-27 10:54:01 +09:00
brunocous
eefb48ba3f add additionalVolumes and additionalVolumeMounts to helm chart (#952)
* added additional volumes and volumeMounts
2021-11-22 19:03:09 +00:00
Callum Tait
62995fec5b chore: bumping chart for new release (#948) 2021-11-15 19:58:49 +00:00
Roee Landesman
7ee1d6bcdb Add podDistruptionBudget resource for controller pods (#805)
* Add podDistruptionBudget resource for controller pods

* Add PDB to GithubWebhookServer

* Fix truncation on pdb naming

Co-authored-by: Roee Landesman <roee.landesman@gmail.com>
2021-11-15 19:07:23 +00:00
Yusuke Kuoka
b87e6e3966 doc: Add GitHub App Permission for Workflow Job Webhook (#932)
* doc: Add GitHub App Permission for Workflow Job Webhook

I think we've missed documenting about the app permission for receiving workflow_job events via the app webhook

* docs: clarification on webhook events

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2021-11-09 09:12:47 +09:00
Max N. Boyarov
88b8871830 Reduce number of http superfluous messages (#894)
write to http.ResponseWriter create HTTP OK response, so set *ok* to disable error code in defered function
2021-11-09 09:07:07 +09:00
Yusuke Kuoka
2191617eb5 Remove unnecessary scale-target-not-found error on in_progress workflow_job event (#927)
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/877#issuecomment-955614456
2021-11-09 09:05:50 +09:00
Yusuke Kuoka
b305e38b17 Add webhook-based autoscale for Enterprise runners (#906)
Fixes #892
2021-11-09 09:04:19 +09:00
Yusuke Kuoka
b6c33cee32 Fail with detailed message on envvar parse error (#907)
Ref #829
2021-11-03 10:24:24 +09:00
Renovate Bot
46da4a6b6e chore(deps): update dependency actions/runner to v2.284.0 2021-11-01 18:16:10 +00:00
Callum Tait
f7e14e06e8 docs: clarify runnerset limitation (#919)
* docs: clarify runnerset limitation

* docs: correct terminology

* docs: missing word
2021-10-28 11:19:56 +01:00
Callum Tait
09e6b1839b docs: correct anti-flapping example (#918)
* docs: correct anti-flapping example

* docs: provide complete example

* docs: use example/myrepo everywhere
2021-10-28 10:52:54 +01:00
Callum Tait
0416a9272f docs: minor correction to autoscaling description (#917)
* docs: minor correction to autoscaling description

* docs: make pull metrics plural

* docs: remove redundant words

* docs: make plural and expand anti-flapping
2021-10-28 10:33:20 +01:00
Huu Khiem (Mark)
c33578a041 Update examples of RunnerDeployment in README.md (#913)
The README had some examples of RunnerDeployment with wrong/outdated
spec.

This caused some pain when trying to create them based on the examples.

=> Fixed using the spec declared in: api/v1alpha1/runner_types.go
2021-10-28 10:12:48 +01:00
Callum Tait
49566aaebd docs: fix broken link (#916) 2021-10-28 09:57:46 +01:00
Callum Tait
f66e6a00fa docs: clean up auto scaling documentation (#909)
* docs: clean up of autoscaling section

* docs: clarifying anti-flapping

* docs: more improvements

* docs: more improvements

* docs: adding duration details and cavaets

* docs: smaller english and better structure

* docs: use consistent wording

* docs: adding limitation cavaet for RunnerSets

* docs: correct helm uprgade order

* docs: lines helm upgrade command with help switch

* docs: use existing limitations section

* docs: fix table of headers and contents

* docs: add link to runnersets on first mention

* docs: adding runnerset limitation

* chore: use new enterprise permission for PAT

* docs: bump example deploy to latest version

* docs: adding oauth apps link

* docs: adding cavaet to the oauth apps doc
2021-10-28 09:55:04 +01:00
Yusuke Kuoka
431c1ed04a Add how-to for testing controller built from PR (#908)
We occasionally ask you to help testing actions-runner-controller built from a pull requested branch, like in https://github.com/actions-runner-controller/actions-runner-controller/issues/829#issuecomment-950252645 and https://github.com/actions-runner-controller/actions-runner-controller/issues/892#issuecomment-950113578.

To make the process more streamlined, I'd like to add some guide for that, so that we can just point to the guide when asking for help and we can improve the guide in a more sustainable manner.

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2021-10-26 12:53:19 +09:00
apr-1985
0d3de9ee2a chore: correct logging typo (#904) 2021-10-22 09:03:23 +09:00
Callum Tait
79d63acded chore: bumping chart for new release (#903) 2021-10-18 22:05:41 +01:00
apr-1985
271a4dcd9d Revert "chore: support app ids as int or strings (#869)" (#883)
* Revert "chore: support app ids as int or strings (#869)"

This reverts commit 0a3d2b686e.

* docs: adding some comments to the code

* docs: adding comment to the chart values
2021-10-17 23:23:31 +01:00
Arun Anandhan
0401b2d786 Create optional serviceAnnotations value in helm chart (#867)
* Create optional serviceAnnotations value in helm chart

* update annotation key

* update annotation key - webhook service

* fix README.md

* docs: using consistent tense

* docs: making the code comments more generic
2021-10-17 22:37:43 +01:00
Maxim Tacu
43141cb751 feat: Added option for secret annotation (#824)
* feat: Added option for secret annotation

* bump the chart version

* chore: aligning values attributes with standard

* fixed template for manager_secrets

* docs: update annotations and fix layout

Co-authored-by: Maxim Tacu <maxim.tacu@mercedes-benz.io>
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2021-10-17 22:18:35 +01:00
KeisukeYamashita
b805cfada7 Fix maxReplicas typo in HorizontalRunnerAutoscaler spec comment (#895)
* Fix maxreplicas in spec comment

Signed-off-by: KeisukeYamashita <19yamashita15@gmail.com>

* Generate manifests

Signed-off-by: KeisukeYamashita <19yamashita15@gmail.com>
2021-10-17 22:01:08 +01:00
Sebastien Le Digabel
c4e97d600d feat: Have githubwebhook monitor a single namespace (#828)
* feat: Have githubwebhook monitor a single namespace

When using `scope.singleNamespace: true` in Helm, expected behaviour is
that the github webhook server behaves the same way as the controller.

The current behaviour is that the webhook server monitors all the
namespaces.

* Changing the chart README.md to reflect the scope

The documentation now mentions that both the controller and the github
webhook server will make use of the `scope.watchNamespace` field if
`scope.singleNamespace` is set to `true`.

Co-authored-by: Sebastien Le Digabel <sebastien.ledigabel@skyscanner.net>
2021-10-17 21:54:32 +01:00
Maxim Pogozhiy
fce7d6d2a7 Add topologySpreadConstraints (#814) 2021-10-17 21:49:44 +01:00
Callum Tait
5805e39e1f Revert "feat: adding workflow_dispatch webhook event" (#879)
This reverts commit d36d47fe66.
2021-10-09 18:36:02 +01:00
Callum
d36d47fe66 feat: adding workflow_dispatch webhook event 2021-10-09 10:07:07 +01:00
Callum Tait
8657a34f32 ci: fix chart publish workflow (#874)
* ci: correcting cross job output syntax

* ci: adding job names
2021-10-05 22:35:32 +01:00
Callum Tait
b01e193aab ci: publish chart on chart.yaml changes only (#873)
* ci: clean up of workflows in general
2021-10-05 22:28:05 +01:00
Callum Tait
0a3d2b686e chore: support app ids as int or strings (#869)
Co-authored-by: Callum <callum@domain.com>
2021-10-05 09:01:27 +09:00
Renovate Bot
2bc050a62d chore(deps): update dependency actions/runner to v2.283.3 2021-10-04 21:17:21 +00:00
Callum Tait
0725e72ae0 ci: disable chart version bump check (#870) 2021-10-04 20:39:38 +01:00
Callum Tait
5f9fcaf016 ci: add label to renovate pull requests (#863)
* ci: add label to renovate pull requests

* ci: updating stale config to exempt new label
2021-10-02 22:48:28 +01:00
Callum Tait
e4e0b45933 chore: bump helm chart to latest app version (#862) 2021-10-02 10:05:06 +01:00
Renovate Bot
2937173101 chore(deps): update dependency actions/runner to v2.283.2 2021-09-30 15:01:11 +00:00
Aidan
fccf29970b Fix bug related to label matching. (#852)
* Fix bug related to label matching.

Add start of test framework for Workflow Job Events

Signed-off-by: Aidan Jensen <aidan@artificial.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-09-30 11:02:59 +09:00
Alex Kulikovskikh
ea06001819 fix: scaling issue based on workflow_job event (#850)
This PR fix scaling issue based on `workflow_job` event discussed in #819
2021-09-30 10:36:59 +09:00
Yusuke Kuoka
1bc1712519 Do bump go-github in codebase to fix build error in CI builds (#853) 2021-09-30 10:35:24 +09:00
Callum Tait
24224613f3 docs: updating to reflect the latest release (#844)
* docs: updating to reflect the latest release

* docs: further updates

* docs: format updates

* docs: correcting title sizes

* docs: correcting links

* docs: correcting the grammar of multi controllers

* docs: slightly less awkward English
2021-09-28 09:44:15 +01:00
Rob Bos
3f331e9a39 Fixing capitalization and a typo (#838)
* Fixing capitalization and a typo

* typo

* Typo

* Update controllers/autoscaling.go

* Update controllers/autoscaling.go

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-09-26 14:34:55 +09:00
Yusuke Kuoka
67c7b7a228 Bump chart version to 0.13.1 with controller 0.20.1 2021-09-24 00:40:08 +00:00
178 changed files with 17443 additions and 6453 deletions

View File

@@ -11,3 +11,4 @@ charts
*.md
*.txt
*.sh
test/e2e/.docker-build

View File

@@ -1,36 +0,0 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**Checks**
- [ ] My actions-runner-controller version (v0.x.y) does support the feature
- [ ] I'm using an unreleased version of the controller I built from HEAD of the default branch
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Environment (please complete the following information):**
- Controller Version [e.g. 0.18.2]
- Deployment Method [e.g. Helm and Kustomize ]
- Helm Chart Version [e.g. 0.11.0, if applicable]
**Additional context**
Add any other context about the problem here.

177
.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file
View File

@@ -0,0 +1,177 @@
name: Bug Report
description: File a bug report
title: "Bug"
labels: ["bug"]
body:
- type: input
id: controller-version
attributes:
label: Controller Version
description: Refer to semver-like release tags for controller versions. Any release tags prefixed with `actions-runner-controller-` are for chart releases
placeholder: ex. 0.18.2 or git commit ID
validations:
required: true
- type: input
id: chart-version
attributes:
label: Helm Chart Version
description: Run `helm list` and see what's shown under CHART VERSION. Any release tags prefixed with `actions-runner-controller-` are for chart releases
placeholder: ex. 0.11.0
- type: input
id: cert-manager-version
attributes:
label: CertManager Version
description: Run `kubectl get po -o yaml $CERT_MANAGER_POD` and see the image tag, or run `helm list` and see what's shown under APP VERSION for your cert-manager Helm release.
placeholder: ex. 1.8
- type: dropdown
id: deployment-method
attributes:
label: Deployment Method
description: Which deployment method did you use to install ARC?
options:
- Helm
- Kustomize
- ArgoCD
- Other
validations:
required: true
- type: textarea
id: cert-manager
attributes:
label: cert-manager installation
description: Confirm that you've installed cert-manager correctly by answering a few questions
placeholder: |
- Did you follow https://github.com/actions-runner-controller/actions-runner-controller#installation? If not, describe the installation process so that we can reproduce your environment.
- Are you sure you've installed cert-manager from an official source?
(Note that we won't provide user support for cert-manager itself. Make sure cert-manager is fully working before testing ARC or reporting a bug
validations:
required: true
- type: checkboxes
id: checks
attributes:
label: Checks
description: Please check the boxes below before submitting
options:
- label: This isn't a question or user support case (For Q&A and community support, go to [Discussions](https://github.com/actions-runner-controller/actions-runner-controller/discussions). It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
required: true
- label: I've read [releasenotes](https://github.com/actions-runner-controller/actions-runner-controller/tree/master/docs/releasenotes) before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
required: true
- label: My actions-runner-controller version (v0.x.y) does support the feature
required: true
- label: I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
required: true
- type: textarea
id: resource-definitions
attributes:
label: Resource Definitions
description: "Add copy(s) of your resource definition(s) (RunnerDeployment or RunnerSet, and HorizontalRunnerAutoscaler. If RunnerSet, also include the StorageClass being used)"
render: yaml
placeholder: |
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example
spec:
#snip
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: example
spec:
#snip
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: example
provisioner: ...
reclaimPolicy: ...
volumeBindingMode: ...
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name:
spec:
#snip
validations:
required: true
- type: textarea
id: reproduction-steps
attributes:
label: To Reproduce
description: "Steps to reproduce the behavior"
render: markdown
placeholder: |
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
validations:
required: true
- type: textarea
id: actual-behavior
attributes:
label: Describe the bug
description: Also tell us, what did happen?
placeholder: A clear and concise description of what happened.
validations:
required: true
- type: textarea
id: expected-behavior
attributes:
label: Describe the expected behavior
description: Also tell us, what did you expect to happen?
placeholder: A clear and concise description of what the expected behavior is.
validations:
required: true
- type: textarea
id: controller-logs
attributes:
label: Controller Logs
description: "NEVER EVER OMIT THIS! Include logs from `actions-runner-controller`'s controller-manager pod"
render: shell
placeholder: |
PROVIDE THE LOGS VIA A GIST LINK (https://gist.github.com/), NOT DIRECTLY IN THIS TEXT AREA
To grab controller logs:
# Set NS according to your setup
NS=actions-runner-system
# Grab the pod name and set it to $POD_NAME
kubectl -n $NS get po
kubectl -n $NS logs $POD_NAME > arc.log
validations:
required: true
- type: textarea
id: runner-pod-logs
attributes:
label: Runner Pod Logs
description: "Include logs from runner pod(s)"
render: shell
placeholder: |
PROVIDE THE LOGS VIA A GIST LINK (https://gist.github.com/), NOT DIRECTLY IN THIS TEXT AREA
To grab the runner pod logs:
# Set NS according to your setup. It should match your RunnerDeployment's metadata.namespace.
NS=default
# Grab the name of the problematic runner pod and set it to $POD_NAME
kubectl -n $NS get po
kubectl -n $NS logs $POD_NAME -c runner > runnerpod_runner.log
kubectl -n $NS logs $POD_NAME -c docker > runnerpod_docker.log
validations:
required: true
- type: textarea
id: additional-context
attributes:
label: Additional Context
description: |
Add any other context about the problem here.
Tip: You can attach images or log files by clicking this area to highlight it and then dragging files in.

15
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1,15 @@
# Blank issues are mainly for maintainers who are known to write complete issue descriptions without need to following a form
blank_issues_enabled: true
contact_links:
- name: Sponsor ARC Maintainers
about: If your business relies on the continued maintainance of actions-runner-controller, please consider sponsoring the project and the maintainers.
url: https://github.com/actions-runner-controller/actions-runner-controller/tree/master/CODEOWNERS
- name: Ideas and Feature Requests
about: Wanna request a feature? Create a discussion and collect :+1:s first.
url: https://github.com/actions-runner-controller/actions-runner-controller/discussions/new?category=ideas
- name: Questions and User Support
about: Need support using ARC? We use Discussions as the place to provide community support.
url: https://github.com/actions-runner-controller/actions-runner-controller/discussions/new?category=questions
- name: Need Paid Support?
about: Consider contracting with any of the actions-runner-controller maintainers and contributors.
url: https://github.com/actions-runner-controller/actions-runner-controller/tree/master/CODEOWNERS

View File

@@ -0,0 +1,52 @@
name: "Setup Docker"
inputs:
username:
description: "Username"
required: true
password:
description: "Password"
required: true
ghcr_username:
description: "GHCR username. Usually set from the github.actor variable"
required: true
ghcr_password:
description: "GHCR password. Usually set from the secrets.GITHUB_TOKEN variable"
required: true
outputs:
sha_short:
description: "The short SHA used for image builds"
value: ${{ steps.vars.outputs.sha_short }}
runs:
using: "composite"
steps:
- name: Get Short SHA
id: vars
run: |
echo ::set-output name=sha_short::${GITHUB_SHA::7}
shell: bash
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
version: latest
- name: Login to DockerHub
if: ${{ github.event_name == 'release' || github.event_name == 'push' && github.ref == 'refs/heads/master' }}
uses: docker/login-action@v2
with:
username: ${{ inputs.username }}
password: ${{ inputs.password }}
- name: Login to GitHub Container Registry
if: ${{ github.event_name == 'release' || github.event_name == 'push' && github.ref == 'refs/heads/master' }}
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ inputs.ghcr_username }}
password: ${{ inputs.ghcr_password }}

25
.github/lock.yml vendored
View File

@@ -1,25 +0,0 @@
# Configuration for Lock Threads
# Repo: https://github.com/dessant/lock-threads-app
# App: https://github.com/apps/lock
# Number of days of inactivity before a closed issue or pull request is locked
daysUntilLock: 7
# Skip issues and pull requests created before a given timestamp. Timestamp must
# follow ISO 8601 (`YYYY-MM-DD`). Set to `false` to disable
skipCreatedBefore: false
# Issues and pull requests with these labels will be ignored. Set to `[]` to disable
exemptLabels: []
# Label to add before locking, such as `outdated`. Set to `false` to disable
lockLabel: false
# Comment to post before locking. Set to `false` to disable
lockComment: >
This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.
# Assign `resolved` as the reason for locking. Set to `false` to disable
setLockReason: true

View File

@@ -1,5 +1,6 @@
{
"extends": ["config:base"],
"labels": ["dependencies"],
"packageRules": [
{
// automatically merge an update of runner
@@ -11,10 +12,30 @@
"regexManagers": [
{
// use https://github.com/actions/runner/releases
"fileMatch": [".github/workflows/build-and-release-runners.yml"],
"fileMatch": [
".github/workflows/runners.yaml"
],
"matchStrings": ["RUNNER_VERSION: +(?<currentValue>.*?)\\n"],
"depNameTemplate": "actions/runner",
"datasourceTemplate": "github-releases"
},
{
"fileMatch": [
"runner/Makefile",
"Makefile"
],
"matchStrings": ["RUNNER_VERSION \\?= +(?<currentValue>.*?)\\n"],
"depNameTemplate": "actions/runner",
"datasourceTemplate": "github-releases"
},
{
"fileMatch": [
"runner/actions-runner.dockerfile",
"runner/actions-runner-dind.dockerfile"
],
"matchStrings": ["RUNNER_VERSION=+(?<currentValue>.*?)\\n"],
"depNameTemplate": "actions/runner",
"datasourceTemplate": "github-releases"
}
]
}
}

66
.github/stale.yml vendored
View File

@@ -1,66 +0,0 @@
# Configuration for probot-stale - https://github.com/probot/stale
# Number of days of inactivity before an Issue or Pull Request becomes stale
daysUntilStale: 30
# Number of days of inactivity before an Issue or Pull Request with the stale label is closed.
# Set to false to disable. If disabled, issues still need to be closed manually, but will remain marked as stale.
daysUntilClose: 14
# Only issues or pull requests with all of these labels are check if stale. Defaults to `[]` (disabled)
onlyLabels: []
# Issues or Pull Requests with these labels will never be considered stale. Set to `[]` to disable
exemptLabels:
- pinned
- security
- enhancement
- refactor
- documentation
- chore
- needs-investigation
- bug
# Set to true to ignore issues in a project (defaults to false)
exemptProjects: false
# Set to true to ignore issues in a milestone (defaults to false)
exemptMilestones: false
# Set to true to ignore issues with an assignee (defaults to false)
exemptAssignees: false
# Label to use when marking as stale
staleLabel: stale
# Comment to post when marking as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when removing the stale label.
# unmarkComment: >
# Your comment here.
# Comment to post when closing a stale Issue or Pull Request.
# closeComment: >
# Your comment here.
# Limit the number of actions per hour, from 1-30. Default is 30
limitPerRun: 30
# Limit to only `issues` or `pulls`
# only: issues
# Optionally, specify configuration settings that are specific to just 'issues' or 'pulls':
# pulls:
# daysUntilStale: 30
# markComment: >
# This pull request has been automatically marked as stale because it has not had
# recent activity. It will be closed if no further activity occurs. Thank you
# for your contributions.
# issues:
# exemptLabels:
# - confirmed

View File

@@ -1,121 +0,0 @@
name: Build and Release Runners
on:
pull_request:
branches:
- '**'
paths:
- 'runner/**'
- .github/workflows/build-and-release-runners.yml
push:
branches:
- master
paths:
- runner/patched/*
- runner/Dockerfile
- runner/Dockerfile.ubuntu.1804
- runner/Dockerfile.dindrunner
- runner/entrypoint.sh
- .github/workflows/build-and-release-runners.yml
env:
RUNNER_VERSION: 2.283.1
DOCKER_VERSION: 20.10.8
DOCKERHUB_USERNAME: summerwind
jobs:
build:
runs-on: ubuntu-latest
name: Build ${{ matrix.name }}-ubuntu-${{ matrix.os-version }}
strategy:
matrix:
include:
- name: actions-runner
os-version: 20.04
dockerfile: Dockerfile
- name: actions-runner
os-version: 18.04
dockerfile: Dockerfile.ubuntu.1804
- name: actions-runner-dind
os-version: 20.04
dockerfile: Dockerfile.dindrunner
steps:
- name: Set outputs
id: vars
run: echo ::set-output name=sha_short::${GITHUB_SHA::7}
- name: Checkout
uses: actions/checkout@v2
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
with:
version: latest
- name: Login to DockerHub
uses: docker/login-action@v1
if: ${{ github.event_name == 'push' || github.event_name == 'release' }}
with:
username: ${{ secrets.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- name: Build and Push Versioned Tags
uses: docker/build-push-action@v2
with:
context: ./runner
file: ./runner/${{ matrix.dockerfile }}
platforms: linux/amd64,linux/arm64
push: ${{ github.event_name != 'pull_request' }}
build-args: |
RUNNER_VERSION=${{ env.RUNNER_VERSION }}
DOCKER_VERSION=${{ env.DOCKER_VERSION }}
tags: |
${{ env.DOCKERHUB_USERNAME }}/${{ matrix.name }}:v${{ env.RUNNER_VERSION }}-ubuntu-${{ matrix.os-version }}
${{ env.DOCKERHUB_USERNAME }}/${{ matrix.name }}:v${{ env.RUNNER_VERSION }}-ubuntu-${{ matrix.os-version }}-${{ steps.vars.outputs.sha_short }}
latest-tags:
if: ${{ github.event_name == 'push' || github.event_name == 'release' }}
runs-on: ubuntu-latest
name: Build ${{ matrix.name }}-latest
strategy:
matrix:
include:
- name: actions-runner
dockerfile: Dockerfile
- name: actions-runner-dind
dockerfile: Dockerfile.dindrunner
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
with:
version: latest
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- name: Build and Push Latest Tag
uses: docker/build-push-action@v2
with:
context: ./runner
file: ./runner/${{ matrix.dockerfile }}
platforms: linux/amd64,linux/arm64
push: true
build-args: |
RUNNER_VERSION=${{ env.RUNNER_VERSION }}
DOCKER_VERSION=${{ env.DOCKER_VERSION }}
tags: |
${{ env.DOCKERHUB_USERNAME }}/${{ matrix.name }}:latest

View File

@@ -1,20 +1,28 @@
name: Publish ARC
on:
release:
types: [published]
types:
- published
# https://docs.github.com/en/rest/overview/permissions-required-for-github-apps
permissions:
contents: write
packages: write
jobs:
build:
runs-on: ubuntu-latest
release-controller:
name: Release
runs-on: ubuntu-latest
env:
DOCKERHUB_USERNAME: ${{ secrets.DOCKER_USER }}
steps:
- name: Set outputs
id: vars
run: echo ::set-output name=sha_short::${GITHUB_SHA::7}
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
- uses: actions/setup-go@v3
with:
go-version: '1.18.2'
- name: Install tools
run: |
@@ -33,25 +41,20 @@ jobs:
- name: Upload artifacts
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: make github-release
run: |
make github-release
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
- name: Setup Docker Environment
id: vars
uses: ./.github/actions/setup-docker-environment
with:
version: latest
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USER }}
username: ${{ env.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
ghcr_username: ${{ github.actor }}
ghcr_password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and Push
uses: docker/build-push-action@v2
uses: docker/build-push-action@v3
with:
file: Dockerfile
platforms: linux/amd64,linux/arm64
@@ -60,4 +63,8 @@ jobs:
${{ env.DOCKERHUB_USERNAME }}/actions-runner-controller:latest
${{ env.DOCKERHUB_USERNAME }}/actions-runner-controller:${{ env.VERSION }}
${{ env.DOCKERHUB_USERNAME }}/actions-runner-controller:${{ env.VERSION }}-${{ steps.vars.outputs.sha_short }}
ghcr.io/actions-runner-controller/actions-runner-controller:latest
ghcr.io/actions-runner-controller/actions-runner-controller:${{ env.VERSION }}
ghcr.io/actions-runner-controller/actions-runner-controller:${{ env.VERSION }}-${{ steps.vars.outputs.sha_short }}
cache-from: type=gha
cache-to: type=gha,mode=max

58
.github/workflows/publish-canary.yaml vendored Normal file
View File

@@ -0,0 +1,58 @@
name: Publish Canary Image
on:
push:
branches:
- master
paths-ignore:
- '**.md'
- '.github/ISSUE_TEMPLATE/**'
- '.github/workflows/validate-chart.yaml'
- '.github/workflows/publish-chart.yaml'
- '.github/workflows/publish-arc.yaml'
- '.github/workflows/runners.yaml'
- '.github/workflows/validate-entrypoint.yaml'
- '.github/renovate.*'
- 'runner/**'
- '.gitignore'
- 'PROJECT'
- 'LICENSE'
- 'Makefile'
# https://docs.github.com/en/rest/overview/permissions-required-for-github-apps
permissions:
contents: read
packages: write
jobs:
canary-build:
name: Build and Publish Canary Image
runs-on: ubuntu-latest
env:
DOCKERHUB_USERNAME: ${{ secrets.DOCKER_USER }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup Docker Environment
id: vars
uses: ./.github/actions/setup-docker-environment
with:
username: ${{ env.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
ghcr_username: ${{ github.actor }}
ghcr_password: ${{ secrets.GITHUB_TOKEN }}
# Considered unstable builds
# See Issue #285, PR #286, and PR #323 for more information
- name: Build and Push
uses: docker/build-push-action@v3
with:
file: Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: |
${{ env.DOCKERHUB_USERNAME }}/actions-runner-controller:canary
ghcr.io/actions-runner-controller/actions-runner-controller:canary
cache-from: type=gha,scope=arc-canary
cache-to: type=gha,mode=max,scope=arc-canary

View File

@@ -1,4 +1,4 @@
name: Publish helm chart
name: Publish Helm Chart
on:
push:
@@ -6,26 +6,32 @@ on:
- master
paths:
- 'charts/**'
- '.github/workflows/on-push-master-publish-chart.yml'
- '.github/workflows/publish-chart.yaml'
- '!charts/actions-runner-controller/docs/**'
- '!**.md'
workflow_dispatch:
env:
KUBE_SCORE_VERSION: 1.10.0
HELM_VERSION: v3.4.1
HELM_VERSION: v3.8.0
permissions:
contents: read
jobs:
lint-chart:
name: Lint Chart
runs-on: ubuntu-latest
outputs:
publish-chart: ${{ steps.publish-chart-step.outputs.publish }}
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Helm
uses: azure/setup-helm@v1
uses: azure/setup-helm@v3.0
with:
version: ${{ env.HELM_VERSION }}
@@ -46,12 +52,12 @@ jobs:
--enable-optional-test container-security-context-readonlyrootfilesystem
# python is a requirement for the chart-testing action below (supports yamllint among other tests)
- uses: actions/setup-python@v2
- uses: actions/setup-python@v4
with:
python-version: 3.7
python-version: '3.7'
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.1.0
uses: helm/chart-testing-action@v2.2.1
- name: Run chart-testing (list-changed)
id: list-changed
@@ -62,31 +68,50 @@ jobs:
fi
- name: Run chart-testing (lint)
run: ct lint --config charts/.ci/ct-config.yaml
run: |
ct lint --config charts/.ci/ct-config.yaml
- name: Create kind cluster
uses: helm/kind-action@v1.2.0
if: steps.list-changed.outputs.changed == 'true'
uses: helm/kind-action@v1.3.0
# We need cert-manager already installed in the cluster because we assume the CRDs exist
- name: Install cert-manager
if: steps.list-changed.outputs.changed == 'true'
run: |
helm repo add jetstack https://charts.jetstack.io --force-update
helm install cert-manager jetstack/cert-manager --set installCRDs=true --wait
if: steps.list-changed.outputs.changed == 'true'
- name: Run chart-testing (install)
run: ct install --config charts/.ci/ct-config.yaml
if: steps.list-changed.outputs.changed == 'true'
run: ct install --config charts/.ci/ct-config.yaml
# WARNING: This relies on the latest release being inat the top of the JSON from GitHub and a clean chart.yaml
- name: Check if Chart Publish is Needed
id: publish-chart-step
run: |
CHART_TEXT=$(curl -fs https://raw.githubusercontent.com/actions-runner-controller/actions-runner-controller/master/charts/actions-runner-controller/Chart.yaml)
NEW_CHART_VERSION=$(echo "$CHART_TEXT" | grep version: | cut -d ' ' -f 2)
RELEASE_LIST=$(curl -fs https://api.github.com/repos/actions-runner-controller/actions-runner-controller/releases | jq .[].tag_name | grep actions-runner-controller | cut -d '"' -f 2 | cut -d '-' -f 4)
LATEST_RELEASED_CHART_VERSION=$(echo $RELEASE_LIST | cut -d ' ' -f 1)
echo "Chart version in master : $NEW_CHART_VERSION"
echo "Latest release chart version : $LATEST_RELEASED_CHART_VERSION"
if [[ $NEW_CHART_VERSION != $LATEST_RELEASED_CHART_VERSION ]]; then
echo "::set-output name=publish::true"
fi
publish-chart:
runs-on: ubuntu-latest
if: needs.lint-chart.outputs.publish-chart == 'true'
needs: lint-chart
name: Publish Chart
runs-on: ubuntu-latest
permissions:
contents: write # for helm/chart-releaser-action to push chart release and create a release
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0
@@ -96,7 +121,7 @@ jobs:
git config user.email "$GITHUB_ACTOR@users.noreply.github.com"
- name: Run chart-releaser
uses: helm/chart-releaser-action@v1.2.1
uses: helm/chart-releaser-action@v1.4.0
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

32
.github/workflows/run-codeql.yaml vendored Normal file
View File

@@ -0,0 +1,32 @@
name: Run CodeQL
on:
push:
branches:
- master
pull_request:
branches:
- master
schedule:
- cron: '30 1 * * 0'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
security-events: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: go
- name: Autobuild
uses: github/codeql-action/autobuild@v2
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2

25
.github/workflows/run-stale.yaml vendored Normal file
View File

@@ -0,0 +1,25 @@
name: Run Stale Bot
on:
schedule:
- cron: '30 1 * * *'
permissions:
contents: read
jobs:
stale:
name: Run Stale
runs-on: ubuntu-latest
permissions:
issues: write # for actions/stale to close stale issues
pull-requests: write # for actions/stale to close stale PRs
steps:
- uses: actions/stale@v5
with:
stale-issue-message: 'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
# turn off stale for both issues and PRs
days-before-stale: -1
# turn stale back on for issues only
days-before-issue-stale: 30
days-before-issue-close: 14
exempt-issue-labels: 'pinned,security,enhancement,refactor,documentation,chore,bug,dependencies,needs-investigation'

83
.github/workflows/runners.yaml vendored Normal file
View File

@@ -0,0 +1,83 @@
name: Runners
on:
pull_request:
types:
- opened
- synchronize
- reopened
branches:
- 'master'
paths:
- 'runner/**'
- '!runner/Makefile'
- '.github/workflows/runners.yaml'
- '!**.md'
# We must do a trigger on a push: instead of a types: closed so GitHub Secrets
# are available to the workflow run
push:
branches:
- 'master'
paths:
- 'runner/**'
- '!runner/Makefile'
- '.github/workflows/runners.yaml'
- '!**.md'
env:
RUNNER_VERSION: 2.294.0
DOCKER_VERSION: 20.10.12
RUNNER_CONTAINER_HOOKS_VERSION: 0.1.2
DOCKERHUB_USERNAME: summerwind
jobs:
build-runners:
name: Build ${{ matrix.name }}-${{ matrix.os-name }}-${{ matrix.os-version }}
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
strategy:
fail-fast: false
matrix:
include:
- name: actions-runner
os-name: ubuntu
os-version: 20.04
- name: actions-runner-dind
os-name: ubuntu
os-version: 20.04
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup Docker Environment
id: vars
uses: ./.github/actions/setup-docker-environment
with:
username: ${{ env.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
ghcr_username: ${{ github.actor }}
ghcr_password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and Push Versioned Tags
uses: docker/build-push-action@v3
with:
context: ./runner
file: ./runner/${{ matrix.name }}.dockerfile
platforms: linux/amd64,linux/arm64
push: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
build-args: |
RUNNER_VERSION=${{ env.RUNNER_VERSION }}
DOCKER_VERSION=${{ env.DOCKER_VERSION }}
RUNNER_CONTAINER_HOOKS_VERSION=${{ env.RUNNER_CONTAINER_HOOKS_VERSION }}
tags: |
${{ env.DOCKERHUB_USERNAME }}/${{ matrix.name }}:v${{ env.RUNNER_VERSION }}-${{ matrix.os-name }}-${{ matrix.os-version }}
${{ env.DOCKERHUB_USERNAME }}/${{ matrix.name }}:v${{ env.RUNNER_VERSION }}-${{ matrix.os-name }}-${{ matrix.os-version }}-${{ steps.vars.outputs.sha_short }}
${{ env.DOCKERHUB_USERNAME }}/${{ matrix.name }}:latest
ghcr.io/${{ github.repository }}/${{ matrix.name }}:latest
ghcr.io/${{ github.repository }}/${{ matrix.name }}:v${{ env.RUNNER_VERSION }}-${{ matrix.os-name }}-${{ matrix.os-version }}
ghcr.io/${{ github.repository }}/${{ matrix.name }}:v${{ env.RUNNER_VERSION }}-${{ matrix.os-name }}-${{ matrix.os-version }}-${{ steps.vars.outputs.sha_short }}
cache-from: type=gha,scope=build-${{ matrix.name }}
cache-to: type=gha,mode=max,scope=build-${{ matrix.name }}

View File

@@ -1,34 +0,0 @@
name: CI
on:
pull_request:
branches:
- master
paths-ignore:
- 'runner/**'
- .github/workflows/build-and-release-runners.yml
- '*.md'
- '.gitignore'
jobs:
test:
runs-on: ubuntu-latest
name: Test
steps:
- name: Checkout
uses: actions/checkout@v2
- uses: actions/setup-go@v2
with:
go-version: '^1.16.5'
- run: go version
- name: Install kubebuilder
run: |
curl -L -O https://github.com/kubernetes-sigs/kubebuilder/releases/download/v2.3.2/kubebuilder_2.3.2_linux_amd64.tar.gz
tar zxvf kubebuilder_2.3.2_linux_amd64.tar.gz
sudo mv kubebuilder_2.3.2_linux_amd64 /usr/local/kubebuilder
- name: Run tests
run: make test
- name: Verify manifests are up-to-date
run: |
make manifests
git diff --exit-code

60
.github/workflows/validate-arc.yaml vendored Normal file
View File

@@ -0,0 +1,60 @@
name: Validate ARC
on:
pull_request:
branches:
- master
paths-ignore:
- '**.md'
- '.github/ISSUE_TEMPLATE/**'
- '.github/workflows/publish-canary.yaml'
- '.github/workflows/validate-chart.yaml'
- '.github/workflows/publish-chart.yaml'
- '.github/workflows/runners.yaml'
- '.github/workflows/publish-arc.yaml'
- '.github/workflows/validate-entrypoint.yaml'
- '.github/renovate.*'
- 'runner/**'
- '.gitignore'
- 'PROJECT'
- 'LICENSE'
- 'Makefile'
permissions:
contents: read
jobs:
test-controller:
name: Test ARC
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set-up Go
uses: actions/setup-go@v3
with:
go-version: '1.18.2'
check-latest: false
- uses: actions/cache@v3
with:
path: ~/go/pkg/mod
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ runner.os }}-go-
- name: Install kubebuilder
run: |
curl -L -O https://github.com/kubernetes-sigs/kubebuilder/releases/download/v2.3.2/kubebuilder_2.3.2_linux_amd64.tar.gz
tar zxvf kubebuilder_2.3.2_linux_amd64.tar.gz
sudo mv kubebuilder_2.3.2_linux_amd64 /usr/local/kubebuilder
- name: Run tests
run: |
make test
- name: Verify manifests are up-to-date
run: |
make manifests
git diff --exit-code

View File

@@ -1,28 +1,32 @@
name: Lint and Test Charts
name: Validate Helm Chart
on:
push:
paths:
- 'charts/**'
- '.github/workflows/on-push-lint-charts.yml'
- '.github/workflows/validate-chart.yaml'
- '!charts/actions-runner-controller/docs/**'
- '!**.md'
workflow_dispatch:
env:
KUBE_SCORE_VERSION: 1.10.0
HELM_VERSION: v3.4.1
HELM_VERSION: v3.8.0
permissions:
contents: read
jobs:
lint-test:
validate-chart:
name: Lint Chart
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Helm
uses: azure/setup-helm@v1
uses: azure/setup-helm@v3.0
with:
version: ${{ env.HELM_VERSION }}
@@ -43,12 +47,12 @@ jobs:
--enable-optional-test container-security-context-readonlyrootfilesystem
# python is a requirement for the chart-testing action below (supports yamllint among other tests)
- uses: actions/setup-python@v2
- uses: actions/setup-python@v4
with:
python-version: 3.7
python-version: '3.7'
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.1.0
uses: helm/chart-testing-action@v2.2.1
- name: Run chart-testing (list-changed)
id: list-changed
@@ -59,18 +63,20 @@ jobs:
fi
- name: Run chart-testing (lint)
run: ct lint --config charts/.ci/ct-config.yaml
run: |
ct lint --config charts/.ci/ct-config.yaml
- name: Create kind cluster
uses: helm/kind-action@v1.2.0
uses: helm/kind-action@v1.3.0
if: steps.list-changed.outputs.changed == 'true'
# We need cert-manager already installed in the cluster because we assume the CRDs exist
- name: Install cert-manager
if: steps.list-changed.outputs.changed == 'true'
run: |
helm repo add jetstack https://charts.jetstack.io --force-update
helm install cert-manager jetstack/cert-manager --set installCRDs=true --wait
if: steps.list-changed.outputs.changed == 'true'
- name: Run chart-testing (install)
run: ct install --config charts/.ci/ct-config.yaml
run: |
ct install --config charts/.ci/ct-config.yaml

View File

@@ -1,4 +1,4 @@
name: Unit tests for entrypoint
name: Validate Runners
on:
pull_request:
@@ -7,15 +7,19 @@ on:
paths:
- 'runner/**'
- 'test/entrypoint/**'
- '!**.md'
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
test-runner-entrypoint:
name: Test entrypoint
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Run unit tests for entrypoint.sh
uses: actions/checkout@v3
- name: Run tests
run: |
cd test/entrypoint
bash entrypoint_unittest.sh
make acceptance/runner/entrypoint

View File

@@ -1,44 +0,0 @@
on:
push:
branches:
- master
paths-ignore:
- "runner/**"
- "**.md"
- ".gitignore"
jobs:
build:
runs-on: ubuntu-latest
name: release-latest
env:
DOCKERHUB_USERNAME: ${{ secrets.DOCKER_USER }}
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
with:
version: latest
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
# Considered unstable builds
# See Issue #285, PR #286, and PR #323 for more information
- name: Build and Push
uses: docker/build-push-action@v2
with:
file: Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: |
${{ env.DOCKERHUB_USERNAME }}/actions-runner-controller:canary

2
CODEOWNERS Normal file
View File

@@ -0,0 +1,2 @@
# actions-runner-controller maintainers
* @mumoshu @toast-gear

View File

@@ -1,10 +1,25 @@
## Contributing
### Testing Controller Built from a Pull Request
We always appreciate your help in testing open pull requests by deploying custom builds of actions-runner-controller onto your own environment, so that we are extra sure we didn't break anything.
It is especially true when the pull request is about GitHub Enterprise, both GHEC and GHES, as [maintainers don't have GitHub Enterprise environments for testing](/README.md#github-enterprise-support).
The process would look like the below:
- Clone this repository locally
- Checkout the branch. If you use the `gh` command, run `gh pr checkout $PR_NUMBER`
- Run `NAME=$DOCKER_USER/actions-runner-controller VERSION=canary make docker-build docker-push` for a custom container image build
- Update your actions-runner-controller's controller-manager deployment to use the new image, `$DOCKER_USER/actions-runner-controller:canary`
Please also note that you need to replace `$DOCKER_USER` with your own DockerHub account name.
### How to Contribute a Patch
Depending on what you are patching depends on how you should go about it. Below are some guides on how to test patches locally as well as develop the controller and runners.
When sumitting a PR for a change please provide evidence that your change works as we still need to work on improving the CI of the project. Some resources are provided for helping achieve this, see this guide for details.
When submitting a PR for a change please provide evidence that your change works as we still need to work on improving the CI of the project. Some resources are provided for helping achieve this, see this guide for details.
#### Running an End to End Test
@@ -80,6 +95,7 @@ To make your development cycle faster, use the below command to update deploy an
# you either need to bump VERSION and RUNNER_TAG on each run,
# or manually run `kubectl delete pod $POD` on respective pods for changes to actually take effect.
# Makefile
VERSION=controller1 \
RUNNER_TAG=runner1 \
make acceptance/pull acceptance/kind docker-build acceptance/load acceptance/deploy
@@ -88,14 +104,16 @@ VERSION=controller1 \
If you've already deployed actions-runner-controller and only want to recreate pods to use the newer image, you can run:
```shell
# Makefile
NAME=$DOCKER_USER/actions-runner-controller \
make docker-build acceptance/load && \
kubectl -n actions-runner-system delete po $(kubectl -n actions-runner-system get po -ojsonpath={.items[*].metadata.name})
```
Similarly, if you'd like to recreate runner pods with the newer runner image,
Similarly, if you'd like to recreate runner pods with the newer runner image you can use the runner specific [Makefile](runner/Makefile) to build and / or push new runner images
```shell
# runner/Makefile
NAME=$DOCKER_USER/actions-runner make \
-C runner docker-{build,push}-ubuntu && \
(kubectl get po -ojsonpath={.items[*].metadata.name} | xargs -n1 kubectl delete po)
@@ -136,7 +154,4 @@ GINKGO_FOCUS='[It] should create a new Runner resource from the specified templa
#### Helm Version Bumps
**Chart Version :** When bumping the chart version follow semantic versioning https://semver.org/<br />
**App Version :** When bumping the app version you will also need to bump the chart version too. Again, follow semantic versioning when bumping the chart.
To determine if you need to bump the MAJOR, MINOR or PATCH versions you will need to review the changes between the previous app version and the new app version and / or ask for a maintainer to advise.
In general we ask you not to bump the version in your PR, the maintainers in general manage the publishing of a new chart.

View File

@@ -1,29 +1,44 @@
# Build the manager binary
FROM golang:1.17 as builder
ARG TARGETPLATFORM
FROM --platform=$BUILDPLATFORM golang:1.18.3 as builder
WORKDIR /workspace
ENV GO111MODULE=on \
CGO_ENABLED=0
# Make it runnable on a distroless image/without libc
ENV CGO_ENABLED=0
# Copy the Go Modules manifests
COPY go.mod go.sum ./
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
# and so that source changes don't invalidate our downloaded layer.
#
# Also, we need to do this before setting TARGETPLATFORM/TARGETOS/TARGETARCH/TARGETVARIANT
# so that go mod cache is shared across platforms.
RUN go mod download
# Copy the go source
COPY . .
# COPY . .
# Usage:
# docker buildx build --tag repo/img:tag -f ./Dockerfile . --platform linux/amd64,linux/arm64,linux/arm/v7
#
# With the above commmand,
# TARGETOS can be "linux", TARGETARCH can be "amd64", "arm64", and "arm", TARGETVARIANT can be "v7".
ARG TARGETPLATFORM TARGETOS TARGETARCH TARGETVARIANT
# We intentionally avoid `--mount=type=cache,mode=0777,target=/go/pkg/mod` in the `go mod download` and the `go build` runs
# to avoid https://github.com/moby/buildkit/issues/2334
# We can use docker layer cache so the build is fast enogh anyway
# We also use per-platform GOCACHE for the same reason.
env GOCACHE /build/${TARGETPLATFORM}/root/.cache/go-build
# Build
RUN export GOOS=$(echo ${TARGETPLATFORM} | cut -d / -f1) && \
export GOARCH=$(echo ${TARGETPLATFORM} | cut -d / -f2) && \
GOARM=$(echo ${TARGETPLATFORM} | cut -d / -f3 | cut -c2-) && \
go build -a -o manager main.go && \
go build -a -o github-webhook-server ./cmd/githubwebhookserver
RUN --mount=target=. \
--mount=type=cache,mode=0777,target=${GOCACHE} \
export GOOS=${TARGETOS} GOARCH=${TARGETARCH} GOARM=${TARGETVARIANT#v} && \
go build -o /out/manager main.go && \
go build -o /out/github-webhook-server ./cmd/githubwebhookserver
# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
@@ -31,8 +46,8 @@ FROM gcr.io/distroless/static:nonroot
WORKDIR /
COPY --from=builder /workspace/manager .
COPY --from=builder /workspace/github-webhook-server .
COPY --from=builder /out/manager .
COPY --from=builder /out/github-webhook-server .
USER nonroot:nonroot

View File

@@ -5,21 +5,23 @@ else
endif
DOCKER_USER ?= $(shell echo ${NAME} | cut -d / -f1)
VERSION ?= latest
RUNNER_VERSION ?= 2.294.0
TARGETPLATFORM ?= $(shell arch)
RUNNER_NAME ?= ${DOCKER_USER}/actions-runner
RUNNER_TAG ?= ${VERSION}
TEST_REPO ?= ${DOCKER_USER}/actions-runner-controller
TEST_ORG ?=
TEST_ORG_REPO ?=
TEST_EPHEMERAL ?= false
SYNC_PERIOD ?= 5m
SYNC_PERIOD ?= 1m
USE_RUNNERSET ?=
RUNNER_FEATURE_FLAG_EPHEMERAL ?=
KUBECONTEXT ?= kind-acceptance
CLUSTER ?= acceptance
CERT_MANAGER_VERSION ?= v1.1.1
KUBE_RBAC_PROXY_VERSION ?= v0.11.0
# Produce CRDs that work back to Kubernetes 1.11 (no version conversion)
CRD_OPTIONS ?= "crd:trivialVersions=true,generateEmbeddedObjectMeta=true"
CRD_OPTIONS ?= "crd:generateEmbeddedObjectMeta=true"
# Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)
ifeq (,$(shell go env GOBIN))
@@ -54,6 +56,7 @@ GO_TEST_ARGS ?= -short
# Run tests
test: generate fmt vet manifests
go test $(GO_TEST_ARGS) ./... -coverprofile cover.out
go test -fuzz=Fuzz -fuzztime=10s -run=Fuzz* ./controllers
test-with-deps: kube-apiserver etcd kubectl
# See https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/envtest#pkg-constants
@@ -107,13 +110,9 @@ vet:
generate: controller-gen
$(CONTROLLER_GEN) object:headerFile=./hack/boilerplate.go.txt paths="./..."
# Build the docker image
docker-build:
docker build -t ${NAME}:${VERSION} .
docker build -t ${RUNNER_NAME}:${RUNNER_TAG} --build-arg TARGETPLATFORM=$(shell arch) runner
docker-buildx:
export DOCKER_CLI_EXPERIMENTAL=enabled
export DOCKER_CLI_EXPERIMENTAL=enabled ;\
export DOCKER_BUILDKIT=1
@if ! docker buildx ls | grep -q container-builder; then\
docker buildx create --platform ${PLATFORMS} --name container-builder --use;\
fi
@@ -156,7 +155,7 @@ acceptance/kind:
# See https://kind.sigs.k8s.io/docs/user/known-issues/#docker-installed-with-snap
acceptance/load:
kind load docker-image ${NAME}:${VERSION} --name ${CLUSTER}
kind load docker-image quay.io/brancz/kube-rbac-proxy:v0.10.0 --name ${CLUSTER}
kind load docker-image quay.io/brancz/kube-rbac-proxy:$(KUBE_RBAC_PROXY_VERSION) --name ${CLUSTER}
kind load docker-image ${RUNNER_NAME}:${RUNNER_TAG} --name ${CLUSTER}
kind load docker-image docker:dind --name ${CLUSTER}
kind load docker-image quay.io/jetstack/cert-manager-controller:$(CERT_MANAGER_VERSION) --name ${CLUSTER}
@@ -166,7 +165,7 @@ acceptance/load:
# Pull the docker images for acceptance
acceptance/pull:
docker pull quay.io/brancz/kube-rbac-proxy:v0.10.0
docker pull quay.io/brancz/kube-rbac-proxy:$(KUBE_RBAC_PROXY_VERSION)
docker pull docker:dind
docker pull quay.io/jetstack/cert-manager-controller:$(CERT_MANAGER_VERSION)
docker pull quay.io/jetstack/cert-manager-cainjector:$(CERT_MANAGER_VERSION)
@@ -189,12 +188,14 @@ acceptance/deploy:
TEST_ORG=${TEST_ORG} TEST_ORG_REPO=${TEST_ORG_REPO} SYNC_PERIOD=${SYNC_PERIOD} \
USE_RUNNERSET=${USE_RUNNERSET} \
TEST_EPHEMERAL=${TEST_EPHEMERAL} \
RUNNER_FEATURE_FLAG_EPHEMERAL=${RUNNER_FEATURE_FLAG_EPHEMERAL} \
acceptance/deploy.sh
acceptance/tests:
acceptance/checks.sh
acceptance/runner/entrypoint:
cd test/entrypoint/ && bash test.sh
# We use -count=1 instead of `go clean -testcache`
# See https://terratest.gruntwork.io/docs/testing-best-practices/avoid-test-caching/
.PHONY: e2e
@@ -221,7 +222,7 @@ ifeq (, $(wildcard $(GOBIN)/controller-gen))
CONTROLLER_GEN_TMP_DIR=$$(mktemp -d) ;\
cd $$CONTROLLER_GEN_TMP_DIR ;\
go mod init tmp ;\
go get sigs.k8s.io/controller-tools/cmd/controller-gen@v0.6.0 ;\
go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.7.0 ;\
rm -rf $$CONTROLLER_GEN_TMP_DIR ;\
}
endif
@@ -241,7 +242,7 @@ ifeq (, $(wildcard $(GOBIN)/yq))
YQ_TMP_DIR=$$(mktemp -d) ;\
cd $$YQ_TMP_DIR ;\
go mod init tmp ;\
go get github.com/mikefarah/yq/v3@3.4.0 ;\
go install github.com/mikefarah/yq/v3@3.4.0 ;\
rm -rf $$YQ_TMP_DIR ;\
}
endif

1322
README.md

File diff suppressed because it is too large Load Diff

22
SECURITY.md Normal file
View File

@@ -0,0 +1,22 @@
# Security Policy
## Sponsoring the project
This project is maintained by a small team of two and therefore lacks the resource to provide security fixes in a timely manner.
If you have important business(es) that relies on this project, please consider sponsoring the project so that the maintainer(s) can commit to providing such service.
Please refer to https://github.com/sponsors/actions-runner-controller for available tiers.
## Supported Versions
| Version | Supported |
| ------- | ------------------ |
| 0.23.0 | :white_check_mark: |
| < 0.23.0| :x: |
## Reporting a Vulnerability
To report a security issue, please email ykuoka+arcsecurity(at)gmail.com with a description of the issue, the steps you took to create the issue, affected versions, and, if known, mitigations for the issue.
A maintainer will try to respond within 5 working days. If the issue is confirmed as a vulnerability, a Security Advisory will be opened. This project tries to follow a 90 day disclosure timeline.

240
TROUBLESHOOTING.md Normal file
View File

@@ -0,0 +1,240 @@
# Troubleshooting
* [Tools](#tools)
* [Installation](#installation)
* [InternalError when calling webhook: context deadline exceeded](#internalerror-when-calling-webhook-context-deadline-exceeded)
* [Invalid header field value](#invalid-header-field-value)
* [Operations](#operations)
* [Stuck runner kind or backing pod](#stuck-runner-kind-or-backing-pod)
* [Delay in jobs being allocated to runners](#delay-in-jobs-being-allocated-to-runners)
* [Runner coming up before network available](#runner-coming-up-before-network-available)
* [Outgoing network action hangs indefinitely](#outgoing-network-action-hangs-indefinitely)
* [Unable to scale to zero with TotalNumberOfQueuedAndInProgressWorkflowRuns](#unable-to-scale-to-zero-with-totalnumberofqueuedandinprogressworkflowruns)
## Tools
A list of tools which are helpful for troubleshooting
* https://github.com/rewanthtammana/kubectl-fields Kubernetes resources hierarchy parsing tool
* https://github.com/stern/stern Multi pod and container log tailing for Kubernetes
## Installation
Troubeshooting runbooks that relate to ARC installation problems
### InternalError when calling webhook: context deadline exceeded
**Problem**
This issue can come up for various reasons like leftovers from previous installations or not being able to access the K8s service's clusterIP associated with the admission webhook server (of ARC).
```
Internal error occurred: failed calling webhook "mutate.runnerdeployment.actions.summerwind.dev":
Post "https://actions-runner-controller-webhook.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment?timeout=10s": context deadline exceeded
```
**Solution**
First we will try the common solution of checking webhook leftovers from previous installations:
1. ```bash
kubectl get validatingwebhookconfiguration -A
kubectl get mutatingwebhookconfiguration -A
```
2. If you see any webhooks related to actions-runner-controller, delete them:
```bash
kubectl delete mutatingwebhookconfiguration actions-runner-controller-mutating-webhook-configuration
kubectl delete validatingwebhookconfiguration actions-runner-controller-validating-webhook-configuration
```
If that didn't work then probably your K8s control-plane is somehow unable to access the K8s service's clusterIP associated with the admission webhook server:
1. You're running apiserver as a binary and you didn't make service cluster IPs available to the host network.
2. You're running the apiserver in the pod but your pod network (i.e. CNI plugin installation and config) is not good so your pods(like kube-apiserver) in the K8s control-plane nodes can't access ARC's admission webhook server pod(s) in probably data-plane nodes.
Another reason could be due to GKEs firewall settings you may run into the following errors when trying to deploy runners on a private GKE cluster:
To fix this, you may either:
1. Configure the webhook to use another port, such as 443 or 10250, [each of
which allow traffic by default](https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules).
```sh
# With helm, you'd set `webhookPort` to the port number of your choice
# See https://github.com/actions-runner-controller/actions-runner-controller/pull/1410/files for more information
helm upgrade --install --namespace actions-runner-system --create-namespace \
--wait actions-runner-controller actions-runner-controller/actions-runner-controller \
--set webhookPort=10250
```
2. Set up a firewall rule to allow the master node to connect to the default
webhook port. The exact way to do this may vary, but the following script
should point you in the right direction:
```sh
# 1) Retrieve the network tag automatically given to the worker nodes
# NOTE: this only works if you have only one cluster in your GCP project. You will have to manually inspect the result of this command to find the tag for the cluster you want to target
WORKER_NODES_TAG=$(gcloud compute instances list --format='text(tags.items[0])' --filter='metadata.kubelet-config:*' | grep tags | awk '{print $2}' | sort | uniq)
# 2) Take note of the VPC network in which you deployed your cluster
# NOTE this only works if you have only one network in which you deploy your clusters
NETWORK=$(gcloud compute instances list --format='text(networkInterfaces[0].network)' --filter='metadata.kubelet-config:*' | grep networks | awk -F'/' '{print $NF}' | sort | uniq)
# 3) Get the master source ip block
SOURCE=$(gcloud container clusters describe <cluster-name> --region <region> | grep masterIpv4CidrBlock| cut -d ':' -f 2 | tr -d ' ')
gcloud compute firewall-rules create k8s-cert-manager --source-ranges $SOURCE --target-tags $WORKER_NODES_TAG --allow TCP:9443 --network $NETWORK
```
### Invalid header field value
**Problem**
```json
2020-11-12T22:17:30.693Z ERROR controller-runtime.controller Reconciler error
{
"controller": "runner",
"request": "actions-runner-system/runner-deployment-dk7q8-dk5c9",
"error": "failed to create registration token: Post \"https://api.github.com/orgs/$YOUR_ORG_HERE/actions/runners/registration-token\": net/http: invalid header field value \"Bearer $YOUR_TOKEN_HERE\\n\" for key Authorization"
}
```
**Solution**
Your base64'ed PAT token has a new line at the end, it needs to be created without a `\n` added, either:
* `echo -n $TOKEN | base64`
* Create the secret as described in the docs using the shell and documented flags
## Operations
Troubeshooting runbooks that relate to ARC operational problems
### Stuck runner kind or backing pod
**Problem**
Sometimes either the runner kind (`kubectl get runners`) or it's underlying pod can get stuck in a terminating state for various reasons. You can get the kind unstuck by removing its finaliser using something like this:
**Solution**
Remove the finaliser from the relevent runner kind or pod
```
# Get all kind runners and remove the finalizer
$ kubectl get runners --no-headers | awk {'print $1'} | xargs kubectl patch runner --type merge -p '{"metadata":{"finalizers":null}}'
# Get all pods that are stuck terminating and remove the finalizer
$ kubectl -n get pods | grep Terminating | awk {'print $1'} | xargs kubectl patch pod -p '{"metadata":{"finalizers":null}}'
```
_Note the code assumes you have already selected the namespace your runners are in and that they
are in a namespace not shared with anything else_
### Delay in jobs being allocated to runners
**Problem**
ARC isn't involved in jobs actually getting allocated to a runner. ARC is responsible for orchestrating runners and the runner lifecycle. Why some people see large delays in job allocation is not clear however it has been https://github.com/actions-runner-controller/actions-runner-controller/issues/1387#issuecomment-1122593984 that this is caused from the self-update process somehow.
**Solution**
Disable the self-update process in your runner manifests
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example-runnerdeployment-with-sleep
spec:
template:
spec:
...
env:
- name: DISABLE_RUNNER_UPDATE
value: "true"
```
### Runner coming up before network available
**Problem**
If you're running your action runners on a service mesh like Istio, you might
have problems with runner configuration accompanied by logs like:
```
....
runner Starting Runner listener with startup type: service
runner Started listener process
runner An error occurred: Not configured
runner Runner listener exited with error code 2
runner Runner listener exit with retryable error, re-launch runner in 5 seconds.
....
```
This is because the `istio-proxy` has not completed configuring itself when the
configuration script tries to communicate with the network.
More broadly, there are many other circumstances where the runner pod coming up first can cause issues.
**Solution**<br />
> Added originally to help users with older istio instances.
> Newer Istio instances can use Istio's `holdApplicationUntilProxyStarts` attribute ([istio/istio#11130](https://github.com/istio/istio/issues/11130)) to avoid having to delay starting up the runner.
> Please read the discussion in [#592](https://github.com/actions-runner-controller/actions-runner-controller/pull/592) for more information.
You can add a delay to the runner's entrypoint script by setting the `STARTUP_DELAY_IN_SECONDS` environment variable for the runner pod. This will cause the script to sleep X seconds, this works with any runner kind.
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example-runnerdeployment-with-sleep
spec:
template:
spec:
...
env:
- name: STARTUP_DELAY_IN_SECONDS
value: "5"
```
## Outgoing network action hangs indefinitely
**Problem**
Some random outgoing network actions hangs indefinitely. This could be because your cluster does not give Docker the standard MTU of 1500, you can check this out by running `ip link` in a pod that encounters the problem and reading the outgoing interface's MTU value. If it is smaller than 1500, then try the following.
**Solution**
Add a `dockerMTU` key in your runner's spec with the value you read on the outgoing interface. For instance:
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: github-runner
namespace: github-system
spec:
replicas: 6
template:
spec:
dockerMTU: 1400
repository: $username/$repo
env: []
```
There may be more places you need to tweak for MTU.
Please consult issues like #651 for more information.
## Unable to scale to zero with TotalNumberOfQueuedAndInProgressWorkflowRuns
**Problem**
HRA doesn't scale the RunnerDeployment to zero, even though you did configure HRA correctly, to have a pull-based scaling metric `TotalNumberOfQueuedAndInProgressWorkflowRuns`, and set `minReplicas: 0`.
**Solution**
You very likely have some dangling workflow jobs stuck in `queued` or `in_progress` as seen in [#1057](https://github.com/actions-runner-controller/actions-runner-controller/issues/1057#issuecomment-1133439061).
Manually call [the "list workflow runs" API](https://docs.github.com/en/rest/actions/workflow-runs#list-workflow-runs-for-a-repository), and [remove the dangling workflow job(s)](https://docs.github.com/en/rest/actions/workflow-runs#delete-a-workflow-run).

View File

@@ -6,6 +6,8 @@ tpe=${ACCEPTANCE_TEST_SECRET_TYPE}
VALUES_FILE=${VALUES_FILE:-$(dirname $0)/values.yaml}
kubectl delete secret -n actions-runner-system controller-manager || :
if [ "${tpe}" == "token" ]; then
if ! kubectl get secret controller-manager -n actions-runner-system >/dev/null; then
kubectl create secret generic controller-manager \
@@ -16,16 +18,29 @@ elif [ "${tpe}" == "app" ]; then
kubectl create secret generic controller-manager \
-n actions-runner-system \
--from-literal=github_app_id=${APP_ID:?must not be empty} \
--from-literal=github_app_installation_id=${INSTALLATION_ID:?must not be empty} \
--from-file=github_app_private_key=${PRIVATE_KEY_FILE_PATH:?must not be empty}
--from-literal=github_app_installation_id=${APP_INSTALLATION_ID:?must not be empty} \
--from-file=github_app_private_key=${APP_PRIVATE_KEY_FILE:?must not be empty}
else
echo "ACCEPTANCE_TEST_SECRET_TYPE must be set to either \"token\" or \"app\"" 1>&2
exit 1
fi
if [ -n "${WEBHOOK_GITHUB_TOKEN}" ]; then
kubectl -n actions-runner-system delete secret \
github-webhook-server || :
kubectl -n actions-runner-system create secret generic \
github-webhook-server \
--from-literal=github_token=${WEBHOOK_GITHUB_TOKEN:?WEBHOOK_GITHUB_TOKEN must not be empty}
else
echo 'Skipped deploying secret "github-webhook-server". Set WEBHOOK_GITHUB_TOKEN to deploy.' 1>&2
fi
tool=${ACCEPTANCE_TEST_DEPLOYMENT_TOOL}
TEST_ID=${TEST_ID:-default}
if [ "${tool}" == "helm" ]; then
set -v
helm upgrade --install actions-runner-controller \
charts/actions-runner-controller \
-n actions-runner-system \
@@ -34,42 +49,30 @@ if [ "${tool}" == "helm" ]; then
--set authSecret.create=false \
--set image.repository=${NAME} \
--set image.tag=${VERSION} \
--set podAnnotations.test-id=${TEST_ID} \
--set githubWebhookServer.podAnnotations.test-id=${TEST_ID} \
-f ${VALUES_FILE}
kubectl apply -f charts/actions-runner-controller/crds
kubectl -n actions-runner-system wait deploy/actions-runner-controller --for condition=available --timeout 60s
set +v
# To prevent `CustomResourceDefinition.apiextensions.k8s.io "runners.actions.summerwind.dev" is invalid: metadata.annotations: Too long: must have at most 262144 bytes`
# errors
kubectl create -f charts/actions-runner-controller/crds || kubectl replace -f charts/actions-runner-controller/crds
# This wait fails due to timeout when it's already in crashloopback and this update doesn't change the image tag.
# That's why we add `|| :`. With that we prevent stopping the script in case of timeout and
# proceed to delete (possibly in crashloopback and/or running with outdated image) pods so that they are recreated by K8s.
kubectl -n actions-runner-system wait deploy/actions-runner-controller --for condition=available --timeout 60s || :
else
kubectl apply \
-n actions-runner-system \
-f release/actions-runner-controller.yaml
kubectl -n actions-runner-system wait deploy/controller-manager --for condition=available --timeout 120s
kubectl -n actions-runner-system wait deploy/controller-manager --for condition=available --timeout 120s || :
fi
# Restart all ARC pods
kubectl -n actions-runner-system delete po -l app.kubernetes.io/name=actions-runner-controller
echo Waiting for all ARC pods to be up and running after restart
kubectl -n actions-runner-system wait deploy/actions-runner-controller --for condition=available --timeout 120s
# Adhocly wait for some time until actions-runner-controller's admission webhook gets ready
sleep 20
RUNNER_LABEL=${RUNNER_LABEL:-self-hosted}
if [ -n "${TEST_REPO}" ]; then
if [ -n "USE_RUNNERSET" ]; then
cat acceptance/testdata/repo.runnerset.yaml | envsubst | kubectl apply -f -
cat acceptance/testdata/repo.runnerset.hra.yaml | envsubst | kubectl apply -f -
else
echo 'Deploying runnerdeployment and hra. Set USE_RUNNERSET if you want to deploy runnerset instead.'
cat acceptance/testdata/repo.runnerdeploy.yaml | envsubst | kubectl apply -f -
cat acceptance/testdata/repo.hra.yaml | envsubst | kubectl apply -f -
fi
else
echo 'Skipped deploying runnerdeployment and hra. Set TEST_REPO to "yourorg/yourrepo" to deploy.'
fi
if [ -n "${TEST_ORG}" ]; then
cat acceptance/testdata/org.runnerdeploy.yaml | envsubst | kubectl apply -f -
if [ -n "${TEST_ORG_REPO}" ]; then
cat acceptance/testdata/org.hra.yaml | envsubst | kubectl apply -f -
else
echo 'Skipped deploying organizational hra. Set TEST_ORG_REPO to "yourorg/yourrepo" to deploy.'
fi
else
echo 'Skipped deploying organizational runnerdeployment. Set TEST_ORG to deploy.'
fi

58
acceptance/deploy_runners.sh Executable file
View File

@@ -0,0 +1,58 @@
#!/usr/bin/env bash
set -e
OP=${OP:-apply}
RUNNER_LABEL=${RUNNER_LABEL:-self-hosted}
if [ -n "${TEST_REPO}" ]; then
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ENTERPRISE= TEST_ORG= RUNNER_MIN_REPLICAS=${REPO_RUNNER_MIN_REPLICAS} NAME=repo-runnerset envsubst | kubectl ${OP} -f -
else
echo "Running ${OP} runnerdeployment and hra. Set USE_RUNNERSET if you want to deploy runnerset instead."
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ENTERPRISE= TEST_ORG= RUNNER_MIN_REPLICAS=${REPO_RUNNER_MIN_REPLICAS} NAME=repo-runnerdeploy envsubst | kubectl ${OP} -f -
fi
else
echo "Skipped ${OP} for runnerdeployment and hra. Set TEST_REPO to "yourorg/yourrepo" to deploy."
fi
if [ -n "${TEST_ORG}" ]; then
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= RUNNER_MIN_REPLICAS=${ORG_RUNNER_MIN_REPLICAS} NAME=org-runnerset envsubst | kubectl ${OP} -f -
else
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= RUNNER_MIN_REPLICAS=${ORG_RUNNER_MIN_REPLICAS} NAME=org-runnerdeploy envsubst | kubectl ${OP} -f -
fi
if [ -n "${TEST_ORG_GROUP}" ]; then
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= RUNNER_MIN_REPLICAS=${ORG_RUNNER_MIN_REPLICAS} TEST_GROUP=${TEST_ORG_GROUP} NAME=orggroup-runnerset envsubst | kubectl ${OP} -f -
else
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= RUNNER_MIN_REPLICAS=${ORG_RUNNER_MIN_REPLICAS} TEST_GROUP=${TEST_ORG_GROUP} NAME=orggroup-runnerdeploy envsubst | kubectl ${OP} -f -
fi
else
echo "Skipped ${OP} on enterprise runnerdeployment. Set TEST_ORG_GROUP to ${OP}."
fi
else
echo "Skipped ${OP} on organizational runnerdeployment. Set TEST_ORG to ${OP}."
fi
if [ -n "${TEST_ENTERPRISE}" ]; then
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ORG= TEST_REPO= RUNNER_MIN_REPLICAS=${ENTERPRISE_RUNNER_MIN_REPLICAS} NAME=enterprise-runnerset envsubst | kubectl ${OP} -f -
else
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ORG= TEST_REPO= RUNNER_MIN_REPLICAS=${ENTERPRISE_RUNNER_MIN_REPLICAS} NAME=enterprise-runnerdeploy envsubst | kubectl ${OP} -f -
fi
if [ -n "${TEST_ENTERPRISE_GROUP}" ]; then
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ORG= TEST_REPO= RUNNER_MIN_REPLICAS=${ENTERPRISE_RUNNER_MIN_REPLICAS} TEST_GROUP=${TEST_ENTERPRISE_GROUP} NAME=enterprisegroup-runnerset envsubst | kubectl ${OP} -f -
else
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ORG= TEST_REPO= RUNNER_MIN_REPLICAS=${ENTERPRISE_RUNNER_MIN_REPLICAS} TEST_GROUP=${TEST_ENTERPRISE_GROUP} NAME=enterprisegroup-runnerdeploy envsubst | kubectl ${OP} -f -
fi
else
echo "Skipped ${OP} on enterprise runnerdeployment. Set TEST_ENTERPRISE_GROUP to ${OP}."
fi
else
echo "Skipped ${OP} on enterprise runnerdeployment. Set TEST_ENTERPRISE to ${OP}."
fi

View File

@@ -1,36 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: org
spec:
scaleTargetRef:
name: org-runnerdeploy
scaleUpTriggers:
- githubEvent:
checkRun:
types: ["created"]
status: "queued"
amount: 1
duration: "1m"
scheduledOverrides:
- startTime: "2021-05-11T16:05:00+09:00"
endTime: "2021-05-11T16:40:00+09:00"
minReplicas: 2
- startTime: "2021-05-01T00:00:00+09:00"
endTime: "2021-05-03T00:00:00+09:00"
recurrenceRule:
frequency: Weekly
untilTime: "2022-05-01T00:00:00+09:00"
minReplicas: 0
minReplicas: 0
maxReplicas: 5
# Used to test that HRA is working for org runners
metrics:
- type: PercentageRunnersBusy
scaleUpThreshold: '0.75'
scaleDownThreshold: '0.3'
scaleUpFactor: '2'
scaleDownFactor: '0.5'
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- ${TEST_ORG_REPO}

View File

@@ -1,25 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: actions-runner-aos-autoscaler
spec:
scaleTargetRef:
name: example-runnerdeploy
scaleUpTriggers:
- githubEvent:
checkRun:
types: ["created"]
status: "queued"
amount: 1
duration: "1m"
minReplicas: 0
maxReplicas: 5
metrics:
- type: PercentageRunnersBusy
scaleUpThreshold: '0.75'
scaleDownThreshold: '0.3'
scaleUpFactor: '2'
scaleDownFactor: '0.5'
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- ${TEST_REPO}

View File

@@ -1,37 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example-runnerdeploy
spec:
# replicas: 1
template:
spec:
repository: ${TEST_REPO}
#
# Custom runner image
#
image: ${RUNNER_NAME}:${RUNNER_TAG}
imagePullPolicy: IfNotPresent
#
# dockerd within runner container
#
## Replace `mumoshu/actions-runner-dind:dev` with your dind image
#dockerdWithinRunnerContainer: true
#image: mumoshu/actions-runner-dind:dev
#
# Set the MTU used by dockerd-managed network interfaces (including docker-build-ubuntu)
#
#dockerMTU: 1450
#Runner group
# labels:
# - "mylabel 1"
# - "mylabel 2"
#
# Non-standard working directory
#
# workDir: "/"

View File

@@ -1,29 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: example-runnerset
spec:
scaleTargetRef:
kind: RunnerSet
name: example-runnerset
scaleUpTriggers:
- githubEvent:
checkRun:
types: ["created"]
status: "queued"
amount: 1
duration: "1m"
# RunnerSet doesn't support scale from/to zero yet
minReplicas: 1
maxReplicas: 5
# This should be less than 600(seconds, the default) for faster testing
scaleDownDelaySecondsAfterScaleOut: 60
metrics:
- type: PercentageRunnersBusy
scaleUpThreshold: '0.75'
scaleDownThreshold: '0.3'
scaleUpFactor: '2'
scaleDownFactor: '0.5'
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- ${TEST_REPO}

View File

@@ -1,59 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: example-runnerset
spec:
# MANDATORY because it is based on StatefulSet: Results in a below error when omitted:
# missing required field "selector" in dev.summerwind.actions.v1alpha1.RunnerSet.spec
selector:
matchLabels:
app: example-runnerset
# MANDATORY because it is based on StatefulSet: Results in a below error when omitted:
# missing required field "serviceName" in dev.summerwind.actions.v1alpha1.RunnerSet.spec]
serviceName: example-runnerset
#replicas: 1
# From my limited testing, `ephemeral: true` is more reliable.
# Seomtimes, updating already deployed runners from `ephemeral: false` to `ephemeral: true` seems to
# result in queued jobs hanging forever.
ephemeral: ${TEST_EPHEMERAL}
repository: ${TEST_REPO}
#
# Custom runner image
#
image: ${RUNNER_NAME}:${RUNNER_TAG}
#
# dockerd within runner container
#
## Replace `mumoshu/actions-runner-dind:dev` with your dind image
#dockerdWithinRunnerContainer: true
#
# Set the MTU used by dockerd-managed network interfaces (including docker-build-ubuntu)
#
#dockerMTU: 1450
#Runner group
# labels:
# - "mylabel 1"
# - "mylabel 2"
labels:
- "${RUNNER_LABEL}"
#
# Non-standard working directory
#
# workDir: "/"
template:
metadata:
labels:
app: example-runnerset
spec:
containers:
- name: runner
imagePullPolicy: IfNotPresent
env:
- name: RUNNER_FEATURE_FLAG_EPHEMERAL
value: "${RUNNER_FEATURE_FLAG_EPHEMERAL}"
#- name: docker
# #image: mumoshu/actions-runner-dind:dev

View File

@@ -1,12 +1,15 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: org-runnerdeploy
name: ${NAME}
spec:
# replicas: 1
template:
spec:
enterprise: ${TEST_ENTERPRISE}
group: ${TEST_GROUP}
organization: ${TEST_ORG}
repository: ${TEST_REPO}
#
# Custom runner image
@@ -14,12 +17,15 @@ spec:
image: ${RUNNER_NAME}:${RUNNER_TAG}
imagePullPolicy: IfNotPresent
ephemeral: ${TEST_EPHEMERAL}
#
# dockerd within runner container
#
## Replace `mumoshu/actions-runner-dind:dev` with your dind image
#dockerdWithinRunnerContainer: true
#image: mumoshu/actions-runner-dind:dev
dockerdWithinRunnerContainer: ${RUNNER_DOCKERD_WITHIN_RUNNER_CONTAINER}
#
# Set the MTU used by dockerd-managed network interfaces (including docker-build-ubuntu)
@@ -30,8 +36,26 @@ spec:
# labels:
# - "mylabel 1"
# - "mylabel 2"
labels:
- "${RUNNER_LABEL}"
#
# Non-standard working directory
#
# workDir: "/"
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: ${NAME}
spec:
scaleTargetRef:
name: ${NAME}
scaleUpTriggers:
- githubEvent:
workflowJob: {}
amount: 1
duration: "10m"
minReplicas: ${RUNNER_MIN_REPLICAS}
maxReplicas: 10
scaleDownDelaySecondsAfterScaleOut: ${RUNNER_SCALE_DOWN_DELAY_SECONDS_AFTER_SCALE_OUT}

View File

@@ -0,0 +1,263 @@
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ${NAME}-runner-work-dir
labels:
content: ${NAME}-runner-work-dir
provisioner: rancher.io/local-path
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ${NAME}
# In kind environments, the provider writes:
# /var/lib/docker/volumes/KIND_NODE_CONTAINER_VOL_ID/_data/local-path-provisioner/PV_NAME
# It can be hundreds of gigabytes depending on what you cache in the test workflow. Beware to not encounter `no space left on device` errors!
# If you did encounter no space errorrs try:
# docker system prune
# docker buildx prune #=> frees up /var/lib/docker/volumes/buildx_buildkit_container-builder0_state
# sudo rm -rf /var/lib/docker/volumes/KIND_NODE_CONTAINER_VOL_ID/_data/local-path-provisioner #=> frees up local-path-provisioner's data
provisioner: rancher.io/local-path
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ${NAME}-var-lib-docker
labels:
content: ${NAME}-var-lib-docker
provisioner: rancher.io/local-path
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ${NAME}-cache
labels:
content: ${NAME}-cache
provisioner: rancher.io/local-path
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ${NAME}-runner-tool-cache
labels:
content: ${NAME}-runner-tool-cache
provisioner: rancher.io/local-path
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: ${NAME}
spec:
# MANDATORY because it is based on StatefulSet: Results in a below error when omitted:
# missing required field "selector" in dev.summerwind.actions.v1alpha1.RunnerSet.spec
selector:
matchLabels:
app: ${NAME}
# MANDATORY because it is based on StatefulSet: Results in a below error when omitted:
# missing required field "serviceName" in dev.summerwind.actions.v1alpha1.RunnerSet.spec]
serviceName: ${NAME}
#replicas: 1
# From my limited testing, `ephemeral: true` is more reliable.
# Seomtimes, updating already deployed runners from `ephemeral: false` to `ephemeral: true` seems to
# result in queued jobs hanging forever.
ephemeral: ${TEST_EPHEMERAL}
enterprise: ${TEST_ENTERPRISE}
group: ${TEST_GROUP}
organization: ${TEST_ORG}
repository: ${TEST_REPO}
#
# Custom runner image
#
image: ${RUNNER_NAME}:${RUNNER_TAG}
#
# dockerd within runner container
#
## Replace `mumoshu/actions-runner-dind:dev` with your dind image
#dockerdWithinRunnerContainer: true
dockerdWithinRunnerContainer: ${RUNNER_DOCKERD_WITHIN_RUNNER_CONTAINER}
#
# Set the MTU used by dockerd-managed network interfaces (including docker-build-ubuntu)
#
#dockerMTU: 1450
#Runner group
# labels:
# - "mylabel 1"
# - "mylabel 2"
labels:
- "${RUNNER_LABEL}"
#
# Non-standard working directory
#
# workDir: "/"
template:
metadata:
labels:
app: ${NAME}
spec:
containers:
- name: runner
imagePullPolicy: IfNotPresent
env:
- name: RUNNER_FEATURE_FLAG_EPHEMERAL
value: "${RUNNER_FEATURE_FLAG_EPHEMERAL}"
- name: GOMODCACHE
value: "/home/runner/.cache/go-mod"
# PV-backed runner work dir
volumeMounts:
# Comment out the ephemeral work volume if you're going to test the kubernetes container mode
# The volume and mount with the same names will be created by workVolumeClaimTemplate and the kubernetes container mode support.
# - name: work
# mountPath: /runner/_work
# Cache docker image layers, in case dockerdWithinRunnerContainer=true
- name: var-lib-docker
mountPath: /var/lib/docker
# Cache go modules and builds
# - name: gocache
# # Run `goenv | grep GOCACHE` to verify the path is correct for your env
# mountPath: /home/runner/.cache/go-build
# - name: gomodcache
# # Run `goenv | grep GOMODCACHE` to verify the path is correct for your env
# # mountPath: /home/runner/go/pkg/mod
- name: cache
# go: could not create module cache: stat /home/runner/.cache/go-mod: permission denied
mountPath: "/home/runner/.cache"
- name: runner-tool-cache
# This corresponds to our runner image's default setting of RUNNER_TOOL_CACHE=/opt/hostedtoolcache.
#
# In case you customize the envvar in both runner and docker containers of the runner pod spec,
# You'd need to change this mountPath accordingly.
#
# The tool cache directory is defined in actions/toolkit's tool-cache module:
# https://github.com/actions/toolkit/blob/2f164000dcd42fb08287824a3bc3030dbed33687/packages/tool-cache/src/tool-cache.ts#L621-L638
#
# Many setup-* actions like setup-go utilizes the tool-cache module to download and cache installed binaries:
# https://github.com/actions/setup-go/blob/56a61c9834b4a4950dbbf4740af0b8a98c73b768/src/installer.ts#L144
mountPath: "/opt/hostedtoolcache"
# Valid only when dockerdWithinRunnerContainer=false
- name: docker
# PV-backed runner work dir
volumeMounts:
- name: work
mountPath: /runner/_work
# Cache docker image layers, in case dockerdWithinRunnerContainer=false
- name: var-lib-docker
mountPath: /var/lib/docker
# image: mumoshu/actions-runner-dind:dev
# For buildx cache
- name: cache
mountPath: "/home/runner/.cache"
# Comment out the ephemeral work volume if you're going to test the kubernetes container mode
# volumes:
# - name: work
# ephemeral:
# volumeClaimTemplate:
# spec:
# accessModes:
# - ReadWriteOnce
# storageClassName: "${NAME}-runner-work-dir"
# resources:
# requests:
# storage: 10Gi
volumeClaimTemplates:
- metadata:
name: vol1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi
storageClassName: ${NAME}
## Dunno which provider supports auto-provisioning with selector.
## At least the rancher local path provider stopped with:
## waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator
# selector:
# matchLabels:
# runnerset-volume-id: ${NAME}-vol1
- metadata:
name: vol2
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi
storageClassName: ${NAME}
# selector:
# matchLabels:
# runnerset-volume-id: ${NAME}-vol2
- metadata:
name: var-lib-docker
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi
storageClassName: ${NAME}-var-lib-docker
- metadata:
name: cache
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi
storageClassName: ${NAME}-cache
- metadata:
name: runner-tool-cache
# It turns out labels doesn't distinguish PVs across PVCs and the
# end result is PVs are reused by wrong PVCs.
# The correct way seems to be to differentiate storage class per pvc template.
# labels:
# id: runner-tool-cache
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi
storageClassName: ${NAME}-runner-tool-cache
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: ${NAME}
spec:
scaleTargetRef:
kind: RunnerSet
name: ${NAME}
scaleUpTriggers:
- githubEvent:
workflowJob: {}
amount: 1
duration: "10m"
minReplicas: ${RUNNER_MIN_REPLICAS}
maxReplicas: 10
scaleDownDelaySecondsAfterScaleOut: ${RUNNER_SCALE_DOWN_DELAY_SECONDS_AFTER_SCALE_OUT}
# Comment out the whole metrics if you'd like to solely test webhook-based scaling
metrics:
- type: PercentageRunnersBusy
scaleUpThreshold: '0.75'
scaleDownThreshold: '0.25'
scaleUpFactor: '2'
scaleDownFactor: '0.5'

View File

@@ -1,12 +1,15 @@
# Set actions-runner-controller settings for testing
githubAPICacheDuration: 10s
logLevel: "-4"
githubWebhookServer:
logLevel: "-4"
enabled: true
labels: {}
replicaCount: 1
syncPeriod: 10m
useRunnerGroupsVisibility: true
secret:
create: true
enabled: true
# create: true
name: "github-webhook-server"
### GitHub Webhook Configuration
#github_webhook_secret_token: ""

View File

@@ -29,7 +29,7 @@ type HorizontalRunnerAutoscalerSpec struct {
// +optional
MinReplicas *int `json:"minReplicas,omitempty"`
// MinReplicas is the maximum number of replicas the deployment is allowed to scale
// MaxReplicas is the maximum number of replicas the deployment is allowed to scale
// +optional
MaxReplicas *int `json:"maxReplicas,omitempty"`
@@ -72,10 +72,12 @@ type GitHubEventScaleUpTriggerSpec struct {
CheckRun *CheckRunSpec `json:"checkRun,omitempty"`
PullRequest *PullRequestSpec `json:"pullRequest,omitempty"`
Push *PushSpec `json:"push,omitempty"`
WorkflowJob *WorkflowJobSpec `json:"workflowJob,omitempty"`
}
// https://docs.github.com/en/actions/reference/events-that-trigger-workflows#check_run
type CheckRunSpec struct {
// One of: created, rerequested, or completed
Types []string `json:"types,omitempty"`
Status string `json:"status,omitempty"`
@@ -90,6 +92,10 @@ type CheckRunSpec struct {
Repositories []string `json:"repositories,omitempty"`
}
// https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job
type WorkflowJobSpec struct {
}
// https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request
type PullRequestSpec struct {
Types []string `json:"types,omitempty"`
@@ -107,6 +113,9 @@ type CapacityReservation struct {
Name string `json:"name,omitempty"`
ExpirationTime metav1.Time `json:"expirationTime,omitempty"`
Replicas int `json:"replicas,omitempty"`
// +optional
EffectiveTime metav1.Time `json:"effectiveTime,omitempty"`
}
type ScaleTargetRef struct {

View File

@@ -18,8 +18,10 @@ package v1alpha1
import (
"errors"
"fmt"
"k8s.io/apimachinery/pkg/api/resource"
"k8s.io/apimachinery/pkg/util/validation/field"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@@ -71,6 +73,9 @@ type RunnerConfig struct {
VolumeSizeLimit *resource.Quantity `json:"volumeSizeLimit,omitempty"`
// +optional
VolumeStorageMedium *string `json:"volumeStorageMedium,omitempty"`
// +optional
ContainerMode string `json:"containerMode,omitempty"`
}
// RunnerPodSpec defines the desired pod spec fields of the runner pod
@@ -81,6 +86,9 @@ type RunnerPodSpec struct {
// +optional
DockerVolumeMounts []corev1.VolumeMount `json:"dockerVolumeMounts,omitempty"`
// +optional
DockerEnv []corev1.EnvVar `json:"dockerEnv,omitempty"`
// +optional
Containers []corev1.Container `json:"containers,omitempty"`
@@ -132,6 +140,9 @@ type RunnerPodSpec struct {
// +optional
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
// +optional
PriorityClassName string `json:"priorityClassName,omitempty"`
// +optional
TerminationGracePeriodSeconds *int64 `json:"terminationGracePeriodSeconds,omitempty"`
@@ -141,17 +152,47 @@ type RunnerPodSpec struct {
// +optional
HostAliases []corev1.HostAlias `json:"hostAliases,omitempty"`
// +optional
TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
// RuntimeClassName is the container runtime configuration that containers should run under.
// More info: https://kubernetes.io/docs/concepts/containers/runtime-class
// +optional
RuntimeClassName *string `json:"runtimeClassName,omitempty"`
// +optional
DnsConfig []corev1.PodDNSConfig `json:"dnsConfig,omitempty"`
DnsConfig *corev1.PodDNSConfig `json:"dnsConfig,omitempty"`
// +optional
WorkVolumeClaimTemplate *WorkVolumeClaimTemplate `json:"workVolumeClaimTemplate,omitempty"`
}
func (rs *RunnerSpec) Validate(rootPath *field.Path) field.ErrorList {
var (
errList field.ErrorList
err error
)
err = rs.validateRepository()
if err != nil {
errList = append(errList, field.Invalid(rootPath.Child("repository"), rs.Repository, err.Error()))
}
err = rs.validateWorkVolumeClaimTemplate()
if err != nil {
errList = append(errList, field.Invalid(rootPath.Child("workVolumeClaimTemplate"), rs.WorkVolumeClaimTemplate, err.Error()))
}
err = rs.validateIsServiceAccountNameSet()
if err != nil {
errList = append(errList, field.Invalid(rootPath.Child("serviceAccountName"), rs.ServiceAccountName, err.Error()))
}
return errList
}
// ValidateRepository validates repository field.
func (rs *RunnerSpec) ValidateRepository() error {
func (rs *RunnerSpec) validateRepository() error {
// Enterprise, Organization and repository are both exclusive.
foundCount := 0
if len(rs.Organization) > 0 {
@@ -173,8 +214,34 @@ func (rs *RunnerSpec) ValidateRepository() error {
return nil
}
func (rs *RunnerSpec) validateWorkVolumeClaimTemplate() error {
if rs.ContainerMode != "kubernetes" {
return nil
}
if rs.WorkVolumeClaimTemplate == nil {
return errors.New("Spec.ContainerMode: kubernetes must have workVolumeClaimTemplate field specified")
}
return rs.WorkVolumeClaimTemplate.validate()
}
func (rs *RunnerSpec) validateIsServiceAccountNameSet() error {
if rs.ContainerMode != "kubernetes" {
return nil
}
if rs.ServiceAccountName == "" {
return errors.New("service account name is required if container mode is kubernetes")
}
return nil
}
// RunnerStatus defines the observed state of Runner
type RunnerStatus struct {
// Turns true only if the runner pod is ready.
// +optional
Ready bool `json:"ready"`
// +optional
Registration RunnerStatusRegistration `json:"registration"`
// +optional
@@ -198,6 +265,51 @@ type RunnerStatusRegistration struct {
ExpiresAt metav1.Time `json:"expiresAt"`
}
type WorkVolumeClaimTemplate struct {
StorageClassName string `json:"storageClassName"`
AccessModes []corev1.PersistentVolumeAccessMode `json:"accessModes"`
Resources corev1.ResourceRequirements `json:"resources"`
}
func (w *WorkVolumeClaimTemplate) validate() error {
if w.AccessModes == nil || len(w.AccessModes) == 0 {
return errors.New("Access mode should have at least one mode specified")
}
for _, accessMode := range w.AccessModes {
switch accessMode {
case corev1.ReadWriteOnce, corev1.ReadWriteMany:
default:
return fmt.Errorf("Access mode %v is not supported", accessMode)
}
}
return nil
}
func (w *WorkVolumeClaimTemplate) V1Volume() corev1.Volume {
return corev1.Volume{
Name: "work",
VolumeSource: corev1.VolumeSource{
Ephemeral: &corev1.EphemeralVolumeSource{
VolumeClaimTemplate: &corev1.PersistentVolumeClaimTemplate{
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: w.AccessModes,
StorageClassName: &w.StorageClassName,
Resources: w.Resources,
},
},
},
},
}
}
func (w *WorkVolumeClaimTemplate) V1VolumeMount(mountPath string) corev1.VolumeMount {
return corev1.VolumeMount{
MountPath: mountPath,
Name: "work",
}
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:JSONPath=".spec.enterprise",name=Enterprise,type=string

View File

@@ -66,15 +66,7 @@ func (r *Runner) ValidateDelete() error {
// Validate validates resource spec.
func (r *Runner) Validate() error {
var (
errList field.ErrorList
err error
)
err = r.Spec.ValidateRepository()
if err != nil {
errList = append(errList, field.Invalid(field.NewPath("spec", "repository"), r.Spec.Repository, err.Error()))
}
errList := r.Spec.Validate(field.NewPath("spec"))
if len(errList) > 0 {
return apierrors.NewInvalid(r.GroupVersionKind().GroupKind(), r.Name, errList)

View File

@@ -31,6 +31,14 @@ type RunnerDeploymentSpec struct {
// +nullable
Replicas *int `json:"replicas,omitempty"`
// EffectiveTime is the time the upstream controller requested to sync Replicas.
// It is usually populated by the webhook-based autoscaler via HRA.
// The value is inherited to RunnerRepicaSet(s) and used to prevent ephemeral runners from unnecessarily recreated.
//
// +optional
// +nullable
EffectiveTime *metav1.Time `json:"effectiveTime"`
// +optional
// +nullable
Selector *metav1.LabelSelector `json:"selector"`

View File

@@ -26,7 +26,7 @@ import (
)
// log is for logging in this package.
var runenrDeploymentLog = logf.Log.WithName("runnerdeployment-resource")
var runnerDeploymentLog = logf.Log.WithName("runnerdeployment-resource")
func (r *RunnerDeployment) SetupWebhookWithManager(mgr ctrl.Manager) error {
return ctrl.NewWebhookManagedBy(mgr).
@@ -49,13 +49,13 @@ var _ webhook.Validator = &RunnerDeployment{}
// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *RunnerDeployment) ValidateCreate() error {
runenrDeploymentLog.Info("validate resource to be created", "name", r.Name)
runnerDeploymentLog.Info("validate resource to be created", "name", r.Name)
return r.Validate()
}
// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type
func (r *RunnerDeployment) ValidateUpdate(old runtime.Object) error {
runenrDeploymentLog.Info("validate resource to be updated", "name", r.Name)
runnerDeploymentLog.Info("validate resource to be updated", "name", r.Name)
return r.Validate()
}
@@ -66,15 +66,7 @@ func (r *RunnerDeployment) ValidateDelete() error {
// Validate validates resource spec.
func (r *RunnerDeployment) Validate() error {
var (
errList field.ErrorList
err error
)
err = r.Spec.Template.Spec.ValidateRepository()
if err != nil {
errList = append(errList, field.Invalid(field.NewPath("spec", "template", "spec", "repository"), r.Spec.Template.Spec.Repository, err.Error()))
}
errList := r.Spec.Template.Spec.Validate(field.NewPath("spec", "template", "spec"))
if len(errList) > 0 {
return apierrors.NewInvalid(r.GroupVersionKind().GroupKind(), r.Name, errList)

View File

@@ -26,6 +26,15 @@ type RunnerReplicaSetSpec struct {
// +nullable
Replicas *int `json:"replicas,omitempty"`
// EffectiveTime is the time the upstream controller requested to sync Replicas.
// It is usually populated by the webhook-based autoscaler via HRA and RunnerDeployment.
// The value is used to prevent runnerreplicaset controller from unnecessarily recreating ephemeral runners
// based on potentially outdated Replicas value.
//
// +optional
// +nullable
EffectiveTime *metav1.Time `json:"effectiveTime"`
// +optional
// +nullable
Selector *metav1.LabelSelector `json:"selector"`

View File

@@ -66,15 +66,7 @@ func (r *RunnerReplicaSet) ValidateDelete() error {
// Validate validates resource spec.
func (r *RunnerReplicaSet) Validate() error {
var (
errList field.ErrorList
err error
)
err = r.Spec.Template.Spec.ValidateRepository()
if err != nil {
errList = append(errList, field.Invalid(field.NewPath("spec", "template", "spec", "repository"), r.Spec.Template.Spec.Repository, err.Error()))
}
errList := r.Spec.Template.Spec.Validate(field.NewPath("spec", "template", "spec"))
if len(errList) > 0 {
return apierrors.NewInvalid(r.GroupVersionKind().GroupKind(), r.Name, errList)

View File

@@ -25,6 +25,20 @@ import (
type RunnerSetSpec struct {
RunnerConfig `json:",inline"`
// EffectiveTime is the time the upstream controller requested to sync Replicas.
// It is usually populated by the webhook-based autoscaler via HRA.
// It is used to prevent ephemeral runners from unnecessarily recreated.
//
// +optional
// +nullable
EffectiveTime *metav1.Time `json:"effectiveTime,omitempty"`
// +optional
ServiceAccountName string `json:"serviceAccountName,omitempty"`
// +optional
WorkVolumeClaimTemplate *WorkVolumeClaimTemplate `json:"workVolumeClaimTemplate,omitempty"`
appsv1.StatefulSetSpec `json:",inline"`
}

View File

@@ -47,6 +47,7 @@ func (in *CacheEntry) DeepCopy() *CacheEntry {
func (in *CapacityReservation) DeepCopyInto(out *CapacityReservation) {
*out = *in
in.ExpirationTime.DeepCopyInto(&out.ExpirationTime)
in.EffectiveTime.DeepCopyInto(&out.EffectiveTime)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new CapacityReservation.
@@ -107,6 +108,11 @@ func (in *GitHubEventScaleUpTriggerSpec) DeepCopyInto(out *GitHubEventScaleUpTri
*out = new(PushSpec)
**out = **in
}
if in.WorkflowJob != nil {
in, out := &in.WorkflowJob, &out.WorkflowJob
*out = new(WorkflowJobSpec)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new GitHubEventScaleUpTriggerSpec.
@@ -498,6 +504,10 @@ func (in *RunnerDeploymentSpec) DeepCopyInto(out *RunnerDeploymentSpec) {
*out = new(int)
**out = **in
}
if in.EffectiveTime != nil {
in, out := &in.EffectiveTime, &out.EffectiveTime
*out = (*in).DeepCopy()
}
if in.Selector != nil {
in, out := &in.Selector, &out.Selector
*out = new(metav1.LabelSelector)
@@ -599,6 +609,13 @@ func (in *RunnerPodSpec) DeepCopyInto(out *RunnerPodSpec) {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
if in.DockerEnv != nil {
in, out := &in.DockerEnv, &out.DockerEnv
*out = make([]v1.EnvVar, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
if in.Containers != nil {
in, out := &in.Containers, &out.Containers
*out = make([]v1.Container, len(*in))
@@ -707,6 +724,13 @@ func (in *RunnerPodSpec) DeepCopyInto(out *RunnerPodSpec) {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
if in.TopologySpreadConstraints != nil {
in, out := &in.TopologySpreadConstraints, &out.TopologySpreadConstraints
*out = make([]v1.TopologySpreadConstraint, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
}
if in.RuntimeClassName != nil {
in, out := &in.RuntimeClassName, &out.RuntimeClassName
*out = new(string)
@@ -714,10 +738,13 @@ func (in *RunnerPodSpec) DeepCopyInto(out *RunnerPodSpec) {
}
if in.DnsConfig != nil {
in, out := &in.DnsConfig, &out.DnsConfig
*out = make([]v1.PodDNSConfig, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
*out = new(v1.PodDNSConfig)
(*in).DeepCopyInto(*out)
}
if in.WorkVolumeClaimTemplate != nil {
in, out := &in.WorkVolumeClaimTemplate, &out.WorkVolumeClaimTemplate
*out = new(WorkVolumeClaimTemplate)
(*in).DeepCopyInto(*out)
}
}
@@ -798,6 +825,10 @@ func (in *RunnerReplicaSetSpec) DeepCopyInto(out *RunnerReplicaSetSpec) {
*out = new(int)
**out = **in
}
if in.EffectiveTime != nil {
in, out := &in.EffectiveTime, &out.EffectiveTime
*out = (*in).DeepCopy()
}
if in.Selector != nil {
in, out := &in.Selector, &out.Selector
*out = new(metav1.LabelSelector)
@@ -909,6 +940,15 @@ func (in *RunnerSetList) DeepCopyObject() runtime.Object {
func (in *RunnerSetSpec) DeepCopyInto(out *RunnerSetSpec) {
*out = *in
in.RunnerConfig.DeepCopyInto(&out.RunnerConfig)
if in.EffectiveTime != nil {
in, out := &in.EffectiveTime, &out.EffectiveTime
*out = (*in).DeepCopy()
}
if in.WorkVolumeClaimTemplate != nil {
in, out := &in.WorkVolumeClaimTemplate, &out.WorkVolumeClaimTemplate
*out = new(WorkVolumeClaimTemplate)
(*in).DeepCopyInto(*out)
}
in.StatefulSetSpec.DeepCopyInto(&out.StatefulSetSpec)
}
@@ -1095,3 +1135,39 @@ func (in *ScheduledOverride) DeepCopy() *ScheduledOverride {
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *WorkVolumeClaimTemplate) DeepCopyInto(out *WorkVolumeClaimTemplate) {
*out = *in
if in.AccessModes != nil {
in, out := &in.AccessModes, &out.AccessModes
*out = make([]v1.PersistentVolumeAccessMode, len(*in))
copy(*out, *in)
}
in.Resources.DeepCopyInto(&out.Resources)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new WorkVolumeClaimTemplate.
func (in *WorkVolumeClaimTemplate) DeepCopy() *WorkVolumeClaimTemplate {
if in == nil {
return nil
}
out := new(WorkVolumeClaimTemplate)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *WorkflowJobSpec) DeepCopyInto(out *WorkflowJobSpec) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new WorkflowJobSpec.
func (in *WorkflowJobSpec) DeepCopy() *WorkflowJobSpec {
if in == nil {
return nil
}
out := new(WorkflowJobSpec)
in.DeepCopyInto(out)
return out
}

View File

@@ -2,3 +2,4 @@
lint-conf: charts/.ci/lint-config.yaml
chart-repos:
- jetstack=https://charts.jetstack.io
check-version-increment: false # Disable checking that the chart version has been bumped

View File

@@ -15,10 +15,10 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.13.0
version: 0.20.0
# Used as the default manager tag value when no tag property is provided in the values.yaml
appVersion: 0.20.0
appVersion: 0.25.0
home: https://github.com/actions-runner-controller/actions-runner-controller

View File

@@ -4,69 +4,88 @@ All additional docs are kept in the `docs/` folder, this README is solely for do
## Values
**_The values are documented as of HEAD, to review the configuration options for your chart version ensure you view this file at the relevent [tag](https://github.com/actions-runner-controller/actions-runner-controller/tags)_**
**_The values are documented as of HEAD, to review the configuration options for your chart version ensure you view this file at the relevant [tag](https://github.com/actions-runner-controller/actions-runner-controller/tags)_**
> _Default values are the defaults set in the charts values.yaml, some properties have default configurations in the code for when the property is omitted or invalid_
> _Default values are the defaults set in the charts `values.yaml`, some properties have default configurations in the code for when the property is omitted or invalid_
| Key | Description | Default |
|----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| `labels` | Set labels to apply to all resources in the chart | |
| `replicaCount` | Set the number of controller pods | 1 |
| `webhookPort` | Set the containerPort for the webhook Pod | 9443 |
| `syncPeriod` | Set the period in which the controler reconciles the desired runners count | 10m |
| `enableLeaderElection` | Enable election configuration | true |
| `leaderElectionId` | Set the election ID for the controller group | |
| `githubAPICacheDuration` | Set the cache period for API calls | |
| `githubEnterpriseServerURL` | Set the URL for a self-hosted GitHub Enterprise Server | |
| `githubURL` | Override GitHub URL to be used for GitHub API calls | |
| `githubUploadURL` | Override GitHub Upload URL to be used for GitHub API calls | |
| `runnerGithubURL` | Override GitHub URL to be used by runners during registration | |
| `logLevel` | Set the log level of the controller container | |
| `additionalVolumes` | Set additional volumes to add to the manager container | |
| `additionalVolumeMounts` | Set additional volume mounts to add to the manager container | |
| `authSecret.create` | Deploy the controller auth secret | false |
| `authSecret.name` | Set the name of the auth secret | controller-manager |
| `authSecret.annotations` | Set annotations for the auth Secret | |
| `authSecret.github_app_id` | The ID of your GitHub App. **This can't be set at the same time as `authSecret.github_token`** | |
| `authSecret.github_app_installation_id` | The ID of your GitHub App installation. **This can't be set at the same time as `authSecret.github_token`** | |
| `authSecret.github_app_private_key` | The multiline string of your GitHub App's private key. **This can't be set at the same time as `authSecret.github_token`** | |
| `authSecret.github_token` | Your chosen GitHub PAT token. **This can't be set at the same time as the `authSecret.github_app_*`** | |
| `authSecret.github_basicauth_username` | Username for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API | |
| `authSecret.github_basicauth_password` | Password for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API | |
| `dockerRegistryMirror` | The default Docker Registry Mirror used by runners. | |
| `hostNetwork` | The "hostNetwork" of the controller container | false |
| `image.repository` | The "repository/image" of the controller container | summerwind/actions-runner-controller |
| `image.tag` | The tag of the controller container | |
| `image.actionsRunnerRepositoryAndTag` | The "repository/image" of the actions runner container | summerwind/actions-runner:latest |
| `image.actionsRunnerImagePullSecrets` | Optional image pull secrets to be included in the runner pod's ImagePullSecrets | |
| `image.dindSidecarRepositoryAndTag` | The "repository/image" of the dind sidecar container | docker:dind |
| `image.pullPolicy` | The pull policy of the controller image | IfNotPresent |
| `metrics.serviceMonitor` | Deploy serviceMonitor kind for for use with prometheus-operator CRDs | false |
| `metrics.serviceAnnotations` | Set annotations for the provisioned metrics service resource | |
| `metrics.port` | Set port of metrics service | 8443 |
| `metrics.proxy.enabled` | Deploy kube-rbac-proxy container in controller pod | true |
| `metrics.proxy.image.repository` | The "repository/image" of the kube-proxy container | quay.io/brancz/kube-rbac-proxy |
| `metrics.proxy.image.tag` | The tag of the kube-proxy image to use when pulling the container | v0.10.0 |
| `metrics.serviceMonitorLabels` | Set labels to apply to ServiceMonitor resources | |
| `imagePullSecrets` | Specifies the secret to be used when pulling the controller pod containers | |
| `fullNameOverride` | Override the full resource names | |
| `fullnameOverride` | Override the full resource names | |
| `nameOverride` | Override the resource name prefix | |
| `serviceAccont.annotations` | Set annotations to the service account | |
| `serviceAccount.annotations` | Set annotations to the service account | |
| `serviceAccount.create` | Deploy the controller pod under a service account | true |
| `podAnnotations` | Set annotations for the controller pod | |
| `podLabels` | Set labels for the controller pod | |
| `serviceAccount.name` | Set the name of the service account | |
| `securityContext` | Set the security context for each container in the controller pod | |
| `podSecurityContext` | Set the security context to controller pod | |
| `service.port` | Set controller service type | |
| `service.type` | Set controller service ports | |
| `service.annotations` | Set annotations for the provisioned webhook service resource | |
| `service.port` | Set controller service ports | |
| `service.type` | Set controller service type | |
| `topologySpreadConstraints` | Set the controller pod topologySpreadConstraints | |
| `nodeSelector` | Set the controller pod nodeSelector | |
| `resources` | Set the controller pod resources | |
| `affinity` | Set the controller pod affinity rules | |
| `podDisruptionBudget.enabled` | Enables a PDB to ensure HA of controller pods | false |
| `podDisruptionBudget.minAvailable` | Minimum number of pods that must be available after eviction | |
| `podDisruptionBudget.maxUnavailable` | Maximum number of pods that can be unavailable after eviction. Kubernetes 1.7+ required. | |
| `tolerations` | Set the controller pod tolerations | |
| `env` | Set environment variables for the controller container | |
| `priorityClassName` | Set the controller pod priorityClassName | |
| `scope.watchNamespace` | Tells the controller which namespace to watch if `scope.singleNamespace` is true | |
| `scope.watchNamespace` | Tells the controller and the github webhook server which namespace to watch if `scope.singleNamespace` is true | `Release.Namespace` (the default namespace of the helm chart). |
| `scope.singleNamespace` | Limit the controller to watch a single namespace | false |
| `certManagerEnabled` | Enable cert-manager. If disabled you must set admissionWebHooks.caBundle and create TLS secrets manually | true |
| `admissionWebHooks.caBundle` | Base64-encoded PEM bundle containing the CA that signed the webhook's serving certificate | |
| `githubWebhookServer.logLevel` | Set the log level of the githubWebhookServer container | |
| `githubWebhookServer.replicaCount` | Set the number of webhook server pods | 1 |
| `githubWebhookServer.useRunnerGroupsVisibility` | Enable supporting runner groups with custom visibility. This will incur in extra API calls and may blow up your budget. Currently, you also need to set `githubWebhookServer.secret.enabled` to enable this feature. | false |
| `githubWebhookServer.syncPeriod` | Set the period in which the controller reconciles the resources | 10m |
| `githubWebhookServer.enabled` | Deploy the webhook server pod | false |
| `githubWebhookServer.secret.enabled` | Passes the webhook hook secret to the github-webhook-server | false |
| `githubWebhookServer.secret.create` | Deploy the webhook hook secret | false |
| `githubWebhookServer.secret.name` | Set the name of the webhook hook secret | github-webhook-server |
| `githubWebhookServer.secret.github_webhook_secret_token` | Set the webhook secret token value | |
| `githubWebhookServer.imagePullSecrets` | Specifies the secret to be used when pulling the githubWebhookServer pod containers | |
| `githubWebhookServer.nameOveride` | Override the resource name prefix | |
| `githubWebhookServer.fullNameOveride` | Override the full resource names | |
| `githubWebhookServer.nameOverride` | Override the resource name prefix | |
| `githubWebhookServer.fullnameOverride` | Override the full resource names | |
| `githubWebhookServer.serviceAccount.create` | Deploy the githubWebhookServer under a service account | true |
| `githubWebhookServer.serviceAccount.annotations` | Set annotations for the service account | |
| `githubWebhookServer.serviceAccount.name` | Set the service account name | |
@@ -86,3 +105,7 @@ All additional docs are kept in the `docs/` folder, this README is solely for do
| `githubWebhookServer.ingress.annotations` | Set annotations for the ingress kind | |
| `githubWebhookServer.ingress.hosts` | Set hosts configuration for ingress | `[{"host": "chart-example.local", "paths": []}]` |
| `githubWebhookServer.ingress.tls` | Set tls configuration for ingress | |
| `githubWebhookServer.ingress.ingressClassName` | Set ingress class name | |
| `githubWebhookServer.podDisruptionBudget.enabled` | Enables a PDB to ensure HA of githubwebhook pods | false |
| `githubWebhookServer.podDisruptionBudget.minAvailable` | Minimum number of pods that must be available after eviction | |
| `githubWebhookServer.podDisruptionBudget.maxUnavailable` | Maximum number of pods that can be unavailable after eviction. Kubernetes 1.7+ required. | |

View File

@@ -2,7 +2,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.6.0
controller-gen.kubebuilder.io/version: v0.7.0
creationTimestamp: null
name: horizontalrunnerautoscalers.actions.summerwind.dev
spec:
@@ -49,6 +49,9 @@ spec:
items:
description: CapacityReservation specifies the number of replicas temporarily added to the scale target until ExpirationTime.
properties:
effectiveTime:
format: date-time
type: string
expirationTime:
format: date-time
type: string
@@ -59,7 +62,7 @@ spec:
type: object
type: array
maxReplicas:
description: MinReplicas is the maximum number of replicas the deployment is allowed to scale
description: MaxReplicas is the maximum number of replicas the deployment is allowed to scale
type: integer
metrics:
description: Metrics is the collection of various metric targets to calculate desired number of runners
@@ -138,6 +141,7 @@ spec:
status:
type: string
types:
description: 'One of: created, rerequested, or completed'
items:
type: string
type: array
@@ -157,6 +161,9 @@ spec:
push:
description: PushSpec is the condition for triggering scale-up on push event Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
type: object
workflowJob:
description: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job
type: object
type: object
type: object
type: array

View File

@@ -18,23 +18,27 @@ Due to the above you can't just do a `helm upgrade` to release the latest versio
## Steps
1. Upgrade CRDs
1. Upgrade CRDs, this isn't optional, the CRDs you are using must be those that correspond with the version of the controller you are installing
```shell
# REMEMBER TO UPDATE THE CHART_VERSION TO RELEVANT CHART VERISON!!!!
CHART_VERSION=0.11.0
# REMEMBER TO UPDATE THE CHART_VERSION TO RELEVANT CHART VERISON!!!!
CHART_VERSION=0.18.0
curl -L https://github.com/actions-runner-controller/actions-runner-controller/releases/download/actions-runner-controller-${CHART_VERSION}/actions-runner-controller-${CHART_VERSION}.tgz | tar zxv --strip 1 actions-runner-controller/crds
kubectl apply -f crds/
kubectl replace -f crds/
```
2. Upgrade the Helm release
```shell
helm upgrade --install \
--namespace actions-runner-system \
--version ${CHART_VERSION} \
# helm repo [command]
helm repo update
# helm upgrade [RELEASE] [CHART] [flags]
helm upgrade actions-runner-controller \
actions-runner-controller/actions-runner-controller \
actions-runner-controller
--install \
--namespace actions-runner-system \
--version ${CHART_VERSION}
```

View File

@@ -58,3 +58,7 @@ Create the name of the service account to use
{{- define "actions-runner-controller-github-webhook-server.serviceMonitorName" -}}
{{- include "actions-runner-controller-github-webhook-server.fullname" . | trunc 47 }}-service-monitor
{{- end }}
{{- define "actions-runner-controller-github-webhook-server.pdbName" -}}
{{- include "actions-runner-controller-github-webhook-server.fullname" . | trunc 59 }}-pdb
{{- end }}

View File

@@ -68,6 +68,10 @@ Create the name of the service account to use
{{- default (include "actions-runner-controller.fullname" .) .Values.authSecret.name -}}
{{- end }}
{{- define "actions-runner-controller.githubWebhookServerSecretName" -}}
{{- default (include "actions-runner-controller.fullname" .) .Values.githubWebhookServer.secret.name -}}
{{- end }}
{{- define "actions-runner-controller.leaderElectionRoleName" -}}
{{- include "actions-runner-controller.fullname" . }}-leader-election
{{- end }}
@@ -107,3 +111,7 @@ Create the name of the service account to use
{{- define "actions-runner-controller.servingCertName" -}}
{{- include "actions-runner-controller.fullname" . }}-serving-cert
{{- end }}
{{- define "actions-runner-controller.pdbName" -}}
{{- include "actions-runner-controller.fullname" . | trunc 59 }}-pdb
{{- end }}

View File

@@ -1,3 +1,4 @@
{{- if .Values.certManagerEnabled }}
# The following manifests contain a self-signed issuer CR and a certificate CR.
# More document can be found at https://docs.cert-manager.io
# WARNING: Targets CertManager 0.11 check https://docs.cert-manager.io/en/latest/tasks/upgrading/index.html for breaking changes
@@ -22,3 +23,4 @@ spec:
kind: Issuer
name: {{ include "actions-runner-controller.selfsignedIssuerName" . }}
secretName: {{ include "actions-runner-controller.servingCertName" . }}
{{- end }}

View File

@@ -7,4 +7,8 @@ data:
kind: Secret
metadata:
name: controller-manager
{{- if .Values.authSecret.annotations }}
annotations:
{{ toYaml .Values.authSecret.annotations | nindent 4 }}
{{- end }}
{{- end }}

View File

@@ -5,6 +5,10 @@ metadata:
{{- include "actions-runner-controller.labels" . | nindent 4 }}
name: {{ include "actions-runner-controller.metricsServiceName" . }}
namespace: {{ .Release.Namespace }}
{{- with .Values.metrics.serviceAnnotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
ports:
- name: metrics-port

View File

@@ -0,0 +1,19 @@
{{- if .Values.podDisruptionBudget.enabled }}
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
labels:
{{- include "actions-runner-controller.labels" . | nindent 4 }}
name: {{ include "actions-runner-controller.pdbName" . }}
namespace: {{ .Release.Namespace }}
spec:
{{- if .Values.podDisruptionBudget.minAvailable }}
minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
{{- end }}
{{- if .Values.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
{{- end }}
selector:
matchLabels:
{{- include "actions-runner-controller.selectorLabels" . | nindent 6 }}
{{- end -}}

View File

@@ -14,6 +14,7 @@ spec:
metadata:
{{- with .Values.podAnnotations }}
annotations:
kubectl.kubernetes.io/default-logs-container: "manager"
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
@@ -43,9 +44,14 @@ spec:
{{- if .Values.leaderElectionId }}
- "--leader-election-id={{ .Values.leaderElectionId }}"
{{- end }}
- "--port={{ .Values.webhookPort }}"
- "--sync-period={{ .Values.syncPeriod }}"
- "--default-scale-down-delay={{ .Values.defaultScaleDownDelay }}"
- "--docker-image={{ .Values.image.dindSidecarRepositoryAndTag }}"
- "--runner-image={{ .Values.image.actionsRunnerRepositoryAndTag }}"
{{- range .Values.image.actionsRunnerImagePullSecrets }}
- "--runner-image-pull-secret={{ . }}"
{{- end }}
{{- if .Values.dockerRegistryMirror }}
- "--docker-registry-mirror={{ .Values.dockerRegistryMirror }}"
{{- end }}
@@ -58,6 +64,9 @@ spec:
{{- if .Values.logLevel }}
- "--log-level={{ .Values.logLevel }}"
{{- end }}
{{- if .Values.runnerGithubURL }}
- "--runner-github-url={{ .Values.runnerGithubURL }}"
{{- end }}
command:
- "/manager"
env:
@@ -65,6 +74,15 @@ spec:
- name: GITHUB_ENTERPRISE_URL
value: {{ .Values.githubEnterpriseServerURL }}
{{- end }}
{{- if .Values.githubURL }}
- name: GITHUB_URL
value: {{ .Values.githubURL }}
{{- end }}
{{- if .Values.githubUploadURL }}
- name: GITHUB_UPLOAD_URL
value: {{ .Values.githubUploadURL }}
{{- end }}
{{- if .Values.authSecret.enabled }}
- name: GITHUB_TOKEN
valueFrom:
secretKeyRef:
@@ -84,7 +102,22 @@ spec:
name: {{ include "actions-runner-controller.secretName" . }}
optional: true
- name: GITHUB_APP_PRIVATE_KEY
value: /etc/actions-runner-controller/github_app_private_key
valueFrom:
secretKeyRef:
key: github_app_private_key
name: {{ include "actions-runner-controller.secretName" . }}
optional: true
{{- if .Values.authSecret.github_basicauth_username }}
- name: GITHUB_BASICAUTH_USERNAME
value: {{ .Values.authSecret.github_basicauth_username }}
{{- end }}
- name: GITHUB_BASICAUTH_PASSWORD
valueFrom:
secretKeyRef:
key: github_basicauth_password
name: {{ include "actions-runner-controller.secretName" . }}
optional: true
{{- end }}
{{- range $key, $val := .Values.env }}
- name: {{ $key }}
value: {{ $val | quote }}
@@ -93,7 +126,7 @@ spec:
name: manager
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- containerPort: 9443
- containerPort: {{ .Values.webhookPort }}
name: webhook-server
protocol: TCP
{{- if not .Values.metrics.proxy.enabled }}
@@ -106,14 +139,19 @@ spec:
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
volumeMounts:
{{- if .Values.authSecret.enabled }}
- mountPath: "/etc/actions-runner-controller"
name: secret
readOnly: true
{{- end }}
- mountPath: /tmp
name: tmp
- mountPath: /tmp/k8s-webhook-server/serving-certs
name: cert
readOnly: true
{{- if .Values.additionalVolumeMounts }}
{{- toYaml .Values.additionalVolumeMounts | nindent 8 }}
{{- end }}
{{- if .Values.metrics.proxy.enabled }}
- args:
- "--secure-listen-address=0.0.0.0:{{ .Values.metrics.port }}"
@@ -133,15 +171,20 @@ spec:
{{- end }}
terminationGracePeriodSeconds: 10
volumes:
{{- if .Values.authSecret.enabled }}
- name: secret
secret:
secretName: {{ include "actions-runner-controller.secretName" . }}
{{- end }}
- name: cert
secret:
defaultMode: 420
secretName: {{ include "actions-runner-controller.servingCertName" . }}
- name: tmp
emptyDir: {}
{{- if .Values.additionalVolumes }}
{{- toYaml .Values.additionalVolumes | nindent 6}}
{{- end }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
@@ -158,3 +201,6 @@ spec:
topologySpreadConstraints:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- if .Values.hostNetwork }}
hostNetwork: {{ .Values.hostNetwork }}
{{- end }}

View File

@@ -15,6 +15,7 @@ spec:
metadata:
{{- with .Values.githubWebhookServer.podAnnotations }}
annotations:
kubectl.kubernetes.io/default-logs-container: "github-webhook-server"
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
@@ -42,6 +43,12 @@ spec:
{{- if .Values.githubWebhookServer.logLevel }}
- "--log-level={{ .Values.githubWebhookServer.logLevel }}"
{{- end }}
{{- if .Values.scope.singleNamespace }}
- "--watch-namespace={{ default .Release.Namespace .Values.scope.watchNamespace }}"
{{- end }}
{{- if .Values.runnerGithubURL }}
- "--runner-github-url={{ .Values.runnerGithubURL }}"
{{- end }}
command:
- "/github-webhook-server"
env:
@@ -51,6 +58,54 @@ spec:
key: github_webhook_secret_token
name: {{ include "actions-runner-controller-github-webhook-server.secretName" . }}
optional: true
{{- if .Values.githubEnterpriseServerURL }}
- name: GITHUB_ENTERPRISE_URL
value: {{ .Values.githubEnterpriseServerURL }}
{{- end }}
{{- if .Values.githubURL }}
- name: GITHUB_URL
value: {{ .Values.githubURL }}
{{- end }}
{{- if .Values.githubUploadURL }}
- name: GITHUB_UPLOAD_URL
value: {{ .Values.githubUploadURL }}
{{- end }}
{{- if and .Values.githubWebhookServer.useRunnerGroupsVisibility .Values.githubWebhookServer.secret.enabled }}
- name: GITHUB_TOKEN
valueFrom:
secretKeyRef:
key: github_token
name: {{ include "actions-runner-controller.githubWebhookServerSecretName" . }}
optional: true
- name: GITHUB_APP_ID
valueFrom:
secretKeyRef:
key: github_app_id
name: {{ include "actions-runner-controller.githubWebhookServerSecretName" . }}
optional: true
- name: GITHUB_APP_INSTALLATION_ID
valueFrom:
secretKeyRef:
key: github_app_installation_id
name: {{ include "actions-runner-controller.githubWebhookServerSecretName" . }}
optional: true
- name: GITHUB_APP_PRIVATE_KEY
valueFrom:
secretKeyRef:
key: github_app_private_key
name: {{ include "actions-runner-controller.githubWebhookServerSecretName" . }}
optional: true
{{- if .Values.authSecret.github_basicauth_username }}
- name: GITHUB_BASICAUTH_USERNAME
value: {{ .Values.authSecret.github_basicauth_username }}
{{- end }}
- name: GITHUB_BASICAUTH_PASSWORD
valueFrom:
secretKeyRef:
key: github_basicauth_password
name: {{ include "actions-runner-controller.secretName" . }}
optional: true
{{- end }}
{{- range $key, $val := .Values.githubWebhookServer.env }}
- name: {{ $key }}
value: {{ $val | quote }}

View File

@@ -1,14 +1,17 @@
{{- if .Values.githubWebhookServer.ingress.enabled -}}
{{- $fullName := include "actions-runner-controller-github-webhook-server.fullname" . -}}
{{- $svcPort := (index .Values.githubWebhookServer.service.ports 0).port -}}
{{- if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
{{- if .Capabilities.APIVersions.Has "networking.k8s.io/v1/Ingress" }}
apiVersion: networking.k8s.io/v1
{{- else if .Capabilities.APIVersions.Has "networking.k8s.io/v1beta1/Ingress" }}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
{{- else if .Capabilities.APIVersions.Has "extensions/v1beta1/Ingress" }}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ $fullName }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "actions-runner-controller.labels" . | nindent 4 }}
{{- with .Values.githubWebhookServer.ingress.annotations }}
@@ -26,16 +29,32 @@ spec:
secretName: {{ .secretName }}
{{- end }}
{{- end }}
{{- with .Values.githubWebhookServer.ingress.ingressClassName }}
ingressClassName: {{ . }}
{{- end }}
rules:
{{- range .Values.githubWebhookServer.ingress.hosts }}
- host: {{ .host | quote }}
http:
paths:
{{- if .extraPaths }}
{{- toYaml .extraPaths | nindent 10 }}
{{- end }}
{{- range .paths }}
- path: {{ .path }}
{{- if $.Capabilities.APIVersions.Has "networking.k8s.io/v1/Ingress" }}
pathType: {{ .pathType }}
{{- end }}
backend:
{{- if $.Capabilities.APIVersions.Has "networking.k8s.io/v1/Ingress" }}
service:
name: {{ $fullName }}
port:
number: {{ $svcPort }}
{{- else }}
serviceName: {{ $fullName }}
servicePort: {{ $svcPort }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,19 @@
{{- if .Values.githubWebhookServer.podDisruptionBudget.enabled }}
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
labels:
{{- include "actions-runner-controller.labels" . | nindent 4 }}
name: {{ include "actions-runner-controller-github-webhook-server.pdbName" . }}
namespace: {{ .Release.Namespace }}
spec:
{{- if .Values.githubWebhookServer.podDisruptionBudget.minAvailable }}
minAvailable: {{ .Values.githubWebhookServer.podDisruptionBudget.minAvailable }}
{{- end }}
{{- if .Values.githubWebhookServer.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ .Values.githubWebhookServer.podDisruptionBudget.maxUnavailable }}
{{- end }}
selector:
matchLabels:
{{- include "actions-runner-controller-github-webhook-server.selectorLabels" . | nindent 6 }}
{{- end -}}

View File

@@ -12,5 +12,17 @@ data:
{{- if .Values.githubWebhookServer.secret.github_webhook_secret_token }}
github_webhook_secret_token: {{ .Values.githubWebhookServer.secret.github_webhook_secret_token | toString | b64enc }}
{{- end }}
{{- if .Values.githubWebhookServer.secret.github_app_id }}
github_app_id: {{ .Values.githubWebhookServer.secret.github_app_id | toString | b64enc }}
{{- end }}
{{- if .Values.githubWebhookServer.secret.github_app_installation_id }}
github_app_installation_id: {{ .Values.githubWebhookServer.secret.github_app_installation_id | toString | b64enc }}
{{- end }}
{{- if .Values.githubWebhookServer.secret.github_app_private_key }}
github_app_private_key: {{ .Values.githubWebhookServer.secret.github_app_private_key | toString | b64enc }}
{{- end }}
{{- if .Values.githubWebhookServer.secret.github_token }}
github_token: {{ .Values.githubWebhookServer.secret.github_token | toString | b64enc }}
{{- end }}
{{- end }}
{{- end }}

View File

@@ -195,6 +195,28 @@ rules:
verbs:
- create
- patch
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- persistentvolumes
verbs:
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- coordination.k8s.io
resources:
@@ -228,3 +250,11 @@ rules:
- patch
- update
- watch
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
- list
- watch

View File

@@ -4,14 +4,20 @@ kind: Secret
metadata:
name: {{ include "actions-runner-controller.secretName" . }}
namespace: {{ .Release.Namespace }}
{{- if .Values.authSecret.annotations }}
annotations:
{{ toYaml .Values.authSecret.annotations | nindent 4 }}
{{- end }}
labels:
{{- include "actions-runner-controller.labels" . | nindent 4 }}
type: Opaque
data:
{{- if .Values.authSecret.github_app_id }}
# Keep this as a string as strings integrate better with things like AWS Parameter Store, see PR #882 for an example
github_app_id: {{ .Values.authSecret.github_app_id | toString | b64enc }}
{{- end }}
{{- if .Values.authSecret.github_app_installation_id }}
# Keep this as a string as strings integrate better with things like AWS Parameter Store, see PR #882 for an example
github_app_installation_id: {{ .Values.authSecret.github_app_installation_id | toString | b64enc }}
{{- end }}
{{- if .Values.authSecret.github_app_private_key }}
@@ -20,4 +26,7 @@ data:
{{- if .Values.authSecret.github_token }}
github_token: {{ .Values.authSecret.github_token | toString | b64enc }}
{{- end }}
{{- if .Values.authSecret.github_basicauth_password }}
github_basicauth_password: {{ .Values.authSecret.github_basicauth_password | toString | b64enc }}
{{- end }}
{{- end }}

View File

@@ -5,12 +5,22 @@ kind: MutatingWebhookConfiguration
metadata:
creationTimestamp: null
name: {{ include "actions-runner-controller.fullname" . }}-mutating-webhook-configuration
{{- if .Values.certManagerEnabled }}
annotations:
cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "actions-runner-controller.servingCertName" . }}
{{- end }}
webhooks:
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ quote .Values.admissionWebHooks.caBundle }}
{{- end }}
service:
name: {{ include "actions-runner-controller.webhookServiceName" . }}
namespace: {{ .Release.Namespace }}
@@ -30,7 +40,15 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
{{- end }}
service:
name: {{ include "actions-runner-controller.webhookServiceName" . }}
namespace: {{ .Release.Namespace }}
@@ -50,7 +68,15 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
{{- end }}
service:
name: {{ include "actions-runner-controller.webhookServiceName" . }}
namespace: {{ .Release.Namespace }}
@@ -70,7 +96,15 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
{{- end }}
service:
name: {{ include "actions-runner-controller.webhookServiceName" . }}
namespace: {{ .Release.Namespace }}
@@ -96,12 +130,22 @@ kind: ValidatingWebhookConfiguration
metadata:
creationTimestamp: null
name: {{ include "actions-runner-controller.fullname" . }}-validating-webhook-configuration
{{- if .Values.certManagerEnabled }}
annotations:
cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "actions-runner-controller.servingCertName" . }}
{{- end }}
webhooks:
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
{{- end }}
service:
name: {{ include "actions-runner-controller.webhookServiceName" . }}
namespace: {{ .Release.Namespace }}
@@ -121,7 +165,15 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
{{- end }}
service:
name: {{ include "actions-runner-controller.webhookServiceName" . }}
namespace: {{ .Release.Namespace }}
@@ -141,7 +193,15 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
{{- end }}
service:
name: {{ include "actions-runner-controller.webhookServiceName" . }}
namespace: {{ .Release.Namespace }}

View File

@@ -5,11 +5,15 @@ metadata:
namespace: {{ .Release.Namespace }}
labels:
{{- include "actions-runner-controller.labels" . | nindent 4 }}
{{- with .Values.service.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
type: {{ .Values.service.type }}
ports:
- port: 443
targetPort: 9443
targetPort: {{ .Values.webhookPort }}
protocol: TCP
name: https
selector:

View File

@@ -6,13 +6,16 @@ labels: {}
replicaCount: 1
syncPeriod: 10m
webhookPort: 9443
syncPeriod: 1m
defaultScaleDownDelay: 10m
enableLeaderElection: true
# Specifies the controller id for leader election.
# Must be unique if more than one controller installed onto the same namespace.
#leaderElectionId: "actions-runner-controller"
# DEPRECATED: This has been removed as unnecessary in #1192
# The controller tries its best not to repeat the duplicate GitHub API call
# within this duration.
# Defaults to syncPeriod - 10s.
@@ -21,17 +24,34 @@ enableLeaderElection: true
# The URL of your GitHub Enterprise server, if you're using one.
#githubEnterpriseServerURL: https://github.example.com
# Override GitHub URLs in case of using proxy APIs
#githubURL: ""
#githubUploadURL: ""
#runnerGithubURL: ""
# Only 1 authentication method can be deployed at a time
# Uncomment the configuration you are applying and fill in the details
#
# If authSecret.enabled=true these values are inherited to actions-runner-controller's controller-manager container's env.
#
# Do set authSecret.enabled=false and set env if you want full control over
# the GitHub authn related envvars of the container.
# See https://github.com/actions-runner-controller/actions-runner-controller/pull/937 for more details.
authSecret:
enabled: true
create: false
name: "controller-manager"
annotations: {}
### GitHub Apps Configuration
## NOTE: IDs MUST be strings, use quotes
#github_app_id: ""
#github_app_installation_id: ""
#github_app_private_key: |
### GitHub PAT Configuration
#github_token: ""
### Basic auth for github API proxy
#github_basicauth_username: ""
#github_basicauth_password: ""
dockerRegistryMirror: ""
image:
@@ -39,6 +59,9 @@ image:
actionsRunnerRepositoryAndTag: "summerwind/actions-runner:latest"
dindSidecarRepositoryAndTag: "docker:dind"
pullPolicy: IfNotPresent
# The default image-pull secrets name for self-hosted runner container.
# It's added to spec.ImagePullSecrets of self-hosted runner pods.
actionsRunnerImagePullSecrets: []
imagePullSecrets: []
nameOverride: ""
@@ -70,11 +93,15 @@ securityContext:
# runAsNonRoot: true
# runAsUser: 1000
# Webhook service resource
service:
type: ClusterIP
port: 443
annotations: {}
# Metrics service resource
metrics:
serviceAnnotations: {}
serviceMonitor: false
serviceMonitorLabels: {}
port: 8443
@@ -82,7 +109,7 @@ metrics:
enabled: true
image:
repository: quay.io/brancz/kube-rbac-proxy
tag: v0.10.0
tag: v0.13.0
resources:
{}
@@ -103,6 +130,12 @@ tolerations: []
affinity: {}
# Only one of minAvailable or maxUnavailable can be set
podDisruptionBudget:
enabled: false
# minAvailable: 1
# maxUnavailable: 3
# Leverage a PriorityClass to ensure your pods survive resource shortages
# ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
# PriorityClass: system-cluster-critical
@@ -114,6 +147,14 @@ env:
# https_proxy: "proxy.com:8080"
# no_proxy: ""
## specify additional volumes to mount in the manager container, this can be used
## to specify additional storage of material or to inject files from ConfigMaps
## into the running container
additionalVolumes: []
## specify where the additional volumes are mounted in the manager container
additionalVolumeMounts: []
scope:
# If true, the controller will only watch custom resources in a single namespace
singleNamespace: false
@@ -121,15 +162,34 @@ scope:
# The default value is "", which means the namespace of the controller
watchNamespace: ""
certManagerEnabled: true
admissionWebHooks:
{}
#caBundle: "Ci0tLS0tQk...<base64-encoded PEM bundle containing the CA that signed the webhook's serving certificate>...tLS0K"
# There may be alternatives to setting `hostNetwork: true`, see
# https://github.com/actions-runner-controller/actions-runner-controller/issues/1005#issuecomment-993097155
#hostNetwork: true
githubWebhookServer:
enabled: false
replicaCount: 1
syncPeriod: 10m
useRunnerGroupsVisibility: false
secret:
enabled: false
create: false
name: "github-webhook-server"
### GitHub Webhook Configuration
github_webhook_secret_token: ""
### GitHub Apps Configuration
## NOTE: IDs MUST be strings, use quotes
#github_app_id: ""
#github_app_installation_id: ""
#github_app_private_key: |
### GitHub PAT Configuration
#github_token: ""
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
@@ -162,14 +222,36 @@ githubWebhookServer:
#nodePort: someFixedPortForUseWithTerraformCdkCfnEtc
ingress:
enabled: false
annotations:
{}
ingressClassName: ""
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: chart-example.local
paths: []
# - path: /*
# pathType: ImplementationSpecific
# Extra paths that are not automatically connected to the server. This is useful when working with annotation based services.
extraPaths: []
# - path: /*
# backend:
# serviceName: ssl-redirect
# servicePort: use-annotation
## for Kubernetes >=1.19 (when "networking.k8s.io/v1" is used)
# - path: /*
# pathType: Prefix
# backend:
# service:
# name: ssl-redirect
# port:
# name: use-annotation
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
# Only one of minAvailable or maxUnavailable can be set
podDisruptionBudget:
enabled: false
# minAvailable: 1
# maxUnavailable: 3

View File

@@ -20,6 +20,7 @@ import (
"context"
"errors"
"flag"
"fmt"
"net/http"
"os"
"sync"
@@ -27,14 +28,15 @@ import (
actionsv1alpha1 "github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/actions-runner-controller/actions-runner-controller/controllers"
zaplib "go.uber.org/zap"
"github.com/actions-runner-controller/actions-runner-controller/github"
"github.com/actions-runner-controller/actions-runner-controller/logging"
"github.com/kelseyhightower/envconfig"
"k8s.io/apimachinery/pkg/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
_ "k8s.io/client-go/plugin/pkg/client/auth/exec"
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
_ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
// +kubebuilder:scaffold:imports
)
@@ -44,10 +46,7 @@ var (
)
const (
logLevelDebug = "debug"
logLevelInfo = "info"
logLevelWarn = "warn"
logLevelError = "error"
webhookSecretTokenEnvName = "GITHUB_WEBHOOK_SECRET_TOKEN"
)
func init() {
@@ -65,16 +64,27 @@ func main() {
metricsAddr string
// The secret token of the GitHub Webhook. See https://docs.github.com/en/developers/webhooks-and-events/securing-your-webhooks
webhookSecretToken string
webhookSecretToken string
webhookSecretTokenEnv string
watchNamespace string
enableLeaderElection bool
syncPeriod time.Duration
logLevel string
queueLimit int
ghClient *github.Client
)
webhookSecretToken = os.Getenv("GITHUB_WEBHOOK_SECRET_TOKEN")
var c github.Config
err = envconfig.Process("github", &c)
if err != nil {
fmt.Fprintf(os.Stderr, "Error: processing environment variables: %v\n", err)
os.Exit(1)
}
webhookSecretTokenEnv = os.Getenv(webhookSecretTokenEnvName)
flag.StringVar(&webhookAddr, "webhook-addr", ":8000", "The address the metric endpoint binds to.")
flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
@@ -82,11 +92,28 @@ func main() {
flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,
"Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.")
flag.DurationVar(&syncPeriod, "sync-period", 10*time.Minute, "Determines the minimum frequency at which K8s resources managed by this controller are reconciled. When you use autoscaling, set to a lower value like 10 minute, because this corresponds to the minimum time to react on demand change")
flag.StringVar(&logLevel, "log-level", logLevelDebug, `The verbosity of the logging. Valid values are "debug", "info", "warn", "error". Defaults to "debug".`)
flag.StringVar(&logLevel, "log-level", logging.LogLevelDebug, `The verbosity of the logging. Valid values are "debug", "info", "warn", "error". Defaults to "debug".`)
flag.IntVar(&queueLimit, "queue-limit", controllers.DefaultQueueLimit, `The maximum length of the scale operation queue. The scale opration is enqueued per every matching webhook event, and the server returns a 500 HTTP status when the queue was already full on enqueue attempt.`)
flag.StringVar(&webhookSecretToken, "github-webhook-secret-token", "", "The personal access token of GitHub.")
flag.StringVar(&c.Token, "github-token", c.Token, "The personal access token of GitHub.")
flag.Int64Var(&c.AppID, "github-app-id", c.AppID, "The application ID of GitHub App.")
flag.Int64Var(&c.AppInstallationID, "github-app-installation-id", c.AppInstallationID, "The installation ID of GitHub App.")
flag.StringVar(&c.AppPrivateKey, "github-app-private-key", c.AppPrivateKey, "The path of a private key file to authenticate as a GitHub App")
flag.StringVar(&c.URL, "github-url", c.URL, "GitHub URL to be used for GitHub API calls")
flag.StringVar(&c.UploadURL, "github-upload-url", c.UploadURL, "GitHub Upload URL to be used for GitHub API calls")
flag.StringVar(&c.BasicauthUsername, "github-basicauth-username", c.BasicauthUsername, "Username for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API")
flag.StringVar(&c.BasicauthPassword, "github-basicauth-password", c.BasicauthPassword, "Password for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API")
flag.StringVar(&c.RunnerGitHubURL, "runner-github-url", c.RunnerGitHubURL, "GitHub URL to be used by runners during registration")
flag.Parse()
if webhookSecretToken == "" && webhookSecretTokenEnv != "" {
setupLog.Info(fmt.Sprintf("Using the value from %s for -github-webhook-secret-token", webhookSecretTokenEnvName))
webhookSecretToken = webhookSecretTokenEnv
}
if webhookSecretToken == "" {
setupLog.Info("-webhook-secret-token is missing or empty. Create one following https://docs.github.com/en/developers/webhooks-and-events/securing-your-webhooks")
setupLog.Info(fmt.Sprintf("-github-webhook-secret-token and %s are missing or empty. Create one following https://docs.github.com/en/developers/webhooks-and-events/securing-your-webhooks and specify it via the flag or the envvar", webhookSecretTokenEnvName))
}
if watchNamespace == "" {
@@ -95,24 +122,28 @@ func main() {
setupLog.Info("-watch-namespace is %q. Only HorizontalRunnerAutoscalers in %q are watched, cached, and considered as scale targets.")
}
logger := zap.New(func(o *zap.Options) {
switch logLevel {
case logLevelDebug:
o.Development = true
case logLevelInfo:
lvl := zaplib.NewAtomicLevelAt(zaplib.InfoLevel)
o.Level = &lvl
case logLevelWarn:
lvl := zaplib.NewAtomicLevelAt(zaplib.WarnLevel)
o.Level = &lvl
case logLevelError:
lvl := zaplib.NewAtomicLevelAt(zaplib.ErrorLevel)
o.Level = &lvl
}
})
logger := logging.NewLogger(logLevel)
ctrl.SetLogger(logger)
// In order to support runner groups with custom visibility (selected repositories), we need to perform some GitHub API calls.
// Let the user define if they want to opt-in supporting this option by providing the proper GitHub authentication parameters
// Without an opt-in, runner groups with custom visibility won't be supported to save API calls
// That is, all runner groups managed by ARC are assumed to be visible to any repositories,
// which is wrong when you have one or more non-default runner groups in your organization or enterprise.
if len(c.Token) > 0 || (c.AppID > 0 && c.AppInstallationID > 0 && c.AppPrivateKey != "") || (len(c.BasicauthUsername) > 0 && len(c.BasicauthPassword) > 0) {
c.Log = &logger
ghClient, err = c.NewClient()
if err != nil {
fmt.Fprintln(os.Stderr, "Error: Client creation failed.", err)
setupLog.Error(err, "unable to create controller", "controller", "Runner")
os.Exit(1)
}
} else {
setupLog.Info("GitHub client is not initialized. Runner groups with custom visibility are not supported. If needed, please provide GitHub authentication. This will incur in extra GitHub API calls")
}
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
SyncPeriod: &syncPeriod,
@@ -127,16 +158,19 @@ func main() {
}
hraGitHubWebhook := &controllers.HorizontalRunnerAutoscalerGitHubWebhook{
Name: "webhookbasedautoscaler",
Client: mgr.GetClient(),
Log: ctrl.Log.WithName("controllers").WithName("Runner"),
Log: ctrl.Log.WithName("controllers").WithName("webhookbasedautoscaler"),
Recorder: nil,
Scheme: mgr.GetScheme(),
SecretKeyBytes: []byte(webhookSecretToken),
Namespace: watchNamespace,
GitHubClient: ghClient,
QueueLimit: queueLimit,
}
if err = hraGitHubWebhook.SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "Runner")
setupLog.Error(err, "unable to create controller", "controller", "webhookbasedautoscaler")
os.Exit(1)
}

View File

@@ -2,7 +2,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.6.0
controller-gen.kubebuilder.io/version: v0.7.0
creationTimestamp: null
name: horizontalrunnerautoscalers.actions.summerwind.dev
spec:
@@ -49,6 +49,9 @@ spec:
items:
description: CapacityReservation specifies the number of replicas temporarily added to the scale target until ExpirationTime.
properties:
effectiveTime:
format: date-time
type: string
expirationTime:
format: date-time
type: string
@@ -59,7 +62,7 @@ spec:
type: object
type: array
maxReplicas:
description: MinReplicas is the maximum number of replicas the deployment is allowed to scale
description: MaxReplicas is the maximum number of replicas the deployment is allowed to scale
type: integer
metrics:
description: Metrics is the collection of various metric targets to calculate desired number of runners
@@ -138,6 +141,7 @@ spec:
status:
type: string
types:
description: 'One of: created, rerequested, or completed'
items:
type: string
type: array
@@ -157,6 +161,9 @@ spec:
push:
description: PushSpec is the condition for triggering scale-up on push event Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
type: object
workflowJob:
description: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job
type: object
type: object
type: object
type: array

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -8,6 +8,7 @@ spec:
conversion:
strategy: Webhook
webhook:
conversionReviewVersions: ["v1","v1beta1"]
clientConfig:
# this is "\n" used as a placeholder, otherwise it will be rejected by the apiserver for being blank,
# but we're going to set it later using the cert-manager (or potentially a patch if not using cert-manager)

View File

@@ -20,19 +20,20 @@ bases:
- ../webhook
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus
patchesStrategicMerge:
# Protect the /metrics endpoint by putting it behind auth.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
# Protect the /metrics endpoint by putting it behind auth.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
- manager_auth_proxy_patch.yaml
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, uncomment the following line and
# comment manager_auth_proxy_patch.yaml.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, uncomment the following line and
# comment manager_auth_proxy_patch.yaml.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
#- manager_prometheus_metrics_patch.yaml
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in crd/kustomization.yaml

View File

@@ -23,4 +23,3 @@ spec:
args:
- "--metrics-addr=127.0.0.1:8080"
- "--enable-leader-election"
- "--sync-period=10m"

View File

@@ -0,0 +1,37 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
template:
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
spec:
containers:
- name: github-webhook-server
image: controller:latest
command:
- '/github-webhook-server'
env:
- name: GITHUB_WEBHOOK_SECRET_TOKEN
valueFrom:
secretKeyRef:
key: github_webhook_secret_token
name: github-webhook-server
optional: true
ports:
- containerPort: 8000
name: http
protocol: TCP
serviceAccountName: github-webhook-server
terminationGracePeriodSeconds: 10

View File

@@ -0,0 +1,23 @@
# This patch injects an HTTP proxy sidecar container that performs RBAC
# authorization against the Kubernetes API using SubjectAccessReviews.
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-webhook-server
spec:
template:
spec:
containers:
- name: kube-rbac-proxy
image: quay.io/brancz/kube-rbac-proxy:v0.10.0
args:
- '--secure-listen-address=0.0.0.0:8443'
- '--upstream=http://127.0.0.1:8080/'
- '--logtostderr=true'
- '--v=10'
ports:
- containerPort: 8443
name: https
- name: github-webhook-server
args:
- '--metrics-addr=127.0.0.1:8080'

View File

@@ -0,0 +1,15 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
images:
- name: controller
newName: summerwind/actions-runner-controller
newTag: latest
resources:
- deployment.yaml
- rbac.yaml
- service.yaml
patchesStrategicMerge:
- gh-webhook-server-auth-proxy-patch.yaml

View File

@@ -0,0 +1,113 @@
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
rules:
- apiGroups:
- actions.summerwind.dev
resources:
- horizontalrunnerautoscalers
verbs:
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- horizontalrunnerautoscalers/finalizers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- horizontalrunnerautoscalers/status
verbs:
- get
- patch
- update
- apiGroups:
- actions.summerwind.dev
resources:
- runnersets
verbs:
- get
- list
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- runnerdeployments
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- runnerdeployments/finalizers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- runnerdeployments/status
verbs:
- get
- patch
- update
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: github-webhook-server
subjects:
- kind: ServiceAccount
name: github-webhook-server

View File

@@ -0,0 +1,16 @@
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
spec:
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller

View File

@@ -202,6 +202,29 @@ rules:
verbs:
- create
- patch
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- persistentvolumes
verbs:
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
@@ -226,3 +249,12 @@ rules:
- patch
- update
- watch
- apiGroups:
- ""
resources:
- secrets
verbs:
- delete
- get
- list
- watch

View File

@@ -7,9 +7,11 @@ import (
"math"
"strconv"
"strings"
"time"
"github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/google/go-github/v45/github"
corev1 "k8s.io/api/core/v1"
"sigs.k8s.io/controller-runtime/pkg/client"
)
const (
@@ -19,47 +21,6 @@ const (
defaultScaleDownFactor = 0.7
)
func getValueAvailableAt(now time.Time, from, to *time.Time, reservedValue int) *int {
if to != nil && now.After(*to) {
return nil
}
if from != nil && now.Before(*from) {
return nil
}
return &reservedValue
}
func (r *HorizontalRunnerAutoscalerReconciler) fetchSuggestedReplicasFromCache(hra v1alpha1.HorizontalRunnerAutoscaler) *int {
var entry *v1alpha1.CacheEntry
for i := range hra.Status.CacheEntries {
ent := hra.Status.CacheEntries[i]
if ent.Key != v1alpha1.CacheEntryKeyDesiredReplicas {
continue
}
if !time.Now().Before(ent.ExpirationTime.Time) {
continue
}
entry = &ent
break
}
if entry != nil {
v := getValueAvailableAt(time.Now(), nil, &entry.ExpirationTime.Time, entry.Value)
if v != nil {
return v
}
}
return nil
}
func (r *HorizontalRunnerAutoscalerReconciler) suggestDesiredReplicas(st scaleTarget, hra v1alpha1.HorizontalRunnerAutoscaler) (*int, error) {
if hra.Spec.MinReplicas == nil {
return nil, fmt.Errorf("horizontalrunnerautoscaler %s/%s is missing minReplicas", hra.Namespace, hra.Name)
@@ -70,13 +31,11 @@ func (r *HorizontalRunnerAutoscalerReconciler) suggestDesiredReplicas(st scaleTa
metrics := hra.Spec.Metrics
numMetrics := len(metrics)
if numMetrics == 0 {
if len(hra.Spec.ScaleUpTriggers) == 0 {
return r.suggestReplicasByQueuedAndInProgressWorkflowRuns(st, hra, nil)
}
// We don't default to anything since ARC 0.23.0
// See https://github.com/actions-runner-controller/actions-runner-controller/issues/728
return nil, nil
} else if numMetrics > 2 {
return nil, fmt.Errorf("Too many autoscaling metrics configured: It must be 0 to 2, but got %d", numMetrics)
return nil, fmt.Errorf("too many autoscaling metrics configured: It must be 0 to 2, but got %d", numMetrics)
}
primaryMetric := metrics[0]
@@ -93,7 +52,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) suggestDesiredReplicas(st scaleTa
case v1alpha1.AutoscalingMetricTypePercentageRunnersBusy:
suggested, err = r.suggestReplicasByPercentageRunnersBusy(st, hra, primaryMetric)
default:
return nil, fmt.Errorf("validting autoscaling metrics: unsupported metric type %q", primaryMetric)
return nil, fmt.Errorf("validating autoscaling metrics: unsupported metric type %q", primaryMetric)
}
if err != nil {
@@ -164,14 +123,46 @@ func (r *HorizontalRunnerAutoscalerReconciler) suggestReplicasByQueuedAndInProgr
fallback_cb()
return
}
jobs, _, err := r.GitHubClient.Actions.ListWorkflowJobs(context.TODO(), user, repoName, runID, nil)
if err != nil {
r.Log.Error(err, "Error listing workflow jobs")
fallback_cb()
} else if len(jobs.Jobs) == 0 {
opt := github.ListWorkflowJobsOptions{ListOptions: github.ListOptions{PerPage: 50}}
var allJobs []*github.WorkflowJob
for {
jobs, resp, err := r.GitHubClient.Actions.ListWorkflowJobs(context.TODO(), user, repoName, runID, &opt)
if err != nil {
r.Log.Error(err, "Error listing workflow jobs")
return //err
}
allJobs = append(allJobs, jobs.Jobs...)
if resp.NextPage == 0 {
break
}
opt.Page = resp.NextPage
}
if len(allJobs) == 0 {
fallback_cb()
} else {
for _, job := range jobs.Jobs {
JOB:
for _, job := range allJobs {
runnerLabels := make(map[string]struct{}, len(st.labels))
for _, l := range st.labels {
runnerLabels[l] = struct{}{}
}
if len(job.Labels) == 0 {
// This shouldn't usually happen
r.Log.Info("Detected job with no labels, which is not supported by ARC. Skipping anyway.", "labels", job.Labels, "run_id", job.GetRunID(), "job_id", job.GetID())
continue JOB
}
for _, l := range job.Labels {
if l == "self-hosted" {
continue
}
if _, ok := runnerLabels[l]; !ok {
continue JOB
}
}
switch job.GetStatus() {
case "completed":
// We add a case for `completed` so it is not counted in `unknown`.
@@ -325,22 +316,52 @@ func (r *HorizontalRunnerAutoscalerReconciler) suggestReplicasByPercentageRunner
numRunners int
numRunnersRegistered int
numRunnersBusy int
numTerminatingBusy int
)
numRunners = len(runnerMap)
busyTerminatingRunnerPods := map[string]struct{}{}
kindLabel := LabelKeyRunnerDeploymentName
if hra.Spec.ScaleTargetRef.Kind == "RunnerSet" {
kindLabel = LabelKeyRunnerSetName
}
var runnerPodList corev1.PodList
if err := r.Client.List(ctx, &runnerPodList, client.InNamespace(hra.Namespace), client.MatchingLabels(map[string]string{
kindLabel: hra.Spec.ScaleTargetRef.Name,
})); err != nil {
return nil, err
}
for _, p := range runnerPodList.Items {
if p.Annotations[AnnotationKeyUnregistrationFailureMessage] != "" {
busyTerminatingRunnerPods[p.Name] = struct{}{}
}
}
for _, runner := range runners {
if _, ok := runnerMap[*runner.Name]; ok {
numRunnersRegistered++
if runner.GetBusy() {
numRunnersBusy++
} else if _, ok := busyTerminatingRunnerPods[*runner.Name]; ok {
numTerminatingBusy++
}
delete(busyTerminatingRunnerPods, *runner.Name)
}
}
// Remaining busyTerminatingRunnerPods are runners that were not on the ListRunners API response yet
for range busyTerminatingRunnerPods {
numTerminatingBusy++
}
var desiredReplicas int
fractionBusy := float64(numRunnersBusy) / float64(desiredReplicasBefore)
fractionBusy := float64(numRunnersBusy+numTerminatingBusy) / float64(desiredReplicasBefore)
if fractionBusy >= scaleUpThreshold {
if scaleUpAdjustment > 0 {
desiredReplicas = desiredReplicasBefore + scaleUpAdjustment
@@ -369,6 +390,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) suggestReplicasByPercentageRunner
"num_runners", numRunners,
"num_runners_registered", numRunnersRegistered,
"num_runners_busy", numRunnersBusy,
"num_terminating_busy", numTerminatingBusy,
"namespace", hra.Namespace,
"kind", st.kind,
"name", st.st,

View File

@@ -41,8 +41,12 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
metav1Now := metav1.Now()
testcases := []struct {
repo string
org string
description string
repo string
org string
labels []string
fixed *int
max *int
min *int
@@ -68,6 +72,19 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"status":"in_progress"}, {"status":"in_progress"}]}"`,
want: 3,
},
// Explicitly speified the default `self-hosted` label which is ignored by the simulator,
// as we assume that GitHub Actions automatically associates the `self-hosted` label to every self-hosted runner.
// 3 demanded, max at 3
{
repo: "test/valid",
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(3),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"status":"queued"}, {"status":"in_progress"}, {"status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"status":"in_progress"}, {"status":"in_progress"}]}"`,
want: 3,
},
// 2 demanded, max at 3, currently 3, delay scaling down due to grace period
{
repo: "test/valid",
@@ -152,9 +169,40 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
want: 3,
},
// Job-level autoscaling
// 5 requested from 3 workflows
{
description: "Job-level autoscaling with no explicit runner label (runners have implicit self-hosted, requested self-hosted, 5 jobs from 3 workflows)",
repo: "test/valid",
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"completed", "labels":["self-hosted"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
},
want: 5,
},
{
description: "Skipped job-level autoscaling with no explicit runner label (runners have implicit self-hosted, requested self-hosted+custom, 0 jobs from 3 workflows)",
repo: "test/valid",
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling with no label (runners have implicit self-hosted, jobs had no labels, 0 jobs from 3 workflows)",
repo: "test/valid",
min: intPtr(2),
max: intPtr(10),
@@ -166,6 +214,91 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
2: `{"jobs": [{"status": "in_progress"}, {"status":"completed"}]}`,
3: `{"jobs": [{"status": "in_progress"}, {"status":"queued"}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling with default runner label (runners have self-hosted only, requested self-hosted+custom, 0 jobs from 3 workflows)",
repo: "test/valid",
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling with custom runner label (runners have custom2, requested self-hosted+custom, 0 jobs from 5 workflows",
repo: "test/valid",
labels: []string{"custom2"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling with default runner label (runners have self-hosted, requested managed-runner-label, 0 jobs from 3 runs)",
repo: "test/valid",
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["managed-runner-label"]}, {"status":"queued", "labels":["managed-runner-label"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["managed-runner-label"]}, {"status":"completed", "labels":["managed-runner-label"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["managed-runner-label"]}, {"status":"queued", "labels":["managed-runner-label"]}]}`,
},
want: 2,
},
{
description: "Job-level autoscaling with default + custom runner label (runners have self-hosted+custom, requested self-hosted+custom, 5 jobs from 3 workflows)",
repo: "test/valid",
labels: []string{"self-hosted", "custom"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 5,
},
{
description: "Job-level autoscaling with custom runner label (runners have custom, requested self-hosted+custom, 5 jobs from 3 workflows)",
repo: "test/valid",
labels: []string{"custom"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 5,
},
}
@@ -181,7 +314,12 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
_ = clientgoscheme.AddToScheme(scheme)
_ = v1alpha1.AddToScheme(scheme)
t.Run(fmt.Sprintf("case %d", i), func(t *testing.T) {
testName := fmt.Sprintf("case %d", i)
if tc.description != "" {
testName = tc.description
}
t.Run(testName, func(t *testing.T) {
server := fake.NewServer(
fake.WithListRepositoryWorkflowRunsResponse(200, tc.workflowRuns, tc.workflowRuns_queued, tc.workflowRuns_in_progress),
fake.WithListWorkflowJobsResponse(200, tc.workflowJobs),
@@ -191,9 +329,10 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
client := newGithubClient(server)
h := &HorizontalRunnerAutoscalerReconciler{
Log: log,
GitHubClient: client,
Scheme: scheme,
Log: log,
GitHubClient: client,
Scheme: scheme,
DefaultScaleDownDelay: DefaultScaleDownDelay,
}
rd := v1alpha1.RunnerDeployment{
@@ -206,6 +345,7 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
Spec: v1alpha1.RunnerSpec{
RunnerConfig: v1alpha1.RunnerConfig{
Repository: tc.repo,
Labels: tc.labels,
},
},
},
@@ -220,6 +360,11 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
Spec: v1alpha1.HorizontalRunnerAutoscalerSpec{
MaxReplicas: tc.max,
MinReplicas: tc.min,
Metrics: []v1alpha1.MetricSpec{
{
Type: "TotalNumberOfQueuedAndInProgressWorkflowRuns",
},
},
},
Status: v1alpha1.HorizontalRunnerAutoscalerStatus{
DesiredReplicas: tc.sReplicas,
@@ -234,7 +379,7 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
st := h.scaleTargetFromRD(context.Background(), rd)
got, _, _, err := h.computeReplicasWithCache(log, metav1Now.Time, st, hra, minReplicas)
got, err := h.computeReplicasWithCache(log, metav1Now.Time, st, hra, minReplicas)
if err != nil {
if tc.err == "" {
t.Fatalf("unexpected error: expected none, got %v", err)
@@ -258,8 +403,12 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
metav1Now := metav1.Now()
testcases := []struct {
repos []string
org string
description string
repos []string
org string
labels []string
fixed *int
max *int
min *int
@@ -399,9 +548,43 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
err: "validating autoscaling metrics: spec.autoscaling.metrics[].repositoryNames is required and must have one more more entries for organizational runner deployment",
},
// Job-level autoscaling
// 5 requested from 3 workflows
{
description: "Job-level autoscaling (runners have implicit self-hosted, requested self-hosted, 5 jobs from 3 runs)",
org: "test",
repos: []string{"valid"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"completed", "labels":["self-hosted"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
},
want: 5,
},
{
description: "Job-level autoscaling (runners have explicit self-hosted, requested self-hosted, 5 jobs from 3 runs)",
org: "test",
repos: []string{"valid"},
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"completed", "labels":["self-hosted"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
},
want: 5,
},
{
description: "Skipped job-level autoscaling (jobs lack labels, 0 requested from 3 workflows)",
org: "test",
repos: []string{"valid"},
min: intPtr(2),
@@ -414,8 +597,97 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
2: `{"jobs": [{"status": "in_progress"}, {"status":"completed"}]}`,
3: `{"jobs": [{"status": "in_progress"}, {"status":"queued"}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling (runners have valid and implicit self-hosted, requested self-hosted+custom, 0 jobs from 3 runs)",
org: "test",
repos: []string{"valid"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling (runners have self-hosted, requested self-hosted+custom, 0 jobs from 3 workflows)",
org: "test",
repos: []string{"valid"},
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Job-level autoscaling (runners have custom, requested self-hosted+custom, 5 requested from 3 workflows)",
org: "test",
repos: []string{"valid"},
labels: []string{"custom"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 5,
},
{
description: "Job-level autoscaling (runners have custom, requested custom, 5 requested from 3 workflows)",
org: "test",
repos: []string{"valid"},
labels: []string{"custom"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["custom"]}, {"status":"queued", "labels":["custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["custom"]}, {"status":"completed", "labels":["custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["custom"]}, {"status":"queued", "labels":["custom"]}]}`,
},
want: 5,
},
{
description: "Skipped job-level autoscaling (specified custom2, 0 requested from 3 workflows)",
org: "test",
repos: []string{"valid"},
labels: []string{"custom2"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
}
for i := range testcases {
@@ -429,7 +701,12 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
_ = clientgoscheme.AddToScheme(scheme)
_ = v1alpha1.AddToScheme(scheme)
t.Run(fmt.Sprintf("case %d", i), func(t *testing.T) {
testName := fmt.Sprintf("case %d", i)
if tc.description != "" {
testName = tc.description
}
t.Run(testName, func(t *testing.T) {
t.Helper()
server := fake.NewServer(
@@ -441,9 +718,10 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
client := newGithubClient(server)
h := &HorizontalRunnerAutoscalerReconciler{
Log: log,
Scheme: scheme,
GitHubClient: client,
Log: log,
Scheme: scheme,
GitHubClient: client,
DefaultScaleDownDelay: DefaultScaleDownDelay,
}
rd := v1alpha1.RunnerDeployment{
@@ -465,6 +743,7 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
Spec: v1alpha1.RunnerSpec{
RunnerConfig: v1alpha1.RunnerConfig{
Organization: tc.org,
Labels: tc.labels,
},
},
},
@@ -502,7 +781,7 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
st := h.scaleTargetFromRD(context.Background(), rd)
got, _, _, err := h.computeReplicasWithCache(log, metav1Now.Time, st, hra, minReplicas)
got, err := h.computeReplicasWithCache(log, metav1Now.Time, st, hra, minReplicas)
if err != nil {
if tc.err == "" {
t.Fatalf("unexpected error: expected none, got %v", err)

72
controllers/constants.go Normal file
View File

@@ -0,0 +1,72 @@
package controllers
import "time"
const (
LabelKeyRunnerSetName = "runnerset-name"
LabelKeyRunner = "actions-runner"
)
const (
// This names requires at least one slash to work.
// See https://github.com/google/knative-gcp/issues/378
runnerPodFinalizerName = "actions.summerwind.dev/runner-pod"
runnerLinkedResourcesFinalizerName = "actions.summerwind.dev/linked-resources"
annotationKeyPrefix = "actions-runner/"
AnnotationKeyLastRegistrationCheckTime = "actions-runner-controller/last-registration-check-time"
// AnnotationKeyUnregistrationFailureMessage is the annotation that is added onto the pod once it failed to be unregistered from GitHub due to e.g. 422 error
AnnotationKeyUnregistrationFailureMessage = annotationKeyPrefix + "unregistration-failure-message"
// AnnotationKeyUnregistrationCompleteTimestamp is the annotation that is added onto the pod once the previously started unregistration process has been completed.
AnnotationKeyUnregistrationCompleteTimestamp = annotationKeyPrefix + "unregistration-complete-timestamp"
// AnnotationKeyRunnerCompletionWaitStartTimestamp is the annotation that is added onto the pod when
// ARC decided to wait until the pod to complete by itself, without the need for ARC to unregister the corresponding runner.
AnnotationKeyRunnerCompletionWaitStartTimestamp = annotationKeyPrefix + "runner-completion-wait-start-timestamp"
// unregistarionStartTimestamp is the annotation that contains the time that the requested unregistration process has been started
AnnotationKeyUnregistrationStartTimestamp = annotationKeyPrefix + "unregistration-start-timestamp"
// AnnotationKeyUnregistrationRequestTimestamp is the annotation that contains the time that the unregistration has been requested.
// This doesn't immediately start the unregistration. Instead, ARC will first check if the runner has already been registered.
// If not, ARC will hold on until the registration to complete first, and only after that it starts the unregistration process.
// This is crucial to avoid a race between ARC marking the runner pod for deletion while the actions-runner registers itself to GitHub, leaving the assigned job
// hang like forever.
AnnotationKeyUnregistrationRequestTimestamp = annotationKeyPrefix + "unregistration-request-timestamp"
AnnotationKeyRunnerID = annotationKeyPrefix + "id"
// This can be any value but a larger value can make an unregistration timeout longer than configured in practice.
DefaultUnregistrationRetryDelay = time.Minute
// RetryDelayOnCreateRegistrationError is the delay between retry attempts for runner registration token creation.
// Usually, a retry in this case happens when e.g. your PAT has no access to certain scope of runners, like you're using repository admin's token
// for creating a broader scoped runner token, like organizationa or enterprise runner token.
// Such permission issue will never fixed automatically, so we don't need to retry so often, hence this value.
RetryDelayOnCreateRegistrationError = 3 * time.Minute
// registrationTimeout is the duration until a pod times out after it becomes Ready and Running.
// A pod that is timed out can be terminated if needed.
registrationTimeout = 10 * time.Minute
// DefaultRunnerPodRecreationDelayAfterWebhookScale is the delay until syncing the runners with the desired replicas
// after a webhook-based scale up.
// This is used to prevent ARC from recreating completed runner pods that are deleted soon without being used at all.
// In other words, this is used as a timer to wait for the completed runner to emit the next `workflow_job` webhook event to decrease the desired replicas.
// So if we set 30 seconds for this, you are basically saying that you would assume GitHub and your installation of ARC to
// emit and propagate a workflow_job completion event down to the RunnerSet or RunnerReplicaSet, vha ARC's github webhook server and HRA, in approximately 30 seconds.
// In case it actually took more than DefaultRunnerPodRecreationDelayAfterWebhookScale for the workflow_job completion event to arrive,
// ARC will recreate the completed runner(s), assuming something went wrong in either GitHub, your K8s cluster, or ARC, so ARC needs to resync anyway.
//
// See https://github.com/actions-runner-controller/actions-runner-controller/pull/1180
DefaultRunnerPodRecreationDelayAfterWebhookScale = 10 * time.Minute
EnvVarRunnerName = "RUNNER_NAME"
EnvVarRunnerToken = "RUNNER_TOKEN"
// defaultHookPath is path to the hook script used when the "containerMode: kubernetes" is specified
defaultRunnerHookPath = "/runner/k8s/index.js"
)

View File

@@ -0,0 +1,207 @@
package controllers
import (
"context"
"fmt"
"sync"
"time"
"github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/go-logr/logr"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/client"
)
type batchScaler struct {
Ctx context.Context
Client client.Client
Log logr.Logger
interval time.Duration
queue chan *ScaleTarget
workerStart sync.Once
}
func newBatchScaler(ctx context.Context, client client.Client, log logr.Logger) *batchScaler {
return &batchScaler{
Ctx: ctx,
Client: client,
Log: log,
interval: 3 * time.Second,
}
}
type batchScaleOperation struct {
namespacedName types.NamespacedName
scaleOps []scaleOperation
}
type scaleOperation struct {
trigger v1alpha1.ScaleUpTrigger
log logr.Logger
}
// Add the scale target to the unbounded queue, blocking until the target is successfully added to the queue.
// All the targets in the queue are dequeued every 3 seconds, grouped by the HRA, and applied.
// In a happy path, batchScaler update each HRA only once, even though the HRA had two or more associated webhook events in the 3 seconds interval,
// which results in less K8s API calls and less HRA update conflicts in case your ARC installation receives a lot of webhook events
func (s *batchScaler) Add(st *ScaleTarget) {
if st == nil {
return
}
s.workerStart.Do(func() {
var expBackoff = []time.Duration{time.Second, 2 * time.Second, 4 * time.Second, 8 * time.Second, 16 * time.Second}
s.queue = make(chan *ScaleTarget)
log := s.Log
go func() {
log.Info("Starting batch worker")
defer log.Info("Stopped batch worker")
for {
select {
case <-s.Ctx.Done():
return
default:
}
log.V(2).Info("Batch worker is dequeueing operations")
batches := map[types.NamespacedName]batchScaleOperation{}
after := time.After(s.interval)
var ops uint
batch:
for {
select {
case <-after:
after = nil
break batch
case st := <-s.queue:
nsName := types.NamespacedName{
Namespace: st.HorizontalRunnerAutoscaler.Namespace,
Name: st.HorizontalRunnerAutoscaler.Name,
}
b, ok := batches[nsName]
if !ok {
b = batchScaleOperation{
namespacedName: nsName,
}
}
b.scaleOps = append(b.scaleOps, scaleOperation{
log: *st.log,
trigger: st.ScaleUpTrigger,
})
batches[nsName] = b
ops++
}
}
log.V(2).Info("Batch worker dequeued operations", "ops", ops, "batches", len(batches))
retry:
for i := 0; ; i++ {
failed := map[types.NamespacedName]batchScaleOperation{}
for nsName, b := range batches {
b := b
if err := s.batchScale(context.Background(), b); err != nil {
log.V(2).Info("Failed to scale due to error", "error", err)
failed[nsName] = b
} else {
log.V(2).Info("Successfully ran batch scale", "hra", b.namespacedName)
}
}
if len(failed) == 0 {
break retry
}
batches = failed
delay := 16 * time.Second
if i < len(expBackoff) {
delay = expBackoff[i]
}
time.Sleep(delay)
}
}
}()
})
s.queue <- st
}
func (s *batchScaler) batchScale(ctx context.Context, batch batchScaleOperation) error {
var hra v1alpha1.HorizontalRunnerAutoscaler
if err := s.Client.Get(ctx, batch.namespacedName, &hra); err != nil {
return err
}
copy := hra.DeepCopy()
copy.Spec.CapacityReservations = getValidCapacityReservations(copy)
var added, completed int
for _, scale := range batch.scaleOps {
amount := 1
if scale.trigger.Amount != 0 {
amount = scale.trigger.Amount
}
scale.log.V(2).Info("Adding capacity reservation", "amount", amount)
if amount > 0 {
now := time.Now()
copy.Spec.CapacityReservations = append(copy.Spec.CapacityReservations, v1alpha1.CapacityReservation{
EffectiveTime: metav1.Time{Time: now},
ExpirationTime: metav1.Time{Time: now.Add(scale.trigger.Duration.Duration)},
Replicas: amount,
})
added += amount
} else if amount < 0 {
var reservations []v1alpha1.CapacityReservation
var found bool
for _, r := range copy.Spec.CapacityReservations {
if !found && r.Replicas+amount == 0 {
found = true
} else {
reservations = append(reservations, r)
}
}
copy.Spec.CapacityReservations = reservations
completed += amount
}
}
before := len(hra.Spec.CapacityReservations)
expired := before - len(copy.Spec.CapacityReservations)
after := len(copy.Spec.CapacityReservations)
s.Log.V(1).Info(
fmt.Sprintf("Updating hra %s for capacityReservations update", hra.Name),
"before", before,
"expired", expired,
"added", added,
"completed", completed,
"after", after,
)
if err := s.Client.Update(ctx, copy); err != nil {
return fmt.Errorf("updating horizontalrunnerautoscaler to add capacity reservation: %w", err)
}
return nil
}

View File

@@ -18,28 +18,36 @@ package controllers
import (
"context"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"strings"
"sync"
"time"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
"github.com/go-logr/logr"
gogithub "github.com/google/go-github/v37/github"
gogithub "github.com/google/go-github/v45/github"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/client-go/tools/record"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/actions-runner-controller/actions-runner-controller/github"
"github.com/actions-runner-controller/actions-runner-controller/simulator"
)
const (
scaleTargetKey = "scaleTarget"
keyPrefixEnterprise = "enterprises/"
keyRunnerGroup = "/group/"
DefaultQueueLimit = 100
)
// HorizontalRunnerAutoscalerGitHubWebhook autoscales a HorizontalRunnerAutoscaler and the RunnerDeployment on each
@@ -54,11 +62,23 @@ type HorizontalRunnerAutoscalerGitHubWebhook struct {
// the administrator is generated and specified in GitHub Web UI.
SecretKeyBytes []byte
// GitHub Client to discover runner groups assigned to a repository
GitHubClient *github.Client
// Namespace is the namespace to watch for HorizontalRunnerAutoscaler's to be
// scaled on Webhook.
// Set to empty for letting it watch for all namespaces.
Namespace string
Name string
// QueueLimit is the maximum length of the bounded queue of scale targets and their associated operations
// A scale target is enqueued on each retrieval of each eligible webhook event, so that it is processed asynchronously.
QueueLimit int
worker *worker
workerInit sync.Once
workerStart sync.Once
batchCh chan *ScaleTarget
}
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Reconcile(_ context.Context, request reconcile.Request) (reconcile.Result, error) {
@@ -84,7 +104,7 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
if err != nil {
msg := err.Error()
if written, err := w.Write([]byte(msg)); err != nil {
autoscaler.Log.Error(err, "failed writing http error response", "msg", msg, "written", written)
autoscaler.Log.V(1).Error(err, "failed writing http error response", "msg", msg, "written", written)
}
}
}
@@ -98,6 +118,7 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
// respond ok to GET / e.g. for health check
if r.Method == http.MethodGet {
ok = true
fmt.Fprintln(w, "webhook server is running")
return
}
@@ -141,6 +162,20 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
"delivery", r.Header.Get("X-GitHub-Delivery"),
)
var enterpriseEvent struct {
Enterprise struct {
Slug string `json:"slug,omitempty"`
} `json:"enterprise,omitempty"`
}
if err := json.Unmarshal(payload, &enterpriseEvent); err != nil {
var s string
if payload != nil {
s = string(payload)
}
autoscaler.Log.Error(err, "could not parse webhook payload for extracting enterprise slug", "webhookType", webhookType, "payload", s)
}
enterpriseSlug := enterpriseEvent.Enterprise.Slug
switch e := event.(type) {
case *gogithub.PushEvent:
target, err = autoscaler.getScaleUpTarget(
@@ -149,6 +184,9 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
e.Repo.GetName(),
e.Repo.Owner.GetLogin(),
e.Repo.Owner.GetType(),
// Most go-github Event types don't seem to contain Enteprirse(.Slug) fields
// we need, so we parse it by ourselves.
enterpriseSlug,
autoscaler.MatchPushEvent(e),
)
case *gogithub.PullRequestEvent:
@@ -158,6 +196,9 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
e.Repo.GetName(),
e.Repo.Owner.GetLogin(),
e.Repo.Owner.GetType(),
// Most go-github Event types don't seem to contain Enteprirse(.Slug) fields
// we need, so we parse it by ourselves.
enterpriseSlug,
autoscaler.MatchPullRequestEvent(e),
)
@@ -174,6 +215,9 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
e.Repo.GetName(),
e.Repo.Owner.GetLogin(),
e.Repo.Owner.GetType(),
// Most go-github Event types don't seem to contain Enteprirse(.Slug) fields
// we need, so we parse it by ourselves.
enterpriseSlug,
autoscaler.MatchCheckRunEvent(e),
)
@@ -191,13 +235,14 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
"repository.name", e.Repo.GetName(),
"repository.owner.login", e.Repo.Owner.GetLogin(),
"repository.owner.type", e.Repo.Owner.GetType(),
"enterprise.slug", enterpriseSlug,
"action", e.GetAction(),
)
}
labels := e.WorkflowJob.Labels
switch e.GetAction() {
switch action := e.GetAction(); action {
case "queued", "completed":
target, err = autoscaler.getJobScaleUpTargetForRepoOrOrg(
context.TODO(),
@@ -205,22 +250,34 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
e.Repo.GetName(),
e.Repo.Owner.GetLogin(),
e.Repo.Owner.GetType(),
enterpriseSlug,
labels,
)
if target != nil {
if e.GetAction() == "queued" {
target.Amount = 1
} else if e.GetAction() == "completed" {
// A nagative amount is processed in the tryScale func as a scale-down request,
// that erasese the oldest CapacityReservation with the same amount.
// If the first CapacityReservation was with Replicas=1, this negative scale target erases that,
// so that the resulting desired replicas decreases by 1.
target.Amount = -1
}
if target == nil {
break
}
default:
if e.GetAction() == "queued" {
target.Amount = 1
break
} else if e.GetAction() == "completed" && e.GetWorkflowJob().GetConclusion() != "skipped" {
// A nagative amount is processed in the tryScale func as a scale-down request,
// that erasese the oldest CapacityReservation with the same amount.
// If the first CapacityReservation was with Replicas=1, this negative scale target erases that,
// so that the resulting desired replicas decreases by 1.
target.Amount = -1
break
}
// If the conclusion is "skipped", we will ignore it and fallthrough to the default case.
fallthrough
default:
ok = true
w.WriteHeader(http.StatusOK)
log.V(2).Info("Received and ignored a workflow_job event as it triggers neither scale-up nor scale-down", "action", action)
return
}
case *gogithub.PingEvent:
ok = true
@@ -249,7 +306,7 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
}
if target == nil {
log.Info(
log.V(1).Info(
"Scale target not found. If this is unexpected, ensure that there is exactly one repository-wide or organizational runner deployment that matches this webhook event",
)
@@ -266,9 +323,19 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
return
}
if err := autoscaler.tryScale(context.TODO(), target); err != nil {
log.Error(err, "could not scale up")
autoscaler.workerInit.Do(func() {
batchScaler := newBatchScaler(context.Background(), autoscaler.Client, autoscaler.Log)
queueLimit := autoscaler.QueueLimit
if queueLimit == 0 {
queueLimit = DefaultQueueLimit
}
autoscaler.worker = newWorker(context.Background(), queueLimit, batchScaler.Add)
})
target.log = &log
if ok := autoscaler.worker.Add(target); !ok {
log.Error(err, "Could not scale up due to queue full")
return
}
@@ -310,9 +377,7 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) findHRAsByKey(ctx con
return nil, err
}
for _, d := range hraList.Items {
hras = append(hras, d)
}
hras = append(hras, hraList.Items...)
}
return hras, nil
@@ -339,6 +404,8 @@ func matchTriggerConditionAgainstEvent(types []string, eventAction *string) bool
type ScaleTarget struct {
v1alpha1.HorizontalRunnerAutoscaler
v1alpha1.ScaleUpTrigger
log *logr.Logger
}
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) searchScaleTargets(hras []v1alpha1.HorizontalRunnerAutoscaler, f func(v1alpha1.ScaleUpTrigger) bool) []ScaleTarget {
@@ -391,7 +458,7 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getScaleTarget(ctx co
"Found too many scale targets: "+
"It must be exactly one to avoid ambiguity. "+
"Either set Namespace for the webhook-based autoscaler to let it only find HRAs in the namespace, "+
"or update Repository or Organization fields in your RunnerDeployment resources to fix the ambiguity.",
"or update Repository, Organization, or Enterprise fields in your RunnerDeployment resources to fix the ambiguity.",
"scaleTargets", strings.Join(scaleTargetIDs, ","))
return nil, nil
@@ -400,44 +467,31 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getScaleTarget(ctx co
return &targets[0], nil
}
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getScaleUpTarget(ctx context.Context, log logr.Logger, repo, owner, ownerType string, f func(v1alpha1.ScaleUpTrigger) bool) (*ScaleTarget, error) {
repositoryRunnerKey := owner + "/" + repo
if target, err := autoscaler.getScaleTarget(ctx, repositoryRunnerKey, f); err != nil {
log.Info("finding repository-wide runner", "repository", repositoryRunnerKey)
return nil, err
} else if target != nil {
log.Info("scale up target is repository-wide runners", "repository", repo)
return target, nil
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getScaleUpTarget(ctx context.Context, log logr.Logger, repo, owner, ownerType, enterprise string, f func(v1alpha1.ScaleUpTrigger) bool) (*ScaleTarget, error) {
scaleTarget := func(value string) (*ScaleTarget, error) {
return autoscaler.getScaleTarget(ctx, value, f)
}
if ownerType == "User" {
log.V(1).Info("no repository runner found", "organization", owner)
return nil, nil
}
if target, err := autoscaler.getScaleTarget(ctx, owner, f); err != nil {
log.Info("finding organizational runner", "organization", owner)
return nil, err
} else if target != nil {
log.Info("scale up target is organizational runners", "organization", owner)
return target, nil
} else {
log.V(1).Info("no repository runner or organizational runner found",
"repository", repositoryRunnerKey,
"organization", owner,
)
}
return nil, nil
return autoscaler.getScaleUpTargetWithFunction(ctx, log, repo, owner, ownerType, enterprise, scaleTarget)
}
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getJobScaleUpTargetForRepoOrOrg(ctx context.Context, log logr.Logger, repo, owner, ownerType string, labels []string) (*ScaleTarget, error) {
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getJobScaleUpTargetForRepoOrOrg(
ctx context.Context, log logr.Logger, repo, owner, ownerType, enterprise string, labels []string,
) (*ScaleTarget, error) {
scaleTarget := func(value string) (*ScaleTarget, error) {
return autoscaler.getJobScaleTarget(ctx, value, labels)
}
return autoscaler.getScaleUpTargetWithFunction(ctx, log, repo, owner, ownerType, enterprise, scaleTarget)
}
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getScaleUpTargetWithFunction(
ctx context.Context, log logr.Logger, repo, owner, ownerType, enterprise string, scaleTarget func(value string) (*ScaleTarget, error)) (*ScaleTarget, error) {
repositoryRunnerKey := owner + "/" + repo
if target, err := autoscaler.getJobScaleTarget(ctx, repositoryRunnerKey, labels); err != nil {
log.Info("finding repository-wide runner", "repository", repositoryRunnerKey)
// Search for repository HRAs
if target, err := scaleTarget(repositoryRunnerKey); err != nil {
log.Error(err, "finding repository-wide runner", "repository", repositoryRunnerKey)
return nil, err
} else if target != nil {
log.Info("job scale up target is repository-wide runners", "repository", repo)
@@ -445,25 +499,181 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getJobScaleUpTargetFo
}
if ownerType == "User" {
log.V(1).Info("no repository runner found", "organization", owner)
log.V(1).Info("user repositories not supported", "owner", owner)
return nil, nil
}
if target, err := autoscaler.getJobScaleTarget(ctx, owner, labels); err != nil {
log.Info("finding organizational runner", "organization", owner)
// Find the potential runner groups first to avoid spending API queries needless. Once/if GitHub improves an
// API to find related/linked runner groups from a specific repository this logic could be removed
managedRunnerGroups, err := autoscaler.getManagedRunnerGroupsFromHRAs(ctx, enterprise, owner)
if err != nil {
log.Error(err, "finding potential organization/enterprise runner groups from HRAs", "organization", owner)
return nil, err
} else if target != nil {
log.Info("job scale up target is organizational runners", "organization", owner)
return target, nil
} else {
log.V(1).Info("no repository runner or organizational runner found",
}
if managedRunnerGroups.IsEmpty() {
log.V(1).Info("no repository/organizational/enterprise runner found",
"repository", repositoryRunnerKey,
"organization", owner,
"enterprises", enterprise,
)
} else {
log.V(1).Info("Found some runner groups are managed by ARC", "groups", managedRunnerGroups)
}
var visibleGroups *simulator.VisibleRunnerGroups
if autoscaler.GitHubClient != nil {
simu := &simulator.Simulator{
Client: autoscaler.GitHubClient,
Log: log,
}
// Get available organization runner groups and enterprise runner groups for a repository
// These are the sum of runner groups with repository access = All repositories and runner groups
// where owner/repo has access to as well. The list will include default runner group also if it has access to
visibleGroups, err = simu.GetRunnerGroupsVisibleToRepository(ctx, owner, repositoryRunnerKey, managedRunnerGroups)
log.V(1).Info("Searching in runner groups", "groups", visibleGroups)
if err != nil {
log.Error(err, "Unable to find runner groups from repository", "organization", owner, "repository", repo)
return nil, fmt.Errorf("error while finding visible runner groups: %v", err)
}
} else {
// For backwards compatibility if GitHub authentication is not configured, we assume all runner groups have
// visibility=all to honor the previous implementation, therefore any available enterprise/organization runner
// is a potential target for scaling. This will also avoid doing extra API calls caused by
// GitHubClient.GetRunnerGroupsVisibleToRepository in case users are not using custom visibility on their runner
// groups or they are using only default runner groups
visibleGroups = managedRunnerGroups
}
scaleTargetKey := func(rg simulator.RunnerGroup) string {
switch rg.Kind {
case simulator.Default:
switch rg.Scope {
case simulator.Organization:
return owner
case simulator.Enterprise:
return enterpriseKey(enterprise)
}
case simulator.Custom:
switch rg.Scope {
case simulator.Organization:
return organizationalRunnerGroupKey(owner, rg.Name)
case simulator.Enterprise:
return enterpriseRunnerGroupKey(enterprise, rg.Name)
}
}
return ""
}
log.V(1).Info("groups", "groups", visibleGroups)
var t *ScaleTarget
traverseErr := visibleGroups.Traverse(func(rg simulator.RunnerGroup) (bool, error) {
key := scaleTargetKey(rg)
target, err := scaleTarget(key)
if err != nil {
log.Error(err, "finding runner group", "enterprise", enterprise, "organization", owner, "repository", repo, "key", key)
return false, err
} else if target == nil {
return false, nil
}
t = target
log.V(1).Info("job scale up target found", "enterprise", enterprise, "organization", owner, "repository", repo, "key", key)
return true, nil
})
if traverseErr != nil {
return nil, err
}
if t == nil {
log.V(1).Info("no repository/organizational/enterprise runner found",
"repository", repositoryRunnerKey,
"organization", owner,
"enterprise", enterprise,
)
}
return nil, nil
return t, nil
}
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getManagedRunnerGroupsFromHRAs(ctx context.Context, enterprise, org string) (*simulator.VisibleRunnerGroups, error) {
groups := simulator.NewVisibleRunnerGroups()
ns := autoscaler.Namespace
var defaultListOpts []client.ListOption
if ns != "" {
defaultListOpts = append(defaultListOpts, client.InNamespace(ns))
}
opts := append([]client.ListOption{}, defaultListOpts...)
if autoscaler.Namespace != "" {
opts = append(opts, client.InNamespace(autoscaler.Namespace))
}
var hraList v1alpha1.HorizontalRunnerAutoscalerList
if err := autoscaler.List(ctx, &hraList, opts...); err != nil {
return groups, err
}
for _, hra := range hraList.Items {
var o, e, g string
kind := hra.Spec.ScaleTargetRef.Kind
switch kind {
case "RunnerSet":
var rs v1alpha1.RunnerSet
if err := autoscaler.Client.Get(context.Background(), types.NamespacedName{Namespace: hra.Namespace, Name: hra.Spec.ScaleTargetRef.Name}, &rs); err != nil {
return groups, err
}
o, e, g = rs.Spec.Organization, rs.Spec.Enterprise, rs.Spec.Group
case "RunnerDeployment", "":
var rd v1alpha1.RunnerDeployment
if err := autoscaler.Client.Get(context.Background(), types.NamespacedName{Namespace: hra.Namespace, Name: hra.Spec.ScaleTargetRef.Name}, &rd); err != nil {
return groups, err
}
o, e, g = rd.Spec.Template.Spec.Organization, rd.Spec.Template.Spec.Enterprise, rd.Spec.Template.Spec.Group
default:
return nil, fmt.Errorf("unsupported scale target kind: %v", kind)
}
if g != "" && e == "" && o == "" {
autoscaler.Log.V(1).Info(
"invalid runner group config in scale target: spec.group must be set along with either spec.enterprise or spec.organization",
"scaleTargetKind", kind,
"group", g,
"enterprise", e,
"organization", o,
)
continue
}
if e != enterprise && o != org {
autoscaler.Log.V(1).Info(
"Skipped scale target irrelevant to event",
"eventOrganization", org,
"eventEnterprise", enterprise,
"scaleTargetKind", kind,
"scaleTargetGroup", g,
"scaleTargetEnterprise", e,
"scaleTargetOrganization", o,
)
continue
}
rg := simulator.NewRunnerGroupFromProperties(e, o, g)
if err := groups.Add(rg); err != nil {
return groups, fmt.Errorf("failed adding visible group from HRA %s/%s: %w", hra.Namespace, hra.Name, err)
}
}
return groups, nil
}
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getJobScaleTarget(ctx context.Context, name string, labels []string) (*ScaleTarget, error) {
@@ -482,16 +692,29 @@ HRA:
if len(hra.Spec.ScaleUpTriggers) > 1 {
autoscaler.Log.V(1).Info("Skipping this HRA as it has too many ScaleUpTriggers to be used in workflow_job based scaling", "hra", hra.Name)
continue
}
if len(hra.Spec.ScaleUpTriggers) == 0 {
autoscaler.Log.V(1).Info("Skipping this HRA as it has no ScaleUpTriggers configured", "hra", hra.Name)
continue
}
scaleUpTrigger := hra.Spec.ScaleUpTriggers[0]
if scaleUpTrigger.GitHubEvent == nil {
autoscaler.Log.V(1).Info("Skipping this HRA as it has no `githubEvent` scale trigger configured", "hra", hra.Name)
continue
}
var duration metav1.Duration
if scaleUpTrigger.GitHubEvent.WorkflowJob == nil {
autoscaler.Log.V(1).Info("Skipping this HRA as it has no `githubEvent.workflowJob` scale trigger configured", "hra", hra.Name)
if len(hra.Spec.ScaleUpTriggers) > 0 {
duration = hra.Spec.ScaleUpTriggers[0].Duration
continue
}
duration := scaleUpTrigger.Duration
if duration.Duration <= 0 {
// Try to release the reserved capacity after at least 10 minutes by default,
// we won't end up in the reserved capacity remained forever in case GitHub somehow stopped sending us "completed" workflow_job events.
@@ -508,13 +731,17 @@ HRA:
return nil, err
}
if len(labels) == 1 && labels[0] == "self-hosted" {
return &ScaleTarget{HorizontalRunnerAutoscaler: hra, ScaleUpTrigger: v1alpha1.ScaleUpTrigger{Duration: duration}}, nil
}
// Ensure that the RunnerSet-managed runners have all the labels requested by the workflow_job.
for _, l := range labels {
var matched bool
// ignore "self-hosted" label as all instance here are self-hosted
if l == "self-hosted" {
continue
}
// TODO labels related to OS and architecture needs to be explicitly declared or the current implementation will not be able to find them.
for _, l2 := range rs.Spec.Labels {
if l == l2 {
matched = true
@@ -535,14 +762,18 @@ HRA:
return nil, err
}
if len(labels) == 1 && labels[0] == "self-hosted" {
return &ScaleTarget{HorizontalRunnerAutoscaler: hra, ScaleUpTrigger: v1alpha1.ScaleUpTrigger{Duration: duration}}, nil
}
// Ensure that the RunnerDeployment-managed runners have all the labels requested by the workflow_job.
for _, l := range labels {
var matched bool
for _, l2 := range rd.Spec.Template.Labels {
// ignore "self-hosted" label as all instance here are self-hosted
if l == "self-hosted" {
continue
}
// TODO labels related to OS and architecture needs to be explicitly declared or the current implementation will not be able to find them.
for _, l2 := range rd.Spec.Template.Spec.Labels {
if l == l2 {
matched = true
break
@@ -563,55 +794,6 @@ HRA:
return nil, nil
}
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) tryScale(ctx context.Context, target *ScaleTarget) error {
if target == nil {
return nil
}
copy := target.HorizontalRunnerAutoscaler.DeepCopy()
amount := 1
if target.ScaleUpTrigger.Amount != 0 {
amount = target.ScaleUpTrigger.Amount
}
capacityReservations := getValidCapacityReservations(copy)
if amount > 0 {
copy.Spec.CapacityReservations = append(capacityReservations, v1alpha1.CapacityReservation{
ExpirationTime: metav1.Time{Time: time.Now().Add(target.ScaleUpTrigger.Duration.Duration)},
Replicas: amount,
})
} else if amount < 0 {
var reservations []v1alpha1.CapacityReservation
var found bool
for _, r := range capacityReservations {
if !found && r.Replicas+amount == 0 {
found = true
} else {
reservations = append(reservations, r)
}
}
copy.Spec.CapacityReservations = reservations
}
autoscaler.Log.Info(
"Patching hra for capacityReservations update",
"before", target.HorizontalRunnerAutoscaler.Spec.CapacityReservations,
"after", copy.Spec.CapacityReservations,
)
if err := autoscaler.Client.Patch(ctx, copy, client.MergeFrom(&target.HorizontalRunnerAutoscaler)); err != nil {
return fmt.Errorf("patching horizontalrunnerautoscaler to add capacity reservation: %w", err)
}
return nil
}
func getValidCapacityReservations(autoscaler *v1alpha1.HorizontalRunnerAutoscaler) []v1alpha1.CapacityReservation {
var capacityReservations []v1alpha1.CapacityReservation
@@ -638,26 +820,63 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) SetupWithManager(mgr
hra := rawObj.(*v1alpha1.HorizontalRunnerAutoscaler)
if hra.Spec.ScaleTargetRef.Name == "" {
autoscaler.Log.V(1).Info(fmt.Sprintf("scale target ref name not set for hra %s", hra.Name))
return nil
}
switch hra.Spec.ScaleTargetRef.Kind {
case "", "RunnerDeployment":
var rd v1alpha1.RunnerDeployment
if err := autoscaler.Client.Get(context.Background(), types.NamespacedName{Namespace: hra.Namespace, Name: hra.Spec.ScaleTargetRef.Name}, &rd); err != nil {
autoscaler.Log.V(1).Info(fmt.Sprintf("RunnerDeployment not found with scale target ref name %s for hra %s", hra.Spec.ScaleTargetRef.Name, hra.Name))
return nil
}
return []string{rd.Spec.Template.Spec.Repository, rd.Spec.Template.Spec.Organization}
keys := []string{}
if rd.Spec.Template.Spec.Repository != "" {
keys = append(keys, rd.Spec.Template.Spec.Repository) // Repository runners
}
if rd.Spec.Template.Spec.Organization != "" {
if group := rd.Spec.Template.Spec.Group; group != "" {
keys = append(keys, organizationalRunnerGroupKey(rd.Spec.Template.Spec.Organization, rd.Spec.Template.Spec.Group)) // Organization runner groups
} else {
keys = append(keys, rd.Spec.Template.Spec.Organization) // Organization runners
}
}
if enterprise := rd.Spec.Template.Spec.Enterprise; enterprise != "" {
if group := rd.Spec.Template.Spec.Group; group != "" {
keys = append(keys, enterpriseRunnerGroupKey(enterprise, rd.Spec.Template.Spec.Group)) // Enterprise runner groups
} else {
keys = append(keys, enterpriseKey(enterprise)) // Enterprise runners
}
}
autoscaler.Log.V(2).Info(fmt.Sprintf("HRA keys indexed for HRA %s: %v", hra.Name, keys))
return keys
case "RunnerSet":
var rs v1alpha1.RunnerSet
if err := autoscaler.Client.Get(context.Background(), types.NamespacedName{Namespace: hra.Namespace, Name: hra.Spec.ScaleTargetRef.Name}, &rs); err != nil {
autoscaler.Log.V(1).Info(fmt.Sprintf("RunnerSet not found with scale target ref name %s for hra %s", hra.Spec.ScaleTargetRef.Name, hra.Name))
return nil
}
return []string{rs.Spec.Repository, rs.Spec.Organization}
keys := []string{}
if rs.Spec.Repository != "" {
keys = append(keys, rs.Spec.Repository) // Repository runners
}
if rs.Spec.Organization != "" {
keys = append(keys, rs.Spec.Organization) // Organization runners
if group := rs.Spec.Group; group != "" {
keys = append(keys, organizationalRunnerGroupKey(rs.Spec.Organization, rs.Spec.Group)) // Organization runner groups
}
}
if enterprise := rs.Spec.Enterprise; enterprise != "" {
keys = append(keys, enterpriseKey(enterprise)) // Enterprise runners
if group := rs.Spec.Group; group != "" {
keys = append(keys, enterpriseRunnerGroupKey(enterprise, rs.Spec.Group)) // Enterprise runner groups
}
}
autoscaler.Log.V(2).Info(fmt.Sprintf("HRA keys indexed for HRA %s: %v", hra.Name, keys))
return keys
}
return nil
@@ -670,3 +889,15 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) SetupWithManager(mgr
Named(name).
Complete(autoscaler)
}
func enterpriseKey(name string) string {
return keyPrefixEnterprise + name
}
func organizationalRunnerGroupKey(owner, group string) string {
return owner + keyRunnerGroup + group
}
func enterpriseRunnerGroupKey(enterprise, group string) string {
return keyPrefixEnterprise + enterprise + keyRunnerGroup + group
}

View File

@@ -3,7 +3,7 @@ package controllers
import (
"github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/actions-runner-controller/actions-runner-controller/pkg/actionsglob"
"github.com/google/go-github/v37/github"
"github.com/google/go-github/v45/github"
)
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) MatchCheckRunEvent(event *github.CheckRunEvent) func(scaleUpTrigger v1alpha1.ScaleUpTrigger) bool {

View File

@@ -2,7 +2,7 @@ package controllers
import (
"github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/google/go-github/v37/github"
"github.com/google/go-github/v45/github"
)
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) MatchPullRequestEvent(event *github.PullRequestEvent) func(scaleUpTrigger v1alpha1.ScaleUpTrigger) bool {

View File

@@ -2,7 +2,7 @@ package controllers
import (
"github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/google/go-github/v37/github"
"github.com/google/go-github/v45/github"
)
func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) MatchPushEvent(event *github.PushEvent) func(scaleUpTrigger v1alpha1.ScaleUpTrigger) bool {
@@ -15,10 +15,6 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) MatchPushEvent(event
push := g.Push
if push == nil {
return false
}
return true
return push != nil
}
}

View File

@@ -15,7 +15,7 @@ import (
actionsv1alpha1 "github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/go-logr/logr"
"github.com/google/go-github/v37/github"
"github.com/google/go-github/v45/github"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
@@ -114,6 +114,326 @@ func TestWebhookPing(t *testing.T) {
)
}
func TestWebhookWorkflowJob(t *testing.T) {
setupTest := func() github.WorkflowJobEvent {
f, err := os.Open("testdata/org_webhook_workflow_job_payload.json")
if err != nil {
t.Fatalf("could not open the fixture: %s", err)
}
defer f.Close()
var e github.WorkflowJobEvent
if err := json.NewDecoder(f).Decode(&e); err != nil {
t.Fatalf("invalid json: %s", err)
}
return e
}
t.Run("Successful", func(t *testing.T) {
e := setupTest()
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Template: actionsv1alpha1.RunnerTemplate{
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Organization: "MYORG",
Labels: []string{"label1"},
},
},
},
},
}
initObjs := []runtime.Object{hra, rd}
testServerWithInitObjs(t,
"workflow_job",
&e,
200,
"scaled test-name by 1",
initObjs,
)
})
t.Run("WrongLabels", func(t *testing.T) {
e := setupTest()
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Template: actionsv1alpha1.RunnerTemplate{
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Organization: "MYORG",
Labels: []string{"bad-label"},
},
},
},
},
}
initObjs := []runtime.Object{hra, rd}
testServerWithInitObjs(t,
"workflow_job",
&e,
200,
"no horizontalrunnerautoscaler to scale for this github event",
initObjs,
)
})
// This test verifies that the old way of matching labels doesn't work anymore
t.Run("OldLabels", func(t *testing.T) {
e := setupTest()
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Template: actionsv1alpha1.RunnerTemplate{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"label1": "label1",
},
},
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Organization: "MYORG",
Labels: []string{"bad-label"},
},
},
},
},
}
initObjs := []runtime.Object{hra, rd}
testServerWithInitObjs(t,
"workflow_job",
&e,
200,
"no horizontalrunnerautoscaler to scale for this github event",
initObjs,
)
})
}
func TestWebhookWorkflowJobWithSelfHostedLabel(t *testing.T) {
setupTest := func() github.WorkflowJobEvent {
f, err := os.Open("testdata/org_webhook_workflow_job_with_self_hosted_label_payload.json")
if err != nil {
t.Fatalf("could not open the fixture: %s", err)
}
defer f.Close()
var e github.WorkflowJobEvent
if err := json.NewDecoder(f).Decode(&e); err != nil {
t.Fatalf("invalid json: %s", err)
}
return e
}
t.Run("Successful", func(t *testing.T) {
e := setupTest()
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Template: actionsv1alpha1.RunnerTemplate{
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Organization: "MYORG",
Labels: []string{"label1"},
},
},
},
},
}
initObjs := []runtime.Object{hra, rd}
testServerWithInitObjs(t,
"workflow_job",
&e,
200,
"scaled test-name by 1",
initObjs,
)
})
t.Run("WrongLabels", func(t *testing.T) {
e := setupTest()
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Template: actionsv1alpha1.RunnerTemplate{
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Organization: "MYORG",
Labels: []string{"bad-label"},
},
},
},
},
}
initObjs := []runtime.Object{hra, rd}
testServerWithInitObjs(t,
"workflow_job",
&e,
200,
"no horizontalrunnerautoscaler to scale for this github event",
initObjs,
)
})
// This test verifies that the old way of matching labels doesn't work anymore
t.Run("OldLabels", func(t *testing.T) {
e := setupTest()
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Template: actionsv1alpha1.RunnerTemplate{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"label1": "label1",
},
},
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Organization: "MYORG",
Labels: []string{"bad-label"},
},
},
},
},
}
initObjs := []runtime.Object{hra, rd}
testServerWithInitObjs(t,
"workflow_job",
&e,
200,
"no horizontalrunnerautoscaler to scale for this github event",
initObjs,
)
})
}
func TestGetRequest(t *testing.T) {
hra := HorizontalRunnerAutoscalerGitHubWebhook{}
request, _ := http.NewRequest(http.MethodGet, "/", nil)
@@ -167,23 +487,23 @@ func TestGetValidCapacityReservations(t *testing.T) {
func installTestLogger(webhook *HorizontalRunnerAutoscalerGitHubWebhook) *bytes.Buffer {
logs := &bytes.Buffer{}
log := testLogger{
sink := &testLogSink{
name: "testlog",
writer: logs,
}
webhook.Log = &log
log := logr.New(sink)
webhook.Log = log
return logs
}
func testServer(t *testing.T, eventType string, event interface{}, wantCode int, wantBody string) {
func testServerWithInitObjs(t *testing.T, eventType string, event interface{}, wantCode int, wantBody string, initObjs []runtime.Object) {
t.Helper()
hraWebhook := &HorizontalRunnerAutoscalerGitHubWebhook{}
var initObjs []runtime.Object
client := fake.NewFakeClientWithScheme(sc, initObjs...)
logs := installTestLogger(hraWebhook)
@@ -227,6 +547,11 @@ func testServer(t *testing.T, eventType string, event interface{}, wantCode int,
}
}
func testServer(t *testing.T, eventType string, event interface{}, wantCode int, wantBody string) {
var initObjs []runtime.Object
testServerWithInitObjs(t, eventType, event, wantCode, wantBody, initObjs)
}
func sendWebhook(server *httptest.Server, eventType string, event interface{}) (*http.Response, error) {
jsonBuf := &bytes.Buffer{}
enc := json.NewEncoder(jsonBuf)
@@ -256,18 +581,22 @@ func sendWebhook(server *httptest.Server, eventType string, event interface{}) (
return http.DefaultClient.Do(req)
}
// testLogger is a sample logr.Logger that logs in-memory.
// testLogSink is a sample logr.Logger that logs in-memory.
// It's only for testing log outputs.
type testLogger struct {
type testLogSink struct {
name string
keyValues map[string]interface{}
writer io.Writer
}
var _ logr.Logger = &testLogger{}
var _ logr.LogSink = &testLogSink{}
func (l *testLogger) Info(msg string, kvs ...interface{}) {
func (l *testLogSink) Init(_ logr.RuntimeInfo) {
}
func (l *testLogSink) Info(_ int, msg string, kvs ...interface{}) {
fmt.Fprintf(l.writer, "%s] %s\t", l.name, msg)
for k, v := range l.keyValues {
fmt.Fprintf(l.writer, "%s=%+v ", k, v)
@@ -278,28 +607,24 @@ func (l *testLogger) Info(msg string, kvs ...interface{}) {
fmt.Fprintf(l.writer, "\n")
}
func (_ *testLogger) Enabled() bool {
func (_ *testLogSink) Enabled(level int) bool {
return true
}
func (l *testLogger) Error(err error, msg string, kvs ...interface{}) {
func (l *testLogSink) Error(err error, msg string, kvs ...interface{}) {
kvs = append(kvs, "error", err)
l.Info(msg, kvs...)
l.Info(0, msg, kvs...)
}
func (l *testLogger) V(_ int) logr.InfoLogger {
return l
}
func (l *testLogger) WithName(name string) logr.Logger {
return &testLogger{
func (l *testLogSink) WithName(name string) logr.LogSink {
return &testLogSink{
name: l.name + "." + name,
keyValues: l.keyValues,
writer: l.writer,
}
}
func (l *testLogger) WithValues(kvs ...interface{}) logr.Logger {
func (l *testLogSink) WithValues(kvs ...interface{}) logr.LogSink {
newMap := make(map[string]interface{}, len(l.keyValues)+len(kvs)/2)
for k, v := range l.keyValues {
newMap[k] = v
@@ -307,7 +632,7 @@ func (l *testLogger) WithValues(kvs ...interface{}) logr.Logger {
for i := 0; i < len(kvs); i += 2 {
newMap[kvs[i].(string)] = kvs[i+1]
}
return &testLogger{
return &testLogSink{
name: l.name,
keyValues: newMap,
writer: l.writer,

View File

@@ -0,0 +1,55 @@
package controllers
import (
"context"
)
// worker is a worker that has a non-blocking bounded queue of scale targets, dequeues scale target and executes the scale operation one by one.
type worker struct {
scaleTargetQueue chan *ScaleTarget
work func(*ScaleTarget)
done chan struct{}
}
func newWorker(ctx context.Context, queueLimit int, work func(*ScaleTarget)) *worker {
w := &worker{
scaleTargetQueue: make(chan *ScaleTarget, queueLimit),
work: work,
done: make(chan struct{}),
}
go func() {
defer close(w.done)
for {
select {
case <-ctx.Done():
return
case t := <-w.scaleTargetQueue:
work(t)
}
}
}()
return w
}
// Add the scale target to the bounded queue, returning the result as a bool value. It returns true on successful enqueue, and returns false otherwise.
// When returned false, the queue is already full so the enqueue operation must be retried later.
// If the enqueue was triggered by an external source and there's no intermediate queue that we can use,
// you must instruct the source to resend the original request later.
// In case you're building a webhook server around this worker, this means that you must return a http error to the webhook server,
// so that (hopefully) the sender can resend the webhook event later, or at least the human operator can notice or be notified about the
// webhook develiery failure so that a manual retry can be done later.
func (w *worker) Add(st *ScaleTarget) bool {
select {
case w.scaleTargetQueue <- st:
return true
default:
return false
}
}
func (w *worker) Done() chan struct{} {
return w.done
}

View File

@@ -0,0 +1,36 @@
package controllers
import (
"context"
"testing"
"github.com/stretchr/testify/require"
)
func TestWorker_Add(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
w := newWorker(ctx, 2, func(st *ScaleTarget) {})
require.True(t, w.Add(&ScaleTarget{}))
require.True(t, w.Add(&ScaleTarget{}))
require.False(t, w.Add(&ScaleTarget{}))
}
func TestWorker_Work(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
var count int
w := newWorker(ctx, 1, func(st *ScaleTarget) {
count++
cancel()
})
require.True(t, w.Add(&ScaleTarget{}))
require.False(t, w.Add(&ScaleTarget{}))
<-w.Done()
require.Equal(t, count, 1)
}

View File

@@ -25,10 +25,10 @@ import (
corev1 "k8s.io/api/core/v1"
"github.com/actions-runner-controller/actions-runner-controller/github"
"github.com/go-logr/logr"
kerrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/types"
"github.com/go-logr/logr"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/client-go/tools/record"
ctrl "sigs.k8s.io/controller-runtime"
@@ -47,13 +47,13 @@ const (
// HorizontalRunnerAutoscalerReconciler reconciles a HorizontalRunnerAutoscaler object
type HorizontalRunnerAutoscalerReconciler struct {
client.Client
GitHubClient *github.Client
Log logr.Logger
Recorder record.EventRecorder
Scheme *runtime.Scheme
CacheDuration time.Duration
Name string
GitHubClient *github.Client
Log logr.Logger
Recorder record.EventRecorder
Scheme *runtime.Scheme
CacheDuration time.Duration
DefaultScaleDownDelay time.Duration
Name string
}
const defaultReplicas = 1
@@ -99,11 +99,33 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(ctx context.Context, re
return r.reconcile(ctx, req, log, hra, st, func(newDesiredReplicas int) error {
currentDesiredReplicas := getIntOrDefault(rd.Spec.Replicas, defaultReplicas)
ephemeral := rd.Spec.Template.Spec.Ephemeral == nil || *rd.Spec.Template.Spec.Ephemeral
var effectiveTime *time.Time
for _, r := range hra.Spec.CapacityReservations {
t := r.EffectiveTime
if effectiveTime == nil || effectiveTime.Before(t.Time) {
effectiveTime = &t.Time
}
}
// Please add more conditions that we can in-place update the newest runnerreplicaset without disruption
if currentDesiredReplicas != newDesiredReplicas {
copy := rd.DeepCopy()
copy.Spec.Replicas = &newDesiredReplicas
if ephemeral && effectiveTime != nil {
copy.Spec.EffectiveTime = &metav1.Time{Time: *effectiveTime}
}
if err := r.Client.Patch(ctx, copy, client.MergeFrom(&rd)); err != nil {
return fmt.Errorf("patching runnerdeployment to have %d replicas: %w", newDesiredReplicas, err)
}
} else if ephemeral && effectiveTime != nil {
copy := rd.DeepCopy()
copy.Spec.EffectiveTime = &metav1.Time{Time: *effectiveTime}
if err := r.Client.Patch(ctx, copy, client.MergeFrom(&rd)); err != nil {
return fmt.Errorf("patching runnerdeployment to have %d replicas: %w", newDesiredReplicas, err)
}
@@ -137,6 +159,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(ctx context.Context, re
org: rs.Spec.Organization,
repo: rs.Spec.Repository,
replicas: replicas,
labels: rs.Spec.RunnerConfig.Labels,
getRunnerMap: func() (map[string]struct{}, error) {
// return the list of runners in namespace. Horizontal Runner Autoscaler should only be responsible for scaling resources in its own ns.
var runnerPodList corev1.PodList
@@ -180,15 +203,38 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(ctx context.Context, re
}
currentDesiredReplicas := getIntOrDefault(replicas, defaultReplicas)
ephemeral := rs.Spec.Ephemeral == nil || *rs.Spec.Ephemeral
var effectiveTime *time.Time
for _, r := range hra.Spec.CapacityReservations {
t := r.EffectiveTime
if effectiveTime == nil || effectiveTime.Before(t.Time) {
effectiveTime = &t.Time
}
}
if currentDesiredReplicas != newDesiredReplicas {
copy := rs.DeepCopy()
v := int32(newDesiredReplicas)
copy.Spec.Replicas = &v
if ephemeral && effectiveTime != nil {
copy.Spec.EffectiveTime = &metav1.Time{Time: *effectiveTime}
}
if err := r.Client.Patch(ctx, copy, client.MergeFrom(&rs)); err != nil {
return fmt.Errorf("patching runnerset to have %d replicas: %w", newDesiredReplicas, err)
}
} else if ephemeral && effectiveTime != nil {
copy := rs.DeepCopy()
copy.Spec.EffectiveTime = &metav1.Time{Time: *effectiveTime}
if err := r.Client.Patch(ctx, copy, client.MergeFrom(&rs)); err != nil {
return fmt.Errorf("patching runnerset to have %d replicas: %w", newDesiredReplicas, err)
}
}
return nil
})
}
@@ -206,6 +252,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) scaleTargetFromRD(ctx context.Con
org: rd.Spec.Template.Spec.Organization,
repo: rd.Spec.Template.Spec.Repository,
replicas: rd.Spec.Replicas,
labels: rd.Spec.Template.Spec.RunnerConfig.Labels,
getRunnerMap: func() (map[string]struct{}, error) {
// return the list of runners in namespace. Horizontal Runner Autoscaler should only be responsible for scaling resources in its own ns.
var runnerList v1alpha1.RunnerList
@@ -248,6 +295,7 @@ type scaleTarget struct {
st, kind string
enterprise, repo, org string
replicas *int
labels []string
getRunnerMap func() (map[string]struct{}, error)
}
@@ -262,7 +310,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) reconcile(ctx context.Context, re
return ctrl.Result{}, err
}
newDesiredReplicas, computedReplicas, computedReplicasFromCache, err := r.computeReplicasWithCache(log, now, st, hra, minReplicas)
newDesiredReplicas, err := r.computeReplicasWithCache(log, now, st, hra, minReplicas)
if err != nil {
r.Recorder.Event(&hra, corev1.EventTypeNormal, "RunnerAutoscalingFailure", err.Error())
@@ -287,24 +335,6 @@ func (r *HorizontalRunnerAutoscalerReconciler) reconcile(ctx context.Context, re
updated.Status.DesiredReplicas = &newDesiredReplicas
}
if computedReplicasFromCache == nil {
cacheEntries := getValidCacheEntries(updated, now)
var cacheDuration time.Duration
if r.CacheDuration > 0 {
cacheDuration = r.CacheDuration
} else {
cacheDuration = 10 * time.Minute
}
updated.Status.CacheEntries = append(cacheEntries, v1alpha1.CacheEntry{
Key: v1alpha1.CacheEntryKeyDesiredReplicas,
Value: computedReplicas,
ExpirationTime: metav1.Time{Time: time.Now().Add(cacheDuration)},
})
}
var overridesSummary string
if (active != nil && upcoming == nil) || (active != nil && upcoming != nil && active.Period.EndTime.Before(upcoming.Period.StartTime)) {
@@ -339,18 +369,6 @@ func (r *HorizontalRunnerAutoscalerReconciler) reconcile(ctx context.Context, re
return ctrl.Result{}, nil
}
func getValidCacheEntries(hra *v1alpha1.HorizontalRunnerAutoscaler, now time.Time) []v1alpha1.CacheEntry {
var cacheEntries []v1alpha1.CacheEntry
for _, ent := range hra.Status.CacheEntries {
if ent.ExpirationTime.After(now) {
cacheEntries = append(cacheEntries, ent)
}
}
return cacheEntries
}
func (r *HorizontalRunnerAutoscalerReconciler) SetupWithManager(mgr ctrl.Manager) error {
name := "horizontalrunnerautoscaler-controller"
if r.Name != "" {
@@ -443,32 +461,18 @@ func (r *HorizontalRunnerAutoscalerReconciler) getMinReplicas(log logr.Logger, n
return minReplicas, active, upcoming, nil
}
func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr.Logger, now time.Time, st scaleTarget, hra v1alpha1.HorizontalRunnerAutoscaler, minReplicas int) (int, int, *int, error) {
func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr.Logger, now time.Time, st scaleTarget, hra v1alpha1.HorizontalRunnerAutoscaler, minReplicas int) (int, error) {
var suggestedReplicas int
suggestedReplicasFromCache := r.fetchSuggestedReplicasFromCache(hra)
v, err := r.suggestDesiredReplicas(st, hra)
if err != nil {
return 0, err
}
var cached *int
if suggestedReplicasFromCache != nil {
cached = suggestedReplicasFromCache
if cached == nil {
suggestedReplicas = minReplicas
} else {
suggestedReplicas = *cached
}
if v == nil {
suggestedReplicas = minReplicas
} else {
v, err := r.suggestDesiredReplicas(st, hra)
if err != nil {
return 0, 0, nil, err
}
if v == nil {
suggestedReplicas = minReplicas
} else {
suggestedReplicas = *v
}
suggestedReplicas = *v
}
var reserved int
@@ -496,7 +500,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr
if hra.Spec.ScaleDownDelaySecondsAfterScaleUp != nil {
scaleDownDelay = time.Duration(*hra.Spec.ScaleDownDelaySecondsAfterScaleUp) * time.Second
} else {
scaleDownDelay = DefaultScaleDownDelay
scaleDownDelay = r.DefaultScaleDownDelay
}
var scaleDownDelayUntil *time.Time
@@ -527,8 +531,8 @@ func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr
"min", minReplicas,
}
if cached != nil {
kvs = append(kvs, "cached", *cached)
if maxReplicas := hra.Spec.MaxReplicas; maxReplicas != nil {
kvs = append(kvs, "max", *maxReplicas)
}
if scaleDownDelayUntil != nil {
@@ -536,13 +540,9 @@ func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr
kvs = append(kvs, "scale_down_delay_until", scaleDownDelayUntil)
}
if maxReplicas := hra.Spec.MaxReplicas; maxReplicas != nil {
kvs = append(kvs, "max", *maxReplicas)
}
log.V(1).Info(fmt.Sprintf("Calculated desired replicas of %d", newDesiredReplicas),
kvs...,
)
return newDesiredReplicas, suggestedReplicas, suggestedReplicasFromCache, nil
return newDesiredReplicas, nil
}

View File

@@ -1,50 +0,0 @@
package controllers
import (
"testing"
"time"
actionsv1alpha1 "github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/google/go-cmp/cmp"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func TestGetValidCacheEntries(t *testing.T) {
now := time.Now()
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
Status: actionsv1alpha1.HorizontalRunnerAutoscalerStatus{
CacheEntries: []actionsv1alpha1.CacheEntry{
{
Key: "foo",
Value: 1,
ExpirationTime: metav1.Time{Time: now.Add(-time.Second)},
},
{
Key: "foo",
Value: 2,
ExpirationTime: metav1.Time{Time: now},
},
{
Key: "foo",
Value: 3,
ExpirationTime: metav1.Time{Time: now.Add(time.Second)},
},
},
},
}
revs := getValidCacheEntries(hra, now)
counts := map[string]int{}
for _, r := range revs {
counts[r.Key] += r.Value
}
want := map[string]int{"foo": 3}
if d := cmp.Diff(want, counts); d != "" {
t.Errorf("%s", d)
}
}

View File

@@ -8,7 +8,7 @@ import (
"time"
github2 "github.com/actions-runner-controller/actions-runner-controller/github"
"github.com/google/go-github/v37/github"
"github.com/google/go-github/v45/github"
"github.com/actions-runner-controller/actions-runner-controller/github/fake"
@@ -108,8 +108,9 @@ func SetupIntegrationTest(ctx2 context.Context) *testEnvironment {
RunnerImage: "example/runner:test",
DockerImage: "example/docker:test",
Name: controllerName("runner"),
RegistrationRecheckInterval: time.Millisecond,
RegistrationRecheckJitter: time.Millisecond,
RegistrationRecheckInterval: time.Millisecond * 100,
RegistrationRecheckJitter: time.Millisecond * 10,
UnregistrationRetryDelay: 1 * time.Second,
}
err = runnerController.SetupWithManager(mgr)
Expect(err).NotTo(HaveOccurred(), "failed to setup runner controller")
@@ -268,7 +269,6 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 2)
ExpectHRAStatusCacheEntryLengthEventuallyEquals(ctx, ns.Name, name, 1)
}
{
@@ -371,7 +371,6 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 3)
ExpectHRAStatusCacheEntryLengthEventuallyEquals(ctx, ns.Name, name, 1)
}
{
@@ -538,6 +537,106 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
}
})
It("should create and scale organization's repository runners on workflow_job event", func() {
name := "example-runnerdeploy"
{
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Replicas: intPtr(1),
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"foo": "bar",
},
},
Template: actionsv1alpha1.RunnerTemplate{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"foo": "bar",
},
},
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Repository: "test/valid",
Image: "bar",
Group: "baz",
},
RunnerPodSpec: actionsv1alpha1.RunnerPodSpec{
Env: []corev1.EnvVar{
{Name: "FOO", Value: "FOOVALUE"},
},
},
},
},
},
}
ExpectCreate(ctx, rd, "test RunnerDeployment")
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 1)
env.ExpectRegisteredNumberCountEventuallyEquals(1, "count of fake list runners")
}
// Scale-up to 1 replica via ScaleUpTriggers.GitHubEvent.WorkflowJob based scaling
{
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: name,
},
MinReplicas: intPtr(1),
MaxReplicas: intPtr(5),
ScaleDownDelaySecondsAfterScaleUp: intPtr(1),
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
Amount: 1,
Duration: metav1.Duration{Duration: time.Minute},
},
},
},
}
ExpectCreate(ctx, hra, "test HorizontalRunnerAutoscaler")
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 1)
env.ExpectRegisteredNumberCountEventuallyEquals(1, "count of fake list runners")
}
// Scale-up to 2 replicas on first workflow_job.queued webhook event
{
env.SendWorkflowJobEvent("test", "valid", "queued", []string{"self-hosted"})
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 2, "runners after first webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(2, "count of fake list runners")
}
// Scale-up to 3 replicas on second workflow_job.queued webhook event
{
env.SendWorkflowJobEvent("test", "valid", "queued", []string{"self-hosted"})
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 3, "runners after second webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(3, "count of fake list runners")
}
// Do not scale-up on third workflow_job.queued webhook event
// repo "example" doesn't match our Spec
{
env.SendWorkflowJobEvent("test", "example", "queued", []string{"self-hosted"})
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 3, "runners after third webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(3, "count of fake list runners")
}
})
It("should create and scale organization's repository runners only on check_run event", func() {
name := "example-runnerdeploy"
@@ -582,9 +681,7 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
env.ExpectRegisteredNumberCountEventuallyEquals(1, "count of fake list runners")
}
// Scale-up to 3 replicas by the default TotalNumberOfQueuedAndInProgressWorkflowRuns-based scaling
// See workflowRunsFor3Replicas_queued and workflowRunsFor3Replicas_in_progress for GitHub List-Runners API responses
// used while testing.
// Scale-up to 1 replica via ScaleUpTriggers.GitHubEvent.CheckRun based scaling
{
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
@@ -1077,24 +1174,176 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
}
})
It("should be able to scale visible organization runner group with default labels", func() {
name := "example-runnerdeploy"
{
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Replicas: intPtr(1),
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"foo": "bar",
},
},
Template: actionsv1alpha1.RunnerTemplate{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"foo": "bar",
},
},
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Repository: "test/valid",
Image: "bar",
Group: "baz",
},
RunnerPodSpec: actionsv1alpha1.RunnerPodSpec{
Env: []corev1.EnvVar{
{Name: "FOO", Value: "FOOVALUE"},
},
},
},
},
},
}
ExpectCreate(ctx, rd, "test RunnerDeployment")
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: name,
},
MinReplicas: intPtr(1),
MaxReplicas: intPtr(5),
ScaleDownDelaySecondsAfterScaleUp: intPtr(1),
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
Amount: 1,
Duration: metav1.Duration{Duration: time.Minute},
},
},
},
}
ExpectCreate(ctx, hra, "test HorizontalRunnerAutoscaler")
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 1)
}
{
env.ExpectRegisteredNumberCountEventuallyEquals(1, "count of fake list runners")
}
// Scale-up to 2 replicas on first workflow_job webhook event
{
env.SendWorkflowJobEvent("test", "valid", "queued", []string{"self-hosted"})
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1, "runner sets after webhook")
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 2, "runners after first webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(2, "count of fake list runners")
}
})
It("should be able to scale visible organization runner group with custom labels", func() {
name := "example-runnerdeploy"
{
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Replicas: intPtr(1),
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"foo": "bar",
},
},
Template: actionsv1alpha1.RunnerTemplate{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"foo": "bar",
},
},
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Repository: "test/valid",
Image: "bar",
Group: "baz",
Labels: []string{"custom-label"},
},
RunnerPodSpec: actionsv1alpha1.RunnerPodSpec{
Env: []corev1.EnvVar{
{Name: "FOO", Value: "FOOVALUE"},
},
},
},
},
},
}
ExpectCreate(ctx, rd, "test RunnerDeployment")
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: name,
},
MinReplicas: intPtr(1),
MaxReplicas: intPtr(5),
ScaleDownDelaySecondsAfterScaleUp: intPtr(1),
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
Amount: 1,
Duration: metav1.Duration{Duration: time.Minute},
},
},
},
}
ExpectCreate(ctx, hra, "test HorizontalRunnerAutoscaler")
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 1)
}
{
env.ExpectRegisteredNumberCountEventuallyEquals(1, "count of fake list runners")
}
// Scale-up to 2 replicas on first workflow_job webhook event
{
env.SendWorkflowJobEvent("test", "valid", "queued", []string{"custom-label"})
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1, "runner sets after webhook")
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 2, "runners after first webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(2, "count of fake list runners")
}
})
})
})
func ExpectHRAStatusCacheEntryLengthEventuallyEquals(ctx context.Context, ns string, name string, value int, optionalDescriptions ...interface{}) {
EventuallyWithOffset(
1,
func() int {
var hra actionsv1alpha1.HorizontalRunnerAutoscaler
err := k8sClient.Get(ctx, types.NamespacedName{Namespace: ns, Name: name}, &hra)
ExpectWithOffset(1, err).NotTo(HaveOccurred(), "failed to get test HRA resource")
return len(hra.Status.CacheEntries)
},
time.Second*5, time.Millisecond*500).Should(Equal(value), optionalDescriptions...)
}
func ExpectHRADesiredReplicasEquals(ctx context.Context, ns, name string, desired int, optionalDescriptions ...interface{}) {
var rd actionsv1alpha1.HorizontalRunnerAutoscaler
@@ -1118,7 +1367,7 @@ func (env *testEnvironment) ExpectRegisteredNumberCountEventuallyEquals(want int
return len(rs)
},
time.Second*5, time.Millisecond*500).Should(Equal(want), optionalDescriptions...)
time.Second*10, time.Millisecond*500).Should(Equal(want), optionalDescriptions...)
}
func (env *testEnvironment) SendOrgPullRequestEvent(org, repo, branch, action string) {
@@ -1166,6 +1415,30 @@ func (env *testEnvironment) SendOrgCheckRunEvent(org, repo, status, action strin
ExpectWithOffset(1, resp.StatusCode).To(Equal(200))
}
func (env *testEnvironment) SendWorkflowJobEvent(org, repo, statusAndAction string, labels []string) {
resp, err := sendWebhook(env.webhookServer, "workflow_job", &github.WorkflowJobEvent{
WorkflowJob: &github.WorkflowJob{
Status: &statusAndAction,
Labels: labels,
},
Org: &github.Organization{
Login: github.String(org),
},
Repo: &github.Repository{
Name: github.String(repo),
Owner: &github.User{
Login: github.String(org),
Type: github.String("Organization"),
},
},
Action: github.String(statusAndAction),
})
ExpectWithOffset(1, err).NotTo(HaveOccurred(), "failed to send workflow_job event")
ExpectWithOffset(1, resp.StatusCode).To(Equal(200))
}
func (env *testEnvironment) SendUserPullRequestEvent(owner, repo, branch, action string) {
resp, err := sendWebhook(env.webhookServer, "pull_request", &github.PullRequestEvent{
PullRequest: &github.PullRequest{

Some files were not shown because too many files have changed in this diff Show More