Compare commits

..

199 Commits

Author SHA1 Message Date
Callum Tait
d3b7f0bf7d chore: release chart targeting v0.23.0 (#1404) 2022-04-29 13:54:22 +01:00
Yusuke Kuoka
dbcb67967f Turn the bug report template into a form with more context (#1401)
I believe this helps us focus on relatively more important issues like critical bug reports and highly-requested feature requests.

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-04-29 21:09:59 +09:00
Callum Tait
55369bf846 fix: forgot to do the chart (#1388)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

> chart test is failing due to `flag provided but not defined: -default-scale-down-delay` which seems to come from the fact that we still use ARC 0.22.3 for chart testing.
> 
> Probably we'd better figure out how to test it against both the latest release version of ARC and the canary version of ARC?
> 
> Or just test it against the canary version so that it won't fail when the chart depends on features that are available only in the canary version of ARC? 🤔

yup, lets get this merged though so we can do a release today
2022-04-29 09:15:27 +01:00
Yusuke Kuoka
1f6303daed Merge pull request #1396 from actions-runner-controller/docs/pre-release
docs: final doc changes + v0.23.0 release notes
2022-04-29 12:35:42 +09:00
Yusuke Kuoka
0fd1a681af Update bug_report.md (#1400)
so that we can hopefully get enough information to diagnose the issue in case it's really a bug report, or it goes to Discussions in case it's a question.
2022-04-29 12:32:08 +09:00
toast-gear
58416db8c8 docs: add new runner group API enhancemnet 2022-04-28 16:17:53 +01:00
toast-gear
78a0817c2c docs: align release doc format 2022-04-28 16:06:59 +01:00
toast-gear
9ed429513d docs: bump the helm upgrade chart docs version 2022-04-28 16:04:58 +01:00
toast-gear
46291c1823 docs: highlight the new scale down delay flag 2022-04-28 16:04:16 +01:00
toast-gear
832e59338e docs: clarification of the release log 2022-04-28 16:00:03 +01:00
toast-gear
70ae5aef1f docs: add migration steps 2022-04-28 15:57:03 +01:00
toast-gear
6d10dd8e1d docs: breaking changes in v0.23.0 2022-04-28 10:51:19 +01:00
toast-gear
61c5a112db docs: remove reference to cleared limitation 2022-04-28 10:39:11 +01:00
toast-gear
7bc08fbe7c docs: remove TotalNumberOfQueuedAndInProgressWorkflowRuns limitation 2022-04-28 10:36:12 +01:00
Yusuke Kuoka
4053ab3e11 Fix label support for TotalNumberOfQueuedAndInProgressWorkflowRuns metric (#1390)
In #1373 we made two mistakes:

- We mistakenly checked if all the runner labels are included in the job labels and only after that it marked the target as eligible for scale. It should definitely be the opposite!
- We mistakenly checked for the existence of `self-hosted` labe l in the job. [Although it should be a good practice to explicitly say `runs-on: ["self-hosted", "custom-label"]`](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-using-labels-for-runner-selection), that's not a requirement so we should code accordingly.

The consequence of those two mistakes was that, for example, jobs with `self-hosted` + `custom` labels didn't result in scaling runner with `self-hosted` + `custom` + `custom2`. This should fix that.

Ref #1056
Ref #1373
2022-04-27 16:24:21 +01:00
Callum Tait
059481b610 refactor: remove legacy controller Docker build (#1360) [skip ci]
* refactor: remove legacy build and use buildkit

* refactor: add runner version to root makefie

* refactor: enable buildkit for runner make build

* refactor: ignore runner makefile in ci

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-27 08:21:02 +01:00
renovate[bot]
9fdb2c009d fix(deps): update module github.com/google/go-cmp to v0.5.8 (#1394)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-04-27 10:11:33 +09:00
Callum Tait
9f7ea0c014 docs: highlight breaking changes are possible (#1310)
It's probably worth highlighting it's ver 0.X.X by design and that breaking changes are possible until we move it to 1.0.0

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-26 11:20:11 +09:00
Callum Tait
0caa0315c6 feat: set default in chart (#1389)
Ref #963
Ref #899

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-26 10:25:01 +09:00
Yusuke Kuoka
1c726ae20c chore: Add unit test for RunnerReconciler.newPod (#1382)
Adds some unit tests for the runner pod generation logic that is used internally by runner controller as preparation for #1282
2022-04-25 09:59:21 +09:00
Yusuke Kuoka
d6cdd5964c chore: Add unit test for newRunnerPod (#1381)
Adds some unit tests for the runner pod generation logic that is used internally by runner deployment and runner set controllers as preparation for #1282
2022-04-25 08:52:58 +09:00
Yusuke Kuoka
a622968ff2 feat: Add label support to TotalNumberOfQueuedAndInProgressWorkflowRuns metric (#1373)
This is an implementation for my intepretation of the "bronze" case proposed in #1056

Ref #1056
2022-04-24 14:41:34 +09:00
Soham Banerjee
e8ef84ab76 Removed the default githubEvent: {} (#1361)
Ref #1358
See also #1379
2022-04-24 13:39:59 +09:00
Yusuke Kuoka
1551f3b5fc Remove the default githubEvent: {} requiring a event to be defined (#1379)
Ref #1358
2022-04-24 13:37:26 +09:00
Yusuke Kuoka
3ba7179995 Do not enable TotalNumberOfQueuedAndInProgressWorkflowRuns by default (#1372)
Previously, omitting hra.spec.metrics at all resulted in ARC enabling the TotalNumberOfQueuedAndInProgressWorkflowRuns.
That turned out not a good idea so since this change it is not enabled by default.

Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/728
2022-04-24 13:36:42 +09:00
Mário Uhrík
e7c6c26266 Runner CRD: Add required conversionReviewVersions field (#1259)
Without that field, GKE 1.21 refuses to create the CRD
with an error message that conversionReviewVersions is mandatory.

conversionReviewVersions is a required field when creating apiextensions.k8s.io/v1 custom resource definitions.
Webhooks are required to support at least one ConversionReview version understood by the current and previous API server.

See https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/_print/#webhook-request-and-response
2022-04-24 11:04:15 +09:00
Tingluo Huang
ebe7d060cb Find runner groups that visible to repository using a single API call. (#1324)
The [ListRunnerGroup API](https://docs.github.com/en/rest/reference/actions#list-self-hosted-runner-groups-for-an-organization) now add a new query parameter `visible_to_repository`.

We were doing `N+1` lookup when trying to find which runner group can be used for job from a certain repository.
- List all runner groups
- Loop through all groups to check repository access for each of them via [API](https://docs.github.com/en/rest/reference/actions#list-repository-access-to-a-self-hosted-runner-group-in-an-organization)

The new query parameter `visible_to_repository` should allow us to get the runner groups with access in one call.

Limitation:
- The new query parameter is only supported in GitHub.com, which means anyone who uses ARC in GitHub Enterprise Server won't get this.
- I am working on a PR to update `go-github` library to support the new parameter, but it will take a few weeks for a newer `go-github` to be released, so in the meantime, I am duplicating the implementation in ARC as well to support the new query parameter.
2022-04-24 10:54:40 +09:00
Callum Tait
c3e280eadb refactor: set sync period default to 1m (#1308)
Fixes: #1294

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-04-24 10:47:00 +09:00
Vinícius Garcia
9f254a2393 docs: run README files through Grammarly (#1353)
* Update README.md

* Run charts/actions-runner-controller/README.md thorugh Grammarly

* Fix broken link as suggested by @toast-gear

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-04-22 16:58:10 +01:00
renovate[bot]
e5cf3b95cf fix(deps): update module github.com/teambition/rrule-go to v1.8.0 (#1048)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-04-20 11:11:38 +09:00
Callum Tait
24aae58dbc feat: default scale down flag (#963)
Resolves #899

Co-authored-by: Callum <callum@domain.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-04-20 11:09:09 +09:00
Jeff Billimek
13bfa2da4e Fix runner pod dnsConfig (#1227)
Fixes #1226
Fixes #1224

Signed-off-by: Jeff Billimek <jeff@billimek.com>
2022-04-20 10:55:20 +09:00
Chris Bui
cb4e1fa8f2 breaking: Pluralize topologySpreadConstraint to match docs (#1089)
Original PR:
https://github.com/actions-runner-controller/actions-runner-controller/pull/814/files#diff-25283fab3c6d5fa726652c8741a122c1ba14d8486fe092774617a385e4bc1a92R145

If you're already using this feature, follow the process explained in https://github.com/actions-runner-controller/actions-runner-controller/pull/1089#issuecomment-1103354025 when upgrading.

Fixes #984
2022-04-20 10:47:18 +09:00
Patrick Ellis
7a5a6381c3 Add WorkflowJob to GitHubEventScaleUpTriggerSpec types (#922) 2022-04-20 09:59:08 +09:00
Renovate Bot
81951780b1 chore(deps): update dependency actions/runner to v2.290.1 2022-04-14 18:36:24 +00:00
renovate[bot]
3b48db0d26 chore(deps): update actions/stale action to v5 (#1338)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-04-13 09:42:27 +01:00
Callum Tait
352e206148 refactor: use apt-get instead of apt (#1342)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-13 09:40:15 +01:00
Richard Fussenegger
6288036ed4 Removed modprobe Script (#1247) [skip ci]
* Removed `modprobe` Script

I was able to find out that this script originates from https://github.com/docker-library/docker/blob/master/modprobe.sh but our image does not have `lsmod` nor `modprobe` installed. Hence, if it were in use, it would fail every time. 🤔

* fix: correct command order

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-13 09:39:55 +01:00
Siyuan Zhang
a37b4dfbe3 Fix scale down condition to exclude skipped (#1330)
* Fix scale down condition to exclude skipped
* Use fallthrough and break to let default handle the skipped case

Fixes #1326
2022-04-13 08:53:07 +09:00
Callum Tait
c4ff1a588f chore: migrate to actions stale bot (#1334)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-13 08:29:49 +09:00
Callum Tait
4a3b7bc8d5 refactor: location of some runner cmds (#1337)
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-04-12 22:18:34 +01:00
Richard Fussenegger
8db071c4ba Improved Bash Logger (#1246)
* Improved Bash Logger

This is a first step towards having robust Bash scripts in the runner images. The changes _could_ be considered breaking, depending on our backwards compatibility definition.

* Fixed Log Formatting Issues

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-04-12 22:02:06 +01:00
Renovate Bot
7b8057e417 chore(deps): update dependency actions/runner to v2.290.0 2022-04-12 20:46:19 +00:00
renovate[bot]
960a704246 chore(deps): update azure/setup-helm action to v2.1 (#1328) [skip ci]
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-04-11 18:23:48 +01:00
Daniel Moran
f907f82275 Add more usages of RUNNER_VERSION to Renovate config. (#1313)
* Add more usages of RUNNER_VERSION to Renovate config.

* Double-escape `?` in pattern
2022-04-11 11:28:00 +01:00
Rolf Ahrenberg
7124451cea chore: fix typo (#1316) [skip ci] 2022-04-08 17:32:01 +01:00
Yusuke Kuoka
c8f1acd92c chore: bump chart to latest (#1319)
Bumps the chart version along with the controller version.
We bump the patch number for the chart as the release for the controller is a patch release.
That's the same handling as we've done in the previous version ecc8b4472a and #1300

As always, be sure to upgrade CRDs before updating the controller version!
Otherwise it can break in interesting ways.
2022-04-08 10:59:07 +09:00
Yusuke Kuoka
b0fd7a75ea Fix release workflow 2022-04-08 01:36:14 +00:00
Yusuke Kuoka
b09c54045a Prevent runners from stuck in Terminating when pod disappeared without standard termination process (#1318)
This fixes the said issue by additionally treating any runner pod whose phase is Failed or the runner container exited with non-zero code as "complete" so that ARC gives up unregistering the runner from Actions, deletes the runner pod anyway.

Note that there are a plenty of causes for that. If you are deploying runner pods on AWS spot instances or GCE preemptive instances and a job assigned to a runner took more time than the shutdown grace period provided by your cloud provider (2 minutes for AWS spot instances), the runner pod would be terminated prematurely without letting actions/runner unregisters itself from Actions. If your VM or hypervisor failed then runner pods that were running on the node will become PodFailed without unregistering runners from Actions.

Please beware that it is currently users responsibility to clean up any dangling runner resources on GitHub Actions.

Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1307
Might also relate to https://github.com/actions-runner-controller/actions-runner-controller/issues/1273
2022-04-08 10:17:33 +09:00
Yusuke Kuoka
96f2da1c2e Merge pull request #1262 from fgalind1/patch-4
Fix deleting a runner when pod was deleted
2022-04-08 10:17:13 +09:00
Yusuke Kuoka
cac8b76c68 Merge pull request #1292 from actions-runner-controller/renovate/sigs.k8s.io-controller-runtime-0.x
fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.2
2022-04-08 10:14:47 +09:00
Felipe Galindo Sanchez
e24d942d63 Merge remote-tracking branch 'upstream/master' into patch-4 2022-04-06 06:43:01 -07:00
Felipe Galindo Sanchez
b855991373 ci: pin go version to the known working version (#1303) 2022-04-06 09:34:48 +01:00
Felipe Galindo Sanchez
e7e48a77e4 Merge remote-tracking branch 'upstream/master' into patch-4 2022-04-04 09:54:08 -07:00
Yusuke Kuoka
85dea9b67c Merge pull request #1285 from actions-runner-controller/docs/runnersets
docs: add limitations to runnersets + reorder
2022-04-03 18:18:54 +09:00
Yusuke Kuoka
1d9347f418 chore: bump chart to latest (#1300)
* chore: bump chart to latest

Bumps the chart version along with the controller version.
We bump the patch number for the chart as the release for the controller is a patch release.
That's the same handling as we've done in the previous version ecc8b4472a

As always, be sure to upgrade CRDs before updating the controller version!
Otherwise it can break in interesting ways.

* docs: expand on CRD upgrade requirement

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-04-03 10:15:39 +01:00
Yusuke Kuoka
631a70a35f Fix runner pod to be cleaned up earlier regardless of the sync period (#1299)
Ref #1291
2022-04-03 11:12:44 +09:00
Yusuke Kuoka
b614dcf54b Make the hard-coded runner startup timeout to avoid race on token expiration longer (#1296)
Ref #1295
2022-04-03 09:59:35 +09:00
Callum Tait
14f9e7229e docs: highlight why persistent are not ideal 2022-04-01 15:49:15 +01:00
Renovate Bot
82770e145b fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.2 2022-03-30 21:38:12 +00:00
Renovate Bot
971c54bf5c chore(deps): update dependency actions/runner to v2.289.2 2022-03-30 18:18:17 +00:00
renovate[bot]
b80d9b0cdc chore(deps): update helm/chart-releaser-action action to v1.4.0 (#1287)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-30 13:24:26 +01:00
Bernardo Meurer
e46df413a1 refactor(runner/entrypoint): check for externalstmp (#1277)
* refactor(runner/entrypoint): check for externalstmp [skip ci]

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-03-30 12:18:18 +01:00
toast-gear
eb02f6f26e docs: redundant words 2022-03-30 10:36:34 +01:00
toast-gear
7a750b9285 docs: wording 2022-03-30 10:35:32 +01:00
toast-gear
d26c8d6529 docs: add autoscaling also causes problems 2022-03-30 10:26:08 +01:00
toast-gear
fd0092d13f chore: new line for consistency 2022-03-30 10:02:33 +01:00
toast-gear
88d17c7988 docs: use the right font 2022-03-30 10:00:34 +01:00
toast-gear
98567dadc9 docs: fix index 2022-03-30 09:59:32 +01:00
toast-gear
7e8d80689b docs: add limitations to runnersets + reorder 2022-03-30 09:53:59 +01:00
Callum Tait
d72c396ff1 docs: slight correction for a multi controller env 2022-03-29 16:57:58 +01:00
Milan Aleks
13e7b440a8 chore: typo fix in runner Dockerfile [skip ci] (#1270) 2022-03-29 11:05:24 +01:00
Michael Goodness
a95983fb98 feat(kustomize): add github-webhook-server overlay (#1198)
* feat(kustomize): add github-webhook-server overlay

* chore(kustomize): add image to github-webhook-server overlay

* feat(kustomize): drop sync-period
2022-03-29 11:00:55 +01:00
Callum Tait
ecc8b4472a chore: bump chart to latest (#1280) 2022-03-29 07:46:44 +01:00
Callum Tait
459beeafb9 docs: remove the nonsense 2022-03-27 14:15:42 +01:00
Rolf Ahrenberg
1b327a0721 refactor: use const envvars (#1251) 2022-03-27 12:14:56 +01:00
Jérôme Foray
1f8a23c129 fix(chart): add namespace selector to webhooks when in singleNamespace mode (#1237)
* fix(chart): add namespace selector to webhooks when in singleNamespace mode

* docs: expand multi controller setup

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-03-27 11:52:39 +01:00
Naka Masato
af8d8f7e1d Update runnerdeployment_webhook.go (#1271) 2022-03-25 09:24:13 +09:00
Yusuke Kuoka
e7ef21fdf9 Merge pull request #1264 from ekarlso/env-var-detection-fix
Use container name to detect runner container in Pod
2022-03-25 09:23:48 +09:00
Endre Karlson
ee7484ac91 Use container name to detect runner container in Pod 2022-03-23 12:39:58 +01:00
Yusuke Kuoka
debf53c640 Fix missing pip bin path (/home/runner/.local/bin) (#1263)
Fixes #1261
2022-03-23 10:28:12 +09:00
Felipe Galindo Sanchez
9657d3e5b3 Fix deleting a runner when pod was deleted
With the current implementation if a pod is deleted, controller is failing to delete the runner as it's trying to annotate a pod that doesn't exist as we're passing a new pod object that is not an existing resource
2022-03-22 14:44:50 -07:00
Callum Tait
2cb04ddde7 * feat: move to new run.sh container friendly file (#1244)
* fix: unit tests were very broken

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-03-22 19:02:51 +00:00
renovate[bot]
366f8927d8 chore(deps): update actions/cache action to v3 (#1252)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-22 18:48:23 +00:00
Richard Fussenegger
532a2bb2a9 feat: remove registration-only runner logic from entrypoint (#1249)
Closes #1207
2022-03-22 18:33:14 +00:00
Callum Tait
f28cecffe9 docs: various minor changes (#1250)
* docs: various minor changes

* docs: format fixes
2022-03-20 16:05:03 +00:00
Renovate Bot
4cbbcd64ce chore(deps): update dependency actions/runner to v2.289.1 2022-03-18 22:36:38 +00:00
Richard Fussenegger
a68eede616 feat: copy dotfiles from asset to service dir (#1136)
* feat: copy dotfiles from asset to service dir

* Fixed `UNITTEST` Condition

* Load `/etc/environment`

See https://github.com/actions/runner/issues/1703 for context on this change.
2022-03-18 07:40:52 +00:00
Julien Tanay
c06a806d75 Add note about having 100+ replicas (#1103) 2022-03-16 21:03:05 +00:00
Callum Tait
857c1700ba docs: add repo update to upgrade notes (#1233) 2022-03-16 10:37:37 +00:00
Callum Tait
a40793bb60 chore: bump chart to app 0.22.0 (#1232)
* chore: bump chart to app 0.22.0
2022-03-16 07:57:30 +00:00
Callum Tait
48a7b78bf3 docs: remove runnerset limitation (#1225)
This works great from testing now, this is no longer a limitation due to ARC now creating a statefulset per runner
2022-03-16 09:08:41 +09:00
renovate[bot]
6ff93eae95 chore(deps): update helm/chart-testing-action action to v2.2.1 (#1216)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-15 18:51:54 +00:00
Yusuke Kuoka
b25a0fd606 Merge pull request #1217 from actions-runner-controller/docs/re-order
docs: various changes in preparation for 0.22.0 release

- Move RunnerSets to a more predominant location in the docs
- Clean up  a few bits
- Highlight the deprecation and removal timeline for the `--once` flag
- Renamed ephemeral runners section to something more logical (persistent runners). Static runners were an option however the word static is awkward as it's sort of tied up with autoscaling and the `Runner` kind so Persistent was chosen instead.
- Update upgrade docs to use `replace` instead of `apply`
2022-03-15 09:01:32 +09:00
toast-gear
3beef84f30 docs: better sentences 2022-03-14 12:43:07 +00:00
toast-gear
76cc758d12 docs: minor consistency change 2022-03-14 12:37:57 +00:00
toast-gear
c4c6e833a7 chore: add deprecation warning 2022-03-14 12:35:07 +00:00
toast-gear
ecf74e615e docs: bump versions and upgrade instructions 2022-03-14 10:23:36 +00:00
toast-gear
bb19e85037 docs: various cleanups and re-orderings 2022-03-14 09:52:22 +00:00
Yusuke Kuoka
e7200f274d Merge pull request #1214 from actions-runner-controller/fix-static-runners
Fix runner{set,deployment} rollouts and static runner scaling

I was testing static runners as a preparation to cut the next release of ARC, v0.22.0, and found several problems that I thought worth being fixed.

In particular, this PR fixes static runners reliability issues in two means.

c4b24f8366 fixes the issue that ARC gives up retrying RemoveRunner calls too early, especially on static runners, that resulted in static runners to often get terminated prematurely while running jobs.

791634fb12 fixes the issue that ARC was unable to scale up any static runners when the corresponding desired replicas number in e.g. RunnerDeployment gets updated. It was caused by a bug in the mechanism that is intended to prevent ephemeral runners from being recreated in unwanted circumstances, mistakenly triggered also for static runners.

Since #1179, RunnerDeployment was not able to gracefully terminate old RunnerReplicaSet on update. c612e87 fixes that by changing RunnerDeployment to firstly scale old RunnerReplicaSet(s) down to zero and waits for sync, and set the deletion timestamp only after that. That way, RunnerDeployment can ensure that all the old RunnerReplicaSets that are being deleted are already scaled to zero passing the standard unregister-and-then-delete runner termination process.

It revealed a hidden bug in #1179 that sometimes the scale-to-zero-before-runnerreplicaset-termination does not work as intended. 4551309 fixes that, so that RunnerDeployment can actually terminate old RunnerReplicaSets gracefully.
2022-03-13 22:19:26 +09:00
Yusuke Kuoka
1cc06e7408 e2e: Make enterprise runners optional for testing GitHub App
As GitHub App does not allow ARC to access enterprise runner related API endpoints, like the create-registration-token API.
2022-03-13 13:11:26 +00:00
Yusuke Kuoka
4551309e30 Fix runners to not terminate before unregistration when scaling down
#1179 was not working particularly for scale down of static (and perhaps long-running ephemeral) runners, which resulted in some runner pods are terminated before the requested unregistration processes complete, that triggered some in-progress workflow jobs to hang forever. This fixes an edge-case that resulted in a decreased desired replicas to trigger the failure, so that every runner is unregistered then terminated, as originally designed.
2022-03-13 13:09:46 +00:00
Yusuke Kuoka
7123b18a47 chore: Log more variables when log level is -2 2022-03-13 13:04:28 +00:00
Yusuke Kuoka
cc55d0bd7d Let runnerdeployment controller log runnerreplicaset creation 2022-03-13 12:25:53 +00:00
Yusuke Kuoka
c612e87d85 fix: Let RunnerDeployment scale RunnerReplicaSet to zero before terminating it
so that hopefully RunnerDeployment can gracefully termiante older RunnerReplicaSet on update.
2022-03-13 12:18:22 +00:00
Yusuke Kuoka
326d6a1fe8 Fix the timing of Marking owner for unregistration completion log 2022-03-13 12:16:55 +00:00
Yusuke Kuoka
fa8ff70aa2 Add log when deletion timestamp is being set on owner object 2022-03-13 12:16:29 +00:00
Yusuke Kuoka
efb7fca308 Fix externally deleted runner pod to not block unregistration process 2022-03-13 12:15:49 +00:00
Yusuke Kuoka
e4280dcb0d Fix patch MergeFrom target 2022-03-13 12:14:14 +00:00
Yusuke Kuoka
f153870f5f fix: Do not block indefinitely on runner that cannot be deleted due to 403 2022-03-13 12:12:01 +00:00
Yusuke Kuoka
8ca39caff5 Fix log message on runner deletion 2022-03-13 12:11:11 +00:00
Yusuke Kuoka
791634fb12 Fix static runners not scaling up
It turned out that #1179 broke static runners in a way it is no longer able to scale up at all when the desired replicas is updated.
This fixes that by correcting a certain short-circuit that is intended only for ephemeral runners to not mistakenly triggered for static runners.
2022-03-13 07:26:43 +00:00
Yusuke Kuoka
c4b24f8366 Prevent static runners from terminating due to unregister timeout
The unregister timeout of 1 minute (no matter how long it is) can negatively impact availability of static runner constantly running workflow jobs, and ephemeral runner that runs a long-running job.
We deal with that by completely removing the unregistaration timeout, so that regarldess of the type of runner(static or ephemeral) it waits forever until it successfully to get unregistered before being terminated.
2022-03-13 07:26:36 +00:00
Yusuke Kuoka
a1c6d1d11a doc: Add release note for 0.22.0 (#1199)
As it turned out to be the biggest release ever, I was afraid I might not be able to write a summary of changes that communicates well. Here is my attempt. Please review and leave any comments so that we can be more confident in this release. Thank you!
2022-03-13 16:25:24 +09:00
Yusuke Kuoka
adc889ce8a Fix RunnerDeployment to be able to finish rollout (#1213)
I found that #1179 was unable to finish rollout of an RunnerDeployment update(like runner env update). It was able to create a new RunnerReplicaSet with the desired spec, but unable to tear down the older ones. This fixes that.
2022-03-13 10:10:24 +09:00
Yusuke Kuoka
b83db7be8f Merge pull request #1212 from actions-runner-controller/fix-runnerdeploy-duplicate-envvars
Fix RunnerDeployment-managed runner pods to not get RUNNER_NAME and RUNNER_TOKEN injected twice
2022-03-12 23:27:45 +09:00
Yusuke Kuoka
da2adc0cc5 e2e: Omit RUNNER_FEATURE_FLAG_EPHEMERAL when TEST_FEATURE_FLAG_EPHEMERAL is not set 2022-03-12 14:08:23 +00:00
Yusuke Kuoka
fa287c4395 Fix RunnerDeployment-managed runner pods to not get RUNNER_NAME and RUNNER_TOKEN injected twice
Since #1179, runner pods managed by RunnerDeployment had two duplicate environment variables for RUNNER_NAME and RUNNER_TOKEN. This fixes that.
2022-03-12 13:49:50 +00:00
Yusuke Kuoka
7c0340dea0 Merge pull request #1211 from actions-runner-controller/use-ephemeral-by-default
Use --ephemeral by default

Every runner is now --ephemeral by default.

Note that this works by ARC setting the RUNNER_FEATURE_FLAG_EPHEMERAL envvar to true by default. Previously you had to explicitly set it to true otherwise the runner was passed --once which is known to various race conditions.

It's worth noting that the very confusing and related configuration, ephemeral: true, which creates --once runners instead of static(or persistent) runners had been the default since many months ago. So, this should be the only change needed to make every runner ephemeral without any explicit configuration.

You can still fall back to static(persistent) runners by setting ephemeral: false, and to --once runners by setting RUNNER_FEATURE_FLAG_EPHEMERAL to "false". But I don't think there're many reasons to do so.

Ref #1189
2022-03-12 22:47:38 +09:00
Yusuke Kuoka
c3dd1c5c05 e2e: Make TEST_FEATURE_FLAG_EPHEMERAL optional 2022-03-12 13:32:42 +00:00
Yusuke Kuoka
051089733b Use --ephemeral by default
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1189
2022-03-12 13:20:07 +00:00
Yusuke Kuoka
757e0a82a2 Merge pull request #1210 from actions-runner-controller/fix-github-api-cache-for-github-app-mode
Fix GitHub API cache to work with GitHub App authentication
2022-03-12 21:17:25 +09:00
Yusuke Kuoka
83e550cde5 Experimetanl log level "-4" for logging every HTTP round-trip for GitHub API calls 2022-03-12 12:11:16 +00:00
Yusuke Kuoka
22ef7b3a71 acceptance,e2e: Fix deploy.sh and e2e_test.go for testing with GitHub App 2022-03-12 12:10:04 +00:00
Yusuke Kuoka
28fccbcecd Fix GitHub API cache to work with GitHub App authentication
The version of `bradleyfalzon/ghinstallation` which is used to enable GitHub App authentication turned out to add an extra header `application/vnd.github.machine-man-preview+json` to every HTTP request. That revealed an edge-case in our HTTP cache layer `gregjones/httpcache` that results it to not serve responses from cache when it should.

There were two problems. One was that it does not support multi-valued header and it only looked for the first value for each header, and another is that it does not support any http.RoundTripper implementation that modifies HTTP request headers in a RoundTrip function call.

I fixed it in my fork of httpcache, which is hosted at https://github.com/actions-runner-controller/httpcache.

The relevant commits are:

- 70d975e77d
- 197a8a3546

This can be considered as a follow-up for #1127, which turned out to have enabled the cache only for the case that ARC uses PAT for authentication.
Since this fix, the cache is also enabled when ARC authenticates as a GitHub App.
2022-03-12 11:14:16 +00:00
Yusuke Kuoka
9628bb2937 Prevent RemoveRunner spam on busy ephemeral runner scale down (#1204)
Since #1127 and #1167, we had been retrying `RemoveRunner` API call on each graceful runner stop attempt when the runner was still busy.
There was no reliable way to throttle the retry attempts. The combination of these resulted in ARC spamming RemoveRunner calls(one call per reconciliation loop but the loop runs quite often due to how the controller works) when it failed once due to that the runner is in the middle of running a workflow job.

This fixes that, by adding a few short-circuit conditions that would work for ephemeral runners. An ephemeral runner can unregister itself on completion so in most of cases ARC can just wait for the runner to stop if it's already running a job. As a RemoveRunner response of status 422 implies that the runner is running a job, we can use that as a trigger to start the runner stop waiter.

The end result is that 422 errors will be observed at most once per the whole graceful termination process of an ephemeral runner pod. RemoveRunner API calls are never retried for ephemeral runners. ARC consumes less GitHub API rate limit budget and logs are much cleaner than before.

Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1167#issuecomment-1064213271
2022-03-11 19:03:17 +09:00
Renovate Bot
736a53fed6 fix(deps): update golang.org/x/oauth2 commit hash to 6242fa9 2022-03-10 08:39:51 +09:00
yourmoonlight
132faa13a1 docs: fix the helm command for webhook installation (#1188)
* fix doc for install the webhook server

* modify cmd with single set && add double quote for zsh users
2022-03-08 17:59:01 +00:00
Callum Tait
66e070f798 docs: remove githubAPICacheDuration from docs (#1194) 2022-03-08 13:27:30 +00:00
Yusuke Kuoka
55ff4de79a Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192)
* Remove legacy GitHub API cache of HRA.Status.CachedEntries

We migrated to the transport-level cache introduced in #1127 so not only this is useless, it is harder to deduce which cache resulted in the desired replicas number calculated by HRA.
Just remove the legacy cache to keep it simple and easy to understand.

* Deprecate githubAPICacheDuration helm chart value and the --github-api-cache-duration as well

* Fix integration test
2022-03-08 19:05:43 +09:00
Yusuke Kuoka
301439b06a chore: Change log ts format to RFC3339 (#1191)
The TimeEncoder for zap seems to have been set to EpochTimeEncoder which is the default and it was not very readable. Changing it to a TimeEncoderOfLayout(time.RFC3339) for readability.

Another benefit of doing this is the ts format is now consistent with various timestamps ARC put into pod and other custom resource annotations.
2022-03-08 10:34:52 +09:00
Yusuke Kuoka
15ee6d6360 chore: Reorganize "Calculated desired replicas log fields (#1190)
So that `max` is emitted immediately after `min`, which is the counterpart of it.
2022-03-08 10:29:53 +09:00
Felipe Galindo Sanchez
5b899f578b fix(chart): allow to use basic auth when authSecret.create is false (#1149)
* fix(chart): allow to use basic auth when authSecret.create is false

When secret is created outside of the ARC chart using authSecret.create=false and basicAuth,
the controller fails as we're not including the basic password as environment variable as
the password value won't be inside the helm values.

This PR includes both environment variables for consistent regardless if
those are set or not similar as the rest of the other auth options (e.g
app_id, private  key, etc)

* chart: Add back the conditional block for .Values.authSecret.github_basicauth_username

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-03-07 10:07:24 +09:00
Yusuke Kuoka
d8c9eb7ba7 Fix arm64 image (#1185)
Fixes #1184
2022-03-07 10:00:20 +09:00
Yusuke Kuoka
cbbc383a80 Auto-correct replicas number on missing webhook_job completion event (#1180)
While testing #1179, I discovered that ARC sometimes stop resyncing RunnerReplicaSet when the desired replicas is greater than the actual number of runner pods.
This seems to happen when ARC missed receiving a workflow_job completion event but it has no way to decide if it is either (1) something went wrong on ARC or (2) a loadbalancer in the middle or GitHub or anything not ARC went wrong. It needs a standard to decide it, or if it's not impossible, how to deal with it.

In this change, I added a hard-coded 10 minutes timeout(can be made customizable later) to prevent runner pod recreation.
Now, a RunnerReplicaSet/RunnerSet to restart runner pod recreation 10 minutes after the last scale-up. If the workflow completion event arrived after the timeout, it will decrease the desired replicas number that results in the removal of a runner pod. The removed runner pod might be deleted without ever being used, but I think that's better than leaving the desired replicas and the actual number of replicas diverged forever.
2022-03-07 09:35:13 +09:00
seplak
b57e885a73 Fix service account typo in Helm README (#1183)
Just fixing a typo I discovered while reading through the README.
2022-03-07 08:39:01 +09:00
Yusuke Kuoka
bed927052d Merge pull request #1179 from actions-runner-controller/refactor-runner-and-runnerset
Refactor Runner and RunnerSet so that they use the same library code that powers RunnerSet.

RunnerSet is StatefulSet-based and RunnerSet/Runner is Pod-based so it had been hard to unify the implementation although they look very similar in many aspects.

This change finally resolves that issue, by first introducing a library that implements the generic logic that is used to reconcile RunnerSet, then adding an adapter that can be used to let the generic logic manage runner pods via Runner, instead of via StatefulSet.

Follow-up for #1127, #1167, and 1178
2022-03-06 15:56:51 +09:00
Yusuke Kuoka
14a878bfae refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
Yusuke Kuoka
c95e84a528 refactor: Extract runner pod owner management out of runnerset controller
so that it can potentially be reusable from runnerreplicaset controller
2022-03-05 12:18:02 +00:00
Yusuke Kuoka
95a5770d55 Fix regression that registration-timeout check was not working for runnerset (#1178)
Follow-up for #1167
2022-03-05 19:31:05 +09:00
Yusuke Kuoka
9cc9f8c182 chore: Add a few comments to runnerset and runnerpod controllers to help potential contributors 2022-03-05 05:41:56 +00:00
Yusuke Kuoka
b7c5611516 dockerfile: Fix unintended removal of CGO_ENABLED=0 2022-03-05 05:41:56 +00:00
Yusuke Kuoka
138e326705 chore: Add comment on lastSyncTime in runnerset controller 2022-03-05 05:41:56 +00:00
Renovate Bot
c21fa75afa fix(deps): update kubernetes packages to v0.23.4 2022-03-04 08:39:18 +09:00
Yusuke Kuoka
34483e268f ci: Enable actions/cache for Go modules 2022-03-03 18:47:54 +09:00
Yusuke Kuoka
5f2b5327f7 integration: Reduce error logs to ease debugging 2022-03-03 18:47:54 +09:00
renovate[bot]
a93b2fdad4 fix(deps): update golang.org/x/oauth2 commit hash to ee48083 (#1150)
fix(deps): update golang.org/x/oauth2 commit hash to ee48083

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-03 18:00:43 +09:00
Yusuke Kuoka
25570a0c6d Fix docker build 2022-03-03 02:05:38 +00:00
Felipe Galindo Sanchez
d20ad71071 Fix minor log in runner controller (#1175)
Log is mentioning registration only but this is about the standard runner pod
2022-03-03 09:51:30 +09:00
Daniel
8a379ac94b Add custom volume mount documentation (#1045)
one example for in-memory
and one example for NVME backed storage, also pointing out all the
current flaws/risks for that configuration
2022-03-03 09:13:42 +09:00
Felipe Galindo Sanchez
27563c4378 Remove unused function (#1173) 2022-03-03 09:02:47 +09:00
Felipe Galindo Sanchez
4a0f68bfe3 Cleanup extra block in runner controller (#1174) 2022-03-03 09:01:34 +09:00
Yusuke Kuoka
1917cf90c4 chore: Tweak runner-id annotation name and the annotation prefix to be more consistent 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
0ba3cad6c2 fix: Prefix runner pod related annotation keys with actions/ to make them distinguishable from other annotations 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
7f0e65cb73 refactor: Extract definitions of various annotation keys and other defaults to their own source 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
12a04b7f38 Fix typo in comment 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
a3072c110d Prevent runnerset pod unregistration until it gets runner ID
This eliminates the race condition that results in the runner terminated prematurely when RunnerSet triggered unregistration of StatefulSet that added just a few seconds ago.
2022-03-02 19:03:20 +09:00
Yusuke Kuoka
15b402bb32 Make RunnerSet much more reliable with or without webhook 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
11be6c1fb6 Prevent runner pod deletion delay when pod disappeared before unregistration 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
59c3288e87 acceptance,e2e: Automate restarts of ARC pods in case image tag is not changed 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
5030e075a9 dockerfile,e2e: Use buildx and cache mounts for faster rebuilds in E2E 2022-03-02 19:03:20 +09:00
Yusuke Kuoka
3115d71471 acceptance,e2e: Enhance deploy.sh to support more types of runnersets 2022-03-02 19:03:20 +09:00
Renovate Bot
c221b6e278 chore(deps): update actions/checkout action to v3 2022-03-02 11:05:16 +09:00
Renovate Bot
a8dbc8a501 fix(deps): update module github.com/prometheus/client_golang to v1.12.1 2022-03-02 10:56:53 +09:00
Renovate Bot
b1ac63683f fix(deps): update module go.uber.org/zap to v1.21.0 2022-03-02 10:54:35 +09:00
Renovate Bot
10bc28af75 fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.1 2022-03-02 10:52:43 +09:00
Renovate Bot
e23692b3bc chore(deps): update actions/setup-python action to v3 2022-03-02 10:51:22 +09:00
renovate[bot]
e7f4a0e200 chore(deps): update actions/setup-go action to v3 (#1163)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-02 10:51:01 +09:00
Yusuke Kuoka
828ddcd44e Merge pull request #1151 from fgalind1/improve-logs
logging: improve logs for scaling
2022-03-02 10:46:53 +09:00
Yusuke Kuoka
fc821fd473 Merge pull request #1168 from actions-runner-controller/docs/better-runner-group-description
docs: better runner group description
2022-03-02 10:31:22 +09:00
Callum Tait
4b0aa92286 docs: better wording 2022-03-01 08:56:30 +00:00
Callum Tait
c69c8dd84d docs: better runner group description 2022-03-01 08:54:24 +00:00
Renovate Bot
e42db00006 chore(deps): update dependency actions/runner to v2.288.1 2022-02-28 22:30:10 +00:00
Felipe Galindo Sanchez
eff0c7364f Merge branch 'master' into improve-logs 2022-02-28 09:25:30 -08:00
Tingluo Huang
516695b275 Set UserAgent to 'actions-runner-controller' for all Http Client. (#1140)
I can't find any requests made by user agent `actions-runner-controller` in GitHub.com's telemetry in the past 7 days.

Turns out we only set user agent `actions-runner-controller` if we are configured to use BasicAuth which is not the case for most customers I think.

I update the code a little bit to make sure it always set `actions-runner-controller` as UserAgent for the GitHub HttpClient in ARC.

A further step would be somehow baking the ARC release version into the UserAgent as well.
2022-02-28 09:17:58 +09:00
Yusuke Kuoka
686d40c20d Merge pull request #1127 from actions-runner-controller/github-api-cache
Enhances ARC(both the controller-manager and github-webhook-server) to cache any GitHub API responses with HTTP GET and an appropriate Cache-Control header.

Ref #920

## Cache Implementation

`gregjones/httpcache` has been chosen as a library to implement this feature, as it is as recommended in `go-github`'s documentation:

https://github.com/google/go-github#conditional-requests

`gregjones/httpcache` supports a number of cache backends like `diskcache`, `s3cache`, and so on:

https://github.com/gregjones/httpcache#cache-backends

We stick to the built-in in-memory cache as a starter. Probably this will never becomes an issue as long as various HTTP responses for all the GitHub API calls that ARC makes, list-runners, list-workflow-jobs, list-runner-groups, etc., doesn't overflow the in-memory cache.

`httpcache` has an known unfixed issue that it doesn't update cache on chunked responses. But we assume that the APIs that we call doesn't use chunked responses. See #1503 for more information on that.

## Ephemeral runner pods are no longer recreated

The addition of the cache layer resulted in a slow down of a scale-down process and a trade-off between making the runner pod termination process fragile to various race conditions(shorter grace period before runner deletion) or delaying runner pod deletion depending on how long the grace period is(longer grace period). A grace period needs to be at least longer than 60s (which is the same as cache duration of ListRunners API) to not prematurely delete a runner pod that was just created.

But once I disabled automatic recreation of ephemeral runner pod, it turned out to be no more of an issue when it's being scaled via workflow_job webhook.

Ephemeral runner resources are still automatically added on demand by RunnerDeployment via RunnerReplicaSet(I've added `EffectiveTime` fields to our CRDs but that's an implementation detail so let's omit). A good side-effect of disabling ephemeral runner pod recreations is that ARC will no longer create redundant ephemeral runners when used with webhook-based autoscaler.

Basically, autoscaling still works as everyone might expect. It's just better than before overall.
2022-02-28 08:37:26 +09:00
Renovate Bot
f0fa99fc53 chore(deps): update dependency actions/runner to v2.288.0 2022-02-26 01:34:49 +00:00
Javier Sotelo
6b12413fdd Add optional hostNetwork (#1035)
Co-authored-by: jsotelo <javier.sotelo@viasat.com>
2022-02-23 20:11:40 +00:00
Felipe Galindo Sanchez
3abecd0f19 logging: improve logs for scaling 2022-02-23 08:29:13 -08:00
Callum Tait
7156ce040e chore: bump chart (#1138) 2022-02-21 09:24:14 +00:00
Yusuke Kuoka
1463d4927f acceptance,e2e: Let capacity reservation expired more later 2022-02-21 00:07:49 +00:00
Yusuke Kuoka
5bc16f2619 Enhance HRA capacity reservation update log 2022-02-21 00:06:26 +00:00
Yusuke Kuoka
b8e65aa857 Prevent unnecessary ephemeral runner recreations 2022-02-20 13:45:42 +00:00
Yusuke Kuoka
d4a9750e20 acceptance,e2e: Enhance E2E test and deploy.sh to support scaleDownDelaySeconds~ and minReplicas for HRA 2022-02-20 13:45:42 +00:00
Yusuke Kuoka
a6f0e0008f Make unregistration timeout and retry delay configurable in integration tests 2022-02-20 12:05:34 +00:00
Yusuke Kuoka
79a31328a5 Stop recreating ephemeral runner pod
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/911#issuecomment-1046161384
2022-02-20 04:42:19 +00:00
Yusuke Kuoka
4e6bfd8114 e2e: Add ability to toggle dockerdWithinRunnerContainer 2022-02-20 04:37:15 +00:00
Yusuke Kuoka
3c16188371 Introduce consistent timeouts for runner unregistration and runner pod deletion
Enhances runner controller and runner pod controller to have consistent timeouts for runner unregistration and runner pod deletion,
so that we are very much unlikely to terminate pods that are running any jobs.
2022-02-20 04:36:35 +00:00
Yusuke Kuoka
9e356b419e chart: Add default-logs-container annotation to controller pods
so that you can run `kubectl logs` on controller pods without the specifying the container name.

It is especially useful when you want to run kubectl-logs on all ARC pods across controller-manager and github-webhook-server like:

```
kubectl -n actions-runner-system logs -l app.kubernetes.io/name=actions-runner-controller
```

That was previously impossible due to that the selector matches pods from both controller-manager and github-webhook-server and kubectl does not provide a way to specify container names for respective pods.
2022-02-19 12:22:53 +00:00
Yusuke Kuoka
f3ceccd904 acceptance: Improve deploy.sh to recreate ARC (not runner) pods on new test id
So that one does not need to manually recreate ARC pods frequently.
2022-02-19 12:22:53 +00:00
Yusuke Kuoka
4b557dc54c Add logging transport to log HTTP requests in log level -3
The log level -3 is the minimum log level that is supported today, smaller than debug(-1) and -2(used to log some HRA related logs).

This commit adds a logging HTTP transport to log HTTP requests and responses to that log level.

It implements http.RoundTripper so that it can log each HTTP request with useful metadata like `from_cache` and `ratelimit_remaining`.
The former is set to `true` only when the logged request's response was served from ARC's in-memory cache.
The latter is set to X-RateLimit-Remaining response header value if and only if the response was served by GitHub, not by ARC's cache.
2022-02-19 12:22:53 +00:00
Yusuke Kuoka
4c53e3aa75 Add GitHub API cache to avoid rate limit
This will cache any GitHub API responses with correct Cache-Control header.

`gregjones/httpcache` has been chosen as a library to implement this feature, as it is as recommended in `go-github`'s documentation:

https://github.com/google/go-github#conditional-requests

`gregjones/httpcache` supports a number of cache backends like `diskcache`, `s3cache`, and so on:

https://github.com/gregjones/httpcache#cache-backends

We stick to the built-in in-memory cache as a starter. Probably this will never becomes an issue as long as various HTTP responses for all the GitHub API calls that ARC makes, list-runners, list-workflow-jobs, list-runner-groups, etc., doesn't overflow the in-memory cache.

`httpcache` has an known unfixed issue that it doesn't update cache on chunked responses. But we assume that the APIs that we call doesn't use chunked responses. See #1503 for more information on that.

Ref #920
2022-02-19 12:22:53 +00:00
Tingluo Huang
0b9bef2c08 Try to unconfig runner before deleting the pod to recreate (#1125)
There is a race condition between ARC and GitHub service about deleting runner pod.

- The ARC use REST API to find a particular runner in a pod that is not running any jobs, so it decides to delete the pod.
- A job is queued on the GitHub service side, and it sends the job to this idle runner right before ARC deletes the pod.
- The ARC delete the runner pod which cause the in-progress job to end up canceled.

To avoid this race condition, I am calling `r.unregisterRunner()` before deleting the pod.
- `r.unregisterRunner()` will return 204 to indicate the runner is deleted from the GitHub service, we should be safe to delete the pod.
- `r.unregisterRunner` will return 400 to indicate the runner is still running a job, so we will leave this runner pod as it is.

TODO: I need to do some E2E tests to force the race condition to happen.

Ref #911
2022-02-19 21:22:31 +09:00
Yusuke Kuoka
a5ed6bd263 Fix RunerSet managed runner pods to terminate more gracefully (#1126)
Make RunnerSet-managed runners as reliable as RunnerDeployment-managed runners.

Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/911#issuecomment-1042404460
2022-02-19 21:19:37 +09:00
Yusuke Kuoka
921f547200 fix: Do recreate runner pod on registration token update (#1087)
Apparently, we've been missed taking an updated registration token into account when generating the pod template hash which is used to detect if the runner pod needs to be recreated.

This shouldn't have been the end of the world since the runner pod is recreated on the next reconciliation loop anyway, but this change will make the pod recreation happen one reconciliation loop earlier so that you're less likely to get runner pods with outdated refresh tokens.

Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1085#issuecomment-1027433365
2022-02-19 21:18:00 +09:00
Felipe Galindo Sanchez
9079c5d85f fix: configure logger before trying to log (#1128)
Log about GitHub client not being initialized is not seen as logger is configured after adding the log
2022-02-19 20:56:58 +09:00
Yusuke Kuoka
a9aea0bd9c Fix issue that visible runner groups are printed as if empty in log 2022-02-19 14:43:41 +09:00
Yusuke Kuoka
fcf4778bac Fix regression that prevented default organizational runner group from being scale target
Fixes #1131
2022-02-19 14:43:41 +09:00
Yusuke Kuoka
eb0a4a9603 chart: Bump to 0.16.0 (with appVersion 0.21.0) 2022-02-18 01:57:37 +00:00
120 changed files with 5317 additions and 2870 deletions

View File

@@ -11,3 +11,4 @@ charts
*.md
*.txt
*.sh
test/e2e/.docker-build

View File

@@ -1,36 +0,0 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**Checks**
- [ ] My actions-runner-controller version (v0.x.y) does support the feature
- [ ] I'm using an unreleased version of the controller I built from HEAD of the default branch
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Environment (please complete the following information):**
- Controller Version [e.g. 0.18.2]
- Deployment Method [e.g. Helm and Kustomize ]
- Helm Chart Version [e.g. 0.11.0, if applicable]
**Additional context**
Add any other context about the problem here.

160
.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file
View File

@@ -0,0 +1,160 @@
name: Bug Report
description: File a bug report
title: "Bug"
labels: ["bug"]
body:
- type: input
id: controller-version
attributes:
label: Controller Version
description: Refer to semver-like release tags for controller versions. Any release tags prefixed with `actions-runner-controller-` are for chart releases
placeholder: ex. 0.18.2 or git commit ID
validations:
required: true
- type: input
id: chart-version
attributes:
label: Helm Chart Version
description: Run `helm list` and see what's shown under CHART VERSION. Any release tags prefixed with `actions-runner-controller-` are for chart releases
placeholder: ex. 0.11.0
- type: dropdown
id: deployment-method
attributes:
label: Deployment Method
description: Which deployment method did you use to install ARC?
options:
- Helm
- Kustomize
- ArgoCD
- Other
validations:
required: true
- type: checkboxes
id: checks
attributes:
label: Checks
description: Please check the boxes below before submitting
options:
- label: This isn't a question or user support case (For Q&A and community support, go to [Discussions](https://github.com/actions-runner-controller/actions-runner-controller/discussions). It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
required: true
- label: I've read [releasenotes](https://github.com/actions-runner-controller/actions-runner-controller/tree/master/docs/releasenotes) before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
required: true
- label: My actions-runner-controller version (v0.x.y) does support the feature
required: true
- label: I've already upgraded ARC to the latest and it didn't fix the issue
required: true
- type: textarea
id: resource-definitions
attributes:
label: Resource Definitions
description: "Add copy(s) of your resource definition(s) (RunnerDeployment or RunnerSet, and HorizontalRunnerAutoscaler. If RunnerSet, also include the StorageClass being used)"
render: yaml
placeholder: |
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example
spec:
#snip
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: example
spec:
#snip
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: example
provisioner: ...
reclaimPolicy: ...
volumeBindingMode: ...
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name:
spec:
#snip
validations:
required: true
- type: textarea
id: reproduction-steps
attributes:
label: To Reproduce
description: "Steps to reproduce the behavior"
render: markdown
placeholder: |
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
validations:
required: true
- type: textarea
id: actual-behavior
attributes:
label: Describe the bug
description: Also tell us, what did happen?
placeholder: A clear and concise description of what happened.
validations:
required: true
- type: textarea
id: expected-behavior
attributes:
label: Describe the expected behavior
description: Also tell us, what did you expect to happen?
placeholder: A clear and concise description of what the expected behavior is.
validations:
required: true
- type: textarea
id: controller-logs
attributes:
label: Controller Logs
description: "Include logs from `actions-runner-controller`'s controller-manager pod"
render: shell
placeholder: |
To grab controller logs:
# Set NS according to your setup
NS=actions-runner-system
# Grab the pod name and set it to $POD_NAME
kubectl -n $NS get po
kubectl -n $NS logs $POD_NAME > arc.log
Upload it to e.g. https://gist.github.com/ and paste the link to it here.
validations:
required: true
- type: textarea
id: runner-pod-logs
attributes:
label: Runner Pod Logs
description: "Include logs from runner pod(s)"
render: shell
placeholder: |
To grab the runner pod logs:
# Set NS according to your setup. It should match your RunnerDeployment's metadata.namespace.
NS=default
# Grab the name of the problematic runner pod and set it to $POD_NAME
kubectl -n $NS get po
kubectl -n $NS logs $POD_NAME -c runner > runnerpod_runner.log
kubectl -n $NS logs $POD_NAME -c docker > runnerpod_docker.log
Upload it to e.g. https://gist.github.com/ and paste the link to it here.
validations:
required: true
- type: textarea
id: additional-context
attributes:
label: Additional Context
description: |
Add any other context about the problem here.
Tip: You can attach images or log files by clicking this area to highlight it and then dragging files in.

15
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1,15 @@
# Blank issues are mainly for maintainers who are known to write complete issue descriptions without need to following a form
blank_issues_enabled: true
contact_links:
- name: Sponsor ARC Maintainers
about: If your business relies on the continued maintainance of actions-runner-controller, please consider sponsoring the project and the maintainers.
url: https://github.com/actions-runner-controller/actions-runner-controller/tree/master/CODEOWNERS
- name: Ideas and Feature Requests
about: Wanna request a feature? Create a discussion and collect :+1:s first.
url: https://github.com/actions-runner-controller/actions-runner-controller/discussions/new?category=ideas
- name: Questions and User Support
about: Need support using ARC? We use Discussions as the place to provide community support.
url: https://github.com/actions-runner-controller/actions-runner-controller/discussions/new?category=questions
- name: Need Paid Support?
about: Consider contracting with any of the actions-runner-controller maintainers and contributors.
url: https://github.com/actions-runner-controller/actions-runner-controller/tree/master/CODEOWNERS

View File

@@ -14,10 +14,28 @@
// use https://github.com/actions/runner/releases
"fileMatch": [
".github/workflows/runners.yml"
],
],
"matchStrings": ["RUNNER_VERSION: +(?<currentValue>.*?)\\n"],
"depNameTemplate": "actions/runner",
"datasourceTemplate": "github-releases"
},
{
"fileMatch": [
"runner/Makefile",
"Makefile"
],
"matchStrings": ["RUNNER_VERSION \\?= +(?<currentValue>.*?)\\n"],
"depNameTemplate": "actions/runner",
"datasourceTemplate": "github-releases"
},
{
"fileMatch": [
"runner/Dockerfile",
"runner/Dockerfile.dindrunner"
],
"matchStrings": ["RUNNER_VERSION=+(?<currentValue>.*?)\\n"],
"depNameTemplate": "actions/runner",
"datasourceTemplate": "github-releases"
}
]
}
}

67
.github/stale.yml vendored
View File

@@ -1,67 +0,0 @@
# Configuration for probot-stale - https://github.com/probot/stale
# Number of days of inactivity before an Issue or Pull Request becomes stale
daysUntilStale: 30
# Number of days of inactivity before an Issue or Pull Request with the stale label is closed.
# Set to false to disable. If disabled, issues still need to be closed manually, but will remain marked as stale.
daysUntilClose: 14
# Only issues or pull requests with all of these labels are check if stale. Defaults to `[]` (disabled)
onlyLabels: []
# Issues or Pull Requests with these labels will never be considered stale. Set to `[]` to disable
exemptLabels:
- pinned
- security
- enhancement
- refactor
- documentation
- chore
- bug
- dependencies
- needs-investigation
# Set to true to ignore issues in a project (defaults to false)
exemptProjects: false
# Set to true to ignore issues in a milestone (defaults to false)
exemptMilestones: false
# Set to true to ignore issues with an assignee (defaults to false)
exemptAssignees: false
# Label to use when marking as stale
staleLabel: stale
# Comment to post when marking as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when removing the stale label.
# unmarkComment: >
# Your comment here.
# Comment to post when closing a stale Issue or Pull Request.
# closeComment: >
# Your comment here.
# Limit the number of actions per hour, from 1-30. Default is 30
limitPerRun: 30
# Limit to only `issues` or `pulls`
# only: issues
# Optionally, specify configuration settings that are specific to just 'issues' or 'pulls':
# pulls:
# daysUntilStale: 30
# markComment: >
# This pull request has been automatically marked as stale because it has not had
# recent activity. It will be closed if no further activity occurs. Thank you
# for your contributions.
# issues:
# exemptLabels:
# - confirmed

View File

@@ -18,12 +18,12 @@ jobs:
name: Lint Chart
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Helm
uses: azure/setup-helm@v2.0
uses: azure/setup-helm@v2.1
with:
version: ${{ env.HELM_VERSION }}
@@ -44,12 +44,12 @@ jobs:
--enable-optional-test container-security-context-readonlyrootfilesystem
# python is a requirement for the chart-testing action below (supports yamllint among other tests)
- uses: actions/setup-python@v2
- uses: actions/setup-python@v3
with:
python-version: 3.7
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.2.0
uses: helm/chart-testing-action@v2.2.1
- name: Run chart-testing (list-changed)
id: list-changed

View File

@@ -23,12 +23,12 @@ jobs:
publish-chart: ${{ steps.publish-chart-step.outputs.publish }}
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Helm
uses: azure/setup-helm@v2.0
uses: azure/setup-helm@v2.1
with:
version: ${{ env.HELM_VERSION }}
@@ -49,12 +49,12 @@ jobs:
--enable-optional-test container-security-context-readonlyrootfilesystem
# python is a requirement for the chart-testing action below (supports yamllint among other tests)
- uses: actions/setup-python@v2
- uses: actions/setup-python@v3
with:
python-version: 3.7
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.2.0
uses: helm/chart-testing-action@v2.2.1
- name: Run chart-testing (list-changed)
id: list-changed
@@ -104,7 +104,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0
@@ -114,7 +114,7 @@ jobs:
git config user.email "$GITHUB_ACTOR@users.noreply.github.com"
- name: Run chart-releaser
uses: helm/chart-releaser-action@v1.3.0
uses: helm/chart-releaser-action@v1.4.0
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

View File

@@ -16,11 +16,11 @@ jobs:
run: echo ::set-output name=sha_short::${GITHUB_SHA::7}
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
- uses: actions/setup-go@v2
- uses: actions/setup-go@v3
with:
go-version: '^1.17.7'
go-version: '1.17.7'
- name: Install tools
run: |

View File

@@ -11,11 +11,12 @@ on:
- 'master'
paths:
- 'runner/**'
- '!runner/Makefile'
- .github/workflows/runners.yml
- '!**.md'
env:
RUNNER_VERSION: 2.287.1
RUNNER_VERSION: 2.290.1
DOCKER_VERSION: 20.10.12
DOCKERHUB_USERNAME: summerwind
@@ -41,7 +42,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Setup Docker Environment
id: vars

19
.github/workflows/stale.yaml vendored Normal file
View File

@@ -0,0 +1,19 @@
name: 'Close stale issues and PRs'
on:
schedule:
# 01:30 every day
- cron: '30 1 * * *'
jobs:
stale:
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v5
with:
stale-issue-message: 'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
# turn off stale for both issues and PRs
days-before-stale: -1
# turn stale back on for issues only
days-before-issue-stale: 30
days-before-issue-close: 14
exempt-issue-labels: 'pinned,security,enhancement,refactor,documentation,chore,bug,dependencies,needs-investigation'

View File

@@ -15,8 +15,7 @@ jobs:
name: Test entrypoint
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Run unit tests for entrypoint.sh
run: |
cd test/entrypoint
bash entrypoint_unittest.sh
make acceptance/runner/entrypoint

View File

@@ -21,11 +21,18 @@ jobs:
name: Test
steps:
- name: Checkout
uses: actions/checkout@v2
- uses: actions/setup-go@v2
uses: actions/checkout@v3
- uses: actions/setup-go@v3
with:
go-version: '^1.17.7'
go-version: '1.17.7'
check-latest: false
- run: go version
- uses: actions/cache@v3
with:
path: ~/go/pkg/mod
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ runner.os }}-go-
- name: Install kubebuilder
run: |
curl -L -O https://github.com/kubernetes-sigs/kubebuilder/releases/download/v2.3.2/kubebuilder_2.3.2_linux_amd64.tar.gz

View File

@@ -22,7 +22,7 @@ jobs:
DOCKERHUB_USERNAME: ${{ secrets.DOCKER_USER }}
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Set up QEMU
uses: docker/setup-qemu-action@v1

2
CODEOWNERS Normal file
View File

@@ -0,0 +1,2 @@
# actions-runner-controller maintainers
* @mumoshu @toast-gear

View File

@@ -1,29 +1,44 @@
# Build the manager binary
FROM golang:1.17 as builder
ARG TARGETPLATFORM
FROM --platform=$BUILDPLATFORM golang:1.17 as builder
WORKDIR /workspace
ENV GO111MODULE=on \
CGO_ENABLED=0
# Make it runnable on a distroless image/without libc
ENV CGO_ENABLED=0
# Copy the Go Modules manifests
COPY go.mod go.sum ./
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
# and so that source changes don't invalidate our downloaded layer.
#
# Also, we need to do this before setting TARGETPLATFORM/TARGETOS/TARGETARCH/TARGETVARIANT
# so that go mod cache is shared across platforms.
RUN go mod download
# Copy the go source
COPY . .
# COPY . .
# Usage:
# docker buildx build --tag repo/img:tag -f ./Dockerfile . --platform linux/amd64,linux/arm64,linux/arm/v7
#
# With the above commmand,
# TARGETOS can be "linux", TARGETARCH can be "amd64", "arm64", and "arm", TARGETVARIANT can be "v7".
ARG TARGETPLATFORM TARGETOS TARGETARCH TARGETVARIANT
# We intentionally avoid `--mount=type=cache,mode=0777,target=/go/pkg/mod` in the `go mod download` and the `go build` runs
# to avoid https://github.com/moby/buildkit/issues/2334
# We can use docker layer cache so the build is fast enogh anyway
# We also use per-platform GOCACHE for the same reason.
env GOCACHE /build/${TARGETPLATFORM}/root/.cache/go-build
# Build
RUN export GOOS=$(echo ${TARGETPLATFORM} | cut -d / -f1) && \
export GOARCH=$(echo ${TARGETPLATFORM} | cut -d / -f2) && \
GOARM=$(echo ${TARGETPLATFORM} | cut -d / -f3 | cut -c2-) && \
go build -a -o manager main.go && \
go build -a -o github-webhook-server ./cmd/githubwebhookserver
RUN --mount=target=. \
--mount=type=cache,mode=0777,target=${GOCACHE} \
export GOOS=${TARGETOS} GOARCH=${TARGETARCH} GOARM=${TARGETVARIANT#v} && \
go build -o /out/manager main.go && \
go build -o /out/github-webhook-server ./cmd/githubwebhookserver
# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
@@ -31,8 +46,8 @@ FROM gcr.io/distroless/static:nonroot
WORKDIR /
COPY --from=builder /workspace/manager .
COPY --from=builder /workspace/github-webhook-server .
COPY --from=builder /out/manager .
COPY --from=builder /out/github-webhook-server .
USER nonroot:nonroot

View File

@@ -5,6 +5,7 @@ else
endif
DOCKER_USER ?= $(shell echo ${NAME} | cut -d / -f1)
VERSION ?= latest
RUNNER_VERSION ?= 2.290.1
TARGETPLATFORM ?= $(shell arch)
RUNNER_NAME ?= ${DOCKER_USER}/actions-runner
RUNNER_TAG ?= ${VERSION}
@@ -12,7 +13,7 @@ TEST_REPO ?= ${DOCKER_USER}/actions-runner-controller
TEST_ORG ?=
TEST_ORG_REPO ?=
TEST_EPHEMERAL ?= false
SYNC_PERIOD ?= 5m
SYNC_PERIOD ?= 1m
USE_RUNNERSET ?=
RUNNER_FEATURE_FLAG_EPHEMERAL ?=
KUBECONTEXT ?= kind-acceptance
@@ -109,13 +110,9 @@ vet:
generate: controller-gen
$(CONTROLLER_GEN) object:headerFile=./hack/boilerplate.go.txt paths="./..."
# Build the docker image
docker-build:
docker build -t ${NAME}:${VERSION} .
docker build -t ${RUNNER_NAME}:${RUNNER_TAG} --build-arg TARGETPLATFORM=${TARGETPLATFORM} runner
docker-buildx:
export DOCKER_CLI_EXPERIMENTAL=enabled
export DOCKER_CLI_EXPERIMENTAL=enabled ;\
export DOCKER_BUILDKIT=1
@if ! docker buildx ls | grep -q container-builder; then\
docker buildx create --platform ${PLATFORMS} --name container-builder --use;\
fi
@@ -197,6 +194,9 @@ acceptance/deploy:
acceptance/tests:
acceptance/checks.sh
acceptance/runner/entrypoint:
cd test/entrypoint/ && bash test.sh
# We use -count=1 instead of `go clean -testcache`
# See https://terratest.gruntwork.io/docs/testing-best-practices/avoid-test-caching/
.PHONY: e2e

475
README.md
View File

@@ -1,4 +1,4 @@
# actions-runner-controller
# actions-runner-controller (ARC)
[![awesome-runners](https://img.shields.io/badge/listed%20on-awesome--runners-blue.svg)](https://github.com/jonico/awesome-runners)
@@ -6,7 +6,8 @@ This controller operates self-hosted runners for GitHub Actions on your Kubernet
ToC:
- [Motivation](#motivation)
- [Status](#status)
- [About](#about)
- [Installation](#installation)
- [GitHub Enterprise Support](#github-enterprise-support)
- [Setting Up Authentication with GitHub API](#setting-up-authentication-with-github-api)
@@ -18,6 +19,8 @@ ToC:
- [Organization Runners](#organization-runners)
- [Enterprise Runners](#enterprise-runners)
- [RunnerDeployments](#runnerdeployments)
- [RunnerSets](#runnersets)
- [Persistent Runners](#persistent-runners)
- [Autoscaling](#autoscaling)
- [Anti-Flapping Configuration](#anti-flapping-configuration)
- [Pull Driven Scaling](#pull-driven-scaling)
@@ -26,37 +29,43 @@ ToC:
- [Scheduled Overrides](#scheduled-overrides)
- [Runner with DinD](#runner-with-dind)
- [Additional Tweaks](#additional-tweaks)
- [Custom Volume mounts](#custom-volume-mounts)
- [Runner Labels](#runner-labels)
- [Runner Groups](#runner-groups)
- [Runner Entrypoint Features](#runner-entrypoint-features)
- [Using IRSA (IAM Roles for Service Accounts) in EKS](#using-irsa-iam-roles-for-service-accounts-in-eks)
- [Stateful Runners](#stateful-runners)
- [Ephemeral Runners](#ephemeral-runners)
- [Software Installed in the Runner Image](#software-installed-in-the-runner-image)
- [Using without cert-manager](#using-without-cert-manager)
- [Common Errors](#common-errors)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)
## Motivation
## Status
Even though actions-runner-controller is used in production environments, it is still in its early stage of development, hence versioned 0.x.
actions-runner-controller complies to Semantic Versioning 2.0.0 in which v0.x means that there could be backward-incompatible changes for every release.
The documentation is kept inline with master@HEAD, we do our best to highlight any features that require a specific ARC version or higher however this is not always easily done due to there being many moving parts. Additionally, we actively do not retain compatibly with every GitHub Enterprise Server version nor every Kubernetes version so you will need to ensure you stay current within a reasonable timespan.
## About
[GitHub Actions](https://github.com/features/actions) is a very useful tool for automating development. GitHub Actions jobs are run in the cloud by default, but you may want to run your jobs in your environment. [Self-hosted runner](https://github.com/actions/runner) can be used for such use cases, but requires the provisioning and configuration of a virtual machine instance. Instead if you already have a Kubernetes cluster, it makes more sense to run the self-hosted runner on top of it.
**actions-runner-controller** makes that possible. Just create a *Runner* resource on your Kubernetes, and it will run and operate the self-hosted runner for the specified repository. Combined with Kubernetes RBAC, you can also build simple Self-hosted runners as a Service.
## Installation
By default, actions-runner-controller uses [cert-manager](https://cert-manager.io/docs/installation/kubernetes/) for certificate management of Admission Webhook. Make sure you have already installed cert-manager before you install. The installation instructions for cert-manager can be found below.
By default, actions-runner-controller uses [cert-manager](https://cert-manager.io/docs/installation/kubernetes/) for certificate management of Admission Webhook. Make sure you have already installed cert-manager before you install. The installation instructions for the cert-manager can be found below.
- [Installing cert-manager on Kubernetes](https://cert-manager.io/docs/installation/kubernetes/)
Subsequent to this, install the custom resource definitions and actions-runner-controller with `kubectl` or `helm`. This will create actions-runner-system namespace in your Kubernetes and deploy the required resources.
Subsequent to this, install the custom resource definitions and actions-runner-controller with `kubectl` or `helm`. This will create an actions-runner-system namespace in your Kubernetes and deploy the required resources.
**Kubectl Deployment:**
```shell
# REPLACE "v0.20.2" with the version you wish to deploy
kubectl apply -f https://github.com/actions-runner-controller/actions-runner-controller/releases/download/v0.20.2/actions-runner-controller.yaml
# REPLACE "v0.22.0" with the version you wish to deploy
kubectl apply -f https://github.com/actions-runner-controller/actions-runner-controller/releases/download/v0.22.0/actions-runner-controller.yaml
```
**Helm Deployment:**
@@ -81,7 +90,7 @@ When deploying the solution for a GHES environment you need to provide an additi
kubectl set env deploy controller-manager -c manager GITHUB_ENTERPRISE_URL=<GHEC/S URL> --namespace actions-runner-system
```
**_Note: The repository maintainers do not have an enterprise environment (cloud or server). Support for the enterprise specific feature set is community driven and on a best effort basis. PRs from the community are welcomed to add features and maintain support._**
**_Note: The repository maintainers do not have an enterprise environment (cloud or server). Support for the enterprise specific feature set is community driven and on a best effort basis. PRs from the community are welcome to add features and maintain support._**
## Setting Up Authentication with GitHub API
@@ -90,7 +99,7 @@ There are two ways for actions-runner-controller to authenticate with the GitHub
1. Using a GitHub App (not supported for enterprise level runners due to lack of support from GitHub)
2. Using a PAT
Functionality wise, there isn't much of a difference between the 2 authentication methods. The primarily benefit of authenticating via a GitHub App is an [increased API quota](https://docs.github.com/en/developers/apps/rate-limits-for-github-apps).
Functionality wise, there isn't much of a difference between the 2 authentication methods. The primary benefit of authenticating via a GitHub App is an [increased API quota](https://docs.github.com/en/developers/apps/rate-limits-for-github-apps).
If you are deploying the solution for a GHES environment you are able to [configure your rate limit settings](https://docs.github.com/en/enterprise-server@3.0/admin/configuration/configuring-rate-limits) making the main benefit irrelevant. If you're deploying the solution for a GHEC or regular GitHub environment and you run into rate limit issues, consider deploying the solution using the GitHub App authentication method instead.
@@ -156,7 +165,7 @@ When the installation is complete, you will be taken to a URL in one of the foll
- `https://github.com/organizations/eventreactor/settings/installations/${INSTALLATION_ID}`
Finally, register the App ID (`APP_ID`), Installation ID (`INSTALLATION_ID`), and downloaded private key file (`PRIVATE_KEY_FILE_PATH`) to Kubernetes as Secret.
Finally, register the App ID (`APP_ID`), Installation ID (`INSTALLATION_ID`), and the downloaded private key file (`PRIVATE_KEY_FILE_PATH`) to Kubernetes as a secret.
**Kubectl Deployment:**
@@ -196,9 +205,9 @@ Log-in to a GitHub account that has `admin` privileges for the repository, and [
* admin:enterprise (manage_runners:enterprise)
_Note: When you deploy enterprise runners they will get access to organizations, however, access to the repositories themselves is **NOT** allowed by default. Each GitHub organization must allow enterprise runner groups to be used in repositories as an initial one time configuration step, this only needs to be done once after which it is permanent for that runner group._
_Note: When you deploy enterprise runners they will get access to organizations, however, access to the repositories themselves is **NOT** allowed by default. Each GitHub organization must allow enterprise runner groups to be used in repositories as an initial one-time configuration step, this only needs to be done once after which it is permanent for that runner group._
_Note: GitHub do not document exactly what permissions you get with each PAT scope beyond a vague description. The best documentation they provide on the topic can be found [here](https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps) if you wish to review. The docs target OAuth apps and so are incomplete and amy not be 100% accurate._
_Note: GitHub does not document exactly what permissions you get with each PAT scope beyond a vague description. The best documentation they provide on the topic can be found [here](https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps) if you wish to review. The docs target OAuth apps and so are incomplete and may not be 100% accurate._
---
@@ -220,20 +229,22 @@ Configure your values.yaml, see the chart's [README](./charts/actions-runner-con
> This feature requires controller version => [v0.18.0](https://github.com/actions-runner-controller/actions-runner-controller/releases/tag/v0.18.0)
**_Note: Be aware when using this feature that CRDs are cluster wide and so you should upgrade all of your controllers (and your CRDs) as the same time if you are doing an upgrade. Do not mix and match CRD versions with different controller versions. Doing so risks out of control scaling._**
**_Note: Be aware when using this feature that CRDs are cluster-wide and so you should upgrade all of your controllers (and your CRDs) at the same time if you are doing an upgrade. Do not mix and match CRD versions with different controller versions. Doing so risks out of control scaling._**
By default the controller will look for runners in all namespaces, the watch namespace feature allows you to restrict the controller to monitoring a single namespace. This then lets you deploy multiple controllers in a single cluster. You may want to do this either because you wish to scale beyond the API rate limit of a single PAT / GitHub App configuration or you wish to support multiple GitHub organizations with runners installed at the organization level in a single cluster.
This feature is configured via the controller's `--watch-namespace` flag. When a namespace is provided via this flag, the controller will only monitor runners in that namespace.
If you plan on installing all instances of the controller stack into a single namespace you will need to make the names of the resources unique to each stack. In the case of Helm this can be done by giving each install a unique release name, or via the `fullnameOverride` properties.
You can deploy multiple controllers either in a single shared namespace, or in a unique namespace per controller.
Alternatively, you can install each controller stack into its own unique namespace (relative to other controller stacks in the cluster), avoiding the need to uniquely prefix resources.
If you plan on installing all instances of the controller stack into a single namespace there are a few things you need to do for this to work.
When you go to the route of sharing the namespace while giving each a unique Helm release name, you must also ensure the following values are configured correctly:
1. All resources per stack must have a unique, in the case of Helm this can be done by giving each install a unique release name, or via the `fullnameOverride` properties.
2. `authSecret.name` needs to be unique per stack when each stack is tied to runners in different GitHub organizations and repositories AND you want your GitHub credentials to be narrowly scoped.
3. `leaderElectionId` needs to be unique per stack. If this is not unique to the stack the controller tries to race onto the leader election lock resulting in only one stack working concurrently. Your controller will be stuck with a log message something like this `attempting to acquire leader lease arc-controllers/actions-runner-controller...`
4. The MutatingWebhookConfiguration in each stack must include a namespace selector for that stack's corresponding runner namespace, this is already configured in the helm chart.
- `authSecret.name` needs be unique per stack when each stack is tied to runners in different GitHub organizations and repositories AND you want your GitHub credentials to narrowly scoped.
- `leaderElectionId` needs to be unique per stack. If this is not unique to the stack the controller tries to race onto the leader election lock and resulting in only one stack working concurrently.
Alternatively, you can install each controller stack into a unique namespace (relative to other controller stacks in the cluster). Implementing ARC this way avoids the first, second and third pitfalls (you still need to set the corresponding namespace selector for each stack's mutating webhook)
## Usage
@@ -249,7 +260,7 @@ There are two ways to use this controller:
### Repository Runners
To launch a single self-hosted runner, you need to create a manifest file includes `Runner` resource as follows. This example launches a self-hosted runner with name *example-runner* for the *actions-runner-controller/actions-runner-controller* repository.
To launch a single self-hosted runner, you need to create a manifest file that includes a `Runner` resource as follows. This example launches a self-hosted runner with name *example-runner* for the *actions-runner-controller/actions-runner-controller* repository.
```yaml
# runner.yaml
@@ -361,21 +372,133 @@ example-runnerdeploy2475h595fr mumoshu/actions-runner-controller-ci Running
example-runnerdeploy2475ht2qbr mumoshu/actions-runner-controller-ci Running
```
### RunnerSets
> This feature requires controller version => [v0.20.0](https://github.com/actions-runner-controller/actions-runner-controller/releases/tag/v0.20.0)
_Ensure you see the limitations before using this kind!!!!!_
For scenarios where you require the advantages of a `StatefulSet`, for example persistent storage, ARC implements a runner based on Kubernetes' `StatefulSets`, the `RunnerSet`.
A basic `RunnerSet` would look like this:
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: example
spec:
ephemeral: false
replicas: 2
repository: mumoshu/actions-runner-controller-ci
# Other mandatory fields from StatefulSet
selector:
matchLabels:
app: example
serviceName: example
template:
metadata:
labels:
app: example
```
As it is based on `StatefulSet`, `selector` and `template.medatada.labels` it needs to be defined and have the exact same set of labels. `serviceName` must be set to some non-empty string as it is also required by `StatefulSet`.
Runner-related fields like `ephemeral`, `repository`, `organization`, `enterprise`, and so on should be written directly under `spec`.
Fields like `volumeClaimTemplates` that originates from `StatefulSet` should also be written directly under `spec`.
Pod-related fields like security contexts and volumes are written under `spec.template.spec` like `StatefulSet`.
Similarly, container-related fields like resource requests and limits, container image names and tags, security context, and so on are written under `spec.template.spec.containers`. There are two reserved container `name`, `runner` and `docker`. The former is for the container that runs [actions runner](https://github.com/actions/runner) and the latter is for the container that runs a `dockerd`.
For a more complex example, see the below:
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: example
spec:
ephemeral: false
replicas: 2
repository: mumoshu/actions-runner-controller-ci
dockerdWithinRunnerContainer: true
template:
spec:
securityContext:
# All level/role/type/user values will vary based on your SELinux policies.
# See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/container_security_guide/docker_selinux_security_policy for information about SELinux with containers
seLinuxOptions:
level: "s0"
role: "system_r"
type: "super_t"
user: "system_u"
containers:
- name: runner
env: []
resources:
limits:
cpu: "4.0"
memory: "8Gi"
requests:
cpu: "2.0"
memory: "4Gi"
- name: docker
resources:
limits:
cpu: "4.0"
memory: "8Gi"
requests:
cpu: "2.0"
memory: "4Gi"
```
You can also read the design and usage documentation written in the original pull request that introduced `RunnerSet` for more information [#629](https://github.com/actions-runner-controller/actions-runner-controller/pull/629).
Under the hood, `RunnerSet` relies on Kubernetes's `StatefulSet` and Mutating Webhook. A `statefulset` is used to create a number of pods that has stable names and dynamically provisioned persistent volumes, so that each `statefulset-managed` pod gets the same persistent volume even after restarting. A mutating webhook is used to dynamically inject a runner's "registration token" which is used to call GitHub's "Create Runner" API.
**Limitations**
* For autoscaling the `RunnerSet` kind only supports pull driven scaling or the `workflow_job` event for webhook driven scaling.
* Whilst `RunnerSets` support all runner modes as well as autoscaling, currently PVs are **NOT** automatically cleaned up as they are still bound to their respective PVCs when a runner is deleted by the controller. This has **major** implications when using `RunnerSets` in the standard runner mode, `ephemeral: true`, see [persistent runners](#persistent-runners) for more details. As a result of this, using the default ephemeral configuration or implementing autoscaling for your `RunnerSets`, you will get a build-up of PVCs and PVs without some sort of custom solution for cleaning up.
### Persistent Runners
Every runner managed by ARC is "ephemeral" by default. The life of an ephemeral runner managed by ARC looks like this- ARC creates a runner pod for the runner. As it's an ephemeral runner, the `--ephemeral` flag is passed to the `actions/runner` agent that runs within the `runner` container of the runner pod.
`--ephemeral` is an `actions/runner` feature that instructs the runner to stop and de-register itself after the first job run.
Once the ephemeral runner has completed running a workflow job, it stops with a status code of 0, hence the runner pod is marked as completed, removed by ARC.
As it's removed after a workflow job run, the runner pod is never reused across multiple GitHub Actions workflow jobs, providing you a clean environment per each workflow job.
Although not generally recommended, it's possible to disable the passing of the `--ephemeral` flag by explicitly setting `ephemeral: false` in the `RunnerDeployment` or `RunnerSet` spec. When disabled, your runner becomes "persistent". A persistent runner does not stop after workflow job ends, and in this mode `actions/runner` is known to clean only runner's work dir after each job. Whilst this can seem helpful it creates a non-deterministic environment which is not ideal for a CI/CD environment. Between runs, your actions cache, docker images stored in the `dind` and layer cache, globally installed packages etc are retained across multiple workflow job runs which can cause issues that are hard to debug and inconsistent.
Persistent runners are available as an option for some edge cases however they are not preferred as they can create challenges around providing a deterministic and secure environment.
### Autoscaling
> Since the release of GitHub's [`workflow_job` webhook](https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job), webhook driven scaling is the preferred way of autoscaling as it enables targeted scaling of your `RunnerDeployment` / `RunnerSet` as it includes the `runs-on` information needed to scale the appropriate runners for that workflow run. More broadly, webhook driven scaling is the preferred scaling option as it is far quicker compared to the pull driven scaling and is easy to setup.
> Since the release of GitHub's [`workflow_job` webhook](https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job), webhook driven scaling is the preferred way of autoscaling as it enables targeted scaling of your `RunnerDeployment` / `RunnerSet` as it includes the `runs-on` information needed to scale the appropriate runners for that workflow run. More broadly, webhook driven scaling is the preferred scaling option as it is far quicker compared to the pull driven scaling and is easy to set up.
A `RunnerDeployment` or `RunnerSet` (see [stateful runners](#stateful-runners) for more details on this kind) can scale the number of runners between `minReplicas` and `maxReplicas` fields driven by either pull based scaling metrics or via a webhook event (see limitations section of [stateful runners](#stateful-runners) for cavaets of this kind). Whether the autoscaling is driven from a webhook event or pull based metrics it is implemented by backing a `RunnerDeployment` or `RunnerSet` kind with a `HorizontalRunnerAutoscaler` kind.
> If you are using controller version < [v0.22.0](https://github.com/actions-runner-controller/actions-runner-controller/releases/tag/v0.22.0) and you are not using GHES, and so can't set your rate limit budget, it is recommended that you use 100 replicas or fewer to prevent being rate limited.
A `RunnerDeployment` or `RunnerSet` can scale the number of runners between `minReplicas` and `maxReplicas` fields driven by either pull based scaling metrics or via a webhook event (see limitations section of [RunnerSets](#runnersets) for caveats of this kind). Whether the autoscaling is driven from a webhook event or pull based metrics it is implemented by backing a `RunnerDeployment` or `RunnerSet` kind with a `HorizontalRunnerAutoscaler` kind.
**_Important!!! If you opt to configure autoscaling, ensure you remove the `replicas:` attribute in the `RunnerDeployment` / `RunnerSet` kinds that are configured for autoscaling [#206](https://github.com/actions-runner-controller/actions-runner-controller/issues/206#issuecomment-748601907)_**
#### Anti-Flapping Configuration
For both pull driven or webhook driven scaling an anti-flapping implementation is included, by default a runner won't be scaled down within 10 minutes of it having been scaled up. This delay is configurable by including the attribute `scaleDownDelaySecondsAfterScaleOut:` in a `HorizontalRunnerAutoscaler` kind's `spec:`.
For both pull driven or webhook driven scaling an anti-flapping implementation is included, by default a runner won't be scaled down within 10 minutes of it having been scaled up.
This configuration has the final say on if a runner can be scaled down or not regardless of the chosen scaling method. Depending on your requirements, you may want to consider adjusting this by setting the `scaleDownDelaySecondsAfterScaleOut:` attribute.
This anti-flap configuration also has the final say on if a runner can be scaled down or not regardless of the chosen scaling method.
Below is a complete basic example with one of the pull driven scaling metrics.
This delay is configurable via 2 methods:
1. By setting a new default via the controller's `--default-scale-down-delay` flag
2. By setting by setting the attribute `scaleDownDelaySecondsAfterScaleOut:` in a `HorizontalRunnerAutoscaler` kind's `spec:`.
Below is a complete basic example of one of the pull driven scaling metrics.
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
@@ -413,7 +536,9 @@ spec:
> To configure webhook driven scaling see the [Webhook Driven Scaling](#webhook-driven-scaling) section
The pull based metrics are configured in the `metrics` attribute of a HRA (see snippet below). The period between polls is defined by the controller's `--sync-period` flag. If this flag isn't provided then the controller defaults to a sync period of 10 minutes. The default value is set to 10 minutes to prevent default deployments rate limiting themselves from the GitHub API, you will most likely want to adjust this.
The pull based metrics are configured in the `metrics` attribute of a HRA (see snippet below). The period between polls is defined by the controller's `--sync-period` flag. If this flag isn't provided then the controller defaults to a sync period of `1m`, this can be configured in seconds or minutes.
Be aware that the shorter the sync period the quicker you will consume your rate limit budget, depending on your environment this may or may not be a risk. Consider monitoring ARCs rate limit budget when configuring this feature to find the optimal performance sync period.
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
@@ -440,14 +565,13 @@ The `TotalNumberOfQueuedAndInProgressWorkflowRuns` metric polls GitHub for all p
**Benefits of this metric**
1. Supports named repositories allowing you to restrict the runner to a specified set of repositories server-side.
2. Scales the runner count based on the depth of the job queue meaning a more 1:1 scaling of runners to queued jobs (caveat, see drawback #4)
2. Scales the runner count based on the depth of the job queue meaning a 1:1 scaling of runners to queued jobs.
3. Like all scaling metrics, you can manage workflow allocation to the RunnerDeployment through the use of [GitHub labels](#runner-labels).
**Drawbacks of this metric**
1. A list of repositories must be included within the scaling metric. Maintaining a list of repositories may not be viable in larger environments or self-serve environments.
2. May not scale quick enough for some users needs. This metric is pull based and so the queue depth is polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
3. Relatively large amounts of API requests required to maintain this metric, you may run in API rate limit issues depending on the size of your environment and how aggressive your sync period configuration is.
4. The GitHub API doesn't provide a way to filter workflow jobs to just those targeting self-hosted runners. If your environment's workflows target both self-hosted and GitHub hosted runners then the queue depth this metric scales against isn't a true 1:1 mapping of queue depth to required runner count. As a result of this, this metric may scale too aggressively for your actual self-hosted runner count needs.
2. May not scale quickly enough for some users' needs. This metric is pull based and so the queue depth is polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
3. Relatively large amounts of API requests are required to maintain this metric, you may run into API rate limit issues depending on the size of your environment and how aggressive your sync period configuration is.
Example `RunnerDeployment` backed by a `HorizontalRunnerAutoscaler`:
@@ -468,7 +592,7 @@ metadata:
spec:
scaleTargetRef:
name: example-runner-deployment
# Uncomment the below in case the target is not RunnerDeployment but RunnerSet
# IMPORTANT : If your HRA is targeting a RunnerSet you must specify the kind in the scaleTargetRef:, uncomment the below
#kind: RunnerSet
minReplicas: 1
maxReplicas: 5
@@ -489,7 +613,7 @@ The `HorizontalRunnerAutoscaler` will poll GitHub for the number of runners in t
4. Supports scaling desired runner count on both a percentage increase / decrease basis as well as on a fixed increase / decrease count basis [#223](https://github.com/actions-runner-controller/actions-runner-controller/pull/223) [#315](https://github.com/actions-runner-controller/actions-runner-controller/pull/315)
**Drawbacks of this metric**
1. May not scale quick enough for some users needs. This metric is pull based and so the number of busy runners are polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
1. May not scale quickly enough for some users' needs. This metric is pull based and so the number of busy runners is polled as configured by the sync period, as a result scaling performance is bound by this sync period meaning there is a lag to scaling activity.
2. We are scaling up and down based on indicative information rather than a count of the actual number of queued jobs and so the desired runner count is likely to under provision new runners or overprovision them relative to actual job queue depth, this may or may not be a problem for you.
Examples of each scaling type implemented with a `RunnerDeployment` backed by a `HorizontalRunnerAutoscaler`:
@@ -540,10 +664,10 @@ spec:
> To configure pull driven scaling see the [Pull Driven Scaling](#pull-driven-scaling) section
Webhooks are processed by a seperate webhook server. The webhook server receives GitHub Webhook events and scales
Webhooks are processed by a separate webhook server. The webhook server receives GitHub Webhook events and scales
[`RunnerDeployments`](#runnerdeployments) by updating corresponding [`HorizontalRunnerAutoscalers`](#autoscaling).
Today, the Webhook server can be configured to respond GitHub `check_run`, `workflow_job`, `pull_request` and `push` events
Today, the Webhook server can be configured to respond to GitHub's `check_run`, `workflow_job`, `pull_request`, and `push` events
by scaling up the matching `HorizontalRunnerAutoscaler` by N replica(s), where `N` is configurable within `HorizontalRunnerAutoscaler`'s `spec:`.
More concretely, you can configure the targeted GitHub event types and the `N` in `scaleUpTriggers`:
@@ -566,27 +690,27 @@ spec:
With the above example, the webhook server scales `example-runners` by `1` replica for 5 minutes on each `check_run` event with the type of `created` and the status of `queued` received.
Of note is the `HRA.spec.scaleUpTriggers[].duration` attribute. This attribute is used to calculate if the replica number added via the trigger is expired or not. On each reconcilation loop, the controller sums up all the non-expiring replica numbers from previous scale up triggers. It then compares the summed desired replica number against the current replica number. If the summed desired replica number > the current number then it means the replica count needs to scale up.
Of note is the `HRA.spec.scaleUpTriggers[].duration` attribute. This attribute is used to calculate if the replica number added via the trigger is expired or not. On each reconciliation loop, the controller sums up all the non-expiring replica numbers from previous scale-up triggers. It then compares the summed desired replica number against the current replica number. If the summed desired replica number > the current number then it means the replica count needs to scale up.
As mentioned previously, the `scaleDownDelaySecondsAfterScaleOut` property has the final say still. If the latest scale-up time + the anti-flapping duration is later than the current time, it doesnt immediately scale up and instead retries the calculation again later to see if it needs to scale yet.
---
The primary benefit of autoscaling on Webhook compared to the pull driven scaling is that it is far quicker as it allows you to immediately add runners resource rather than waiting for the next sync period.
The primary benefit of autoscaling on Webhooks compared to the pull driven scaling is that it is far quicker as it allows you to immediately add runner resources rather than waiting for the next sync period.
> You can learn the implementation details in [#282](https://github.com/actions-runner-controller/actions-runner-controller/pull/282)
To enable this feature, you firstly need to install the webhook server, currently, only our Helm chart has the ability install it:
To enable this feature, you first need to install the GitHub webhook server. To install via our Helm chart,
_[see the values documentation for all configuration options](https://github.com/actions-runner-controller/actions-runner-controller/blob/master/charts/actions-runner-controller/README.md)_
```console
$ helm --upgrade install actions-runner-controller/actions-runner-controller \
githubWebhookServer.enabled=true \
githubWebhookServer.ports[0].nodePort=33080
$ helm upgrade --install --namespace actions-runner-system --create-namespace \
--wait actions-runner-controller actions-runner-controller/actions-runner-controller \
--set "githubWebhookServer.enabled=true,githubWebhookServer.ports[0].nodePort=33080"
```
The above command will result in exposing the node port 33080 for Webhook events. Usually, you need to create an
external loadbalancer targeted to the node port, and register the hostname or the IP address of the external loadbalancer
external load balancer targeted to the node port, and register the hostname or the IP address of the external load balancer
to the GitHub Webhook.
Once you were able to confirm that the Webhook server is ready and running from GitHub - this is usually verified by the
@@ -598,13 +722,13 @@ by learning the following configuration examples.
- [Example 3: Scale on each `pull_request` event against a given set of branches](#example-3-scale-on-each-pull_request-event-against-a-given-set-of-branches)
- [Example 4: Scale on each `push` event](#example-4-scale-on-each-push-event)
**Note:** All these examples should have **minReplicas** & **maxReplicas** as mandatory parameter even for webhook driven scaling.
**Note:** All these examples should have **minReplicas** & **maxReplicas** as mandatory parameters even for webhook driven scaling.
##### Example 1: Scale on each `workflow_job` event
> This feature requires controller version => [v0.20.0](https://github.com/actions-runner-controller/actions-runner-controller/releases/tag/v0.20.0)
_Note: GitHub does not include the runner group information of a repository in the payload of `workflow_job` event in the initial `queued` event. The runner group information is only include for `workflow_job` events when the job has already been allocated to a runner (events with a status of `in_progress` or `completed`). Please do raise feature requests against [GitHub](https://support.github.com/tickets/personal/0) for this information to be included in the initial `queued` event if this would improve autoscaling runners for you._
_Note: GitHub does not include the runner group information of a repository in the payload of `workflow_job` event in the initial `queued` event. The runner group information is only included for `workflow_job` events when the job has already been allocated to a runner (events with a status of `in_progress` or `completed`). Please do raise feature requests against [GitHub](https://support.github.com/tickets/personal/0) for this information to be included in the initial `queued` event if this would improve autoscaling runners for you._
The most flexible webhook GitHub offers is the `workflow_job` webhook, it includes the `runs-on` information in the payload allowing scaling based on runner labels.
@@ -626,7 +750,8 @@ spec:
# Uncomment the below in case the target is not RunnerDeployment but RunnerSet
#kind: RunnerSet
scaleUpTriggers:
- githubEvent: {}
- githubEvent:
workflowJob: {}
duration: "30m"
```
@@ -634,7 +759,7 @@ This webhook requires you to explicitly set the labels in the RunnerDeployment /
You can configure your GitHub webhook settings to only include `Workflows Job` events, so that it sends us three kinds of `workflow_job` events per a job run.
Each kind has a `status` of `queued`, `in_progress` and `completed`. With the above configuration, `actions-runner-controller` adds one runner for a `workflow_job` event whose `status` is `queued`. Similarly, it removes one runner for a `workflow_job` event whose `status` is `completed`. The cavaet to this to remember is that this the scale down is within the bounds of your `scaleDownDelaySecondsAfterScaleOut` configuration, if this time hasn't past the scale down will be defered.
Each kind has a `status` of `queued`, `in_progress` and `completed`. With the above configuration, `actions-runner-controller` adds one runner for a `workflow_job` event whose `status` is `queued`. Similarly, it removes one runner for a `workflow_job` event whose `status` is `completed`. The caveat to this to remember is that this scale-down is within the bounds of your `scaleDownDelaySecondsAfterScaleOut` configuration, if this time hasn't passed the scale down will be deferred.
##### Example 2: Scale up on each `check_run` event
@@ -752,7 +877,7 @@ spec:
> This feature requires controller version => [v0.19.0](https://github.com/actions-runner-controller/actions-runner-controller/releases/tag/v0.19.0)
The regular `RunnerDeployment` `replicas:` attribute as well as the `HorizontalRunnerAutoscaler` `minReplicas:` attribute supports being set to 0.
The regular `RunnerDeployment` / `RunnerSet` `replicas:` attribute as well as the `HorizontalRunnerAutoscaler` `minReplicas:` attribute supports being set to 0.
The main use case for scaling from 0 is with the `HorizontalRunnerAutoscaler` kind. To scale from 0 whilst still being able to provision runners as jobs are queued we must use the `HorizontalRunnerAutoscaler` with only certain scaling configurations, only the below configurations support scaling from 0 whilst also being able to provision runners as jobs are queued:
@@ -761,17 +886,17 @@ The main use case for scaling from 0 is with the `HorizontalRunnerAutoscaler` ki
- `PercentageRunnersBusy` + Webhook-based autoscaling
- Webhook-based autoscaling only
`PercentageRunnersBusy` can't be used alone as, by its definition, it needs one or more GitHub runners to become `busy` to be able to scale. If there isn't a runner to pick up a job and enter a `busy` state then the controller will never know to provision a runner to begin with as this metric has no knowledge of the job queue and is relying using the number of busy runners as a means for calculating the desired replica count.
`PercentageRunnersBusy` can't be used alone as, by its definition, it needs one or more GitHub runners to become `busy` to be able to scale. If there isn't a runner to pick up a job and enter a `busy` state then the controller will never know to provision a runner to begin with as this metric has no knowledge of the job queue and is relying on using the number of busy runners as a means for calculating the desired replica count.
If a HorizontalRunnerAutoscaler is configured with a secondary metric of `TotalNumberOfQueuedAndInProgressWorkflowRuns` then be aware that the controller will check the primary metric of `PercentageRunnersBusy` first and will only use the secondary metric to calculate the desired replica count if the primary metric returns 0 desired replicas.
Webhook-based autoscaling is the best option as it is relatively easy to configure and also it can scale scale quickly.
Webhook-based autoscaling is the best option as it is relatively easy to configure and also it can scale quickly.
#### Scheduled Overrides
> This feature requires controller version => [v0.19.0](https://github.com/actions-runner-controller/actions-runner-controller/releases/tag/v0.19.0)
`Scheduled Overrides` allows you to configure `HorizontalRunnerAutoscaler` so that its `spec:` gets updated only during a certain period of time. This feature is usually used for following scenarios:
`Scheduled Overrides` allows you to configure `HorizontalRunnerAutoscaler` so that its `spec:` gets updated only during a certain period of time. This feature is usually used for the following scenarios:
- You want to reduce your infrastructure costs by scaling your Kubernetes nodes down outside a given period
- You want to scale for scheduled spikes in workloads
@@ -822,7 +947,7 @@ spec:
minReplicas: 1
```
A recurring override is initially active between `startTime` and `endTime`, and then it repeatedly get activated after a certain period of time denoted by `frequency`.
A recurring override is initially active between `startTime` and `endTime`, and then it repeatedly gets activated after a certain period of time denoted by `frequency`.
`frequecy` can take one of the following values:
@@ -831,21 +956,21 @@ spec:
- `Monthly`
- `Yearly`
By default, a scheduled override repeats forever. If you want it to repeat until a specific point in time, define `untilTime`. The controller create the last recurrence of the override until the recurrence's `startTime` is equal or earlier than `untilTime`.
By default, a scheduled override repeats forever. If you want it to repeat until a specific point in time, define `untilTime`. The controller creates the last recurrence of the override until the recurrence's `startTime` is equal or earlier than `untilTime`.
Do ensure that you have enough slack for `untilTime` so that a delayed or offline `actions-runner-controller` is much less likely to miss the last recurrence. For example, you might want to set `untilTime` to `M` minutes after the last recurrence's `startTime`, so that `actions-runner-controller` being offline up to `M` minutes doesn't miss the last recurrence.
**Combining Multiple Scheduled Overrides**:
In case you have a more complex scenarios, try writing two or more entries under `scheduledOverrides`.
In case you have a more complex scenario, try writing two or more entries under `scheduledOverrides`.
The earlier entry is prioritized higher than later entries. So you usually define one-time overrides in the top of your list, then yearly, monthly, weekly, and lastly daily overrides.
The earlier entry is prioritized higher than later entries. So you usually define one-time overrides at the top of your list, then yearly, monthly, weekly, and lastly daily overrides.
A common use case for this may be to have 1 override to scale to 0 during the week outside of core business hours and another override to scale to 0 during all hours of the weekend.
### Runner with DinD
When using default runner, runner pod starts up 2 containers: runner and DinD (Docker-in-Docker). This might create issues if there's `LimitRange` set to namespace.
When using the default runner, the runner pod starts up 2 containers: runner and DinD (Docker-in-Docker). This might create issues if there's `LimitRange` set to namespace.
```yaml
# dindrunnerdeployment.yaml
@@ -950,7 +1075,7 @@ spec:
# false (default) = Docker support is provided by a sidecar container deployed in the runner pod.
# true = No docker sidecar container is deployed in the runner pod but docker can be used within the runner container instead. The image summerwind/actions-runner-dind is used by default.
dockerdWithinRunnerContainer: true
#Optional environement variables for docker container
#Optional environment variables for docker container
# Valid only when dockerdWithinRunnerContainer=false
dockerEnv:
- name: HTTP_PROXY
@@ -992,7 +1117,7 @@ spec:
- mountPath: /var/lib/docker
name: docker-extra
# You can mount some of the shared volumes to the runner container using volumeMounts.
# NOTE: Do not try to mount the volume onto the runner workdir itself as it will not work. You could mount it however on a sub directory in the runner workdir
# NOTE: Do not try to mount the volume onto the runner workdir itself as it will not work. You could mount it however on a subdirectory in the runner workdir
# Please see https://github.com/actions-runner-controller/actions-runner-controller/issues/630#issuecomment-862087323 for more information.
volumeMounts:
- mountPath: /home/runner/work/repo
@@ -1015,6 +1140,84 @@ spec:
runtimeClassName: "runc"
```
### Custom Volume mounts
You can configure your own custom volume mounts. For example to have the work/docker data in memory or on NVME SSD, for
i/o intensive builds. Other custom volume mounts should be possible as well, see [kubernetes documentation](https://kubernetes.io/docs/concepts/storage/volumes/)
**RAM Disk Runner**<br />
Example how to place the runner work dir, docker sidecar and /tmp within the runner onto a ramdisk.
```yaml
kind: RunnerDeployment
spec:
template:
spec:
dockerVolumeMounts:
- mountPath: /var/lib/docker
name: docker
volumeMounts:
- mountPath: /tmp
name: tmp
volumes:
- name: docker
emptyDir:
medium: Memory
- name: work # this volume gets automatically used up for the workdir
emptyDir:
medium: Memory
- name: tmp
emptyDir:
medium: Memory
emphemeral: true # recommended to not leak data between builds.
```
**NVME SSD Runner**<br />
In this example we provide NVME backed storage for the workdir, docker sidecar and /tmp within the runner.
Here we use a working example on GKE, which will provide the NVME disk at /mnt/disks/ssd0. We will be placing the respective volumes in subdirs here and in order to be able to run multiple runners we will use the pod name as a prefix for subdirectories. Also the disk will fill up over time and disk space will not be freed until the node is removed.
**Beware** that running these persistent backend volumes **leave data behind** between 2 different jobs on the workdir and `/tmp` with `emphemeral: false`.
```yaml
kind: RunnerDeployment
spec:
template:
spec:
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
dockerVolumeMounts:
- mountPath: /var/lib/docker
name: docker
subPathExpr: $(POD_NAME)-docker
- mountPath: /runner/_work
name: work
subPathExpr: $(POD_NAME)-work
volumeMounts:
- mountPath: /runner/_work
name: work
subPathExpr: $(POD_NAME)-work
- mountPath: /tmp
name: tmp
subPathExpr: $(POD_NAME)-tmp
dockerEnv:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumes:
- hostPath:
path: /mnt/disks/ssd0
name: docker
- hostPath:
path: /mnt/disks/ssd0
name: work
- hostPath:
path: /mnt/disks/ssd0
name: tmp
emphemeral: true # VERY important. otherwise data inside the workdir and /tmp is not cleared between builds
```
### Runner Labels
To run a workflow job on a self-hosted runner, you can use the following syntax in your workflow:
@@ -1071,12 +1274,11 @@ spec:
group: NewGroup
```
GitHub supports custom visilibity in a Runner Group to make it available to a specific set of repositories only. By default if no GitHub
authentication is included in the GitHub webhook server it will be assumed that all runner groups to be usable in all repositories.
Supporting custom visibility requires to do a few GitHub API calls to find out what are the potential runner groups that are visible to
the webhook's repository, this may incur in increased API rate limiting when using github.com
GitHub supports custom visibility in a Runner Group to make it available to a specific set of repositories only. By default if no GitHub
authentication is included in the webhook server ARC will be assumed that all runner groups to be usable in all repositories.
Currently, GitHub does not include the repository runner group membership information in the workflow_job event (or any webhook). To make the ARC "runner group aware" additional GitHub API calls are needed to find out what runner groups are visible to the webhook's repository. This behaviour will impact your rate-limit budget and so the option needs to be explicitly configured by the end user.
This option will be enabled when proper GitHub authentication options (token, app or basic auth is provided) in the GitHub webhook server and `useRunnerGroupsVisibility` is set to true, e.g.
This option will be enabled when proper GitHub authentication options (token, app or basic auth) are provided in the webhook server and `useRunnerGroupsVisibility` is set to true, e.g.
```yaml
githubWebhookServer:
@@ -1109,16 +1311,23 @@ spec:
# Disables automatic runner updates
- name: DISABLE_RUNNER_UPDATE
value: "true"
# Configure runner with --ephemeral instead of --once flag
# WARNING | THIS ENV VAR IS DEPRECATED AND WILL BE REMOVED
# IN A FUTURE VERSION OF ARC. IN 0.22.0 ARC SETS --ephemeral VIA
# THE CONTROLLER SETTING THIS ENV VAR ON POD CREATION.
# THIS ENV VAR WILL BE REMOVED, SEE ISSUE #1196 FOR DETAILS
- name: RUNNER_FEATURE_FLAG_EPHEMERAL
value: "true"
```
### Using IRSA (IAM Roles for Service Accounts) in EKS
> This feature requires controller version => [v0.15.0](https://github.com/actions-runner-controller/actions-runner-controller/releases/tag/v0.15.0)
As similar as for regular pods and deployments, you firstly need an existing service account with the IAM role associated.
Similar to regular pods and deployments, you firstly need an existing service account with the IAM role associated.
Create one using e.g. `eksctl`. You can refer to [the EKS documentation](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) for more details.
Once you set up the service account, all you need is to add `serviceAccountName` and `fsGroup` to any pods that uses the IAM-role enabled service account.
Once you set up the service account, all you need is to add `serviceAccountName` and `fsGroup` to any pods that use the IAM-role enabled service account.
For `RunnerDeployment`, you can set those two fields under the runner spec at `RunnerDeployment.Spec.Template`:
@@ -1135,141 +1344,13 @@ spec:
securityContext:
fsGroup: 1000
```
### Stateful Runners
> This feature requires controller version => [v0.20.0](https://github.com/actions-runner-controller/actions-runner-controller/releases/tag/v0.20.0)
`actions-runner-controller` supports `RunnerSet` API that let you deploy stateful runners. A stateful runner is designed to be able to store some data persists across GitHub Actions workflow and job runs. You might find it useful, for example, to speed up your docker builds by persisting the docker layer cache.
A basic `RunnerSet` would look like this:
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: example
spec:
ephemeral: false
replicas: 2
repository: mumoshu/actions-runner-controller-ci
# Other mandatory fields from StatefulSet
selector:
matchLabels:
app: example
serviceName: example
template:
metadata:
labels:
app: example
```
As it is based on `StatefulSet`, `selector` and `template.medatada.labels` needs to be defined and have the exact same set of labels. `serviceName` must be set to some non-empty string as it is also required by `StatefulSet`.
Runner-related fields like `ephemeral`, `repository`, `organization`, `enterprise`, and so on should be written directly under `spec`.
Fields like `volumeClaimTemplates` that originates from `StatefulSet` should also be written directly under `spec`.
Pod-related fields like security contexts and volumes are written under `spec.template.spec` like `StatefulSet`.
Similarly, container-related fields like resource requests and limits, container image names and tags, security context, and so on are written under `spec.template.spec.containers`. There are two reserved container `name`, `runner` and `docker`. The former is for the container that runs [actions runner](https://github.com/actions/runner) and the latter is for the container that runs a dockerd.
For a more complex example, see the below:
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: example
spec:
ephemeral: false
replicas: 2
repository: mumoshu/actions-runner-controller-ci
dockerdWithinRunnerContainer: true
template:
spec:
securityContext:
#All level/role/type/user values will vary based on your SELinux policies.
#See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/container_security_guide/docker_selinux_security_policy for information about SELinux with containers
seLinuxOptions:
level: "s0"
role: "system_r"
type: "super_t"
user: "system_u"
containers:
- name: runner
env: []
resources:
limits:
cpu: "4.0"
memory: "8Gi"
requests:
cpu: "2.0"
memory: "4Gi"
- name: docker
resources:
limits:
cpu: "4.0"
memory: "8Gi"
requests:
cpu: "2.0"
memory: "4Gi"
```
You can also read the design and usage documentation written in the original pull request that introduced `RunnerSet` for more information.
https://github.com/actions-runner-controller/actions-runner-controller/pull/629
Under the hood, `RunnerSet` relies on Kubernetes's `StatefulSet` and Mutating Webhook. A statefulset is used to create a number of pods that has stable names and dynamically provisioned persistent volumes, so that each statefulset-managed pod gets the same persistent volume even after restarting. A mutating webhook is used to dynamically inject a runner's "registration token" which is used to call GitHub's "Create Runner" API.
We envision that `RunnerSet` will eventually replace `RunnerDeployment`, as `RunnerSet` provides a more standard API that is easy to learn and use because it is based on `StatefulSet`, and it has a support for `volumeClaimTemplates` which is crucial to manage dynamically provisioned persistent volumes.
**Limitations**
* For autoscaling the `RunnerSet` kind only supports pull driven scaling or the `workflow_job` event for webhook driven scaling.
* For autoscaling the `RunnerSet` kind doesn't support the [registration-only runner](#autoscaling-tofrom-0), these are deprecated however and to be [removed](https://github.com/actions-runner-controller/actions-runner-controller/issues/859)
* A known down-side of relying on `StatefulSet` is that it misses a support for `maxUnavailable`. A `StatefulSet` basically works like `maxUnavailable: 1` in `Deployment`, which means that it can take down only one pod concurrently while doing a rolling-update of pods. Kubernetes 1.22 doesn't support customizing it yet so probably it takes more releases to arrive. See https://github.com/kubernetes/kubernetes/issues/68397 for more information.
### Ephemeral Runners
Both `RunnerDeployment` and `RunnerSet` has ability to configure `ephemeral: true` in the spec.
When it is configured, it passes a `--once` flag to every runner.
`--once` is an experimental `actions/runner` feature that instructs the runner to stop after the first job run. It has a known race condition issue that means the runner may fetch a job even when it's being terminated. If a runner fetched a job while terminating, the job is very likely to fail because the terminating runner doesn't wait for the job to complete. This is tracked in issue [#466](https://github.com/actions-runner-controller/actions-runner-controller/issues/466).
Since the implementation of the `--once` flag GitHub have implemented the `--ephemeral` flag which has no known race conditions and is much more supported by GitHub, this is the prefered flag for ephemeral runners. To have your `RunnerDeployment` and `RunnerSet` kinds use this new flag instead of the `--once` flag set `RUNNER_FEATURE_FLAG_EPHEMERAL` to `"true"`. For example, a `RunnerSet` configured to use the new flag looks like:
```yaml
kind: RunnerSet
metadata:
name: example-runnerset
spec:
# ...
template:
metadata:
labels:
app: example-runnerset
spec:
containers:
- name: runner
imagePullPolicy: IfNotPresent
env:
- name: RUNNER_FEATURE_FLAG_EPHEMERAL
value: "true"
```
You should configure all your ephemeral runners to use the new flag unless you have a reason for needing to use the old flag.
Once able, `actions-runner-controller` will make `--ephemeral` the default option for `ephemeral: true` runners and potentially remove `--once` entirely. It is likely that in the future the `--once` flag will be officially deprecated by GitHub and subsquently removed in `actions/runner`.
### Software Installed in the Runner Image
**Cloud Tooling**<br />
The project supports being deployed on the various cloud Kubernetes platforms (e.g. EKS), it does not however aim to go beyond that. No cloud specific tooling is bundled in the base runner, this is an active decision to keep the overhead of maintaining the solution manageable.
**Bundled Software**<br />
The GitHub hosted runners include a large amount of pre-installed software packages. GitHub maintain a list in README files at <https://github.com/actions/virtual-environments/tree/main/images/linux>
The GitHub hosted runners include a large amount of pre-installed software packages. GitHub maintains a list in README files at <https://github.com/actions/virtual-environments/tree/main/images/linux>
This solution maintains a few runner images with `latest` aligning with GitHub's Ubuntu version, these images do not contain all of the software installed on the GitHub runners. The images contain the following subset of packages from the GitHub runners:
@@ -1280,7 +1361,7 @@ This solution maintains a few runner images with `latest` aligning with GitHub's
The virtual environments from GitHub contain a lot more software packages (different versions of Java, Node.js, Golang, .NET, etc) which are not provided in the runner image. Most of these have dedicated setup actions which allow the tools to be installed on-demand in a workflow, for example: `actions/setup-java` or `actions/setup-node`
If there is a need to include packages in the runner image for which there is no setup action, then this can be achieved by building a custom container image for the runner. The easiest way is to start with the `summerwind/actions-runner` image and installing the extra dependencies directly in the docker image:
If there is a need to include packages in the runner image for which there is no setup action, then this can be achieved by building a custom container image for the runner. The easiest way is to start with the `summerwind/actions-runner` image and then install the extra dependencies directly in the docker image:
```shell
FROM summerwind/actions-runner:latest
@@ -1333,7 +1414,7 @@ $ helm --upgrade install actions-runner-controller/actions-runner-controller \
# Troubleshooting
See [troubleshooting guide](TROUBLESHOOTING.md) for solutions to various problems people have ran into consistently.
See [troubleshooting guide](TROUBLESHOOTING.md) for solutions to various problems people have run into consistently.
# Contributing

View File

@@ -6,7 +6,7 @@ tpe=${ACCEPTANCE_TEST_SECRET_TYPE}
VALUES_FILE=${VALUES_FILE:-$(dirname $0)/values.yaml}
kubectl delete secret controller-manager || :
kubectl delete secret -n actions-runner-system controller-manager || :
if [ "${tpe}" == "token" ]; then
if ! kubectl get secret controller-manager -n actions-runner-system >/dev/null; then
@@ -18,8 +18,8 @@ elif [ "${tpe}" == "app" ]; then
kubectl create secret generic controller-manager \
-n actions-runner-system \
--from-literal=github_app_id=${APP_ID:?must not be empty} \
--from-literal=github_app_installation_id=${INSTALLATION_ID:?must not be empty} \
--from-file=github_app_private_key=${PRIVATE_KEY_FILE_PATH:?must not be empty}
--from-literal=github_app_installation_id=${APP_INSTALLATION_ID:?must not be empty} \
--from-file=github_app_private_key=${APP_PRIVATE_KEY_FILE:?must not be empty}
else
echo "ACCEPTANCE_TEST_SECRET_TYPE must be set to either \"token\" or \"app\"" 1>&2
exit 1
@@ -37,7 +37,10 @@ fi
tool=${ACCEPTANCE_TEST_DEPLOYMENT_TOOL}
TEST_ID=${TEST_ID:-default}
if [ "${tool}" == "helm" ]; then
set -v
helm upgrade --install actions-runner-controller \
charts/actions-runner-controller \
-n actions-runner-system \
@@ -46,41 +49,60 @@ if [ "${tool}" == "helm" ]; then
--set authSecret.create=false \
--set image.repository=${NAME} \
--set image.tag=${VERSION} \
--set podAnnotations.test-id=${TEST_ID} \
--set githubWebhookServer.podAnnotations.test-id=${TEST_ID} \
-f ${VALUES_FILE}
set +v
# To prevent `CustomResourceDefinition.apiextensions.k8s.io "runners.actions.summerwind.dev" is invalid: metadata.annotations: Too long: must have at most 262144 bytes`
# errors
kubectl create -f charts/actions-runner-controller/crds || kubectl replace -f charts/actions-runner-controller/crds
kubectl -n actions-runner-system wait deploy/actions-runner-controller --for condition=available --timeout 60s
# This wait fails due to timeout when it's already in crashloopback and this update doesn't change the image tag.
# That's why we add `|| :`. With that we prevent stopping the script in case of timeout and
# proceed to delete (possibly in crashloopback and/or running with outdated image) pods so that they are recreated by K8s.
kubectl -n actions-runner-system wait deploy/actions-runner-controller --for condition=available --timeout 60s || :
else
kubectl apply \
-n actions-runner-system \
-f release/actions-runner-controller.yaml
kubectl -n actions-runner-system wait deploy/controller-manager --for condition=available --timeout 120s
kubectl -n actions-runner-system wait deploy/controller-manager --for condition=available --timeout 120s || :
fi
# Restart all ARC pods
kubectl -n actions-runner-system delete po -l app.kubernetes.io/name=actions-runner-controller
echo Waiting for all ARC pods to be up and running after restart
kubectl -n actions-runner-system wait deploy/actions-runner-controller --for condition=available --timeout 120s
# Adhocly wait for some time until actions-runner-controller's admission webhook gets ready
sleep 20
RUNNER_LABEL=${RUNNER_LABEL:-self-hosted}
if [ -n "${TEST_REPO}" ]; then
if [ "${USE_RUNNERSET}" -ne "false" ]; then
cat acceptance/testdata/repo.runnerset.yaml | envsubst | kubectl apply -f -
cat acceptance/testdata/repo.runnerset.hra.yaml | envsubst | kubectl apply -f -
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ENTERPRISE= TEST_ORG= RUNNER_MIN_REPLICAS=${REPO_RUNNER_MIN_REPLICAS} NAME=repo-runnerset envsubst | kubectl apply -f -
else
echo 'Deploying runnerdeployment and hra. Set USE_RUNNERSET if you want to deploy runnerset instead.'
cat acceptance/testdata/repo.runnerdeploy.yaml | envsubst | kubectl apply -f -
cat acceptance/testdata/repo.hra.yaml | envsubst | kubectl apply -f -
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ENTERPRISE= TEST_ORG= RUNNER_MIN_REPLICAS=${REPO_RUNNER_MIN_REPLICAS} NAME=repo-runnerdeploy envsubst | kubectl apply -f -
fi
else
echo 'Skipped deploying runnerdeployment and hra. Set TEST_REPO to "yourorg/yourrepo" to deploy.'
fi
if [ -n "${TEST_ORG}" ]; then
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= NAME=org-runnerdeploy envsubst | kubectl apply -f -
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= RUNNER_MIN_REPLICAS=${ORG_RUNNER_MIN_REPLICAS} NAME=org-runnerset envsubst | kubectl apply -f -
else
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= RUNNER_MIN_REPLICAS=${ORG_RUNNER_MIN_REPLICAS} NAME=org-runnerdeploy envsubst | kubectl apply -f -
fi
if [ -n "${TEST_ORG_GROUP}" ]; then
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= TEST_GROUP=${TEST_ORG_GROUP} NAME=orggroup-runnerdeploy envsubst | kubectl apply -f -
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= RUNNER_MIN_REPLICAS=${ORG_RUNNER_MIN_REPLICAS} TEST_GROUP=${TEST_ORG_GROUP} NAME=orgroupg-runnerset envsubst | kubectl apply -f -
else
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ENTERPRISE= TEST_REPO= RUNNER_MIN_REPLICAS=${ORG_RUNNER_MIN_REPLICAS} TEST_GROUP=${TEST_ORG_GROUP} NAME=orggroup-runnerdeploy envsubst | kubectl apply -f -
fi
else
echo 'Skipped deploying enterprise runnerdeployment. Set TEST_ORG_GROUP to deploy.'
fi
@@ -89,10 +111,18 @@ else
fi
if [ -n "${TEST_ENTERPRISE}" ]; then
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ORG= TEST_REPO= NAME=enterprise-runnerdeploy envsubst | kubectl apply -f -
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ORG= TEST_REPO= RUNNER_MIN_REPLICAS=${ENTERPRISE_RUNNER_MIN_REPLICAS} NAME=enterprise-runnerset envsubst | kubectl apply -f -
else
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ORG= TEST_REPO= RUNNER_MIN_REPLICAS=${ENTERPRISE_RUNNER_MIN_REPLICAS} NAME=enterprise-runnerdeploy envsubst | kubectl apply -f -
fi
if [ -n "${TEST_ENTERPRISE_GROUP}" ]; then
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ORG= TEST_REPO= TEST_GROUP=${TEST_ENTERPRISE_GROUP} NAME=enterprisegroup-runnerdeploy envsubst | kubectl apply -f -
if [ "${USE_RUNNERSET}" != "false" ]; then
cat acceptance/testdata/runnerset.envsubst.yaml | TEST_ORG= TEST_REPO= RUNNER_MIN_REPLICAS=${ENTERPRISE_RUNNER_MIN_REPLICAS} TEST_GROUP=${TEST_ENTERPRISE_GROUP} NAME=enterprisegroup-runnerset envsubst | kubectl apply -f -
else
cat acceptance/testdata/runnerdeploy.envsubst.yaml | TEST_ORG= TEST_REPO= RUNNER_MIN_REPLICAS=${ENTERPRISE_RUNNER_MIN_REPLICAS} TEST_GROUP=${TEST_ENTERPRISE_GROUP} NAME=enterprisegroup-runnerdeploy envsubst | kubectl apply -f -
fi
else
echo 'Skipped deploying enterprise runnerdeployment. Set TEST_ENTERPRISE_GROUP to deploy.'
fi

View File

@@ -1,36 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: org
spec:
scaleTargetRef:
name: org-runnerdeploy
scaleUpTriggers:
- githubEvent:
checkRun:
types: ["created"]
status: "queued"
amount: 1
duration: "1m"
scheduledOverrides:
- startTime: "2021-05-11T16:05:00+09:00"
endTime: "2021-05-11T16:40:00+09:00"
minReplicas: 2
- startTime: "2021-05-01T00:00:00+09:00"
endTime: "2021-05-03T00:00:00+09:00"
recurrenceRule:
frequency: Weekly
untilTime: "2022-05-01T00:00:00+09:00"
minReplicas: 0
minReplicas: 0
maxReplicas: 5
# Used to test that HRA is working for org runners
metrics:
- type: PercentageRunnersBusy
scaleUpThreshold: '0.75'
scaleDownThreshold: '0.3'
scaleUpFactor: '2'
scaleDownFactor: '0.5'
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- ${TEST_ORG_REPO}

View File

@@ -1,44 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: org-runnerdeploy
spec:
# replicas: 1
template:
spec:
organization: ${TEST_ORG}
#
# Custom runner image
#
image: ${RUNNER_NAME}:${RUNNER_TAG}
imagePullPolicy: IfNotPresent
# Whether to pass --ephemeral (true) or --once (false, deprecated)
env:
- name: RUNNER_FEATURE_FLAG_EPHEMERAL
value: "${RUNNER_FEATURE_FLAG_EPHEMERAL}"
#
# dockerd within runner container
#
## Replace `mumoshu/actions-runner-dind:dev` with your dind image
#dockerdWithinRunnerContainer: true
#image: mumoshu/actions-runner-dind:dev
#
# Set the MTU used by dockerd-managed network interfaces (including docker-build-ubuntu)
#
#dockerMTU: 1450
#Runner group
# labels:
# - "mylabel 1"
# - "mylabel 2"
labels:
- "${RUNNER_LABEL}"
#
# Non-standard working directory
#
# workDir: "/"

View File

@@ -1,25 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: actions-runner-aos-autoscaler
spec:
scaleTargetRef:
name: example-runnerdeploy
scaleUpTriggers:
- githubEvent:
checkRun:
types: ["created"]
status: "queued"
amount: 1
duration: "1m"
minReplicas: 0
maxReplicas: 5
metrics:
- type: PercentageRunnersBusy
scaleUpThreshold: '0.75'
scaleDownThreshold: '0.3'
scaleUpFactor: '2'
scaleDownFactor: '0.5'
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- ${TEST_REPO}

View File

@@ -1,44 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example-runnerdeploy
spec:
# replicas: 1
template:
spec:
repository: ${TEST_REPO}
#
# Custom runner image
#
image: ${RUNNER_NAME}:${RUNNER_TAG}
imagePullPolicy: IfNotPresent
# Whether to pass --ephemeral (true) or --once (false, deprecated)
env:
- name: RUNNER_FEATURE_FLAG_EPHEMERAL
value: "${RUNNER_FEATURE_FLAG_EPHEMERAL}"
#
# dockerd within runner container
#
## Replace `mumoshu/actions-runner-dind:dev` with your dind image
#dockerdWithinRunnerContainer: true
#image: mumoshu/actions-runner-dind:dev
#
# Set the MTU used by dockerd-managed network interfaces (including docker-build-ubuntu)
#
#dockerMTU: 1450
#Runner group
# labels:
# - "mylabel 1"
# - "mylabel 2"
labels:
- "${RUNNER_LABEL}"
#
# Non-standard working directory
#
# workDir: "/"

View File

@@ -1,29 +0,0 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: example-runnerset
spec:
scaleTargetRef:
kind: RunnerSet
name: example-runnerset
scaleUpTriggers:
- githubEvent:
checkRun:
types: ["created"]
status: "queued"
amount: 1
duration: "1m"
# RunnerSet doesn't support scale from/to zero yet
minReplicas: 1
maxReplicas: 5
# This should be less than 600(seconds, the default) for faster testing
scaleDownDelaySecondsAfterScaleOut: 60
metrics:
- type: PercentageRunnersBusy
scaleUpThreshold: '0.75'
scaleDownThreshold: '0.3'
scaleUpFactor: '2'
scaleDownFactor: '0.5'
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- ${TEST_REPO}

View File

@@ -17,6 +17,8 @@ spec:
image: ${RUNNER_NAME}:${RUNNER_TAG}
imagePullPolicy: IfNotPresent
ephemeral: ${TEST_EPHEMERAL}
# Whether to pass --ephemeral (true) or --once (false, deprecated)
env:
- name: RUNNER_FEATURE_FLAG_EPHEMERAL
@@ -28,6 +30,7 @@ spec:
## Replace `mumoshu/actions-runner-dind:dev` with your dind image
#dockerdWithinRunnerContainer: true
#image: mumoshu/actions-runner-dind:dev
dockerdWithinRunnerContainer: ${RUNNER_DOCKERD_WITHIN_RUNNER_CONTAINER}
#
# Set the MTU used by dockerd-managed network interfaces (including docker-build-ubuntu)
@@ -56,6 +59,7 @@ spec:
scaleUpTriggers:
- githubEvent: {}
amount: 1
duration: "1m"
minReplicas: 0
duration: "10m"
minReplicas: ${RUNNER_MIN_REPLICAS}
maxReplicas: 10
scaleDownDelaySecondsAfterScaleOut: ${RUNNER_SCALE_DOWN_DELAY_SECONDS_AFTER_SCALE_OUT}

View File

@@ -1,17 +1,17 @@
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: example-runnerset
name: ${NAME}
spec:
# MANDATORY because it is based on StatefulSet: Results in a below error when omitted:
# missing required field "selector" in dev.summerwind.actions.v1alpha1.RunnerSet.spec
selector:
matchLabels:
app: example-runnerset
app: ${NAME}
# MANDATORY because it is based on StatefulSet: Results in a below error when omitted:
# missing required field "serviceName" in dev.summerwind.actions.v1alpha1.RunnerSet.spec]
serviceName: example-runnerset
serviceName: ${NAME}
#replicas: 1
@@ -20,16 +20,23 @@ spec:
# result in queued jobs hanging forever.
ephemeral: ${TEST_EPHEMERAL}
enterprise: ${TEST_ENTERPRISE}
group: ${TEST_GROUP}
organization: ${TEST_ORG}
repository: ${TEST_REPO}
#
# Custom runner image
#
image: ${RUNNER_NAME}:${RUNNER_TAG}
#
# dockerd within runner container
#
## Replace `mumoshu/actions-runner-dind:dev` with your dind image
#dockerdWithinRunnerContainer: true
dockerdWithinRunnerContainer: ${RUNNER_DOCKERD_WITHIN_RUNNER_CONTAINER}
#
# Set the MTU used by dockerd-managed network interfaces (including docker-build-ubuntu)
#
@@ -47,7 +54,7 @@ spec:
template:
metadata:
labels:
app: example-runnerset
app: ${NAME}
spec:
containers:
- name: runner
@@ -57,3 +64,19 @@ spec:
value: "${RUNNER_FEATURE_FLAG_EPHEMERAL}"
#- name: docker
# #image: mumoshu/actions-runner-dind:dev
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: ${NAME}
spec:
scaleTargetRef:
kind: RunnerSet
name: ${NAME}
scaleUpTriggers:
- githubEvent: {}
amount: 1
duration: "10m"
minReplicas: ${RUNNER_MIN_REPLICAS}
maxReplicas: 10
scaleDownDelaySecondsAfterScaleOut: ${RUNNER_SCALE_DOWN_DELAY_SECONDS_AFTER_SCALE_OUT}

View File

@@ -1,7 +1,7 @@
# Set actions-runner-controller settings for testing
githubAPICacheDuration: 10s
logLevel: "-4"
githubWebhookServer:
logLevel: debug
logLevel: "-4"
enabled: true
labels: {}
replicaCount: 1

View File

@@ -72,10 +72,12 @@ type GitHubEventScaleUpTriggerSpec struct {
CheckRun *CheckRunSpec `json:"checkRun,omitempty"`
PullRequest *PullRequestSpec `json:"pullRequest,omitempty"`
Push *PushSpec `json:"push,omitempty"`
WorkflowJob *WorkflowJobSpec `json:"workflowJob,omitempty"`
}
// https://docs.github.com/en/actions/reference/events-that-trigger-workflows#check_run
type CheckRunSpec struct {
// One of: created, rerequested, or completed
Types []string `json:"types,omitempty"`
Status string `json:"status,omitempty"`
@@ -90,6 +92,10 @@ type CheckRunSpec struct {
Repositories []string `json:"repositories,omitempty"`
}
// https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job
type WorkflowJobSpec struct {
}
// https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request
type PullRequestSpec struct {
Types []string `json:"types,omitempty"`
@@ -107,6 +113,9 @@ type CapacityReservation struct {
Name string `json:"name,omitempty"`
ExpirationTime metav1.Time `json:"expirationTime,omitempty"`
Replicas int `json:"replicas,omitempty"`
// +optional
EffectiveTime metav1.Time `json:"effectiveTime,omitempty"`
}
type ScaleTargetRef struct {

View File

@@ -145,7 +145,7 @@ type RunnerPodSpec struct {
HostAliases []corev1.HostAlias `json:"hostAliases,omitempty"`
// +optional
TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraint,omitempty"`
TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
// RuntimeClassName is the container runtime configuration that containers should run under.
// More info: https://kubernetes.io/docs/concepts/containers/runtime-class
@@ -153,7 +153,7 @@ type RunnerPodSpec struct {
RuntimeClassName *string `json:"runtimeClassName,omitempty"`
// +optional
DnsConfig []corev1.PodDNSConfig `json:"dnsConfig,omitempty"`
DnsConfig *corev1.PodDNSConfig `json:"dnsConfig,omitempty"`
}
// ValidateRepository validates repository field.
@@ -181,6 +181,9 @@ func (rs *RunnerSpec) ValidateRepository() error {
// RunnerStatus defines the observed state of Runner
type RunnerStatus struct {
// Turns true only if the runner pod is ready.
// +optional
Ready bool `json:"ready"`
// +optional
Registration RunnerStatusRegistration `json:"registration"`
// +optional

View File

@@ -31,6 +31,14 @@ type RunnerDeploymentSpec struct {
// +nullable
Replicas *int `json:"replicas,omitempty"`
// EffectiveTime is the time the upstream controller requested to sync Replicas.
// It is usually populated by the webhook-based autoscaler via HRA.
// The value is inherited to RunnerRepicaSet(s) and used to prevent ephemeral runners from unnecessarily recreated.
//
// +optional
// +nullable
EffectiveTime *metav1.Time `json:"effectiveTime"`
// +optional
// +nullable
Selector *metav1.LabelSelector `json:"selector"`

View File

@@ -26,7 +26,7 @@ import (
)
// log is for logging in this package.
var runenrDeploymentLog = logf.Log.WithName("runnerdeployment-resource")
var runnerDeploymentLog = logf.Log.WithName("runnerdeployment-resource")
func (r *RunnerDeployment) SetupWebhookWithManager(mgr ctrl.Manager) error {
return ctrl.NewWebhookManagedBy(mgr).
@@ -49,13 +49,13 @@ var _ webhook.Validator = &RunnerDeployment{}
// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *RunnerDeployment) ValidateCreate() error {
runenrDeploymentLog.Info("validate resource to be created", "name", r.Name)
runnerDeploymentLog.Info("validate resource to be created", "name", r.Name)
return r.Validate()
}
// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type
func (r *RunnerDeployment) ValidateUpdate(old runtime.Object) error {
runenrDeploymentLog.Info("validate resource to be updated", "name", r.Name)
runnerDeploymentLog.Info("validate resource to be updated", "name", r.Name)
return r.Validate()
}

View File

@@ -26,6 +26,15 @@ type RunnerReplicaSetSpec struct {
// +nullable
Replicas *int `json:"replicas,omitempty"`
// EffectiveTime is the time the upstream controller requested to sync Replicas.
// It is usually populated by the webhook-based autoscaler via HRA and RunnerDeployment.
// The value is used to prevent runnerreplicaset controller from unnecessarily recreating ephemeral runners
// based on potentially outdated Replicas value.
//
// +optional
// +nullable
EffectiveTime *metav1.Time `json:"effectiveTime"`
// +optional
// +nullable
Selector *metav1.LabelSelector `json:"selector"`

View File

@@ -25,6 +25,14 @@ import (
type RunnerSetSpec struct {
RunnerConfig `json:",inline"`
// EffectiveTime is the time the upstream controller requested to sync Replicas.
// It is usually populated by the webhook-based autoscaler via HRA.
// It is used to prevent ephemeral runners from unnecessarily recreated.
//
// +optional
// +nullable
EffectiveTime *metav1.Time `json:"effectiveTime,omitempty"`
appsv1.StatefulSetSpec `json:",inline"`
}

View File

@@ -47,6 +47,7 @@ func (in *CacheEntry) DeepCopy() *CacheEntry {
func (in *CapacityReservation) DeepCopyInto(out *CapacityReservation) {
*out = *in
in.ExpirationTime.DeepCopyInto(&out.ExpirationTime)
in.EffectiveTime.DeepCopyInto(&out.EffectiveTime)
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new CapacityReservation.
@@ -107,6 +108,11 @@ func (in *GitHubEventScaleUpTriggerSpec) DeepCopyInto(out *GitHubEventScaleUpTri
*out = new(PushSpec)
**out = **in
}
if in.WorkflowJob != nil {
in, out := &in.WorkflowJob, &out.WorkflowJob
*out = new(WorkflowJobSpec)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new GitHubEventScaleUpTriggerSpec.
@@ -498,6 +504,10 @@ func (in *RunnerDeploymentSpec) DeepCopyInto(out *RunnerDeploymentSpec) {
*out = new(int)
**out = **in
}
if in.EffectiveTime != nil {
in, out := &in.EffectiveTime, &out.EffectiveTime
*out = (*in).DeepCopy()
}
if in.Selector != nil {
in, out := &in.Selector, &out.Selector
*out = new(metav1.LabelSelector)
@@ -728,10 +738,8 @@ func (in *RunnerPodSpec) DeepCopyInto(out *RunnerPodSpec) {
}
if in.DnsConfig != nil {
in, out := &in.DnsConfig, &out.DnsConfig
*out = make([]v1.PodDNSConfig, len(*in))
for i := range *in {
(*in)[i].DeepCopyInto(&(*out)[i])
}
*out = new(v1.PodDNSConfig)
(*in).DeepCopyInto(*out)
}
}
@@ -812,6 +820,10 @@ func (in *RunnerReplicaSetSpec) DeepCopyInto(out *RunnerReplicaSetSpec) {
*out = new(int)
**out = **in
}
if in.EffectiveTime != nil {
in, out := &in.EffectiveTime, &out.EffectiveTime
*out = (*in).DeepCopy()
}
if in.Selector != nil {
in, out := &in.Selector, &out.Selector
*out = new(metav1.LabelSelector)
@@ -923,6 +935,10 @@ func (in *RunnerSetList) DeepCopyObject() runtime.Object {
func (in *RunnerSetSpec) DeepCopyInto(out *RunnerSetSpec) {
*out = *in
in.RunnerConfig.DeepCopyInto(&out.RunnerConfig)
if in.EffectiveTime != nil {
in, out := &in.EffectiveTime, &out.EffectiveTime
*out = (*in).DeepCopy()
}
in.StatefulSetSpec.DeepCopyInto(&out.StatefulSetSpec)
}
@@ -1109,3 +1125,18 @@ func (in *ScheduledOverride) DeepCopy() *ScheduledOverride {
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *WorkflowJobSpec) DeepCopyInto(out *WorkflowJobSpec) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new WorkflowJobSpec.
func (in *WorkflowJobSpec) DeepCopy() *WorkflowJobSpec {
if in == nil {
return nil
}
out := new(WorkflowJobSpec)
in.DeepCopyInto(out)
return out
}

View File

@@ -15,10 +15,10 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.15.3
version: 0.18.0
# Used as the default manager tag value when no tag property is provided in the values.yaml
appVersion: 0.20.4
appVersion: 0.23.0
home: https://github.com/actions-runner-controller/actions-runner-controller

View File

@@ -4,9 +4,9 @@ All additional docs are kept in the `docs/` folder, this README is solely for do
## Values
**_The values are documented as of HEAD, to review the configuration options for your chart version ensure you view this file at the relevent [tag](https://github.com/actions-runner-controller/actions-runner-controller/tags)_**
**_The values are documented as of HEAD, to review the configuration options for your chart version ensure you view this file at the relevant [tag](https://github.com/actions-runner-controller/actions-runner-controller/tags)_**
> _Default values are the defaults set in the charts values.yaml, some properties have default configurations in the code for when the property is omitted or invalid_
> _Default values are the defaults set in the charts `values.yaml`, some properties have default configurations in the code for when the property is omitted or invalid_
| Key | Description | Default |
|----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
@@ -15,7 +15,6 @@ All additional docs are kept in the `docs/` folder, this README is solely for do
| `syncPeriod` | Set the period in which the controler reconciles the desired runners count | 10m |
| `enableLeaderElection` | Enable election configuration | true |
| `leaderElectionId` | Set the election ID for the controller group | |
| `githubAPICacheDuration` | Set the cache period for API calls | |
| `githubEnterpriseServerURL` | Set the URL for a self-hosted GitHub Enterprise Server | |
| `githubURL` | Override GitHub URL to be used for GitHub API calls | |
| `githubUploadURL` | Override GitHub Upload URL to be used for GitHub API calls | |
@@ -33,6 +32,7 @@ All additional docs are kept in the `docs/` folder, this README is solely for do
| `authSecret.github_basicauth_username` | Username for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API | |
| `authSecret.github_basicauth_password` | Password for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API | |
| `dockerRegistryMirror` | The default Docker Registry Mirror used by runners. | |
| `hostNetwork` | The "hostNetwork" of the controller container | false |
| `image.repository` | The "repository/image" of the controller container | summerwind/actions-runner-controller |
| `image.tag` | The tag of the controller container | |
| `image.actionsRunnerRepositoryAndTag` | The "repository/image" of the actions runner container | summerwind/actions-runner:latest |
@@ -49,7 +49,7 @@ All additional docs are kept in the `docs/` folder, this README is solely for do
| `imagePullSecrets` | Specifies the secret to be used when pulling the controller pod containers | |
| `fullnameOverride` | Override the full resource names | |
| `nameOverride` | Override the resource name prefix | |
| `serviceAccont.annotations` | Set annotations to the service account | |
| `serviceAccount.annotations` | Set annotations to the service account | |
| `serviceAccount.create` | Deploy the controller pod under a service account | true |
| `podAnnotations` | Set annotations for the controller pod | |
| `podLabels` | Set labels for the controller pod | |

View File

@@ -49,6 +49,9 @@ spec:
items:
description: CapacityReservation specifies the number of replicas temporarily added to the scale target until ExpirationTime.
properties:
effectiveTime:
format: date-time
type: string
expirationTime:
format: date-time
type: string
@@ -138,6 +141,7 @@ spec:
status:
type: string
types:
description: 'One of: created, rerequested, or completed'
items:
type: string
type: array
@@ -157,6 +161,9 @@ spec:
push:
description: PushSpec is the condition for triggering scale-up on push event Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
type: object
workflowJob:
description: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job
type: object
type: object
type: object
type: array

View File

@@ -48,6 +48,11 @@ spec:
spec:
description: RunnerDeploymentSpec defines the desired state of RunnerDeployment
properties:
effectiveTime:
description: EffectiveTime is the time the upstream controller requested to sync Replicas. It is usually populated by the webhook-based autoscaler via HRA. The value is inherited to RunnerRepicaSet(s) and used to prevent ephemeral runners from unnecessarily recreated.
format: date-time
nullable: true
type: string
replicas:
nullable: true
type: integer
@@ -1349,33 +1354,31 @@ spec:
type: object
type: array
dnsConfig:
items:
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
type: array
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
dockerEnabled:
type: boolean
dockerEnv:
@@ -4152,7 +4155,7 @@ spec:
type: string
type: object
type: array
topologySpreadConstraint:
topologySpreadConstraints:
items:
description: TopologySpreadConstraint specifies how to spread matching pods among the given topology.
properties:

View File

@@ -45,6 +45,11 @@ spec:
spec:
description: RunnerReplicaSetSpec defines the desired state of RunnerReplicaSet
properties:
effectiveTime:
description: EffectiveTime is the time the upstream controller requested to sync Replicas. It is usually populated by the webhook-based autoscaler via HRA and RunnerDeployment. The value is used to prevent runnerreplicaset controller from unnecessarily recreating ephemeral runners based on potentially outdated Replicas value.
format: date-time
nullable: true
type: string
replicas:
nullable: true
type: integer
@@ -1346,33 +1351,31 @@ spec:
type: object
type: array
dnsConfig:
items:
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
type: array
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
dockerEnabled:
type: boolean
dockerEnv:
@@ -4149,7 +4152,7 @@ spec:
type: string
type: object
type: array
topologySpreadConstraint:
topologySpreadConstraints:
items:
description: TopologySpreadConstraint specifies how to spread matching pods among the given topology.
properties:

View File

@@ -1292,33 +1292,31 @@ spec:
type: object
type: array
dnsConfig:
items:
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
type: array
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
dockerEnabled:
type: boolean
dockerEnv:
@@ -4095,7 +4093,7 @@ spec:
type: string
type: object
type: array
topologySpreadConstraint:
topologySpreadConstraints:
items:
description: TopologySpreadConstraint specifies how to spread matching pods among the given topology.
properties:
@@ -5126,6 +5124,9 @@ spec:
type: string
phase:
type: string
ready:
description: Turns true only if the runner pod is ready.
type: boolean
reason:
type: string
registration:

View File

@@ -55,6 +55,11 @@ spec:
type: string
dockerdWithinRunnerContainer:
type: boolean
effectiveTime:
description: EffectiveTime is the time the upstream controller requested to sync Replicas. It is usually populated by the webhook-based autoscaler via HRA. It is used to prevent ephemeral runners from unnecessarily recreated.
format: date-time
nullable: true
type: string
enterprise:
pattern: ^[^/]+$
type: string

View File

@@ -18,20 +18,23 @@ Due to the above you can't just do a `helm upgrade` to release the latest versio
## Steps
1. Upgrade CRDs
1. Upgrade CRDs, this isn't optional, the CRDs you are using must be those that correspond with the version of the controller you are installing
```shell
# REMEMBER TO UPDATE THE CHART_VERSION TO RELEVANT CHART VERISON!!!!
CHART_VERSION=0.14.0
# REMEMBER TO UPDATE THE CHART_VERSION TO RELEVANT CHART VERISON!!!!
CHART_VERSION=0.18.0
curl -L https://github.com/actions-runner-controller/actions-runner-controller/releases/download/actions-runner-controller-${CHART_VERSION}/actions-runner-controller-${CHART_VERSION}.tgz | tar zxv --strip 1 actions-runner-controller/crds
kubectl apply -f crds/
kubectl replace -f crds/
```
2. Upgrade the Helm release
```shell
# helm repo [command]
helm repo update
# helm upgrade [RELEASE] [CHART] [flags]
helm upgrade actions-runner-controller \
actions-runner-controller/actions-runner-controller \

View File

@@ -14,6 +14,7 @@ spec:
metadata:
{{- with .Values.podAnnotations }}
annotations:
kubectl.kubernetes.io/default-logs-container: "manager"
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
@@ -44,6 +45,7 @@ spec:
- "--leader-election-id={{ .Values.leaderElectionId }}"
{{- end }}
- "--sync-period={{ .Values.syncPeriod }}"
- "--default-scale-down-delay={{ .Values.defaultScaleDownDelay }}"
- "--docker-image={{ .Values.image.dindSidecarRepositoryAndTag }}"
- "--runner-image={{ .Values.image.actionsRunnerRepositoryAndTag }}"
{{- range .Values.image.actionsRunnerImagePullSecrets }}
@@ -104,17 +106,16 @@ spec:
key: github_app_private_key
name: {{ include "actions-runner-controller.secretName" . }}
optional: true
{{- if .Values.authSecret.github_basicauth_username }}
{{- if .Values.authSecret.github_basicauth_username }}
- name: GITHUB_BASICAUTH_USERNAME
value: {{ .Values.authSecret.github_basicauth_username }}
{{- end }}
{{- if .Values.authSecret.github_basicauth_password }}
- name: GITHUB_BASICAUTH_PASSWORD
valueFrom:
secretKeyRef:
key: github_basicauth_password
name: {{ include "actions-runner-controller.secretName" . }}
{{- end }}
optional: true
{{- end }}
{{- range $key, $val := .Values.env }}
- name: {{ $key }}
@@ -199,3 +200,6 @@ spec:
topologySpreadConstraints:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- if .Values.hostNetwork }}
hostNetwork: {{ .Values.hostNetwork }}
{{- end }}

View File

@@ -15,6 +15,7 @@ spec:
metadata:
{{- with .Values.githubWebhookServer.podAnnotations }}
annotations:
kubectl.kubernetes.io/default-logs-container: "github-webhook-server"
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
@@ -94,17 +95,16 @@ spec:
key: github_app_private_key
name: {{ include "actions-runner-controller.githubWebhookServerSecretName" . }}
optional: true
{{- if .Values.authSecret.github_basicauth_username }}
{{- if .Values.authSecret.github_basicauth_username }}
- name: GITHUB_BASICAUTH_USERNAME
value: {{ .Values.authSecret.github_basicauth_username }}
{{- end }}
{{- if .Values.authSecret.github_basicauth_password }}
- name: GITHUB_BASICAUTH_PASSWORD
valueFrom:
secretKeyRef:
key: github_basicauth_password
name: {{ include "actions-runner-controller.secretName" . }}
{{- end }}
optional: true
{{- end }}
{{- range $key, $val := .Values.githubWebhookServer.env }}
- name: {{ $key }}

View File

@@ -12,6 +12,11 @@ metadata:
webhooks:
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ quote .Values.admissionWebHooks.caBundle }}
@@ -35,6 +40,11 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
@@ -58,6 +68,11 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
@@ -81,6 +96,11 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
@@ -117,6 +137,11 @@ metadata:
webhooks:
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
@@ -140,6 +165,11 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}
@@ -163,6 +193,11 @@ webhooks:
sideEffects: None
- admissionReviewVersions:
- v1beta1
{{- if .Values.scope.singleNamespace }}
namespaceSelector:
matchLabels:
name: {{ default .Release.Namespace .Values.scope.watchNamespace }}
{{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ .Values.admissionWebHooks.caBundle }}

View File

@@ -6,13 +6,15 @@ labels: {}
replicaCount: 1
syncPeriod: 10m
syncPeriod: 1m
defaultScaleDownDelay: 10m
enableLeaderElection: true
# Specifies the controller id for leader election.
# Must be unique if more than one controller installed onto the same namespace.
#leaderElectionId: "actions-runner-controller"
# DEPRECATED: This has been removed as unnecessary in #1192
# The controller tries its best not to repeat the duplicate GitHub API call
# within this duration.
# Defaults to syncPeriod - 10s.
@@ -165,6 +167,10 @@ admissionWebHooks:
{}
#caBundle: "Ci0tLS0tQk...<base64-encoded PEM bundle containing the CA that signed the webhook's serving certificate>...tLS0K"
# There may be alternatives to setting `hostNetwork: true`, see
# https://github.com/actions-runner-controller/actions-runner-controller/issues/1005#issuecomment-993097155
#hostNetwork: true
githubWebhookServer:
enabled: false
replicaCount: 1

View File

@@ -29,15 +29,14 @@ import (
actionsv1alpha1 "github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/actions-runner-controller/actions-runner-controller/controllers"
"github.com/actions-runner-controller/actions-runner-controller/github"
"github.com/actions-runner-controller/actions-runner-controller/logging"
"github.com/kelseyhightower/envconfig"
zaplib "go.uber.org/zap"
"k8s.io/apimachinery/pkg/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
_ "k8s.io/client-go/plugin/pkg/client/auth/exec"
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
_ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
// +kubebuilder:scaffold:imports
)
@@ -47,11 +46,6 @@ var (
)
const (
logLevelDebug = "debug"
logLevelInfo = "info"
logLevelWarn = "warn"
logLevelError = "error"
webhookSecretTokenEnvName = "GITHUB_WEBHOOK_SECRET_TOKEN"
)
@@ -97,7 +91,7 @@ func main() {
flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,
"Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.")
flag.DurationVar(&syncPeriod, "sync-period", 10*time.Minute, "Determines the minimum frequency at which K8s resources managed by this controller are reconciled. When you use autoscaling, set to a lower value like 10 minute, because this corresponds to the minimum time to react on demand change")
flag.StringVar(&logLevel, "log-level", logLevelDebug, `The verbosity of the logging. Valid values are "debug", "info", "warn", "error". Defaults to "debug".`)
flag.StringVar(&logLevel, "log-level", logging.LogLevelDebug, `The verbosity of the logging. Valid values are "debug", "info", "warn", "error". Defaults to "debug".`)
flag.StringVar(&webhookSecretToken, "github-webhook-secret-token", "", "The personal access token of GitHub.")
flag.StringVar(&c.Token, "github-token", c.Token, "The personal access token of GitHub.")
flag.Int64Var(&c.AppID, "github-app-id", c.AppID, "The application ID of GitHub App.")
@@ -126,23 +120,9 @@ func main() {
setupLog.Info("-watch-namespace is %q. Only HorizontalRunnerAutoscalers in %q are watched, cached, and considered as scale targets.")
}
logger := zap.New(func(o *zap.Options) {
switch logLevel {
case logLevelDebug:
o.Development = true
lvl := zaplib.NewAtomicLevelAt(-2) // maps to logr's V(2)
o.Level = &lvl
case logLevelInfo:
lvl := zaplib.NewAtomicLevelAt(zaplib.InfoLevel)
o.Level = &lvl
case logLevelWarn:
lvl := zaplib.NewAtomicLevelAt(zaplib.WarnLevel)
o.Level = &lvl
case logLevelError:
lvl := zaplib.NewAtomicLevelAt(zaplib.ErrorLevel)
o.Level = &lvl
}
})
logger := logging.NewLogger(logLevel)
ctrl.SetLogger(logger)
// In order to support runner groups with custom visibility (selected repositories), we need to perform some GitHub API calls.
// Let the user define if they want to opt-in supporting this option by providing the proper GitHub authentication parameters
@@ -150,6 +130,8 @@ func main() {
// That is, all runner groups managed by ARC are assumed to be visible to any repositories,
// which is wrong when you have one or more non-default runner groups in your organization or enterprise.
if len(c.Token) > 0 || (c.AppID > 0 && c.AppInstallationID > 0 && c.AppPrivateKey != "") || (len(c.BasicauthUsername) > 0 && len(c.BasicauthPassword) > 0) {
c.Log = &logger
ghClient, err = c.NewClient()
if err != nil {
fmt.Fprintln(os.Stderr, "Error: Client creation failed.", err)
@@ -160,8 +142,6 @@ func main() {
setupLog.Info("GitHub client is not initialized. Runner groups with custom visibility are not supported. If needed, please provide GitHub authentication. This will incur in extra GitHub API calls")
}
ctrl.SetLogger(logger)
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
SyncPeriod: &syncPeriod,

View File

@@ -49,6 +49,9 @@ spec:
items:
description: CapacityReservation specifies the number of replicas temporarily added to the scale target until ExpirationTime.
properties:
effectiveTime:
format: date-time
type: string
expirationTime:
format: date-time
type: string
@@ -138,6 +141,7 @@ spec:
status:
type: string
types:
description: 'One of: created, rerequested, or completed'
items:
type: string
type: array
@@ -157,6 +161,9 @@ spec:
push:
description: PushSpec is the condition for triggering scale-up on push event Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
type: object
workflowJob:
description: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job
type: object
type: object
type: object
type: array

View File

@@ -48,6 +48,11 @@ spec:
spec:
description: RunnerDeploymentSpec defines the desired state of RunnerDeployment
properties:
effectiveTime:
description: EffectiveTime is the time the upstream controller requested to sync Replicas. It is usually populated by the webhook-based autoscaler via HRA. The value is inherited to RunnerRepicaSet(s) and used to prevent ephemeral runners from unnecessarily recreated.
format: date-time
nullable: true
type: string
replicas:
nullable: true
type: integer
@@ -1349,33 +1354,31 @@ spec:
type: object
type: array
dnsConfig:
items:
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
type: array
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
dockerEnabled:
type: boolean
dockerEnv:
@@ -4152,7 +4155,7 @@ spec:
type: string
type: object
type: array
topologySpreadConstraint:
topologySpreadConstraints:
items:
description: TopologySpreadConstraint specifies how to spread matching pods among the given topology.
properties:

View File

@@ -45,6 +45,11 @@ spec:
spec:
description: RunnerReplicaSetSpec defines the desired state of RunnerReplicaSet
properties:
effectiveTime:
description: EffectiveTime is the time the upstream controller requested to sync Replicas. It is usually populated by the webhook-based autoscaler via HRA and RunnerDeployment. The value is used to prevent runnerreplicaset controller from unnecessarily recreating ephemeral runners based on potentially outdated Replicas value.
format: date-time
nullable: true
type: string
replicas:
nullable: true
type: integer
@@ -1346,33 +1351,31 @@ spec:
type: object
type: array
dnsConfig:
items:
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
type: array
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
dockerEnabled:
type: boolean
dockerEnv:
@@ -4149,7 +4152,7 @@ spec:
type: string
type: object
type: array
topologySpreadConstraint:
topologySpreadConstraints:
items:
description: TopologySpreadConstraint specifies how to spread matching pods among the given topology.
properties:

View File

@@ -1292,33 +1292,31 @@ spec:
type: object
type: array
dnsConfig:
items:
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
type: array
description: PodDNSConfig defines the DNS parameters of a pod in addition to those generated from DNSPolicy.
properties:
nameservers:
description: A list of DNS name server IP addresses. This will be appended to the base nameservers generated from DNSPolicy. Duplicated nameservers will be removed.
items:
type: string
type: array
options:
description: A list of DNS resolver options. This will be merged with the base options generated from DNSPolicy. Duplicated entries will be removed. Resolution options given in Options will override those that appear in the base DNSPolicy.
items:
description: PodDNSConfigOption defines DNS resolver options of a pod.
properties:
name:
description: Required.
type: string
value:
type: string
type: object
type: array
searches:
description: A list of DNS search domains for host-name lookup. This will be appended to the base search paths generated from DNSPolicy. Duplicated search paths will be removed.
items:
type: string
type: array
type: object
dockerEnabled:
type: boolean
dockerEnv:
@@ -4095,7 +4093,7 @@ spec:
type: string
type: object
type: array
topologySpreadConstraint:
topologySpreadConstraints:
items:
description: TopologySpreadConstraint specifies how to spread matching pods among the given topology.
properties:
@@ -5126,6 +5124,9 @@ spec:
type: string
phase:
type: string
ready:
description: Turns true only if the runner pod is ready.
type: boolean
reason:
type: string
registration:

View File

@@ -55,6 +55,11 @@ spec:
type: string
dockerdWithinRunnerContainer:
type: boolean
effectiveTime:
description: EffectiveTime is the time the upstream controller requested to sync Replicas. It is usually populated by the webhook-based autoscaler via HRA. It is used to prevent ephemeral runners from unnecessarily recreated.
format: date-time
nullable: true
type: string
enterprise:
pattern: ^[^/]+$
type: string

View File

@@ -8,6 +8,7 @@ spec:
conversion:
strategy: Webhook
webhook:
conversionReviewVersions: ["v1","v1beta1"]
clientConfig:
# this is "\n" used as a placeholder, otherwise it will be rejected by the apiserver for being blank,
# but we're going to set it later using the cert-manager (or potentially a patch if not using cert-manager)

View File

@@ -0,0 +1,23 @@
# This patch injects an HTTP proxy sidecar container that performs RBAC
# authorization against the Kubernetes API using SubjectAccessReviews.
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-webhook-server
spec:
template:
spec:
containers:
- name: kube-rbac-proxy
image: quay.io/brancz/kube-rbac-proxy:v0.10.0
args:
- '--secure-listen-address=0.0.0.0:8443'
- '--upstream=http://127.0.0.1:8080/'
- '--logtostderr=true'
- '--v=10'
ports:
- containerPort: 8443
name: https
- name: github-webhook-server
args:
- '--metrics-addr=127.0.0.1:8080'

View File

@@ -20,19 +20,22 @@ bases:
- ../webhook
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus
# [GH_WEBHOOK_SERVER] To enable the GitHub webhook server, uncomment all sections with 'GH_WEBHOOK_SERVER'.
#- ../github-webhook-server
patchesStrategicMerge:
# Protect the /metrics endpoint by putting it behind auth.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
# Protect the /metrics endpoint by putting it behind auth.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
- manager_auth_proxy_patch.yaml
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, uncomment the following line and
# comment manager_auth_proxy_patch.yaml.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, uncomment the following line and
# comment manager_auth_proxy_patch.yaml.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
#- manager_prometheus_metrics_patch.yaml
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in crd/kustomization.yaml
@@ -43,6 +46,10 @@ patchesStrategicMerge:
# 'CERTMANAGER' needs to be enabled to use ca injection
- webhookcainjection_patch.yaml
# [GH_WEBHOOK_SERVER] To enable the GitHub webhook server, uncomment all sections with 'GH_WEBHOOK_SERVER'.
# Protect the GitHub webhook server metrics endpoint by putting it behind auth.
# - gh-webhook-server-auth-proxy-patch.yaml
# the following config is for teaching kustomize how to do var substitution
vars:
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.

View File

@@ -23,4 +23,3 @@ spec:
args:
- "--metrics-addr=127.0.0.1:8080"
- "--enable-leader-election"
- "--sync-period=10m"

View File

@@ -0,0 +1,37 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
template:
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
spec:
containers:
- name: github-webhook-server
image: controller:latest
command:
- '/github-webhook-server'
env:
- name: GITHUB_WEBHOOK_SECRET_TOKEN
valueFrom:
secretKeyRef:
key: github_webhook_secret_token
name: github-webhook-server
optional: true
ports:
- containerPort: 8000
name: http
protocol: TCP
serviceAccountName: github-webhook-server
terminationGracePeriodSeconds: 10

View File

@@ -0,0 +1,12 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
images:
- name: controller
newName: summerwind/actions-runner-controller
newTag: latest
resources:
- deployment.yaml
- rbac.yaml
- service.yaml

View File

@@ -0,0 +1,113 @@
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
rules:
- apiGroups:
- actions.summerwind.dev
resources:
- horizontalrunnerautoscalers
verbs:
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- horizontalrunnerautoscalers/finalizers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- horizontalrunnerautoscalers/status
verbs:
- get
- patch
- update
- apiGroups:
- actions.summerwind.dev
resources:
- runnersets
verbs:
- get
- list
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- runnerdeployments
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- runnerdeployments/finalizers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- runnerdeployments/status
verbs:
- get
- patch
- update
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: github-webhook-server
subjects:
- kind: ServiceAccount
name: github-webhook-server

View File

@@ -0,0 +1,16 @@
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
spec:
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller

View File

@@ -7,7 +7,6 @@ import (
"math"
"strconv"
"strings"
"time"
"github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/google/go-github/v39/github"
@@ -20,47 +19,6 @@ const (
defaultScaleDownFactor = 0.7
)
func getValueAvailableAt(now time.Time, from, to *time.Time, reservedValue int) *int {
if to != nil && now.After(*to) {
return nil
}
if from != nil && now.Before(*from) {
return nil
}
return &reservedValue
}
func (r *HorizontalRunnerAutoscalerReconciler) fetchSuggestedReplicasFromCache(hra v1alpha1.HorizontalRunnerAutoscaler) *int {
var entry *v1alpha1.CacheEntry
for i := range hra.Status.CacheEntries {
ent := hra.Status.CacheEntries[i]
if ent.Key != v1alpha1.CacheEntryKeyDesiredReplicas {
continue
}
if !time.Now().Before(ent.ExpirationTime.Time) {
continue
}
entry = &ent
break
}
if entry != nil {
v := getValueAvailableAt(time.Now(), nil, &entry.ExpirationTime.Time, entry.Value)
if v != nil {
return v
}
}
return nil
}
func (r *HorizontalRunnerAutoscalerReconciler) suggestDesiredReplicas(st scaleTarget, hra v1alpha1.HorizontalRunnerAutoscaler) (*int, error) {
if hra.Spec.MinReplicas == nil {
return nil, fmt.Errorf("horizontalrunnerautoscaler %s/%s is missing minReplicas", hra.Namespace, hra.Name)
@@ -71,10 +29,8 @@ func (r *HorizontalRunnerAutoscalerReconciler) suggestDesiredReplicas(st scaleTa
metrics := hra.Spec.Metrics
numMetrics := len(metrics)
if numMetrics == 0 {
if len(hra.Spec.ScaleUpTriggers) == 0 {
return r.suggestReplicasByQueuedAndInProgressWorkflowRuns(st, hra, nil)
}
// We don't default to anything since ARC 0.23.0
// See https://github.com/actions-runner-controller/actions-runner-controller/issues/728
return nil, nil
} else if numMetrics > 2 {
return nil, fmt.Errorf("too many autoscaling metrics configured: It must be 0 to 2, but got %d", numMetrics)
@@ -182,7 +138,29 @@ func (r *HorizontalRunnerAutoscalerReconciler) suggestReplicasByQueuedAndInProgr
if len(allJobs) == 0 {
fallback_cb()
} else {
JOB:
for _, job := range allJobs {
runnerLabels := make(map[string]struct{}, len(st.labels))
for _, l := range st.labels {
runnerLabels[l] = struct{}{}
}
if len(job.Labels) == 0 {
// This shouldn't usually happen
r.Log.Info("Detected job with no labels, which is not supported by ARC. Skipping anyway.", "labels", job.Labels, "run_id", job.GetRunID(), "job_id", job.GetID())
continue JOB
}
for _, l := range job.Labels {
if l == "self-hosted" {
continue
}
if _, ok := runnerLabels[l]; !ok {
continue JOB
}
}
switch job.GetStatus() {
case "completed":
// We add a case for `completed` so it is not counted in `unknown`.

View File

@@ -41,8 +41,12 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
metav1Now := metav1.Now()
testcases := []struct {
repo string
org string
description string
repo string
org string
labels []string
fixed *int
max *int
min *int
@@ -68,6 +72,19 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"status":"in_progress"}, {"status":"in_progress"}]}"`,
want: 3,
},
// Explicitly speified the default `self-hosted` label which is ignored by the simulator,
// as we assume that GitHub Actions automatically associates the `self-hosted` label to every self-hosted runner.
// 3 demanded, max at 3
{
repo: "test/valid",
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(3),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"status":"queued"}, {"status":"in_progress"}, {"status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"status":"in_progress"}, {"status":"in_progress"}]}"`,
want: 3,
},
// 2 demanded, max at 3, currently 3, delay scaling down due to grace period
{
repo: "test/valid",
@@ -152,9 +169,40 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
want: 3,
},
// Job-level autoscaling
// 5 requested from 3 workflows
{
description: "Job-level autoscaling with no explicit runner label (runners have implicit self-hosted, requested self-hosted, 5 jobs from 3 workflows)",
repo: "test/valid",
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"completed", "labels":["self-hosted"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
},
want: 5,
},
{
description: "Skipped job-level autoscaling with no explicit runner label (runners have implicit self-hosted, requested self-hosted+custom, 0 jobs from 3 workflows)",
repo: "test/valid",
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling with no label (runners have implicit self-hosted, jobs had no labels, 0 jobs from 3 workflows)",
repo: "test/valid",
min: intPtr(2),
max: intPtr(10),
@@ -166,6 +214,91 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
2: `{"jobs": [{"status": "in_progress"}, {"status":"completed"}]}`,
3: `{"jobs": [{"status": "in_progress"}, {"status":"queued"}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling with default runner label (runners have self-hosted only, requested self-hosted+custom, 0 jobs from 3 workflows)",
repo: "test/valid",
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling with custom runner label (runners have custom2, requested self-hosted+custom, 0 jobs from 5 workflows",
repo: "test/valid",
labels: []string{"custom2"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling with default runner label (runners have self-hosted, requested managed-runner-label, 0 jobs from 3 runs)",
repo: "test/valid",
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["managed-runner-label"]}, {"status":"queued", "labels":["managed-runner-label"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["managed-runner-label"]}, {"status":"completed", "labels":["managed-runner-label"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["managed-runner-label"]}, {"status":"queued", "labels":["managed-runner-label"]}]}`,
},
want: 2,
},
{
description: "Job-level autoscaling with default + custom runner label (runners have self-hosted+custom, requested self-hosted+custom, 5 jobs from 3 workflows)",
repo: "test/valid",
labels: []string{"self-hosted", "custom"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 5,
},
{
description: "Job-level autoscaling with custom runner label (runners have custom, requested self-hosted+custom, 5 jobs from 3 workflows)",
repo: "test/valid",
labels: []string{"custom"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 5,
},
}
@@ -181,7 +314,12 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
_ = clientgoscheme.AddToScheme(scheme)
_ = v1alpha1.AddToScheme(scheme)
t.Run(fmt.Sprintf("case %d", i), func(t *testing.T) {
testName := fmt.Sprintf("case %d", i)
if tc.description != "" {
testName = tc.description
}
t.Run(testName, func(t *testing.T) {
server := fake.NewServer(
fake.WithListRepositoryWorkflowRunsResponse(200, tc.workflowRuns, tc.workflowRuns_queued, tc.workflowRuns_in_progress),
fake.WithListWorkflowJobsResponse(200, tc.workflowJobs),
@@ -191,9 +329,10 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
client := newGithubClient(server)
h := &HorizontalRunnerAutoscalerReconciler{
Log: log,
GitHubClient: client,
Scheme: scheme,
Log: log,
GitHubClient: client,
Scheme: scheme,
DefaultScaleDownDelay: DefaultScaleDownDelay,
}
rd := v1alpha1.RunnerDeployment{
@@ -206,6 +345,7 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
Spec: v1alpha1.RunnerSpec{
RunnerConfig: v1alpha1.RunnerConfig{
Repository: tc.repo,
Labels: tc.labels,
},
},
},
@@ -220,6 +360,11 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
Spec: v1alpha1.HorizontalRunnerAutoscalerSpec{
MaxReplicas: tc.max,
MinReplicas: tc.min,
Metrics: []v1alpha1.MetricSpec{
{
Type: "TotalNumberOfQueuedAndInProgressWorkflowRuns",
},
},
},
Status: v1alpha1.HorizontalRunnerAutoscalerStatus{
DesiredReplicas: tc.sReplicas,
@@ -234,7 +379,7 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
st := h.scaleTargetFromRD(context.Background(), rd)
got, _, _, err := h.computeReplicasWithCache(log, metav1Now.Time, st, hra, minReplicas)
got, err := h.computeReplicasWithCache(log, metav1Now.Time, st, hra, minReplicas)
if err != nil {
if tc.err == "" {
t.Fatalf("unexpected error: expected none, got %v", err)
@@ -258,8 +403,12 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
metav1Now := metav1.Now()
testcases := []struct {
repos []string
org string
description string
repos []string
org string
labels []string
fixed *int
max *int
min *int
@@ -399,9 +548,43 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
err: "validating autoscaling metrics: spec.autoscaling.metrics[].repositoryNames is required and must have one more more entries for organizational runner deployment",
},
// Job-level autoscaling
// 5 requested from 3 workflows
{
description: "Job-level autoscaling (runners have implicit self-hosted, requested self-hosted, 5 jobs from 3 runs)",
org: "test",
repos: []string{"valid"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"completed", "labels":["self-hosted"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
},
want: 5,
},
{
description: "Job-level autoscaling (runners have explicit self-hosted, requested self-hosted, 5 jobs from 3 runs)",
org: "test",
repos: []string{"valid"},
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"completed", "labels":["self-hosted"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted"]}, {"status":"queued", "labels":["self-hosted"]}]}`,
},
want: 5,
},
{
description: "Skipped job-level autoscaling (jobs lack labels, 0 requested from 3 workflows)",
org: "test",
repos: []string{"valid"},
min: intPtr(2),
@@ -414,8 +597,97 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
2: `{"jobs": [{"status": "in_progress"}, {"status":"completed"}]}`,
3: `{"jobs": [{"status": "in_progress"}, {"status":"queued"}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling (runners have valid and implicit self-hosted, requested self-hosted+custom, 0 jobs from 3 runs)",
org: "test",
repos: []string{"valid"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Skipped job-level autoscaling (runners have self-hosted, requested self-hosted+custom, 0 jobs from 3 workflows)",
org: "test",
repos: []string{"valid"},
labels: []string{"self-hosted"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
{
description: "Job-level autoscaling (runners have custom, requested self-hosted+custom, 5 requested from 3 workflows)",
org: "test",
repos: []string{"valid"},
labels: []string{"custom"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 5,
},
{
description: "Job-level autoscaling (runners have custom, requested custom, 5 requested from 3 workflows)",
org: "test",
repos: []string{"valid"},
labels: []string{"custom"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["custom"]}, {"status":"queued", "labels":["custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["custom"]}, {"status":"completed", "labels":["custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["custom"]}, {"status":"queued", "labels":["custom"]}]}`,
},
want: 5,
},
{
description: "Skipped job-level autoscaling (specified custom2, 0 requested from 3 workflows)",
org: "test",
repos: []string{"valid"},
labels: []string{"custom2"},
min: intPtr(2),
max: intPtr(10),
workflowRuns: `{"total_count": 4, "workflow_runs":[{"id": 1, "status":"queued"}, {"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowRuns_queued: `{"total_count": 1, "workflow_runs":[{"id": 1, "status":"queued"}]}"`,
workflowRuns_in_progress: `{"total_count": 2, "workflow_runs":[{"id": 2, "status":"in_progress"}, {"id": 3, "status":"in_progress"}, {"status":"completed"}]}"`,
workflowJobs: map[int]string{
1: `{"jobs": [{"status":"queued", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
2: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"completed", "labels":["self-hosted", "custom"]}]}`,
3: `{"jobs": [{"status": "in_progress", "labels":["self-hosted", "custom"]}, {"status":"queued", "labels":["self-hosted", "custom"]}]}`,
},
want: 2,
},
}
for i := range testcases {
@@ -429,7 +701,12 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
_ = clientgoscheme.AddToScheme(scheme)
_ = v1alpha1.AddToScheme(scheme)
t.Run(fmt.Sprintf("case %d", i), func(t *testing.T) {
testName := fmt.Sprintf("case %d", i)
if tc.description != "" {
testName = tc.description
}
t.Run(testName, func(t *testing.T) {
t.Helper()
server := fake.NewServer(
@@ -441,9 +718,10 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
client := newGithubClient(server)
h := &HorizontalRunnerAutoscalerReconciler{
Log: log,
Scheme: scheme,
GitHubClient: client,
Log: log,
Scheme: scheme,
GitHubClient: client,
DefaultScaleDownDelay: DefaultScaleDownDelay,
}
rd := v1alpha1.RunnerDeployment{
@@ -465,6 +743,7 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
Spec: v1alpha1.RunnerSpec{
RunnerConfig: v1alpha1.RunnerConfig{
Organization: tc.org,
Labels: tc.labels,
},
},
},
@@ -502,7 +781,7 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
st := h.scaleTargetFromRD(context.Background(), rd)
got, _, _, err := h.computeReplicasWithCache(log, metav1Now.Time, st, hra, minReplicas)
got, err := h.computeReplicasWithCache(log, metav1Now.Time, st, hra, minReplicas)
if err != nil {
if tc.err == "" {
t.Fatalf("unexpected error: expected none, got %v", err)

66
controllers/constants.go Normal file
View File

@@ -0,0 +1,66 @@
package controllers
import "time"
const (
LabelKeyRunnerSetName = "runnerset-name"
)
const (
// This names requires at least one slash to work.
// See https://github.com/google/knative-gcp/issues/378
runnerPodFinalizerName = "actions.summerwind.dev/runner-pod"
annotationKeyPrefix = "actions-runner/"
AnnotationKeyLastRegistrationCheckTime = "actions-runner-controller/last-registration-check-time"
// AnnotationKeyUnregistrationCompleteTimestamp is the annotation that is added onto the pod once the previously started unregistration process has been completed.
AnnotationKeyUnregistrationCompleteTimestamp = annotationKeyPrefix + "unregistration-complete-timestamp"
// AnnotationKeyRunnerCompletionWaitStartTimestamp is the annotation that is added onto the pod when
// ARC decided to wait until the pod to complete by itself, without the need for ARC to unregister the corresponding runner.
AnnotationKeyRunnerCompletionWaitStartTimestamp = annotationKeyPrefix + "runner-completion-wait-start-timestamp"
// unregistarionStartTimestamp is the annotation that contains the time that the requested unregistration process has been started
AnnotationKeyUnregistrationStartTimestamp = annotationKeyPrefix + "unregistration-start-timestamp"
// AnnotationKeyUnregistrationRequestTimestamp is the annotation that contains the time that the unregistration has been requested.
// This doesn't immediately start the unregistration. Instead, ARC will first check if the runner has already been registered.
// If not, ARC will hold on until the registration to complete first, and only after that it starts the unregistration process.
// This is crucial to avoid a race between ARC marking the runner pod for deletion while the actions-runner registers itself to GitHub, leaving the assigned job
// hang like forever.
AnnotationKeyUnregistrationRequestTimestamp = annotationKeyPrefix + "unregistration-request-timestamp"
AnnotationKeyRunnerID = annotationKeyPrefix + "id"
// This can be any value but a larger value can make an unregistration timeout longer than configured in practice.
DefaultUnregistrationRetryDelay = time.Minute
// RetryDelayOnCreateRegistrationError is the delay between retry attempts for runner registration token creation.
// Usually, a retry in this case happens when e.g. your PAT has no access to certain scope of runners, like you're using repository admin's token
// for creating a broader scoped runner token, like organizationa or enterprise runner token.
// Such permission issue will never fixed automatically, so we don't need to retry so often, hence this value.
RetryDelayOnCreateRegistrationError = 3 * time.Minute
// registrationTimeout is the duration until a pod times out after it becomes Ready and Running.
// A pod that is timed out can be terminated if needed.
registrationTimeout = 10 * time.Minute
defaultRegistrationCheckInterval = time.Minute
// DefaultRunnerPodRecreationDelayAfterWebhookScale is the delay until syncing the runners with the desired replicas
// after a webhook-based scale up.
// This is used to prevent ARC from recreating completed runner pods that are deleted soon without being used at all.
// In other words, this is used as a timer to wait for the completed runner to emit the next `workflow_job` webhook event to decrease the desired replicas.
// So if we set 30 seconds for this, you are basically saying that you would assume GitHub and your installation of ARC to
// emit and propagate a workflow_job completion event down to the RunnerSet or RunnerReplicaSet, vha ARC's github webhook server and HRA, in approximately 30 seconds.
// In case it actually took more than DefaultRunnerPodRecreationDelayAfterWebhookScale for the workflow_job completion event to arrive,
// ARC will recreate the completed runner(s), assuming something went wrong in either GitHub, your K8s cluster, or ARC, so ARC needs to resync anyway.
//
// See https://github.com/actions-runner-controller/actions-runner-controller/pull/1180
DefaultRunnerPodRecreationDelayAfterWebhookScale = 10 * time.Minute
EnvVarRunnerName = "RUNNER_NAME"
EnvVarRunnerToken = "RUNNER_TOKEN"
)

View File

@@ -242,18 +242,23 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) Handle(w http.Respons
enterpriseSlug,
labels,
)
if target != nil {
if e.GetAction() == "queued" {
target.Amount = 1
} else if e.GetAction() == "completed" {
// A nagative amount is processed in the tryScale func as a scale-down request,
// that erasese the oldest CapacityReservation with the same amount.
// If the first CapacityReservation was with Replicas=1, this negative scale target erases that,
// so that the resulting desired replicas decreases by 1.
target.Amount = -1
}
if target == nil {
break
}
if e.GetAction() == "queued" {
target.Amount = 1
break
} else if e.GetAction() == "completed" && e.GetWorkflowJob().GetConclusion() != "skipped" {
// A nagative amount is processed in the tryScale func as a scale-down request,
// that erasese the oldest CapacityReservation with the same amount.
// If the first CapacityReservation was with Replicas=1, this negative scale target erases that,
// so that the resulting desired replicas decreases by 1.
target.Amount = -1
break
}
// If the conclusion is "skipped", we will ignore it and fallthrough to the default case.
fallthrough
default:
ok = true
@@ -351,9 +356,7 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) findHRAsByKey(ctx con
return nil, err
}
for _, d := range hraList.Items {
hras = append(hras, d)
}
hras = append(hras, hraList.Items...)
}
return hras, nil
@@ -614,11 +617,7 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getManagedRunnerGroup
return nil, fmt.Errorf("unsupported scale target kind: %v", kind)
}
if g == "" {
continue
}
if e == "" && o == "" {
if g != "" && e == "" && o == "" {
autoscaler.Log.V(1).Info(
"invalid runner group config in scale target: spec.group must be set along with either spec.enterprise or spec.organization",
"scaleTargetKind", kind,
@@ -631,6 +630,16 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) getManagedRunnerGroup
}
if e != enterprise && o != org {
autoscaler.Log.V(1).Info(
"Skipped scale target irrelevant to event",
"eventOrganization", org,
"eventEnterprise", enterprise,
"scaleTargetKind", kind,
"scaleTargetGroup", g,
"scaleTargetEnterprise", e,
"scaleTargetOrganization", o,
)
continue
}
@@ -659,16 +668,29 @@ HRA:
if len(hra.Spec.ScaleUpTriggers) > 1 {
autoscaler.Log.V(1).Info("Skipping this HRA as it has too many ScaleUpTriggers to be used in workflow_job based scaling", "hra", hra.Name)
continue
}
if len(hra.Spec.ScaleUpTriggers) == 0 {
autoscaler.Log.V(1).Info("Skipping this HRA as it has no ScaleUpTriggers configured", "hra", hra.Name)
continue
}
scaleUpTrigger := hra.Spec.ScaleUpTriggers[0]
if scaleUpTrigger.GitHubEvent == nil {
autoscaler.Log.V(1).Info("Skipping this HRA as it has no `githubEvent` scale trigger configured", "hra", hra.Name)
continue
}
var duration metav1.Duration
if scaleUpTrigger.GitHubEvent.WorkflowJob == nil {
autoscaler.Log.V(1).Info("Skipping this HRA as it has no `githubEvent.workflowJob` scale trigger configured", "hra", hra.Name)
if len(hra.Spec.ScaleUpTriggers) > 0 {
duration = hra.Spec.ScaleUpTriggers[0].Duration
continue
}
duration := scaleUpTrigger.Duration
if duration.Duration <= 0 {
// Try to release the reserved capacity after at least 10 minutes by default,
// we won't end up in the reserved capacity remained forever in case GitHub somehow stopped sending us "completed" workflow_job events.
@@ -764,8 +786,10 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) tryScale(ctx context.
capacityReservations := getValidCapacityReservations(copy)
if amount > 0 {
now := time.Now()
copy.Spec.CapacityReservations = append(capacityReservations, v1alpha1.CapacityReservation{
ExpirationTime: metav1.Time{Time: time.Now().Add(target.ScaleUpTrigger.Duration.Duration)},
EffectiveTime: metav1.Time{Time: now},
ExpirationTime: metav1.Time{Time: now.Add(target.ScaleUpTrigger.Duration.Duration)},
Replicas: amount,
})
} else if amount < 0 {
@@ -784,10 +808,16 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) tryScale(ctx context.
copy.Spec.CapacityReservations = reservations
}
autoscaler.Log.Info(
"Patching hra for capacityReservations update",
"before", target.HorizontalRunnerAutoscaler.Spec.CapacityReservations,
"after", copy.Spec.CapacityReservations,
before := len(target.HorizontalRunnerAutoscaler.Spec.CapacityReservations)
expired := before - len(capacityReservations)
after := len(copy.Spec.CapacityReservations)
autoscaler.Log.V(1).Info(
fmt.Sprintf("Patching hra %s for capacityReservations update", target.HorizontalRunnerAutoscaler.Name),
"before", before,
"expired", expired,
"amount", amount,
"after", after,
)
if err := autoscaler.Client.Patch(ctx, copy, client.MergeFrom(&target.HorizontalRunnerAutoscaler)); err != nil {
@@ -823,6 +853,7 @@ func (autoscaler *HorizontalRunnerAutoscalerGitHubWebhook) SetupWithManager(mgr
hra := rawObj.(*v1alpha1.HorizontalRunnerAutoscaler)
if hra.Spec.ScaleTargetRef.Name == "" {
autoscaler.Log.V(1).Info(fmt.Sprintf("scale target ref name not set for hra %s", hra.Name))
return nil
}

View File

@@ -138,6 +138,13 @@ func TestWebhookWorkflowJob(t *testing.T) {
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
@@ -177,6 +184,13 @@ func TestWebhookWorkflowJob(t *testing.T) {
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
@@ -217,6 +231,13 @@ func TestWebhookWorkflowJob(t *testing.T) {
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
@@ -277,6 +298,13 @@ func TestWebhookWorkflowJobWithSelfHostedLabel(t *testing.T) {
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
@@ -316,6 +344,13 @@ func TestWebhookWorkflowJobWithSelfHostedLabel(t *testing.T) {
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}
@@ -356,6 +391,13 @@ func TestWebhookWorkflowJobWithSelfHostedLabel(t *testing.T) {
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: "test-name",
},
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
},
},
},
}

View File

@@ -47,13 +47,13 @@ const (
// HorizontalRunnerAutoscalerReconciler reconciles a HorizontalRunnerAutoscaler object
type HorizontalRunnerAutoscalerReconciler struct {
client.Client
GitHubClient *github.Client
Log logr.Logger
Recorder record.EventRecorder
Scheme *runtime.Scheme
CacheDuration time.Duration
Name string
GitHubClient *github.Client
Log logr.Logger
Recorder record.EventRecorder
Scheme *runtime.Scheme
CacheDuration time.Duration
DefaultScaleDownDelay time.Duration
Name string
}
const defaultReplicas = 1
@@ -99,11 +99,33 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(ctx context.Context, re
return r.reconcile(ctx, req, log, hra, st, func(newDesiredReplicas int) error {
currentDesiredReplicas := getIntOrDefault(rd.Spec.Replicas, defaultReplicas)
ephemeral := rd.Spec.Template.Spec.Ephemeral == nil || *rd.Spec.Template.Spec.Ephemeral
var effectiveTime *time.Time
for _, r := range hra.Spec.CapacityReservations {
t := r.EffectiveTime
if effectiveTime == nil || effectiveTime.Before(t.Time) {
effectiveTime = &t.Time
}
}
// Please add more conditions that we can in-place update the newest runnerreplicaset without disruption
if currentDesiredReplicas != newDesiredReplicas {
copy := rd.DeepCopy()
copy.Spec.Replicas = &newDesiredReplicas
if ephemeral && effectiveTime != nil {
copy.Spec.EffectiveTime = &metav1.Time{Time: *effectiveTime}
}
if err := r.Client.Patch(ctx, copy, client.MergeFrom(&rd)); err != nil {
return fmt.Errorf("patching runnerdeployment to have %d replicas: %w", newDesiredReplicas, err)
}
} else if ephemeral && effectiveTime != nil {
copy := rd.DeepCopy()
copy.Spec.EffectiveTime = &metav1.Time{Time: *effectiveTime}
if err := r.Client.Patch(ctx, copy, client.MergeFrom(&rd)); err != nil {
return fmt.Errorf("patching runnerdeployment to have %d replicas: %w", newDesiredReplicas, err)
}
@@ -137,6 +159,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(ctx context.Context, re
org: rs.Spec.Organization,
repo: rs.Spec.Repository,
replicas: replicas,
labels: rs.Spec.RunnerConfig.Labels,
getRunnerMap: func() (map[string]struct{}, error) {
// return the list of runners in namespace. Horizontal Runner Autoscaler should only be responsible for scaling resources in its own ns.
var runnerPodList corev1.PodList
@@ -180,15 +203,38 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(ctx context.Context, re
}
currentDesiredReplicas := getIntOrDefault(replicas, defaultReplicas)
ephemeral := rs.Spec.Ephemeral == nil || *rs.Spec.Ephemeral
var effectiveTime *time.Time
for _, r := range hra.Spec.CapacityReservations {
t := r.EffectiveTime
if effectiveTime == nil || effectiveTime.Before(t.Time) {
effectiveTime = &t.Time
}
}
if currentDesiredReplicas != newDesiredReplicas {
copy := rs.DeepCopy()
v := int32(newDesiredReplicas)
copy.Spec.Replicas = &v
if ephemeral && effectiveTime != nil {
copy.Spec.EffectiveTime = &metav1.Time{Time: *effectiveTime}
}
if err := r.Client.Patch(ctx, copy, client.MergeFrom(&rs)); err != nil {
return fmt.Errorf("patching runnerset to have %d replicas: %w", newDesiredReplicas, err)
}
} else if ephemeral && effectiveTime != nil {
copy := rs.DeepCopy()
copy.Spec.EffectiveTime = &metav1.Time{Time: *effectiveTime}
if err := r.Client.Patch(ctx, copy, client.MergeFrom(&rs)); err != nil {
return fmt.Errorf("patching runnerset to have %d replicas: %w", newDesiredReplicas, err)
}
}
return nil
})
}
@@ -206,6 +252,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) scaleTargetFromRD(ctx context.Con
org: rd.Spec.Template.Spec.Organization,
repo: rd.Spec.Template.Spec.Repository,
replicas: rd.Spec.Replicas,
labels: rd.Spec.Template.Spec.RunnerConfig.Labels,
getRunnerMap: func() (map[string]struct{}, error) {
// return the list of runners in namespace. Horizontal Runner Autoscaler should only be responsible for scaling resources in its own ns.
var runnerList v1alpha1.RunnerList
@@ -248,6 +295,7 @@ type scaleTarget struct {
st, kind string
enterprise, repo, org string
replicas *int
labels []string
getRunnerMap func() (map[string]struct{}, error)
}
@@ -262,7 +310,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) reconcile(ctx context.Context, re
return ctrl.Result{}, err
}
newDesiredReplicas, computedReplicas, computedReplicasFromCache, err := r.computeReplicasWithCache(log, now, st, hra, minReplicas)
newDesiredReplicas, err := r.computeReplicasWithCache(log, now, st, hra, minReplicas)
if err != nil {
r.Recorder.Event(&hra, corev1.EventTypeNormal, "RunnerAutoscalingFailure", err.Error())
@@ -287,24 +335,6 @@ func (r *HorizontalRunnerAutoscalerReconciler) reconcile(ctx context.Context, re
updated.Status.DesiredReplicas = &newDesiredReplicas
}
if computedReplicasFromCache == nil {
cacheEntries := getValidCacheEntries(updated, now)
var cacheDuration time.Duration
if r.CacheDuration > 0 {
cacheDuration = r.CacheDuration
} else {
cacheDuration = 10 * time.Minute
}
updated.Status.CacheEntries = append(cacheEntries, v1alpha1.CacheEntry{
Key: v1alpha1.CacheEntryKeyDesiredReplicas,
Value: computedReplicas,
ExpirationTime: metav1.Time{Time: time.Now().Add(cacheDuration)},
})
}
var overridesSummary string
if (active != nil && upcoming == nil) || (active != nil && upcoming != nil && active.Period.EndTime.Before(upcoming.Period.StartTime)) {
@@ -339,18 +369,6 @@ func (r *HorizontalRunnerAutoscalerReconciler) reconcile(ctx context.Context, re
return ctrl.Result{}, nil
}
func getValidCacheEntries(hra *v1alpha1.HorizontalRunnerAutoscaler, now time.Time) []v1alpha1.CacheEntry {
var cacheEntries []v1alpha1.CacheEntry
for _, ent := range hra.Status.CacheEntries {
if ent.ExpirationTime.After(now) {
cacheEntries = append(cacheEntries, ent)
}
}
return cacheEntries
}
func (r *HorizontalRunnerAutoscalerReconciler) SetupWithManager(mgr ctrl.Manager) error {
name := "horizontalrunnerautoscaler-controller"
if r.Name != "" {
@@ -443,32 +461,18 @@ func (r *HorizontalRunnerAutoscalerReconciler) getMinReplicas(log logr.Logger, n
return minReplicas, active, upcoming, nil
}
func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr.Logger, now time.Time, st scaleTarget, hra v1alpha1.HorizontalRunnerAutoscaler, minReplicas int) (int, int, *int, error) {
func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr.Logger, now time.Time, st scaleTarget, hra v1alpha1.HorizontalRunnerAutoscaler, minReplicas int) (int, error) {
var suggestedReplicas int
suggestedReplicasFromCache := r.fetchSuggestedReplicasFromCache(hra)
v, err := r.suggestDesiredReplicas(st, hra)
if err != nil {
return 0, err
}
var cached *int
if suggestedReplicasFromCache != nil {
cached = suggestedReplicasFromCache
if cached == nil {
suggestedReplicas = minReplicas
} else {
suggestedReplicas = *cached
}
if v == nil {
suggestedReplicas = minReplicas
} else {
v, err := r.suggestDesiredReplicas(st, hra)
if err != nil {
return 0, 0, nil, err
}
if v == nil {
suggestedReplicas = minReplicas
} else {
suggestedReplicas = *v
}
suggestedReplicas = *v
}
var reserved int
@@ -496,7 +500,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr
if hra.Spec.ScaleDownDelaySecondsAfterScaleUp != nil {
scaleDownDelay = time.Duration(*hra.Spec.ScaleDownDelaySecondsAfterScaleUp) * time.Second
} else {
scaleDownDelay = DefaultScaleDownDelay
scaleDownDelay = r.DefaultScaleDownDelay
}
var scaleDownDelayUntil *time.Time
@@ -527,8 +531,8 @@ func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr
"min", minReplicas,
}
if cached != nil {
kvs = append(kvs, "cached", *cached)
if maxReplicas := hra.Spec.MaxReplicas; maxReplicas != nil {
kvs = append(kvs, "max", *maxReplicas)
}
if scaleDownDelayUntil != nil {
@@ -536,13 +540,9 @@ func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr
kvs = append(kvs, "scale_down_delay_until", scaleDownDelayUntil)
}
if maxReplicas := hra.Spec.MaxReplicas; maxReplicas != nil {
kvs = append(kvs, "max", *maxReplicas)
}
log.V(1).Info(fmt.Sprintf("Calculated desired replicas of %d", newDesiredReplicas),
kvs...,
)
return newDesiredReplicas, suggestedReplicas, suggestedReplicasFromCache, nil
return newDesiredReplicas, nil
}

View File

@@ -1,50 +0,0 @@
package controllers
import (
"testing"
"time"
actionsv1alpha1 "github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/google/go-cmp/cmp"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func TestGetValidCacheEntries(t *testing.T) {
now := time.Now()
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
Status: actionsv1alpha1.HorizontalRunnerAutoscalerStatus{
CacheEntries: []actionsv1alpha1.CacheEntry{
{
Key: "foo",
Value: 1,
ExpirationTime: metav1.Time{Time: now.Add(-time.Second)},
},
{
Key: "foo",
Value: 2,
ExpirationTime: metav1.Time{Time: now},
},
{
Key: "foo",
Value: 3,
ExpirationTime: metav1.Time{Time: now.Add(time.Second)},
},
},
},
}
revs := getValidCacheEntries(hra, now)
counts := map[string]int{}
for _, r := range revs {
counts[r.Key] += r.Value
}
want := map[string]int{"foo": 3}
if d := cmp.Diff(want, counts); d != "" {
t.Errorf("%s", d)
}
}

View File

@@ -108,8 +108,9 @@ func SetupIntegrationTest(ctx2 context.Context) *testEnvironment {
RunnerImage: "example/runner:test",
DockerImage: "example/docker:test",
Name: controllerName("runner"),
RegistrationRecheckInterval: time.Millisecond,
RegistrationRecheckJitter: time.Millisecond,
RegistrationRecheckInterval: time.Millisecond * 100,
RegistrationRecheckJitter: time.Millisecond * 10,
UnregistrationRetryDelay: 1 * time.Second,
}
err = runnerController.SetupWithManager(mgr)
Expect(err).NotTo(HaveOccurred(), "failed to setup runner controller")
@@ -268,7 +269,6 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 2)
ExpectHRAStatusCacheEntryLengthEventuallyEquals(ctx, ns.Name, name, 1)
}
{
@@ -371,7 +371,6 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 3)
ExpectHRAStatusCacheEntryLengthEventuallyEquals(ctx, ns.Name, name, 1)
}
{
@@ -538,6 +537,106 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
}
})
It("should create and scale organization's repository runners on workflow_job event", func() {
name := "example-runnerdeploy"
{
rd := &actionsv1alpha1.RunnerDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
Spec: actionsv1alpha1.RunnerDeploymentSpec{
Replicas: intPtr(1),
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"foo": "bar",
},
},
Template: actionsv1alpha1.RunnerTemplate{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"foo": "bar",
},
},
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Repository: "test/valid",
Image: "bar",
Group: "baz",
},
RunnerPodSpec: actionsv1alpha1.RunnerPodSpec{
Env: []corev1.EnvVar{
{Name: "FOO", Value: "FOOVALUE"},
},
},
},
},
},
}
ExpectCreate(ctx, rd, "test RunnerDeployment")
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 1)
env.ExpectRegisteredNumberCountEventuallyEquals(1, "count of fake list runners")
}
// Scale-up to 1 replica via ScaleUpTriggers.GitHubEvent.WorkflowJob based scaling
{
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
Spec: actionsv1alpha1.HorizontalRunnerAutoscalerSpec{
ScaleTargetRef: actionsv1alpha1.ScaleTargetRef{
Name: name,
},
MinReplicas: intPtr(1),
MaxReplicas: intPtr(5),
ScaleDownDelaySecondsAfterScaleUp: intPtr(1),
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
Amount: 1,
Duration: metav1.Duration{Duration: time.Minute},
},
},
},
}
ExpectCreate(ctx, hra, "test HorizontalRunnerAutoscaler")
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 1)
env.ExpectRegisteredNumberCountEventuallyEquals(1, "count of fake list runners")
}
// Scale-up to 2 replicas on first workflow_job.queued webhook event
{
env.SendWorkflowJobEvent("test", "valid", "queued", []string{"self-hosted"})
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 2, "runners after first webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(2, "count of fake list runners")
}
// Scale-up to 3 replicas on second workflow_job.queued webhook event
{
env.SendWorkflowJobEvent("test", "valid", "queued", []string{"self-hosted"})
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 3, "runners after second webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(3, "count of fake list runners")
}
// Do not scale-up on third workflow_job.queued webhook event
// repo "example" doesn't match our Spec
{
env.SendWorkflowJobEvent("test", "example", "queued", []string{"self-hosted"})
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 3, "runners after third webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(3, "count of fake list runners")
}
})
It("should create and scale organization's repository runners only on check_run event", func() {
name := "example-runnerdeploy"
@@ -582,9 +681,7 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
env.ExpectRegisteredNumberCountEventuallyEquals(1, "count of fake list runners")
}
// Scale-up to 3 replicas by the default TotalNumberOfQueuedAndInProgressWorkflowRuns-based scaling
// See workflowRunsFor3Replicas_queued and workflowRunsFor3Replicas_in_progress for GitHub List-Runners API responses
// used while testing.
// Scale-up to 1 replica via ScaleUpTriggers.GitHubEvent.CheckRun based scaling
{
hra := &actionsv1alpha1.HorizontalRunnerAutoscaler{
ObjectMeta: metav1.ObjectMeta{
@@ -1131,9 +1228,11 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
ScaleDownDelaySecondsAfterScaleUp: intPtr(1),
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{},
Amount: 1,
Duration: metav1.Duration{Duration: time.Minute},
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
Amount: 1,
Duration: metav1.Duration{Duration: time.Minute},
},
},
},
@@ -1151,7 +1250,7 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
// Scale-up to 2 replicas on first workflow_job webhook event
{
env.SendWorkflowJobEvent("test", "valid", "pending", "created", []string{"self-hosted"})
env.SendWorkflowJobEvent("test", "valid", "queued", []string{"self-hosted"})
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1, "runner sets after webhook")
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 2, "runners after first webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(2, "count of fake list runners")
@@ -1213,9 +1312,11 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
ScaleDownDelaySecondsAfterScaleUp: intPtr(1),
ScaleUpTriggers: []actionsv1alpha1.ScaleUpTrigger{
{
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{},
Amount: 1,
Duration: metav1.Duration{Duration: time.Minute},
GitHubEvent: &actionsv1alpha1.GitHubEventScaleUpTriggerSpec{
WorkflowJob: &actionsv1alpha1.WorkflowJobSpec{},
},
Amount: 1,
Duration: metav1.Duration{Duration: time.Minute},
},
},
},
@@ -1233,7 +1334,7 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
// Scale-up to 2 replicas on first workflow_job webhook event
{
env.SendWorkflowJobEvent("test", "valid", "pending", "created", []string{"custom-label"})
env.SendWorkflowJobEvent("test", "valid", "queued", []string{"custom-label"})
ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1, "runner sets after webhook")
ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 2, "runners after first webhook event")
env.ExpectRegisteredNumberCountEventuallyEquals(2, "count of fake list runners")
@@ -1243,21 +1344,6 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
})
})
func ExpectHRAStatusCacheEntryLengthEventuallyEquals(ctx context.Context, ns string, name string, value int, optionalDescriptions ...interface{}) {
EventuallyWithOffset(
1,
func() int {
var hra actionsv1alpha1.HorizontalRunnerAutoscaler
err := k8sClient.Get(ctx, types.NamespacedName{Namespace: ns, Name: name}, &hra)
ExpectWithOffset(1, err).NotTo(HaveOccurred(), "failed to get test HRA resource")
return len(hra.Status.CacheEntries)
},
time.Second*5, time.Millisecond*500).Should(Equal(value), optionalDescriptions...)
}
func ExpectHRADesiredReplicasEquals(ctx context.Context, ns, name string, desired int, optionalDescriptions ...interface{}) {
var rd actionsv1alpha1.HorizontalRunnerAutoscaler
@@ -1329,6 +1415,30 @@ func (env *testEnvironment) SendOrgCheckRunEvent(org, repo, status, action strin
ExpectWithOffset(1, resp.StatusCode).To(Equal(200))
}
func (env *testEnvironment) SendWorkflowJobEvent(org, repo, statusAndAction string, labels []string) {
resp, err := sendWebhook(env.webhookServer, "workflow_job", &github.WorkflowJobEvent{
WorkflowJob: &github.WorkflowJob{
Status: &statusAndAction,
Labels: labels,
},
Org: &github.Organization{
Login: github.String(org),
},
Repo: &github.Repository{
Name: github.String(repo),
Owner: &github.User{
Login: github.String(org),
Type: github.String("Organization"),
},
},
Action: github.String(statusAndAction),
})
ExpectWithOffset(1, err).NotTo(HaveOccurred(), "failed to send workflow_job event")
ExpectWithOffset(1, resp.StatusCode).To(Equal(200))
}
func (env *testEnvironment) SendUserPullRequestEvent(owner, repo, branch, action string) {
resp, err := sendWebhook(env.webhookServer, "pull_request", &github.PullRequestEvent{
PullRequest: &github.PullRequest{
@@ -1371,28 +1481,6 @@ func (env *testEnvironment) SendUserCheckRunEvent(owner, repo, status, action st
ExpectWithOffset(1, resp.StatusCode).To(Equal(200))
}
func (env *testEnvironment) SendWorkflowJobEvent(owner, repo, status, action string, labels []string) {
resp, err := sendWebhook(env.webhookServer, "workflow_job", &github.WorkflowJobEvent{
Org: &github.Organization{
Name: github.String(owner),
},
WorkflowJob: &github.WorkflowJob{
Labels: labels,
},
Action: github.String("queued"),
Repo: &github.Repository{
Name: github.String(repo),
Owner: &github.User{
Login: github.String(owner),
Type: github.String("Organization"),
},
},
})
ExpectWithOffset(1, err).NotTo(HaveOccurred(), "failed to send check_run event")
ExpectWithOffset(1, resp.StatusCode).To(Equal(200))
}
func (env *testEnvironment) SyncRunnerRegistrations() {
var runnerList actionsv1alpha1.RunnerList

View File

@@ -0,0 +1,940 @@
package controllers
import (
"testing"
arcv1alpha1 "github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/actions-runner-controller/actions-runner-controller/github"
"github.com/stretchr/testify/require"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
)
func TestNewRunnerPod(t *testing.T) {
type testcase struct {
description string
template corev1.Pod
config arcv1alpha1.RunnerConfig
want corev1.Pod
}
base := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"actions-runner-controller/inject-registration-token": "true",
"runnerset-name": "runner",
},
},
Spec: corev1.PodSpec{
Volumes: []corev1.Volume{
{
Name: "runner",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
{
Name: "work",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
{
Name: "certs-client",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
},
Containers: []corev1.Container{
{
Name: "runner",
Image: "default-runner-image",
Env: []corev1.EnvVar{
{
Name: "RUNNER_ORG",
Value: "",
},
{
Name: "RUNNER_REPO",
Value: "",
},
{
Name: "RUNNER_ENTERPRISE",
Value: "",
},
{
Name: "RUNNER_LABELS",
Value: "",
},
{
Name: "RUNNER_GROUP",
Value: "",
},
{
Name: "DOCKER_ENABLED",
Value: "true",
},
{
Name: "DOCKERD_IN_RUNNER",
Value: "false",
},
{
Name: "GITHUB_URL",
Value: "api.github.com",
},
{
Name: "RUNNER_WORKDIR",
Value: "/runner/_work",
},
{
Name: "RUNNER_EPHEMERAL",
Value: "true",
},
{
Name: "DOCKER_HOST",
Value: "tcp://localhost:2376",
},
{
Name: "DOCKER_TLS_VERIFY",
Value: "1",
},
{
Name: "DOCKER_CERT_PATH",
Value: "/certs/client",
},
{
Name: "RUNNER_FEATURE_FLAG_EPHEMERAL",
Value: "true",
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "runner",
MountPath: "/runner",
},
{
Name: "work",
MountPath: "/runner/_work",
},
{
Name: "certs-client",
MountPath: "/certs/client",
ReadOnly: true,
},
},
ImagePullPolicy: corev1.PullAlways,
SecurityContext: &corev1.SecurityContext{
Privileged: func() *bool { v := false; return &v }(),
},
},
{
Name: "docker",
Image: "default-docker-image",
Env: []corev1.EnvVar{
{
Name: "DOCKER_TLS_CERTDIR",
Value: "/certs",
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "runner",
MountPath: "/runner",
},
{
Name: "certs-client",
MountPath: "/certs/client",
},
{
Name: "work",
MountPath: "/runner/_work",
},
},
SecurityContext: &corev1.SecurityContext{
Privileged: func(b bool) *bool { return &b }(true),
},
},
},
RestartPolicy: corev1.RestartPolicyOnFailure,
},
}
boolPtr := func(v bool) *bool {
return &v
}
dinrBase := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"actions-runner-controller/inject-registration-token": "true",
"runnerset-name": "runner",
},
},
Spec: corev1.PodSpec{
Volumes: []corev1.Volume{
{
Name: "runner",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
},
Containers: []corev1.Container{
{
Name: "runner",
Image: "default-runner-image",
Env: []corev1.EnvVar{
{
Name: "RUNNER_ORG",
Value: "",
},
{
Name: "RUNNER_REPO",
Value: "",
},
{
Name: "RUNNER_ENTERPRISE",
Value: "",
},
{
Name: "RUNNER_LABELS",
Value: "",
},
{
Name: "RUNNER_GROUP",
Value: "",
},
{
Name: "DOCKER_ENABLED",
Value: "true",
},
{
Name: "DOCKERD_IN_RUNNER",
Value: "true",
},
{
Name: "GITHUB_URL",
Value: "api.github.com",
},
{
Name: "RUNNER_WORKDIR",
Value: "/runner/_work",
},
{
Name: "RUNNER_EPHEMERAL",
Value: "true",
},
{
Name: "RUNNER_FEATURE_FLAG_EPHEMERAL",
Value: "true",
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "runner",
MountPath: "/runner",
},
},
ImagePullPolicy: corev1.PullAlways,
SecurityContext: &corev1.SecurityContext{
Privileged: boolPtr(true),
},
},
},
RestartPolicy: corev1.RestartPolicyOnFailure,
},
}
dockerDisabled := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"actions-runner-controller/inject-registration-token": "true",
"runnerset-name": "runner",
},
},
Spec: corev1.PodSpec{
Volumes: []corev1.Volume{
{
Name: "runner",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
},
Containers: []corev1.Container{
{
Name: "runner",
Image: "default-runner-image",
Env: []corev1.EnvVar{
{
Name: "RUNNER_ORG",
Value: "",
},
{
Name: "RUNNER_REPO",
Value: "",
},
{
Name: "RUNNER_ENTERPRISE",
Value: "",
},
{
Name: "RUNNER_LABELS",
Value: "",
},
{
Name: "RUNNER_GROUP",
Value: "",
},
{
Name: "DOCKER_ENABLED",
Value: "false",
},
{
Name: "DOCKERD_IN_RUNNER",
Value: "false",
},
{
Name: "GITHUB_URL",
Value: "api.github.com",
},
{
Name: "RUNNER_WORKDIR",
Value: "/runner/_work",
},
{
Name: "RUNNER_EPHEMERAL",
Value: "true",
},
{
Name: "RUNNER_FEATURE_FLAG_EPHEMERAL",
Value: "true",
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "runner",
MountPath: "/runner",
},
},
ImagePullPolicy: corev1.PullAlways,
SecurityContext: &corev1.SecurityContext{
Privileged: boolPtr(false),
},
},
},
RestartPolicy: corev1.RestartPolicyOnFailure,
},
}
newTestPod := func(base corev1.Pod, f func(*corev1.Pod)) corev1.Pod {
pod := base.DeepCopy()
if f != nil {
f(pod)
}
return *pod
}
testcases := []testcase{
{
description: "it should have unprivileged runner and privileged sidecar docker container",
template: corev1.Pod{},
config: arcv1alpha1.RunnerConfig{},
want: newTestPod(base, nil),
},
{
description: "dockerdWithinRunnerContainer=true should set privileged=true and omit the dind sidecar container",
template: corev1.Pod{},
config: arcv1alpha1.RunnerConfig{
DockerdWithinRunnerContainer: boolPtr(true),
},
want: newTestPod(dinrBase, nil),
},
{
description: "in the default config you should provide both dockerdWithinRunnerContainer=true and runnerImage",
template: corev1.Pod{},
config: arcv1alpha1.RunnerConfig{
DockerdWithinRunnerContainer: boolPtr(true),
Image: "dind-runner-image",
},
want: newTestPod(dinrBase, func(p *corev1.Pod) {
p.Spec.Containers[0].Image = "dind-runner-image"
}),
},
{
description: "dockerEnabled=false should have no effect when dockerdWithinRunnerContainer=true",
template: corev1.Pod{},
config: arcv1alpha1.RunnerConfig{
DockerdWithinRunnerContainer: boolPtr(true),
DockerEnabled: boolPtr(false),
},
want: newTestPod(dinrBase, nil),
},
{
description: "dockerEnabled=false should omit the dind sidecar and set privileged=false and envvars DOCKER_ENABLED=false and DOCKERD_IN_RUNNER=false",
template: corev1.Pod{},
config: arcv1alpha1.RunnerConfig{
DockerEnabled: boolPtr(false),
},
want: newTestPod(dockerDisabled, nil),
},
{
description: "TODO: dockerEnabled=false results in privileged=false by default but you can override it",
template: corev1.Pod{
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "runner",
SecurityContext: &corev1.SecurityContext{
Privileged: boolPtr(true),
},
},
},
},
},
config: arcv1alpha1.RunnerConfig{
DockerEnabled: boolPtr(false),
},
want: newTestPod(dockerDisabled, func(p *corev1.Pod) {
// TODO
// p.Spec.Containers[0].SecurityContext.Privileged = boolPtr(true)
}),
},
}
var (
defaultRunnerImage = "default-runner-image"
defaultRunnerImagePullSecrets = []string{}
defaultDockerImage = "default-docker-image"
defaultDockerRegistryMirror = ""
githubBaseURL = "api.github.com"
)
for i := range testcases {
tc := testcases[i]
t.Run(tc.description, func(t *testing.T) {
got, err := newRunnerPod("runner", tc.template, tc.config, defaultRunnerImage, defaultRunnerImagePullSecrets, defaultDockerImage, defaultDockerRegistryMirror, githubBaseURL, false)
require.NoError(t, err)
require.Equal(t, tc.want, got)
})
}
}
func TestNewRunnerPodFromRunnerController(t *testing.T) {
type testcase struct {
description string
runner arcv1alpha1.Runner
want corev1.Pod
}
boolPtr := func(v bool) *bool {
return &v
}
base := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "runner",
Labels: map[string]string{
"actions-runner-controller/inject-registration-token": "true",
"pod-template-hash": "8857b86c7",
"runnerset-name": "runner",
},
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: "actions.summerwind.dev/v1alpha1",
Kind: "Runner",
Name: "runner",
Controller: boolPtr(true),
BlockOwnerDeletion: boolPtr(true),
},
},
},
Spec: corev1.PodSpec{
Volumes: []corev1.Volume{
{
Name: "runner",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
{
Name: "work",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
{
Name: "certs-client",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
},
Containers: []corev1.Container{
{
Name: "runner",
Image: "default-runner-image",
Env: []corev1.EnvVar{
{
Name: "RUNNER_ORG",
Value: "",
},
{
Name: "RUNNER_REPO",
Value: "",
},
{
Name: "RUNNER_ENTERPRISE",
Value: "",
},
{
Name: "RUNNER_LABELS",
Value: "",
},
{
Name: "RUNNER_GROUP",
Value: "",
},
{
Name: "DOCKER_ENABLED",
Value: "true",
},
{
Name: "DOCKERD_IN_RUNNER",
Value: "false",
},
{
Name: "GITHUB_URL",
Value: "api.github.com",
},
{
Name: "RUNNER_WORKDIR",
Value: "/runner/_work",
},
{
Name: "RUNNER_EPHEMERAL",
Value: "true",
},
{
Name: "DOCKER_HOST",
Value: "tcp://localhost:2376",
},
{
Name: "DOCKER_TLS_VERIFY",
Value: "1",
},
{
Name: "DOCKER_CERT_PATH",
Value: "/certs/client",
},
{
Name: "RUNNER_FEATURE_FLAG_EPHEMERAL",
Value: "true",
},
{
Name: "RUNNER_NAME",
Value: "runner",
},
{
Name: "RUNNER_TOKEN",
Value: "",
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "runner",
MountPath: "/runner",
},
{
Name: "work",
MountPath: "/runner/_work",
},
{
Name: "certs-client",
MountPath: "/certs/client",
ReadOnly: true,
},
},
ImagePullPolicy: corev1.PullAlways,
SecurityContext: &corev1.SecurityContext{
Privileged: func() *bool { v := false; return &v }(),
},
},
{
Name: "docker",
Image: "default-docker-image",
Env: []corev1.EnvVar{
{
Name: "DOCKER_TLS_CERTDIR",
Value: "/certs",
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "runner",
MountPath: "/runner",
},
{
Name: "certs-client",
MountPath: "/certs/client",
},
{
Name: "work",
MountPath: "/runner/_work",
},
},
SecurityContext: &corev1.SecurityContext{
Privileged: func(b bool) *bool { return &b }(true),
},
},
},
RestartPolicy: corev1.RestartPolicyOnFailure,
},
}
dinrBase := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "runner",
Labels: map[string]string{
"actions-runner-controller/inject-registration-token": "true",
"pod-template-hash": "8857b86c7",
"runnerset-name": "runner",
},
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: "actions.summerwind.dev/v1alpha1",
Kind: "Runner",
Name: "runner",
Controller: boolPtr(true),
BlockOwnerDeletion: boolPtr(true),
},
},
},
Spec: corev1.PodSpec{
Volumes: []corev1.Volume{
{
Name: "runner",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
},
Containers: []corev1.Container{
{
Name: "runner",
Image: "default-runner-image",
Env: []corev1.EnvVar{
{
Name: "RUNNER_ORG",
Value: "",
},
{
Name: "RUNNER_REPO",
Value: "",
},
{
Name: "RUNNER_ENTERPRISE",
Value: "",
},
{
Name: "RUNNER_LABELS",
Value: "",
},
{
Name: "RUNNER_GROUP",
Value: "",
},
{
Name: "DOCKER_ENABLED",
Value: "true",
},
{
Name: "DOCKERD_IN_RUNNER",
Value: "true",
},
{
Name: "GITHUB_URL",
Value: "api.github.com",
},
{
Name: "RUNNER_WORKDIR",
Value: "/runner/_work",
},
{
Name: "RUNNER_EPHEMERAL",
Value: "true",
},
{
Name: "RUNNER_FEATURE_FLAG_EPHEMERAL",
Value: "true",
},
{
Name: "RUNNER_NAME",
Value: "runner",
},
{
Name: "RUNNER_TOKEN",
Value: "",
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "runner",
MountPath: "/runner",
},
},
ImagePullPolicy: corev1.PullAlways,
SecurityContext: &corev1.SecurityContext{
Privileged: boolPtr(true),
},
},
},
RestartPolicy: corev1.RestartPolicyOnFailure,
},
}
dockerDisabled := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "runner",
Labels: map[string]string{
"actions-runner-controller/inject-registration-token": "true",
"pod-template-hash": "8857b86c7",
"runnerset-name": "runner",
},
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: "actions.summerwind.dev/v1alpha1",
Kind: "Runner",
Name: "runner",
Controller: boolPtr(true),
BlockOwnerDeletion: boolPtr(true),
},
},
},
Spec: corev1.PodSpec{
Volumes: []corev1.Volume{
{
Name: "runner",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
},
},
Containers: []corev1.Container{
{
Name: "runner",
Image: "default-runner-image",
Env: []corev1.EnvVar{
{
Name: "RUNNER_ORG",
Value: "",
},
{
Name: "RUNNER_REPO",
Value: "",
},
{
Name: "RUNNER_ENTERPRISE",
Value: "",
},
{
Name: "RUNNER_LABELS",
Value: "",
},
{
Name: "RUNNER_GROUP",
Value: "",
},
{
Name: "DOCKER_ENABLED",
Value: "false",
},
{
Name: "DOCKERD_IN_RUNNER",
Value: "false",
},
{
Name: "GITHUB_URL",
Value: "api.github.com",
},
{
Name: "RUNNER_WORKDIR",
Value: "/runner/_work",
},
{
Name: "RUNNER_EPHEMERAL",
Value: "true",
},
{
Name: "RUNNER_FEATURE_FLAG_EPHEMERAL",
Value: "true",
},
{
Name: "RUNNER_NAME",
Value: "runner",
},
{
Name: "RUNNER_TOKEN",
Value: "",
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "runner",
MountPath: "/runner",
},
},
ImagePullPolicy: corev1.PullAlways,
SecurityContext: &corev1.SecurityContext{
Privileged: boolPtr(false),
},
},
},
RestartPolicy: corev1.RestartPolicyOnFailure,
},
}
newTestPod := func(base corev1.Pod, f func(*corev1.Pod)) corev1.Pod {
pod := base.DeepCopy()
if f != nil {
f(pod)
}
return *pod
}
testcases := []testcase{
{
description: "it should have unprivileged runner and privileged sidecar docker container",
runner: arcv1alpha1.Runner{
ObjectMeta: metav1.ObjectMeta{
Name: "runner",
},
Spec: arcv1alpha1.RunnerSpec{
RunnerConfig: arcv1alpha1.RunnerConfig{},
},
},
want: newTestPod(base, nil),
},
{
description: "dockerdWithinRunnerContainer=true should set privileged=true and omit the dind sidecar container",
runner: arcv1alpha1.Runner{
ObjectMeta: metav1.ObjectMeta{
Name: "runner",
},
Spec: arcv1alpha1.RunnerSpec{
RunnerConfig: arcv1alpha1.RunnerConfig{
DockerdWithinRunnerContainer: boolPtr(true),
},
},
},
want: newTestPod(dinrBase, nil),
},
{
description: "in the default config you should provide both dockerdWithinRunnerContainer=true and runnerImage",
runner: arcv1alpha1.Runner{
ObjectMeta: metav1.ObjectMeta{
Name: "runner",
},
Spec: arcv1alpha1.RunnerSpec{
RunnerConfig: arcv1alpha1.RunnerConfig{
DockerdWithinRunnerContainer: boolPtr(true),
Image: "dind-runner-image",
},
},
},
want: newTestPod(dinrBase, func(p *corev1.Pod) {
p.Spec.Containers[0].Image = "dind-runner-image"
}),
},
{
description: "dockerEnabled=false should have no effect when dockerdWithinRunnerContainer=true",
runner: arcv1alpha1.Runner{
ObjectMeta: metav1.ObjectMeta{
Name: "runner",
},
Spec: arcv1alpha1.RunnerSpec{
RunnerConfig: arcv1alpha1.RunnerConfig{
DockerdWithinRunnerContainer: boolPtr(true),
DockerEnabled: boolPtr(false),
},
},
},
want: newTestPod(dinrBase, nil),
},
{
description: "dockerEnabled=false should omit the dind sidecar and set privileged=false and envvars DOCKER_ENABLED=false and DOCKERD_IN_RUNNER=false",
runner: arcv1alpha1.Runner{
ObjectMeta: metav1.ObjectMeta{
Name: "runner",
},
Spec: arcv1alpha1.RunnerSpec{
RunnerConfig: arcv1alpha1.RunnerConfig{
DockerEnabled: boolPtr(false),
},
},
},
want: newTestPod(dockerDisabled, nil),
},
{
description: "TODO: dockerEnabled=false results in privileged=false by default but you can override it",
runner: arcv1alpha1.Runner{
ObjectMeta: metav1.ObjectMeta{
Name: "runner",
},
Spec: arcv1alpha1.RunnerSpec{
RunnerConfig: arcv1alpha1.RunnerConfig{
DockerEnabled: boolPtr(false),
},
RunnerPodSpec: arcv1alpha1.RunnerPodSpec{
Containers: []corev1.Container{
{
Name: "runner",
SecurityContext: &corev1.SecurityContext{
Privileged: boolPtr(true),
},
},
},
},
},
},
want: newTestPod(dockerDisabled, func(p *corev1.Pod) {
// p.Spec.Containers[0].SecurityContext.Privileged = boolPtr(true)
}),
},
}
var (
defaultRunnerImage = "default-runner-image"
defaultRunnerImagePullSecrets = []string{}
defaultDockerImage = "default-docker-image"
defaultDockerRegistryMirror = ""
githubBaseURL = "api.github.com"
)
scheme := runtime.NewScheme()
_ = clientgoscheme.AddToScheme(scheme)
_ = arcv1alpha1.AddToScheme(scheme)
for i := range testcases {
tc := testcases[i]
t.Run(tc.description, func(t *testing.T) {
r := &RunnerReconciler{
RunnerImage: defaultRunnerImage,
RunnerImagePullSecrets: defaultRunnerImagePullSecrets,
DockerImage: defaultDockerImage,
DockerRegistryMirror: defaultDockerRegistryMirror,
GitHubClient: &github.Client{GithubBaseURL: githubBaseURL},
Scheme: scheme,
}
got, err := r.newPod(tc.runner)
require.NoError(t, err)
require.Equal(t, tc.want, got)
})
}
}

View File

@@ -59,9 +59,9 @@ func (t *PodRunnerTokenInjector) Handle(ctx context.Context, req admission.Reque
return newEmptyResponse()
}
enterprise, okEnterprise := getEnv(runnerContainer, "RUNNER_ENTERPRISE")
repo, okRepo := getEnv(runnerContainer, "RUNNER_REPO")
org, okOrg := getEnv(runnerContainer, "RUNNER_ORG")
enterprise, okEnterprise := getEnv(runnerContainer, EnvVarEnterprise)
repo, okRepo := getEnv(runnerContainer, EnvVarRepo)
org, okOrg := getEnv(runnerContainer, EnvVarOrg)
if !okRepo || !okOrg || !okEnterprise {
return newEmptyResponse()
}

View File

@@ -18,15 +18,12 @@ package controllers
import (
"context"
"errors"
"fmt"
"strings"
"time"
"github.com/actions-runner-controller/actions-runner-controller/hash"
"github.com/go-logr/logr"
gogithub "github.com/google/go-github/v39/github"
"k8s.io/apimachinery/pkg/util/wait"
kerrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
@@ -53,9 +50,12 @@ const (
// This is an annotation internal to actions-runner-controller and can change in backward-incompatible ways
annotationKeyRegistrationOnly = "actions-runner-controller/registration-only"
EnvVarOrg = "RUNNER_ORG"
EnvVarRepo = "RUNNER_REPO"
EnvVarEnterprise = "RUNNER_ENTERPRISE"
EnvVarOrg = "RUNNER_ORG"
EnvVarRepo = "RUNNER_REPO"
EnvVarEnterprise = "RUNNER_ENTERPRISE"
EnvVarEphemeral = "RUNNER_EPHEMERAL"
EnvVarRunnerFeatureFlagEphemeral = "RUNNER_FEATURE_FLAG_EPHEMERAL"
EnvVarTrue = "true"
)
// RunnerReconciler reconciles a Runner object
@@ -72,6 +72,8 @@ type RunnerReconciler struct {
Name string
RegistrationRecheckInterval time.Duration
RegistrationRecheckJitter time.Duration
UnregistrationRetryDelay time.Duration
}
// +kubebuilder:rbac:groups=actions.summerwind.dev,resources=runners,verbs=get;list;watch;create;update;patch;delete
@@ -89,12 +91,6 @@ func (r *RunnerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
return ctrl.Result{}, client.IgnoreNotFound(err)
}
err := runner.Validate()
if err != nil {
log.Info("Failed to validate runner spec", "error", err.Error())
return ctrl.Result{}, nil
}
if runner.ObjectMeta.DeletionTimestamp.IsZero() {
finalizers, added := addFinalizer(runner.ObjectMeta.Finalizers, finalizerName)
@@ -111,35 +107,16 @@ func (r *RunnerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
}
} else {
// Request to remove a runner. DeletionTimestamp was set in the runner - we need to unregister runner
return r.processRunnerDeletion(runner, ctx, log)
}
registrationOnly := metav1.HasAnnotation(runner.ObjectMeta, annotationKeyRegistrationOnly)
if registrationOnly && runner.Status.Phase != "" {
// At this point we are sure that the registration-only runner has successfully configured and
// is of `offline` status, because we set runner.Status.Phase to that of the runner pod only after
// successful registration.
var pod corev1.Pod
if err := r.Get(ctx, req.NamespacedName, &pod); err != nil {
if !kerrors.IsNotFound(err) {
log.Info(fmt.Sprintf("Retrying soon as we failed to get registration-only runner pod: %v", err))
return ctrl.Result{Requeue: true}, nil
}
} else if err := r.Delete(ctx, &pod); err != nil {
if !kerrors.IsNotFound(err) {
log.Info(fmt.Sprintf("Retrying soon as we failed to delete registration-only runner pod: %v", err))
log.Info(fmt.Sprintf("Retrying soon as we failed to get runner pod: %v", err))
return ctrl.Result{Requeue: true}, nil
}
// Pod was not found
return r.processRunnerDeletion(runner, ctx, log, nil)
}
log.Info("Successfully deleted registration-only runner pod to free node and cluster resource")
// Return here to not recreate the deleted pod, because recreating it is the waste of cluster and node resource,
// and also defeats the original purpose of scale-from/to-zero we're trying to implement by using the registration-only runner.
return ctrl.Result{}, nil
return r.processRunnerDeletion(runner, ctx, log, &pod)
}
var pod corev1.Pod
@@ -151,15 +128,67 @@ func (r *RunnerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
return r.processRunnerCreation(ctx, runner, log)
}
// Pod already exists
if !pod.ObjectMeta.DeletionTimestamp.IsZero() {
return r.processRunnerPodDeletion(ctx, runner, log, pod)
phase := string(pod.Status.Phase)
if phase == "" {
phase = "Created"
}
ready := runnerPodReady(&pod)
if runner.Status.Phase != phase || runner.Status.Ready != ready {
if pod.Status.Phase == corev1.PodRunning {
// Seeing this message, you can expect the runner to become `Running` soon.
log.V(1).Info(
"Runner appears to have been registered and running.",
"podCreationTimestamp", pod.CreationTimestamp,
)
}
updated := runner.DeepCopy()
updated.Status.Phase = phase
updated.Status.Ready = ready
updated.Status.Reason = pod.Status.Reason
updated.Status.Message = pod.Status.Message
if err := r.Status().Patch(ctx, updated, client.MergeFrom(&runner)); err != nil {
log.Error(err, "Failed to update runner status for Phase/Reason/Message")
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
func runnerPodReady(pod *corev1.Pod) bool {
for _, c := range pod.Status.Conditions {
if c.Type != corev1.PodReady {
continue
}
return c.Status == corev1.ConditionTrue
}
return false
}
func runnerContainerExitCode(pod *corev1.Pod) *int32 {
for _, status := range pod.Status.ContainerStatuses {
if status.Name != containerName {
continue
}
if status.State.Terminated != nil {
return &status.State.Terminated.ExitCode
}
}
return nil
}
func runnerPodOrContainerIsStopped(pod *corev1.Pod) bool {
// If pod has ended up succeeded we need to restart it
// Happens e.g. when dind is in runner and run completes
stopped := pod.Status.Phase == corev1.PodSucceeded
stopped := pod.Status.Phase == corev1.PodSucceeded || pod.Status.Phase == corev1.PodFailed
if !stopped {
if pod.Status.Phase == corev1.PodRunning {
@@ -168,338 +197,37 @@ func (r *RunnerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
continue
}
if status.State.Terminated != nil && status.State.Terminated.ExitCode == 0 {
if status.State.Terminated != nil {
stopped = true
}
}
}
}
restart := stopped
if registrationOnly && stopped {
restart = false
log.Info(
"Observed that registration-only runner for scaling-from-zero has successfully stopped. " +
"Unlike other pods, this one will be recreated only when runner spec changes.",
)
}
if updated, err := r.updateRegistrationToken(ctx, runner); err != nil {
return ctrl.Result{}, err
} else if updated {
return ctrl.Result{Requeue: true}, nil
}
newPod, err := r.newPod(runner)
if err != nil {
log.Error(err, "Could not create pod")
return ctrl.Result{}, err
}
if registrationOnly {
newPod.Spec.Containers[0].Env = append(
newPod.Spec.Containers[0].Env,
corev1.EnvVar{
Name: "RUNNER_REGISTRATION_ONLY",
Value: "true",
},
)
}
var registrationRecheckDelay time.Duration
// all checks done below only decide whether a restart is needed
// if a restart was already decided before, there is no need for the checks
// saving API calls and scary log messages
if !restart {
registrationCheckInterval := time.Minute
if r.RegistrationRecheckInterval > 0 {
registrationCheckInterval = r.RegistrationRecheckInterval
}
// We want to call ListRunners GitHub Actions API only once per runner per minute.
// This if block, in conjunction with:
// return ctrl.Result{RequeueAfter: registrationRecheckDelay}, nil
// achieves that.
if lastCheckTime := runner.Status.LastRegistrationCheckTime; lastCheckTime != nil {
nextCheckTime := lastCheckTime.Add(registrationCheckInterval)
now := time.Now()
// Requeue scheduled by RequeueAfter can happen a bit earlier (like dozens of milliseconds)
// so to avoid excessive, in-effective retry, we heuristically ignore the remaining delay in case it is
// shorter than 1s
requeueAfter := nextCheckTime.Sub(now) - time.Second
if requeueAfter > 0 {
log.Info(
fmt.Sprintf("Skipped registration check because it's deferred until %s. Retrying in %s at latest", nextCheckTime, requeueAfter),
"lastRegistrationCheckTime", lastCheckTime,
"registrationCheckInterval", registrationCheckInterval,
)
// Without RequeueAfter, the controller may not retry on scheduled. Instead, it must wait until the
// next sync period passes, which can be too much later than nextCheckTime.
//
// We need to requeue on this reconcilation even though we have already scheduled the initial
// requeue previously with `return ctrl.Result{RequeueAfter: registrationRecheckDelay}, nil`.
// Apparently, the workqueue used by controller-runtime seems to deduplicate and resets the delay on
// other requeues- so the initial scheduled requeue may have been reset due to requeue on
// spec/status change.
return ctrl.Result{RequeueAfter: requeueAfter}, nil
}
}
notFound := false
offline := false
runnerBusy, err := r.GitHubClient.IsRunnerBusy(ctx, runner.Spec.Enterprise, runner.Spec.Organization, runner.Spec.Repository, runner.Name)
currentTime := time.Now()
if err != nil {
var notFoundException *github.RunnerNotFound
var offlineException *github.RunnerOffline
if errors.As(err, &notFoundException) {
notFound = true
} else if errors.As(err, &offlineException) {
offline = true
} else {
var e *gogithub.RateLimitError
if errors.As(err, &e) {
// We log the underlying error when we failed calling GitHub API to list or unregisters,
// or the runner is still busy.
log.Error(
err,
fmt.Sprintf(
"Failed to check if runner is busy due to Github API rate limit. Retrying in %s to avoid excessive GitHub API calls",
retryDelayOnGitHubAPIRateLimitError,
),
)
return ctrl.Result{RequeueAfter: retryDelayOnGitHubAPIRateLimitError}, err
}
return ctrl.Result{}, err
}
}
// See the `newPod` function called above for more information
// about when this hash changes.
curHash := pod.Labels[LabelKeyPodTemplateHash]
newHash := newPod.Labels[LabelKeyPodTemplateHash]
if !runnerBusy && curHash != newHash {
restart = true
}
registrationTimeout := 10 * time.Minute
durationAfterRegistrationTimeout := currentTime.Sub(pod.CreationTimestamp.Add(registrationTimeout))
registrationDidTimeout := durationAfterRegistrationTimeout > 0
if notFound {
if registrationDidTimeout {
log.Info(
"Runner failed to register itself to GitHub in timely manner. "+
"Recreating the pod to see if it resolves the issue. "+
"CAUTION: If you see this a lot, you should investigate the root cause. "+
"See https://github.com/actions-runner-controller/actions-runner-controller/issues/288",
"podCreationTimestamp", pod.CreationTimestamp,
"currentTime", currentTime,
"configuredRegistrationTimeout", registrationTimeout,
)
restart = true
} else {
log.V(1).Info(
"Runner pod exists but we failed to check if runner is busy. Apparently it still needs more time.",
"runnerName", runner.Name,
)
}
} else if offline {
if registrationOnly {
log.Info(
"Observed that registration-only runner for scaling-from-zero has successfully been registered.",
"podCreationTimestamp", pod.CreationTimestamp,
"currentTime", currentTime,
"configuredRegistrationTimeout", registrationTimeout,
)
} else if registrationDidTimeout {
if runnerBusy {
log.Info(
"Timeout out while waiting for the runner to be online, but observed that it's busy at the same time."+
"This is a known (unintuitive) behaviour of a runner that is already running a job. Please see https://github.com/actions-runner-controller/actions-runner-controller/issues/911",
"podCreationTimestamp", pod.CreationTimestamp,
"currentTime", currentTime,
"configuredRegistrationTimeout", registrationTimeout,
)
} else {
log.Info(
"Already existing GitHub runner still appears offline . "+
"Recreating the pod to see if it resolves the issue. "+
"CAUTION: If you see this a lot, you should investigate the root cause. ",
"podCreationTimestamp", pod.CreationTimestamp,
"currentTime", currentTime,
"configuredRegistrationTimeout", registrationTimeout,
)
restart = true
}
} else {
log.V(1).Info(
"Runner pod exists but the GitHub runner appears to be still offline. Waiting for runner to get online ...",
"runnerName", runner.Name,
)
}
}
if (notFound || (offline && !registrationOnly)) && !registrationDidTimeout {
registrationRecheckJitter := 10 * time.Second
if r.RegistrationRecheckJitter > 0 {
registrationRecheckJitter = r.RegistrationRecheckJitter
}
registrationRecheckDelay = registrationCheckInterval + wait.Jitter(registrationRecheckJitter, 0.1)
}
}
// Don't do anything if there's no need to restart the runner
if !restart {
// This guard enables us to update runner.Status.Phase to `Running` only after
// the runner is registered to GitHub.
if registrationRecheckDelay > 0 {
log.V(1).Info(fmt.Sprintf("Rechecking the runner registration in %s", registrationRecheckDelay))
updated := runner.DeepCopy()
updated.Status.LastRegistrationCheckTime = &metav1.Time{Time: time.Now()}
if err := r.Status().Patch(ctx, updated, client.MergeFrom(&runner)); err != nil {
log.Error(err, "Failed to update runner status for LastRegistrationCheckTime")
return ctrl.Result{}, err
}
return ctrl.Result{RequeueAfter: registrationRecheckDelay}, nil
}
if runner.Status.Phase != string(pod.Status.Phase) {
if pod.Status.Phase == corev1.PodRunning {
// Seeing this message, you can expect the runner to become `Running` soon.
log.Info(
"Runner appears to have registered and running.",
"podCreationTimestamp", pod.CreationTimestamp,
)
}
updated := runner.DeepCopy()
updated.Status.Phase = string(pod.Status.Phase)
updated.Status.Reason = pod.Status.Reason
updated.Status.Message = pod.Status.Message
if err := r.Status().Patch(ctx, updated, client.MergeFrom(&runner)); err != nil {
log.Error(err, "Failed to update runner status for Phase/Reason/Message")
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
// Delete current pod if recreation is needed
if err := r.Delete(ctx, &pod); err != nil {
log.Error(err, "Failed to delete pod resource")
return ctrl.Result{}, err
}
r.Recorder.Event(&runner, corev1.EventTypeNormal, "PodDeleted", fmt.Sprintf("Deleted pod '%s'", newPod.Name))
log.Info("Deleted runner pod", "repository", runner.Spec.Repository)
return ctrl.Result{}, nil
return stopped
}
func (r *RunnerReconciler) processRunnerDeletion(runner v1alpha1.Runner, ctx context.Context, log logr.Logger) (reconcile.Result, error) {
func (r *RunnerReconciler) processRunnerDeletion(runner v1alpha1.Runner, ctx context.Context, log logr.Logger, pod *corev1.Pod) (reconcile.Result, error) {
finalizers, removed := removeFinalizer(runner.ObjectMeta.Finalizers, finalizerName)
if removed {
if len(runner.Status.Registration.Token) > 0 {
ok, err := r.unregisterRunner(ctx, runner.Spec.Enterprise, runner.Spec.Organization, runner.Spec.Repository, runner.Name)
if err != nil {
if errors.Is(err, &gogithub.RateLimitError{}) {
// We log the underlying error when we failed calling GitHub API to list or unregisters,
// or the runner is still busy.
log.Error(
err,
fmt.Sprintf(
"Failed to unregister runner due to GitHub API rate limits. Delaying retry for %s to avoid excessive GitHub API calls",
retryDelayOnGitHubAPIRateLimitError,
),
)
return ctrl.Result{RequeueAfter: retryDelayOnGitHubAPIRateLimitError}, err
}
return ctrl.Result{}, err
}
if !ok {
log.V(1).Info("Runner no longer exists on GitHub")
}
} else {
log.V(1).Info("Runner was never registered on GitHub")
}
newRunner := runner.DeepCopy()
newRunner.ObjectMeta.Finalizers = finalizers
if err := r.Patch(ctx, newRunner, client.MergeFrom(&runner)); err != nil {
log.Error(err, "Failed to update runner for finalizer removal")
log.Error(err, "Unable to remove finalizer")
return ctrl.Result{}, err
}
log.Info("Removed runner from GitHub", "repository", runner.Spec.Repository, "organization", runner.Spec.Organization)
log.Info("Removed finalizer")
}
return ctrl.Result{}, nil
}
func (r *RunnerReconciler) processRunnerPodDeletion(ctx context.Context, runner v1alpha1.Runner, log logr.Logger, pod corev1.Pod) (reconcile.Result, error) {
deletionTimeout := 1 * time.Minute
currentTime := time.Now()
deletionDidTimeout := currentTime.Sub(pod.DeletionTimestamp.Add(deletionTimeout)) > 0
if deletionDidTimeout {
log.Info(
fmt.Sprintf("Failed to delete pod within %s. ", deletionTimeout)+
"This is typically the case when a Kubernetes node became unreachable "+
"and the kube controller started evicting nodes. Forcefully deleting the pod to not get stuck.",
"podDeletionTimestamp", pod.DeletionTimestamp,
"currentTime", currentTime,
"configuredDeletionTimeout", deletionTimeout,
)
var force int64 = 0
// forcefully delete runner as we would otherwise get stuck if the node stays unreachable
if err := r.Delete(ctx, &pod, &client.DeleteOptions{GracePeriodSeconds: &force}); err != nil {
// probably
if !kerrors.IsNotFound(err) {
log.Error(err, "Failed to forcefully delete pod resource ...")
return ctrl.Result{}, err
}
// forceful deletion finally succeeded
return ctrl.Result{Requeue: true}, nil
}
r.Recorder.Event(&runner, corev1.EventTypeNormal, "PodDeleted", fmt.Sprintf("Forcefully deleted pod '%s'", pod.Name))
log.Info("Forcefully deleted runner pod", "repository", runner.Spec.Repository)
// give kube manager a little time to forcefully delete the stuck pod
return ctrl.Result{RequeueAfter: 3 * time.Second}, nil
} else {
return ctrl.Result{}, nil
}
}
func (r *RunnerReconciler) processRunnerCreation(ctx context.Context, runner v1alpha1.Runner, log logr.Logger) (reconcile.Result, error) {
if updated, err := r.updateRegistrationToken(ctx, runner); err != nil {
return ctrl.Result{}, err
return ctrl.Result{RequeueAfter: RetryDelayOnCreateRegistrationError}, nil
} else if updated {
return ctrl.Result{Requeue: true}, nil
}
@@ -528,37 +256,10 @@ func (r *RunnerReconciler) processRunnerCreation(ctx context.Context, runner v1a
r.Recorder.Event(&runner, corev1.EventTypeNormal, "PodCreated", fmt.Sprintf("Created pod '%s'", newPod.Name))
log.Info("Created runner pod", "repository", runner.Spec.Repository)
return ctrl.Result{}, nil
}
func (r *RunnerReconciler) unregisterRunner(ctx context.Context, enterprise, org, repo, name string) (bool, error) {
runners, err := r.GitHubClient.ListRunners(ctx, enterprise, org, repo)
if err != nil {
return false, err
}
id := int64(0)
for _, runner := range runners {
if runner.GetName() == name {
if runner.GetBusy() {
return false, fmt.Errorf("runner is busy")
}
id = runner.GetID()
break
}
}
if id == int64(0) {
return false, nil
}
if err := r.GitHubClient.RemoveRunner(ctx, enterprise, org, repo, id); err != nil {
return false, err
}
return true, nil
}
func (r *RunnerReconciler) updateRegistrationToken(ctx context.Context, runner v1alpha1.Runner) (bool, error) {
if runner.IsRegisterable() {
return false, nil
@@ -568,6 +269,10 @@ func (r *RunnerReconciler) updateRegistrationToken(ctx context.Context, runner v
rt, err := r.GitHubClient.GetRegistrationToken(ctx, runner.Spec.Enterprise, runner.Spec.Organization, runner.Spec.Repository, runner.Name)
if err != nil {
// An error can be a permanent, permission issue like the below:
// POST https://api.github.com/enterprises/YOUR_ENTERPRISE/actions/runners/registration-token: 403 Resource not accessible by integration []
// In such case retrying in seconds might not make much sense.
r.Recorder.Event(&runner, corev1.EventTypeWarning, "FailedUpdateRegistrationToken", "Updating registration token failed")
log.Error(err, "Failed to get new registration token")
return false, err
@@ -626,6 +331,11 @@ func (r *RunnerReconciler) newPod(runner v1alpha1.Runner) (corev1.Pod, error) {
runner.ObjectMeta.Annotations,
runner.Spec,
r.GitHubClient.GithubBaseURL,
// Token change should trigger replacement.
// We need to include this explicitly here because
// runner.Spec does not contain the possibly updated token stored in the
// runner status yet.
runner.Status.Registration.Token,
)
objectMeta := metav1.ObjectMeta{
@@ -663,7 +373,7 @@ func (r *RunnerReconciler) newPod(runner v1alpha1.Runner) (corev1.Pod, error) {
registrationOnly := metav1.HasAnnotation(runner.ObjectMeta, annotationKeyRegistrationOnly)
pod, err := newRunnerPod(template, runner.Spec.RunnerConfig, r.RunnerImage, r.RunnerImagePullSecrets, r.DockerImage, r.DockerRegistryMirror, r.GitHubClient.GithubBaseURL, registrationOnly)
pod, err := newRunnerPod(runner.Name, template, runner.Spec.RunnerConfig, r.RunnerImage, r.RunnerImagePullSecrets, r.DockerImage, r.DockerRegistryMirror, r.GitHubClient.GithubBaseURL, registrationOnly)
if err != nil {
return pod, err
}
@@ -743,6 +453,10 @@ func (r *RunnerReconciler) newPod(runner v1alpha1.Runner) (corev1.Pod, error) {
pod.Spec.HostAliases = runnerSpec.HostAliases
}
if runnerSpec.DnsConfig != nil {
pod.Spec.DNSConfig = runnerSpec.DnsConfig
}
if runnerSpec.RuntimeClassName != nil {
pod.Spec.RuntimeClassName = runnerSpec.RuntimeClassName
}
@@ -762,25 +476,18 @@ func (r *RunnerReconciler) newPod(runner v1alpha1.Runner) (corev1.Pod, error) {
func mutatePod(pod *corev1.Pod, token string) *corev1.Pod {
updated := pod.DeepCopy()
for i := range pod.Spec.Containers {
if pod.Spec.Containers[i].Name == "runner" {
updated.Spec.Containers[i].Env = append(updated.Spec.Containers[i].Env,
corev1.EnvVar{
Name: "RUNNER_NAME",
Value: pod.ObjectMeta.Name,
},
corev1.EnvVar{
Name: "RUNNER_TOKEN",
Value: token,
},
)
}
if getRunnerEnv(pod, EnvVarRunnerName) == "" {
setRunnerEnv(updated, EnvVarRunnerName, pod.ObjectMeta.Name)
}
if getRunnerEnv(pod, EnvVarRunnerToken) == "" {
setRunnerEnv(updated, EnvVarRunnerToken, token)
}
return updated
}
func newRunnerPod(template corev1.Pod, runnerSpec v1alpha1.RunnerConfig, defaultRunnerImage string, defaultRunnerImagePullSecrets []string, defaultDockerImage, defaultDockerRegistryMirror string, githubBaseURL string, registrationOnly bool) (corev1.Pod, error) {
func newRunnerPod(runnerName string, template corev1.Pod, runnerSpec v1alpha1.RunnerConfig, defaultRunnerImage string, defaultRunnerImagePullSecrets []string, defaultDockerImage, defaultDockerRegistryMirror string, githubBaseURL string, registrationOnly bool) (corev1.Pod, error) {
var (
privileged bool = true
dockerdInRunner bool = runnerSpec.DockerdWithinRunnerContainer != nil && *runnerSpec.DockerdWithinRunnerContainer
@@ -789,6 +496,12 @@ func newRunnerPod(template corev1.Pod, runnerSpec v1alpha1.RunnerConfig, default
dockerdInRunnerPrivileged bool = dockerdInRunner
)
template = *template.DeepCopy()
// This label selector is used by default when rd.Spec.Selector is empty.
template.ObjectMeta.Labels = CloneAndAddLabel(template.ObjectMeta.Labels, LabelKeyRunnerSetName, runnerName)
template.ObjectMeta.Labels = CloneAndAddLabel(template.ObjectMeta.Labels, LabelKeyPodMutation, LabelValuePodMutation)
workDir := runnerSpec.WorkDir
if workDir == "" {
workDir = "/runner/_work"
@@ -841,7 +554,7 @@ func newRunnerPod(template corev1.Pod, runnerSpec v1alpha1.RunnerConfig, default
Value: workDir,
},
{
Name: "RUNNER_EPHEMERAL",
Name: EnvVarEphemeral,
Value: fmt.Sprintf("%v", ephemeral),
},
}
@@ -1117,6 +830,12 @@ func newRunnerPod(template corev1.Pod, runnerSpec v1alpha1.RunnerConfig, default
}
}
// TODO Remove this once we remove RUNNER_FEATURE_FLAG_EPHEMERAL from runner's entrypoint.sh
// and make --ephemeral the default option.
if getRunnerEnv(pod, EnvVarRunnerFeatureFlagEphemeral) == "" {
setRunnerEnv(pod, EnvVarRunnerFeatureFlagEphemeral, EnvVarTrue)
}
return *pod, nil
}

View File

@@ -0,0 +1,396 @@
package controllers
import (
"context"
"errors"
"fmt"
"strconv"
"time"
"github.com/actions-runner-controller/actions-runner-controller/github"
"github.com/go-logr/logr"
gogithub "github.com/google/go-github/v39/github"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
)
// tickRunnerGracefulStop reconciles the runner and the runner pod in a way so that
// we can delete the runner pod without disrupting a workflow job.
//
// This function returns a non-nil pointer to corev1.Pod as the first return value
// if the runner is considered to have gracefully stopped, hence it's pod is safe for deletion.
//
// It's a "tick" operation so a graceful stop can take multiple calls to complete.
// This function is designed to complete a lengthy graceful stop process in a unblocking way.
// When it wants to be retried later, the function returns a non-nil *ctrl.Result as the second return value, may or may not populating the error in the second return value.
// The caller is expected to return the returned ctrl.Result and error to postpone the current reconcilation loop and trigger a scheduled retry.
func tickRunnerGracefulStop(ctx context.Context, retryDelay time.Duration, log logr.Logger, ghClient *github.Client, c client.Client, enterprise, organization, repository, runner string, pod *corev1.Pod) (*corev1.Pod, *ctrl.Result, error) {
pod, err := annotatePodOnce(ctx, c, log, pod, AnnotationKeyUnregistrationStartTimestamp, time.Now().Format(time.RFC3339))
if err != nil {
return nil, &ctrl.Result{}, err
}
if res, err := ensureRunnerUnregistration(ctx, retryDelay, log, ghClient, c, enterprise, organization, repository, runner, pod); res != nil {
return nil, res, err
}
pod, err = annotatePodOnce(ctx, c, log, pod, AnnotationKeyUnregistrationCompleteTimestamp, time.Now().Format(time.RFC3339))
if err != nil {
return nil, &ctrl.Result{}, err
}
return pod, nil, nil
}
// annotatePodOnce annotates the pod if it wasn't.
// Returns the provided pod as-is if it was already annotated.
// Returns the updated pod if the pod was missing the annotation and the update to add the annotation succeeded.
func annotatePodOnce(ctx context.Context, c client.Client, log logr.Logger, pod *corev1.Pod, k, v string) (*corev1.Pod, error) {
if pod == nil {
return nil, nil
}
if _, ok := getAnnotation(pod, k); ok {
return pod, nil
}
updated := pod.DeepCopy()
setAnnotation(&updated.ObjectMeta, k, v)
if err := c.Patch(ctx, updated, client.MergeFrom(pod)); err != nil {
log.Error(err, fmt.Sprintf("Failed to patch pod to have %s annotation", k))
return nil, err
}
log.V(2).Info("Annotated pod", "key", k, "value", v)
return updated, nil
}
// If the first return value is nil, it's safe to delete the runner pod.
func ensureRunnerUnregistration(ctx context.Context, retryDelay time.Duration, log logr.Logger, ghClient *github.Client, c client.Client, enterprise, organization, repository, runner string, pod *corev1.Pod) (*ctrl.Result, error) {
var runnerID *int64
if id, ok := getAnnotation(pod, AnnotationKeyRunnerID); ok {
v, err := strconv.ParseInt(id, 10, 64)
if err != nil {
return &ctrl.Result{}, err
}
runnerID = &v
}
if runnerID == nil {
runner, err := getRunner(ctx, ghClient, enterprise, organization, repository, runner)
if err != nil {
return &ctrl.Result{}, err
}
if runner != nil && runner.ID != nil {
runnerID = runner.ID
}
}
code := runnerContainerExitCode(pod)
if pod != nil && pod.Annotations[AnnotationKeyUnregistrationCompleteTimestamp] != "" {
// If it's already unregistered in the previous reconcilation loop,
// you can safely assume that it won't get registered again so it's safe to delete the runner pod.
log.Info("Runner pod is marked as already unregistered.")
} else if runnerID == nil {
log.Info(
"Unregistration started before runner ID is assigned. " +
"Perhaps the runner pod was terminated by anyone other than ARC? Was it OOM killed? " +
"Marking unregistration as completed anyway because there's nothing ARC can do.",
)
} else if pod != nil && runnerPodOrContainerIsStopped(pod) {
// If it's an ephemeral runner with the actions/runner container exited with 0,
// we can safely assume that it has unregistered itself from GitHub Actions
// so it's natural that RemoveRunner fails due to 404.
// If pod has ended up succeeded we need to restart it
// Happens e.g. when dind is in runner and run completes
log.Info("Runner pod has been stopped with a successful status.")
} else if pod != nil && pod.Annotations[AnnotationKeyRunnerCompletionWaitStartTimestamp] != "" {
log.Info("Runner pod is annotated to wait for completion")
return &ctrl.Result{RequeueAfter: retryDelay}, nil
} else if ok, err := unregisterRunner(ctx, ghClient, enterprise, organization, repository, runner, *runnerID); err != nil {
if errors.Is(err, &gogithub.RateLimitError{}) {
// We log the underlying error when we failed calling GitHub API to list or unregisters,
// or the runner is still busy.
log.Error(
err,
fmt.Sprintf(
"Failed to unregister runner due to GitHub API rate limits. Delaying retry for %s to avoid excessive GitHub API calls",
retryDelayOnGitHubAPIRateLimitError,
),
)
return &ctrl.Result{RequeueAfter: retryDelayOnGitHubAPIRateLimitError}, err
}
log.V(1).Info("Failed to unregister runner before deleting the pod.", "error", err)
var runnerBusy bool
errRes := &gogithub.ErrorResponse{}
if errors.As(err, &errRes) {
if errRes.Response.StatusCode == 403 {
log.Error(err, "Unable to unregister due to permission error. "+
"Perhaps you've changed the permissions of PAT or GitHub App, or you updated authentication method of ARC in a wrong way? "+
"ARC considers it as already unregistered and continue removing the pod. "+
"You may need to remove the runner on GitHub UI.")
return nil, nil
}
runner, _ := getRunner(ctx, ghClient, enterprise, organization, repository, runner)
var runnerID int64
if runner != nil && runner.ID != nil {
runnerID = *runner.ID
}
runnerBusy = errRes.Response.StatusCode == 422
if runnerBusy && code != nil {
log.V(2).Info("Runner container has already stopped but the unregistration attempt failed. "+
"This can happen when the runner container crashed due to an unhandled error, OOM, etc. "+
"ARC terminates the pod anyway. You'd probably need to manually delete the runner later by calling the GitHub API",
"runnerExitCode", *code,
"runnerID", runnerID,
)
return nil, nil
}
}
if runnerBusy {
// We want to prevent spamming the deletion attemps but returning ctrl.Result with RequeueAfter doesn't
// work as the reconcilation can happen earlier due to pod status update.
// For ephemeral runners, we can expect it to stop and unregister itself on completion.
// So we can just wait for the completion without actively retrying unregistration.
ephemeral := getRunnerEnv(pod, EnvVarEphemeral)
if ephemeral == "true" {
pod, err = annotatePodOnce(ctx, c, log, pod, AnnotationKeyRunnerCompletionWaitStartTimestamp, time.Now().Format(time.RFC3339))
if err != nil {
return &ctrl.Result{}, err
}
return &ctrl.Result{}, nil
}
log.V(2).Info("Retrying runner unregistration because the static runner is still busy")
// Otherwise we may end up spamming 422 errors,
// each call consuming GitHub API rate limit
// https://github.com/actions-runner-controller/actions-runner-controller/pull/1167#issuecomment-1064213271
return &ctrl.Result{RequeueAfter: retryDelay}, nil
}
return &ctrl.Result{}, err
} else if ok {
log.Info("Runner has just been unregistered.")
} else if pod == nil {
// `r.unregisterRunner()` will returns `false, nil` if the runner is not found on GitHub.
// However, that doesn't always mean the pod can be safely removed.
//
// If the pod does not exist for the runner,
// it may be due to that the runner pod has never been created.
// In that case we can safely assume that the runner will never be registered.
log.Info("Runner was not found on GitHub and the runner pod was not found on Kuberntes.")
} else if ts := pod.Annotations[AnnotationKeyUnregistrationStartTimestamp]; ts != "" {
log.Info("Runner unregistration is in-progress. It can take forever to complete if if it's a static runner constantly running jobs."+
" It can also take very long time if it's an ephemeral runner that is running a log-running job.", "error", err)
return &ctrl.Result{RequeueAfter: retryDelay}, nil
} else {
// A runner and a runner pod that is created by this version of ARC should match
// any of the above branches.
//
// But we leave this match all branch for potential backward-compatibility.
// The caller is expected to take appropriate actions, like annotating the pod as started the unregistration process,
// and retry later.
log.V(1).Info("Runner unregistration is being retried later.")
return &ctrl.Result{RequeueAfter: retryDelay}, nil
}
return nil, nil
}
func ensureRunnerPodRegistered(ctx context.Context, log logr.Logger, ghClient *github.Client, c client.Client, enterprise, organization, repository, runner string, pod *corev1.Pod) (*corev1.Pod, *ctrl.Result, error) {
_, hasRunnerID := getAnnotation(pod, AnnotationKeyRunnerID)
if runnerPodOrContainerIsStopped(pod) || hasRunnerID {
return pod, nil, nil
}
r, err := getRunner(ctx, ghClient, enterprise, organization, repository, runner)
if err != nil {
return nil, &ctrl.Result{RequeueAfter: 10 * time.Second}, err
}
if r == nil || r.ID == nil {
return nil, &ctrl.Result{RequeueAfter: 10 * time.Second}, err
}
id := *r.ID
updated, err := annotatePodOnce(ctx, c, log, pod, AnnotationKeyRunnerID, fmt.Sprintf("%d", id))
if err != nil {
return nil, &ctrl.Result{RequeueAfter: 10 * time.Second}, err
}
return updated, nil, nil
}
func getAnnotation(obj client.Object, key string) (string, bool) {
if obj.GetAnnotations() == nil {
return "", false
}
v, ok := obj.GetAnnotations()[key]
return v, ok
}
func setAnnotation(meta *metav1.ObjectMeta, key, value string) {
if meta.Annotations == nil {
meta.Annotations = map[string]string{}
}
meta.Annotations[key] = value
}
func podConditionTransitionTime(pod *corev1.Pod, tpe corev1.PodConditionType, v corev1.ConditionStatus) *metav1.Time {
for _, c := range pod.Status.Conditions {
if c.Type == tpe && c.Status == v {
return &c.LastTransitionTime
}
}
return nil
}
func podConditionTransitionTimeAfter(pod *corev1.Pod, tpe corev1.PodConditionType, d time.Duration) bool {
c := podConditionTransitionTime(pod, tpe, corev1.ConditionTrue)
if c == nil {
return false
}
return c.Add(d).Before(time.Now())
}
func podRunnerID(pod *corev1.Pod) string {
id, _ := getAnnotation(pod, AnnotationKeyRunnerID)
return id
}
func getRunnerEnv(pod *corev1.Pod, key string) string {
for _, c := range pod.Spec.Containers {
if c.Name == containerName {
for _, e := range c.Env {
if e.Name == key {
return e.Value
}
}
}
}
return ""
}
func setRunnerEnv(pod *corev1.Pod, key, value string) {
for i := range pod.Spec.Containers {
c := pod.Spec.Containers[i]
if c.Name == containerName {
for j, env := range c.Env {
if env.Name == key {
pod.Spec.Containers[i].Env[j].Value = value
return
}
}
pod.Spec.Containers[i].Env = append(c.Env, corev1.EnvVar{Name: key, Value: value})
}
}
}
// unregisterRunner unregisters the runner from GitHub Actions by name.
//
// This function returns:
//
// Case 1. (true, nil) when it has successfully unregistered the runner.
// Case 2. (false, nil) when (2-1.) the runner has been already unregistered OR (2-2.) the runner will never be created OR (2-3.) the runner is not created yet and it is about to be registered(hence we couldn't see it's existence from GitHub Actions API yet)
// Case 3. (false, err) when it postponed unregistration due to the runner being busy, or it tried to unregister the runner but failed due to
// an error returned by GitHub API.
//
// When the returned values is "Case 2. (false, nil)", the caller must handle the three possible sub-cases appropriately.
// In other words, all those three sub-cases cannot be distinguished by this function alone.
//
// - Case "2-1." can happen when e.g. ARC has successfully unregistered in a previous reconcilation loop or it was an ephemeral runner that finished it's job run(an ephemeral runner is designed to stop after a job run).
// You'd need to maintain the runner state(i.e. if it's already unregistered or not) somewhere,
// so that you can either not call this function at all if the runner state says it's already unregistered, or determine that it's case "2-1." when you got (false, nil).
//
// - Case "2-2." can happen when e.g. the runner registration token was somehow broken so that `config.sh` within the runner container was never meant to succeed.
// Waiting and retrying forever on this case is not a solution, because `config.sh` won't succeed with a wrong token hence the runner gets stuck in this state forever.
// There isn't a perfect solution to this, but a practical workaround would be implement a "grace period" in the caller side.
//
// - Case "2-3." can happen when e.g. ARC recreated an ephemral runner pod in a previous reconcilation loop and then it was requested to delete the runner before the runner comes up.
// If handled inappropriately, this can cause a race condition betweeen a deletion of the runner pod and GitHub scheduling a workflow job onto the runner.
//
// Once successfully detected case "2-1." or "2-2.", you can safely delete the runner pod because you know that the runner won't come back
// as long as you recreate the runner pod.
//
// If it was "2-3.", you need a workaround to avoid the race condition.
//
// You shall introduce a "grace period" mechanism, similar or equal to that is required for "Case 2-2.", so that you ever
// start the runner pod deletion only after it's more and more likely that the runner pod is not coming up.
//
// Beware though, you need extra care to set an appropriate grace period depending on your environment.
// There isn't a single right grace period that works for everyone.
// The longer the grace period is, the earlier a cluster resource shortage can occur due to throttoled runner pod deletions,
// while the shorter the grace period is, the more likely you may encounter the race issue.
func unregisterRunner(ctx context.Context, client *github.Client, enterprise, org, repo, name string, id int64) (bool, error) {
// For the record, historically ARC did not try to call RemoveRunner on a busy runner, but it's no longer true.
// The reason ARC did so was to let a runner running a job to not stop prematurely.
//
// However, we learned that RemoveRunner already has an ability to prevent stopping a busy runner,
// so ARC doesn't need to do anything special for a graceful runner stop.
// It can just call RemoveRunner, and if it returned 200 you're guaranteed that the runner will not automatically come back and
// the runner pod is safe for deletion.
//
// Trying to remove a busy runner can result in errors like the following:
// failed to remove runner: DELETE https://api.github.com/repos/actions-runner-controller/mumoshu-actions-test/actions/runners/47: 422 Bad request - Runner \"example-runnerset-0\" is still running a job\" []
//
// # NOTES
//
// - It can be "status=offline" at the same time but that's another story.
// - After https://github.com/actions-runner-controller/actions-runner-controller/pull/1127, ListRunners responses that are used to
// determine if the runner is busy can be more outdated than before, as those responeses are now cached for 60 seconds.
// - Note that 60 seconds is controlled by the Cache-Control response header provided by GitHub so we don't have a strict control on it but we assume it won't
// change from 60 seconds.
//
// TODO: Probably we can just remove the runner by ID without seeing if the runner is busy, by treating it as busy when a remove-runner call failed with 422?
if err := client.RemoveRunner(ctx, enterprise, org, repo, id); err != nil {
return false, err
}
return true, nil
}
func getRunner(ctx context.Context, client *github.Client, enterprise, org, repo, name string) (*gogithub.Runner, error) {
runners, err := client.ListRunners(ctx, enterprise, org, repo)
if err != nil {
return nil, err
}
for _, runner := range runners {
if runner.GetName() == name {
return runner, nil
}
}
return nil, nil
}

View File

@@ -23,8 +23,6 @@ import (
"time"
"github.com/go-logr/logr"
gogithub "github.com/google/go-github/v39/github"
"k8s.io/apimachinery/pkg/util/wait"
kerrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
@@ -47,16 +45,10 @@ type RunnerPodReconciler struct {
Name string
RegistrationRecheckInterval time.Duration
RegistrationRecheckJitter time.Duration
UnregistrationRetryDelay time.Duration
}
const (
// This names requires at least one slash to work.
// See https://github.com/google/knative-gcp/issues/378
runnerPodFinalizerName = "actions.summerwind.dev/runner-pod"
AnnotationKeyLastRegistrationCheckTime = "actions-runner-controller/last-registration-check-time"
)
// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch
@@ -73,9 +65,19 @@ func (r *RunnerPodReconciler) Reconcile(ctx context.Context, req ctrl.Request) (
return ctrl.Result{}, nil
}
var envvars []corev1.EnvVar
for _, container := range runnerPod.Spec.Containers {
if container.Name == "runner" {
envvars = container.Env
}
}
if len(envvars) == 0 {
return ctrl.Result{}, errors.New("Could not determine env vars for runner Pod")
}
var enterprise, org, repo string
envvars := runnerPod.Spec.Containers[0].Env
for _, e := range envvars {
switch e.Name {
case EnvVarEnterprise:
@@ -99,44 +101,36 @@ func (r *RunnerPodReconciler) Reconcile(ctx context.Context, req ctrl.Request) (
return ctrl.Result{}, err
}
log.V(2).Info("Added finalizer")
return ctrl.Result{}, nil
}
} else {
log.V(2).Info("Seen deletion-timestamp is already set")
finalizers, removed := removeFinalizer(runnerPod.ObjectMeta.Finalizers, runnerPodFinalizerName)
if removed {
ok, err := r.unregisterRunner(ctx, enterprise, org, repo, runnerPod.Name)
if err != nil {
if errors.Is(err, &gogithub.RateLimitError{}) {
// We log the underlying error when we failed calling GitHub API to list or unregisters,
// or the runner is still busy.
log.Error(
err,
fmt.Sprintf(
"Failed to unregister runner due to GitHub API rate limits. Delaying retry for %s to avoid excessive GitHub API calls",
retryDelayOnGitHubAPIRateLimitError,
),
)
return ctrl.Result{RequeueAfter: retryDelayOnGitHubAPIRateLimitError}, err
}
return ctrl.Result{}, err
// In a standard scenario, the upstream controller, like runnerset-controller, ensures this runner to be gracefully stopped before the deletion timestamp is set.
// But for the case that the user manually deleted it for whatever reason,
// we have to ensure it to gracefully stop now.
updatedPod, res, err := tickRunnerGracefulStop(ctx, r.unregistrationRetryDelay(), log, r.GitHubClient, r.Client, enterprise, org, repo, runnerPod.Name, &runnerPod)
if res != nil {
return *res, err
}
if !ok {
log.V(1).Info("Runner no longer exists on GitHub")
}
patchedPod := updatedPod.DeepCopy()
patchedPod.ObjectMeta.Finalizers = finalizers
newRunner := runnerPod.DeepCopy()
newRunner.ObjectMeta.Finalizers = finalizers
if err := r.Patch(ctx, newRunner, client.MergeFrom(&runnerPod)); err != nil {
// We commit the removal of the finalizer so that Kuberenetes notices it and delete the pod resource from the cluster.
if err := r.Patch(ctx, patchedPod, client.MergeFrom(&runnerPod)); err != nil {
log.Error(err, "Failed to update runner for finalizer removal")
return ctrl.Result{}, err
}
log.Info("Removed runner from GitHub", "repository", repo, "organization", org)
log.V(2).Info("Removed finalizer")
return ctrl.Result{}, nil
}
deletionTimeout := 1 * time.Minute
@@ -174,246 +168,45 @@ func (r *RunnerPodReconciler) Reconcile(ctx context.Context, req ctrl.Request) (
return ctrl.Result{}, nil
}
// If pod has ended up succeeded we need to restart it
// Happens e.g. when dind is in runner and run completes
stopped := runnerPod.Status.Phase == corev1.PodSucceeded
if !stopped {
if runnerPod.Status.Phase == corev1.PodRunning {
for _, status := range runnerPod.Status.ContainerStatuses {
if status.Name != containerName {
continue
}
if status.State.Terminated != nil && status.State.Terminated.ExitCode == 0 {
stopped = true
}
}
}
po, res, err := ensureRunnerPodRegistered(ctx, log, r.GitHubClient, r.Client, enterprise, org, repo, runnerPod.Name, &runnerPod)
if res != nil {
return *res, err
}
restart := stopped
runnerPod = *po
var registrationRecheckDelay time.Duration
if _, unregistrationRequested := getAnnotation(&runnerPod, AnnotationKeyUnregistrationRequestTimestamp); unregistrationRequested {
log.V(2).Info("Progressing unregistration because unregistration-request timestamp is set")
// all checks done below only decide whether a restart is needed
// if a restart was already decided before, there is no need for the checks
// saving API calls and scary log messages
if !restart {
registrationCheckInterval := time.Minute
if r.RegistrationRecheckInterval > 0 {
registrationCheckInterval = r.RegistrationRecheckInterval
// At this point we're sure that DeletionTimestamp is not set yet, but the unregistration process is triggered by an upstream controller like runnerset-controller.
//
// In a standard scenario, ARC starts the unregistration process before marking the pod for deletion at all,
// so that it isn't subject to terminationGracePeriod and can safely take hours to finish it's work.
_, res, err := tickRunnerGracefulStop(ctx, r.unregistrationRetryDelay(), log, r.GitHubClient, r.Client, enterprise, org, repo, runnerPod.Name, &runnerPod)
if res != nil {
return *res, err
}
lastCheckTimeStr := runnerPod.Annotations[AnnotationKeyLastRegistrationCheckTime]
var lastCheckTime *time.Time
if lastCheckTimeStr != "" {
t, err := time.Parse(time.RFC3339, lastCheckTimeStr)
if err != nil {
log.Error(err, "failed to parase last check time %q", lastCheckTimeStr)
return ctrl.Result{}, nil
}
lastCheckTime = &t
}
// We want to call ListRunners GitHub Actions API only once per runner per minute.
// This if block, in conjunction with:
// return ctrl.Result{RequeueAfter: registrationRecheckDelay}, nil
// achieves that.
if lastCheckTime != nil {
nextCheckTime := lastCheckTime.Add(registrationCheckInterval)
now := time.Now()
// Requeue scheduled by RequeueAfter can happen a bit earlier (like dozens of milliseconds)
// so to avoid excessive, in-effective retry, we heuristically ignore the remaining delay in case it is
// shorter than 1s
requeueAfter := nextCheckTime.Sub(now) - time.Second
if requeueAfter > 0 {
log.Info(
fmt.Sprintf("Skipped registration check because it's deferred until %s. Retrying in %s at latest", nextCheckTime, requeueAfter),
"lastRegistrationCheckTime", lastCheckTime,
"registrationCheckInterval", registrationCheckInterval,
)
// Without RequeueAfter, the controller may not retry on scheduled. Instead, it must wait until the
// next sync period passes, which can be too much later than nextCheckTime.
//
// We need to requeue on this reconcilation even though we have already scheduled the initial
// requeue previously with `return ctrl.Result{RequeueAfter: registrationRecheckDelay}, nil`.
// Apparently, the workqueue used by controller-runtime seems to deduplicate and resets the delay on
// other requeues- so the initial scheduled requeue may have been reset due to requeue on
// spec/status change.
return ctrl.Result{RequeueAfter: requeueAfter}, nil
}
}
notFound := false
offline := false
_, err := r.GitHubClient.IsRunnerBusy(ctx, enterprise, org, repo, runnerPod.Name)
currentTime := time.Now()
if err != nil {
var notFoundException *github.RunnerNotFound
var offlineException *github.RunnerOffline
if errors.As(err, &notFoundException) {
notFound = true
} else if errors.As(err, &offlineException) {
offline = true
} else {
var e *gogithub.RateLimitError
if errors.As(err, &e) {
// We log the underlying error when we failed calling GitHub API to list or unregisters,
// or the runner is still busy.
log.Error(
err,
fmt.Sprintf(
"Failed to check if runner is busy due to Github API rate limit. Retrying in %s to avoid excessive GitHub API calls",
retryDelayOnGitHubAPIRateLimitError,
),
)
return ctrl.Result{RequeueAfter: retryDelayOnGitHubAPIRateLimitError}, err
}
return ctrl.Result{}, err
}
}
registrationTimeout := 10 * time.Minute
durationAfterRegistrationTimeout := currentTime.Sub(runnerPod.CreationTimestamp.Add(registrationTimeout))
registrationDidTimeout := durationAfterRegistrationTimeout > 0
if notFound {
if registrationDidTimeout {
log.Info(
"Runner failed to register itself to GitHub in timely manner. "+
"Recreating the pod to see if it resolves the issue. "+
"CAUTION: If you see this a lot, you should investigate the root cause. "+
"See https://github.com/actions-runner-controller/actions-runner-controller/issues/288",
"podCreationTimestamp", runnerPod.CreationTimestamp,
"currentTime", currentTime,
"configuredRegistrationTimeout", registrationTimeout,
)
restart = true
} else {
log.V(1).Info(
"Runner pod exists but we failed to check if runner is busy. Apparently it still needs more time.",
"runnerName", runnerPod.Name,
)
}
} else if offline {
if registrationDidTimeout {
log.Info(
"Already existing GitHub runner still appears offline . "+
"Recreating the pod to see if it resolves the issue. "+
"CAUTION: If you see this a lot, you should investigate the root cause. ",
"podCreationTimestamp", runnerPod.CreationTimestamp,
"currentTime", currentTime,
"configuredRegistrationTimeout", registrationTimeout,
)
restart = true
} else {
log.V(1).Info(
"Runner pod exists but the GitHub runner appears to be still offline. Waiting for runner to get online ...",
"runnerName", runnerPod.Name,
)
}
}
if (notFound || offline) && !registrationDidTimeout {
registrationRecheckJitter := 10 * time.Second
if r.RegistrationRecheckJitter > 0 {
registrationRecheckJitter = r.RegistrationRecheckJitter
}
registrationRecheckDelay = registrationCheckInterval + wait.Jitter(registrationRecheckJitter, 0.1)
}
}
// Don't do anything if there's no need to restart the runner
if !restart {
// This guard enables us to update runner.Status.Phase to `Running` only after
// the runner is registered to GitHub.
if registrationRecheckDelay > 0 {
log.V(1).Info(fmt.Sprintf("Rechecking the runner registration in %s", registrationRecheckDelay))
updated := runnerPod.DeepCopy()
t := time.Now().Format(time.RFC3339)
updated.Annotations[AnnotationKeyLastRegistrationCheckTime] = t
if err := r.Patch(ctx, updated, client.MergeFrom(&runnerPod)); err != nil {
log.Error(err, "Failed to update runner pod annotation for LastRegistrationCheckTime")
return ctrl.Result{}, err
}
return ctrl.Result{RequeueAfter: registrationRecheckDelay}, nil
}
// Seeing this message, you can expect the runner to become `Running` soon.
log.Info(
"Runner appears to have registered and running.",
"podCreationTimestamp", runnerPod.CreationTimestamp,
)
// At this point we are sure that the runner has successfully unregistered, hence is safe to be deleted.
// But we don't delete the pod here. Instead, let the upstream controller/parent object to delete this pod as
// a part of a cascade deletion.
// This is to avoid a parent object, like statefulset, to recreate the deleted pod.
// If the pod was recreated, it will start a registration process and that may race with the statefulset deleting the pod.
log.V(2).Info("Unregistration seems complete")
return ctrl.Result{}, nil
}
// Delete current pod if recreation is needed
if err := r.Delete(ctx, &runnerPod); err != nil {
log.Error(err, "Failed to delete pod resource")
return ctrl.Result{}, err
}
r.Recorder.Event(&runnerPod, corev1.EventTypeNormal, "PodDeleted", fmt.Sprintf("Deleted pod '%s'", runnerPod.Name))
log.Info("Deleted runner pod", "name", runnerPod.Name)
return ctrl.Result{}, nil
}
func (r *RunnerPodReconciler) unregisterRunner(ctx context.Context, enterprise, org, repo, name string) (bool, error) {
runners, err := r.GitHubClient.ListRunners(ctx, enterprise, org, repo)
if err != nil {
return false, err
func (r *RunnerPodReconciler) unregistrationRetryDelay() time.Duration {
retryDelay := DefaultUnregistrationRetryDelay
if r.UnregistrationRetryDelay > 0 {
retryDelay = r.UnregistrationRetryDelay
}
var busy bool
id := int64(0)
for _, runner := range runners {
if runner.GetName() == name {
// Sometimes a runner can stuck "busy" even though it is already "offline".
// Thus removing the condition on status can block the runner pod from being terminated forever.
busy = runner.GetBusy()
if runner.GetStatus() != "offline" && busy {
r.Log.Info("This runner will delay the runner pod deletion and the runner deregistration until it becomes either offline or non-busy", "name", runner.GetName(), "status", runner.GetStatus(), "busy", runner.GetBusy())
return false, fmt.Errorf("runner is busy")
}
id = runner.GetID()
break
}
}
if id == int64(0) {
return false, nil
}
// Sometimes a runner can stuck "busy" even though it is already "offline".
// Trying to remove the offline but busy runner can result in errors like the following:
// failed to remove runner: DELETE https://api.github.com/repos/actions-runner-controller/mumoshu-actions-test/actions/runners/47: 422 Bad request - Runner \"example-runnerset-0\" is still running a job\" []
if !busy {
if err := r.GitHubClient.RemoveRunner(ctx, enterprise, org, repo, id); err != nil {
return false, err
}
}
return true, nil
return retryDelay
}
func (r *RunnerPodReconciler) SetupWithManager(mgr ctrl.Manager) error {

View File

@@ -0,0 +1,600 @@
package controllers
import (
"context"
"fmt"
"sort"
"time"
"github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/go-logr/logr"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/client"
)
type podsForOwner struct {
total int
completed int
running int
terminating int
regTimeout int
pending int
templateHash string
runner *v1alpha1.Runner
statefulSet *appsv1.StatefulSet
owner owner
object client.Object
synced bool
pods []corev1.Pod
}
type owner interface {
client.Object
pods(context.Context, client.Client) ([]corev1.Pod, error)
templateHash() (string, bool)
withAnnotation(k, v string) client.Object
synced() bool
}
type ownerRunner struct {
client.Object
Log logr.Logger
Runner *v1alpha1.Runner
}
var _ owner = (*ownerRunner)(nil)
func (r *ownerRunner) pods(ctx context.Context, c client.Client) ([]corev1.Pod, error) {
var pod corev1.Pod
if err := c.Get(ctx, types.NamespacedName{Namespace: r.Runner.Namespace, Name: r.Runner.Name}, &pod); err != nil {
if errors.IsNotFound(err) {
return nil, nil
}
r.Log.Error(err, "Failed to get pod managed by runner")
return nil, err
}
return []corev1.Pod{pod}, nil
}
func (r *ownerRunner) templateHash() (string, bool) {
return getRunnerTemplateHash(r.Runner)
}
func (r *ownerRunner) withAnnotation(k, v string) client.Object {
copy := r.Runner.DeepCopy()
setAnnotation(&copy.ObjectMeta, k, v)
return copy
}
func (r *ownerRunner) synced() bool {
return r.Runner.Status.Phase != ""
}
type ownerStatefulSet struct {
client.Object
Log logr.Logger
StatefulSet *appsv1.StatefulSet
}
var _ owner = (*ownerStatefulSet)(nil)
func (s *ownerStatefulSet) pods(ctx context.Context, c client.Client) ([]corev1.Pod, error) {
var podList corev1.PodList
if err := c.List(ctx, &podList, client.MatchingLabels(s.StatefulSet.Spec.Template.ObjectMeta.Labels)); err != nil {
s.Log.Error(err, "Failed to list pods managed by statefulset")
return nil, err
}
var pods []corev1.Pod
for _, pod := range podList.Items {
if owner := metav1.GetControllerOf(&pod); owner == nil || owner.Kind != "StatefulSet" || owner.Name != s.StatefulSet.Name {
continue
}
pods = append(pods, pod)
}
return pods, nil
}
func (s *ownerStatefulSet) templateHash() (string, bool) {
return getRunnerTemplateHash(s.StatefulSet)
}
func (s *ownerStatefulSet) withAnnotation(k, v string) client.Object {
copy := s.StatefulSet.DeepCopy()
setAnnotation(&copy.ObjectMeta, k, v)
return copy
}
func (s *ownerStatefulSet) synced() bool {
var replicas int32 = 1
if s.StatefulSet.Spec.Replicas != nil {
replicas = *s.StatefulSet.Spec.Replicas
}
if s.StatefulSet.Status.Replicas != replicas {
s.Log.V(2).Info("Waiting for statefulset to sync", "desiredReplicas", replicas, "currentReplicas", s.StatefulSet.Status.Replicas)
return false
}
return true
}
func getPodsForOwner(ctx context.Context, c client.Client, log logr.Logger, o client.Object) (*podsForOwner, error) {
var (
owner owner
runner *v1alpha1.Runner
statefulSet *appsv1.StatefulSet
object client.Object
)
switch v := o.(type) {
case *v1alpha1.Runner:
owner = &ownerRunner{
Log: log,
Runner: v,
Object: v,
}
runner = v
object = v
case *appsv1.StatefulSet:
owner = &ownerStatefulSet{
Log: log,
StatefulSet: v,
Object: v,
}
statefulSet = v
object = v
default:
return nil, fmt.Errorf("BUG: Unsupported runner pods owner %v(%T)", v, v)
}
pods, err := owner.pods(ctx, c)
if err != nil {
return nil, err
}
var completed, running, terminating, regTimeout, pending, total int
for _, pod := range pods {
total++
if runnerPodOrContainerIsStopped(&pod) {
completed++
} else if pod.Status.Phase == corev1.PodRunning {
if podRunnerID(&pod) == "" && podConditionTransitionTimeAfter(&pod, corev1.PodReady, registrationTimeout) {
log.Info(
"Runner failed to register itself to GitHub in timely manner. "+
"Recreating the pod to see if it resolves the issue. "+
"CAUTION: If you see this a lot, you should investigate the root cause. "+
"See https://github.com/actions-runner-controller/actions-runner-controller/issues/288",
"creationTimestamp", pod.CreationTimestamp,
"readyTransitionTime", podConditionTransitionTime(&pod, corev1.PodReady, corev1.ConditionTrue),
"configuredRegistrationTimeout", registrationTimeout,
)
regTimeout++
} else {
running++
}
} else if !pod.DeletionTimestamp.IsZero() {
terminating++
} else {
// pending includes running but timedout runner's pod too
pending++
}
}
templateHash, ok := owner.templateHash()
if !ok {
log.Info("Failed to get template hash of statefulset. It must be in an invalid state. Please manually delete the statefulset so that it is recreated")
return nil, nil
}
synced := owner.synced()
return &podsForOwner{
total: total,
completed: completed,
running: running,
terminating: terminating,
regTimeout: regTimeout,
pending: pending,
templateHash: templateHash,
runner: runner,
statefulSet: statefulSet,
owner: owner,
object: object,
synced: synced,
pods: pods,
}, nil
}
func getRunnerTemplateHash(r client.Object) (string, bool) {
hash, ok := r.GetLabels()[LabelKeyRunnerTemplateHash]
return hash, ok
}
type state struct {
podsForOwners map[string][]*podsForOwner
lastSyncTime *time.Time
}
type result struct {
currentObjects []*podsForOwner
}
// Why `create` must be a function rather than a client.Object? That's becase we use it to create one or more objects on scale up.
//
// We use client.Create to create a necessary number of client.Object. client.Create mutates the passed object on a successful creation.
// It seems to set .Revision at least, and the existence of .Revision let client.Create fail due to K8s restriction that an object being just created
// can't have .Revision.
// Now, imagine that you are to add 2 runner replicas on scale up.
// We create one resource object per a replica that ends up calling 2 client.Create calls.
// If we were reusing client.Object to be passed to client.Create calls, only the first call suceeeds.
// The second call fails due to the first call mutated the client.Object to have .Revision.
// Passing a factory function of client.Object and creating a brand-new client.Object per a client.Create call resolves this issue,
// allowing us to create two or more replicas in one reconcilation loop without being rejected by K8s.
func syncRunnerPodsOwners(ctx context.Context, c client.Client, log logr.Logger, effectiveTime *metav1.Time, newDesiredReplicas int, create func() client.Object, ephemeral bool, owners []client.Object) (*result, error) {
state, err := collectPodsForOwners(ctx, c, log, owners)
if err != nil || state == nil {
return nil, err
}
podsForOwnersPerTemplateHash, lastSyncTime := state.podsForOwners, state.lastSyncTime
// # Why do we recreate statefulsets instead of updating their desired replicas?
//
// A statefulset cannot add more pods when not all the pods are running.
// Our ephemeral runners' pods that have finished running become Completed(Phase=Succeeded).
// So creating one statefulset per a batch of ephemeral runners is the only way for us to add more replicas.
//
// # Why do we recreate statefulsets instead of updating fields other than replicas?
//
// That's because Kubernetes doesn't allow updating anything other than replicas, template, and updateStrategy.
// And the nature of ephemeral runner pods requires you to create a statefulset per a batch of new runner pods so
// we have really no other choice.
//
// If you're curious, the below is the error message you will get when you tried to update forbidden StatefulSet field(s):
//
// 2021-06-13T07:19:52.760Z ERROR actions-runner-controller.runnerset Failed to patch statefulset
// {"runnerset": "default/example-runnerset", "error": "StatefulSet.apps \"example-runnerset\" is invalid: s
// pec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy'
// are forbidden"}
//
// Even though the error message includes "Forbidden", this error's reason is "Invalid".
// So we used to match these errors by using errors.IsInvalid. But that's another story...
desiredTemplateHash, ok := getRunnerTemplateHash(create())
if !ok {
log.Info("Failed to get template hash of desired owner resource. It must be in an invalid state. Please manually delete the owner so that it is recreated")
return nil, nil
}
currentObjects := podsForOwnersPerTemplateHash[desiredTemplateHash]
sort.SliceStable(currentObjects, func(i, j int) bool {
return currentObjects[i].owner.GetCreationTimestamp().Time.Before(currentObjects[j].owner.GetCreationTimestamp().Time)
})
if len(currentObjects) > 0 {
timestampFirst := currentObjects[0].owner.GetCreationTimestamp()
timestampLast := currentObjects[len(currentObjects)-1].owner.GetCreationTimestamp()
var names []string
for _, ss := range currentObjects {
names = append(names, ss.owner.GetName())
}
log.V(2).Info("Detected some current object(s)", "creationTimestampFirst", timestampFirst, "creationTimestampLast", timestampLast, "names", names)
}
var total, terminating, pending, running, regTimeout int
for _, ss := range currentObjects {
total += ss.total
terminating += ss.terminating
pending += ss.pending
running += ss.running
regTimeout += ss.regTimeout
}
numOwners := len(owners)
var hashes []string
for h, _ := range state.podsForOwners {
hashes = append(hashes, h)
}
log.V(2).Info(
"Found some pods across owner(s)",
"total", total,
"terminating", terminating,
"pending", pending,
"running", running,
"regTimeout", regTimeout,
"desired", newDesiredReplicas,
"owners", numOwners,
)
maybeRunning := pending + running
wantMoreRunners := newDesiredReplicas > maybeRunning
alreadySyncedAfterEffectiveTime := ephemeral && lastSyncTime != nil && effectiveTime != nil && lastSyncTime.After(effectiveTime.Time)
runnerPodRecreationDelayAfterWebhookScale := lastSyncTime != nil && time.Now().Before(lastSyncTime.Add(DefaultRunnerPodRecreationDelayAfterWebhookScale))
log = log.WithValues(
"lastSyncTime", lastSyncTime,
"effectiveTime", effectiveTime,
"templateHashDesired", desiredTemplateHash,
"replicasDesired", newDesiredReplicas,
"replicasPending", pending,
"replicasRunning", running,
"replicasMaybeRunning", maybeRunning,
"templateHashObserved", hashes,
)
if wantMoreRunners && alreadySyncedAfterEffectiveTime && runnerPodRecreationDelayAfterWebhookScale {
// This is our special handling of the situation for ephemeral runners only.
//
// Handling static runners this way results in scale-up to not work at all,
// because then any scale up attempts for static runenrs fall within this condition, for two reasons.
// First, static(persistent) runners will never restart on their own.
// Second, we don't update EffectiveTime for static runners.
//
// We do need to skip this condition for static runners, and that's why we take the `ephemeral` flag into account when
// computing `alreadySyncedAfterEffectiveTime``.
log.V(2).Info(
"Detected that some ephemeral runners have disappeared. " +
"Usually this is due to that ephemeral runner completions " +
"so ARC does not create new runners until EffectiveTime is updated, or DefaultRunnerPodRecreationDelayAfterWebhookScale is elapsed.")
} else if wantMoreRunners {
if alreadySyncedAfterEffectiveTime && !runnerPodRecreationDelayAfterWebhookScale {
log.V(2).Info("Adding more replicas because DefaultRunnerPodRecreationDelayAfterWebhookScale has been passed")
}
num := newDesiredReplicas - maybeRunning
for i := 0; i < num; i++ {
// Add more replicas
if err := c.Create(ctx, create()); err != nil {
return nil, err
}
}
log.V(1).Info("Created replica(s)",
"created", num,
)
return nil, nil
} else if newDesiredReplicas <= running {
// If you use ephemeral runners with webhook-based autoscaler and the runner controller is working normally,
// you're unlikely to fall into this branch.
//
// That's because all the stakeholders work like this:
//
// 1. A runner pod completes with the runner container exiting with code 0
// 2. ARC runner controller detects the pod completion, marks the owner(runner or statefulset) resource on k8s for deletion (=Runner.DeletionTimestamp becomes non-zero)
// 3. GitHub triggers a corresponding workflow_job "complete" webhook event
// 4. ARC github-webhook-server (webhook-based autoscaler) receives the webhook event updates HRA with removing the oldest capacity reservation
// 5. ARC horizontalrunnerautoscaler updates RunnerDeployment's desired replicas based on capacity reservations
// 6. ARC runnerdeployment controller updates RunnerReplicaSet's desired replicas
// 7. (We're here) ARC runnerset or runnerreplicaset controller starts reconciling the owner resource (statefulset or runner)
//
// In a normally working ARC installation, the runner that was used to run the workflow job should already have been
// marked for deletion by the runner controller.
// This runnerreplicaset controller doesn't count marked runners into the `running` value, hence you're unlikely to
// fall into this branch when you're using ephemeral runners with webhook-based-autoscaler.
var retained int
var delete []*podsForOwner
for i := len(currentObjects) - 1; i >= 0; i-- {
ss := currentObjects[i]
if ss.running == 0 || retained >= newDesiredReplicas {
// In case the desired replicas is satisfied until i-1, or this owner has no running pods,
// this owner can be considered safe for deletion.
// Note that we already waited on this owner to create pods by waiting for
// `.Status.Replicas`(=total number of pods managed by owner, regardless of the runner is Running or Completed) to match the desired replicas in a previous step.
// So `.running == 0` means "the owner has created the desired number of pods before, and all of them are completed now".
delete = append(delete, ss)
} else if retained < newDesiredReplicas {
retained += ss.running
}
}
if retained == newDesiredReplicas {
for _, ss := range delete {
log := log.WithValues("owner", types.NamespacedName{Namespace: ss.owner.GetNamespace(), Name: ss.owner.GetName()})
// Statefulset termination process 1/4: Set unregistrationRequestTimestamp only after all the pods managed by the statefulset have
// started unregistreation process.
//
// NOTE: We just mark it instead of immediately starting the deletion process.
// Otherwise, the runner pod may hit termiationGracePeriod before the unregistration completes(the max terminationGracePeriod is limited to 1h by K8s and a job can be run for more than that),
// or actions/runner may potentially misbehave on SIGTERM immediately sent by K8s.
// We'd better unregister first and then start a pod deletion process.
// The annotation works as a mark to start the pod unregistration and deletion process of ours.
if _, ok := getAnnotation(ss.owner, AnnotationKeyUnregistrationRequestTimestamp); ok {
log.V(2).Info("Still waiting for runner pod(s) unregistration to complete")
continue
}
for _, po := range ss.pods {
if _, err := annotatePodOnce(ctx, c, log, &po, AnnotationKeyUnregistrationRequestTimestamp, time.Now().Format(time.RFC3339)); err != nil {
return nil, err
}
}
updated := ss.owner.withAnnotation(AnnotationKeyUnregistrationRequestTimestamp, time.Now().Format(time.RFC3339))
if err := c.Patch(ctx, updated, client.MergeFrom(ss.owner)); err != nil {
log.Error(err, fmt.Sprintf("Failed to patch owner to have %s annotation", AnnotationKeyUnregistrationRequestTimestamp))
return nil, err
}
log.V(2).Info("Redundant owner has been annotated to start the unregistration before deletion")
}
} else if retained > newDesiredReplicas {
log.V(2).Info("Waiting sync before scale down", "retained", retained, "newDesiredReplicas", newDesiredReplicas)
return nil, nil
} else {
log.Info("Invalid state", "retained", retained, "newDesiredReplicas", newDesiredReplicas)
panic("crashed due to invalid state")
}
}
for _, sss := range podsForOwnersPerTemplateHash {
for _, ss := range sss {
if ss.templateHash != desiredTemplateHash {
if ss.owner.GetDeletionTimestamp().IsZero() {
if err := c.Delete(ctx, ss.object); err != nil {
log.Error(err, "Unable to delete object")
return nil, err
}
log.V(2).Info("Deleted redundant and outdated object")
}
return nil, nil
}
}
}
return &result{
currentObjects: currentObjects,
}, nil
}
func collectPodsForOwners(ctx context.Context, c client.Client, log logr.Logger, owners []client.Object) (*state, error) {
podsForOwnerPerTemplateHash := map[string][]*podsForOwner{}
// lastSyncTime becomes non-nil only when there are one or more owner(s) hence there are same number of runner pods.
// It's used to prevent runnerset-controller from recreating "completed ephemeral runners".
// This is needed to prevent runners from being terminated prematurely.
// See https://github.com/actions-runner-controller/actions-runner-controller/issues/911 for more context.
//
// This becomes nil when there are zero statefulset(s). That's fine because then there should be zero stateful(s) to be recreated either hence
// we don't need to guard with lastSyncTime.
var lastSyncTime *time.Time
for _, ss := range owners {
log := log.WithValues("owner", types.NamespacedName{Namespace: ss.GetNamespace(), Name: ss.GetName()})
res, err := getPodsForOwner(ctx, c, log, ss)
if err != nil {
return nil, err
}
if res.templateHash == "" {
log.Info("validation error: runner pod owner must have template hash", "object", res.object)
return nil, nil
}
// Statefulset termination process 4/4: Let Kubernetes cascade-delete the statefulset and the pods.
//
// If the runner is already marked for deletion(=has a non-zero deletion timestamp) by the runner controller (can be caused by an ephemeral runner completion)
// or by this controller (in case it was deleted in the previous reconcilation loop),
// we don't need to bother calling GitHub API to re-mark the runner for deletion.
// Just hold on, and runners will disappear as long as the runner controller is up and running.
if !res.owner.GetDeletionTimestamp().IsZero() {
continue
}
// Statefulset termination process 3/4: Set the deletionTimestamp to let Kubernetes start a cascade deletion of the statefulset and the pods.
if _, ok := getAnnotation(res.owner, AnnotationKeyUnregistrationCompleteTimestamp); ok {
if err := c.Delete(ctx, res.object); err != nil {
log.Error(err, "Failed to delete owner")
return nil, err
}
log.V(2).Info("Started deletion of owner")
continue
}
// Statefulset termination process 2/4: Set unregistrationCompleteTimestamp only if all the pods managed by the statefulset
// have either unregistered or being deleted.
if _, ok := getAnnotation(res.owner, AnnotationKeyUnregistrationRequestTimestamp); ok {
var deletionSafe int
for _, po := range res.pods {
if _, ok := getAnnotation(&po, AnnotationKeyUnregistrationCompleteTimestamp); ok {
deletionSafe++
} else if !po.DeletionTimestamp.IsZero() {
deletionSafe++
}
}
if deletionSafe == res.total {
log.V(2).Info("Marking owner for unregistration completion", "deletionSafe", deletionSafe, "total", res.total)
if _, ok := getAnnotation(res.owner, AnnotationKeyUnregistrationCompleteTimestamp); !ok {
updated := res.owner.withAnnotation(AnnotationKeyUnregistrationCompleteTimestamp, time.Now().Format(time.RFC3339))
if err := c.Patch(ctx, updated, client.MergeFrom(res.owner)); err != nil {
log.Error(err, fmt.Sprintf("Failed to patch owner to have %s annotation", AnnotationKeyUnregistrationCompleteTimestamp))
return nil, err
}
log.V(2).Info("Redundant owner has been annotated to start the deletion")
} else {
log.V(2).Info("BUG: Redundant owner was already annotated to start the deletion")
}
continue
}
}
if annotations := res.owner.GetAnnotations(); annotations != nil {
if a, ok := annotations[SyncTimeAnnotationKey]; ok {
t, err := time.Parse(time.RFC3339, a)
if err == nil {
if lastSyncTime == nil || lastSyncTime.Before(t) {
lastSyncTime = &t
}
}
}
}
// A completed owner and a completed runner pod can safely be deleted without
// a race condition so delete it here,
// so that the later process can be a bit simpler.
if res.total > 0 && res.total == res.completed {
if err := c.Delete(ctx, ss); err != nil {
log.Error(err, "Unable to delete owner")
return nil, err
}
log.V(2).Info("Deleted completed owner")
return nil, nil
}
if !res.synced {
log.V(1).Info("Skipped reconcilation because owner is not synced yet", "pods", res.pods)
return nil, nil
}
podsForOwnerPerTemplateHash[res.templateHash] = append(podsForOwnerPerTemplateHash[res.templateHash], res)
}
return &state{podsForOwnerPerTemplateHash, lastSyncTime}, nil
}

View File

@@ -118,6 +118,8 @@ func (r *RunnerDeploymentReconciler) Reconcile(ctx context.Context, req ctrl.Req
return ctrl.Result{}, err
}
log.Info("Created runnerreplicaset", "runnerreplicaset", desiredRS.Name)
return ctrl.Result{}, nil
}
@@ -142,6 +144,8 @@ func (r *RunnerDeploymentReconciler) Reconcile(ctx context.Context, req ctrl.Req
return ctrl.Result{}, err
}
log.Info("Created runnerreplicaset", "runnerreplicaset", desiredRS.Name)
// We requeue in order to clean up old runner replica sets later.
// Otherwise, they aren't cleaned up until the next re-sync interval.
return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
@@ -177,6 +181,7 @@ func (r *RunnerDeploymentReconciler) Reconcile(ctx context.Context, req ctrl.Req
// Please add more conditions that we can in-place update the newest runnerreplicaset without disruption
if currentDesiredReplicas != newDesiredReplicas {
newestSet.Spec.Replicas = &newDesiredReplicas
newestSet.Spec.EffectiveTime = rd.Spec.EffectiveTime
if err := r.Client.Update(ctx, newestSet); err != nil {
log.Error(err, "Failed to update runnerreplicaset resource")
@@ -221,15 +226,38 @@ func (r *RunnerDeploymentReconciler) Reconcile(ctx context.Context, req ctrl.Req
for i := range oldSets {
rs := oldSets[i]
rslog := log.WithValues("runnerreplicaset", rs.Name)
if rs.Status.Replicas != nil && *rs.Status.Replicas > 0 {
if rs.Spec.Replicas != nil && *rs.Spec.Replicas == 0 {
rslog.V(2).Info("Waiting for runnerreplicaset to scale to zero")
continue
}
updated := rs.DeepCopy()
zero := 0
updated.Spec.Replicas = &zero
if err := r.Client.Update(ctx, updated); err != nil {
rslog.Error(err, "Failed to scale runnerreplicaset to zero")
return ctrl.Result{}, err
}
rslog.Info("Scaled runnerreplicaset to zero")
continue
}
if err := r.Client.Delete(ctx, &rs); err != nil {
log.Error(err, "Failed to delete runnerreplicaset resource")
rslog.Error(err, "Failed to delete runnerreplicaset resource")
return ctrl.Result{}, err
}
r.Recorder.Event(&rd, corev1.EventTypeNormal, "RunnerReplicaSetDeleted", fmt.Sprintf("Deleted runnerreplicaset '%s'", rs.Name))
log.Info("Deleted runnerreplicaset", "runnerdeployment", rd.ObjectMeta.Name, "runnerreplicaset", rs.Name)
rslog.Info("Deleted runnerreplicaset")
}
}
@@ -417,9 +445,10 @@ func newRunnerReplicaSet(rd *v1alpha1.RunnerDeployment, commonRunnerLabels []str
Labels: newRSTemplate.ObjectMeta.Labels,
},
Spec: v1alpha1.RunnerReplicaSetSpec{
Replicas: rd.Spec.Replicas,
Selector: newRSSelector,
Template: newRSTemplate,
Replicas: rd.Spec.Replicas,
Selector: newRSSelector,
Template: newRSTemplate,
EffectiveTime: rd.Spec.EffectiveTime,
},
}

View File

@@ -18,13 +18,10 @@ package controllers
import (
"context"
"errors"
"fmt"
"reflect"
"time"
"github.com/go-logr/logr"
gogithub "github.com/google/go-github/v39/github"
kerrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
@@ -32,7 +29,6 @@ import (
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
@@ -49,6 +45,10 @@ type RunnerReplicaSetReconciler struct {
Name string
}
const (
SyncTimeAnnotationKey = "sync-time"
)
// +kubebuilder:rbac:groups=actions.summerwind.dev,resources=runnerreplicasets,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=actions.summerwind.dev,resources=runnerreplicasets/finalizers,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=actions.summerwind.dev,resources=runnerreplicasets/status,verbs=get;update;patch
@@ -65,18 +65,42 @@ func (r *RunnerReplicaSetReconciler) Reconcile(ctx context.Context, req ctrl.Req
}
if !rs.ObjectMeta.DeletionTimestamp.IsZero() {
// RunnerReplicaSet cannot be gracefuly removed.
// That means any runner that is running a job can be prematurely terminated.
// To gracefully remove a RunnerReplicaSet, scale it down to zero first, observe RunnerReplicaSet's status replicas,
// and remove it only after the status replicas becomes zero.
return ctrl.Result{}, nil
}
if rs.ObjectMeta.Labels == nil {
rs.ObjectMeta.Labels = map[string]string{}
}
// Template hash is usually set by the upstream controller(RunnerDeplloyment controller) on authoring
// RunerReplicaset resource, but it may be missing when the user directly created RunnerReplicaSet.
// As a template hash is required by by the runner replica management, we dynamically add it here without ever persisting it.
if rs.ObjectMeta.Labels[LabelKeyRunnerTemplateHash] == "" {
template := rs.Spec.DeepCopy()
template.Replicas = nil
template.EffectiveTime = nil
templateHash := ComputeHash(template)
log.Info("Using auto-generated template hash", "value", templateHash)
rs.ObjectMeta.Labels = CloneAndAddLabel(rs.ObjectMeta.Labels, LabelKeyRunnerTemplateHash, templateHash)
rs.Spec.Template.ObjectMeta.Labels = CloneAndAddLabel(rs.Spec.Template.ObjectMeta.Labels, LabelKeyRunnerTemplateHash, templateHash)
}
selector, err := metav1.LabelSelectorAsSelector(rs.Spec.Selector)
if err != nil {
return ctrl.Result{}, err
}
// Get the Runners managed by the target RunnerReplicaSet
var allRunners v1alpha1.RunnerList
var runnerList v1alpha1.RunnerList
if err := r.List(
ctx,
&allRunners,
&runnerList,
client.InNamespace(req.Namespace),
client.MatchingLabelsSelector{Selector: selector},
); err != nil {
@@ -85,179 +109,44 @@ func (r *RunnerReplicaSetReconciler) Reconcile(ctx context.Context, req ctrl.Req
}
}
var myRunners []v1alpha1.Runner
replicas := 1
if rs.Spec.Replicas != nil {
replicas = *rs.Spec.Replicas
}
effectiveTime := rs.Spec.EffectiveTime
ephemeral := rs.Spec.Template.Spec.Ephemeral == nil || *rs.Spec.Template.Spec.Ephemeral
desired, err := r.newRunner(rs)
if err != nil {
log.Error(err, "Could not create runner")
return ctrl.Result{}, err
}
var live []client.Object
for _, r := range runnerList.Items {
r := r
live = append(live, &r)
}
res, err := syncRunnerPodsOwners(ctx, r.Client, log, effectiveTime, replicas, func() client.Object { return desired.DeepCopy() }, ephemeral, live)
if err != nil || res == nil {
return ctrl.Result{}, err
}
var (
current int
ready int
available int
status v1alpha1.RunnerReplicaSetStatus
current, available, ready int
)
for _, r := range allRunners.Items {
// This guard is required to avoid the RunnerReplicaSet created by the controller v0.17.0 or before
// to not treat all the runners in the namespace as its children.
if metav1.IsControlledBy(&r, &rs) && !metav1.HasAnnotation(r.ObjectMeta, annotationKeyRegistrationOnly) {
myRunners = append(myRunners, r)
current += 1
if r.Status.Phase == string(corev1.PodRunning) {
ready += 1
// available is currently the same as ready, as we don't yet have minReadySeconds for runners
available += 1
}
}
for _, o := range res.currentObjects {
current += o.total
available += o.running
ready += o.running
}
var desired int
if rs.Spec.Replicas != nil {
desired = *rs.Spec.Replicas
} else {
desired = 1
}
// TODO: remove this registration runner cleanup later (v0.23.0 or v0.24.0)
//
// We had to have a registration-only runner to support scale-from-zero before.
// But since Sep 2021 Actions update on GitHub Cloud and GHES 3.3, it is unneceesary.
// See the below issues for more contexts:
// https://github.com/actions-runner-controller/actions-runner-controller/issues/516
// https://github.com/actions-runner-controller/actions-runner-controller/issues/859
//
// In the below block, we have a logic to remove existing registration-only runners as unnecessary.
// This logic is introduced since actions-runner-controller 0.21.0 and probably last one or two minor releases
// so that actions-runner-controller instance in everyone's cluster won't leave dangling registration-only runners.
registrationOnlyRunnerNsName := req.NamespacedName
registrationOnlyRunnerNsName.Name = registrationOnlyRunnerNameFor(rs.Name)
registrationOnlyRunner := v1alpha1.Runner{}
registrationOnlyRunnerExists := false
if err := r.Get(
ctx,
registrationOnlyRunnerNsName,
&registrationOnlyRunner,
); err != nil {
if !kerrors.IsNotFound(err) {
return ctrl.Result{}, err
}
} else {
registrationOnlyRunnerExists = true
}
if registrationOnlyRunnerExists {
if err := r.Client.Delete(ctx, &registrationOnlyRunner); err != nil {
log.Error(err, "Retrying soon because we failed to delete registration-only runner")
return ctrl.Result{Requeue: true}, nil
}
}
if current > desired {
n := current - desired
log.V(0).Info(fmt.Sprintf("Deleting %d runners", n), "desired", desired, "current", current, "ready", ready)
// get runners that are currently offline/not busy/timed-out to register
var deletionCandidates []v1alpha1.Runner
for _, runner := range allRunners.Items {
busy, err := r.GitHubClient.IsRunnerBusy(ctx, runner.Spec.Enterprise, runner.Spec.Organization, runner.Spec.Repository, runner.Name)
if err != nil {
notRegistered := false
offline := false
var notFoundException *github.RunnerNotFound
var offlineException *github.RunnerOffline
if errors.As(err, &notFoundException) {
log.V(1).Info("Failed to check if runner is busy. Either this runner has never been successfully registered to GitHub or it still needs more time.", "runnerName", runner.Name)
notRegistered = true
} else if errors.As(err, &offlineException) {
offline = true
} else {
var e *gogithub.RateLimitError
if errors.As(err, &e) {
// We log the underlying error when we failed calling GitHub API to list or unregisters,
// or the runner is still busy.
log.Error(
err,
fmt.Sprintf(
"Failed to check if runner is busy due to GitHub API rate limit. Retrying in %s to avoid excessive GitHub API calls",
retryDelayOnGitHubAPIRateLimitError,
),
)
return ctrl.Result{RequeueAfter: retryDelayOnGitHubAPIRateLimitError}, err
}
return ctrl.Result{}, err
}
registrationTimeout := 15 * time.Minute
currentTime := time.Now()
registrationDidTimeout := currentTime.Sub(runner.CreationTimestamp.Add(registrationTimeout)) > 0
if notRegistered && registrationDidTimeout {
log.Info(
"Runner failed to register itself to GitHub in timely manner. "+
"Marking the runner for scale down. "+
"CAUTION: If you see this a lot, you should investigate the root cause. "+
"See https://github.com/actions-runner-controller/actions-runner-controller/issues/288",
"runnerCreationTimestamp", runner.CreationTimestamp,
"currentTime", currentTime,
"configuredRegistrationTimeout", registrationTimeout,
)
deletionCandidates = append(deletionCandidates, runner)
}
// offline runners should always be a great target for scale down
if offline {
deletionCandidates = append(deletionCandidates, runner)
}
} else if !busy {
deletionCandidates = append(deletionCandidates, runner)
}
}
if len(deletionCandidates) < n {
n = len(deletionCandidates)
}
log.V(0).Info(fmt.Sprintf("Deleting %d runner(s)", n), "desired", desired, "current", current, "ready", ready)
for i := 0; i < n; i++ {
if err := r.Client.Delete(ctx, &deletionCandidates[i]); client.IgnoreNotFound(err) != nil {
log.Error(err, "Failed to delete runner resource")
return ctrl.Result{}, err
}
r.Recorder.Event(&rs, corev1.EventTypeNormal, "RunnerDeleted", fmt.Sprintf("Deleted runner '%s'", deletionCandidates[i].Name))
log.Info("Deleted runner")
}
} else if desired > current {
n := desired - current
log.V(0).Info(fmt.Sprintf("Creating %d runner(s)", n), "desired", desired, "available", current, "ready", ready)
for i := 0; i < n; i++ {
newRunner, err := r.newRunner(rs)
if err != nil {
log.Error(err, "Could not create runner")
return ctrl.Result{}, err
}
if err := r.Client.Create(ctx, &newRunner); err != nil {
log.Error(err, "Failed to create runner resource")
return ctrl.Result{}, err
}
}
}
var status v1alpha1.RunnerReplicaSetStatus
status.Replicas = &current
status.AvailableReplicas = &available
status.ReadyReplicas = &ready
@@ -278,10 +167,16 @@ func (r *RunnerReplicaSetReconciler) Reconcile(ctx context.Context, req ctrl.Req
}
func (r *RunnerReplicaSetReconciler) newRunner(rs v1alpha1.RunnerReplicaSet) (v1alpha1.Runner, error) {
// Note that the upstream controller (runnerdeployment) is expected to add
// the "runner template hash" label to the template.meta which is necessary to make this controller work correctly
objectMeta := rs.Spec.Template.ObjectMeta.DeepCopy()
objectMeta.GenerateName = rs.ObjectMeta.Name + "-"
objectMeta.Namespace = rs.ObjectMeta.Namespace
if objectMeta.Annotations == nil {
objectMeta.Annotations = map[string]string{}
}
objectMeta.Annotations[SyncTimeAnnotationKey] = time.Now().Format(time.RFC3339)
runner := v1alpha1.Runner{
TypeMeta: metav1.TypeMeta{},

View File

@@ -7,7 +7,6 @@ import (
"time"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
logf "sigs.k8s.io/controller-runtime/pkg/log"
@@ -102,12 +101,40 @@ func intPtr(v int) *int {
var _ = Context("Inside of a new namespace", func() {
ctx := context.TODO()
ns := SetupTest(ctx)
name := "example-runnerreplicaset"
Describe("when no existing resources exist", func() {
getRunnerCount := func() int {
runners := actionsv1alpha1.RunnerList{Items: []actionsv1alpha1.Runner{}}
It("should create a new Runner resource from the specified template, add a another Runner on replicas increased, and removes all the replicas when set to 0", func() {
name := "example-runnerreplicaset"
selector, err := metav1.LabelSelectorAsSelector(
&metav1.LabelSelector{
MatchLabels: map[string]string{
"foo": "bar",
},
},
)
if err != nil {
logf.Log.Error(err, "failed to create labelselector")
return -1
}
err = k8sClient.List(
ctx,
&runners,
client.InNamespace(ns.Name),
client.MatchingLabelsSelector{Selector: selector},
)
if err != nil {
logf.Log.Error(err, "list runners")
}
runnersList.Sync(runners.Items)
return len(runners.Items)
}
Describe("RunnerReplicaSet", func() {
It("should create a new Runner resource from the specified template", func() {
{
rs := &actionsv1alpha1.RunnerReplicaSet{
ObjectMeta: metav1.ObjectMeta{
@@ -146,126 +173,99 @@ var _ = Context("Inside of a new namespace", func() {
Expect(err).NotTo(HaveOccurred(), "failed to create test RunnerReplicaSet resource")
runners := actionsv1alpha1.RunnerList{Items: []actionsv1alpha1.Runner{}}
Eventually(
func() int {
selector, err := metav1.LabelSelectorAsSelector(
&metav1.LabelSelector{
MatchLabels: map[string]string{
"foo": "bar",
},
},
)
if err != nil {
logf.Log.Error(err, "failed to create labelselector")
return -1
}
err = k8sClient.List(
ctx,
&runners,
client.InNamespace(ns.Name),
client.MatchingLabelsSelector{Selector: selector},
)
if err != nil {
logf.Log.Error(err, "list runners")
return -1
}
runnersList.Sync(runners.Items)
return len(runners.Items)
},
time.Second*5, time.Millisecond*500).Should(BeEquivalentTo(1))
getRunnerCount,
time.Second*5, time.Second).Should(BeEquivalentTo(1))
}
})
It("should create 2 runners when specified 2 replicas", func() {
{
// We wrap the update in the Eventually block to avoid the below error that occurs due to concurrent modification
// made by the controller to update .Status.AvailableReplicas and .Status.ReadyReplicas
// Operation cannot be fulfilled on runnerreplicasets.actions.summerwind.dev "example-runnerreplicaset": the object has been modified; please apply your changes to the latest version and try again
Eventually(func() error {
var rs actionsv1alpha1.RunnerReplicaSet
err := k8sClient.Get(ctx, types.NamespacedName{Namespace: ns.Name, Name: name}, &rs)
Expect(err).NotTo(HaveOccurred(), "failed to get test RunnerReplicaSet resource")
rs.Spec.Replicas = intPtr(2)
return k8sClient.Update(ctx, &rs)
},
time.Second*1, time.Millisecond*500).Should(BeNil())
runners := actionsv1alpha1.RunnerList{Items: []actionsv1alpha1.Runner{}}
Eventually(
func() int {
selector, err := metav1.LabelSelectorAsSelector(
&metav1.LabelSelector{
MatchLabels: map[string]string{
"foo": "bar",
},
},
)
if err != nil {
logf.Log.Error(err, "failed to create labelselector")
return -1
}
err = k8sClient.List(
ctx,
&runners,
client.InNamespace(ns.Name),
client.MatchingLabelsSelector{Selector: selector},
)
if err != nil {
logf.Log.Error(err, "list runners")
}
runnersList.Sync(runners.Items)
return len(runners.Items)
rs := &actionsv1alpha1.RunnerReplicaSet{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
time.Second*5, time.Millisecond*500).Should(BeEquivalentTo(2))
}
{
// We wrap the update in the Eventually block to avoid the below error that occurs due to concurrent modification
// made by the controller to update .Status.AvailableReplicas and .Status.ReadyReplicas
// Operation cannot be fulfilled on runnersets.actions.summerwind.dev "example-runnerset": the object has been modified; please apply your changes to the latest version and try again
Eventually(func() error {
var rs actionsv1alpha1.RunnerReplicaSet
err := k8sClient.Get(ctx, types.NamespacedName{Namespace: ns.Name, Name: name}, &rs)
Expect(err).NotTo(HaveOccurred(), "failed to get test RunnerReplicaSet resource")
rs.Spec.Replicas = intPtr(0)
return k8sClient.Update(ctx, &rs)
},
time.Second*1, time.Millisecond*500).Should(BeNil())
runners := actionsv1alpha1.RunnerList{Items: []actionsv1alpha1.Runner{}}
Eventually(
func() int {
selector, err := metav1.LabelSelectorAsSelector(&metav1.LabelSelector{
Spec: actionsv1alpha1.RunnerReplicaSetSpec{
Replicas: intPtr(2),
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"foo": "bar",
},
})
Expect(err).ToNot(HaveOccurred())
if err := k8sClient.List(ctx, &runners, client.InNamespace(ns.Name), client.MatchingLabelsSelector{Selector: selector}); err != nil {
logf.Log.Error(err, "list runners")
return -1
}
runnersList.Sync(runners.Items)
return len(runners.Items)
},
Template: actionsv1alpha1.RunnerTemplate{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"foo": "bar",
},
},
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Repository: "test/valid",
Image: "bar",
},
RunnerPodSpec: actionsv1alpha1.RunnerPodSpec{
Env: []corev1.EnvVar{
{Name: "FOO", Value: "FOOVALUE"},
},
},
},
},
},
time.Second*5, time.Millisecond*500).Should(BeEquivalentTo(0))
}
err := k8sClient.Create(ctx, rs)
Expect(err).NotTo(HaveOccurred(), "failed to create test RunnerReplicaSet resource")
Eventually(
getRunnerCount,
time.Second*5, time.Second).Should(BeEquivalentTo(2))
}
})
It("should not create any runners when specified 0 replicas", func() {
{
rs := &actionsv1alpha1.RunnerReplicaSet{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: ns.Name,
},
Spec: actionsv1alpha1.RunnerReplicaSetSpec{
Replicas: intPtr(0),
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"foo": "bar",
},
},
Template: actionsv1alpha1.RunnerTemplate{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"foo": "bar",
},
},
Spec: actionsv1alpha1.RunnerSpec{
RunnerConfig: actionsv1alpha1.RunnerConfig{
Repository: "test/valid",
Image: "bar",
},
RunnerPodSpec: actionsv1alpha1.RunnerPodSpec{
Env: []corev1.EnvVar{
{Name: "FOO", Value: "FOOVALUE"},
},
},
},
},
},
}
err := k8sClient.Create(ctx, rs)
Expect(err).NotTo(HaveOccurred(), "failed to create test RunnerReplicaSet resource")
Consistently(
getRunnerCount,
time.Second*5, time.Second).Should(BeEquivalentTo(0))
}
})
})

View File

@@ -22,8 +22,6 @@ import (
"time"
appsv1 "k8s.io/api/apps/v1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/client-go/tools/record"
@@ -38,10 +36,6 @@ import (
"github.com/go-logr/logr"
)
const (
LabelKeyRunnerSetName = "runnerset-name"
)
// RunnerSetReconciler reconciles a Runner object
type RunnerSetReconciler struct {
Name string
@@ -90,6 +84,18 @@ func (r *RunnerSetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (
metrics.SetRunnerSet(*runnerSet)
var statefulsetList appsv1.StatefulSetList
if err := r.List(ctx, &statefulsetList, client.InNamespace(req.Namespace), client.MatchingFields{runnerSetOwnerKey: req.Name}); err != nil {
return ctrl.Result{}, err
}
statefulsets := statefulsetList.Items
if len(statefulsets) > 1000 {
log.Info("Postponed reconcilation to prevent potential infinite loop. If you're really scaling more than 1000 statefulsets, do change this hard-coded threshold!")
return ctrl.Result{}, nil
}
desiredStatefulSet, err := r.newStatefulSet(runnerSet)
if err != nil {
r.Recorder.Event(runnerSet, corev1.EventTypeNormal, "RunnerAutoscalingFailure", err.Error())
@@ -99,107 +105,43 @@ func (r *RunnerSetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (
return ctrl.Result{}, err
}
liveStatefulSet := &appsv1.StatefulSet{}
if err := r.Get(ctx, types.NamespacedName{Namespace: runnerSet.Namespace, Name: runnerSet.Name}, liveStatefulSet); err != nil {
if !errors.IsNotFound(err) {
log.Error(err, "Failed to get live statefulset")
return ctrl.Result{}, err
}
if err := r.Client.Create(ctx, desiredStatefulSet); err != nil {
log.Error(err, "Failed to create statefulset resource")
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
liveTemplateHash, ok := getStatefulSetTemplateHash(liveStatefulSet)
if !ok {
log.Info("Failed to get template hash of newest statefulset resource. It must be in an invalid state. Please manually delete the statefulset so that it is recreated")
return ctrl.Result{}, nil
}
desiredTemplateHash, ok := getStatefulSetTemplateHash(desiredStatefulSet)
if !ok {
log.Info("Failed to get template hash of desired statefulset. It must be in an invalid state. Please manually delete the statefulset so that it is recreated")
return ctrl.Result{}, nil
}
if liveTemplateHash != desiredTemplateHash {
copy := liveStatefulSet.DeepCopy()
copy.Spec = desiredStatefulSet.Spec
if err := r.Client.Patch(ctx, copy, client.MergeFrom(liveStatefulSet)); err != nil {
log.Error(err, "Failed to patch statefulset", "reason", errors.ReasonForError(err))
if errors.IsInvalid(err) {
// NOTE: This might not be ideal but is currently required to deal with the forbidden error by recreating the statefulset
//
// 2021-06-13T07:19:52.760Z ERROR actions-runner-controller.runnerset Failed to patch statefulset
// {"runnerset": "default/example-runnerset", "error": "StatefulSet.apps \"example-runnerset\" is invalid: s
// pec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy'
// are forbidden"}
//
// Even though the error message includes "Forbidden", this error's reason is "Invalid".
// That's why we're using errors.IsInvalid above.
if err := r.Client.Delete(ctx, liveStatefulSet); err != nil {
log.Error(err, "Failed to delete statefulset for force-update")
return ctrl.Result{}, err
}
log.Info("Deleted statefulset for force-update")
}
return ctrl.Result{}, err
}
// We requeue in order to clean up old runner replica sets later.
// Otherwise, they aren't cleaned up until the next re-sync interval.
return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
}
addedReplicas := int32(1)
create := desiredStatefulSet.DeepCopy()
create.Spec.Replicas = &addedReplicas
const defaultReplicas = 1
var replicasOfLiveStatefulSet *int
if liveStatefulSet.Spec.Replicas != nil {
v := int(*liveStatefulSet.Spec.Replicas)
replicasOfLiveStatefulSet = &v
}
var replicasOfDesiredStatefulSet *int
if desiredStatefulSet.Spec.Replicas != nil {
v := int(*desiredStatefulSet.Spec.Replicas)
replicasOfDesiredStatefulSet = &v
}
currentDesiredReplicas := getIntOrDefault(replicasOfLiveStatefulSet, defaultReplicas)
newDesiredReplicas := getIntOrDefault(replicasOfDesiredStatefulSet, defaultReplicas)
// Please add more conditions that we can in-place update the newest runnerreplicaset without disruption
if currentDesiredReplicas != newDesiredReplicas {
v := int32(newDesiredReplicas)
effectiveTime := runnerSet.Spec.EffectiveTime
ephemeral := runnerSet.Spec.Ephemeral == nil || *runnerSet.Spec.Ephemeral
updated := liveStatefulSet.DeepCopy()
updated.Spec.Replicas = &v
var owners []client.Object
if err := r.Client.Patch(ctx, updated, client.MergeFrom(liveStatefulSet)); err != nil {
log.Error(err, "Failed to update statefulset")
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
for _, ss := range statefulsets {
ss := ss
owners = append(owners, &ss)
}
statusReplicas := int(liveStatefulSet.Status.Replicas)
statusReadyReplicas := int(liveStatefulSet.Status.ReadyReplicas)
totalCurrentReplicas := int(liveStatefulSet.Status.CurrentReplicas)
updatedReplicas := int(liveStatefulSet.Status.UpdatedReplicas)
res, err := syncRunnerPodsOwners(ctx, r.Client, log, effectiveTime, newDesiredReplicas, func() client.Object { return create.DeepCopy() }, ephemeral, owners)
if err != nil || res == nil {
return ctrl.Result{}, err
}
var statusReplicas, statusReadyReplicas, totalCurrentReplicas, updatedReplicas int
for _, ss := range res.currentObjects {
statusReplicas += int(ss.statefulSet.Status.Replicas)
statusReadyReplicas += int(ss.statefulSet.Status.ReadyReplicas)
totalCurrentReplicas += int(ss.statefulSet.Status.CurrentReplicas)
updatedReplicas += int(ss.statefulSet.Status.UpdatedReplicas)
}
status := runnerSet.Status.DeepCopy()
@@ -224,12 +166,6 @@ func (r *RunnerSetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (
return ctrl.Result{}, nil
}
func getStatefulSetTemplateHash(rs *appsv1.StatefulSet) (string, bool) {
hash, ok := rs.Labels[LabelKeyRunnerTemplateHash]
return hash, ok
}
func getRunnerSetSelector(runnerSet *v1alpha1.RunnerSet) *metav1.LabelSelector {
selector := runnerSet.Spec.Selector
if selector == nil {
@@ -249,17 +185,12 @@ func (r *RunnerSetReconciler) newStatefulSet(runnerSet *v1alpha1.RunnerSet) (*ap
runnerSetWithOverrides.Labels = append(runnerSetWithOverrides.Labels, l)
}
// This label selector is used by default when rd.Spec.Selector is empty.
runnerSetWithOverrides.Template.ObjectMeta.Labels = CloneAndAddLabel(runnerSetWithOverrides.Template.ObjectMeta.Labels, LabelKeyRunnerSetName, runnerSet.Name)
runnerSetWithOverrides.Template.ObjectMeta.Labels = CloneAndAddLabel(runnerSetWithOverrides.Template.ObjectMeta.Labels, LabelKeyPodMutation, LabelValuePodMutation)
template := corev1.Pod{
ObjectMeta: runnerSetWithOverrides.StatefulSetSpec.Template.ObjectMeta,
Spec: runnerSetWithOverrides.StatefulSetSpec.Template.Spec,
}
pod, err := newRunnerPod(template, runnerSet.Spec.RunnerConfig, r.RunnerImage, r.RunnerImagePullSecrets, r.DockerImage, r.DockerRegistryMirror, r.GitHubBaseURL, false)
pod, err := newRunnerPod(runnerSet.Name, template, runnerSet.Spec.RunnerConfig, r.RunnerImage, r.RunnerImagePullSecrets, r.DockerImage, r.DockerRegistryMirror, r.GitHubBaseURL, false)
if err != nil {
return nil, err
}
@@ -288,9 +219,12 @@ func (r *RunnerSetReconciler) newStatefulSet(runnerSet *v1alpha1.RunnerSet) (*ap
rs := appsv1.StatefulSet{
TypeMeta: metav1.TypeMeta{},
ObjectMeta: metav1.ObjectMeta{
Name: runnerSet.ObjectMeta.Name,
Namespace: runnerSet.ObjectMeta.Namespace,
Labels: CloneAndAddLabel(runnerSet.ObjectMeta.Labels, LabelKeyRunnerTemplateHash, templateHash),
GenerateName: runnerSet.ObjectMeta.Name + "-",
Namespace: runnerSet.ObjectMeta.Namespace,
Labels: CloneAndAddLabel(runnerSet.ObjectMeta.Labels, LabelKeyRunnerTemplateHash, templateHash),
Annotations: map[string]string{
SyncTimeAnnotationKey: time.Now().Format(time.RFC3339),
},
},
Spec: runnerSetWithOverrides.StatefulSetSpec,
}
@@ -310,6 +244,22 @@ func (r *RunnerSetReconciler) SetupWithManager(mgr ctrl.Manager) error {
r.Recorder = mgr.GetEventRecorderFor(name)
if err := mgr.GetFieldIndexer().IndexField(context.TODO(), &appsv1.StatefulSet{}, runnerSetOwnerKey, func(rawObj client.Object) []string {
set := rawObj.(*appsv1.StatefulSet)
owner := metav1.GetControllerOf(set)
if owner == nil {
return nil
}
if owner.APIVersion != v1alpha1.GroupVersion.String() || owner.Kind != "RunnerSet" {
return nil
}
return []string{owner.Name}
}); err != nil {
return err
}
return ctrl.NewControllerManagedBy(mgr).
For(&v1alpha1.RunnerSet{}).
Owns(&appsv1.StatefulSet{}).

74
docs/releasenotes/0.22.md Normal file
View File

@@ -0,0 +1,74 @@
# actions-runner-controller v0.22.0
This version of ARC focuses on scalability and reliablity of runners.
## GitHub API Cache
In terms of scalability, ARC now caches GitHub API responses according to their recommendation(=Cache-Control header[^1]).
As long as GitHub keeps its current behavior, it will result in ARC to cache various List Runners API and List Workflow Jobs calls for 60 seconds.
[^1]: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests
The cache for List Runners API is expecially important, as their responses can be shared between every runner under the same scope (repository, organization, or enterprise).
In previous versions of ARC, the number of List Runners API calls had scaled proportional to the number of runners managed by ARC.
Thanks to the addition of cache, since v0.22.0, it may scale proportional to the number of runner scopes (=The number of repositories for your repository runners + The number of organizations for your organizational runners + The number of enterprises for your enterprise runners). You might be able to scale to hundreds of runners depending on your environemnt.
Please share your experience if you successfully scaled to a level that wasn't possible with previous versions!
## Improved Runner Scale Down Process
In terms of reliability, the first thing to note is that it has a new scale down process for both RunnerDeployment and RunnerSet.
Previously every runner pod can restart immediately after the completion, while at the same time ARC might mark the same runner pod for deletion due to scale down.
That resulted in various race conditions that terminated the runner prematurely while running a workflow job[^2].
[^2]: See [this issue](https://github.com/actions-runner-controller/actions-runner-controller/issues/911) for more context.
And it's now fixed. The new scale down process ensures that the runner has been registered successfully and then de-registered from GitHub Actions, before starting the runner pod deletion process.
Any runner pod can't be terminated while being restarting or running a job now, which makes it impossible to be in the middle of running a workflow job when a runner pod is being terminated. No more race conditions.
## Optimized Ephemeral Runner Termination Makes Less "Remove Runner" API calls
It is also worth mentioning that the new scale down process makes less GitHub Actions `RemoveRunner` API calls, which contributes to more scallability.
Two enhancements had been made on that.
First, every runner managed by ARC now [uses `--ephemeral` by default](https://github.com/actions-runner-controller/actions-runner-controller/pull/1211).
Second, we [removed unnecessary `RemoveRunner` API calls](https://github.com/actions-runner-controller/actions-runner-controller/pull/1204) when it's an ephemeral runner that has already completed running.
[GitHub designed ephemeral runners to be automatically unregistered from GitHub Actions after running their first workflow jobs](https://github.blog/changelog/2021-09-20-github-actions-ephemeral-self-hosted-runners-new-webhooks-for-auto-scaling). It is unnecessary to call `RemoveRunner` API when the ephemeral runner pod has already completed successfully. These two enhancements aligns with that fact and it results in ARC making less API calls.
## Prevention of Unnecessary Runner Pod Recreations
Another reliability enhancement is based on the addition of a new field, `EffectiveTime`, to our RunnerDeployment and RunnerSet specifications.
The field comes in play only for ephemeral runners, and ARC uses it as an indicator of when to add more runner pods, to match the current number of runner pods to the desired number.
How that improves the reliability?
Previously, ARC had been continuously recreating runner pods as they complete, with no delay. That sometimes resulted in a runner pod to get recreated and then immediately terminated without being used at all. Not only this is a waste of cluster resource, it resulted in race conditions we explained in the previous section about "Improved Runner Scale Down Process". We fixed the race conditions as explained in the previous section, but the waste of cluster resource was still problematic.
With `EffectiveTime`, ARC defers the addition(and recreations, as ARC doesn't distinguish addition vs recreation) of
missing runner pods until the `EffectiveTime` is updated. `EffectiveTime` is updated only when the github-webhook-server of ARC updates the desired replicas number, ARC adds/recreates runner pods only after the webhook server updates it, the issue is resolved.
This can be an unnecessary detail, but anyway- the "defer" mechanism times out after the `DefaultRunnerPodRecreationDelayAfterWebhookScale` duration, which is currently hard-coded to 10 minutes. So in case ARC missed receiving a webhook event for proper scaling, it converges to the desired replicas after 10 minutes anyway, so that the current state eventually syncs up with the desired state.
Note that `EffectiveTime` fields are set by HRA controller for any RunnerDeployment and RunnerSet that manages ephemeral runners. That means, it is enabled regardless of the type of autoscaler you're using, webhook or API polling based ones. It isn't enabled for static(persistent) runners.
There's currently no way to opt-out of `EffectiveTime` because the author of the feature(@mumoshu) thought it's unneeded. Please open a GitHub issue with details on your use-case if you do need to opt-out.
## Generalized Runner Pod Management Logic
This one might not be an user-visible change, but I'm explaining it for anyone who may wonder.
Since this version, ARC uses the same logic for `RunnerDeployment` and `RunnerSet`. `RunnerDeployment` is Pod-based and `RunnerSet` is StatefulSet-based. That remains unchanged. But the most of the logic about how runner pods are managed is shared between the two.
The only difference is that what adapters those variants pass to the generalized logic. `RunnerDeployment` uses `RunnerReplicaSet`(our another Kubernetes custom resource that powers `RunnerDeployment`) as an owner of a runner pod, and `RunnerSet` uses `StatefulSet`(it's vanilla Kubernetes StatefulSet) as an owner of a runner pod.
This refactoring turned out to enable us to make `RunnerSet` as reliable as `RunnerDeployment`. `RunnerSet` has been considered an experimental feature
even though it is more customizable than `RunnerDeployment` and has a support for Persistent Volume Claim(PVC)s.
But since it now uses the same logic under the hood, `RunnerSet` can be considered more production-ready than before.
If you staed away from using `RunnerSet` due to that, please try it and report anything you experienced!

89
docs/releasenotes/0.23.md Normal file
View File

@@ -0,0 +1,89 @@
# actions-runner-controller v0.23.0
All changes in this release can be found in the milestone https://github.com/actions-runner-controller/actions-runner-controller/milestone/3
This log documents breaking and major enhancements
## BREAKING CHANGE : Workflow job webhooks require an explicit field set
Previously the webhook event workflow job was set as the default if no `githubEvent` was set.
**Migration Steps**
Change this:
```yaml
scaleUpTriggers:
- githubEvent: {}
duration: "30m"
```
To this:
```yaml
scaleUpTriggers:
- githubEvent:
workflowJob: {}
duration: "30m"
```
## BREAKING CHANGE : topologySpreadConstraints renamed to topologySpreadConstraint
Previously to use the pod `topologySpreadConstraint:` attribute in your runners you had to set `topologySpreadConstraints:` instead, this was a typo and has been corrected.
**Migration Steps**
Update your runners to use `topologySpreadConstraints:` instead
## BREAKING CHANGE : Default sync period is now 1 minute instead of 10 minutes
Since caching as been implemented the default sync period of 10 minutes is unnecessarily conservative and gives a poor out of the box user experience. If you need a 10 minute sync period ensure you explicitly set this value.
**Migration Steps**
Update your sync period, how this is done will depend on how you've deployed ARC.
## BREAKING CHANGE : A metric is set by default
Previously if no metric was provided and you were using pull based scaling the `TotalNumberOfQueuedAndInProgressWorkflowRuns` was metric applied. No default is set now.
**Migration Steps**
Add in the `TotalNumberOfQueuedAndInProgressWorkflowRuns` metric where you are currenty relying on it
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example-runner-deployment
spec:
template:
spec:
organisation: my-awesome-organisation
labels:
- my-awesome-runner
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: example-runner-deployment-autoscaler
spec:
scaleTargetRef:
name: example-runner-deployment
minReplicas: 1
maxReplicas: 5
metrics:
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- owner/my-awesome-repo-1
- owner/my-awesome-repo-2
- owner/my-awesome-repo-3
```
## ENHANCEMENT : Find runner groups that visible to repository using a single API call
GitHub has contributed code to utilise a new API to enable us to get a repositories runner groups with a single API call. This enables us to scale runners based on the requesting repositories runner group membership without a series of expensive API queries.
This is an opt-in feature currently as it's a significant change in behaviour if enabled, additionally, whilst scaling based on the repositories runner group membership is supported in both GHES and github.com, only github.com currently has access to the new raate-limit budget friendly API.
To enable this set deploy via Helm and set `githubWebhookServer.useRunnerGroupsVisibility` to `true`.

View File

@@ -162,6 +162,10 @@ func NewServer(opts ...Option) *httptest.Server {
},
// For RemoveRunner
"/repos/test/valid/actions/runners/0": &Handler{
Status: http.StatusNoContent,
Body: "",
},
"/repos/test/valid/actions/runners/1": &Handler{
Status: http.StatusNoContent,
Body: "",

View File

@@ -11,8 +11,11 @@ import (
"time"
"github.com/actions-runner-controller/actions-runner-controller/github/metrics"
"github.com/actions-runner-controller/actions-runner-controller/logging"
"github.com/bradleyfalzon/ghinstallation"
"github.com/go-logr/logr"
"github.com/google/go-github/v39/github"
"github.com/gregjones/httpcache"
"golang.org/x/oauth2"
)
@@ -28,6 +31,8 @@ type Config struct {
BasicauthUsername string `split_words:"true"`
BasicauthPassword string `split_words:"true"`
RunnerGitHubURL string `split_words:"true"`
Log *logr.Logger
}
// Client wraps GitHub client with some additional
@@ -46,7 +51,6 @@ type BasicAuthTransport struct {
func (p BasicAuthTransport) RoundTrip(req *http.Request) (*http.Response, error) {
req.SetBasicAuth(p.Username, p.Password)
req.Header.Set("User-Agent", "actions-runner-controller")
return http.DefaultTransport.RoundTrip(req)
}
@@ -82,8 +86,11 @@ func (c *Config) NewClient() (*Client, error) {
transport = tr
}
transport = metrics.Transport{Transport: transport}
httpClient := &http.Client{Transport: transport}
cached := httpcache.NewTransport(httpcache.NewMemoryCache())
cached.Transport = transport
loggingTransport := logging.Transport{Transport: cached, Log: c.Log}
metricsTransport := metrics.Transport{Transport: loggingTransport}
httpClient := &http.Client{Transport: metricsTransport}
var client *github.Client
var githubBaseURL string
@@ -128,6 +135,8 @@ func (c *Config) NewClient() (*Client, error) {
}
}
client.UserAgent = "actions-runner-controller"
return &Client{
Client: client,
regTokens: map[string]*github.RegistrationToken{},
@@ -144,8 +153,18 @@ func (c *Client) GetRegistrationToken(ctx context.Context, enterprise, org, repo
key := getRegistrationKey(org, repo, enterprise)
rt, ok := c.regTokens[key]
// we like to give runners a chance that are just starting up and may miss the expiration date by a bit
runnerStartupTimeout := 3 * time.Minute
// We'd like to allow the runner just starting up to miss the expiration date by a bit.
// Note that this means that we're going to cache Creation Registraion Token API response longer than the
// recommended cache duration.
//
// https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-a-repository
// https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-an-organization
// https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-an-enterprise
// https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests
//
// This is currently set to 30 minutes as the result of the discussion took place at the following issue:
// https://github.com/actions-runner-controller/actions-runner-controller/issues/1295
runnerStartupTimeout := 30 * time.Minute
if ok && rt.GetExpiresAt().After(time.Now().Add(runnerStartupTimeout)) {
return rt, nil
@@ -246,6 +265,29 @@ func (c *Client) ListOrganizationRunnerGroups(ctx context.Context, org string) (
return runnerGroups, nil
}
// ListOrganizationRunnerGroupsForRepository returns all the runner groups defined in the organization and
// inherited to the organization from an enterprise.
// We can remove this when google/go-github library is updated to support this.
func (c *Client) ListOrganizationRunnerGroupsForRepository(ctx context.Context, org, repo string) ([]*github.RunnerGroup, error) {
var runnerGroups []*github.RunnerGroup
opts := github.ListOptions{PerPage: 100}
for {
list, res, err := c.listOrganizationRunnerGroupsVisibleToRepo(ctx, org, repo, &opts)
if err != nil {
return runnerGroups, fmt.Errorf("failed to list organization runner groups: %w", err)
}
runnerGroups = append(runnerGroups, list.RunnerGroups...)
if res.NextPage == 0 {
break
}
opts.Page = res.NextPage
}
return runnerGroups, nil
}
func (c *Client) ListRunnerGroupRepositoryAccesses(ctx context.Context, org string, runnerGroupId int64) ([]*github.Repository, error) {
var repos []*github.Repository
@@ -267,6 +309,42 @@ func (c *Client) ListRunnerGroupRepositoryAccesses(ctx context.Context, org stri
return repos, nil
}
// listOrganizationRunnerGroupsVisibleToRepo lists all self-hosted runner groups configured in an organization which can be used by the repository.
//
// GitHub API docs: https://docs.github.com/en/rest/reference/actions#list-self-hosted-runner-groups-for-an-organization
func (c *Client) listOrganizationRunnerGroupsVisibleToRepo(ctx context.Context, org, repo string, opts *github.ListOptions) (*github.RunnerGroups, *github.Response, error) {
repoName := repo
parts := strings.Split(repo, "/")
if len(parts) == 2 {
repoName = parts[1]
}
u := fmt.Sprintf("orgs/%v/actions/runner-groups?visible_to_repository=%v", org, repoName)
if opts != nil {
if opts.PerPage > 0 {
u = fmt.Sprintf("%v&per_page=%v", u, opts.PerPage)
}
if opts.Page > 0 {
u = fmt.Sprintf("%v&page=%v", u, opts.Page)
}
}
req, err := c.Client.NewRequest("GET", u, nil)
if err != nil {
return nil, nil, err
}
groups := &github.RunnerGroups{}
resp, err := c.Client.Do(ctx, req, &groups)
if err != nil {
return nil, resp, err
}
return groups, resp, nil
}
// cleanup removes expired registration tokens.
func (c *Client) cleanup() {
c.mu.Lock()

View File

@@ -152,3 +152,10 @@ func TestCleanup(t *testing.T) {
t.Errorf("expired token still exists")
}
}
func TestUserAgent(t *testing.T) {
client := newTestClient()
if client.UserAgent != "actions-runner-controller" {
t.Errorf("UserAgent should be set to actions-runner-controller")
}
}

41
go.mod
View File

@@ -6,29 +6,30 @@ require (
github.com/bradleyfalzon/ghinstallation v1.1.1
github.com/davecgh/go-spew v1.1.1
github.com/go-logr/logr v1.2.0
github.com/google/go-cmp v0.5.7
github.com/google/go-cmp v0.5.8
github.com/google/go-github/v39 v39.2.0
github.com/gorilla/mux v1.8.0
github.com/gregjones/httpcache v0.0.0-20190611155906-901d90724c79
github.com/kelseyhightower/envconfig v1.4.0
github.com/onsi/ginkgo v1.16.5
github.com/onsi/gomega v1.17.0
github.com/prometheus/client_golang v1.11.0
github.com/prometheus/client_golang v1.12.1
github.com/stretchr/testify v1.7.0
github.com/teambition/rrule-go v1.7.2
go.uber.org/zap v1.20.0
golang.org/x/oauth2 v0.0.0-20211104180415-d3ed0bb246c8
github.com/teambition/rrule-go v1.8.0
go.uber.org/zap v1.21.0
golang.org/x/oauth2 v0.0.0-20220309155454-6242fa91716a
gomodules.xyz/jsonpatch/v2 v2.2.0
k8s.io/api v0.23.0
k8s.io/apimachinery v0.23.0
k8s.io/client-go v0.23.0
sigs.k8s.io/controller-runtime v0.11.0
k8s.io/api v0.23.5
k8s.io/apimachinery v0.23.5
k8s.io/client-go v0.23.5
sigs.k8s.io/controller-runtime v0.11.2
sigs.k8s.io/yaml v1.3.0
)
require (
cloud.google.com/go v0.81.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.1.1 // indirect
github.com/cespare/xxhash/v2 v2.1.2 // indirect
github.com/dgrijalva/jwt-go v3.2.0+incompatible // indirect
github.com/evanphx/json-patch v4.12.0+incompatible // indirect
github.com/fsnotify/fsnotify v1.5.1 // indirect
@@ -50,15 +51,15 @@ require (
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_model v0.2.0 // indirect
github.com/prometheus/common v0.28.0 // indirect
github.com/prometheus/procfs v0.6.0 // indirect
github.com/prometheus/common v0.32.1 // indirect
github.com/prometheus/procfs v0.7.3 // indirect
github.com/spf13/pflag v1.0.5 // indirect
go.uber.org/atomic v1.7.0 // indirect
go.uber.org/multierr v1.6.0 // indirect
golang.org/x/crypto v0.0.0-20210817164053-32db794688a5 // indirect
golang.org/x/net v0.0.0-20210825183410-e898025ed96a // indirect
golang.org/x/sys v0.0.0-20211029165221-6e7872819dc8 // indirect
golang.org/x/term v0.0.0-20210615171337-6886f2dfbf5b // indirect
golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd // indirect
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9 // indirect
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 // indirect
golang.org/x/text v0.3.7 // indirect
golang.org/x/time v0.0.0-20210723032227-1f47c861a9ac // indirect
google.golang.org/appengine v1.6.7 // indirect
@@ -67,11 +68,13 @@ require (
gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b // indirect
k8s.io/apiextensions-apiserver v0.23.0 // indirect
k8s.io/component-base v0.23.0 // indirect
k8s.io/apiextensions-apiserver v0.23.5 // indirect
k8s.io/component-base v0.23.5 // indirect
k8s.io/klog/v2 v2.30.0 // indirect
k8s.io/kube-openapi v0.0.0-20211115234752-e816edb12b65 // indirect
k8s.io/utils v0.0.0-20210930125809-cb0fa318a74b // indirect
k8s.io/utils v0.0.0-20211116205334-6203023598ed // indirect
sigs.k8s.io/json v0.0.0-20211020170558-c049b76a60c6 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.2.0 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.2.1 // indirect
)
replace github.com/gregjones/httpcache => github.com/actions-runner-controller/httpcache v0.2.0

73
go.sum
View File

@@ -54,6 +54,8 @@ github.com/NYTimes/gziphandler v1.1.1/go.mod h1:n/CVRwUEOgIxrgPvAQhUUr9oeUtvrhMo
github.com/OneOfOne/xxhash v1.2.2/go.mod h1:HSdplMjZKSmBqAxg5vPj2TmRDmfkzw+cTzAElWljhcU=
github.com/PuerkitoBio/purell v1.1.1/go.mod h1:c11w/QuzBsJSee3cPx9rAFu61PvFxuPbtSwDGJws/X0=
github.com/PuerkitoBio/urlesc v0.0.0-20170810143723-de5bf2ad4578/go.mod h1:uGdkoq3SwY9Y+13GIhn11/XLaGBb4BfwItxLd5jeuXE=
github.com/actions-runner-controller/httpcache v0.2.0 h1:hCNvYuVPJ2xxYBymqBvH0hSiQpqz4PHF/LbU3XghGNI=
github.com/actions-runner-controller/httpcache v0.2.0/go.mod h1:JLu9/2M/btPz1Zu/vTZ71XzukQHn2YeISPmJoM5exBI=
github.com/alecthomas/template v0.0.0-20160405071501-a0175ee3bccc/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc=
github.com/alecthomas/template v0.0.0-20190718012654-fb15b899a751/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc=
github.com/alecthomas/units v0.0.0-20151022065526-2efee857e7cf/go.mod h1:ybxpYRFXyAe+OPACYpWeL0wqObRcbAqCMya13uyzqw0=
@@ -83,8 +85,9 @@ github.com/certifi/gocertifi v0.0.0-20191021191039-0944d244cd40/go.mod h1:sGbDF6
github.com/certifi/gocertifi v0.0.0-20200922220541-2c3bb06c6054/go.mod h1:sGbDF6GwGcLpkNXPUTkMRoywsNa/ol15pxFe6ERfguA=
github.com/cespare/xxhash v1.1.0 h1:a6HrQnmkObjyL+Gs60czilIUGqrzKutQD6XZog3p+ko=
github.com/cespare/xxhash v1.1.0/go.mod h1:XrSqR1VqqWfGrhpAt58auRo0WTKS1nRRg3ghfAqPWnc=
github.com/cespare/xxhash/v2 v2.1.1 h1:6MnRN8NT7+YBpUIWxHtefFZOKTAPgGjpQSxqLNn0+qY=
github.com/cespare/xxhash/v2 v2.1.1/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/cespare/xxhash/v2 v2.1.2 h1:YRXhKfTDauu4ajMg1TPgFO5jnlC2HCbmLXMcTG5cbYE=
github.com/cespare/xxhash/v2 v2.1.2/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWRnGsAI=
github.com/chzyer/readline v0.0.0-20180603132655-2972be24d48e/go.mod h1:nSuG5e5PlCu98SY8svDHJxuZscDgtXS6KTTbou5AhLI=
github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1/go.mod h1:Q3SI9o4m/ZMnBNeIyt5eFwwo7qiLfzFZmjNmxjkiQlU=
@@ -218,10 +221,11 @@ github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/
github.com/google/go-cmp v0.5.3/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.4/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.6 h1:BKbKCqvP6I+rmFHt06ZmyQtvB8xAkWdhFyr0ZUNZcxQ=
github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.7 h1:81/ik6ipDQS2aGcBfIN5dHDB36BwrStyeAQquSYCV4o=
github.com/google/go-cmp v0.5.7/go.mod h1:n+brtR0CgQNWTVd5ZUFpTBC8YFBDLK/h/bpaJ8/DtOE=
github.com/google/go-cmp v0.5.8 h1:e6P7q2lk1O+qJJb4BtCQXlK8vWEO8V1ZeuEdJNOqZyg=
github.com/google/go-cmp v0.5.8/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/go-github/v29 v29.0.2 h1:opYN6Wc7DOz7Ku3Oh4l7prmkOMwEcQxpFtxdU8N8Pts=
github.com/google/go-github/v29 v29.0.2/go.mod h1:CHKiKKPHJ0REzfwc14QMklvtHwCveD0PxlMjLlzAM5E=
github.com/google/go-github/v39 v39.2.0 h1:rNNM311XtPOz5rDdsJXAp2o8F67X9FnROXTvto3aSnQ=
@@ -258,7 +262,6 @@ github.com/gopherjs/gopherjs v0.0.0-20181017120253-0766667cb4d1/go.mod h1:wJfORR
github.com/gorilla/mux v1.8.0 h1:i40aqfkR1h2SlN9hojwV5ZA91wcXFOvkdNIeFDP5koI=
github.com/gorilla/mux v1.8.0/go.mod h1:DVbg23sWSpFRCP0SfiEN6jmj59UnW/n46BH5rLB71So=
github.com/gorilla/websocket v1.4.2/go.mod h1:YR8l580nyteQvAITg2hZ9XVh4b55+EU/adAjf1fMHhE=
github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7/go.mod h1:FecbI9+v66THATjSRHfNgh1IVFe/9kFxbXtjV0ctIMA=
github.com/grpc-ecosystem/go-grpc-middleware v1.0.0/go.mod h1:FiyG127CGDf3tlThmgyCl78X/SZQqEOJBCDaAfeWzPs=
github.com/grpc-ecosystem/go-grpc-middleware v1.3.0/go.mod h1:z0ButlSOZa5vEBq9m2m2hlwIgKw+rp3sdCBRoJY+30Y=
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0/go.mod h1:8NvIoxWQoOIhqOTXgfV/d3M/q6VIi02HzZEHgUlZvzk=
@@ -392,8 +395,9 @@ github.com/prometheus/client_golang v0.9.1/go.mod h1:7SWBe2y4D6OKWSNQJUaRYU/AaXP
github.com/prometheus/client_golang v0.9.3/go.mod h1:/TN21ttK/J9q6uSwhBd54HahCDft0ttaMvbicHlPoso=
github.com/prometheus/client_golang v1.0.0/go.mod h1:db9x61etRT2tGnBNRi70OPL5FsnadC4Ky3P0J6CfImo=
github.com/prometheus/client_golang v1.7.1/go.mod h1:PY5Wy2awLA44sXw4AOSfFBetzPP4j5+D6mVACh+pe2M=
github.com/prometheus/client_golang v1.11.0 h1:HNkLOAEQMIDv/K+04rukrLx6ch7msSRwf3/SASFAGtQ=
github.com/prometheus/client_golang v1.11.0/go.mod h1:Z6t4BnS23TR94PD6BsDNk8yVqroYurpAkEiz0P2BEV0=
github.com/prometheus/client_golang v1.12.1 h1:ZiaPsmm9uiBeaSMRznKsCDNtPCS0T3JVDGF+06gjBzk=
github.com/prometheus/client_golang v1.12.1/go.mod h1:3Z9XVyYiZYEO+YQWt3RD2R3jrbd179Rt297l4aS6nDY=
github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910/go.mod h1:MbSGuTsp3dbXC40dX6PRTWyKYBIrTGTE9sqQNg2J8bo=
github.com/prometheus/client_model v0.0.0-20190129233127-fd36f4220a90/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA=
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA=
@@ -404,14 +408,16 @@ github.com/prometheus/common v0.4.0/go.mod h1:TNfzLD0ON7rHzMJeJkieUDPYmFC7Snx/y8
github.com/prometheus/common v0.4.1/go.mod h1:TNfzLD0ON7rHzMJeJkieUDPYmFC7Snx/y86RQel1bk4=
github.com/prometheus/common v0.10.0/go.mod h1:Tlit/dnDKsSWFlCLTWaA1cyBgKHSMdTB80sz/V91rCo=
github.com/prometheus/common v0.26.0/go.mod h1:M7rCNAaPfAosfx8veZJCuw84e35h3Cfd9VFqTh1DIvc=
github.com/prometheus/common v0.28.0 h1:vGVfV9KrDTvWt5boZO0I19g2E3CsWfpPPKZM9dt3mEw=
github.com/prometheus/common v0.28.0/go.mod h1:vu+V0TpY+O6vW9J44gczi3Ap/oXXR10b+M/gUGO4Hls=
github.com/prometheus/common v0.32.1 h1:hWIdL3N2HoUx3B8j3YN9mWor0qhY/NlEKZEaXxuIRh4=
github.com/prometheus/common v0.32.1/go.mod h1:vu+V0TpY+O6vW9J44gczi3Ap/oXXR10b+M/gUGO4Hls=
github.com/prometheus/procfs v0.0.0-20181005140218-185b4288413d/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk=
github.com/prometheus/procfs v0.0.0-20190507164030-5867b95ac084/go.mod h1:TjEm7ze935MbeOT/UhFTIMYKhuLP4wbCsTZCD3I8kEA=
github.com/prometheus/procfs v0.0.2/go.mod h1:TjEm7ze935MbeOT/UhFTIMYKhuLP4wbCsTZCD3I8kEA=
github.com/prometheus/procfs v0.1.3/go.mod h1:lV6e/gmhEcM9IjHGsFOCxxuZ+z1YqCvr4OA4YeYWdaU=
github.com/prometheus/procfs v0.6.0 h1:mxy4L2jP6qMonqmq+aTtOx1ifVWUgG/TAmntgbh3xv4=
github.com/prometheus/procfs v0.6.0/go.mod h1:cz+aTbrPOrUb4q7XlbU9ygM+/jj0fzG6c1xBZuNvfVA=
github.com/prometheus/procfs v0.7.3 h1:4jVXhlkAyzOScmCkXBTOLRLTz8EeU+eyjrwB/EPq0VU=
github.com/prometheus/procfs v0.7.3/go.mod h1:cz+aTbrPOrUb4q7XlbU9ygM+/jj0fzG6c1xBZuNvfVA=
github.com/prometheus/tsdb v0.7.1/go.mod h1:qhTCs0VvXwvX/y3TZrWD7rabWM+ijKTux40TwIPHuXU=
github.com/rogpeppe/fastuuid v0.0.0-20150106093220-6724a57986af/go.mod h1:XWv6SoW27p1b0cqNHllgS5HIMJraePCO15w5zCzIWYg=
github.com/rogpeppe/fastuuid v1.2.0/go.mod h1:jVj6XXZzXRy/MSR5jhDC/2q6DgLz+nrA6LYCDYWNEvQ=
@@ -457,6 +463,8 @@ github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/
github.com/subosito/gotenv v1.2.0/go.mod h1:N0PQaV/YGNqwC0u51sEeR/aUtSLEXKX9iv69rRypqCw=
github.com/teambition/rrule-go v1.7.2 h1:goEajFWYydfCgavn2m/3w5U+1b3PGqPUHx/fFSVfTy0=
github.com/teambition/rrule-go v1.7.2/go.mod h1:mBJ1Ht5uboJ6jexKdNUJg2NcwP8uUMNvStWXlJD3MvU=
github.com/teambition/rrule-go v1.8.0 h1:a/IX5s56hGkFF+nRlJUooZU/45OTeeldBGL29nDKIHw=
github.com/teambition/rrule-go v1.8.0/go.mod h1:Ieq5AbrKGciP1V//Wq8ktsTXwSwJHDD5mD/wLBGl3p4=
github.com/tmc/grpc-websocket-proxy v0.0.0-20190109142713-0ad062ec5ee5/go.mod h1:ncp9v5uamzpCO7NfCPTXjqaC+bZgJeR0sMTm6dMHP7U=
github.com/tmc/grpc-websocket-proxy v0.0.0-20201229170055-e5319fda7802/go.mod h1:ncp9v5uamzpCO7NfCPTXjqaC+bZgJeR0sMTm6dMHP7U=
github.com/xiang90/probing v0.0.0-20190116061207-43a291ad63a2/go.mod h1:UETIi67q53MR2AWcXfiuqkDkRtnGDLqkBTpCHuJHxtU=
@@ -509,8 +517,8 @@ go.uber.org/zap v1.10.0/go.mod h1:vwi/ZaCAaUcBkycHslxD9B2zi4UTXhF60s6SWpuDF0Q=
go.uber.org/zap v1.17.0/go.mod h1:MXVU+bhUf/A7Xi2HNOnopQOrmycQ5Ih87HtOu4q5SSo=
go.uber.org/zap v1.19.0/go.mod h1:xg/QME4nWcxGxrpdeYfq7UvYrLh66cuVKdrbD1XF/NI=
go.uber.org/zap v1.19.1/go.mod h1:j3DNczoxDZroyBnOT1L/Q79cfUMGZxlv/9dzN7SM1rI=
go.uber.org/zap v1.20.0 h1:N4oPlghZwYG55MlU6LXk/Zp00FVNE9X9wrYO8CEs4lc=
go.uber.org/zap v1.20.0/go.mod h1:wjWOCqI0f2ZZrJF/UufIOkiC8ii6tm1iqIsLo76RfJw=
go.uber.org/zap v1.21.0 h1:WefMeulhovoZ2sYXz7st6K0sLj7bBhpiFaud4r4zST8=
go.uber.org/zap v1.21.0/go.mod h1:wjWOCqI0f2ZZrJF/UufIOkiC8ii6tm1iqIsLo76RfJw=
golang.org/x/crypto v0.0.0-20180904163835-0709b304e793/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4=
golang.org/x/crypto v0.0.0-20181029021203-45a5f77698d3/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
@@ -603,8 +611,10 @@ golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96b
golang.org/x/net v0.0.0-20210428140749-89ef3d95e781/go.mod h1:OJAsFXCWl8Ukc7SiCT/9KSuxbyM7479/AVlXFRxuMCk=
golang.org/x/net v0.0.0-20210525063256-abc453219eb5/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
golang.org/x/net v0.0.0-20210805182204-aaa1db679c0d/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
golang.org/x/net v0.0.0-20210825183410-e898025ed96a h1:bRuuGXV8wwSdGTB+CtJf+FjgO1APK1CoO39T4BN/XBw=
golang.org/x/net v0.0.0-20210825183410-e898025ed96a/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
golang.org/x/net v0.0.0-20211209124913-491a49abca63/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd h1:O7DYs+zxREGLKzKoMQrtrEacpb0ZVXA5rIwylE2Xchk=
golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd/go.mod h1:CfG3xpIq0wQ8r1q4Su4UZFWDARRcnwPjda9FqA0JpMk=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
@@ -619,8 +629,8 @@ golang.org/x/oauth2 v0.0.0-20210313182246-cd4f82c27b84/go.mod h1:KelEdhl1UZF7XfJ
golang.org/x/oauth2 v0.0.0-20210402161424-2e8d93401602/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A=
golang.org/x/oauth2 v0.0.0-20210514164344-f6687ab2804c/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A=
golang.org/x/oauth2 v0.0.0-20210819190943-2bc19b11175f/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A=
golang.org/x/oauth2 v0.0.0-20211104180415-d3ed0bb246c8 h1:RerP+noqYHUQ8CMRcPlC2nvTa4dcBIjegkuWdcUDuqg=
golang.org/x/oauth2 v0.0.0-20211104180415-d3ed0bb246c8/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A=
golang.org/x/oauth2 v0.0.0-20220309155454-6242fa91716a h1:qfl7ob3DIEs3Ml9oLuPwY2N04gymzAW04WsUQHIClgM=
golang.org/x/oauth2 v0.0.0-20220309155454-6242fa91716a/go.mod h1:DAh4E804XQdzx2j+YRIaUnCqCV2RuMz24cGBJ5QYIrc=
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
@@ -697,11 +707,14 @@ golang.org/x/sys v0.0.0-20210616094352-59db8d763f22/go.mod h1:oPkhp1MJrh7nUepCBc
golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20210809222454-d867a43fc93e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20210831042530-f4d43177bf5e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20211029165221-6e7872819dc8 h1:M69LAlWZCshgp0QSzyDcSsSIejIEeuaCVpmwcKwyLMk=
golang.org/x/sys v0.0.0-20211029165221-6e7872819dc8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9 h1:XfKQ4OlFl8okEOr5UvAqFRVj8pY/4yfcXrddB8qAbU0=
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210615171337-6886f2dfbf5b h1:9zKuko04nR4gjZ4+DNjHqRlAJqbJETHwiNKDqTfOjfE=
golang.org/x/term v0.0.0-20210615171337-6886f2dfbf5b/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 h1:JGgROgKl9N8DuW20oFS5gxc+lE67/N3FcwmBPMe7ArY=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
@@ -935,18 +948,33 @@ honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWh
honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt0JzvZhAg=
honnef.co/go/tools v0.0.1-2020.1.3/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k=
honnef.co/go/tools v0.0.1-2020.1.4/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k=
k8s.io/api v0.23.0 h1:WrL1gb73VSC8obi8cuYETJGXEoFNEh3LU0Pt+Sokgro=
k8s.io/api v0.23.0/go.mod h1:8wmDdLBHBNxtOIytwLstXt5E9PddnZb0GaMcqsvDBpg=
k8s.io/api v0.23.4 h1:85gnfXQOWbJa1SiWGpE9EEtHs0UVvDyIsSMpEtl2D4E=
k8s.io/api v0.23.4/go.mod h1:i77F4JfyNNrhOjZF7OwwNJS5Y1S9dpwvb9iYRYRczfI=
k8s.io/api v0.23.5 h1:zno3LUiMubxD/V1Zw3ijyKO3wxrhbUF1Ck+VjBvfaoA=
k8s.io/api v0.23.5/go.mod h1:Na4XuKng8PXJ2JsploYYrivXrINeTaycCGcYgF91Xm8=
k8s.io/apiextensions-apiserver v0.23.0 h1:uii8BYmHYiT2ZTAJxmvc3X8UhNYMxl2A0z0Xq3Pm+WY=
k8s.io/apiextensions-apiserver v0.23.0/go.mod h1:xIFAEEDlAZgpVBl/1VSjGDmLoXAWRG40+GsWhKhAxY4=
k8s.io/apimachinery v0.23.0 h1:mIfWRMjBuMdolAWJ3Fd+aPTMv3X9z+waiARMpvvb0HQ=
k8s.io/apiextensions-apiserver v0.23.5 h1:5SKzdXyvIJKu+zbfPc3kCbWpbxi+O+zdmAJBm26UJqI=
k8s.io/apiextensions-apiserver v0.23.5/go.mod h1:ntcPWNXS8ZPKN+zTXuzYMeg731CP0heCTl6gYBxLcuQ=
k8s.io/apimachinery v0.23.0/go.mod h1:fFCTTBKvKcwTPFzjlcxp91uPFZr+JA0FubU4fLzzFYc=
k8s.io/apimachinery v0.23.4 h1:fhnuMd/xUL3Cjfl64j5ULKZ1/J9n8NuQEgNL+WXWfdM=
k8s.io/apimachinery v0.23.4/go.mod h1:BEuFMMBaIbcOqVIJqNZJXGFTP4W6AycEpb5+m/97hrM=
k8s.io/apimachinery v0.23.5 h1:Va7dwhp8wgkUPWsEXk6XglXWU4IKYLKNlv8VkX7SDM0=
k8s.io/apimachinery v0.23.5/go.mod h1:BEuFMMBaIbcOqVIJqNZJXGFTP4W6AycEpb5+m/97hrM=
k8s.io/apiserver v0.23.0/go.mod h1:Cec35u/9zAepDPPFyT+UMrgqOCjgJ5qtfVJDxjZYmt4=
k8s.io/client-go v0.23.0 h1:vcsOqyPq7XV3QmQRCBH/t9BICJM9Q1M18qahjv+rebY=
k8s.io/apiserver v0.23.5/go.mod h1:7wvMtGJ42VRxzgVI7jkbKvMbuCbVbgsWFT7RyXiRNTw=
k8s.io/client-go v0.23.0/go.mod h1:hrDnpnK1mSr65lHHcUuIZIXDgEbzc7/683c6hyG4jTA=
k8s.io/client-go v0.23.4 h1:YVWvPeerA2gpUudLelvsolzH7c2sFoXXR5wM/sWqNFU=
k8s.io/client-go v0.23.4/go.mod h1:PKnIL4pqLuvYUK1WU7RLTMYKPiIh7MYShLshtRY9cj0=
k8s.io/client-go v0.23.5 h1:zUXHmEuqx0RY4+CsnkOn5l0GU+skkRXKGJrhmE2SLd8=
k8s.io/client-go v0.23.5/go.mod h1:flkeinTO1CirYgzMPRWxUCnV0G4Fbu2vLhYCObnt/r4=
k8s.io/code-generator v0.23.0/go.mod h1:vQvOhDXhuzqiVfM/YHp+dmg10WDZCchJVObc9MvowsE=
k8s.io/code-generator v0.23.5/go.mod h1:S0Q1JVA+kSzTI1oUvbKAxZY/DYbA/ZUb4Uknog12ETk=
k8s.io/component-base v0.23.0 h1:UAnyzjvVZ2ZR1lF35YwtNY6VMN94WtOnArcXBu34es8=
k8s.io/component-base v0.23.0/go.mod h1:DHH5uiFvLC1edCpvcTDV++NKULdYYU6pR9Tt3HIKMKI=
k8s.io/component-base v0.23.5 h1:8qgP5R6jG1BBSXmRYW+dsmitIrpk8F/fPEvgDenMCCE=
k8s.io/component-base v0.23.5/go.mod h1:c5Nq44KZyt1aLl0IpHX82fhsn84Sb0jjzwjpcA42bY0=
k8s.io/gengo v0.0.0-20210813121822-485abfe95c7c/go.mod h1:FiNAH4ZV3gBg2Kwh89tzAEV2be7d5xI0vBa/VySYy3E=
k8s.io/klog/v2 v2.0.0/go.mod h1:PBfzABfn139FHAV07az/IF9Wp1bkk3vpT2XSJ76fSDE=
k8s.io/klog/v2 v2.2.0/go.mod h1:Od+F08eJP+W3HUb4pSrPpgp9DGU4GzlpG/TmITuYh/Y=
@@ -955,20 +983,25 @@ k8s.io/klog/v2 v2.30.0/go.mod h1:y1WjHnz7Dj687irZUWR/WLkLc5N1YHtjLdmgWjndZn0=
k8s.io/kube-openapi v0.0.0-20211115234752-e816edb12b65 h1:E3J9oCLlaobFUqsjG9DfKbP2BmgwBL2p7pn0A3dG9W4=
k8s.io/kube-openapi v0.0.0-20211115234752-e816edb12b65/go.mod h1:sX9MT8g7NVZM5lVL/j8QyCCJe8YSMW30QvGZWaCIDIk=
k8s.io/utils v0.0.0-20210802155522-efc7438f0176/go.mod h1:jPW/WVKK9YHAvNhRxK0md/EJ228hCsBRufyofKtW8HA=
k8s.io/utils v0.0.0-20210930125809-cb0fa318a74b h1:wxEMGetGMur3J1xuGLQY7GEQYg9bZxKn3tKo5k/eYcs=
k8s.io/utils v0.0.0-20210930125809-cb0fa318a74b/go.mod h1:jPW/WVKK9YHAvNhRxK0md/EJ228hCsBRufyofKtW8HA=
k8s.io/utils v0.0.0-20211116205334-6203023598ed h1:ck1fRPWPJWsMd8ZRFsWc6mh/zHp5fZ/shhbrgPUxDAE=
k8s.io/utils v0.0.0-20211116205334-6203023598ed/go.mod h1:jPW/WVKK9YHAvNhRxK0md/EJ228hCsBRufyofKtW8HA=
rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8=
rsc.io/quote/v3 v3.1.0/go.mod h1:yEA65RcK8LyAZtP9Kv3t0HmxON59tX3rD+tICJqUlj0=
rsc.io/sampler v1.3.0/go.mod h1:T1hPZKmBbMNahiBKFy5HrXp6adAjACjK9JXDnKaTXpA=
sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.0.25/go.mod h1:Mlj9PNLmG9bZ6BHFwFKDo5afkpWyUISkb9Me0GnK66I=
sigs.k8s.io/controller-runtime v0.11.0 h1:DqO+c8mywcZLFJWILq4iktoECTyn30Bkj0CwgqMpZWQ=
sigs.k8s.io/controller-runtime v0.11.0/go.mod h1:KKwLiTooNGu+JmLZGn9Sl3Gjmfj66eMbCQznLP5zcqA=
sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.0.30/go.mod h1:fEO7lRTdivWO2qYVCVG7dEADOMo/MLDCVr8So2g88Uw=
sigs.k8s.io/controller-runtime v0.11.1 h1:7YIHT2QnHJArj/dk9aUkYhfqfK5cIxPOX5gPECfdZLU=
sigs.k8s.io/controller-runtime v0.11.1/go.mod h1:KKwLiTooNGu+JmLZGn9Sl3Gjmfj66eMbCQznLP5zcqA=
sigs.k8s.io/controller-runtime v0.11.2 h1:H5GTxQl0Mc9UjRJhORusqfJCIjBO8UtUxGggCwL1rLA=
sigs.k8s.io/controller-runtime v0.11.2/go.mod h1:P6QCzrEjLaZGqHsfd+os7JQ+WFZhvB8MRFsn4dWF7O4=
sigs.k8s.io/json v0.0.0-20211020170558-c049b76a60c6 h1:fD1pz4yfdADVNfFmcP2aBEtudwUQ1AlLnRBALr33v3s=
sigs.k8s.io/json v0.0.0-20211020170558-c049b76a60c6/go.mod h1:p4QtZmO4uMYipTQNzagwnNoseA6OxSUutVw05NhYDRs=
sigs.k8s.io/structured-merge-diff/v4 v4.0.2/go.mod h1:bJZC9H9iH24zzfZ/41RGcq60oK1F7G282QMXDPYydCw=
sigs.k8s.io/structured-merge-diff/v4 v4.1.2/go.mod h1:j/nl6xW8vLS49O8YvXW1ocPhZawJtm+Yrr7PPRQ0Vg4=
sigs.k8s.io/structured-merge-diff/v4 v4.2.0 h1:kDvPBbnPk+qYmkHmSo8vKGp438IASWofnbbUKDE/bv0=
sigs.k8s.io/structured-merge-diff/v4 v4.2.0/go.mod h1:j/nl6xW8vLS49O8YvXW1ocPhZawJtm+Yrr7PPRQ0Vg4=
sigs.k8s.io/structured-merge-diff/v4 v4.2.1 h1:bKCqE9GvQ5tiVHn5rfn1r+yao3aLQEaLzkkmAkf+A6Y=
sigs.k8s.io/structured-merge-diff/v4 v4.2.1/go.mod h1:j/nl6xW8vLS49O8YvXW1ocPhZawJtm+Yrr7PPRQ0Vg4=
sigs.k8s.io/yaml v1.2.0/go.mod h1:yfXDCHCao9+ENCvLSE62v9VSji2MKu5jeNfTrofGhJc=
sigs.k8s.io/yaml v1.3.0 h1:a2VclLzOGrwOHDiV8EfBGhvjHvP46CtW5j6POvhYGGo=
sigs.k8s.io/yaml v1.3.0/go.mod h1:GeOyir5tyXNByN85N/dRIT9es5UQNerPYEKK56eTBm8=

56
logging/logger.go Normal file
View File

@@ -0,0 +1,56 @@
package logging
import (
"fmt"
"os"
"strconv"
"time"
"github.com/go-logr/logr"
zaplib "go.uber.org/zap"
"go.uber.org/zap/zapcore"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
)
const (
LogLevelDebug = "debug"
LogLevelInfo = "info"
LogLevelWarn = "warn"
LogLevelError = "error"
)
func NewLogger(logLevel string) logr.Logger {
log := zap.New(func(o *zap.Options) {
switch logLevel {
case LogLevelDebug:
o.Development = true
lvl := zaplib.NewAtomicLevelAt(zaplib.DebugLevel) // maps to logr's V(1)
o.Level = &lvl
case LogLevelInfo:
lvl := zaplib.NewAtomicLevelAt(zaplib.InfoLevel)
o.Level = &lvl
case LogLevelWarn:
lvl := zaplib.NewAtomicLevelAt(zaplib.WarnLevel)
o.Level = &lvl
case LogLevelError:
lvl := zaplib.NewAtomicLevelAt(zaplib.ErrorLevel)
o.Level = &lvl
default:
// We use bitsize of 8 as zapcore.Level is a type alias to int8
levelInt, err := strconv.ParseInt(logLevel, 10, 8)
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to parse --log-level=%s: %v", logLevel, err)
os.Exit(1)
}
// For example, --log-level=debug a.k.a --log-level=-1 maps to zaplib.DebugLevel, which is associated to logr's V(1)
// --log-level=-2 maps the specific custom log level that is associated to logr's V(2).
level := zapcore.Level(levelInt)
atomicLevel := zaplib.NewAtomicLevelAt(level)
o.Level = &atomicLevel
}
o.TimeEncoder = zapcore.TimeEncoderOfLayout(time.RFC3339)
})
return log
}

66
logging/transport.go Normal file
View File

@@ -0,0 +1,66 @@
// Package logging provides various logging helpers for ARC
package logging
import (
"bytes"
"io"
"net/http"
"github.com/go-logr/logr"
"github.com/gregjones/httpcache"
)
const (
// https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting
headerRateLimitRemaining = "X-RateLimit-Remaining"
)
// Transport wraps a transport with metrics monitoring
type Transport struct {
Transport http.RoundTripper
Log *logr.Logger
}
func (t Transport) RoundTrip(req *http.Request) (*http.Response, error) {
resp, err := t.Transport.RoundTrip(req)
if resp != nil {
t.log(req, resp)
}
return resp, err
}
func (t Transport) log(req *http.Request, resp *http.Response) {
if t.Log == nil {
return
}
var args []interface{}
marked := resp.Header.Get(httpcache.XFromCache) == "1"
args = append(args, "from_cache", marked, "method", req.Method, "url", req.URL.String())
if !marked {
// Do not log outdated rate limit remaining value
remaining := resp.Header.Get(headerRateLimitRemaining)
args = append(args, "ratelimit_remaining", remaining)
}
if t.Log.V(4).Enabled() {
var buf bytes.Buffer
if _, err := io.Copy(&buf, resp.Body); err != nil {
t.Log.V(3).Info("unable to copy http response", "error", err)
}
resp.Body.Close()
t.Log.V(4).Info("Logging HTTP round-trip", "method", req.Method, "requestHeader", req.Header, "statusCode", resp.StatusCode, "responseHeader", resp.Header, "responseBody", buf.String())
resp.Body = io.NopCloser(&buf)
}
t.Log.V(3).Info("Seen HTTP response", args...)
}

46
main.go
View File

@@ -26,24 +26,18 @@ import (
actionsv1alpha1 "github.com/actions-runner-controller/actions-runner-controller/api/v1alpha1"
"github.com/actions-runner-controller/actions-runner-controller/controllers"
"github.com/actions-runner-controller/actions-runner-controller/github"
"github.com/actions-runner-controller/actions-runner-controller/logging"
"github.com/kelseyhightower/envconfig"
zaplib "go.uber.org/zap"
"k8s.io/apimachinery/pkg/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
// +kubebuilder:scaffold:imports
)
const (
defaultRunnerImage = "summerwind/actions-runner:latest"
defaultDockerImage = "docker:dind"
logLevelDebug = "debug"
logLevelInfo = "info"
logLevelWarn = "warn"
logLevelError = "error"
)
var (
@@ -80,6 +74,7 @@ func main() {
syncPeriod time.Duration
gitHubAPICacheDuration time.Duration
defaultScaleDownDelay time.Duration
runnerImage string
runnerImagePullSecrets stringSlice
@@ -116,28 +111,17 @@ func main() {
flag.StringVar(&c.BasicauthUsername, "github-basicauth-username", c.BasicauthUsername, "Username for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API")
flag.StringVar(&c.BasicauthPassword, "github-basicauth-password", c.BasicauthPassword, "Password for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API")
flag.StringVar(&c.RunnerGitHubURL, "runner-github-url", c.RunnerGitHubURL, "GitHub URL to be used by runners during registration")
flag.DurationVar(&gitHubAPICacheDuration, "github-api-cache-duration", 0, "The duration until the GitHub API cache expires. Setting this to e.g. 10m results in the controller tries its best not to make the same API call within 10m to reduce the chance of being rate-limited. Defaults to mostly the same value as sync-period. If you're tweaking this in order to make autoscaling more responsive, you'll probably want to tweak sync-period, too")
flag.DurationVar(&syncPeriod, "sync-period", 10*time.Minute, "Determines the minimum frequency at which K8s resources managed by this controller are reconciled. When you use autoscaling, set to a lower value like 10 minute, because this corresponds to the minimum time to react on demand change. . If you're tweaking this in order to make autoscaling more responsive, you'll probably want to tweak github-api-cache-duration, too")
flag.DurationVar(&gitHubAPICacheDuration, "github-api-cache-duration", 0, "DEPRECATED: The duration until the GitHub API cache expires. Setting this to e.g. 10m results in the controller tries its best not to make the same API call within 10m to reduce the chance of being rate-limited. Defaults to mostly the same value as sync-period. If you're tweaking this in order to make autoscaling more responsive, you'll probably want to tweak sync-period, too")
flag.DurationVar(&defaultScaleDownDelay, "default-scale-down-delay", controllers.DefaultScaleDownDelay, "The approximate delay for a scale down followed by a scale up, used to prevent flapping (down->up->down->... loop)")
flag.DurationVar(&syncPeriod, "sync-period", 1*time.Minute, "Determines the minimum frequency at which K8s resources managed by this controller are reconciled.")
flag.Var(&commonRunnerLabels, "common-runner-labels", "Runner labels in the K1=V1,K2=V2,... format that are inherited all the runners created by the controller. See https://github.com/actions-runner-controller/actions-runner-controller/issues/321 for more information")
flag.StringVar(&namespace, "watch-namespace", "", "The namespace to watch for custom resources. Set to empty for letting it watch for all namespaces.")
flag.StringVar(&logLevel, "log-level", logLevelDebug, `The verbosity of the logging. Valid values are "debug", "info", "warn", "error". Defaults to "debug".`)
flag.StringVar(&logLevel, "log-level", logging.LogLevelDebug, `The verbosity of the logging. Valid values are "debug", "info", "warn", "error". Defaults to "debug".`)
flag.Parse()
logger := zap.New(func(o *zap.Options) {
switch logLevel {
case logLevelDebug:
o.Development = true
case logLevelInfo:
lvl := zaplib.NewAtomicLevelAt(zaplib.InfoLevel)
o.Level = &lvl
case logLevelWarn:
lvl := zaplib.NewAtomicLevelAt(zaplib.WarnLevel)
o.Level = &lvl
case logLevelError:
lvl := zaplib.NewAtomicLevelAt(zaplib.ErrorLevel)
o.Level = &lvl
}
})
logger := logging.NewLogger(logLevel)
c.Log = &logger
ghClient, err = c.NewClient()
if err != nil {
@@ -230,6 +214,7 @@ func main() {
log.Info(
"Initializing actions-runner-controller",
"github-api-cache-duration", gitHubAPICacheDuration,
"default-scale-down-delay", defaultScaleDownDelay,
"sync-period", syncPeriod,
"runner-image", runnerImage,
"docker-image", dockerImage,
@@ -238,11 +223,12 @@ func main() {
)
horizontalRunnerAutoscaler := &controllers.HorizontalRunnerAutoscalerReconciler{
Client: mgr.GetClient(),
Log: log.WithName("horizontalrunnerautoscaler"),
Scheme: mgr.GetScheme(),
GitHubClient: ghClient,
CacheDuration: gitHubAPICacheDuration,
Client: mgr.GetClient(),
Log: log.WithName("horizontalrunnerautoscaler"),
Scheme: mgr.GetScheme(),
GitHubClient: ghClient,
CacheDuration: gitHubAPICacheDuration,
DefaultScaleDownDelay: defaultScaleDownDelay,
}
runnerPodReconciler := &controllers.RunnerPodReconciler{

View File

@@ -1,7 +1,7 @@
FROM ubuntu:20.04
ARG TARGETPLATFORM
ARG RUNNER_VERSION=2.287.1
ARG RUNNER_VERSION=2.290.1
ARG DOCKER_CHANNEL=stable
ARG DOCKER_VERSION=20.10.12
ARG DUMB_INIT_VERSION=1.2.5
@@ -10,10 +10,10 @@ RUN test -n "$TARGETPLATFORM" || (echo "TARGETPLATFORM must be set" && false)
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update -y \
&& apt install -y software-properties-common \
&& apt-get install -y software-properties-common \
&& add-apt-repository -y ppa:git-core/ppa \
&& apt update -y \
&& apt install -y --no-install-recommends \
&& apt-get update -y \
&& apt-get install -y --no-install-recommends \
build-essential \
curl \
ca-certificates \
@@ -66,7 +66,6 @@ RUN set -vx; \
&& usermod -aG docker runner \
&& echo "%sudo ALL=(ALL:ALL) NOPASSWD:ALL" > /etc/sudoers
ENV RUNNER_ASSETS_DIR=/runnertmp
ENV HOME=/home/runner
# Uncomment the below COPY to use your own custom build of actions-runner.
@@ -83,7 +82,7 @@ ENV HOME=/home/runner
#
# If you're willing to uncomment the following line, you'd also need to comment-out the
# && curl -L -o runner.tar.gz https://github.com/actions/runner/releases/download/v${RUNNER_VERSION}/actions-runner-linux-${ARCH}-${RUNNER_VERSION}.tar.gz \
# line in the next `RUN` command in this Dockerfile, to avoid overwiding this runner.tar.gz with a remote one.
# line in the next `RUN` command in this Dockerfile, to avoid overwiting this runner.tar.gz with a remote one.
# COPY actions-runner-linux-x64-2.280.3.tar.gz /runnertmp/runner.tar.gz
@@ -92,6 +91,7 @@ ENV HOME=/home/runner
# libyaml-dev is required for ruby/setup-ruby action.
# It is installed after installdependencies.sh and before removing /var/lib/apt/lists
# to avoid rerunning apt-update on its own.
ENV RUNNER_ASSETS_DIR=/runnertmp
RUN export ARCH=$(echo ${TARGETPLATFORM} | cut -d / -f2) \
&& if [ "$ARCH" = "amd64" ] || [ "$ARCH" = "x86_64" ] || [ "$ARCH" = "i386" ]; then export ARCH=x64 ; fi \
&& mkdir -p "$RUNNER_ASSETS_DIR" \
@@ -110,14 +110,18 @@ RUN mkdir /opt/hostedtoolcache \
&& chgrp docker /opt/hostedtoolcache \
&& chmod g+rwx /opt/hostedtoolcache
COPY entrypoint.sh /
COPY --chown=runner:docker patched $RUNNER_ASSETS_DIR/patched
# We place the scripts in `/usr/bin` so that users who extend this image can
# override them with scripts of the same name placed in `/usr/local/bin`.
COPY entrypoint.sh logger.bash /usr/bin/
# Add the Python "User Script Directory" to the PATH
ENV PATH="${PATH}:${HOME}/.local/bin"
ENV ImageOS=ubuntu20
RUN echo "PATH=${PATH}" > /etc/environment \
&& echo "ImageOS=${ImageOS}" >> /etc/environment
USER runner
ENTRYPOINT ["/usr/local/bin/dumb-init", "--"]
CMD ["/entrypoint.sh"]
CMD ["entrypoint.sh"]

View File

@@ -1,7 +1,7 @@
FROM ubuntu:20.04
ARG TARGETPLATFORM
ARG RUNNER_VERSION=2.287.1
ARG RUNNER_VERSION=2.290.1
ARG DOCKER_CHANNEL=stable
ARG DOCKER_VERSION=20.10.12
ARG DUMB_INIT_VERSION=1.2.5
@@ -10,10 +10,10 @@ RUN test -n "$TARGETPLATFORM" || (echo "TARGETPLATFORM must be set" && false)
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update -y \
&& apt install -y software-properties-common \
&& apt-get install -y software-properties-common \
&& add-apt-repository -y ppa:git-core/ppa \
&& apt update -y \
&& apt install -y --no-install-recommends \
&& apt-get update -y \
&& apt-get install -y --no-install-recommends \
build-essential \
curl \
ca-certificates \
@@ -74,7 +74,6 @@ RUN export ARCH=$(echo ${TARGETPLATFORM} | cut -d / -f2) \
dockerd --version; \
docker --version
ENV RUNNER_ASSETS_DIR=/runnertmp
ENV HOME=/home/runner
# Runner download supports amd64 as x64
@@ -82,6 +81,7 @@ ENV HOME=/home/runner
# libyaml-dev is required for ruby/setup-ruby action.
# It is installed after installdependencies.sh and before removing /var/lib/apt/lists
# to avoid rerunning apt-update on its own.
ENV RUNNER_ASSETS_DIR=/runnertmp
RUN export ARCH=$(echo ${TARGETPLATFORM} | cut -d / -f2) \
&& if [ "$ARCH" = "amd64" ] || [ "$ARCH" = "x86_64" ] || [ "$ARCH" = "i386" ]; then export ARCH=x64 ; fi \
&& mkdir -p "$RUNNER_ASSETS_DIR" \
@@ -97,13 +97,12 @@ ENV RUNNER_TOOL_CACHE=/opt/hostedtoolcache
RUN mkdir /opt/hostedtoolcache \
&& chgrp docker /opt/hostedtoolcache \
&& chmod g+rwx /opt/hostedtoolcache
COPY modprobe startup.sh /usr/local/bin/
COPY supervisor/ /etc/supervisor/conf.d/
COPY logger.sh /opt/bash-utils/logger.sh
COPY entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/startup.sh /usr/local/bin/entrypoint.sh /usr/local/bin/modprobe
# We place the scripts in `/usr/bin` so that users who extend this image can
# override them with scripts of the same name placed in `/usr/local/bin`.
COPY entrypoint.sh logger.bash startup.sh /usr/bin/
COPY supervisor/ /etc/supervisor/conf.d/
RUN chmod +x /usr/bin/startup.sh /usr/bin/entrypoint.sh
# arch command on OS X reports "i386" for Intel CPUs regardless of bitness
RUN export ARCH=$(echo ${TARGETPLATFORM} | cut -d / -f2) \
@@ -114,12 +113,13 @@ RUN export ARCH=$(echo ${TARGETPLATFORM} | cut -d / -f2) \
VOLUME /var/lib/docker
COPY --chown=runner:docker patched $RUNNER_ASSETS_DIR/patched
# Add the Python "User Script Directory" to the PATH
ENV PATH="${PATH}:${HOME}/.local/bin"
ENV ImageOS=ubuntu20
RUN echo "PATH=${PATH}" > /etc/environment \
&& echo "ImageOS=${ImageOS}" >> /etc/environment
# No group definition, as that makes it harder to run docker.
USER runner

View File

@@ -4,7 +4,7 @@ DIND_RUNNER_NAME ?= ${DOCKER_USER}/actions-runner-dind
TAG ?= latest
TARGETPLATFORM ?= $(shell arch)
RUNNER_VERSION ?= 2.287.1
RUNNER_VERSION ?= 2.290.1
DOCKER_VERSION ?= 20.10.12
# default list of platforms for which multiarch image is built
@@ -33,7 +33,8 @@ docker-push-ubuntu:
docker push ${DIND_RUNNER_NAME}:${TAG}
docker-buildx-ubuntu:
export DOCKER_CLI_EXPERIMENTAL=enabled
export DOCKER_CLI_EXPERIMENTAL=enabled ;\
export DOCKER_BUILDKIT=1
@if ! docker buildx ls | grep -q container-builder; then\
docker buildx create --platform ${PLATFORMS} --name container-builder --use;\
fi

View File

@@ -1,50 +1,35 @@
#!/bin/bash
source logger.bash
RUNNER_ASSETS_DIR=${RUNNER_ASSETS_DIR:-/runnertmp}
RUNNER_HOME=${RUNNER_HOME:-/runner}
LIGHTGREEN="\e[0;32m"
LIGHTRED="\e[0;31m"
WHITE="\e[0;97m"
RESET="\e[0m"
log(){
printf "${WHITE}${@}${RESET}\n" 1>&2
}
success(){
printf "${LIGHTGREEN}${@}${RESET}\n" 1>&2
}
error(){
printf "${LIGHTRED}${@}${RESET}\n" 1>&2
}
if [ ! -z "${STARTUP_DELAY_IN_SECONDS}" ]; then
log "Delaying startup by ${STARTUP_DELAY_IN_SECONDS} seconds"
log.notice "Delaying startup by ${STARTUP_DELAY_IN_SECONDS} seconds"
sleep ${STARTUP_DELAY_IN_SECONDS}
fi
if [[ "${DISABLE_WAIT_FOR_DOCKER}" != "true" ]] && [[ "${DOCKER_ENABLED}" == "true" ]]; then
log "Docker enabled runner detected and Docker daemon wait is enabled"
log "Waiting until Docker is avaliable or the timeout is reached"
log.debug 'Docker enabled runner detected and Docker daemon wait is enabled'
log.debug 'Waiting until Docker is available or the timeout is reached'
timeout 120s bash -c 'until docker ps ;do sleep 1; done'
else
log "Docker wait check skipped. Either Docker is disabled or the wait is disabled, continuing with entrypoint"
log.notice 'Docker wait check skipped. Either Docker is disabled or the wait is disabled, continuing with entrypoint'
fi
if [ -z "${GITHUB_URL}" ]; then
log "Working with public GitHub"
log.debug 'Working with public GitHub'
GITHUB_URL="https://github.com/"
else
length=${#GITHUB_URL}
last_char=${GITHUB_URL:length-1:1}
[[ $last_char != "/" ]] && GITHUB_URL="$GITHUB_URL/"; :
log "Github endpoint URL ${GITHUB_URL}"
log.debug "Github endpoint URL ${GITHUB_URL}"
fi
if [ -z "${RUNNER_NAME}" ]; then
error "RUNNER_NAME must be set"
log.error 'RUNNER_NAME must be set'
exit 1
fi
@@ -57,12 +42,12 @@ elif [ -n "${RUNNER_REPO}" ]; then
elif [ -n "${RUNNER_ENTERPRISE}" ]; then
ATTACH="enterprises/${RUNNER_ENTERPRISE}"
else
error "At least one of RUNNER_ORG or RUNNER_REPO or RUNNER_ENTERPRISE must be set"
log.error 'At least one of RUNNER_ORG, RUNNER_REPO, or RUNNER_ENTERPRISE must be set'
exit 1
fi
if [ -z "${RUNNER_TOKEN}" ]; then
error "RUNNER_TOKEN must be set"
log.error 'RUNNER_TOKEN must be set'
exit 1
fi
@@ -72,33 +57,37 @@ fi
# Hack due to https://github.com/actions-runner-controller/actions-runner-controller/issues/252#issuecomment-758338483
if [ ! -d "${RUNNER_HOME}" ]; then
error "${RUNNER_HOME} should be an emptyDir mount. Please fix the pod spec."
log.error "$RUNNER_HOME should be an emptyDir mount. Please fix the pod spec."
exit 1
fi
# if this is not a testing environment
if [ -z "${UNITTEST:-}" ]; then
sudo chown -R runner:docker ${RUNNER_HOME}
# use cp over mv to avoid issues when /runnertmp and {RUNNER_HOME} are on different devices
cp -r /runnertmp/* ${RUNNER_HOME}/
if [[ "${UNITTEST:-}" == '' ]]; then
sudo chown -R runner:docker "$RUNNER_HOME"
# enable dotglob so we can copy a ".env" file to load in env vars as part of the service startup if one is provided
# loading a .env from the root of the service is part of the actions/runner logic
shopt -s dotglob
# use cp instead of mv to avoid issues when src and dst are on different devices
cp -r "$RUNNER_ASSETS_DIR"/* "$RUNNER_HOME"/
shopt -u dotglob
fi
cd ${RUNNER_HOME}
# past that point, it's all relative pathes from /runner
config_args=()
if [ "${RUNNER_FEATURE_FLAG_EPHEMERAL:-}" == "true" -a "${RUNNER_EPHEMERAL}" != "false" ]; then
if [ "${RUNNER_FEATURE_FLAG_EPHEMERAL:-}" == "true" -a "${RUNNER_EPHEMERAL}" == "true" ]; then
config_args+=(--ephemeral)
echo "Passing --ephemeral to config.sh to enable the ephemeral runner."
log.debug 'Passing --ephemeral to config.sh to enable the ephemeral runner.'
fi
if [ "${DISABLE_RUNNER_UPDATE:-}" == "true" ]; then
config_args+=(--disableupdate)
echo "Passing --disableupdate to config.sh to disable automatic runner updates."
log.debug 'Passing --disableupdate to config.sh to disable automatic runner updates.'
fi
retries_left=10
while [[ ${retries_left} -gt 0 ]]; do
log "Configuring the runner."
log.debug 'Configuring the runner.'
./config.sh --unattended --replace \
--name "${RUNNER_NAME}" \
--url "${GITHUB_URL}${ATTACH}" \
@@ -108,18 +97,18 @@ while [[ ${retries_left} -gt 0 ]]; do
--work "${RUNNER_WORKDIR}" "${config_args[@]}"
if [ -f .runner ]; then
success "Runner successfully configured."
log.debug 'Runner successfully configured.'
break
fi
error "Configuration failed. Retrying"
log.debug 'Configuration failed. Retrying'
retries_left=$((retries_left - 1))
sleep 1
done
if [ ! -f .runner ]; then
# we couldn't configure and register the runner; no point continuing
error "Configuration failed!"
log.error 'Configuration failed!'
exit 2
fi
@@ -145,28 +134,34 @@ cat .runner
# -H "Authorization: bearer ${GITHUB_TOKEN}"
# https://api.github.com/repos/USER/REPO/actions/runners/171
if [ -n "${RUNNER_REGISTRATION_ONLY}" ]; then
success "This runner is configured to be registration-only. Exiting without starting the runner service..."
exit 0
fi
if [ -z "${UNITTEST:-}" ]; then
mkdir ./externals
mkdir -p ./externals
# Hack due to the DinD volumes
mv ./externalstmp/* ./externals/
for f in runsvc.sh RunnerService.js; do
diff {bin,patched}/${f} || :
sudo mv bin/${f}{,.bak}
sudo mv {patched,bin}/${f}
done
fi
args=()
if [ "${RUNNER_FEATURE_FLAG_EPHEMERAL:-}" != "true" -a "${RUNNER_EPHEMERAL}" != "false" ]; then
if [ "${RUNNER_FEATURE_FLAG_EPHEMERAL:-}" != "true" -a "${RUNNER_EPHEMERAL}" == "true" ]; then
args+=(--once)
echo "Passing --once to runsvc.sh to enable the legacy ephemeral runner."
log.warning 'Passing --once is deprecated and will be removed as an option' \
'from the image and actions-runner-controller at the release of 0.24.0.' \
'Upgrade to GHES => 3.3 to continue using actions-runner-controller. If' \
'you are using github.com ignore this warning.'
fi
unset RUNNER_NAME RUNNER_REPO RUNNER_TOKEN
exec ./bin/runsvc.sh "${args[@]}"
# Unset entrypoint environment variables so they don't leak into the runner environment
unset RUNNER_NAME RUNNER_REPO RUNNER_TOKEN STARTUP_DELAY_IN_SECONDS DISABLE_WAIT_FOR_DOCKER
# Docker ignores PAM and thus never loads the system environment variables that
# are meant to be set in every environment of every user. We emulate the PAM
# behavior by reading the environment variables without interpreting them.
#
# https://github.com/actions-runner-controller/actions-runner-controller/issues/1135
# https://github.com/actions/runner/issues/1703
# /etc/environment may not exist when running unit tests depending on the platform being used
# (e.g. Mac OS) so we just skip the mapping entirely
if [ -z "${UNITTEST:-}" ]; then
mapfile -t env </etc/environment
fi
exec env -- "${env[@]}" ./run.sh "${args[@]}"

73
runner/logger.bash Executable file
View File

@@ -0,0 +1,73 @@
#!/usr/bin/env bash
# We are not using `set -Eeuo pipefail` here because this file is sourced by
# other scripts that might not be ready for a strict Bash setup. The functions
# in this file do not require it, because they are not handling signals, have
# no external calls that can fail (printf as well as date failures are ignored),
# are not using any variables that need to be set, and are not using any pipes.
# This logger implementation can be replaced with another logger implementation
# by placing a script called `logger.bash` in `/usr/local/bin` of the image. The
# only requirement for the script is that it defines the following functions:
#
# - `log.debug`
# - `log.notice`
# - `log.warning`
# - `log.error`
# - `log.success`
#
# Each function **MUST** accept an arbitrary amount of arguments that make up
# the (unstructured) logging message.
#
# Additionally the following environment variables **SHOULD** be supported to
# disable their corresponding log entries, the value of the variables **MUST**
# not matter the mere fact that they are set is all that matters:
#
# - `LOG_DEBUG_DISABLED`
# - `LOG_NOTICE_DISABLED`
# - `LOG_WARNING_DISABLED`
# - `LOG_ERROR_DISABLED`
# - `LOG_SUCCESS_DISABLED`
# The log format is constructed in a way that it can easily be parsed with
# standard tools and simple string manipulations; pattern and example:
#
# YYYY-MM-DD hh:mm:ss.SSS $level --- $message
# 2022-03-19 10:01:23.172 NOTICE --- example message
#
# This function is an implementation detail and **MUST NOT** be called from
# outside this script (which is possible if the file is sourced).
__log() {
local color instant level
color=${1:?missing required <color> argument}
shift
level=${FUNCNAME[1]} # `main` if called from top-level
level=${level#log.} # substring after `log.`
level=${level^^} # UPPERCASE
if [[ ! -v "LOG_${level}_DISABLED" ]]; then
instant=$(date '+%F %T.%-3N' 2>/dev/null || :)
# https://no-color.org/
if [[ -v NO_COLOR ]]; then
printf -- '%s %s --- %s\n' "$instant" "$level" "$*" 1>&2 || :
else
printf -- '\033[0;%dm%s %s --- %s\033[0m\n' "$color" "$instant" "$level" "$*" 1>&2 || :
fi
fi
}
# To log with a dynamic level use standard Bash capabilities:
#
# level=notice
# command || level=error
# "log.$level" message
#
# @formatter:off
log.debug () { __log 37 "$@"; } # white
log.notice () { __log 34 "$@"; } # blue
log.warning () { __log 33 "$@"; } # yellow
log.error () { __log 31 "$@"; } # red
log.success () { __log 32 "$@"; } # green
# @formatter:on

View File

@@ -1,24 +0,0 @@
#!/bin/sh
# Logger from this post http://www.cubicrace.com/2016/03/log-tracing-mechnism-for-shell-scripts.html
function INFO(){
local function_name="${FUNCNAME[1]}"
local msg="$1"
timeAndDate=`date`
echo "[$timeAndDate] [INFO] [${0}] $msg"
}
function DEBUG(){
local function_name="${FUNCNAME[1]}"
local msg="$1"
timeAndDate=`date`
echo "[$timeAndDate] [DEBUG] [${0}] $msg"
}
function ERROR(){
local function_name="${FUNCNAME[1]}"
local msg="$1"
timeAndDate=`date`
echo "[$timeAndDate] [ERROR] $msg"
}

View File

@@ -1,21 +0,0 @@
#!/bin/sh
set -eu
# "modprobe" without modprobe
# https://twitter.com/lucabruno/status/902934379835662336
# this isn't 100% fool-proof, but it'll have a much higher success rate than simply using the "real" modprobe
# Docker often uses "modprobe -va foo bar baz"
# so we ignore modules that start with "-"
for module; do
if [ "${module#-}" = "$module" ]; then
ip link show "$module" || true
lsmod | grep "$module" || true
fi
done
# remove /usr/local/... from PATH so we can exec the real modprobe as a last resort
export PATH='/usr/sbin:/usr/bin:/sbin:/bin'
exec modprobe "$@"

View File

@@ -1,91 +0,0 @@
#!/usr/bin/env node
// Copyright (c) GitHub. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.
var childProcess = require("child_process");
var path = require("path")
var supported = ['linux', 'darwin']
if (supported.indexOf(process.platform) == -1) {
console.log('Unsupported platform: ' + process.platform);
console.log('Supported platforms are: ' + supported.toString());
process.exit(1);
}
var stopping = false;
var listener = null;
var runService = function() {
var listenerExePath = path.join(__dirname, '../bin/Runner.Listener');
var interactive = process.argv[2] === "interactive";
if(!stopping) {
try {
if (interactive) {
console.log('Starting Runner listener interactively');
listener = childProcess.spawn(listenerExePath, ['run'].concat(process.argv.slice(3)), { env: process.env });
} else {
console.log('Starting Runner listener with startup type: service');
listener = childProcess.spawn(listenerExePath, ['run', '--startuptype', 'service'].concat(process.argv.slice(2)), { env: process.env });
}
console.log('Started listener process');
listener.stdout.on('data', (data) => {
process.stdout.write(data.toString('utf8'));
});
listener.stderr.on('data', (data) => {
process.stdout.write(data.toString('utf8'));
});
listener.on('close', (code) => {
console.log(`Runner listener exited with error code ${code}`);
if (code === 0) {
console.log('Runner listener exit with 0 return code, stop the service, no retry needed.');
stopping = true;
} else if (code === 1) {
console.log('Runner listener exit with terminated error, stop the service, no retry needed.');
stopping = true;
} else if (code === 2) {
console.log('Runner listener exit with retryable error, re-launch runner in 5 seconds.');
} else if (code === 3) {
console.log('Runner listener exit because of updating, re-launch runner in 5 seconds.');
} else {
console.log('Runner listener exit with undefined return code, re-launch runner in 5 seconds.');
}
if(!stopping) {
setTimeout(runService, 5000);
}
});
} catch(ex) {
console.log(ex);
}
}
}
runService();
console.log('Started running service');
var gracefulShutdown = function(code) {
console.log('Shutting down runner listener');
stopping = true;
if (listener) {
console.log('Sending SIGINT to runner listener to stop');
listener.kill('SIGINT');
// TODO wait for 30 seconds and send a SIGKILL
}
}
process.on('SIGINT', () => {
gracefulShutdown(0);
});
process.on('SIGTERM', () => {
gracefulShutdown(0);
});

View File

@@ -1,20 +0,0 @@
#!/bin/bash
# convert SIGTERM signal to SIGINT
# for more info on how to propagate SIGTERM to a child process see: http://veithen.github.io/2014/11/16/sigterm-propagation.html
trap 'kill -INT $PID' TERM INT
if [ -f ".path" ]; then
# configure
export PATH=`cat .path`
echo ".path=${PATH}"
fi
# insert anything to setup env when running as a service
# run the host process which keep the listener alive
./externals/node12/bin/node ./bin/RunnerService.js $* &
PID=$!
wait $PID
trap - TERM INT
wait $PID

35
runner/startup.sh Normal file → Executable file
View File

@@ -1,13 +1,13 @@
#!/bin/bash
source /opt/bash-utils/logger.sh
source logger.bash
function wait_for_process () {
local max_time_wait=30
local process_name="$1"
local waited_sec=0
while ! pgrep "$process_name" >/dev/null && ((waited_sec < max_time_wait)); do
INFO "Process $process_name is not running yet. Retrying in 1 seconds"
INFO "Waited $waited_sec seconds of $max_time_wait seconds"
log.debug "Process $process_name is not running yet. Retrying in 1 seconds"
log.debug "Waited $waited_sec seconds of $max_time_wait seconds"
sleep 1
((waited_sec=waited_sec+1))
if ((waited_sec >= max_time_wait)); then
@@ -33,29 +33,32 @@ jq ".\"registry-mirrors\"[0] = \"${DOCKER_REGISTRY_MIRROR}\"" /etc/docker/daemon
fi
SCRIPT
INFO "Using /etc/docker/daemon.json with the following content"
dump() {
local path=${1:?missing required <path> argument}
shift
printf -- "%s\n---\n" "${*//\{path\}/"$path"}" 1>&2
cat "$path" 1>&2
printf -- '---\n' 1>&2
}
cat /etc/docker/daemon.json
for config in /etc/docker/daemon.json /etc/supervisor/conf.d/dockerd.conf; do
dump "$config" 'Using {path} with the following content:'
done
INFO "Using /etc/supervisor/conf.d/dockerd.conf with the following content"
cat /etc/supervisor/conf.d/dockerd.conf
INFO "Starting supervisor"
log.debug 'Starting supervisor daemon'
sudo /usr/bin/supervisord -n >> /dev/null 2>&1 &
INFO "Waiting for processes to be running"
log.debug 'Waiting for processes to be running...'
processes=(dockerd)
for process in "${processes[@]}"; do
wait_for_process "$process"
if [ $? -ne 0 ]; then
ERROR "$process is not running after max time"
ERROR "Dumping /var/log/dockerd.err.log to help investigation"
cat /var/log/dockerd.err.log
log.error "$process is not running after max time"
dump /var/log/dockerd.err.log 'Dumping {path} to aid investigation'
exit 1
else
INFO "$process is running"
else
log.debug "$process is running"
fi
done

View File

@@ -18,30 +18,47 @@ func (c *Simulator) GetRunnerGroupsVisibleToRepository(ctx context.Context, org,
panic(fmt.Sprintf("BUG: owner should not be empty in this context. repo=%v", repo))
}
runnerGroups, err := c.Client.ListOrganizationRunnerGroups(ctx, org)
if err != nil {
return visible, err
}
for _, runnerGroup := range runnerGroups {
ref := NewRunnerGroupFromGitHub(runnerGroup)
if !managed.Includes(ref) {
continue
if c.Client.GithubBaseURL == "https://github.com/" {
runnerGroups, err := c.Client.ListOrganizationRunnerGroupsForRepository(ctx, org, repo)
if err != nil {
return visible, err
}
if runnerGroup.GetVisibility() != "all" {
hasAccess, err := c.hasRepoAccessToOrganizationRunnerGroup(ctx, org, runnerGroup.GetID(), repo)
if err != nil {
return visible, err
}
for _, runnerGroup := range runnerGroups {
ref := NewRunnerGroupFromGitHub(runnerGroup)
if !hasAccess {
if !managed.Includes(ref) {
continue
}
visible.Add(ref)
}
} else {
runnerGroups, err := c.Client.ListOrganizationRunnerGroups(ctx, org)
if err != nil {
return visible, err
}
visible.Add(ref)
for _, runnerGroup := range runnerGroups {
ref := NewRunnerGroupFromGitHub(runnerGroup)
if !managed.Includes(ref) {
continue
}
if runnerGroup.GetVisibility() != "all" {
hasAccess, err := c.hasRepoAccessToOrganizationRunnerGroup(ctx, org, runnerGroup.GetID(), repo)
if err != nil {
return visible, err
}
if !hasAccess {
continue
}
}
visible.Add(ref)
}
}
return visible, nil

View File

@@ -3,6 +3,7 @@ package simulator
import (
"fmt"
"sort"
"strings"
"github.com/google/go-github/v39/github"
)
@@ -96,6 +97,10 @@ type RunnerGroup struct {
Name string
}
func (r RunnerGroup) String() string {
return fmt.Sprintf("RunnerGroup{Scope:%s, Kind:%s, Name:%s}", r.Scope, r.Kind, r.Name)
}
// VisibleRunnerGroups is a set of enterprise and organization runner groups
// that are visible to a GitHub repository.
// GitHub Actions chooses one of such visible group on which the workflow job is scheduled.
@@ -111,6 +116,15 @@ func NewVisibleRunnerGroups() *VisibleRunnerGroups {
return &VisibleRunnerGroups{}
}
func (g *VisibleRunnerGroups) String() string {
var gs []string
for _, g := range g.sortedGroups {
gs = append(gs, g.String())
}
return strings.Join(gs, ", ")
}
func (g *VisibleRunnerGroups) IsEmpty() bool {
return len(g.sortedGroups) == 0
}

Some files were not shown because too many files have changed in this diff Show More