* Changed Dockerfile to get the Enviroment variable from the github actions workflow and pass it to the main.go file
Added a function in main.go to fetch the enviroment varible and to have a fallback if the env variable isnt there
Added a test for the version to use for this branch only
* Update test-version.yaml
* Update test-version.yaml
* Removed the test because its not needed when we push upstream
* Moved the version print in main.go to the Log codeblock as requested by toast-gear
Added version as issue#1161 requests.
Decided to use a docker tag structure for the userAgent string, with : being a seperator of the name and version
* Used ldflags instead like mumoshu recommended
Changed Dockerfile to use $VERSION from the workflow
Added version.go and the build package
Removed the getVersion function as we can just get the value directly
* Used ldflags instead like mumoshu recommended
Changed Dockerfile to use $VERSION from the workflow
Added version.go and the build package
Removed the getVersion function as we can just get the value directly
* * Removed the default from the go code (set it as N/A)
* Changed version from latest to dev inside makefile
* Added buildarg for version to the dockerfile in the makerfile
* Added VERSION with default dev value as arg inside dockerfile
* Cleaned up inside dockerfile
* Fix failing test
* Fix possible missing VERSION in the ARC UA suffix due to missing build arg in docker-build-push step
Co-authored-by: S8338C <viktor.lindgren@seb.se>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* [Doc] Create ARC overview doc
The purpose of this doc is a starting point overview to ARC, with links to Quick start guide within.
* Update Actions-Runner-Controller-Overview.md
Fixed some formatting
* Update for minor formatting
Fixed links to include quotes, where missing. Added spaces after periods, where missing.
* Updated links to the QuickStart guide
* Updated Images and scaling sections
Updated the following based on PR feedback
- `The Runner container image` now calls out more explicitly the recommended way to install additional software
- `Scaling runners - dynamically with Pull Driven ScalingScaling runners - dynamically with Pull Driven Scaling` - Removed mentions of `TotalNumberOfQueuedAndInProgressWorkflowRuns` as its not fully implemented
* Apply suggestions from code review
Incorporated review feedback from @andyfeller, @sethrylan, @debuger24 and @mumoshu. Thank you all.
Co-authored-by: Andy Feller <andyfeller@github.com>
Co-authored-by: Rahul Kumar <rahulcomp24@gmail.com>
Co-authored-by: Seth Rylan Gainey <sethrylan@github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Apply suggestions from code review
Add more detailed config for PercentageRunnersBusy metric
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Updated link text to "Pull Driven Scaling"
Co-authored-by: Andy Feller <andyfeller@github.com>
Co-authored-by: Rahul Kumar <rahulcomp24@gmail.com>
Co-authored-by: Seth Rylan Gainey <sethrylan@github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Add prometheus metrics for autoscaling
* Add desc for prometheus-metrics
* FIX: Typo
* Remove replicas_desired_before in metrics
* Remove Num prefix in metricws
* Create QuickStartGuide.md
Creating a new Quickstart guide that captures simple onboarding instructions. The intent is for first time users to be able to follow this guide and get their environment running and try out ARC. A link to this guide would be added to the repo readme once this PR get merged.
* Update QuickStartGuide.md
Fixed a typo - removed "$" from codeblock "$ kubectl apply -f runnerdeployment.yaml"
* Update QuickStartGuide.md
Eliminated need to specify PAT in Custom_values.yaml. Instead passing as parameter while installing helm chart. This eliminates need to store PAT in a file and also eliminates a setup step.
* Fixed minor typos
Fixed types identified by @nebuk89
* Minor formatting in links and periods.
Fixed formatting to include space after period and commas. Fixed formatting on some links to include quotes
Changed from `Kuernetes` to `Kubernetes` in the **Multitenancy** chapter.
By the way why not use [the vale-action](https://github.com/errata-ai/vale-action) to automate linting in the markdown files? If you'd like I can probably find some time to do it. Just a small token of appreciation for an awesome project!
* chart: Remove support for extensions/v1beta1 and networking.k8s.io/v1beta1
`networking.k8s.io/v1` has been available since v1.19.
As of today, AWS EKS supports v1.19+ and Oracle Cloud supports v1.20+. GKE and AKS supports v1.21+. The upstream Kubernetes project maintains v1.22+.
So it should be safe to remove it now.
* fixup! chart: Remove support for extensions/v1beta1 and networking.k8s.io/v1beta1
This removes the flag and code for the legacy GitHub API cache. We already migrated to fully use the new HTTP cache based API cache functionality which had been added via #1127 and available since ARC 0.22.0. Since then, the legacy one had been no-op and therefore removing it is safe.
Ref #1412
* feat: allow to discover runner statuses
* fix manifests
* Bump runner version to 2.289.1 which includes the hooks support
* Add feedback from review
* Update reference to newRunnerPod
* Fix TestNewRunnerPodFromRunnerController and make hooks file names job specific
* Fix additional TestNewRunnerPod test
* Cover additional feedback from review
* fix rbac manager role
* Add permissions to service account for container mode if not provided
* Rename flag to runner.statusUpdateHook.enabled and fix needsServiceAccount
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
This contains apparently enough changes to the current E2E test code to make it runnable against remote Kubernetes clusters. I was actually able to make the test passing against my AWS EKS based test clusters with these changes. You still need to trigger it manually from a local checkout of the ARC repo today. But this might be the foundation for automated E2E tests against major cloud providers.
I encountered this once while E2E testing ARC with K8s 1.22 and cert-manager 1.1.1. The K8s version is too high / The cert-manager is too low so you generally need to fix either. In a standard scenario, it should be more feasible and meaningful to upgrade cert-manager to a recent enough version that supports the new Kubernetes version.
I've been manually setting up Argo Tunnel to expose the webhook server while running E2E tests so that I can cover the webhook-based autoscaling. This automates the setup process so that we can automatiaclly bring up and down cloudflared before/after the test run, so that it can be a part of our upcoming automated E2E test.
The regression resulted in the webhook-based autoscaler be unable to find visible runner groups and therefore unable to scale up and down the target RunnerDeployment/RunnerSet at all when the webhook-based autoscaler was provided GitHub API credentials to enable the runner groups support. This fixes that.
The regression was introduced via #1578 which is not released yet. Users of existing ARC releases are therefore not affected.
* Cover ARC upgrade in E2E test
so that we can make it extra sure that you can upgrade the existing installation of ARC to the next and also (hopefully) it is backward-compatible, or at least it does not break immediately after upgrading.
* Consolidate E2E tests for RS and RD
* Fix E2E for RD to pass
* Add some comment in E2E for how to release disk consumed after dozens of test runs
* chore: move HOME to more logical place
* chore: don't break the PATH
* chore: don't break the PATH
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
The primary goal of this change is to let the tester know about the config difference between the explicitly configured ephemeral work volume vs the automatically configured work volume with workVolumeClaimTemplate+containerMode=kubernetes.
* added troubleshooting solution
* added error example
* added entry to the pages index
* sorted
Co-authored-by: Mike Joseph <mike@Mikes-MacBook-Pro-5618.local>
Co-authored-by: Mike Joseph <mike@efrontier.com>
* added containerMode=kubernetes env variables to the runner
* removed unused logging
* restored configs and charts
* restored makefile cert version and acceptance/run
* added workVolumeClaimTemplate in pod definition, including logic
* added claim template name based on the runner
* Apply suggestions from code review
update errors
* added concurrent cleanup before runner pod is deleted
* update manifests
* added retry after 30s if pod cleanup contains err
* added admission webhook check, made workVolumeClaimTemplate mandatory for k8s
* style changes and added comments
* added izZero timestamp check for deleting runner-linked pods
* changed order of local variable to avoid copy if p is deleted
* removed docker from container mode k8s
* restored charts, config, makefile
* restored forked files back and not the ARC ones
* created PersistentVolume on containerMode k8s
* create pv only if storage class name is local-storage
* removed actions if storage class name is local-storage
* added service account validation if container mode kubernetes
* changed the coding style to match rest of the ARC
* added validation to the runnerdeployment webhook
* specified fields more precisely, added webhook validation to the replicaset as well
* remake manifests
* wraped delete runner-linked-pods in kube mode
* fixed empty line
* fixed import
* makefile changes for hooks
* added cleanup secrets
* create manifests
* docs
* update access modes
* update dockerfile
* nit changes
* fixed dockerfile
* rewrite allowing reuse for runners and runnersets
* deepcopy forgot to stage
* changed privileged
* make manifests
* partly moved to finalizer, still need to apply finalizer first
* finalizer added if env variable used in container mode exists
* bump runner version
* error message moved from Error to Info on cleanup pods/secrets
* removed useless dereferencing, added transformation tests of workVolumeClaimTemplate
* Apply suggestions from code review
* Update controllers/utils_test.go
Co-authored-by: Thomas Boop <52323235+thboop@users.noreply.github.com>
* Update controllers/utils_test.go
Co-authored-by: Thomas Boop <52323235+thboop@users.noreply.github.com>
* add hook version to cli, update to 0.1.2
* Apply suggestions from code review
* Update controllers/utils_test.go
* Update runner/Makefile
* Fix missing secret permission and the error handling
* Fix a runnerpod reconciler finalizer to not trigger unnecessary retry
Co-authored-by: Nikola Jokic <nikola-jokic@github.com>
Co-authored-by: Nikola Jokic <97525037+nikola-jokic@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Make webhook-based scale operation asynchronous
This prevents race condition in the webhook-based autoscaler when it received another webhook event while processing another webhook event and both ended up
scaling up the same horizontal runner autoscaler.
Ref #1321
* Fix typos
* Update rather than Patch HRA to avoid race among webhook-based autoscaler servers
* Batch capacity reservation updates for efficient use of apiserver
* Fix potential never-ending HRA update conflicts in batch update
* Extract batchScaler out of webhook-based autoscaler for testability
* Fix log levels and batch scaler hang on start
* Correlate webhook event with scale trigger amount in logs
* Fix log message
This overhaul turns it into a shellcheck valid script with explicit error handling for all possible situations I could think of. This change takes https://github.com/actions-runner-controller/actions-runner-controller/pull/1409 into account and things can be merged in any order. There are a few important changes here to the logic:
- The wait logic for checking if docker comes up was fundamentally flawed because it checks for the PID. Docker will always come up and thus become visible in the process list, just to immediately die when it encounters an issue, after which supervisor starts it again. This means that our check so far is flaky due to the `sleep 1` it might encounter a PID, or it might not, and the existence of the PID does not mean anything. The `docker ps` check we have in the `entrypoint.sh` script does not suffer from this as it checks for a feature of docker and not a PID. I thus entirely removed the PID check, and instead I am handing things over to our `entrypoint.sh` script by setting the environment variables correctly.
- This change has an influence on the `docker0` interface MTU configuration, because the interface might or might not exist after we started docker. Hence, I changed this to a time boxed loop that tries for one minute to set up the interface's MTU. In case the command fails we log an error and continue with the run.
- I changed the entire MTU handling by validating its value before configuring it, logging an error and continuing without if it is set incorrectly. This ensures that we are not going to send our users on a bug hunt.
- The way we started supervisord did not make much sense to me. It sends itself into the background automatically, there is no need for us to do so with Bash.
The decision to not fail on errors but continue is a deliberate choice, because I believe that running a build is more important than having a perfectly configured system. However, this strategy might also hide issues for all users who are not properly checking their logs. It also makes testing harder. Hence, we could change all error conditions from graceful to panicking. We should then align the exit codes across `startup.sh` and `entrypoint.sh` to ensure that every possible error condition has its own unique error code for easy debugging.
* ci: align pipeline files and setups
* ci: more changes
* ci: various changes
* ci: fix setup-helm action ref
* ci: better pipeline name
* ci: more format aligning
* ci: more format aligning
* ci: better job name
* ci: supports multiple languages
* ci: better pipeline and job names
* ci: do a verb-noun thing for consistency
* ci: use 'arc' when talking holistically
* ci: add caching scope
* ci: put canary in a scope
* ci: fix syntax error
* ci: better pipeline and job names
* ci: better job name
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
* Fix example manifests for webhook based scaling
I tried running these on my k8s cluster and I got some easy to fix errors, so I am committing them here.
* Fix example manifests for webhook autoscaling with workflow_jobs
* Fix the explation on how to setup webhooks on your cluster
* Replace unclear comment with actual code examples
There was a comment instructing users to add minReplicas and
maxReplicas to all the HRA yamls, so I just removed it and added
these attributes to the yamls themselves for clarity.
* Make clear that using the ingress example is just a suggestion
* Apply some text improvements suggested by @mumoshu
* Update examples so the webhook server is exposed on a NodePort
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Remove an unnecessary field from one the examples
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Apply suggestion from @mumoshu
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Remove namespace fields from webhook autoscaler examples
This change was suggested by @mumoshu
* Apply final suggestion from @mumoshu
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
A small improvement to our E2E test suite which allows you to set `ARC_E2E_NO_CLEANUP=whatever` to let it prevent the kind cluster cleanup on successful test run, so that you can rerun it without waiting for the new kind cluster to come up.
* doc: Use RunnerSet to retain various cache
In relation to #1286 and as a follow-up for #1340
* docs: clarify client vs daemon
* docs: better wording
* Separate RunnerSet examples for docker iimage layer caching
* Revert changes on testdata as it is going to be added via #1471 instead
* Update README.md
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
* fixup! Update README.md
* Remove the outdated RunnerSet limitation
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
This adds the test to verify the runner pod generation logic for the case that you use a generic ephemeral volume as "work".
It is almost an adaptation of the test cases writetn for RunnerSet in #1471, to RunnerDeployment and Runner.
* fix: Avoid duplicate volume and mount name error for generic ephemeral volume as "work"
While manually testing configurations being documented in #1464, I discovered that the use of dynamic ephemeral volume for "work" directory was not working correctly due to the valiadation error.
This fixes the runner pod generation logic to not add the default volume and volume mount for "work" dir, so that the error disappears.
Ref #1464
* e2e: Ensure work generic ephemeral volume to work as expected
Renamed the runner dockerfiles so that we have proper syntax highlighting for them, as well as a consistent way to map from the image name to the dockerfile. Added a `.dockerignore` file to avoid uploading things to the daemon that we never use.
* chart: Add extraPaths to Ingress of GitHub Webhook Server
* Update charts/actions-runner-controller/templates/githubwebhook.ingress.yaml
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Prefix the toYaml expression to remove the extra newline before extra paths
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
We had some dead code left over from the removal of registration runners. Registration runners were removed in #859#1207
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Enhance RunnerSet to optionally retain PVs accross restarts
This is our initial attempt to bring back the ability to retain PVs across runner pod restarts when using RunnerSet.
The implementation is composed of two new controllers, `runnerpersistentvolumeclaim-controller` and `runnerpersistentvolume-controller`.
It all starts from our existing `runnerset-controller`. The controller now tries to mark any PVCs created by StatefulSets created for the RunnerSet.
Once the controller terminated statefulsets, their corresponding PVCs are clean up by `runnerpersistentvolumeclaim-controller`, then PVs are unbound from their corresponding PVCs by `runnerpersistentvolume-controller` so that they can be reused by future PVCs createf for future StatefulSets that shares the same same StorageClass.
Ref #1286
* Update E2E test suite to cover runner, docker, and go caching with RunnerSet + PVs
Ref #1286
Give up pinning deps with commit IDs because PRs were unreviewable due to missing changelog and it sends PRs for every commit to the master/main branch of the deps, which is undesired. We only need updates for tagged releases!
* chore: Add signrel command for sigining arc release assets
I used this command to sign assets for the recent releases to comply with the recommendation of 5758364c82/docs/checks.md (signed-releases)
Ref #1298
* Implement signrel subcommands for listing tags and signing assets, with docs
* chore: more initialisation info to help debug
* chore: clearer flag description
* chore: use actual english
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
This is intended to fix#1369 mostly for RunnerSet-managed runner pods. It is "mostly" because this fix might work well for RunnerDeployment in cases that #1395 does not work, like in a case that the user explicitly set the runner pod restart policy to anything other than "Never".
Ref #1369
* docs: runner cleanup instructions
* docs: add delay in job allocation
* docs: fix broken link
* docs: reorganise into categories
* docs: align the format across the doc
* docs: remove code comment
* docs: add a tools section
* docs: add a short description to each section
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
This feature flag was provided from ARC to runner container automatically to let it use `--ephemeral` instead of `--once` by default. As the support for `--once` is being dropped from the runner image via #1384, we no longer need that.
Ref #1196
* runner: Remove the ability to use the deprecated `--once` flag
Ref #1196
* runner: Ability to opt-out of using --ephemeral
Although we are going to eventually remove the ability to use the legacy --once flag as proposed in #1196, there might be folks still using legacy GHES versions 3.2 or earlier.
This commit removes the existing feature flag to opt-in for --ephemeral, while adding another feature flag RUNNER_FEATURE_FLAG_ONCE to opt-in for --once so that folks stuck in legacy GHES versions
can still use ARC.
Since this change every user starts using --ephemeral by default. If they see any issues on legacy GHES instance, RUNNER_FEATURE_FLAG_ONCE=true can be set to opt-in to keep using --once, which gives one more ARC release until they upgrade their GHES instance.
But beware, we won't support legacy GHES instances forever as it's going to be a maintenance nightmare. Please upgrade!
Ref #1196
I believe this helps us focus on relatively more important issues like critical bug reports and highly-requested feature requests.
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
> chart test is failing due to `flag provided but not defined: -default-scale-down-delay` which seems to come from the fact that we still use ARC 0.22.3 for chart testing.
>
> Probably we'd better figure out how to test it against both the latest release version of ARC and the canary version of ARC?
>
> Or just test it against the canary version so that it won't fail when the chart depends on features that are available only in the canary version of ARC? 🤔
yup, lets get this merged though so we can do a release today
so that we can hopefully get enough information to diagnose the issue in case it's really a bug report, or it goes to Discussions in case it's a question.
In #1373 we made two mistakes:
- We mistakenly checked if all the runner labels are included in the job labels and only after that it marked the target as eligible for scale. It should definitely be the opposite!
- We mistakenly checked for the existence of `self-hosted` labe l in the job. [Although it should be a good practice to explicitly say `runs-on: ["self-hosted", "custom-label"]`](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-using-labels-for-runner-selection), that's not a requirement so we should code accordingly.
The consequence of those two mistakes was that, for example, jobs with `self-hosted` + `custom` labels didn't result in scaling runner with `self-hosted` + `custom` + `custom2`. This should fix that.
Ref #1056
Ref #1373
* refactor: remove legacy build and use buildkit
* refactor: add runner version to root makefie
* refactor: enable buildkit for runner make build
* refactor: ignore runner makefile in ci
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
It's probably worth highlighting it's ver 0.X.X by design and that breaking changes are possible until we move it to 1.0.0
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
Adds some unit tests for the runner pod generation logic that is used internally by runner deployment and runner set controllers as preparation for #1282
Without that field, GKE 1.21 refuses to create the CRD
with an error message that conversionReviewVersions is mandatory.
conversionReviewVersions is a required field when creating apiextensions.k8s.io/v1 custom resource definitions.
Webhooks are required to support at least one ConversionReview version understood by the current and previous API server.
See https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/_print/#webhook-request-and-response
The [ListRunnerGroup API](https://docs.github.com/en/rest/reference/actions#list-self-hosted-runner-groups-for-an-organization) now add a new query parameter `visible_to_repository`.
We were doing `N+1` lookup when trying to find which runner group can be used for job from a certain repository.
- List all runner groups
- Loop through all groups to check repository access for each of them via [API](https://docs.github.com/en/rest/reference/actions#list-repository-access-to-a-self-hosted-runner-group-in-an-organization)
The new query parameter `visible_to_repository` should allow us to get the runner groups with access in one call.
Limitation:
- The new query parameter is only supported in GitHub.com, which means anyone who uses ARC in GitHub Enterprise Server won't get this.
- I am working on a PR to update `go-github` library to support the new parameter, but it will take a few weeks for a newer `go-github` to be released, so in the meantime, I am duplicating the implementation in ARC as well to support the new query parameter.
* Improved Bash Logger
This is a first step towards having robust Bash scripts in the runner images. The changes _could_ be considered breaking, depending on our backwards compatibility definition.
* Fixed Log Formatting Issues
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
Bumps the chart version along with the controller version.
We bump the patch number for the chart as the release for the controller is a patch release.
That's the same handling as we've done in the previous version ecc8b4472a and #1300
As always, be sure to upgrade CRDs before updating the controller version!
Otherwise it can break in interesting ways.
This fixes the said issue by additionally treating any runner pod whose phase is Failed or the runner container exited with non-zero code as "complete" so that ARC gives up unregistering the runner from Actions, deletes the runner pod anyway.
Note that there are a plenty of causes for that. If you are deploying runner pods on AWS spot instances or GCE preemptive instances and a job assigned to a runner took more time than the shutdown grace period provided by your cloud provider (2 minutes for AWS spot instances), the runner pod would be terminated prematurely without letting actions/runner unregisters itself from Actions. If your VM or hypervisor failed then runner pods that were running on the node will become PodFailed without unregistering runners from Actions.
Please beware that it is currently users responsibility to clean up any dangling runner resources on GitHub Actions.
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1307
Might also relate to https://github.com/actions-runner-controller/actions-runner-controller/issues/1273
* chore: bump chart to latest
Bumps the chart version along with the controller version.
We bump the patch number for the chart as the release for the controller is a patch release.
That's the same handling as we've done in the previous version ecc8b4472a
As always, be sure to upgrade CRDs before updating the controller version!
Otherwise it can break in interesting ways.
* docs: expand on CRD upgrade requirement
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
With the current implementation if a pod is deleted, controller is failing to delete the runner as it's trying to annotate a pod that doesn't exist as we're passing a new pod object that is not an existing resource
* feat: copy dotfiles from asset to service dir
* Fixed `UNITTEST` Condition
* Load `/etc/environment`
See https://github.com/actions/runner/issues/1703 for context on this change.
docs: various changes in preparation for 0.22.0 release
- Move RunnerSets to a more predominant location in the docs
- Clean up a few bits
- Highlight the deprecation and removal timeline for the `--once` flag
- Renamed ephemeral runners section to something more logical (persistent runners). Static runners were an option however the word static is awkward as it's sort of tied up with autoscaling and the `Runner` kind so Persistent was chosen instead.
- Update upgrade docs to use `replace` instead of `apply`
Fix runner{set,deployment} rollouts and static runner scaling
I was testing static runners as a preparation to cut the next release of ARC, v0.22.0, and found several problems that I thought worth being fixed.
In particular, this PR fixes static runners reliability issues in two means.
c4b24f8366 fixes the issue that ARC gives up retrying RemoveRunner calls too early, especially on static runners, that resulted in static runners to often get terminated prematurely while running jobs.
791634fb12 fixes the issue that ARC was unable to scale up any static runners when the corresponding desired replicas number in e.g. RunnerDeployment gets updated. It was caused by a bug in the mechanism that is intended to prevent ephemeral runners from being recreated in unwanted circumstances, mistakenly triggered also for static runners.
Since #1179, RunnerDeployment was not able to gracefully terminate old RunnerReplicaSet on update. c612e87 fixes that by changing RunnerDeployment to firstly scale old RunnerReplicaSet(s) down to zero and waits for sync, and set the deletion timestamp only after that. That way, RunnerDeployment can ensure that all the old RunnerReplicaSets that are being deleted are already scaled to zero passing the standard unregister-and-then-delete runner termination process.
It revealed a hidden bug in #1179 that sometimes the scale-to-zero-before-runnerreplicaset-termination does not work as intended. 4551309 fixes that, so that RunnerDeployment can actually terminate old RunnerReplicaSets gracefully.
#1179 was not working particularly for scale down of static (and perhaps long-running ephemeral) runners, which resulted in some runner pods are terminated before the requested unregistration processes complete, that triggered some in-progress workflow jobs to hang forever. This fixes an edge-case that resulted in a decreased desired replicas to trigger the failure, so that every runner is unregistered then terminated, as originally designed.
It turned out that #1179 broke static runners in a way it is no longer able to scale up at all when the desired replicas is updated.
This fixes that by correcting a certain short-circuit that is intended only for ephemeral runners to not mistakenly triggered for static runners.
The unregister timeout of 1 minute (no matter how long it is) can negatively impact availability of static runner constantly running workflow jobs, and ephemeral runner that runs a long-running job.
We deal with that by completely removing the unregistaration timeout, so that regarldess of the type of runner(static or ephemeral) it waits forever until it successfully to get unregistered before being terminated.
As it turned out to be the biggest release ever, I was afraid I might not be able to write a summary of changes that communicates well. Here is my attempt. Please review and leave any comments so that we can be more confident in this release. Thank you!
I found that #1179 was unable to finish rollout of an RunnerDeployment update(like runner env update). It was able to create a new RunnerReplicaSet with the desired spec, but unable to tear down the older ones. This fixes that.
Use --ephemeral by default
Every runner is now --ephemeral by default.
Note that this works by ARC setting the RUNNER_FEATURE_FLAG_EPHEMERAL envvar to true by default. Previously you had to explicitly set it to true otherwise the runner was passed --once which is known to various race conditions.
It's worth noting that the very confusing and related configuration, ephemeral: true, which creates --once runners instead of static(or persistent) runners had been the default since many months ago. So, this should be the only change needed to make every runner ephemeral without any explicit configuration.
You can still fall back to static(persistent) runners by setting ephemeral: false, and to --once runners by setting RUNNER_FEATURE_FLAG_EPHEMERAL to "false". But I don't think there're many reasons to do so.
Ref #1189
The version of `bradleyfalzon/ghinstallation` which is used to enable GitHub App authentication turned out to add an extra header `application/vnd.github.machine-man-preview+json` to every HTTP request. That revealed an edge-case in our HTTP cache layer `gregjones/httpcache` that results it to not serve responses from cache when it should.
There were two problems. One was that it does not support multi-valued header and it only looked for the first value for each header, and another is that it does not support any http.RoundTripper implementation that modifies HTTP request headers in a RoundTrip function call.
I fixed it in my fork of httpcache, which is hosted at https://github.com/actions-runner-controller/httpcache.
The relevant commits are:
- 70d975e77d
- 197a8a3546
This can be considered as a follow-up for #1127, which turned out to have enabled the cache only for the case that ARC uses PAT for authentication.
Since this fix, the cache is also enabled when ARC authenticates as a GitHub App.
Since #1127 and #1167, we had been retrying `RemoveRunner` API call on each graceful runner stop attempt when the runner was still busy.
There was no reliable way to throttle the retry attempts. The combination of these resulted in ARC spamming RemoveRunner calls(one call per reconciliation loop but the loop runs quite often due to how the controller works) when it failed once due to that the runner is in the middle of running a workflow job.
This fixes that, by adding a few short-circuit conditions that would work for ephemeral runners. An ephemeral runner can unregister itself on completion so in most of cases ARC can just wait for the runner to stop if it's already running a job. As a RemoveRunner response of status 422 implies that the runner is running a job, we can use that as a trigger to start the runner stop waiter.
The end result is that 422 errors will be observed at most once per the whole graceful termination process of an ephemeral runner pod. RemoveRunner API calls are never retried for ephemeral runners. ARC consumes less GitHub API rate limit budget and logs are much cleaner than before.
Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1167#issuecomment-1064213271
* Remove legacy GitHub API cache of HRA.Status.CachedEntries
We migrated to the transport-level cache introduced in #1127 so not only this is useless, it is harder to deduce which cache resulted in the desired replicas number calculated by HRA.
Just remove the legacy cache to keep it simple and easy to understand.
* Deprecate githubAPICacheDuration helm chart value and the --github-api-cache-duration as well
* Fix integration test
The TimeEncoder for zap seems to have been set to EpochTimeEncoder which is the default and it was not very readable. Changing it to a TimeEncoderOfLayout(time.RFC3339) for readability.
Another benefit of doing this is the ts format is now consistent with various timestamps ARC put into pod and other custom resource annotations.
* fix(chart): allow to use basic auth when authSecret.create is false
When secret is created outside of the ARC chart using authSecret.create=false and basicAuth,
the controller fails as we're not including the basic password as environment variable as
the password value won't be inside the helm values.
This PR includes both environment variables for consistent regardless if
those are set or not similar as the rest of the other auth options (e.g
app_id, private key, etc)
* chart: Add back the conditional block for .Values.authSecret.github_basicauth_username
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
While testing #1179, I discovered that ARC sometimes stop resyncing RunnerReplicaSet when the desired replicas is greater than the actual number of runner pods.
This seems to happen when ARC missed receiving a workflow_job completion event but it has no way to decide if it is either (1) something went wrong on ARC or (2) a loadbalancer in the middle or GitHub or anything not ARC went wrong. It needs a standard to decide it, or if it's not impossible, how to deal with it.
In this change, I added a hard-coded 10 minutes timeout(can be made customizable later) to prevent runner pod recreation.
Now, a RunnerReplicaSet/RunnerSet to restart runner pod recreation 10 minutes after the last scale-up. If the workflow completion event arrived after the timeout, it will decrease the desired replicas number that results in the removal of a runner pod. The removed runner pod might be deleted without ever being used, but I think that's better than leaving the desired replicas and the actual number of replicas diverged forever.
Refactor Runner and RunnerSet so that they use the same library code that powers RunnerSet.
RunnerSet is StatefulSet-based and RunnerSet/Runner is Pod-based so it had been hard to unify the implementation although they look very similar in many aspects.
This change finally resolves that issue, by first introducing a library that implements the generic logic that is used to reconcile RunnerSet, then adding an adapter that can be used to let the generic logic manage runner pods via Runner, instead of via StatefulSet.
Follow-up for #1127, #1167, and 1178
This eliminates the race condition that results in the runner terminated prematurely when RunnerSet triggered unregistration of StatefulSet that added just a few seconds ago.
I can't find any requests made by user agent `actions-runner-controller` in GitHub.com's telemetry in the past 7 days.
Turns out we only set user agent `actions-runner-controller` if we are configured to use BasicAuth which is not the case for most customers I think.
I update the code a little bit to make sure it always set `actions-runner-controller` as UserAgent for the GitHub HttpClient in ARC.
A further step would be somehow baking the ARC release version into the UserAgent as well.
Enhances ARC(both the controller-manager and github-webhook-server) to cache any GitHub API responses with HTTP GET and an appropriate Cache-Control header.
Ref #920
## Cache Implementation
`gregjones/httpcache` has been chosen as a library to implement this feature, as it is as recommended in `go-github`'s documentation:
https://github.com/google/go-github#conditional-requests
`gregjones/httpcache` supports a number of cache backends like `diskcache`, `s3cache`, and so on:
https://github.com/gregjones/httpcache#cache-backends
We stick to the built-in in-memory cache as a starter. Probably this will never becomes an issue as long as various HTTP responses for all the GitHub API calls that ARC makes, list-runners, list-workflow-jobs, list-runner-groups, etc., doesn't overflow the in-memory cache.
`httpcache` has an known unfixed issue that it doesn't update cache on chunked responses. But we assume that the APIs that we call doesn't use chunked responses. See #1503 for more information on that.
## Ephemeral runner pods are no longer recreated
The addition of the cache layer resulted in a slow down of a scale-down process and a trade-off between making the runner pod termination process fragile to various race conditions(shorter grace period before runner deletion) or delaying runner pod deletion depending on how long the grace period is(longer grace period). A grace period needs to be at least longer than 60s (which is the same as cache duration of ListRunners API) to not prematurely delete a runner pod that was just created.
But once I disabled automatic recreation of ephemeral runner pod, it turned out to be no more of an issue when it's being scaled via workflow_job webhook.
Ephemeral runner resources are still automatically added on demand by RunnerDeployment via RunnerReplicaSet(I've added `EffectiveTime` fields to our CRDs but that's an implementation detail so let's omit). A good side-effect of disabling ephemeral runner pod recreations is that ARC will no longer create redundant ephemeral runners when used with webhook-based autoscaler.
Basically, autoscaling still works as everyone might expect. It's just better than before overall.
Enhances runner controller and runner pod controller to have consistent timeouts for runner unregistration and runner pod deletion,
so that we are very much unlikely to terminate pods that are running any jobs.
so that you can run `kubectl logs` on controller pods without the specifying the container name.
It is especially useful when you want to run kubectl-logs on all ARC pods across controller-manager and github-webhook-server like:
```
kubectl -n actions-runner-system logs -l app.kubernetes.io/name=actions-runner-controller
```
That was previously impossible due to that the selector matches pods from both controller-manager and github-webhook-server and kubectl does not provide a way to specify container names for respective pods.
The log level -3 is the minimum log level that is supported today, smaller than debug(-1) and -2(used to log some HRA related logs).
This commit adds a logging HTTP transport to log HTTP requests and responses to that log level.
It implements http.RoundTripper so that it can log each HTTP request with useful metadata like `from_cache` and `ratelimit_remaining`.
The former is set to `true` only when the logged request's response was served from ARC's in-memory cache.
The latter is set to X-RateLimit-Remaining response header value if and only if the response was served by GitHub, not by ARC's cache.
This will cache any GitHub API responses with correct Cache-Control header.
`gregjones/httpcache` has been chosen as a library to implement this feature, as it is as recommended in `go-github`'s documentation:
https://github.com/google/go-github#conditional-requests
`gregjones/httpcache` supports a number of cache backends like `diskcache`, `s3cache`, and so on:
https://github.com/gregjones/httpcache#cache-backends
We stick to the built-in in-memory cache as a starter. Probably this will never becomes an issue as long as various HTTP responses for all the GitHub API calls that ARC makes, list-runners, list-workflow-jobs, list-runner-groups, etc., doesn't overflow the in-memory cache.
`httpcache` has an known unfixed issue that it doesn't update cache on chunked responses. But we assume that the APIs that we call doesn't use chunked responses. See #1503 for more information on that.
Ref #920
There is a race condition between ARC and GitHub service about deleting runner pod.
- The ARC use REST API to find a particular runner in a pod that is not running any jobs, so it decides to delete the pod.
- A job is queued on the GitHub service side, and it sends the job to this idle runner right before ARC deletes the pod.
- The ARC delete the runner pod which cause the in-progress job to end up canceled.
To avoid this race condition, I am calling `r.unregisterRunner()` before deleting the pod.
- `r.unregisterRunner()` will return 204 to indicate the runner is deleted from the GitHub service, we should be safe to delete the pod.
- `r.unregisterRunner` will return 400 to indicate the runner is still running a job, so we will leave this runner pod as it is.
TODO: I need to do some E2E tests to force the race condition to happen.
Ref #911
Apparently, we've been missed taking an updated registration token into account when generating the pod template hash which is used to detect if the runner pod needs to be recreated.
This shouldn't have been the end of the world since the runner pod is recreated on the next reconciliation loop anyway, but this change will make the pod recreation happen one reconciliation loop earlier so that you're less likely to get runner pods with outdated refresh tokens.
Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1085#issuecomment-1027433365
* chart: Allow using different secrets for controller-manager and gh-webhook-server
As it is entirely possible to do so because they are two different K8s deployments. It may provide better scalability because then each component gets its own GitHub API quota.
Some of logs like `HRA keys indexed for HRA` were so excessive that it made testing and debugging the githubwebhookserver harder. This tries to fix that.
We had to manually remove the secret first to update the GitHub credentials used by the controller, which was cumbersome.
Note that you still need to recreate the controller pods and the gh webhook server pods to let them remount the recreated secret.
This will work on GHES but GitHub Enterprise Cloud due to excessive GitHub API calls required.
More work is needed, like adding a cache layer to the GitHub client, to make it usable on GitHub Enterprise Cloud.
Fixes additional cases from https://github.com/actions-runner-controller/actions-runner-controller/pull/1012
If GitHub auth is provided in the webhooks controller then runner groups with custom visibility are supported. Otherwise, all runner groups will be assumed to be visible to all repositories
`getScaleUpTargetWithFunction()` will check if there is an HRA available with the following flow:
1. Search for **repository** HRAs - if so it ends here
2. Get available HRAs in k8s
3. Compute visible runner groups
a. If GitHub auth is provided - get all the runner groups that are visible to the repository of the incoming webhook using GitHub API calls.
b. If GitHub auth is not provided - assume all runner groups are visible to all repositories
4. Search for **default organization** runners (a.k.a runners from organization's visible default runner group) with matching labels
5. Search for **default enterprise** runners (a.k.a runners from enterprise's visible default runner group) with matching labels
6. Search for **custom organization runner groups** with matching labels
7. Search for **custom enterprise runner groups** with matching labels
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Add env variable to configure `disablupdate` flag
* Write test for entrypoint disable update
* Rename flag, update docs for DISABLE_RUNNER_UPDATE
* chore: bump runner version in makefile
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
This allows providing a different `work` Volume.
This should be a cloud agnostic way of allowing the operator to use (for example) NVME backed storage.
This is a working example where the workDir will use the provided volume, additionally here docker is placed on the same NVME.
```
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: runner-2
spec:
template:
spec:
dockerdContainerResources: {}
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# this is to mount the docker in docker onto NVME disk
dockerVolumeMounts:
- mountPath: /var/lib/docker
name: scratch
subPathExpr: $(POD_NAME)-docker
- mountPath: /runner/_work
name: work
subPathExpr: $(POD_NAME)-work
volumeMounts:
- mountPath: /runner/_work
name: work
subPathExpr: $(POD_NAME)-work
dockerEnv:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumes:
- hostPath:
path: /mnt/disks/ssd0
name: scratch
- hostPath:
path: /mnt/disks/ssd0
name: work
nodeSelector:
cloud.google.com/gke-nodepool: runner-16-with-nvme
ephemeral: false
image: ""
imagePullPolicy: Always
labels:
- runner-2
- self-hosted
organization: yourorganization
```
Adding handling of paginated results when calling `ListWorkflowJobs`. By default the `per_page` is 30, which potentially would return 30 queued and 30 in_progress jobs.
This change should enable the autoscaler to scale workflows with more than 60 jobs to the exact number of runners needed.
Problem: I did not find any support for pagination in the Github fake client, and have not been able to test this (as I have not been able to push an image to an environment where I can verify this).
If anyone is able to help out verifying this PR, i would really appreciate it.
Resolves#990
GitHub currently has some limitations w.r.t permissions management on
runner groups as they all require org admin, however at our company
we're using runner groups to serve different internal teams (with
different permissions), thus we needed to deploy a custom proxy API with
our internal authentication to provide who has access to certain APIs
depending on the repository/runner group on a given org/enterprise
This change just allows to optionally send the GitHub API calls to an alternate custom
proxy URL instead of cloud github (github.com) or an enterprise URL with
basic authentication
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
The current implementation doesn't support yet runner groups with custom visibility (e.g selected repositories only). If there are multiple runner groups with selected visibility - not all runner groups may be a potential target to be scaled up. Thus this PR introduces support to allow having runner groups with selected visibility. This requires to query GitHub API to find what are the potential runner groups that are linked to a specific repository (whether using visibility all or selected).
This also improves resolving the `scaleTargetKey` that are used to match an HRA based on the inputs of the `RunnerSet`/`RunnerDeployment` spec to better support for runner groups.
This requires to configure github auth in the webhook server, to keep backwards compatibility if github auth is not provided to the webhook server, this will assume all runner groups have no selected visibility and it will target any available runner group as before
The webhook "workflowJob" pass the labels the job needs to the controller, who in turns search for them in its RunnerDeployment / RunnerSet. The current implementation ignore the search for `self-hosted` if this is the only label, however if multiple labels are found the `self-hosted` label must be declared explicitely or the RD / RS will not be selected for the autoscaling.
This PR fixes the behavior by ignoring this label, and add documentation on this webhook for the other labels that will still require an explicit declaration (OS and architecture).
The exception should be temporary, ideally the labels implicitely created (self-hosted, OS, architecture) should be searchable alongside the explicitly declared labels.
code tested, work with `["self-hosted"]` and `["self-hosted","anotherLabel"]`
Fixes#951
A lot of people have issues with private GKE clusters and it seems they are all solved by setting up a firewall policy. I think it would be relevant to include this in a troubleshooting-section since so many people are searching around issues for it. I myself just spent most of my day trying to figure it out.
Issues where this is the solution:
* #293
* #335
* #68
* fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.0
* Fix dependencies and bump Go to 1.17 so that it builds after controller-runtime 0.11.0 upgrade
* Regenerate manifests with the latest K8s dependencies
Co-authored-by: Renovate Bot <bot@renovateapp.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
When false the chart deployment template will not add GITHUB_*
environment variables to the manager container. In addition, the `volume`
and `volumeMount` for the secret will also be omitted from the
deployment manifest.
Signed-off-by: Piaras Hoban <phoban01@gmail.com>
* chore(deps): update quay.io/brancz/kube-rbac-proxy docker tag to v0.11.0
* chore(deps): update quay.io/brancz/kube-rbac-proxy make tag to v0.11.0
Co-authored-by: Renovate Bot <bot@renovateapp.com>
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
* doc: Add GitHub App Permission for Workflow Job Webhook
I think we've missed documenting about the app permission for receiving workflow_job events via the app webhook
* docs: clarification on webhook events
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
* docs: minor correction to autoscaling description
* docs: make pull metrics plural
* docs: remove redundant words
* docs: make plural and expand anti-flapping
The README had some examples of RunnerDeployment with wrong/outdated
spec.
This caused some pain when trying to create them based on the examples.
=> Fixed using the spec declared in: api/v1alpha1/runner_types.go
* docs: clean up of autoscaling section
* docs: clarifying anti-flapping
* docs: more improvements
* docs: more improvements
* docs: adding duration details and cavaets
* docs: smaller english and better structure
* docs: use consistent wording
* docs: adding limitation cavaet for RunnerSets
* docs: correct helm uprgade order
* docs: lines helm upgrade command with help switch
* docs: use existing limitations section
* docs: fix table of headers and contents
* docs: add link to runnersets on first mention
* docs: adding runnerset limitation
* chore: use new enterprise permission for PAT
* docs: bump example deploy to latest version
* docs: adding oauth apps link
* docs: adding cavaet to the oauth apps doc
* Revert "chore: support app ids as int or strings (#869)"
This reverts commit 0a3d2b686e.
* docs: adding some comments to the code
* docs: adding comment to the chart values
* Create optional serviceAnnotations value in helm chart
* update annotation key
* update annotation key - webhook service
* fix README.md
* docs: using consistent tense
* docs: making the code comments more generic
* feat: Have githubwebhook monitor a single namespace
When using `scope.singleNamespace: true` in Helm, expected behaviour is
that the github webhook server behaves the same way as the controller.
The current behaviour is that the webhook server monitors all the
namespaces.
* Changing the chart README.md to reflect the scope
The documentation now mentions that both the controller and the github
webhook server will make use of the `scope.watchNamespace` field if
`scope.singleNamespace` is set to `true`.
Co-authored-by: Sebastien Le Digabel <sebastien.ledigabel@skyscanner.net>
* Fix bug related to label matching.
Add start of test framework for Workflow Job Events
Signed-off-by: Aidan Jensen <aidan@artificial.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* docs: updating to reflect the latest release
* docs: further updates
* docs: format updates
* docs: correcting title sizes
* docs: correcting links
* docs: correcting the grammar of multi controllers
* docs: slightly less awkward English
I have a dedicated GitHub organization and a private repository to run this E2E test. After a few fixes included in this change, it has successfully passed.
This was something that was missing in #707.
Adding a new test to make sure the ephemeral feature flag from upstream
is set up correctly by the script.
The unit tests are simulating a run for entrypoint. It creates some
dummy config.sh and runsvc.sh and makes sure the logic behind
entrypoint.sh is correct.
Unfortunately the entrypoint.sh contains some sections that are not
mockable so I had to put some logic in there too.
Testing includes for now:
- the normal scenario
- the normal non-ephemeral scenario
- the configuration failure scenario
Also tested the entrypoint.sh on a real runner, still works as expected.
Adding a basic retry loop during configuration. If configuration fails,
the runner will just straight into a retry loop and will continuously
fail until it dies after a while.
This change will retry 10 times and will exit if the configuration
wasn't successful.
Also, changed the logging format, adding a bit of color in the event of
success or failure.
This fixes the below error that occurs in `make release`:
```
kustomize build config/default > release/actions-runner-controller.yaml
Error: accumulating resources: accumulation err='accumulating resources from '../webhook': '/home/mumoshu/p/actions-runner-controller/config/webhook' must resolve to a file': recursed accumulation of path '/home/mumoshu/p/actions-runner-controller/config/webhook': accumulating resources: accumulation err='accumulating resources from 'manifests.v1beta1.yaml': evalsymlink failure on '/home/mumoshu/p/actions-runner-controller/config/webhook/manifests.v1beta1.yaml' : lstat /home/mumoshu/p/actions-runner-controller/config/webhook/manifests.v1beta1.yaml: no such file or directory': evalsymlink failure on '/home/mumoshu/p/actions-runner-controller/config/webhook/manifests.v1beta1.yaml' : lstat /home/mumoshu/p/actions-runner-controller/config/webhook/manifests.v1beta1.yaml: no such file or directory
make: *** [Makefile:156: release] Error 1
```
Ref #144
This fixes the below error on installing the chart:
```
Error: UPGRADE FAILED: error validating "": error validating data: [ValidationError(MutatingWebhookConfiguration.webhooks[0]): missing required field "admissionReviewVersions" in io.k8s.api.admissionregistration.v1.MutatingWebhook, ValidationError(MutatingWebhookConfiguration.webhooks[1]): missing required field "admissionReviewVersions" in io.k8s.api.admissionregistration.v1.MutatingWebhook, ValidationError(MutatingWebhookConfiguration.webhooks[2]): missing required field "admissionReviewVersions" in io.k8s.api.admissionregistration.v1.MutatingWebhook, ValidationError(MutatingWebhookConfiguration.webhooks[3]): missing required field "admissionReviewVersions" in io.k8s.api.admissionregistration.v1.MutatingWebhook]
```
Ref #144
This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet.
**These features are not yet generally available to all GitHub users**. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end.
**Ephemeral runners**:
The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller).
`--once` has been suffering from a race issue #466. `--ephemeral` fixes that.
To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`.
Please read the section `Ephemeral Runners` in the updated version of our README for more information.
Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub!
**`workflow_job` events**:
`workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale.
Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial.
In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet.
Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `.
Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.
* Adding a default docker registry mirror
This change allows the controller to start with a specified default
docker registry mirror and avoid having to specify it in all the runner*
objects.
The change is backward compatible, if a runner has a docker registry
mirror specified, it will supersede the default one.
Add `shortNames` to kube api-resource CRDs. Short-names make it easier when interacting/troubleshooting api-resources with kubectl.
We have tried to follow the naming convention similar to what K8s uses which should help with avoiding any naming conflicts as well. For example:
* `Deployment` has a shortName of deploy, so added rdeploy for `runnerdeployment`
* `HorizontalPodAutoscaler` has a shortName of hpa, so added hra for `HorizontalRunnerAutoscaler`
* `ReplicaSets` has a shortName of rs, so added rrs for `runnerreplicaset`
Co-authored-by: abhinav454 <43758739+abhinav454@users.noreply.github.com>
* Add POC of GitHub Webhook Delivery Forwarder
* multi-forwarder and ctrl-c existing and fix for non-woring http post
* Rename source files
* Extract signal handling into a dedicated source file
* Faster ctrl-c handling
* Enable automatic creation of repo hook on startup
* Add support for forwarding org hook deliveries
* Set hook secret on hook creation via envvar (HOOK_SECRET)
* Fix org hook support
* Fix HOOK_SECRET for consistency
* Refactor to prepare for custom log position provider
* Refactor to extract inmemory log position provider
* Add configmap-based log position provider
* Rename githubwebhookdeliveryforwarder to hookdeliveryforwarder
* Refactor to rename LogPositionProvider to Checkpointer and extract ConfigMap checkpointer into a dedicated pkg
* Refactor to extract logger initialization
* Add hookdeliveryforwarder README and bump go-github to unreleased ver
Improves #664 by adding annotations to the server's service. Beyond general applications, we use these annotations within my own projects to configure various LB values.
* Optional override of runner image in chart
This commit adds the option to override the actions runner image. This
allows running the controller in environments where access to Dockerhub
is restricted.
It uses the parameter [--runner-image](https://github.com/actions-runner-controller/actions-runner-controller/blob/master/main.go#L89) from the controller.
The default value is set as a constant
[here](acb906164b/main.go (L40)).
The default value for the chart is the same.
* Fixing actionsRunner name
... to actionsRunnerRepositoryAndTag for consistency.
* Bumping chart to v0.12.5
Previously the E2E test suite covered only RunnerSet. This refactors the existing E2E test code to extract the common test structure into a `env` struct and its methods, and use it to write two very similar tests, one for RunnerSet and another for RunnerDeployment.
There's no reason to create a non-working secret by default. If someone wants to deploy the secrets via the chart they will need to do some config regardless so they might as well also set the create flag
Enhances out existing E2E test suite to additionally support triggering two or more concurrent workflow jobs and verifying all the results, so that you can ensure the runners managed by the controller are able to handle jobs reliably when loaded.
This enhances the E2E test suite introduced in #658 to also include the following steps:
- Install GitHub Actions workflow
- Trigger a workflow run via a git commit
- Verify the workflow run result
In the workflow, we use `kubectl create cm --from-literal` to create a configmap that contains an unique test ID. In the last step we obtain the configmap from within the E2E test and check the test ID to match the expected one.
To install a GitHub Actions workflow, we clone a GitHub repository denoted by the TEST_REPO envvar, progmatically generate a few files with some Go code, run `git-add`, `git-commit`, and then `git-push` to actually push the files to the repository. A single commit containing an updated workflow definition and an updated file seems to run a workflow derived to the definition introduced in the commit, which was a bit surpirising and useful behaviour.
At this point, the E2E test fully covers all the steps for a GitHub token based installation. We need to add scenarios for more deployment options, like GitHub App, RunnerDeployment, HRA, and so on. But each of them would worth another pull request.
This is the initial version of our E2E test suite which is currently a subset of the acceptance test suite reimplemented in Go.
To run it, pass `-run ^TestE2E$` to `go test`, without `-short`, like `go test -timeout 600s -run ^TestE2E$ github.com/actions-runner-controller/actions-runner-controller/test/e2e -v`.
`make test` is modified to pass `-short` to `go test` by default to skip E2E tests.
The biggest benefit of rewriting the acceptance test in Go turned out to be the fact that you can easily rerun each step- a go-test "subtest"- individually from your IDE, for faster turnaround. Both VS Code and IntelliJ IDEA/GoLand are known to work.
In the near future, we will add more steps to the suite, like actually git-comminting some Actions workflow and pushing some commit to trigger a workflow run, and verify the workflow and job run results, and finally run it on our `test` workflow to fully automated E2E testing. But that s another story.
`HRA.Spec.ScaleTargetRef.Kind` is added to denote that the scale-target is a RunnerSet.
It defaults to `RunnerDeployment` for backward compatibility.
```
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: myhra
spec:
scaleTargetRef:
kind: RunnerSet
name: myrunnerset
```
Ref #629
Ref #613
Ref #612
This option expose internally some `KUBERNETES_*` environment variables
that doesn't allow the runner to use KinD (Kubernetes in Docker) since it will
try to connect to the Kubernetes cluster where the runner it's running.
This option it's set by default to `true` in any Kubernetes deployment.
Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@enterprisedb.com>
* feat: RunnerSet backed by StatefulSet
Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens.
Resolves#613
Ref #612
* Upgrade controller-runtime to 0.9.0
* Bump Go to 1.16.x following controller-runtime 0.9.0
* Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup
* Fix startup failure due to missing LeaderElectionID
* Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered
* Allow force-updating statefulset
* Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false
* Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec
* Enable running acceptance tests against arbitrary kind cluster
* RunnerSet supports non-ephemeral runners only today
* fix: docker-build from root Makefile on intel mac
* fix: arch check fixes for mac and ARM
* ci: aligning test data format and patching checks
* fix: removing namespace in test data
* chore: adding more ignores
* chore: removing leading space in shebang
* Re-add metrics to org hra testdata
* Bump cert-manager to v1.1.1 and fix deploy.sh
Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com>
Co-authored-by: Callum James Tait <callum.tait@photobox.com>
To prevent people from writing related and unrelated things to already closed issues. As a maitainer, that kind of situation only makes it harder to effectively provide user support. Please create another issue with concrete description of "your issue" and the reproduction steps, rather than commenting "me too" on unrelated issues!
Sets the privileged flag to false if SELinuxOptions are present/defined. This is needed because containerd treats SELinux and Privileged controls as mutually exclusive. Also see https://github.com/containerd/cri/blob/aa2d5a97c/pkg/server/container_create.go#L164.
This allows users who use SELinux for managing privileged processes to use GH Actions - otherwise, based on the SELinux policy, the Docker in Docker container might not be privileged enough.
Signed-off-by: Jonah Back <jonah@jonahback.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
This allows using the `runtimeClassName` directive in the runner's spec.
One of the use-cases for this is Kata Containers, which use `runtimeClassName` in a pod spec as an indicator that the pod should run inside a Kata container. This allows us a greater degree of pod isolation.
* docs: adding upgrade notes for Helm
* chore: adding new ignore
* docs: add in cmd to check for stuck runners
* docs: better format
* docs: removing superfluous steps
* docs: moved location of docs
Co-authored-by: Callum James Tait <callum.tait@photobox.com>
Adds a column to help the operator see if they configured HRA.Spec.ScheduledOverrides correctly, in a form of "next override schedule recognized by the controller":
```
$ k get horizontalrunnerautoscaler
NAME MIN MAX DESIRED SCHEDULE
actions-runner-aos-autoscaler 0 5 0
org 0 5 0 min=0 time=2021-05-21 15:00:00 +0000 UTC
```
Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/484
This fixes human-readable output of `kubectl get` on `runnerdeployment`, `runnerreplicaset`, and `runner`.
Most notably, CURRENT and READY of runner replicasets are now computed and printed correctly. Runner deployments now have UP-TO-DATE and AVAILABLE instead of READY so that it is consistent with columns of K8s deployments.
A few fixes has been also made to runner deployment and runner replicaset controllers so that those numbers stored in Status objects are reliably updated and in-sync with actual values.
Finally, `AGE` columns are added to runnerdeployment, runnerreplicaset, runnner to make that more visible to users.
`kubectl get` outputs should now look like the below examples:
```
# Immediately after runnerdeployment updated/created
$ k get runnerdeployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
example-runnerdeploy 0 0 0 0 8d
org-runnerdeploy 5 5 5 0 8d
# A few dozens of seconds after update/create all the runners are registered that "available" numbers increase
$ k get runnerdeployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
example-runnerdeploy 0 0 0 0 8d
org-runnerdeploy 5 5 5 5 8d
```
```
$ k get runnerreplicaset
NAME DESIRED CURRENT READY AGE
example-runnerdeploy-wnpf6 0 0 0 61m
org-runnerdeploy-fsnmr 2 2 0 8m41s
```
```
$ k get runner
NAME ENTERPRISE ORGANIZATION REPOSITORY LABELS STATUS AGE
example-runnerdeploy-wnpf6-registration-only actions-runner-controller/mumoshu-actions-test Running 61m
org-runnerdeploy-fsnmr-n8kkx actions-runner-controller ["mylabel 1","mylabel 2"] 21s
org-runnerdeploy-fsnmr-sq6m8 actions-runner-controller ["mylabel 1","mylabel 2"] 21s
```
Fixes#490
`PercentageRunnersBusy`, in combination with a secondary `TotalInProgressAndQueuedWorkflowRuns` metric, enables scale-from-zero for PercentageRunnersBusy.
Please see the new `Autoscaling to/from 0` section in the updated documentation about how it works.
Resolves#522
This adds the initial version of ScheduledOverrides to HorizontalRunnerAutoscaler.
`MinReplicas` overriding should just work.
When there are two or more ScheduledOverrides, the earliest one that matched is activated. Each ScheduledOverride can be recurring or one-time. If you have two or more ScheduledOverrides, only one of them should be one-time. And the one-time override should be the earliest item in the list to make sense.
Tests will be added in another commit. Logging improvements and additional observability in HRA.Status will also be added in yet another commits.
Ref #484
Adds two types `RecurrenceRule` and `Period` and one function `MatchSchedule` as the foundation for building the upcoming ScheduledOverrides feature.
Ref #484
- Adds `ephemeral` option to `runner.spec`
```
....
template:
spec:
ephemeral: false
repository: mumoshu/actions-runner-controller-ci
....
```
- `ephemeral` defaults to `true`
- `entrypoint.sh` in runner/Dockerfile modified to read `RUNNER_EPHEMERAL` flag
- Runner images are backward-compatible. `--once` is omitted only when the new envvar `RUNNER_EPHEMERAL` is explicitly set to `false`.
Resolves#457
Previously any non-go changes resulted in `make docker-build` rerunning time-consufming `go build`. This fixes that by adding clearly unnecessary files .dockerignore
- You can now use `make acceptance/run` to run only a specific acceptance test case
- Add note about Ubuntu 20.04 users / snap-provided docker
- Add instruction to run Ginkgo tests
- Extract acceptance/load from acceptance/kind
- Make `acceptance/pull` not depend on `docker-build`, so that you can do `make docker-build acceptance/load` for faster image reload
This is an attempt to support scaling from/to zero.
The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those.
GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners.
In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler.
Related to #447
* Fix acceptance helm test not using newly built controller image
* Locally build runner image instead of pulling it
* Revert runner controller image pull policy to always
and add a line to the test deployment to use IfNotPresent
* Change runner repository from summerwind/action-runner to the owner of actions-runner-controller.
Also fix some Makefile formatting.
* Undo renaming acceptance/pull to docker-pull
* Some env var cleanup
Rename USERNAME to DOCKER_USER(is still used for github too tho)
Add RUNNER_NAME var(defaults to $DOCKER_USER/actions-runner)
Add TEST_REPO(defaults to $DOCKER_USER/actions-runner-controller)
Changes:
- Switched to use `jq` in startup.sh
- Enable docker registry mirror configuration which is useful when e.g. avoiding the Docker Hub rate-limiting
Check #478 for how this feature is tested and supposed to be used.
* chore: adding Helm app version back
* chore: removing redundant values entry
* chore: bumping to newer version
* chore: bumping app version to latest
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
Enable the user to set a limit size on the volume of the runner to avoid some runner pod affecting other resources of the same cluster
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Cache docker images locally
Cache dind, runner, and kube-rbac-proxy docker image on the host and copy onto the kind node instead of downloading it to the node directly.
* Also cache certmanager docker images
Images for `actions-runner:v${VERSION}` and `actions-runner:latest` tags are upgraded to Ubuntu 20.04.
If you would like not to upgrade Ubuntu in the runner image in the future, migrate to new tags suffixed with `-ubuntu-20.04` like`actions-runner:v${VERSION}-ubuntu-20.04`.
We also keep publishing the existing Ubuntu 18.04 images with new `actions-runner:v${VERSION}-ubuntu-18.04` tags. Please use it when it turned out that you had workflows dependent on Ubuntu 18.04.
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
Prevents arguments from being split when e.g. the RUNNER_GROUP variable contains spaces (which is legit. One can create such groups in GitHub).
I've seen that all workers with group names that contain no spaces can register successfully, while all workers with groups that contain spaces will not register.
Furthermore, I suppose also other chars can be used here to inject arbitrary commands in an unsupported way via e.g. pipe symbol.
Quoting the vars correctly should prevent that and allow for e.g. group names and runner labels with spaces and other bash reserved characters.
This makes logging more concise by changing logger names to something like `controllers.Runner` to `actions-runner-controller.runner` after the standard `controller-rutime.controller` and reducing redundant logs by removing unnecessary requeues. I have also tweaked log messages so that their style is more consistent, which will also help readability. Also, runnerreplicaset-controller lacked useful logs so I have enhanced it.
As part of #282, I have introduced some caching mechanism to avoid excessive GitHub API calls due to the autoscaling calculation involving GitHub API calls is executed on each Webhook event.
Apparently, it was saving the wrong value in the cache- The value was one after applying `HRA.Spec.{Max,Min}Replicas` so manual changes to {Max,Min}Replicas doesn't affect RunnerDeployment.Spec.Replicas until the cache expires. This isn't what I had wanted.
This patch fixes that, by changing the value being cached to one before applying {Min,Max}Replicas.
Additionally, I've also updated logging so that you observe which number was fetched from cache, and what number was suggested by either TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy, and what was the final number used as the desired-replicas(after applying {Min,Max}Replicas).
Follow-up for #282
Since #392, the runner controller could have taken unexpectedly long time until it finally notices that the runner has been registered to GitHub. This patch fixes the issue, so that the controller will notice the successful registration in approximately 1 minute(hard-coded).
More concretely, let's say you had configured a long sync-period of like 10m, the runner controller could have taken approx 10m to notice the successful registration. The original expectation was 1m, because it was intended to recheck every 1m as implemented in #392. It wasn't working as such due to my misunderstanding in how requeueing work.
We occasionally encountered those errors while the underlying RunnerReplicaSet is being recreated/replaced on RunnerDeployment.Spec.Template update. It turned out to be due to that the RunnerDeployment controller was waiting for the runner pod becomes `Running`, intead of the new replacement runner to have registered to GitHub. This fixes that, by trying to Runner.Status.Phase to `Running` only after the runner in the runner pod appears to be registered.
A side-effect of this change is that runner controller would call more "ListRunners" GitHub Actions API. I've reviewed and improved the runner controller code and Runner CRD to make make the number of calls minimum. In most cases, ListRunners should be called only twice for each runner creation.
This allows you to trigger autoscaling depending on check_run names(i.e. actions job names). If you are willing to differentiate scale amount only for a specific job, or want to scale only on a specific job, try this.
PercentageRunnerBusy seems to have regressed since #355 due to that RunnerDeployment.Spec.Selector is empty by default and the HRA controller was using that empty selector to query runners, which somehow returned 0 runners. This fixes that by using the newly added automatic `runner-deployment-name` label for the default runner label and the selector, which avoids querying with empty selector.
Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-795200205
* so far, runner group parameter is only set for runners with org scope
* now set group for enterprise runners as well
* removed null check for org scope as either org or enterprise will be set
We sometimes see that integration test fails due to runner replicas not meeting the expected number in a timely manner. After investigating a bit, this turned out to be due to that HRA updates on webhook-based autoscaler and HRA controller are conflicting. This changes the controllers to use Patch instead of Update to make conflicts less likely to happen.
I have also updated the hra controller to use Patch when updating RunnerDeployment, too.
Overall, these changes should make the webhook-based autoscaling more reliable due to less conflicts.
* add replicaCount
* Add authSecret.existingSecret
* set image.tag null by default
* implement ingress for githubwebhook server
* fix deprecated and secretName template
* backward compat .authSecret.enabled
* existingSecret for github webhook secret
* use secretName template
* set default secret names
* do not use app version based image tag
* create and name variable for secrets
Similar to #348 for #346, but for HRA.Spec.CapacityReservations usually modified by the webhook-based autoscaler controller.
This patch tries to fix that by improving the webhook-based autoscaler controller to omit expired reservations on updating HRA spec.
`kubectl get horizontalrunnerautoscalers.actions.summerwind.dev` shows HRA.status.desiredReplicas as the DESIRED count. However the value had been not taking capacityReservations into account, which resulted in showing incorrect count when you used webhook-based autoscaler, or capacityReservations API directly. This fixes that.
The controller had been writing confusing messages like the below on missing scale target:
```
Found too many scale targets: It must be exactly one to avoid ambiguity. Either set WatchNamespace for the webhook-based autoscaler to let it only find HRAs in the namespace, or update Repository or Organization fields in your RunnerDeployment resources to fix the ambiguity.{"scaleTargets": ""}
```
This fixes that, while improving many kinds of messages written while reconcilation, so that the error message is more actionable.
* if a new runner pod was just scheduled to start up right before a
registration expired, it will not get a new registration token and go in
an infinite update loop (until #341) kicks in
* if registzration tokens got updated a little bit before they actually
expired, just starting up pods will way more likely get a working token
We occasionally see logs like the below:
```
2021-02-24T02:48:26.769ZERRORFailed to update runner status{"runnerreplicaset": "testns-244ol/example-runnerdeploy-j5wzf", "error": "Operation cannot be fulfilled on runnerreplicasets.actions.summerwind.dev \"example-runnerdeploy-j5wzf\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
/home/runner/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
github.com/summerwind/actions-runner-controller/controllers.(*RunnerReplicaSetReconciler).Reconcile
/home/runner/work/actions-runner-controller/actions-runner-controller/controllers/runnerreplicaset_controller.go:207
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88
2021-02-24T02:48:26.769ZERRORcontroller-runtime.controllerReconciler error{"controller": "testns-244olrunnerreplicaset", "request": "testns-244ol/example-runnerdeploy-j5wzf", "error": "Operation cannot be fulfilled on runnerreplicasets.actions.summerwind.dev \"example-runnerdeploy-j5wzf\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
/home/runner/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88
```
which can be compacted into one-liner, without the useless stack trace, without double-logging the same error from the logger and the controller.
Apparently I have mistakenly removed `push` option from the workflow in #323 which resulted in new runner build #323 not being pushed. This fixes that.
* if a runner pod starts up with an invalid token, it will go in an
infinite retry loop, appearing as RUNNING from the outside
* normally, this error situation is detected because no corresponding
runner objects exists in GitHub and the pod will get removed after
registration timeout
* if the GitHub runner object already existed before - e.g. because a
finalizer was not properly run as part of a partial Kubernetes crash,
the runner will always stay in a running mode, even updating the
registration token will not kill the problematic pod
* introducing RunnerOffline exception that can be handled in runner
controller and replicaset controller
* as runners are offline when a pod is completed and marked for restart,
only do additional restart checks if no restart was already decided,
making code a bit cleaner and saving GitHub API calls after each job
completion
Changes:
1. Fix length of github-webhook-server port name
2. Add a cluster role binding for github-webhook-server
3. Remove --enable-leader-election from github-webhook-server
* Update runner to 2.277.1
* Update build-and-release-runners.yml
* integration test condition
Don't run integration tests when only updating the runner image
* fixup! integration test condition
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* so far, only push events would trigger the DockerHub login step
* hence, attempts to release would fail because of a permission problem (tested locally)
* adding OR condition to also login in case a release got published
I have heard from some user that they have hundred thousands of `status=completed` workflow runs in their repository which effectively blocked TotalNumberOfQueuedAndInProgressWorkflowRuns from working because of GitHub API rate limit due to excessive paginated requests.
This fixes that by separating list-workflow-runs calls to two - one for `queued` and one for `in_progress`, which can make the minimum API call from 1 to 2, but allows it to work regardless of number of remaining `completed` workflow runs.
* the reconciliation loop is often much faster than the runner startup,
so changing runner not found related messages to debug and also add the
possibility that the runner just needs more time
* as pointed out in #281 the currently used image for the
kube-rbac-proxy - gcr.io/kubebuilder/kube-rbac-proxy:v0.4.1" - does not
have an ARM64 image
* hence, trying to use the standard deployment manifest / helm char will
fail on ARM64 systems
* replaced image with quay.io/brancz/kube-rbac-proxy:v0.8.0 which is the
latest version from the upstream maintainer
(https://github.com/brancz/kube-rbac-proxy/blob/master/Makefile#L13)
* successfully tested on both AMD64 and ARM64 clusters
* fixes#281
* errors.Is compares all members of a struct to return true which never
happened
* switched to type check instead of exact value check
* notRegistered was using double negation in if statement which lead to
unregistering runners after the registration timeout
* ... otherwise it will take 40 seconds (until a node is detected as unreachable) + 5 minutes (until pods are evicted from unreachable/crashed nodes)
* pods stuck in "Terminating" status on unreachable nodes will only be freed once #307 gets merged
* if a k8s node becomes unresponsive, the kube controller will soft
delete all pods after the eviction time (default 5 mins)
* as long as the node stays unresponsive, the pod will never leave the
last status and hence the runner controller will assume that everything
is fine with the pod and will not try to create new pods
* this can result in a situation where a horizontal autoscaler thinks
that none of its runners are currently busy and will not schedule any
further runners / pods, resulting in a broken runner deployment until the
runnerreplicaset is deleted or the node comes back online
* introducing a pod deletion timeout (1 minute) after which the runner
controller will try to reboot the runner and create a pod on a working
node
* use forceful deletion and requeue for pods that have been stuck for
more than one minute in terminating state
* gracefully handling race conditions if pod gets finally forcefully deleted within
This enhances the controller to recreate the runner pod if the corresponding runner has failed to register itself to GitHub within 10 minutes(currently hard-coded).
It should alleviate #288 in case the root cause is some kind of transient failures(network unreliability, GitHub down, temporarly compute resource shortage, etc).
Formerly you had to manually detect and delete such pods or even force-delete corresponding runners to unblock the controller.
Since this enhancement, the controller does the pod deletion automatically after 10 minutes after pod creation, which result in the controller create another pod that might work.
Ref #288
When we used `QueuedAndInProgressWorkflowRuns`-based autoscaling, it only fetched and considered only the first 30 workflow runs at the reconcilation time. This may have resulted in unreliable scaling behaviour, like scale-in/out not happening when it was expected.
* feat: HorizontalRunnerAutoscaler Webhook server
This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners.
This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs.
On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun|Push|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners.
Changes:
* Fix integration test
* Update manifests
* chart: Add support for github webhook server
* dockerfile: Include github-webhook-server binary
* Do not import unversioned go-github
* Update README
* bug-fix: patched dir owned by runner
* always build with latest runner version
* Revert "always build with latest runner version"
This reverts commit e719724ae9fe92a12d4a087185cf2a2ff543a0dd.
* Also patch dindrunner.Dockerfile
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Added GITHUB.RUN_NUMBER to DockerHub push
* switch run_number to sha on docker tag
* re-add mutable tags for backwards compatability
* truncate to short SHA (7 chars)
* behaviour workaround
* use ENV to define sha_short
* use ::set-output to define sha_short
* bump action
* feat/helm: Bump appVersion to 0.6.1 release
* Also bump chart version to trigger a new chart release
Co-authored-by: Yusuke Kuoka <c-ykuoka@zlab.co.jp>
* Add chart workflows (#1)
* Add chart workflows
* Fix publishing step in CI
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
* Update CI on push-to-master (#3)
* Put helm installation step in the correct CI job
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
* Put helm installation step in the correct CI job (#4)
* Update on-push-master-publish-chart.yml
* Remove references to certmanager dependency
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
* Add ability to customize kube-rbac-proxy image
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
* Only install cert-manager if we're going to spin up KinD
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
* when setting a GitHub Enterprise server URL without a namespace, an error occurs: "error: the server doesn't have a resource type "controller-manager"
* setting default namespace "actions-runner-system" makes the example work out of the box
* ensure that minReplicas <= desiredReplicas <= maxReplicas no matter what
* before this change, if the number of runners was much larger than the max number, the applied scale down factor might still result in a desired value > maxReplicas
* if for resource constraints in the cluster, runners would be permanently restarted, the number of runners could go up more than the reverse scale down factor until the next reconciliation round, resulting in a situation where the number of runners climbs up even though it should actually go down
* by checking whether the desiredReplicas is always <= maxReplicas, infinite scaling up loops can be prevented
* feat: adding maanger secret to Helm
* fix: correcting secret data format
* feat: adding in common labels
* fix: updating default values to have config
The auth config needs to be commented out by default as we don't want to deploy both configs empty. This may break stuff, so we want the user to actively uncomment the auth method they want instead
* chore: updating default format of cert
* chore: wording
One of the pod recreation conditions has been modified to use hash of runner spec, so that the controller does not keep restarting pods mutated by admission webhooks. This naturally allows us, for example, to use IRSA for EKS that requires its admission webhook to mutate the runner pod to have additional, IRSA-related volumes, volume mounts and env.
Resolves#200
It turned out previous versions of runner images were unable to run actions that require `AGENT_TOOLSDIRECTORY` or `libyaml` to exist in the runner environment. One of notable examples of such actions is [`ruby/setup-ruby`](https://github.com/ruby/setup-ruby).
This change adds the support for those actions, by setting up AGENT_TOOLSDIRECTORY and installing libyaml-dev within runner images.
* runner/controller: Add externals directory mount point
* Runner: Create hack for moving content of /runner/externals/ dir
* Externals dir Mount: mount examples for '__e/node12/bin/node' not found error
Add dockerEnabled option for users who does not need docker and want not to run privileged container.
if `dockerEnabled == false`, dind container not run, and there are no privileged container.
Do the same as closed#96
docker:dind container creates `/var/run/docker.sock` with root user and root group.
so, docker command in runner container needs root privileges to use docker.sock and docker action fails because lack of permission.
Use tcp connection between runner and docker container, so runner container doesn't need root privileges to run docker, and can run docker action.
Fixes#174
Acceptance tests are passing with the chart. In addition to standard chart values, syncPeriod is supported.
Please use it as a foundation for further collaboration.
Ref #184
Inspired by #91
Related #61
* Adds RUNNER_GROUP argument to the runner registration
Adds the ability to register a runner to a predefined runner_group
Resolves#137
* Update README with runner group example
- Updates the README with instructions of how to add the runner to a
group
- Fix code fencing for shell and yaml blocks in the README
- Use consistent bullet points (dash not asterisk)
description:Refer to semver-like release tags for controller versions. Any release tags prefixed with `actions-runner-controller-` are for chart releases
placeholder:ex. 0.18.2 or git commit ID
validations:
required:true
- type:input
id:chart-version
attributes:
label:Helm Chart Version
description:Run `helm list` and see what's shown under CHART VERSION. Any release tags prefixed with `actions-runner-controller-` are for chart releases
placeholder:ex. 0.11.0
- type:input
id:cert-manager-version
attributes:
label:CertManager Version
description:Run `kubectl get po -o yaml $CERT_MANAGER_POD` and see the image tag, or run `helm list` and see what's shown under APP VERSION for your cert-manager Helm release.
placeholder:ex. 1.8
- type:dropdown
id:deployment-method
attributes:
label:Deployment Method
description:Which deployment method did you use to install ARC?
options:
- Helm
- Kustomize
- ArgoCD
- Other
validations:
required:true
- type:textarea
id:cert-manager
attributes:
label:cert-manager installation
description:Confirm that you've installed cert-manager correctly by answering a few questions
placeholder:|
- Did you follow https://github.com/actions-runner-controller/actions-runner-controller#installation? If not, describe the installation process so that we can reproduce your environment.
- Are you sure you've installed cert-manager from an official source?
(Note that we won't provide user support for cert-manager itself. Make sure cert-manager is fully working before testing ARC or reporting a bug
validations:
required:true
- type:checkboxes
id:checks
attributes:
label:Checks
description:Please check the boxes below before submitting
options:
- label:This isn't a question or user support case (For Q&A and community support, go to [Discussions](https://github.com/actions-runner-controller/actions-runner-controller/discussions). It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
required:true
- label:I've read [releasenotes](https://github.com/actions-runner-controller/actions-runner-controller/tree/master/docs/releasenotes) before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
required:true
- label:My actions-runner-controller version (v0.x.y) does support the feature
required:true
- label:I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
required:true
- type:textarea
id:resource-definitions
attributes:
label:Resource Definitions
description:"Add copy(s) of your resource definition(s) (RunnerDeployment or RunnerSet, and HorizontalRunnerAutoscaler. If RunnerSet, also include the StorageClass being used)"
render:yaml
placeholder:|
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example
spec:
#snip
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: example
spec:
#snip
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: example
provisioner: ...
reclaimPolicy: ...
volumeBindingMode: ...
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name:
spec:
#snip
validations:
required:true
- type:textarea
id:reproduction-steps
attributes:
label:To Reproduce
description:"Steps to reproduce the behavior"
render:markdown
placeholder:|
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
validations:
required:true
- type:textarea
id:actual-behavior
attributes:
label:Describe the bug
description:Also tell us, what did happen?
placeholder:A clear and concise description of what happened.
validations:
required:true
- type:textarea
id:expected-behavior
attributes:
label:Describe the expected behavior
description:Also tell us, what did you expect to happen?
placeholder:A clear and concise description of what the expected behavior is.
validations:
required:true
- type:textarea
id:controller-logs
attributes:
label:Controller Logs
description:"NEVER EVER OMIT THIS! Include logs from `actions-runner-controller`'s controller-manager pod"
render:shell
placeholder:|
PROVIDE THE LOGS VIA A GIST LINK (https://gist.github.com/), NOT DIRECTLY IN THIS TEXT AREA
To grab controller logs:
# Set NS according to your setup
NS=actions-runner-system
# Grab the pod name and set it to $POD_NAME
kubectl -n $NS get po
kubectl -n $NS logs $POD_NAME > arc.log
validations:
required:true
- type:textarea
id:runner-pod-logs
attributes:
label:Runner Pod Logs
description:"Include logs from runner pod(s)"
render:shell
placeholder:|
PROVIDE THE LOGS VIA A GIST LINK (https://gist.github.com/), NOT DIRECTLY IN THIS TEXT AREA
To grab the runner pod logs:
# Set NS according to your setup. It should match your RunnerDeployment's metadata.namespace.
NS=default
# Grab the name of the problematic runner pod and set it to $POD_NAME
This is the template of actions-runner-controller's release notes.
Whenever a new release is made, I start by manually copy-pasting this template onto the GitHub UI for creating the release.
I then walk-through all the changes, take sometime to think abount best one-sentence explanations to tell the users about changes, write it all,
and click the publish button.
If you think you can improve future release notes in any way, please do submit a pull request to change the template below.
Note that even though it looks like a Go template, I don't use any templating to generate the changelog.
It's just that I'm used to reading and intepreting Go template by myself, not a computer program :)
**Title**:
```
v{{ .Version }}: {{ .TitlesOfImportantChanges }}
```
**Body**:
```
**CAUTION:** If you're using the Helm chart, beware to review changes to CRDs and do manually upgrade CRDs! Helm installs CRDs only on installing a chart. It doesn't automatically upgrade CRDs. Otherwise you end up with troubles like #427, #467, and #468. Please refer to the [UPGRADING](charts/actions-runner-controller/docs/UPGRADING.md) docs for the latest process.
This release includes the following changes from contributors. Thank you!
- @{{ .GitHubUser }} fixed {{ .Feature }} to not break when ... (#{{ .PullRequestNumber }})
issues:write # for actions/stale to close stale issues
pull-requests:write # for actions/stale to close stale PRs
steps:
- uses:actions/stale@v5
with:
stale-issue-message:'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
This document provides a high level overview of Actions Runner Controller (ARC). ARC enables running Github Actions Runners on Kubernetes (K8s) clusters.
This document provides a background of Github Actions, self-hosted runners and ARC overview. By the end of the doc, the reader should have a foundation with basic scenarios and be capable of reviewing other advanced topics.
## GitHub Actions
[GitHub Actions](https://github.com/features/actions) is a continuous integration and continuous delivery (CI/CD) platform to automate your build, test, and deployment pipeline.
You can create workflows that build and test every pull request to your repository, or deploy merged pull requests to production. Your workflow contains one or more jobs which can run in sequential order or in parallel. Each job will run inside its own runner and has one or more steps that either run a script that you define or run an action, which is a reusable extension that can simplify your workflow. To learn more about about Actions - see "[Learn Github Actions](https://docs.github.com/en/actions/learn-github-actions)".
## Runners
Runners execute the job that is assigned to them by Github Actions workflow. There are two types of Runners:
- [Github-hosted runners](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners) - GitHub provides Linux, Windows, and macOS virtual machines to run your workflows. These virtual machines are hosted in the cloud by Github.
- [Self-hosted runners](https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners) - you can host your own self-hosted runners in your own data center or cloud infrastructure. ARC deploys self-hosted runners.
## Self hosted runners
Self-hosted runners offer more control of hardware, operating system, and software tools than GitHub-hosted runners. With self-hosted runners, you can create custom hardware configurations that meet your needs with processing power or memory to run larger jobs, install software available on your local network, and choose an operating system not offered by GitHub-hosted runners.
### Types of Self hosted runners
Self-hosted runners can be physical, virtual, in a container, on-premises, or in a cloud.
- Traditional Deployment is having a physical machine, with OS and apps on it. The runner runs on this machine and executes any jobs. It comes with the cost of owning and operating the hardware 24/7 even if it isn't in use that entire time.
- Virtualized deployments are simpler to manage. Each runner runs on a virtual machine (VM) that runs on a host. There could be multiple such VMs running on the same host. VMs are complete OS’s and might take time to bring up everytime a clean environment is needed to run workflows.
- Containerized deployments are similar to VMs, but instead of bringing up entire VM’s, a container gets deployed.Kubernetes (K8s) provides a scalable and reproducible environment for containerized workloads. They are lightweight, loosely coupled, highly efficient and can be managed centrally. There are advantages to using Kubernetes (outlined "[here](https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/)."), but it is more complicated and less widely-understood than the other options. A managed provider makes this much simpler to run at scale.
*Actions Runner Controller(ARC) makes it simpler to run self hosted runners on K8s managed containers.*
## Actions Runner Controller (ARC)
ARC is a K8s controller to create self-hosted runners on your K8s cluster. With few commands, you can set up self hosted runners that can scale up and down based on demand. And since these could be ephemeral and based on containers, new instances of the runner can be brought up rapidly and cleanly.
### Deploying ARC
We have a quick start guide that demonstrates how to easily deploy ARC into your K8s environment. For more details, see "[QuickStart Guide](https://github.com/actions-runner-controller/actions-runner-controller/blob/master/QuickStartGuide.md)."
## ARC components
ARC basically consists of a set of custom resources. An ARC deployment is applying these custom resources onto a K8s cluster. Once applied, it creates a set of Pods, with the Github Actions runner running within them. Github is now able to treat these Pods as self hosted runners and allocate jobs to them.
### Custom resources
ARC consists of several custom resource definitions (Runner, Runner Set, Runner Deployment, Runner Replica Set and Horizontal Runner AutoScaler). For more information on CRDs, refer "[Kubernetes Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)."
The helm command (in the QuickStart guide) installs the custom resources into the actions-runner-system namespace.
The `Deployment and Configure ARC` section in the `Quick Start guide` lists the steps to deploy ARC using a `runnerdeployment.yaml` file. Here, we will explain the details
For more details, see "[QuickStart Guide](https://github.com/actions-runner-controller/actions-runner-controller/blob/master/QuickStartGuide.md)."
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example-runnerdeploy
spec:
replicas: 1
template:
spec:
repository: mumoshu/actions-runner-controller-ci
```
-`kind: RunnerDeployment`: indicates its a kind of custom resource RunnerDeployment.
-`replicas: 1` : will deploy one replica. Multiple replicas can also be deployed ( more on that later).
-`repository: mumoshu/actions-runner-controller-ci` : is the repository to link to when the pod comes up with the Actions runner (Note, this can be configured to link at the Enterprise or Organization level also).
When this configuration is applied with `kubectl apply -f runnerdeployment.yaml` , ARC creates one pod `example-runnerdeploy-[**]` with 2 containers `runner` and `docker`.
`runner` container has the github runner component installed, `docker` container has docker installed.
### The Runner container image
The GitHub hosted runners include a large amount of pre-installed software packages. For complete list, see "[Runner images](https://github.com/actions/virtual-environments/tree/main/images/linux)."
ARC maintains a few runner images with `latest` aligning with GitHub's Ubuntu version. These images do not contain all of the software installed on the GitHub runners. They contain subset of packages from the GitHub runners: Basic CLI packages, git, docker and build-essentials. To install additional software, it is recommended to use the corresponding setup actions. For instance, `actions/setup-java` for Java or `actions/setup-node` for Node.
## Executing workflows
Now, all the setup and configuration is done. A workflow can be created in the same repository that could target the self hosted runner created from ARC. The workflow needs to have `runs-on: self-hosted` so it can target the self host pool. For more information on targeting workflows to run on self hosted runners, see "[Using Self-hosted runners](https://docs.github.com/en/actions/hosting-your-own-runners/using-self-hosted-runners-in-a-workflow)."
## Scaling runners - statically with replicas count
With a small tweak to the replicas count (for eg - `replicas: 2`) in the `runnerdeployment.yaml` file, more runners can be created. Depending on the count of replicas, those many sets of pods would be created. As before, Each pod contains the two containers.
## Scaling runners - dynamically with Pull Driven Scaling
ARC also allows for scaling the runners dynamically. There are two mechanisms for dynamically scaling - (1) Webhook driven scaling and (2) Pull Driven scaling, This document describes the Pull Driven scaling model.
1) Enable `HorizontalRunnerAutoscaler` - Create a `deployment.yaml` file of type `HorizontalRunnerAutoscaler`. The schema for this file is defined below.
2) Scaling parameters - `minReplicas` and `maxReplicas` indicates the min and max number of replicas to scale to.
3) Scaling metrics - ARC currently supports `PercentageRunnersBusy` as a metric type. The `PercentageRunnersBusy` will poll GitHub for the number of runners in the `busy` state in the RunnerDeployment's namespace, it will then scale depending on how you have configured the scale factors.
### Pull Driven Scaling Schema
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: example-runner-deployment-autoscaler
spec:
scaleTargetRef:
# Your RunnerDeployment Here
name: example-runnerdeploy
kind: RunnerDeployment
minReplicas: 1
maxReplicas: 5
metrics:
- type: PercentageRunnersBusy
scaleUpThreshold: '0.75'
scaleDownThreshold: '0.25'
scaleUpFactor: '2'
scaleDownFactor: '0.5'
```
For more details - please see "[Pull Driven Scaling](https://github.com/actions-runner-controller/actions-runner-controller#pull-driven-scaling)."
*The period between polls is defined by the controller's `--sync-period` flag. If this flag isn't provided then the controller defaults to a sync period of `1m`, this can be configured in seconds or minutes.*
## Other Configurations
ARC supports several different advanced configuration.
- support for alternate runners : Setting up runner pods with Docker-In-Docker configuration.
- managing runner groups : Managing a set of running with runner groups thus making it easy to manage different groups within enterprise
- Webhook driven scaling.
Please refer to the documentation in this repo for further details.
We always appreciate your help in testing open pull requests by deploying custom builds of actions-runner-controller onto your own environment, so that we are extra sure we didn't break anything.
It is especially true when the pull request is about GitHub Enterprise, both GHEC and GHES, as [maintainers don't have GitHub Enterprise environments for testing](/README.md#github-enterprise-support).
The process would look like the below:
- Clone this repository locally
- Checkout the branch. If you use the `gh` command, run `gh pr checkout $PR_NUMBER`
- Run `NAME=$DOCKER_USER/actions-runner-controller VERSION=canary make docker-build docker-push` for a custom container image build
- Update your actions-runner-controller's controller-manager deployment to use the new image, `$DOCKER_USER/actions-runner-controller:canary`
Please also note that you need to replace `$DOCKER_USER` with your own DockerHub account name.
### How to Contribute a Patch
Depending on what you are patching depends on how you should go about it. Below are some guides on how to test patches locally as well as develop the controller and runners.
When submitting a PR for a change please provide evidence that your change works as we still need to work on improving the CI of the project. Some resources are provided for helping achieve this, see this guide for details.
#### Running an End to End Test
> **Notes for Ubuntu 20.04+ users**
>
> If you're using Ubuntu 20.04 or greater, you might have installed `docker` with `snap`.
>
> If you want to stick with `snap`-provided `docker`, do not forget to set `TMPDIR` to
> somewhere under `$HOME`.
> Otherwise `kind load docker-image` fail while running `docker save`.
> See https://kind.sigs.k8s.io/docs/user/known-issues/#docker-installed-with-snap for more information.
To test your local changes against both PAT and App based authentication please run the `acceptance` make target with the authentication configuration details provided:
```shell
# This sets `VERSION` envvar to some appropriate value
. hack/make-env.sh
DOCKER_USER=*** \
GITHUB_TOKEN=*** \
APP_ID=*** \
PRIVATE_KEY_FILE_PATH=path/to/pem/file \
INSTALLATION_ID=*** \
make acceptance
```
**Rerunning a failed test**
When one of tests run by `make acceptance` failed, you'd probably like to rerun only the failed one.
It can be done by `make acceptance/run` and by setting the combination of `ACCEPTANCE_TEST_DEPLOYMENT_TOOL=helm|kubectl` and `ACCEPTANCE_TEST_SECRET_TYPE=token|app` values that failed (note, you just need to set the corresponding authentication configuration in this circumstance)
In the example below, we rerun the test for the combination `ACCEPTANCE_TEST_DEPLOYMENT_TOOL=helm ACCEPTANCE_TEST_SECRET_TYPE=token` only:
```shell
DOCKER_USER=*** \
GITHUB_TOKEN=*** \
ACCEPTANCE_TEST_DEPLOYMENT_TOOL=helm
ACCEPTANCE_TEST_SECRET_TYPE=token \
make acceptance/run
```
**Testing in a non-kind cluster**
If you prefer to test in a non-kind cluster, you can instead run:
```shell
KUBECONFIG=path/to/kubeconfig \
DOCKER_USER=*** \
GITHUB_TOKEN=*** \
APP_ID=*** \
PRIVATE_KEY_FILE_PATH=path/to/pem/file \
INSTALLATION_ID=*** \
ACCEPTANCE_TEST_SECRET_TYPE=token \
make docker-build acceptance/setup \
acceptance/deploy \
acceptance/tests
```
#### Developing the Controller
Rerunning the whole acceptance test suite from scratch on every little change to the controller, the runner, and the chart would be counter-productive.
To make your development cycle faster, use the below command to update deploy and update all the three:
```shell
# Let assume we have all other envvars like DOCKER_USER, GITHUB_TOKEN already set,
# The below command will (re)build `actions-runner-controller:controller1` and `actions-runner:runner1`,
# load those into kind nodes, and then rerun kubectl or helm to install/upgrade the controller,
# and finally upgrade the runner deployment to use the new runner image.
#
# As helm 3 and kubectl is unable to recreate a pod when no tag change,
# you either need to bump VERSION and RUNNER_TAG on each run,
# or manually run `kubectl delete pod $POD` on respective pods for changes to actually take effect.
# Makefile
VERSION=controller1 \
RUNNER_TAG=runner1 \
make acceptance/pull acceptance/kind docker-build acceptance/load acceptance/deploy
```
If you've already deployed actions-runner-controller and only want to recreate pods to use the newer image, you can run:
```shell
# Makefile
NAME=$DOCKER_USER/actions-runner-controller \
make docker-build acceptance/load &&\
kubectl -n actions-runner-system delete po $(kubectl -n actions-runner-system get po -ojsonpath={.items[*].metadata.name})
```
Similarly, if you'd like to recreate runner pods with the newer runner image you can use the runner specific [Makefile](runner/Makefile) to build and / or push new runner images
```shell
# runner/Makefile
NAME=$DOCKER_USER/actions-runner make \
-C runner docker-{build,push}-ubuntu &&\
(kubectl get po -ojsonpath={.items[*].metadata.name}| xargs -n1 kubectl delete po)
```
#### Developing the Runners
**Tests**
A set of example pipelines (./acceptance/pipelines) are provided in this repository which you can use to validate your runners are working as expected. When raising a PR please run the relevant suites to prove your change hasn't broken anything.
**Running Ginkgo Tests**
You can run the integration test suite that is written in Ginkgo with:
```shell
make test-with-deps
```
This will firstly install a few binaries required to setup the integration test environment and then runs `go test` to start the Ginkgo test.
If you don't want to use `make`, like when you're running tests from your IDE, install required binaries to `/usr/local/kubebuilder/bin`. That's the directory in which controller-runtime's `envtest` framework locates the binaries.
go test -v -run TestAPIs github.com/actions-runner-controller/actions-runner-controller/controllers
```
To run Ginkgo tests selectively, set the pattern of target test names to `GINKGO_FOCUS`.
All the Ginkgo test that matches `GINKGO_FOCUS` will be run.
```shell
GINKGO_FOCUS='[It] should create a new Runner resource from the specified template, add a another Runner on replicas increased, and removes all the replicas when set to 0'\
go test -v -run TestAPIs github.com/actions-runner-controller/actions-runner-controller/controllers
```
#### Helm Version Bumps
In general we ask you not to bump the version in your PR, the maintainers in general manage the publishing of a new chart.
# We use -count=1 instead of `go clean -testcache`
# See https://terratest.gruntwork.io/docs/testing-best-practices/avoid-test-caching/
.PHONY:e2e
e2e:
go test -count=1 -v -timeout 600s -run '^TestE2E$$' ./test/e2e
# Upload release file to GitHub.
github-release:release
ghr ${VERSION} release/
# find or download controller-gen
# download controller-gen if necessary
# Find or download controller-gen
#
# Note that controller-gen newer than 0.4.1 is needed for https://github.com/kubernetes-sigs/controller-tools/issues/444#issuecomment-680168439
# Otherwise we get errors like the below:
# Error: failed to install CRD crds/actions.summerwind.dev_runnersets.yaml: CustomResourceDefinition.apiextensions.k8s.io "runnersets.actions.summerwind.dev" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[template].properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[template].properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]
#
# Note that controller-gen newer than 0.6.0 is needed due to https://github.com/kubernetes-sigs/controller-tools/issues/448
# Otherwise ObjectMeta embedded in Spec results in empty on the storage.
GitHub Actions can be run in GitHub-hosted cloud or self hosted environments. Self-hosted runners offer more control of hardware, operating system, and software tools than GitHub-hosted runners provide.
With just a few steps, you can set up your kubernetes (K8s) cluster to be a self-hosted environment.
In this guide, we will setup prerequistes, deploy Actions Runner controller (ARC) and then target that cluster to run GitHub Action workflows.
<details><summary><sub>Create a K8s cluster, if not available.</sub></summary>
<sub>
If you don't have a K8s cluster, you can install a local environment using minikube. For more information, see "[Installing minikube](https://minikube.sigs.k8s.io/docs/start/)."
"[Using workflows](/actions/using-workflows)."
</sub>
</details>
:one: Install cert-manager in your cluster. For more information, see "[cert-manager](https://cert-manager.io/docs/installation/)."
<sub> *note:- This command uses v1.8.2. Please replace with a later version, if available.</sub>
>You may also install cert-manager using Helm. For instructions, see "[Installing with Helm](https://cert-manager.io/docs/installation/helm/#installing-with-helm)."
:two: Next, Generate a Personal Access Token (PAT) for ARC to authenticate with GitHub.
- Login to GitHub account and Navigate to https://github.com/settings/tokens/new.
- Select **repo**.
- Click **Generate Token** and then copy the token locally ( we’ll need it later).
## Deploy and Configure ARC
1️⃣ Deploy and configure ARC on your K8s cluster. You may use Helm or Kubectl.
Also, this runner has been registered directly to the specified repository, you can see it in repository settings. For more information, see "[settings](https://docs.github.com/en/actions/hosting-your-own-runners/monitoring-and-troubleshooting-self-hosted-runners#checking-the-status-of-a-self-hosted-runner)."
:two: You are ready to execute workflows against this self hosted runner.
GitHub documentation lists the steps to target Actions against self hosted runners. For more information, see "[Using self-hosted runners in a workflow - GitHub Docs](https://docs.github.com/en/actions/hosting-your-own-runners/using-self-hosted-runners-in-a-workflow#using-self-hosted-runners-in-a-workflow)."
There's also has a quick start guide to get started on Actions, For more information, see "[Quick start Guide to GitHub Actions](https://docs.github.com/en/actions/quickstart)."
## Next steps
ARC provides several interesting features and capabilities. For more information, see "[readme](https://github.com/actions-runner-controller/actions-runner-controller/blob/master/README.md)."
This project is maintained by a small team of two and therefore lacks the resource to provide security fixes in a timely manner.
If you have important business(es) that relies on this project, please consider sponsoring the project so that the maintainer(s) can commit to providing such service.
Please refer to https://github.com/sponsors/actions-runner-controller for available tiers.
* [Unable to scale to zero with TotalNumberOfQueuedAndInProgressWorkflowRuns](#unable-to-scale-to-zero-with-totalnumberofqueuedandinprogressworkflowruns)
## Tools
A list of tools which are helpful for troubleshooting
* https://github.com/stern/stern Multi pod and container log tailing for Kubernetes
## Installation
Troubeshooting runbooks that relate to ARC installation problems
### InternalError when calling webhook: context deadline exceeded
**Problem**
This issue can come up for various reasons like leftovers from previous installations or not being able to access the K8s service's clusterIP associated with the admission webhook server (of ARC).
Post "https://actions-runner-controller-webhook.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment?timeout=10s": context deadline exceeded
```
**Solution**
First we will try the common solution of checking webhook leftovers from previous installations:
1. ```bash
kubectl get validatingwebhookconfiguration -A
kubectl get mutatingwebhookconfiguration -A
```
2. If you see any webhooks related to actions-runner-controller, delete them:
If that didn't work then probably your K8s control-plane is somehow unable to access the K8s service's clusterIP associated with the admission webhook server:
1. You're running apiserver as a binary and you didn't make service cluster IPs available to the host network.
2. You're running the apiserver in the pod but your pod network (i.e. CNI plugin installation and config) is not good so your pods(like kube-apiserver) in the K8s control-plane nodes can't access ARC's admission webhook server pod(s) in probably data-plane nodes.
Another reason could be due to GKEs firewall settings you may run into the following errors when trying to deploy runners on a private GKE cluster:
To fix this, you may either:
1. Configure the webhook to use another port, such as 443 or 10250, [each of
which allow traffic by default](https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules).
```sh
# With helm, you'd set `webhookPort` to the port number of your choice
# See https://github.com/actions-runner-controller/actions-runner-controller/pull/1410/files for more information
2. Set up a firewall rule to allow the master node to connect to the default
webhook port. The exact way to do this may vary, but the following script
should point you in the right direction:
```sh
# 1) Retrieve the network tag automatically given to the worker nodes
# NOTE: this only works if you have only one cluster in your GCP project. You will have to manually inspect the result of this command to find the tag for the cluster you want to target
"error": "failed to create registration token: Post \"https://api.github.com/orgs/$YOUR_ORG_HERE/actions/runners/registration-token\": net/http: invalid header field value \"Bearer $YOUR_TOKEN_HERE\\n\" for key Authorization"
}
```
**Solution**
Your base64'ed PAT token has a new line at the end, it needs to be created without a `\n` added, either:
* `echo -n $TOKEN | base64`
* Create the secret as described in the docs using the shell and documented flags
### Helm chart install failure: certificate signed by unknown authority
**Problem**
```
Error: UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": x509: certificate signed by unknown authority
```
Apparently, it's failing while `helm` is creating one of resources defined in the ARC chart and the cause was that cert-manager's webhook is not working correctly, due to the missing or the invalid CA certficate.
You'd try to tail logs from the `cert-manager-cainjector` and see it's failing with an error like:
I0703 03:32:12.339197 1 request.go:645] Throttling request took 1.047437675s, request: GET:https://10.96.0.1:443/apis/storage.k8s.io/v1beta1?timeout=32s
E0703 03:32:13.143790 1 start.go:151] cert-manager/ca-injector "msg"="Error registering certificate based controllers. Retrying after 5 seconds." "error"="no matches for kind \"MutatingWebhookConfiguration\" in version \"admissionregistration.k8s.io/v1beta1\""
Error: error registering secret controller: no matches for kind "MutatingWebhookConfiguration" in version "admissionregistration.k8s.io/v1beta1"
```
**Solution**
Your cluster is based on a new enough Kubernetes of version 1.22 or greater which does not support the legacy `admissionregistration.k8s.io/v1beta1` API anymore, and your `cert-manager` is not up-to-date hence it's still trying to use the leagcy Kubernetes API.
In many cases, it's not an option to downgrade Kubernetes. So, just upgrade `cert-manager` to a more recent version that does have have the support for the specific Kubernetes version you're using.
See https://cert-manager.io/docs/installation/supported-releases/ for the list of available cert-manager versions.
## Operations
Troubeshooting runbooks that relate to ARC operational problems
### Stuck runner kind or backing pod
**Problem**
Sometimes either the runner kind (`kubectl get runners`) or it's underlying pod can get stuck in a terminating state for various reasons. You can get the kind unstuck by removing its finaliser using something like this:
**Solution**
Remove the finaliser from the relevent runner kind or pod
# Get all pods that are stuck terminating and remove the finalizer
$ kubectl -n get pods | grep Terminating | awk {'print $1'} | xargs kubectl patch pod -p '{"metadata":{"finalizers":null}}'
```
_Note the code assumes you have already selected the namespace your runners are in and that they
are in a namespace not shared with anything else_
### Delay in jobs being allocated to runners
**Problem**
ARC isn't involved in jobs actually getting allocated to a runner. ARC is responsible for orchestrating runners and the runner lifecycle. Why some people see large delays in job allocation is not clear however it has been https://github.com/actions-runner-controller/actions-runner-controller/issues/1387#issuecomment-1122593984 that this is caused from the self-update process somehow.
**Solution**
Disable the self-update process in your runner manifests
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example-runnerdeployment-with-sleep
spec:
template:
spec:
...
env:
- name: DISABLE_RUNNER_UPDATE
value: "true"
```
### Runner coming up before network available
**Problem**
If you're running your action runners on a service mesh like Istio, you might
have problems with runner configuration accompanied by logs like:
```
....
runner Starting Runner listener with startup type: service
runner Started listener process
runner An error occurred: Not configured
runner Runner listener exited with error code 2
runner Runner listener exit with retryable error, re-launch runner in 5 seconds.
....
```
This is because the `istio-proxy` has not completed configuring itself when the
configuration script tries to communicate with the network.
More broadly, there are many other circumstances where the runner pod coming up first can cause issues.
**Solution**<br />
> Added originally to help users with older istio instances.
> Newer Istio instances can use Istio's `holdApplicationUntilProxyStarts` attribute ([istio/istio#11130](https://github.com/istio/istio/issues/11130)) to avoid having to delay starting up the runner.
> Please read the discussion in [#592](https://github.com/actions-runner-controller/actions-runner-controller/pull/592) for more information.
You can add a delay to the runner's entrypoint script by setting the `STARTUP_DELAY_IN_SECONDS` environment variable for the runner pod. This will cause the script to sleep X seconds, this works with any runner kind.
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: example-runnerdeployment-with-sleep
spec:
template:
spec:
...
env:
- name: STARTUP_DELAY_IN_SECONDS
value: "5"
```
## Outgoing network action hangs indefinitely
**Problem**
Some random outgoing network actions hangs indefinitely. This could be because your cluster does not give Docker the standard MTU of 1500, you can check this out by running `ip link` in a pod that encounters the problem and reading the outgoing interface's MTU value. If it is smaller than 1500, then try the following.
**Solution**
Add a `dockerMTU` key in your runner's spec with the value you read on the outgoing interface. For instance:
```yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: github-runner
namespace: github-system
spec:
replicas: 6
template:
spec:
dockerMTU: 1400
repository: $username/$repo
env: []
```
There may be more places you need to tweak for MTU.
Please consult issues like #651 for more information.
## Unable to scale to zero with TotalNumberOfQueuedAndInProgressWorkflowRuns
**Problem**
HRA doesn't scale the RunnerDeployment to zero, even though you did configure HRA correctly, to have a pull-based scaling metric `TotalNumberOfQueuedAndInProgressWorkflowRuns`, and set `minReplicas: 0`.
**Solution**
You very likely have some dangling workflow jobs stuck in `queued` or `in_progress` as seen in [#1057](https://github.com/actions-runner-controller/actions-runner-controller/issues/1057#issuecomment-1133439061).
Manually call [the "list workflow runs" API](https://docs.github.com/en/rest/actions/workflow-runs#list-workflow-runs-for-a-repository), and [remove the dangling workflow job(s)](https://docs.github.com/en/rest/actions/workflow-runs#delete-a-workflow-run).
# To prevent `CustomResourceDefinition.apiextensions.k8s.io "runners.actions.summerwind.dev" is invalid: metadata.annotations: Too long: must have at most 262144 bytes`
# Needed to report test success by crating a cm from within workflow job step
- apiGroups:[""]
resources:["configmaps"]
verbs:["create","delete"]
---
apiVersion:rbac.authorization.k8s.io/v1
kind:ClusterRole
metadata:
name:runner-status-updater
rules:
- apiGroups:["actions.summerwind.dev"]
resources:["runners/status"]
verbs:["get","update","patch"]
---
apiVersion:v1
kind:ServiceAccount
metadata:
name:${RUNNER_SERVICE_ACCOUNT_NAME}
namespace:${NAMESPACE}
---
# To verify it's working, try:
# kubectl auth can-i --as system:serviceaccount:default:runner get pod
# If incomplete, workflows and jobs would fail with an error message like:
# Error: Error: The Service account needs the following permissions [{"group":"","verbs":["get","list","create","delete"],"resource":"pods","subresource":""},{"group":"","verbs":["get","create"],"resource":"pods","subresource":"exec"},{"group":"","verbs":["get","list","watch"],"resource":"pods","subresource":"log"},{"group":"batch","verbs":["get","list","create","delete"],"resource":"jobs","subresource":""},{"group":"","verbs":["create","delete","get","list"],"resource":"secrets","subresource":""}] on the pod resource in the 'default' namespace. Please contact your self hosted runner administrator.
# Error: Process completed with exit code 1.
apiVersion:rbac.authorization.k8s.io/v1
# This role binding allows "jane" to read pods in the "default" namespace.
# You need to already have a Role named "pod-reader" in that namespace.
# Set actions-runner-controller settings for testing
logLevel:"-4"
imagePullSecrets:[]
image:
# This needs to be an empty array rather than a single-item array with empty name.
# Otherwise you end up with the following error on helm-upgrade:
# Error: UPGRADE FAILED: failed to create patch: map: map[] does not contain declared merge key: name && failed to create patch: map: map[] does not contain declared merge key: name
All additional docs are kept in the `docs/` folder, this README is solely for documenting the values.yaml keys and values
## Values
**_The values are documented as of HEAD, to review the configuration options for your chart version ensure you view this file at the relevant [tag](https://github.com/actions-runner-controller/actions-runner-controller/tags)_**
> _Default values are the defaults set in the charts `values.yaml`, some properties have default configurations in the code for when the property is omitted or invalid_
| `authSecret.name` | Set the name of the auth secret | controller-manager |
| `authSecret.annotations` | Set annotations for the auth Secret | |
| `authSecret.github_app_id` | The ID of your GitHub App. **This can't be set at the same time as `authSecret.github_token`** | |
| `authSecret.github_app_installation_id` | The ID of your GitHub App installation. **This can't be set at the same time as `authSecret.github_token`** | |
| `authSecret.github_app_private_key` | The multiline string of your GitHub App's private key. **This can't be set at the same time as `authSecret.github_token`** | |
| `authSecret.github_token` | Your chosen GitHub PAT token. **This can't be set at the same time as the `authSecret.github_app_*`** | |
| `authSecret.github_basicauth_username` | Username for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API | |
| `authSecret.github_basicauth_password` | Password for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API | |
| `dockerRegistryMirror` | The default Docker Registry Mirror used by runners. | |
| `hostNetwork` | The "hostNetwork" of the controller container | false |
| `image.repository` | The "repository/image" of the controller container | summerwind/actions-runner-controller |
| `image.tag` | The tag of the controller container | |
| `image.actionsRunnerRepositoryAndTag` | The "repository/image" of the actions runner container | summerwind/actions-runner:latest |
| `image.actionsRunnerImagePullSecrets` | Optional image pull secrets to be included in the runner pod's ImagePullSecrets | |
| `image.dindSidecarRepositoryAndTag` | The "repository/image" of the dind sidecar container | docker:dind |
| `image.pullPolicy` | The pull policy of the controller image | IfNotPresent |
| `metrics.serviceMonitor` | Deploy serviceMonitor kind for for use with prometheus-operator CRDs | false |
| `metrics.serviceAnnotations` | Set annotations for the provisioned metrics service resource | |
| `metrics.port` | Set port of metrics service | 8443 |
| `metrics.proxy.enabled` | Deploy kube-rbac-proxy container in controller pod | true |
| `metrics.proxy.image.repository` | The "repository/image" of the kube-proxy container | quay.io/brancz/kube-rbac-proxy |
| `metrics.proxy.image.tag` | The tag of the kube-proxy image to use when pulling the container | v0.10.0 |
| `metrics.serviceMonitorLabels` | Set labels to apply to ServiceMonitor resources | |
| `imagePullSecrets` | Specifies the secret to be used when pulling the controller pod containers | |
| `fullnameOverride` | Override the full resource names | |
| `nameOverride` | Override the resource name prefix | |
| `serviceAccount.annotations` | Set annotations to the service account | |
| `serviceAccount.create` | Deploy the controller pod under a service account | true |
| `podAnnotations` | Set annotations for the controller pod | |
| `podLabels` | Set labels for the controller pod | |
| `serviceAccount.name` | Set the name of the service account | |
| `securityContext` | Set the security context for each container in the controller pod | |
| `podSecurityContext` | Set the security context to controller pod | |
| `service.annotations` | Set annotations for the provisioned webhook service resource | |
| `service.port` | Set controller service ports | |
| `service.type` | Set controller service type | |
| `topologySpreadConstraints` | Set the controller pod topologySpreadConstraints | |
| `nodeSelector` | Set the controller pod nodeSelector | |
| `resources` | Set the controller pod resources | |
| `affinity` | Set the controller pod affinity rules | |
| `podDisruptionBudget.enabled` | Enables a PDB to ensure HA of controller pods | false |
| `podDisruptionBudget.minAvailable` | Minimum number of pods that must be available after eviction | |
| `podDisruptionBudget.maxUnavailable` | Maximum number of pods that can be unavailable after eviction. Kubernetes 1.7+ required. | |
| `tolerations` | Set the controller pod tolerations | |
| `env` | Set environment variables for the controller container | |
| `priorityClassName` | Set the controller pod priorityClassName | |
| `scope.watchNamespace` | Tells the controller and the github webhook server which namespace to watch if `scope.singleNamespace` is true | `Release.Namespace` (the default namespace of the helm chart). |
| `scope.singleNamespace` | Limit the controller to watch a single namespace | false |
| `certManagerEnabled` | Enable cert-manager. If disabled you must set admissionWebHooks.caBundle and create TLS secrets manually | true |
| `runner.statusUpdateHook.enabled` | Use custom RBAC for runners (role, role binding and service account), this will enable reporting runner statuses | false |
| `admissionWebHooks.caBundle` | Base64-encoded PEM bundle containing the CA that signed the webhook's serving certificate | |
| `githubWebhookServer.logLevel` | Set the log level of the githubWebhookServer container | |
| `githubWebhookServer.replicaCount` | Set the number of webhook server pods | 1 |
| `githubWebhookServer.useRunnerGroupsVisibility` | Enable supporting runner groups with custom visibility. This will incur in extra API calls and may blow up your budget. Currently, you also need to set `githubWebhookServer.secret.enabled` to enable this feature. | false |
| `githubWebhookServer.enabled` | Deploy the webhook server pod | false |
| `githubWebhookServer.queueLimit` | Set the queue size limit in the githubWebhookServer | |
| `githubWebhookServer.secret.enabled` | Passes the webhook hook secret to the github-webhook-server | false |
| `githubWebhookServer.secret.name` | Set the name of the webhook hook secret | github-webhook-server |
| `githubWebhookServer.secret.github_webhook_secret_token` | Set the webhook secret token value | |
| `githubWebhookServer.imagePullSecrets` | Specifies the secret to be used when pulling the githubWebhookServer pod containers | |
| `githubWebhookServer.nameOverride` | Override the resource name prefix | |
| `githubWebhookServer.fullnameOverride` | Override the full resource names | |
| `githubWebhookServer.serviceAccount.create` | Deploy the githubWebhookServer under a service account | true |
| `githubWebhookServer.serviceAccount.annotations` | Set annotations for the service account | |
| `githubWebhookServer.serviceAccount.name` | Set the service account name | |
| `githubWebhookServer.podAnnotations` | Set annotations for the githubWebhookServer pod | |
| `githubWebhookServer.podLabels` | Set labels for the githubWebhookServer pod | |
| `githubWebhookServer.podSecurityContext` | Set the security context to githubWebhookServer pod | |
| `githubWebhookServer.securityContext` | Set the security context for each container in the githubWebhookServer pod | |
| `githubWebhookServer.resources` | Set the githubWebhookServer pod resources | |
| `githubWebhookServer.topologySpreadConstraints` | Set the githubWebhookServer pod topologySpreadConstraints | |
| `githubWebhookServer.nodeSelector` | Set the githubWebhookServer pod nodeSelector | |
| `githubWebhookServer.tolerations` | Set the githubWebhookServer pod tolerations | |
| `githubWebhookServer.affinity` | Set the githubWebhookServer pod affinity rules | |
| `githubWebhookServer.priorityClassName` | Set the githubWebhookServer pod priorityClassName | |
| `githubWebhookServer.service.type` | Set githubWebhookServer service type | |
| `githubWebhookServer.service.ports` | Set githubWebhookServer service ports | `[{"port":80, "targetPort:"http", "protocol":"TCP", "name":"http"}]` |
| `githubWebhookServer.ingress.enabled` | Deploy an ingress kind for the githubWebhookServer | false |
| `githubWebhookServer.ingress.annotations` | Set annotations for the ingress kind | |
| `githubWebhookServer.ingress.hosts` | Set hosts configuration for ingress | `[{"host": "chart-example.local", "paths": []}]` |
| `githubWebhookServer.ingress.tls` | Set tls configuration for ingress | |
| `githubWebhookServer.ingress.ingressClassName` | Set ingress class name | |
| `githubWebhookServer.podDisruptionBudget.enabled` | Enables a PDB to ensure HA of githubwebhook pods | false |
| `githubWebhookServer.podDisruptionBudget.minAvailable` | Minimum number of pods that must be available after eviction | |
| `githubWebhookServer.podDisruptionBudget.maxUnavailable` | Maximum number of pods that can be unavailable after eviction. Kubernetes 1.7+ required. | |
description:HorizontalRunnerAutoscaler is the Schema for the horizontalrunnerautoscaler API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info:https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info:https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:HorizontalRunnerAutoscalerSpec defines the desired state of HorizontalRunnerAutoscaler
properties:
capacityReservations:
items:
description:CapacityReservation specifies the number of replicas temporarily added to the scale target until ExpirationTime.
properties:
effectiveTime:
format:date-time
type:string
expirationTime:
format:date-time
type:string
name:
type:string
replicas:
type:integer
type:object
type:array
githubAPICredentialsFrom:
properties:
secretRef:
properties:
name:
type:string
required:
- name
type:object
type:object
maxReplicas:
description:MaxReplicas is the maximum number of replicas the deployment is allowed to scale
type:integer
metrics:
description:Metrics is the collection of various metric targets to calculate desired number of runners
items:
properties:
repositoryNames:
description:RepositoryNames is the list of repository names to be used for calculating the metric. For example, a repository name is the REPO part of `github.com/USER/REPO`.
items:
type:string
type:array
scaleDownAdjustment:
description:ScaleDownAdjustment is the number of runners removed on scale-down. You can only specify either ScaleDownFactor or ScaleDownAdjustment.
type:integer
scaleDownFactor:
description:ScaleDownFactor is the multiplicative factor applied to the current number of runners used to determine how many pods should be removed.
type:string
scaleDownThreshold:
description:ScaleDownThreshold is the percentage of busy runners less than which will trigger the hpa to scale the runners down.
type:string
scaleUpAdjustment:
description:ScaleUpAdjustment is the number of runners added on scale-up. You can only specify either ScaleUpFactor or ScaleUpAdjustment.
type:integer
scaleUpFactor:
description:ScaleUpFactor is the multiplicative factor applied to the current number of runners used to determine how many pods should be added.
type:string
scaleUpThreshold:
description:ScaleUpThreshold is the percentage of busy runners greater than which will trigger the hpa to scale runners up.
type:string
type:
description:Type is the type of metric to be used for autoscaling. It can be TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy.
type:string
type:object
type:array
minReplicas:
description:MinReplicas is the minimum number of replicas the deployment is allowed to scale
type:integer
scaleDownDelaySecondsAfterScaleOut:
description:ScaleDownDelaySecondsAfterScaleUp is the approximate delay for a scale down followed by a scale up Used to prevent flapping (down->up->down->... loop)
type:integer
scaleTargetRef:
description:ScaleTargetRef sis the reference to scaled resource like RunnerDeployment
properties:
kind:
description:Kind is the type of resource being referenced
enum:
- RunnerDeployment
- RunnerSet
type:string
name:
description:Name is the name of resource being referenced
type:string
type:object
scaleUpTriggers:
description:"ScaleUpTriggers is an experimental feature to increase the desired replicas by 1 on each webhook requested received by the webhookBasedAutoscaler. \n This feature requires you to also enable and deploy the webhookBasedAutoscaler onto your cluster. \n Note that the added runners remain until the next sync period at least, and they may or may not be used by GitHub Actions depending on the timing. They are intended to be used to gain \"resource slack\" immediately after you receive a webhook from GitHub, so that you can loosely expect MinReplicas runners to be always available."
description:Names is a list of GitHub Actions glob patterns. Any check_run event whose name matches one of patterns in the list can trigger autoscaling. Note that check_run name seem to equal to the job name you've defined in your actions workflow yaml file. So it is very likely that you can utilize this to trigger depending on the job.
items:
type:string
type:array
repositories:
description:Repositories is a list of GitHub repositories. Any check_run event whose repository matches one of repositories in the list can trigger autoscaling.
items:
type:string
type:array
status:
type:string
types:
description: 'One of:created, rerequested, or completed'
description:PushSpec is the condition for triggering scale-up on push event Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
description:ScheduledOverrides is the list of ScheduledOverride. It can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule. The earlier a scheduled override is, the higher it is prioritized.
items:
description:ScheduledOverride can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule. A schedule can optionally be recurring, so that the corresponding override happens every day, week, month, or year.
properties:
endTime:
description:EndTime is the time at which the first override ends.
format:date-time
type:string
minReplicas:
description:MinReplicas is the number of runners while overriding. If omitted, it doesn't override minReplicas.
minimum:0
nullable:true
type:integer
recurrenceRule:
properties:
frequency:
description:Frequency is the name of a predefined interval of each recurrence. The valid values are "Daily", "Weekly", "Monthly", and "Yearly". If empty, the corresponding override happens only once.
enum:
- Daily
- Weekly
- Monthly
- Yearly
type:string
untilTime:
description:UntilTime is the time of the final recurrence. If empty, the schedule recurs forever.
format:date-time
type:string
type:object
startTime:
description:StartTime is the time at which the first override starts.
format:date-time
type:string
required:
- endTime
- startTime
type:object
type:array
type:object
status:
properties:
cacheEntries:
items:
properties:
expirationTime:
format:date-time
type:string
key:
type:string
value:
type:integer
type:object
type:array
desiredReplicas:
description:DesiredReplicas is the total number of desired, non-terminated and latest pods to be set for the primary RunnerSet This doesn't include outdated pods while upgrading the deployment and replacing the runnerset.
type:integer
lastSuccessfulScaleOutTime:
format:date-time
nullable:true
type:string
observedGeneration:
description:ObservedGeneration is the most recent generation observed for the target. It corresponds to e.g. RunnerDeployment's generation, which is updated on mutation by the API Server.
format:int64
type:integer
scheduledOverridesSummary:
description:ScheduledOverridesSummary is the summary of active and upcoming scheduled overrides to be shown in e.g. a column of a `kubectl get hra` output for observability.
This project makes extensive use of CRDs to provide much of its functionality. Helm unfortunately does not support [managing](https://helm.sh/docs/chart_best_practices/custom_resource_definitions/) CRDs by design:
_The full breakdown as to how they came to this decision and why they have taken the approach they have for dealing with CRDs can be found in [Helm Improvement Proposal 11](https://github.com/helm/community/blob/main/hips/hip-0011.md)_
```
There is no support at this time for upgrading or deleting CRDs using Helm. This was an explicit decision after much
community discussion due to the danger for unintentional data loss. Furthermore, there is currently no community
consensus around how to handle CRDs and their lifecycle. As this evolves, Helm will add support for those use cases.
```
Helm will do an initial install of CRDs but it will not touch them afterwards (update or delete).
Additionally, because the project leverages CRDs so extensively you **MUST** run the matching controller app container with its matching CRDs i.e. always redeploy your CRDs if you are changing the app version.
Due to the above you can't just do a `helm upgrade` to release the latest version of the chart, the best practice steps are recorded below:
## Steps
1. Upgrade CRDs, this isn't optional, the CRDs you are using must be those that correspond with the version of the controller you are installing
```shell
# REMEMBER TO UPDATE THE CHART_VERSION TO RELEVANT CHART VERISON!!!!
CHART_VERSION=0.18.0
curl -L https://github.com/actions-runner-controller/actions-runner-controller/releases/download/actions-runner-controller-${CHART_VERSION}/actions-runner-controller-${CHART_VERSION}.tgz | tar zxv --strip 1 actions-runner-controller/crds
flag.StringVar(&webhookAddr,"webhook-addr",":8000","The address the metric endpoint binds to.")
flag.StringVar(&metricsAddr,"metrics-addr",":8080","The address the metric endpoint binds to.")
flag.StringVar(&watchNamespace,"watch-namespace","","The namespace to watch for HorizontalRunnerAutoscaler's to scale on Webhook. Set to empty for letting it watch for all namespaces.")
flag.StringVar(&logLevel,"log-level",logging.LogLevelDebug,`The verbosity of the logging. Valid values are "debug", "info", "warn", "error". Defaults to "debug".`)
flag.IntVar(&queueLimit,"queue-limit",controllers.DefaultQueueLimit,`The maximum length of the scale operation queue. The scale opration is enqueued per every matching webhook event, and the server returns a 500 HTTP status when the queue was already full on enqueue attempt.`)
flag.StringVar(&webhookSecretToken,"github-webhook-secret-token","","The personal access token of GitHub.")
flag.StringVar(&c.Token,"github-token",c.Token,"The personal access token of GitHub.")
flag.Int64Var(&c.AppID,"github-app-id",c.AppID,"The application ID of GitHub App.")
flag.Int64Var(&c.AppInstallationID,"github-app-installation-id",c.AppInstallationID,"The installation ID of GitHub App.")
flag.StringVar(&c.AppPrivateKey,"github-app-private-key",c.AppPrivateKey,"The path of a private key file to authenticate as a GitHub App")
flag.StringVar(&c.URL,"github-url",c.URL,"GitHub URL to be used for GitHub API calls")
flag.StringVar(&c.UploadURL,"github-upload-url",c.UploadURL,"GitHub Upload URL to be used for GitHub API calls")
flag.StringVar(&c.BasicauthUsername,"github-basicauth-username",c.BasicauthUsername,"Username for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API")
flag.StringVar(&c.BasicauthPassword,"github-basicauth-password",c.BasicauthPassword,"Password for GitHub basic auth to use instead of PAT or GitHub APP in case it's running behind a proxy API")
flag.StringVar(&c.RunnerGitHubURL,"runner-github-url",c.RunnerGitHubURL,"GitHub URL to be used by runners during registration")
setupLog.Info(fmt.Sprintf("Using the value from %s for -github-webhook-secret-token",webhookSecretTokenEnvName))
webhookSecretToken=webhookSecretTokenEnv
}
ifwebhookSecretToken==""{
setupLog.Info(fmt.Sprintf("-github-webhook-secret-token and %s are missing or empty. Create one following https://docs.github.com/en/developers/webhooks-and-events/securing-your-webhooks and specify it via the flag or the envvar",webhookSecretTokenEnvName))
}
ifwatchNamespace==""{
setupLog.Info("-watch-namespace is empty. HorizontalRunnerAutoscalers in all the namespaces are watched, cached, and considered as scale targets.")
}else{
setupLog.Info("-watch-namespace is %q. Only HorizontalRunnerAutoscalers in %q are watched, cached, and considered as scale targets.")
}
logger:=logging.NewLogger(logLevel)
ctrl.SetLogger(logger)
// In order to support runner groups with custom visibility (selected repositories), we need to perform some GitHub API calls.
// Let the user define if they want to opt-in supporting this option by providing the proper GitHub authentication parameters
// Without an opt-in, runner groups with custom visibility won't be supported to save API calls
// That is, all runner groups managed by ARC are assumed to be visible to any repositories,
// which is wrong when you have one or more non-default runner groups in your organization or enterprise.
setupLog.Error(err,"unable to create controller","controller","Runner")
os.Exit(1)
}
}else{
setupLog.Info("GitHub client is not initialized. Runner groups with custom visibility are not supported. If needed, please provide GitHub authentication. This will incur in extra GitHub API calls")
description:HorizontalRunnerAutoscaler is the Schema for the horizontalrunnerautoscaler
API
description:HorizontalRunnerAutoscaler is the Schema for the horizontalrunnerautoscaler API
properties:
apiVersion:
description:'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info:https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type:string
kind:
description:'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info:https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type:string
metadata:
type:object
spec:
description:HorizontalRunnerAutoscalerSpec defines the desired state of
HorizontalRunnerAutoscaler
description:HorizontalRunnerAutoscalerSpec defines the desired state of HorizontalRunnerAutoscaler
properties:
capacityReservations:
items:
description:CapacityReservation specifies the number of replicas temporarily added to the scale target until ExpirationTime.
properties:
effectiveTime:
format:date-time
type:string
expirationTime:
format:date-time
type:string
name:
type:string
replicas:
type:integer
type:object
type:array
githubAPICredentialsFrom:
properties:
secretRef:
properties:
name:
type:string
required:
- name
type:object
type:object
maxReplicas:
description:MinReplicas is the maximum number of replicas the deployment
is allowed to scale
description:MaxReplicas is the maximum number of replicas the deployment is allowed to scale
type:integer
metrics:
description:Metrics is the collection of various metric targets to
calculate desired number of runners
description:Metrics is the collection of various metric targets to calculate desired number of runners
items:
properties:
repositoryNames:
description:RepositoryNames is the list of repository names to
be used for calculating the metric. For example, a repository
name is the REPO part of `github.com/USER/REPO`.
description:RepositoryNames is the list of repository names to be used for calculating the metric. For example, a repository name is the REPO part of `github.com/USER/REPO`.
items:
type:string
type:array
scaleDownAdjustment:
description:ScaleDownAdjustment is the number of runners removed on scale-down. You can only specify either ScaleDownFactor or ScaleDownAdjustment.
type:integer
scaleDownFactor:
description:ScaleDownFactor is the multiplicative factor applied to the current number of runners used to determine how many pods should be removed.
type:string
scaleDownThreshold:
description:ScaleDownThreshold is the percentage of busy runners less than which will trigger the hpa to scale the runners down.
type:string
scaleUpAdjustment:
description:ScaleUpAdjustment is the number of runners added on scale-up. You can only specify either ScaleUpFactor or ScaleUpAdjustment.
type:integer
scaleUpFactor:
description:ScaleUpFactor is the multiplicative factor applied to the current number of runners used to determine how many pods should be added.
type:string
scaleUpThreshold:
description:ScaleUpThreshold is the percentage of busy runners greater than which will trigger the hpa to scale runners up.
type:string
type:
description:Type is the type of metric to be used for autoscaling.
The only supported Type is TotalNumberOfQueuedAndInProgressWorkflowRuns
description:Type is the type of metric to be used for autoscaling. It can be TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy.
type:string
type:object
type:array
minReplicas:
description:MinReplicas is the minimum number of replicas the deployment
is allowed to scale
description:MinReplicas is the minimum number of replicas the deployment is allowed to scale
type:integer
scaleDownDelaySecondsAfterScaleOut:
description:ScaleDownDelaySecondsAfterScaleUp is the approximate delay
for a scale down followed by a scale up Used to prevent flapping (down->up->down->...
loop)
description:ScaleDownDelaySecondsAfterScaleUp is the approximate delay for a scale down followed by a scale up Used to prevent flapping (down->up->down->... loop)
type:integer
scaleTargetRef:
description:ScaleTargetRef sis the reference to scaled resource like
RunnerDeployment
description:ScaleTargetRef sis the reference to scaled resource like RunnerDeployment
properties:
kind:
description:Kind is the type of resource being referenced
enum:
- RunnerDeployment
- RunnerSet
type:string
name:
description:Name is the name of resource being referenced
type:string
type:object
scaleUpTriggers:
description:"ScaleUpTriggers is an experimental feature to increase the desired replicas by 1 on each webhook requested received by the webhookBasedAutoscaler. \n This feature requires you to also enable and deploy the webhookBasedAutoscaler onto your cluster. \n Note that the added runners remain until the next sync period at least, and they may or may not be used by GitHub Actions depending on the timing. They are intended to be used to gain \"resource slack\" immediately after you receive a webhook from GitHub, so that you can loosely expect MinReplicas runners to be always available."
description:Names is a list of GitHub Actions glob patterns. Any check_run event whose name matches one of patterns in the list can trigger autoscaling. Note that check_run name seem to equal to the job name you've defined in your actions workflow yaml file. So it is very likely that you can utilize this to trigger depending on the job.
items:
type:string
type:array
repositories:
description:Repositories is a list of GitHub repositories. Any check_run event whose repository matches one of repositories in the list can trigger autoscaling.
items:
type:string
type:array
status:
type:string
types:
description: 'One of:created, rerequested, or completed'
description:PushSpec is the condition for triggering scale-up on push event Also see https://docs.github.com/en/actions/reference/events-that-trigger-workflows#push
description:ScheduledOverrides is the list of ScheduledOverride. It can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule. The earlier a scheduled override is, the higher it is prioritized.
items:
description:ScheduledOverride can be used to override a few fields of HorizontalRunnerAutoscalerSpec on schedule. A schedule can optionally be recurring, so that the corresponding override happens every day, week, month, or year.
properties:
endTime:
description:EndTime is the time at which the first override ends.
format:date-time
type:string
minReplicas:
description:MinReplicas is the number of runners while overriding. If omitted, it doesn't override minReplicas.
minimum:0
nullable:true
type:integer
recurrenceRule:
properties:
frequency:
description:Frequency is the name of a predefined interval of each recurrence. The valid values are "Daily", "Weekly", "Monthly", and "Yearly". If empty, the corresponding override happens only once.
enum:
- Daily
- Weekly
- Monthly
- Yearly
type:string
untilTime:
description:UntilTime is the time of the final recurrence. If empty, the schedule recurs forever.
format:date-time
type:string
type:object
startTime:
description:StartTime is the time at which the first override starts.
format:date-time
type:string
required:
- endTime
- startTime
type:object
type:array
type:object
status:
properties:
cacheEntries:
items:
properties:
expirationTime:
format:date-time
type:string
key:
type:string
value:
type:integer
type:object
type:array
desiredReplicas:
description:DesiredReplicas is the total number of desired, non-terminated
and latest pods to be set for the primary RunnerSet This doesn't include
outdated pods while upgrading the deployment and replacing the runnerset.
description:DesiredReplicas is the total number of desired, non-terminated and latest pods to be set for the primary RunnerSet This doesn't include outdated pods while upgrading the deployment and replacing the runnerset.
type:integer
lastSuccessfulScaleOutTime:
format:date-time
nullable:true
type:string
observedGeneration:
description:ObservedGeneration is the most recent generation observed
for the target. It corresponds to e.g. RunnerDeployment's generation,
which is updated on mutation by the API Server.
description:ObservedGeneration is the most recent generation observed for the target. It corresponds to e.g. RunnerDeployment's generation, which is updated on mutation by the API Server.
format:int64
type:integer
scheduledOverridesSummary:
description:ScheduledOverridesSummary is the summary of active and upcoming scheduled overrides to be shown in e.g. a column of a `kubectl get hra` output for observability.
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.