Commit Graph

61 Commits

Author SHA1 Message Date
Nikola Jokic
07bff8aa1e Extend the user agent and fix the build version for the listener app (#2892) 2023-09-14 20:10:49 +02:00
Nikola Jokic
a0a3916c80 Provide scale-set listener metrics (#2559)
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-08-21 13:50:07 +02:00
Nikola Jokic
6fe8008640 Add configurable log format to values.yaml and propagate it to listener (#2686) 2023-07-05 21:06:42 +02:00
Bassem Dghaidi
8afef51c8b Add DrainJobsMode (aka UpdateStrategy feature) (#2569) 2023-05-23 07:42:30 -04:00
Yusuke Kuoka
3e0bc3f7be Fix docker.sock permission error for non-dind Ubuntu 20.04 runners since v0.27.2 (#2499)
#2490 has been happening since v0.27.2 for non-dind runners based on Ubuntu 20.04 runner images. It does not affect Ubuntu 22.04 non-dind runners(i.e. runners with dockerd sidecars) and Ubuntu 20.04/22.04 dind runners(i.e. runners without dockerd sidecars). However, presuming many folks are still using Ubuntu 20.04 runners and non-dind runners, we should fix it.

This change tries to fix it by defaulting to the docker group id 1001 used by Ubuntu 20.04 runners, and use gid 121 for Ubuntu 22.04 runners. We use the image tag to see which Ubuntu version the runner is based on. The algorithm is so simple- we assume it's Ubuntu-22.04-based if the image tag contains "22.04".

This might be a breaking change for folks who have already upgraded to Ubuntu 22.04 runners using their own custom runner images. Note again; we rely on the image tag to detect Ubuntu 22.04 runner images and use the proper docker gid- Folks using our official Ubuntu 22.04 runner images are not affected. It is a breaking change anyway, so I have added a remedy-

ARC got a new flag, `--docker-gid`, which defaults to `1001` but can be set to `121` or whatever gid the operator/admin likes. This can be set to `--docker-gid=121`, for example, if you are using your own custom runner image based on Ubuntu 22.04 and the image tag does not contain "22.04".

Fixes #2490
2023-04-17 21:30:41 +09:00
Nikola Jokic
a804bf8b00 Add ImagePullPolicy to the AutoscalingListener, configurable through Manager env (#2477) 2023-04-04 19:07:20 +02:00
Tingluo Huang
40811ebe0e Support the controller to watching a single namespace. (#2374) 2023-03-14 10:52:25 -04:00
Tingluo Huang
2bf83d0d7f Remove list/watch secrets permission from the manager cluster role. (#2276) 2023-03-14 09:23:14 -04:00
Nikola Jokic
a5f98dea75 Refactor main.go and introduce make run-scaleset to be able to run manager locally (#2337) 2023-03-10 18:05:51 +01:00
Yusuke Kuoka
a44fe04bef Fix manager crashloopback for ARC deployments without scaleset-related controllers (#2293) 2023-02-21 08:18:59 +09:00
Tingluo Huang
622eaa34f8 Introduce new preview auto-scaling mode for ARC. (#2153)
Co-authored-by: Cory Miller <cory-miller@github.com>
Co-authored-by: Nikola Jokic <nikola-jokic@github.com>
Co-authored-by: Ava Stancu <AvaStancu@github.com>
Co-authored-by: Ferenc Hammerl <fhammerl@github.com>
Co-authored-by: Francesco Renzi <rentziass@github.com>
Co-authored-by: Bassem Dghaidi <Link-@github.com>
2023-01-17 12:06:20 -05:00
Nikola Jokic
aa6dab5a9a Changes to folder structure to allow multigroups and changed go mod name (#2105)
* Changed folder structure to allow multi group registration

* included actions.github.com directory for resources and controllers

* updated go module to actions/actions-runner-controller

* publish arc packages under actions-runner-controller

* Update charts/actions-runner-controller/docs/UPGRADING.md

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-12-28 09:38:34 +09:00
malachiobadeyi
fbdfe0df8c 1770 update log format and add additional fields to webhook server logs (#1771)
* 1770 update log format and add runID and Id to worflow logs

update tests, change log format for controllers.HorizontalRunnerAutoscalerGitHubWebhook

use logging package

remove unused modules

add setup name to setuplog

add flag to change log format

change flag name to enableProdLogConfig

move log opts to logger package

remove empty else and reset timeEncoder

update flag description

use get function to handle nil

rename flag and update logger function

Update main.go

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

Update controllers/horizontal_runner_autoscaler_webhook.go

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

Update logging/logger.go

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

copy log opt per each NewLogger call

revert to use autoscaler.log

update flag descript and remove unused imports

add logFormat to readme

 rename setupLog to logger

make fmt

* Fix E2E along the way

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-11-04 10:46:58 +09:00
Barun Mishra
921daff61b Add cmd line arg for enterprise url. Fix enterprise bug. (#1)
* Add cmd line arg for enterprise url. Fix enterprise bug.

* Fix package import order

* Fix comment
2022-09-05 13:50:17 +01:00
Viktor Lindgren
ca97f39fcb Print Version Number on startup (#1659)
* Changed Dockerfile to get the Enviroment variable from the github actions workflow and pass it to the main.go file

Added a function in main.go to fetch the enviroment varible and to have a fallback if the env variable isnt there

Added a test for the version to use for this branch only

* Update test-version.yaml

* Update test-version.yaml

* Removed the test because its not needed when we push upstream

* Moved the version print in main.go to the Log codeblock as requested by toast-gear

Added version as issue#1161 requests.

Decided to use a docker tag structure for the userAgent string, with : being a seperator of the name and version

* Used ldflags instead like mumoshu recommended

Changed Dockerfile to use $VERSION from the workflow

Added version.go and the build package
Removed the getVersion function as we can just get the value directly

* Used ldflags instead like mumoshu recommended

Changed Dockerfile to use $VERSION from the workflow

Added version.go and the build package
Removed the getVersion function as we can just get the value directly

* * Removed the default from the go code (set it as N/A)
* Changed version from latest to dev inside makefile
* Added buildarg for version to the dockerfile in the makerfile
* Added VERSION with default dev value as arg inside dockerfile
* Cleaned up inside dockerfile

* Fix failing test

* Fix possible missing VERSION in the ARC UA suffix due to missing build arg in docker-build-push step

Co-authored-by: S8338C <viktor.lindgren@seb.se>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-08-23 13:40:16 +09:00
Yusuke Kuoka
8071ac7066 Remove github-api-cache-duration flag and code (#1631)
This removes the flag and code for the legacy GitHub API cache. We already migrated to fully use the new HTTP cache based API cache functionality which had been added via #1127 and available since ARC 0.22.0. Since then, the legacy one had been no-op and therefore removing it is safe.

Ref #1412
2022-07-12 20:37:24 +09:00
Yusuke Kuoka
618276e3d3 Enhance support for multi-tenancy (#1371)
This enhances every ARC controller and the various K8s custom resources so that the user can now configure a custom GitHub API credentials (that is different from the default one configured per the ARC instance).

Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1067#issuecomment-1043716646
2022-07-12 09:45:00 +09:00
Felipe Galindo Sanchez
11cb9b7882 feat: allow to discover runner statuses (#1268)
* feat: allow to discover runner statuses

* fix manifests

* Bump runner version to 2.289.1 which includes the hooks support

* Add feedback from review

* Update reference to newRunnerPod

* Fix TestNewRunnerPodFromRunnerController and make hooks file names job specific

* Fix additional TestNewRunnerPod test

* Cover additional feedback from review

* fix rbac manager role

* Add permissions to service account for container mode if not provided

* Rename flag to runner.statusUpdateHook.enabled and fix needsServiceAccount

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-07-10 15:11:29 +09:00
Nicholas Farley
95ddc77245 Allow customizing the controller webhook port (#1410)
Closes #1314

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-05-16 10:33:13 +09:00
Yusuke Kuoka
b5194fd75a Enhance RunnerSet to optionally retain PVs accross restarts (#1340)
* Enhance RunnerSet to optionally retain PVs accross restarts

This is our initial attempt to bring back the ability to retain PVs across runner pod restarts when using RunnerSet.
The implementation is composed of two new controllers, `runnerpersistentvolumeclaim-controller` and `runnerpersistentvolume-controller`.
It all starts from our existing `runnerset-controller`. The controller now tries to mark any PVCs created by StatefulSets created for the RunnerSet.
Once the controller terminated statefulsets, their corresponding PVCs are clean up by `runnerpersistentvolumeclaim-controller`, then PVs are unbound from their corresponding PVCs by `runnerpersistentvolume-controller` so that they can be reused by future PVCs createf for future StatefulSets that shares the same same StorageClass.

Ref #1286

* Update E2E test suite to cover runner, docker, and go caching with RunnerSet + PVs

Ref #1286
2022-05-16 09:26:48 +09:00
Callum Tait
51ba7d7160 chore: more initialisation info to help debug (#1276)
* chore: more initialisation info to help debug

* chore: clearer flag description

* chore: use actual english

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-05-14 17:11:20 +09:00
Callum Tait
c3e280eadb refactor: set sync period default to 1m (#1308)
Fixes: #1294

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-04-24 10:47:00 +09:00
Callum Tait
24aae58dbc feat: default scale down flag (#963)
Resolves #899

Co-authored-by: Callum <callum@domain.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-04-20 11:09:09 +09:00
Yusuke Kuoka
55ff4de79a Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192)
* Remove legacy GitHub API cache of HRA.Status.CachedEntries

We migrated to the transport-level cache introduced in #1127 so not only this is useless, it is harder to deduce which cache resulted in the desired replicas number calculated by HRA.
Just remove the legacy cache to keep it simple and easy to understand.

* Deprecate githubAPICacheDuration helm chart value and the --github-api-cache-duration as well

* Fix integration test
2022-03-08 19:05:43 +09:00
Yusuke Kuoka
4b557dc54c Add logging transport to log HTTP requests in log level -3
The log level -3 is the minimum log level that is supported today, smaller than debug(-1) and -2(used to log some HRA related logs).

This commit adds a logging HTTP transport to log HTTP requests and responses to that log level.

It implements http.RoundTripper so that it can log each HTTP request with useful metadata like `from_cache` and `ratelimit_remaining`.
The former is set to `true` only when the logged request's response was served from ARC's in-memory cache.
The latter is set to X-RateLimit-Remaining response header value if and only if the response was served by GitHub, not by ARC's cache.
2022-02-19 12:22:53 +00:00
Felipe Galindo Sanchez
de1f48111a feat: support routing GitHub API calls to custom proxy API (#1017)
GitHub currently has some limitations w.r.t permissions management on
runner groups as they all require org admin, however at our company
we're using runner groups to serve different internal teams (with
different permissions), thus we needed to deploy a custom proxy API with
our internal authentication to provide who has access to certain APIs
depending on the repository/runner group on a given org/enterprise

This change just allows to optionally send the GitHub API calls to an alternate custom
proxy URL instead of cloud github (github.com) or an enterprise URL with
basic authentication

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-23 09:24:10 +09:00
Felipe Galindo Sanchez
9bb21aef1f Add support for default image pull secret name (#921)
Resolves #896

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-15 09:29:31 +09:00
Yusuke Kuoka
b6c33cee32 Fail with detailed message on envvar parse error (#907)
Ref #829
2021-11-03 10:24:24 +09:00
Rolf Ahrenberg
5da808af96 Allow defining unique election leader id 2021-09-14 16:37:04 +09:00
Rolf Ahrenberg
14564c7b8e Allow disabling /runner emptydir mounts and setting storage volume (#674)
* Allow disabling /runner emptydir mounts

* Support defining storage medium for emptydirs

* Fix typos
2021-07-15 06:29:58 +09:00
Sebastien Le Digabel
7f2795b5d6 Adding a default docker registry mirror (#689)
* Adding a default docker registry mirror

This change allows the controller to start with a specified default
docker registry mirror and avoid having to specify it in all the runner*
objects.

The change is backward compatible, if a runner has a docker registry
mirror specified, it will supersede the default one.
2021-07-15 06:20:08 +09:00
Yusuke Kuoka
acb906164b RunnerSet: Automatic-recovery from registration timeout and deregistration on pod termination (#652)
Ref #629
Ref #613
Ref #612
2021-06-24 20:39:37 +09:00
Yusuke Kuoka
8b90b0f0e3 Clean up import list (#645)
Resolves #644
2021-06-22 17:55:06 +09:00
Yusuke Kuoka
9e4dbf497c feat: RunnerSet backed by StatefulSet (#629)
* feat: RunnerSet backed by StatefulSet

Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens.

Resolves #613
Ref #612

* Upgrade controller-runtime to 0.9.0

* Bump Go to 1.16.x following controller-runtime 0.9.0

* Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup

* Fix startup failure due to missing LeaderElectionID

* Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered

* Allow force-updating statefulset

* Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false

* Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec

* Enable running acceptance tests against arbitrary kind cluster

* RunnerSet supports non-ephemeral runners only today

* fix: docker-build from root Makefile on intel mac

* fix: arch check fixes for mac and ARM

* ci: aligning test data format and patching checks

* fix: removing namespace in test data

* chore: adding more ignores

* chore: removing leading space in shebang

* Re-add metrics to org hra testdata

* Bump cert-manager to v1.1.1 and fix deploy.sh

Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com>
Co-authored-by: Callum James Tait <callum.tait@photobox.com>
2021-06-22 17:10:09 +09:00
Yusuke Kuoka
ae09e6ebb7 Make log level configurable (#541)
Resolves #425
2021-05-11 20:23:06 +09:00
Yusuke Kuoka
9d961c58ff Log used settings on startup 2021-05-11 11:46:35 +09:00
Yusuke Kuoka
6cbba80df1 Add --github-api-cache-duration
Resolves #502
2021-05-11 11:46:35 +09:00
Yusuke Kuoka
bc6e499e4f Make logging more concise (#410)
This makes logging more concise by changing logger names to something like `controllers.Runner` to `actions-runner-controller.runner` after the standard `controller-rutime.controller` and reducing redundant logs by removing unnecessary requeues. I have also tweaked log messages so that their style is more consistent, which will also help readability. Also, runnerreplicaset-controller lacked useful logs so I have enhanced it.
2021-03-20 07:34:25 +09:00
Yusuke Kuoka
1b8a656051 Use --watch-namespace flag to restrict the namespace to watch
Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793172995
2021-03-09 09:46:21 +09:00
Yusuke Kuoka
67f6de010b feat: Common runner labels configurable per controller (#327)
* feat: Common runner labels configurable per controller

Ref #321
2021-02-18 20:19:08 +09:00
Yusuke Kuoka
ab1c39de57 feat: HorizontalRunnerAutoscaler Webhook server (#282)
* feat: HorizontalRunnerAutoscaler Webhook server

This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners.

This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs.

On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun|Push|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners.

Changes:

* Fix integration test
* Update manifests
* chart: Add support for github webhook server
* dockerfile: Include github-webhook-server binary
* Do not import unversioned go-github
* Update README
2021-02-07 17:37:27 +09:00
Yusuke Kuoka
3f335ca628 Fix panic on startup when misconfigured (#154)
Fixes #153
2020-11-10 17:03:33 +09:00
Juho Saarinen
40c5050978 Added support for other than public GitHub URL (#146)
Refactoring a bit
2020-10-28 22:15:53 +09:00
Helder Moreira
7a2fa7fbce runner-controller: do not delete runner if it is busy (#103)
Currently, after refreshing the token, the controller re-creates the runner with the new token. This results in jobs being interrupted. This PR makes sure the pod is not restarted if it is busy.

Closes #74
2020-10-05 09:06:37 +09:00
Yusuke Kuoka
ae30648985 feat: Use HorizontalRunnerAutoscaler for autoscaling 2020-07-27 20:33:44 +09:00
KUOKA Yusuke
5bb2694349 feat: Repository-wide RunnerDeployment Autoscaling (#57)
* feat: Repository-wide RunnerDeployment Autoscaling

This adds `maxReplicas` and `minReplicas` to the RunnerDeploymentSpec. If and only if both fields are set, the controller computes and sets desired `replicas` automatically depending on the demand.

The number of demanded runner replicas is computed by `queued workflow runs + in_progress workflow runs` for the repository. The support for organizational runners is not included.

Ref https://github.com/summerwind/actions-runner-controller/issues/10
2020-06-27 17:26:46 +09:00
Moto Ishizawa
e889eaeb04 Add validation webhooks 2020-04-30 22:11:59 +09:00
Moto Ishizawa
13616ba1b2 Use latest container image 2020-04-16 19:16:15 +09:00
Rui Chen
79f15b4906 Make defaultDockerImage to 19.03.8 2020-04-15 19:47:07 -04:00
Moto Ishizawa
3ccc51433f Use github package to access the GitHub API 2020-04-13 22:28:07 +09:00