Commit Graph

27 Commits

Author SHA1 Message Date
Yusuke Kuoka
f858e2e432 Add POC of GitHub Webhook Delivery Forwarder (#682)
* Add POC of GitHub Webhook Delivery Forwarder

* multi-forwarder and ctrl-c existing and fix for non-woring http post

* Rename source files

* Extract signal handling into a dedicated source file

* Faster ctrl-c handling

* Enable automatic creation of repo hook on startup

* Add support for forwarding org hook deliveries

* Set hook secret on hook creation via envvar (HOOK_SECRET)

* Fix org hook support

* Fix HOOK_SECRET for consistency

* Refactor to prepare for custom log position provider

* Refactor to extract inmemory log position provider

* Add configmap-based log position provider

* Rename githubwebhookdeliveryforwarder to hookdeliveryforwarder

* Refactor to rename LogPositionProvider to Checkpointer and extract ConfigMap checkpointer into a dedicated pkg

* Refactor to extract logger initialization

* Add hookdeliveryforwarder README and bump go-github to unreleased ver
2021-07-14 10:18:55 +09:00
Yusuke Kuoka
f19e7ea8a8 chore: Upgrade go-github to v36 (#681) 2021-07-04 17:43:52 +09:00
Yusuke Kuoka
8b90b0f0e3 Clean up import list (#645)
Resolves #644
2021-06-22 17:55:06 +09:00
Yusuke Kuoka
3f23501b8e Reduce "No runner matching the specified labels was found" errors while runner replacement (#392)
We occasionally encountered those errors while the underlying RunnerReplicaSet is being recreated/replaced on RunnerDeployment.Spec.Template update. It turned out to be due to that the RunnerDeployment controller was waiting for the runner pod becomes `Running`, intead of the new replacement runner to have registered to GitHub. This fixes that, by trying to Runner.Status.Phase to `Running` only after the runner in the runner pod appears to be registered.

A side-effect of this change is that runner controller would call more "ListRunners" GitHub Actions API. I've reviewed and improved the runner controller code and Runner CRD to make make the number of calls minimum. In most cases, ListRunners should be called only twice for each runner creation.
2021-03-16 10:52:30 +09:00
Yusuke Kuoka
81016154c0 GITHUB_APP_PRIVATE_KEY can now be the content of the key (#383)
Resolves #382
2021-03-10 09:37:15 +09:00
Yusuke Kuoka
9ae3551744 Remove unnecessary GitHub API calls (#363)
The controller had the 2 extra and redundant calls to List Workflow Runs API.

Ref #362
2021-03-02 10:55:30 +09:00
Johannes Nicolai
4d4137aa28 Avoid zombie runners that missed token expiration by a bit (#345)
* if a new runner pod was just scheduled to start up right before a 
registration expired, it will not get a new registration token and go in 
an infinite update loop (until #341) kicks in
* if registzration tokens got updated a little bit before they actually 
expired, just starting up pods will way more likely get a working token
2021-02-25 09:07:49 +09:00
Johannes Nicolai
2d7fbbfb68 Handle offline runners gracefully (#341)
* if a runner pod starts up with an invalid token, it will go in an 
infinite retry loop, appearing as RUNNING from the outside
* normally, this error situation is detected because no corresponding 
runner objects exists in GitHub and the pod will get removed after 
registration timeout
* if the GitHub runner object already existed before - e.g. because a 
finalizer was not properly run as part of a partial Kubernetes crash, 
the runner will always stay in a running mode, even updating the 
registration token will not kill the problematic pod
* introducing RunnerOffline exception that can be handled in runner 
controller and replicaset controller
* as runners are offline when a pod is completed and marked for restart, 
only do additional restart checks if no restart was already decided, 
making code a bit cleaner and saving GitHub API calls after each job 
completion
2021-02-22 10:08:04 +09:00
Yusuke Kuoka
eb2eaf8130 Fix TotalNumberOfQueuedAndInProgressWorkflowRuns to work with a lot of remaining completed jobs (#316)
I have heard from some user that they have hundred thousands of `status=completed` workflow runs in their repository which effectively blocked TotalNumberOfQueuedAndInProgressWorkflowRuns from working because of GitHub API rate limit due to excessive paginated requests.

This fixes that by separating list-workflow-runs calls to two - one for `queued` and one for `in_progress`, which can make the minimum API call from 1 to 2, but allows it to work regardless of number of remaining `completed` workflow runs.
2021-02-16 18:55:55 +09:00
Yusuke Kuoka
35d047db01 Fix enterprise runners misusing cached token (#314)
Follow-up for #290
2021-02-16 12:56:52 +09:00
Hidetake Iwata
4f3f2fb60d Add metrics for GitHub API rate limit (#312) 2021-02-16 09:58:09 +09:00
Johannes Nicolai
bc8bc70f69 Fix rate limit and runner registration logic (#309)
* errors.Is compares all members of a struct to return true which never 
happened
* switched to type check instead of exact value check
* notRegistered was using double negation in if statement which lead to 
unregistering runners after the registration timeout
2021-02-15 09:36:49 +09:00
Yusuke Kuoka
bbb036e732 feat: Prevent blocking on transient runner registration failure (#297)
This enhances the controller to recreate the runner pod if the corresponding runner has failed to register itself to GitHub within 10 minutes(currently hard-coded).

It should alleviate #288 in case the root cause is some kind of transient failures(network unreliability, GitHub down, temporarly compute resource shortage, etc).

Formerly you had to manually detect and delete such pods or even force-delete corresponding runners to unblock the controller.

Since this enhancement, the controller does the pod deletion automatically after 10 minutes after pod creation, which result in the controller create another pod that might work.

Ref #288
2021-02-09 10:17:52 +09:00
Yusuke Kuoka
9301409aec fix: Paginate ListRepositoryWorkflowRuns (#295)
When we used `QueuedAndInProgressWorkflowRuns`-based autoscaling, it only fetched and considered only the first 30 workflow runs at the reconcilation time. This may have resulted in unreliable scaling behaviour, like scale-in/out not happening when it was expected.
2021-02-09 10:13:53 +09:00
Jesse Haka
28e80a2d28 Add support for enterprise runners (#290)
* Add support for enterprise runners

* update docs
2021-02-05 09:31:06 +09:00
Reinier Timmer
8d6f77e07c Remove beta GitHub client implementations (#228) 2020-12-10 09:08:51 +09:00
Erik Nobel
a2b335ad6a Github pkg: Bump github package to version 33 (#222) 2020-12-06 10:01:47 +09:00
ZacharyBenamram
df99f394b4 Remove 10 minute buffer to token expiration (#214)
Co-authored-by: Zachary Benamram <zacharybenamram@blend.com>
2020-11-30 09:03:27 +09:00
Yusuke Kuoka
4eb45d3c7f Fix build error 2020-11-10 17:09:16 +09:00
Juho Saarinen
1c30bdf35b Add GHE URL to transport (#152)
Fixes #149
2020-11-10 17:05:09 +09:00
Yusuke Kuoka
3f335ca628 Fix panic on startup when misconfigured (#154)
Fixes #153
2020-11-10 17:03:33 +09:00
Juho Saarinen
40c5050978 Added support for other than public GitHub URL (#146)
Refactoring a bit
2020-10-28 22:15:53 +09:00
Helder Moreira
7a2fa7fbce runner-controller: do not delete runner if it is busy (#103)
Currently, after refreshing the token, the controller re-creates the runner with the new token. This results in jobs being interrupted. This PR makes sure the pod is not restarted if it is busy.

Closes #74
2020-10-05 09:06:37 +09:00
KUOKA Yusuke
5bb2694349 feat: Repository-wide RunnerDeployment Autoscaling (#57)
* feat: Repository-wide RunnerDeployment Autoscaling

This adds `maxReplicas` and `minReplicas` to the RunnerDeploymentSpec. If and only if both fields are set, the controller computes and sets desired `replicas` automatically depending on the demand.

The number of demanded runner replicas is computed by `queued workflow runs + in_progress workflow runs` for the repository. The support for organizational runners is not included.

Ref https://github.com/summerwind/actions-runner-controller/issues/10
2020-06-27 17:26:46 +09:00
Reinier Timmer
9f57f52e36 organization and repository are now exclusive 2020-04-28 11:14:31 +02:00
Reinier Timmer
fb35dd4131 support for organization runners 2020-04-28 11:14:31 +02:00
Moto Ishizawa
5f608058cd Add github package 2020-04-13 22:27:05 +09:00