Compare commits

..

28 Commits

Author SHA1 Message Date
Yusuke Kuoka
c8f1acd92c chore: bump chart to latest (#1319)
Bumps the chart version along with the controller version.
We bump the patch number for the chart as the release for the controller is a patch release.
That's the same handling as we've done in the previous version ecc8b4472a and #1300

As always, be sure to upgrade CRDs before updating the controller version!
Otherwise it can break in interesting ways.
2022-04-08 10:59:07 +09:00
Yusuke Kuoka
b0fd7a75ea Fix release workflow 2022-04-08 01:36:14 +00:00
Yusuke Kuoka
b09c54045a Prevent runners from stuck in Terminating when pod disappeared without standard termination process (#1318)
This fixes the said issue by additionally treating any runner pod whose phase is Failed or the runner container exited with non-zero code as "complete" so that ARC gives up unregistering the runner from Actions, deletes the runner pod anyway.

Note that there are a plenty of causes for that. If you are deploying runner pods on AWS spot instances or GCE preemptive instances and a job assigned to a runner took more time than the shutdown grace period provided by your cloud provider (2 minutes for AWS spot instances), the runner pod would be terminated prematurely without letting actions/runner unregisters itself from Actions. If your VM or hypervisor failed then runner pods that were running on the node will become PodFailed without unregistering runners from Actions.

Please beware that it is currently users responsibility to clean up any dangling runner resources on GitHub Actions.

Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1307
Might also relate to https://github.com/actions-runner-controller/actions-runner-controller/issues/1273
2022-04-08 10:17:33 +09:00
Yusuke Kuoka
96f2da1c2e Merge pull request #1262 from fgalind1/patch-4
Fix deleting a runner when pod was deleted
2022-04-08 10:17:13 +09:00
Yusuke Kuoka
cac8b76c68 Merge pull request #1292 from actions-runner-controller/renovate/sigs.k8s.io-controller-runtime-0.x
fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.2
2022-04-08 10:14:47 +09:00
Felipe Galindo Sanchez
e24d942d63 Merge remote-tracking branch 'upstream/master' into patch-4 2022-04-06 06:43:01 -07:00
Felipe Galindo Sanchez
b855991373 ci: pin go version to the known working version (#1303) 2022-04-06 09:34:48 +01:00
Felipe Galindo Sanchez
e7e48a77e4 Merge remote-tracking branch 'upstream/master' into patch-4 2022-04-04 09:54:08 -07:00
Yusuke Kuoka
85dea9b67c Merge pull request #1285 from actions-runner-controller/docs/runnersets
docs: add limitations to runnersets + reorder
2022-04-03 18:18:54 +09:00
Yusuke Kuoka
1d9347f418 chore: bump chart to latest (#1300)
* chore: bump chart to latest

Bumps the chart version along with the controller version.
We bump the patch number for the chart as the release for the controller is a patch release.
That's the same handling as we've done in the previous version ecc8b4472a

As always, be sure to upgrade CRDs before updating the controller version!
Otherwise it can break in interesting ways.

* docs: expand on CRD upgrade requirement

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-04-03 10:15:39 +01:00
Yusuke Kuoka
631a70a35f Fix runner pod to be cleaned up earlier regardless of the sync period (#1299)
Ref #1291
2022-04-03 11:12:44 +09:00
Yusuke Kuoka
b614dcf54b Make the hard-coded runner startup timeout to avoid race on token expiration longer (#1296)
Ref #1295
2022-04-03 09:59:35 +09:00
Callum Tait
14f9e7229e docs: highlight why persistent are not ideal 2022-04-01 15:49:15 +01:00
Renovate Bot
82770e145b fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.2 2022-03-30 21:38:12 +00:00
Renovate Bot
971c54bf5c chore(deps): update dependency actions/runner to v2.289.2 2022-03-30 18:18:17 +00:00
renovate[bot]
b80d9b0cdc chore(deps): update helm/chart-releaser-action action to v1.4.0 (#1287)
Co-authored-by: Renovate Bot <bot@renovateapp.com>
2022-03-30 13:24:26 +01:00
Bernardo Meurer
e46df413a1 refactor(runner/entrypoint): check for externalstmp (#1277)
* refactor(runner/entrypoint): check for externalstmp [skip ci]

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-03-30 12:18:18 +01:00
toast-gear
eb02f6f26e docs: redundant words 2022-03-30 10:36:34 +01:00
toast-gear
7a750b9285 docs: wording 2022-03-30 10:35:32 +01:00
toast-gear
d26c8d6529 docs: add autoscaling also causes problems 2022-03-30 10:26:08 +01:00
toast-gear
fd0092d13f chore: new line for consistency 2022-03-30 10:02:33 +01:00
toast-gear
88d17c7988 docs: use the right font 2022-03-30 10:00:34 +01:00
toast-gear
98567dadc9 docs: fix index 2022-03-30 09:59:32 +01:00
toast-gear
7e8d80689b docs: add limitations to runnersets + reorder 2022-03-30 09:53:59 +01:00
Callum Tait
d72c396ff1 docs: slight correction for a multi controller env 2022-03-29 16:57:58 +01:00
Milan Aleks
13e7b440a8 chore: typo fix in runner Dockerfile [skip ci] (#1270) 2022-03-29 11:05:24 +01:00
Michael Goodness
a95983fb98 feat(kustomize): add github-webhook-server overlay (#1198)
* feat(kustomize): add github-webhook-server overlay

* chore(kustomize): add image to github-webhook-server overlay

* feat(kustomize): drop sync-period
2022-03-29 11:00:55 +01:00
Felipe Galindo Sanchez
9657d3e5b3 Fix deleting a runner when pod was deleted
With the current implementation if a pod is deleted, controller is failing to delete the runner as it's trying to annotate a pod that doesn't exist as we're passing a new pod object that is not an existing resource
2022-03-22 14:44:50 -07:00
24 changed files with 326 additions and 56 deletions

View File

@@ -114,7 +114,7 @@ jobs:
git config user.email "$GITHUB_ACTOR@users.noreply.github.com"
- name: Run chart-releaser
uses: helm/chart-releaser-action@v1.3.0
uses: helm/chart-releaser-action@v1.4.0
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

View File

@@ -20,7 +20,7 @@ jobs:
- uses: actions/setup-go@v3
with:
go-version: '^1.17.7'
go-version: '1.17.7'
- name: Install tools
run: |

View File

@@ -15,7 +15,7 @@ on:
- '!**.md'
env:
RUNNER_VERSION: 2.289.1
RUNNER_VERSION: 2.289.2
DOCKER_VERSION: 20.10.12
DOCKERHUB_USERNAME: summerwind

View File

@@ -24,7 +24,8 @@ jobs:
uses: actions/checkout@v3
- uses: actions/setup-go@v3
with:
go-version: '^1.17.7'
go-version: '1.17.7'
check-latest: false
- run: go version
- uses: actions/cache@v3
with:

View File

@@ -19,6 +19,7 @@ ToC:
- [Enterprise Runners](#enterprise-runners)
- [RunnerDeployments](#runnerdeployments)
- [RunnerSets](#runnersets)
- [Persistent Runners](#persistent-runners)
- [Autoscaling](#autoscaling)
- [Anti-Flapping Configuration](#anti-flapping-configuration)
- [Pull Driven Scaling](#pull-driven-scaling)
@@ -32,7 +33,6 @@ ToC:
- [Runner Groups](#runner-groups)
- [Runner Entrypoint Features](#runner-entrypoint-features)
- [Using IRSA (IAM Roles for Service Accounts) in EKS](#using-irsa-iam-roles-for-service-accounts-in-eks)
- [Persistent Runners](#persistent-runners)
- [Software Installed in the Runner Image](#software-installed-in-the-runner-image)
- [Using without cert-manager](#using-without-cert-manager)
- [Troubleshooting](#troubleshooting)
@@ -233,9 +233,9 @@ If you plan on installing all instances of the controller stack into a single na
1. All resources per stack must have a unique, in the case of Helm this can be done by giving each install a unique release name, or via the `fullnameOverride` properties.
2. `authSecret.name` needs be unique per stack when each stack is tied to runners in different GitHub organizations and repositories AND you want your GitHub credentials to narrowly scoped.
3. `leaderElectionId` needs to be unique per stack. If this is not unique to the stack the controller tries to race onto the leader election lock resulting in only one stack working concurrently. Your controller will be stuck with a log message something like this `attempting to acquire leader lease arc-controllers/actions-runner-controller...`
4. The MutatingWebhookConfiguration in each stack must include a namespace selector for that stacks corresponding runners namespace, this is already configured in the helm chart.
4. The MutatingWebhookConfiguration in each stack must include a namespace selector for that stacks corresponding runner namespace, this is already configured in the helm chart.
Alternatively, you can install each controller stack into a unique namespace (relative to other controller stacks in the cluster), avoiding these potential pitfalls.
Alternatively, you can install each controller stack into a unique namespace (relative to other controller stacks in the cluster). Implementing ARC this way avoids the first, second and third pitfalls (you still need to set the corresponding namespace selector for each stacks mutating webhook)
## Usage
@@ -367,6 +367,8 @@ example-runnerdeploy2475ht2qbr mumoshu/actions-runner-controller-ci Running
> This feature requires controller version => [v0.20.0](https://github.com/actions-runner-controller/actions-runner-controller/releases/tag/v0.20.0)
_Ensure you see the limitations before using this kind!!!!!_
For scenarios where you require the advantages of a `StatefulSet`, for example persistent storage, ARC implements a runner based on Kubernete's StatefulSets, the RunnerSet.
A basic `RunnerSet` would look like this:
@@ -450,6 +452,21 @@ Under the hood, `RunnerSet` relies on Kubernetes's `StatefulSet` and Mutating We
**Limitations**
* For autoscaling the `RunnerSet` kind only supports pull driven scaling or the `workflow_job` event for webhook driven scaling.
* Whilst `RunnerSets` support all runner modes as well as autoscaling, currently PVs are **NOT** automatically cleaned up as they are still bound to their respective PVCs when a runner is deleted by the controller. This has **major** implications when using `RunnerSets` in the standard runner mode, `ephemeral: true`, see [persistent runners](#persistent-runners) for more details. As a result of this, using the default ephemeral configuration or implementing autoscaling for your `RunnerSets`, you will get a build up of PVCs and PVs without some sort of custom solution for cleaning up.
### Persistent Runners
Every runner managed by ARC is "ephemeral" by default. The life of an ephemeral runner managed by ARC looks like this- ARC creates a runner pod for the runner. As it's an ephemeral runner, the `--ephemeral` flag is passed to the `actions/runner` agent that runs within the `runner` container of the runner pod.
`--ephemeral` is an `actions/runner` feature that instructs the runner to stop and de-register itself after the first job run.
Once the ephemeral runner has completed running a workflow job, it stops with a status code of 0, hence the runner pod is marked as completed, removed by ARC.
As it's removed after a workflow job run, the runner pod is never reused across multiple GitHub Actions workflow jobs, providing you a clean environment per each workflow job.
Although not generally recommended, it's possible to disable passing `--ephemeral` flag by explicitly setting `ephemeral: false` in the `RunnerDeployment` or `RunnerSet` spec. When disabled, your runner becomes "persistent". A persistent runner does not stop after workflow job ends, and in this mode `actions/runner` is known to clean only runner's work dir after each job. Whilst this can seem helpful it creates a non-deterministic environment which is not ideal for a CI/CD environment. Between runs your actions cache, docker images stored in the `dind` and layer cache, globally installed packages etc are retained across multiple workflow job runs which can cause issues which are hard to debug and inconsistent.
Persistent runners are available as an option for some edge cases however they are not preferred as they can create challenges around providing a deterministic and secure environment.
### Autoscaling
@@ -668,7 +685,7 @@ The primary benefit of autoscaling on Webhook compared to the pull driven scalin
> You can learn the implementation details in [#282](https://github.com/actions-runner-controller/actions-runner-controller/pull/282)
To enable this feature, you firstly need to install the webhook server, currently, only our Helm chart has the ability install it:
To enable this feature, you first need to install the GitHub webhook server. To install via our Helm chart,
_[see the values documentation for all configuration options](https://github.com/actions-runner-controller/actions-runner-controller/blob/master/charts/actions-runner-controller/README.md)_
```console
@@ -1311,21 +1328,6 @@ spec:
securityContext:
fsGroup: 1000
```
### Persistent Runners
Every runner managed by ARC is "ephemeral" by default. The life of an ephemeral runner managed by ARC looks like this- ARC creates a runner pod for the runner. As it's an ephemeral runner, the `--ephemeral` flag is passed to the `actions/runner` agent that runs within the `runner` container of the runner pod.
`--ephemeral` is an `actions/runner` feature that instructs the runner to stop and de-register itself after the first job run.
Once the ephemeral runner has completed running a workflow job, it stops with a status code of 0, hence the runner pod is marked as completed, removed by ARC.
As it's removed after a workflow job run, the runner pod is never reused across multiple GitHub Actions workflow jobs, providing you a clean environment per each workflow job.
Although not recommended, it's possible to disable passing `--ephemeral` flag by explicitly setting `ephemeral: false` in the `RunnerDeployment` or `RunnerSet` spec. When disabled, your runner becomes "persistent". A persistent runner does not stop after workflow job ends, and in this mode `actions/runner` is known to clean only runner's work dir after each job. That means your runner's environment, including various actions cache, docker images stored in the `dind` and layer cache, is retained across multiple workflow job runs.
Persistent runners are available as an option for some edge cases however they are not preferred as they can create challenges around providing a deterministic and secure environment.
### Software Installed in the Runner Image
**Cloud Tooling**<br />

View File

@@ -181,6 +181,9 @@ func (rs *RunnerSpec) ValidateRepository() error {
// RunnerStatus defines the observed state of Runner
type RunnerStatus struct {
// Turns true only if the runner pod is ready.
// +optional
Ready bool `json:"ready"`
// +optional
Registration RunnerStatusRegistration `json:"registration"`
// +optional

View File

@@ -15,10 +15,10 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.17.1
version: 0.17.3
# Used as the default manager tag value when no tag property is provided in the values.yaml
appVersion: 0.22.1
appVersion: 0.22.3
home: https://github.com/actions-runner-controller/actions-runner-controller

View File

@@ -5126,6 +5126,9 @@ spec:
type: string
phase:
type: string
ready:
description: Turns true only if the runner pod is ready.
type: boolean
reason:
type: string
registration:

View File

@@ -18,7 +18,7 @@ Due to the above you can't just do a `helm upgrade` to release the latest versio
## Steps
1. Upgrade CRDs
1. Upgrade CRDs, this isn't optional, the CRDs you are using must be those that correspond with the version of the controller you are installing
```shell
# REMEMBER TO UPDATE THE CHART_VERSION TO RELEVANT CHART VERISON!!!!

View File

@@ -5126,6 +5126,9 @@ spec:
type: string
phase:
type: string
ready:
description: Turns true only if the runner pod is ready.
type: boolean
reason:
type: string
registration:

View File

@@ -0,0 +1,23 @@
# This patch injects an HTTP proxy sidecar container that performs RBAC
# authorization against the Kubernetes API using SubjectAccessReviews.
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-webhook-server
spec:
template:
spec:
containers:
- name: kube-rbac-proxy
image: quay.io/brancz/kube-rbac-proxy:v0.10.0
args:
- '--secure-listen-address=0.0.0.0:8443'
- '--upstream=http://127.0.0.1:8080/'
- '--logtostderr=true'
- '--v=10'
ports:
- containerPort: 8443
name: https
- name: github-webhook-server
args:
- '--metrics-addr=127.0.0.1:8080'

View File

@@ -20,19 +20,22 @@ bases:
- ../webhook
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus
# [GH_WEBHOOK_SERVER] To enable the GitHub webhook server, uncomment all sections with 'GH_WEBHOOK_SERVER'.
#- ../github-webhook-server
patchesStrategicMerge:
# Protect the /metrics endpoint by putting it behind auth.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
# Protect the /metrics endpoint by putting it behind auth.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
- manager_auth_proxy_patch.yaml
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, uncomment the following line and
# comment manager_auth_proxy_patch.yaml.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, uncomment the following line and
# comment manager_auth_proxy_patch.yaml.
# Only one of manager_auth_proxy_patch.yaml and
# manager_prometheus_metrics_patch.yaml should be enabled.
#- manager_prometheus_metrics_patch.yaml
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in crd/kustomization.yaml
@@ -43,6 +46,10 @@ patchesStrategicMerge:
# 'CERTMANAGER' needs to be enabled to use ca injection
- webhookcainjection_patch.yaml
# [GH_WEBHOOK_SERVER] To enable the GitHub webhook server, uncomment all sections with 'GH_WEBHOOK_SERVER'.
# Protect the GitHub webhook server metrics endpoint by putting it behind auth.
# - gh-webhook-server-auth-proxy-patch.yaml
# the following config is for teaching kustomize how to do var substitution
vars:
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.

View File

@@ -23,4 +23,3 @@ spec:
args:
- "--metrics-addr=127.0.0.1:8080"
- "--enable-leader-election"
- "--sync-period=10m"

View File

@@ -0,0 +1,37 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
template:
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
spec:
containers:
- name: github-webhook-server
image: controller:latest
command:
- '/github-webhook-server'
env:
- name: GITHUB_WEBHOOK_SECRET_TOKEN
valueFrom:
secretKeyRef:
key: github_webhook_secret_token
name: github-webhook-server
optional: true
ports:
- containerPort: 8000
name: http
protocol: TCP
serviceAccountName: github-webhook-server
terminationGracePeriodSeconds: 10

View File

@@ -0,0 +1,12 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
images:
- name: controller
newName: summerwind/actions-runner-controller
newTag: latest
resources:
- deployment.yaml
- rbac.yaml
- service.yaml

View File

@@ -0,0 +1,113 @@
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
rules:
- apiGroups:
- actions.summerwind.dev
resources:
- horizontalrunnerautoscalers
verbs:
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- horizontalrunnerautoscalers/finalizers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- horizontalrunnerautoscalers/status
verbs:
- get
- patch
- update
- apiGroups:
- actions.summerwind.dev
resources:
- runnersets
verbs:
- get
- list
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- runnerdeployments
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- runnerdeployments/finalizers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- actions.summerwind.dev
resources:
- runnerdeployments/status
verbs:
- get
- patch
- update
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: github-webhook-server
subjects:
- kind: ServiceAccount
name: github-webhook-server

View File

@@ -0,0 +1,16 @@
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller
name: github-webhook-server
spec:
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app.kubernetes.io/component: github-webhook-server
app.kubernetes.io/part-of: actions-runner-controller

View File

@@ -106,15 +106,16 @@ func (r *RunnerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
return ctrl.Result{}, nil
}
} else {
// Request to remove a runner. DeletionTimestamp was set in the runner - we need to unregister runner
var pod corev1.Pod
if err := r.Get(ctx, req.NamespacedName, &pod); err != nil {
if !kerrors.IsNotFound(err) {
log.Info(fmt.Sprintf("Retrying soon as we failed to get runner pod: %v", err))
return ctrl.Result{Requeue: true}, nil
}
// Pod was not found
return r.processRunnerDeletion(runner, ctx, log, nil)
}
// Request to remove a runner. DeletionTimestamp was set in the runner - we need to unregister runner
return r.processRunnerDeletion(runner, ctx, log, &pod)
}
@@ -132,7 +133,9 @@ func (r *RunnerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
phase = "Created"
}
if runner.Status.Phase != phase {
ready := runnerPodReady(&pod)
if runner.Status.Phase != phase || runner.Status.Ready != ready {
if pod.Status.Phase == corev1.PodRunning {
// Seeing this message, you can expect the runner to become `Running` soon.
log.V(1).Info(
@@ -143,6 +146,7 @@ func (r *RunnerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
updated := runner.DeepCopy()
updated.Status.Phase = phase
updated.Status.Ready = ready
updated.Status.Reason = pod.Status.Reason
updated.Status.Message = pod.Status.Message
@@ -155,6 +159,18 @@ func (r *RunnerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
return ctrl.Result{}, nil
}
func runnerPodReady(pod *corev1.Pod) bool {
for _, c := range pod.Status.Conditions {
if c.Type != corev1.PodReady {
continue
}
return c.Status == corev1.ConditionTrue
}
return false
}
func runnerContainerExitCode(pod *corev1.Pod) *int32 {
for _, status := range pod.Status.ContainerStatuses {
if status.Name != containerName {
@@ -172,7 +188,7 @@ func runnerContainerExitCode(pod *corev1.Pod) *int32 {
func runnerPodOrContainerIsStopped(pod *corev1.Pod) bool {
// If pod has ended up succeeded we need to restart it
// Happens e.g. when dind is in runner and run completes
stopped := pod.Status.Phase == corev1.PodSucceeded
stopped := pod.Status.Phase == corev1.PodSucceeded || pod.Status.Phase == corev1.PodFailed
if !stopped {
if pod.Status.Phase == corev1.PodRunning {
@@ -181,7 +197,7 @@ func runnerPodOrContainerIsStopped(pod *corev1.Pod) bool {
continue
}
if status.State.Terminated != nil && status.State.Terminated.ExitCode == 0 {
if status.State.Terminated != nil {
stopped = true
}
}

View File

@@ -153,8 +153,18 @@ func (c *Client) GetRegistrationToken(ctx context.Context, enterprise, org, repo
key := getRegistrationKey(org, repo, enterprise)
rt, ok := c.regTokens[key]
// we like to give runners a chance that are just starting up and may miss the expiration date by a bit
runnerStartupTimeout := 3 * time.Minute
// We'd like to allow the runner just starting up to miss the expiration date by a bit.
// Note that this means that we're going to cache Creation Registraion Token API response longer than the
// recommended cache duration.
//
// https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-a-repository
// https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-an-organization
// https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-an-enterprise
// https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests
//
// This is currently set to 30 minutes as the result of the discussion took place at the following issue:
// https://github.com/actions-runner-controller/actions-runner-controller/issues/1295
runnerStartupTimeout := 30 * time.Minute
if ok && rt.GetExpiresAt().After(time.Now().Add(runnerStartupTimeout)) {
return rt, nil

12
go.mod
View File

@@ -19,10 +19,10 @@ require (
go.uber.org/zap v1.21.0
golang.org/x/oauth2 v0.0.0-20220309155454-6242fa91716a
gomodules.xyz/jsonpatch/v2 v2.2.0
k8s.io/api v0.23.4
k8s.io/apimachinery v0.23.4
k8s.io/client-go v0.23.4
sigs.k8s.io/controller-runtime v0.11.1
k8s.io/api v0.23.5
k8s.io/apimachinery v0.23.5
k8s.io/client-go v0.23.5
sigs.k8s.io/controller-runtime v0.11.2
sigs.k8s.io/yaml v1.3.0
)
@@ -68,8 +68,8 @@ require (
gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b // indirect
k8s.io/apiextensions-apiserver v0.23.0 // indirect
k8s.io/component-base v0.23.0 // indirect
k8s.io/apiextensions-apiserver v0.23.5 // indirect
k8s.io/component-base v0.23.5 // indirect
k8s.io/klog/v2 v2.30.0 // indirect
k8s.io/kube-openapi v0.0.0-20211115234752-e816edb12b65 // indirect
k8s.io/utils v0.0.0-20211116205334-6203023598ed // indirect

15
go.sum
View File

@@ -947,18 +947,30 @@ honnef.co/go/tools v0.0.1-2020.1.4/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9
k8s.io/api v0.23.0/go.mod h1:8wmDdLBHBNxtOIytwLstXt5E9PddnZb0GaMcqsvDBpg=
k8s.io/api v0.23.4 h1:85gnfXQOWbJa1SiWGpE9EEtHs0UVvDyIsSMpEtl2D4E=
k8s.io/api v0.23.4/go.mod h1:i77F4JfyNNrhOjZF7OwwNJS5Y1S9dpwvb9iYRYRczfI=
k8s.io/api v0.23.5 h1:zno3LUiMubxD/V1Zw3ijyKO3wxrhbUF1Ck+VjBvfaoA=
k8s.io/api v0.23.5/go.mod h1:Na4XuKng8PXJ2JsploYYrivXrINeTaycCGcYgF91Xm8=
k8s.io/apiextensions-apiserver v0.23.0 h1:uii8BYmHYiT2ZTAJxmvc3X8UhNYMxl2A0z0Xq3Pm+WY=
k8s.io/apiextensions-apiserver v0.23.0/go.mod h1:xIFAEEDlAZgpVBl/1VSjGDmLoXAWRG40+GsWhKhAxY4=
k8s.io/apiextensions-apiserver v0.23.5 h1:5SKzdXyvIJKu+zbfPc3kCbWpbxi+O+zdmAJBm26UJqI=
k8s.io/apiextensions-apiserver v0.23.5/go.mod h1:ntcPWNXS8ZPKN+zTXuzYMeg731CP0heCTl6gYBxLcuQ=
k8s.io/apimachinery v0.23.0/go.mod h1:fFCTTBKvKcwTPFzjlcxp91uPFZr+JA0FubU4fLzzFYc=
k8s.io/apimachinery v0.23.4 h1:fhnuMd/xUL3Cjfl64j5ULKZ1/J9n8NuQEgNL+WXWfdM=
k8s.io/apimachinery v0.23.4/go.mod h1:BEuFMMBaIbcOqVIJqNZJXGFTP4W6AycEpb5+m/97hrM=
k8s.io/apimachinery v0.23.5 h1:Va7dwhp8wgkUPWsEXk6XglXWU4IKYLKNlv8VkX7SDM0=
k8s.io/apimachinery v0.23.5/go.mod h1:BEuFMMBaIbcOqVIJqNZJXGFTP4W6AycEpb5+m/97hrM=
k8s.io/apiserver v0.23.0/go.mod h1:Cec35u/9zAepDPPFyT+UMrgqOCjgJ5qtfVJDxjZYmt4=
k8s.io/apiserver v0.23.5/go.mod h1:7wvMtGJ42VRxzgVI7jkbKvMbuCbVbgsWFT7RyXiRNTw=
k8s.io/client-go v0.23.0/go.mod h1:hrDnpnK1mSr65lHHcUuIZIXDgEbzc7/683c6hyG4jTA=
k8s.io/client-go v0.23.4 h1:YVWvPeerA2gpUudLelvsolzH7c2sFoXXR5wM/sWqNFU=
k8s.io/client-go v0.23.4/go.mod h1:PKnIL4pqLuvYUK1WU7RLTMYKPiIh7MYShLshtRY9cj0=
k8s.io/client-go v0.23.5 h1:zUXHmEuqx0RY4+CsnkOn5l0GU+skkRXKGJrhmE2SLd8=
k8s.io/client-go v0.23.5/go.mod h1:flkeinTO1CirYgzMPRWxUCnV0G4Fbu2vLhYCObnt/r4=
k8s.io/code-generator v0.23.0/go.mod h1:vQvOhDXhuzqiVfM/YHp+dmg10WDZCchJVObc9MvowsE=
k8s.io/code-generator v0.23.5/go.mod h1:S0Q1JVA+kSzTI1oUvbKAxZY/DYbA/ZUb4Uknog12ETk=
k8s.io/component-base v0.23.0 h1:UAnyzjvVZ2ZR1lF35YwtNY6VMN94WtOnArcXBu34es8=
k8s.io/component-base v0.23.0/go.mod h1:DHH5uiFvLC1edCpvcTDV++NKULdYYU6pR9Tt3HIKMKI=
k8s.io/component-base v0.23.5 h1:8qgP5R6jG1BBSXmRYW+dsmitIrpk8F/fPEvgDenMCCE=
k8s.io/component-base v0.23.5/go.mod h1:c5Nq44KZyt1aLl0IpHX82fhsn84Sb0jjzwjpcA42bY0=
k8s.io/gengo v0.0.0-20210813121822-485abfe95c7c/go.mod h1:FiNAH4ZV3gBg2Kwh89tzAEV2be7d5xI0vBa/VySYy3E=
k8s.io/klog/v2 v2.0.0/go.mod h1:PBfzABfn139FHAV07az/IF9Wp1bkk3vpT2XSJ76fSDE=
k8s.io/klog/v2 v2.2.0/go.mod h1:Od+F08eJP+W3HUb4pSrPpgp9DGU4GzlpG/TmITuYh/Y=
@@ -974,8 +986,11 @@ rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8
rsc.io/quote/v3 v3.1.0/go.mod h1:yEA65RcK8LyAZtP9Kv3t0HmxON59tX3rD+tICJqUlj0=
rsc.io/sampler v1.3.0/go.mod h1:T1hPZKmBbMNahiBKFy5HrXp6adAjACjK9JXDnKaTXpA=
sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.0.25/go.mod h1:Mlj9PNLmG9bZ6BHFwFKDo5afkpWyUISkb9Me0GnK66I=
sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.0.30/go.mod h1:fEO7lRTdivWO2qYVCVG7dEADOMo/MLDCVr8So2g88Uw=
sigs.k8s.io/controller-runtime v0.11.1 h1:7YIHT2QnHJArj/dk9aUkYhfqfK5cIxPOX5gPECfdZLU=
sigs.k8s.io/controller-runtime v0.11.1/go.mod h1:KKwLiTooNGu+JmLZGn9Sl3Gjmfj66eMbCQznLP5zcqA=
sigs.k8s.io/controller-runtime v0.11.2 h1:H5GTxQl0Mc9UjRJhORusqfJCIjBO8UtUxGggCwL1rLA=
sigs.k8s.io/controller-runtime v0.11.2/go.mod h1:P6QCzrEjLaZGqHsfd+os7JQ+WFZhvB8MRFsn4dWF7O4=
sigs.k8s.io/json v0.0.0-20211020170558-c049b76a60c6 h1:fD1pz4yfdADVNfFmcP2aBEtudwUQ1AlLnRBALr33v3s=
sigs.k8s.io/json v0.0.0-20211020170558-c049b76a60c6/go.mod h1:p4QtZmO4uMYipTQNzagwnNoseA6OxSUutVw05NhYDRs=
sigs.k8s.io/structured-merge-diff/v4 v4.0.2/go.mod h1:bJZC9H9iH24zzfZ/41RGcq60oK1F7G282QMXDPYydCw=

View File

@@ -83,7 +83,7 @@ ENV HOME=/home/runner
#
# If you're willing to uncomment the following line, you'd also need to comment-out the
# && curl -L -o runner.tar.gz https://github.com/actions/runner/releases/download/v${RUNNER_VERSION}/actions-runner-linux-${ARCH}-${RUNNER_VERSION}.tar.gz \
# line in the next `RUN` command in this Dockerfile, to avoid overwiding this runner.tar.gz with a remote one.
# line in the next `RUN` command in this Dockerfile, to avoid overwiting this runner.tar.gz with a remote one.
# COPY actions-runner-linux-x64-2.280.3.tar.gz /runnertmp/runner.tar.gz

View File

@@ -151,7 +151,7 @@ cat .runner
# https://api.github.com/repos/USER/REPO/actions/runners/171
if [ -z "${UNITTEST:-}" ]; then
mkdir ./externals
mkdir -p ./externals
# Hack due to the DinD volumes
mv ./externalstmp/* ./externals/
fi

View File

@@ -37,13 +37,23 @@ var (
},
{
Dockerfile: "../../runner/Dockerfile",
Args: []testing.BuildArg{},
Image: runnerImage,
Args: []testing.BuildArg{
{
Name: "RUNNER_VERSION",
Value: "2.289.2",
},
},
Image: runnerImage,
},
{
Dockerfile: "../../runner/Dockerfile.dindrunner",
Args: []testing.BuildArg{},
Image: runnerDindImage,
Args: []testing.BuildArg{
{
Name: "RUNNER_VERSION",
Value: "2.289.2",
},
},
Image: runnerDindImage,
},
}
@@ -58,7 +68,7 @@ var (
}
commonScriptEnv = []string{
"SYNC_PERIOD=" + "10s",
"SYNC_PERIOD=" + "30m",
"NAME=" + controllerImageRepo,
"VERSION=" + controllerImageTag,
"RUNNER_TAG=" + runnerImageTag,