Quote vars in entrypoint.sh to prevent unwanted argument split (#420 )

Prevents arguments from being split when e.g. the RUNNER_GROUP variable contains spaces (which is legit. One can create such groups in GitHub). I've seen that all workers with group names that contain no spaces can register successfully, while all workers with groups that contain spaces will not register. Furthermore, I suppose also other chars can be used here to inject arbitrary commands in an unsupported way via e.g. pipe symbol. Quoting the vars correctly should prevent that and allow for e.g. group names and runner labels with spaces and other bash reserved characters.
Fix MTU configuration for dockerd (#421 )
2025-12-10 11:41:27 +00:00 · 2021-03-31 10:09:08 +09:00 · 2021-03-31 09:29:21 +09:00 · 2021-03-31 09:23:16 +09:00 · 2021-03-29 10:08:21 +09:00 · 2021-03-25 10:23:36 +09:00
23 changed files with 601 additions and 211 deletions
--- a/72
+++ b/72
@@ -14,6 +14,8 @@ else
 GOBIN=$(shell go env GOBIN)
 endif

+TEST_ASSETS=$(PWD)/test-assets
+
 # default list of platforms for which multiarch image is built
 ifeq (${PLATFORMS}, )
 	export PLATFORMS="linux/amd64,linux/arm64"
@@ -37,6 +39,13 @@ all: manager
 test: generate fmt vet manifests
 	go test ./... -coverprofile cover.out

+test-with-deps: kube-apiserver etcd kubectl
+	# See https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/envtest#pkg-constants
+	TEST_ASSET_KUBE_APISERVER=$(KUBE_APISERVER_BIN) \
+	TEST_ASSET_ETCD=$(ETCD_BIN) \
+	TEST_ASSET_KUBECTL=$(KUBECTL_BIN) \
+	  make test
+
 # Build manager binary
 manager: generate fmt vet
 	go build -o bin/manager main.go
@@ -191,3 +200,66 @@ ifeq (, $(wildcard $(GOBIN)/yq))
 	}
 endif
 YQ=$(GOBIN)/yq
+
+OS_NAME := $(shell uname -s | tr A-Z a-z)
+
+# find or download etcd
+etcd:
+ifeq (, $(wildcard $(TEST_ASSETS)/etcd))
+	@{ \
+	set -xe ;\
+	INSTALL_TMP_DIR=$$(mktemp -d) ;\
+	cd $$INSTALL_TMP_DIR ;\
+	wget https://github.com/kubernetes-sigs/kubebuilder/releases/download/v2.3.2/kubebuilder_2.3.2_$(OS_NAME)_amd64.tar.gz ;\
+	mkdir -p $(TEST_ASSETS) ;\
+	tar zxvf kubebuilder_2.3.2_$(OS_NAME)_amd64.tar.gz ;\
+	mv kubebuilder_2.3.2_$(OS_NAME)_amd64/bin/etcd $(TEST_ASSETS)/etcd ;\
+	mv kubebuilder_2.3.2_$(OS_NAME)_amd64/bin/kube-apiserver $(TEST_ASSETS)/kube-apiserver ;\
+	mv kubebuilder_2.3.2_$(OS_NAME)_amd64/bin/kubectl $(TEST_ASSETS)/kubectl ;\
+	rm -rf $$INSTALL_TMP_DIR ;\
+	}
+ETCD_BIN=$(TEST_ASSETS)/etcd
+else
+ETCD_BIN=$(TEST_ASSETS)/etcd
+endif
+
+# find or download kube-apiserver
+kube-apiserver:
+ifeq (, $(wildcard $(TEST_ASSETS)/kube-apiserver))
+	@{ \
+	set -xe ;\
+	INSTALL_TMP_DIR=$$(mktemp -d) ;\
+	cd $$INSTALL_TMP_DIR ;\
+	wget https://github.com/kubernetes-sigs/kubebuilder/releases/download/v2.3.2/kubebuilder_2.3.2_$(OS_NAME)_amd64.tar.gz ;\
+	mkdir -p $(TEST_ASSETS) ;\
+	tar zxvf kubebuilder_2.3.2_$(OS_NAME)_amd64.tar.gz ;\
+	mv kubebuilder_2.3.2_$(OS_NAME)_amd64/bin/etcd $(TEST_ASSETS)/etcd ;\
+	mv kubebuilder_2.3.2_$(OS_NAME)_amd64/bin/kube-apiserver $(TEST_ASSETS)/kube-apiserver ;\
+	mv kubebuilder_2.3.2_$(OS_NAME)_amd64/bin/kubectl $(TEST_ASSETS)/kubectl ;\
+	rm -rf $$INSTALL_TMP_DIR ;\
+	}
+KUBE_APISERVER_BIN=$(TEST_ASSETS)/kube-apiserver
+else
+KUBE_APISERVER_BIN=$(TEST_ASSETS)/kube-apiserver
+endif
+
+
+# find or download kubectl
+kubectl:
+ifeq (, $(wildcard $(TEST_ASSETS)/kubectl))
+	@{ \
+	set -xe ;\
+	INSTALL_TMP_DIR=$$(mktemp -d) ;\
+	cd $$INSTALL_TMP_DIR ;\
+	wget https://github.com/kubernetes-sigs/kubebuilder/releases/download/v2.3.2/kubebuilder_2.3.2_$(OS_NAME)_amd64.tar.gz ;\
+	mkdir -p $(TEST_ASSETS) ;\
+	tar zxvf kubebuilder_2.3.2_$(OS_NAME)_amd64.tar.gz ;\
+	mv kubebuilder_2.3.2_$(OS_NAME)_amd64/bin/etcd $(TEST_ASSETS)/etcd ;\
+	mv kubebuilder_2.3.2_$(OS_NAME)_amd64/bin/kube-apiserver $(TEST_ASSETS)/kube-apiserver ;\
+	mv kubebuilder_2.3.2_$(OS_NAME)_amd64/bin/kubectl $(TEST_ASSETS)/kubectl ;\
+	rm -rf $$INSTALL_TMP_DIR ;\
+	}
+KUBECTL_BIN=$(TEST_ASSETS)/kubectl
+else
+KUBECTL_BIN=$(TEST_ASSETS)/kubectl
+endif
--- a/README.md
+++ b/README.md
@@ -163,7 +163,7 @@ Log-in to a GitHub account that has `admin` privileges for the repository, and [

 * repo (Full control)

-**Scopes for a Organisation Runner**
+**Scopes for a Organization Runner**

 * repo (Full control)
 * admin:org (Full control)
@@ -419,11 +419,11 @@ spec:
 > Please get prepared to put some time and effort to learn and leverage this feature!

 `actions-runner-controller` has an optional Webhook server that receives GitHub Webhook events and scale
-[`RunnerDeployment`s](#runnerdeployments) by updating corresponding [`HorizontalRunnerAutoscaler`s](#autoscaling).
+[`RunnerDeployments`](#runnerdeployments) by updating corresponding [`HorizontalRunnerAutoscalers`](#autoscaling).

 Today, the Webhook server can be configured to respond GitHub `check_run`, `pull_request`, and `push` events
 by scaling up the matching `HorizontalRunnerAutoscaler` by N replica(s), where `N` is configurable within
-`HorizontalRunerAutoscaler`'s `Spec`.
+`HorizontalRunerAutoscaler's` `Spec`.

 More concretely, you can configure the targeted GitHub event types and the `N` in
 `scaleUpTriggers`:
--- a/acceptance/testdata/runnerdeploy.yaml
+++ b/acceptance/testdata/runnerdeploy.yaml
@@ -7,3 +7,14 @@ spec:
  template:
    spec:
      repository: mumoshu/actions-runner-controller-ci
+      #
+      # dockerd within runner container
+      #
+      ## Replace `mumoshu/actions-runner-dind:dev` with your dind image
+      #dockerdWithinRunnerContainer: true
+      #image: mumoshu/actions-runner-dind:dev
+
+      #
+      # Set the MTU used by dockerd-managed network interfaces (including docker-build)
+      #
+      #dockerMTU: 1450
--- a/api/v1alpha1/runner_types.go
+++ b/api/v1alpha1/runner_types.go
@@ -130,6 +130,7 @@ type RunnerStatus struct {
 	// +optional
 	Message string `json:"message,omitempty"`
 	// +optional
+	// +nullable
 	LastRegistrationCheckTime *metav1.Time `json:"lastRegistrationCheckTime,omitempty"`
 }

--- a/charts/actions-runner-controller/Chart.yaml
+++ b/charts/actions-runner-controller/Chart.yaml
@@ -15,7 +15,7 @@ type: application
 # This is the chart version. This version number should be incremented each time you make changes
 # to the chart and its templates, including the app version.
 # Versions are expected to follow Semantic Versioning (https://semver.org/)
-version: 0.10.3
+version: 0.10.4

 home: https://github.com/summerwind/actions-runner-controller

--- a/charts/actions-runner-controller/crds/actions.summerwind.dev_runners.yaml
+++ b/charts/actions-runner-controller/crds/actions.summerwind.dev_runners.yaml
@@ -1543,6 +1543,7 @@ spec:
          properties:
            lastRegistrationCheckTime:
              format: date-time
+              nullable: true
              type: string
            message:
              type: string
--- a/config/crd/bases/actions.summerwind.dev_runners.yaml
+++ b/config/crd/bases/actions.summerwind.dev_runners.yaml
@@ -1543,6 +1543,7 @@ spec:
          properties:
            lastRegistrationCheckTime:
              format: date-time
+              nullable: true
              type: string
            message:
              type: string
--- a/controllers/autoscaling.go
+++ b/controllers/autoscaling.go
@@ -34,7 +34,7 @@ func getValueAvailableAt(now time.Time, from, to *time.Time, reservedValue int)
 	return &reservedValue
 }

-func (r *HorizontalRunnerAutoscalerReconciler) getDesiredReplicasFromCache(hra v1alpha1.HorizontalRunnerAutoscaler) *int {
+func (r *HorizontalRunnerAutoscalerReconciler) fetchSuggestedReplicasFromCache(hra v1alpha1.HorizontalRunnerAutoscaler) *int {
 	var entry *v1alpha1.CacheEntry

 	for i := range hra.Status.CacheEntries {
@@ -63,7 +63,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) getDesiredReplicasFromCache(hra v
 	return nil
 }

-func (r *HorizontalRunnerAutoscalerReconciler) determineDesiredReplicas(rd v1alpha1.RunnerDeployment, hra v1alpha1.HorizontalRunnerAutoscaler) (*int, error) {
+func (r *HorizontalRunnerAutoscalerReconciler) suggestDesiredReplicas(rd v1alpha1.RunnerDeployment, hra v1alpha1.HorizontalRunnerAutoscaler) (*int, error) {
 	if hra.Spec.MinReplicas == nil {
 		return nil, fmt.Errorf("horizontalrunnerautoscaler %s/%s is missing minReplicas", hra.Namespace, hra.Name)
 	} else if hra.Spec.MaxReplicas == nil {
@@ -73,20 +73,20 @@ func (r *HorizontalRunnerAutoscalerReconciler) determineDesiredReplicas(rd v1alp
 	metrics := hra.Spec.Metrics
 	if len(metrics) == 0 {
 		if len(hra.Spec.ScaleUpTriggers) == 0 {
-			return r.calculateReplicasByQueuedAndInProgressWorkflowRuns(rd, hra)
+			return r.suggestReplicasByQueuedAndInProgressWorkflowRuns(rd, hra)
 		}

-		return hra.Spec.MinReplicas, nil
+		return nil, nil
 	} else if metrics[0].Type == v1alpha1.AutoscalingMetricTypeTotalNumberOfQueuedAndInProgressWorkflowRuns {
-		return r.calculateReplicasByQueuedAndInProgressWorkflowRuns(rd, hra)
+		return r.suggestReplicasByQueuedAndInProgressWorkflowRuns(rd, hra)
 	} else if metrics[0].Type == v1alpha1.AutoscalingMetricTypePercentageRunnersBusy {
-		return r.calculateReplicasByPercentageRunnersBusy(rd, hra)
+		return r.suggestReplicasByPercentageRunnersBusy(rd, hra)
 	} else {
 		return nil, fmt.Errorf("validting autoscaling metrics: unsupported metric type %q", metrics[0].Type)
 	}
 }

-func (r *HorizontalRunnerAutoscalerReconciler) calculateReplicasByQueuedAndInProgressWorkflowRuns(rd v1alpha1.RunnerDeployment, hra v1alpha1.HorizontalRunnerAutoscaler) (*int, error) {
+func (r *HorizontalRunnerAutoscalerReconciler) suggestReplicasByQueuedAndInProgressWorkflowRuns(rd v1alpha1.RunnerDeployment, hra v1alpha1.HorizontalRunnerAutoscaler) (*int, error) {

 	var repos [][]string
 	metrics := hra.Spec.Metrics
@@ -101,7 +101,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) calculateReplicasByQueuedAndInPro
 		// we assume that the desired replicas should always be `minReplicas + capacityReservedThroughWebhook`.
 		// See https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793372693
 		if len(metrics) == 0 {
-			return hra.Spec.MinReplicas, nil
+			return nil, nil
 		}

 		if len(metrics[0].RepositoryNames) == 0 {
@@ -178,28 +178,10 @@ func (r *HorizontalRunnerAutoscalerReconciler) calculateReplicasByQueuedAndInPro
 		}
 	}

-	minReplicas := *hra.Spec.MinReplicas
-	maxReplicas := *hra.Spec.MaxReplicas
 	necessaryReplicas := queued + inProgress

-	var desiredReplicas int
-
-	if necessaryReplicas < minReplicas {
-		desiredReplicas = minReplicas
-	} else if necessaryReplicas > maxReplicas {
-		desiredReplicas = maxReplicas
-	} else {
-		desiredReplicas = necessaryReplicas
-	}
-
-	rd.Status.Replicas = &desiredReplicas
-	replicas := desiredReplicas
-
 	r.Log.V(1).Info(
-		"Calculated desired replicas",
-		"computed_replicas_desired", desiredReplicas,
-		"spec_replicas_min", minReplicas,
-		"spec_replicas_max", maxReplicas,
+		fmt.Sprintf("Suggested desired replicas of %d by TotalNumberOfQueuedAndInProgressWorkflowRuns", necessaryReplicas),
 		"workflow_runs_completed", completed,
 		"workflow_runs_in_progress", inProgress,
 		"workflow_runs_queued", queued,
@@ -209,13 +191,11 @@ func (r *HorizontalRunnerAutoscalerReconciler) calculateReplicasByQueuedAndInPro
 		"horizontal_runner_autoscaler", hra.Name,
 	)

-	return &replicas, nil
+	return &necessaryReplicas, nil
 }

-func (r *HorizontalRunnerAutoscalerReconciler) calculateReplicasByPercentageRunnersBusy(rd v1alpha1.RunnerDeployment, hra v1alpha1.HorizontalRunnerAutoscaler) (*int, error) {
+func (r *HorizontalRunnerAutoscalerReconciler) suggestReplicasByPercentageRunnersBusy(rd v1alpha1.RunnerDeployment, hra v1alpha1.HorizontalRunnerAutoscaler) (*int, error) {
 	ctx := context.Background()
-	minReplicas := *hra.Spec.MinReplicas
-	maxReplicas := *hra.Spec.MaxReplicas
 	metrics := hra.Spec.Metrics[0]
 	scaleUpThreshold := defaultScaleUpThreshold
 	scaleDownThreshold := defaultScaleDownThreshold
@@ -363,21 +343,13 @@ func (r *HorizontalRunnerAutoscalerReconciler) calculateReplicasByPercentageRunn
 		desiredReplicas = *rd.Spec.Replicas
 	}

-	if desiredReplicas < minReplicas {
-		desiredReplicas = minReplicas
-	} else if desiredReplicas > maxReplicas {
-		desiredReplicas = maxReplicas
-	}
-
 	// NOTES for operators:
 	//
 	// - num_runners can be as twice as large as replicas_desired_before while
 	//   the runnerdeployment controller is replacing RunnerReplicaSet for runner update.

 	r.Log.V(1).Info(
-		"Calculated desired replicas",
-		"replicas_min", minReplicas,
-		"replicas_max", maxReplicas,
+		fmt.Sprintf("Suggested desired replicas of %d by PercentageRunnersBusy", desiredReplicas),
 		"replicas_desired_before", desiredReplicasBefore,
 		"replicas_desired", desiredReplicas,
 		"num_runners", numRunners,
@@ -391,8 +363,5 @@ func (r *HorizontalRunnerAutoscalerReconciler) calculateReplicasByPercentageRunn
 		"repository", repository,
 	)

-	rd.Status.Replicas = &desiredReplicas
-	replicas := desiredReplicas
-
-	return &replicas, nil
+	return &desiredReplicas, nil
 }
--- a/controllers/autoscaling_test.go
+++ b/controllers/autoscaling_test.go
@@ -224,7 +224,7 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
 				},
 			}

-			got, err := h.computeReplicas(rd, hra)
+			got, _, _, err := h.computeReplicasWithCache(log, metav1Now.Time, rd, hra)
 			if err != nil {
 				if tc.err == "" {
 					t.Fatalf("unexpected error: expected none, got %v", err)
@@ -234,12 +234,8 @@ func TestDetermineDesiredReplicas_RepositoryRunner(t *testing.T) {
 				return
 			}

-			if got == nil {
-				t.Fatalf("unexpected value of rs.Spec.Replicas: nil")
-			}
-
-			if *got != tc.want {
-				t.Errorf("%d: incorrect desired replicas: want %d, got %d", i, tc.want, *got)
+			if got != tc.want {
+				t.Errorf("%d: incorrect desired replicas: want %d, got %d", i, tc.want, got)
 			}
 		})
 	}
@@ -424,6 +420,8 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
 		_ = v1alpha1.AddToScheme(scheme)

 		t.Run(fmt.Sprintf("case %d", i), func(t *testing.T) {
+			t.Helper()
+
 			server := fake.NewServer(
 				fake.WithListRepositoryWorkflowRunsResponse(200, tc.workflowRuns, tc.workflowRuns_queued, tc.workflowRuns_in_progress),
 				fake.WithListWorkflowJobsResponse(200, tc.workflowJobs),
@@ -485,7 +483,7 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
 				},
 			}

-			got, err := h.computeReplicas(rd, hra)
+			got, _, _, err := h.computeReplicasWithCache(log, metav1Now.Time, rd, hra)
 			if err != nil {
 				if tc.err == "" {
 					t.Fatalf("unexpected error: expected none, got %v", err)
@@ -495,12 +493,8 @@ func TestDetermineDesiredReplicas_OrganizationalRunner(t *testing.T) {
 				return
 			}

-			if got == nil {
-				t.Fatalf("unexpected value of rs.Spec.Replicas: nil, wanted %v", tc.want)
-			}
-
-			if *got != tc.want {
-				t.Errorf("%d: incorrect desired replicas: want %d, got %d", i, tc.want, *got)
+			if got != tc.want {
+				t.Errorf("%d: incorrect desired replicas: want %d, got %d", i, tc.want, got)
 			}
 		})
 	}
--- a/controllers/horizontalrunnerautoscaler_controller.go
+++ b/controllers/horizontalrunnerautoscaler_controller.go
@@ -19,6 +19,7 @@ package controllers
 import (
 	"context"
 	"fmt"
+	corev1 "k8s.io/api/core/v1"
 	"time"

 	"github.com/summerwind/actions-runner-controller/github"
@@ -30,10 +31,10 @@ import (
 	ctrl "sigs.k8s.io/controller-runtime"
 	"sigs.k8s.io/controller-runtime/pkg/client"

-	corev1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

 	"github.com/summerwind/actions-runner-controller/api/v1alpha1"
+	"github.com/summerwind/actions-runner-controller/controllers/metrics"
 )

 const (
@@ -52,6 +53,8 @@ type HorizontalRunnerAutoscalerReconciler struct {
 	Name          string
 }

+const defaultReplicas = 1
+
 // +kubebuilder:rbac:groups=actions.summerwind.dev,resources=runnerdeployments,verbs=get;list;watch;update;patch
 // +kubebuilder:rbac:groups=actions.summerwind.dev,resources=horizontalrunnerautoscalers,verbs=get;list;watch;create;update;patch;delete
 // +kubebuilder:rbac:groups=actions.summerwind.dev,resources=horizontalrunnerautoscalers/finalizers,verbs=get;list;watch;create;update;patch;delete
@@ -71,6 +74,8 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(req ctrl.Request) (ctrl
 		return ctrl.Result{}, nil
 	}

+	metrics.SetHorizontalRunnerAutoscalerSpec(hra.ObjectMeta, hra.Spec)
+
 	var rd v1alpha1.RunnerDeployment
 	if err := r.Get(ctx, types.NamespacedName{
 		Namespace: req.Namespace,
@@ -83,41 +88,18 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(req ctrl.Request) (ctrl
 		return ctrl.Result{}, nil
 	}

-	var replicas *int
-
-	replicasFromCache := r.getDesiredReplicasFromCache(hra)
-
-	if replicasFromCache != nil {
-		replicas = replicasFromCache
-	} else {
-		var err error
-
-		replicas, err = r.computeReplicas(rd, hra)
-		if err != nil {
-			r.Recorder.Event(&hra, corev1.EventTypeNormal, "RunnerAutoscalingFailure", err.Error())
-
-			log.Error(err, "Could not compute replicas")
-
-			return ctrl.Result{}, err
-		}
-	}
-
-	const defaultReplicas = 1
-
-	currentDesiredReplicas := getIntOrDefault(rd.Spec.Replicas, defaultReplicas)
-	newDesiredReplicas := getIntOrDefault(replicas, defaultReplicas)
-
 	now := time.Now()

-	for _, reservation := range hra.Spec.CapacityReservations {
-		if reservation.ExpirationTime.Time.After(now) {
-			newDesiredReplicas += reservation.Replicas
-		}
+	newDesiredReplicas, computedReplicas, computedReplicasFromCache, err := r.computeReplicasWithCache(log, now, rd, hra)
+	if err != nil {
+		r.Recorder.Event(&hra, corev1.EventTypeNormal, "RunnerAutoscalingFailure", err.Error())
+
+		log.Error(err, "Could not compute replicas")
+
+		return ctrl.Result{}, err
 	}

-	if hra.Spec.MaxReplicas != nil && *hra.Spec.MaxReplicas < newDesiredReplicas {
-		newDesiredReplicas = *hra.Spec.MaxReplicas
-	}
+	currentDesiredReplicas := getIntOrDefault(rd.Spec.Replicas, defaultReplicas)

 	// Please add more conditions that we can in-place update the newest runnerreplicaset without disruption
 	if currentDesiredReplicas != newDesiredReplicas {
@@ -143,7 +125,7 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(req ctrl.Request) (ctrl
 		updated.Status.DesiredReplicas = &newDesiredReplicas
 	}

-	if replicasFromCache == nil {
+	if computedReplicasFromCache == nil {
 		if updated == nil {
 			updated = hra.DeepCopy()
 		}
@@ -160,12 +142,14 @@ func (r *HorizontalRunnerAutoscalerReconciler) Reconcile(req ctrl.Request) (ctrl

 		updated.Status.CacheEntries = append(cacheEntries, v1alpha1.CacheEntry{
 			Key:            v1alpha1.CacheEntryKeyDesiredReplicas,
-			Value:          *replicas,
+			Value:          computedReplicas,
 			ExpirationTime: metav1.Time{Time: time.Now().Add(cacheDuration)},
 		})
 	}

 	if updated != nil {
+		metrics.SetHorizontalRunnerAutoscalerStatus(updated.ObjectMeta, updated.Status)
+
 		if err := r.Status().Patch(ctx, updated, client.MergeFrom(&hra)); err != nil {
 			return ctrl.Result{}, fmt.Errorf("patching horizontalrunnerautoscaler status to add cache entry: %w", err)
 		}
@@ -200,14 +184,59 @@ func (r *HorizontalRunnerAutoscalerReconciler) SetupWithManager(mgr ctrl.Manager
 		Complete(r)
 }

-func (r *HorizontalRunnerAutoscalerReconciler) computeReplicas(rd v1alpha1.RunnerDeployment, hra v1alpha1.HorizontalRunnerAutoscaler) (*int, error) {
-	var computedReplicas *int
-
-	replicas, err := r.determineDesiredReplicas(rd, hra)
-	if err != nil {
-		return nil, err
+func (r *HorizontalRunnerAutoscalerReconciler) computeReplicasWithCache(log logr.Logger, now time.Time, rd v1alpha1.RunnerDeployment, hra v1alpha1.HorizontalRunnerAutoscaler) (int, int, *int, error) {
+	minReplicas := defaultReplicas
+	if hra.Spec.MinReplicas != nil && *hra.Spec.MinReplicas > 0 {
+		minReplicas = *hra.Spec.MinReplicas
 	}

+	var suggestedReplicas int
+
+	suggestedReplicasFromCache := r.fetchSuggestedReplicasFromCache(hra)
+
+	var cached *int
+
+	if suggestedReplicasFromCache != nil {
+		cached = suggestedReplicasFromCache
+
+		if cached == nil {
+			suggestedReplicas = minReplicas
+		} else {
+			suggestedReplicas = *cached
+		}
+	} else {
+		v, err := r.suggestDesiredReplicas(rd, hra)
+		if err != nil {
+			return 0, 0, nil, err
+		}
+
+		if v == nil {
+			suggestedReplicas = minReplicas
+		} else {
+			suggestedReplicas = *v
+		}
+	}
+
+	var reserved int
+
+	for _, reservation := range hra.Spec.CapacityReservations {
+		if reservation.ExpirationTime.Time.After(now) {
+			reserved += reservation.Replicas
+		}
+	}
+
+	newDesiredReplicas := suggestedReplicas + reserved
+
+	if newDesiredReplicas < minReplicas {
+		newDesiredReplicas = minReplicas
+	} else if hra.Spec.MaxReplicas != nil && newDesiredReplicas > *hra.Spec.MaxReplicas {
+		newDesiredReplicas = *hra.Spec.MaxReplicas
+	}
+
+	//
+	// Delay scaling-down for ScaleDownDelaySecondsAfterScaleUp or DefaultScaleDownDelay
+	//
+
 	var scaleDownDelay time.Duration

 	if hra.Spec.ScaleDownDelaySecondsAfterScaleUp != nil {
@@ -216,17 +245,50 @@ func (r *HorizontalRunnerAutoscalerReconciler) computeReplicas(rd v1alpha1.Runne
 		scaleDownDelay = DefaultScaleDownDelay
 	}

-	now := time.Now()
+	var scaleDownDelayUntil *time.Time

 	if hra.Status.DesiredReplicas == nil ||
-		*hra.Status.DesiredReplicas < *replicas ||
-		hra.Status.LastSuccessfulScaleOutTime == nil ||
-		hra.Status.LastSuccessfulScaleOutTime.Add(scaleDownDelay).Before(now) {
+		*hra.Status.DesiredReplicas < newDesiredReplicas ||
+		hra.Status.LastSuccessfulScaleOutTime == nil {

-		computedReplicas = replicas
+	} else if hra.Status.LastSuccessfulScaleOutTime != nil {
+		t := hra.Status.LastSuccessfulScaleOutTime.Add(scaleDownDelay)
+
+		// ScaleDownDelay is not passed
+		if t.After(now) {
+			scaleDownDelayUntil = &t
+			newDesiredReplicas = *hra.Status.DesiredReplicas
+		}
 	} else {
-		computedReplicas = hra.Status.DesiredReplicas
+		newDesiredReplicas = *hra.Status.DesiredReplicas
 	}

-	return computedReplicas, nil
+	//
+	// Logs various numbers for monitoring and debugging purpose
+	//
+
+	kvs := []interface{}{
+		"suggested", suggestedReplicas,
+		"reserved", reserved,
+		"min", minReplicas,
+	}
+
+	if cached != nil {
+		kvs = append(kvs, "cached", *cached)
+	}
+
+	if scaleDownDelayUntil != nil {
+		kvs = append(kvs, "last_scale_up_time", *hra.Status.LastSuccessfulScaleOutTime)
+		kvs = append(kvs, "scale_down_delay_until", scaleDownDelayUntil)
+	}
+
+	if maxReplicas := hra.Spec.MaxReplicas; maxReplicas != nil {
+		kvs = append(kvs, "max", *maxReplicas)
+	}
+
+	log.V(1).Info(fmt.Sprintf("Calculated desired replicas of %d", newDesiredReplicas),
+		kvs...,
+	)
+
+	return newDesiredReplicas, suggestedReplicas, suggestedReplicasFromCache, nil
 }
--- a/controllers/integration_test.go
+++ b/controllers/integration_test.go
@@ -71,7 +71,9 @@ func SetupIntegrationTest(ctx context.Context) *testEnvironment {
 		err := k8sClient.Create(ctx, ns)
 		Expect(err).NotTo(HaveOccurred(), "failed to create test namespace")

-		mgr, err := ctrl.NewManager(cfg, ctrl.Options{})
+		mgr, err := ctrl.NewManager(cfg, ctrl.Options{
+			Namespace: ns.Name,
+		})
 		Expect(err).NotTo(HaveOccurred(), "failed to create manager")

 		responses := &fake.FixedResponses{}
@@ -97,6 +99,21 @@ func SetupIntegrationTest(ctx context.Context) *testEnvironment {
 			return fmt.Sprintf("%s%s", ns.Name, name)
 		}

+		runnerController := &RunnerReconciler{
+			Client:                      mgr.GetClient(),
+			Scheme:                      scheme.Scheme,
+			Log:                         logf.Log,
+			Recorder:                    mgr.GetEventRecorderFor("runnerreplicaset-controller"),
+			GitHubClient:                env.ghClient,
+			RunnerImage:                 "example/runner:test",
+			DockerImage:                 "example/docker:test",
+			Name:                        controllerName("runner"),
+			RegistrationRecheckInterval: time.Millisecond,
+			RegistrationRecheckJitter:   time.Millisecond,
+		}
+		err = runnerController.SetupWithManager(mgr)
+		Expect(err).NotTo(HaveOccurred(), "failed to setup runner controller")
+
 		replicasetController := &RunnerReplicaSetReconciler{
 			Client:       mgr.GetClient(),
 			Scheme:       scheme.Scheme,
@@ -106,7 +123,7 @@ func SetupIntegrationTest(ctx context.Context) *testEnvironment {
 			Name:         controllerName("runnerreplicaset"),
 		}
 		err = replicasetController.SetupWithManager(mgr)
-		Expect(err).NotTo(HaveOccurred(), "failed to setup controller")
+		Expect(err).NotTo(HaveOccurred(), "failed to setup runnerreplicaset controller")

 		deploymentsController := &RunnerDeploymentReconciler{
 			Client:   mgr.GetClient(),
@@ -116,7 +133,7 @@ func SetupIntegrationTest(ctx context.Context) *testEnvironment {
 			Name:     controllerName("runnnerdeployment"),
 		}
 		err = deploymentsController.SetupWithManager(mgr)
-		Expect(err).NotTo(HaveOccurred(), "failed to setup controller")
+		Expect(err).NotTo(HaveOccurred(), "failed to setup runnerdeployment controller")

 		autoscalerController := &HorizontalRunnerAutoscalerReconciler{
 			Client:        mgr.GetClient(),
@@ -128,7 +145,7 @@ func SetupIntegrationTest(ctx context.Context) *testEnvironment {
 			Name:          controllerName("horizontalrunnerautoscaler"),
 		}
 		err = autoscalerController.SetupWithManager(mgr)
-		Expect(err).NotTo(HaveOccurred(), "failed to setup controller")
+		Expect(err).NotTo(HaveOccurred(), "failed to setup autoscaler controller")

 		autoscalerWebhook := &HorizontalRunnerAutoscalerGitHubWebhook{
 			Client:    mgr.GetClient(),
@@ -475,10 +492,9 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {

 				ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1)
 				ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 3)
-			}
-
-			{
 				env.ExpectRegisteredNumberCountEventuallyEquals(3, "count of fake list runners")
+				env.SyncRunnerRegistrations()
+				ExpectRunnerCountEventuallyEquals(ctx, ns.Name, 3)
 			}

 			// Scale-up to 4 replicas on first check_run create webhook event
@@ -486,19 +502,19 @@ var _ = Context("INTEGRATION: Inside of a new namespace", func() {
 				env.SendOrgCheckRunEvent("test", "valid", "pending", "created")
 				ExpectRunnerSetsCountEventuallyEquals(ctx, ns.Name, 1, "runner sets after webhook")
 				ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 4, "runners after first webhook event")
-			}
-
-			{
 				env.ExpectRegisteredNumberCountEventuallyEquals(4, "count of fake list runners")
+				env.SyncRunnerRegistrations()
+				ExpectRunnerCountEventuallyEquals(ctx, ns.Name, 4)
 			}

 			// Scale-up to 5 replicas on second check_run create webhook event
 			{
 				env.SendOrgCheckRunEvent("test", "valid", "pending", "created")
 				ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx, ns.Name, 5, "runners after second webhook event")
+				env.ExpectRegisteredNumberCountEventuallyEquals(5, "count of fake list runners")
+				env.SyncRunnerRegistrations()
+				ExpectRunnerCountEventuallyEquals(ctx, ns.Name, 5)
 			}
-
-			env.ExpectRegisteredNumberCountEventuallyEquals(5, "count of fake list runners")
 		})

 		It("should create and scale organization's repository runners only on check_run event", func() {
@@ -1228,6 +1244,44 @@ func ExpectRunnerSetsCountEventuallyEquals(ctx context.Context, ns string, count
 		time.Second*10, time.Millisecond*500).Should(BeEquivalentTo(count), optionalDescription...)
 }

+func ExpectRunnerCountEventuallyEquals(ctx context.Context, ns string, count int, optionalDescription ...interface{}) {
+	runners := actionsv1alpha1.RunnerList{Items: []actionsv1alpha1.Runner{}}
+
+	EventuallyWithOffset(
+		1,
+		func() int {
+			err := k8sClient.List(ctx, &runners, client.InNamespace(ns))
+			if err != nil {
+				logf.Log.Error(err, "list runner sets")
+			}
+
+			var running int
+
+			for _, r := range runners.Items {
+				if r.Status.Phase == string(corev1.PodRunning) {
+					running++
+				} else {
+					var pod corev1.Pod
+					if err := k8sClient.Get(ctx, types.NamespacedName{Namespace: ns, Name: r.Name}, &pod); err != nil {
+						logf.Log.Error(err, "simulating pod controller")
+						continue
+					}
+
+					copy := pod.DeepCopy()
+					copy.Status.Phase = corev1.PodRunning
+
+					if err := k8sClient.Status().Patch(ctx, copy, client.MergeFrom(&pod)); err != nil {
+						logf.Log.Error(err, "simulating pod controller")
+						continue
+					}
+				}
+			}
+
+			return running
+		},
+		time.Second*10, time.Millisecond*500).Should(BeEquivalentTo(count), optionalDescription...)
+}
+
 func ExpectRunnerSetsManagedReplicasCountEventuallyEquals(ctx context.Context, ns string, count int, optionalDescription ...interface{}) {
 	runnerSets := actionsv1alpha1.RunnerReplicaSetList{Items: []actionsv1alpha1.RunnerReplicaSet{}}

--- a/controllers/metrics/horizontalrunnerautoscaler.go
+++ b/controllers/metrics/horizontalrunnerautoscaler.go
@@ -0,0 +1,67 @@
+package metrics
+
+import (
+	"github.com/prometheus/client_golang/prometheus"
+	"github.com/summerwind/actions-runner-controller/api/v1alpha1"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+)
+
+const (
+	hraName      = "horizontalrunnerautoscaler"
+	hraNamespace = "namespace"
+)
+
+var (
+	horizontalRunnerAutoscalerMetrics = []prometheus.Collector{
+		horizontalRunnerAutoscalerMinReplicas,
+		horizontalRunnerAutoscalerMaxReplicas,
+		horizontalRunnerAutoscalerDesiredReplicas,
+	}
+)
+
+var (
+	horizontalRunnerAutoscalerMinReplicas = prometheus.NewGaugeVec(
+		prometheus.GaugeOpts{
+			Name: "horizontalrunnerautoscaler_spec_min_replicas",
+			Help: "minReplicas of HorizontalRunnerAutoscaler",
+		},
+		[]string{hraName, hraNamespace},
+	)
+	horizontalRunnerAutoscalerMaxReplicas = prometheus.NewGaugeVec(
+		prometheus.GaugeOpts{
+			Name: "horizontalrunnerautoscaler_spec_max_replicas",
+			Help: "maxReplicas of HorizontalRunnerAutoscaler",
+		},
+		[]string{hraName, hraNamespace},
+	)
+	horizontalRunnerAutoscalerDesiredReplicas = prometheus.NewGaugeVec(
+		prometheus.GaugeOpts{
+			Name: "horizontalrunnerautoscaler_status_desired_replicas",
+			Help: "desiredReplicas of HorizontalRunnerAutoscaler",
+		},
+		[]string{hraName, hraNamespace},
+	)
+)
+
+func SetHorizontalRunnerAutoscalerSpec(o metav1.ObjectMeta, spec v1alpha1.HorizontalRunnerAutoscalerSpec) {
+	labels := prometheus.Labels{
+		hraName:      o.Name,
+		hraNamespace: o.Namespace,
+	}
+	if spec.MaxReplicas != nil {
+		horizontalRunnerAutoscalerMaxReplicas.With(labels).Set(float64(*spec.MaxReplicas))
+	}
+	if spec.MinReplicas != nil {
+		horizontalRunnerAutoscalerMinReplicas.With(labels).Set(float64(*spec.MinReplicas))
+	}
+}
+
+func SetHorizontalRunnerAutoscalerStatus(o metav1.ObjectMeta, status v1alpha1.HorizontalRunnerAutoscalerStatus) {
+	labels := prometheus.Labels{
+		hraName:      o.Name,
+		hraNamespace: o.Namespace,
+	}
+	if status.DesiredReplicas != nil {
+		horizontalRunnerAutoscalerDesiredReplicas.With(labels).Set(float64(*status.DesiredReplicas))
+	}
+}
--- a/controllers/metrics/metrics.go
+++ b/controllers/metrics/metrics.go
@@ -0,0 +1,14 @@
+// Package metrics provides the metrics of custom resources such as HRA.
+//
+// This depends on the metrics exporter of kubebuilder.
+// See https://book.kubebuilder.io/reference/metrics.html for details.
+package metrics
+
+import (
+	"sigs.k8s.io/controller-runtime/pkg/metrics"
+)
+
+func init() {
+	metrics.Registry.MustRegister(runnerDeploymentMetrics...)
+	metrics.Registry.MustRegister(horizontalRunnerAutoscalerMetrics...)
+}
--- a/controllers/metrics/runnerdeployment.go
+++ b/controllers/metrics/runnerdeployment.go
@@ -0,0 +1,37 @@
+package metrics
+
+import (
+	"github.com/prometheus/client_golang/prometheus"
+	"github.com/summerwind/actions-runner-controller/api/v1alpha1"
+)
+
+const (
+	rdName      = "runnerdeployment"
+	rdNamespace = "namespace"
+)
+
+var (
+	runnerDeploymentMetrics = []prometheus.Collector{
+		runnerDeploymentReplicas,
+	}
+)
+
+var (
+	runnerDeploymentReplicas = prometheus.NewGaugeVec(
+		prometheus.GaugeOpts{
+			Name: "runnerdeployment_spec_replicas",
+			Help: "replicas of RunnerDeployment",
+		},
+		[]string{rdName, rdNamespace},
+	)
+)
+
+func SetRunnerDeployment(rd v1alpha1.RunnerDeployment) {
+	labels := prometheus.Labels{
+		rdName:      rd.Name,
+		rdNamespace: rd.Namespace,
+	}
+	if rd.Spec.Replicas != nil {
+		runnerDeploymentReplicas.With(labels).Set(float64(*rd.Spec.Replicas))
+	}
+}
--- a/controllers/runner_controller.go
+++ b/controllers/runner_controller.go
@@ -52,12 +52,15 @@ const (
 // RunnerReconciler reconciles a Runner object
 type RunnerReconciler struct {
 	client.Client
-	Log          logr.Logger
-	Recorder     record.EventRecorder
-	Scheme       *runtime.Scheme
-	GitHubClient *github.Client
-	RunnerImage  string
-	DockerImage  string
+	Log                         logr.Logger
+	Recorder                    record.EventRecorder
+	Scheme                      *runtime.Scheme
+	GitHubClient                *github.Client
+	RunnerImage                 string
+	DockerImage                 string
+	Name                        string
+	RegistrationRecheckInterval time.Duration
+	RegistrationRecheckJitter   time.Duration
 }

 // +kubebuilder:rbac:groups=actions.summerwind.dev,resources=runners,verbs=get;list;watch;create;update;patch;delete
@@ -164,9 +167,11 @@ func (r *RunnerReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
 				// Gracefully handle pod-already-exists errors due to informer cache delay.
 				// Without this we got a few errors like the below on new runner pod:
 				// 2021-03-16T00:23:10.116Z        ERROR   controller-runtime.controller   Reconciler error      {"controller": "runner-controller", "request": "default/example-runnerdeploy-b2g2g-j4mcp", "error": "pods \"example-runnerdeploy-b2g2g-j4mcp\" already exists"}
-				log.Info("Runner pod already exists. Probably this pod has been already created in previous reconcilation but the new pod is not yet cached.")
+				log.Info(
+					"Failed to create pod due to AlreadyExists error. Probably this pod has been already created in previous reconcilation but is still not in the informer cache. Will retry on pod created. If it doesn't repeat, there's no problem",
+				)

-				return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
+				return ctrl.Result{}, nil
 			}

 			log.Error(err, "Failed to create pod resource")
@@ -184,7 +189,7 @@ func (r *RunnerReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {

 			if deletionDidTimeout {
 				log.Info(
-					"Pod failed to delete itself in a timely manner. "+
+					fmt.Sprintf("Failed to delete pod within %s. ", deletionTimeout)+
 						"This is typically the case when a Kubernetes node became unreachable "+
 						"and the kube controller started evicting nodes. Forcefully deleting the pod to not get stuck.",
 					"podDeletionTimestamp", pod.DeletionTimestamp,
@@ -248,6 +253,9 @@ func (r *RunnerReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
 		// saving API calls and scary{ log messages
 		if !restart {
 			registrationCheckInterval := time.Minute
+			if r.RegistrationRecheckInterval > 0 {
+				registrationCheckInterval = r.RegistrationRecheckInterval
+			}

 			// We want to call ListRunners GitHub Actions API only once per runner per minute.
 			// This if block, in conjunction with:
@@ -255,15 +263,28 @@ func (r *RunnerReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
 			// achieves that.
 			if lastCheckTime := runner.Status.LastRegistrationCheckTime; lastCheckTime != nil {
 				nextCheckTime := lastCheckTime.Add(registrationCheckInterval)
-				if nextCheckTime.After(time.Now()) {
+				now := time.Now()
+
+				// Requeue scheduled by RequeueAfter can happen a bit earlier (like dozens of milliseconds)
+				// so to avoid excessive, in-effective retry, we heuristically ignore the remaining delay in case it is
+				// shorter than 1s
+				requeueAfter := nextCheckTime.Sub(now) - time.Second
+				if requeueAfter > 0 {
 					log.Info(
-						fmt.Sprintf("Skipping registration check because it's deferred until %s", nextCheckTime),
+						fmt.Sprintf("Skipped registration check because it's deferred until %s. Retrying in %s at latest", nextCheckTime, requeueAfter),
+						"lastRegistrationCheckTime", lastCheckTime,
+						"registrationCheckInterval", registrationCheckInterval,
 					)

-					// Note that we don't need to explicitly requeue on this reconcilation because
-					// the requeue should have been already scheduled previsouly
-					// (with `return ctrl.Result{RequeueAfter: registrationRecheckDelay}, nil` as noted above and coded below)
-					return ctrl.Result{}, nil
+					// Without RequeueAfter, the controller may not retry on scheduled. Instead, it must wait until the
+					// next sync period passes, which can be too much later than nextCheckTime.
+					//
+					// We need to requeue on this reconcilation even though we have already scheduled the initial
+					// requeue previously with `return ctrl.Result{RequeueAfter: registrationRecheckDelay}, nil`.
+					// Apparently, the workqueue used by controller-runtime seems to deduplicate and resets the delay on
+					// other requeues- so the initial scheduled requeue may have been reset due to requeue on
+					// spec/status change.
+					return ctrl.Result{RequeueAfter: requeueAfter}, nil
 				}
 			}

@@ -354,7 +375,12 @@ func (r *RunnerReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
 			}

 			if (notFound || offline) && !registrationDidTimeout {
-				registrationRecheckDelay = registrationCheckInterval + wait.Jitter(10*time.Second, 0.1)
+				registrationRecheckJitter := 10 * time.Second
+				if r.RegistrationRecheckJitter > 0 {
+					registrationRecheckJitter = r.RegistrationRecheckJitter
+				}
+
+				registrationRecheckDelay = registrationCheckInterval + wait.Jitter(registrationRecheckJitter, 0.1)
 			}
 		}

@@ -608,45 +634,58 @@ func (r *RunnerReconciler) newPod(runner v1alpha1.Runner) (corev1.Pod, error) {
 		}...)
 	}

-	if !dockerdInRunner && dockerEnabled {
-		runnerVolumeName := "runner"
-		runnerVolumeMountPath := "/runner"
+	//
+	// /runner must be generated on runtime from /runnertmp embedded in the container image.
+	//
+	// When you're NOT using dindWithinRunner=true,
+	// it must also be shared with the dind container as it seems like required to run docker steps.
+	//

-		pod.Spec.Volumes = []corev1.Volume{
-			{
+	runnerVolumeName := "runner"
+	runnerVolumeMountPath := "/runner"
+
+	pod.Spec.Volumes = append(pod.Spec.Volumes,
+		corev1.Volume{
+			Name: runnerVolumeName,
+			VolumeSource: corev1.VolumeSource{
+				EmptyDir: &corev1.EmptyDirVolumeSource{},
+			},
+		},
+	)
+
+	pod.Spec.Containers[0].VolumeMounts = append(pod.Spec.Containers[0].VolumeMounts,
+		corev1.VolumeMount{
+			Name:      runnerVolumeName,
+			MountPath: runnerVolumeMountPath,
+		},
+	)
+
+	if !dockerdInRunner && dockerEnabled {
+		pod.Spec.Volumes = append(pod.Spec.Volumes,
+			corev1.Volume{
 				Name: "work",
 				VolumeSource: corev1.VolumeSource{
 					EmptyDir: &corev1.EmptyDirVolumeSource{},
 				},
 			},
-			{
-				Name: runnerVolumeName,
-				VolumeSource: corev1.VolumeSource{
-					EmptyDir: &corev1.EmptyDirVolumeSource{},
-				},
-			},
-			{
+			corev1.Volume{
 				Name: "certs-client",
 				VolumeSource: corev1.VolumeSource{
 					EmptyDir: &corev1.EmptyDirVolumeSource{},
 				},
 			},
-		}
-		pod.Spec.Containers[0].VolumeMounts = []corev1.VolumeMount{
-			{
+		)
+		pod.Spec.Containers[0].VolumeMounts = append(pod.Spec.Containers[0].VolumeMounts,
+			corev1.VolumeMount{
 				Name:      "work",
 				MountPath: workDir,
 			},
-			{
-				Name:      runnerVolumeName,
-				MountPath: runnerVolumeMountPath,
-			},
-			{
+			corev1.VolumeMount{
 				Name:      "certs-client",
 				MountPath: "/certs/client",
 				ReadOnly:  true,
 			},
-		}
+		)
 		pod.Spec.Containers[0].Env = append(pod.Spec.Containers[0].Env, []corev1.EnvVar{
 			{
 				Name:  "DOCKER_HOST",
@@ -664,6 +703,7 @@ func (r *RunnerReconciler) newPod(runner v1alpha1.Runner) (corev1.Pod, error) {
 		pod.Spec.Containers = append(pod.Spec.Containers, corev1.Container{
 			Name:  "docker",
 			Image: r.DockerImage,
+			Args:  []string{"dockerd"},
 			VolumeMounts: []corev1.VolumeMount{
 				{
 					Name:      "work",
@@ -692,11 +732,17 @@ func (r *RunnerReconciler) newPod(runner v1alpha1.Runner) (corev1.Pod, error) {

 		if mtu := runner.Spec.DockerMTU; mtu != nil {
 			pod.Spec.Containers[1].Env = append(pod.Spec.Containers[1].Env, []corev1.EnvVar{
+				// See https://docs.docker.com/engine/security/rootless/
 				{
 					Name:  "DOCKERD_ROOTLESS_ROOTLESSKIT_MTU",
 					Value: fmt.Sprintf("%d", *runner.Spec.DockerMTU),
 				},
 			}...)
+
+			pod.Spec.Containers[1].Args = append(pod.Spec.Containers[1].Args,
+				"--mtu",
+				fmt.Sprintf("%d", *runner.Spec.DockerMTU),
+			)
 		}

 	}
@@ -768,6 +814,9 @@ func (r *RunnerReconciler) newPod(runner v1alpha1.Runner) (corev1.Pod, error) {

 func (r *RunnerReconciler) SetupWithManager(mgr ctrl.Manager) error {
 	name := "runner-controller"
+	if r.Name != "" {
+		name = r.Name
+	}

 	r.Recorder = mgr.GetEventRecorderFor(name)

--- a/controllers/runnerdeployment_controller.go
+++ b/controllers/runnerdeployment_controller.go
@@ -38,6 +38,7 @@ import (
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

 	"github.com/summerwind/actions-runner-controller/api/v1alpha1"
+	"github.com/summerwind/actions-runner-controller/controllers/metrics"
 )

 const (
@@ -77,6 +78,8 @@ func (r *RunnerDeploymentReconciler) Reconcile(req ctrl.Request) (ctrl.Result, e
 		return ctrl.Result{}, nil
 	}

+	metrics.SetRunnerDeployment(rd)
+
 	var myRunnerReplicaSetList v1alpha1.RunnerReplicaSetList
 	if err := r.List(ctx, &myRunnerReplicaSetList, client.InNamespace(req.Namespace), client.MatchingFields{runnerSetOwnerKey: req.Name}); err != nil {
 		return ctrl.Result{}, err
@@ -189,17 +192,28 @@ func (r *RunnerDeploymentReconciler) Reconcile(req ctrl.Request) (ctrl.Result, e
 	if len(oldSets) > 0 {
 		readyReplicas := newestSet.Status.ReadyReplicas

-		if readyReplicas < currentDesiredReplicas {
-			log.WithValues("runnerreplicaset", types.NamespacedName{
+		oldSetsCount := len(oldSets)
+
+		logWithDebugInfo := log.WithValues(
+			"newest_runnerreplicaset", types.NamespacedName{
 				Namespace: newestSet.Namespace,
 				Name:      newestSet.Name,
-			}).
-				Info("Waiting until the newest runner replica set to be 100% available",
-					"ready", readyReplicas,
-					"desired", currentDesiredReplicas,
-				)
+			},
+			"newest_runnerreplicaset_replicas_ready", readyReplicas,
+			"newest_runnerreplicaset_replicas_desired", currentDesiredReplicas,
+			"old_runnerreplicasets_count", oldSetsCount,
+		)

-			return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
+		if readyReplicas < currentDesiredReplicas {
+			logWithDebugInfo.
+				Info("Waiting until the newest runnerreplicaset to be 100% available")
+
+			return ctrl.Result{}, nil
+		}
+
+		if oldSetsCount > 0 {
+			logWithDebugInfo.
+				Info("The newest runnerreplicaset is 100% available. Deleting old runnerreplicasets")
 		}

 		for i := range oldSets {
--- a/controllers/runnerdeployment_controller_test.go
+++ b/controllers/runnerdeployment_controller_test.go
@@ -139,7 +139,9 @@ func SetupDeploymentTest(ctx context.Context) *corev1.Namespace {
 		err := k8sClient.Create(ctx, ns)
 		Expect(err).NotTo(HaveOccurred(), "failed to create test namespace")

-		mgr, err := ctrl.NewManager(cfg, ctrl.Options{})
+		mgr, err := ctrl.NewManager(cfg, ctrl.Options{
+			Namespace: ns.Name,
+		})
 		Expect(err).NotTo(HaveOccurred(), "failed to create manager")

 		controller := &RunnerDeploymentReconciler{
@@ -199,7 +201,7 @@ var _ = Context("Inside of a new namespace", func() {
 								},
 							},
 							Spec: actionsv1alpha1.RunnerSpec{
-								Repository: "foo/bar",
+								Repository: "test/valid",
 								Image:      "bar",
 								Env: []corev1.EnvVar{
 									{Name: "FOO", Value: "FOOVALUE"},
@@ -295,7 +297,7 @@ var _ = Context("Inside of a new namespace", func() {
 						Replicas: intPtr(1),
 						Template: actionsv1alpha1.RunnerTemplate{
 							Spec: actionsv1alpha1.RunnerSpec{
-								Repository: "foo/bar",
+								Repository: "test/valid",
 								Image:      "bar",
 								Env: []corev1.EnvVar{
 									{Name: "FOO", Value: "FOOVALUE"},
@@ -391,7 +393,7 @@ var _ = Context("Inside of a new namespace", func() {
 						Replicas: intPtr(1),
 						Template: actionsv1alpha1.RunnerTemplate{
 							Spec: actionsv1alpha1.RunnerSpec{
-								Repository: "foo/bar",
+								Repository: "test/valid",
 								Image:      "bar",
 								Env: []corev1.EnvVar{
 									{Name: "FOO", Value: "FOOVALUE"},
--- a/controllers/runnerreplicaset_controller.go
+++ b/controllers/runnerreplicaset_controller.go
@@ -114,13 +114,14 @@ func (r *RunnerReplicaSetReconciler) Reconcile(req ctrl.Request) (ctrl.Result, e
 		desired = 1
 	}

-	log.V(0).Info("debug", "desired", desired, "available", available)
-
 	if available > desired {
 		n := available - desired

-		// get runners that are currently not busy
-		var notBusy []v1alpha1.Runner
+		log.V(0).Info(fmt.Sprintf("Deleting %d runners", n), "desired", desired, "available", available, "ready", ready)
+
+		// get runners that are currently offline/not busy/timed-out to register
+		var deletionCandidates []v1alpha1.Runner
+
 		for _, runner := range allRunners.Items {
 			busy, err := r.GitHubClient.IsRunnerBusy(ctx, runner.Spec.Enterprise, runner.Spec.Organization, runner.Spec.Repository, runner.Name)
 			if err != nil {
@@ -168,35 +169,37 @@ func (r *RunnerReplicaSetReconciler) Reconcile(req ctrl.Request) (ctrl.Result, e
 						"configuredRegistrationTimeout", registrationTimeout,
 					)

-					notBusy = append(notBusy, runner)
+					deletionCandidates = append(deletionCandidates, runner)
 				}

 				// offline runners should always be a great target for scale down
 				if offline {
-					notBusy = append(notBusy, runner)
+					deletionCandidates = append(deletionCandidates, runner)
 				}
 			} else if !busy {
-				notBusy = append(notBusy, runner)
+				deletionCandidates = append(deletionCandidates, runner)
 			}
 		}

-		if len(notBusy) < n {
-			n = len(notBusy)
+		if len(deletionCandidates) < n {
+			n = len(deletionCandidates)
 		}

 		for i := 0; i < n; i++ {
-			if err := r.Client.Delete(ctx, &notBusy[i]); client.IgnoreNotFound(err) != nil {
+			if err := r.Client.Delete(ctx, &deletionCandidates[i]); client.IgnoreNotFound(err) != nil {
 				log.Error(err, "Failed to delete runner resource")

 				return ctrl.Result{}, err
 			}

-			r.Recorder.Event(&rs, corev1.EventTypeNormal, "RunnerDeleted", fmt.Sprintf("Deleted runner '%s'", notBusy[i].Name))
-			log.Info("Deleted runner", "runnerreplicaset", rs.ObjectMeta.Name)
+			r.Recorder.Event(&rs, corev1.EventTypeNormal, "RunnerDeleted", fmt.Sprintf("Deleted runner '%s'", deletionCandidates[i].Name))
+			log.Info("Deleted runner")
 		}
 	} else if desired > available {
 		n := desired - available

+		log.V(0).Info(fmt.Sprintf("Creating %d runner(s)", n), "desired", desired, "available", available, "ready", ready)
+
 		for i := 0; i < n; i++ {
 			newRunner, err := r.newRunner(rs)
 			if err != nil {
--- a/controllers/runnerreplicaset_controller_test.go
+++ b/controllers/runnerreplicaset_controller_test.go
@@ -47,7 +47,9 @@ func SetupTest(ctx context.Context) *corev1.Namespace {
 		err := k8sClient.Create(ctx, ns)
 		Expect(err).NotTo(HaveOccurred(), "failed to create test namespace")

-		mgr, err := ctrl.NewManager(cfg, ctrl.Options{})
+		mgr, err := ctrl.NewManager(cfg, ctrl.Options{
+			Namespace: ns.Name,
+		})
 		Expect(err).NotTo(HaveOccurred(), "failed to create manager")

 		runnersList = fake.NewRunnersList()
@@ -127,7 +129,7 @@ var _ = Context("Inside of a new namespace", func() {
 								},
 							},
 							Spec: actionsv1alpha1.RunnerSpec{
-								Repository: "foo/bar",
+								Repository: "test/valid",
 								Image:      "bar",
 								Env: []corev1.EnvVar{
 									{Name: "FOO", Value: "FOOVALUE"},
--- a/controllers/suite_test.go
+++ b/controllers/suite_test.go
@@ -55,9 +55,17 @@ func TestAPIs(t *testing.T) {
 var _ = BeforeSuite(func(done Done) {
 	logf.SetLogger(zap.LoggerTo(GinkgoWriter, true))

+	var apiServerFlags []string
+
+	apiServerFlags = append(apiServerFlags, envtest.DefaultKubeAPIServerFlags...)
+	// Avoids the following error:
+	// 2021-03-19T15:14:11.673+0900    ERROR   controller-runtime.controller   Reconciler error      {"controller": "testns-tvjzjrunner", "request": "testns-gdnyx/example-runnerdeploy-zps4z-j5562", "error": "Pod \"example-runnerdeploy-zps4z-j5562\" is invalid: [spec.containers[1].image: Required value, spec.containers[1].securityContext.privileged: Forbidden: disallowed by cluster policy]"}
+	apiServerFlags = append(apiServerFlags, "--allow-privileged=true")
+
 	By("bootstrapping test environment")
 	testEnv = &envtest.Environment{
-		CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
+		CRDDirectoryPaths:  []string{filepath.Join("..", "config", "crd", "bases")},
+		KubeAPIServerFlags: apiServerFlags,
 	}

 	var err error
--- a/main.go
+++ b/main.go
@@ -41,8 +41,8 @@ const (
 )

 var (
-	scheme   = runtime.NewScheme()
-	setupLog = ctrl.Log.WithName("setup")
+	scheme = runtime.NewScheme()
+	log    = ctrl.Log.WithName("actions-runner-controller")
 )

 func init() {
@@ -109,13 +109,13 @@ func main() {
 		Namespace:          namespace,
 	})
 	if err != nil {
-		setupLog.Error(err, "unable to start manager")
+		log.Error(err, "unable to start manager")
 		os.Exit(1)
 	}

 	runnerReconciler := &controllers.RunnerReconciler{
 		Client:       mgr.GetClient(),
-		Log:          ctrl.Log.WithName("controllers").WithName("Runner"),
+		Log:          log.WithName("runner"),
 		Scheme:       mgr.GetScheme(),
 		GitHubClient: ghClient,
 		RunnerImage:  runnerImage,
@@ -123,64 +123,64 @@ func main() {
 	}

 	if err = runnerReconciler.SetupWithManager(mgr); err != nil {
-		setupLog.Error(err, "unable to create controller", "controller", "Runner")
+		log.Error(err, "unable to create controller", "controller", "Runner")
 		os.Exit(1)
 	}

 	runnerSetReconciler := &controllers.RunnerReplicaSetReconciler{
 		Client:       mgr.GetClient(),
-		Log:          ctrl.Log.WithName("controllers").WithName("RunnerReplicaSet"),
+		Log:          log.WithName("runnerreplicaset"),
 		Scheme:       mgr.GetScheme(),
 		GitHubClient: ghClient,
 	}

 	if err = runnerSetReconciler.SetupWithManager(mgr); err != nil {
-		setupLog.Error(err, "unable to create controller", "controller", "RunnerReplicaSet")
+		log.Error(err, "unable to create controller", "controller", "RunnerReplicaSet")
 		os.Exit(1)
 	}

 	runnerDeploymentReconciler := &controllers.RunnerDeploymentReconciler{
 		Client:             mgr.GetClient(),
-		Log:                ctrl.Log.WithName("controllers").WithName("RunnerDeployment"),
+		Log:                log.WithName("runnerdeployment"),
 		Scheme:             mgr.GetScheme(),
 		CommonRunnerLabels: commonRunnerLabels,
 	}

 	if err = runnerDeploymentReconciler.SetupWithManager(mgr); err != nil {
-		setupLog.Error(err, "unable to create controller", "controller", "RunnerDeployment")
+		log.Error(err, "unable to create controller", "controller", "RunnerDeployment")
 		os.Exit(1)
 	}

 	horizontalRunnerAutoscaler := &controllers.HorizontalRunnerAutoscalerReconciler{
 		Client:        mgr.GetClient(),
-		Log:           ctrl.Log.WithName("controllers").WithName("HorizontalRunnerAutoscaler"),
+		Log:           log.WithName("horizontalrunnerautoscaler"),
 		Scheme:        mgr.GetScheme(),
 		GitHubClient:  ghClient,
 		CacheDuration: syncPeriod - 10*time.Second,
 	}

 	if err = horizontalRunnerAutoscaler.SetupWithManager(mgr); err != nil {
-		setupLog.Error(err, "unable to create controller", "controller", "HorizontalRunnerAutoscaler")
+		log.Error(err, "unable to create controller", "controller", "HorizontalRunnerAutoscaler")
 		os.Exit(1)
 	}

 	if err = (&actionsv1alpha1.Runner{}).SetupWebhookWithManager(mgr); err != nil {
-		setupLog.Error(err, "unable to create webhook", "webhook", "Runner")
+		log.Error(err, "unable to create webhook", "webhook", "Runner")
 		os.Exit(1)
 	}
 	if err = (&actionsv1alpha1.RunnerDeployment{}).SetupWebhookWithManager(mgr); err != nil {
-		setupLog.Error(err, "unable to create webhook", "webhook", "RunnerDeployment")
+		log.Error(err, "unable to create webhook", "webhook", "RunnerDeployment")
 		os.Exit(1)
 	}
 	if err = (&actionsv1alpha1.RunnerReplicaSet{}).SetupWebhookWithManager(mgr); err != nil {
-		setupLog.Error(err, "unable to create webhook", "webhook", "RunnerReplicaSet")
+		log.Error(err, "unable to create webhook", "webhook", "RunnerReplicaSet")
 		os.Exit(1)
 	}
 	// +kubebuilder:scaffold:builder

-	setupLog.Info("starting manager")
+	log.Info("starting manager")
 	if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
-		setupLog.Error(err, "problem running manager")
+		log.Error(err, "problem running manager")
 		os.Exit(1)
 	}
 }
--- a/runner/entrypoint.sh
+++ b/runner/entrypoint.sh
@@ -29,21 +29,13 @@ else
  exit 1
 fi

-if [ -n "${RUNNER_WORKDIR}" ]; then
-  WORKDIR_ARG="--work ${RUNNER_WORKDIR}"
-fi
-
-if [ -n "${RUNNER_LABELS}" ]; then
-  LABEL_ARG="--labels ${RUNNER_LABELS}"
-fi
-
 if [ -z "${RUNNER_TOKEN}" ]; then
  echo "RUNNER_TOKEN must be set" 1>&2
  exit 1
 fi

 if [ -z "${RUNNER_REPO}" ] && [ -n "${RUNNER_GROUP}" ];then
-  RUNNER_GROUP_ARG="--runnergroup ${RUNNER_GROUP}"
+  RUNNER_GROUPS=${RUNNER_GROUP}
 fi

 # Hack due to https://github.com/summerwind/actions-runner-controller/issues/252#issuecomment-758338483
@@ -56,7 +48,14 @@ sudo chown -R runner:docker /runner
 mv /runnertmp/* /runner/

 cd /runner
-./config.sh --unattended --replace --name "${RUNNER_NAME}" --url "${GITHUB_URL}${ATTACH}" --token "${RUNNER_TOKEN}" ${RUNNER_GROUP_ARG} ${LABEL_ARG} ${WORKDIR_ARG}
+./config.sh --unattended --replace \
+  --name "${RUNNER_NAME}" \
+  --url "${GITHUB_URL}${ATTACH}" \
+  --token "${RUNNER_TOKEN}" \
+  --runnergroup "${RUNNER_GROUPS}" \
+  --labels "${RUNNER_LABELS}" \
+  --work "${RUNNER_WORKDIR}"
+
 mkdir ./externals
 # Hack due to the DinD volumes
 mv ./externalstmp/* ./externals/
--- a/runner/startup.sh
+++ b/runner/startup.sh
@@ -17,6 +17,34 @@ function wait_for_process () {
    return 0
 }

+sudo /bin/bash <<SCRIPT
+mkdir -p /etc/docker
+
+cat <<EOS > /etc/docker/daemon.json
+{
+EOS
+
+if [ -n "${MTU}" ]; then
+cat <<EOS >> /etc/docker/daemon.json
+  "mtu": ${MTU}
+EOS
+# See https://docs.docker.com/engine/security/rootless/
+echo "environment=DOCKERD_ROOTLESS_ROOTLESSKIT_MTU=${MTU}" >> /etc/supervisor/conf.d/dockerd.conf
+fi
+
+cat <<EOS >> /etc/docker/daemon.json
+}
+EOS
+SCRIPT
+
+INFO "Using /etc/docker/daemon.json with the following content"
+
+cat /etc/docker/daemon.json
+
+INFO "Using /etc/supervisor/conf.d/dockerd.conf with the following content"
+
+cat /etc/supervisor/conf.d/dockerd.conf
+
 INFO "Starting supervisor"
 sudo /usr/bin/supervisord -n >> /dev/null 2>&1 &

@@ -27,6 +55,8 @@ for process in "${processes[@]}"; do
    wait_for_process "$process"
    if [ $? -ne 0 ]; then
        ERROR "$process is not running after max time"
+        ERROR "Dumping /var/log/dockerd.err.log to help investigation"
+        cat /var/log/dockerd.err.log
        exit 1
    else 
        INFO "$process is running"
Author	SHA1	Message	Date
Florian Braun	5b7807d54b	Quote vars in entrypoint.sh to prevent unwanted argument split (#420 ) Prevents arguments from being split when e.g. the RUNNER_GROUP variable contains spaces (which is legit. One can create such groups in GitHub). I've seen that all workers with group names that contain no spaces can register successfully, while all workers with groups that contain spaces will not register. Furthermore, I suppose also other chars can be used here to inject arbitrary commands in an unsupported way via e.g. pipe symbol. Quoting the vars correctly should prevent that and allow for e.g. group names and runner labels with spaces and other bash reserved characters.	2021-03-31 10:09:08 +09:00
Yusuke Kuoka	156e2c1987	Fix MTU configuration for dockerd (#421 ) Resolves #393	2021-03-31 09:29:21 +09:00
Yusuke Kuoka	da4dfb3fdf	Add make target `test-with-deps` to ease setting up dependent binaries (#426 )	2021-03-31 09:23:16 +09:00
Gabriel Dantas Gomes	0783ffe989	some readme typos (#423 )	2021-03-29 10:08:21 +09:00
Yusuke Kuoka	374105c1f3	Fix dindWithinRunnerContainer not to crash-loop runner pods (#419 ) Apparently #253 broke dindWithinRunnerContainer completely due to the difference in how /runner volume is set up.	2021-03-25 10:23:36 +09:00
Yusuke Kuoka	bc6e499e4f	Make logging more concise (#410 ) This makes logging more concise by changing logger names to something like `controllers.Runner` to `actions-runner-controller.runner` after the standard `controller-rutime.controller` and reducing redundant logs by removing unnecessary requeues. I have also tweaked log messages so that their style is more consistent, which will also help readability. Also, runnerreplicaset-controller lacked useful logs so I have enhanced it.	2021-03-20 07:34:25 +09:00
Yusuke Kuoka	07f822bb08	Do include Runner controller in integration test (#409 ) So that we could catch bugs in runner controller like seen in #398, #404, and #407. Ref #400	2021-03-19 16:14:15 +09:00
Hidetake Iwata	3a0332dfdc	Add metrics of RunnerDeployment and HRA (#408 ) * Add metrics of RunnerDeployment and HRA * Use kube-state-metrics-style label names	2021-03-19 16:14:02 +09:00
Yusuke Kuoka	f6ab66c55b	Do not delay min/maxReplicas propagation from HRA to RD due to caching (#406 ) As part of #282, I have introduced some caching mechanism to avoid excessive GitHub API calls due to the autoscaling calculation involving GitHub API calls is executed on each Webhook event. Apparently, it was saving the wrong value in the cache- The value was one after applying `HRA.Spec.{Max,Min}Replicas` so manual changes to {Max,Min}Replicas doesn't affect RunnerDeployment.Spec.Replicas until the cache expires. This isn't what I had wanted. This patch fixes that, by changing the value being cached to one before applying {Min,Max}Replicas. Additionally, I've also updated logging so that you observe which number was fetched from cache, and what number was suggested by either TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy, and what was the final number used as the desired-replicas(after applying {Min,Max}Replicas). Follow-up for #282	2021-03-19 12:58:02 +09:00
Yusuke Kuoka	d874a5cfda	Fix `status.lastRegistrationCheckTime in body must be of type string: \"null\"` errors (#407 ) Follow-up for #398 and #404	2021-03-19 11:15:35 +09:00
Yusuke Kuoka	c424215044	Do recheck runner registration timely (#405 ) Since #392, the runner controller could have taken unexpectedly long time until it finally notices that the runner has been registered to GitHub. This patch fixes the issue, so that the controller will notice the successful registration in approximately 1 minute(hard-coded). More concretely, let's say you had configured a long sync-period of like 10m, the runner controller could have taken approx 10m to notice the successful registration. The original expectation was 1m, because it was intended to recheck every 1m as implemented in #392. It wasn't working as such due to my misunderstanding in how requeueing work.	2021-03-19 11:02:47 +09:00