Commit Graph

355 Commits

Author SHA1 Message Date
Nikola Jokic
b6a95ae879 Change duplicate message key in logs while updating ephemeral runner status (#3380) 2024-03-26 12:57:46 +01:00
Nikola Jokic
7a643a5107 Fix overscaling when the controller is much faster then the listener (#3371)
Co-authored-by: Francesco Renzi <rentziass@gmail.com>
2024-03-20 15:36:12 +01:00
Nikola Jokic
c9099a5a56 Add annotation with values hash to re-create listener (#3195) 2024-03-19 14:29:49 +01:00
Hidehito Yabuuchi
48706584fd Propagate runner scale set name annotation to EphemeralRunner (#3098) 2024-03-19 12:50:49 +01:00
Nikola Jokic
c00465973e Publish metrics in the new ghalistener (#3193) 2024-01-25 14:46:42 +01:00
Nikola Jokic
fe8c3bb789 Change listener container name (#3167) 2023-12-19 12:22:52 +01:00
Serge
d7d479172d Fix override listener pod spec (#3139) (#3161)
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
2023-12-18 16:50:06 +01:00
Nikola Jokic
b78cadd901 Refactoring listener app with configurable fallback (#3096) 2023-12-08 13:41:06 +01:00
Nikola Jokic
202a97ab12 Modify user agent format with subsystem and is proxy configured information (#3116) 2023-12-08 13:16:29 +01:00
Toru Komatsu
b08d533105 Record the error when the creation pod fails (#3112)
Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-12-07 21:11:52 +01:00
Toru Komatsu
7793e1974a Record a reason for pod failure in EphemeralRunner (#3074)
Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-11-21 08:26:29 +01:00
Nikola Jokic
2939640fa9 Add ResizePolicy and RestartPolicy on mergeListenerContainer (#3075) 2023-11-15 10:39:11 +01:00
Nikola Jokic
65fd04540c Bump go version and all direct dependencies to newest for k8s compatibility (#2947) 2023-11-14 16:19:43 +01:00
Nikola Jokic
95487735a2 Remove inheritance of imagePullPolicy from manager to listeners (#3009) 2023-11-07 15:08:36 +01:00
Nikola Jokic
2117fd1892 Configure listener pod with the secret instead of env (#2965)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-10-19 12:29:32 +02:00
Nikola Jokic
bffcb32b19 Fix role and rolebinding cleanup for the listener controller (#2970) 2023-10-16 12:40:38 +02:00
Nikola Jokic
fdf7b6c525 Fix nil map when annotations are applied (#2916)
Co-authored-by: Hidetake Iwata <int128@gmail.com>
2023-09-26 11:21:16 +02:00
Nikola Jokic
07bff8aa1e Extend the user agent and fix the build version for the listener app (#2892) 2023-09-14 20:10:49 +02:00
Nikola Jokic
ea2fb32e20 Extend and generate crds allowing listener pod spec change (#2758)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-09-14 15:33:29 +02:00
Nikola Jokic
a0a3916c80 Provide scale-set listener metrics (#2559)
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-08-21 13:50:07 +02:00
Gavin Williams
cd996e7c27 Fix panic: slice bounds out of range when runner spec contains volumeMounts. (#2720)
Signed-off-by: Gavin Williams <gavin.williams@machinemax.com>
2023-07-25 09:53:50 +09:00
Lars Lange
297442975e fix: remove callbacks resulting in scales due to incomplete response (#2671)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2023-07-25 09:04:54 +09:00
Nikola Jokic
6fe8008640 Add configurable log format to values.yaml and propagate it to listener (#2686) 2023-07-05 21:06:42 +02:00
Nikola Jokic
94934819c4 Trim repo/org/enterprise to 63 characters in label values (#2657) 2023-06-09 20:57:20 +02:00
Nuru
aac811f210 Update unconsumed HRA capacity reservation's expiration more frequently and consistently (#2502)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2023-05-30 09:04:57 +09:00
Thang Le
e7ec736738 Use head_branch metric (#2549)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2023-05-28 16:36:55 +09:00
Daniel Hobley
90ea691e72 feat: allow for modifying var-run mount maximum size limit (#2624) 2023-05-27 11:47:23 +09:00
Bassem Dghaidi
8afef51c8b Add DrainJobsMode (aka UpdateStrategy feature) (#2569) 2023-05-23 07:42:30 -04:00
argokasper
a2d4b95b79 Fix GET validation for lowercase http methods (#2497)
Some requests send method in lowercase (verified with curl and as a default for AWS ALB health check requests), but Go HTTP library constant MethodGet is in upper.
2023-04-27 13:22:41 +09:00
Nuru
9bd4025e9c Stricter filtering of check run completion events (#2520)
I observed that 100% of canceled jobs in my runner pool were not causing scale down events. This PR fixes that.

The problem was caused by #2119. 

#2119 ignores certain webhook events in order to fix #2118. However, #2119 overdoes it and filters out valid job cancellation events. This PR uses stricter filtering and add visibility for future troubleshooting.

<details><summary>Example cancellation event</summary>

This is the redacted top portion of a valid cancellation event my runner pool received and ignored.

```json
{
  "action": "completed",
  "workflow_job": {
    "id": 12848997134,
    "run_id": 4738060033,
    "workflow_name": "slack-notifier",
    "head_branch": "auto-update/slack-notifier-0.5.1",
    "run_url": "https://api.github.com/repos/nuru/<redacted>/actions/runs/4738060033",
    "run_attempt": 1,
    "node_id": "CR_kwDOB8Xtbc8AAAAC_dwjDg",
    "head_sha": "55bada8f3d0d3e12a510a1bf34d0c3e169b65f89",
    "url": "https://api.github.com/repos/nuru/<redacted>/actions/jobs/12848997134",
    "html_url": "https://github.com/nuru/<redacted>/actions/runs/4738060033/jobs/8411515430",
    "status": "completed",
    "conclusion": "cancelled",
    "created_at": "2023-04-19T00:03:12Z",
    "started_at": "2023-04-19T00:03:42Z",
    "completed_at": "2023-04-19T00:03:42Z",
    "name": "build (arm64)",
    "steps": [

    ],
    "check_run_url": "https://api.github.com/repos/nuru/<redacted>/check-runs/12848997134",
    "labels": [
      "self-hosted",
      "arm64"
    ],
    "runner_id": 0,
    "runner_name": "",
    "runner_group_id": 0,
    "runner_group_name": ""
  },
```

</details>
2023-04-27 13:15:23 +09:00
Yusuke Kuoka
94c089c407 Revert docker.sock path to /var/run/docker.sock (#2536)
Starting ARC v0.27.2, we've changed the `docker.sock` path from `/var/run/docker.sock` to `/var/run/docker/docker.sock`. That resulted in breaking some container-based actions due to the hard-coded `docker.sock` path in various places.

Even `actions/runner` seem to use `/var/run/docker.sock` for building container-based actions and for service containers?

Anyway, this fixes that by moving the sock file back to the previous location.

Once this gets merged, users stuck at ARC v0.27.1, previously upgraded to 0.27.2 or 0.27.3 and reverted back to v0.27.1 due to #2519, should be able to upgrade to the upcoming v0.27.4.

Resolves #2519
Resolves #2538
2023-04-27 13:06:35 +09:00
Nikola Jokic
9859bbc7f2 Use build.Version to check if resource version is a mismatch (#2521)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-04-24 10:40:15 +02:00
Yusuke Kuoka
3e0bc3f7be Fix docker.sock permission error for non-dind Ubuntu 20.04 runners since v0.27.2 (#2499)
#2490 has been happening since v0.27.2 for non-dind runners based on Ubuntu 20.04 runner images. It does not affect Ubuntu 22.04 non-dind runners(i.e. runners with dockerd sidecars) and Ubuntu 20.04/22.04 dind runners(i.e. runners without dockerd sidecars). However, presuming many folks are still using Ubuntu 20.04 runners and non-dind runners, we should fix it.

This change tries to fix it by defaulting to the docker group id 1001 used by Ubuntu 20.04 runners, and use gid 121 for Ubuntu 22.04 runners. We use the image tag to see which Ubuntu version the runner is based on. The algorithm is so simple- we assume it's Ubuntu-22.04-based if the image tag contains "22.04".

This might be a breaking change for folks who have already upgraded to Ubuntu 22.04 runners using their own custom runner images. Note again; we rely on the image tag to detect Ubuntu 22.04 runner images and use the proper docker gid- Folks using our official Ubuntu 22.04 runner images are not affected. It is a breaking change anyway, so I have added a remedy-

ARC got a new flag, `--docker-gid`, which defaults to `1001` but can be set to `121` or whatever gid the operator/admin likes. This can be set to `--docker-gid=121`, for example, if you are using your own custom runner image based on Ubuntu 22.04 and the image tag does not contain "22.04".

Fixes #2490
2023-04-17 21:30:41 +09:00
Nikola Jokic
ba1ac0990b Reordering methods and constants so it is easier to look it up (#2501) 2023-04-12 09:50:23 +02:00
Nikola Jokic
b86af190f7 Extend manager roles to accept ephemeralrunnerset/finalizers (#2493) 2023-04-10 08:49:32 +02:00
Nikola Jokic
a804bf8b00 Add ImagePullPolicy to the AutoscalingListener, configurable through Manager env (#2477) 2023-04-04 19:07:20 +02:00
Nikola Jokic
5dea6db412 Fix helm uninstall cleanup by adding finalizers and cleaning them from the controller (#2433)
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
2023-04-03 21:06:12 +02:00
Stewart Thomson
a7ef871248 Check if appID and instID are non-empty before attempting to parseInt (#2463) 2023-04-03 09:06:59 +09:00
Milas Bowman
878c9b8b49 runner: Use Docker socket via shared emptyDir instead of TCP/mTLS (#2324)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2023-03-28 11:29:16 +09:00
Nikola Jokic
56e1c62ac2 Add labels to autoscaling runner set subresources to allow easier inspection (#2391)
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
2023-03-27 11:19:34 +02:00
Tingluo Huang
08acb1b831 Get RunnerScaleSet based on both RunnerGroupId and Name. (#2413) 2023-03-15 11:10:09 -04:00
Tingluo Huang
2bf83d0d7f Remove list/watch secrets permission from the manager cluster role. (#2276) 2023-03-14 09:23:14 -04:00
Tingluo Huang
261d4371b5 Update E2E test workflow. (#2395) 2023-03-14 09:00:07 -04:00
Nikola Jokic
babbfc77d5 Surface EphemeralRunnerSet stats to AutoscalingRunnerSet (#2382) 2023-03-13 16:16:28 +01:00
Tingluo Huang
d7b589bed5 Helm chart react changes for the new runner image. (#2348) 2023-03-10 11:18:21 +00:00
Francesco Renzi
c569304271 Add support for self-signed CA certificates (#2268)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
Co-authored-by: Nikola Jokic <jokicnikola07@gmail.com>
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
2023-03-09 17:23:32 +00:00
Chris Patterson
41f2ca3ed9 Adding parameter to configure the runner set name. (#2279)
Co-authored-by: TingluoHuang <TingluoHuang@github.com>
2023-03-03 08:36:14 -05:00
Francesco Renzi
40c905f25d Simplify the setup of controller tests (#2352) 2023-03-02 18:55:49 +00:00
Nikola Jokic
2984de912c Split listener pod label to avoid long names issue (#2341) 2023-03-02 17:25:50 +01:00
Alex Williams
69abd51f30 Ensure that EffectiveTime is updated on webhook scale down (#2258)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2023-03-01 08:27:37 +09:00