Server Admin Log
Appearance
2025-06-07
- 11:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2007.codfw.wmnet with OS bullseye
- 11:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2006.codfw.wmnet with OS bullseye
- 08:12 elukey: restart apache2 / php-fpm on phab1004
- 04:18 mutante: restarted apache on phab1004
2025-06-06
- 21:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:25 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:19 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:02 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: downtime before decom
- 20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2007.codfw.wmnet with reason: host reimage
- 20:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2007.codfw.wmnet with reason: host reimage
- 20:35 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2007.codfw.wmnet with OS bullseye
- 20:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2006.codfw.wmnet with reason: host reimage
- 20:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2006.codfw.wmnet with reason: host reimage
- 19:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2006.codfw.wmnet with OS bullseye
- 19:11 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2244.codfw.wmnet with OS bookworm
- 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2244.codfw.wmnet with reason: host reimage
- 17:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2244.codfw.wmnet with reason: host reimage
- 17:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2244.codfw.wmnet with OS bookworm
- 17:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2244']
- 17:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2244']
- 17:20 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 17:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:08 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 17:06 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 17:00 sukhe: forced agent run on O:alerting_host to reload vopsbot to add cdobbins
- 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2244
- 16:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2244
- 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:42 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:08 sbassett: Deployed security update to fix T396111
- 15:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
- 15:34 eevans@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
- 15:24 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 15:23 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 15:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1013.eqiad.wmnet
- 15:19 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1013.eqiad.wmnet
- 14:53 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:42 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:24 sukhe@dns1004: END - running authdns-update
- 14:23 sukhe@dns1004: START - running authdns-update
- 14:23 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2004*} and (A:dnsbox)
- 14:23 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2004.wikimedia.org
- 14:22 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns1005*} and (A:dnsbox)
- 14:22 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1005.wikimedia.org
- 14:10 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1005.wikimedia.org
- 14:10 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns1005*} and (A:dnsbox)
- 14:10 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2004.wikimedia.org
- 14:10 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2004*} and (A:dnsbox)
- 13:51 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2006*} and (A:dnsbox)
- 13:51 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2006.wikimedia.org
- 13:49 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns1006*} and (A:dnsbox)
- 13:49 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1006.wikimedia.org
- 13:40 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 13:40 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 13:40 vgutierrez@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: switching to katran
- 13:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2006.wikimedia.org
- 13:35 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2006*} and (A:dnsbox)
- 13:34 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1006.wikimedia.org
- 13:34 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns1006*} and (A:dnsbox)
- 13:32 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns3003*} and (A:dnsbox)
- 13:32 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3003.wikimedia.org
- 13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns6001*} and (A:dnsbox)
- 13:31 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6001.wikimedia.org
- 13:21 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:21 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add back entry for mistakenly deleted ssw1-a8-codfw IP - cmooney@cumin1003"
- 13:21 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add back entry for mistakenly deleted ssw1-a8-codfw IP - cmooney@cumin1003"
- 13:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6001.wikimedia.org
- 13:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns6001*} and (A:dnsbox)
- 13:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3003.wikimedia.org
- 13:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns3003*} and (A:dnsbox)
- 13:13 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns5003*} and (A:dnsbox)
- 13:13 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5003.wikimedia.org
- 13:09 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns4003*} and (A:dnsbox)
- 13:09 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4003.wikimedia.org
- 13:06 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5003.wikimedia.org
- 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns5003*} and (A:dnsbox)
- 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4003.wikimedia.org
- 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns4003*} and (A:dnsbox)
- 12:20 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2158 - Depool db2158.codfw.wmnet to then clone it to db2151.codfw.wmnet - fceratto@cumin1002
- 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2158 - Depool db2158.codfw.wmnet to then clone it to db2151.codfw.wmnet - fceratto@cumin1002
- 12:19 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2158.codfw.wmnet onto db2151.codfw.wmnet
- 12:03 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:03 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 11:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Disabling notifications
- 11:42 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 11:42 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 11:32 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:32 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:32 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:31 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:31 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 11:30 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 11:30 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 11:28 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 11:28 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:28 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:28 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:27 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:27 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 11:26 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 11:25 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 11:22 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:21 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 11:18 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:18 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 11:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:12 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:12 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 09:40 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2151* - Log issue and disk filled up
- 09:40 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2151* - Log issue and disk filled up
- 09:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:04 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 08:37 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398044
- 08:36 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 398044
- 08:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46997
- 08:34 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 46997
- 08:33 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 394065
- 08:33 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 394065
- 08:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46562
- 08:32 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 46562
- 08:30 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13150
- 08:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 13150
- 08:28 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
- 08:25 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 199524
- 08:24 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 08:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63199
- 08:23 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 08:23 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 63199
- 08:22 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 08:22 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2113.codfw.wmnet
- 08:11 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cirrussearch2113.codfw.wmnet
- 08:10 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 07:52 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on cirrussearch2113.codfw.wmnet with reason: T394543
- 06:15 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1157-1159].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 5 - rack F1
- 06:14 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1160-1162].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 5 - rack F1
- 06:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b3-magru
- 06:13 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device asw1-b3-magru
- 05:42 XioNoX: push pfw policies - T395904
- 03:02 eevans@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
- 02:57 eevans@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
2025-06-05
- 22:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 22:04 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 22:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 22:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:42 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:00 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 20:59 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 20:57 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 20:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:56 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 20:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:49 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for Fix back compat for data-chart (T395462) (duration: 10m 05s)
- 20:17 jdlrobson@deploy1003: jdlrobson: Continuing with sync
- 20:16 jdlrobson@deploy1003: jdlrobson: Backport for Fix back compat for data-chart (T395462) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:14 jdlrobson@deploy1003: Started scap sync-world: Backport for Fix back compat for data-chart (T395462)
- 20:09 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 19:24 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:22 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:15 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:14 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
- 19:13 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 19:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:12 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
- 19:12 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 18:52 phuedx: Disabled the SDS 2.4.11 Synthetic A/A Test in xLab
- 18:49 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 18:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:43 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:32 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:30 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:21 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.4 refs T392174
- 18:21 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:20 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
- 18:20 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 18:19 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:18 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 18:17 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 18:17 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
- 18:17 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:17 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:05 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 18:04 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 17:28 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:27 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:25 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:24 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:24 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:24 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:21 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 17:21 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 17:15 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1013.eqiad.wmnet
- 17:15 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1013.eqiad.wmnet
- 16:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief2002.codfw.wmnet,acmechief1002.eqiad.wmnet,acmechief-test2001.codfw.wmnet,acmechief-test1001.eqiad.wmnet
- 16:54 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief2002.codfw.wmnet,acmechief1002.eqiad.wmnet,acmechief-test2001.codfw.wmnet,acmechief-test1001.eqiad.wmnet
- 16:51 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@930d28b]: adapt check_bad_parsing to dumps 2.0 (duration: 01m 16s)
- 16:50 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@930d28b]: adapt check_bad_parsing to dumps 2.0
- 16:50 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on acmechief2002.codfw.wmnet,acmechief1002.eqiad.wmnet,acmechief-test2001.codfw.wmnet,acmechief-test1001.eqiad.wmnet with reason: Reboots
- 16:27 jdlrobson@deploy1003: Finished scap sync-world: Backport for Revert "Deploy survey to en at twenty percent" (duration: 11m 23s)
- 16:20 jdlrobson@deploy1003: jdlrobson, jdrewniak: Continuing with sync
- 16:18 jdlrobson@deploy1003: jdlrobson, jdrewniak: Backport for Revert "Deploy survey to en at twenty percent" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T395241)', diff saved to https://phabricator.wikimedia.org/P77191 and previous config saved to /var/cache/conftool/dbconfig/20250605-161701-fceratto.json
- 16:16 jdlrobson@deploy1003: Started scap sync-world: Backport for Revert "Deploy survey to en at twenty percent"
- 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2244
- 16:02 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2244
- 16:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P77190 and previous config saved to /var/cache/conftool/dbconfig/20250605-160154-fceratto.json
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2244 to codfw - jhancock@cumin2002"
- 16:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2244 to codfw - jhancock@cumin2002"
- 15:57 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:55 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on doc1003.eqiad.wmnet with reason: Bookworm Migration
- 15:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P77189 and previous config saved to /var/cache/conftool/dbconfig/20250605-154647-fceratto.json
- 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T395241)', diff saved to https://phabricator.wikimedia.org/P77188 and previous config saved to /var/cache/conftool/dbconfig/20250605-153139-fceratto.json
- 15:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T395241)', diff saved to https://phabricator.wikimedia.org/P77187 and previous config saved to /var/cache/conftool/dbconfig/20250605-152314-fceratto.json
- 15:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T395241)', diff saved to https://phabricator.wikimedia.org/P77186 and previous config saved to /var/cache/conftool/dbconfig/20250605-152248-fceratto.json
- 15:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2004-dev.codfw.wmnet
- 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P77185 and previous config saved to /var/cache/conftool/dbconfig/20250605-150741-fceratto.json
- 14:59 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2004-dev.codfw.wmnet
- 14:59 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
- 14:53 damilare: payments-wiki upgraded from 2d8b655a to aa102260
- 14:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P77184 and previous config saved to /var/cache/conftool/dbconfig/20250605-145234-fceratto.json
- 14:50 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
- 14:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T395241)', diff saved to https://phabricator.wikimedia.org/P77183 and previous config saved to /var/cache/conftool/dbconfig/20250605-143724-fceratto.json
- 14:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T395241)', diff saved to https://phabricator.wikimedia.org/P77182 and previous config saved to /var/cache/conftool/dbconfig/20250605-142908-fceratto.json
- 14:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance
- 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77181 and previous config saved to /var/cache/conftool/dbconfig/20250605-142840-fceratto.json
- 14:20 taavi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'private.codfw.wikimedia.cloud$' on codfw recursors
- 14:20 taavi@cumin1002: START - Cookbook sre.dns.wipe-cache 'private.codfw.wikimedia.cloud$' on codfw recursors
- 14:19 tgr@deploy1003: Unlocked for deployment [MediaWiki]: T395468 (duration: 39m 39s)
- 14:19 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: switching to katran
- 14:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 14:18 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2007
- 14:18 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2007
- 14:17 taavi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:17 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign cloud-private v6 addresses for codfw1dev devices - taavi@cumin1002"
- 14:17 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign cloud-private v6 addresses for codfw1dev devices - taavi@cumin1002"
- 14:17 tgr: deploying a PrivateSettings config change
- 14:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P77179 and previous config saved to /var/cache/conftool/dbconfig/20250605-141333-fceratto.json
- 14:13 taavi@cumin1002: START - Cookbook sre.dns.netbox
- 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77178 and previous config saved to /var/cache/conftool/dbconfig/20250605-141108-root.json
- 13:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P77177 and previous config saved to /var/cache/conftool/dbconfig/20250605-135826-fceratto.json
- 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77176 and previous config saved to /var/cache/conftool/dbconfig/20250605-135603-root.json
- 13:51 marostegui: Migrate s2 codfw to SBR dbmaint T383795
- 13:50 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 13:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2037.codfw.wmnet
- 13:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
- 13:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:49 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 13:49 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 13:48 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 13:48 vgutierrez: upload liberica 0.16 to bookworm-wikimedia (apt.wm.o) - T395228
- 13:48 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 13:47 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 13:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
- 13:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77175 and previous config saved to /var/cache/conftool/dbconfig/20250605-134319-fceratto.json
- 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77174 and previous config saved to /var/cache/conftool/dbconfig/20250605-134158-root.json
- 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77173 and previous config saved to /var/cache/conftool/dbconfig/20250605-134153-root.json
- 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77172 and previous config saved to /var/cache/conftool/dbconfig/20250605-134057-root.json
- 13:40 moritzm: installing net-tools bugfix updates for bookworm
- 13:40 tgr@deploy1003: Locking from deployment [MediaWiki]: T395468
- 13:38 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2037.codfw.wmnet
- 13:37 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
- 13:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
- 13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77171 and previous config saved to /var/cache/conftool/dbconfig/20250605-133500-fceratto.json
- 13:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 13:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T395241)', diff saved to https://phabricator.wikimedia.org/P77170 and previous config saved to /var/cache/conftool/dbconfig/20250605-133434-fceratto.json
- 13:31 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 13:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77169 and previous config saved to /var/cache/conftool/dbconfig/20250605-132652-root.json
- 13:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77168 and previous config saved to /var/cache/conftool/dbconfig/20250605-132648-root.json
- 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77167 and previous config saved to /var/cache/conftool/dbconfig/20250605-132552-root.json
- 13:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow7001.magru.wmnet
- 13:23 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:23 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 13:21 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 13:21 Lucas_WMDE: UTC afternoon backport+config window done
- 13:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P77166 and previous config saved to /var/cache/conftool/dbconfig/20250605-131926-fceratto.json
- 13:18 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
- 13:17 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 13:16 gkyziridis@deploy1003: Finished scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (excluding azwiki) (T395823) (duration: 11m 51s)
- 13:12 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts netflow7001.magru.wmnet
- 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77165 and previous config saved to /var/cache/conftool/dbconfig/20250605-131147-root.json
- 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77164 and previous config saved to /var/cache/conftool/dbconfig/20250605-131142-root.json
- 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77163 and previous config saved to /var/cache/conftool/dbconfig/20250605-131046-root.json
- 13:09 gkyziridis@deploy1003: gkyziridis: Continuing with sync
- 13:07 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (excluding azwiki) (T395823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:04 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (excluding azwiki) (T395823)
- 13:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P77162 and previous config saved to /var/cache/conftool/dbconfig/20250605-130419-fceratto.json
- 13:03 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "remove outdated octavia net - taavi@cumin1002"
- 13:03 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "remove outdated octavia net - taavi@cumin1002"
- 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77161 and previous config saved to /var/cache/conftool/dbconfig/20250605-125641-root.json
- 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77160 and previous config saved to /var/cache/conftool/dbconfig/20250605-125637-root.json
- 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77159 and previous config saved to /var/cache/conftool/dbconfig/20250605-125540-root.json
- 12:54 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
- 12:54 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
- 12:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2151 T395989', diff saved to https://phabricator.wikimedia.org/P77158 and previous config saved to /var/cache/conftool/dbconfig/20250605-125057-marostegui.json
- 12:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T395241)', diff saved to https://phabricator.wikimedia.org/P77157 and previous config saved to /var/cache/conftool/dbconfig/20250605-124912-fceratto.json
- 12:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
- 12:43 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
- 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77156 and previous config saved to /var/cache/conftool/dbconfig/20250605-124136-root.json
- 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77155 and previous config saved to /var/cache/conftool/dbconfig/20250605-124131-root.json
- 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T395241)', diff saved to https://phabricator.wikimedia.org/P77154 and previous config saved to /var/cache/conftool/dbconfig/20250605-124110-fceratto.json
- 12:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 12:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77153 and previous config saved to /var/cache/conftool/dbconfig/20250605-124043-fceratto.json
- 12:32 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:32 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 12:31 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:30 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 12:30 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:30 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77151 and previous config saved to /var/cache/conftool/dbconfig/20250605-122631-root.json
- 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77150 and previous config saved to /var/cache/conftool/dbconfig/20250605-122625-root.json
- 12:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P77149 and previous config saved to /var/cache/conftool/dbconfig/20250605-122537-fceratto.json
- 12:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es[2042,2045].codfw.wmnet with reason: Maintenance
- 12:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2042 es2045 T395241', diff saved to https://phabricator.wikimedia.org/P77147 and previous config saved to /var/cache/conftool/dbconfig/20250605-122035-marostegui.json
- 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P77146 and previous config saved to /var/cache/conftool/dbconfig/20250605-121029-fceratto.json
- 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77145 and previous config saved to /var/cache/conftool/dbconfig/20250605-115522-fceratto.json
- 11:49 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:48 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77144 and previous config saved to /var/cache/conftool/dbconfig/20250605-114711-fceratto.json
- 11:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 11:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 11:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T395241)', diff saved to https://phabricator.wikimedia.org/P77143 and previous config saved to /var/cache/conftool/dbconfig/20250605-114213-fceratto.json
- 11:35 moritzm: installing Linux 5.10.237 on Bullseye hosts
- 11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P77142 and previous config saved to /var/cache/conftool/dbconfig/20250605-112706-fceratto.json
- 11:26 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
- 11:25 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
- 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77141 and previous config saved to /var/cache/conftool/dbconfig/20250605-112518-root.json
- 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77140 and previous config saved to /var/cache/conftool/dbconfig/20250605-112511-root.json
- 11:19 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
- 11:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P77139 and previous config saved to /var/cache/conftool/dbconfig/20250605-111158-fceratto.json
- 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77138 and previous config saved to /var/cache/conftool/dbconfig/20250605-111013-root.json
- 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77137 and previous config saved to /var/cache/conftool/dbconfig/20250605-111005-root.json
- 11:09 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
- 11:04 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
- 11:03 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
- 11:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
- 11:02 gehel: restarting Blazegraph on wdqs1023 to address allocator decreasing alert
- 10:57 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
- 10:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T395241)', diff saved to https://phabricator.wikimedia.org/P77136 and previous config saved to /var/cache/conftool/dbconfig/20250605-105650-fceratto.json
- 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77135 and previous config saved to /var/cache/conftool/dbconfig/20250605-105507-root.json
- 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77134 and previous config saved to /var/cache/conftool/dbconfig/20250605-105500-root.json
- 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77133 and previous config saved to /var/cache/conftool/dbconfig/20250605-105216-root.json
- 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T395241)', diff saved to https://phabricator.wikimedia.org/P77132 and previous config saved to /var/cache/conftool/dbconfig/20250605-104928-fceratto.json
- 10:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
- 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77131 and previous config saved to /var/cache/conftool/dbconfig/20250605-104912-fceratto.json
- 10:42 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
- 10:41 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2030.codfw.wmnet
- 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77130 and previous config saved to /var/cache/conftool/dbconfig/20250605-104002-root.json
- 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77129 and previous config saved to /var/cache/conftool/dbconfig/20250605-103954-root.json
- 10:39 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1165.eqiad.wmnet
- 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77128 and previous config saved to /var/cache/conftool/dbconfig/20250605-103711-root.json
- 10:34 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
- 10:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P77127 and previous config saved to /var/cache/conftool/dbconfig/20250605-103403-fceratto.json
- 10:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
- 10:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1165.eqiad.wmnet
- 10:31 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1164.eqiad.wmnet
- 10:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 10:30 Ammar: Ran fixStuckGlobalRename.php for T396054
- 10:27 claime: Manual run of generatecaptcha on mw-cron with delete - T388531
- 10:26 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:26 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77126 and previous config saved to /var/cache/conftool/dbconfig/20250605-102456-root.json
- 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77125 and previous config saved to /var/cache/conftool/dbconfig/20250605-102449-root.json
- 10:24 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1164.eqiad.wmnet
- 10:23 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1163.eqiad.wmnet
- 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2041 to es4 master and es2044 as es5 master', diff saved to https://phabricator.wikimedia.org/P77124 and previous config saved to /var/cache/conftool/dbconfig/20250605-102319-root.json
- 10:22 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2029.codfw.wmnet
- 10:22 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
- 10:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77123 and previous config saved to /var/cache/conftool/dbconfig/20250605-102205-root.json
- 10:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 10:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P77122 and previous config saved to /var/cache/conftool/dbconfig/20250605-101856-fceratto.json
- 10:16 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
- 10:16 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1163.eqiad.wmnet
- 10:09 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
- 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77121 and previous config saved to /var/cache/conftool/dbconfig/20250605-100950-root.json
- 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77120 and previous config saved to /var/cache/conftool/dbconfig/20250605-100943-root.json
- 10:08 cgoubert@deploy1003: Finished scap sync-world: Backport for Revert "ores-extension: enable extension with revertrisk filter for second batch of wikis" (duration: 10m 36s)
- 10:08 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
- 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77119 and previous config saved to /var/cache/conftool/dbconfig/20250605-100700-root.json
- 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77118 and previous config saved to /var/cache/conftool/dbconfig/20250605-100527-root.json
- 10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es[2043,2046].codfw.wmnet with reason: Maintenance
- 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2043 es2046 T395241', diff saved to https://phabricator.wikimedia.org/P77117 and previous config saved to /var/cache/conftool/dbconfig/20250605-100419-marostegui.json
- 10:04 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@d11bd51]: Update webrequest-test hive jar for ua-parser (duration: 00m 16s)
- 10:03 joal@deploy1003: Started deploy [airflow-dags/analytics_test@d11bd51]: Update webrequest-test hive jar for ua-parser
- 10:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77116 and previous config saved to /var/cache/conftool/dbconfig/20250605-100349-fceratto.json
- 10:03 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
- 10:01 cgoubert@deploy1003: gkyziridis, cgoubert: Continuing with sync
- 10:00 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
- 10:00 cgoubert@deploy1003: gkyziridis, cgoubert: Backport for Revert "ores-extension: enable extension with revertrisk filter for second batch of wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:58 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
- 09:58 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1007.eqiad.wmnet
- 09:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
- 09:57 cgoubert@deploy1003: Started scap sync-world: Backport for Revert "ores-extension: enable extension with revertrisk filter for second batch of wikis"
- 09:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77115 and previous config saved to /var/cache/conftool/dbconfig/20250605-095415-fceratto.json
- 09:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 09:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T395241)', diff saved to https://phabricator.wikimedia.org/P77114 and previous config saved to /var/cache/conftool/dbconfig/20250605-095347-fceratto.json
- 09:53 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 09:53 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 09:52 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
- 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77113 and previous config saved to /var/cache/conftool/dbconfig/20250605-095155-root.json
- 09:50 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1007.eqiad.wmnet
- 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77112 and previous config saved to /var/cache/conftool/dbconfig/20250605-095022-root.json
- 09:44 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
- 09:44 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:43 claime: Re-enabling CPU/RAM limits on mw-cron - T395436
- 09:43 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P77111 and previous config saved to /var/cache/conftool/dbconfig/20250605-093840-fceratto.json
- 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77110 and previous config saved to /var/cache/conftool/dbconfig/20250605-093649-root.json
- 09:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:36 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
- 09:36 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
- 09:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:35 hnowlan: Migrate reading lists API out of restbase for group1 via rest-gateway
- 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77109 and previous config saved to /var/cache/conftool/dbconfig/20250605-093515-root.json
- 09:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2158 T395989', diff saved to https://phabricator.wikimedia.org/P77108 and previous config saved to /var/cache/conftool/dbconfig/20250605-093107-marostegui.json
- 09:30 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
- 09:26 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
- 09:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add netflow7002 - jmm@cumin1003"
- 09:24 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2026.codfw.wmnet
- 09:24 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add netflow7002 - jmm@cumin1003"
- 09:24 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
- 09:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P77107 and previous config saved to /var/cache/conftool/dbconfig/20250605-092333-fceratto.json
- 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77106 and previous config saved to /var/cache/conftool/dbconfig/20250605-092010-root.json
- 09:18 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
- 09:17 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
- 09:13 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
- 09:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T395241)', diff saved to https://phabricator.wikimedia.org/P77105 and previous config saved to /var/cache/conftool/dbconfig/20250605-090825-fceratto.json
- 09:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77104 and previous config saved to /var/cache/conftool/dbconfig/20250605-090504-root.json
- 08:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T395241)', diff saved to https://phabricator.wikimedia.org/P77103 and previous config saved to /var/cache/conftool/dbconfig/20250605-085847-fceratto.json
- 08:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 08:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T395241)', diff saved to https://phabricator.wikimedia.org/P77102 and previous config saved to /var/cache/conftool/dbconfig/20250605-085820-fceratto.json
- 08:53 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2026.codfw.wmnet
- 08:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
- 08:50 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
- 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77101 and previous config saved to /var/cache/conftool/dbconfig/20250605-084959-root.json
- 08:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
- 08:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2169 T395989', diff saved to https://phabricator.wikimedia.org/P77100 and previous config saved to /var/cache/conftool/dbconfig/20250605-084557-marostegui.json
- 08:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
- 08:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
- 08:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P77099 and previous config saved to /var/cache/conftool/dbconfig/20250605-084313-fceratto.json
- 08:35 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
- 08:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P77098 and previous config saved to /var/cache/conftool/dbconfig/20250605-082806-fceratto.json
- 08:24 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow7002.magru.wmnet
- 08:24 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow7002.magru.wmnet with OS bookworm
- 08:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T395241)', diff saved to https://phabricator.wikimedia.org/P77097 and previous config saved to /var/cache/conftool/dbconfig/20250605-081258-fceratto.json
- 08:10 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
- 08:09 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
- 08:06 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow7002.magru.wmnet with reason: host reimage
- 08:03 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow7002.magru.wmnet with reason: host reimage
- 08:03 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
- 08:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T395241)', diff saved to https://phabricator.wikimedia.org/P77096 and previous config saved to /var/cache/conftool/dbconfig/20250605-080310-fceratto.json
- 08:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 08:00 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 08:00 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 07:56 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
- 07:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 07:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
- 07:44 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:43 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
- 07:38 gkyziridis@deploy1003: Sync cancelled.
- 07:36 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host netflow7002.magru.wmnet with OS bookworm
- 07:36 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 07:32 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
- 07:32 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
- 07:27 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow7002.magru.wmnet - jmm@cumin1003"
- 07:27 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow7002.magru.wmnet - jmm@cumin1003"
- 07:26 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow7002.magru.wmnet on all recursors
- 07:26 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache netflow7002.magru.wmnet on all recursors
- 07:26 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:26 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow7002.magru.wmnet - jmm@cumin1003"
- 07:26 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow7002.magru.wmnet - jmm@cumin1003"
- 07:26 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
- 07:23 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (T395823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:22 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 07:22 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host netflow7002.magru.wmnet
- 07:21 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (T395823)
- 07:19 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
- 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77095 and previous config saved to /var/cache/conftool/dbconfig/20250605-064137-root.json
- 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77094 and previous config saved to /var/cache/conftool/dbconfig/20250605-062629-root.json
- 06:19 marostegui: Change datadir on pc8 dbmaint eqiad codfw T395983
- 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc8 T395983', diff saved to https://phabricator.wikimedia.org/P77093 and previous config saved to /var/cache/conftool/dbconfig/20250605-061929-marostegui.json
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc8 T395983', diff saved to https://phabricator.wikimedia.org/P77092 and previous config saved to /var/cache/conftool/dbconfig/20250605-061612-marostegui.json
- 06:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2018.codfw.wmnet,pc1018.eqiad.wmnet with reason: Maintenance
- 06:15 marostegui: Change datadir on pc7 dbmaint eqiad codfw T395983
- 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc7 T395983', diff saved to https://phabricator.wikimedia.org/P77091 and previous config saved to /var/cache/conftool/dbconfig/20250605-061502-marostegui.json
- 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc7 T395983', diff saved to https://phabricator.wikimedia.org/P77090 and previous config saved to /var/cache/conftool/dbconfig/20250605-061200-marostegui.json
- 06:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2017.codfw.wmnet,pc1017.eqiad.wmnet with reason: Maintenance
- 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77089 and previous config saved to /var/cache/conftool/dbconfig/20250605-061124-root.json
- 05:57 marostegui: Change datadir on pc6 dbmaint eqiad codfw T395983
- 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc6 T395983', diff saved to https://phabricator.wikimedia.org/P77088 and previous config saved to /var/cache/conftool/dbconfig/20250605-055655-marostegui.json
- 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77087 and previous config saved to /var/cache/conftool/dbconfig/20250605-055619-root.json
- 05:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2016.codfw.wmnet,pc1016.eqiad.wmnet with reason: Maintenance
- 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc6 T395983', diff saved to https://phabricator.wikimedia.org/P77086 and previous config saved to /var/cache/conftool/dbconfig/20250605-055438-marostegui.json
- 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc5 T395983', diff saved to https://phabricator.wikimedia.org/P77085 and previous config saved to /var/cache/conftool/dbconfig/20250605-055349-marostegui.json
- 05:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: Maintenance
- 05:51 marostegui: Change datadir on pc5 dbmaint eqiad codfw T395983
- 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc5 T395983', diff saved to https://phabricator.wikimedia.org/P77084 and previous config saved to /var/cache/conftool/dbconfig/20250605-055121-marostegui.json
- 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc4 T395983', diff saved to https://phabricator.wikimedia.org/P77083 and previous config saved to /var/cache/conftool/dbconfig/20250605-055013-marostegui.json
- 05:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2014.codfw.wmnet,pc1014.eqiad.wmnet with reason: Maintenance
- 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc4 T395983', diff saved to https://phabricator.wikimedia.org/P77082 and previous config saved to /var/cache/conftool/dbconfig/20250605-054806-marostegui.json
- 05:47 marostegui: Change datadir on pc4 dbmaint eqiad codfw T395983
- 05:47 marostegui: Change datadir on pc3 dbmaint eqiad codfw T395983
- 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77081 and previous config saved to /var/cache/conftool/dbconfig/20250605-054113-root.json
- 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc3 T395983', diff saved to https://phabricator.wikimedia.org/P77080 and previous config saved to /var/cache/conftool/dbconfig/20250605-053647-marostegui.json
- 05:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance
- 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc3 T395983', diff saved to https://phabricator.wikimedia.org/P77079 and previous config saved to /var/cache/conftool/dbconfig/20250605-053317-marostegui.json
- 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc2 T395983', diff saved to https://phabricator.wikimedia.org/P77078 and previous config saved to /var/cache/conftool/dbconfig/20250605-052934-marostegui.json
- 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77077 and previous config saved to /var/cache/conftool/dbconfig/20250605-052604-root.json
- 05:25 marostegui: Change datadir on pc2 dbmaint eqiad codfw T395983
- 05:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Maintenance
- 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc2 T395983', diff saved to https://phabricator.wikimedia.org/P77076 and previous config saved to /var/cache/conftool/dbconfig/20250605-052442-marostegui.json
- 05:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2180 T395989', diff saved to https://phabricator.wikimedia.org/P77075 and previous config saved to /var/cache/conftool/dbconfig/20250605-052003-marostegui.json
2025-06-04
- 23:55 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
- 22:45 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
- 22:30 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7016.magru.wmnet
- 22:27 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 22:20 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7014.magru.wmnet
- 22:18 damilare: SmashPig upgraded from d08693e5 to 3222a1f3
- 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump cache key version in EventStore (T396075) (duration: 13m 54s)
- 22:12 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7016.magru.wmnet
- 22:12 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7015.magru.wmnet
- 22:12 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7015.magru.wmnet
- 22:11 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7015.magru.wmnet
- 22:11 brett: sudo -i cumin 'A:ncredir' 'depool && apt-get update && apt-get upgrade -y && pool' -b1 -s10
- 22:09 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 22:04 ladsgroup@deploy1003: ladsgroup: Backport for Bump cache key version in EventStore (T396075) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:02 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump cache key version in EventStore (T396075)
- 22:02 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7015.magru.wmnet
- 22:02 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7014.magru.wmnet
- 22:02 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7013.magru.wmnet
- 21:58 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7012.magru.wmnet
- 21:43 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7013.magru.wmnet
- 21:42 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7011.magru.wmnet
- 21:40 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7012.magru.wmnet
- 21:40 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7010.magru.wmnet
- 21:39 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7010.magru.wmnet
- 21:35 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7010.magru.wmnet
- 21:29 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
- 21:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7011.magru.wmnet
- 21:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7010.magru.wmnet
- 21:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250526/ using stat1009.eqiad.wmnet)
- 21:22 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7009.magru.wmnet
- 21:14 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7008.magru.wmnet
- 21:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 21:06 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:05 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7009.magru.wmnet
- 21:05 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7008.magru.wmnet
- 21:04 cjming: end of UTC late backport window
- 21:04 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7007.magru.wmnet
- 21:02 cjming@deploy1003: Finished scap sync-world: Backport for SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784), SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784) (d
- 21:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 20:55 cjming@deploy1003: matmarex, cjming: Continuing with sync
- 20:55 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:54 cjming@deploy1003: matmarex, cjming: Backport for SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784), SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784) synced to
- 20:51 cjming@deploy1003: Started scap sync-world: Backport for SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784), SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784)
- 20:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 20:50 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7006.magru.wmnet
- 20:46 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7007.magru.wmnet
- 20:44 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7005.magru.wmnet
- 20:40 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7006.magru.wmnet
- 20:38 cjming@deploy1003: Finished scap sync-world: Backport for Treat File::getShortDesc() as possibly unsafe HTML (T395834), Treat File::getShortDesc() as possibly unsafe HTML (T395834) (duration: 15m 37s)
- 20:37 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7004.magru.wmnet
- 20:31 cjming@deploy1003: cjming, matmarex: Continuing with sync
- 20:26 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7005.magru.wmnet
- 20:26 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7004.magru.wmnet
- 20:25 cjming@deploy1003: cjming, matmarex: Backport for Treat File::getShortDesc() as possibly unsafe HTML (T395834), Treat File::getShortDesc() as possibly unsafe HTML (T395834) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:23 cjming@deploy1003: Started scap sync-world: Backport for Treat File::getShortDesc() as possibly unsafe HTML (T395834), Treat File::getShortDesc() as possibly unsafe HTML (T395834)
- 20:20 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7003.magru.wmnet
- 20:18 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7002.magru.wmnet
- 20:15 cjming@deploy1003: Finished scap sync-world: Backport for beta cluster: Set $wgOATHAuthAccountPrefix (T396061) (duration: 10m 13s)
- 20:10 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7003.magru.wmnet
- 20:09 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7002.magru.wmnet
- 20:08 cjming@deploy1003: lucaswerkmeister, cjming: Continuing with sync
- 20:07 cjming@deploy1003: lucaswerkmeister, cjming: Backport for beta cluster: Set $wgOATHAuthAccountPrefix (T396061) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:05 cjming@deploy1003: Started scap sync-world: Backport for beta cluster: Set $wgOATHAuthAccountPrefix (T396061)
- 20:03 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp3072.esams.wmnet
- 19:54 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3072.esams.wmnet
- 19:42 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7001.magru.wmnet
- 19:36 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 19:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7001.magru.wmnet
- 19:23 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 19:22 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 19:13 sukhe@dns1004: END - running authdns-update
- 19:12 sukhe@dns1004: START - running authdns-update
- 19:11 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.wikimedia.org [reason: repooling after reboot]
- 19:11 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.magru.wmnet [reason: repooling after reboot]
- 19:11 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7002.wikimedia.org
- 19:11 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7002.wikimedia.org
- 19:10 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:magru and (A:dnsbox)
- 19:10 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7002.wikimedia.org
- 19:10 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 19:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 19:10 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 19:10 dreamyjazz@deploy1003: Finished scap sync-world: Backport for CustomBlockedDomainStorage::fetchConfig: Cast LinkTarget to a Title for RevisionStore (T396056) (duration: 12m 27s)
- 19:10 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 19:09 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 19:09 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 19:08 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 19:03 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 19:00 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 19:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 19:00 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 18:59 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 18:59 dreamyjazz@deploy1003: dreamyjazz: Backport for CustomBlockedDomainStorage::fetchConfig: Cast LinkTarget to a Title for RevisionStore (T396056) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:57 dreamyjazz@deploy1003: Started scap sync-world: Backport for CustomBlockedDomainStorage::fetchConfig: Cast LinkTarget to a Title for RevisionStore (T396056)
- 18:55 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7002.wikimedia.org
- 18:51 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.*
- 18:49 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 18:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7001.wikimedia.org
- 18:37 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.*
- 18:37 brett: depooling cp7001 for CPU stress testing and temperature effects (T373993)
- 18:29 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7001.wikimedia.org
- 18:29 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:magru and (A:dnsbox)
- 18:24 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 18:24 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 18:18 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.4 refs T392174
- 18:16 damilare: SmashPig upgraded from a99f2265 to d08693e5
- 18:15 sukhe: puppet re-enabled on A:cp and finished rolling out removal of ats-be from single backend cp nodes: T288106
- 18:14 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:03 bvibber@deploy1003: Finished scap sync-world: Backport for Update Charts so they can render from data-mw-charts as well as data-charts (T395462), Update Charts so they can render from data-mw-charts as well as data-charts (T395462) (duration: 10m 05s)
- 17:56 bvibber@deploy1003: bvibber: Continuing with sync
- 17:55 bvibber@deploy1003: bvibber: Backport for Update Charts so they can render from data-mw-charts as well as data-charts (T395462), Update Charts so they can render from data-mw-charts as well as data-charts (T395462) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:53 bvibber@deploy1003: Started scap sync-world: Backport for Update Charts so they can render from data-mw-charts as well as data-charts (T395462), Update Charts so they can render from data-mw-charts as well as data-charts (T395462)
- 17:50 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: sync
- 17:49 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: sync
- 17:48 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:48 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:35 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:15 cgoubert@deploy1003: Finished scap sync-world: 1153647: mediawiki: Fix captcha configmap structure - T388531 (duration: 02m 39s)
- 17:13 cgoubert@deploy1003: Started scap sync-world: 1153647: mediawiki: Fix captcha configmap structure - T388531
- 16:41 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 16:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:32 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:30 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
- 16:30 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 16:27 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1165.eqiad.wmnet
- 16:27 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1165.eqiad.wmnet
- 16:22 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:22 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 16:22 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 16:19 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1164.eqiad.wmnet
- 16:18 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1164.eqiad.wmnet
- 16:18 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1165.eqiad.wmnet
- 16:16 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1165.eqiad.wmnet
- 16:14 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 16:14 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1164.eqiad.wmnet
- 16:13 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1164.eqiad.wmnet
- 16:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:12 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1164.eqiad.wmnet
- 16:12 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum and not A:magru and A:durum
- 16:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:11 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and not A:magru and A:wikidough
- 16:10 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1164.eqiad.wmnet
- 16:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1163.eqiad.wmnet
- 16:04 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 16:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 16:03 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1163.eqiad.wmnet
- 16:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T395241)', diff saved to https://phabricator.wikimedia.org/P77074 and previous config saved to /var/cache/conftool/dbconfig/20250604-160120-fceratto.json
- 15:56 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:56 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P77073 and previous config saved to /var/cache/conftool/dbconfig/20250604-154611-fceratto.json
- 15:42 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
- 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P77072 and previous config saved to /var/cache/conftool/dbconfig/20250604-153104-fceratto.json
- 15:30 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
- 15:29 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
- 15:24 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Set wgCheckUserDisableCheckUserAPI to false on loginwiki (T396010) (duration: 10m 03s)
- 15:20 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 15:20 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
- 15:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:18 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:17 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 15:16 dreamyjazz@deploy1003: dreamyjazz: Backport for Set wgCheckUserDisableCheckUserAPI to false on loginwiki (T396010) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T395241)', diff saved to https://phabricator.wikimedia.org/P77071 and previous config saved to /var/cache/conftool/dbconfig/20250604-151556-fceratto.json
- 15:15 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum and not A:magru and A:durum
- 15:14 dreamyjazz@deploy1003: Started scap sync-world: Backport for Set wgCheckUserDisableCheckUserAPI to false on loginwiki (T396010)
- 15:14 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and not A:magru and A:wikidough
- 15:14 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:08 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T395241)', diff saved to https://phabricator.wikimedia.org/P77070 and previous config saved to /var/cache/conftool/dbconfig/20250604-150740-fceratto.json
- 15:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T395241)', diff saved to https://phabricator.wikimedia.org/P77069 and previous config saved to /var/cache/conftool/dbconfig/20250604-150716-fceratto.json
- 15:05 jiji@deploy1003: Finished scap sync-world: T276994: Chart bump, noop (duration: 02m 52s)
- 15:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:02 jiji@deploy1003: Started scap sync-world: T276994: Chart bump, noop
- 15:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:00 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 15:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:00 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1163.eqiad.wmnet
- 14:58 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1163.eqiad.wmnet
- 14:58 moritzm: installing Linux 5.10.237 on Bullseye hosts
- 14:55 cmooney@dns2005: END - running authdns-update
- 14:54 cmooney@dns2005: START - running authdns-update
- 14:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P77068 and previous config saved to /var/cache/conftool/dbconfig/20250604-145209-fceratto.json
- 14:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove DNS entries for IPs used in Nokia test lab codfw - cmooney@cumin1002"
- 14:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove DNS entries for IPs used in Nokia test lab codfw - cmooney@cumin1002"
- 14:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1094.eqiad.wmnet with OS bullseye
- 14:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:47 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
- 14:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
- 14:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
- 14:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 14:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1095.eqiad.wmnet with OS bullseye
- 14:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:43 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
- 14:41 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
- 14:38 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:37 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P77067 and previous config saved to /var/cache/conftool/dbconfig/20250604-143702-fceratto.json
- 14:36 cgoubert@deploy1003: Finished scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531 (duration: 02m 33s)
- 14:33 cgoubert@deploy1003: Started scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531
- 14:32 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:32 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
- 14:31 cgoubert@deploy1003: Finished scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531 (duration: 02m 24s)
- 14:29 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
- 14:29 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
- 14:28 cgoubert@deploy1003: Started scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531
- 14:28 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 14:27 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 14:27 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 14:26 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 14:26 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:25 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:24 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh7001.wikimedia.org
- 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
- 14:23 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
- 14:23 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 14:23 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
- 14:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T395241)', diff saved to https://phabricator.wikimedia.org/P77066 and previous config saved to /var/cache/conftool/dbconfig/20250604-142155-fceratto.json
- 14:21 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:21 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 14:20 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 14:19 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum7001.magru.wmnet
- 14:19 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:19 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
- 14:19 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
- 14:19 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 14:19 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
- 14:17 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 14:16 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 14:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
- 14:13 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
- 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T395241)', diff saved to https://phabricator.wikimedia.org/P77065 and previous config saved to /var/cache/conftool/dbconfig/20250604-141238-fceratto.json
- 14:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T395241)', diff saved to https://phabricator.wikimedia.org/P77064 and previous config saved to /var/cache/conftool/dbconfig/20250604-141213-fceratto.json
- 14:11 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 14:11 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 14:08 sukhe: decommissioning doh7001 and durum7001: T396015
- 14:07 sukhe@cumin1002: START - Cookbook sre.hosts.decommission for hosts doh7001.wikimedia.org
- 14:07 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:07 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:07 sukhe@cumin1002: START - Cookbook sre.hosts.decommission for hosts durum7001.magru.wmnet
- 14:06 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:06 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:05 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:05 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 14:04 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 14:02 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:02 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
- 13:58 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
- 13:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P77062 and previous config saved to /var/cache/conftool/dbconfig/20250604-135706-fceratto.json
- 13:56 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
- 13:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
- 13:51 claime: Manual run of generatecaptcha on mw-cron, no delete - T388531
- 13:51 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 13:50 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 13:49 sukhe: sudo cumin -b1 -s15 'A:cp' 'run-puppet-agent --enable "merging CR 1114074"': T288106
- 13:48 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 13:48 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 13:47 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
- 13:46 sukhe: forcing ats-backend-restart on cp1104
- 13:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1095.eqiad.wmnet with reason: host reimage
- 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77061 and previous config saved to /var/cache/conftool/dbconfig/20250604-134336-root.json
- 13:43 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P77060 and previous config saved to /var/cache/conftool/dbconfig/20250604-134158-fceratto.json
- 13:41 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host grafana2001.codfw.wmnet
- 13:41 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 13:40 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host grafana2001.codfw.wmnet
- 13:40 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 13:40 samtar@deploy1003: Finished scap sync-world: Backport for IS: Undo turning on wgTemplateDataEnableCategoryBrowser for mw.org (T377975) (duration: 09m 57s)
- 13:39 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1095.eqiad.wmnet with reason: host reimage
- 13:38 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host grafana2001.codfw.wmnet
- 13:38 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 13:37 sukhe: forcing agent run on cp2037 (non-single BE node): CR 1114074
- 13:37 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1094.eqiad.wmnet with OS bullseye
- 13:33 samtar@deploy1003: samtar: Continuing with sync
- 13:32 samtar@deploy1003: samtar: Backport for IS: Undo turning on wgTemplateDataEnableCategoryBrowser for mw.org (T377975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:30 sukhe: forcing agent run on cp7001 (single BE node): CR 1114074
- 13:30 samtar@deploy1003: Started scap sync-world: Backport for IS: Undo turning on wgTemplateDataEnableCategoryBrowser for mw.org (T377975)
- 13:29 sukhe: forcing agent run on cp6015: CR 1114074
- 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77059 and previous config saved to /var/cache/conftool/dbconfig/20250604-132829-root.json
- 13:27 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1013.eqiad.wmnet
- 13:27 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1013.eqiad.wmnet
- 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T395241)', diff saved to https://phabricator.wikimedia.org/P77058 and previous config saved to /var/cache/conftool/dbconfig/20250604-132648-fceratto.json
- 13:23 sukhe: starting removal of ats-be service from eqiad, eqsin, esams, magru, ulsfo: T288106
- 13:21 sukhe: sudo cumin 'A:cp' 'disable-puppet "merging CR 1114074"'
- 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1095.eqiad.wmnet with OS bullseye
- 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T395241)', diff saved to https://phabricator.wikimedia.org/P77057 and previous config saved to /var/cache/conftool/dbconfig/20250604-131852-fceratto.json
- 13:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
- 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T395241)', diff saved to https://phabricator.wikimedia.org/P77056 and previous config saved to /var/cache/conftool/dbconfig/20250604-131827-fceratto.json
- 13:14 jforrester@deploy1003: Finished scap sync-world: Backport for release CampaignEvents to cbk-zam wiki (T393604), Bump portals to the 2025-06-02 09:23:11+00:00 build (T128546), build: Rename the rarely-used 'typos' script to 'checkTypos', Drop Chart roll-out dblists, no longer needed (T383079) (duration: 10m 29s)
- 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77055 and previous config saved to /var/cache/conftool/dbconfig/20250604-131323-root.json
- 13:11 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 13:11 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 13:08 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
- 13:07 jforrester@deploy1003: jforrester, mhorsey: Continuing with sync
- 13:06 jforrester@deploy1003: jforrester, mhorsey: Backport for release CampaignEvents to cbk-zam wiki (T393604), Bump portals to the 2025-06-02 09:23:11+00:00 build (T128546), build: Rename the rarely-used 'typos' script to 'checkTypos', Drop Chart roll-out dblists, no longer needed (T383079) synced to the testservers (see https://wikitech.wikimedia
- 13:04 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 13:04 jforrester@deploy1003: Started scap sync-world: Backport for release CampaignEvents to cbk-zam wiki (T393604), Bump portals to the 2025-06-02 09:23:11+00:00 build (T128546), build: Rename the rarely-used 'typos' script to 'checkTypos', Drop Chart roll-out dblists, no longer needed (T383079)
- 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P77054 and previous config saved to /var/cache/conftool/dbconfig/20250604-130319-fceratto.json
- 13:03 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 13:02 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 13:02 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 13:02 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 13:01 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 13:01 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
- 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77053 and previous config saved to /var/cache/conftool/dbconfig/20250604-125817-root.json
- 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77051 and previous config saved to /var/cache/conftool/dbconfig/20250604-124311-root.json
- 12:43 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
- 12:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1003.eqiad.wmnet
- 12:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
- 12:39 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm
- 12:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
- 12:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1095.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:36 moritzm: installing modsecurity-apache security updates
- 12:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1094.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:35 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1003.eqiad.wmnet
- 12:35 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:35 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for ms-be1094/95 - jclark@cumin1002"
- 12:35 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for ms-be1094/95 - jclark@cumin1002"
- 12:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T395241)', diff saved to https://phabricator.wikimedia.org/P77050 and previous config saved to /var/cache/conftool/dbconfig/20250604-123304-fceratto.json
- 12:32 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77049 and previous config saved to /var/cache/conftool/dbconfig/20250604-122948-root.json
- {{safesubst:SAL entry|1=12:28 reedy@deploy1003: Finished scap sync-world: Backport for GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531), captcha.py: Expand variables and user in filenames (T395810), captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804), [[gerrit:1153595|captcha.py: Bail out if no words were rea}}
- 12:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77048 and previous config saved to /var/cache/conftool/dbconfig/20250604-122806-root.json
- 12:27 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcontrol2010-dev.codfw.wmnet on all recursors
- 12:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache cloudcontrol2010-dev.codfw.wmnet on all recursors
- 12:26 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephosd2010-dev.codfw.wmnet on all recursors
- 12:26 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache cloudcephosd2010-dev.codfw.wmnet on all recursors
- 12:25 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for cloudcontrol2010-dev which had been added on wrong vlan - cmooney@cumin1002"
- 12:25 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for cloudcontrol2010-dev which had been added on wrong vlan - cmooney@cumin1002"
- 12:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T395241)', diff saved to https://phabricator.wikimedia.org/P77047 and previous config saved to /var/cache/conftool/dbconfig/20250604-122436-fceratto.json
- 12:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 12:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T395241)', diff saved to https://phabricator.wikimedia.org/P77046 and previous config saved to /var/cache/conftool/dbconfig/20250604-122411-fceratto.json
- 12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2217 T395989', diff saved to https://phabricator.wikimedia.org/P77045 and previous config saved to /var/cache/conftool/dbconfig/20250604-122303-marostegui.json
- 12:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 12:21 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 12:21 reedy@deploy1003: reedy: Continuing with sync
- 12:21 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- {{safesubst:SAL entry|1=12:20 reedy@deploy1003: reedy: Backport for GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531), captcha.py: Expand variables and user in filenames (T395810), captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804), [[gerrit:1153595|captcha.py: Bail out if no words were read from wordlist (T3}}
- 12:18 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- {{safesubst:SAL entry|1=12:18 reedy@deploy1003: Started scap sync-world: Backport for GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531), captcha.py: Expand variables and user in filenames (T395810), captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804), [[gerrit:1153595|captcha.py: Bail out if no words were read}}
- 12:16 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77044 and previous config saved to /var/cache/conftool/dbconfig/20250604-121442-root.json
- 12:12 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 12:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P77043 and previous config saved to /var/cache/conftool/dbconfig/20250604-120904-fceratto.json
- 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77041 and previous config saved to /var/cache/conftool/dbconfig/20250604-115936-root.json
- 11:58 samtar@deploy1003: Finished scap sync-world: Backport for IS/IS-labs: Enable TemplateDiscovery flags for mediawikiwiki (T377975) (duration: 12m 28s)
- 11:55 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts krb1001.eqiad.wmnet
- 11:55 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:55 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 11:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P77040 and previous config saved to /var/cache/conftool/dbconfig/20250604-115357-fceratto.json
- 11:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 11:51 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm
- 11:51 samtar@deploy1003: samtar: Continuing with sync
- 11:47 samtar@deploy1003: samtar: Backport for IS/IS-labs: Enable TemplateDiscovery flags for mediawikiwiki (T377975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:45 samtar@deploy1003: Started scap sync-world: Backport for IS/IS-labs: Enable TemplateDiscovery flags for mediawikiwiki (T377975)
- 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77038 and previous config saved to /var/cache/conftool/dbconfig/20250604-114430-root.json
- 11:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T395241)', diff saved to https://phabricator.wikimedia.org/P77037 and previous config saved to /var/cache/conftool/dbconfig/20250604-113849-fceratto.json
- 11:38 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 11:35 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 11:35 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 11:35 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 11:34 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 11:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
- 11:34 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
- 11:33 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 11:32 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
- 11:32 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts krb1001.eqiad.wmnet
- 11:31 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:31 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T395241)', diff saved to https://phabricator.wikimedia.org/P77036 and previous config saved to /var/cache/conftool/dbconfig/20250604-113030-fceratto.json
- 11:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 11:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T395241)', diff saved to https://phabricator.wikimedia.org/P77035 and previous config saved to /var/cache/conftool/dbconfig/20250604-113005-fceratto.json
- 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77034 and previous config saved to /var/cache/conftool/dbconfig/20250604-112923-root.json
- 11:27 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
- 11:25 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:25 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:22 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:22 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:20 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
- 11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P77033 and previous config saved to /var/cache/conftool/dbconfig/20250604-111457-fceratto.json
- 11:14 claime: Deployed k8s-controller-sidecars version 1.0.2-3
- 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77032 and previous config saved to /var/cache/conftool/dbconfig/20250604-111418-root.json
- 11:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2193 T395989', diff saved to https://phabricator.wikimedia.org/P77031 and previous config saved to /var/cache/conftool/dbconfig/20250604-110955-marostegui.json
- 11:06 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:06 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:06 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:05 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:05 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:04 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:04 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:03 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P77030 and previous config saved to /var/cache/conftool/dbconfig/20250604-105950-fceratto.json
- 10:52 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 10:51 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 10:45 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:44 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T395241)', diff saved to https://phabricator.wikimedia.org/P77029 and previous config saved to /var/cache/conftool/dbconfig/20250604-104443-fceratto.json
- 10:38 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 10:37 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 10:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T395241)', diff saved to https://phabricator.wikimedia.org/P77028 and previous config saved to /var/cache/conftool/dbconfig/20250604-103629-fceratto.json
- 10:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 10:36 moritzm: failover ganeti master in ulsfo to ganeti4005
- 10:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T395241)', diff saved to https://phabricator.wikimedia.org/P77027 and previous config saved to /var/cache/conftool/dbconfig/20250604-103604-fceratto.json
- 10:35 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
- 10:35 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
- 10:35 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
- 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77026 and previous config saved to /var/cache/conftool/dbconfig/20250604-103233-root.json
- 10:29 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
- 10:25 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
- 10:25 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
- 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77025 and previous config saved to /var/cache/conftool/dbconfig/20250604-102351-root.json
- 10:23 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
- 10:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
- 10:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P77024 and previous config saved to /var/cache/conftool/dbconfig/20250604-102056-fceratto.json
- 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77023 and previous config saved to /var/cache/conftool/dbconfig/20250604-101728-root.json
- 10:17 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
- 10:09 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
- 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77022 and previous config saved to /var/cache/conftool/dbconfig/20250604-100846-root.json
- 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P77021 and previous config saved to /var/cache/conftool/dbconfig/20250604-100547-fceratto.json
- 10:05 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: switching to katran
- 10:04 vgutierrez: upload liberica 0.15 to bookwork-wikimedia (apt.wm.o) - T395228
- 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77020 and previous config saved to /var/cache/conftool/dbconfig/20250604-100222-root.json
- 10:00 vgutierrez: depool lvs1013 before switching to katran - T395228
- 09:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:55 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77018 and previous config saved to /var/cache/conftool/dbconfig/20250604-095340-root.json
- 09:52 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:52 akosiaris: re-deploy changeprop-jobqueue to set the Host HTTP header for submission of all jobs. T395451
- 09:52 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 09:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T395241)', diff saved to https://phabricator.wikimedia.org/P77017 and previous config saved to /var/cache/conftool/dbconfig/20250604-095041-fceratto.json
- 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77016 and previous config saved to /var/cache/conftool/dbconfig/20250604-094715-root.json
- 09:46 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
- 09:46 akosiaris: T395451 deploy mw-jobrunner hot patch for VirtualHost selection, testing out that the single version change will work this time around.
- 09:46 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
- 09:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T395241)', diff saved to https://phabricator.wikimedia.org/P77015 and previous config saved to /var/cache/conftool/dbconfig/20250604-094217-fceratto.json
- 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
- 09:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
- 09:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77014 and previous config saved to /var/cache/conftool/dbconfig/20250604-094152-fceratto.json
- 09:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm
- 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77013 and previous config saved to /var/cache/conftool/dbconfig/20250604-093835-root.json
- 09:37 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 09:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 09:36 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
- 09:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
- 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77011 and previous config saved to /var/cache/conftool/dbconfig/20250604-093251-root.json
- 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77010 and previous config saved to /var/cache/conftool/dbconfig/20250604-093209-root.json
- 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc1 T395983', diff saved to https://phabricator.wikimedia.org/P77009 and previous config saved to /var/cache/conftool/dbconfig/20250604-092819-marostegui.json
- 09:27 marostegui: Move datadir on pc1011 dbmaint pc1 eqiad T395983
- 09:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P77008 and previous config saved to /var/cache/conftool/dbconfig/20250604-092645-fceratto.json
- 09:24 taavi@cumin1002: conftool action : set/weight=100:pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=x3
- 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77007 and previous config saved to /var/cache/conftool/dbconfig/20250604-092328-root.json
- 09:20 marostegui: Move datadir on pc2011 dbmaint pc1 codfw T395983
- 09:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Maintenance
- 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc1 T395983', diff saved to https://phabricator.wikimedia.org/P77006 and previous config saved to /var/cache/conftool/dbconfig/20250604-091921-marostegui.json
- 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77005 and previous config saved to /var/cache/conftool/dbconfig/20250604-091745-root.json
- 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77004 and previous config saved to /var/cache/conftool/dbconfig/20250604-091704-root.json
- 09:16 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 09:16 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 09:16 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 09:15 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:15 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 09:15 akosiaris: T395451 rollback the host header addition, this is erroring out, returning 404s.
- 09:14 akosiaris: T395451 rollback the host header addition, this is erroring out, returning 3xx.
- 09:14 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:14 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 09:14 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 09:12 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 09:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P77003 and previous config saved to /var/cache/conftool/dbconfig/20250604-091138-fceratto.json
- 09:11 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 09:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2044.codfw.wmnet with reason: Maintenance
- 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2044', diff saved to https://phabricator.wikimedia.org/P77002 and previous config saved to /var/cache/conftool/dbconfig/20250604-091041-marostegui.json
- 09:10 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 09:10 moritzm: installing qemu bugfix updates from Bookworm point release
- 09:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f1-codfw
- 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77001 and previous config saved to /var/cache/conftool/dbconfig/20250604-090823-root.json
- 09:06 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f1-codfw
- 09:05 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 09:03 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:02 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77000 and previous config saved to /var/cache/conftool/dbconfig/20250604-090240-root.json
- 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2041', diff saved to https://phabricator.wikimedia.org/P76999 and previous config saved to /var/cache/conftool/dbconfig/20250604-090226-marostegui.json
- 09:02 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2041.codfw.wmnet with reason: Maintenance
- 09:01 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 08:57 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 08:57 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 08:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T395241)', diff saved to https://phabricator.wikimedia.org/P76998 and previous config saved to /var/cache/conftool/dbconfig/20250604-085630-fceratto.json
- 08:53 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 08:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 08:51 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm
- 08:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T395241)', diff saved to https://phabricator.wikimedia.org/P76997 and previous config saved to /var/cache/conftool/dbconfig/20250604-084819-fceratto.json
- 08:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
- 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76996 and previous config saved to /var/cache/conftool/dbconfig/20250604-084735-root.json
- 08:42 akosiaris: deploy changeprop-jobqueue to set the Host HTTP header for submission of all jobs. T395451
- 08:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 08:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76995 and previous config saved to /var/cache/conftool/dbconfig/20250604-084231-fceratto.json
- 08:42 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 08:41 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 08:38 moritzm: removing ganeti7001 from magru01 cluster T394263
- 08:38 marostegui: Change s6 eqiad dbmaint to SBR T383795
- 08:38 akosiaris: revoke and clean helm-charts.discovery.wmnet old cergen cert from puppetmaster1001
- 08:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76994 and previous config saved to /var/cache/conftool/dbconfig/20250604-083229-root.json
- 08:28 marostegui: Change s6 codfw dbmaint to SBR T383795
- 08:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P76993 and previous config saved to /var/cache/conftool/dbconfig/20250604-082725-fceratto.json
- 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76992 and previous config saved to /var/cache/conftool/dbconfig/20250604-081725-root.json
- 08:14 moritzm: removing atlas7001 from magru01 cluster T394263
- 08:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P76991 and previous config saved to /var/cache/conftool/dbconfig/20250604-081219-fceratto.json
- 08:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2224 T395989', diff saved to https://phabricator.wikimedia.org/P76990 and previous config saved to /var/cache/conftool/dbconfig/20250604-081058-marostegui.json
- 08:05 moritzm: installing gcc-12 bugfix updates from Bookworm point releases (includes various run time libraries)
- 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76989 and previous config saved to /var/cache/conftool/dbconfig/20250604-080546-root.json
- 08:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2002.wikimedia.org
- 08:02 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host apt2002.wikimedia.org
- 07:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76988 and previous config saved to /var/cache/conftool/dbconfig/20250604-075711-fceratto.json
- 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76987 and previous config saved to /var/cache/conftool/dbconfig/20250604-075041-root.json
- 07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76986 and previous config saved to /var/cache/conftool/dbconfig/20250604-074850-fceratto.json
- 07:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 07:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76985 and previous config saved to /var/cache/conftool/dbconfig/20250604-073921-root.json
- 07:37 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain
- 07:37 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain
- 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76984 and previous config saved to /var/cache/conftool/dbconfig/20250604-073535-root.json
- 07:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain
- 07:31 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain
- 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76983 and previous config saved to /var/cache/conftool/dbconfig/20250604-072416-root.json
- 07:23 Emperor: restart swift-object-replicator ms-be2066
- 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76982 and previous config saved to /var/cache/conftool/dbconfig/20250604-072030-root.json
- 07:19 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain
- 07:18 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain
- 07:16 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain
- 07:13 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain
- 07:09 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain
- 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76981 and previous config saved to /var/cache/conftool/dbconfig/20250604-070910-root.json
- 07:08 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain
- 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76980 and previous config saved to /var/cache/conftool/dbconfig/20250604-070525-root.json
- 07:01 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
- 06:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
- 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76979 and previous config saved to /var/cache/conftool/dbconfig/20250604-065405-root.json
- 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76978 and previous config saved to /var/cache/conftool/dbconfig/20250604-065020-root.json
- 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76977 and previous config saved to /var/cache/conftool/dbconfig/20250604-063900-root.json
- 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76976 and previous config saved to /var/cache/conftool/dbconfig/20250604-063515-root.json
- 06:33 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 06:33 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 06:31 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es7" (duration: 09m 52s)
- 06:24 marostegui@deploy1003: marostegui: Continuing with sync
- 06:24 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es7" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76974 and previous config saved to /var/cache/conftool/dbconfig/20250604-062355-root.json
- 06:21 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es7"
- 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76973 and previous config saved to /var/cache/conftool/dbconfig/20250604-062010-root.json
- 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1035 T395982', diff saved to https://phabricator.wikimedia.org/P76972 and previous config saved to /var/cache/conftool/dbconfig/20250604-060413-marostegui.json
- 06:03 marostegui@dns1006: END - running authdns-update
- 06:03 marostegui@dns1006: START - running authdns-update
- 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1039 to es7 primary T395982', diff saved to https://phabricator.wikimedia.org/P76971 and previous config saved to /var/cache/conftool/dbconfig/20250604-060246-marostegui.json
- 06:02 marostegui: Starting es7 eqiad failover from es1035 to es1039 - T395982
- 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1039 with weight 0 T395982', diff saved to https://phabricator.wikimedia.org/P76970 and previous config saved to /var/cache/conftool/dbconfig/20250604-055744-marostegui.json
- 05:56 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es7 (T395982) (duration: 13m 00s)
- 05:49 marostegui@deploy1003: marostegui: Continuing with sync
- 05:45 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es7 (T395982) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 05:43 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es7 (T395982)
- 05:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395982
- 00:38 eileen: civicrm upgraded from 8eb67a94 to 22171c0b
2025-06-03
- 22:42 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 22:10 eileen: civicrm upgraded from 3b59e784 to 8eb67a94
- 22:09 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 21:59 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 21:53 tzatziki: removing 4 files for legal compliance
- 21:48 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 21:41 tzatziki: removing 2 files for legal compliance
- 21:38 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 21:38 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 21:28 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 21:21 mstyles@deploy1003: Finished scap sync-world: Backport for Revert^2 "OATHAuth: Mark checkuser and suppress as requiring 2FA" (T150898) (duration: 11m 31s)
- 21:18 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 21:14 mstyles@deploy1003: mstyles, sbassett: Continuing with sync
- 21:11 mstyles@deploy1003: mstyles, sbassett: Backport for Revert^2 "OATHAuth: Mark checkuser and suppress as requiring 2FA" (T150898) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:09 mstyles@deploy1003: Started scap sync-world: Backport for Revert^2 "OATHAuth: Mark checkuser and suppress as requiring 2FA" (T150898)
- 21:03 cjming@deploy1003: Finished scap sync-world: Backport for Use default preference if no client preference in auth request (T395957) (duration: 09m 49s)
- 20:56 cjming@deploy1003: matmarex, cjming: Continuing with sync
- 20:55 cjming@deploy1003: matmarex, cjming: Backport for Use default preference if no client preference in auth request (T395957) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:53 cjming@deploy1003: Started scap sync-world: Backport for Use default preference if no client preference in auth request (T395957)
- 20:46 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of es2040.codfw.wmnet onto es2048.codfw.wmnet
- 20:46 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) es2048 gradually with 4 steps - Pool es2048.codfw.wmnet in after cloning
- 20:37 cscott@deploy1003: Finished scap sync-world: Backport for Use ::getContentId() and ::clearContentId() from the Parsoid extension API (duration: 12m 41s)
- 20:31 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
- 20:30 cscott@deploy1003: cscott: Continuing with sync
- 20:27 cscott@deploy1003: cscott: Backport for Use ::getContentId() and ::clearContentId() from the Parsoid extension API synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:25 cscott@deploy1003: Started scap sync-world: Backport for Use ::getContentId() and ::clearContentId() from the Parsoid extension API
- 20:18 cjming@deploy1003: Finished scap sync-world: Backport for Deploy survey to en at twenty percent (T389393) (duration: 11m 18s)
- 20:11 cjming@deploy1003: ksarabia, cjming: Continuing with sync
- 20:09 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1011.eqiad.wmnet)
- 20:08 cjming@deploy1003: ksarabia, cjming: Backport for Deploy survey to en at twenty percent (T389393) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:06 cjming@deploy1003: Started scap sync-world: Backport for Deploy survey to en at twenty percent (T389393)
- 19:37 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1008.eqiad.wmnet)
- 19:37 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1008.eqiad.wmnet)
- 19:36 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1010.eqiad.wmnet)
- 19:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1010.eqiad.wmnet)
- 19:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata on wdqs1020.eqiad.wmnet from DumpsSource.NFS (munging data to /srv/wdqs/munged, /srv/wdqs/lex-munged)
- 19:35 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata on wdqs1020.eqiad.wmnet from DumpsSource.NFS (munging data to /srv/wdqs/munged, /srv/wdqs/lex-munged)
- 19:22 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.4 refs T392174
- 19:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1011.eqiad.wmnet)
- 19:08 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1009.eqiad.wmnet)
- 19:07 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1009.eqiad.wmnet)
- 19:05 ebernhardson@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: 0.3.157 (duration: 17m 57s)
- 19:00 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
- 18:47 ebernhardson@deploy1003: Started deploy [wdqs/wdqs@fea7794]: 0.3.157
- 18:08 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
- 18:00 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
- 17:50 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 17:50 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 17:45 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:45 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:42 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 17:41 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 17:39 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 17:39 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 17:39 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 17:39 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 17:38 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 17:38 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 17:36 swfrench@deploy1003: Finished scap sync-world: Scap test run after revert - T389786 (duration: 02m 10s)
- 17:35 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
- 17:34 swfrench@deploy1003: Started scap sync-world: Scap test run after revert - T389786
- 17:19 swfrench@deploy1003: Started scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T388761 T389786
- 17:17 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 17:16 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 17:16 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 17:15 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 17:15 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 17:14 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 17:14 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 17:13 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 17:13 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 17:12 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 17:06 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 17:06 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 17:01 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:59 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 16:57 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1019.eqiad.wmnet with reason: Debugging stuck queryies T390767
- 16:42 bvibber@deploy1003: Finished scap sync-world: Backport for Fixes: Charts embedded in template rendering in Parsoid (T395462), Fixes: Charts embedded in template rendering in Parsoid (T395462) (duration: 09m 54s)
- 16:38 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 16:35 bvibber@deploy1003: bvibber: Continuing with sync
- 16:35 bvibber@deploy1003: bvibber: Backport for Fixes: Charts embedded in template rendering in Parsoid (T395462), Fixes: Charts embedded in template rendering in Parsoid (T395462) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:34 sukhe@dns1004: END - running authdns-update
- 16:33 sukhe: testing dummy authdns-update to ensure clean run after gc-authdns-git-repo.timer rnu
- 16:33 sukhe@dns1004: START - running authdns-update
- 16:32 bvibber@deploy1003: Started scap sync-world: Backport for Fixes: Charts embedded in template rendering in Parsoid (T395462), Fixes: Charts embedded in template rendering in Parsoid (T395462)
- 16:23 jiji@deploy1003: Finished scap sync-world: T276994: We merged a number of noop patches, sparing deployers the scary diffs (duration: 02m 58s)
- 16:22 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:21 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:20 jiji@deploy1003: Started scap sync-world: T276994: We merged a number of noop patches, sparing deployers the scary diffs
- 16:12 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 16:12 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 16:06 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:04 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 16:04 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:57 jiji@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:57 jiji@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:55 jiji@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:54 jiji@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:51 jiji@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:51 jiji@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:50 jiji@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:18 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 15:18 moritzm: installing gcc-12 bugfix updates from Bookworm point releases (includes various run time libraries)
- 15:17 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 15:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 15:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T395241)', diff saved to https://phabricator.wikimedia.org/P76966 and previous config saved to /var/cache/conftool/dbconfig/20250603-151552-fceratto.json
- 15:10 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 15:10 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 15:10 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 15:06 hashar: Restarted Gerrit due to issue with replication config | T395887
- 15:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P76964 and previous config saved to /var/cache/conftool/dbconfig/20250603-150045-fceratto.json
- 14:58 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 14:58 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 14:54 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 14:54 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 14:50 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
- 14:46 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus7002.magru.wmnet with OS bookworm
- 14:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P76963 and previous config saved to /var/cache/conftool/dbconfig/20250603-144538-fceratto.json
- 14:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T395241)', diff saved to https://phabricator.wikimedia.org/P76962 and previous config saved to /var/cache/conftool/dbconfig/20250603-143031-fceratto.json
- 14:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus7002.magru.wmnet with reason: host reimage
- 14:26 jmm@cumin1003: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1063.eqiad.wmnet
- 14:23 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus7002.magru.wmnet with reason: host reimage
- 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T395241)', diff saved to https://phabricator.wikimedia.org/P76961 and previous config saved to /var/cache/conftool/dbconfig/20250603-142314-fceratto.json
- 14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 14:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T395241)', diff saved to https://phabricator.wikimedia.org/P76960 and previous config saved to /var/cache/conftool/dbconfig/20250603-142248-fceratto.json
- 14:19 jmm@cumin1003: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1103.eqiad.wmnet
- 14:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P76959 and previous config saved to /var/cache/conftool/dbconfig/20250603-140740-fceratto.json
- 14:01 Amir1: dropping term store tables from s8 (T351820)
- 14:01 Amir1: dropping term store tables from s8 (T351802)
- 13:57 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host prometheus7002.magru.wmnet with OS bookworm
- 13:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P76957 and previous config saved to /var/cache/conftool/dbconfig/20250603-135233-fceratto.json
- 13:49 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76956 and previous config saved to /var/cache/conftool/dbconfig/20250603-134935-root.json
- 13:44 phuedx: Disabled the SDS 2.4.11 Synthetic A/A Test in xLab
- 13:42 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 13:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 13:38 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T395241)', diff saved to https://phabricator.wikimedia.org/P76954 and previous config saved to /var/cache/conftool/dbconfig/20250603-133725-fceratto.json
- 13:34 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76953 and previous config saved to /var/cache/conftool/dbconfig/20250603-133429-root.json
- 13:33 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 13:32 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:32 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:31 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T395241)', diff saved to https://phabricator.wikimedia.org/P76952 and previous config saved to /var/cache/conftool/dbconfig/20250603-132802-fceratto.json
- 13:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 13:27 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76951 and previous config saved to /var/cache/conftool/dbconfig/20250603-132735-fceratto.json
- 13:26 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:22 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:21 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 13:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2050.codfw.wmnet
- 13:20 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 13:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76950 and previous config saved to /var/cache/conftool/dbconfig/20250603-131959-root.json
- 13:19 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76949 and previous config saved to /var/cache/conftool/dbconfig/20250603-131923-root.json
- 13:18 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 13:17 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 13:16 moritzm: installing libavif security updates
- 13:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 13:15 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 13:14 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2050.codfw.wmnet
- 13:14 jgleeson: payments-wiki rolled back from def6c267 to 1a4ef678
- 13:12 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 13:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 13:12 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P76948 and previous config saved to /var/cache/conftool/dbconfig/20250603-131228-fceratto.json
- 13:11 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 13:11 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:08 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76946 and previous config saved to /var/cache/conftool/dbconfig/20250603-130453-root.json
- 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76945 and previous config saved to /var/cache/conftool/dbconfig/20250603-130418-root.json
- 13:04 marostegui: Shutdown clouddb1016:x3 T390954
- 13:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Setting up x3 T390954
- 12:58 moritzm: uploaded wmf-laptop 1.0.2 to apt.wikimedia.org
- 12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P76943 and previous config saved to /var/cache/conftool/dbconfig/20250603-125721-fceratto.json
- 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76942 and previous config saved to /var/cache/conftool/dbconfig/20250603-124948-root.json
- 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76941 and previous config saved to /var/cache/conftool/dbconfig/20250603-124913-root.json
- 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76940 and previous config saved to /var/cache/conftool/dbconfig/20250603-124214-fceratto.json
- 12:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2049.codfw.wmnet
- 12:35 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2049.codfw.wmnet
- 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76939 and previous config saved to /var/cache/conftool/dbconfig/20250603-123442-root.json
- 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76938 and previous config saved to /var/cache/conftool/dbconfig/20250603-123407-root.json
- 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76937 and previous config saved to /var/cache/conftool/dbconfig/20250603-123357-fceratto.json
- 12:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76936 and previous config saved to /var/cache/conftool/dbconfig/20250603-123331-fceratto.json
- 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 12:25 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2048 gradually with 4 steps - Pool es2048.codfw.wmnet in after cloning
- 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76935 and previous config saved to /var/cache/conftool/dbconfig/20250603-121937-root.json
- 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76934 and previous config saved to /var/cache/conftool/dbconfig/20250603-121902-root.json
- 12:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P76933 and previous config saved to /var/cache/conftool/dbconfig/20250603-121824-fceratto.json
- 12:16 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es7" (duration: 09m 47s)
- 12:15 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 12:12 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 12:10 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:09 marostegui@deploy1003: marostegui: Continuing with sync
- 12:09 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es7" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:08 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:07 claime: Launching manual run of recount-categories cronjob - T395745
- 12:06 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es7"
- 12:06 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 12:05 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 12:04 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76931 and previous config saved to /var/cache/conftool/dbconfig/20250603-120431-root.json
- 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76930 and previous config saved to /var/cache/conftool/dbconfig/20250603-120356-root.json
- 12:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P76929 and previous config saved to /var/cache/conftool/dbconfig/20250603-120316-fceratto.json
- 12:02 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 12:00 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 11:59 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 11:56 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 11:55 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 11:54 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 11:52 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2038 T395785', diff saved to https://phabricator.wikimedia.org/P76927 and previous config saved to /var/cache/conftool/dbconfig/20250603-115026-marostegui.json
- 11:49 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2039 to es7 primary and set section read-write T395785', diff saved to https://phabricator.wikimedia.org/P76926 and previous config saved to /var/cache/conftool/dbconfig/20250603-114917-marostegui.json
- 11:48 marostegui: Starting es7 codfw failover from es2038 to es2039 - T395785
- 11:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76925 and previous config saved to /var/cache/conftool/dbconfig/20250603-114809-fceratto.json
- 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2039 with weight 0 T395785', diff saved to https://phabricator.wikimedia.org/P76924 and previous config saved to /var/cache/conftool/dbconfig/20250603-114637-marostegui.json
- 11:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395785
- 11:46 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 11:41 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2048.codfw.wmnet
- 11:40 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 11:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76923 and previous config saved to /var/cache/conftool/dbconfig/20250603-113952-fceratto.json
- 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1039 T395647', diff saved to https://phabricator.wikimedia.org/P76922 and previous config saved to /var/cache/conftool/dbconfig/20250603-113946-marostegui.json
- 11:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 11:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76921 and previous config saved to /var/cache/conftool/dbconfig/20250603-113924-fceratto.json
- 11:39 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es7 (T395647) (duration: 09m 56s)
- 11:36 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2048.codfw.wmnet
- 11:33 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 11:32 marostegui@deploy1003: marostegui: Continuing with sync
- 11:31 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es7 (T395647) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395785
- 11:29 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es7 (T395647)
- 11:25 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2040 gradually with 4 steps - Pool es2040.codfw.wmnet in after cloning
- 11:22 taavi@cumin1002: conftool action : set/weight=100:pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=x3
- 11:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395785
- 11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P76919 and previous config saved to /var/cache/conftool/dbconfig/20250603-111822-fceratto.json
- 11:09 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 11:03 jgleeson: payments-wiki upgraded from 1a4ef678 to def6c267
- 11:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P76917 and previous config saved to /var/cache/conftool/dbconfig/20250603-110315-fceratto.json
- 10:57 phuedx: Enabling the SDS 2.4.11 Synthetic A/A Test in xLab
- 10:57 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2047.codfw.wmnet
- 10:52 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2047.codfw.wmnet
- 10:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2046.codfw.wmnet
- 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76915 and previous config saved to /var/cache/conftool/dbconfig/20250603-104809-fceratto.json
- 10:44 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 10:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2046.codfw.wmnet
- 10:43 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 10:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2045.codfw.wmnet
- 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76914 and previous config saved to /var/cache/conftool/dbconfig/20250603-104056-fceratto.json
- 10:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
- 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T395241)', diff saved to https://phabricator.wikimedia.org/P76913 and previous config saved to /var/cache/conftool/dbconfig/20250603-104030-fceratto.json
- 10:40 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2045.codfw.wmnet
- 10:38 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 11 hosts with reason: silence alerts due to down BGP groups on cr2-codfw while PIC is reconfigured
- 10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P76912 and previous config saved to /var/cache/conftool/dbconfig/20250603-102523-fceratto.json
- 10:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Fix the repool dbctl commit', diff saved to https://phabricator.wikimedia.org/P76911 and previous config saved to /var/cache/conftool/dbconfig/20250603-102517-ladsgroup.json
- 10:18 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 10:13 topranks: drain cr2-codfw traffic to enable PIC port bw rebalence on slot 0 T387504
- 09:59 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus7002.magru.wmnet with OS bookworm
- 09:43 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1002.eqiad.wmnet with OS bookworm
- 09:25 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
- 09:22 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
- 09:22 elukey: puppet cert destroy {mobileapps,proton,recommendation-api}.discovery.wmnet on puppetmaster1001 - old certs not used anymore
- 09:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:18 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2040 gradually with 4 steps - Pool es2040.codfw.wmnet in after cloning
- 09:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P76909 and previous config saved to /var/cache/conftool/dbconfig/20250603-091521-fceratto.json
- 09:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:06 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2008.codfw.wmnet with OS bullseye
- 09:06 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 09:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T395241)', diff saved to https://phabricator.wikimedia.org/P76907 and previous config saved to /var/cache/conftool/dbconfig/20250603-090013-fceratto.json
- 08:59 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 08:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T395241)', diff saved to https://phabricator.wikimedia.org/P76906 and previous config saved to /var/cache/conftool/dbconfig/20250603-085148-fceratto.json
- 08:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 08:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T395241)', diff saved to https://phabricator.wikimedia.org/P76905 and previous config saved to /var/cache/conftool/dbconfig/20250603-085121-fceratto.json
- 08:45 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 08:43 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 08:42 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
- 08:41 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 08:40 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 08:38 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
- 08:37 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 08:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P76904 and previous config saved to /var/cache/conftool/dbconfig/20250603-083614-fceratto.json
- 08:36 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 08:34 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host prometheus7002.magru.wmnet with OS bookworm
- 08:30 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner1002.eqiad.wmnet with OS bookworm
- 08:22 moritzm: rearm keyholder on cumin1003 following reboot
- 08:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P76903 and previous config saved to /var/cache/conftool/dbconfig/20250603-082107-fceratto.json
- 08:16 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be2008.codfw.wmnet with OS bullseye
- 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1003.eqiad.wmnet
- 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1003.eqiad.wmnet
- 08:10 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2009.codfw.wmnet with OS bullseye
- 08:10 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 08:06 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 08:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T395241)', diff saved to https://phabricator.wikimedia.org/P76901 and previous config saved to /var/cache/conftool/dbconfig/20250603-080600-fceratto.json
- 08:04 phuedx: Disabling the SDS 2.4.11 Synthetic A/A/ Test in xLab
- 07:57 phuedx: Enabling the SDS 2.4.11 Synthetic A/A Test in xLab
- 07:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T395241)', diff saved to https://phabricator.wikimedia.org/P76900 and previous config saved to /var/cache/conftool/dbconfig/20250603-075638-fceratto.json
- 07:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 07:56 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus7002.magru.wmnet
- 07:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T395241)', diff saved to https://phabricator.wikimedia.org/P76899 and previous config saved to /var/cache/conftool/dbconfig/20250603-075622-fceratto.json
- 07:56 jmm@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 07:49 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
- 07:49 jmm@cumin1003: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master
- 07:46 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 07:46 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host prometheus7002.magru.wmnet
- 07:46 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
- 07:44 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
- 07:44 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
- 07:43 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast7002.wikimedia.org
- 07:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast7002.wikimedia.org with OS bookworm
- 07:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P76898 and previous config saved to /var/cache/conftool/dbconfig/20250603-074113-fceratto.json
- 07:38 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
- 07:38 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
- 07:37 jmm@cumin1003: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master
- 07:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast7002.wikimedia.org with reason: host reimage
- 07:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P76897 and previous config saved to /var/cache/conftool/dbconfig/20250603-072604-fceratto.json
- 07:25 tchanders@deploy1003: Finished scap sync-world: Backport for Assign IP auto-reveal rights to certain groups (T386492) (duration: 10m 39s)
- 07:23 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be2009.codfw.wmnet with OS bullseye
- 07:23 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on bast7002.wikimedia.org with reason: host reimage
- 07:18 tchanders@deploy1003: tchanders: Continuing with sync
- 07:16 tchanders@deploy1003: tchanders: Backport for Assign IP auto-reveal rights to certain groups (T386492) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:14 tchanders@deploy1003: Started scap sync-world: Backport for Assign IP auto-reveal rights to certain groups (T386492)
- 07:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T395241)', diff saved to https://phabricator.wikimedia.org/P76896 and previous config saved to /var/cache/conftool/dbconfig/20250603-071057-fceratto.json
- 07:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76895 and previous config saved to /var/cache/conftool/dbconfig/20250603-070155-root.json
- 07:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T395241)', diff saved to https://phabricator.wikimedia.org/P76894 and previous config saved to /var/cache/conftool/dbconfig/20250603-070036-fceratto.json
- 07:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 07:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T395241)', diff saved to https://phabricator.wikimedia.org/P76893 and previous config saved to /var/cache/conftool/dbconfig/20250603-070021-fceratto.json
- 06:58 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host bast7002.wikimedia.org with OS bookworm
- 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast7002.wikimedia.org - jmm@cumin1003"
- 06:54 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast7002.wikimedia.org - jmm@cumin1003"
- 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast7002.wikimedia.org on all recursors
- 06:54 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache bast7002.wikimedia.org on all recursors
- 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast7002.wikimedia.org - jmm@cumin1003"
- 06:53 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast7002.wikimedia.org - jmm@cumin1003"
- 06:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2009.codfw.wmnet with OS bullseye
- 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76892 and previous config saved to /var/cache/conftool/dbconfig/20250603-064649-root.json
- 06:45 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 06:45 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host bast7002.wikimedia.org
- 06:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P76891 and previous config saved to /var/cache/conftool/dbconfig/20250603-064513-fceratto.json
- 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76890 and previous config saved to /var/cache/conftool/dbconfig/20250603-064147-root.json
- 06:37 marostegui: Decrease buffer size on clouddb1016:s8 T390954
- 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76889 and previous config saved to /var/cache/conftool/dbconfig/20250603-063144-root.json
- 06:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Setting up x3 T390954
- 06:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P76887 and previous config saved to /var/cache/conftool/dbconfig/20250603-063004-fceratto.json
- 06:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Setting up x3 T390954
- 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76886 and previous config saved to /var/cache/conftool/dbconfig/20250603-062641-root.json
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76885 and previous config saved to /var/cache/conftool/dbconfig/20250603-061638-root.json
- 06:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T395241)', diff saved to https://phabricator.wikimedia.org/P76884 and previous config saved to /var/cache/conftool/dbconfig/20250603-061457-fceratto.json
- 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76883 and previous config saved to /var/cache/conftool/dbconfig/20250603-061134-root.json
- 06:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T395241)', diff saved to https://phabricator.wikimedia.org/P76882 and previous config saved to /var/cache/conftool/dbconfig/20250603-060719-fceratto.json
- 06:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76881 and previous config saved to /var/cache/conftool/dbconfig/20250603-060132-root.json
- 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76880 and previous config saved to /var/cache/conftool/dbconfig/20250603-055628-root.json
- 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76879 and previous config saved to /var/cache/conftool/dbconfig/20250603-054626-root.json
- 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76878 and previous config saved to /var/cache/conftool/dbconfig/20250603-054123-root.json
- 05:39 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es6" (duration: 09m 52s)
- 05:32 marostegui@deploy1003: marostegui: Continuing with sync
- 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Give some weight to es1038', diff saved to https://phabricator.wikimedia.org/P76877 and previous config saved to /var/cache/conftool/dbconfig/20250603-053151-marostegui.json
- 05:31 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es6" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 05:29 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es6"
- 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1037 T395867', diff saved to https://phabricator.wikimedia.org/P76876 and previous config saved to /var/cache/conftool/dbconfig/20250603-052719-marostegui.json
- 05:27 marostegui@dns1006: END - running authdns-update
- 05:26 marostegui@dns1006: START - running authdns-update
- 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1038 to es6 primary T395867', diff saved to https://phabricator.wikimedia.org/P76875 and previous config saved to /var/cache/conftool/dbconfig/20250603-052614-marostegui.json
- 05:25 marostegui: Starting es6 eqiad failover from es1037 to es1038 - T395867
- 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1038 with weight 0 T395867', diff saved to https://phabricator.wikimedia.org/P76874 and previous config saved to /var/cache/conftool/dbconfig/20250603-052353-marostegui.json
- 05:22 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es6 (T395867) (duration: 13m 39s)
- 05:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es6 T395867
- 05:14 marostegui@deploy1003: marostegui: Continuing with sync
- 05:13 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es6 (T395867) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 05:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es6 T395867
- 05:09 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es6 (T395867)
- 04:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2037.codfw.wmnet with reason: Primary switchover es6 T395420
- 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2037 T395420', diff saved to https://phabricator.wikimedia.org/P76873 and previous config saved to /var/cache/conftool/dbconfig/20250603-045251-marostegui.json
- 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2035 to es6 primary and set section read-write T395420', diff saved to https://phabricator.wikimedia.org/P76872 and previous config saved to /var/cache/conftool/dbconfig/20250603-045202-root.json
- 04:51 marostegui: Starting es6 codfw failover from es2037 to es2035 - T395420
- 04:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es6 T395420
- 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2035 with weight 0 T395420', diff saved to https://phabricator.wikimedia.org/P76871 and previous config saved to /var/cache/conftool/dbconfig/20250603-044855-root.json
- 04:45 marostegui@cumin1002: dbctl commit (dc=all): 'Add es2048 to dbctl depooled T395771+', diff saved to https://phabricator.wikimedia.org/P76870 and previous config saved to /var/cache/conftool/dbconfig/20250603-044550-marostegui.json
- 04:40 marostegui@cumin1002: START - Cookbook sre.mysql.clone of es2040.codfw.wmnet onto es2048.codfw.wmnet
- 04:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2040 T395771', diff saved to https://phabricator.wikimedia.org/P76869 and previous config saved to /var/cache/conftool/dbconfig/20250603-043151-marostegui.json
- 04:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2040.codfw.wmnet with reason: Maintenance
- 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.1 (duration: 01m 39s)
- 03:48 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.4 refs T392174 (duration: 45m 55s)
- 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.4 refs T392174
- 00:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 00:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
2025-06-02
- 23:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
- 23:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
- 23:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be2008.codfw.wmnet with OS bullseye
- 23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 23:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
- 23:24 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
- 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1049.eqiad.wmnet with OS bullseye
- 23:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:09 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 22:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
- 22:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
- 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
- 22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
- 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
- 22:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
- 22:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
- 22:16 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 22:08 maryum: scap sync-world finished to deploy several security bugs and PrivateSettings.php changes
- 22:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
- 21:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bullseye
- 21:38 tgr@deploy1003: Unlocked for deployment [MediaWiki]: T395758 (duration: 22m 32s)
- 21:38 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 21:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 21:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch205*,cirrussearch2060* for T395855 - bking@cumin2002
- 21:22 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch205*,cirrussearch2060* for T395855 - bking@cumin2002
- 21:16 bking@cumin2002: conftool action : set/pooled=no; selector: name=cirrussearch2055.codfw.wmnet|cirrussearch2056.codfw.wmnet|cirrussearch2057.codfw.wmnet|cirrussearch2058.codfw.wmnet|cirrussearch2059.codfw.wmnet|cirrussearch2060.codfw.wmnet|cirrussearch2091.codfw.wmnet
- 21:16 tgr@deploy1003: Locking from deployment [MediaWiki]: T395758
- 21:13 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch.*.codfw.wmnet
- 21:11 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 21:06 cjming@deploy1003: Finished scap sync-world: Backport for Simple summaries survey for English (T389393) (duration: 11m 41s)
- 21:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1049.eqiad.wmnet with OS bullseye
- 21:04 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bullseye
- 20:59 cjming@deploy1003: cjming, ksarabia: Continuing with sync
- 20:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 20:59 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
- 20:56 cjming@deploy1003: cjming, ksarabia: Backport for Simple summaries survey for English (T389393) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:55 cjming@deploy1003: Started scap sync-world: Backport for Simple summaries survey for English (T389393)
- 20:51 jsn@deploy1003: Finished scap sync-world: Backport for Undeploy first set of Patroller Tools surveys (T389401) (duration: 12m 55s)
- 20:45 jsn@deploy1003: jsn: Continuing with sync
- 20:41 jsn@deploy1003: jsn: Backport for Undeploy first set of Patroller Tools surveys (T389401) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:38 jsn@deploy1003: Started scap sync-world: Backport for Undeploy first set of Patroller Tools surveys (T389401)
- 20:36 arlolra@deploy1003: Finished scap sync-world: Backport for Remove wgParserEnableLegacyHeadingDOM option (T371756) (duration: 10m 37s)
- 20:29 arlolra@deploy1003: arlolra: Continuing with sync
- 20:27 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1028
- 20:27 arlolra@deploy1003: arlolra: Backport for Remove wgParserEnableLegacyHeadingDOM option (T371756) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:27 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-in2001.wikimedia.org with reason: T395240
- 20:26 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-in1001.wikimedia.org with reason: T395240
- 20:26 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1028
- 20:25 arlolra@deploy1003: Started scap sync-world: Backport for Remove wgParserEnableLegacyHeadingDOM option (T371756)
- 20:23 cjming@deploy1003: Finished scap sync-world: Backport for ext.xLab: Send limited copies of stream configs (T391988) (duration: 15m 51s)
- 20:22 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-out1001.wikimedia.org with reason: T395240
- 20:18 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
- 20:16 cjming@deploy1003: cjming, phuedx: Continuing with sync
- 20:16 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-out2001.wikimedia.org with reason: T395240
- 20:10 cjming@deploy1003: cjming, phuedx: Backport for ext.xLab: Send limited copies of stream configs (T391988) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:07 cjming@deploy1003: Started scap sync-world: Backport for ext.xLab: Send limited copies of stream configs (T391988)
- 19:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 19:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 19:36 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 19:32 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3008.esams.wmnet} and A:liberica
- 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs3008.esams.wmnet} and A:liberica
- 19:20 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs3008.esams.wmnet} and A:liberica
- 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs3008.esams.wmnet} and A:liberica
- 19:20 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs3008.esams.wmnet} and A:liberica
- 19:19 sukhe@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3008.esams.wmnet} and A:liberica
- 19:15 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3009.esams.wmnet} and A:liberica
- 19:15 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs3009.esams.wmnet} and A:liberica
- 19:14 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs3009.esams.wmnet} and A:liberica
- 19:14 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs3009.esams.wmnet} and A:liberica
- 19:14 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs3009.esams.wmnet} and A:liberica
- 19:14 sukhe@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3009.esams.wmnet} and A:liberica
- 19:08 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 19:06 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3010.esams.wmnet} and A:liberica
- 19:06 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs3010.esams.wmnet} and A:liberica
- 19:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs3010.esams.wmnet} and A:liberica
- 19:05 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs3010.esams.wmnet} and A:liberica
- 19:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs3010.esams.wmnet} and A:liberica
- 19:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3010.esams.wmnet} and A:liberica
- 18:46 jasmine@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wikikube-worker1028.eqiad.wmnet
- 18:46 jasmine@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:44 jasmine@cumin1002: START - Cookbook sre.dns.netbox
- 18:34 jasmine@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker1028.eqiad.wmnet
- 18:33 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1104.eqiad.wmnet,service=(cdn|ats-be)
- 18:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T395241)', diff saved to https://phabricator.wikimedia.org/P76866 and previous config saved to /var/cache/conftool/dbconfig/20250602-183230-fceratto.json
- 18:24 brett: include libvmod-wmfuniq 0.2.0~deb11u1 in bullseye-wikimedia
- 18:23 brett: include libvmod-wmfuniq 0.2.0~deb12u1 in bookworm-wikimedia
- 18:21 ebernhardson@deploy1003: Finished deploy [airflow-dags/search@443d0ab]: bump glent to 0.3.6 (duration: 00m 29s)
- 18:21 ebernhardson@deploy1003: Started deploy [airflow-dags/search@443d0ab]: bump glent to 0.3.6
- 18:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P76865 and previous config saved to /var/cache/conftool/dbconfig/20250602-181722-fceratto.json
- 18:10 jasmine@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wikikube-worker[1026-1028].eqiad.wmnet
- 18:10 jasmine@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:10 jasmine@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[1026-1028].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1002"
- 18:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 18:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 18:05 jasmine@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[1026-1028].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1002"
- 18:02 jasmine@cumin1002: START - Cookbook sre.dns.netbox
- 18:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P76864 and previous config saved to /var/cache/conftool/dbconfig/20250602-180216-fceratto.json
- 17:50 jasmine@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[1026-1028].eqiad.wmnet
- 17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T395241)', diff saved to https://phabricator.wikimedia.org/P76863 and previous config saved to /var/cache/conftool/dbconfig/20250602-174708-fceratto.json
- 17:38 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T395241)', diff saved to https://phabricator.wikimedia.org/P76862 and previous config saved to /var/cache/conftool/dbconfig/20250602-173850-fceratto.json
- 17:38 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1254.eqiad.wmnet with reason: Maintenance
- 17:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T395241)', diff saved to https://phabricator.wikimedia.org/P76861 and previous config saved to /var/cache/conftool/dbconfig/20250602-173316-fceratto.json
- 17:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 17:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250602-171804-fceratto.json
- 17:05 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1026-1028].eqiad.wmnet
- 17:04 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1026-1028].eqiad.wmnet
- 17:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P76860 and previous config saved to /var/cache/conftool/dbconfig/20250602-170256-fceratto.json
- 16:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T395241)', diff saved to https://phabricator.wikimedia.org/P76859 and previous config saved to /var/cache/conftool/dbconfig/20250602-164748-fceratto.json
- 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T395241)', diff saved to https://phabricator.wikimedia.org/P76857 and previous config saved to /var/cache/conftool/dbconfig/20250602-164030-fceratto.json
- 16:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 16:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T395241)', diff saved to https://phabricator.wikimedia.org/P76856 and previous config saved to /var/cache/conftool/dbconfig/20250602-164003-fceratto.json
- 16:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76855 and previous config saved to /var/cache/conftool/dbconfig/20250602-162957-root.json
- 16:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P76854 and previous config saved to /var/cache/conftool/dbconfig/20250602-162455-fceratto.json
- 16:22 sukhe: sudo cumin -b1 -s60 'A:cp and not P{cp7001*}' "depool cdn && sleep 10 && run-puppet-agent --enable 'merging CR 1091330' && systemctl restart trafficserver.service && sleep 10 && pool cdn"
- 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76853 and previous config saved to /var/cache/conftool/dbconfig/20250602-161452-root.json
- 16:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P76852 and previous config saved to /var/cache/conftool/dbconfig/20250602-160948-fceratto.json
- 16:03 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet [reason: [end] testing CR 1091330]
- 15:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76851 and previous config saved to /var/cache/conftool/dbconfig/20250602-155946-root.json
- 15:55 sukhe: enable puppet and run agent on cp7001
- 15:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T395241)', diff saved to https://phabricator.wikimedia.org/P76850 and previous config saved to /var/cache/conftool/dbconfig/20250602-155441-fceratto.json
- 15:52 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet [reason: testing CR 1091330]
- 15:50 sukhe: disable puppet on A:cp to merge CR: 1091330
- 15:49 phuedx@deploy1003: Finished scap sync-world: Backport for Enable MetricsPlatform's experimentation feature (duration: 14m 23s)
- 15:47 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T395241)', diff saved to https://phabricator.wikimedia.org/P76849 and previous config saved to /var/cache/conftool/dbconfig/20250602-154734-fceratto.json
- 15:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 15:47 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T395241)', diff saved to https://phabricator.wikimedia.org/P76848 and previous config saved to /var/cache/conftool/dbconfig/20250602-154709-fceratto.json
- 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
- 15:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76847 and previous config saved to /var/cache/conftool/dbconfig/20250602-154440-root.json
- 15:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
- 15:42 phuedx@deploy1003: phuedx: Continuing with sync
- 15:38 phuedx@deploy1003: phuedx: Backport for Enable MetricsPlatform's experimentation feature synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:35 phuedx@deploy1003: Started scap sync-world: Backport for Enable MetricsPlatform's experimentation feature
- 15:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P76846 and previous config saved to /var/cache/conftool/dbconfig/20250602-153201-fceratto.json
- 15:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76845 and previous config saved to /var/cache/conftool/dbconfig/20250602-152935-root.json
- 15:27 joal@deploy1003: Finished deploy [airflow-dags/analytics@03db055]: Regular analytics weekly train (with pull...) [airflow-dags/analytics_test@03db0552] (duration: 00m 42s)
- 15:26 joal@deploy1003: Started deploy [airflow-dags/analytics@03db055]: Regular analytics weekly train (with pull...) [airflow-dags/analytics_test@03db0552]
- 15:21 thcipriani: jouncebot nowandnext
- 15:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 15:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P76844 and previous config saved to /var/cache/conftool/dbconfig/20250602-151654-fceratto.json
- 15:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2009.codfw.wmnet with OS bullseye
- 15:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76843 and previous config saved to /var/cache/conftool/dbconfig/20250602-151429-root.json
- 15:03 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@4ebb376]: Regular analytics weekly train [airflow-dags/analytics_test@4ebb376f] (duration: 00m 05s)
- 15:03 joal@deploy1003: Started deploy [airflow-dags/analytics_test@4ebb376]: Regular analytics weekly train [airflow-dags/analytics_test@4ebb376f]
- 15:03 joal@deploy1003: Finished deploy [airflow-dags/analytics@afad011]: Regular analytics weekly train [airflow-dags/main@afad011c] (duration: 00m 07s)
- 15:03 joal@deploy1003: Started deploy [airflow-dags/analytics@afad011]: Regular analytics weekly train [airflow-dags/main@afad011c]
- 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T395241)', diff saved to https://phabricator.wikimedia.org/P76842 and previous config saved to /var/cache/conftool/dbconfig/20250602-150146-fceratto.json
- 15:00 phuedx: Disabled the SDS 2.4.11 Synthetic A/A Test in xLab
- 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T395241)', diff saved to https://phabricator.wikimedia.org/P76841 and previous config saved to /var/cache/conftool/dbconfig/20250602-145443-fceratto.json
- 14:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T395241)', diff saved to https://phabricator.wikimedia.org/P76840 and previous config saved to /var/cache/conftool/dbconfig/20250602-145418-fceratto.json
- 14:54 joal@deploy1003: Finished deploy [analytics/refinery@b1aa837] (hadoop-test): Regular analytics weekly train test [analytics/refinery@b1aa837f] (duration: 09m 27s)
- 14:44 joal@deploy1003: Started deploy [analytics/refinery@b1aa837] (hadoop-test): Regular analytics weekly train test [analytics/refinery@b1aa837f]
- 14:44 joal@deploy1003: Finished deploy [analytics/refinery@b1aa837] (thin): Regular analytics weekly train THIN [analytics/refinery@b1aa837f] (duration: 01m 06s)
- 14:43 joal@deploy1003: Started deploy [analytics/refinery@b1aa837] (thin): Regular analytics weekly train THIN [analytics/refinery@b1aa837f]
- 14:42 joal@deploy1003: Finished deploy [analytics/refinery@b1aa837]: Regular analytics weekly train [analytics/refinery@b1aa837f] (duration: 03m 08s)
- 14:39 joal@deploy1003: Started deploy [analytics/refinery@b1aa837]: Regular analytics weekly train [analytics/refinery@b1aa837f]
- 14:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P76835 and previous config saved to /var/cache/conftool/dbconfig/20250602-143910-fceratto.json
- 14:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[1154,1211].eqiad.wmnet with reason: Maintenance
- 14:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic1067.eqiad.wmnet
- 14:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: elastic1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 14:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: elastic1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P76833 and previous config saved to /var/cache/conftool/dbconfig/20250602-142403-fceratto.json
- 14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T395241)', diff saved to https://phabricator.wikimedia.org/P76832 and previous config saved to /var/cache/conftool/dbconfig/20250602-140854-fceratto.json
- 14:04 phuedx: Enabling the SDS 2.4.11 Synthetic A/A Test in xLab
- 13:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T395241)', diff saved to https://phabricator.wikimedia.org/P76831 and previous config saved to /var/cache/conftool/dbconfig/20250602-135945-fceratto.json
- 13:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 13:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T395241)', diff saved to https://phabricator.wikimedia.org/P76830 and previous config saved to /var/cache/conftool/dbconfig/20250602-135920-fceratto.json
- 13:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:49 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus7002.magru.wmnet
- 13:49 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus7002.magru.wmnet with OS bookworm
- 13:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P76829 and previous config saved to /var/cache/conftool/dbconfig/20250602-134413-fceratto.json
- 13:43 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:38 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic1067.eqiad.wmnet
- 13:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cirrussearch[1064-1066].eqiad.wmnet
- 13:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1064-1066].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 13:37 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1064-1066].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 13:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P76828 and previous config saved to /var/cache/conftool/dbconfig/20250602-132906-fceratto.json
- 13:24 Lucas_WMDE: UTC afternoon backport+config window done
- 13:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 13:22 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for core-Namespaces: Add Page, Author to default search ns in ruwikisource (T395632) (duration: 12m 00s)
- 13:21 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 13:17 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for cloudcephosd1048-51 - jclark@cumin1002"
- 13:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for cloudcephosd1048-51 - jclark@cumin1002"
- 13:15 lucaswerkmeister-wmde@deploy1003: bunnypranav, lucaswerkmeister-wmde: Continuing with sync
- 13:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T395241)', diff saved to https://phabricator.wikimedia.org/P76826 and previous config saved to /var/cache/conftool/dbconfig/20250602-131359-fceratto.json
- 13:13 lucaswerkmeister-wmde@deploy1003: bunnypranav, lucaswerkmeister-wmde: Backport for core-Namespaces: Add Page, Author to default search ns in ruwikisource (T395632) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:13 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 13:10 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for core-Namespaces: Add Page, Author to default search ns in ruwikisource (T395632)
- 13:09 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cirrussearch[1064-1066].eqiad.wmnet
- 13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T395241)', diff saved to https://phabricator.wikimedia.org/P76825 and previous config saved to /var/cache/conftool/dbconfig/20250602-130548-fceratto.json
- 13:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T395241)', diff saved to https://phabricator.wikimedia.org/P76824 and previous config saved to /var/cache/conftool/dbconfig/20250602-130523-fceratto.json
- 12:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P76823 and previous config saved to /var/cache/conftool/dbconfig/20250602-125016-fceratto.json
- 12:37 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of es2036.codfw.wmnet onto es2047.codfw.wmnet
- 12:37 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2047 gradually with 4 steps - Pool es2047.codfw.wmnet in after cloning
- 12:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
- 12:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P76821 and previous config saved to /var/cache/conftool/dbconfig/20250602-123508-fceratto.json
- 12:33 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
- 12:26 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host prometheus7002.magru.wmnet with OS bookworm
- 12:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus7002.magru.wmnet - jmm@cumin1003"
- 12:25 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus7002.magru.wmnet - jmm@cumin1003"
- 12:24 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus7002.magru.wmnet on all recursors
- 12:24 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache prometheus7002.magru.wmnet on all recursors
- 12:24 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:24 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus7002.magru.wmnet - jmm@cumin1003"
- 12:22 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus7002.magru.wmnet - jmm@cumin1003"
- 12:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T395241)', diff saved to https://phabricator.wikimedia.org/P76819 and previous config saved to /var/cache/conftool/dbconfig/20250602-122001-fceratto.json
- 12:17 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 12:17 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host prometheus7002.magru.wmnet
- 12:12 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh7003.wikimedia.org
- 12:12 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh7003.wikimedia.org with OS bookworm
- 12:11 claime: cgoubert@mwmaint1002:~$ sudo systemctl restart mediawiki_job_generatecaptcha.service - T388531
- 12:11 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet
- 12:10 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of global-data-captcha-render
- 12:10 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of global-data-captcha-render
- 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T395241)', diff saved to https://phabricator.wikimedia.org/P76818 and previous config saved to /var/cache/conftool/dbconfig/20250602-121041-fceratto.json
- 12:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76817 and previous config saved to /var/cache/conftool/dbconfig/20250602-121016-fceratto.json
- 12:07 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host apt-staging2001.codfw.wmnet
- 12:00 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2002.codfw.wmnet
- 11:59 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 11:57 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 11:55 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P76814 and previous config saved to /var/cache/conftool/dbconfig/20250602-115509-fceratto.json
- 11:54 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host build2002.codfw.wmnet
- 11:51 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh7003.wikimedia.org with reason: host reimage
- 11:49 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on doh7003.wikimedia.org with reason: host reimage
- 11:48 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2047 gradually with 4 steps - Pool es2047.codfw.wmnet in after cloning
- 11:47 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 11:46 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
- 11:42 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
- 11:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P76812 and previous config saved to /var/cache/conftool/dbconfig/20250602-114001-fceratto.json
- 11:39 claime: cgoubert@mwmaint1002:~$ sudo systemctl restart mediawiki_job_generatecaptcha.service - T388531
- 11:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
- 11:33 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
- 11:32 claime: Manual run of cronjobs/generatecaptcha on k8s - T388531
- 11:31 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 11:30 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 11:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76811 and previous config saved to /var/cache/conftool/dbconfig/20250602-112453-fceratto.json
- 11:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint1001.eqiad.wmnet
- 11:19 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-maint1001.eqiad.wmnet
- 11:18 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host doh7003.wikimedia.org with OS bookworm
- 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh7003.wikimedia.org - jmm@cumin1003"
- 11:17 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh7003.wikimedia.org - jmm@cumin1003"
- 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh7003.wikimedia.org on all recursors
- 11:17 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache doh7003.wikimedia.org on all recursors
- 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh7003.wikimedia.org - jmm@cumin1003"
- 11:16 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh7003.wikimedia.org - jmm@cumin1003"
- 11:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76810 and previous config saved to /var/cache/conftool/dbconfig/20250602-111519-fceratto.json
- 11:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 11:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 11:11 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 11:11 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host doh7003.wikimedia.org
- 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76809 and previous config saved to /var/cache/conftool/dbconfig/20250602-111044-root.json
- 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum7003.magru.wmnet
- 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7003.magru.wmnet with OS bookworm
- 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint2001.codfw.wmnet
- 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76808 and previous config saved to /var/cache/conftool/dbconfig/20250602-105539-root.json
- 10:54 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-maint2001.codfw.wmnet
- 10:48 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2036 gradually with 4 steps - Pool es2036.codfw.wmnet in after cloning
- 10:41 kamila@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/jaeger: apply
- 10:40 kamila@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/jaeger: apply
- 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76806 and previous config saved to /var/cache/conftool/dbconfig/20250602-104032-root.json
- 10:40 kamila@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
- 10:40 kamila@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
- 10:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
- 10:34 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
- 10:31 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
- 10:27 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
- 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76804 and previous config saved to /var/cache/conftool/dbconfig/20250602-102526-root.json
- 10:22 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2003.codfw.wmnet
- 10:18 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
- 10:12 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
- 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76802 and previous config saved to /var/cache/conftool/dbconfig/20250602-101020-root.json
- 10:08 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
- 10:07 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host durum7003.magru.wmnet with OS bookworm
- 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum7003.magru.wmnet - jmm@cumin1003"
- 10:06 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum7003.magru.wmnet - jmm@cumin1003"
- 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum7003.magru.wmnet on all recursors
- 10:06 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache durum7003.magru.wmnet on all recursors
- 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum7003.magru.wmnet - jmm@cumin1003"
- 10:06 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum7003.magru.wmnet - jmm@cumin1003"
- 10:02 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2036 gradually with 4 steps - Pool es2036.codfw.wmnet in after cloning
- 10:02 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 10:02 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host durum7003.magru.wmnet
- 09:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2003.codfw.wmnet
- 09:59 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir7003.magru.wmnet
- 09:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir7003.magru.wmnet with OS bookworm
- 09:55 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host rpki2003.codfw.wmnet
- 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76800 and previous config saved to /var/cache/conftool/dbconfig/20250602-095514-root.json
- 09:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2039.codfw.wmnet with reason: Maintenance
- 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2039 T395647', diff saved to https://phabricator.wikimedia.org/P76798 and previous config saved to /var/cache/conftool/dbconfig/20250602-094402-marostegui.json
- 09:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
- 09:33 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
- 09:28 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir7003.magru.wmnet with reason: host reimage
- 09:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
- 09:24 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir7003.magru.wmnet with reason: host reimage
- 09:22 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
- 09:13 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
- 09:10 jelto: update gitlab-settings artifact retention to 6 month - T395014
- 09:09 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
- 09:02 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
- 08:58 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7003.magru.wmnet with OS bookworm
- 08:58 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 08:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7003.magru.wmnet on all recursors
- 08:51 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7003.magru.wmnet on all recursors
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 08:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76796 and previous config saved to /var/cache/conftool/dbconfig/20250602-085105-root.json
- 08:47 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 08:47 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7003.magru.wmnet
- 08:45 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncredir7003.magru.wmnet with OS bookworm
- 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir7003.magru.wmnet
- 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76795 and previous config saved to /var/cache/conftool/dbconfig/20250602-083559-root.json
- 08:33 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir7003.magru.wmnet
- 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76794 and previous config saved to /var/cache/conftool/dbconfig/20250602-082053-root.json
- 08:11 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7003.magru.wmnet with OS bookworm
- 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76793 and previous config saved to /var/cache/conftool/dbconfig/20250602-080547-root.json
- 07:58 phuedx@deploy1003: Finished scap sync-world: Backport for Beta Cluster: Support A/B experiments (T393918) (duration: 35m 59s)
- 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76792 and previous config saved to /var/cache/conftool/dbconfig/20250602-075041-root.json
- 07:49 phuedx@deploy1003: phuedx, dr0ptp4kt: Continuing with sync
- 07:38 phuedx@deploy1003: phuedx, dr0ptp4kt: Backport for Beta Cluster: Support A/B experiments (T393918) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:36 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
- 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76791 and previous config saved to /var/cache/conftool/dbconfig/20250602-073535-root.json
- 07:27 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
- 07:22 phuedx@deploy1003: Started scap sync-world: Backport for Beta Cluster: Support A/B experiments (T393918)
- 07:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
- 07:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
- 07:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: Maintenance
- 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76790 and previous config saved to /var/cache/conftool/dbconfig/20250602-070837-root.json
- 07:08 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
- 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1040 T395647', diff saved to https://phabricator.wikimedia.org/P76789 and previous config saved to /var/cache/conftool/dbconfig/20250602-070602-marostegui.json
- 07:02 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
- 06:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
- 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76787 and previous config saved to /var/cache/conftool/dbconfig/20250602-065331-root.json
- 06:53 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
- 06:52 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
- 06:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
- 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76786 and previous config saved to /var/cache/conftool/dbconfig/20250602-063826-root.json
- 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76785 and previous config saved to /var/cache/conftool/dbconfig/20250602-062320-root.json
- 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76783 and previous config saved to /var/cache/conftool/dbconfig/20250602-060815-root.json
- 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76782 and previous config saved to /var/cache/conftool/dbconfig/20250602-055309-root.json
- 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T395663', diff saved to https://phabricator.wikimedia.org/P76781 and previous config saved to /var/cache/conftool/dbconfig/20250602-053905-marostegui.json
- 05:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Add es2047 to dbctl depooled T395771', diff saved to https://phabricator.wikimedia.org/P76780 and previous config saved to /var/cache/conftool/dbconfig/20250602-051957-marostegui.json
- 05:15 marostegui@cumin1002: START - Cookbook sre.mysql.clone of es2036.codfw.wmnet onto es2047.codfw.wmnet
- 05:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2036.codfw.wmnet with reason: Maintenance
- 05:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2036 T395771', diff saved to https://phabricator.wikimedia.org/P76779 and previous config saved to /var/cache/conftool/dbconfig/20250602-050150-marostegui.json