[CELEBORN-2016] Add cooldown time in worker shutdown #3294

s0nskar · 2025-05-27T17:05:43Z

What changes were proposed in this pull request?

Adding addition cooldown time in worker shutdown logic, which will allow all shutdown hook to execute completely.
Minor improvement in shutdown logic.

Why are the changes needed?

We current shutdown logic we have seen worker getting shutdown abruptly with timeout exception without completely executing the shutdown hook because of which Celeborn is –

unable to print unreleased partition info on decommission
not able to update sorted file DB

Does this PR introduce any user-facing change?

NA

How was this patch tested?

NA

s0nskar · 2025-05-27T17:06:56Z

common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala

+  val WORKER_GRACEFUL_SHUTDOWN_TIMEOUT: ConfigEntry[Long] =
+    buildConf("celeborn.worker.graceful.shutdown.timeout")
+      .categories("worker")
+      .doc(s"The worker's graceful shutdown timeout time. This should include " +
+        s"${WORKER_CHECK_SLOTS_FINISHED_TIMEOUT.key} and ${WORKER_PARTITION_SORTER_SHUTDOWN_TIMEOUT.key}.")
+      .version("0.2.0")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .createWithDefaultString("600s")


Changed ordering to fix forward referencing the variables in description.

FMX · 2025-06-04T07:35:54Z

Can we just set a larger value for celeborn.worker.graceful.shutdown.timeout?

s0nskar · 2025-06-04T08:38:56Z

Can we just set a larger value for celeborn.worker.graceful.shutdown.timeout?

For graceful shutdown, we can do this and let other user know by adding in the config description.

But we still need to handle this for decommission flow, so this config can be general / default way to provide some time buffer without requiring config tuning from users.

FMX · 2025-06-05T03:19:25Z

Can we just set a larger value for celeborn.worker.graceful.shutdown.timeout?

For graceful shutdown, we can do this and let other user know by adding in the config description.

But we still need to handle this for decommission flow, so this config can be general / default way to provide some time buffer without requiring config tuning from users.

The config celeborn.worker.graceful.shutdown.timeout is Celeborn worker's config, which means that the users should not tune it because Celeborn Cluster should not be exposed to the users. In what scenario will the user tune the configs for the Celeborn cluster?

s0nskar · 2025-06-09T06:52:44Z

@FMX By users i meant the Celeborn admins, not the client.

For worker decommission, the force exit timeout is 600s. But in some cases the shutdown hook will not fully execute, lets say sendWorkerDecommissionToMaster() took 2s. Due to which we will miss out on the information on unreleased shuffle which gets print at the end of decommission shutdown hook.

Similarly, for worker graceful shutdown, current default timeout is 600s, which accounts for check slots finished timeout (480s) and partition sorter timeout (120s). So either we can increase the graceful shutdown default value slightly or ask the celeborn admins to increase the timeout slightly grater than (check slots finished timeout + partition sorter timeout).

Both of the cases can be handled by adding small cooldown time.

FMX · 2025-06-10T08:05:29Z

@FMX By users i meant the Celeborn admins, not the client.

For worker decommission, the force exit timeout is 600s. But in some cases the shutdown hook will not fully execute, lets say sendWorkerDecommissionToMaster() took 2s. Due to which we will miss out on the information on unreleased shuffle which gets print at the end of decommission shutdown hook.

Similarly, for worker graceful shutdown, current default timeout is 600s, which accounts for check slots finished timeout (480s) and partition sorter timeout (120s). So either we can increase the graceful shutdown default value slightly or ask the celeborn admins to increase the timeout slightly grater than (check slots finished timeout + partition sorter timeout).

Both of the cases can be handled by adding small cooldown time.

In this scenario, why not increase the graceful shutdown time slightly? Looks like in both approaches, the cluster admin will need to change the config file.

s0nskar · 2025-06-11T04:42:21Z

Increasing the timeout approach will only solve for graceful shutdown, decommission flow will still have this problem.

Adding worker shutdown cooldown time

5084033

github-actions bot added kind:documentation module:common module:worker labels May 27, 2025

s0nskar commented May 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CELEBORN-2016] Add cooldown time in worker shutdown #3294

[CELEBORN-2016] Add cooldown time in worker shutdown #3294

Uh oh!

s0nskar commented May 27, 2025

Uh oh!

s0nskar May 27, 2025

Uh oh!

FMX commented Jun 4, 2025

Uh oh!

s0nskar commented Jun 4, 2025

Uh oh!

FMX commented Jun 5, 2025

Uh oh!

s0nskar commented Jun 9, 2025 •

edited

Loading

Uh oh!

FMX commented Jun 10, 2025

Uh oh!

s0nskar commented Jun 11, 2025

Uh oh!

Uh oh!

[CELEBORN-2016] Add cooldown time in worker shutdown #3294

Are you sure you want to change the base?

[CELEBORN-2016] Add cooldown time in worker shutdown #3294

Uh oh!

Conversation

s0nskar commented May 27, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

s0nskar May 27, 2025

Choose a reason for hiding this comment

Uh oh!

FMX commented Jun 4, 2025

Uh oh!

s0nskar commented Jun 4, 2025

Uh oh!

FMX commented Jun 5, 2025

Uh oh!

s0nskar commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FMX commented Jun 10, 2025

Uh oh!

s0nskar commented Jun 11, 2025

Uh oh!

Uh oh!

s0nskar commented Jun 9, 2025 •

edited

Loading