-
Notifications
You must be signed in to change notification settings - Fork 380
Add retries to local sync download #9191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Download Retry Fails, Progress Bars Leak
When retrying a failed download, the file is not truncated. The code only seeks to the beginning of the file, which can lead to file corruption if a previous attempt wrote partial data and the retry writes less, leaving leftover data. Additionally, a new progress bar is created for each retry attempt, resulting in multiple progress indicators for the same download and potential resource leaks, as previous bars are not cleaned up.
pkg/local/sync.go#L186-L202
Lines 186 to 202 in 4ca2f9e
if attempt > 1 { | |
// If we are retrying, we need to reset the file pointer to the beginning | |
if _, err := f.Seek(0, io.SeekStart); err != nil { | |
return fmt.Errorf("could not seek to start of file '%s': %w", destination, err) | |
} | |
} | |
b := s.progressBar.AddReader(fmt.Sprintf("download %s", path), sizeBytes) | |
defer func() { | |
if err != nil { | |
b.Error() | |
} else { | |
atomic.AddUint64(&s.tasks.Downloaded, 1) | |
b.Done() | |
} | |
}() |
Was this report helpful? Give feedback by reacting with 👍 or 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things on the higher level which I don't understand:
- SyncManager is already receiving a retryable http client, why do we need this mechanism as well?
- We already have a retry configuration parameter, why are we not using it?
- The PR states the changes should address issues with httpclient hangs and large files, however we catch all errors and retry on them. I'm not sure that's what we want.
- I did not understand the use case. There's no issue linked, no reproduction scenario or a sighted bug linked. Need more context
} | ||
|
||
// retryIfError attempts to execute the given operation up to maxAttempts times | ||
func retryIfError(operation func(attempt int) error, maxAttempts int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func retryIfError(operation func(attempt int) error, maxAttempts int) error { | |
func retryOnError(operation func(attempt int) error, maxAttempts int) error { |
Or just simply retry
return nil | ||
} | ||
|
||
retriesCount := 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this magic number?
Change Description
Background
Adding extra retries to
lakectl local
sync (which is invoked by eitherclone
orcheckout
).These are an attempt to better handle socket hangouts of the storage backend, in extreme cases of very large files.
Testing Details
I haven't been able to reproduce the bug (which was reported by a client), but this addresses this specific reported failure.
Added unit-tests, and verified that a simple
local clone
andlocal checkout
work as before.