hcloud-upload-image/hcloudimages/internal/control/retry.go
Julian Tölle c1f086867d
fix: timeout while waiting for SSH to become available
In #68 I reduced the general limits for the backoff, thinking that it
would speed up the upload on average because it was retrying faster. But
because it was retrying faster, the 10 available retries were used up
before SSH became available.

The new 100 retries match the 3 minutes of total timeout that the
previous solution had, and should fix all issues.

In addition I discovered that my implementation in
`hcloudimages/backoff.ExponentialBackoffWithLimit` has a bug where the
calculated offset could overflow before the limit was applied, resulting
in negative durations. I did not fix the issue because `hcloud-go`
provides such a method natively nowadays. Instead I marked the method as
deprected, to be removed in a later release.
2025-05-09 15:55:08 +02:00

41 lines
984 B
Go

// SPDX-License-Identifier: MPL-2.0
// From https://github.com/hetznercloud/terraform-provider-hcloud/blob/v1.46.1/internal/control/retry.go
// Copyright (c) Hetzner Cloud GmbH
package control
import (
"context"
"time"
"github.com/hetznercloud/hcloud-go/v2/hcloud"
"github.com/apricote/hcloud-upload-image/hcloudimages/contextlogger"
)
// Retry executes f at most maxTries times.
func Retry(ctx context.Context, maxTries int, f func() error) error {
logger := contextlogger.From(ctx)
var err error
backoffFunc := hcloud.ExponentialBackoffWithOpts(hcloud.ExponentialBackoffOpts{Multiplier: 2, Base: 200 * time.Millisecond, Cap: 2 * time.Second})
for try := 0; try < maxTries; try++ {
if ctx.Err() != nil {
return ctx.Err()
}
err = f()
if err != nil {
sleep := backoffFunc(try)
logger.DebugContext(ctx, "operation failed, waiting before trying again", "try", try, "backoff", sleep)
time.Sleep(sleep)
continue
}
return nil
}
return err
}