AWS STS Session Token Support for Backups

Feature Request: AWS STS Session Token Support for Backups

Summary

Add support for AWS STS temporary credentials (specifically AWS_SESSION_TOKEN) in both scheduled backups and PITR binlog collector components.

Current Behavior

Both backup components currently have issues with AWS STS temporary credentials:

Scheduled Backups

The containerOptions.args.xbcloud feature appears to support passing --s3-session-token, but variable substitution does not work. When configuring:

containerOptions:
  env:
    - name: AWS_SESSION_TOKEN
      valueFrom:
        secretKeyRef:
          name: backup-creds
          key: AWS_SESSION_TOKEN
  args:
    xbcloud:
      - "--s3-session-token=${AWS_SESSION_TOKEN}"

The backup script receives the literal string --s3-session-token=${AWS_SESSION_TOKEN} instead of the expanded token value. This is because:

  1. The operator passes containerOptions.args.xbcloud to the environment variable XBCLOUD_EXTRA_ARGS
  2. The backup script at /backup/lib/pxc/backup.sh line 10 does:
    XBCLOUD_ARGS="--curl-retriable-errors=7 $XBCLOUD_EXTRA_ARGS"
    
  3. Shell variable references inside $XBCLOUD_EXTRA_ARGS are not expanded because they’re already strings, not shell code

The result is xbcloud receiving a literal ${AWS_SESSION_TOKEN} string:

xbcloud put ... '--s3-session-token=${AWS_SESSION_TOKEN}' ...
xbcloud: Probe failed. Please check your credentials and endpoint settings.

PITR Binlog Collector

The PITR binlog collector uses the minio-go SDK directly with a hardcoded empty session token in pkg/pxc/backup/storage/storage.go:

Creds: credentials.NewStaticV4(accessKeyID, secretAccessKey, ""),  // Empty session token

Use Case

AWS STS temporary credentials are commonly used to:

  1. Follow AWS security best practices by avoiding long-lived static credentials
  2. Implement fine-grained, time-limited access control
  3. Enable cross-account S3 access via AssumeRole
  4. Comply with organizational security policies that mandate temporary credentials

Without session token support, users must either:

  • Disable backups entirely or accept that they will fail with STS credentials
  • Use static long-lived credentials (not always possible due to security requirements)

Technical Analysis

Root Cause 1: Scheduled Backups (xbcloud variable expansion)

The backup shell scripts don’t expand environment variable references passed via XBCLOUD_EXTRA_ARGS. The script at /backup/lib/pxc/backup.sh simply concatenates the string without evaluation:

XBCLOUD_ARGS="--curl-retriable-errors=7 $XBCLOUD_EXTRA_ARGS"

When XBCLOUD_EXTRA_ARGS contains --s3-session-token=${AWS_SESSION_TOKEN}, the ${AWS_SESSION_TOKEN} is treated as a literal string because:

  • It’s already a string value in the environment variable
  • Shell expansion only happens once when the line is executed
  • Nested variable references in strings are not automatically expanded

Files requiring changes:

  • /backup/lib/pxc/backup.sh - Add eval or use envsubst to expand variables in XBCLOUD_EXTRA_ARGS
  • /backup/run_backup.sh - Same fix needed
  • /backup/recovery-cloud.sh - Same fix needed

Proposed fix for backup scripts:

# Option 1: Use eval (simple but requires careful escaping)
XBCLOUD_EXTRA_ARGS=$(eval echo "$XBCLOUD_EXTRA_ARGS")
XBCLOUD_ARGS="--curl-retriable-errors=7 $XBCLOUD_EXTRA_ARGS"

# Option 2: Use envsubst (safer)
XBCLOUD_EXTRA_ARGS=$(echo "$XBCLOUD_EXTRA_ARGS" | envsubst)
XBCLOUD_ARGS="--curl-retriable-errors=7 $XBCLOUD_EXTRA_ARGS"

Root Cause 2: PITR Binlog Collector (minio-go SDK)

The issue is in pkg/pxc/backup/storage/storage.go at line 100-101:

minioClient, err := minio.New(strings.TrimRight(endpoint, "/"), &minio.Options{
    Creds:     credentials.NewStaticV4(accessKeyID, secretAccessKey, ""),  // Empty session token
    Secure:    useSSL,
    Region:    region,
    Transport: transport,
})

The third parameter to credentials.NewStaticV4() is the session token, which is hardcoded to an empty string "".

PITR Code Flow

  1. PITR Deployment is created by pkg/pxc/app/binlogcollector/binlog-collector.go
  2. Environment variables are set from the credentials secret in getStorageEnvs() (lines 241-336)
  3. The collector binary reads config from environment variables in cmd/pitr/collector/collector.go (lines 104-110)
  4. Storage client is created in collector.New() calling storage.NewS3() (line 147)
  5. storage.NewS3() creates the minio client with hardcoded empty session token

Files Requiring Changes for PITR

  1. pkg/pxc/backup/storage/storage.go - Add session token parameter to NewS3() function

  2. cmd/pitr/collector/collector.go - Add SessionToken field to BackupS3 config struct

  3. pkg/pxc/app/binlogcollector/binlog-collector.go - Add AWS_SESSION_TOKEN environment variable to the PITR deployment

  4. api/v1/pxc_types.go (optional) - Add sessionToken field to BackupStorageS3Spec if a dedicated secret key reference is preferred

Proposed Solution

Part 1: Fix Scheduled Backups (xbcloud variable expansion)

Update the backup shell scripts to expand environment variables in XBCLOUD_EXTRA_ARGS:

In /backup/lib/pxc/backup.sh, /backup/run_backup.sh, and /backup/recovery-cloud.sh:

# Before using XBCLOUD_EXTRA_ARGS, expand any variable references
if [ -n "$XBCLOUD_EXTRA_ARGS" ]; then
    XBCLOUD_EXTRA_ARGS=$(echo "$XBCLOUD_EXTRA_ARGS" | envsubst)
fi
XBCLOUD_ARGS="--curl-retriable-errors=7 $XBCLOUD_EXTRA_ARGS"

This allows users to configure:

containerOptions:
  env:
    - name: AWS_SESSION_TOKEN
      valueFrom:
        secretKeyRef:
          name: backup-creds
          key: AWS_SESSION_TOKEN
  args:
    xbcloud:
      - "--s3-session-token=${AWS_SESSION_TOKEN}"

Part 2: Fix PITR Binlog Collector

Add support for an optional AWS_SESSION_TOKEN key in the existing credentials secret. If present, it will be used; if absent, behavior remains unchanged (backwards compatible).

Change 1: pkg/pxc/backup/storage/storage.go

// Modify function signature to accept sessionToken
func NewS3(
    ctx context.Context,
    endpoint,
    accessKeyID,
    secretAccessKey,
    sessionToken,  // NEW PARAMETER
    bucketName,
    prefix,
    region string,
    verifyTLS bool,
    caBundle []byte,
) (Storage, error) {
    // ... existing code ...

    minioClient, err := minio.New(strings.TrimRight(endpoint, "/"), &minio.Options{
        Creds:     credentials.NewStaticV4(accessKeyID, secretAccessKey, sessionToken),  // USE sessionToken
        Secure:    useSSL,
        Region:    region,
        Transport: transport,
    })

    // ... rest of function ...
}

Change 2: cmd/pitr/collector/collector.go

type BackupS3 struct {
    Endpoint     string `env:"ENDPOINT" envDefault:"s3.amazonaws.com"`
    AccessKeyID  string `env:"ACCESS_KEY_ID,required"`
    AccessKey    string `env:"SECRET_ACCESS_KEY,required"`
    SessionToken string `env:"AWS_SESSION_TOKEN"`  // NEW FIELD (optional)
    BucketURL    string `env:"S3_BUCKET_URL,required"`
    Region       string `env:"DEFAULT_REGION,required"`
}

And update the call to storage.NewS3():

s, err = storage.NewS3(
    ctx,
    c.BackupStorageS3.Endpoint,
    c.BackupStorageS3.AccessKeyID,
    c.BackupStorageS3.AccessKey,
    c.BackupStorageS3.SessionToken,  // NEW ARGUMENT
    bucketArr[0],
    prefix,
    c.BackupStorageS3.Region,
    c.VerifyTLS,
    caBundle,
)

Change 3: pkg/pxc/app/binlogcollector/binlog-collector.go

In getStorageEnvs() function, add the session token environment variable:

case api.BackupStorageS3:
    if storage.S3 == nil {
        return nil, errors.New("s3 storage is not specified")
    }
    envs = []corev1.EnvVar{
        {
            Name: "SECRET_ACCESS_KEY",
            ValueFrom: &corev1.EnvVarSource{
                SecretKeyRef: app.SecretKeySelector(storage.S3.CredentialsSecret, "AWS_SECRET_ACCESS_KEY"),
            },
        },
        {
            Name: "ACCESS_KEY_ID",
            ValueFrom: &corev1.EnvVarSource{
                SecretKeyRef: app.SecretKeySelector(storage.S3.CredentialsSecret, "AWS_ACCESS_KEY_ID"),
            },
        },
        // NEW: Add session token (optional key)
        {
            Name: "AWS_SESSION_TOKEN",
            ValueFrom: &corev1.EnvVarSource{
                SecretKeyRef: &corev1.SecretKeySelector{
                    LocalObjectReference: corev1.LocalObjectReference{
                        Name: storage.S3.CredentialsSecret,
                    },
                    Key:      "AWS_SESSION_TOKEN",
                    Optional: ptr.To(true),  // Optional - won't fail if key doesn't exist
                },
            },
        },
        // ... rest of existing env vars ...
    }

Change 4: cmd/pitr/recoverer/recoverer.go

Apply the same changes to the recoverer for consistency:

type BackupS3 struct {
    Endpoint     string `env:"ENDPOINT" envDefault:"s3.amazonaws.com"`
    AccessKeyID  string `env:"ACCESS_KEY_ID,required"`
    AccessKey    string `env:"SECRET_ACCESS_KEY,required"`
    SessionToken string `env:"AWS_SESSION_TOKEN"`  // NEW FIELD
    BackupDest   string `env:"S3_BUCKET_URL,required"`
    Region       string `env:"DEFAULT_REGION,required"`
}

type BinlogS3 struct {
    Endpoint     string `env:"BINLOG_S3_ENDPOINT" envDefault:"s3.amazonaws.com"`
    AccessKeyID  string `env:"BINLOG_S3_ACCESS_KEY_ID,required"`
    AccessKey    string `env:"BINLOG_S3_SECRET_ACCESS_KEY,required"`
    SessionToken string `env:"BINLOG_S3_SESSION_TOKEN"`  // NEW FIELD
    Region       string `env:"BINLOG_S3_REGION,required"`
    BucketURL    string `env:"BINLOG_S3_BUCKET_URL,required"`
}

And update the calls to storage.NewS3() accordingly.

Backwards Compatibility

This solution maintains full backwards compatibility:

  • Existing deployments without AWS_SESSION_TOKEN in their secrets will continue to work unchanged
  • The session token parameter defaults to empty string if not provided
  • No changes required to existing CRs unless users want to enable STS support

Testing Recommendations

  1. Unit tests: Add tests for storage.NewS3() with and without session token
  2. Integration test with STS credentials:
    • Create credentials using aws sts assume-role
    • Store all three values (access key, secret key, session token) in Kubernetes secret
    • Verify PITR uploads succeed
    • Verify PITR recovery works
  3. Backwards compatibility test: Verify existing deployments without session token continue to function

Additional Context

Related: CloudNativePG Implementation

For reference, the CloudNativePG Barman plugin handles this well:

  • Supports sessionToken in the ObjectStore CRD
  • Reads credentials fresh from Kubernetes secrets on each operation
  • Full STS support out of the box

See: Hello from Barman Cloud CNPG-I plugin | Barman Cloud CNPG-I plugin

Environment

  • PXC Operator version: 1.18.0 (also affects earlier versions)
  • Kubernetes: Any
  • Cloud provider: AWS (or S3-compatible with STS support)

Summary

Adding AWS STS session token support to both scheduled backups and PITR would:

  1. Align with AWS security best practices for temporary credentials
  2. Enable use of STS AssumeRole for S3 backup storage
  3. Require minimal code changes:
    • Scheduled backups: Add envsubst expansion in 3 shell scripts
    • PITR: Add session token parameter across 4 Go files
  4. Maintain full backwards compatibility

Thank you for considering this feature request.

1 Like

Hi @gmautner, I have created two tasks:

to add support for AWS Session Token.

2 Likes

Good news: I was able to figure out how to pass the STS Token to the Backup Jobs.

By inserting this configuration in the PXC CR:

    storages:
      s3-backup:
        type: s3
        containerOptions:
          env:
            - name: AWS_SESSION_TOKEN
              valueFrom:
                secretKeyRef:
                  name: backup-creds
                  key: AWS_SESSION_TOKEN
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: backup-creds
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: backup-creds

And, NOT providing the containerOptions.args key, the xbcloud commands executed correctly.

Is seems that only by loading the AWS credentials into the environment is all is needed to make it work.

So, all that remains is the PITR issue.

I can make backups work without PITR according to the above. However, that isn’t very helpful either because the restores with PerconaXtraDBClusterRestore don’t support Session Tokens, they read only access and secret keys from the secret. Adding below a draft for how to fix it as well.

Feature Request: Add AWS Session Token Support for S3 Restore Validation

Summary

The Percona XtraDB Cluster Operator v1.18.0 does not support AWS STS temporary credentials (session tokens) when validating S3 backups during restore operations. This prevents using AWS Security Token Service (STS) where scoped, temporary credentials are preferred over permanent IAM user credentials.

Use Case

  • Backup jobs (via containerOptions.env with AWS_SESSION_TOKEN)
  • Secret pointed to by backupSource.s3.credentialsSecret containing AWS_SESSION_TOKEN

However, restore validation fails because the operator’s internal S3 client does not read or use AWS_SESSION_TOKEN from the credentials secret.

Current Behavior

When processing a PerconaXtraDBClusterRestore, the operator:

  1. Reads credentials from the secret specified in backupSource.s3.credentialsSecret
  2. Only extracts AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  3. Creates an S3 client with an empty session token
  4. Fails with “Access Denied” when calling BucketExists()

Error observed:

failed to validate restore job: failed to validate backup existence:
failed to create s3 client: failed to check if bucket exists: Access Denied.

Proposed Solution

Add support for reading AWS_SESSION_TOKEN from the credentials secret and passing it to the minio-go S3 client.

Changes Required

1. pkg/pxc/backup/storage/options.go

Add SessionToken field to S3Options struct:

type S3Options struct {
	Endpoint        string
	AccessKeyID     string
	SecretAccessKey string
	SessionToken    string  // NEW FIELD
	BucketName      string
	Prefix          string
	Region          string
	VerifyTLS       bool
}

Update getS3Options() function to read the session token:

func getS3Options(
	ctx context.Context,
	cl client.Client,
	cluster *api.PerconaXtraDBCluster,
	s3 *api.BackupStorageS3Spec,
	verifyTLS *bool,
) (*S3Options, error) {
	secret := new(corev1.Secret)
	err := cl.Get(ctx, types.NamespacedName{
		Name:      s3.CredentialsSecret,
		Namespace: cluster.Namespace,
	}, secret)
	if client.IgnoreNotFound(err) != nil {
		return nil, errors.Wrap(err, "failed to get secret")
	}

	accessKeyID := string(secret.Data["AWS_ACCESS_KEY_ID"])
	secretAccessKey := string(secret.Data["AWS_SECRET_ACCESS_KEY"])
	sessionToken := string(secret.Data["AWS_SESSION_TOKEN"])  // NEW LINE

	// ... existing code ...

	return &S3Options{
		Endpoint:        s3.EndpointURL,
		AccessKeyID:     accessKeyID,
		SecretAccessKey: secretAccessKey,
		SessionToken:    sessionToken,  // NEW FIELD
		BucketName:      bucket,
		Prefix:          prefix,
		Region:          region,
		VerifyTLS:       verify,
	}, nil
}

Update getS3OptionsFromBackup() similarly:

func getS3OptionsFromBackup(ctx context.Context, cl client.Client, cluster *api.PerconaXtraDBCluster, backup *api.PerconaXtraDBClusterBackup) (*S3Options, error) {
	secret := new(corev1.Secret)
	err := cl.Get(ctx, types.NamespacedName{
		Name:      backup.Status.S3.CredentialsSecret,
		Namespace: backup.Namespace,
	}, secret)
	if client.IgnoreNotFound(err) != nil {
		return nil, errors.Wrap(err, "failed to get secret")
	}
	accessKeyID := string(secret.Data["AWS_ACCESS_KEY_ID"])
	secretAccessKey := string(secret.Data["AWS_SECRET_ACCESS_KEY"])
	sessionToken := string(secret.Data["AWS_SESSION_TOKEN"])  // NEW LINE

	// ... existing code ...

	return &S3Options{
		Endpoint:        backup.Status.S3.EndpointURL,
		AccessKeyID:     accessKeyID,
		SecretAccessKey: secretAccessKey,
		SessionToken:    sessionToken,  // NEW FIELD
		BucketName:      bucket,
		Prefix:          prefix,
		Region:          region,
		VerifyTLS:       verifyTLS,
	}, nil
}

2. pkg/pxc/backup/storage/storage.go

Update NewS3() function signature and implementation:

func NewS3(ctx context.Context, endpoint, accessKeyID, secretAccessKey, sessionToken, bucketName, prefix, region string, verifyTLS bool) (Storage, error) {
	// ... existing endpoint logic ...

	minioClient, err := minio.New(strings.TrimRight(endpoint, "/"), &minio.Options{
		Creds:     credentials.NewStaticV4(accessKeyID, secretAccessKey, sessionToken),  // CHANGED: was ""
		Secure:    useSSL,
		Region:    region,
		Transport: transport,
	})
	// ... rest of function ...
}

Update NewClient() to pass the session token:

func NewClient(ctx context.Context, opts Options) (Storage, error) {
	switch opts.Type() {
	case api.BackupStorageS3:
		opts, ok := opts.(*S3Options)
		if !ok {
			return nil, errors.New("invalid options type")
		}
		return NewS3(ctx, opts.Endpoint, opts.AccessKeyID, opts.SecretAccessKey, opts.SessionToken, opts.BucketName, opts.Prefix, opts.Region, opts.VerifyTLS)
		//                                                                        ^^^^^^^^^^^^^^^^^ NEW PARAMETER
	// ... rest of switch ...
	}
}

Backward Compatibility

This change is fully backward compatible:

  • If AWS_SESSION_TOKEN is not present in the secret, an empty string is read
  • The minio-go credentials.NewStaticV4() function already accepts an empty string for the session token parameter (current behavior)
  • Existing configurations without session tokens continue to work unchanged

Testing

  1. Existing behavior: Secrets with only AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY should continue to work
  2. New behavior: Secrets with all three keys (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN) should successfully authenticate using STS temporary credentials

Environment

  • Operator Version: 1.18.0
  • Kubernetes: EKS (also applicable to any K8s distribution)
  • Storage: AWS S3

Related

  • The backup jobs already support session tokens via containerOptions.env - this change brings parity to the operator’s internal validation
  • Similar support may be needed for the backup validation code path if not already present
1 Like