Upload files to Rook Ceph or S3 Bucket using s5cmd

Josphat Mutai
May 2, 2025
1:09 pm
No Comments

s5cmd is a CLI utility used to access S3 bucket and manage files – upload, or delete objects in an S3 bucket. It is written in Go and highly optimized for speed and parallelism, while maintaining its simplicity. If you are a developer, DevOps engineer, or data scientists with a need to manage large-scale S3 operations – s5cmd this is an ideal tool for you. Alternative tools to s5cmd include s3cmd and aws-cli, but neither is optimized for high performance.

Features of s5cmd

Here is a list of standard features available in s5cmd:

List buckets and objects
Upload, download or delete objects
Move, copy or rename objects
Set Server Side Encryption using AWS Key Management Service (KMS)
Set Access Control List (ACL) for objects/files on the upload, copy, move.
Print object contents to stdout
Select JSON records from objects using SQL expressions
Create or remove buckets
Summarize objects sizes, grouping by storage class
Wildcard support for all operations
Multiple arguments support for delete operation
Command file support to run commands in batches at very high execution speeds
Dry run support
S3 Transfer Acceleration support
Google Cloud Storage (and any other S3 API compatible service) support
Structured logging for querying command outputs
Shell auto-completion
S3 ListObjects API backward compatibility

s5cmd comparison with AWS CLI

Feature	`s5cmd`	`aws s3 CLI`
Parallel Transfers	✅ Yes	⚠️ Limited
Wildcard Support	✅ Yes	✅ Yes
Batch Mode	✅ Yes	❌ No
Performance	🚀 High	🐢 Moderate
Syntax Simplicity	✅ Friendly	❌ Verbose

Installing s5cmd on Linux, macOS, BSD

Visit s5cmd github releases page to check the latest version. It is also possible to get the latest release tag programmatically by running the following command:

VER=$(curl -s https://api.github.com/repos/peak/s5cmd/releases/latest|grep tag_name|cut -d '"' -f 4|sed 's/v//')

Next download the binary:

Linux Intel 64-bit

wget https://github.com/peak/s5cmd/releases/download/v${VER}/s5cmd_${VER}_Linux-64bit.tar.gz
tar xvf s5cmd_${VER}_Linux-64bit.tar.gz
chmod +x s5cmd
mv s5cmd /usr/local/bin/

Linux ARM 64-bit

wget https://github.com/peak/s5cmd/releases/download/v${VER}0/s5cmd_${VER}_Linux-arm64.tar.gz
tar xvf s5cmd_${VER}_Linux-arm64.tar.gz
chmod +x s5cmd
mv s5cmd /usr/local/bin/

macOS

brew install peak/tap/s5cmd

FreeBSD

pkg install s5cmd

You can also run s5cmd in docker container:

docker pull peakcom/s5cmd
docker run --rm -v ~/.aws:/root/.aws peakcom/s5cmd <S3 operation>

Run s5cmd to see commonly used commands and options:

$ s5cmd
NAME:
   s5cmd - Blazing fast S3 and local filesystem execution tool

USAGE:
   s5cmd [global options] command [command options] [arguments...]

COMMANDS:
   ls              list buckets and objects
   cp              copy objects
   rm              remove objects
   mv              move/rename objects
   mb              make bucket
   rb              remove bucket
   select          run SQL queries on objects
   du              show object size usage
   cat             print remote object content
   pipe            stream to remote from stdin
   run             run commands in batch
   sync            sync objects
   version         print version
   bucket-version  configure bucket versioning
   presign         print remote object presign url
   head            print remote object metadata
   help, h         Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --credentials-file value       use the specified credentials file instead of the default credentials file
   --dry-run                      fake run; show what commands will be executed without actually executing them (default: false)
   --endpoint-url value           override default S3 host for custom services [$S3_ENDPOINT_URL]
   --help, -h                     show help (default: false)
   --install-completion           get completion installation instructions for your shell (only available for bash, pwsh, and zsh) (default: false)
   --json                         enable JSON formatted output (default: false)
   --log value                    log level: (trace, debug, info, error) (default: info)
   --no-sign-request              do not sign requests: credentials will not be loaded if --no-sign-request is provided (default: false)
   --no-verify-ssl                disable SSL certificate verification (default: false)
   --numworkers value             number of workers execute operation on each object (default: 256)
   --profile value                use the specified profile from the credentials file
   --request-payer value          who pays for request (access requester pays buckets)
   --retry-count value, -r value  number of times that a request will be retried for failures (default: 10)
   --stat                         collect statistics of program execution and display it at the end (default: false)
   --use-list-objects-v1          use ListObjectsV1 API for services that don't support ListObjectsV2 (default: false)

Configure s5cmd with S3 access credentials

For Rook Ceph, check out our post on how to create buckets and obtain access credentials.

Use Rook-Ceph for S3 Bucket Storage

If using Vanilla Ceph Setup, check out:

Create Ceph Bucket User with Quotas using radosgw-admin

Create ~/.aws directory if doesn’t exist already:

mkdir ~/.aws

Then create a file that will store credentials:

touch ~/.aws/credentials

Edit the file to set

[default]
aws_access_key_id = <AWS_ACCESS_KEY_ID>
aws_secret_access_key = <AWS_SECRET_ACCESS_KEY>

You can also set a custom profile. For example:

[ceph]
aws_access_key_id = P1ZP4WJ573B6LZMCIKVW
aws_secret_access_key = TiDDV4aGv4rWJYsc6dglINW3v8j0DycfdvZAep3z

When using custom credentials file path, set it using --credentials-file:

s5cmd --credentials-file ~/.your-credentials-file

In our setup, the RADOS Gateway is accessible at http://192.168.20.15.

If not using profile or file to store credentials, you can export as ENV vars:

export AWS_SECRET_ACCESS_KEY='TiDDV4aGv4rWJYsc6dglINW3v8j0DycfdvZAep3z'
export AWS_ACCESS_KEY_ID='P1ZP4WJ573B6LZMCIKVW'

Using s5cmd: Practical Examples

Get started with s5cmd by following these clear examples that demonstrate its speed and simplicity.

Set RGW Host, Bucket name. For AWS S3 using correct endpoint.

export S3_HOST=http://192.168.20.15
export BUCKET_NAME=test-bucket

1. Upload files to the bucket

Let’s create dummy files with text contents:

echo "This is Test Bucket 1" > /tmp/testbucketest1
echo "This is Test Bucket 2" > /tmp/testbucketest2

Copy files into the bucket we just created:

s5cmd --profile ceph --endpoint-url $S3_HOST cp /tmp/testbucketest1 s3://$BUCKET_NAME
s5cmd --profile ceph --endpoint-url $S3_HOST cp /tmp/testbucketest2 s3://$BUCKET_NAME

Confirm if the file was uploaded successfully:

s5cmd --profile ceph --endpoint-url $S3_HOST ls s3://$BUCKET_NAME

Output:

2025/04/11 10:38:31                22  testbucketest1
2025/04/11 10:38:18                22  testbucketest2

2. Check object size usage

Use the du command to check the bucket usage size:

$ s5cmd --profile ceph --endpoint-url $S3_HOST du s3://$BUCKET_NAME
44 bytes in 2 objects: s3://mybucketest1

3. Rename objects in bucket

Consider the following example to move objects using mv command:

s5cmd --profile ceph --endpoint-url $S3_HOST mv s3://$BUCKET_NAME/testbucketest1 s3://$BUCKET_NAME/testbucketest11

Confirmation:

$ s5cmd --profile ceph --endpoint-url $S3_HOST ls s3://$BUCKET_NAME
2025/04/11 10:43:54                22  testbucketest11
2025/04/11 10:38:18                22  testbucketest2

4. Print contents of a remote object

To print the contents of an object in the bucket use cat command:

s5cmd --profile ceph --endpoint-url $S3_HOST cat s3://$BUCKET_NAME/testbucketest2

Output:

This is Test Bucket 2

Concatenate multiple objects matching a prefix or wildcard and print to stdout:

s5cmd cat "s3://bucket/prefix/*"

5. Using sync to copy objects

Sync a file to S3 bucket

s5cmd --profile ceph --endpoint-url $S3_HOST sync myfile.gz s3://$BUCKET_NAME/

Use the sync command to get multiple files synchronized:

mkdir temp_files && cd temp_files
for i in {1..5}; do echo "Hi from CloudSpinx $i" > file$i.txt; done

Synchronize the current directory’s files to the bucket using the sync command:

$ s5cmd --profile ceph --endpoint-url $S3_HOST sync ./ s3://$BUCKET_NAME/
cp file1.txt s3://mybucketest1/file1.txt
cp file2.txt s3://mybucketest1/file2.txt
cp file4.txt s3://mybucketest1/file4.txt
cp file3.txt s3://mybucketest1/file3.txt
cp file5.txt s3://mybucketest1/file5.txt

List files in the bucket:

$ s5cmd --profile ceph --endpoint-url $S3_HOST ls s3://$BUCKET_NAME/
2025/04/11 12:01:46                21  file1.txt
2025/04/11 12:01:46                21  file2.txt
2025/04/11 12:01:46                21  file3.txt
2025/04/11 12:01:46                21  file4.txt
2025/04/11 12:01:46                21  file5.txt
2025/04/11 10:43:54                22  testbucketest11
2025/04/11 10:38:18                22  testbucketest2

Sync local folder to s3 bucket

s5cmd --profile ceph --endpoint-url $S3_HOST sync folder/ s3:///$BUCKET_NAME/

Sync S3 bucket objects under prefix to S3 bucket.

s5cmd --profile ceph --endpoint-url $S3_HOST sync "s3://sourcebucket/prefix/*" s3://destbucket/

Sync matching S3 objects to another bucket:

s5cmd sync "s3://bucket/*.gz" s3://target-bucket/prefix/

6. Download objects locally

Download an S3 object to working directory

s5cmd cp s3://bucket/prefix/object.gz .

Download all S3 objects to a directory:

s5cmd cp "s3://bucket/*" target-directory/

Download an S3 object from a public bucket:

s5cmd --no-sign-request cp s3://bucket/prefix/object.gz

Sync single file in bucket

s5cmd --profile ceph --endpoint-url $S3_HOST sync  s3://$BUCKET_NAME/file2.txt /tmp/
cat /tmp/file2.txt

Sync S3 bucket to local folder but use size as only comparison criteria.

s5cmd --profile ceph --endpoint-url $S3_HOST sync --size-only "s3://bucket/*" folder/

7. Delete files in the bucket

Delete a single S3 object

s5cmd --profile ceph --endpoint-url $S3_HOST rm s3://$BUCKET_NAME/file5.txt

Delete all objects with a prefix

s5cmd --profile ceph --endpoint-url $S3_HOST rm "s3://bucketname/prefix/*"

Delete all objects that matches a wildcard:

s5cmd --profile ceph --endpoint-url $S3_HOST rm "s3://bucketname/*/obj*.gz"

Delete all versions of an object in the bucket:

s5cmd --profile ceph --endpoint-url $S3_HOST rm --all-versions s3://bucket/object

Delete all versions of all objects that starts with a prefix in the bucket:

s5cmd --profile ceph --endpoint-url $S3_HOST rm --all-versions "s3://bucket/prefix*"

Delete more than one object with given name:

s5cmd --profile ceph --endpoint-url $S3_HOST rm s3://$BUCKET_NAME/{file4.txt,file3.txt}

Delete all versions of all objects in the bucket:

s5cmd --profile ceph --endpoint-url $S3_HOST rm --all-versions "s3://bucket/*"

8. Delete a bucket

Before you delete a bucket, it must be empty. The commands use to remove a bucket are:

s5cmd rb s3://bucketname

Example:

# Remove all objects from the bucket
s5cmd --profile ceph --endpoint-url "$S3_HOST" rm "s3://$BUCKET_NAME/*"

# Remove the empty bucket
s5cmd --profile ceph --endpoint-url "$S3_HOST" rb "s3://$BUCKET_NAME"

In this post, we explored how to install, configure, and use s5cmd to manage Ceph S3 storage faster and more efficiently. The included examples should help you get started with confidence.

Join our Linux and open source community. Subscribe to our newsletter for tips, tricks, and collaboration opportunities!

Unlock the Right Solutions with Confidence

At CloudSpinx, we don’t just offer services - we deliver clarity, direction, and results. Whether you're navigating cloud adoption, scaling infrastructure, or solving DevOps challenges, our seasoned experts help you make smart, strategic decisions with total confidence. Let us turn complexity into opportunity and bring your vision to life.

Elearning

Cloud Services

Infra Services

Iac & GitOps

IT Support

InfoSec

Development