- 2026-01-14
- it
- #fastapi, #ceph, #radosgw, #s3, #object storage, #storage, #r2, #infiniband, #clustering, #clustermax, #roce, #nccl, #all_reduce_perf
At $Dayjob Instant Cluster
Major things at verda / what did I do | learn?
I got better at making coffee in the Sage machine, still not perfect but it's getting there :)
Instant Cluster product went live
We got Bronze in two ClusterMaxx v1 and v2.
I've spent quite much time into finding race conditions.
What I'm looking forward to next year is to get around to making the backend backend API for this even more resilient. For example, wouldn't it be nice to be able to do a rolling upgrade of the API without affecting existing flows?
multi-node all_reduce_perf with RoCE backend network
Sure took a while to learn:
show_gidscommand to find out which IB_GID_INDEX
These kind of extra flags to mpirun to get all_reduce_perf running:
mpirun -H 10.1.1.5:8,10.1.1.6:8 \
-x NCCL_IB_GID_INDEX=5 \
-mca coll ^hcoll \
-mca pml ob1 \
-mca btl tcp,self \
./build/all_reduce_perf -e 8G -n 200 -b 512M -f 2 -g 1
Do you know of a better way to run it?
A RadosGW temporary user API
Trip to Palo Alto and focus time to work on CEPH radosgw API, worked on a wrapper that replicated cloudflare R2's temporary access credentials that are given access to a certain prefix.
How would you implement something like that? This was for one single customer and a single bucket and I wanted to avoid adding a database & state to this API that currently was just a wrapper in front of CEPH's radosgw API.
I went with radosgw user names with _ separated keys & values to indicate:
- for which user this was for
- when it expires (epoch timestamp)
And then a cron task that calls an endpoint that goes through the users for this one customer and checks expiry time and deletes those users.
This was quite fun to figure out as this also involves managing S3 policies and to create those for a bucket you need to use the S3 API, not the radosgw API.