Mastodon S3 Timeout and cascading failure!

By shlee on 19 June 2024 • ( Leave a comment )

Post Views: 97

TLDR: CHECKING YOUR WEB/PUMA LOGS AND SIDEKIQ DEAD QUEUE FOR S3 ReadTimeout ERRORS FOR FUN AND PROFIT!!!

I recently started investigating the high bandwidth being used by my Mastodon instance, and noticed a lot of errors (similar to below) showing failures related to uploading files to my S3 provider Wasabi.

Aws::S3::MultipartUploadError (multipart upload failed: Net::ReadTimeout with #<TCPSocket:(closed)>):

lib/paperclip/attachment_extensions.rb:87:in `block in save'
lib/paperclip/attachment_extensions.rb:93:in `save'
app/controllers/api/v2/media_controller.rb:5:in `create'
app/controllers/concerns/localized.rb:11:in `set_locale'
lib/mastodon/rack_middleware.rb:9:in `call'

and to resolve the ReadTimeout issue. I tried to modify the default timeouts from 5 to 15, and added the option to retry the failed upload.

S3_MULTIPART_THRESHOLD=52428800 ## 50MB I believe this isn't required
S3_OPEN_TIMEOUT=15
S3_READ_TIMEOUT=15
S3_RETRY_LIMIT=1

Daily bandwidth for Aus.Social (Before and After)

As you can see, my daily usage dropped by 50%

Wasabi Usage for Aus.Social (Before and After)

I’ve added a bucket lifecycle to delete failed multipart uploads after 1 day, and this has resulted in my Wasabi bucket dropping from 12TB to 7.45TB (or a 35% drop)

Plus: this makes the Mastodon tootctl media usage look much closer to the reality invoiced from Wasabi!

Outcome

My understanding is Mastodon was spending half of the bandwidth getting stuck in a loop.

Download remote media from another mastodon instance
Try to upload the media and fail.
Download the media again and fail the upload again (repeat multiple times)

Now my instance is:

downloading the remote media
uploading once without issue.

This has dropped my bandwidth costs and storage costs dramatically!

and lowering my CPU usage because it’s not testing/transcoding the remote media over and over

win win win

Categories: Uncategorised

Shlee

Lover and Sickle

Mastodon S3 Timeout and cascading failure!

Daily bandwidth for Aus.Social (Before and After)

Wasabi Usage for Aus.Social (Before and After)

Outcome

win win win

shlee

Leave a Reply Cancel reply