At work we have seen a big uptick in traffic - enough that the egress costs for S3 are exceeding a resonable budget. We use Cloudflare to cache our html, JS, CSS, etc. but have always just served our images straight from our S3 bucket. Time to change that and use Cloudflare for caching and perhaps to block some of the excess traffic.
We had already added a CloudFront distribution in front of our S3 bucket, so our initial idea was
just to point a Cloudflare name at the CloudFront url. Unfortunately that didn’t work. We could get
to the images when using the CloudFront url, but when using the Cloudflare proxy to that url, we got
Status 529: SSL handshake failed. We tried a bunch of variations - including going straight to the
CloudFront distribution name rather than the nicer domain name we had assigned to it. But no
dice. We are paying customers so we filed a ticket to get help.
The SSL handshake problem is because the domain names don’t match. To be able to proxy to the nice Cloudfront domain name, we need to add a rule that sets a Host header to each request. Steps:
- In the zone where you want to create the proxy, navigate to “Rules” and click “Create rule”.
- For what type rule to create, choose “Origin Rule”.
- Use the
Change HTTP host headertemplate to start creating your rule:- Create a
Custom filter expressionthat matches “Hostname” to the host you are setting up in Cloudflare - Set a Host Header to rewrite to the hostname of your CloudFront distribution
- Preserve the DNS Record and Destination Port (the other fields in that form).
- Create a
- Deploy.
Our Cloudflare rep had also told us to rewrite the SNI Header but that appears to be optional since
the proxy to CloudFront will work without it. That being optional surprised me a little - especially
since one of the other configuration parameters they had had us play with had been to change the
encryption mode.
When we had trouble with the SSL handshake, we had tried backing off from the default of “Full” to “Flexible” but had been advised to be more strict. When doing this configuration, we have had the encryption mode between Cloudflare and S3 set to Strict (SSL-Only Origin Pull).
If you want to rewrite the SNI header, you need to use a domain name you control. If you try to use
the url to your CloudFront distribution, <somestring>.cloudfront.net, you get the error message that
<somestring>.cloudfront.net does not belong to your account. Cloudflare doesn’t have the same
restriction for the Host header field. You can use the direct url for your CloudFront distribution
or any alternate domain name you have set up in CloudFront. Both worked.
If you try to configure the host header to point to S3 bucket url directly,
e.g. mybucket.s3.us-west-1.amazonaws.com, we still get the SSL handshake failed message and you are
prevented from trying to get around that by adding an SNI header.
So in summary, if you want to use CloudFront + Cloudflare, you will both a DNS record and and Origin Rule to proxy requests to CloudFront and thus to S3.
Cloud Connector
Or, you can dispense with CloudFront and the extra rule and use Cloud Connector to do the set up for
you. Cloud Connector is a Cloudflare feature for connecting to other cloud providers. We use
Terraform to do any configuration we can, so we followed the Cloudflare
documentation
to connect directly to the S3 bucket url <mybucket>.s3.us-west-1.amazonaws.com. From what our tech
rep said, this is a more automatic way of doing the header rewrites I was fooling with in the first
section AND apparently will do the SNI header rewrite for us (or so I surmise).
Configuring Image Serving and Cache Clearing
For our web pages, we configure Cloudflare to obey whatever caching headers we set on the origin server. It’s possbilble to get S3 to send caching headers but we decided it would be a lot easier to control image caching using a Cloudflare rule, so we created a custom Cloudflare zone just for serving images. Then we created a new domain name for each bucket we want to put behind Cloudflare and added a Cloud Connector rule for each. We use django-storages so once we have configured the AWS_S3_CUSTOM_DOMAIN, our app starts building image urls to use the new Cloudflare-backed domain.
A lot of the images we use are very stable so we can cache them for a long time. But when they change, we need to clear them from our cache. First I created a separate cache clearing backend for the images Cloudflare zone.
def get_images_zone_backend():
"""
Set up Cloudflare backend for clearing cache in images.example.com zone
"""
cf_token = getattr(settings, "CLOUDFLARE_BEARER_TOKEN", None)
if not cf_token:
if not settings.DEBUG and not settings.TESTING:
logger.error(
"cloudflare.configuration_error: ImproperlyConfigured caching: CLOUDFLARE_BEARER_TOKEN not configured"
)
return None
zone_id = getattr(settings, "CLOUDFLARE_IMAGES_ZONE_ID", None)
if not zone_id:
# Some of our installs don't use the images zone yet, so don't log an error.
return None
return MultitenantCloudflareBackend({"BEARER_TOKEN": cf_token, "ZONE_ID": zone_id})
Then we need to configure cache clearing when images change.
def clear_image_cache(file_url):
"""
Clear the Cloudflare cache for the given image file URL.
"""
backend = get_images_zone_backend()
if backend:
logger.info(f"cloudflare: Purging image file: {file_url}", zone_id=backend.cloudflare_zoneid)
backend.purge(file_url)
@receiver(pre_save, sender=CustomImage)
def clear_cache_when_image_saved(sender, instance, **kwargs):
"""
Clear the Cloudflare cache if we are saving the entire image (the 'not
update_fields' clause) or if the collection or file are changed.
"""
update_fields = kwargs.get("update_fields")
if not update_fields or "collection" in update_fields or "file" in update_fields:
clear_image_cache(instance.file.url)
# Delete the source image file when an image is deleted.
@receiver(pre_delete, sender=CustomImage)
def image_delete(sender, instance, **kwargs):
clear_image_cache(instance.file.url)
# Tell delete() not to save the instance, since we're in the middle of a delete operation for it.
instance.file.delete(save=False)
# Delete the rendition image file when a rendition is deleted.
@receiver(pre_delete, sender=CustomRendition)
def rendition_delete(sender, instance, **kwargs):
# Tell delete() not to save the instance, since we're in the middle of a delete operation for it.
clear_image_cache(instance.file.url)
instance.file.delete(save=False)
Caching for the win!
We set the cache time within Cloudflare to 7 days and set the edge TTL (the time we tell browsers to cache the image before checking for updates) to 4 hours. With those settings, we have a cache hit ratio of 99% and the S3 hosting costs are down to below what they were before the traffic increase.