Using Python, Django, and Boto3 with Scaleway Object Storage
This excerpt simply exists to help you understand how to use the popular boto3 library to work with Scaleway's Object Storage. You might also consider djang-storages.
This is post is an excerpt as part of my own journey in making NewShots, a not-so-simple news outlet screenshot capture site.
More specifically, this excerpt simply exists to help you understand how to use the popular boto3 library to work with Scaleway's Object Storage. Their aim is to offer an Amazon S3-compatible file/objects storage system.
Personal Opinion Warning
In my personal opinion (I am not paid or compensated in any way by Scaleway), it works well and as expected for my simple use cases of CRUD (create/retrieve/update/delete) on objects. The security model is simple and straight-forward. I find it much easier to work with versus AWS, but the tradeoff is probably with security and other big enterprisey features.
My original use case
I need to save images that come in over webhooks from the screenshot service.
- The screenshot service fires a webhook to the backend.
- I unwrap the JSON payload and look at the file-location of the image
- I pull that image down and immediately re-upload it into my own s3-compatible bucket.
The initial MVP saved them to disk, but my little VM ran out of space after only a few thousand hi-res screenshots started piling up. This was the first scaling issue I hit – disk space.
Let's get started!
- Assemble your configuration options
- Create the session object
- ready the content to upload/update
- Perform your operations
- Cleanup
Assemble your config options
I prefer to put access keys and config options into either settings.py
or using django-solo. The former is out of scope for this document, but I like it.
This is a little over-simplified because I prefer to configer settings like this via environment variables (again, out of scope for this post).
settings.py
AWS_ACCESS_KEY_ID = 'myaccessid'
AWS_SECRET_ACCESS_KEY = 'mysecretkey'
AWS_STORAGE_BUCKET_NAME = 'mybucket-2020'
AWS_DEFAULT_ACL = 'public-read'
AWS_S3_REGION_NAME = 'nl-ams'
AWS_S3_ENDPOINT_URL = 'https://s3.nl-ams.scw.cloud'
Let's break this down a little bit.
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
can be obtained from the credential control panel under API TOKENS.AWS_STORAGE_BUCKET_NAME
is the name of the bucket you create on objects administration pageAWS_DEFAULT_ACL
is set topublic-read
so that the objects can be pulled from a URL without any access keys or time-limited signatures.AWS_S3_REGION_NAME
andAWS_S3_ENDPOINT_URL
should be configured so that boto3 knows to point to Scaleways resources. (We are not actually using AWS, afterall)
All of this is references in the Scaleways docs on Object Storage.
Creating a session object
Ok, now that we have our credentials and settings done, we are ready to access a session object that makes all the operations possible. In the following code we we simple import the settings module and the instantiate the client.
from django.conf import settings
s3 = boto3.client('s3',
region_name=settings.AWS_S3_REGION_NAME,
endpoint_url=settings.AWS_S3_ENDPOINT_URL,
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY
)
Before we dig deeper...
This is really where the tricky part ends. The rest is standard use of the Boto3 library, and I think you should look at the documentation here:
Create/ Upload an object
OK, we have the session object, now we can do something. Let's start with uploing
file_bytes = requests.get(image_url).content
s3_object_name = 'myobjectname.jpg'
s3.put_object(
Body=file_bytes,
Bucket=f"{settings.AWS_STORAGE_BUCKET_NAME}",
Key=s3_object_name,
ACL='public-read',
CacheControl='max-age=31556926' # 1 year
)
I cheat here a bit as we will simple use the popular requests library to download an image from somewhere. the raw bytes are put into the file_bytes
variable.
Now the good part – using the session object s3
, we put
the object and name it using the s3_object_name
variable. It is that easy.
Note – I am lazy here because there is no error handling. The operation can fail and leave our app in an unknown state (most probably a crash with unhandled exception)
Retrieve (Download) an Object
I prefer to grab the object directly using HTTPS, but you can do the same with the session object.
Or, you can refer to:
s3.download_file
- download a file and write it to the local filesystems3.download_fileobj
- download file and hand back the bytes
Delete an object
Deleting is pretty straght-forward. Using the same session object we can delete an object by passing in the Bucket
and Key
names.
s3.delete_object(Bucket=settings.AWS_STORAGE_BUCKET_NAME, Key=s3_object_name)
More Django-centric options
Wow, so you made it this far. Thanks. I should mention that using boto3 directly with Django works well. I have no complaints, but if you are looking tighter integration with Django, then you might want to consider django-storages. It offers convenient tie-ins with the way django saves files and works with models. One nice thing I like is that it will automatially delete objects when you delete the model.
If I make a Part II about my s3 journey with NewShots, it will be how I moved from boto3 to django-storages.