File uploads and downloads in GraphQL
Part #1 of the GraphQL Patterns Series
I have been working with GraphQL for the past three years now. Noticing that there is a shortage of guideliness and patterns I decided to start this series of articles to describe the patterns I am using in my own GraphQL based efforts.
Today I will be focusing on file uploads and downloads. Uploading files in GraphQL from an implementation point of view has multiple solutions. I will describe the possible ways of implementation (their upsides and downsides) and in the end of this article I will present in depth how I upload and download files in GraphQL by using Minio, a private S3 based object storage implementation solution. If you only want the code, jump to the end of the article that contains a link to the Gitlab repository.
To my knowledge we currenly have these options for uploading a file via GraphQL:
- Serializing a file into BASE64 format
- Using Multipart request specification
- Using a cloud based solution for storage (S3 or alternatives)
- Using Minio (the option I will present in detail)
Serialization — BASE64 uploading
The file that is uploaded first gets processed on the client side and encoded using BASE64. Then it gets send as a string to the GraphQL server which then processes the request and decodes the file and saves it. The same goes for the other way around, the BASE64 encoded file gets sent to the frontend where it gets decoded and served to the user.
The downside? BASE64 is wasteful. We only use 64 different values per byte, but a byte can represent 256 different characters. To put it differently, we use bytes (which are 8-bit words) as 6-bit words. For each 8 bits of data we need to transmit, there are 2 wasted bits. If we need to send three bytes of data (24 bits) we need. to use four bytes (24 bits) which means that we use 33% more storage than we would with a normal file transmission. The other downside is that we still need to use some computational time in order to encode and decode the file.
Upside: It is easy to implement uploads and downloads and you can use GraphQL all the way with this implementation.
Downside: BASE64 encoding takes up space and extra network traffic.
Multipart requests specification for GraphQL
The second technique is using multipart form requests. This is a feature that allows you to upload files via GraphQL mutations directly. Not all server implementations support this, one of them that supports it is Apollo.
There are some downsides to this approach though. One is that this implementation is incompatible with schema stitching (or at least it was), the second one I still wasn’t able to find a viable solution for downloading files. Yes, you can use mutations to generate links but it seems like a half baked solution.
Upside: It is partly an official protocol implementation.
Downside: Not a viable solution for downloading through GraphQL directly, not all server implementations support this, not production ready.
Cloud based solution
You can use services that are available to you like S3 to store your files. This means sending the file to Cloudinary or S3 directly and passing the returned URL through a GraphQL mutation. However, coordinating these multiple requests might be hard to manage.
Upside: We stay in what we know best, REST and the APIs cover everything for us.
Downside: Extra costs (depending on how much traffic you need).
The project I was working on needed cloud based storage and the requirement was that the data center needs to be located in the country of residence. Because this was not a viable option I needed to search for alternatives.
The first one that popped out was Minio which is a high performance, kubernetes-friendly object storage implementation based on the S3 API.
The idea is, lets use Minio for storage and have a microservice that communicates with Minio and only gives GraphQL the links needed for file uploads and downloads. Because I also needed time-based links that would expire, this was a perfect solution.
The following schema explains the approach:
So let’s explain this from a technical point of view by starting what we did on GraphQL itself.
Note that the example that follows simplifies things a bit. We are not sending any app id’s to our service, just regular bucket names.
The initial schema for our communication with a Minio microservice would look like this:
Because we will not be querying any data we just have a dummy query here, but we do have three mutations:
The first mutation is the one that creates a new bucket. A bucket is like a folder which holds our data and each bucket can have multiple files in it. As an input it only receives one argument called bucketId which is actually the name of the bucket.
The mutations that follow are responsible for generating pre-signed URLs which will be valid for a fixed amount of time. For the data input they receive the bucketId and a fileName.
The mutations actually communicate with the storage microservice which is responsible for the communication with our Minio server. For that we use the RESTDataSource model. This is the code that communicates with our microservice:
We won’t be going into details here. The internal URL for our microservice is storage_service and it runs on port 8080. The rest is just calling our endpoints on that service.
The storage service is a simple Express server with the following routes:
- upload (which returns presigned one-time upload links)
- download (which returns presigned one-time download links)
- bucket (which creates a new bucket)
All of them have a single POST request in them and they all use the Minio Client object which is just a library that allows us to call the endpoints of the Minio server more easily. To initialize the Minio client we create a file called minio.js in our utils folder:
The environment variables are located in the root of the project in the .env file. Both the access key and the secret key are configured in the Docker Compose file of the project under the Minio section. With the client we can use various API calls which are documented in the NPM package page.
Now we just add the specific API calls to the endpoints we need (I will only be explaining the upload portion because the download portion is the same and follows the same procedure):
The presignedGetObject method receives a bucket name parameter, the filename for which we will be generating the upload link and an integer which states how long this link will be valid. Note that the filename of the uploaded file must be the same as the link that you are generating it for, otherwise the upload link will not be valid. The data then gets returned to the GraphQL server which returns an upload link to the frontend.
So lets have a look at the final result:
We open up Minio and check that the bucket space is empty. Then we run our mutation for creating a bucket called test. Afterwards we create an upload link for a file called WD_Passport.pdf in the bucket test. We upload it by using Postman and the URL that was given to us as the response from the upload mutation (in a frontend project we would just use either a form that submits to a specific URL or just a request in the background that submits a file). Later we use the download mutation to create a download link for that file and successfuly download it.
Being a sample implementation, we can make a lot of improvements on it. As noted in the introduction, I use a more in-depth implementation which allows the service and mutations to be called through a variety of applications that work in the same ecosystem. This way I can get buckets per client for different apps.
Upside: We have a centralized scalable file storage system. We use GraphQL to fetch download and upload links and we can also assign validity to them. It can be used across a multitude of applications.
Downside: Files do not go through GraphQL, it is merely a proxy for getting links for the files.
The whole sample project is available on Github. The repository also contains a Postman file which you can use to test the endpoints on the microservice itself.
Disclaimer: This is not a production ready implementation. It does not contain permission schemas, authentication or authorization mechanisms or any other limitations that are needed in a production ready GraphQL server.
Would love to hear about the challenges you are facing and what kind of patterns people are interested in.
If you would like to receive notifications regarding further articles, please follow me on Twitter.
- Uploading files with Apollo Server; https://www.apollographql.com/docs/apollo-server/data/file-uploads/
- Multipart specifications; https://github.com/jaydenseric/graphql-multipart-request-spec
- Apollo Server RESTDataSource model; https://www.apollographql.com/docs/apollo-server/data/data-sources/