What is it …

Almost every second web application has a requirement for allowing users to upload large file. With huge internet bandwidths available and all applications in the cloud, it is expected that the user must be able to upload large data to the web application.

While working on a web application which deals with banking domain, we need to have such a feature where users might be uploading their huge files at the same slot of the day! Each user may upload files from 500 Mb to 2 Gb, and this is critical for the growing number of end users for this web application in the last few months!

.. and how we took the path ?

OK, understood, our Rails application is not having that big code-base or features for now, and also not a huge user-base. But the number is increasing at the double pace each month than it was before 6 months! That means we would have almost three times the users in next 2 months and if this feature is not implemented correctly then the performance could be affected.

We have started testing the application with growing the upload file size, each time we are trying with 500 – 900 Mb files in a single go where each file could be of maximum 100 Mb. OK, works fine, but when we goes up 1 GB or when the net speed is slower it started failing!

  • We identified that the upload mechanism is kept very basic for the first iteration, and now we have a change in the feature. So, we have to change it such that it should work in all conditions. We have used Dropzone (for drag-drop support and AJAX based file uploads), Amazon S3 (storing the files in the cloud), nginx as a web server and unicorn as an application server.
  • Nginx : Every request to the server will first go to nginx, and then it will be forwarded to unicorn which eventually routed to the particular controller. We found many references and issues related to large filesize uploads with Rails, and so based on that we have fixed the nginx configuration by adding the nginx upload module. This requires to compile the nginx from scratch, change the service configuration and restart the server. We have also tweaked the nginx configuration to handle client timeout & file uploads and only forward the non-file parameters to Rails. This has reduced the response time, because Rails does not need to handle the file now.
  • Amazon S3 data upload : We have identified that for huge file, Rails takes time to push data to S3 and meanwhile the request may expire because of timeout between nginx and unicorn. Usually, unicorn timeout is 30s so if it takes more than 30s then the request will time out and nginx will respond with an error to the browser. How to fix this? We found carrierwave_backgrounder which allows to process the file uploading in a background job. We have used sidekiq to handle this background job which will eventually take the long running upload out of the synchronous request processing task and so the response would be faster.
  • Dropzone : But still, while uploading more than 700 – 900 Mb of files with slower bandwidth it is sometimes not working as expected. To fix that we have tweaked the Dropzone configurations. Dropzone allows to change the file uploading in diff. ways - We can push all files at once or process them in batches. Now as we could have max size of 50 – 100 Mb per file, we have decided to process the files in batch of 2 files. So if the user adds say 50 files of 20 Mb each then Dropzone would send 25 requests to the server with 2 files in each request. We just have to change the request handling logic on server side as well as on client side JavaScript. This would reduce the load on server just in one request and also the response time is better. End users can see the progress, and file uploading is going to take the same time as before but in multiple synchronous requests one after the other!

And, its done as expected! Now we can upload 1.5–2GB of total files at once even with a 4GB RAM cloud server! End user does not have any difference than before, they are able to see the file uploading progress and server never complaints about timeout!

Summary

  • Always send data in chunks to the server, to reduce the response time.
  • Configure the web server to handle large file uploading, it works faster than most app servers.
  • Use background processing on the app server for larger file uploads over cloud storage like S3/dropbox etc.

Click here for more details…

At BoTree Technologies, we build enterprise applications with our RoR team of 25+ engineers.

We also specialize in Python, RPA, AI, Django, JavaScript and ReactJS.

Consulting is free – let us help you grow!