How to Handle Large File Uploads?

Share this

I'm actually posting this as a question. If you're looking for the answer, sorry I don't have it yet.

How can we reasonably handle large file uploads? I'm talking in the >100MB range; YouTube, for instance, now supports 2GB files, and this will become increasingly the norm. I don't think that most servers are up to that yet, particularly if you need an application to scale.

Elephant on a Bike

Currently, using PHP, you need to set memory_limit to more than twice the upload_max_filesize, which as you can see would be prohibitive in the example of 2GB uploads; you'd need to set your PHP memory to >4GB (adding the buffer of 64M or whatever you need to run Drupal). EDIT: Looks like I was incorrect in my assumption; if you're not going to process the file, you don't need a huge memory footprint just to handle the raw uploads. Thanks Nate and Jamie!

Even if you manage to have that kind of resource available, you can probably expect things to splode with concurrent uploads...

So I spent some time yesterday looking at SWFUpload yesterday (module here), as I'd misunderstood its claims. Yes, it handles large file uploads (from the browser's standpoint), but you still need to set PHP memory accordingly. Not suitable for what I'm looking for, but it is a really nice way to handle multiple uploads. WARNING: I also learned from experience and much head-scratching that it doesn't work if you have Apache authentication on your server...

Now I'm looking at node.js as a possibility. This looks really great, and might do the job. Basically, it's a JavaScript application that sits on your server. Yes, you heard that right. Turns out that as JS has evolved, it's turned into a really tight language, and should be quite suitable for concurrent tasks.

Sorry if you came to this post looking for answers; I've simply postulated more questions. But I'm hoping that someone with more experience with this issue might be able to comment, and we'll all benefit from it. Additionally, this might turn out to be a handy addition to the Media suite, perhaps as a fancy stream wrapper for handling large files? And I'll definitely follow-up when I figure out how best to tackle this.

Thanks,
Aaron

Comments

hatsch's picture

new filefield sources release

hi,
just wanted to mention that there is a new filefield_sources release including great new funcionality for using ftp uploades files: http://drupal.org/node/438940
maybe you want to give it a try. i am using this patch now for 2 months without a problem. now its included in the release!

kmind's picture

Nice post Aaron! I have faced

Nice post Aaron! I have faced that problem as well. At work, I am building a file hosting site as a side project. We need to move files that are 30-40gbytes with ease, over HTTPS, as anything else is pretty much sealed closed for both encryption and tightly firewalled modern enterprise networks. We've found Flash to work until about 2Gbytes, after which it starts sending a negative Content-Length header. It might be some time before 64bit Flash is common and/or Adobe uses a 64bit counter for Content-Length.
In order to upload such large files with a browser, we ended up using SWFUpload and the native HTML4 method and if the user selects a large file, she is prompted to click on a button to toggle the native uploader. In Webkit, Firefox and IE8 that seems to work great for files in the 10+ gigabyte range.

Another way to approach this is to write a Java applet. I personally dont like going there ;)

Good luck!

hatsch's picture

the real problem

the real problem with large file uploads is not php, although there might be much better solutions for that,
but limitations in current browsers. they still use for signed int for content length in form data.
opera seems to be the only exeption to this limitation.

Jen's picture

Module Suggestion

MediaFront Module and CDN or Amazon S3 maybe?
Or for files on another server: File Field Sources?

aabid's picture

i was also in need if that

i was also in need if that type of module and i posted my question in several forums for suggestion and got advice to go for swfupload, i don't know whether it will work properly or not but i will opt it.
My blog - mobile software

Jamie's picture

Not sure why you are saying

Not sure why you are saying that you have to set the memory limit to double the upload size. I a video site and we have numerous uploads per day that are over 200mb in size and our memory limit is only set to 32mb. That's with Apache 2.2.X running mod_php on PHP 5.2.x. We have never had a single problem from it.

For upload progress we currently use SWFUpload, but will be moving to upload_progress in the near future.

SWFUpload doesn't do anything different on the PHP side. It's the same as uploading via a standard HTML form. How SWFUpload does the progress is by querying the bytesTotal property in the progress event of the flash.net.FileReference class. It's 100% client side.

Now something I just finished up testing and am rolling out in a couple weeks is using a separate server for our video uploads. Instead of the videos going directly into Drupal, we will now have an upload.* server. Basically it works as a proxy to Drupal so that the upload page looks like it came from Drupal (this was a pain to get the menu items to render correctly, but it works). Now the videos go directly to this other server, which also contains our FFMPEG encoding scripts. The point of this was to keep processes from being tied up on the main server while files are uploading, plus I got certain things running on the upload server as soon as the file is done (get screen grab, verify FFMPEG can actually read the file and populate the database entry with the meta information). On that server I have PHP's memory limit set at 16mb and have successfully uploaded a 1.5 gb video file and everything worked fine.

aaron's picture

Wow! 16M! That's impressive;

Wow! 16M! That's impressive; I like your solution of a separate server for handling the uploads, which absolutely makes sense if you're going to have another server to process anyway. (By the way, one absolutely needs that if they're going to process with FFMPEG anyway...) In any case, after reading yours and Nate's comments, I've retested, and confirm that I'm able to handle large uploads with 64M (which we require for our basic Drupal + contrib modules). I'm going to update my post accordingly so folks aren't continued to be misled by the perception that you must have a large memory footprint to handle large file uploads.

Thanks!

Wim Mostrey's picture

FTP

For large file uploads I always try to encourage the client to use ftp along with either a file browser like imce or a solution like image_import to get the files linked to Drupal.

aaron's picture

FTP + Media Mover is also an

FTP + Media Mover is also an excellent workflow implementation. Unfortunately, in this case, it's user-facing, not client-facing, so we need to be able to offer the ability for HTML uploads. Thanks!

Onopoc (Not receiving registration confirmation email)'s picture

Kaltura

With the module http://drupal.org/project/kaltura you can upload unlimited file size. This is for the free Kaltura.org. It's open source. I never tested with files larger than 200MB though. You might get limited by your PHP limits settings. Read more at http://www.kaltura.org/upload-limit-200-mb-kalturace

With the paid version of Kalture.com you can upload up to 150MB file. Read more at http://www.kaltura.com/index.php/kmc/help

aaron's picture

Kaltura continues to rock

Kaltura continues to rock, of course! That's an excellent resource for folks to use. Thanks for the reminder.

guidot's picture

WebDAV

When we are able to bypass the load of a scripting language on public downloads by letting the webserver send the files directly it should be possible to do that the other way around too. WevDAV is an HTTP-extension that does exactly that. No need to have another script (be it Perl or Java or whatever) on the server and the additional load that it would cost.

aaron's picture

Very interesting. I wasn't

Very interesting. I wasn't familiar with WebDav before. Looks like it would sit on the server. I haven't explored that option yet -- would it also require something on the client side to function properly?

Kars-T's picture

Perl

Some years ago I used a php script for a progressbar that brought a mini perl script to go around the php limitations as perl has none. Perl can read raw header and has afaik no size limit. If you use something else than html on the client side like flash maybe its wise to use something different on the server as well.

I think this was the project: Megaupload

aaron's picture

Interesting solution -- I

Interesting solution -- I looked at another Perl solution earlier as well. Thanks for the tip!

Nate Haug's picture

Memory Limits

I think the claim of needing "twice the the max_upload_size" only applies to images (though my experience is more like 4 times the size of images for resizing). If you don't do any on-the-fly processing, you can handle extremely large files with a modest amount of memory. In testing FileField I've managed 1.5 GB files with 96MB of memory. Time outs are still a problem though.

aaron's picture

Don't I feel foolish now... I

Don't I feel foolish now... I hadn't even bothered to test that. I now confirm your reported behavior -- looks like that page should be updated, as it's a little misleading. If you're not processing a file, you don't need to bump up your memory... I'll still need to figure out something for time outs, but this solves half the battle. Thanks, Nate!

blainelang's picture

filedepot module?

The filedepot module http://drupal.org/project/filedepot that was released a few weeks ago is focused on providing a secure document management application for drupal but it can store any file format. There is an optional desktop client http://www.nextide.ca/solutions/filedepotclient that allows users to drag multiple files to be uploaded from their desktop to the file repository. It also can be used to upload very large file but it still relies or is restricted by the web server PHP settings. The filedepot client is a MS Windows desktop app and behaves like a mini-browser monitoring a local desktop directory for files and then in the background uploads the files. Although still limited by some of the PHP settings, users are not sitting waiting for the browser form to process the upload.

aaron's picture

Wow, this looks like an

Wow, this looks like an excellent resource! I'll definitely check it out. Thanks!

Mark Schoonover's picture

Use A Content Delivery Network

Interesting topic as I've run into this myself. It's more than just PHP memory limits, it's also internet bandwidth in & out, and disk space. Disk space is probably not so much an issue these days I will agree, but that can make backing up and restoring your server problematic.

During SANDCamp 2010 here in San Diego, I gave a session on Amazon S3 & Cloudfront integration: http://cf.thetajoin.com/SandCAMP2010/DrupalAmazonIntegration.pdf . It's pretty easy to configure, plus you can use familiar tools to upload very large files. Cloudfront supports streaming media too, and I have a blog post on that very subject: http://blog.thetajoin.com/content/preparing-amazon-s3-cloudfront-streami...

The same concepts can be applied to other CDNs as well. I prefer to split content delivery to a CDN and let the server just handle Drupal.

aaron's picture

Yes, a CDN or other external

Yes, a CDN or other external hosting service is crucial for serving up large files, or even for small files when you need to scale. Thanks for the tips!

Anonymous's picture

SWFUpload Option

I have successfully used http://drupal.org/project/swfupload
I must admit I did have some problems and it was a while ago. I am not a developer. Perhaps a developer would have less trouble.

aaron's picture

I spent some time patching on

I spent some time patching on the issue queue the other day while testing out the module. SWFUpload is an excellent plugin, and well worth the effort to get in place if you want a nice client-side manager, particularly for multiple file uploads.

Wim Leers's picture

W3C File API

It looks like the W3C File API might help us out here. Unfortunately, it's only been implemented in FF 3.6 so far AFAICT.

Unfortunately, while this is very likely what the future will bring, it's not yet possible to use this today. Maybe when we're at IE10/Chrome 6/Safari 5/FF4/Opera 11. So, as for realistic solutions for today, I don't know.

The spec: http://dev.w3.org/2006/webapi/FileAPI/
Demo video: http://www.thecssninja.com/javascript/drag-and-drop-upload
Other relevant links:
- http://www.appelsiini.net/2009/10/html5-drag-and-drop-multiple-file-upload
- http://www.broken-links.com/2009/12/15/firefox-3-6-uses-the-w3c-file-api/

aaron's picture

HTML5 FTW!

Exciting things to come at least. In my dream world, I'd completely ignore IE. Unfortunately, we have to continue to contend with them, at least until they're less than 3% of the market share, or they wake up and embrace open standards. I'm not holding my breath on either outcome...

bojanz's picture

video_upload?

I've recently contributed the browser upload method to video_upload (it's in the issue queue, tested by me, my client, and a few community members).

1) video_upload is the module that provides a filefield-like cck field for youtube uploads
2) The browser upload method means that it uploads directly to youtube, bypassing your server.

Might be a good option to explore...

aaron's picture

Wow, that's really great! Why

Wow, that's really great! Why create a new field type though? Wouldn't it be better to instead supply a new widget on top of the Filefield field? That seems like it would create more traction in the community -- see SWFUpload to see how they did that.

Justin Ellison's picture

Java Applets

The only answer to this problem that I'm aware of is a java applet. My company has a PHP-based file manager webapp that uses http://www.jfileupload.com/. While it's a java applet, and it's fairly non-exciting, it does work and I've uploaded 4+GB iso's using it.

aaron's picture

That looks like a great

That looks like a great client-side solution for some cases. Of course, much of that functionality (such as image resizing and file management) would be better handled on Drupal, with such solutions as Image Styles in core and the Media module.

awolfey's picture

jifupload

The open source java applet jupload will also do this. It works by breaking files into chunks, which the server must reassemble.

aaron's picture

That looks like an

That looks like an interesting solution; thanks -- I'd thought about the idea of breaking up files, and am not surprised to see an open source solution using that implementation already in place. I knew there had to be folks who have tackled this issue already. Thanks!

arthur's picture

Timeouts and memory limits too!

One of the things that kills uploads is PHP's execution time and depending on how you run Apache it can create the same problem with long uploads. PHP also has a bug with some file operations over 2GB (http://php.net/manual/en/function.filesize.php)

FTP/SFTP is a much better uploading solution, but having a client embedded in a web page is not something that I know of.

aaron's picture

Yes, I agree re. FTP/SFTP.

Yes, I agree re. FTP/SFTP. However, until browsers have a consistent way to embed an FTP upload form into a page, that becomes prohibitive to user-facing uploads, such as what YouTube offers.

The Society for Venturism has chosen me as the recipient of its charity for this year, to hopefully offer me cryonic preservation when the time comes. And this month, Longecity, an excellent forum for the discussion of issues related to extending the lifespan of humans, has offered up a matching grant of up to a thousand dollars to help out! So help out! Please.