Download file in chunks in parallel in C#
Improved file download using chunks of file in parallel in C#
Downloading large files from your code may cause problems due to limitations in your network or system where your code is executing. For example, some systems limit the size of the file you can download through the network. This is a common case in highly controlled environments.
Another aspect is download speed. Basically when you open the response stream in your code, you are reading bytes in order which means you could technically define stream ranges and read them in parallel and therefore use the power of multiple cores n your machine to achieve faster download. Similar principle is used in download manager applications with addition for download resuming.
The following approach enables the power of parallel operations on multi-core machines and can be used as a base for download resume. This is only the base implementation which allows downloading files in chunks in parallel.
Approach consists of few simple basic steps:
- acquire the file size by making http request with HEAD method
- calculate the size of chunks based on desired number of parallel downloads
- initiate download of each chunk in parallel and save to a separate file
- merge all chunk files in a single final file
- delete all temporary files
And now to begin with hands on code. First thing I decided to do is to do is to handle the response stream ranges in a collection of model objects. I could got with dictionary in this case, but using a model class seemed more readable solution.
namespace Downloader.App { internal class Range { public long Start { get; set; } public long End { get; set; } } }
Before we switch to the logic we need to declare a model for a result. We are going to need few infos for the invoker of the download method. I found following properties useful, so I put them as a part of a download result method
using System;namespace Downloader.App{public class DownloadResult{public long Size { get; set; } public String FilePath { get; set; } public TimeSpan TimeTaken { get; set; } public int ParallelDownloads { get; set; } } }
And now to the main stuff. The following method accepts the download url, destination path as well as optional number of parallel downloads and whether you want to skip SSL validation if you are downloading from HTTPS url
using System; using System.Collections.Generic; using System.Collections.Concurrent; using System.IO; using System.Linq; using System.Net; using System.Threading.Tasks; namespace Downloader.App { public static class Downloader { static Downloader() { ServicePointManager.Expect100Continue = false; ServicePointManager.DefaultConnectionLimit = 100; ServicePointManager.MaxServicePointIdleTime = 1000; } public static DownloadResult Download(String fileUrl, String destinationFolderPath, int numberOfParallelDownloads = 0, bool validateSSL = false) { if (!validateSSL) { ServicePointManager.ServerCertificateValidationCallback = delegate { return true; }; } Uri uri = new Uri(fileUrl); //Calculate destination path String destinationFilePath = Path.Combine(destinationFolderPath, uri.Segments.Last()); DownloadResult result = new DownloadResult() { FilePath = destinationFilePath }; //Handle number of parallel downloads if (numberOfParallelDownloads <= 0) { numberOfParallelDownloads = Environment.ProcessorCount; } #region Get file size WebRequest webRequest = HttpWebRequest.Create(fileUrl); webRequest.Method = "HEAD"; long responseLength; using (WebResponse webResponse = webRequest.GetResponse()) { responseLength = long.Parse(webResponse.Headers.Get("Content-Length")); result.Size = responseLength; } #endregion if (File.Exists(destinationFilePath)) { File.Delete(destinationFilePath); } using (FileStream destinationStream = new FileStream(destinationFilePath, FileMode.Append)) { ConcurrentDictionary<int, String> tempFilesDictionary = new ConcurrentDictionary<int, String>(); #region Calculate ranges List<Range> readRanges = new List<Range>(); for (int chunk = 0; chunk < numberOfParallelDownloads - 1; chunk++) { var range = new Range() { Start = chunk * (responseLength / numberOfParallelDownloads), End = ((chunk + 1) * (responseLength / numberOfParallelDownloads)) - 1 }; readRanges.Add(range); } readRanges.Add(new Range() { Start = readRanges.Any() ? readRanges.Last().End + 1 : 0, End = responseLength - 1 }); #endregion DateTime startTime = DateTime.Now; #region Parallel download int index = 0; Parallel.ForEach(readRanges, new ParallelOptions() { MaxDegreeOfParallelism = numberOfParallelDownloads }, readRange => { HttpWebRequest httpWebRequest = HttpWebRequest.Create(fileUrl) as HttpWebRequest; httpWebRequest.Method = "GET"; httpWebRequest.AddRange(readRange.Start, readRange.End); using (HttpWebResponse httpWebResponse = httpWebRequest.GetResponse() as HttpWebResponse) { String tempFilePath = Path.GetTempFileName(); using (var fileStream = new FileStream(tempFilePath, FileMode.Create, FileAccess.Write, FileShare.Write)) { httpWebResponse.GetResponseStream().CopyTo(fileStream); tempFilesDictionary.TryAdd((int)index, tempFilePath); } } index++; }); result.ParallelDownloads = index; #endregion result.TimeTaken = DateTime.Now.Subtract(startTime); #region Merge to single file foreach (var tempFile in tempFilesDictionary.OrderBy(b => b.Key)) { byte[] tempFileBytes = File.ReadAllBytes(tempFile.Value); destinationStream.Write(tempFileBytes, 0, tempFileBytes.Length); File.Delete(tempFile.Value); } #endregion return result; } } } }
The class and method are static and they are setting ServicePointManager class properties which is also static. That makes this class not thread safe
Now to give a test run to the code. Although I did not do a test with a large file, you can see the difference in download speed.
using System; namespace Downloader.App { class Program { static void Main(string[] args) { var result = Downloader.Download("http://dejanstojanovic.net/media/215073/optimize-jpg.zip", @"c:\temp\", 2); Console.WriteLine($"Location: {result.FilePath}"); Console.WriteLine($"Size: {result.Size}bytes"); Console.WriteLine($"Time taken: {result.TimeTaken.Milliseconds}ms"); Console.WriteLine($"Parallel: {result.ParallelDownloads}"); Console.ReadKey(); } } }
This is the output of the console running with single downloader option
Size: 307440bytes
Time taken: 486ms
Parallel: 1
Now running with 4 parallel download the results are the following
Size: 307440bytes
Time taken: 279ms
Parallel: 4
Not a big difference in time since the file is to small, but if you compare these two values you will see that improvement is over 50%.
Complete code and ready to debug project you can find in the download section of this article page.
References
Disclaimer
Purpose of the code contained in snippets or available for download in this article is solely for learning and demo purposes. Author will not be held responsible for any failure or damages caused due to any other usage.
Comments for this article