Fast image search in .NET using C#

Searching images on the file system with C#

  • Share

Searching files on the file system is pretty easy using System.IO namespace classes. The problem is if you need to youery file system with some things which are related to a file, but not related to file system itself.

This is the case for searching images or any other media files. For searching these type of files you might want to take in concern width and height for the image, length ov video or audio file and a bunch of some other things which are not part of the System.IO namespace.

However, .NET has classes for working with these types of files, but they are often slow and do take a lot of memory since they deserialize file into object instance. For example for bitmap, to take the dimensions, you have to create instance of a Bitmap which is in memory larger several times than the file itself.

Image you need to do this for a search where you are searching among thousands of file. This will use a lot of memory by loading a whole image just to check dimensions.

Other approach is to read just headers of the image files to get. This uses less memory and it is for sure a lot faster. To skip re-inventing a wheel I just used a snippet from this stackoverflow.com article. I made some small modifications like adding the fall-back in case it fails to read image headers.

using System;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.Linq;

namespace ImageSearch
{
    internal static class ImageHelper
    {
        const string errorMessage = "Could not recognise image format.";

        private static Dictionary<byte[], Func<BinaryReader, Size>> imageFormatDecoders = new Dictionary<byte[], Func<BinaryReader, Size>>()
        {
            { new byte[]{ 0x42, 0x4D }, DecodeBitmap},
            { new byte[]{ 0x47, 0x49, 0x46, 0x38, 0x37, 0x61 }, DecodeGif },
            { new byte[]{ 0x47, 0x49, 0x46, 0x38, 0x39, 0x61 }, DecodeGif },
            { new byte[]{ 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A }, DecodePng },
            { new byte[]{ 0xff, 0xd8 }, DecodeJfif },
        };

        /// <summary>
        /// Gets the dimensions of an image.
        /// </summary>
        /// <param name="path">The path of the image to get the dimensions of.</param>
        /// <returns>The dimensions of the specified image.</returns>
        /// <exception cref="ArgumentException">The image was of an unrecognised format.</exception>
        public static Size GetDimensions(string path)
        {
            using (BinaryReader binaryReader = new BinaryReader(File.OpenRead(path)))
            {
                try
                {
                    return GetDimensions(binaryReader);
                }
                catch (ArgumentException e)
                {
                    if (e.Message.StartsWith(errorMessage))
                    {
                        throw new ArgumentException(errorMessage, "path", e);
                    }
                    else
                    {
                        throw e;
                    }
                }
            }
        }

        /// <summary>
        /// Gets the dimensions of an image.
        /// </summary>
        /// <param name="path">The path of the image to get the dimensions of.</param>
        /// <returns>The dimensions of the specified image.</returns>
        /// <exception cref="ArgumentException">The image was of an unrecognised format.</exception>    
        public static Size GetDimensions(BinaryReader binaryReader)
        {
            int maxMagicBytesLength = imageFormatDecoders.Keys.OrderByDescending(x => x.Length).First().Length;

            byte[] magicBytes = new byte[maxMagicBytesLength];

            for (int i = 0; i < maxMagicBytesLength; i += 1)
            {
                magicBytes[i] = binaryReader.ReadByte();

                foreach (var kvPair in imageFormatDecoders)
                {
                    if (magicBytes.StartsWith(kvPair.Key))
                    {
                        return kvPair.Value(binaryReader);
                    }
                }
            }

            throw new ArgumentException(errorMessage, "binaryReader");
        }

        private static bool StartsWith(this byte[] thisBytes, byte[] thatBytes)
        {
            for (int i = 0; i < thatBytes.Length; i += 1)
            {
                if (thisBytes[i] != thatBytes[i])
                {
                    return false;
                }
            }
            return true;
        }

        private static short ReadLittleEndianInt16(this BinaryReader binaryReader)
        {
            byte[] bytes = new byte[sizeof(short)];
            for (int i = 0; i < sizeof(short); i += 1)
            {
                bytes[sizeof(short) - 1 - i] = binaryReader.ReadByte();
            }
            return BitConverter.ToInt16(bytes, 0);
        }

        private static int ReadLittleEndianInt32(this BinaryReader binaryReader)
        {
            byte[] bytes = new byte[sizeof(int)];
            for (int i = 0; i < sizeof(int); i += 1)
            {
                bytes[sizeof(int) - 1 - i] = binaryReader.ReadByte();
            }
            return BitConverter.ToInt32(bytes, 0);
        }

        private static Size DecodeBitmap(BinaryReader binaryReader)
        {
            binaryReader.ReadBytes(16);
            int width = binaryReader.ReadInt32();
            int height = binaryReader.ReadInt32();
            return new Size(width, height);
        }

        private static Size DecodeGif(BinaryReader binaryReader)
        {
            int width = binaryReader.ReadInt16();
            int height = binaryReader.ReadInt16();
            return new Size(width, height);
        }

        private static Size DecodePng(BinaryReader binaryReader)
        {
            binaryReader.ReadBytes(8);
            int width = binaryReader.ReadLittleEndianInt32();
            int height = binaryReader.ReadLittleEndianInt32();
            return new Size(width, height);
        }

        private static Size DecodeJfif(BinaryReader binaryReader)
        {
            while (binaryReader.ReadByte() == 0xff)
            {
                byte marker = binaryReader.ReadByte();
                short chunkLength = binaryReader.ReadLittleEndianInt16();

                if (marker == 0xc0)
                {
                    binaryReader.ReadByte();

                    int height = binaryReader.ReadLittleEndianInt16();
                    int width = binaryReader.ReadLittleEndianInt16();
                    return new Size(width, height);
                }

                binaryReader.ReadBytes(chunkLength - 2);
            }

            throw new ArgumentException(errorMessage);
        }
    }
}
    

This class will do the magic of finding the dimension of image file, but there is still room to improve performances. The first thing that got to my mind is using of Tasks to make looping through files faster. Using tasks or threads made code work a lot faster and I was pretty happy about it. It took around 650 milliseconds for 4174 files in all sub-folders.

My friend and college helped me make it even more faster with using of Parallel

using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Drawing;
using ImageSearch;

namespace ImageSearch
{
    public class Finder
    {
        private ConcurrentBag<string> resultPaths = new ConcurrentBag<string>();
        public IEnumerable<string> FilePaths
        {
            get
            {
                return resultPaths;
            }
        }

        public void Find(string folderPath, int? width = null, int? height = null)
        {
            string[] extensions = new string[] { ".bmp", ".jpg", ".jpeg", ".png", ".gif" };

            IEnumerable<String> filesList =
                Directory.EnumerateFiles(folderPath, "*.*", SearchOption.AllDirectories).Where(f => extensions.Contains(Path.GetExtension(f), StringComparer.InvariantCultureIgnoreCase));

            Parallel.ForEach(filesList, new Action<string>((fn) =>
                {
                    if (fn != null && !string.IsNullOrWhiteSpace(fn.ToString()) && File.Exists(fn.ToString()))
                    {
                        string fileName = fn.ToString();
                        bool match = true;

                        //Check image dimensions
                        if (width != null || height != null)
                        {
                            Size imageDimension;
                            try
                            {
                                imageDimension = ImageHelper.GetDimensions(fileName);
                            }
                            catch (Exception ex)
                            {
                                using (var bitmap = new Bitmap(fileName))
                                {
                                    imageDimension = bitmap.Size;
                                }
                            }

                            //Compare size
                            if (width != null)
                            {
                                match = imageDimension.Width == width;
                            }

                            if (match && height != null)
                            {
                                match = imageDimension.Height == height;
                            }
                        }

                        if (match)
                        {
                            resultPaths.Add(fileName);
                        }
                    }
                }));
        }
    }
}
    

For executing search with using Parallel instead of Tasks made code faster more than two times. So now for the same amount of work which is checking headers of 4174 image files it took around 250 milliseconds.

Note

Task.Run will always make a single task per item (since you're doing this), but the Parallel class batches work so you create fewer tasks than total work items. This can provide significantly better overall performance, especially if the loop body has a small amount of work per item.

 namespace ImageSearch { class Program { static void Main(string[] args) { var finder = new Finder(); var watch = Stopwatch.StartNew(); finder.Find(@"D:\Images", 300); watch.Stop(); foreach (string file in finder.FilePaths) { Console.WriteLine(file); } Console.WriteLine(string.Format("TOTAL RESULTS: {0}", finder.FilePaths.Count())); Console.WriteLine(string.Format("TOTAL TIME: {0}", watch.ElapsedMilliseconds)); Console.ReadLine(); } } } 

References

  • Share

Disclaimer

Purpose of the code contained in snippets or available for download in this article is solely for learning and demo purposes. Author will not be held responsible for any failure or damages caused due to any other usage.

Comments for this article

comments powered by Disqus