Using GZip compression for large text values in Redis

Store and retrieve larger text values with Redis in .NET

Using REDIS for caching values for increasing application performance can be quite beneficial and can increase performance and scalability of your application.

Although REDIS allows quite large text files, up to 512MB which is something you most likely will not reach if you are storing for example JOSN data, there will be a significant network transfer over the wire for putting and pulling this value in and out.

You can always switch to a different format of the data you store like protobuf or Hadoop Avro which will reduce the size of the data, but the simplest one to use GZip compression which is already included in .NET framework in System.IO.Compression namespace.

The following is a helper class for using GZip to compress and decompress string values out of the box:

    public static class GZipHelper
    {
        private static void CopyTo(Stream sourceStream, Stream destinationStream)
        {
            byte[] bytes = new byte[4096];

            int cnt;

            while ((cnt = sourceStream.Read(bytes, 0, bytes.Length)) != 0)
            {
                destinationStream.Write(bytes, 0, cnt);
            }
        }

        public static byte[] Zip(String stringValue)
        {
            var bytes = Encoding.UTF8.GetBytes(stringValue);

            using (var inputStream = new MemoryStream(bytes))
            {
                using (var outputStream = new MemoryStream())
                {
                    using (var gzipStream = new GZipStream(outputStream, CompressionMode.Compress))
                    {
                        CopyTo(inputStream, gzipStream);
                    }

                    return outputStream.ToArray();
                }
            }
        }

        public static String Unzip(byte[] bytes)
        {
            using (var inputStream = new MemoryStream(bytes))
            {
                using (var outputStream = new MemoryStream())
                {
                    using (var gzipStream = new GZipStream(inputStream, CompressionMode.Decompress))
                    {
                        CopyTo(gzipStream, outputStream);
                    }

                    return Encoding.UTF8.GetString(outputStream.ToArray());
                }
            }
        }
    }

    

The part left apart from compression and decompression is storing values to REDIS.

Since Microsoft is recommending StackExchange.Redis nuGet package, we will be using this package for REDIS operations of storing and retrieving compressed value. Full guide for storing and retrieving values with REDIS you can find in this Microsoft's article https://docs.microsoft.com/en-us/azure/redis-cache/cache-dotnet-how-to-use-azure-redis-cache.

Note

In case of storing and pulling larger data from REDIS using StackExchange.Redis make sure you configure higher value for connectTimeout parameter in connection string. Complete configuration parametes list can be found here https://stackexchange.github.io/StackExchange.Redis/Configuration.html

The sample in the ling above describes simple text store and retrieve operations. The thing is we cannot store byte array generated with our GZip helper class as a simple text to REDIS.

Instead of using StringSet method we have to store compressed bytes as a HashSet in REDIS. For this, we are going to use a bit different approach.

First thing is connection setup as recommended on Microsoft Docs page. We are going to use single shared connection with Lazy constructor:

        private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
        {
            return ConnectionMultiplexer.Connect("myazure.cache.windows.net:6380,password=XXXXXXXXXXXXXXXXXXXXXXXXXX=,ssl=True,abortConnect=False,connectRetry=5,connectTimeout=5000,syncTimeout=100000,defaultDatabase=0");
        });

        public static ConnectionMultiplexer Connection
        {
            get
            {
                return lazyConnection.Value;
            }
        }
    

We are going to need some data to serialize and deserialize and involve compression in between. For test purpose I created one simple poco class with some properties which will be populated on the runtime:

public class SampleModel
    {
        public int Id { get; set; }
        public DateTime Time { get; set; }
        public String Guid { get; set; }
    }
    

Data model is here so we need to aggregate some sample data for test:

            var objectInputData = Enumerable.Range(0, 10000)
                                                    .Select(r => new SampleModel() { Id = r, Guid = Guid.NewGuid().ToString(), Time = DateTime.Now })
                                                    .ToList();
            var jsonInputData = Newtonsoft.Json.JsonConvert.SerializeObject(objectInputData);
            var byteInputData = System.Text.Encoding.UTF8.GetBytes(jsonInputData); //1006830 bytes
    

You can notice that there is a comment next to the byte array variable. It shows the size of the raw data which would go to REDIS without compression applied. Of course we are not going to use this data, we are goint to compress it using GZip before we push it to REDIS.

As mentioned above in the article, we cannot just use StringSet, instead we have to store value as a HashSet:

 String redisKey = "ZipValue";
            String hashKey = "data";
            IDatabase redisDatabase = Connection.GetDatabase();

            #region Write compressed data to REDIS
            var compressedJsonData = GZipHelper.Zip(jsonInputData);//264542 bytes
            redisDatabase.HashSet(redisKey, new HashEntry[] { new HashEntry(hashKey, compressedJsonData) });
            #endregion
    

From the comment you can see the size of compressed data, which is several times less that original JSON string size. This will reduce network transfer and decrease the IO operation time. Ofcourse it will put some pressure on the CPU for compression and decompression, but is one thing that is easy to scale comparing to the network transfer time.

Once we have our data in REDIS, we can easily connect REDIS CLI and check if the key is there:

>select 0
OK

>dbsize
(integer) 1

>scan 0 count 10
1) "0"
2) 1) "ZipValue"

Now last step is to retrieve data from REDIS, decompress and deserialize back to object instances

            var hashSet = redisDatabase.HashGetAll(redisKey).FirstOrDefault();
            byte[] data = hashSet.Value;
            var decompressedJsonData = GZipHelper.Unzip(data);
            var decompressedObjectData = Newtonsoft.Json.JsonConvert.DeserializeObject<IEnumerable<SampleModel>>(decompressedJsonData);
    

 

References

Disclaimer

Purpose of the code contained in snippets or available for download in this article is solely for learning and demo purposes. Author will not be held responsible for any failure or damages caused due to any other usage.


About the author

DEJAN STOJANOVIC

Dejan Stojanovic is a passionate Software Architect/Developer. He is highly experienced in .NET programming platform includion ASP.NET MVC and WebApi. He likes working on new technologies and exciting challenging projects

CONNECT WITH DEJAN Loginlinkedin Logintwitter Logingoogleplus Logingoogleplus

JavaScript

read more

SQL/T-SQL

read more

Umbraco CMS

read more

PowerShell

read more

Comments for this article