Friday, November 15, 2013

Mongo Style Capped Collections In MS SQL Server

Suppose you want to store temporary data in MS SQL Server. In general it's not a good practice because of relatively low performance. Although when expected load isn't too high it can be desirable. The lifetime of each piece of data is short but you create the pieces often therefore you need to find a way of keeping the size of the storage small.

One possible way of solving this is removing multiple outdated records over a time interval using SQL Server Agent or a special console app launched as a service. The downside here is that we introduce one extra dependency which needs to be configured on every server you want to use your storage.

The second option is using a special data structure which prevents your storage from overflow. Mongo capped collections (which are basically circular buffers) are a good example of this kind of data structures. Unfortunately there is no anything similar in MS SQL Server but it's rather easy to build one.

Let's start by defining a table:

CREATE TABLE [dbo].[Storage] (
 [Key] int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
 [Data] varbinary(MAX),
 [Version] rowversion NOT NULL
)

Retrieving data from storage by key should be fast so a clustered index on the Key column is very handy here. We can create an index on rowversion to find records quickly during updates but this should be done carefully. The amount of reads and writes is almost the same so while we gain some performance boost on reads we lose it on writes.

Next we should decide should we initialize the storage with empty values or not. I think it is beneficial because if the entire storage is filled we won't have to choose between an update and insert operation every time we want to store some bytes. We simply update the oldest record with a new value.

declare @MaxItems int = 999,
 @ItemIndex int = 0
while @ItemIndex < @MaxItems
begin
 INSERT INTO [dbo].[Storage] ([Data]) VALUES(null)
 SET @ItemIndex = @ItemIndex + 1
end

This will populate our storage with empty data. We use auto increment for simplicity; in fact keys can be generated whatever we want.

Last but not least we need to create a mechanism for inserting data to the storage. The following stored procedure will do the job:

CREATE PROCEDURE [dbo].[insertToStorage]
 @Data varbinary(MAX)
AS
BEGIN
 UPDATE [dbo].[Storage] WITH (UPDLOCK, READPAST)
 SET [Data] = @Data
 OUTPUT inserted.[Key]
 WHERE [Version] = (SELECT MIN([Version]) FROM [dbo].[Storage])
END

No comments:

Post a Comment