Skip to content

syedhassaanahmed/log-storage-service

Repository files navigation

log-storage-service

Problem

Background:

An automated test framework is used to run tens of thousands of tests per week, all of which generate various output files. Files are being zipped and should be uploaded to this service on Azure.

Requirements:

  • An App Service with a Controller written in C#.
  • A URL where zipped files can be sent using RESTFul API, e.g. HTTP requests.
  • On success, API returns proper status code with a URL to the unzipped contents in the response body.
  • It supports load balancing (such as Azure Traffic Manager)
  • Unit tests

Solution

Build status Build Status Coverage Status

Deploy to Azure

Setup:

  • Visual Studio 2017
  • Azure Storage Emulator for local development and executing StorageServiceTests
  • NOTE: If debugging the app in a container, Azurite is used instead of Azure Storage Emulator.

Design choices:

  • ASP.NET Core 1.1 was selected since its latest, greatest and fastest ASP.NET yet!
  • HTTP PUT is preferred over POST due to its idempotence. (see OneDrive API)
  • ContentType is currently limited to application/zip but can easily be modified/extended.
  • Solution is optimized for writes hence zip is directly being put in blob storage. Reads can be optimized by using ETag, response caching, CDN or even Redis Cache.
  • Each blob has associated metadata which is information about inner files (name, length, lastModified)
  • Block Blobs were preferred over Page Blobs since it allows us to upload multiple blocks in parallel to decrease upload time.
  • Blob Storage request options (SingleBlobUploadThresholdInBytes and ParallelOperationThreadCount) can be configured in App settings and do not require app restart in order to be changed.
  • Upload file size is handled on IIS level (maxAllowedContentLength is currently set to 60MB in Web.config)
  • Static Files Middleware is used for serving files inside zip archive. This way we get some functionality for free.
  • For each upload MD5 hash is computed and stored inside Blob Properties. It gets verified for each download. On a dual-core i7 CPU it takes ~150ms to calculate hash on a 60MB file. If raw performance is needed, this can be turned off.
  • Controllers are tested using TestHost so that authorization, routes and request headers can also be asserted.
  • Solution has naive implementation of security using Basic Authentication with hardcoded Claims-Based Authorization (Test Credentials can be configured in App settings).

Assumptions:

  • Re-uploading zip files with same name will override them.

Future improvements:

  • Instead of directly putting in Blob storage, store archive on local disk first and let a WebJob upload them to Blob storage. Download Requests in the meantime can be served from disk.
  • Multiple file uploads using multipart/form-data adds some flexibility for the API consumer however partial failures must be handled in that case.
  • Protect API by replacing Basic Authentication and hardcoded Authorization with something more secure (e.g use Azure AD).
  • Resumable upload if test zip outputs are expected to be large.
  • LZ4 for transferring zips: LZ4 is scalable with multi-cores CPU. Benchmarks suggest it offers fast compression/decompression at the expense of memory.
  • Blob MetaData has a size limit of 8K including both name and value. Consider alternatives if several inner files are expected inside the archive.
  • Consider keeping fresh archives in hot tier while moving older ones to cool tier.