Skip to Content

Azure Storage Comparing 'Copy Blob' and 'Put Block List From URL'

Azure has two API’s which on the surface do very similar things. You give the API’s a source, and then they copy the data into the destination storage account (blob storage account).

Copy Blob

Copy Blob can only copy from one storage account to another, but it can copy blobs or files. The Copy Blob process is asynchronous, so when you make the call to Copy Blob you get a status and a request id back. You then need to poll and check on the status of your copy operation. The response to a Copy Blob request include these two headers:

  • x-ms-copy-id - to find out the status of the request you pass this onto Get Blob or Get Blob Properties, you can also abort the request using Abort Copy Blob
  • x-ms-copy-status -this shows whether the status is success or pending. I have only ever seen a pending or no response when there was an error returned. The API documentation says that you can get a copy status of success which means it has already been completed.

If you get a copy status of pending you can call Get Blob Properties, and there are four interesting headers returned:

  • x-ms-copy-id - this tells you whether the current copy operation or the last completed operation was your copy id that was returned from Copy Blob request. If this is empty, then your request was either completed or aborted and something else such as a Put Blob or Put Blob List (we’ll come back to this later) has happened since your request stopped. There is no way to tell whether your request completed successfully or not. If something else is busy doing something then better leave them to it :)
  • x-ms-copy-progress - if a Copy Blob is in progress then this tells you how far you have left to go
  • x-ms-copy-status-description - failed or pending
  • x-ms-copy-source - the source url for the copy

The thing with the Copy Blob call is that someone can abort your request, or it can complete successfully but then someone can either start a new Copy Blob request, wiping out your history or they can use a Put Block or Put Block List to make a change which again, wipes out your history.

If you were designing a solution to copy blobs about you would undoubtedly want to know which request should be allowed to complete and which ones should be aborted. We get some help because along with the copy information, Get Blob Properties includes the MD5 of the file so if we want to ensure a blob is replicated successfully we can use that to see if it has the value we expect or we can do something about it.

If you need to monitor lots of Copy Blob requests then each Get Blob Properties call counts as one transaction so having a loop of a few milliseconds might get you rate limited.

When it comes to security with Copy Blob there are some great options, you can use a SAS key on the url for either the source or destination or if you are copying a blob within a storage account you can use a shared key. I have used the account key for the source and SAS key for the destination, but ideally, I would use the SAS key for the source and the destination.

Put Block From URL

Put Block From URL wins the award for the driest sounding storage API but it is pretty cool. You can give it a source URL (HINT could be an s3 bucket or a web page somewhere), and it will copy the data into blob storage on your storage account. Wowsers, I’m in love :)

Because the source for Put Block From URL can be any URL a blob in a storage account can be the source!

To use this API you send a PUT request with an x-ms-copy-source with the URL to copy data from. If it is a storage account, then it either needs to be available anonymously or you include the SAS key in the URL.

When you send the request you can also include an optional timeout in the querystring timeout=20. The timeout is in seconds, and it is important as the request is synchronous. I haven’t tried using this with a large file that takes a long time to copy, and I wouldn’t expect the TCP/HTTP channel to stay up for days for example. That being said the maximum blob size is 4.75 TB so as long as the source system wasn’t terribly slow, you should get the data within a reasonable amount of time.

If you were copying small blobs then a synchronous call could work well for you, nodejs would be happy with it.

Once your request finishes you are not quite done yet, the blobs copied from URL will not be committed until you (or some other kind soul) calls Put Block List on the new blob. Once the Put Block List completes then you can go home for the day and tell your family how exciting the Azure storage API’s are.

Which one is better?

I prefer Copy Blob because it feels more logical to me, you create a request and then you wait for it to finish. With Put Block From URL you have a lot of flexibility You can get data from anywhere, but I find the whole “do this”, “now commit it” a litter onerous, give me a single operation and I’m sold.

It is worth pointing out that not all of the SDK’s have Put Block From URL, the .net sdk only had it added recently (https://github.com/Azure/azure-storage-net/blob/master/changelog.txt) in version 9.3.0. The nodejs sdk doesn’t have it and who knows what the java sdk has, we can’t get past the BlobServiceAbstractFactoryFactory to know what the hell it does!

References

Copy Blob https://docs.microsoft.com/en-us/rest/api/storageservices/copy-blob

Put Block From URL https://docs.microsoft.com/en-us/rest/api/storageservices/put-block-from-url

Get Blob Properties https://docs.microsoft.com/en-us/rest/api/storageservices/get-blob-properties

Introducing the async copy blob (way back in 2012!) https://blogs.msdn.microsoft.com/windowsazurestorage/2012/06/12/introducing-asynchronous-cross-account-copy-blob/