How do you automate deployment of a multi-VM application from staging to production?
Three high-level approaches might be:
1. Roll out deployment, update one or more VMs at a time. This can be a good approach to avoid downtime, as only a subset of machines are down during update. It assumes different versions of the application can coexist. See: https://msftstack.wordpress.com/2016/05/17/how-to-upgrade-an-azure-vm-scale-set-without-shutting-it-down/
2. Create a staging cluster and move it to production by swapping the network endpoints. Also known as a blue-green deployment, this is a good way to publish an application as a consistent or immutable set, but can involve downtime during the transition.
3. Use Application Gateway with two backend pools and a routing rule. Have two backend pools – one stage pool and one prod pool. Add stage VMSS to stage pool, prod VMSS to prod pool. Have one routing rule in the app gateway. Depending on whether you want to use stage or prod VMSS, this rule will change to point to the appropriate backend address pool.
This article describes 2. how to swap the public IP addresses between two Azure Load Balancers in Azure Resource Manager (ARM). You can use this method if for example you have two VM scale sets behind load balancers, one production and one staging, and you want to move the staging scale set into production.
The Azure Cloud Service (classic) deployment method included an asynchronous Swap Deployment operation, which was a fast and powerful way to initiate a virtual IP address swap between staging and deployment environments. Azure Resource Manager doesn’t have an equivalent built-in VIP swap function, so if you have a staging environment behind a load balancer, and want to swap it with a production environment behind another load balancer, you have to do something like:
Since you can’t assign a public IP address to another resource until it has been unassigned from its current resource, one way to do this is to create a temporary IP address as a float. E.g.
One caveat to be aware of is that these operations involve some downtime. Unassigning a public IP address can take around ~30 seconds (at the time of writing), therefore total downtime for your app could be at least 60 seconds as temp/staging/production IP addresses are moved around.
Test environment
To test VIP swap in ARM I created two load balancers in the same resource group and VNET, and associated each one with a different VM scale set. The Azure Resource Manager templates used to set up this infrastructure can be found here: https://github.com/gbowerman/azure-myriad/tree/master/vip-swap
PowerShell VIP swap example
# put your load balancer names, resource group and location here $lb1name = 'vipswap1lb' $lb2name = 'vipswap2lb' $rgname = 'vipswap' $location = 'southcentralus' # create a new temporary public ip address "Creating a temporary public IP address" new-AzureRmPublicIpAddress -name 'floatip' -ResourceGroupName $rgname -location $location -AllocationMethod Dynamic $floatip = Get-AzureRmPublicIpAddress -name 'floatip' -ResourceGroupName $rgname # get the LB1 model $lb1 = Get-AzureRmLoadBalancer -Name $lb1name -ResourceGroupName $rgname $lb1_ip_id = $lb1.frontendIPConfigurations.publicIPAddress.id # set the LB1 IP addr to floatip "Assigning the temporary public IP address id " + $floatip.id + " to load balancer " + $lb1name $lb1.FrontendIpConfigurations.publicIpAddress.id = $floatip.id Set-AzureRmLoadBalancer -LoadBalancer $lb1 # get the LB2 model $lb2 = Get-AzureRmLoadBalancer -Name $lb2name -ResourceGroupName $rgname $lb2_ip_id = $lb2.FrontendIPConfigurations.publicIPAddress.id # set the LB2 IP addr to lb1 IP "Assigning the public IP address id " + $lb1_ip_id + "to load balancer " + $lb2name $lb2.FrontendIpConfigurations.publicIpAddress.id = $lb1_ip_id Set-AzureRmLoadBalancer -LoadBalancer $lb2 # set the LB1 IP addr to old lb2 IP "Assigning the public IP id " + $lb2_ip_id + " to load balancer " + $lb1name $lb1.FrontendIpConfigurations.publicIpAddress.id = $lb2_ip_id Set-AzureRmLoadBalancer -LoadBalancer $lb1 # now delete the floatip "Deleting the temporary public IP address" Remove-AzureRmPublicIpAddress -Name 'floatip' -ResourceGroupName $rgname -Force
Python VIP swap example
Here’s a Python example based on the azurerm REST wrapper library, which follows the same logic, and adds some timing code: https://github.com/gbowerman/vmsstools/blob/master/vipswap/vip_swap.py
The output looks like this:
In this case downtime was measured at 61 seconds.
Next steps
If your application requires immutable deployment approaches, for further reading take a look at the Azure Spinnaker, an Azure port of Netflix’s open source continuous delivery platform: http://www.codechannels.com/video/microsoft/azure/host-spinnaker-on-azure/ and associated templates: https://azure.microsoft.com/en-us/resources/templates/?term=Spinnaker.
Is there any plan to support the VIP swap transparently without any downtime? Or allow multiple availability set co-exists in one LB? It’s hard to believe Azure does not provide a solution at this time.
I am using ApplicationGateway as a workaround, but I don’t think this is a good approach.
I believe there are some Traffic Manager and Load Balancer enhancements planned which will make this better but don’t have details yet.