This article describes how to deploy an Azure Data Science virtual machine from the command line using Azure CLI 2.0.
Setting up a compute environment for data science work can be challenging for several reasons:
- Identifying the software you need, and setting up all the required open source and other tools can be time-consuming and complex.
- Figuring out all the versions of individual tools which play nicely together is a hard problem.
- You could be working with sensitive data that needs to stay within a customer-controlled security boundary. Your trusty desktop/laptop won’t do. Data compliance is an increasingly important and complex area to navigate.
The Azure Data Science virtual machine (DSVM) images provide a quick way to get a data science and machine learning environment on virtual machines without requiring any software installation or configuration.
There are pre-configured images available for Ubuntu, Windows and CentOS. They each come with a comprehensive set of software including Jupyter Notebook Server with R, Python, development tools & IDEs, data movement and management systems, machine learning, deep learning, big data and many other tools.
You have the option of deploying DSVMs interactively through the Azure Portal, but it can save time to simply run a script to deploy a VM in a single step. If you need a client or an operator to deploy the VM infrastructure within their Azure subscription in order to maintain a security boundary, you can provide a script to run without having to document all the interactive steps.
Because the data science VMs are Azure Marketplace images, you can’t just issue a vm create statement in CLI and point to the image. You need to include the Marketplace plan name, product and publisher in the create statement.
Here’s an example bash script which uses Azure CLI 2.0 to automate the creation of Microsoft Windows 2016 Data Science VM, with an attached data disk. The script takes the following command line arguments:
- VM name
- Azure resource group name
- Azure data center location
- VM size (small, medium or large – estimates which Azure VM and data disk size to pick, edit the script if you want to pick different sizes)
The Azure resource group is created if does not already exist. This script can be run directly from the Azure Portal in an Azure Cloud Shell.
#!/bin/bash # script to create a Microsoft Windows 2016 data science VM in Azure VMNAME=$1 RGNAME=$2 LOCATION=$3 # e.g. westus2 CONFIG=$4 # small|medium|large USER=$5 # e.g. wpauser PASS=$6 # must be 12 or more characters PUB='microsoft-ads' OFFER='windows-data-science-vm' SKU='windows2016' VERSION='latest' # determine config size case $CONFIG in small) SIZE='Standard_D1_v2' DATASIZEGB='32' ;; medium) SIZE='Standard_D2_v3' DATASIZEGB='256' ;; large) SIZE='Standard_D8_v3' DATASIZEGB='1024' ;; esac # create the resource group (keeps going if already exists) az group create --name $RGNAME --location $LOCATION # create the VM az vm create \ --name $VMNAME --resource-group $RGNAME --image $PUB\:$OFFER\:$SKU\:$VERSION \ --plan-name $SKU --plan-product $OFFER --plan-publisher $PUB \ --admin-username $USER --admin-password $PASS \ --size $SIZE \ --data-disk-sizes-gb $DATASIZEGB
To run this script you need an Azure subscription, and to be in an environment with CLI installed. The simplest way to do this is to create the script as an executable file in your Azure Cloud Shell. For convenience I’ve put the script on github here: https://github.com/gbowerman/data-science/blob/master/CLI/create-ds-vm.sh.
The video below shows an example of running the script in the cloud shell once you’ve created the file. To create the file you could log into your cloud shell and run:
curl https://raw.githubusercontent.com/gbowerman/data-science/master/CLI/create-ds-vm.sh > create-ds-vm.sh chmod +x create-ds-vm.sh