Deploying an Azure Data Science VM from the command line

This article describes how to deploy an Azure Data Science virtual machine from the command line using Azure CLI 2.0.

Setting up a compute environment for data science work can be challenging for several reasons:

  • Identifying the software you need, and setting up all the required open source and other tools can be time-consuming and complex.
  • Figuring out all the versions of individual tools which play nicely together is a hard problem.
  • You could be working with sensitive data that needs to stay within a customer-controlled security boundary. Your trusty desktop/laptop won’t do. Data compliance is an increasingly important and complex area to navigate.

The Azure Data Science virtual machine (DSVM) images provide a quick way to get a data science and machine learning environment on virtual machines without requiring any software installation or configuration.

There are pre-configured images available for Ubuntu, Windows and CentOS. They each come with a comprehensive set of software including Jupyter Notebook Server with R, Python, development tools & IDEs, data movement and management systems, machine learning, deep learning, big data and many other tools.

You have the option of deploying DSVMs interactively through the Azure Portal, but it can save time to simply run a script to deploy a VM in a single step. If you need a client or an operator to deploy the VM infrastructure within their Azure subscription in order to maintain a security boundary, you can provide a script to run without having to document all the interactive steps.

Because the data science VMs are Azure Marketplace images, you can’t just issue a vm create statement in CLI and point to the image. You need to include the Marketplace plan name, product and publisher in the create statement.

Here’s an example bash script which uses Azure CLI 2.0 to automate the creation of Microsoft Windows 2016 Data Science VM, with an attached data disk. The script takes the following command line arguments:

  • VM name
  • Azure resource group name
  • Azure data center location
  • VM size (small, medium or large – estimates which Azure VM and data disk size to pick, edit the script if you want to pick different sizes)
  • user
  • password

The Azure resource group is created if does not already exist. This script can be run directly from the Azure Portal in an Azure Cloud Shell.

#!/bin/bash
# script to create a Microsoft Windows 2016 data science VM in Azure
VMNAME=$1
RGNAME=$2
LOCATION=$3  # e.g. westus2
CONFIG=$4    # small|medium|large
USER=$5      # e.g. wpauser
PASS=$6      # must be 12 or more characters

PUB='microsoft-ads'
OFFER='windows-data-science-vm'
SKU='windows2016'
VERSION='latest'

# determine config size
case $CONFIG in
    small)
        SIZE='Standard_D1_v2'
        DATASIZEGB='32'
        ;;
    medium)
        SIZE='Standard_D2_v3'
        DATASIZEGB='256'
        ;;
    large)
        SIZE='Standard_D8_v3'
        DATASIZEGB='1024'
        ;;
esac

# create the resource group (keeps going if already exists)
az group create --name $RGNAME --location $LOCATION

# create the VM
az vm create \
    --name $VMNAME --resource-group $RGNAME --image $PUB\:$OFFER\:$SKU\:$VERSION \
    --plan-name $SKU --plan-product $OFFER --plan-publisher $PUB \
    --admin-username $USER --admin-password $PASS \
    --size $SIZE \
    --data-disk-sizes-gb $DATASIZEGB

To run this script you need an Azure subscription, and to be in an environment with CLI installed. The simplest way to do this is to create the script as an executable file in your Azure Cloud Shell. For convenience I’ve put the script on github here: https://github.com/gbowerman/data-science/blob/master/CLI/create-ds-vm.sh.

The video below shows an example of running the script in the cloud shell once you’ve created the file. To create the file you could log into your cloud shell and run:

curl https://raw.githubusercontent.com/gbowerman/data-science/master/CLI/create-ds-vm.sh > create-ds-vm.sh
chmod +x create-ds-vm.sh
Advertisements
This entry was posted in Cloud, Computers and Internet, Data science, Linux, Python, Ubuntu and tagged , , , , . Bookmark the permalink.

One Response to Deploying an Azure Data Science VM from the command line

  1. Pingback: Azure Weekly: May 21, 2018 – Lots of Announcements from //Build 2018 – Build Azure

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s