Anaconda Easy Button – Microsoft SQL Server and Python

Previously there were many twisty roads that you may have followed if you wanted to use Python on a client system to connect to a Microsoft SQL Server database, and not all of those roads would even get you to your destination. With the news that Microsoft SQL Server 2017 has increased support for Python, by including a subset of Anaconda packages on the server-side, I thought it would be useful to demonstrate how Anaconda delivers the easy button to get Python on the client side connected to Microsoft SQL Server.

This blog post demonstrates how Anaconda can be used on the client-side to connect Python running on Windows, Mac, or Linux to a SQL Server instance. The instructions should work for many versions of SQL Server, Python, and Anaconda. If you run into any trouble let us know either through the Anaconda Community Support mailing list or on Twitter @ContinuumIO.

TL;DR: For the Impatient

If you’re the kind of person who just wants the punch line and not the story, there are three core steps to connect to an existing SQL Server database:

  1. Install the SQL Server drivers for your platform on the client system. That is described in the “Client Driver Installation” section below.

  2. conda install pyodbc

  3. Establish a connection to the SQL Server database with an appropriately constructed connection statement:

     conn = pyodbc.connect(
     r'DRIVER={ODBC Driver 13 for SQL Server};' +
     ('SERVER={server},{port};' +
     'DATABASE={database};' +
     'UID={username};' +
     'PWD={password}').format(
     server= 'sqlserver.testnet.corp',
     port= 1433,
     database= 'AdventureWorksDW2012',
     username= 'tanya',
     password= 'Tanya1234')
    )
    

For cut-and-paste convenience, here’s the string:

 (r'DRIVER={ODBC Driver 13 for SQL Server};' +
 ('SERVER={server},{port};' +
 'DATABASE={database};' +
 'UID={username};' +
 'PWD={password}').format(
 server= 'sqlserver.testnet.corp',
 port= 1433,
 database= 'AdventureWorksDW2012',
 username= 'tanya',
 password= 'Tanya1234')
)
'DRIVER={ODBC Driver 13 for SQL Server};SERVER=sqlserver.testnet.corp,1433;DATABASE=AdventureWorksDW2012;UID=tanya;PWD=Tanya1234'

Hopefully that doesn’t look too intimidating!

Here’s the scoop: the Python piece is easy (yay Python!) whereas the challenges are installing the platform-specific drivers (Step 1), and if you don’t already have a database properly setup then the SQL Server installation, configuration, database loading, and setting up appropriate security credentials are the parts that the rest of this blog post are going to go into in more detail. As well as a fully worked out example of client-side connection and query.

And you can grab a copy of this blog post as a Jupyter Notebook from Anaconda Cloud.

On With The Story

While this isn’t meant to be an exhaustive reference for SQL Server connectivity from Python and Anaconda it does cover several client/server configurations. In all cases I was running SQL Server 2016 on a Windows 10 system. My Linux system was CentOS 6.9 based. My Mac was running macOS 10.12.4, and my client-side Windows system also used Windows 10. The Windows and Mac Python examples were using Anaconda 4.3 with Python 3.6 and pyodbc version 3.0, while the Linux example used Anaconda Enterprise, based on Anaconda 4.2, using Python 2.7.

NOTE: In the examples below the $ symbol indicates the command line prompt. Do not include this in any commands if you cut-and-paste. Your prompt will probably look different!

Server Side Preparation

If you are an experienced SQL Server administrator then you can skip this section. All you need to know are:

  1. The hostname or IP address and port number for your SQL Server instance
  2. The database you want to connect to
  3. The user credentials that will be used to make the connection

The following provides details on how to setup your SQL Server instance to be able to exactly replicate the client-side Python-based connection that follows. If you do not have Microsoft SQL Server it can be downloaded and installed for free and is now available for Windows and Linux. NOTE: The recently released SQL Server 2017 and SQL Server on Azure both require pyodbc version >= 3.2. This blog post has been developed using SQL Server 2016 with pyodbc version 3.0.1.

This demonstration is going to use the Adventure Works sample database provided by Microsoft on CodePlex. There are instructions on how to install this into your SQL Server instance in Step 3 of this blog post, however you can also simply connect to an existing database by adjusting the connection commands below accordingly.

Many of the preparation steps described below are most easily handled using the SQL Server Management Studio which can be downloaded and installed for free.

Additionally this example makes use of the Mixed Authentication Mode which allows SQL Server-based usernames and passwords for database authentication. By default this is not enabled, and only Windows Authentication is permitted, which makes use of the pre-existing Kerberos user authentication token wallet. It should go without saying that you would only change to Mixed Authentication Mode for testing purposes if SQL Server is not already so configured. While the example below focuses on SQL Server Authentication there are also alternatives presented for the Windows Authentication Mode that uses Kerberos tokens.

You should know the hostname and port on which SQL Server is running and be sure you can connect to that hostname (or IP address) and port from the client-side system. The easiest way to test this is with telnet from the command line excuting the command:

$ telnet sqlserver.testnet.corp 1433

Where you would replace sqlserver.testnet.corp with the hostname or IP address of your SQL Server instance and 1433 with the port SQL Server is running on. Port 1433 is the SQL Server default. Executing this command on the client system should return output like:

Trying sqlserver.testnet.corp...
Connected to sqlserver.testnet.corp.
Escape character is '^]'.

telnet> close

At which point you can then type CTRL-] and then close.

If your client system is also Windows you can perform this simple Universal Data Link (UDL) test.

Finally you will need to confirm that your have access credentials for a user known by SQL Server and that the user is permitted to perform SELECT operations (and perhaps others) on the database in question. In this particular example we are making use of a fictional user named Tanya who has the username tanya and a password of Tanya1234. It is a 3 step process to get Tanya access to the Adventure Works database:

  1. The SQL Server user tanya is added as a Login to the DBMS:

    • which can be found in SQL Server Management Studio
    • under Security->Logins
    • right-click on Logins
    • add a New Login…
    • provide a Login name of tanya
    • select SQL Server authentication
    • provide a password of Tanya1234
    • uncheck the option for Enforce password policy (the other two will automatically be unchecked and greyed out)
  2. The database user needs to be added:

    • under Databases->AdventureWorksDW2012->Security->Users
    • right-click on Users
    • add a New User…
    • select SQL user with Login for type
    • add a Username set to tanya
    • add a Login name set to tanya.
  3. Grant tanya permissions on the AdventureWorksDW2012 database by executing the following query:

    use AdventureWorksDW2012; 
    GRANT SELECT, INSERT, DELETE, UPDATE ON SCHEMA::DBO TO tanya;
    

Using the UDL test method described above is a good way to confirm that tanya can connect to the database, even just from localhost, though it does not confirm if she can perform operations such as SELECT. For that I would recommend installing the free Microsoft Command Line Utilities 13.1 for SQL Server

Client Driver Installation

You’ll now need to get the drivers installed on your client system. There are two parts to this: the platform specific dynamic libraries for SQL Server, and the Python ODBC interface. You won’t be surprised to hear that the platform-specific libraries are harder to get setup, but this should still only take 10-15 minutes and there are established processes for all major operating systems.

Linux

The Linux drivers are avaialble for RHEL, Ubuntu, and SUSE. This is the process that you’d have to follow if you are using Anaconda Enterprise, as well as for anyone using a Linux variant with Anaconda installed.

My Linux test system was using CentOS 6.9, so I followed the RHEL6 installation procedure from Microsoft (linked above), which essentially consisted of 3 steps:

  1. Adding the Microsoft RPM repository to the yum configuration
  2. Pre-emptively removing some packages that may cause conflicts (I didn’t have these installed)
  3. Using yum to install the msodbcsql RPM for version 13.1 of the drivers

In my case I had to play around with the yum command and in the end just doing:

$ ACCEPT_EULA=Y yum install msodbcsql

Windows

The Windows drivers are dead easy to install. Download the msodbcsql MSI file, double click to install, and you’re in business.

Mac

The SQL Server drivers for Mac need to be installed via Homebrew which is a popular OS X package manager, though not affiliated with nor supported by Apple. Microsoft has created their own tap which is a Homebrew package repository. If you don’t already have Homebrew installed you’ll need to do that, then Microsoft have provided some simple instructions describing how to add the SQL Server tap and then install the mssql-tools package. The steps are simple enough I’ll repeat them here, though check out that link above if you need more details or background.

$ brew tap microsoft/mssql-preview https://github.com/Microsoft/homebrew-mssql-preview
$ brew update
$ brew install mssql-tools

One thing I’ll note is that the Homebrew installation output suggested I should execute a command to remove one of the SQL Server drivers. Don’t do this! That driver is required. If you’ve already done it then the way to correct the process is to reset the configuration file by removing and re-adding the package:

$ brew remove mssql-tools
$ brew install mssql-tools

Install Anaconda

Download and install Anaconda if you don’t already have it on your system. There are graphical and command line installers available for Windows, Mac, and Linux. It is about 400 MB to download and a bit over 1 GB installed. If you’re looking for a minimal system you can install Miniconda instead (command line only installer) and then a la carte pick the packages you want with conda install commands.

Anaconda users or administrators can simply execute the commands below in the conda environment where they want pyodbc to be available.

Python ODBC package

This part is easy. You can just do:

$ conda install pyodbc

And if you’re not using Anaconda or prefer pip, then you can also do:

$ pip install pyodbc

NOTE: If you are using the recently released SQL Server 2017 you will need pyodbc >= 3.2. There should be a conda package available for that “shortly” but be sure to check which version you get if you use the conda command above.

Connecting to SQL Server using pyodbc

Now that you’ve got your server-side and client-side systems setup with the correct software, databases, users, libraries, and drivers it is time to connect. If everything works properly these steps are very simple and work for all platforms. Everything that is platform-specific has been handled elsewhere in the process.

We start by importing the common pyodbc package. This is Microsoft’s recommended Python interface to SQL Server. There was an alternate Python interface pymssql that at one point was more reliable for SQL Server connections and even until quite recently was the only way to get Python on a Mac to connect to SQL Server, however with Microsoft’s renewed support for Python and Microsoft’s own Mac Homebrew packages it is now the case that pyodbc is the leader for all platforms.

import pyodbc

Use a Python dict to define the configuration parameters for the connection

config = dict(server= 'sqlserver.testnet.corp', # change this to your SQL Server hostname or IP address
 port= 1433, # change this to your SQL Server port number [1433 is the default]
 database= 'AdventureWorksDW2012',
 username= 'tanya',
 password= 'Tanya1234')

Create a template connection string that can be re-used.

conn_str = ('SERVER={server},{port};' +
 'DATABASE={database};' +
 'UID={username};' +
 'PWD={password}')

If you are using the Windows Authentication mode where existing authorization tokens are picked up automatically this connection string would be changed to remove UID and PWD entries and replace them with TRUSTED_CONNECTION, as below:

trusted_conn_str = ('SERVER={server};' +
 'DATABASE={database};' +
 'TRUSTED_CONNECTION=yes')

Check your configuration looks right:

config
{'database': 'AdventureWorksDW2012',
 'password': 'Tanya1234',
 'port': 1433,
 'server': 'sqlserver.testnet.corp',
 'username': 'tanya'}

Now open a connection by specifying the driver and filling in the connection string with the connection parameters.

The following connection operation can take 10s of seconds to complete.

conn = pyodbc.connect(
 r'DRIVER={ODBC Driver 13 for SQL Server};' +
 conn_str.format(**config)
 )

Executing Queries

Request a cursor from the connection that can be used for queries.

cursor = conn.cursor()

Perform your query.

cursor.execute('SELECT TOP 10 EnglishProductName FROM dbo.DimProduct;')
<pyodbc.Cursor at 0x7f7ca4a82d50>

Loop through to look at the results (an iterable of 1-tuples, containing unicodde strings of the results).

for entry in cursor:
 print(entry)
(u'Adjustable Race', )
(u'Bearing Ball', )
(u'BB Ball Bearing', )
(u'Headset Ball Bearings', )
(u'Blade', )
(u'LL Crankarm', )
(u'ML Crankarm', )
(u'HL Crankarm', )
(u'Chainring Bolts', )
(u'Chainring Nut', )

Data Science Happens Here

Now that we’ve demonstrated how to connect to a SQL Server instance from Windows, Mac, and Linux using Anaconda it is possible to use T-SQL queries to interact with that database as you normally would.

Looking to the future, the latest preview release of SQL Server 2017 includes a server-side Python interface built around Anaconda. There are lots of great resources on Python and SQL Server connectivity from the team at Microsoft, and here are a few that you may find particularly interesting:

Next Steps

My bet is that if you’re reading a blog post on SQL Server and Python (and you can download a Notebook version of it here), then you’re using it in a commercial context. Anaconda is going to be the best way for you and your organization to make a strategic investment in open data science.

See how Anaconda is transforming data science through our webinar series or grab one of our white papers on enterprise open data science.

Let us help you be successful in your strategic adoption of Python and Anaconda for high-performance enterprise-oriented open data science connected to your existing data sources and systems, such as SQL Server.

Talk to an Expert

Talk to one of our financial services and banking industry experts to find solutions for your AI journey.

Talk to an Expert