tl;dr Blaze migrates data efficiently between a variety of data stores.
In our last post on Blaze expressions we showed how Blaze can execute the same tabular query on a variety of computational backends. However, this ability is only useful if you can migrate your data to the new computational system in the first place.
To help with this, Blaze provides the
into function which moves data from one container type to another. In this blog, we will go over how to perform these data migrations. If you want to see this and other functions in action, sign up for the Getting Started with Blaze webinar on October 8th.
into function takes two arguments,
b, and it puts the data in
b into a container like
a. For example, if we have the class iris dataset in a CSV file (
iris.csv includes measurements and species of various flowers)
$ head iris.csv SepalLength,SepalWidth,PetalLength,PetalWidth,Species 5.1,3.5,1.4,0.2,setosa 4.9,3.0,1.4,0.2,setosa 4.7,3.2,1.3,0.2,setosa 4.6,3.1,1.5,0.2,setosa 5.0,3.6,1.4,0.2,setosa 5.4,3.9,1.7,0.4,setosa 4.6,3.4,1.4,0.3,setosa 5.0,3.4,1.5,0.2,setosa 4.4,2.9,1.4,0.2,setosa
We can load this csv file into a Python list, a NumPy array, and a Pandas DataFrame, all using the
List <— CSV
NumPy <— CSV
Pandas <— CSV
Again, Blaze isn’t doing any of the work, it just calls out to the
read_csv function of the appropriate library with the right inputs.
The cases above, moving a csv file into a Pandas DataFrame, are generally well known to Python-savvy data scientists. Where Blaze adds real value is in extending this operation into other powerful and yet less-well-understood backends. We demonstrate breadth below by moving data between some of these more exotic backends.
SQL <— CSV
MongoDB <— Pandas
into doesn’t work just with csv files. We can use it to convert between any pair of data types.
And to demonstrate that it’s there
BColz <— MongoDB
Finally, we migrate from a Mongo database to a BColz out-of-core array.
Robustness and Performance
Blaze leverages known solutions where they exist. For example, migrating from CSV files to SQL databases we use fast, built-in loaders for that particular database.
Blaze manages solutions where they don’t exist. For example, when migrating from a MongoDB to a BColz out-of-core array we stream the database through Python, translating types as necessary.