Technological advancements in the collection, storage, and retrieval of data have ushered in a new era of Big Data. In this day and age, peta-scale data sets are ubiquitous in such diverse fields as medicine, finance, science, and web traffic. These immense data sets enable us, in principle, to answer more complicated questions, discover subtler trends in data, and make more informed and accurate decisions. 

Technological advancements in the collection, storage, and retrieval of data have ushered in a new era of Big Data. In this day and age, peta-scale data sets are ubiquitous in such diverse fields as medicine, finance, science, and web traffic. These immense data sets enable us, in principle, to answer more complicated questions, discover subtler trends in data, and make more informed and accurate decisions. However, advanced analytics on big data becomes challenging when the data themselves do not fit into memory or when answers are needed so quickly that computing on the data on a single core does not suffice.

So how do we gain insight into these massive data?  Machine learning (ML) is a powerful set of sophisticated tools that enables the discovery of complex patterns in heterogeneous and high-dimensional data and gives us the ability to make optimized data-driven decisions.  As we collect more data, machine learning algorithms automatically learn from these new instances, allowing us further insight and more accurate assessment.  However, as data sets grow in size from giga- to petabytes, most standard off-the-shelf implementations of ML algorithms break down.

This is where wiseRF comes in.  In wiseRF, we have created a blazingly fast and scalable version of the popular machine learning algorithm, Random Forest. Random Forest is a highly accurate method for predicting a response variable of interest (e.g., if an email is spam) from heterogeneous input data (e.g., the sender, subject, and content of the email), and is widely regarded as one of the best ML tools around. It works by employing a set of training data with known response variable to discover an optimal set of rules that relate the high-dimensional input data to the response.  As training data become more abundant, standard Random Forest implementations become prohibitively slow and eventually are incapable of even operating on the data. But with wiseRF, speed and memory limitations are no longer a concern.

wise.io has forged a parternership with Continuum Analytics to make wiseRF available on the Anaconda Pro distribution for Big Data computing in Python.  With its fast data handling, easy parallelization, and scalable storage and computation capacity, we believe that Anaconda Pro is the best-in-class platform for Big Data analytics and the Python distribution that no data scientist should be without.  By making wiseRF available on Anaconda Pro, our mission is to enable deep data insight  on massive data in ways that standard analytics software falls short.  Our wiseRF software pairs perfectly with Anaconda Pro’s swift and scalable I/O and computing, allowing users unprecedented capacity to fit sophisticated ML models on huge data and unlock the secrets behind the bits.

WiseRF is a game changer.  It gives you the freedom to unleash the finest ML algorithm around to your immense data, giving you the capacity to fully utilize your data to answer your biggest questions.

At wise.io  we are working to make state-of-the-art machine learning methods scalable to your massive data needs.


About the Author

Max Gamurar

Contractor

Max Gamurar has been with the Anaconda Global Inc. team for over 2 years.

Read more

Join the Disucssion