This one-day course provides an opportunity to explore general concepts in machine learning using the Python library scikit-learn. The focus of the course is on learning how to apply existing Python implementations of machine learning algorithms to solve custom data analysis problems. The course mixes presentation with hands on exploration in roughly equal parts.

What You’ll Learn

  • How to select between available models and machine learning tools in appropriate data analysis contexts
  • When and how to apply strategies for data normalization and dealing with outliers
  • How to develop training/testing sets and perform model validation
  • How to perform feature selection and use estimators and scoring metrics
  • How to construct pipelines involving multiple estimators

Topics Covered

  • Core features of the Python package scikit-learn
  • Supervised and unsupervised learning
  • Principal algorithms in machine learning (e.g., K-nearest neighbor classification, support vector machines, principal/independent component analysis, cross-validation, etc.)

Who Should Attend

Data scientists and analysts wanting better practical understanding of algorithmic frameworks underlying machine learning.

This course has a limit of 20 participants.


Participants need to have a decent background in mathematics (particularly in statistics and linear algebra) and strong Python programming skills (e.g., thorough knowledge of Python’s core data structures, familiarity with the Python Standard Library, some experience with NumPy, pandas and related libraries in the Python scientific stack).