AutoML Automated Machine Learning

What is AutoML?

Machine learning (ML) success crucially relies on human machine learning experts to perform the following tasks:

  • Preprocess and clean the data.
  • Select and construct appropriate features.
  • Select an appropriate model family.
  • Optimize model hyperparameters.
  • Postprocess machine learning models.
  • Critically analyze the results obtained.

AutoML systems

  • AutoWEKA is an approach for the simultaneous selection of a machine learning algorithm and its hyperparameters; combined with the WEKA package it automatically yields good models for a wide variety of data sets.
  • Auto-sklearn is an extension of AutoWEKA using the Python library scikit-learn which is a drop-in replacement for regular scikit-learn classifiers and regressors.
  • TPOT is a data-science assistant which optimizes machine learning pipelines using genetic programming.
  • H2O AutoML provides automated model selection and ensembling for the H2O machine learning and data analytics platform. H2O AutoML now includes XGBoost GBMs (Gradient Boosting Machines) among its set of algorithms.
  • TransmogrifAI is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning.
  • MLBoX is an AutoML library with three components: preprocessing/cleaning/formatting, hyper-parameter optimization and prediction.

Hyperparameter Optimization

Suggestion from Bergstra et al.’s Making a science of model search. These include:

  • Hyperopt, including the TPE algorithm
  • Sequential Model-based Algorithm Configuration (SMAC)
  • Spearmint A package to perform Bayesian optimization

ML Freiburg provide packages for hyperparameter optimization:

  • BOHB – Bayesian Optimization combined with HyperBand
  • RoBO – Robust Bayesian Optimization framework
  • SMAC3 – a python re-implementation of the SMAC algorithm

The field of architecture search addresses the problem of finding a well-performing architecture of a deep neural network. For example, this includes the number of layers, number of neurons, the type of activation functions and many more design decisions

Packages for architecture search and hyperoptimization for deep learning include:

  • AutoKeras
  • DEvol : Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization
  • HyperAS: a combination of Keras and Hyperopt
  • Talos: Hyperparameter Optimization for Keras Models

Preparing time series data

Time series data must be transformed into a structure of samples with input and output components before it can be used to fit a supervised learning model

Keras TimeseriesGenerator

TimeseriesGenerator to automatically transform both univariate and multivariate time series data into samples, ready to train deep learning models. A model can be trained using the TimeseriesGenerator as a data generator. This can be achieved by fitting the defined model using the fit_generator() function.

Reference