AutoML Automated Machine Learning
What is AutoML?
Machine learning (ML) success crucially relies on human machine learning experts to perform the following tasks:
- Preprocess and clean the data.
- Select and construct appropriate features.
- Select an appropriate model family.
- Optimize model hyperparameters.
- Postprocess machine learning models.
- Critically analyze the results obtained.
AutoML systems
- AutoWEKA is an approach for the simultaneous selection of a machine learning algorithm and its hyperparameters; combined with the WEKA package it automatically yields good models for a wide variety of data sets.
- Auto-sklearn is an extension of AutoWEKA using the Python library scikit-learn which is a drop-in replacement for regular scikit-learn classifiers and regressors.
- TPOT is a data-science assistant which optimizes machine learning pipelines using genetic programming.
- H2O AutoML provides automated model selection and ensembling for the H2O machine learning and data analytics platform. H2O AutoML now includes XGBoost GBMs (Gradient Boosting Machines) among its set of algorithms.
- TransmogrifAI is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning.
- MLBoX is an AutoML library with three components: preprocessing/cleaning/formatting, hyper-parameter optimization and prediction.
Hyperparameter Optimization
Suggestion from Bergstra et al.’s Making a science of model search. These include:
- Hyperopt, including the TPE algorithm
- Sequential Model-based Algorithm Configuration (SMAC)
- Spearmint A package to perform Bayesian optimization
ML Freiburg provide packages for hyperparameter optimization:
- BOHB – Bayesian Optimization combined with HyperBand
- RoBO – Robust Bayesian Optimization framework
- SMAC3 – a python re-implementation of the SMAC algorithm
Architecture Search
The field of architecture search addresses the problem of finding a well-performing architecture of a deep neural network. For example, this includes the number of layers, number of neurons, the type of activation functions and many more design decisions
Packages for architecture search and hyperoptimization for deep learning include:
Preparing time series data
Time series data must be transformed into a structure of samples with input and output components before it can be used to fit a supervised learning model
Keras TimeseriesGenerator
TimeseriesGenerator to automatically transform both univariate and multivariate time series data into samples, ready to train deep learning models. A model can be trained using the TimeseriesGenerator as a data generator. This can be achieved by fitting the defined model using the fit_generator() function.