MlBayesOpt

R package to tune parameters for machine learning(Support Vector Machine, Random Forest, and Xgboost), using bayesian optimization with gaussian process

45
15
R

output: github_document

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

MlBayesOpt

CRAN_Status_Badge
Build Status
AppVeyor Build Status
Coverage Status

Overview

This is an R package to tune hyperparameters for machine learning algorithms
using Bayesian Optimization based on Gaussian Processes. Algorithms currently supported are: Support Vector Machines, Random Forest, and XGboost.

Why MlBayesOpt ?

Easy to write

It’s very easy to write Bayesian Optimaization function, but you also able to customise your model very easily. You have only to specify the data and the column name of the label to classify.

On XgBosst functions, your data frame is automatically transformed into xgb.DMatrix class.

Any label class is OK

Any class (character, integer, factor) of label column is OK. The class of the label column is automatically transformed.

Installation

install.packages("MlBayesOpt")

You can also install MlBayesOpt from github with:

# install.packages("githubinstall")
githubinstall::githubinstall("MlBayesOpt")

# install.packages("devtools")
devtools::install_github("ymattu/MlBayesOpt")

Data

Small Fashion MNIST

fashion_train and fashion_test are data reproduced from Fashion-MNIST. Each data has 1,000 rows and 784 feature column, and 1 label column named y.

fashion is a data made by the function dplyr::bind_rows(fashion_train, fashion_test).

iris

iris_train and iris_test are included in this pacakge. iris_train is odd-numbered rows of iris data, and iris_testis even-numbered rows of iris data.

Example

3-fold cross validation for iris data, using SVM.

devtools::load_all()
library(MlBayesOpt)

set.seed(71)
res0 <- svm_cv_opt(data = iris,
                   label = Species,
                   n_folds = 3,
                   init_points = 10,
                   n_iter = 1)

3-fold cross validation for iris data, using Xgboost.

res0 <- xgb_cv_opt(data = iris,
                   label = Species,
                   objectfun = "multi:softmax",
                   evalmetric = "mlogloss",
                   n_folds = 3,
                   classes = 3,
                   init_points = 10,
                   n_iter = 1)

For Details

See the vignette

ToDo

  • [x] Make functions to execute cross validation
  • [ ] Fix minor bugs