Random Forests simplified

  • Decide on a number record sets and call bootStrapAggregate in a loop based on numRecordSets. The sizes of the modified datasets for each loop are unchanged but in each set about 37% of the records have been replaced randomly with duplicates.
  • Decide on a number of feature groupings (typically √p where p is the number of features). Within the bootStrapAggregate method we call randomSubspace (passing the bootStrapAggregate dataset) numFeatureGroups times. The sizes of the modified datasets are unchanged but in each ~37% of the features have been randomly replaced with duplicates.
  • For each of the (numRecordSets * numFeatureGroups) combinations we have a dataset that has been modified twice and we will call trainDecisionTree with that dataset.
  • At the end we will take the mode (classification) or the mean (regression) vote for all the trees as our prediction

--

--

--

Data Scientist, Artificial Intelligence, Machine Learning, Author of “Artificial Intelligence with Python”

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The Challenges of Deploying High-Performance NLP

Setting up the Kaggle API on your Paperspace machine

Sourced from: https://www.pexels.com/photo/battle-black-blur-board-game-260024/

Object Detection using YOLOv3

To Recognize Families in the Wild: A Machine Vision Tutorial

Machine Learning Infrastructure ~Terraforming SageMaker Part 1~

Deep Learning book in plain English Ch1

Machine Learning from Scarth (with Python)

Fashion items recognition with Logistic Regression- Implementing a multi-class basic image…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alberto Artasanchez

Alberto Artasanchez

Data Scientist, Artificial Intelligence, Machine Learning, Author of “Artificial Intelligence with Python”

More from Medium

Choosing the Right Algorithm: Logistic Regression vs Linear Regression

Hypothesis Testing

Can Decision Tree be used for Regression problems?

List Of Common Distances In Data Sciense

https://www.pexels.com/@jake-francis-105190