High-Dimensional Random Forest Demo

High-dimensional random forest demo

A strong random-forest example is a sparse signal in a higher-dimensional space: only features 1 and 2 carry the true class structure, while features 3 through d are pure nuisance variables. Random forests often beat a single deep tree here because feature subsampling reduces the chance of repeatedly splitting on spurious noise coordinates.

Signal hidden in features 1 and 2

Total number of features (d)

Training points per class

Validation points per class

Number of trees

Samples per bag

Features considered at each split (k)

Split criterion

Boundary shown in the left plot

class -1 class +1 circles = training, squares = validation

—single-tree val acc

—bagging val acc

—RF val acc

—RF OOB error

—best current method

Validation error versus number of trees. The dashed gray line is the single deep tree baseline.

Random-forest feature importance. Ideally features 1 and 2 dominate, because they are the only informative coordinates.