//
archives

best practice

This tag is associated with 1 post

RandomForest again : Better Practice

Learning so much from each debugging.  Following previous post, where I got the “new factor levels not present in the training data”.  Here’s some advice I copied from https://stat.ethz.ch/pipermail/r-sig-ecology/2008-September/000320.html, just so I know where to look next time.

And here’s more to read about randomForest from http://www.stat.berkeley.edu/~breiman/RandomForests/

do str(indata) and str(test) give the same information regarding the
types of variables? If any of the variables used are factors, do the
factors have the same levels in indata and test?

I'd probably do this differently, and store the test and training data
in the same df to start with, and then split it out at random into a
training and test set object (or just use the indices on the main object
depending on whether I want the training or test rows).

This way, the variables will be the same type/format/structure as they
came from the same df to begin with.

Also, I really don't follow your loop code. You seem to be indexing
indata without reference to columns/rows in first line within the loop.
There also seem to be several syntax errors - too many "]"?

So start simple, set y <- 7 and perform the first run of the loop "by
hand" and once that works, then do the loop in full.

 

Advertisements
mathbabe

Exploring and venting about quantitative issues

The Stone and the Shell

Using large digital libraries to advance literary history

Hi. I'm Hilary Mason.

Zoom out, zoom in, zoom out.

Introduction to Data Science, Columbia University

Blog to document and reflect on Columbia Data Science Class

statMethods blog

A Quick-R Companion

the Tarzan

[R] + applied economics.

4D Pie Charts

Scientific computing, data viz and general geekery, with examples in R and MATLAB.