you're reading...

Another K Means example to learn from

Adventures in R

I am a fan of K-means approaches to clustering data particularly when you have a theoretical reason to expect a certain number of clusters and you have a large data set. However, I think ploting the cluster means can be misleading. Reading though Hadley Wickham’s ggplot2 book he suggest the following, to which I add a few little change.

#First we run the kmeans analysis: In brackets is the dataset used #(in this case I only want variables #1 through 11 hence the [1:11]) #and the number of clusters I want produced (in this case 4).
cl <-kmeans(mydata[1:11],4)
#We will need to add an id variable for later use. In this case I have called it .row.
clustT1WIN$.row <-rownames(clustT1WIN)
#At this stage I also make a new variable indicating…

View original post 344 more words



No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


Exploring and venting about quantitative issues

The Stone and the Shell

Using large digital libraries to advance literary history

Hi. I'm Hilary Mason.

Zoom out, zoom in, zoom out.

statMethods blog

A Quick-R Companion

the Tarzan

[R] + applied economics.

4D Pie Charts

Scientific computing, data viz and general geekery, with examples in R and MATLAB.