Today I’m at the 46th position of the Online Product Sales leadership board. Not a raving result, but it’s a start. And moreover, it is the improvements that come with each model that make me want to try more and do better.
I think I have squeezed most of the improvements randomForest is able to give me. Now I need to look for other techniques. Maybe blending a few models together, or feed models into a neural network or something. *rub hands together* I have one more week to play around before I start preparing myself for the corporate world again.
Well after the bigdata.sg meet up I went to search a bit more about the definition of what is a data scientist – the growing buzzword for the moment. I like the one in the yahoo article that interviewed EMC Greenplum’s Steven Hillion. His take on the definition (the rest of the article can be found here):
To Hillion, data scientists are “analytically-minded, statistically and mathematically sophisticated data engineers who can infer insights into business and other complex systems out of large quantities of data.”
The skill set of the data scientist goes beyond the capabilities of what many would call “traditional business intelligence (BI).” Traditional BI is interested in the “what and the where,” while data scientists are interested in the “how and why,” Hillion says. “They’re interested in inferring things that are not already present in the data.”
I like the part where he mentions that they are “equal parts engineer, statistician and investigative journalist / forensic reporter”. I can relate to those, but something is missing – the programming/hacker skills? And of course the need to understand the business. They need to listen to people, understand what questions they’re asking, but then sort of read between the lines. Skill in mathematics, statistics, modeling and data mining are of course essential.
Can’t wait to jump into Kaggle and have fun!
Exploring and venting about quantitative issues
Using large digital libraries to advance literary history
Zoom out, zoom in, zoom out.
Blog to document and reflect on Columbia Data Science Class
A Quick-R Companion
[R] + applied economics.
Scientific computing, data viz and general geekery, with examples in R and MATLAB.