//
archives

kaggle

This tag is associated with 2 posts

46th position

Today I’m at the 46th position of the Online Product Sales leadership board. Not a raving result, but it’s a start. And moreover, it is the improvements that come with each model that make me want to try more and do better.

I think I have squeezed most of the improvements randomForest is able to give me. Now I need to look for other techniques. Maybe blending a few models together, or feed models into a neural network or something. *rub hands together* I have one more week to play around before I start preparing myself for the corporate world again.

What’s a Data Scientist

Well after the bigdata.sg meet up I went to search a bit more about the definition of what is a data scientist – the growing buzzword for the moment.  I like the one in the yahoo article that interviewed EMC Greenplum’s Steven Hillion.  His take on the definition (the rest of the article can be found here):

To Hillion, data scientists are “analytically-minded, statistically and mathematically sophisticated data engineers who can infer insights into business and other complex systems out of large quantities of data.”

The skill set of the data scientist goes beyond the capabilities of what many would call “traditional business intelligence (BI).” Traditional BI is interested in the “what and the where,” while data scientists are interested in the “how and why,” Hillion says. “They’re interested in inferring things that are not already present in the data.”

I like the part where he mentions that they are “equal parts engineer, statistician and investigative journalist / forensic reporter”.  I can relate to those, but something is missing – the programming/hacker skills?  And of course the need to understand the business.  They need to listen to people, understand what questions they’re asking, but then sort of read between the lines. Skill in mathematics, statistics, modeling and data mining are of course essential.

Can’t wait to jump into Kaggle and have fun!

 

mathbabe

Exploring and venting about quantitative issues

The Stone and the Shell

Using large digital libraries to advance literary history

Hi. I'm Hilary Mason.

Zoom out, zoom in, zoom out.

Introduction to Data Science, Columbia University

Blog to document and reflect on Columbia Data Science Class

statMethods blog

A Quick-R Companion

the Tarzan

[R] + applied economics.

4D Pie Charts

Scientific computing, data viz and general geekery, with examples in R and MATLAB.