you're reading...

BigData Meetup in SG

Went for my first Big Data Meetup last night.  Short but interesting talks.  It was nice to see many passionate people turning up on a Friday night for something probably not exactly part of their job.

David Smith of Revolution Analytics was there, as was their GM and consultant.  He gave a talk on  “Future of Big Data Analytics: Data Science holds the key to unlocking insight”, which gave a good description of what skills he thinks are necessary for someone to become a data scientist.  And why machine learning alone is not enough.  Basically data science sits in the sweet spot of the overlapping area of 3 circles in a venn diagram of – Hacking (computer, programming skills), Statistics and Substantial domain expertise (getting data source, good understanding of the data, relationships between variables, inherent assumptions…essentially to put data and analysis into context so that they are meaningful not just in mathematical/model terms).  It encouraged me a lot that he came from a statistics background, at least I got 1 area covered.

Did not take much notes or photos of his slides, but it appears that data science involves a far amount of effort at the initial stage of finding data sources, massaging, mashups, linking/mapping and cleaning.  That’s because data sources rarely exist in nicely structured formats, and you will need to source for them through relational databases, web scrapping and available APIs.

The analysis stage requires the muscles of statistics and machine learning.  For large datasets, there is a need to move the code to the data i.e. leverage parallel computing, MapReduce to efficiently process the scale of data. In a nutshell, there are 3 layers as shown below:

Presentation layer – BI tools, Reports
Analytical layer – R
Data layer – RDBMS, unstructured data

I had two questions that I didn’t get to ask though:

Q1 what is level of maturity of the data science industry in SG

Q2 how will it evolve? Data scientists embedded within companies, or specialized data science consulting firms to emerge?

Two more people talked about the Heritage Health prize and Kaggle.  Kaggle really provided one of them a good platform to learn, practice and be validated (think: get a job interview if you win a prize).  My plans are in the right direction at least.   What’s left is execution.  Kaggle, here I come (after exams).


There was a presentation on UP Singapore by a group called Newton Circus.  Somewhat related because technical developers or data scientist-wannabes can contribute great in their quest in:

 Leveraging rich data from the government partners, financial support from corporate partners, NGOs and community members will identify critical urban issues and solutions, and use designers, developers and hackers to prototype workable products

Last speaker was from HP Research lab, Dr Liu Xiaohui who talked about the Bamboo initiative that other than simplifying the cloud infrastructure used in solving big data problem, but will also ease the administration of the infrastructure.  I don’t think I am doing justice to the initiative with my description, hopefully more information will become public soon.

Apparently there’s going to be a call for collaboration soon.  Event to look out for: Cloud Asia. 14-17 May.




No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Exploring and venting about quantitative issues

The Stone and the Shell

Using large digital libraries to advance literary history

Hi. I'm Hilary Mason.

Zoom out, zoom in, zoom out.

Introduction to Data Science, Columbia University

Blog to document and reflect on Columbia Data Science Class

statMethods blog

A Quick-R Companion

the Tarzan

[R] + applied economics.

4D Pie Charts

Scientific computing, data viz and general geekery, with examples in R and MATLAB.