It was March 7th this year, when this mail from the ASA found its way to the ASA members: On first sight, it didn’t look like that one needs to pay too much attention, but in the longer pdf-version, you can read these six principles: P-values can indicate how incompatible the data are with a […]
Posted on 04/30/2016, 17:56, by martin, under
Big Data.
Well, if you are not in Data Science today, you are apparently missing a major trend … many say. Just in the last year, I witnessed at least three people mutating from ordinary computer scientists or statisticians into data scientists or data engineers. If you don’t really know what these people do, Analytics Vidhya has an easy […]
Posted on 03/05/2016, 16:49, by martin, under
Big Data.
So far, my favorite description of Big Data is: Big Data is when it is cheaper to keep all data than to think about what data you probably need to answer your (business) questions. Why is this description so attractive? Well, Big Data is primarily a technology, i.e. storing data in a Hadoop File System […]
The blog post by Vincent Granville “Data science without statistics is possible, even desirable” starts talking about “old statistics” and “new statistics”, which started some more discussion about how statistics and data science relate. Whereas I agree that there is something like “old” and “new” thinking of the role and the tools in statistics, I am less […]
I’ve been thinking about what Big Data really is for quite a while now, and am always happy about voices that can shed some more light on this phenomenon – especially by contrasting to what we call statistics for centuries now. Recently I stumbled over two articles. The first is from Miko Matsumura, who claims “Data […]
… would have been the better title for the book “Graphics of Large Datasets“. As the book was published a few years before the birth of the next buzz word promoted by McKinsey with the white paper “Big data: The next frontier for innovation, competition, and productivity“, we just did not know better. But to be […]
Google is certainly the world champion in collecting endless masses of data, be it search terms, web surfing preferences, e-mail communication, social media posts and links, … As a consequence, at Google they are not only masters of statistics (hey, my former boss at AT&T Labs who was heading statistics research went there!) but they also […]
With Big Data and the internet, we all feel like we can know and analyze everything. Certainly Google must feel that way, as they collect not only data, but also what we – the users – find interesting in that vast pile of information. As we should always keep in mind: Google is not charity […]
I read the President’s Corner of the last ASA Newsletter by Bob Rodrigez the other day and had some flashback to times when statistics met Data Mining in the late 90s. Daryl Pregibon – who happened to be my boss at that time – put forth a definition of data mining as “statistics at scale […]