Data Science 101 - Definitions You Need to Know - Shelly Palmer

By Thought Leaders Archives
Cover image for  article: Data Science 101 - Definitions You Need to Know - Shelly Palmer

Our 7th Annual Media Technology Summit is just a few weeks away and the subject of data science is going to be front and center. Each and every one of us creates a wealth of data every day, but... information is not knowledge. The data must be wrangled and put in context to make it actionable. There are many different techniques one can apply to data to accomplish this goal, but an important part of the process falls in to the multi-disciplinary field of data science. Which is what exactly?

Data Science -- the analysis of data using the scientific method. (It may be the most overused term of the year, but you're unlikely to have a meeting where the topic does not come up.)

Data Scientists -- There are many who say that data scientists (people who practice data science) are not really scientists. That seems unfair. While there are a bunch of charlatans (people and organizations) passing themselves off as data scientists, I would argue that if the scientific method is applied (which if you remember from middle school science class is generally a statistically controlled six-step process: question, research, hypothesize, experiment, analyze, conclude) the professionals doing the work qualify as scientists. Let's not get caught up on whether or not data science is real science. There are people who use analytical tools to find patterns in data, let's call them data scientists.

Data Wrangling or Data Munging – is a laborious process of manually extracting, mapping, converting or generally cleaning up data in raw form. Data wranglers use algorithms (a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer) to parse disparate types of data and fit it into defined structures. The ultimate goal is to prep the data for storage and future use.

Big Data – can mean anything that anyone wants it to mean. It is on my list of banned words and really is more of a concept than an agreed upon thing. However, it is usually defined as sets of data that are too large and complex to manipulate or interrogate with standard methods or tools -- in other words... big. If you collect big amounts of data, go ahead and call it big data. A good example of a big data set is all the digital health records in the United States or all of the viewer data from all of Comcast's set-top boxes.

Hadoop – Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. It is commonly used in "Hadoop clusters" which are purpose-designed computational clusters.

Multivariate statistical analysis -- is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical outcome variable at a time. There are several uses for multivariate analysis such as: capability-based design, inverse design (where variables can be treated independently), AoA (Analysis of Alternatives) and correlations across hierarchical levels.

Time-series analysis -- is the use of a model to predict future values based on previously observed values. It differs from regression analysis (which is often used to test theories that the current values of one or more independent time-series affect the current value of another time-series) in that time-series analysis focuses on comparing values of a single time-series or multiple dependent time-series at different points in time.

Multidimensional array -- a data structure that has the semantics of an array of arrays, all of which may be indexed with values of any data type, usually with a supporting syntax built-into a programming language.

These are just a few of the terms that you should know if you're going to discuss data science with your HR department or anyone else for that matter. There is a great deal of myth and mystery around this subject, such as:

· Are data scientists just statisticians with fancy titles?

· Don't we need super-expensive data appliances to support a data science department?

· Aren't all these people just academics who don't know anything about business?

Of course, everyone really wants to know: Where can I find one? These are all great questions. For great answers, come join us at the7th Annual Media Technology Summit on October 23rd at the Sheraton Times Square. BTW, if you've read this far, email me for a discount code – you deserve it!

Shelly Palmer is Fox 5 New York's On-air Tech Expert (WNYW-TV) and the host of Fox Television's monthly show Shelly Palmer Digital Living. He also hosts United Stations Radio Network's, Shelly Palmer Digital Living Daily, a daily syndicated radio report that features insightful commentary and aShelly Palmerunique insiders take on the biggest stories in technology, media, and entertainment. He is Managing Director of Advanced Media Ventures Group, LLC an industry-leading advisory and business development firm and a member of the Executive Committee of the National Academy of Television Arts & Sciences (the organization that bestows the coveted Emmy® Awards). Palmer is the author of Television Disrupted: The Transition from Network to Networked TV 2nd Edition (York House Press, 2008) the seminal book about the technological, economic, and sociological forces that are changing everything, Overcoming The Digital Divide: How to use Social Media and Digital Tools to Reinvent Yourself and Your Career; (York House Press, 2011) and Digital Wisdom: Thought Leadership for a Connected World (York House Press, 2013). For more information, visit shellypalmer.com.

Read all Shelly's MediaBizBloggers commentaries at Shelly Palmer Report.

Check us out on Facebook at MediaBizBloggers.com
Follow our Twitter updates @MediaBizBlogger

The opinions and points of view expressed in this commentary are exclusively the views of the author and do not necessarily represent the views of MediaBizBloggers.com management or associated bloggers. MediaBizBloggers is an open thought leadership platform and readers may share their comments and opinions in response to all commentaries.

Copyright ©2024 MediaVillage, Inc. All rights reserved. By using this site you agree to the Terms of Use and Privacy Policy.