So far in the series we gave an overview of data science and basics of data enrichment. Next, we conquer the sexiest part of data science—data applications.
Let’s face it, data enrichment is pretty boring. But without data enrichment, you can’t get to the fun stuff—analyzing the data, gaining insights, and blowing the mind of your boss. Big data is only as useful as the insights you pull from it.
With the proper tools and a stroke of mathematical genius, data scientists analyze data and do everything from setting up real time business dashboards to monitoring trends to actually predicting the future. Let’s begin with some of the more popular tools of the trade.
There are three basic categories for data applications: data mining, statistical tools, and business intelligence. KD Nuggets, a leading site on Business Analytics, Big Data, Data Mining, and Data Science, conducted a survey in 2015 to discover what data scientists were actually using.
The top 10 tools by share of users were
One of the most popular data science tools is R. R is free software available under the terms of the Free Software Foundation’s GNU General Public License in source code form. The statistical programming language is used primarily for data analysis and graphics. Check out our complete guide to R for more information!
Another must have for data science is SQL. Structured Query Language is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It was originally developed in June of 1970, introduced in a paper by DR E.F. Codd entitled "A Relational Model of Data for Large Shared Data Banks." Even though it was introduced in 1970, it remains essential to sending and fetching customer data, website movements, and purchases. SQL is a great language to learn whether you are a data scientist or a businessman. Codementor offers a free course to get you started!
As you can judge by the top 10 list, there is a wide variety of tools and applications available to data scientists. Each has their own strengths and weaknesses and adds value in their own way. This being said, tools are only as valuable as the insights you can pull with them. How do we make data relevant?
Business intelligence is all about taking complex data and putting it into a form that is easily understandable to non-technical leaders in your business. While it’s great to get robust, technical data, if it's hard to understand for the rest of the team, then that awesome work remains merely impressive facts as opposed to actionable business data. Great data scientists take complicated data sets and translate them into understandable, actionable insights.
If you have a ton of data, but no actionable insights, it may be time to invest in a BI tool. There comes a point in every growing business where you gather so much data that Excel spreadsheets become impractical and bug-ridden. If you are looking to purchase a BI tool, G2Crowd has a great comparison tool to get you started. Once you have an application, or even if you have a legacy system you need to connect to your new application, our team at Progress DataDirect can connect any data source to any application. Don’t let data integration scare you or keep you from business modernization, we have your back.
Predictive models consist of algorithms that are trained over a period of time with vast amounts of data so that they will eventually provide predictive value. This is where we get into machine learning and deep learning.
These algorithms perform regressions and find relevant trends and correlations to make accurate forecasts. One popular application of deep learning is image recognition. 2012 kicked off the image recognition revolution, when the University of Toronto won an image recognition contest. Since then, Facebook has released facial recognition software, Google maps differentiates between clouds and cars, and Twitter can identify obscene images without the need of a curator. Wired magazine has a great article that delves deeper into Deep Learning.
Algorithms are developed to do everything from natural speech recognition, winning Jeopardy, to successfully predicting crime patterns. Marcus Otero gives some great insight into the algorithms that change our world every single day in his article, “The real 10 algorithms that dominate our world.” One of the most powerful events in this space was in 2013 when IBM’s Watson beat Ken Jennings in a live TV game of Jeopardy. “My past Jeopardy experiences have been great, but they weren’t really weighty with this kind of technological, philosophical, importance. I think we saw something important today." (Ken Jennings)
In this series you have taken a dive into data science as a whole, data enrichment, and data applications. This is a world brimming with potential and a field that is becoming one of the most envied job titles in the world. So whether you are a data scientist now or just someone curious about this awesome field, I encourage you to keep learning, reading, and investing in this big data revolution.
Now that you have a better idea of the intricacies of data science, you may be curious about how to get your company started in collecting, integrating, and analyzing data. Luckily for you, we are there to help you on this road at every step. We specialize in integrating data on premise, in the cloud, or even in hybrid environments from any source. Take a look and if you are interested in any of our products, try them for free today!
Justin Moore is a Data Scientist for Progress DataDirect. He has more than 10 years of experience in software development, analytics, and data analysis. He holds a B.S in theoretical mathematics from North Carolina State University, a Post-baccalaureate certification in computer science, and is currently pursuing a Master’s degree in Data Science through Regis University.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.