Because this post is about most popular languages for data science, it might be reasonable to first ask, “What is data science?”.
Most people are creating hype around data science that data science have focused on the first word: data. They always care about volume and velocity and whatever other buzzwords describe data that is too big for you to analyze in Excel. This hype about the size of the data being collected fed into the second category of hype: hype about tools. People threw around EC2, Hadoop, Pig, and had huge debates about Python versus R.
But the key word in data science is not “data”; it is science. Data science is only useful when the data are used to answer a question.
When it comes to choosing programming language for Data Science and Analytics projects or regarding jobs, people have different views depending on their career backgrounds and domains they worked in. Analytics field is growing with popularity with the advent of Machine Learning and AI all around us. In order to understand data science, beginners must know at least one of the programming languages. Lucky, that are many programming languages that are used for data science.
KDnuggets Analytics/Data Science Survey 2016
According to the popular survey KDnuggets Analytics/Data Science 2016 Software Poll: top 10 most popular tools in 2016, has given below.
The 2017 Top Programming Languages – IEEE Spectrum
IEEE Spectrum has just published its The 2017 Top Programming Languages and below is the ranking of languages.
Results of the 2017 StackOverflow Survey
On the other hand, the recently results of the 2017 StackOverflow Survey of nearly 65,000 developers were published, and includes lots of interesting insights about their work, lives and preferences. The results include a cross-tabulation of the most popular languages amongst the “Data Scientist/Engineer” subset, and the results were … well, super surprising:
Analysis Of Data From indeed.com
Here I have used the trend search available on indeed.com. It looks for occurrences over time of selected terms in job offers. It gives an indication of what skills employers are seeking.
Note: However, it is not a poll on which skills are effectively in use. It is just an indicator of how skill popularity evolve.
Running this query I got the below data:
This data tells there is a good increase of popularity for machine learning and data science over the last few years. Python is the clear leader, followed by R, Java. Julia popularity is not there, but Will Julia turn in one of the popular languages for machine learning and data science? Only future will tell that.
When I focus on the keyword deep learning and run with this query, the data is given below:
Here, Python is still the leader.
My Fab Languages For Crunching Data
Here is the my favorite list of languages that I use daily basis for crunching data. Though, I don’t use Scala and GO but still I kept it in my list because I thought you might be interested in.
R has been kicking around since 1997 as a free alternative to pricey statistical software, such as Matlab or SAS.
Recent years’, it’s become the golden child of data science. It has simple and good appeal and it’s greatest asset is the vibrant ecosystem has developed around it: The R community is constantly adding new packages and features to its already rich function sets. Through R, you can play with complex data sets, manipulate data through sophisticated modeling functions, and create awesome graphics to represent the numbers, in just a few lines of code.
Python’s ecosystem has grown dramatically in recent years, making it more capable of the statistical analysis. Python is intuitive and easier to learn than R.
“It’s the big one people in the industry are moving toward. Over the past two years, there’s been a noticeable shift away from R and towards Python,” – Paul Butler, data scientist at Chango and formerly at Facebook;
The vast majority of data science today is conducted through R, Python, Java, MatLab, and SAS. But there’s still gaps to be filled, and Julia is one newcomer to watch.
“It’s up and coming. Eventually, you’ll be able to do anything you could have done in R and Python, in Julia,” – Butler.
Java, and Java-based frameworks are the skeletons of most of the tech companies. It doesn’t provide the same quality of visualizations R and Python do, and it isn’t the best for statistical modeling. But if you are moving past prototyping and need to build large systems, Java is often your best bet.
“If you look inside Twitter, Linkedin, or Facebook, you will find that Java is the foundational language for all of their data engineering infrastructures,” – Michael Driscoll, CEO of Metamarkets;
Scala is another Java-based language and, similar to Java, it’s increasingly becoming the tool for anyone doing machine learning at large scales, or building high-level algorithms. It’s expressive, and also capable of building robust systems.
“Java is like building in steel. Scala is like working with clay that you can then put into a kiln and turn into steel,” –Michael Driscoll, CEO of Metamarkets;
MatLab has been around for eternity, and despite its price tag, it’s still widely used in very specific niches: research-intensive machine learning, signal processing, and image recognition, to name a few.
Octave is very similar to MatLab, except it’s free. Still, it’s rarely seen outside of academic signal processing circles.
GO is another newcomer that’s gaining steam. It was developed by Google, loosely derives from C, and is gaining ground against rivals such as Java and Python for building robust infrastructures.