Open source platforms like Python and R play an important role in the Data Science market. In the recent years Python has gained a lot of attraction in Data Science industry along with R. In this post, I have listed 5 most popular and useful python libraries for Machine Learning and Deep Learning.
There is a lot of confusion these days about Machine Learning (ML) and Deep Learning (DL).
Machine Learning VS Deep Learning
ML is field of Computer science that uses statistical or mathematical techniques to construct a model from observed data rather than have user enter specific set of instructions that define the model for that data. It involves many modelling techniques (mostly statistical) like Linear regression, logistic regression, K-means, Decision Trees, Random Forest, PCA, SVM, ANN etc.
Among these Artificial Neural Networks or ANN are one special important class of models. When artificial neural networks are designed with multiple hidden layers they form Deep Neural networks(DNNs) which is sometimes called Deep Learning. So, Deep Learning is nothing more than ANN with multiple layers and DL is just a better algorithm than SVM, Decision Trees or Random forest.
“Software is eating the world”, “Deep Learning is eating ML”
Machine Learning Library
Scikit-learn is open source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN. It is well-designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. The package is built on the top of SciPy and makes heavy use of its math operations.
Deep Learning Libraries
one of the most prominent libraries for Python in the feild of deep learning is Keras, which can function either on top of TensorFlow or Theano.
An open-source software library for Machine Intelligence. TensorFlow was developed by Google Brain team and they made it open source on November 9, 2015. It is a successor of DistBelief Net which a Machine Learning system, based on Neural Networks. This is very efficient for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It is similar to NumPy, along with math operations and expressions. Theano was developed by the Machine Learning group of Université de Montréal. The library also optimizes the use of GPU and CPU and make the performance of data-intensive computation even faster. Actually, it serves as the building blocks for neural networks whereas NumPy serves the building blocks for scientific computing.
Keras is a prominent open source library written in Python for building Neural Networks. It is capable of running on top of MXNet, Deeplearning4j, Tensorflow, Microsoft Cognitive Toolkit(CNTK) or Theano. The library contains numerous implementations of commonly used neural network building blocks such as layers, objectives, activation functions, optimizers, and a host of tools to make working with image and text data easier.
Google Trends result for last one year
Lightweight library to build and train neural networks in Theano. It supports Convolutional Neural Networks (CNNs), recurrent networks including Long Short-Term Memory (LSTM). It provides transparent support of CPUs and GPUs due to Theano’s expression compiler. You can use this if you want the flexibility of Theano but don’t want to always write neural network layers from scratch.
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Google’s DeepDream is based on Caffe Framework. Caffe isn’t a Python library but it does provide bindings into the Python programming language. Actually, it is a BSD-licensed C++ library which provides a python interface.
A bonus list of Python Libraries :
Blocks is a framework that helps you build neural network models on top of Theano.
Pylearn2 is a library that wraps a lot of models and training algorithms such as Stochastic Gradient Descent that are commonly used in Deep Learning. Its functional libraries are built on top of Theano.
DeepPy is a another Python deep learning framework built on top of NumPy.
deepnet is a GPU-based python implementation of deep learning algorithms. It includes Feed-forward Neural Nets, Restricted Boltzmann Machines, Deep Belief Nets, Autoencoders, Deep Boltzmann Machines and Convolutional Neural Nets.
Gensim is a deep learning toolkit implemented in python programming language. It was intended for handling large text collections, using efficient algorithms.
nolearn contains a number of wrappers and abstractions around existing neural network libraries. As Keras wraps Theano and TensorFlow to provide a friendly API similarly nolearn is a wrappers and abstractions for Lasagne, along with few machine learning utility modules.
Passage is best suited library for text analysis with RNNs.
The Microsoft Cognitive Toolkit(CNTK),is also a deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. CNTK allows to easily realize and combine popular model types such as feed-forward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an open-source license since April 2015.
This list is by no means exhaustive and there are few other popular deep learning libraries available in the market in other languages except python. Deeplearmmning 4j is a very powerful java library which supports GPU and mapduce. Torch is another scientific computing framework with wide support for machine learning algorithms that puts GPUs first which is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation. Similarly, deepnet is a r package that implements some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on. In R Programming, darch is a package that can be used for generating neural networks with many layers.