In recent years, Natural Language Processing (NLP) technology has progressed quite rapidly in spite of many challenges. The trend is expected to continue with further advancements in the coming years. Today, there is a plethora of diversified NLP solutions featuring new age technologies. As new solutions come along at a rapid pace, the need emerges for an objective method to compare their performance, scalability and cost. This vastly growing
ecosystem makes it hard for customers to compare features and performance of different systems.
In this post, I have handpicked some of the leading NLP solutions;
- Google Natural Language API
- Microsoft Linguistic Analysis API & Text Analytics API
- Watson Natural Language Understanding
- Stanford CoreNLP
- Natural Language Toolkit (NLTK)
Note: This is not ranking of the products. This is just a list of awesome NLP solutions that I have used till date.
Google Cloud Natural Language API reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app. You can analyze text uploaded in your request or integrate with your document storage on Google Cloud Storage.
CLOUD NATURAL LANGUAGE API FEATURES
1. Syntax Analysis
Extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence.
2. Entity Recognition
Identify entities and label by types such as person, organization, location, events, products and media.
3. Sentiment Analysis
By using this feature you can easily understand the overall sentiment expressed in a block of text.
4. Content Classification
It can classify documents in predefined 600+ categories.
5. Language Support
It supports multiple languages including
- Chinese (Simplified and Traditional),
Microsoft Text Aanalytics API can detect sentiment, key phrases, topics and language from the text. And You can simplify your complex language concepts and parse text with the Linguistic Analysis API.
Capabilities in Text Analytics
1. Sentiment Analysis
Find out what customers think of your brand or topic by analyzing raw text for clues about positive or negative sentiment. This API returns a sentiment score between 0 and 1 for each document, where 1 is the most positive.
2. Key Phrase Extraction
It automatically extract key phrases to quickly identify the main points. For example, for the input text ‘The food was delicious and there were wonderful staff’, the API returns the main talking points: ‘food’ and ‘wonderful staff’.
3. Language Detection
For up to 120 languages, detect which language the input text is written in and report a single language code for every document submitted on the request
Capabilities in Linguistic Analysis API
1. Sentence separation & Tokenization
Given a body of text, one of the first steps in analysis is to break it into sentences and tokens.
2. Part-of-speech tagging
If your text text is splited into tokens, you can find the nouns (entities, persons, places, things, etc.), verbs (actions, changes of state) and more using part-of-speech tagging.
3. Constituency parsing
Determine the internal structure and meaning of a sentence (entities, purpose, etc.) by breaking it into labelled phrases. This helps you understand who is doing what to whom.
Watson Natural Language Understanding provides natural language processing functionality for advanced text analysis. Natural Language Understanding is a collection of APIs that offer text analysis through natural language processing. This set of APIs can analyze text to help you understand its concepts, entities, keywords, sentiment, and more. Additionally, you can create a custom model for some APIs to get specific results that are tailored to your domain.
Features of Watson Natural Language Understanding
1. Uncover insights from structured and unstructured data
It can analyze text to extract meta-data from content such as concepts, entities, keywords, categories, relations and semantic roles.
2. Understand sentiment and emotion
Returns both overall sentiment and emotion for a document, and targeted sentiment and emotion towards keywords in the text for deeper analysis.
3. Grasp multiple languages
NLU understands text in nine languages, and through customization with Watson Knowledge Studio.
Stanford CoreNLP provides a set of human language technology tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and syntactic dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract particular or open-class relations between entity mentions, get the quotes people said, etc.
Features of Stanford CoreNLP
- An integrated NLP toolkit with a broad range of grammatical analysis tools
- A fast, robust annotator for arbitrary texts, widely used in production
- A modern, regularly updated package, with the overall highest quality text analytics
- Support for a number of major (human) languages
- Available APIs for most major modern programming languages
- Ability to run as a simple web service
Stanford CoreNLP’s goal is to make it very easy to apply a bunch of linguistic analysis tools to a piece of text. A tool pipeline can be run on a piece of plain text with just two lines of code. CoreNLP is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and disabled. Stanford CoreNLP integrates many of Stanford’s NLP tools, including the
- Part-of-speech (POS) tagger
- Named entity recognizer (NER)
- The coreference resolution system
- Sentiment analysis
- Bootstrapped pattern learning
- Open information extraction tools.
Moreover, an annotator pipeline can include additional custom or third-party annotators. CoreNLP’s analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.
NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
Features of NLTK
- Lexical analysis: Word and text tokenizer
- n-gram and collocations
- Part-of-speech tagger
- Tree model and Text chunker for capturing
- Named-entity recognition