Natural Language Processing (NLP) is a field of artificial intelligence that focuses on extracting and understanding important information from text. Some applications are as follows:

Long essay (>5,000 words) to read and understand we can easily extract what the major themes of the text (i.e., bag-of-words analysis).
A whole set of documents (i.e., corpus) that we know is about cooking, but we want to see what the major themes are in the corpus like does the corpus mention value Asian cooking or Western cooking more (i.e., term frequency-inverse document frequency).
Chatbot application where we need to understand what questions a person is asking, and then redirect him to the right resources to answer his question.
Need to translate a set of documents from English to French, or vice-versa.
Trying to analyze the general sentiment on a specific topic (i.e., immigration) by using a large set of documents written on that topic.

Specifically for finance, NLP can be used in the following applications * Extracting the buy/sell sentiment on a corpus of sell-side analysts reports on a specific stock. * Understanding economic sentiment from conference calls, online job postings, inflation chatter, e-invoicing, etc. * Extracting the amount of media attention to political events * Extracing a long-term view of the global economic sentiment from Warran Buffet's annual reports.

There are several popular NLP libraries as shown below and I found the nice summary below from a ActiveWizards:

nlp library

Important packages to install using conda for Natural Language Processing are:

gensim
stop_words

In this blog series, we will go through the following packages where [x] is available and [ ] is in the pipeline:

[x] Regular expressions with re
[x] Tokenization with nltk
[x] Bag-of-Words (BoW) with nltk
[x] Bag-of-Words (BoW) with gensim
[x] Named-Entity-Recognition (NER) with nltk
[ ] Production grade nlp with spacy
[ ] Translation with polyglot
[x] Text classification with sklearn
[ ] Text scraping with BeautifulSoup:
[ ] [CNNs with t]

Comments