# Named Entity Recognition (NER)

Use the Stanford CoreNLP with nltk

NER is basically extracting the most important keywords in the article and categorizing it according to the Table below.

# NLP Capstone

## NLP final product (single document)¶

This code is a capstone of all the processes we learnt so far. It will allow the user to input the text of any single document and we will immediately extract keywords to understand what the document is about.

# NLP: Detecting the occurence of fakenews

## NLP fake news classifier¶

We download the fake news dataset from kaggle and perform a supervised classification machine learning model.

# Polyglot

## polyglot¶

The polyglot package is used as it allows NLP to be applied to ~200 languages

# Portfolio optimization & backtesting

We evaluate, compare, and demonstrate different packages for performing portfolio optimization. There are several options available

1. Optimization using scipy.optimize
2. Optimization with cvxopt
3. Optimiation with cvxpy

# Regular expressions (regex)

## Regular Expressions (regex)¶

This post explores performing regular expressions (also known as 'regex') in Python using the re module in Python. The regular expression is a sequence of characters that define a search pattern. Applications of regular expressions are (i) searching for certain files names in a directory (ii) string processing such as search and replacement (iii) syntax highlighting (iv) data/web scraping (v) natural language processing.

# Term Frequency - Inverse Document Frequency (tf-idf) with gensim

## Term Frequency - Inverse Document Frquency (tf-idf) using gensim¶

tf-idf allows the analysis of the most important words in the corpus. A corpus (that is a collection of documents) can have words across each document that are shared. For example, a corpus on finance might mention money and we would like to down-weight this keyword. The idea is to make sure that article-specific frequent words are weighted heavily and these article-shared words are weighed low.

The most common module for natural language in Python is nltk that is an acronym for the Natural Language Tool Kit. That are several tokenization commands that can be used by using from nltk.tokenize import <cmd> where <cmd> is replaced by the token commands below: