We evaluate, compare, and demonstrate different packages for performing portfolio optimization. There are several options available
- Optimization using scipy.optimize
- Optimization with cvxopt
- Optimiation with cvxpy
Regular Expressions (regex)¶
This post explores performing regular expressions (also known as 'regex') in Python using the
re module in Python. The regular expression is a sequence of characters that define a search pattern. Applications of regular expressions are (i) searching for certain files names in a directory (ii) string processing such as search and replacement (iii) syntax highlighting (iv) data/web scraping (v) natural language processing.
Term Frequency - Inverse Document Frquency (tf-idf) using gensim¶
tf-idf allows the analysis of the most important words in the corpus. A corpus (that is a collection of documents) can have words across each document that are shared. For example, a corpus on finance might mention money and we would like to down-weight this keyword. The idea is to make sure that article-specific frequent words are weighted heavily and these article-shared words are weighed low.
A token is the technical name for a sequence of characters — such as car, his, or :) — that we want to treat as a group. Tokenization is breaking up a text into these tokens.
The most common module for natural language in Python is
nltk that is an acronym for the Natural Language Tool Kit. That are several tokenization commands that can be used by using
from nltk.tokenize import <cmd> where
<cmd> is replaced by the token commands below: