My Python workflow for data science and financial research

This summarizes my Python setup for data science/finance/economics research. It includes some of the useful Python packages, JupyterLab extensions, and programs I find useful in my development.

It assumes the following:

  • Basic knowledge of using conda for installing Python packages.

  • Ensure that conda-forge is one of the default installation channels for Python packages.


Command line multiplexer

/images/python/tmux_multiplexer.png

Command line multiplxer (tmux!)

I often have a screen multiplexer for my console screen (i.e., command line prompt). This is because I might be updating this website, doing some coding with Jupyter lab, or need to install some Python packages etc.

Windows

  • For multiplexer support for Anaconda Prompt, install ConEmu.

  • Under Settings -> Startup -> Command line.

  • Key in the file location of where the Anaconda Prompt batch file is (e.g., "%windir%\System32\cmd.exe "/K" C:\Users\<YOUR USERNAME>\Anaconda3\Scripts\activate.bat").

Tip

  • If uncertain of the above entry, right-click your Anaconda Prompt icon and Open File Location.

  • Right-click the Anaconda Prompt shortcut.

  • Go Shortcut -> Target.

  • In Target, you will find the correct entry that is required such that when ConEmu starts up a new instance of the Console, it will automatically open up the Anaconda Prompt instead of the Windows Command Prompt.

Linux

Installing Python packages

  • Use the following command in Terminal to install packages.

$ conda install <PKG1> <PKG2> <PKG3>

Core data analytics

Package Name

Description

numpy

Matrix operations

scipy

Scientific operations

pandas

DataFrame operations

statsmodels

Statistical models

matplotlib

Data visualization

Finance

Package Name

Description

pandas_datareader

Reading financial data from the web

arch

Volatility modelling

cvxopt

Core optimization

cvxpy

Wrapper to "nicely" interface into cvxopt

prophet

Facebook's timeseries forecasting tool

Data visualization

Package Name

Description

seaborn

Colorful data visualization

bokeh

Interactive data visualization

geoplotlib

Create maps and geographical visualization

Gleam

Similar to R's Shiny web App

Altair

New framework for plotting graphics

plotnine

Similar to R's ggplot2 using grammer of graphics

Machine & deep learning

Package Name

Description

scikit-learn

Core machine learning package

xgboost

The first publicly available gradient boosting package. Released by Tianqi Chen (University of Washington, Seattle)

lightgbm

Gradient Boosted Decision Trees package (Microsoft)

catboost

Gradient Boosting Decision Trees package (Yandex)

keras

High-level neural networks API

Tensorflow

Deep learning package from Google

Natural language processing (NLP)

Package Name

Description

nltk

General NLP tasks

textblog

Creating NLP prototypes quickly

gensim

NLP applications for topic modelling, document similarity, etc.

scrapy

Web scraping

spacy

Production-level NLP library

Favourite JupyterLab extensions

See JupyterLab extensions for more details and GitHub for a full list of available extensions for JupyterLab.

  • Install nodejs

$ conda install -c conda-forge nodejs
  • Installing Jupyterlab extensions in Linux as follows:

$ jupyter labextension install @jupyterlab/<EXT_NAME>
$ jupyter labextension install jupyterlab_nbmetadata
  • You can install plugins in Jupyter lab by clicking the jigsaw icon on the menu bar on the menu bar on the right of the editor.

  • The following are my favourite Jupyter lab plugins.

Extension name

Description

toc

Table of contents

jupyterlab_variableInspector

Variable inspector in Jupyter Lab

jupyterlab_nbmetadata

Allows you to edit the notebook metadata

jupyterlab_go_to_definition

Jump to definition of a variable or function in JupyterLab notebook and file editor

  • Installing Jupyter notebook extensions

$ conda install -c conda-forge jupyter_contrib_nbextensions

Additional Jupyter commands

  • jupyter notebook list: Lists down all running Jupyter notebooks and which portIDs they are running on (e.g., 8889, 8888)

  • jupyter notebook stop <portID>: Stops the Jupyter notebook from running once the portID is provided.

Comments