My Python workflow for data science and financial research
Contents
This summarizes my Python setup for data science/finance/economics research. It includes some of the useful Python packages, JupyterLab extensions, and programs I find useful in my development.
It assumes the following:
Basic knowledge of using conda for installing Python packages.
Ensure that conda-forge is one of the default installation channels for Python packages.
Command line multiplexer
I often have a screen multiplexer for my console screen (i.e., command line prompt). This is because I might be updating this website, doing some coding with Jupyter lab, or need to install some Python packages etc.
Windows
For multiplexer support for Anaconda Prompt, install ConEmu.
Under Settings -> Startup -> Command line.
Key in the file location of where the Anaconda Prompt batch file is (e.g.,
"%windir%\System32\cmd.exe "/K" C:\Users\<YOUR USERNAME>\Anaconda3\Scripts\activate.bat"
).
Tip
If uncertain of the above entry, right-click your Anaconda Prompt icon and Open File Location.
Right-click the Anaconda Prompt shortcut.
Go Shortcut -> Target.
In Target, you will find the correct entry that is required such that when ConEmu starts up a new instance of the Console, it will automatically open up the Anaconda Prompt instead of the Windows Command Prompt.
Linux
I use tmux with my handy tmux cheatsheet. For more instructions on how to install it in Linux, see Setting up the Pixelbook/Chromebook for Python programming.
Installing Python packages
Use the following command in Terminal to install packages.
$ conda install <PKG1> <PKG2> <PKG3>
Core data analytics
Package Name |
Description |
---|---|
numpy |
Matrix operations |
scipy |
Scientific operations |
pandas |
DataFrame operations |
statsmodels |
Statistical models |
matplotlib |
Data visualization |
Finance
Package Name |
Description |
---|---|
pandas_datareader |
Reading financial data from the web |
arch |
Volatility modelling |
cvxopt |
Core optimization |
cvxpy |
Wrapper to "nicely" interface into cvxopt |
prophet |
Facebook's timeseries forecasting tool |
Data visualization
Package Name |
Description |
---|---|
seaborn |
Colorful data visualization |
bokeh |
Interactive data visualization |
geoplotlib |
Create maps and geographical visualization |
Gleam |
Similar to R's Shiny web App |
Altair |
New framework for plotting graphics |
plotnine |
Similar to R's ggplot2 using grammer of graphics |
Machine & deep learning
Package Name |
Description |
---|---|
scikit-learn |
Core machine learning package |
xgboost |
The first publicly available gradient boosting package. Released by Tianqi Chen (University of Washington, Seattle) |
lightgbm |
Gradient Boosted Decision Trees package (Microsoft) |
catboost |
Gradient Boosting Decision Trees package (Yandex) |
keras |
High-level neural networks API |
Tensorflow |
Deep learning package from Google |
Natural language processing (NLP)
Package Name |
Description |
---|---|
nltk |
General NLP tasks |
textblog |
Creating NLP prototypes quickly |
gensim |
NLP applications for topic modelling, document similarity, etc. |
scrapy |
Web scraping |
spacy |
Production-level NLP library |
Favourite JupyterLab extensions
See JupyterLab extensions for more details and GitHub for a full list of available extensions for JupyterLab.
Install nodejs
$ conda install -c conda-forge nodejs
Installing Jupyterlab extensions in Linux as follows:
$ jupyter labextension install @jupyterlab/<EXT_NAME> $ jupyter labextension install jupyterlab_nbmetadata
You can install plugins in Jupyter lab by clicking the jigsaw icon on the menu bar on the menu bar on the right of the editor.
The following are my favourite Jupyter lab plugins.
Extension name |
Description |
---|---|
toc |
Table of contents |
jupyterlab_variableInspector |
Variable inspector in Jupyter Lab |
jupyterlab_nbmetadata |
Allows you to edit the notebook metadata |
jupyterlab_go_to_definition |
Jump to definition of a variable or function in JupyterLab notebook and file editor |
Installing Jupyter notebook extensions
$ conda install -c conda-forge jupyter_contrib_nbextensions
Additional Jupyter commands
jupyter notebook list: Lists down all running Jupyter notebooks and which portIDs they are running on (e.g., 8889, 8888)
jupyter notebook stop <portID>: Stops the Jupyter notebook from running once the portID is provided.
Comments