These are a list of helpful Python tips & tricks for my workflow

[ ] zip
[ ] List/Dict comprehensions
[ ] Generators
[ ] pandas

zip¶

z1 = zip(l1,l2) allows us to combine two lists (i.e., l1,l2) together such that we get a list of tuples. This is useful if you have different lists (i.e., names, phone number, addresses, etc.) that are all connected and need to group them up togther quickly.

list(z1) converts this zip object into a list of tuples.

*z1 unpacks the zip object. Once you unpack the object, there are no items in the zip variable

In [2]:

l1 = range(0,5)
l2 = ['a','b','c','d','e','f']

In [10]:

z1 = zip(l1,l2)
print(type(z1))
print(*z1)

<class 'zip'>
(0, 'a') (1, 'b') (2, 'c') (3, 'd') (4, 'e')

In [12]:

z1 = zip(l1,l2)
listZ = list(z1)
print(listZ)

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]

List/Dict comprehensions¶

[ ] List comprehension.
[ ] List comprehension with if conditional.
[ ] List comprehension with nexted if conditional.
[ ] List comprehension with if-else conditional.
[ ] Nested List comprehensions.
[ ] Dict comprehension.

List comprehension¶

In [40]:

x = ['alpha','beta','gamma','theta']
x = [print(val) for val in x]

alpha
beta
gamma
theta

List comprehension (if-conditional)¶

if conditionals come after the for loop

In [36]:

x = list(range(0,100,5))
print('x:{0}'.format(x))
cond_x = [val for val in x if val > 50]
print('y:{0}'.format(cond_x))

x:[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
y:[55, 60, 65, 70, 75, 80, 85, 90, 95]

List comprehension (nested if-conditional)¶

In [37]:

x = list(range(0,100,5))
print('x:{0}'.format(x))
cond_x = [val for val in x if val > 50 if val % 2]
print('y:{0}'.format(cond_x))

x:[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
y:[55, 65, 75, 85, 95]

List comprehension (if-else conditional)¶

if-else conditionals come before the for loop

In [38]:

x = list(range(0,100,5))
print('x:{0}'.format(x))
cond_x = [val if val > 50 else 0 for val in x]
print('y:{0}'.format(cond_x))

x:[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
y:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 60, 65, 70, 75, 80, 85, 90, 95]

Nested List comprehensions¶

We show how to perform nested for loops using list comprehensions. The first code is how we would write it using a for loop.

In [14]:

my_list = []

for x in [10, 25, 50]:
    for y in [1, 3, 5]:
        my_list.append(x * y)

print(my_list)

[10, 30, 50, 25, 75, 125, 50, 150, 250]

Below is how to write a nested for loop using list comprehensions

In [16]:

nested_my_list = [ x*y for x in [10,25,50] for y in [1,3,5] ]
print(nested_my_list)

[10, 30, 50, 25, 75, 125, 50, 150, 250]

Dict comprehension¶

COMING SOON!

Generator expressions¶

Generators have same syntax as list comprehensions except using () instead of []. You have to iterate over generators using .items() or use the next(genObj) to access each item in a generator.

In [70]:

list = ['tracy','clarissa','tom','hyacinth','sowhattoo']

gen_expr = ( len(l) for l in list )
print('item 1:{0}'.format(next(gen_expr))) # prints first item
print('item 2:{0}'.format(next(gen_expr))) # prints second item

print(*gen_expr) # prints all remaining items items

[print(l) for l in gen_expr] # will be empty list because of use of *gen_expr

print(','.join(list))

item 1:5
item 2:8
3 8 9
tracy,clarissa,tom,hyacinth,sowhattoo

Pandas¶

Dataframe manipulation
Filtering

In [28]:

import os
import seaborn as sns
import pandas as pd
import numpy as np

In [21]:

iris = sns.load_dataset('iris')
print(iris.head())
iris.info()

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
species         150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB

In [3]:

iris.describe()

Out[3]:

	sepal_length	sepal_width	petal_length	petal_width
count	150.000000	150.000000	150.000000	150.000000
mean	5.843333	3.057333	3.758000	1.199333
std	0.828066	0.435866	1.765298	0.762238
min	4.300000	2.000000	1.000000	0.100000
25%	5.100000	2.800000	1.600000	0.300000
50%	5.800000	3.000000	4.350000	1.300000
75%	6.400000	3.300000	5.100000	1.800000
max	7.900000	4.400000	6.900000	2.500000

DataFrame extraction: When you have single brackets ([]) for selecting columns of a DataFrame, it returns a Series. If you use double brackets ([[]]) it returns a DataFrame.

In [9]:

sepal_data = iris['sepal_length']
print(type(sepal_data))

sepal_data = iris[['sepal_length']]
print(type(sepal_data))

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

Filtering data: When filtering across multiple criteria, remember to use np.logical_and or np.logical_or

In [23]:

filter_data1 = iris[iris['sepal_length']>5]
filter_data2 = iris[np.logical_and(iris['sepal_length']>5,iris['petal_width']>0.3)]

In [25]:

filter_data1.head()

Out[25]:

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
5	5.4	3.9	1.7	0.4	setosa
10	5.4	3.7	1.5	0.2	setosa
14	5.8	4.0	1.2	0.2	setosa
15	5.7	4.4	1.5	0.4	setosa

In [26]:

filter_data2.head()

Out[26]:

	sepal_length	sepal_width	petal_length	petal_width	species
5	5.4	3.9	1.7	0.4	setosa
15	5.7	4.4	1.5	0.4	setosa
16	5.4	3.9	1.3	0.4	setosa
21	5.1	3.7	1.5	0.4	setosa
23	5.1	3.3	1.7	0.5	setosa

Reading csv: When we read csv files, sometimes pandas is unable to recognize the format. We have two options are to:

Read in the csv file and perform a conversion later
Write a dateparser for the pd.read_csv command

In [46]:

currDir = os.getcwd()
fileName = currDir + '\\inputs\\' + 'FF_10_Industry_Portfolios.CSV'

df_10indus_m = pd.read_csv(fileName,skiprows=11,nrows=1107,index_col=0,parse_dates=True)
df_10indus_m.head()

Out[46]:

	NoDur	Durbl	Manuf	Enrgy	HiTec	Telcm	Shops	Hlth	Utils	Other
192607	1.45	15.55	4.69	-1.18	2.90	0.83	0.11	1.77	7.04	2.16
192608	3.97	3.68	2.81	3.47	2.66	2.17	-0.71	4.25	-1.69	4.38
192609	1.14	4.80	1.15	-3.39	-0.38	2.41	0.21	0.69	2.04	0.29
192610	-1.24	-8.23	-3.63	-0.78	-4.58	-0.11	-2.29	-0.57	-2.63	-2.85
192611	5.21	-0.19	4.10	0.01	4.71	1.63	6.43	5.42	3.71	2.11

Perform a conversion on the datetime index

In [47]:

timeformat = '%Y%m' # Can be as complex as '%Y-%m-%d %H:%M'
df_10indus_m.index = pd.to_datetime(df_10indus_m.index,format='%Y%m')
df_10indus_m.head()

Out[47]:

	NoDur	Durbl	Manuf	Enrgy	HiTec	Telcm	Shops	Hlth	Utils	Other
1926-07-01	1.45	15.55	4.69	-1.18	2.90	0.83	0.11	1.77	7.04	2.16
1926-08-01	3.97	3.68	2.81	3.47	2.66	2.17	-0.71	4.25	-1.69	4.38
1926-09-01	1.14	4.80	1.15	-3.39	-0.38	2.41	0.21	0.69	2.04	0.29
1926-10-01	-1.24	-8.23	-3.63	-0.78	-4.58	-0.11	-2.29	-0.57	-2.63	-2.85
1926-11-01	5.21	-0.19	4.10	0.01	4.71	1.63	6.43	5.42	3.71	2.11

Write a date parser as shown below

In [40]:

dateparser = lambda x: pd.datetime.strptime(x,'%Y%m')
dateparser('192004')

df_10indus_m = pd.read_csv(fileName,skiprows=11,nrows=1107,index_col=0,parse_dates=True,date_parser=dateparser)
df_10indus_m.head()

Out[40]:

	NoDur	Durbl	Manuf	Enrgy	HiTec	Telcm	Shops	Hlth	Utils	Other
1926-07-01	1.45	15.55	4.69	-1.18	2.90	0.83	0.11	1.77	7.04	2.16
1926-08-01	3.97	3.68	2.81	3.47	2.66	2.17	-0.71	4.25	-1.69	4.38
1926-09-01	1.14	4.80	1.15	-3.39	-0.38	2.41	0.21	0.69	2.04	0.29
1926-10-01	-1.24	-8.23	-3.63	-0.78	-4.58	-0.11	-2.29	-0.57	-2.63	-2.85
1926-11-01	5.21	-0.19	4.10	0.01	4.71	1.63	6.43	5.42	3.71	2.11

Looping¶

List: for idx, val in enumerate(list): returns the idx,val of the list
Dictionary: for key,val in dict.items(): returns the key,val of the dict
2D array: for item in np.nditer(2Darray): returns every item in the 2D numpy array
DataFrame: for idx,info in df.iterrows(): returns the index row, and the information in that row as a Series

Map vs. apply vs. applymap¶

Command	Description	Example
Map	Iterates over each element of a `Series`.	'df["col1"].map(lambda x: 5+x)': Adds 5 to each element of `col1`. `df["col1"].map(lambda x: "BNE"+x)`: Concatenate “BNE“ at the beginning of each element of column2 (column format is string).
Apply	Applies a function along any axis of the DataFrame.	df[[‘col1’,’col2’]].apply(sum), it will returns the sum of all the values of col1 and col2.
ApplyMap	Applies a function to each element of the DataFrame.	func = lambda x: x+2 df.applymap(func), will add 2 to each element of dataframe (all columns of dataframe must be numeric type)

Counting items¶

Use the collections.defaultdict whenever you can compared to a normal dict {} as its faster. Use collections.defaultdict(int) when setting up a dictionary to count items.
Use the collections.Counter on any Series or data to get a list of tuples of the count of each value.
Use the df["col1"].value_counts() is another way to get a count of all items in that column.

In [79]:

iris.head()

Out[79]:

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

Using value_counts¶

In [92]:

iris['species'].value_counts()

Out[92]:

virginica     50
versicolor    50
setosa        50
Name: species, dtype: int64

In [93]:

iris['sepal_length'].value_counts()

Out[93]:

5.0    10
6.3     9
5.1     9
6.7     8
5.7     8
5.5     7
5.8     7
6.4     7
6.0     6
4.9     6
6.1     6
5.4     6
5.6     6
6.5     5
4.8     5
7.7     4
6.9     4
5.2     4
6.2     4
4.6     4
7.2     3
6.8     3
4.4     3
5.9     3
6.6     2
4.7     2
7.6     1
7.4     1
4.3     1
7.9     1
7.3     1
7.0     1
4.5     1
5.3     1
7.1     1
Name: sepal_length, dtype: int64

Using defautdict¶

In [95]:

import collections

spec_cnt = collections.defaultdict(int)

spec = iris['species']

for s in spec:
    if s in spec_cnt.keys():
        spec_cnt[s] += 1
    else:
        spec_cnt[s] = 1

print(spec_cnt.keys())
print(spec_cnt.values())

dict_keys(['setosa', 'versicolor', 'virginica'])
dict_values([50, 50, 50])

Using collections.Counter¶

In [77]:

collections.Counter(spec)

Out[77]:

Counter({'setosa': 50, 'versicolor': 50, 'virginica': 50})

In [85]:

cnt_sl = collections.Counter(iris['sepal_length'])
cnt_sl.most_common(10)

Out[85]:

[(5.0, 10),
 (5.1, 9),
 (6.3, 9),
 (5.7, 8),
 (6.7, 8),
 (5.8, 7),
 (5.5, 7),
 (6.4, 7),
 (4.9, 6),
 (5.4, 6)]

Writing sophisticated functions¶

Command	Access
def func(*args)	for v in args
def func(*kwargs)	for k, v in kwargs.items()

Using reduce() and filter()¶

Coming soon!

In [ ]: