Python tips and tricks

These are a list of helpful Python tips & tricks for my workflow

  • [ ] zip
  • [ ] List/Dict comprehensions
  • [ ] Generators
  • [ ] pandas

zip

z1 = zip(l1,l2) allows us to combine two lists (i.e., l1,l2) together such that we get a list of tuples. This is useful if you have different lists (i.e., names, phone number, addresses, etc.) that are all connected and need to group them up togther quickly.

list(z1) converts this zip object into a list of tuples.

*z1 unpacks the zip object. Once you unpack the object, there are no items in the zip variable

In [2]:
l1 = range(0,5)
l2 = ['a','b','c','d','e','f']
In [10]:
z1 = zip(l1,l2)
print(type(z1))
print(*z1)
<class 'zip'>
(0, 'a') (1, 'b') (2, 'c') (3, 'd') (4, 'e')
In [12]:
z1 = zip(l1,l2)
listZ = list(z1)
print(listZ)
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]

List/Dict comprehensions

  • [ ] List comprehension.
  • [ ] List comprehension with if conditional.
  • [ ] List comprehension with nexted if conditional.
  • [ ] List comprehension with if-else conditional.
  • [ ] Nested List comprehensions.
  • [ ] Dict comprehension.
List comprehension
In [40]:
x = ['alpha','beta','gamma','theta']
x = [print(val) for val in x]
alpha
beta
gamma
theta
List comprehension (if-conditional)

if conditionals come after the for loop

In [36]:
x = list(range(0,100,5))
print('x:{0}'.format(x))
cond_x = [val for val in x if val > 50]
print('y:{0}'.format(cond_x))
x:[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
y:[55, 60, 65, 70, 75, 80, 85, 90, 95]
List comprehension (nested if-conditional)
In [37]:
x = list(range(0,100,5))
print('x:{0}'.format(x))
cond_x = [val for val in x if val > 50 if val % 2]
print('y:{0}'.format(cond_x))
x:[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
y:[55, 65, 75, 85, 95]
List comprehension (if-else conditional)

if-else conditionals come before the for loop

In [38]:
x = list(range(0,100,5))
print('x:{0}'.format(x))
cond_x = [val if val > 50 else 0 for val in x]
print('y:{0}'.format(cond_x))
x:[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
y:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 60, 65, 70, 75, 80, 85, 90, 95]

Nested List comprehensions

We show how to perform nested for loops using list comprehensions. The first code is how we would write it using a for loop.

In [14]:
my_list = []

for x in [10, 25, 50]:
    for y in [1, 3, 5]:
        my_list.append(x * y)

print(my_list)
[10, 30, 50, 25, 75, 125, 50, 150, 250]

Below is how to write a nested for loop using list comprehensions

In [16]:
nested_my_list = [ x*y for x in [10,25,50] for y in [1,3,5] ]
print(nested_my_list)
[10, 30, 50, 25, 75, 125, 50, 150, 250]

Dict comprehension

COMING SOON!

Generator expressions

Generators have same syntax as list comprehensions except using () instead of []. You have to iterate over generators using .items() or use the next(genObj) to access each item in a generator.

In [70]:
list = ['tracy','clarissa','tom','hyacinth','sowhattoo']

gen_expr = ( len(l) for l in list )
print('item 1:{0}'.format(next(gen_expr))) # prints first item
print('item 2:{0}'.format(next(gen_expr))) # prints second item

print(*gen_expr) # prints all remaining items items

[print(l) for l in gen_expr] # will be empty list because of use of *gen_expr

print(','.join(list))
item 1:5
item 2:8
3 8 9
tracy,clarissa,tom,hyacinth,sowhattoo

Pandas

  1. Dataframe manipulation
  2. Filtering
In [28]:
import os
import seaborn as sns
import pandas as pd
import numpy as np
In [21]:
iris = sns.load_dataset('iris')
print(iris.head())
iris.info()
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
species         150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB
In [3]:
iris.describe()
Out[3]:
sepal_length sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

DataFrame extraction: When you have single brackets ([]) for selecting columns of a DataFrame, it returns a Series. If you use double brackets ([[]]) it returns a DataFrame.

In [9]:
sepal_data = iris['sepal_length']
print(type(sepal_data))

sepal_data = iris[['sepal_length']]
print(type(sepal_data))
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

Filtering data: When filtering across multiple criteria, remember to use np.logical_and or np.logical_or

In [23]:
filter_data1 = iris[iris['sepal_length']>5]
filter_data2 = iris[np.logical_and(iris['sepal_length']>5,iris['petal_width']>0.3)]
In [25]:
filter_data1.head()
Out[25]:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
5 5.4 3.9 1.7 0.4 setosa
10 5.4 3.7 1.5 0.2 setosa
14 5.8 4.0 1.2 0.2 setosa
15 5.7 4.4 1.5 0.4 setosa
In [26]:
filter_data2.head()
Out[26]:
sepal_length sepal_width petal_length petal_width species
5 5.4 3.9 1.7 0.4 setosa
15 5.7 4.4 1.5 0.4 setosa
16 5.4 3.9 1.3 0.4 setosa
21 5.1 3.7 1.5 0.4 setosa
23 5.1 3.3 1.7 0.5 setosa

Reading csv: When we read csv files, sometimes pandas is unable to recognize the format. We have two options are to:

  1. Read in the csv file and perform a conversion later
  2. Write a dateparser for the pd.read_csv command
In [46]:
currDir = os.getcwd()
fileName = currDir + '\\inputs\\' + 'FF_10_Industry_Portfolios.CSV'

df_10indus_m = pd.read_csv(fileName,skiprows=11,nrows=1107,index_col=0,parse_dates=True)
df_10indus_m.head()
Out[46]:
NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other
192607 1.45 15.55 4.69 -1.18 2.90 0.83 0.11 1.77 7.04 2.16
192608 3.97 3.68 2.81 3.47 2.66 2.17 -0.71 4.25 -1.69 4.38
192609 1.14 4.80 1.15 -3.39 -0.38 2.41 0.21 0.69 2.04 0.29
192610 -1.24 -8.23 -3.63 -0.78 -4.58 -0.11 -2.29 -0.57 -2.63 -2.85
192611 5.21 -0.19 4.10 0.01 4.71 1.63 6.43 5.42 3.71 2.11

Perform a conversion on the datetime index

In [47]:
timeformat = '%Y%m' # Can be as complex as '%Y-%m-%d %H:%M'
df_10indus_m.index = pd.to_datetime(df_10indus_m.index,format='%Y%m')
df_10indus_m.head()
Out[47]:
NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other
1926-07-01 1.45 15.55 4.69 -1.18 2.90 0.83 0.11 1.77 7.04 2.16
1926-08-01 3.97 3.68 2.81 3.47 2.66 2.17 -0.71 4.25 -1.69 4.38
1926-09-01 1.14 4.80 1.15 -3.39 -0.38 2.41 0.21 0.69 2.04 0.29
1926-10-01 -1.24 -8.23 -3.63 -0.78 -4.58 -0.11 -2.29 -0.57 -2.63 -2.85
1926-11-01 5.21 -0.19 4.10 0.01 4.71 1.63 6.43 5.42 3.71 2.11

Write a date parser as shown below

In [40]:
dateparser = lambda x: pd.datetime.strptime(x,'%Y%m')
dateparser('192004')

df_10indus_m = pd.read_csv(fileName,skiprows=11,nrows=1107,index_col=0,parse_dates=True,date_parser=dateparser)
df_10indus_m.head()
Out[40]:
NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other
1926-07-01 1.45 15.55 4.69 -1.18 2.90 0.83 0.11 1.77 7.04 2.16
1926-08-01 3.97 3.68 2.81 3.47 2.66 2.17 -0.71 4.25 -1.69 4.38
1926-09-01 1.14 4.80 1.15 -3.39 -0.38 2.41 0.21 0.69 2.04 0.29
1926-10-01 -1.24 -8.23 -3.63 -0.78 -4.58 -0.11 -2.29 -0.57 -2.63 -2.85
1926-11-01 5.21 -0.19 4.10 0.01 4.71 1.63 6.43 5.42 3.71 2.11

Looping

  1. List: for idx, val in enumerate(list): returns the idx,val of the list
  2. Dictionary: for key,val in dict.items(): returns the key,val of the dict
  3. 2D array: for item in np.nditer(2Darray): returns every item in the 2D numpy array
  4. DataFrame: for idx,info in df.iterrows(): returns the index row, and the information in that row as a Series

Map vs. apply vs. applymap

Command Description Example
Map Iterates over each element of a Series. 'df["col1"].map(lambda x: 5+x)': Adds 5 to each element of col1. df["col1"].map(lambda x: "BNE"+x): Concatenate “BNE“ at the beginning of each element of column2 (column format is string).
Apply Applies a function along any axis of the DataFrame. df[[‘col1’,’col2’]].apply(sum), it will returns the sum of all the values of col1 and col2.
ApplyMap Applies a function to each element of the DataFrame. func = lambda x: x+2 df.applymap(func), will add 2 to each element of dataframe (all columns of dataframe must be numeric type)

Counting items

  1. Use the collections.defaultdict whenever you can compared to a normal dict {} as its faster. Use collections.defaultdict(int) when setting up a dictionary to count items.
  2. Use the collections.Counter on any Series or data to get a list of tuples of the count of each value.
  3. Use the df["col1"].value_counts() is another way to get a count of all items in that column.
In [79]:
iris.head()
Out[79]:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Using value_counts

In [92]:
iris['species'].value_counts()
Out[92]:
virginica     50
versicolor    50
setosa        50
Name: species, dtype: int64
In [93]:
iris['sepal_length'].value_counts()
Out[93]:
5.0    10
6.3     9
5.1     9
6.7     8
5.7     8
5.5     7
5.8     7
6.4     7
6.0     6
4.9     6
6.1     6
5.4     6
5.6     6
6.5     5
4.8     5
7.7     4
6.9     4
5.2     4
6.2     4
4.6     4
7.2     3
6.8     3
4.4     3
5.9     3
6.6     2
4.7     2
7.6     1
7.4     1
4.3     1
7.9     1
7.3     1
7.0     1
4.5     1
5.3     1
7.1     1
Name: sepal_length, dtype: int64

Using defautdict

In [95]:
import collections

spec_cnt = collections.defaultdict(int)

spec = iris['species']

for s in spec:
    if s in spec_cnt.keys():
        spec_cnt[s] += 1
    else:
        spec_cnt[s] = 1

print(spec_cnt.keys())
print(spec_cnt.values())
dict_keys(['setosa', 'versicolor', 'virginica'])
dict_values([50, 50, 50])

Using collections.Counter

In [77]:
collections.Counter(spec)
Out[77]:
Counter({'setosa': 50, 'versicolor': 50, 'virginica': 50})
In [85]:
cnt_sl = collections.Counter(iris['sepal_length'])
cnt_sl.most_common(10)
Out[85]:
[(5.0, 10),
 (5.1, 9),
 (6.3, 9),
 (5.7, 8),
 (6.7, 8),
 (5.8, 7),
 (5.5, 7),
 (6.4, 7),
 (4.9, 6),
 (5.4, 6)]

Writing sophisticated functions

Command Access
def func(*args) for v in args
def func(*kwargs) for k, v in kwargs.items()

Using reduce() and filter()

Coming soon!

In [ ]:
 

Comments

Comments powered by Disqus