CS229 Spring 2020 Python Tutorial¶

Some contents are inspired by CS231N Python tutorial notebook

In this tutorial we cover

Basic Python
Simple python data type
- String
- List
- Tuple
- Set/Dictionary
Numpy
Dimension manipulation and broadcasting
Vectorization
Plotting
Pandas

Basic Python¶

If Statement¶

code = 230

if code == 229:
    print('Hello CS229!')
elif code == 230:
    print('That\'s deep learning!')
elif code < 200:
    print('That is some undergraduate class')
else:
    print('Wrong class!')

That's deep learning!

Python doesn't have "switch" statement.

Python Operators¶

Logical operators

true = True
false = False
if true:
    print("It's true!")
if not false:
    print("It's still true!")
if true and not false:
    print("Anyhow, it's true!")
if false or not true:
    print("True?")
else:
    print("Okay, it's false now....")

It's true!
It's still true!
Anyhow, it's true!
Okay, it's false now....

&, | and ~ are all bitwise operators.

Arithmetic operators.

print(5 / 2) # floating number division
print(5 % 2) # remainder
print(5 ** 2) # exponentiation
print(5 // 2) # integer division

2.5
1
25
2

^ means bitwise XOR in Python.

Loop¶

We typically use range and enumerate for iterations. You can loop over all iterables.

for i in range(5):
    print(i)

a = 5
while a > 0:
    print(a)
    a -= 1

5
4
3
2
1

Python doesn't have command like "a++" or "a--".

Function¶

Python functions can take default arguments, they have to be at the end. Be VERY careful because forgetting that you have default argument can prevent you from debugging effectively.

def power(v, p=2):
    return v ** p # How to return multiple values?

print(power(10))
print(power(10, 3))

100
1000

Functions can support extra arguments. You can pass them on to another function, or make use of these directly.

def func2(*args, **kwargs):
    print(args)
    print(kwargs)
    
def func1(v, *args, **kwargs):
    
    func2(*args, **kwargs)
    
    if 'power' in kwargs:
        return v ** kwargs['power']
    else:
        return v

print(func1(10, 'extra 1', 'extra 2', power=3))
print('--------------')
print(func1(10, 5))

('extra 1', 'extra 2')
{'power': 3}
1000
--------------
(5,)
{}
10

Simple Python data types¶

String¶

See Python documentation here

cs_class_code = 'CS-229'

print('I like ' + str(cs_class_code) + ' a lot!')
print(f'I like {cs_class_code} a lot!')

print('I love CS229. (upper)'.upper())
print('I love CS229. (rjust 50)'.rjust(50))
print('we love CS229. (capitalize)'.capitalize())
print('       I love CS229. (strip)        '.strip())

I like CS-229 a lot!
I like CS-229 a lot!
I LOVE CS229. (UPPER)
                          I love CS229. (rjust 50)
We love cs229. (capitalize)
I love CS229. (strip)

"f"-string (f for formatting?) is new since Python 3.6. Embed values using { }

print(f'{print} (print a function)')
print(f'{type(229)} (print a type)')

<built-in function print> (print a function)
<class 'int'> (print a type)

For reference, here is how people used to do things. Or you want more control.

print('Old school formatting: {2}, {1}, {0:10.2F}'.format(1.358, 'b', 'c'))
# Fill in order of 2, 1, 0. For the decimal number, fix at length of 10, round to 2 decimal places

Old school formatting: c, b,       1.36

List¶

In general, data structure documentations can be found here

list_1 = ['one', 'two', 'three']
list_2 = [1, 2, 3]

list_2.append(4)
list_2.insert(0, 'ZERO')

List extension is just addition

print(list_1 + list_2)

list_1_temp = ['a', 'b']
list_1_temp.extend(list_2)

print(list_1_temp)

['one', 'two', 'three', 'ZERO', 1, 2, 3, 4]
['a', 'b', 'ZERO', 1, 2, 3, 4]

But be VERY careful when you multiply a list, will explain later

print(list_1 * 3 + list_2)
print([list_1] * 3 + list_2)

['one', 'two', 'three', 'one', 'two', 'three', 'one', 'two', 'three', 'ZERO', 1, 2, 3, 4]
[['one', 'two', 'three'], ['one', 'two', 'three'], ['one', 'two', 'three'], 'ZERO', 1, 2, 3, 4]

pprint is your friend

import pprint as pp

pp.pprint([list_1] * 5 + list_2)
pp.pprint([list_1] * 2 + [list_2] * 3)

[['one', 'two', 'three'],
 ['one', 'two', 'three'],
 ['one', 'two', 'three'],
 ['one', 'two', 'three'],
 ['one', 'two', 'three'],
 'ZERO',
 1,
 2,
 3,
 4]
[['one', 'two', 'three'],
 ['one', 'two', 'three'],
 ['ZERO', 1, 2, 3, 4],
 ['ZERO', 1, 2, 3, 4],
 ['ZERO', 1, 2, 3, 4]]

List comprehension can save a lot of lines

long_list = [i for i in range(9)]
long_long_list = [(i, j) for i in range(3) for j in range(5)]
long_list_list = [[i for i in range(3)] for _ in range(5)]

pp.pprint(long_list)
pp.pprint(long_long_list)
pp.pprint(long_list_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8]
[(0, 0),
 (0, 1),
 (0, 2),
 (0, 3),
 (0, 4),
 (1, 0),
 (1, 1),
 (1, 2),
 (1, 3),
 (1, 4),
 (2, 0),
 (2, 1),
 (2, 2),
 (2, 3),
 (2, 4)]
[[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]]

List is iterable!

string_list = ['a', 'b', 'c']
for s in string_list:
    print(s)
for i, s in enumerate(string_list):
    print(f'{i}, {s}')

a
b
c
0, a
1, b
2, c

Slicing. With numpy array (covered layter), you can do this to multi-dimensional ones as well.

print(long_list[:5])
print(long_list[:-1])
print(long_list[4:-1])

long_list[3:5] = [-1, -2]
print(long_list)

long_list.pop()
print(long_list)

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 5, 6, 7]
[4, 5, 6, 7]
[0, 1, 2, -1, -2, 5, 6, 7, 8]
[0, 1, 2, -1, -2, 5, 6, 7]

Sorting a list (but remember that sorting can be costly). Documentation for sorting is here

random_list = [3, 12, 5, 6, 8, 2]
print(sorted(random_list))

random_list_2 = [(3, 'z'), (12, 'r'), (5, 'a'), (6, 'e'), (8, 'c'), (2, 'g')]
print(sorted(random_list_2, key=lambda x: x[1]))

[2, 3, 5, 6, 8, 12]
[(5, 'a'), (8, 'c'), (6, 'e'), (2, 'g'), (12, 'r'), (3, 'z')]

Think first before copying Copy by reference not by value. More about copying here

orig_list = [[1, 2], [3, 4]]
dup_list = orig_list

dup_list[0][1] = 'okay'
pp.pprint(orig_list)
pp.pprint(dup_list)

[[1, 'okay'], [3, 4]]
[[1, 'okay'], [3, 4]]

a = [[1, 2, 3]]*3
b = [[1, 2, 3] for i in range(3)]
a[0][1] = 4
b[0][1] = 4
print(a)
print(b)

[[1, 4, 3], [1, 4, 3], [1, 4, 3]]
[[1, 4, 3], [1, 2, 3], [1, 2, 3]]

import copy

orig_list = [[1, 2], [3, 4]]
dup_list = copy.deepcopy(orig_list)

dup_list[0][1] = 'okay'
pp.pprint(orig_list)
pp.pprint(dup_list)

[[1, 2], [3, 4]]
[[1, 'okay'], [3, 4]]

Tuple¶

List that you cannot edit.

my_tuple = (10, 20, 30)
my_tuple[0] = 40

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-48-a4317678f4cc> in <module>
      1 my_tuple = (10, 20, 30)
----> 2 my_tuple[0] = 40

TypeError: 'tuple' object does not support item assignment

Split assignment makes your code shorter (also works for list).

a, b, c = my_tuple
print(f"a={a}, b={b}, c={c}")
for obj in enumerate(my_tuple):
    print(obj)

a=10, b=20, c=30
(0, 10)
(1, 20)
(2, 30)

Dictionary/Set¶

Again, documentation for data structure is here

my_set = {i ** 2 % 3 for i in range(10)}
my_dict = {(5 - i): i ** 2 for i in range(10)}

print(my_set)
print(my_dict)

print(my_dict.keys())

{0, 1}
{5: 0, 4: 1, 3: 4, 2: 9, 1: 16, 0: 25, -1: 36, -2: 49, -3: 64, -4: 81}
dict_keys([5, 4, 3, 2, 1, 0, -1, -2, -3, -4])

Updating and/or addint content to a dictionary

second_dict = {'a': 10, 'b': 11}
my_dict.update(second_dict)

pp.pprint(my_dict)

my_dict['new'] = 10
pp.pprint(my_dict)

{-4: 81,
 -3: 64,
 -2: 49,
 -1: 36,
 0: 25,
 1: 16,
 2: 9,
 3: 4,
 4: 1,
 5: 0,
 'a': 10,
 'b': 11}
{-4: 81,
 -3: 64,
 -2: 49,
 -1: 36,
 0: 25,
 1: 16,
 2: 9,
 3: 4,
 4: 1,
 5: 0,
 'a': 10,
 'b': 11,
 'new': 10}

Here is how to iterate through a dictionary. And remember that dictionary is NOT sorted by key value.

for k, it in my_dict.items(): # similar to for loop over enumerate(list)
    print(k, it)

5 0
4 1
3 4
2 9
1 16
0 25
-1 36
-2 49
-3 64
-4 81
a 10
b 11
new 10

# Sorting keys by string order
for k, it in sorted(my_dict.items(), key=lambda x: str(x[0])):
    print(k, it)

-1 36
-2 49
-3 64
-4 81
0 25
1 16
2 9
3 4
4 1
5 0
a 10
b 11
new 10

For defaultdict and sorted dictionary, see the collections documentation

Numpy¶

Numpy is a nice vector and matrix manipulation package.

import numpy as np

Array initialization¶

Initialize from existing list. If type is not consistent, numpy will give you weird result.

from_list = np.array([1, 2, 3])
from_list_2d = np.array([[1, 2, 3.0], [4, 5, 6]])
from_list_bad_type = np.array([1, 2, 3, 'a'])
                               
pp.pprint(from_list)
print(f'\t Data type of integer is {from_list.dtype}')
pp.pprint(from_list_2d)
print(f'\t Data type of float is {from_list_2d.dtype}')
pp.pprint(from_list_bad_type)

array([1, 2, 3])
	 Data type of integer is int64
array([[1., 2., 3.],
       [4., 5., 6.]])
	 Data type of float is float64
array(['1', '2', '3', 'a'], dtype='<U21')

Initialize with ones, zeros, or as identity matrix

print(np.ones(3))
print(np.ones((3, 3)))

print(np.zeros(3))
print(np.zeros((3, 3)))

print(np.eye(3))

[1. 1. 1.]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[0. 0. 0.]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Sampling over uniform distribution on $[0, 1)$.

print(np.random.random(3))
print(np.random.random((2, 2)))

[0.32901573 0.175129   0.34132364]
[[0.32287561 0.60218408]
 [0.17216162 0.42272833]]

Sampling over stnadard normal distribution.

print(np.random.randn(3, 3))

[[-0.44315124 -1.21745661  0.20513334]
 [ 1.40976472  1.80851604 -0.72227264]
 [-0.70184302 -0.75835938 -0.08404159]]

Numpy has built-in samplers of a lot of other common (and some not so common) distributions.

Array shape¶

Shape/reshape and multi-dimensional arrays

array_1d = np.array([1, 2, 3, 4])
array_1by4 = np.array([[1, 2, 3, 4]])
array_2by4 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(array_1d.shape)
print(array_1by4.shape)

print(array_1d.reshape(-1, 4).shape)

print(array_2by4.size)

(4,)
(1, 4)
(1, 4)
8

large_array = np.array([i for i in range(400)])
large_array = large_array.reshape((20, 20))

print(large_array[:, 5])

large_3d_array = np.array([i for i in range(1000)])
large_3d_array = large_3d_array.reshape((10, 10, 10))

print(large_3d_array[:, 1, 1])
print(large_3d_array[2, :, 1])
print(large_3d_array[2, 3, :])

print(large_3d_array[1, :, :])

[  5  25  45  65  85 105 125 145 165 185 205 225 245 265 285 305 325 345
 365 385]
[ 11 111 211 311 411 511 611 711 811 911]
[201 211 221 231 241 251 261 271 281 291]
[230 231 232 233 234 235 236 237 238 239]
[[100 101 102 103 104 105 106 107 108 109]
 [110 111 112 113 114 115 116 117 118 119]
 [120 121 122 123 124 125 126 127 128 129]
 [130 131 132 133 134 135 136 137 138 139]
 [140 141 142 143 144 145 146 147 148 149]
 [150 151 152 153 154 155 156 157 158 159]
 [160 161 162 163 164 165 166 167 168 169]
 [170 171 172 173 174 175 176 177 178 179]
 [180 181 182 183 184 185 186 187 188 189]
 [190 191 192 193 194 195 196 197 198 199]]

Think about the order you need before using reshape.

small_array = np.arange(4)
print(np.reshape(small_array, (2, 2), order='C')) # Default order
print(np.reshape(small_array, (2, 2), order='F'))

[[0 1]
 [2 3]]
[[0 2]
 [1 3]]

Numpy math¶

This also works for sin, cos, tanh, etc.

array_1 = np.array([1, 2, 3, 4])

print(array_1 + 5)
print(array_1 * 5)
print(np.sqrt(array_1))
print(np.power(array_1, 2))
print(np.exp(array_1))
print(np.log(array_1))

[6 7 8 9]
[ 5 10 15 20]
[1.         1.41421356 1.73205081 2.        ]
[ 1  4  9 16]
[ 2.71828183  7.3890561  20.08553692 54.59815003]
[0.         0.69314718 1.09861229 1.38629436]

For sum, mean, avg, std, var, etc, you can perform the operation on set axis.

array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

pp.pprint(array_2d)
print(f'shape={array_2d.shape}')
print(np.sum(array_2d))
print(np.sum(array_2d, axis=0))
print(np.sum(array_2d, axis=1))

array_3d = np.array([i for i in range(8)]).reshape((2, 2, 2))
pp.pprint(array_3d)

print(np.sum(array_3d, axis=0))
print(np.sum(array_3d, axis=1))
print(np.sum(array_3d, axis=(1, 2)))

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
shape=(3, 3)
45
[12 15 18]
[ 6 15 24]
array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])
[[ 4  6]
 [ 8 10]]
[[ 2  4]
 [10 12]]
[ 6 22]

Numpy tend to do things element-wise. But be VERY CAREFUL when dimensions don't match. We will cover this in broadcasting. Actuall just be careful with dimension of arrays in general.

array_1 = np.array([1, 2, 3, 4])
array_2 = np.array([3, 4, 5, 6])

print(array_1 * array_2)
print(array_1 * array_2.reshape(4, -1)) # Come back to this later

[ 3  8 15 24]
[[ 3  6  9 12]
 [ 4  8 12 16]
 [ 5 10 15 20]
 [ 6 12 18 24]]

Dot product can be written in 3 ways

print(array_1 @ array_2)
print(array_1.dot(array_2))
print(np.dot(array_1, array_2))

print(array_1.shape)

50
50
50
(4,)

Here, you can't dot when the dimensions are incorrect. But it did not complain just now. Check the shapes!

array_1 = np.array([[1, 2, 3, 4]])
array_2 = np.array([[3, 4, 5, 6]])

print(array_1.shape)

print(array_1 * array_2)
print(array_1.dot(array_2))

(1, 4)
[[ 3  8 15 24]]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-67-853ea17e99b1> in <module>
      5 
      6 print(array_1 * array_2)
----> 7 print(array_1.dot(array_2))

ValueError: shapes (1,4) and (1,4) not aligned: 4 (dim 1) != 1 (dim 0)

With proper handling of shapes, things work. Also, dot is just matrix multiplication. You might just want to write matrix multiply to keep things consistent and be SURE that you have the correct shapes.

# T for transpose

print(array_1.dot(array_2.T))
print(array_1.T.dot(array_2))

print(np.matmul(array_1, array_2.T))
print(np.matmul(array_1.T, array_2))

[[50]]
[[ 3  4  5  6]
 [ 6  8 10 12]
 [ 9 12 15 18]
 [12 16 20 24]]
[[50]]
[[ 3  4  5  6]
 [ 6  8 10 12]
 [ 9 12 15 18]
 [12 16 20 24]]

weight_matrix = np.array([1, 2, 3, 4]).reshape(2, 2)
sample = np.array([[50, 60]]).T

np.matmul(weight_matrix, sample)

array([[170],
       [390]])

And of course, we typically use matmul for 2D matrix multiplications. For dim>3, Numpy treats it as a stack of matrices. See Matmul documentation

mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])

print(np.matmul(mat1, mat2))

[[19 22]
 [43 50]]

Notice that np.multiply is element-wise multiplication. NOT proper matrix multiplicatio.

a = np.array([i for i in range(10)]).reshape(2, 5)

print(a * a)
print(np.multiply(a, a))
print(np.multiply(a, 10))

[[ 0  1  4  9 16]
 [25 36 49 64 81]]
[[ 0  1  4  9 16]
 [25 36 49 64 81]]
[[ 0 10 20 30 40]
 [50 60 70 80 90]]

Broadcasting and dimension manipulation¶

Numpy has capability to perform operations on arrays with different shapes, inferring/expanding dimension as needed. Taking examples from Scipy's documentaiton on numpy, some examples can be

A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5

A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5

Essentially all dimensions of size 1 can be "over-looked" or "expanded" to match dimension from another operator. But the order of such must be matched. Dimension of size 1 is only prepended, not appended. For example, the following would not work, though you might think we can add another dimension at the end of B.

A      (3d array):  15 x 3 x 5
B      (2d array):       1 x 3
Result (3d array):  15 x 3 x 5

op1 = np.array([i for i in range(9)]).reshape(3, 3)
op2 = np.array([[1, 2, 3]])
op3 = np.array([1, 2, 3])

pp.pprint(op1)
pp.pprint(op2)

# Notice that the result here is DIFFERENT!
print(op2.shape)
pp.pprint(op1 + op2)
pp.pprint(op1 + op2.T)

# Notice that the result here are THE SAME!
print(op3.shape)
pp.pprint(op1 + op3)
pp.pprint(op1 + op3.T)

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
array([[1, 2, 3]])
(1, 3)
array([[ 1,  3,  5],
       [ 4,  6,  8],
       [ 7,  9, 11]])
array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])
(3,)
array([[ 1,  3,  5],
       [ 4,  6,  8],
       [ 7,  9, 11]])
array([[ 1,  3,  5],
       [ 4,  6,  8],
       [ 7,  9, 11]])

Here, broadcasting won't work for 15 x 3 x 5 with 1 x 3. Because dimensions are only prepended.

But it WILL work for 15 x 3 x 5 with 3 x 1.

op1 = np.array([i for i in range(225)]).reshape(15, 3, 5)
op2 = np.array([[1, 2, 3]])

# This does not work
# print(op1 + op2)

# This works
print(op1 + op2.T)

# BTW you can contract the cells by clicking on the left

[[[  1   2   3   4   5]
  [  7   8   9  10  11]
  [ 13  14  15  16  17]]

 [[ 16  17  18  19  20]
  [ 22  23  24  25  26]
  [ 28  29  30  31  32]]

 [[ 31  32  33  34  35]
  [ 37  38  39  40  41]
  [ 43  44  45  46  47]]

 [[ 46  47  48  49  50]
  [ 52  53  54  55  56]
  [ 58  59  60  61  62]]

 [[ 61  62  63  64  65]
  [ 67  68  69  70  71]
  [ 73  74  75  76  77]]

 [[ 76  77  78  79  80]
  [ 82  83  84  85  86]
  [ 88  89  90  91  92]]

 [[ 91  92  93  94  95]
  [ 97  98  99 100 101]
  [103 104 105 106 107]]

 [[106 107 108 109 110]
  [112 113 114 115 116]
  [118 119 120 121 122]]

 [[121 122 123 124 125]
  [127 128 129 130 131]
  [133 134 135 136 137]]

 [[136 137 138 139 140]
  [142 143 144 145 146]
  [148 149 150 151 152]]

 [[151 152 153 154 155]
  [157 158 159 160 161]
  [163 164 165 166 167]]

 [[166 167 168 169 170]
  [172 173 174 175 176]
  [178 179 180 181 182]]

 [[181 182 183 184 185]
  [187 188 189 190 191]
  [193 194 195 196 197]]

 [[196 197 198 199 200]
  [202 203 204 205 206]
  [208 209 210 211 212]]

 [[211 212 213 214 215]
  [217 218 219 220 221]
  [223 224 225 226 227]]]

Tile¶

Treat broadcasting as tilling the lower dimensional array to suit the size of the "more complex" array.

array = np.array([1, 2, 3])

# np.tile(array, shape)
print(np.tile(array, 2))
print(np.tile(array, (2, 3)))

[1 2 3 1 2 3]
[[1 2 3 1 2 3 1 2 3]
 [1 2 3 1 2 3 1 2 3]]

Observe how, with transpose, the tiled result is different. Op2 originally has shape 1 x 3, so

Tiling it (1 x 5) means tiling 2nd dimension 5 times, yielding (1 x 15)

Tiling the transpose, thus 3 x 1, by (1 x 5) means tiling 2nd dimension 5 times, yielding (3 x 5)

op1 = np.array([i for i in range(225)]).reshape(15, 3, 5)
op2 = np.array([[1, 2, 3]])

op_tiled= np.tile(op2, (1, 5))
print(op_tiled.shape)

op_tiled= np.tile(op2.T, (1, 5))
print(op_tiled.shape)

(1, 15)
(3, 5)

Expand/Squeeze¶

Add a dimension of size 1 or remove dimension of size 1. Here we massage op2 (shape=(1, 3)) to shape of (15, 3, 5)

op_expanded = np.expand_dims(op2, axis=2)
print(op_expanded.shape)

op_tiled_2 = np.tile(op_expanded, (15, 1, 5))
print(op_tiled_2.shape)

(1, 3, 1)
(15, 3, 5)

Same effect with np.newaxis

op3 = np.array([i for i in range(9)]).reshape(3, 3)

op_na = op3[np.newaxis, :]
print(op_na)
print(op_na.shape)

op_na2 = op3[:, np.newaxis, :]
print(op_na2)
print(op_na2.shape)

[[[0 1 2]
  [3 4 5]
  [6 7 8]]]
(1, 3, 3)
[[[0 1 2]]

 [[3 4 5]]

 [[6 7 8]]]
(3, 1, 3)

Squeeze removes size 1 dimensions

print(op_expanded)
print(op_expanded.shape)

op_squeezed = np.squeeze(op_expanded)

print(op_squeezed)

[[[1]
  [2]
  [3]]]
(1, 3, 1)
[1 2 3]

Pairwise distance¶

Here are 3 ways to compute pairwise distances.

"Naive" method through tile expansion
Convert the tile/expansion to broadcasting
Scipy one line

samples = np.random.random((15, 5))
print(samples.shape)
print(samples)

# Without broadcasting
expanded1 = np.expand_dims(samples, axis=1)
tile1 = np.tile(expanded1, (1, samples.shape[0], 1))
#print(expanded1.shape)
#print(tile1.shape)
#print(tile1)

expanded2 = np.expand_dims(samples, axis=0)
tile2 = np.tile(expanded2, (samples.shape[0], 1 ,1))
#print(expanded2.shape)
#print(tile2.shape)
#print(tile2)

diff = tile2 - tile1
distances = np.linalg.norm(diff, axis=-1)
# print(distances)
print(np.mean(distances))
##################################


# With broadcasting
diff = samples[: ,np.newaxis, :] - samples[np.newaxis, :, :]
distances = np.linalg.norm(diff, axis=-1)
# print(distances)
print(np.mean(distances))


# With scipy
import scipy.spatial
distances = scipy.spatial.distance.cdist(samples, samples)
# print(distances)
# print(len(distances))
print(np.mean(distances))

(15, 5)
[[0.45767142 0.56489308 0.37910783 0.1012638  0.63895657]
 [0.72033823 0.28494664 0.86460006 0.81522924 0.05615894]
 [0.72889278 0.88609119 0.04580975 0.81831563 0.24520082]
 [0.68200685 0.6404537  0.70349505 0.58704715 0.58236006]
 [0.11619128 0.48050658 0.74821419 0.43276056 0.24725844]
 [0.95417451 0.95489342 0.07671449 0.86527711 0.64929007]
 [0.18535464 0.92787863 0.7322276  0.00184351 0.90755884]
 [0.89479318 0.99133381 0.23356447 0.30061149 0.93226858]
 [0.98611507 0.03185917 0.24049277 0.63320623 0.89291318]
 [0.76912372 0.3582217  0.22339368 0.50746419 0.51563737]
 [0.45958078 0.3723447  0.99481086 0.28386613 0.75707502]
 [0.27449411 0.56054339 0.91572132 0.97952258 0.35366246]
 [0.51649077 0.49313818 0.58891696 0.04172703 0.56133593]
 [0.24170496 0.01170604 0.82451557 0.34265237 0.42497829]
 [0.82304983 0.96870729 0.04454417 0.77944192 0.68369793]]
0.8702615298788167
0.8702615298788167
0.8702615298788167

Vectorization¶

tqdm is a nice package for you to track progress, or just kill time.

import time # time.time() gets wall time, time.clock() gets processor time
from tqdm import tqdm

Dot Product¶

Numpy is 25 times faster than loops here.

a = np.random.random(500000)
b = np.random.random(500000)

p_tic = time.perf_counter()
tic = time.time()

dot = 0.0;
for i in tqdm(range(len(a))):
    dot += a[i] * b[i]

print(dot)

toc = time.time()
p_toc = time.perf_counter()

print(f'Result: {dot}');
print(f'Compute time (wall): {round(1000 * (toc - tic), 6)}ms')
print(f'Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms\n')

#####################################################################

p_tic = time.perf_counter()
tic = time.time()

print(np.array(a).dot(np.array(b)))

toc = time.time()
p_toc = time.perf_counter()

print(f'(vectorized) Result: {dot}');
print(f'(vectorized) Compute time: {round(1000 * (toc - tic), 6)}ms')
print(f'(vectorized) Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms')

100%|██████████| 500000/500000 [00:00<00:00, 513976.60it/s]

125037.9051522837
Result: 125037.9051522837
Compute time (wall): 985.734701ms
Compute time (cpu) : 985.8469ms

125037.90515228486
(vectorized) Result: 125037.9051522837
(vectorized) Compute time: 23.534298ms
(vectorized) Compute time (cpu) : 23.6601ms

Matrix muliplication (2D)¶

Numpy is more than TWO THOUSAND times faster than loops here.

Matrix multiplication is a O(n^3) complexity operation if implemented naively.

def matrix_mul(X, Y):
    # iterate through rows of X
    for i in range(len(X)):
        # iterate through columns of Y
        for j in range(len(Y[0])):
            # iterate through rows of Y
            for k in range(len(Y)):
                result[i][j] += X[i][k] * Y[k][j]
    return result

X = np.random.random((200, 200))
Y = np.random.random((200, 200))

result = np.zeros((200, 200))

p_tic = time.perf_counter()
tic = time.time()

# iterate through rows of X
for i in tqdm(range(len(X))):
    # iterate through columns of Y
    for j in range(len(Y[0])):
        # iterate through rows of Y
        for k in range(len(Y)):
            result[i][j] += X[i][k] * Y[k][j]

s = np.sum(result)

toc = time.time()
p_toc = time.perf_counter()

print(f'Result: {s}');
print(f'Compute time (wall): {round(1000 * (toc - tic), 6)}ms')
print(f'Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms\n')

#####################################################################

p_tic = time.perf_counter()
tic = time.time()

result = np.matmul(X, Y)
s = np.sum(result)

toc = time.time()
p_toc = time.perf_counter()

print(f'(vectorized) Result: {s}');
print(f'(vectorized) Compute time: {round(1000 * (toc - tic), 6)}ms')
print(f'(vectorized) Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms')

100%|██████████| 200/200 [00:14<00:00, 14.05it/s]

Result: 1999185.355486207
Compute time (wall): 14240.134239ms
Compute time (cpu) : 14240.2964ms

(vectorized) Result: 1999185.355486207
(vectorized) Compute time: 5.803108ms
(vectorized) Compute time (cpu) : 5.93ms

Pairwise distance, again¶

Again, numpy is 30 times faster

samples = np.random.random((100, 5))

p_tic = time.perf_counter()
tic = time.time()

total_dist = []
for s1 in samples:
    for s2 in samples:
        d = np.linalg.norm(s1 - s2)
        total_dist.append(d)
        
avg_dist = np.mean(total_dist)

toc = time.time()
p_toc = time.perf_counter()

print(f'Result: {avg_dist}');
print(f'Compute time (wall): {round(1000 * (toc - tic), 6)}ms')
print(f'Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms\n')


#####################################################################

p_tic = time.perf_counter()
tic = time.time()

diff = samples[: ,np.newaxis, :] - samples[np.newaxis, :, :]
distances = np.linalg.norm(diff, axis=-1)
avg_dist = np.mean(distances)

toc = time.time()
p_toc = time.perf_counter()

print(f'Result: {avg_dist}');
print(f'Compute time (wall): {round(1000 * (toc - tic), 6)}ms')
print(f'Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms\n')

Result: 0.8657805911523523
Compute time (wall): 172.189951ms
Compute time (cpu) : 172.335ms

Result: 0.8657805911523523
Compute time (wall): 2.529621ms
Compute time (cpu) : 2.6422ms

You might want to make sure that OpenBLAS is installed. OpenBLAS is a "basic linear algebra subprograms" package that basically, speeds up math for numpy.

np.show_config()

mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/jingboyang/anaconda3/envs/common/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/jingboyang/anaconda3/envs/common/include']
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/jingboyang/anaconda3/envs/common/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/jingboyang/anaconda3/envs/common/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/jingboyang/anaconda3/envs/common/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/jingboyang/anaconda3/envs/common/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/jingboyang/anaconda3/envs/common/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/jingboyang/anaconda3/envs/common/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/jingboyang/anaconda3/envs/common/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/jingboyang/anaconda3/envs/common/include']

Matplotlib¶

Simple plotting¶

We want to plot with proper labels, series legend, and even markets.

# If you are using a headless environment. Very important if running on server
# import matplotlib
# matplotlib.use('Agg')

import matplotlib.pyplot as plt

def draw_simple_sin_cos(x_values):
    
    y1_values = np.sin(x_values * np.pi)
    y2_values = np.cos(x_values * np.pi)

    plt.plot(x_values, y1_values, label='Sine')
    plt.plot(x_values, y2_values, label='Cosine')

    plt.legend()
    plt.xlabel('x')
    plt.ylabel('values')
    plt.title('Values for sin and cos, scaled by $\phi_i$')

x_values = np.arange(0, 20, 0.001)

draw_simple_sin_cos(x_values)
plt.show()

You can adjust figure size for aspect ratio then DPI for density of pixels. These combined give you resolution of the image

plt.figure(figsize=(10,3), dpi=100) # 640 x 450

draw_simple_sin_cos(x_values)

plt.savefig('tutorial_sin.jpg')
plt.show()

Subplots in a grid can share axis labels through sharex and sharey.

def draw_subplot_sin_cos(index, x_values, ax):
    
    y1_values = np.sin(x_values * np.pi)
    y2_values = np.cos(x_values * np.pi)

    ax.plot(x_values, y1_values, c='r', label='Sine')
    ax.scatter(x_values, y2_values, s=4, label='Cosine')

    ax.legend()
    ax.set_xlabel('x')
    ax.set_ylabel('values')
    ax.set_title(f'Values for sin and cos (Subplot #{index})')

fig, ax_list = plt.subplots(nrows=2, ncols=2, figsize=(10, 10))
#fig, ax_list = plt.subplots(nrows=2, ncols=2,
#                            sharex='col', sharey='row',
#                            figsize=(10, 10))

i = 0
for r, row in enumerate(ax_list):
    for c, ax in enumerate(row):
        x_values = np.arange(i, i + 10, 0.1)
        draw_subplot_sin_cos(i, x_values, ax)
        i += 1

plt.show()

Confusion matrix¶

Here we show plotting confusion matrix from scratch. For a pre-built one, see implementation by scikit-learn

fig, ax = plt.subplots(figsize=(10,10))

color='YlGn'

labels = ['Python', 'C++', 'Fortran']

cm = np.array([[0.7, 0.3, 0.2], [0.1, 0.5, 0.4], [0.05, 0.1, 0.85]])
heatmap = ax.pcolor(cm, cmap=color)
fig.colorbar(heatmap)
ax.invert_yaxis()
ax.xaxis.tick_top()

ax.set_title('Confusion Matrix')
ax.set_xlabel('Prediction')
ax.set_ylabel('Groud Truth')

ax.set_xticks(np.arange(cm.shape[0]) + 0.5, minor=False)
ax.set_yticks(np.arange(cm.shape[1]) + 0.5, minor=False)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)

plt.show()

Show image¶

When showing images, remember to tell numpy the range of pixel values. Typically pixel values are either 0-1 or 0-255.

img_arr = np.random.random((256, 256))# 0 -> 1
print(img_arr.shape)

plt.imshow(img_arr, cmap='gray', vmin=0.2, vmax=0.25)
plt.show()

(256, 256)

By default numpy goes channel first.

img_arr = np.random.random((256, 256, 3))# R, C, (RGB)
print(img_arr.shape)

plt.imshow(img_arr, vmin=0, vmax=1)
plt.show()

(256, 256, 3)

Remember to move axis around if you want to use the default plotting tool.

img_arr = np.random.random((3, 256, 256))# (RGB) R C
print(img_arr.shape)

img_arr = np.moveaxis(img_arr, 0, -1)
print(img_arr.shape)

plt.imshow(img_arr, vmin=0, vmax=1)
plt.show()

(3, 256, 256)
(256, 256, 3)

import imageio

fname = 'sample.jpg'
img = imageio.imread(fname)

pp.pprint(img)
print(img.shape)

Array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       ...,

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]]], dtype=uint8)
(211, 862, 3)

plt.figure(dpi=250)   # dpi=500 -> larger
plt.imshow(img, vmin=0, vmax=10000, interpolation='bilinear')
plt.show()

Pandas¶

Pandas is great data processing library for table/database-like data. Excellent for things that come in or you wish to output as CSV/Excel. Part of the following content is inspired by a Pandas tutorial online.

File operations¶

import pandas as pd

data = pd.read_csv('train.csv')

data_short = data[:20]

data_short

You can get basic statiscis with little effort like this

print(data['x_1'].describe())     # For one column

data.describe()                   # For the entire dataframe

count    2500.000000
mean        0.286400
std         0.452169
min         0.000000
25%         0.000000
50%         0.000000
75%         1.000000
max         1.000000
Name: x_1, dtype: float64

See Pandas documentation for parameters of to_csv function.

data_short.to_csv('data_short.csv', index=False)

Manipulations¶

Columns can be selected and filtered based on value/name. Be careful with binary operation for filtering due to order of execution. Bitwise operations takes precedence over boolean.

data_short[['x_1', 'y']]

data_short[(data_short['y'] > 5) & (data_short['x_3'] < 1.5)]       # Use & | instead of and/or. Put brackets around

A filter function can be applied to generate a new column (you can use this to apply a trained model for prediction result).

We can add column based on filter/conditions.

def filter_func(row):
    
    if row['x_1'] == 1.0 and row['x_2'] == 0.0:
        return row['y'] * 10
    
    return -1

data_short['new_column'] = data_short[['x_1', 'x_2', 'y']].apply(filter_func, axis=1)

data_short

/home/jingboyang/anaconda3/envs/common/lib/python3.7/site-packages/ipykernel_launcher.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Iterating through Pandas rows can be done as follows. Each row is a "dictionary". Adding data directly via a list of values is also valid.

col2 = []
for i, row in data_short.iterrows():
    print(f'Row {i}: y-value: {row["y"]}')
    col2.append(row['y'] ** 2)

data_short['col_2'] = col2
data_short

Row 0: y-value: 10.0
Row 1: y-value: 12.0
Row 2: y-value: 7.0
Row 3: y-value: 15.0
Row 4: y-value: 9.0
Row 5: y-value: 8.0
Row 6: y-value: 9.0
Row 7: y-value: 8.0
Row 8: y-value: 4.0
Row 9: y-value: 16.0
Row 10: y-value: 14.0
Row 11: y-value: 7.0
Row 12: y-value: 8.0
Row 13: y-value: 10.0
Row 14: y-value: 4.0
Row 15: y-value: 19.0
Row 16: y-value: 15.0
Row 17: y-value: 21.0
Row 18: y-value: 5.0
Row 19: y-value: 11.0

/home/jingboyang/anaconda3/envs/common/lib/python3.7/site-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Not a great example here, but loc means index by value, iloc means index by index. For example, you can do iloc by -1, but NOT loc by -1.

print(data.loc[19])
print(data.iloc[-1])

x_1     1.000000
x_2     0.000000
x_3     1.275172
x_4     0.756409
y      11.000000
Name: 19, dtype: float64
x_1     0.000000
x_2     1.000000
x_3     1.825617
x_4     0.059309
y      11.000000
Name: 2499, dtype: float64

Create dataframe¶

You can create dataframe from dictionary in row or column major manner. Notice that "extra" things will be filled with Nan.

data_list = [{'a': i, 'b': i + 1} for i in range(15)]
data_list[5] = {'a': 10, 'b': 9, 'c': -1}

df = pd.DataFrame(data_list)
df

Dataframe can also be created from 2D array. Naming the rows and columns is a good practice.

data_2d = np.array([i for i in range(50)]).reshape(5, 10)

df = pd.DataFrame(data_2d, columns=[f'col {i}' for i in range(10)], index=[f'row {i}' for i in range(5)])
df

Similarly, you can create dataframe directly from dictionary. It also supports whether the dicionary keys are row/col indices.

data_dict = {'col 1': [3, 2, 1, 0],
        'col 2': ['a', 'b', 'c', 'd']}

df = pd.DataFrame.from_dict(data_dict)
df

df = pd.DataFrame.from_dict(data_dict, orient='index')
df

Simple plotting¶

Pandas also support plotting. The images it generates are the same style as those in Matplotlib. Pandas plotting provides a quick way to visualize, while you might still need to resort to Matplotlib for more formal plots with higher flexibility.

data.plot(kind='scatter', x='x_3', y='y', title='Plot of Data');

data['y'].plot(kind='hist', title='Y');

data.boxplot(column='x_3', by='y');

data.to_numpy()

array([[ 1.        ,  0.        ,  2.97614241,  0.65148205, 10.        ],
       [ 0.        ,  1.        ,  1.4113903 ,  0.74373156, 12.        ],
       [ 0.        ,  1.        ,  1.03989184,  1.2905879 ,  7.        ],
       ...,
       [ 0.        ,  1.        ,  1.49124324,  0.84115559,  7.        ],
       [ 0.        ,  1.        ,  2.8631773 ,  1.13793409, 12.        ],
       [ 0.        ,  1.        ,  1.82561719,  0.05930945, 11.        ]])

plt.scatter(data['x_1'], data['y'])

<matplotlib.collections.PathCollection at 0x7f36ce2ccfd0>

	x_1	x_2	x_3	x_4	y
0	1.0	0.0	2.976142	0.651482	10
1	0.0	1.0	1.411390	0.743732	12
2	0.0	1.0	1.039892	1.290588	7
3	1.0	0.0	2.338679	0.973942	15
4	0.0	1.0	2.385257	0.297921	9
5	0.0	1.0	2.912910	0.244489	8
6	1.0	0.0	2.585491	0.133044	9
7	1.0	0.0	2.961107	0.338565	8
8	0.0	1.0	0.161944	0.481609	4
9	0.0	1.0	2.512621	1.118481	16
10	1.0	0.0	2.711287	0.463432	14
11	1.0	0.0	1.479011	0.860247	7
12	1.0	0.0	0.223923	1.030258	8
13	1.0	0.0	2.918245	0.409249	10
14	0.0	1.0	1.447071	0.061543	4
15	1.0	0.0	2.269534	1.754568	19
16	0.0	1.0	2.804809	1.114212	15
17	0.0	1.0	2.539715	1.850662	21
18	0.0	1.0	1.300125	1.178924	5
19	1.0	0.0	1.275172	0.756409	11

	x_1	x_2	x_3	x_4	y
count	2500.000000	2500.000000	2500.000000	2500.000000	2500.000000
mean	0.286400	0.713600	1.506160	0.974763	9.764000
std	0.452169	0.452169	0.862699	0.577296	4.559893
min	0.000000	0.000000	0.004021	0.001589	1.000000
25%	0.000000	0.000000	0.753142	0.481975	6.000000
50%	0.000000	1.000000	1.488827	0.969344	9.000000
75%	1.000000	1.000000	2.262212	1.473104	13.000000
max	1.000000	1.000000	2.998699	1.997793	29.000000

	x_1	y
0	1.0	10
1	0.0	12
2	0.0	7
3	1.0	15
4	0.0	9
5	0.0	8
6	1.0	9
7	1.0	8
8	0.0	4
9	0.0	16
10	1.0	14
11	1.0	7
12	1.0	8
13	1.0	10
14	0.0	4
15	1.0	19
16	0.0	15
17	0.0	21
18	0.0	5
19	1.0	11

	x_1	x_2	x_3	x_4	y	new_column
0	1.0	0.0	2.976142	0.651482	10	100.0
1	0.0	1.0	1.411390	0.743732	12	-1.0
2	0.0	1.0	1.039892	1.290588	7	-1.0
3	1.0	0.0	2.338679	0.973942	15	150.0
4	0.0	1.0	2.385257	0.297921	9	-1.0
5	0.0	1.0	2.912910	0.244489	8	-1.0
6	1.0	0.0	2.585491	0.133044	9	90.0
7	1.0	0.0	2.961107	0.338565	8	80.0
8	0.0	1.0	0.161944	0.481609	4	-1.0
9	0.0	1.0	2.512621	1.118481	16	-1.0
10	1.0	0.0	2.711287	0.463432	14	140.0
11	1.0	0.0	1.479011	0.860247	7	70.0
12	1.0	0.0	0.223923	1.030258	8	80.0
13	1.0	0.0	2.918245	0.409249	10	100.0
14	0.0	1.0	1.447071	0.061543	4	-1.0
15	1.0	0.0	2.269534	1.754568	19	190.0
16	0.0	1.0	2.804809	1.114212	15	-1.0
17	0.0	1.0	2.539715	1.850662	21	-1.0
18	0.0	1.0	1.300125	1.178924	5	-1.0
19	1.0	0.0	1.275172	0.756409	11	110.0

	col 0	col 1	col 2	col 3	col 4	col 5	col 6	col 7	col 8	col 9
row 0	0	1	2	3	4	5	6	7	8	9
row 1	10	11	12	13	14	15	16	17	18	19
row 2	20	21	22	23	24	25	26	27	28	29
row 3	30	31	32	33	34	35	36	37	38	39
row 4	40	41	42	43	44	45	46	47	48	49

	a	b	c
0	0	1	NaN
1	1	2	NaN
2	2	3	NaN
3	3	4	NaN
4	4	5	NaN
5	10	9	-1.0
6	6	7	NaN
7	7	8	NaN
8	8	9	NaN
9	9	10	NaN
10	10	11	NaN
11	11	12	NaN
12	12	13	NaN
13	13	14	NaN
14	14	15	NaN

	x_1	y
0	1.0	10
1	0.0	12
2	0.0	7
3	1.0	15
4	0.0	9
5	0.0	8
6	1.0	9
7	1.0	8
8	0.0	4
9	0.0	16
10	1.0	14
11	1.0	7
12	1.0	8
13	1.0	10
14	0.0	4
15	1.0	19
16	0.0	15
17	0.0	21
18	0.0	5
19	1.0	11

	x_1	y
0	1.0	10
1	0.0	12
2	0.0	7
3	1.0	15
4	0.0	9
5	0.0	8
6	1.0	9
7	1.0	8
8	0.0	4
9	0.0	16
10	1.0	14
11	1.0	7
12	1.0	8
13	1.0	10
14	0.0	4
15	1.0	19
16	0.0	15
17	0.0	21
18	0.0	5
19	1.0	11