Some contents are inspired by CS231N Python tutorial notebook
In this tutorial we cover
code = 230
if code == 229:
print('Hello CS229!')
elif code == 230:
print('That\'s deep learning!')
elif code < 200:
print('That is some undergraduate class')
else:
print('Wrong class!')
Python doesn't have "switch" statement.
Logical operators
true = True
false = False
if true:
print("It's true!")
if not false:
print("It's still true!")
if true and not false:
print("Anyhow, it's true!")
if false or not true:
print("True?")
else:
print("Okay, it's false now....")
&, | and ~ are all bitwise operators.
Arithmetic operators.
print(5 / 2) # floating number division
print(5 % 2) # remainder
print(5 ** 2) # exponentiation
print(5 // 2) # integer division
^ means bitwise XOR in Python.
We typically use range and enumerate for iterations. You can loop over all iterables.
for i in range(5):
print(i)
a = 5
while a > 0:
print(a)
a -= 1
Python doesn't have command like "a++" or "a--".
Python functions can take default arguments, they have to be at the end. Be VERY careful because forgetting that you have default argument can prevent you from debugging effectively.
def power(v, p=2):
return v ** p # How to return multiple values?
print(power(10))
print(power(10, 3))
Functions can support extra arguments. You can pass them on to another function, or make use of these directly.
def func2(*args, **kwargs):
print(args)
print(kwargs)
def func1(v, *args, **kwargs):
func2(*args, **kwargs)
if 'power' in kwargs:
return v ** kwargs['power']
else:
return v
print(func1(10, 'extra 1', 'extra 2', power=3))
print('--------------')
print(func1(10, 5))
cs_class_code = 'CS-229'
print('I like ' + str(cs_class_code) + ' a lot!')
print(f'I like {cs_class_code} a lot!')
print('I love CS229. (upper)'.upper())
print('I love CS229. (rjust 50)'.rjust(50))
print('we love CS229. (capitalize)'.capitalize())
print(' I love CS229. (strip) '.strip())
"f"-string (f for formatting?) is new since Python 3.6. Embed values using { }
print(f'{print} (print a function)')
print(f'{type(229)} (print a type)')
For reference, here is how people used to do things. Or you want more control.
print('Old school formatting: {2}, {1}, {0:10.2F}'.format(1.358, 'b', 'c'))
# Fill in order of 2, 1, 0. For the decimal number, fix at length of 10, round to 2 decimal places
list_1 = ['one', 'two', 'three']
list_2 = [1, 2, 3]
list_2.append(4)
list_2.insert(0, 'ZERO')
List extension is just addition
print(list_1 + list_2)
list_1_temp = ['a', 'b']
list_1_temp.extend(list_2)
print(list_1_temp)
But be VERY careful when you multiply a list, will explain later
print(list_1 * 3 + list_2)
print([list_1] * 3 + list_2)
pprint is your friend
import pprint as pp
pp.pprint([list_1] * 5 + list_2)
pp.pprint([list_1] * 2 + [list_2] * 3)
List comprehension can save a lot of lines
long_list = [i for i in range(9)]
long_long_list = [(i, j) for i in range(3) for j in range(5)]
long_list_list = [[i for i in range(3)] for _ in range(5)]
pp.pprint(long_list)
pp.pprint(long_long_list)
pp.pprint(long_list_list)
List is iterable!
string_list = ['a', 'b', 'c']
for s in string_list:
print(s)
for i, s in enumerate(string_list):
print(f'{i}, {s}')
Slicing. With numpy array (covered layter), you can do this to multi-dimensional ones as well.
print(long_list[:5])
print(long_list[:-1])
print(long_list[4:-1])
long_list[3:5] = [-1, -2]
print(long_list)
long_list.pop()
print(long_list)
Sorting a list (but remember that sorting can be costly). Documentation for sorting is here
random_list = [3, 12, 5, 6, 8, 2]
print(sorted(random_list))
random_list_2 = [(3, 'z'), (12, 'r'), (5, 'a'), (6, 'e'), (8, 'c'), (2, 'g')]
print(sorted(random_list_2, key=lambda x: x[1]))
Think first before copying Copy by reference not by value. More about copying here
orig_list = [[1, 2], [3, 4]]
dup_list = orig_list
dup_list[0][1] = 'okay'
pp.pprint(orig_list)
pp.pprint(dup_list)
a = [[1, 2, 3]]*3
b = [[1, 2, 3] for i in range(3)]
a[0][1] = 4
b[0][1] = 4
print(a)
print(b)
import copy
orig_list = [[1, 2], [3, 4]]
dup_list = copy.deepcopy(orig_list)
dup_list[0][1] = 'okay'
pp.pprint(orig_list)
pp.pprint(dup_list)
List that you cannot edit.
my_tuple = (10, 20, 30)
my_tuple[0] = 40
Split assignment makes your code shorter (also works for list).
a, b, c = my_tuple
print(f"a={a}, b={b}, c={c}")
for obj in enumerate(my_tuple):
print(obj)
my_set = {i ** 2 % 3 for i in range(10)}
my_dict = {(5 - i): i ** 2 for i in range(10)}
print(my_set)
print(my_dict)
print(my_dict.keys())
Updating and/or addint content to a dictionary
second_dict = {'a': 10, 'b': 11}
my_dict.update(second_dict)
pp.pprint(my_dict)
my_dict['new'] = 10
pp.pprint(my_dict)
Here is how to iterate through a dictionary. And remember that dictionary is NOT sorted by key value.
for k, it in my_dict.items(): # similar to for loop over enumerate(list)
print(k, it)
# Sorting keys by string order
for k, it in sorted(my_dict.items(), key=lambda x: str(x[0])):
print(k, it)
For defaultdict and sorted dictionary, see the collections documentation
import numpy as np
Initialize from existing list. If type is not consistent, numpy will give you weird result.
from_list = np.array([1, 2, 3])
from_list_2d = np.array([[1, 2, 3.0], [4, 5, 6]])
from_list_bad_type = np.array([1, 2, 3, 'a'])
pp.pprint(from_list)
print(f'\t Data type of integer is {from_list.dtype}')
pp.pprint(from_list_2d)
print(f'\t Data type of float is {from_list_2d.dtype}')
pp.pprint(from_list_bad_type)
Initialize with ones, zeros, or as identity matrix
print(np.ones(3))
print(np.ones((3, 3)))
print(np.zeros(3))
print(np.zeros((3, 3)))
print(np.eye(3))
Sampling over uniform distribution on $[0, 1)$.
print(np.random.random(3))
print(np.random.random((2, 2)))
Sampling over stnadard normal distribution.
print(np.random.randn(3, 3))
Numpy has built-in samplers of a lot of other common (and some not so common) distributions.
Shape/reshape and multi-dimensional arrays
array_1d = np.array([1, 2, 3, 4])
array_1by4 = np.array([[1, 2, 3, 4]])
array_2by4 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(array_1d.shape)
print(array_1by4.shape)
print(array_1d.reshape(-1, 4).shape)
print(array_2by4.size)
large_array = np.array([i for i in range(400)])
large_array = large_array.reshape((20, 20))
print(large_array[:, 5])
large_3d_array = np.array([i for i in range(1000)])
large_3d_array = large_3d_array.reshape((10, 10, 10))
print(large_3d_array[:, 1, 1])
print(large_3d_array[2, :, 1])
print(large_3d_array[2, 3, :])
print(large_3d_array[1, :, :])
Think about the order you need before using reshape.
small_array = np.arange(4)
print(np.reshape(small_array, (2, 2), order='C')) # Default order
print(np.reshape(small_array, (2, 2), order='F'))
This also works for sin, cos, tanh, etc.
array_1 = np.array([1, 2, 3, 4])
print(array_1 + 5)
print(array_1 * 5)
print(np.sqrt(array_1))
print(np.power(array_1, 2))
print(np.exp(array_1))
print(np.log(array_1))
For sum, mean, avg, std, var, etc, you can perform the operation on set axis.
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
pp.pprint(array_2d)
print(f'shape={array_2d.shape}')
print(np.sum(array_2d))
print(np.sum(array_2d, axis=0))
print(np.sum(array_2d, axis=1))
array_3d = np.array([i for i in range(8)]).reshape((2, 2, 2))
pp.pprint(array_3d)
print(np.sum(array_3d, axis=0))
print(np.sum(array_3d, axis=1))
print(np.sum(array_3d, axis=(1, 2)))
Numpy tend to do things element-wise. But be VERY CAREFUL when dimensions don't match. We will cover this in broadcasting. Actuall just be careful with dimension of arrays in general.
array_1 = np.array([1, 2, 3, 4])
array_2 = np.array([3, 4, 5, 6])
print(array_1 * array_2)
print(array_1 * array_2.reshape(4, -1)) # Come back to this later
Dot product can be written in 3 ways
print(array_1 @ array_2)
print(array_1.dot(array_2))
print(np.dot(array_1, array_2))
print(array_1.shape)
Here, you can't dot when the dimensions are incorrect. But it did not complain just now. Check the shapes!
array_1 = np.array([[1, 2, 3, 4]])
array_2 = np.array([[3, 4, 5, 6]])
print(array_1.shape)
print(array_1 * array_2)
print(array_1.dot(array_2))
With proper handling of shapes, things work. Also, dot is just matrix multiplication. You might just want to write matrix multiply to keep things consistent and be SURE that you have the correct shapes.
# T for transpose
print(array_1.dot(array_2.T))
print(array_1.T.dot(array_2))
print(np.matmul(array_1, array_2.T))
print(np.matmul(array_1.T, array_2))
weight_matrix = np.array([1, 2, 3, 4]).reshape(2, 2)
sample = np.array([[50, 60]]).T
np.matmul(weight_matrix, sample)
And of course, we typically use matmul for 2D matrix multiplications. For dim>3, Numpy treats it as a stack of matrices. See Matmul documentation
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])
print(np.matmul(mat1, mat2))
Notice that np.multiply is element-wise multiplication. NOT proper matrix multiplicatio.
a = np.array([i for i in range(10)]).reshape(2, 5)
print(a * a)
print(np.multiply(a, a))
print(np.multiply(a, 10))
Numpy has capability to perform operations on arrays with different shapes, inferring/expanding dimension as needed. Taking examples from Scipy's documentaiton on numpy, some examples can be
A (4d array): 8 x 1 x 6 x 1
B (3d array): 7 x 1 x 5
Result (4d array): 8 x 7 x 6 x 5
A (2d array): 5 x 4
B (1d array): 1
Result (2d array): 5 x 4
A (2d array): 5 x 4
B (1d array): 4
Result (2d array): 5 x 4
A (3d array): 15 x 3 x 5
B (3d array): 15 x 1 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 1
Result (3d array): 15 x 3 x 5
Essentially all dimensions of size 1 can be "over-looked" or "expanded" to match dimension from another operator. But the order of such must be matched. Dimension of size 1 is only prepended, not appended. For example, the following would not work, though you might think we can add another dimension at the end of B.
A (3d array): 15 x 3 x 5
B (2d array): 1 x 3
Result (3d array): 15 x 3 x 5
op1 = np.array([i for i in range(9)]).reshape(3, 3)
op2 = np.array([[1, 2, 3]])
op3 = np.array([1, 2, 3])
pp.pprint(op1)
pp.pprint(op2)
# Notice that the result here is DIFFERENT!
print(op2.shape)
pp.pprint(op1 + op2)
pp.pprint(op1 + op2.T)
# Notice that the result here are THE SAME!
print(op3.shape)
pp.pprint(op1 + op3)
pp.pprint(op1 + op3.T)
Here, broadcasting won't work for 15 x 3 x 5 with 1 x 3. Because dimensions are only prepended.
But it WILL work for 15 x 3 x 5 with 3 x 1.
op1 = np.array([i for i in range(225)]).reshape(15, 3, 5)
op2 = np.array([[1, 2, 3]])
# This does not work
# print(op1 + op2)
# This works
print(op1 + op2.T)
# BTW you can contract the cells by clicking on the left
Treat broadcasting as tilling the lower dimensional array to suit the size of the "more complex" array.
array = np.array([1, 2, 3])
# np.tile(array, shape)
print(np.tile(array, 2))
print(np.tile(array, (2, 3)))
Observe how, with transpose, the tiled result is different. Op2 originally has shape 1 x 3, so
Tiling it (1 x 5) means tiling 2nd dimension 5 times, yielding (1 x 15)
Tiling the transpose, thus 3 x 1, by (1 x 5) means tiling 2nd dimension 5 times, yielding (3 x 5)
op1 = np.array([i for i in range(225)]).reshape(15, 3, 5)
op2 = np.array([[1, 2, 3]])
op_tiled= np.tile(op2, (1, 5))
print(op_tiled.shape)
op_tiled= np.tile(op2.T, (1, 5))
print(op_tiled.shape)
Add a dimension of size 1 or remove dimension of size 1. Here we massage op2 (shape=(1, 3)) to shape of (15, 3, 5)
op_expanded = np.expand_dims(op2, axis=2)
print(op_expanded.shape)
op_tiled_2 = np.tile(op_expanded, (15, 1, 5))
print(op_tiled_2.shape)
Same effect with np.newaxis
op3 = np.array([i for i in range(9)]).reshape(3, 3)
op_na = op3[np.newaxis, :]
print(op_na)
print(op_na.shape)
op_na2 = op3[:, np.newaxis, :]
print(op_na2)
print(op_na2.shape)
Squeeze removes size 1 dimensions
print(op_expanded)
print(op_expanded.shape)
op_squeezed = np.squeeze(op_expanded)
print(op_squeezed)
Here are 3 ways to compute pairwise distances.
samples = np.random.random((15, 5))
print(samples.shape)
print(samples)
# Without broadcasting
expanded1 = np.expand_dims(samples, axis=1)
tile1 = np.tile(expanded1, (1, samples.shape[0], 1))
#print(expanded1.shape)
#print(tile1.shape)
#print(tile1)
expanded2 = np.expand_dims(samples, axis=0)
tile2 = np.tile(expanded2, (samples.shape[0], 1 ,1))
#print(expanded2.shape)
#print(tile2.shape)
#print(tile2)
diff = tile2 - tile1
distances = np.linalg.norm(diff, axis=-1)
# print(distances)
print(np.mean(distances))
##################################
# With broadcasting
diff = samples[: ,np.newaxis, :] - samples[np.newaxis, :, :]
distances = np.linalg.norm(diff, axis=-1)
# print(distances)
print(np.mean(distances))
# With scipy
import scipy.spatial
distances = scipy.spatial.distance.cdist(samples, samples)
# print(distances)
# print(len(distances))
print(np.mean(distances))
tqdm is a nice package for you to track progress, or just kill time.
import time # time.time() gets wall time, time.clock() gets processor time
from tqdm import tqdm
Numpy is 25 times faster than loops here.
a = np.random.random(500000)
b = np.random.random(500000)
p_tic = time.perf_counter()
tic = time.time()
dot = 0.0;
for i in tqdm(range(len(a))):
dot += a[i] * b[i]
print(dot)
toc = time.time()
p_toc = time.perf_counter()
print(f'Result: {dot}');
print(f'Compute time (wall): {round(1000 * (toc - tic), 6)}ms')
print(f'Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms\n')
#####################################################################
p_tic = time.perf_counter()
tic = time.time()
print(np.array(a).dot(np.array(b)))
toc = time.time()
p_toc = time.perf_counter()
print(f'(vectorized) Result: {dot}');
print(f'(vectorized) Compute time: {round(1000 * (toc - tic), 6)}ms')
print(f'(vectorized) Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms')
Numpy is more than TWO THOUSAND times faster than loops here.
Matrix multiplication is a O(n^3) complexity operation if implemented naively.
def matrix_mul(X, Y):
# iterate through rows of X
for i in range(len(X)):
# iterate through columns of Y
for j in range(len(Y[0])):
# iterate through rows of Y
for k in range(len(Y)):
result[i][j] += X[i][k] * Y[k][j]
return result
X = np.random.random((200, 200))
Y = np.random.random((200, 200))
result = np.zeros((200, 200))
p_tic = time.perf_counter()
tic = time.time()
# iterate through rows of X
for i in tqdm(range(len(X))):
# iterate through columns of Y
for j in range(len(Y[0])):
# iterate through rows of Y
for k in range(len(Y)):
result[i][j] += X[i][k] * Y[k][j]
s = np.sum(result)
toc = time.time()
p_toc = time.perf_counter()
print(f'Result: {s}');
print(f'Compute time (wall): {round(1000 * (toc - tic), 6)}ms')
print(f'Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms\n')
#####################################################################
p_tic = time.perf_counter()
tic = time.time()
result = np.matmul(X, Y)
s = np.sum(result)
toc = time.time()
p_toc = time.perf_counter()
print(f'(vectorized) Result: {s}');
print(f'(vectorized) Compute time: {round(1000 * (toc - tic), 6)}ms')
print(f'(vectorized) Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms')
Again, numpy is 30 times faster
samples = np.random.random((100, 5))
p_tic = time.perf_counter()
tic = time.time()
total_dist = []
for s1 in samples:
for s2 in samples:
d = np.linalg.norm(s1 - s2)
total_dist.append(d)
avg_dist = np.mean(total_dist)
toc = time.time()
p_toc = time.perf_counter()
print(f'Result: {avg_dist}');
print(f'Compute time (wall): {round(1000 * (toc - tic), 6)}ms')
print(f'Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms\n')
#####################################################################
p_tic = time.perf_counter()
tic = time.time()
diff = samples[: ,np.newaxis, :] - samples[np.newaxis, :, :]
distances = np.linalg.norm(diff, axis=-1)
avg_dist = np.mean(distances)
toc = time.time()
p_toc = time.perf_counter()
print(f'Result: {avg_dist}');
print(f'Compute time (wall): {round(1000 * (toc - tic), 6)}ms')
print(f'Compute time (cpu) : {round(1000 * (p_toc - p_tic), 6)}ms\n')
You might want to make sure that OpenBLAS is installed. OpenBLAS is a "basic linear algebra subprograms" package that basically, speeds up math for numpy.
np.show_config()
We want to plot with proper labels, series legend, and even markets.
# If you are using a headless environment. Very important if running on server
# import matplotlib
# matplotlib.use('Agg')
import matplotlib.pyplot as plt
def draw_simple_sin_cos(x_values):
y1_values = np.sin(x_values * np.pi)
y2_values = np.cos(x_values * np.pi)
plt.plot(x_values, y1_values, label='Sine')
plt.plot(x_values, y2_values, label='Cosine')
plt.legend()
plt.xlabel('x')
plt.ylabel('values')
plt.title('Values for sin and cos, scaled by $\phi_i$')
x_values = np.arange(0, 20, 0.001)
draw_simple_sin_cos(x_values)
plt.show()
You can adjust figure size for aspect ratio then DPI for density of pixels. These combined give you resolution of the image
plt.figure(figsize=(10,3), dpi=100) # 640 x 450
draw_simple_sin_cos(x_values)
plt.savefig('tutorial_sin.jpg')
plt.show()
Subplots in a grid can share axis labels through sharex and sharey.
def draw_subplot_sin_cos(index, x_values, ax):
y1_values = np.sin(x_values * np.pi)
y2_values = np.cos(x_values * np.pi)
ax.plot(x_values, y1_values, c='r', label='Sine')
ax.scatter(x_values, y2_values, s=4, label='Cosine')
ax.legend()
ax.set_xlabel('x')
ax.set_ylabel('values')
ax.set_title(f'Values for sin and cos (Subplot #{index})')
fig, ax_list = plt.subplots(nrows=2, ncols=2, figsize=(10, 10))
#fig, ax_list = plt.subplots(nrows=2, ncols=2,
# sharex='col', sharey='row',
# figsize=(10, 10))
i = 0
for r, row in enumerate(ax_list):
for c, ax in enumerate(row):
x_values = np.arange(i, i + 10, 0.1)
draw_subplot_sin_cos(i, x_values, ax)
i += 1
plt.show()
Here we show plotting confusion matrix from scratch. For a pre-built one, see implementation by scikit-learn
fig, ax = plt.subplots(figsize=(10,10))
color='YlGn'
labels = ['Python', 'C++', 'Fortran']
cm = np.array([[0.7, 0.3, 0.2], [0.1, 0.5, 0.4], [0.05, 0.1, 0.85]])
heatmap = ax.pcolor(cm, cmap=color)
fig.colorbar(heatmap)
ax.invert_yaxis()
ax.xaxis.tick_top()
ax.set_title('Confusion Matrix')
ax.set_xlabel('Prediction')
ax.set_ylabel('Groud Truth')
ax.set_xticks(np.arange(cm.shape[0]) + 0.5, minor=False)
ax.set_yticks(np.arange(cm.shape[1]) + 0.5, minor=False)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.show()
When showing images, remember to tell numpy the range of pixel values. Typically pixel values are either 0-1 or 0-255.
img_arr = np.random.random((256, 256))# 0 -> 1
print(img_arr.shape)
plt.imshow(img_arr, cmap='gray', vmin=0.2, vmax=0.25)
plt.show()
By default numpy goes channel first.
img_arr = np.random.random((256, 256, 3))# R, C, (RGB)
print(img_arr.shape)
plt.imshow(img_arr, vmin=0, vmax=1)
plt.show()
Remember to move axis around if you want to use the default plotting tool.
img_arr = np.random.random((3, 256, 256))# (RGB) R C
print(img_arr.shape)
img_arr = np.moveaxis(img_arr, 0, -1)
print(img_arr.shape)
plt.imshow(img_arr, vmin=0, vmax=1)
plt.show()
import imageio
fname = 'sample.jpg'
img = imageio.imread(fname)
pp.pprint(img)
print(img.shape)
plt.figure(dpi=250) # dpi=500 -> larger
plt.imshow(img, vmin=0, vmax=10000, interpolation='bilinear')
plt.show()
Pandas is great data processing library for table/database-like data. Excellent for things that come in or you wish to output as CSV/Excel. Part of the following content is inspired by a Pandas tutorial online.
import pandas as pd
data = pd.read_csv('train.csv')
data_short = data[:20]
data_short
You can get basic statiscis with little effort like this
print(data['x_1'].describe()) # For one column
data.describe() # For the entire dataframe
See Pandas documentation for parameters of to_csv function.
data_short.to_csv('data_short.csv', index=False)
Columns can be selected and filtered based on value/name. Be careful with binary operation for filtering due to order of execution. Bitwise operations takes precedence over boolean.
data_short[['x_1', 'y']]
data_short[(data_short['y'] > 5) & (data_short['x_3'] < 1.5)] # Use & | instead of and/or. Put brackets around
A filter function can be applied to generate a new column (you can use this to apply a trained model for prediction result).
We can add column based on filter/conditions.
def filter_func(row):
if row['x_1'] == 1.0 and row['x_2'] == 0.0:
return row['y'] * 10
return -1
data_short['new_column'] = data_short[['x_1', 'x_2', 'y']].apply(filter_func, axis=1)
data_short
Iterating through Pandas rows can be done as follows. Each row is a "dictionary". Adding data directly via a list of values is also valid.
col2 = []
for i, row in data_short.iterrows():
print(f'Row {i}: y-value: {row["y"]}')
col2.append(row['y'] ** 2)
data_short['col_2'] = col2
data_short
Not a great example here, but loc means index by value, iloc means index by index. For example, you can do iloc by -1, but NOT loc by -1.
print(data.loc[19])
print(data.iloc[-1])
You can create dataframe from dictionary in row or column major manner. Notice that "extra" things will be filled with Nan.
data_list = [{'a': i, 'b': i + 1} for i in range(15)]
data_list[5] = {'a': 10, 'b': 9, 'c': -1}
df = pd.DataFrame(data_list)
df
Dataframe can also be created from 2D array. Naming the rows and columns is a good practice.
data_2d = np.array([i for i in range(50)]).reshape(5, 10)
df = pd.DataFrame(data_2d, columns=[f'col {i}' for i in range(10)], index=[f'row {i}' for i in range(5)])
df
Similarly, you can create dataframe directly from dictionary. It also supports whether the dicionary keys are row/col indices.
data_dict = {'col 1': [3, 2, 1, 0],
'col 2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame.from_dict(data_dict)
df
df = pd.DataFrame.from_dict(data_dict, orient='index')
df
Pandas also support plotting. The images it generates are the same style as those in Matplotlib. Pandas plotting provides a quick way to visualize, while you might still need to resort to Matplotlib for more formal plots with higher flexibility.
data.plot(kind='scatter', x='x_3', y='y', title='Plot of Data');
data['y'].plot(kind='hist', title='Y');
data.boxplot(column='x_3', by='y');
data.to_numpy()
plt.scatter(data['x_1'], data['y'])