NumPy Tutorial 05: Statistics, Linear Algebra, and Data I/O

Download Notebook

Download this notebook

import numpy as np

1. Descriptive statistics by axis

data = np.array([
    [2.0, 4.0, 6.0],
    [1.0, 3.0, 5.0],
    [7.0, 8.0, 9.0]
])

print('mean axis=0:', data.mean(axis=0))
print('mean axis=1:', data.mean(axis=1))
print('std axis=0:', np.round(data.std(axis=0), 4))
print('percentiles (25,50,75):\n', np.percentile(data, [25, 50, 75]))
mean axis=0: [3.33333333 5.         6.66666667]
mean axis=1: [4. 3. 8.]
std axis=0: [2.6247 2.1602 1.6997]
percentiles (25,50,75):
 [3. 5. 7.]

2. Handling missing values (NaN)

Use nan* functions to ignore missing values safely.

x = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
print('mean (normal):', np.mean(x))
print('mean (nanmean):', np.nanmean(x))
print('sum (nansum):', np.nansum(x))
mean (normal): nan
mean (nanmean): 3.0
sum (nansum): 9.0

3. Linear algebra best practice

For linear systems, prefer np.linalg.solve(A, b) over computing inv(A) @ b.

A = np.array([[3.0, 1.0], [1.0, 2.0]])
b = np.array([9.0, 8.0])

x = np.linalg.solve(A, b)
x_inv = np.linalg.inv(A) @ b

print('solve result:', x)
print('inv(A) @ b:', x_inv)
print('close?', np.allclose(x, x_inv))
print('det(A):', np.linalg.det(A))
solve result: [2. 3.]
inv(A) @ b: [2. 3.]
close? True
det(A): 5.000000000000001

4. Eigen decomposition and SVD (intro)

M = np.array([[2.0, 0.0], [0.0, 1.0]])
vals, vecs = np.linalg.eig(M)
U, S, VT = np.linalg.svd(np.array([[1.0, 2.0], [3.0, 4.0]]))

print('eigenvalues:', vals)
print('eigenvectors:\n', vecs)
print('singular values:', S)
eigenvalues: [2. 1.]
eigenvectors:
 [[1. 0.]
 [0. 1.]]
singular values: [5.4649857  0.36596619]

5. Persist arrays to disk (.npy, .npz, .txt)

.npy is efficient for one array, .npz is a zip container for multiple arrays, and .npz compressed trades CPU for smaller size.

arr = np.arange(10).reshape(2, 5)
arr2 = np.linspace(0, 1, 6).reshape(2, 3)

# 1) Single-array binary format
np.save('demo.npy', arr)
loaded_npy = np.load('demo.npy')

# 2) Multi-array container (.npz)
np.savez('demo_bundle.npz', first=arr, second=arr2)
bundle = np.load('demo_bundle.npz')

# 3) Compressed multi-array container (.npz)
np.savez_compressed('demo_bundle_compressed.npz', first=arr, second=arr2)
bundle_compressed = np.load('demo_bundle_compressed.npz')

# 4) Text format (human-readable, larger/slower)
np.savetxt('demo.txt', arr, fmt='%d', delimiter=',')
loaded_txt = np.loadtxt('demo.txt', delimiter=',')

print('loaded npy:\n', loaded_npy)
print('npz keys:', bundle.files)
print('npz first:\n', bundle['first'])
print('compressed npz keys:', bundle_compressed.files)
print('loaded txt:\n', loaded_txt)
loaded npy:
 [[0 1 2 3 4]
 [5 6 7 8 9]]
npz keys: ['first', 'second']
npz first:
 [[0 1 2 3 4]
 [5 6 7 8 9]]
compressed npz keys: ['first', 'second']
loaded txt:
 [[0. 1. 2. 3. 4.]
 [5. 6. 7. 8. 9.]]