{ "cells": [ { "cell_type": "markdown", "id": "209bb12c", "metadata": {}, "source": [ "# NumPy Tutorial 05: Statistics, Linear Algebra, and Data I/O" ] }, { "cell_type": "markdown", "id": "78ae2363", "metadata": {}, "source": [ "## Download Notebook\n", "\n", "{download}`Download this notebook <05_statistics_linear_algebra_and_io.ipynb>`" ] }, { "cell_type": "code", "execution_count": null, "id": "b4b8694e", "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "id": "003b83e9", "metadata": {}, "source": [ "## 1. Descriptive statistics by axis" ] }, { "cell_type": "code", "execution_count": null, "id": "6c8b56ae", "metadata": {}, "outputs": [], "source": [ "data = np.array([\n", " [2.0, 4.0, 6.0],\n", " [1.0, 3.0, 5.0],\n", " [7.0, 8.0, 9.0]\n", "])\n", "\n", "print('mean axis=0:', data.mean(axis=0))\n", "print('mean axis=1:', data.mean(axis=1))\n", "print('std axis=0:', np.round(data.std(axis=0), 4))\n", "print('percentiles (25,50,75):\\n', np.percentile(data, [25, 50, 75]))" ] }, { "cell_type": "markdown", "id": "1db36e09", "metadata": {}, "source": [ "## 2. Handling missing values (NaN)\n", "\n", "Use `nan*` functions to ignore missing values safely." ] }, { "cell_type": "code", "execution_count": null, "id": "e8b9dde5", "metadata": {}, "outputs": [], "source": [ "x = np.array([1.0, np.nan, 3.0, np.nan, 5.0])\n", "print('mean (normal):', np.mean(x))\n", "print('mean (nanmean):', np.nanmean(x))\n", "print('sum (nansum):', np.nansum(x))" ] }, { "cell_type": "markdown", "id": "43fd0e39", "metadata": {}, "source": [ "## 3. Linear algebra best practice\n", "\n", "For linear systems, prefer `np.linalg.solve(A, b)` over computing `inv(A) @ b`." ] }, { "cell_type": "code", "execution_count": null, "id": "293b62a0", "metadata": {}, "outputs": [], "source": [ "A = np.array([[3.0, 1.0], [1.0, 2.0]])\n", "b = np.array([9.0, 8.0])\n", "\n", "x = np.linalg.solve(A, b)\n", "x_inv = np.linalg.inv(A) @ b\n", "\n", "print('solve result:', x)\n", "print('inv(A) @ b:', x_inv)\n", "print('close?', np.allclose(x, x_inv))\n", "print('det(A):', np.linalg.det(A))" ] }, { "cell_type": "markdown", "id": "694437a0", "metadata": {}, "source": [ "## 4. Eigen decomposition and SVD (intro)" ] }, { "cell_type": "code", "execution_count": null, "id": "cb09b293", "metadata": {}, "outputs": [], "source": [ "M = np.array([[2.0, 0.0], [0.0, 1.0]])\n", "vals, vecs = np.linalg.eig(M)\n", "U, S, VT = np.linalg.svd(np.array([[1.0, 2.0], [3.0, 4.0]]))\n", "\n", "print('eigenvalues:', vals)\n", "print('eigenvectors:\\n', vecs)\n", "print('singular values:', S)" ] }, { "cell_type": "markdown", "id": "420a79b0", "metadata": {}, "source": [ "## 5. Persist arrays to disk (`.npy`, `.npz`, `.txt`)\n", "\n", "`.npy` is efficient for one array, `.npz` is a zip container for multiple arrays, and `.npz` compressed trades CPU for smaller size." ] }, { "cell_type": "code", "execution_count": null, "id": "2e9eb4cc", "metadata": {}, "outputs": [], "source": [ "arr = np.arange(10).reshape(2, 5)\n", "arr2 = np.linspace(0, 1, 6).reshape(2, 3)\n", "\n", "# 1) Single-array binary format\n", "np.save('demo.npy', arr)\n", "loaded_npy = np.load('demo.npy')\n", "\n", "# 2) Multi-array container (.npz)\n", "np.savez('demo_bundle.npz', first=arr, second=arr2)\n", "bundle = np.load('demo_bundle.npz')\n", "\n", "# 3) Compressed multi-array container (.npz)\n", "np.savez_compressed('demo_bundle_compressed.npz', first=arr, second=arr2)\n", "bundle_compressed = np.load('demo_bundle_compressed.npz')\n", "\n", "# 4) Text format (human-readable, larger/slower)\n", "np.savetxt('demo.txt', arr, fmt='%d', delimiter=',')\n", "loaded_txt = np.loadtxt('demo.txt', delimiter=',')\n", "\n", "print('loaded npy:\\n', loaded_npy)\n", "print('npz keys:', bundle.files)\n", "print('npz first:\\n', bundle['first'])\n", "print('compressed npz keys:', bundle_compressed.files)\n", "print('loaded txt:\\n', loaded_txt)" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }