{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "209bb12c",
   "metadata": {},
   "source": [
    "# NumPy Tutorial 05: Statistics, Linear Algebra, and Data I/O"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78ae2363",
   "metadata": {},
   "source": [
    "## Download Notebook\n",
    "\n",
    "{download}`Download this notebook <05_statistics_linear_algebra_and_io.ipynb>`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4b8694e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "003b83e9",
   "metadata": {},
   "source": [
    "## 1. Descriptive statistics by axis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c8b56ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.array([\n",
    "    [2.0, 4.0, 6.0],\n",
    "    [1.0, 3.0, 5.0],\n",
    "    [7.0, 8.0, 9.0]\n",
    "])\n",
    "\n",
    "print('mean axis=0:', data.mean(axis=0))\n",
    "print('mean axis=1:', data.mean(axis=1))\n",
    "print('std axis=0:', np.round(data.std(axis=0), 4))\n",
    "print('percentiles (25,50,75):\\n', np.percentile(data, [25, 50, 75]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1db36e09",
   "metadata": {},
   "source": [
    "## 2. Handling missing values (NaN)\n",
    "\n",
    "Use `nan*` functions to ignore missing values safely."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8b9dde5",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.array([1.0, np.nan, 3.0, np.nan, 5.0])\n",
    "print('mean (normal):', np.mean(x))\n",
    "print('mean (nanmean):', np.nanmean(x))\n",
    "print('sum (nansum):', np.nansum(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43fd0e39",
   "metadata": {},
   "source": [
    "## 3. Linear algebra best practice\n",
    "\n",
    "For linear systems, prefer `np.linalg.solve(A, b)` over computing `inv(A) @ b`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "293b62a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.array([[3.0, 1.0], [1.0, 2.0]])\n",
    "b = np.array([9.0, 8.0])\n",
    "\n",
    "x = np.linalg.solve(A, b)\n",
    "x_inv = np.linalg.inv(A) @ b\n",
    "\n",
    "print('solve result:', x)\n",
    "print('inv(A) @ b:', x_inv)\n",
    "print('close?', np.allclose(x, x_inv))\n",
    "print('det(A):', np.linalg.det(A))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "694437a0",
   "metadata": {},
   "source": [
    "## 4. Eigen decomposition and SVD (intro)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb09b293",
   "metadata": {},
   "outputs": [],
   "source": [
    "M = np.array([[2.0, 0.0], [0.0, 1.0]])\n",
    "vals, vecs = np.linalg.eig(M)\n",
    "U, S, VT = np.linalg.svd(np.array([[1.0, 2.0], [3.0, 4.0]]))\n",
    "\n",
    "print('eigenvalues:', vals)\n",
    "print('eigenvectors:\\n', vecs)\n",
    "print('singular values:', S)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "420a79b0",
   "metadata": {},
   "source": [
    "## 5. Persist arrays to disk (`.npy`, `.npz`, `.txt`)\n",
    "\n",
    "`.npy` is efficient for one array, `.npz` is a zip container for multiple arrays, and `.npz` compressed trades CPU for smaller size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e9eb4cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "arr = np.arange(10).reshape(2, 5)\n",
    "arr2 = np.linspace(0, 1, 6).reshape(2, 3)\n",
    "\n",
    "# 1) Single-array binary format\n",
    "np.save('demo.npy', arr)\n",
    "loaded_npy = np.load('demo.npy')\n",
    "\n",
    "# 2) Multi-array container (.npz)\n",
    "np.savez('demo_bundle.npz', first=arr, second=arr2)\n",
    "bundle = np.load('demo_bundle.npz')\n",
    "\n",
    "# 3) Compressed multi-array container (.npz)\n",
    "np.savez_compressed('demo_bundle_compressed.npz', first=arr, second=arr2)\n",
    "bundle_compressed = np.load('demo_bundle_compressed.npz')\n",
    "\n",
    "# 4) Text format (human-readable, larger/slower)\n",
    "np.savetxt('demo.txt', arr, fmt='%d', delimiter=',')\n",
    "loaded_txt = np.loadtxt('demo.txt', delimiter=',')\n",
    "\n",
    "print('loaded npy:\\n', loaded_npy)\n",
    "print('npz keys:', bundle.files)\n",
    "print('npz first:\\n', bundle['first'])\n",
    "print('compressed npz keys:', bundle_compressed.files)\n",
    "print('loaded txt:\\n', loaded_txt)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}