{ "cells": [ { "cell_type": "markdown", "id": "2fdd1b78", "metadata": {}, "source": [ "# NumPy Tutorial 01: Foundations and Performance\n", "\n", "This chapter builds a practical mental model of NumPy arrays, performance, and numerical precision." ] }, { "cell_type": "markdown", "id": "1f1b0bfb", "metadata": {}, "source": [ "## Download Notebook\n", "\n", "{download}`Download this notebook <01_getting_started.ipynb>`" ] }, { "cell_type": "code", "execution_count": null, "id": "c2cc82ae", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from time import perf_counter" ] }, { "cell_type": "markdown", "id": "4cc56d3e", "metadata": {}, "source": [ "## 1. Why ndarray instead of Python list?\n", "\n", "`ndarray` stores homogeneous numeric data in contiguous memory (or a predictable strided layout), which enables fast vectorized operations." ] }, { "cell_type": "code", "execution_count": null, "id": "7cd52554", "metadata": {}, "outputs": [], "source": [ "py_list = list(range(10))\n", "np_array = np.arange(10)\n", "\n", "print('Python list:', py_list[:5], '...')\n", "print('NumPy array:', np_array[:5], '...')\n", "print('Type:', type(np_array))" ] }, { "cell_type": "markdown", "id": "48e1e090", "metadata": {}, "source": [ "## 2. Vectorization benchmark\n", "\n", "We compare pure Python loops and NumPy vectorization on the same computation: $y = x^2 + 3x + 1$." ] }, { "cell_type": "code", "execution_count": null, "id": "0b41a977", "metadata": {}, "outputs": [], "source": [ "n = 1_000_000\n", "x_list = list(range(n))\n", "x_np = np.arange(n, dtype=np.float64)\n", "\n", "t0 = perf_counter()\n", "y_list = [v * v + 3 * v + 1 for v in x_list]\n", "t1 = perf_counter()\n", "\n", "t2 = perf_counter()\n", "y_np = x_np * x_np + 3 * x_np + 1\n", "t3 = perf_counter()\n", "\n", "print(f'Python loop time: {t1 - t0:.4f}s')\n", "print(f'NumPy vectorized time: {t3 - t2:.4f}s')\n", "print('First 5 values equal?', np.allclose(y_np[:5], y_list[:5]))" ] }, { "cell_type": "markdown", "id": "75e488fc", "metadata": {}, "source": [ "## 3. Dtype and numerical precision\n", "\n", "Choosing `dtype` is a tradeoff between precision and memory." ] }, { "cell_type": "code", "execution_count": null, "id": "d70cdde1", "metadata": {}, "outputs": [], "source": [ "a32 = np.array([1, 2, 3], dtype=np.float32)\n", "a64 = np.array([1, 2, 3], dtype=np.float64)\n", "\n", "print('float32 itemsize:', a32.itemsize, 'bytes')\n", "print('float64 itemsize:', a64.itemsize, 'bytes')\n", "\n", "big = np.arange(1_000_000, dtype=np.float32)\n", "print('float32 total bytes:', big.nbytes)\n", "print('float64 total bytes:', big.astype(np.float64).nbytes)" ] }, { "cell_type": "markdown", "id": "1aa16ce0", "metadata": {}, "source": [ "## 4. Reproducible random numbers\n", "\n", "Use the modern random generator API (`default_rng`) for reproducible experiments." ] }, { "cell_type": "code", "execution_count": null, "id": "6cf4699b", "metadata": {}, "outputs": [], "source": [ "rng1 = np.random.default_rng(42)\n", "rng2 = np.random.default_rng(42)\n", "\n", "s1 = rng1.normal(loc=0.0, scale=1.0, size=5)\n", "s2 = rng2.normal(loc=0.0, scale=1.0, size=5)\n", "\n", "print('Sample 1:', s1)\n", "print('Sample 2:', s2)\n", "print('Exactly same?', np.allclose(s1, s2))" ] }, { "cell_type": "markdown", "id": "0e48d1ee", "metadata": {}, "source": [ "## 5. Practice\n", "\n", "1. Create an array `x = [0, 0.1, 0.2, ..., 9.9]`.\n", "2. Compute `sin(x) + cos(x)` without loops.\n", "3. Measure runtime for vectorized vs loop implementation." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }