presented in advanced tutorial track at EuroScipy conference in Erlangen in 2016.
Introduction
In scientific computing we often munge large arrays of numbers coming from experiments or simulations. For example, one may try to process time series coming from sensors or summarise assays obtained with high-throughput technologies. These data are often small enough to be stored and analysed on a single machine, but large enough to require CPU- and memory-efficient algorithms.
In this tutorial I will present how to work with multi-dimensional (mainly numerical) arrays in Python. I will focus on NumPy and its ndarray
and show how to leverage its implementation to boost performance of data processing. I will also demonstrate how to work with labelled data in NumPy and xarray
.
Format
The tutorial will consist of short demos of some advanced NumPy features followed by hands-on problems. The format is inspired by Software Carpentry workshops and will use much of its teaching techniques.
Pre-requisites
- installation of Python 3, Jupyter notebook, NumPy, xarray, Cython, pillow and matplolib
You can install all dependencies withconda
(link):
conda create -n advanced_numpy python=3 notebook numpy matplotlib xarray pillow cython
This will createadvanced_numpy
environment, which you can activate with:
source activate advanced_numpy
- familiarity with basic features of NumPy (creating arrays, indexing, elementwise operation, axis-based reductions)
Tentative syllabus
- broadcasting
ndarray
object, manipulatingstrides
- dtypes and structured arrays
- labelled arrays with
xarray
- interfacing with other libraries through
__array_interface__
- ufuncs and generalised ufuncs
- extending NumPy with Cython
Materials
All materials are available as Jupyter notebooks on Github