Rust for scientific computing

July 1, 2024July 1, 2024
by btel

I am much impressed by Python’s expressiveness, ease of programming and development speed. However, as a dynamically typed language pure Python suffers from poor performance, which heavily impacts the numerical algorithms. Therefore, many of the computational tasks are often dispatched to binary Python extensions implemented in C, C++ or other languages. Indeed, the ease of interfacing with external languages is another of Python’s super powers, which also led to its reputation as the glue language ¹.

Recently, I have been experimenting with developing Python extensions in Rust, using the amazing pyo3² library (“crate” in Rust world). Although the learning curve is steep at times (hello borrow checker and lifetime annotations), the official Rust book³ does great job in explaining complex topics.

The improvement of performance for the Rust implemented extensions is mind-blowing. You can easily achieve speed-ups of 10 times or more. It’s not surprising that more and more Python libraries are (re)implemented in Rust (ruff, pydantic, polars). However, after spending a few hundred of hours in programming in Rust, I discovered other advantages of this language. Let’s go through some of them.

Efficient memory use

Rust and its borrow checker allow for very fine-grained control of the memory use. You can choose to move, clone, or borrow the original data depending on the lifecycle of your objects and their planned use.

Borrowing saves memory and time when you want to share some data within your program without the need of expensive copies. This comes of course at a cost: if you keep the reference around, you need to make sure that it remains valid (borrow checker will shout at you if it does not), which often means that you have to tag its lifetime.

For many developers starting with Rust (including me), the lifetime is a foreign concept and needs time to get familiar with. In addition, according to strict rust rules you can have only one mutable or many immutable references to the same variable, which limits drastically what you can do with your objects. Fortunately, the standard library offers smart pointers that will implement automatic reference counting (Rc, Arc) and copies-on-write (Cow) for your data structures. With these pointers cloning becomes cheap as it only needs to increment the reference counter behind the scenes.

If you need polymorphic object (references that can store more than one type), Rust implements an original feature called enums with value variants⁴

enum SensorReading {
    Temperature(f32),
    ParticleCount(i32),
    GateState(bool),
    Offline,
}

let new_reading = SensorReading::ParticleCount(5);

match new_reading {
    SensorReading::Temperature(value) => {
        println!("Temperature is {}", value)},
    SensorReading::ParticleCount(value) => {
        println!("Detected {} particles", value)},
    SensorReading::GateState(value) => {
        if value {
            println!("Gate is open")
        } else {
            println!("Gate is closed")
        }
    }
    SensorReading::Offline => { 
        println!("Sensor is offline")},        
};

To allow for such enums to be allocated on the stack, their size is aligned to the largest variant. This may seem inefficient but it allows one to avoid expensive heap allocations.

Another interesting feature is methods that consume (move) the self object. This is useful for implementing processing pipelines where one object is transformed into another one and this one in turn is transformed again

All of this is coupled with memory safety guarantees. What is there not to like?

Helpful type system

The type system in Rust is designed such that it helps you to avoid errors. Therefore, you are encouraged to create custom types for your data structures and Rust will check whether they are used correctly.

For example, you may have a vector of integers encoding the timestamps of your measurements where each measurement is a vector of integers with readings from different sensors. Even though, the data structure is the same, you will not use the same type to represent them so that the compiler will warn you when you confuse timestamps with measurements⁵ :

struct Timestamps(Vec<i32>);
struct Measurements(Vec<i32>);

fn resample(measurements: Measurements, timestamps: Timestamps) -> Measurements {
   // some code transforming measurements
   measurements
}


let measurements = Measurements(vec![3, 4]);
let timestamps = Timestamps(vec![1, 2]);
    
// this will not compile
// let resampled = resample(timestamps, measurements);
    
// this will compile
let resampled = resample(measurements, timestamps);

This wrapping of built-in types with a custom structure, is called the newtype pattern. The compiler will translate the custom types into plain data structures, so there is no performance overhead of using them. This is what, Rustaceans call zero-cost abstractions.

However, don’t be surprised: rust is not an object-oriented language and there is no inheritance, so you should favor the composition for code reuse.

The strict type system also simplifies the refactoring and, as much as I hate to admit it, it discharges the developer from some of the effort in implementing unit tests. If you program compiles, in ~90% of cases it will run just fine (of course, you still have to to test if it does not produce rubbish results).

Easy parallelism

Rust has a reputation of being extremely efficient. World’s fastest web servers and programming toolchains (including node’s and Python’s toolchains) are implemented in Rust. What is more, you can further boost this performance with multithreading: as there is no GIL known from Python world, threads can be run in parallel over multiple CPU cores. Rust’s type system and borrow checker will help you avoid data races without hard to debug synchronization schemes.

This fearless concurrency greatly simplifies parallel programming, but there is no need of spawning and joining threads manually – external crates such as rayon will turn your plain iterators into multi-core beasts⁶.

Conclusion

These are some features that I found useful and which in my opinion put Rust apart from other languages. I am sure that there are still other gems that I will discover when my Rust competence grows. In the meantime, if you have your favorite functionalities please share them with us.

Through the needle’s eye: Data science in production

March 27, 2019June 29, 2024
by admin

Data science is often associated with fast and dirty data analysis + machine learning solutions that do not follow the software engineering practices. Many regard data science more like a Swiss army tool that combines incompatible data and software components in impromptu ways.

Continue reading →

Streaming data with Amazon Kinesis

January 11, 2019June 29, 2024
by admin

I wrote this blog post when working at Sqreen, a startup that develops Software-as-a-service (SaaS) solutions to protect web applications from cyber attacks. This post summarizes the streaming technology used to analyse the attacks in real time.

Continue reading →

Giving presentations with IPython notebook

August 29, 2014June 5, 2023
by admin

IPython notebook became a very popular tool for programming short scripts in Python, interactive computing, sharing code, teaching or even demonstrations. Its advantage is the possibility to combine Python code with graphics, HTML, videos or even interactive JavaScript objects in one notebook. With this functionality it may also serve as a great presentation tool.

Continue reading “Giving presentations with IPython notebook” →

SfN 2012

October 6, 2012October 6, 2012
by admin

From October 11th to 18th, I will be traveling to Society of Neuroscience conference in New Orleans. If you are also there and want to meet, leave me a comment or send an e-mail.

Kiel 2012: Advanced Scientific Programming in Python

September 13, 2012September 13, 2012
by admin

I just delivered another talk on data visualization in Python:

All materials including exercises can be found at https://python.g-node.org/wiki/dataviz

6 steps to migrate your scientifc scripts to Python 3

June 15, 2012June 16, 2012
by admin

Python 3 has been around for some time (the most recent stable version is Python 3.2), but till now it was not widely adopted by scientific community. One of the reason was that the basic scientific Python libraries such as NumPy and SciPy were not ported to Python 3. Since this is no longer the case, there is no reasons anymore to resist migration to Python (you can find the pros and cons on the Python website)

In this guide I am going to describe some tips that I learnt while trying to make my scripts compatible with Python 3. There is nothing to be afraid of – the procedures are actually quite easy and very rewarding (it is like a glimpse into the future of Python!).
Continue reading “6 steps to migrate your scientifc scripts to Python 3” →

SpikeSort 0.12 released!

April 19, 2012April 19, 2012
by admin

First official version of SpikeSort was finally released. You can find out more about this spike sorting software from: http://spikesort.org Continue reading “SpikeSort 0.12 released!” →

Scientific computing with GAE and PiCloud

February 20, 2012April 19, 2012
by admin

Google App Engine (GAE) is a great platform for learning web programming and testing out new ideas. It is free and offers great functionality, such as Channel API (basically Websockets). Deployment is as easy as clicking a button (on a Mac) on running a Python script (on Linux). The best of all is that you can program in Python and offer an easy end-user web interface without time consuming installation, dependencies and nerves. Continue reading “Scientific computing with GAE and PiCloud” →

How to run IPython on MacOSX

January 13, 2012April 19, 2012
by btel

IPython is a very powerful and convenient Python console (alternative to standard Python interpreter) that makes every day tasks much easier. It also plays well with scientific libraries such as numpy and matplotlib making it the console of choice for almost every scientist. Continue reading “How to run IPython on MacOSX” →

Publication-quality figures with matplotlib and svgutils

April 12, 2011September 27, 2016
by btel

Matplotlib is a decent Python library for creating publication-quality plots which offers a multitude of different plot types. However, one limitation of matplotlib is that creating complex layouts can be at times complicated. Continue reading “Publication-quality figures with matplotlib and svgutils” →

New spike sorting library in Python

March 11, 2011April 19, 2012
by admin

Spike sorting is a common pre-processing step in analysis of single or multi-unit responses. The goal of the procedure is to detect the times at which a single cell generated an action potential based on the extracellular recordings of electric potential close to the cell. Continue reading “New spike sorting library in Python” →

Generating LaTeX tables from CSV files

August 9, 2010April 19, 2012
by btel

I am very committed to the idea of the reproducibility. The way I understand the term is that there should be a close link between the results presented in the paper and the raw data. It happens all too often that some pre-processing step essential for the results presented in the paper is modified slightly during the preparation of the manuscript, but the figures, tables and statistics are not updated accordingly. Continue reading “Generating LaTeX tables from CSV files” →

Python Autumn School 2010

July 15, 2010July 15, 2010
by btel

Our next school for Advanced Programming in Python will take place in Trento, Italy on October 4th-8th, 2010. Application deadline: August 31st, 2010. Bellow will you find the detailed program:

Day 0 — Software Carpentry & Advanced Python

Documenting code and using version control
Object-oriented programming, design patterns, and agile programming
Exception handling, lambdas, decorators, context managers, metaclasses

Day 1 — Software Carpentry

Test-driven development, unit testing & Quality Assurance
Debugging, profiling and benchmarking techniques
Data serialization: from pickle to databases

Day 2 — Scientific Tools for Python

Advanced NumPy
The Quest for Speed (intro): Interfacing to C
Programming project

Day 3 — The Quest for Speed

Writing parallel applications in Python
When parallelization does not help: the starving CPUs problem
Programming project

Day 4 — Practical Software Development

Efficient programming in teams
Programming project
The Pac-Man Tournament

Advanced Scientific Programming in Python

September 12, 2009September 27, 2016
by admin

I have just finished teaching at a summer school on Advanced Scientific Programming in Python.

The school was a remarkable success, which I hope most of the participants can agree with. Lets wait for the survey results.

The school featured among other thing a PacMan competition. More information can be found on the school wiki.

Simple point process models of spike trains

November 25, 2008November 27, 2008
by btel

Regarding the stochastic models of neural activity, which are the topic of the lecture and one of our computer classes, I invite you to watch the lecture of Daniel Wojcik (Nencki Insititute for Experimental Biology, Warsaw, Poland). The lecture will be broadcast live on Friday 28th at 16:00 GMT and then archived on the following website: www.spiketrain.org

Enjoy!

Bernstein Stammtisch

October 29, 2008
by btel

Today (Wednesday, October 29th) Berstein Master students, PhD students, Postdocs and other people interested in neuroscience are meeting in Buchhandlung at a Stammtisch. This monthly reunion is a great opportuinity to get to know people related to Bernstein Center and exchange some ideas about neuroscience and other current topics. You are all invited!!

Buchhandlung
Tucholskystr., near the corner Auguststr.
October 29th, starting 19 hrs

I hope to meet you there.

MNS 2008/09

October 26, 2008
by admin

The Model of Neural Systems programming course will start on Monday, October 27th. It will be given by Robert Schmidt and me. The first programming assignments are available on the course webpage. See you all on Monday!

Bartosz Teleńczuk

freelance data scientist

Category: Uncategorized