Rust for scientific computing

I am much impressed by Python’s expressiveness, ease of programming and development speed. However, as a dynamically typed language pure Python suffers from poor performance, which heavily impacts the numerical algorithms. Therefore, many of the computational tasks are often dispatched to binary Python extensions implemented in C, C++ or other languages. Indeed, the ease of interfacing with external languages is another of Python’s super powers, which also led to its reputation as the glue language 1.

Recently, I have been experimenting with developing Python extensions in Rust, using the amazing pyo32 library (“crate” in Rust world). Although the learning curve is steep at times (hello borrow checker and lifetime annotations), the official Rust book3 does great job in explaining complex topics.

The improvement of performance for the Rust implemented extensions is mind-blowing. You can easily achieve speed-ups of 10 times or more. It’s not surprising that more and more Python libraries are (re)implemented in Rust (ruff, pydantic, polars). However, after spending a few hundred of hours in programming in Rust, I discovered other advantages of this language. Let’s go through some of them.

Efficient memory use

Rust and its borrow checker allow for very fine-grained control of the memory use. You can choose to move, clone, or borrow the original data depending on the lifecycle of your objects and their planned use.

Borrowing saves memory and time when you want to share some data within your program without the need of expensive copies. This comes of course at a cost: if you keep the reference around, you need to make sure that it remains valid (borrow checker will shout at you if it does not), which often means that you have to tag its lifetime.

For many developers starting with Rust (including me), the lifetime is a foreign concept and needs time to get familiar with. In addition, according to strict rust rules you can have only one mutable or many immutable references to the same variable, which limits drastically what you can do with your objects. Fortunately, the standard library offers smart pointers that will implement automatic reference counting (Rc, Arc) and copies-on-write (Cow) for your data structures. With these pointers cloning becomes cheap as it only needs to increment the reference counter behind the scenes.

If you need polymorphic object (references that can store more than one type), Rust implements an original feature called enums with value variants4

enum SensorReading {
    Temperature(f32),
    ParticleCount(i32),
    GateState(bool),
    Offline,
}

let new_reading = SensorReading::ParticleCount(5);

match new_reading {
    SensorReading::Temperature(value) => {
        println!("Temperature is {}", value)},
    SensorReading::ParticleCount(value) => {
        println!("Detected {} particles", value)},
    SensorReading::GateState(value) => {
        if value {
            println!("Gate is open")
        } else {
            println!("Gate is closed")
        }
    }
    SensorReading::Offline => { 
        println!("Sensor is offline")},        
};

    

To allow for such enums to be allocated on the stack, their size is aligned to the largest variant. This may seem inefficient but it allows one to avoid expensive heap allocations.

Another interesting feature is methods that consume (move) the self object. This is useful for implementing processing pipelines where one object is transformed into another one and this one in turn is transformed again

All of this is coupled with memory safety guarantees. What is there not to like?

Helpful type system

The type system in Rust is designed such that it helps you to avoid errors. Therefore, you are encouraged to create custom types for your data structures and Rust will check whether they are used correctly.

For example, you may have a vector of integers encoding the timestamps of your measurements where each measurement is a vector of integers with readings from different sensors. Even though, the data structure is the same, you will not use the same type to represent them so that the compiler will warn you when you confuse timestamps with measurements5 :

struct Timestamps(Vec<i32>);
struct Measurements(Vec<i32>);

fn resample(measurements: Measurements, timestamps: Timestamps) -> Measurements {
   // some code transforming measurements
   measurements
}


let measurements = Measurements(vec![3, 4]);
let timestamps = Timestamps(vec![1, 2]);
    
// this will not compile
// let resampled = resample(timestamps, measurements);
    
// this will compile
let resampled = resample(measurements, timestamps);

This wrapping of built-in types with a custom structure, is called the newtype pattern. The compiler will translate the custom types into plain data structures, so there is no performance overhead of using them. This is what, Rustaceans call zero-cost abstractions.

However, don’t be surprised: rust is not an object-oriented language and there is no inheritance, so you should favor the composition for code reuse.

The strict type system also simplifies the refactoring and, as much as I hate to admit it, it discharges the developer from some of the effort in implementing unit tests. If you program compiles, in ~90% of cases it will run just fine (of course, you still have to to test if it does not produce rubbish results).

Easy parallelism

Rust has a reputation of being extremely efficient. World’s fastest web servers and programming toolchains (including node’s and Python’s toolchains) are implemented in Rust. What is more, you can further boost this performance with multithreading: as there is no GIL known from Python world, threads can be run in parallel over multiple CPU cores. Rust’s type system and borrow checker will help you avoid data races without hard to debug synchronization schemes.

This fearless concurrency greatly simplifies parallel programming, but there is no need of spawning and joining threads manually ­– external crates such as rayon will turn your plain iterators into multi-core beasts6.

Conclusion

These are some features that I found useful and which in my opinion put Rust apart from other languages. I am sure that there are still other gems that I will discover when my Rust competence grows. In the meantime, if you have your favorite functionalities please share them with us.

  1. https://numpy.org/doc/stable/user/c-info.python-as-glue.html ↩︎
  2. https://pyo3.rs/ ↩︎
  3. https://doc.rust-lang.org/stable/book/ ↩︎
  4. Run this code on Rust playground ↩︎
  5. Run this code on Rust playground ↩︎
  6. Blog post introducing rayon ↩︎

Generating LaTeX tables from CSV files

I am very committed to the idea of the reproducibility. The way I understand the term is that there should be a close link between the results presented in the paper and the raw data. It happens all too often that some pre-processing step essential for the results presented in the paper is modified slightly during the preparation of the manuscript, but the figures, tables and statistics are not updated accordingly. Continue reading “Generating LaTeX tables from CSV files”

Python Autumn School 2010

Our next school for Advanced Programming in Python will take place in Trento, Italy on October 4th-8th, 2010. Application deadline: August 31st, 2010. Bellow will you find the detailed program:

Day 0 — Software Carpentry & Advanced Python

  • Documenting code and using version control
  • Object-oriented programming, design patterns, and agile programming
  • Exception handling, lambdas, decorators, context managers, metaclasses

Day 1 — Software Carpentry

  • Test-driven development, unit testing & Quality Assurance
  • Debugging, profiling and benchmarking techniques
  • Data serialization: from pickle to databases

Day 2 — Scientific Tools for Python

  • Advanced NumPy
  • The Quest for Speed (intro): Interfacing to C
  • Programming project

Day 3 — The Quest for Speed

  • Writing parallel applications in Python
  • When parallelization does not help: the starving CPUs problem
  • Programming project

Day 4 — Practical Software Development

  • Efficient programming in teams
  • Programming project
  • The Pac-Man Tournament

Bernstein Stammtisch

Today (Wednesday, October 29th) Berstein Master students, PhD students, Postdocs and other people interested in neuroscience are meeting in Buchhandlung at a Stammtisch. This monthly reunion is a great opportuinity to get to know people related to Bernstein Center and exchange some ideas about neuroscience and other current topics. You are all invited!!

Buchhandlung
Tucholskystr., near the corner Auguststr.
October 29th, starting 19 hrs

I hope to meet you there.