btel – Bartosz Teleńczuk

I am much impressed by Python’s expressiveness, ease of programming and development speed. However, as a dynamically typed language pure Python suffers from poor performance, which heavily impacts the numerical algorithms. Therefore, many of the computational tasks are often dispatched to binary Python extensions implemented in C, C++ or other languages. Indeed, the ease of interfacing with external languages is another of Python’s super powers, which also led to its reputation as the glue language ¹.

Recently, I have been experimenting with developing Python extensions in Rust, using the amazing pyo3² library (“crate” in Rust world). Although the learning curve is steep at times (hello borrow checker and lifetime annotations), the official Rust book³ does great job in explaining complex topics.

The improvement of performance for the Rust implemented extensions is mind-blowing. You can easily achieve speed-ups of 10 times or more. It’s not surprising that more and more Python libraries are (re)implemented in Rust (ruff, pydantic, polars). However, after spending a few hundred of hours in programming in Rust, I discovered other advantages of this language. Let’s go through some of them.

Efficient memory use

Rust and its borrow checker allow for very fine-grained control of the memory use. You can choose to move, clone, or borrow the original data depending on the lifecycle of your objects and their planned use.

Borrowing saves memory and time when you want to share some data within your program without the need of expensive copies. This comes of course at a cost: if you keep the reference around, you need to make sure that it remains valid (borrow checker will shout at you if it does not), which often means that you have to tag its lifetime.

For many developers starting with Rust (including me), the lifetime is a foreign concept and needs time to get familiar with. In addition, according to strict rust rules you can have only one mutable or many immutable references to the same variable, which limits drastically what you can do with your objects. Fortunately, the standard library offers smart pointers that will implement automatic reference counting (Rc, Arc) and copies-on-write (Cow) for your data structures. With these pointers cloning becomes cheap as it only needs to increment the reference counter behind the scenes.

If you need polymorphic object (references that can store more than one type), Rust implements an original feature called enums with value variants⁴

enum SensorReading {
    Temperature(f32),
    ParticleCount(i32),
    GateState(bool),
    Offline,
}

let new_reading = SensorReading::ParticleCount(5);

match new_reading {
    SensorReading::Temperature(value) => {
        println!("Temperature is {}", value)},
    SensorReading::ParticleCount(value) => {
        println!("Detected {} particles", value)},
    SensorReading::GateState(value) => {
        if value {
            println!("Gate is open")
        } else {
            println!("Gate is closed")
        }
    }
    SensorReading::Offline => { 
        println!("Sensor is offline")},        
};

To allow for such enums to be allocated on the stack, their size is aligned to the largest variant. This may seem inefficient but it allows one to avoid expensive heap allocations.

Another interesting feature is methods that consume (move) the self object. This is useful for implementing processing pipelines where one object is transformed into another one and this one in turn is transformed again

All of this is coupled with memory safety guarantees. What is there not to like?

Helpful type system

The type system in Rust is designed such that it helps you to avoid errors. Therefore, you are encouraged to create custom types for your data structures and Rust will check whether they are used correctly.

For example, you may have a vector of integers encoding the timestamps of your measurements where each measurement is a vector of integers with readings from different sensors. Even though, the data structure is the same, you will not use the same type to represent them so that the compiler will warn you when you confuse timestamps with measurements⁵ :

struct Timestamps(Vec<i32>);
struct Measurements(Vec<i32>);

fn resample(measurements: Measurements, timestamps: Timestamps) -> Measurements {
   // some code transforming measurements
   measurements
}


let measurements = Measurements(vec![3, 4]);
let timestamps = Timestamps(vec![1, 2]);
    
// this will not compile
// let resampled = resample(timestamps, measurements);
    
// this will compile
let resampled = resample(measurements, timestamps);

This wrapping of built-in types with a custom structure, is called the newtype pattern. The compiler will translate the custom types into plain data structures, so there is no performance overhead of using them. This is what, Rustaceans call zero-cost abstractions.

However, don’t be surprised: rust is not an object-oriented language and there is no inheritance, so you should favor the composition for code reuse.

The strict type system also simplifies the refactoring and, as much as I hate to admit it, it discharges the developer from some of the effort in implementing unit tests. If you program compiles, in ~90% of cases it will run just fine (of course, you still have to to test if it does not produce rubbish results).

Easy parallelism

Rust has a reputation of being extremely efficient. World’s fastest web servers and programming toolchains (including node’s and Python’s toolchains) are implemented in Rust. What is more, you can further boost this performance with multithreading: as there is no GIL known from Python world, threads can be run in parallel over multiple CPU cores. Rust’s type system and borrow checker will help you avoid data races without hard to debug synchronization schemes.

This fearless concurrency greatly simplifies parallel programming, but there is no need of spawning and joining threads manually – external crates such as rayon will turn your plain iterators into multi-core beasts⁶.

Conclusion

These are some features that I found useful and which in my opinion put Rust apart from other languages. I am sure that there are still other gems that I will discover when my Rust competence grows. In the meantime, if you have your favorite functionalities please share them with us.

Our next school for Advanced Programming in Python will take place in Trento, Italy on October 4th-8th, 2010. Application deadline: August 31st, 2010. Bellow will you find the detailed program:

Day 0 — Software Carpentry & Advanced Python

Documenting code and using version control
Object-oriented programming, design patterns, and agile programming
Exception handling, lambdas, decorators, context managers, metaclasses

Day 1 — Software Carpentry

Test-driven development, unit testing & Quality Assurance
Debugging, profiling and benchmarking techniques
Data serialization: from pickle to databases

Day 2 — Scientific Tools for Python

Advanced NumPy
The Quest for Speed (intro): Interfacing to C
Programming project

Day 3 — The Quest for Speed

Writing parallel applications in Python
When parallelization does not help: the starving CPUs problem
Programming project

Day 4 — Practical Software Development

Efficient programming in teams
Programming project
The Pac-Man Tournament

Bartosz Teleńczuk

freelance data scientist

Author: btel

Rust for scientific computing

Efficient memory use

Helpful type system

Easy parallelism

Conclusion

How to run IPython on MacOSX

Publication-quality figures with matplotlib and svgutils

Generating LaTeX tables from CSV files

Python Autumn School 2010

Simple point process models of spike trains

Bernstein Stammtisch