Rust for scientific computing

I am much impressed by Python’s expressiveness, ease of programming and development speed. However, as a dynamically typed language pure Python suffers from poor performance, which heavily impacts the numerical algorithms. Therefore, many of the computational tasks are often dispatched to binary Python extensions implemented in C, C++ or other languages. Indeed, the ease of interfacing with external languages is another of Python’s super powers, which also led to its reputation as the glue language 1.

Recently, I have been experimenting with developing Python extensions in Rust, using the amazing pyo32 library (“crate” in Rust world). Although the learning curve is steep at times (hello borrow checker and lifetime annotations), the official Rust book3 does great job in explaining complex topics.

The improvement of performance for the Rust implemented extensions is mind-blowing. You can easily achieve speed-ups of 10 times or more. It’s not surprising that more and more Python libraries are (re)implemented in Rust (ruff, pydantic, polars). However, after spending a few hundred of hours in programming in Rust, I discovered other advantages of this language. Let’s go through some of them.

Efficient memory use

Rust and its borrow checker allow for very fine-grained control of the memory use. You can choose to move, clone, or borrow the original data depending on the lifecycle of your objects and their planned use.

Borrowing saves memory and time when you want to share some data within your program without the need of expensive copies. This comes of course at a cost: if you keep the reference around, you need to make sure that it remains valid (borrow checker will shout at you if it does not), which often means that you have to tag its lifetime.

For many developers starting with Rust (including me), the lifetime is a foreign concept and needs time to get familiar with. In addition, according to strict rust rules you can have only one mutable or many immutable references to the same variable, which limits drastically what you can do with your objects. Fortunately, the standard library offers smart pointers that will implement automatic reference counting (Rc, Arc) and copies-on-write (Cow) for your data structures. With these pointers cloning becomes cheap as it only needs to increment the reference counter behind the scenes.

If you need polymorphic object (references that can store more than one type), Rust implements an original feature called enums with value variants4

enum SensorReading {
    Temperature(f32),
    ParticleCount(i32),
    GateState(bool),
    Offline,
}

let new_reading = SensorReading::ParticleCount(5);

match new_reading {
    SensorReading::Temperature(value) => {
        println!("Temperature is {}", value)},
    SensorReading::ParticleCount(value) => {
        println!("Detected {} particles", value)},
    SensorReading::GateState(value) => {
        if value {
            println!("Gate is open")
        } else {
            println!("Gate is closed")
        }
    }
    SensorReading::Offline => { 
        println!("Sensor is offline")},        
};

    

To allow for such enums to be allocated on the stack, their size is aligned to the largest variant. This may seem inefficient but it allows one to avoid expensive heap allocations.

Another interesting feature is methods that consume (move) the self object. This is useful for implementing processing pipelines where one object is transformed into another one and this one in turn is transformed again

All of this is coupled with memory safety guarantees. What is there not to like?

Helpful type system

The type system in Rust is designed such that it helps you to avoid errors. Therefore, you are encouraged to create custom types for your data structures and Rust will check whether they are used correctly.

For example, you may have a vector of integers encoding the timestamps of your measurements where each measurement is a vector of integers with readings from different sensors. Even though, the data structure is the same, you will not use the same type to represent them so that the compiler will warn you when you confuse timestamps with measurements5 :

struct Timestamps(Vec<i32>);
struct Measurements(Vec<i32>);

fn resample(measurements: Measurements, timestamps: Timestamps) -> Measurements {
   // some code transforming measurements
   measurements
}


let measurements = Measurements(vec![3, 4]);
let timestamps = Timestamps(vec![1, 2]);
    
// this will not compile
// let resampled = resample(timestamps, measurements);
    
// this will compile
let resampled = resample(measurements, timestamps);

This wrapping of built-in types with a custom structure, is called the newtype pattern. The compiler will translate the custom types into plain data structures, so there is no performance overhead of using them. This is what, Rustaceans call zero-cost abstractions.

However, don’t be surprised: rust is not an object-oriented language and there is no inheritance, so you should favor the composition for code reuse.

The strict type system also simplifies the refactoring and, as much as I hate to admit it, it discharges the developer from some of the effort in implementing unit tests. If you program compiles, in ~90% of cases it will run just fine (of course, you still have to to test if it does not produce rubbish results).

Easy parallelism

Rust has a reputation of being extremely efficient. World’s fastest web servers and programming toolchains (including node’s and Python’s toolchains) are implemented in Rust. What is more, you can further boost this performance with multithreading: as there is no GIL known from Python world, threads can be run in parallel over multiple CPU cores. Rust’s type system and borrow checker will help you avoid data races without hard to debug synchronization schemes.

This fearless concurrency greatly simplifies parallel programming, but there is no need of spawning and joining threads manually ­– external crates such as rayon will turn your plain iterators into multi-core beasts6.

Conclusion

These are some features that I found useful and which in my opinion put Rust apart from other languages. I am sure that there are still other gems that I will discover when my Rust competence grows. In the meantime, if you have your favorite functionalities please share them with us.

  1. https://numpy.org/doc/stable/user/c-info.python-as-glue.html ↩︎
  2. https://pyo3.rs/ ↩︎
  3. https://doc.rust-lang.org/stable/book/ ↩︎
  4. Run this code on Rust playground ↩︎
  5. Run this code on Rust playground ↩︎
  6. Blog post introducing rayon ↩︎

Giving presentations with IPython notebook

IPython notebook became a very popular tool for programming short scripts in Python, interactive computing, sharing code, teaching or even demonstrations. Its advantage is the possibility to combine Python code with graphics, HTML, videos or even interactive JavaScript objects in one notebook. With this functionality it may also serve as a great presentation tool.

Continue reading “Giving presentations with IPython notebook”

6 steps to migrate your scientifc scripts to Python 3

Python 3 has been around for some time (the most recent stable version is Python 3.2), but till now it was not widely adopted by scientific community. One of the reason was that the basic scientific Python libraries such as NumPy and SciPy were not ported to Python 3. Since this is no longer the case, there is no reasons anymore to resist migration to Python (you can find the pros and cons on the Python website)

In this guide I am going to describe some tips that I learnt while trying to make my scripts compatible with Python 3. There is nothing to be afraid of – the procedures are actually quite easy and very rewarding (it is like a glimpse into the future of Python!).
Continue reading “6 steps to migrate your scientifc scripts to Python 3”

Scientific computing with GAE and PiCloud

Google App Engine (GAE) is a great platform for learning web programming and testing out new ideas. It is free and offers great functionality, such as Channel API (basically Websockets). Deployment is as easy as clicking a button (on a Mac) on running a Python script (on Linux). The best of all is that you can program in Python and offer an easy end-user web interface without time consuming installation, dependencies and nerves. Continue reading “Scientific computing with GAE and PiCloud”

Generating LaTeX tables from CSV files

I am very committed to the idea of the reproducibility. The way I understand the term is that there should be a close link between the results presented in the paper and the raw data. It happens all too often that some pre-processing step essential for the results presented in the paper is modified slightly during the preparation of the manuscript, but the figures, tables and statistics are not updated accordingly. Continue reading “Generating LaTeX tables from CSV files”

Python Autumn School 2010

Our next school for Advanced Programming in Python will take place in Trento, Italy on October 4th-8th, 2010. Application deadline: August 31st, 2010. Bellow will you find the detailed program:

Day 0 — Software Carpentry & Advanced Python

  • Documenting code and using version control
  • Object-oriented programming, design patterns, and agile programming
  • Exception handling, lambdas, decorators, context managers, metaclasses

Day 1 — Software Carpentry

  • Test-driven development, unit testing & Quality Assurance
  • Debugging, profiling and benchmarking techniques
  • Data serialization: from pickle to databases

Day 2 — Scientific Tools for Python

  • Advanced NumPy
  • The Quest for Speed (intro): Interfacing to C
  • Programming project

Day 3 — The Quest for Speed

  • Writing parallel applications in Python
  • When parallelization does not help: the starving CPUs problem
  • Programming project

Day 4 — Practical Software Development

  • Efficient programming in teams
  • Programming project
  • The Pac-Man Tournament

Bernstein Stammtisch

Today (Wednesday, October 29th) Berstein Master students, PhD students, Postdocs and other people interested in neuroscience are meeting in Buchhandlung at a Stammtisch. This monthly reunion is a great opportuinity to get to know people related to Bernstein Center and exchange some ideas about neuroscience and other current topics. You are all invited!!

Buchhandlung
Tucholskystr., near the corner Auguststr.
October 29th, starting 19 hrs

I hope to meet you there.

MNS 2008/09

The Model of Neural Systems programming course will start on Monday, October 27th. It will be given by Robert Schmidt and me. The first programming assignments are available on the course webpage. See you all on Monday!