=> home
I started learning Rust recently as on of the other maintainers of Drogon tries and likes it. And C++ have it's own pile of problems. Not saying I don't like C++ anymore just that I'm trying to learn something new. In the process I found a few places I dislike about Rust. Especially from the point of view of a HPC programmer.
One thing we do a lot in highly concurrent enviroment is to use a shared atomic variable to communicate between threads. For example, in my search engine, there's an atomic integer that counts the active connections. Then each worker decides to spawn more crawlers based on that. In C++ it's easy:
// In Crawler. `atomicactiveConnections_` is a member variable // `dispatchCraw` is called from multiple threads void Crawler::dispatchCraw() { size_t activeConnections = activeConnections_.fetch_add(1, std::memory_order_acq_rel); if (activeConnections < maxConnections_) { // spawn async task dispatchCraw(); return; } activeConnections_.fetch_sub(1, std::memory_order_acq_rel); } // To start crawling. Psuedo code void start() { static Crawlers crawler; for(int i = 0; i < numThreads_; i++) { thread t(&Crawler::dispatchCraw, &crawler); t.detach(); } }
However, due to Rust's borrow checker, it's not possible to share a mutable reference across threads. The solution is to wrap that atomic variable in an Arc clone it and pass it to other threads. Arc
(atomic reference count) is basically atomic<shared_ptr>
in C++. Wasting cycles when it can be perfectly avoided by using a known, good static lifetime variable.
fn dispatch_craw(crawler: &Arc) { let active_connections = crawler.active_connections.fetch_add(1, Ordering::AcqRel); if active_connections < crawler.max_connections { // spawn async task dispatch_craw(crawler); return; } crawler.active_connections.fetch_sub(1, Ordering::AcqRel); return; } fn start() { // !!! Forced to create Arc for the crawler. Even though in C++ it's not necessary let crawler = Arc::new(Crawler::new()); for _ in 0..crawler.num_threads { let crawler = crawler.clone(); // Atomic operation, not free thread::spawn(move || dispatch_craw(&crawler)); } }
This goes for all multiple read or write variables. Like a concurrent queue, concurrent hash map, etc. It's really not ideal.
In C++, OpenMP does a very good job at hiding the details of parallelism. The reduction
reduces the local varliable at the end of the parallel execution. The following programs calculates Pi by integrating a quarter circle in parallel. The "Hello World" of parallel programming or so. This should be easy for any language to implement.
double sum = 0; size_t steps = 1000000000; double step = 1.0/(double) steps; #pragma omp parallel for reduction(+:sum) for (i=0; i < steps; i++) { double x = (i+0.5)*step; sum += 4.0 / (1.0+x*x); }
It's still easy with TBB. Which does not require any special compiler level support. We just have to use a vector to store the local sum.
size_t steps = 1000000000; double step_size = 1.0/(double) steps; vectorpartial_sums(tbb::this_task_arena::default_num_threads()); tbb::parallel_for(0, steps, [&](size_t i) { thread_local auto& partial_sum = partial_sums[tbb::this_task_arena::current_thread_index()]; double x = (i+0.5)*step_size; partial_sum += 4.0 / (1.0+x*x); }); double sum = std::accumulate(partial_sums.begin(), partial_sums.end(), 0);
In Rust, the best when I ask my local Rust community is the following. It's clean, but they gave up on solving the false sharing problem.
let steps = 1000000000; (0..steps).par_iter().fold(vec![x, sum], |mut acc, i| { let x = (i+0.5)*step; acc[1] += 4.0 / (1.0+ x * x); acc[0] = x; acc; });
Sigh
=> =
Anyway, rant over. I feel Rust is doing a lot of correct stuff. But I don't like how it's slow when I'm asking it for the highest performance. Gonna keep learning it, hope to find solutions. (No, it'll not be unsafe
)
text/gemini
This content has been proxied by September (3851b).