this post was submitted on 09 Dec 2023
32 points (97.1% liked)

Python

6423 readers
7 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

๐Ÿ“… Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

๐Ÿ Python project:
๐Ÿ’“ Python Community:
โœจ Python Ecosystem:
๐ŸŒŒ Fediverse
Communities
Projects
Feeds

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 5 points 11 months ago (1 children)

It's usually easier imo to separate them into different processes, perhaps running on different machines, and communicate via some form of message passing (I like Rabbitmq).

For example, at my company, we have a web application that does some batch processing, so we have two separate services, our web server and our batch processing service. The web server handles requests, does database calls, etc, and if there's a larger operation, it sends a message on Rabbitmq for our batch processor consumer to handle. That consumer handles some number of threads and schedules work as needed.

This gives us a few benefits:

  • web server can handle I/o heavy tasks, which means either lots of threads or an async loop
  • consumer can handle CPU heavy tasks, which means matching threads to CPU cores

If you host them on the same machine, you can pin the CPU heavy threads to certain cores to minimize context switching costs, and use the rest for switching between the web threads.

I find this works pretty well and it's pretty easy to scale up for production. I hope this helps.

[โ€“] lysdexic 2 points 11 months ago (1 children)

Itโ€™s usually easier imo to separate them into different processes (...)

I don't think your comment applies to the discussion. One of the thread pools mentioned is for IO-bound applications, which means things like sending HTTP requests.

Even if somehow you think it's a good idea to move this class of tasks to a separate process, you will still have a very specific thread pool that can easily overcommit because most tasks end up idling while waiting for data to arrive.

The main take is that there are at least two classes of background tasks that have very distinct requirements and usage patterns. It's important to handle both in separate thread pools which act differently. Some frameworks already do that for you out of the box. Nevertheless it's important to be mindful of how distinct their usage is.

[โ€“] [email protected] 3 points 11 months ago* (last edited 11 months ago)

That's precisely what I was talking about. I basically said it's better to split an application by type of parallelism (CPU bound vs I/O bound) than to mix them.

An I/O heavy service benefits from having lots of available threads mapped to a smaller number of CPU cores, whereas a calculation heavy service benefits from pinning threads to cores to limit context switching. So scaling each will be quite different.

If you separate them into separate processes (one for I/O and one for compute), it's much easier to scale them separately (more machines and whatnot). If I combine them, I'd need to continually balance how cores are split between concerns, and I wouldn't have as much control over the types of cores (I/O is happy with lots of generic cores, whereas compute would benefit from specialized instructions).

So that's my practical application of the "separate thread pools" idea, splitting thread pools at the process boundary is usually useful as an application grows in complexity. This increases latency, but it enables other types of tuning.