this post was submitted on 09 Dec 2023
32 points (97.1% liked)
Python
6423 readers
7 users here now
Welcome to the Python community on the programming.dev Lemmy instance!
๐ Events
Past
November 2023
- PyCon Ireland 2023, 11-12th
- PyData Tel Aviv 2023 14th
October 2023
- PyConES Canarias 2023, 6-8th
- DjangoCon US 2023, 16-20th (!django ๐ฌ)
July 2023
- PyDelhi Meetup, 2nd
- PyCon Israel, 4-5th
- DFW Pythoneers, 6th
- Django Girls Abraka, 6-7th
- SciPy 2023 10-16th, Austin
- IndyPy, 11th
- Leipzig Python User Group, 11th
- Austin Python, 12th
- EuroPython 2023, 17-23rd
- Austin Python: Evening of Coding, 18th
- PyHEP.dev 2023 - "Python in HEP" Developer's Workshop, 25th
August 2023
- PyLadies Dublin, 15th
- EuroSciPy 2023, 14-18th
September 2023
- PyData Amsterdam, 14-16th
- PyCon UK, 22nd - 25th
๐ Python project:
- Python
- Documentation
- News & Blog
- Python Planet blog aggregator
๐ Python Community:
- #python IRC for general questions
- #python-dev IRC for CPython developers
- PySlackers Slack channel
- Python Discord server
- Python Weekly newsletters
- Mailing lists
- Forum
โจ Python Ecosystem:
๐ Fediverse
Communities
- #python on Mastodon
- c/django on programming.dev
- c/pythorhead on lemmy.dbzer0.com
Projects
- Pythรถrhead: a Python library for interacting with Lemmy
- Plemmy: a Python package for accessing the Lemmy API
- pylemmy pylemmy enables simple access to Lemmy's API with Python
- mastodon.py, a Python wrapper for the Mastodon API
Feeds
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It's usually easier imo to separate them into different processes, perhaps running on different machines, and communicate via some form of message passing (I like Rabbitmq).
For example, at my company, we have a web application that does some batch processing, so we have two separate services, our web server and our batch processing service. The web server handles requests, does database calls, etc, and if there's a larger operation, it sends a message on Rabbitmq for our batch processor consumer to handle. That consumer handles some number of threads and schedules work as needed.
This gives us a few benefits:
If you host them on the same machine, you can pin the CPU heavy threads to certain cores to minimize context switching costs, and use the rest for switching between the web threads.
I find this works pretty well and it's pretty easy to scale up for production. I hope this helps.
I don't think your comment applies to the discussion. One of the thread pools mentioned is for IO-bound applications, which means things like sending HTTP requests.
Even if somehow you think it's a good idea to move this class of tasks to a separate process, you will still have a very specific thread pool that can easily overcommit because most tasks end up idling while waiting for data to arrive.
The main take is that there are at least two classes of background tasks that have very distinct requirements and usage patterns. It's important to handle both in separate thread pools which act differently. Some frameworks already do that for you out of the box. Nevertheless it's important to be mindful of how distinct their usage is.
That's precisely what I was talking about. I basically said it's better to split an application by type of parallelism (CPU bound vs I/O bound) than to mix them.
An I/O heavy service benefits from having lots of available threads mapped to a smaller number of CPU cores, whereas a calculation heavy service benefits from pinning threads to cores to limit context switching. So scaling each will be quite different.
If you separate them into separate processes (one for I/O and one for compute), it's much easier to scale them separately (more machines and whatnot). If I combine them, I'd need to continually balance how cores are split between concerns, and I wouldn't have as much control over the types of cores (I/O is happy with lots of generic cores, whereas compute would benefit from specialized instructions).
So that's my practical application of the "separate thread pools" idea, splitting thread pools at the process boundary is usually useful as an application grows in complexity. This increases latency, but it enables other types of tuning.