this post was submitted on 14 May 2024
311 points (91.2% liked)

Programmer Humor

32567 readers
419 users here now

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

founded 5 years ago
MODERATORS
 

Explanation: Python is a programming language. Numpy is a library for python that makes it possible to run large computations much faster than in native python. In order to make that possible, it needs to keep its own set of data types that are different from python's native datatypes, which means you now have two different bool types and two different sets of True and False. Lovely.

Mypy is a type checker for python (python supports static typing, but doesn't actually enforce it). Mypy treats numpy's bool_ and python's native bool as incompatible types, leading to the asinine error message above. Mypy is "technically" correct, since they are two completely different classes. But in practice, there is little functional difference between bool and bool_. So you have to do dumb workarounds like declaring every bool values as bool | np.bool_ or casting bool_ down to bool. Ugh. Both numpy and mypy declared this issue a WONTFIX. Lovely.

you are viewing a single comment's thread
view the rest of the comments
[–] acannan 14 points 6 months ago (1 children)

In my experience, mypy + pydantic is a recipe for success, especially for large python projects

[–] [email protected] 6 points 6 months ago (1 children)

I wholeheartedly agree. The ability to describe (in code) and validate all data, from config files to each and every message being exchanged is invaluable.

I'm actively looking for alternatives in other languages now.

[–] expr 9 points 6 months ago (2 children)

You're just describing parsing in statically-typed languages, to be honest. Adding all of this stuff to Python is just (poorly) reinventing the wheel.

Python's a great language for writing small scripts (one of my favorite for the task, in fact), but it's not really suitable for serious, large scale production usage.

[–] [email protected] 2 points 6 months ago (1 children)

I'm not talking about type checking, I'm talking about data validation using pydantic. I just consider mypy / pyright etc. another linting step, that's not even remotely interesting.

In an environment where a lot of data is being exchanged by various sources, it really has become quite valuable. Give it a try if you haven't.

[–] expr 4 points 6 months ago (1 children)

I understand what you're saying—I'm saying that data validation is precisely the purpose of parsers (or deserialization) in statically-typed languages. Type-checking is data validation, and parsing is the process of turning untyped, unvalidated data into typed, validated data. And, what's more, is that you can often get this functionality for free without having to write any code other than your type (if the validation is simple enough, anyway). Pydantic exists to solve a problem of Python's own making and to reproduce what's standard in statically-typed languages.

In the case of config files, it's even possible to do this at compile time, depending on the language. Or in other words, you can statically guarantee that a config file exists at a particular location and deserialize it/validate it into a native data structure all without ever running your actual program. At my day job, all of our app's configuration lives in Dhall files which get imported and validated into our codebase as a compile-time step, meaning that misconfiguration is a compiler error.

[–] [email protected] 1 points 6 months ago* (last edited 6 months ago)

I am aware of what you are saying, however, I do not agree with your conclusions. Just for the sake of providing context for our discussion, I wrote plenty of code in statically typed languages, starting in a professional capacity some 33 years ago when switching from pure TASM to AT&T C++ 2, so there is no need to convince me of the benefits :)

That being said, I think we're talking about different use cases here. When I'm talking configuration, I'm talking runtime settings provided by a customer, or service tech in the field - that hardly maps to a compiler error as you mentioned. It's also better (more flexible / higher abstraction) than simply checking a JSON schema, and I'm personally encountering multiple new, custom JSON documents every week where it has proven to be a real timesaver.

I also do not believe that all data validation can be boiled down to simple type checking - libraries like pydantic handle complex validation cases with interdependencies between attributes, initialization order, and fields that need to be checked by a finite automaton, regex or even custom code. Sure, you can graft that on after the fact, but what the library does is provide a standardized way of handling these cases with (IMHO) minimal clutter. I know you basically made that point, but the example you gave is oversimplified - at least in what I do, I rarely encounter data that can be properly validated by simple type checking. If business logic and domain knowledge has to be part of the validation, I can save a ton of boilerplate code by writing my validations using pydantic.

Type annotations are a completely orthogonal case and I'll be the first to admit that Python's type situation is not ideal.

[–] [email protected] 2 points 6 months ago

Gradual typing isn't reinventing the wheel, it's a new paradigm. Statically typed code is easier to write and harder to debug. Dynamically typed code is harder to debug, but easier to write. With gradual typing, the idea is that you can first write dynamic code (easier to write), and then -- wait for it -- GRADUALLY turn it into static code by adding type hints (easier to debug). It separates the typing away from the writing, meaning that the programmer doesn't have to multitask as much. If you know what you're doing, mypy really does let you eat your cake and keep it too.