this post was submitted on 15 Sep 2023
21 points (100.0% liked)

Python

6393 readers
14 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

๐Ÿ“… Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

๐Ÿ Python project:
๐Ÿ’“ Python Community:
โœจ Python Ecosystem:
๐ŸŒŒ Fediverse
Communities
Projects
Feeds

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] qwop 3 points 1 year ago (1 children)

UTF-8 is an encoding for unicode, that means it's a way of representing a unicode string as actual bytes on a computer.

It is variable length and works by using the first bits of each byte to indicate how many bytes are are needed to represent the current character.

Python also uses an encoding, as you describe in the article, but it's different to UTF-8. Unlike unicode, all characters in Python's representation of the unicode string use the same number of bytes, which is the maximum that any individual unicode character in the string needs.

I'd probably mess up a more detailed explanation of UTF-8 or Python's representation, so I'll let you look into how they work in more detail if you're interested.