this post was submitted on 04 Apr 2025
16 points (94.4% liked)
Python
6996 readers
15 users here now
Welcome to the Python community on the programming.dev Lemmy instance!
π Events
Past
November 2023- PyCon Ireland 2023, 11-12th
- PyData Tel Aviv 2023 14th
October 2023
- PyConES Canarias 2023, 6-8th
- DjangoCon US 2023, 16-20th (!django π¬)
July 2023
- PyDelhi Meetup, 2nd
- PyCon Israel, 4-5th
- DFW Pythoneers, 6th
- Django Girls Abraka, 6-7th
- SciPy 2023 10-16th, Austin
- IndyPy, 11th
- Leipzig Python User Group, 11th
- Austin Python, 12th
- EuroPython 2023, 17-23rd
- Austin Python: Evening of Coding, 18th
- PyHEP.dev 2023 - "Python in HEP" Developer's Workshop, 25th
August 2023
- PyLadies Dublin, 15th
- EuroSciPy 2023, 14-18th
September 2023
- PyData Amsterdam, 14-16th
- PyCon UK, 22nd - 25th
π Python project:
- Python
- Documentation
- News & Blog
- Python Planet blog aggregator
π Python Community:
- #python IRC for general questions
- #python-dev IRC for CPython developers
- PySlackers Slack channel
- Python Discord server
- Python Weekly newsletters
- Mailing lists
- Forum
β¨ Python Ecosystem:
π Fediverse
Communities
- #python on Mastodon
- c/django on programming.dev
- c/pythorhead on lemmy.dbzer0.com
Projects
- PythΓΆrhead: a Python library for interacting with Lemmy
- Plemmy: a Python package for accessing the Lemmy API
- pylemmy pylemmy enables simple access to Lemmy's API with Python
- mastodon.py, a Python wrapper for the Mastodon API
Feeds
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The most bulletproof way to do this seems to be to escape the
#
characters before running the document through Markdown. It might suffice to use a regex. (Insert regex and two problems joke here.) That seems promising because headings always match#\s+
whereas tags match#[^\s]
.I hope someone has an even better idea, but this ought to work.
I'm so bad at regex, would you mind explaining what's going on with these ones?
I don't mind at all. Beyond my explanation, you might like to try to use an online regular expression checker to explore small changes to the regex to see how it matches what it matches.
Headings always match
#\s+
, because that's the character#
followed by whitespace (\s
) one or more times (+
). Other text matches this, but so not all matches are headings, but all headings match. (You might have# blah
in the middle of the text, which would match. If that's a problem, then you can change the regex to^#\s+
, where^
means "from the beginning of a line".Tags always match
#[^\s]
, which means the character#
followed by one not whitespace character. Be careful: tags match this regex, but this regex doesn't match the entire tag. It only says "there is a tag here".Fortunately, that doesn't hurt, because your Python code could match
#[^\s]
and then turn that#
into\#
and thereby successfully avoid escaping the#
s at the beginning of headings. You could even use regex to do this by capturing the non-whitespace character at the beginning of the tag and "putting it back" using regex search and replace.Replace
#([^s])
with\#\1
.The parentheses capture the matching characters (the first character of the tag) and
\1
echoes back the captured characters. It would replace#a
with\#a
and so on.I hope I explained this clearly enough. I see the other folks also tried, so I hope that together, you found an explanation that works well enough for you.
Peace.
I found a regex checker and it helped so much thank you for the suggestion! I think I better understand what's going on and was able to use that to modify it to work closer to how I want. Currently I have
"#[^\s#][^\s" + string.punctuation + "#]*"
So what I think is going on is it looks for
#
followed by not whitespace or another#
(before it was matching on headers with multiple pound signs). Then keep looking until it runs into a whitespace, punctuation, another # (in the case of multiple tags) for as many characters as needed.My use case is to be able to turn the tags into links to pages with a list of pages including that tag. What I do is blindly replace the tag to where a page should exist, log the tag, and later gather up all the found tags to make the pages with lists. The punctuation was because I had some tags in weird places like the end of sentences that was adding a period or comma to it and making a unique tag (like
#Homeworld
at the top of a file vs.Find better tag than #Homeworld.
as a note)Excellent! Indeed, I'd completely forgot about H2, H3, and so on, so I'm glad you found it comfortable to figure that out!
I read Mastering Regular Expressions about 25 years ago and it's one of the best and simplest investments I ever made in my own programming practice. Regex never goes out of style.
Enjoy!
#\s+
is:#
: a literal#
\s
: any whitespace character (space, tab etc)+
: the previous thing (here the whitespace), one or more timesIn words: "a hash followed by at least one whitespace character"
#[^\s].
is:#
: a literal#
[^\s]
: a negated character class. This matches anything other than the set of characters after the^
.\s
has the same meaning as before, any whitespace character.
: matches any single characterIn words: "a hash followed by any character other than a whitespace character, then any character".
https://regex101.com/ is really good for explaining regex