this post was submitted on 04 Apr 2025
15 points (94.1% liked)

Python

6963 readers
83 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

πŸ“… Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

🐍 Python project:
πŸ’“ Python Community:
✨ Python Ecosystem:
🌌 Fediverse
Communities
Projects
Feeds

founded 2 years ago
MODERATORS
 

Hello if anyone knows of a way to get python-markdown to behave in the way I'd like, or of an alternative way to do it, I'd love some help! My use case is I'm converting .md files made with Obsidian into html files. Obsidian has tags that are a pound sign followed by the tag (so like "#TagName"). When the tag is the first item on a line the pound sign is confused for a heading, even though there is no space after it.

Is there a way that I can avoid this so it only reads it as a heading if there is a space between the pound and the next word? I'm even considering some kind of find/replace logic so I can swap it out with like a link to a page that lists all the pages with that tag or something that gets run before the markdown to html conversion.

top 8 comments
sorted by: hot top controversial new old
[–] logging_strict 2 points 1 day ago* (last edited 1 day ago)

Learn Sphinx which can mix .rst and .md files. myst-parser is the package which deals with .md files.

Just up your game a bit and you'll have variables similar to Obsidian tags which doesn't cause problems when being rendered into html web site and pdf file

[–] [email protected] 5 points 1 day ago (1 children)

The most bulletproof way to do this seems to be to escape the # characters before running the document through Markdown. It might suffice to use a regex. (Insert regex and two problems joke here.) That seems promising because headings always match #\s+ whereas tags match #[^\s].

I hope someone has an even better idea, but this ought to work.

[–] [email protected] 2 points 23 hours ago (1 children)

I'm so bad at regex, would you mind explaining what's going on with these ones?

[–] [email protected] 4 points 17 hours ago* (last edited 16 hours ago)

#\s+ is:

  • #: a literal #

  • \s: any whitespace character (space, tab etc)

  • +: the previous thing (here the whitespace), one or more times

In words: "a hash followed by at least one whitespace character"

#[^\s]. is:

  • #: a literal #

  • [^\s] : a negated character class. This matches anything other than the set of characters after the ^. \s has the same meaning as before, any whitespace character

  • . : matches any single character

In words: "a hash followed by any character other than a whitespace character, then any character".

https://regex101.com/ is really good for explaining regex

[–] [email protected] 3 points 1 day ago (1 children)
[–] logging_strict 1 points 1 day ago (1 children)

the OP is discussing one step before pandoc

[–] moonpiedumplings 1 points 21 hours ago

I just did a quick test with quarto, which uses pandoc markdown and pandoc for conversions, and it looks like pandoc doesn't recognize #nospace as a header (although this could be a quarto specific thing).

A quick look at the python library op is using and it seems that that is what they are using to convert to html, rather than pandoc.

[–] ulterno 0 points 20 hours ago

I like kramdown, though not sure if it fixes this particular problem. But it has another nice little way to add tags^[try converting your HTML with headings to a kramdown document and you will see what I am saying].

Also, discount is lovely and small.