this post was submitted on 29 Jan 2025
29 points (100.0% liked)

Linux

49530 readers
785 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

Edit

My question was very badly written but the new title reflect the actual question. Thanks to 3 very friendly and dedicated users (@harsh3466 @tuna @learnbyexample) I was able to find a solution for my files, so thank you guys !!!

For those who will randomly come across this post here are 3 possible ways to achieve the desired results.

Solution 1 (https://lemmy.ml/post/25346014/16383487)

#! /bin/bash
files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

Solution 2 (https://lemmy.ml/post/25346014/16453351)

sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'

Solution 3 (https://lemmy.ml/post/25346014/16453161)

perl -pe 's/\[[^]]+\]\((?!https?)[^#]*#\K[^)]+(?=\))/lc $&=~s:%20|\d\K\.(?=\d):-:gr/ge'

Relevant links

https://mike.bailey.net.au/notes/software/apps/obsidian/issues/markdown-heading-anchors/#background


Hi everyone !

I'm in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it's way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !

With everything I gathered around the web, It seems it's rather a complicated regex and sed substitution, here we go !

What Am I trying to achieve?

I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo...

Convert the following string:

[Some text](#Header%20Linking%20MARKDOWN.md)

Into

[Some text](#header-linking-markdown.md)

As you can see those are the following requirement:

  • Pattern: [Some text](#link%20to%20header.md)
  • Only edit what's between parentheses
  • Replace space (%20) with -
  • Everything as lowercase
  • Links are sometimes in nested parentheses
    • e.g. (look here [Some text](#link%20to%20header.md))
  • Do not change a line that begins with https (external links)

While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/

What I tried

The furthest I got was the following:

sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase

sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -

These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn't work with nested parentheses. Also this would change every %20 occurrence in the file.

The closest solution I found on stackoverflow looks similar but wasn't able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.


I would appreciate any help even if a change of tool is needed, however I'm more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !

Thanks in advance.

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 9 points 6 days ago* (last edited 6 days ago) (2 children)

This is more of a general suggestion: if you use Regular Expression, use https://regex101.com. It provides syntax highlighting, explains the syntax and allows you to test your regexes.

Additionally, I think that sd is way more intuitive than sed.

[–] [email protected] 4 points 5 days ago

Bad advise for sed. regex101 doesn't support POSIX regexes, so you are unable to get the same results as with sed.

[–] [email protected] 4 points 6 days ago

Hello :) Thanks for your reply !

That's exactly what I did and how I came to my "final" result but I doesn't work as expected... because the lack of knowledge and understanding !

Will give sd a try and see if I can come up with something ! Thanks for the pointer !

[–] [email protected] 4 points 5 days ago (1 children)

Obligatory regex was a mistake post

[–] [email protected] 2 points 2 days ago

Yeah probably bare bone regex was a mistake however a friendly user gave me a step by step guide on how to achieve my goal:

#! /bin/bash

files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

If you know a better way to achieve similar results I'm very open for every new lead and learn something new !

[–] [email protected] 5 points 5 days ago (2 children)

This is very close

sed ':loop;/\[[^]]*\](http/! s/\(\[[^]]*\]\)\(([^)]*\)%20\([^)]*)\)/\1\2-\3/g;t loop;/\[[^]]*\](http/! s/\(\[[^]]*\]\)\(([^)]*)\)/\1\L\2/g'

example file

[Some text](#Header%20Linking%20MARKDOWN.md)
(#Should%20stay%20as%20is.md)
Text surrounding [a link](readme.md#Other%20Page). Cool
Multiple [links](#Links.md) in (%20) [a](#An%20A.md) SINGLE [line](#Lines.md)
Do [NOT](https://example.com/URL%20Should%20Be%20Untouched.html) CHANGE%20 [hyperlinks](http://example.com/No%20Touchy.html)

but it doesn't work if you have a http link and markdown link in the same line, and doesn't work with [escaped \] square brackets](#and-escaped-\)-parenthesis) in the link

but!! it was fun!

[–] [email protected] 2 points 2 days ago (2 children)

Hello :) Sorry for the very late response !

Effectively your regex is very close as a one line, I'm pretty impress ! :0 However I missed to mention something In my post (I only though about it after working on it with another user in the comments...). There a 2 things missing on your beautiful and complex regex:

  1. Numbering with dots also needs to have a dash in between (actually I think every special characters like spaces or a dots are converted to a dash )
FROM
---------------
[Link with numbers](readme.md#1.3%20this%20is%20another%20test)

TO
---------------
[Link with numbers](readme.md#1-3-this-is-another-test)
  1. The part before the hashtag needs to keep it original form (links to a real file)
FROM
---------------
[Link with numbers](Another%20file%20to%20readme.md#1.3%20this%20is%20another%20test.md)

TO
---------------
[Link with numbers](Another%20file%20to%20readme.md#1-3-this-is-another-test.md)

Sorry for the trouble I wasn't aware of all the GitHub-Flavored Markdown syntax :/. I got a a very cool working script that works perfectly with another user but If you want to modify your regex and try to solve the issue in pure regex feel free :) I'm very curious how It could look like (god regex is so obscure and at the same time it has some beauty in it !)

#! /bin/bash

files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"
[–] [email protected] 2 points 2 days ago* (last edited 2 days ago) (1 children)

I did it!! It also handles the case where an external link and internal link are on the same line :D

sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'

Here is my annotated file

# Begin loop
:l;

# Bisect first link in pattern space into pattern space and append to hold space
# Example: `text [label](file#fragment)'
#   Pattern space: `file#fragment)'
#   Hold space: `text [label]('
# Steps:
#   1. Strategically insert \n
#       1a. If this fails, branch out
#   2. Append to hold space (this creates two \n's. It feels weird for the
#      first iteration, but that's ok)
#   3. Copy hold space to pattern space, remove first \n, then trim off
#      everything past the second \n
#   4. Swap pattern/hold, and trim off everything up to and incl the last \n
s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;
Te;
H;
g; s/\n//; s/\n.*//;
x; s/.*\n//;

# Modify only if it is an internal link
/^https?:/! {
    # Add hyphens
    :h;
    s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;
    th;
    # Make lowercase
    s/(#[^)]*\))/\L\1/;
};

# "conditional" branch so it checks the next conditional again
tl;

# Exit: join pattern space to hold space, then move to pattern space.
# Since the loop uses H instead of h, have to make sure hold space is empty
:e;
H;
z;
x; s/\n//;
[–] [email protected] 2 points 2 days ago (1 children)

Wow ! Thank you ! It did a rapid test on a test-file.md

[Just a test](#just-a-test)
[Just a link](https://mylink/%20with%20space.com)
[External link](readme.md#just-a-test)
[Link with numbers](readme.md#1-3-this-is-another-test)
[Link with numbers](Another%20file%20to%20readme.md#1-3-this-is-another-test)

Great job ! Thank you very much !!! I'm really impressed what someone with proper knowledge can do ! However, I really do not want to mess around with your regex... This will only call for disaster xD ! I will keep preciously your regex and annotated file in my knowledge base, I'm sure some time in the future I will come back to it and try to break it down as learning process.

Thank you very much !!! 👍

[–] [email protected] 1 points 1 day ago

No problem. I think this is a great "final boss" question for learning sed, because it turns out it is deceptively hard!! You have to understand not only a lot about regex, but about sed to get it right. I learned a lot about sed just by tackling this problem!

I really do not want to mess around with your regex

It is very delicate for sure, but one part you can for sure change is at the # Add hyphens part. In the regex you can see (%20|\.). These are a list of "characters" which get converted to hyphens. For example, you could modify it to (%20|\.|\+) and it will convert +s to -s as well!

Still it is not perfect:

  • If the link spans multiple lines, the regex won't match
  • If the link contains escaped characters like \\\\\[LINK](#LINK) or [LINK\]\\\\](#LINK)
  • If the link is inside a code block ``` it will get changed (which may or may not be intended)

But for a sed-only solution this is about as good as it will get I'm afraid.

Overall I'm very happy with it. Someday I would like to make a video that goes into depth about sed, since it is tricky to learn just from the docs.

[–] [email protected] 1 points 2 days ago

I'll give another go at it :)

[–] [email protected] 4 points 5 days ago (1 children)

annotated it is working like this:

# use a loop to iteratively replace the %20 with -, since doing s/%20/-/g would replace too much. we loop until it cant substitute any more

# label for looping
:loop;
# skip the following substitute command if the line contains an http link in markdown format
/\[[^]]*\](http/!
# capture each part of the link, and join it together with -
s/\(\[[^]]*\]\)\(([^)]*\)%20\([^)]*)\)/\1\2-\3/g;
# if the substitution made a change, loop again, otherwise break
t loop;

# convert all insides to the link lowercase if the line doesnt contain an http link
/\[[^]]*\](http/!
# this is outside the loop rather than in the s command above because if the link doesnt contain %20 at all then it won't convert to lowercase
s/\(\[[^]]*\]\)\(([^)]*)\)/\1\L\2/g
[–] [email protected] 2 points 5 days ago* (last edited 5 days ago) (1 children)

skip the following substitute command if the line contains an http link in markdown format

Why you assume there's only one link in the line?

Also, you perform substitutions in the whole URL instead only the fragment component.

[–] [email protected] 3 points 5 days ago* (last edited 5 days ago)

Why you assume there's only one link in the line?

They did not want external (http) links to be modified as that would break it:

  • [Example](https://example.com/#Some%20Link)
  • [Example](https://example.com/#some-link)

I compromised by thinking that it might be unlikely enough to have an external http link AND internal link within the same line. You could probably still do it, my first thought was [^h][^t][^t][^p] but that would cause issues for #ttp and #A so i just gave up. Instead I think you'd want a different approach, like breaking each link onto their own line, do the same external/internal check before the substitution, and join the lines afterward.

Also, you perform substitutions in the whole URL instead of the fragment component

That requirement i missed. I just assumed the filename would be replaced the same way too Lol. Not too hard to fix tho :)

[–] [email protected] 8 points 6 days ago* (last edited 6 days ago) (1 children)

Honestly, I'd be looking at doing this in any other language that has a Markdown library to parse these. You're doing this on "hard mode" with sed. There are probably already a ton of Python tools out there that do this.

Have a look at this. Seems it could do the job: https://github.com/Wenzil/mdx_bleach

[–] [email protected] 3 points 6 days ago

Hello,

I have thought of a python script and looked a bit around but couldn't find something satisfactory. Also I'm a tiny bit more versed in bash/CLI than with python... Even though that's very arguable !

I looked through the Github repo and at first glance I have no idea how this could do the job, again I probably have to dig a bit deeper and understand what this is actually doing !

Thanks for the pointer will give it a try :)

[–] [email protected] 4 points 5 days ago* (last edited 5 days ago) (2 children)

As I see, you've already got an answer how to convert text to lower case. So I just tell you how to replace all occurrences of %20 with -. You need to repeat substitution until no matches found. For such iteration you need to use branching to label. Below is sed script with comments.

:subst                                         # label
s/(\[[^]]+\]\([^)#]*#[^)]*)%20([^)]*\))/\1-\2/ # replace the first occurrence of `%20` in the URL fragment
t subst                                        # go to the `subst` label if the substitution took place

However there are some cases when this script will fail, e. g. if there is an escaped ] character in the link text. You cannot avoid such mistakes using only simple regexps, you need a full featured markdown parser for this.

[–] [email protected] 1 points 2 days ago* (last edited 2 days ago)

Thank you very much for taking your time and trying to help me with comments and all !

you need a full featured markdown parser for this.

Do you mean something like pandoc? Someone pointed me to it and it seems it can covert to GitHub-Flavored Markdown ! Thanks for the pointer will give it a try to see how it works out with my actual script :)

Sorry for the very late response !! Here is the working bash script another user helped me put together:

#! /bin/bash

files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"
[–] [email protected] 2 points 5 days ago

NB: global substitution s///g is not applicable here because you need to perform new substitutions in a substituted text. Both sed regexp syntaxes (basic and extended) don't support lookarounds that could solve this issue.

[–] [email protected] 4 points 6 days ago (19 children)

I've got a sed regex that should work, just writing up a breakdown of the whole command so anyone interested can follow what it does. Will post in a bit.

load more comments (19 replies)
[–] learnbyexample 2 points 5 days ago* (last edited 5 days ago) (2 children)

Here's a solution with perl (assuming you don't want to change http/https after the start of ( instead of start of a line):

perl -pe 's/\[[^]]+\]\(\K(?!https?)[^)]+(?=\))/lc $&=~s|%20|-|gr/ge' ip.txt
  • e flag allows you to use Perl code in the substitution portion.
  • \[[^]]+\]\(\K match square brackets and use \K to mark the start of matching portion (text before that won't be part of $&)
  • (?!https?) don't match if http or https is found
  • [^)]+(?=\)) match non ) characters and assert that ) is present after those characters
  • $&=~s|%20|-|gr change %20 to - for the matching portion found, the r flag is used to return the modified string instead of change $& itself
  • lc is a function to change text to lowercase
[–] [email protected] 2 points 2 days ago (1 children)

Sorry for the late response... I was busy with another user :S My English is so bad I'm not able to response to every one at the same time... Whatever...

I tried your pearl regex substitution and effectively it does what I ask from my post, so thank you very much for your help ! However, I missed a few use cases were your regex breaks... But that's on me, your command works as expected !!!

[Link with numbers](Another%20Markdown%20file.md#1.3%20this%20is%20another%20test.md)

The part before the hashtag need to keeps it's original form (even with %20) because it links to a markdown file directly and not a header (Hope it's comprehensible?). It took me a lot of time with another user and we came to a wrapped up script that does everything:

#! /bin/bash

files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

If you are motivated you can still improve your regex If you want :) I'm kinda curious If it's possible with a one-liner ! Thank again for your help and sorry for the late response !!

[–] learnbyexample 2 points 2 days ago (1 children)

This might work, but I think it is best to not tinker further if you already have a working script (especially one that you understand and can modify further if needed).

perl -pe 's/\[[^]]+\]\((?!https?)[^#]*#\K[^)]+(?=\))/lc $&=~s:%20|\d\K\.(?=\d):-:gr/ge'
[–] [email protected] 2 points 2 days ago (1 children)

Thank you ! It does actually ticks every use case (for my files) looks pretty rad !

This might work, but I think it is best to not tinker further if you already have a working script (especially one that you understand and can modify further if needed).

I totally agree but I will keep your regex as reference, in the near future I will give it a try to decompose you regex as learning process but it looks rather very complex !

Another user came up with the following solution:

sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'

Just as a little experiment, If you want to spend some time and give me a answer, what do you think? It's a another way to achieve the same kind of results but they are significantly different. I know there a thousand ways to achieve the same results but I'm kinda curious how it looks from an experts eyes :).

Thanks again for your help and the time you took to write up a complex regex for my use case ! 👍

[–] learnbyexample 1 points 3 hours ago

Well, I'm not going to even try understanding the various features used in that sed command. I do know how to use basic loops with labels, but I never bothered with all the buffer manipulation stuff. I'd rather use awk/perl/python for those cases.

[–] [email protected] 2 points 5 days ago (1 children)

I didn't test this, but it will change the whole URL while changes are only needed in its fragment component (after the first #).

[–] learnbyexample 1 points 5 days ago (1 children)

Hmm, OP mentioned "Only edit what’s between parentheses" - don't see anywhere that whole URL shouldn't be changed...

[–] [email protected] 1 points 5 days ago

Paths are constant, only anchors are generated by forgejo.

[–] [email protected] 1 points 4 days ago (1 children)

Not home so I can't try it but do you need to be so specific to match the whole markdown syntax?

You might be able to get away with

s/#(\w+%20)*\w+\.\w{2,3}/\L&/g; /#(\w+%20)*\w+\.\w{2,3}/ s/%20/-/g

basically, matching #this%20is%20LIKELY%20a%20link.md as opposed to matching whole markdown link

lowercasing that entire match, then on a search matching stuff that looks like that, replace the %20 with a hyphen (combined into a single sed command). this only fails when an http link falls within the same line as a markdown hyperlink

[–] [email protected] 1 points 2 days ago

Hello :) Sorry for the late response !!! I was busy working it out with another user ! However out of curiosity gave your sed regex a try, but there seems a missing ( somewhere ! I tried to fix the issue but your regex is way over my capabilities ! If you are sed/regex fanatic a want to give it another try feel free :). Right now I found a solution with another user that works great here's the script in question if you are interested:

#! /bin/bash

files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Replace spaces (%20) from markdown links to - after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

It's not very elegant but it does the job... While working on it with another very friendly user I came across other thing I haven't though of like:

  • Converting 1.2 to 1-2 (e.g. [Just a placeholder](#1.2%20Just%20a%20link%20to%20header))
  • Linking to another markdown file (e.g. [Just a placeholder](Another%20File.md#1.2%20Just%20a%20link%20to%20header))
  • The link to file before the # need to keeps it's original form (e.g. [Just a placeholder](Another%20File.md#1-2-just-a-link-tp-header))

Well I think that bare bone sed/regex wasn't the right tool, but in a bash script it does exactly what I'm expecting :)

Thanks for your help and pointers !

[–] [email protected] 1 points 5 days ago (2 children)
[–] [email protected] 1 points 2 days ago (1 children)

Hello :) Sorry to pin you, I just gave pandoc a try but it doesn't work and I had to dig a bit further into the web to find out why !

Links to Headings with Spaces are not specified by CommonMark and each tool implement a different approach... Most replace space with hyphens other use URL encoding (%20). So even though pandoc looks awesome it doesn't work for my use case (or did i miss something? Feel free to comment).

You can give it a try on https://pandoc.org/try/ with commonmark to gfm:

[Just a test](#Just a test)
[Just a link](https://mylink/%20with%20space.com)
[External link](Readme.md#JUST%20a%20test)
[Link with numbers](readme.md#1.3%20this%20is%20another%20test)
[Link with numbers](Another%20file%20to%20readme.md#1.3%20this%20is%20another%20test)

If you prefere a cli version:

pandoc --from=commonmark_x --to=gfm+gfm_auto_identifiers "/home/user/Documents/test.md" -o "pandoc_test.md"
[–] [email protected] 2 points 2 days ago

Hey I just did a quick web search and found this. I haven't used the tool specifically before. However I recommend either searching the web for a similar tool or using a chatgpt like tool to create a python script that'll achieve your end result. Sed and regex are cool and useful, but they're only going to make it more difficult to achieve what you need.

[–] [email protected] 1 points 2 days ago

Thanks for the pointer I wasn't aware pandoc was able to do that :/ It seems It can convert to Github-Flavored Markdown !! I have to give it a try :) Still I learned a lot from another user about regex/sed and Pearl :) !

load more comments
view more: next ›