Edit
My question was very badly written but the new title reflect the actual question. Thanks to 3 very friendly and dedicated users (@harsh3466 @tuna @learnbyexample) I was able to find a solution for my files, so thank you guys !!!
For those who will randomly come across this post here are 3 possible ways to achieve the desired results.
Solution 1 (https://lemmy.ml/post/25346014/16383487)
#! /bin/bash
files="/home/USER/projects/test.md"
mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"
while IFS= read -r line; do
#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9])
dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
sed -i "s/$line/${dashlink}/" "$files"
#Puts everything to lowercase after a hashtag
lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
sed -i "s/$dashlink/${lowercaselink}/" "$files"
#Removes spaces (%20) from markdown links after a hashtag
spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
sed -i "s/$lowercaselink/${spacelink}/" "$files"
done <<<"$mdlinks2"
Solution 2 (https://lemmy.ml/post/25346014/16453351)
sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'
Solution 3 (https://lemmy.ml/post/25346014/16453161)
perl -pe 's/\[[^]]+\]\((?!https?)[^#]*#\K[^)]+(?=\))/lc $&=~s:%20|\d\K\.(?=\d):-:gr/ge'
Relevant links
https://mike.bailey.net.au/notes/software/apps/obsidian/issues/markdown-heading-anchors/#background
Hi everyone !
I'm in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it's way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !
With everything I gathered around the web, It seems it's rather a complicated regex and sed substitution, here we go !
What Am I trying to achieve?
I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo...
Convert the following string:
[Some text](#Header%20Linking%20MARKDOWN.md)
Into
[Some text](#header-linking-markdown.md)
As you can see those are the following requirement:
- Pattern:
[Some text](#link%20to%20header.md)
- Only edit what's between parentheses
- Replace
space (%20)
with-
- Everything as lowercase
- Links are sometimes in nested parentheses
- e.g. (look here
[Some text](#link%20to%20header.md)
)
- e.g. (look here
- Do not change a line that begins with
https
(external links)
While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/
What I tried
The furthest I got was the following:
sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase
sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -
These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn't work with nested parentheses. Also this would change every %20
occurrence in the file.
The closest solution I found on stackoverflow looks similar but wasn't able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.
I would appreciate any help even if a change of tool is needed, however I'm more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !
Thanks in advance.
Hello :) Sorry for the very late response !
Effectively your regex is very close as a one line, I'm pretty impress ! :0 However I missed to mention something In my post (I only though about it after working on it with another user in the comments...). There a 2 things missing on your beautiful and complex regex:
Sorry for the trouble I wasn't aware of all the GitHub-Flavored Markdown syntax :/. I got a a very cool working script that works perfectly with another user but If you want to modify your regex and try to solve the issue in pure regex feel free :) I'm very curious how It could look like (god regex is so obscure and at the same time it has some beauty in it !)
I did it!! It also handles the case where an external link and internal link are on the same line :D
Here is my annotated file
Wow ! Thank you ! It did a rapid test on a test-file.md
Great job ! Thank you very much !!! I'm really impressed what someone with proper knowledge can do ! However, I really do not want to mess around with your regex... This will only call for disaster xD ! I will keep preciously your regex and annotated file in my knowledge base, I'm sure some time in the future I will come back to it and try to break it down as learning process.
Thank you very much !!! 👍
No problem. I think this is a great "final boss" question for learning sed, because it turns out it is deceptively hard!! You have to understand not only a lot about regex, but about sed to get it right. I learned a lot about sed just by tackling this problem!
It is very delicate for sure, but one part you can for sure change is at the
# Add hyphens
part. In the regex you can see(%20|\.)
. These are a list of "characters" which get converted to hyphens. For example, you could modify it to(%20|\.|\+)
and it will convert+
s to-
s as well!Still it is not perfect:
\\\\\[LINK](#LINK)
or[LINK\]\\\\](#LINK)
But for a sed-only solution this is about as good as it will get I'm afraid.
Overall I'm very happy with it. Someday I would like to make a video that goes into depth about sed, since it is tricky to learn just from the docs.
I'll give another go at it :)