For those who will randomly come across this post here are 3 possible ways to achieve the desired results.
#! /bin/bash
mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"
while IFS= read -r line; do
#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9])
dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
sed -i "s/$line/${dashlink}/" "$files"
#Puts everything to lowercase after a hashtag
lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
sed -i "s/$dashlink/${lowercaselink}/" "$files"
#Removes spaces (%20) from markdown links after a hashtag
spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
sed -i "s/$lowercaselink/${spacelink}/" "$files"
done <<<"$mdlinks2"
sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'
perl -pe 's/\[[^]]+\]\((?!https?)[^#]*#\K[^)]+(?=\))/lc $&=~s:%20|\d\K\.(?=\d):-:gr/ge'
Hi everyone !
I'm in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it's way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !
With everything I gathered around the web, It seems it's rather a complicated regex and sed substitution, here we go !
What Am I trying to achieve?
I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo...
Convert the following string:
[Some text](#Header%20Linking%20MARKDOWN.md)
[Some text](#header-linking-markdown.md)
As you can see those are the following requirement:
- Pattern:
[Some text](#link%20to%20header.md)
- Only edit what's between parentheses
- Replace
space (%20)
- Everything as lowercase
- Links are sometimes in nested parentheses
- e.g. (look here
[Some text](#link%20to%20header.md)
- e.g. (look here
- Do not change a line that begins with
(external links)
While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/
What I tried
The furthest I got was the following:
sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase
sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -
These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn't work with nested parentheses. Also this would change every %20
occurrence in the file.
The closest solution I found on stackoverflow looks similar but wasn't able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.
I would appreciate any help even if a change of tool is needed, however I'm more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !
Thanks in advance.
As I see, you've already got an answer how to convert text to lower case. So I just tell you how to replace all occurrences of
. You need to repeat substitution until no matches found. For such iteration you need to use branching to label. Below is sed script with comments.However there are some cases when this script will fail, e. g. if there is an escaped
character in the link text. You cannot avoid such mistakes using only simple regexps, you need a full featured markdown parser for this.NB: global substitution
is not applicable here because you need to perform new substitutions in a substituted text. Bothsed
regexp syntaxes (basic and extended) don't support lookarounds that could solve this issue.Thank you very much for taking your time and trying to help me with comments and all !
Do you mean something like pandoc? Someone pointed me to it and it seems it can covert to GitHub-Flavored Markdown ! Thanks for the pointer will give it a try to see how it works out with my actual script :)
Sorry for the very late response !! Here is the working bash script another user helped me put together: