this post was submitted on 12 Aug 2023
83 points (97.7% liked)

Explain Like I'm Five

14289 readers
1 users here now

Simplifying Complexity, One Answer at a Time!

Rules

  1. Be respectful and inclusive.
  2. No harassment, hate speech, or trolling.
  3. Engage in constructive discussions.
  4. Share relevant content.
  5. Follow guidelines and moderators' instructions.
  6. Use appropriate language and tone.
  7. Report violations.
  8. Foster a continuous learning environment.

founded 1 year ago
MODERATORS
 

I feel like whenever I see the ampersand on this website, it’s followed with “amp;”. I’ve noticed it other places on the internet also. Why does this happen? Is it some programming thing?

Just for a test: &

all 20 comments
sorted by: hot top controversial new old
[–] [email protected] 52 points 1 year ago* (last edited 1 year ago) (1 children)

Yes, it's a programming thing. If you think about all of the symbols you can see in a website, there are many more than what's available on your keyboard, like the Copyright symbol, ©. Since programmers deal with the same limited set of keyboard keys, everything that can be displayed in text is given a unique number. Lowercase a is 97, the Copyright symbol is 169. Some commonly used ones are also given a slightly more readable name in addition to the number.

Now that there are numbers or names associated with everything, the programmers need a way to tell the browser that this number or text should be used to refer to a symbol. To do this, they use an Escape Character, a symbol that tells the browser to treat what follows as a lookup. That escape character in HTML is the ampersand, so & yen; will display as the symbol for Japanese Yen, ¥. The semicolon denotes the end of the Escape Sequence.

So with Ampersand having a special meaning, to show an actual ampersand you would typically use & amp; and the browser would turn it back into &. However, if you're not looking at the text in a browser, or if the area where the text is displayed doesn't understand the & amp; notation, then you will see it exactly as you described.

[–] [email protected] 7 points 1 year ago (1 children)

But, shouldn't this have been a solved problem like, back in the 90s? Why is it that modern software like Lemmy still has issues with it?

[–] [email protected] 6 points 1 year ago

There are good reasons why software may wish to ignore escape characters, but this likely comes down to human error. There are many programming problems that have been solved for decades, but occasionally you'll still see them appear in newer software for that reason.

From my own work, I certainly have code that isn't 100% right, but it works well enough that I instead spend my time in other areas.

[–] [email protected] 33 points 1 year ago* (last edited 1 year ago) (2 children)

& amp; is the html code for the symbol &. It is just not being parsed correctly so you are seeing the code.

(I had to add a space so it would show it)

[–] [email protected] 16 points 1 year ago (1 children)
[–] [email protected] 3 points 1 year ago (2 children)
[–] [email protected] 1 points 1 year ago (1 children)
[–] [email protected] 1 points 1 year ago (1 children)

Ok, what is this wizardry?

[–] [email protected] 1 points 1 year ago
[–] [email protected] 1 points 1 year ago

hmm what if you use the escape code to write the escape code :P

&

written as &

... written as &

[–] [email protected] 22 points 1 year ago* (last edited 1 year ago)

It's because some part of the post is being sanitized to reduce the possibility of a security flaw by someone managing to type in something that could be executed by the server or your web browser in an unexpected way.

https://github.com/LemmyNet/lemmy/blob/main/RELEASES.md#major-changes-1

In terms of security, Lemmy now performs HTML sanitization on all messages which are submitted through the API or received via federation. Together with the tightened content-security-policy from 0.18.2, cross-site scripting attacks are now much more difficult.

The & symbol is however incorrectly parsed by the sanitizer, which will eventually be patched by the devs.

[–] [email protected] 13 points 1 year ago* (last edited 1 year ago) (1 children)

There's not enough symbols on my keyboard, so let's invent a code so we can write other symbols

  1. lets say & means start of code
  2. and say ; means end of code
  3. Between the start and end is the code

Now let's make some real symbols

  • ¢ can be ¢
  • © can be ©
  • ÷ can be ÷

I want to tell other people how to use our new code, but if I tell them to "just write ÷" it'll turn my message into "just write ÷" !! So how can we fix this?

What if we make & its own code?

  • & —> &
  • ÷ —> ÷ ???

Yes! That'll work :)

This is how & came to be, and it's specifically used in HTML as a way to write those symbols above (and escape other a few other symbols for similar reasons we did with &)

As for why & shows up as &, there are 2 main places I can see this happening:

  1. The editor you use to write it automatically converts an & —> &. But the user typed in & (making it &). I think this is most likely. I'm guessing the title of posts automatically do the conversion, but the post body and comments do not because it uses a raw markdown editor
  2. In some contexts the & specifically doesn't get converted? like how you can write `&` to get & as opposed to seeing
[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

Lol, ok there might be a little more than what meets the eye, cuz when i type `&` (without amp;) it converted it to & !!

new challenge- try to get it to render a & instead of & inside of `` (or ``` ``` for bulk testing)

tried:

  • `&`
  • `&`
  • `&`
  • `&;`
  • `&<invisible character>`
  • `&divide;` still becomes `÷` weirdly enough

I kinda cheated cuz its not the same character... but I got it to show by using the japanese monospaced & (&)

test:

[–] [email protected] 9 points 1 year ago* (last edited 1 year ago)

Let's see of I can give a trimmed-down explanation of what "character escapement" is, because others have covered that &amp; in web-land is an escapement character.

The simplest type of escapement is probably quotes.

var myString = "This is a string";

This little line of pseudo-code is roughly what you would write (depending on language) to make a program write the text This is a string into some location in memory, that is, a sequence of numbers that are the standard numbers for representing those letters.

Here, the double-quote character is serving a special purpose, to designate that the characters within the set of quotes represent not instructions for things that the program should do, but instead just bits of data that the program should load.

Now consider: what if the character data that you want to load into memory has an actual double-quote character within it? How does the compiler (the program that turns your code into its own program) know the difference between a double-quote character that's supposed to serve the special purpose, and a double-quote character that's just supposed to be a piece of data like the other characters? The answer is escapement.

var myString = "This is a \"string\"";

Here, the backslash character serves its own special purpose of escaping other characters. When the compiler is reading this code, it knows that whatever character follows the backslash is supposed to be interpreted specially: in this case, the double-quote should not be interpreted as the end of the string, as usual, but as just a character to be put within the string. The backslash doesn't end up in memory with the other characters, but it tells the compiler how to interpret things.

In web-land, ampersand is an escape character. If you want to embed plain text to be displayed on the screen, within HTML, you need to "escape" special characters that have a non-text purpose normally, in order to get those characters to display as text. Ampersand is the escape character in HTML, and by extension, it also has its OWN escape sequence, which is &amp;.

The reason you see &amp; in places across Lemmy is likely just due to a bug of some kind. Somewhere between when the user is entering this text, and when it later gets displayed, there's code that's adding escapement to the text an extra time than is necessary.

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

The reason is because a programmer at some point decide that &amp; should indicate the start of a special symbol in HTML. In programming parlance this is a means of “escaping” characters which are reserved.

For example, in HTML, things look something like this:

<p>Hello, World!</p>

The p in the less than and greater symbol symbols means “paragraph” where the ending version with the slash means “the paragraph is done”.

However, there’s a problem. What if you wanted to actually type out <p> to the end-user and have it not be treated as HTML? You use the ampersand syntax to write &lt; by using &lt; and > by using >.

</p><p>&lt;p></p>

Yet another problem: If we use &amp; as a special character in HTML, we also need a way to display it—the answer is &amp;

[–] [email protected] 1 points 1 year ago (2 children)

Ok so it didn’t do it. I don’t get it.

[–] [email protected] 3 points 1 year ago

In my internet experience the & is a bug that pops up sometimes. Some sites I've been on it is a problem that comes and goes.