this post was submitted on 07 Aug 2023
57 points (96.7% liked)

Lemmy.ca Support / Questions

496 readers
1 users here now

Support / Questions specific to lemmy.ca.

For support / questions related to the lemmy software itself, go to [email protected]

founded 4 years ago
MODERATORS
 

Right now, robots.txt on lemmy.ca is configured this way

User-Agent: *
  Disallow: /login
  Disallow: /login_reset
  Disallow: /settings
  Disallow: /create_community
  Disallow: /create_post
  Disallow: /create_private_message
  Disallow: /inbox
  Disallow: /setup
  Disallow: /admin
  Disallow: /password_change
  Disallow: /search/
  Disallow: /modlog

Would it be a good idea privacy-wise to deny GPTBot from scrapping content from the server?

User-agent: GPTBot
Disallow: /

Thanks!

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 18 points 1 year ago (5 children)

Yes. Ban them.

if ($http_user_agent = "GPTBot") {
  return 403;
}
[–] [email protected] 6 points 1 year ago (3 children)

Probably want == instead else we will all be forbidden

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

I would have thought so too, but == failed the syntax check

2023/08/07 15:36:59 [emerg] 2315181#2315181: unexpected "==" in condition in /etc/nginx/sites-enabled/lemmy.ca.conf:50

You actually want ~ though because GPTBot is just in the user agent, it's not the full string.

[–] [email protected] 2 points 1 year ago

Strangely, = works the same as == with nginx. It's a very strange config format...

https://nginx.org/en/docs/http/ngx_http_rewrite_module.html#if

[–] quesomodo 1 points 1 year ago

Look at me! I'm the GPTBot now!

load more comments (1 replies)