this post was submitted on 09 Oct 2024
6 points (87.5% liked)

Django

417 readers
3 users here now

Django Project

Django Community

Django Ecosystem

Jobs
Learning/Docs
Podcasts:
Related Fediverse communities
Feeds

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 1 month ago* (last edited 1 month ago) (3 children)

I rather like the directory structure change of placing the project "app" separate from the other apps, and am a big supporter of using just or task over make for a variety of reasons, but I'm going to push way back on two fronts: multiple configurations & settings files and secrets in the repo.

Your configuration & settings should be identical between environments. Otherwise you're injecting surprises into your project that only happen in certain environments. If your local development operates differently from production, how exactly to do expect to find & fix problems that only occur in production? I wrote a whole rant on this a couple weeks ago, so my frustration on this front is rather fresh. I see it all the time and it's always a problem I need to un-break when I start a new job. Your environment can change, but the code executing within it should not. See the 12-factor app for further discussion on this topic.

As for secrets in the repo (encrypted or not) I've seen this before at a previous job and while it works rather handily, I really don't recommend it. It introduces a needless amount of complexity, can potentially leak production credentials, and requires a code change (+CI run, +deploy) to change any of the values.

Think about all the places your source code goes:

  • Your (likely 3rd-party external) git host
  • Your (also likely 3rd-party external) CI runner
  • Every developer's company laptop
  • Potentially many developer's personal devices when they "just want to work on that thing"
  • Any copies they might keep for personal reference after they quit or are fired
  • Any computer or phone that was used to look at the code in a browser
    • ...and any plugins that browser might be running

Now consider how many of them likely have had access to the keys to decrypt certain values over time, how they might have stored the keys in plain text on their machine, or even been Super Careful with everything but were nonetheless compromised by a virus/hack because their kid used their computer that one time.

All that might be acceptable if the benefits were high, but they aren't. Now, instead of just one environment to configure, you have potentially dozens: production, staging, testing, and one for every developer who's ever worked there. Each one with different values, only some have been updated. When Steve gets fired, how many files do you have to decrypt, edit, and re-encrypt to rotate the secrets? This replaces a small headache with a migraine.

I address this in the post above as well, but the TL;DR is that you can bake known, insecure values into your project (in my case, in compose.yaml) and share any remaining actual secrets required for development sparingly via back-channels. When in staging or production, Compose isn't in use, so your project will die unless these values are set in the environment -- values which should be provided by whatever means that environment favours. Personally, I'm rather fond of tools like Secrets Manager & Vault because they offer an audit trail and a means to alter values without a code change, but a lot of companies prefer to use things like Kubernetes secrets. Whatever it is, it should not be a file in the repo.

One last note about the license: I love the GPL, but you should know that under this license, anyone using your template in their project necessarily must license that project under the GPL. Your description suggests that it's enough to say that any modifications to the template must be shared, but that's not how the GPL works. If I were to use your template in a project about cats, I'd be creating a "derivative work" based on your template which, under the terms of the license means my cat project must also be GPL-licensed.

What you perhaps could get away with would be treating this like documentation, and licensing it under something like the Creative Commons Attribution-ShareAlike license, where you provide it as a sort of guideline for other projects, but not as code upon which you'd base a project of your own.

[–] [email protected] 1 points 3 weeks ago* (last edited 1 week ago) (2 children)

Hi @[email protected] , author here.

Thanks for taking the time for such an in-depth comment πŸ‘ . I'm gonna try to answer point by point.

On different configurations and settings

Although I strongly agree with your take (from your recent blog post) that if ENVIRONMENT == "prod": should never happen, I'm really surprised about this:

Your configuration & settings should be identical between environments

I think there's a misunderstanding here. Here's how I see it:

  • A "setting" is a constant your application needs at runtime to make a decision about its behavior
  • A "configuration" is a way of running your application, with some "settings" preset and some "settings" required from the "environment"
  • An "environment" is a place where your application runs under a certain "configuration" that you've chosen, and this "environment" provides the values for the "settings" your chosen "configuration" requires

This way of thinking enables you to have multiple environments running the same configuration with different values for some settings.

As for:

how exactly to do expect to find & fix problems that only occur in production?

From my experience, 99% of the time, problems that only occur in production are all about data and edge cases, not about configuration. And I've learned to avoid making the whole structure of my application constrained by 1% of the cases.

On secrets-in-repo

To be honest I have no experience with secrets-in-repo, and I am seeking feedback, but:

secrets in the repo (...) requires a code change (+CI run, +deploy) to change any of the values.

How changing of .env file is considered a code change? Of course it triggers a deploy, but it should not trigger a CI run, as the application behavior did not change, only some envvar in some environment. Changing an .env file is exactly as changing an envvar through your cloud provider console, with the only difference that the envvars are yours, which is pretty important to me.

instead of just one environment to configure, you have potentially dozens: production, staging, testing, and one for every developer who’s ever worked there

This has nothing to do with storing secrets in the repo, but everything to do with how many environments you have. I think many teams like to have at least one staging environment, don't you think?

how they might have stored the keys in plain text on their machine

I don't see any difference with the infamous ~/.aws/credentials file that we all have on our work computer, that allows an attacker to fetch all secrets from all environments with the right AWS API calls. Yes, a developer who gets their laptop stolen and who did not encrypt the disk represents a huge risk to their company.

When Steve gets fired, how many files do you have to decrypt, edit, and re-encrypt to rotate the secrets?

Just the ones that are non-development, typically production and staging. Steve did not have the keys to other files, this is the whole point of the strategy I chose of having 2 keys (one individual, one shared). To be honest, even for not-so-small teams, this would not be a long task. Also, when I join a team or when I build a team, I don't expect people getting fired too often πŸ˜…

All that might be acceptable if the benefits were high, but they aren’t.

I guess that's subjective then. Because IMO, being independent from cloud providers when it comes to handling my applications secrets is a great benefit. Maybe I'm wrong on the risks/benefits ratio between the independence benefits and the security risks, but nothing in your comment convinces me that storing secrets in the Git repo is less safe than storing them in a vault that I access with a key that I have on my laptop.

On licensing

I'm sorry but unless you point me to legal sources, I think you're wrong about what the GPLv3 means ; I've studied this topic carefully before choosing that licence because it's my first open source piece of code.

No, using a GPLv3 web project template as base of a new web project doesn't make the larger project a "derivative work" that must also be published under GPLv3, for two reasons:

  • What makes the GPLv3 license "viral" is when a program "links" to a GPLv3 licensed library ; using a project template does not "link" (in a software sense) the newly created project to the template (i.e. there will be no from mytemplate import something in your Django project, thus your project won't be considered as "derivative work")
  • It's the "distribution" of the covered software (modified or unmodified) that triggers the obligation of publishing the source under GPLv3 ; remote execution (e.g. a web application) is not "distribution", this is why AGPL exists. Examples of Django reusable apps published under GPLv3 exist (e.g. jazzband/django-invitations ; imagine if everybody using this had to release their own website under GPLv3?)

In contrast, what would be considered as "derivative work" is someone enriching my Django project template in order to sell a standardized Django experience as part of their consultancy. They would have to release under GPLv3 the changes (fixes and new features) they would make to my template.

I like your suggestion to look at Creative Commons licenses though, maybe it would make sense, thanks.

[–] [email protected] 1 points 3 weeks ago (1 children)

So let's get the licensing bit out of the way first. I am 100% confident that you're wrong on this. The GPL is a copyright license and like all copyright licenses, it applies to the work and your rights to copy it. If you choose to copy the contents of a GPL project's code into your own project, the license dictates that you must license your project under the GPL. For example, if you were to build a new kernel for your own special operating system and copy out significant portions of the Linux kernel to do it, your new kernel will be covered by the GPL.

You may be confusing the GPL with the LGPL here, which specifically has an exemption for linking. Under that license, you can link to a GPL project (it's not clear if a Python import would qualify as this was originally written for external modules in C projects) without your project being covered by the GPL.

You're also misunderstanding "distribution" here. While it's true that there's a distinction between the GPL and AGPL in how this word is defined, it does not affect how the license applies. To use another example, the fuzzywuzzy project is GPL-licensed, so if you were to use it in your Django project, it would necessarily make your Django project GPL. However, as "distribution" under the GPL applies only to sharing copies of the project with others and not to services provided over the web, your project will be GPL, but you'll be under no obligation to share the source with anyone unless you were to copy the project onto someone's laptop. So long as your project is just a webserver sending HTML to the user, you're under no obligation to share the source code for your server.

The AGPL on the other hand includes accessing the software over a network under its definition of "distribution" and so if fuzzywuzzy were AGPL licensed, this would require you to publish your Django project's source publicly.

Source: I too have been reading heavily on this front for about 23 years, so much so that I married a copyright lawyer. We talk about this stuff a lot.

Regarding the secrets in-repo, I'm not going to fight you on this. In my experience it's a Great Big Pain In The Ass to manage these things and I think you may be overlooking just how many of the devs on your team may need the rights to read/write production values.

As for the making the distinction between settings and configuration, again I think you're going to live to regret this as every company I've started at that employs this pattern has. You simply can't have your development, testing, and production environments running different middleware classes (as your example suggests) and not be due for a surprise in production. No, your settings should be as close to production in all environments as possible, and breaking your settings up like this is just begging for deviation.

As for the claim that only 99% of problems in production are data-related, that too is not my experience with such systems. If you're talking to S3 in production and local folders in development, or SQS in production and synchronous execution in development, you will have problems, and you won't be able to detect them, let alone debug and fix them in an environment that doesn't match the place you're deploying to.

[–] [email protected] 1 points 1 week ago* (last edited 1 week ago)

Thank you for the clarifications about how the GPL license applies, and what the consequences are.

If I'm following you correctly:

  • Having a GPL-licensed dependency in my Django project makes my Django project GPL-licensed
  • But since my Django project is not going to need to run anywhere else than on my servers, I won't ever need to share a copy of my project to anyone, so the fact that my project is GPL-licensed has no implications whatsoever, it can live in my private Git repository like any proprietary piece of software

So in the case of my Django project template:

  • The project template is GPL-licensed
  • This makes any Django project based on it is GPL-licensed
  • But real-life Django projects being made to run on a server only, in order to offer services through the web, their being GPL-licensed will have no practical implications
  • Which means that people can safely use my template as it is, with the GPL, without having to worry about it

Thanks for the discussion, I've learned a lot!