Programming

20425 readers

324 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]

founded 2 years ago

MODERATORS

Tracking Sqlite binary file with git? (lemmy.world)

submitted 2 years ago by [email protected] to c/programming

11 comments fedilink hide all child comments

I have a project in development that I'm working on and I frequently switch between two computers. I am including my sqlite file in git and so far it's been fine but I've heard in the past that git doesn't do well with binary? Has anyone actually had issues doing this?

I decided to perform a dump just in case so i dont have to start from scratch if something does go wrong.

you are viewing a single comment's thread
view the rest of the comments

[–] atheken 3 points 2 years ago* (last edited 2 years ago) (1 children)

It sounds like you might be developing an app with an evolving schema.

You should consider adopting a db migrations framework and having a task that can apply them to a dev database to bootstrap/upgrade the DB. If you take this route, you won’t need to even commit the db file, and you will be able to easily seed/replicate the DB schema when you deploy it.

Additionally, SQLite is awesome, and if you are actually storing some data, you can do some stuff where you can make tables that are backed by structured text files (like CSV), so there are ways to store data in text while still getting the benefits of having a SQL interface

Large binary files will start to expand the git repo, but if they’re relatively small, and the update frequency is somewhat limited, it won’t really be an issue. If you are concerned about it, you can look into git-lfs, but it might not matter much.

EDIT: Also, since the "git is bad with binary files" is such a pervasive myth, I decided to check into it a little bit. A couple things:

Git uses "delta compression" when packing/storing/transmitting files. This allows common chunks to be stored once and then reassembled when you check out a file. It does this for "normal" files reguardless of whether they are text or binary until they are considered "big", at which point, they are stored as a single unit in the pack file. What's "big"? By default, 512MB.

You can go pretty deep on the internals of the way that packfiles are constructed in git, but more than likely, a file that's a few MB is still going to work fine, and you will get some storage reduction when you commit it.

You should configure automatic gc to periodically repack stuff so that the actual .git repo doesn't balloon, but again, even if you're talking about a few GB, it's still not much on modern systems.

[–] o11c 1 points 2 years ago

git is, however, bad with files that don't have meaningful small binary diffs. And the page size for SQL binary files is small enough that that is in fact a problem (though this is not nearly as bad as already-compressed files).

If you disable VACUUM that can give a rough idea of what git actually has to deal with. But you really shouldn't.