Technology

62782 readers

2961 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

223

Deleted GitHub data is forever accessible to anyone, researchers claim | Cybernews (cybernews.com)

submitted 6 months ago by AnActOfCreation to c/[email protected]

14 comments fedilink hide all child comments

“The implication here is that any code committed to a public repository may be accessible forever as long as there is at least one fork of that repository,” the report’s authors claim.

Am I dumb or is this exactly the purpose of forks? I feel like I'm missing something.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 7 points 6 months ago (1 children)

I think Github keeps all the commits of forks in a single pool. So if someone commits a secret to one fork, that commit could be looked up in any of them, even if the one that was committed to was private/is deleted/no references exist to the commit.

The big issue is discovery. If no-one has pulled the leaky commit onto a fork, then the only way to access it is to guess the commit hash. Github makes this easier for you:

What’s more, Ayrey explained, you don’t even need the full identifying hash to access the commit. “If you know the first four characters of the identifier, GitHub will almost auto-complete the rest of the identifier for you,” he said, noting that with just sixty-five thousand possible combinations for those characters, that’s a small enough number to test all the possibilities.

I think all GitHub should do is prune orphaned commits from the auto-suggestion list. If someone grabbed the complete commit ID then they probably grabbed the content already anyway.

[–] [email protected] 2 points 6 months ago (1 children)

Thanks, I think that explains it a bit more. It is unexpected to me, as a non-git expert, and I'm sure many others.

[–] [email protected] 2 points 6 months ago

I guess the funny thing is that each Git commit is internally just a file. Branches and tags are just links to specific commit files and of course commits link to their parents. If a branch gets deleted or jumped back to a previous commit, the orphaned commits are still left in the filesystem. Various Git actions can trigger a garbage collection, but unless you generate huge diffs, they usually stick around for a really long time. Determining if a commit is orphaned is work that Git usually doesn't bother doing. There's also a reflog that can let you recover lost commits if you make a mistake.