this post was submitted on 15 Sep 2024
20 points (100.0% liked)

linux4noobs

1341 readers
1 users here now

linux4noobs


Noob Friendly, Expert Enabling

Whether you're a seasoned pro or the noobiest of noobs, you've found the right place for Linux support and information. With a dedication to supporting free and open source software, this community aims to ensure Linux fits your needs and works for you. From troubleshooting to tutorials, practical tips, news and more, all aspects of Linux are warmly welcomed. Join a community of like-minded enthusiasts and professionals driving Linux's ongoing evolution.


Seeking Support?

Community Rules

founded 1 year ago
MODERATORS
 

So, I am looking for any ideas, help or head(s) smarter than mine.

This is my main problem for some time now, easiest way to replicate this is to run anything proton related. After that, some apps just keep spitting this via dmesg:

[ 1403.146954] eartag[5309]: segfault at 30 ip 000077f7d796491c sp 00007ffe20a35210 error 6 in libgtk-4.so.1.1400.4[56491c,77f7d74c7000+4c2000] likely on CPU 7 (core 3, socket 0)
[ 1403.146964] Code: c2 4c 8d 4d c0 48 8d 35 02 75 4a 00 45 31 c0 0f b6 d2 e8 c7 4c 01 00 f3 0f 10 03 48 8b 55 c0 4c 89 fe f3 41 0f 58 07 4c 89 f7 <f3> 0f 11 42 30 f3 0f 10 43 04 f3 41 0f 58 47 04 f3 0f 11 42 34 f3
[ 1408.104856] gnome-system-mo[5360]: segfault at 30 ip 00007c02762ec29e sp 00007ffc399829d0 error 6 in libgtk-4.so.1.1400.5[4ec29e,7c0275e81000+4b5000] likely on CPU 1 (core 1, socket 0)
[ 1408.104871] Code: 48 8d 35 c5 6f 62 00 45 31 c0 e8 fd 0f 01 00 48 8b 45 b0 48 8b 5d c0 4c 89 ff f3 41 0f 10 04 24 f3 0f 58 00 48 89 da 48 89 c6 <f3> 0f 11 43 30 f3 41 0f 10 44 24 04 f3 0f 58 40 04 f3 0f 11 43 34
[ 1434.535310] blackbox[5476]: segfault at 30 ip 00007d7ba0f6491c sp 00007ffcc9c2bfb0 error 6 in libgtk-4.so.1.1400.4[56491c,7d7ba0ac7000+4c2000] likely on CPU 4 (core 0, socket 0)
[ 1434.535318] Code: c2 4c 8d 4d c0 48 8d 35 02 75 4a 00 45 31 c0 0f b6 d2 e8 c7 4c 01 00 f3 0f 10 03 48 8b 55 c0 4c 89 fe f3 41 0f 58 07 4c 89 f7 <f3> 0f 11 42 30 f3 0f 10 43 04 f3 41 0f 58 47 04 f3 0f 11 42 34 f3
[ 1519.054496] tidal-hifi[5241]: segfault at 0 ip 0000776c9650bccc sp 00007ffce741f1a0 error 6 in libnvidia-glcore.so.560.35.03[b0bccc,776c95e00000+c00000] likely on CPU 1 (core 1, socket 0)
[ 1519.054505] Code: 41 0f 7e ce 89 fd 48 8b 80 80 00 00 00 c1 e2 12 66 41 0f 7e d5 48 b9 72 0e 05 a0 04 00 00 00 81 ca 00 0e 00 80 66 41 0f 7e dc <89> 10 48 83 c0 1c 48 89 48 e8 66 0f 7e 40 f0 66 0f 7e 48 f4 66 0f

After a restart, all apps run perfectly fine of course and I can do whatever I need to do, until they stop working again.

  • On one forum, someone was suggesting a problem with memory, so I memtested it hard — everything is fine

  • On another, the idea was to reinstall intel-ucode package, this unfortunately didn't help neither

  • Finally, I've tried sudo pacman -Qnq | sudo pacman -S - because it was a suggestion somewhere too, no changes after that

My boat:

Arch Linux x86_64 / 6.10.9-arch1-2
Gnome 46.4 / Mutter (X11)
@
ASUS Z97-PRO (Wi-Fi ac)
INTEL i7-4790 + NVIDIA GTX 1070 Ti

Big, fat thanks for your time!

Update: Idea from jrgd did the trick! I am still testing it, but so far, I can't force anything to fail and everything seems to be working fine!

top 10 comments
sorted by: hot top controversial new old
[–] [email protected] 7 points 1 month ago (1 children)

Well, I don't think that there's going to be a way to narrow it down just from that. Can maybe suggest some things to try.

Segfaults happen when an app is trying to write to memory that it shouldn't be. It tends to indicate either a bug (which shouldn't be the case at the application level, given multiple applications doing it), or corrupt memory.

INTEL i7-4790

Well, it's not the recent 13th+14th gen Intel hardware problems; I had two processors produce corrupt memory via that route recently. That's an ten-year-old processor, though.

Memory would be an obvious thing to blame, but memtest not hitting it is an argument against that.

Proton-related stuff -- Windows games running under Steam -- might use a fair bit of memory. Do you have swap space, maybe?

$ grep ^SwapTotal /proc/meminfo 
SwapTotal:        999420 kB
$

I can imagine a swap device corrupting memory. If you're using any swap, I'd probably do a fresh boot, use sudo swapoff -a to disable swap space, and then try your repro case with proton.

It's possible that there's some kind of kernel bug relating to memory allocation that you're tripping, I guess, but you're running an up-to-date Arch, so I assume that you've got a recent kernel, and you're saying that this has been going on for "some time". That would be consistent with problems continuing to happen once the problem occurs on a given boot and a reset making things go away, I suppose.

I guess it could hypothetically be a problem with the Nvidia drivers...if the Proton games you're playing are 3d games, that might be tripping it.

A game might also be stressing the CPU, triggering temperature or other issues. The stress utility would let you generate sustained CPU load on cores (--cpu), maybe see if that reproduces your problem.

Proton games might be loading the GPU...I don't know of a good way to artificially generate load there, unfortunately.

You could try checking the kernel log for any errors preceding the segfaults in the kernel log, maybe when you're running that Proton game; any issues there might give a clue.

[–] [email protected] 3 points 1 month ago (1 children)

Okay, tested this just now, disabling swap makes no difference but thanks for a new command for the future!

[–] [email protected] 2 points 1 month ago (1 children)

Which memtests have you tried? They all function a little differently, and passing one doesn't mean it will pass another. My rig passed OCCT and TM5 with flying colors, but it would fail every time on prime95 (until I eventually got it stable).

[–] [email protected] 2 points 1 month ago (1 children)

I just went for the memtest86 and let it do its thing.. Should I do some more testing? I really feel like this is not about memory, at least not in a “physically damaged” way.

[–] [email protected] 1 points 1 month ago

It might not be a memory thing. If you run out of options and are down to trying memory again, take a look at the MemTestHelper test recommendations. You shouldn't have to run tests for more than ≈0.5–1.5 hours at a time (the 8hr+ testing regimen is pointless).

https://github.com/integralfx/MemTestHelper/blob/oc-guide/DDR4%20OC%20Guide.md#memory-testing-software

[–] [email protected] 3 points 1 month ago (1 children)

The potential common cause points toward the GPU drivers (note of games in Proton, libgtk4 segfaults, and libnvidia-glcore segfaults). What nvidia driver version is in use. A quick search found a rough match to shown symptoms, but is recent and matches the hardware (NVidia Polaris desktop). Perhaps the driver version in use exhibits a similar showing of a regression for such GPUs?

[–] [email protected] 2 points 1 month ago

Hey, I just met you, and this is crazy, 
but so far — this is working,
so thank you, maybe?

[–] [email protected] 1 points 1 month ago* (last edited 1 month ago) (1 children)

Have you tried reinstalling proton, or installing an older version of proton?

One of a few things is happening

when a game is calling upon proton, and this error is triggered, and your game crashes, it was either because

  1. Proton (or some affiliated process) reached out to memory at an address it doesn't have permission to play with.

  2. Proton (or some affiliated process) reached out to memory at an address that makes no sense.

  3. Some kind of overflow error occured

It was either a misconfig on your par, a new version compatibility issue, or a bug causing something in this stack to reach out to a nonsense memory location or causing a stack or buffer overflow, something like that.

Things I'd try:

Check Protons compatibility with your current kernel version.

Check Proton/steamlogs for more info on what might be going on.

If this problem is new and you recently upgraded... Consider rolling back.

[–] [email protected] 3 points 1 month ago (1 children)

No, anything proton related works fine, other apps start crashing when something using proton runs in the background or after it was running before.

This problem keeps popping up all the time, thru multiple updates since I finished installing everything.

I will check the logs.

[–] [email protected] 2 points 1 month ago

Ah sorry, I misunderstood.