I found basic functioning of worktrees to fail with submodules. The worktree doesn't know about submodules, and again and again messes up the links to it. Basic pulling, switching branches, ..., all of this frequently fails to work because the link to the submodule is broken. I ended up creating the submodules as worktrees of a separate checkout of the submodule repo, and recreating these submodule worktrees over and over. I pretty much stopped using worktrees at that point.
Have you tried the global git config to enable recursive over sub modules by default?
Nope, fingers crossed it helps for you ;) Unrelated to worktrees but: in the end I like submodules in theory but found them to be absolutely terrible in practice, that's without even factoring in the worktrees. So we went back to a monorepo.
Such gains by limiting included headers is surprising to me, as it's the first thing anyone would suggest doing. Clang-tidy hints in QtCreator show warnings for includes that are not used. For me this works pretty well to keep build times due to headers under control. I wonder, if reducing the amount of included headers already yields such significant gains, what other gains can be had, and what LOC we're talking about. I've seen dramatic improvements by using pch for instance. Or isolating boost usage.