The compiler is getting more and more parallel but there's a few bottlenecks still. The frontend (parsing, macro expansion, trait resolution, HIR lowering) is still single-threaded, although there's a parallel implementation on nightly.
Optimal core count really depends on the project you're compiling. The compiler splits the crate into codegen units that can be processed by LLVM in parallel. It's currently 16 for release builds and 256 for debug builds.
This theoretically means that you could continue to see performance gains up to 256 cores in debug builds, but in practice there's going to be other bottlenecks.
Compilation is very memory and disk-I/O intensive as well. Having a fast SSD and plenty of spare memory space that the OS can use for caching files will help. You may also see a benefit from a processor with a large L3 cache, like AMD's X3D processor variants.
Across a project, it depends on how many dependencies can be compiled in parallel. The dependencies for a crate have to be compiled before the crate itself can be compiled, so the upper limit to parallelism here is set by your dependency graph. But this really only matters for fresh builds.