As discussed previously, browsers are quite complex and so adding a new feature (subtitles) is actually adding several features, on top of existing features (video player) that aren't really (arguably) core to the web experience.
(I think olds like me want to believe the web is still "for" text and static images, but the majority of users today are (allegedly) all-in on video.)
Anyway, what sub-features make up "simple" subtitles? Oh the usual: where are they sourced? What format? What language? What encoding? (Utf8 one can only pray) Left to right support? Asian character support? What font are you using? System fonts? Are they widely supported? Does any of it work on mobile? Who holds the relevant patents? Etc.