Audio Engineer here. Not sure Ardour can open video, but it's a capable DAW and open source. Reaper is closed source but it can open (and even render) pretty much any video format. To actually seperate a single voice, you do need additional plugins though, no matter which DAW you're using.
I think iZotope RX could do it, but it is fairly expensive. I haven't seen any open source audio tools that can do this at all. It is pretty much guaranteed to require some kind of machine learning, as parametrically seperating by EQ or phase won't work if you have only one source signal (even with two or more microphones, it would be really, really hard).
A very good spectral editor might technically work, it would however take several days of manually deleting select frequencies on an almost single sample level and still sound bad, especially if the noise is nearly the same level as the signal.
Free and Open Source Software
If it's free and open source and it's also software, it can be discussed here. Subcommunity of Technology.
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
I would recommend RX as well, but it is pricey if you can't wait for a sale (and if you're only using it once, its expensive when on sale too).
Assuming your footage isn't super long, I'd be happy to try running it through RX for you - feel free to send me a message
The technique you linked is, even at a couple years old, a pretty cutting-edge technique. You aren’t going to find it or something similar in any video editing software. Maybe someone’s made a plugin for one if you are lucky.
However, there are a lot of free tools that make it easy to split or rejoin audio and video, and convert it between different formats.
Id recommend:
- Audacity if you want a GUI
- FFMPEG if you want a command line tool
- VLC can also do a lot of conversions FFMPEG does if you dig through its features (it’s basically a GUI wrapper for FFMPEG)
Any AI solution you find is probably going to be command line / python and is going to require some debugging of your python environment and dependencies to get it working. And that means yes, you will need to separate the audio and video tracks and then recombine them. For that kind of work, I'm only familiar with Linux tools. I've used a tool called Vidcutter that is buggy, but powerful and has a semi intuitive gui.
That said, the results from those AI tools can be a powerful game changer if you can figure them out.
It's possible that there's a reason it requires lossless audio, in that it requires uncompressed signal to work. For instance, if the ML model is trained on uncompressed data, it may need audio which has never been compressed.