Hard to say from the article only, but if it is like the status quo in the EU and USA, then only the training data can be illegally obtained. If I have an AI that is able to say verbatim the script of the Bee movie, I will be sued.
Google books had a similar issue. They scanned pretty much all the books in existence and indexed them. Small issue they did not obtain the consent of the copyright holders before doing this. They were sued and won. You can use copyrighted data as long as you do not provide Access to it.