study done this past December to get a sense of how possible this is: Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers" – Catherine Gao, et al. (2022)Blinded human reviewers were given a mix of real paper abstracts and ChatGPT-generated abstracts for submission to 5 of the highest-impact medical journals.
I think these types of tests can only result in showing human failing at them. Because the test is reduced to judging only the single artefact as a thing in itself, no context etc. That's the basic element of all cons: make you focus narrowly on something, where the facade is, and not where you would find out it's fake. Turing isn't about whether something's human, but whether we can be made to believe it is human. And humans can be made to believe a lot. Turing needs to keep you from looking behind the curtain / in the room to make the test work, even in its shape as a thought experiment. The study (judging by the sentences here) is a Turing test in the real world. Why would you not look behind the curtain? This is the equivalent of MIT's tedious trolley problem fixation and calling it ethics of technology, without ever realising that the way out of their false dilemma's is acknowledging nothing is ever a di-lemma but always a multi-lemma, there are always myriad options to go for.