you should use the Wilcoxon signed-rank test for comparing EFFSAFE1, EFFSAFE2, EFFSAFE3, and EFFSAFE4
I didn't see the wilcoxon signed-rank test in the code. I noticed you did a paired comparison of EFFSAFE1, EFFSAFE2, EFFSAFE3 and EFFSAFE4 in the first plot, so I am assuming the wilcoxon signed-rank test should have been used.