OpenGL UBO performance issue XOR fence sync problem

This topic is aimed to discuss two different problems:

- UBO with glBufferSubData is extremely slow

- sync objects seem to ignore that the buffer I would like to reuse is no longer in use by the GPU (same thing happens on Intel, but with somewhat different effects).

 

First of all, screenshots from the expected and actual results:

 

1.png 2.png

And some measurements on two different cards (I refer to the methods by their number):

 

Intel HD 4600:

1 - 23 fps

2 - 23 fps

3 - 7 fps

4 - 23 fps and the above bug (camera movement reveals that the missing meshes are drawn to the same place as the visible ones)

 

AMD R7 360:

1 - 103 fps

2 - 5 fps

3 - 46 fps

4 - 96 fps and the above bug (missing objects are flickering)

 

Repro source code: Dropbox - FOR_AMD_4.zip

 

I have two questions:

- why the (huge) difference between methods 2 and 3 wrt. to Intel/AMD?
- why are the other teapots missing in method 4 (unless I orphan, which drops performance again)