OpenCL -O2 makes program incorrect: Help!

Hello AMD Forums,

 

This is my first post and I'm also new to OpenCL. Hopefully my mistake here isn't too "n00b". At the core, my program functions perfectly well at -O0 optimization level, but as soon as I go to -O1 or -O2, the output is completely wrong. The program overall is an OpenCL-based PuyoPuyo AI. In short, when you match four of the same color together, the "puyos" fall down, creating a chain reaction. This GIF is a good example of how these chain-reactions can occur: https://i.imgur.com/jWBs2JI.gif . Generally speaking, the bigger your chain, the more damage you do to the opponent.

 

There's a lot going on here (falling and scoring for example), but I've simplified the code down to the "pop" routine only. I've removed as much code as possible but still leave what demonstrates the problem. The attached .zip file contains my simplified Visual Studio 2015 project. Please let me know if there are any issues with this .zip file or the project.

 

My Hardware is an R9 290x and my drivers are "Radeon version 17.10.3".

 

-------------

 

The output includes a ton of "printf" statements, but the very first 4 statements are all you need to see the difference from -O2 and -O0. In -O2, the first few lines are:

 

Err is: 0
A 80000000, 80000000 00000000 00000000
Color Table: e0800000 04001500 e5600020
B 80000000, 80000000 00000000 00000000
C 80000000, 80000000 80000000 80000000

 

Without going into how the algorithm works... the difference between line B and line C are intriguing. The printf in the OpenCL "pop.cl" file really show the mystery.

 

          printf("B %08x, %08x %08x %08x", pickedBit, groupTable[0], groupTable[1], groupTable[2]);
          while (didGrow) {
               didGrow = false;

               // Why is "Printout C" wrong when compiled with -O2?
               printf("C %08x, %08x %08x %08x", pickedBit, groupTable[0], groupTable[1], groupTable[2]);








 

This is simple enough: there's absolutely no change to "groupTable[1]" between "printf B" and "printf C". So why would these variables change with nothing but a "while" loop in between them?

 

Again, this doesn't happen with -O0. My only guess at the moment is that this might be a compiler bug? But I'm wondering if there was anything else that I could have done that would have made this output.

 

Thank you for your time. If there are any questions on how the code is supposed to work, please ask. I would like to get to the bottom of this if at all possible. In particular, -O2 has zero-scratch registers in the full code, so I would rather use the optimized -O2. But if -O2 is wrong, I guess I'll be forced to keep my code at -O0.

 

---------

 

Edit: some notes: This GIF: https://i.imgur.com/jWBs2JI.gif is represented by

struct bitboard bb = toBitboard("BGRYBYBGRYBYBGRYBYGRYBGGGYRGYGYRBBBRGYRGGRGYRBBGBRGGGRR0GRBRG000BB0000G0000000");

 

And I've verified that they are the same. "B" for Blue, "G" for Green, "R" for Red, and "Y" for Yellow. Within the code however, I refer to these colors as "A, B, C, and D". Red is A, Green is B, Blue is C, and Yellow is D.

 

The algorithm starts to look for groups of 4 in the bottom left. Then it works "up" and "over". The "color table" is a local copy of the current color that its trying to match, as a bitblock. ColorTable[0] == "e0800000" represents column 1 and column 2. E == 0x1110, which correlates to Blue-Blue-Blue in the column 1 (starting at the bottom).

 

Hopefully this edit will help anybody who is trying to understand what the code is doing. What I expect is for "groupTable[0]" to be 0x8000000, while groupTable[1] and groupTable[2] needs to be == 0x00000000, at least in the first iteration of the loop. Ultimately, the groupTable[] will EVENTUALLY equal: (0xE0000000, 0x0, 0x0), representing the 3-blues in the bottom left (which isn't a large enough group to pop).