GCN: 4 workitems per compute unit but with computing float16 in each.

Normally I give it 256 workitems per workgroup and 64 cores compute those workitems but what if I just pick 4 items with float16 math? Does the compiler map each item to  whole 16-wide SIMDs in any of drivers or GCN versions?