Performance Optimization Of Memory-Bound Programs On Data Parallel Accelerators