Practical Mechanisms for Reducing Processor–Memory Data Movement in Modern Workloads