Partitioning data across multiple, network connected FPGAs with high bandwidth memory to accelerate non-streaming applications