ComputingReviews.com

Understanding co-running behaviors on integrated CPU/GPU architectures
Zhang F., Zhai J., He B., Zhang S., Chen W. IEEE Transactions on Parallel and Distributed Systems28(3):905-918,2017.Type:Article

Date Reviewed: 08/08/17

Graphics processing units (GPUs) are used in many applications that are not necessarily related to graphics. With the rise of big data and machine learning, GPUs gain more importance. GPUs exist in two different “flavors”: discrete (where the GPU card is plugged into your board through the peripheral component interconnect express (PCIe) bus) or embedded (in the same chip as the multicore processor). This paper discusses the latter. Both AMD and Intel have GPUs embedded with the multicore. The paper uses the AMD Kaveri processor and the Intel Haswell processor for the study at hand. This study tries to answer one question: If I have a parallel application to be executed on such a chip, shall I assign all of the threads to the GPU, all of the threads to the central processing unit (CPU), or use both at the same time?

The authors did a lot of experiments with programs drawn from several benchmark suites. Based on these experiments, they categorize each program in one of three categories: co-run friendly programs (those that can benefit from both the CPU and GPU simultaneously), GPU-dominant programs (those that get the best performance from GPU-only), and CPU-dominant (those that get the best performance from using CPU-only). There is a small fraction of programs that are ratio-oblivious; that is, they have similar performance no matter the ratio of threads running on GPU versus CPU. The paper then tries to understand the factor determining each category.

Most programs were not co-run friendly. This is due to several factors. First, CPU and GPU have totally different architectures. For example, GPU has local memory that the programmer can make use of to reduce global memory access. The CPU does not have this memory and the programming libraries must emulate it, reducing performance. Second, CPUs require less memory bandwidth and get better performance from locality (that is, cache-friendly applications), while GPUs require much higher bandwidth due to the massive parallelism they need to hide global memory access.

The co-run-friendly programs require limited bandwidth and do not have a very high degree of parallelism. The paper goes one step further by building a black-box machine prediction tool that predicts whether a program is co-run friendly or not. If the program is co-run friendly, the authors propose another analytical tool that determines the ratio of the runs on CPU versus GPU.

Finally, the paper discusses an important topic: power/energy efficiency. The authors found that not all integrated architectures (that is, the ones with multicore plus GPU) are more energy-efficient than discrete ones.

Overall, this is a good paper on an important topic, which provides good insights on the behavior of parallel programs and when they benefit from multicore versus GPUs.

Reviewer: Mohamed Zahran

Review #: CR145464 (1710-0660)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy