ABSTRACT
The performance of large-scale graph processing suffers from challenges including poor locality, lack of scalability, random access pattern, and heavy data conflicts. Some characteristics of FPGA make it a promising solution to accelerate various applications. For example, on-chip block RAMs can provide high throughput for random data access. However, large-scale processing on a single FPGA chip is constrained by limited on-chip memory resources and off-chip bandwidth. Using a multi-FPGA architecture may alleviate these problems to some extent, while the data partitioning and communication schemes should be considered to ensure the locality and reduce data conflicts. In this paper, we propose ForeGraph, a large-scale graph processing framework based on the multi-FPGA architecture. In ForeGraph, each FPGA board only stores a partition of the entire graph in off-chip memory. Communication over partitions is reduced. Vertices and edges are sequentially loaded onto the FPGA chip and processed. Under our scheduling scheme, each FPGA chip performs graph processing in parallel without conflicts. We also analyze the impact of system parameters on the performance of ForeGraph. Our experimental results on Xilinx Virtex UltraScale XCVU190 chip show ForeGraph outperforms state-of-the-art FPGA-based large-scale graph processing systems by 4.54x when executing PageRank on the Twitter graph (1.4 billion edges). The average throughput is over 900 MTEPS in our design and 2.03x larger than previous work.
- Graph 500. http://www.graph500.org/.Google Scholar
- Je Gonzalez, Y Low, and H Gu. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012.Google Scholar
Digital Library
- Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.Google Scholar
Digital Library
- Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. GraphChi: Large-Scale Graph Computation on Just a PC Disk-based Graph Computation. In OSDI, pages 31--46, 2012.Google Scholar
- Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. VLDB Endowment, pages 716--727, 2012.Google Scholar
Digital Library
- Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146. ACM, 2010.Google Scholar
Digital Library
- Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. Gemini: A computation-centric distributed graph processing system. In OSDI, pages 301--316, 2016.Google Scholar
Digital Library
- Nadathur Satish, Narayanan Sundaram, Md Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In SIGMOD, pages 979--990. ACM, 2014.Google Scholar
Digital Library
- Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A lightweight infrastructure for graph analytics. In SOSP, pages 456--471. ACM, 2013. Google Scholar
Digital Library
- Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, and Huazhong Yang. Nxgraph: An efficient graph processing system on a single machine. In ICDE, pages 409--420, 2016.Google Scholar
Cross Ref
- Xiaowei Zhu, Wentao Han, and Wenguang Chen. GridGraph : Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In ATC, pages 375--386, 2015.Google Scholar
- Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. X-stream: edge-centric graph processing using streaming partitions. In SOSP, pages 472--488. ACM, 2013.Google Scholar
Digital Library
- Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. Turbograph: a fast parallel graph engine handling billion-scale graphs in a single pc. In SIGKDD, pages 77--85. ACM, 2013. Google Scholar
Digital Library
- Farzad Khorasani. Scalable SIMD-Efficient Graph Processing on GPUs. In PACT, pages 39--50. ACM, 2015. Google Scholar
Digital Library
- Duane Merrill, Michael Garland, and Andrew Grimshaw. Scalable gpu graph traversal. In ACM SIGPLAN Notices, pages 117--128. ACM, 2012. Google Scholar
Digital Library
- Osama G Attia, Tyler Johnson, Kevin Townsend, Philip Jones, and Joseph Zambreno. Cygraph: A reconfigurable architecture for parallel breadth-first search. In IPDPSW, pages 228--235. IEEE, 2014.Google Scholar
Digital Library
- Brahim Betkaoui, Yu Wang, David B Thomas, and Wayne Luk. A reconfigurable computing approach for efficient and scalable parallel graph exploration. In ASAP, pages 8--15. IEEE, 2012.Google Scholar
Digital Library
- Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. Fpgp: Graph processing framework on fpga a case study of breadth-first search. In FPGA, pages 105--110. ACM, 2016.Google Scholar
Digital Library
- Nina Engelhardt and Hayden Kwok-Hay So. Gravf: A vertex-centric distributed graph processing framework on fpgas. In FPL, pages 403--406. IEEE, 2016. Google Scholar
Cross Ref
- Nachiket Kapre, Nikil Mehta, Dominic Rizzo, Ian Eslick, Raphael Rubin, Tomas E Uribe, F Thomas Jr, Andre DeHon, et al. Graphstep: A system architecture for sparse-graph algorithms. In FCCM, pages 143--151. IEEE, 2006.Google Scholar
- Eriko Nurvitadhi, Gabriel Weisz, Yu Wang, Skand Hurkat, Marie Nguyen, James C Hoe, Jose Martínez, and Carlos Guestrin. Graphgen: An fpga framework for vertex-centric graph computation. In FCCM, pages 25--28. IEEE, 2014.Google Scholar
Cross Ref
- Tayo Oguntebi and Kunle Olukotun. Graphops: A dataflow library for graph analytics acceleration. In FPGA, pages 111--117. ACM, 2016. Google Scholar
Digital Library
- Shijie Zhou, Charalampos Chelmis, and Viktor K Prasanna. High-throughput and energy-efficient graph processing on fpga. In FCCM, pages 103--110. IEEE, 2016.Google Scholar
Cross Ref
- Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA, pages 105--117. ACM, 2015. Google Scholar
Digital Library
- Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. Challenges In Parallel Graph Processing. Parallel Processing Letters, pages 5--20, 2007. Google Scholar
Cross Ref
- Andrew Lenharth, Donald Nguyen, and Keshav Pingali. Parallel graph analytics. Communications of the ACM, 59(5):78--87, 2016. Google Scholar
Digital Library
- Brahim Betkaoui, Yu Wang, David B Thomas, and Wayne Luk. Parallel fpga-based all pairs shortest paths for sparse networks: A human brain connectome case study. In FPL, pages 99--104. IEEE, 2012.Google Scholar
Cross Ref
- Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In ISCA, pages 13--24. IEEE, 2014.Google Scholar
Digital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In WWW, pages 591--600. ACM, 2010. Google Scholar
Digital Library
- Virat Agarwal, Fabrizio Petrini, Davide Pasetto, and David A Bader. Scalable graph exploration on multicore processors. In SC, pages 1--11. IEEE, 2010.Google Scholar
Digital Library
- Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun. Efficient parallel graph exploration on multi-core cpu and gpu. In PACT, pages 78--88. IEEE, 2011. Google Scholar
Digital Library
- Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. Dramsim2: A cycle accurate memory system simulator. IEEE Computer Architecture Letters, 10(1):16--19, 2011. Google Scholar
Digital Library
- Stanford large network dataset collection. http://snap.stanford.edu/data/index.html#web.Google Scholar
- Yahoo! altavisata web page hyperlink connectivity graph, circa 2002. http://webscope.sandbox.yahoo.com/.Google Scholar
- https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus.html.Google Scholar
- https://www.altera.com/solutions/technology/next-generation-technology/overview.html.Google Scholar
Index Terms
ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture
Comments