10.1145/3020078.3021739acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture

Published:22 February 2017Publication History

ABSTRACT

The performance of large-scale graph processing suffers from challenges including poor locality, lack of scalability, random access pattern, and heavy data conflicts. Some characteristics of FPGA make it a promising solution to accelerate various applications. For example, on-chip block RAMs can provide high throughput for random data access. However, large-scale processing on a single FPGA chip is constrained by limited on-chip memory resources and off-chip bandwidth. Using a multi-FPGA architecture may alleviate these problems to some extent, while the data partitioning and communication schemes should be considered to ensure the locality and reduce data conflicts. In this paper, we propose ForeGraph, a large-scale graph processing framework based on the multi-FPGA architecture. In ForeGraph, each FPGA board only stores a partition of the entire graph in off-chip memory. Communication over partitions is reduced. Vertices and edges are sequentially loaded onto the FPGA chip and processed. Under our scheduling scheme, each FPGA chip performs graph processing in parallel without conflicts. We also analyze the impact of system parameters on the performance of ForeGraph. Our experimental results on Xilinx Virtex UltraScale XCVU190 chip show ForeGraph outperforms state-of-the-art FPGA-based large-scale graph processing systems by 4.54x when executing PageRank on the Twitter graph (1.4 billion edges). The average throughput is over 900 MTEPS in our design and 2.03x larger than previous work.

References

  1. Graph 500. http://www.graph500.org/.Google ScholarGoogle Scholar
  2. Je Gonzalez, Y Low, and H Gu. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. GraphChi: Large-Scale Graph Computation on Just a PC Disk-based Graph Computation. In OSDI, pages 31--46, 2012.Google ScholarGoogle Scholar
  5. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. VLDB Endowment, pages 716--727, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146. ACM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. Gemini: A computation-centric distributed graph processing system. In OSDI, pages 301--316, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nadathur Satish, Narayanan Sundaram, Md Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In SIGMOD, pages 979--990. ACM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A lightweight infrastructure for graph analytics. In SOSP, pages 456--471. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, and Huazhong Yang. Nxgraph: An efficient graph processing system on a single machine. In ICDE, pages 409--420, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  11. Xiaowei Zhu, Wentao Han, and Wenguang Chen. GridGraph : Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In ATC, pages 375--386, 2015.Google ScholarGoogle Scholar
  12. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. X-stream: edge-centric graph processing using streaming partitions. In SOSP, pages 472--488. ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. Turbograph: a fast parallel graph engine handling billion-scale graphs in a single pc. In SIGKDD, pages 77--85. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Farzad Khorasani. Scalable SIMD-Efficient Graph Processing on GPUs. In PACT, pages 39--50. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Duane Merrill, Michael Garland, and Andrew Grimshaw. Scalable gpu graph traversal. In ACM SIGPLAN Notices, pages 117--128. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Osama G Attia, Tyler Johnson, Kevin Townsend, Philip Jones, and Joseph Zambreno. Cygraph: A reconfigurable architecture for parallel breadth-first search. In IPDPSW, pages 228--235. IEEE, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Brahim Betkaoui, Yu Wang, David B Thomas, and Wayne Luk. A reconfigurable computing approach for efficient and scalable parallel graph exploration. In ASAP, pages 8--15. IEEE, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. Fpgp: Graph processing framework on fpga a case study of breadth-first search. In FPGA, pages 105--110. ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nina Engelhardt and Hayden Kwok-Hay So. Gravf: A vertex-centric distributed graph processing framework on fpgas. In FPL, pages 403--406. IEEE, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  20. Nachiket Kapre, Nikil Mehta, Dominic Rizzo, Ian Eslick, Raphael Rubin, Tomas E Uribe, F Thomas Jr, Andre DeHon, et al. Graphstep: A system architecture for sparse-graph algorithms. In FCCM, pages 143--151. IEEE, 2006.Google ScholarGoogle Scholar
  21. Eriko Nurvitadhi, Gabriel Weisz, Yu Wang, Skand Hurkat, Marie Nguyen, James C Hoe, Jose Martínez, and Carlos Guestrin. Graphgen: An fpga framework for vertex-centric graph computation. In FCCM, pages 25--28. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  22. Tayo Oguntebi and Kunle Olukotun. Graphops: A dataflow library for graph analytics acceleration. In FPGA, pages 111--117. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shijie Zhou, Charalampos Chelmis, and Viktor K Prasanna. High-throughput and energy-efficient graph processing on fpga. In FCCM, pages 103--110. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  24. Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA, pages 105--117. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. Challenges In Parallel Graph Processing. Parallel Processing Letters, pages 5--20, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  26. Andrew Lenharth, Donald Nguyen, and Keshav Pingali. Parallel graph analytics. Communications of the ACM, 59(5):78--87, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Brahim Betkaoui, Yu Wang, David B Thomas, and Wayne Luk. Parallel fpga-based all pairs shortest paths for sparse networks: A human brain connectome case study. In FPL, pages 99--104. IEEE, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  28. Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In ISCA, pages 13--24. IEEE, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In WWW, pages 591--600. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Virat Agarwal, Fabrizio Petrini, Davide Pasetto, and David A Bader. Scalable graph exploration on multicore processors. In SC, pages 1--11. IEEE, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun. Efficient parallel graph exploration on multi-core cpu and gpu. In PACT, pages 78--88. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. Dramsim2: A cycle accurate memory system simulator. IEEE Computer Architecture Letters, 10(1):16--19, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Stanford large network dataset collection. http://snap.stanford.edu/data/index.html#web.Google ScholarGoogle Scholar
  34. Yahoo! altavisata web page hyperlink connectivity graph, circa 2002. http://webscope.sandbox.yahoo.com/.Google ScholarGoogle Scholar
  35. https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus.html.Google ScholarGoogle Scholar
  36. https://www.altera.com/solutions/technology/next-generation-technology/overview.html.Google ScholarGoogle Scholar

Index Terms

  1. ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
        February 2017
        312 pages
        ISBN:9781450343541
        DOI:10.1145/3020078

        Copyright © 2017 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 February 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        FPGA '17 Paper Acceptance Rate 25 of 101 submissions, 25%Overall Acceptance Rate 125 of 627 submissions, 20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!