Skip Navigation
Search

Publications

Following publications are related to Ookami:

Papers

  1. VPIC 2.0: Next Generation Particle-in-Cell Simulations; Bird, Tan, Luedtke, Harrell, Taufer, Albright; 2021
  2. Ookami: Deployment and Initial Experiences; Burford, Calder, Carlson, Chapman, Coskun, Curtis, Feldman, Harrison, Kang, Michalowicz, Raut, Siegmann, Wood, Deleon, Jones, Simakov, White, Oryspayev; PEARC '21

  3. Comparing the behavior of OpenMP Implementations with various Applications on two different Fujitsu A64FX platforms; Michalowicz, Raut, Kang, Curtis, Oryspayev, Chapman; PEARC '21
  4. MoB2 under Pressure: Superconducting Mo Enhanced by Boron; Quan, Lee, Pickett; 2021
  5. A64FX performance: experience on Ookami; Shahneous Bari, Chapman, Curtis,  Harrison, Siegmann, Simakov, Jones; 2021
  6. Porting and Evaluation of a Distributed Task-driven Stencil-based Application; Raut, Anderson, Araya-Polo, Meng;  PMAM 2021
  7. Comparing OpenMP Implementations with Applications Across A64FX Platforms; Michalowicz, Raut, Kang, Curtis, Chapman, Oryspayev;  IWOMP 2021
  8. Educating HPC users in the use of advanced computing technology; Siegmann, Calder, Feldman, Harrison; SC'21 EduHPC
  9. Experiences with Porting the FLASH Code to Ookami, an HPE Apollo 80 A64FX Platform; Feldman, Michalowicz, Siegmann, Curtis, Calder, Harrison; HPC Asia 2022
  10. OpenSHMEM Active Message Extension for Task-Based Programming; Lu, Curtis, Chapman; 2022
  11. Analysis of Vector Particle-In-Cell (VPIC) memory usage optimizations on cutting-edge computer architectures; Tan, Bird, Chen,  Luedtke, Albright, Taufer; Journal of Computational Science; 2022
  12. Dirac lines and loop at the Fermi level in the time-reversal symmetry breaking superconductor LaNiGa2; Badger, Quan, Staab, Sumita, Rossi, Devlin, Neubauer, Shulman, Fettinger, Klavins, Kauzlarich, Aoki, Vishik, Pickett, Taufour; communications physics; 2022
  13. Parthenon – a performance portable block-structured adaptive mesh refinement framework; Grete, Dolence, Miller, Brown, Ryan , Gaspar, Glines, Swaminarayan, Lippuner, Solomon, Shipman, Junghans, Holladay, Stone; 2022
  14. Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems; Ramesh, Hashmi, Xu, Shafi, Ghazimirsaeed, Bayatpour, Subramoni, Panda; IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC) 2021
  15. Quantum calcium-ion affective influences measured by EEG; Ingber; 2022
  16. Hybrid Classical-Quantum Computing: Applications to Statistical Mechanics of Neocortical Interactions; Ingber; 2021
  17. Exploring Source-to-Source Compiler Transformation of OpenMP SIMD Constructs for Intel AVX and Arm SVE Vector Architectures; Flynn, Yi, Yan; The 13th International Workshop on Programming Models and Applications for Multicores and Manycores be held in conjunction with PPoPP 2022
  18. FOURST: A code generator for FFT-based fast stencil computations; Ahmad, Javanmard, Croisdale, Gregory, Ganapathi, Pouchet, Chowdhury; 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 99-108
  19. Friends and foes: Sinophobia was viral in Chinese language communities on Twitter during the early COVID-19 pandemic; Zhang, Lin, Wang, Fan; 2022
  20. Developing Accurate Slurm Simulator; Simakov, Deleon, Lin, Hoffmann, Mathias; PEARC'22
  21. On Using Linux Kernel Huge Pages with FLASH, an Astrophysical Simulation Code; Calder, Feldman, Siegmann, Dey, Curtis, Chheda, Harrison; IEEE Cluster - EAHPC Workshop 2022
  22. Performance of an Astrophysical Radiation Hydrodynamics Code under Scalable Vector Extension Optimization; Smolarski, Swesty  Calder; IEEE Cluster, EAHPC Workshop 2022
  23. Bring the BitCODE - Moving Compute and Data in Distributed Heterogeneous Systems; Lu, Pena, Shamis, Churavy, Chapman, Poole; 2022
  24. From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types; Daiß, Singanaboina, Diehl, Kaiser, Pflüger; 2022
  25. Assessing the State of Autovectorization Support based on SVE; Brank, Pleiter; IEEE Cluster, EAHPC Workshop 2022
  26. Improved Distributed-memory Triangle Counting by Exploiting the Graph Structure; Gosh; IEEE 2022
  27. OpenMP Advisor: A Compiler Tool for Heterogenous Architectures; Mishra, Malik, Lin, Chapman; 2023
  28. Modern server ARM processors for supercomputers: A64FX and others. Initial data of benchmarks; Kuzminsky; 2022
  29. Examining the Connectivity of Antarctic Krill on the West Antarctic Peninsula: Implications for Pygoscelis Penguin Biogeography and Population Dynamics; Gallagher, Dinniman, Lynch; 2023
  30. Are we ready for broader adoption of ARM in the HPC community: Performance and Energy Efficiency Analysis of Benchmarks and Applications Executed on High-End ARM Systems; Simakov, Deleon, White, Jones, Furlani, Siegmann,  Harrison; HPC Asia 2023
  31. Performance Study on CPU-based Machine Learning with PyTorch; Chheda, Curtis, Siegmann, Chapman; HPC Asia 2023
  32. Shared memory parallelism in Modern C++ and HPX; Diehl,  Brandt, Kaiser; 2023
  33. Interoperable PGAS Programming Models for Exascale Supercomputing; Lu; 2023
  34. Program Transformation for Automatic GPU-Offloading using OpenMP; Mishra; 2023
  35. HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs; Zhang, Smith, Sun, Tian, Soifer, Yu, Song, He, Tao; 2023
  36. Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku; Diehl, Daiß, Huck, Marcello, Shiber, Kaiser, Pfluger; 2023
  37. Asynchronous Many-Task Systems and Applications: First International Workshop; Diehl, Thoman, Kaiser, Kale; 2023
  38. CPU Architecture Modelling and Design; Brank, Pleiter; 2023
  39. Cyberinfrastructure for Sustainability Sciences; Song, Merwade, Wang, Witt, Kumar, Irwin, Zhao, Walton; 2023
  40. Human mobility patterns are associated with experienced partisan segregation in US metropolitan areas; Zhang, Cheng, Li, Jiang; 2023
  41. Quantifying Antarctic krill connectivity across the West Antarctic Peninsula and its role in large-scale Pygoscelis penguin population dynamics; Gallagher, Dinniman, Lynch; 2023
  42. Sinophobia was popular in Chinese language communities on Twitter during the early COVID-19 pandemic; Zhang, Lin, Wang, Fan; 2023
  43. Efficient Auto-Vectorization for Control-flow Dependent Loops through Data Permutation; Paktinatkeleshteri; 2023
  44. From Molecular Dynamics to Oceanography - Ookami Graduate Students Porting and Tuning Science Codes for A64FX; Kaushik, Wang, Ma, Carlson, Curtis, Harrison, Siegmann; 2023
  45. A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation Code; Feldman, Chheda, Dey, Siegmann, Curtis, Harrison; 2023
  46. LM4HPC: Towards Effective Language Model Application in High-Performance Computing; Emani, de Supinski; 2023
  47. Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger; Diehl, Daiss, Brandt, Kheirkhahan, Kaiser, Taylor, Leidel; 2023
  48. The General Atomic and Molecular Electronic Structure System (GAMESS): Novel Methods on Novel Architectures; Zahariev, Xu, Westheimer, Webb, Vallejo, Tiwari, Sundriyal, Sosonkina, Shen, Schoendorff, Schlinsog, Sattasathuchana, Ruedenberg, Roskop, Rendell, Poole, Piecuch, Pham, Mironov, Mato, Leonard, Leang, Ivanic, Hayes, Harville, Gururangan, Guidez, Gerasimov, Friedl, Ferreras, Elliott, Datta, Cruz, Carrington, Bertoni, Barca, Alkan, Gordon; 2023
  49. Efficient Auto-Vectorization for Control-flow Dependent Loops through Data Permutation; Rouzbeh, de Carvalho João P. L., Ehsan, Nelson; 2023
  50. Parameterization of Quantum Interactions;  Ingber; 2023
  51. Ookami: An A64FX Computing Resource; Calder, Siegmann, Feldman, Chheda, Smolarski, Swesty, Curtis, Dey, Carlson, Michalowicz, Harrison, 2023
  52. Cross-Feature Transfer Learning For Efficient Tensor Program Generation; Verma, Raskar, Emani, Chapman; 2024
  53. Impact of Write-Allocate Elimination on Fujitsu A64FX; Kang, Gosh, Kandemir, Marquez; 2024
  54. First Impressions of the NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchip for Scientific Workloads; Simankov, Jones, Furlani, Siegmann, Harrison; 2024
  55. Parallel C++ Efficient and Scalable High-Performance Parallel Programming Using HPX; Diehl, Brandt, Kaiser; 2024
  56. Explore as a Storm, Exploit as a Droplet: A Unified Search Technique for the Ansor Optimizer; Canesche, Verma, Quintao Pereira; 2024
  57. Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java; Diehl, Brandt, Morris, Gupta, Kaiser; 2023
  58. Quantifying potential marine debris sources and potential threats to penguins on the West Antarctic Peninsula;Gallagher, Cimino, Dinniman, Lynch; 2024
  59. Anti-Coulomb ion-ion interactions: A theoretical and computational study; Wills, Mannino, Losada, Mayo, Soler, Fernandez-Serra; 2024
  60. Parallel assembly of finite element matrices on multicore computers; Krysl; 2024
  61. First Impressions of the Sapphire Rapids Processor with HBM for Scientific Workloads; Siegmann, Harrison, Carlson, Chheda, Curtis, Coskun, Gonzalez, Wood, Simakov ; 2024
  62. Performance-Portable Tensor Transpositions in MLIR; Lakshminarasimhan, Hall, Sadayappan; 2024
  63. A64FX Enables Engine Decarbonization Using Deep Learning; Ristow Hadlich, Verma, Curtis, Siegmann, Assanis; 2024
  64. From array expressions to predictable portable high-performance: foundations for no-code HPC on arrays; Mullin, Hains; 2024
  65. Explore as a Storm, Exploit as a Droplet: A Unified Search Technique for the MetaSchedule; Canesche, Verma, Quintao Pereira; 2024
  66. Accelerating LULESH using HPX – the C++ Standard Library for Parallelism and Concurrency; Singanaboina, Wei, Seiras, Syskakis, Richardson, Cook, Kaiser; 2024
  67. Hardware-Software Co-design of Efficient and Scalable Deep Learning; Zhang; 2024
  68. Dynamics of Jet Expansion and Impingement Across a Spectrum of Nozzle Pressure Ratios; Martinus, Tumuklu; 2024
  69. Benchmarking with Supernovae: A Performance Study of the FLASH Code; Martin, Feldman, Calder, Curtis, Siegmann, Carlson, Gonzalez, Wood, Harrison, Coskun; 2024
  70. Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels; Nassyr, Pleiter; 2024
  71. Enhancing Code Portability, Problem Scale, and Storage Efficiency in Exascale Applications; Tan; 2024
  72. Towards a Scalable and Efficient PGAS-based Distributed OpenMP; Shan, Araya-Polo, Chapman; 2024
  73. On the Scalability of Computing Genomic Diversity Using SparkLeBLAST: A Feasibility Study; Prabhu, Moussad, Youssef, Vatai, Feng; 2024
  74. Benchmarking and Continuous Performance Monitoring of Ookami, an ARM Fujitsu A64FX Testbed Cluster; Simakov, White, Jones, Siegmann, Wood, Coskun, Harrison; 2024
  75. From Saline to Solids: Studies of Ionic Solvation and Machine Learning for Ab Initio Calculations; Wills; 2024
  76. Xphase3d: Memory-Distributed Phase Retrieval for Reconstructing Large-Scale 3D Density Maps of Biological Macromolecules;  Yhao, Miyashita, Nakano, Tama; 2024
  77. Improving Polyhedral-Based Optimizations With Dynamic Coordinate Descent; Verma, Canesche, Chapman, Quintao Pereira; 2024
  78. Evaluating Tuning Opportunities of the LLVM/OpenMP Runtime; Chheda, Verma, Tian, Chapman, Doerfert; 2024
  79. Studying CPU and memory utilization of applications on Fujitsu A64FX and Nvidia Grace Superchip; Kang, Gosh, Kandemir, Marquez; 2024
  80. Asynchronous-Many-Task Systems: Challenges and Opportunities -- Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX;  Daiß, Diehl, Yan, Holmen, Gayatri, Junghans, Straub, Hammond, Marcello, Tsuji, Pflüger, Kaiser; 2024
  81. Preparing for HPC on RISC-V: Examining Vectorization and Distributed Performance of an Astrophyiscs Application with HPX and Kokkos; Diehl, Syskakis, Daiß, Brandt, Kheirkhahan, Yadav Singanaboina, Marcello, Taylor, Leidel, Kaiser; 2024
  82. The General Atomic and Molecular Electronic Structure System (GAMESS): Novel Methods on Novel Architectures;  Zahariev, Xu, Westheimer, Webb, Galvez Vallejo, Tiwari, Sundriyal, Sosonkina, Shen, Schoendorff, Schlinsog, Sattasathuchana, Ruedenberg, Roskop, Rendell, Poole, Piecuch, Pham,  Mironov, Mato, Leonard, Leang, Ivanic, Hayes, Harville, Gururangan, Guidez, Gerasimov, Friedl, Ferreras, Elliott, Datta, Del Angel Cruz, Carrington, Bertoni, Barca, Alkan, Gordon; 2023
  83. Optimizing and Scaling Machine Learning Models for Scientific Applications on Exascale Supercomputers; Vineeth Gutta, 2025
  84. Closing a Source Complexity Gap between Chapel and HPX; Atre, Taylor, Diehl, Kaiser; 2025
  85. Generalized Ideal Point Models for Robust Measurement with Dirty Data in the Social Sciences; Kubinec; 2025
  86. The AmpereOne A192-32X in Perspective: Benchmarking a New Standard; Carlson, Simakov, Ristow Hadlich, Curtis, Martin, Verma, Chheda, Coskun, Gonzalez, Wood, Zhang, Harrison, Siegmann; 2025
  87. Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku; Diehl, Daiß, Huck, Marcello, Shiber, Kaiser, Pflüger; 2024

 

Posters

  1. Pure Deflagrations of Hybrid CONe White Dwarf Progenitors; C. Feldman, D. Willcox, D. Townsley, A. Calder; AAS; 2021
  2. Stalls and Memory Analysis on Fujitsu A64FX and NVIDIA Grace;  Kang, Gosh, Kandemir, Marquez; 2024

 

Other publications

  1. Kernel module for the A64FX hardware barrier
  2. PEARC 2022 - Birds of a feather session: NSF innovative computing technology testbed community exchange