高性能计算与并行算法

swPredictor: A data-driven performance model for distributed data parallelism training on large-scale HPC clusters

高性能计算与并行算法方向论文:swPredictor: A data-driven performance model for distributed data parall

HPCDistributed学位认定 BCCF B

分类与摘要

关注大规模并行算法、超算平台和性能优化,是 HPC 方向的重要条目。

证据摘录:llelism training on large-scale HPC clusters Xianyu Zhu a , Ruohan Wu a , Junshi Chen a,b ,∗, Hong An a,b a University of Science and Technology of China, Hefei, Anhui, China b Laoshan Laboratory, Qingdao, China A R T I C L E I N F O Keywords: High-performance computing Performance modeling Deep learning Distributed training A B S T R A C T Given the complexity of heterogeneous architectures and multi-node col || llelism training on large-scale HPC clusters Xianyu Zhu a , Ruohan Wu a , Junshi Chen a,b ,∗, Hong An a,b a University of Science and Technology of China, Hefei, Anhui, China b Laoshan Laboratory, Qingdao, China A R T I C L E I N F O Keywords: High-performance computing Performance modeling Deep learning Distributed training A B S T R A C T Given the complexity of heterogeneous architectures and mul || pported by the Chinese Academy of Sciences (Strategic Priority Research Program, China), Grant (XDB0500102). And this work is financially supported by Laoshan Laboratory (China) (LSKJ202300305). We would like to thank Ziyue You from the University of Edinburgh, Shengtao Xue from the Northwest A&F University and Jiahao Huang from the National University of Singapore, for their valuable feedback and assistanc || pported by the Chinese Academy of Sciences (Strategic Priority Research Program, China), Grant (XDB0500102). And this work is financially supported by Laoshan Laboratory (China) (LSKJ202300305). We would like to thank Ziyue You from the University of Edinburgh, Shengtao Xue from the Northwest A&F University and Jiahao Huang from the National University of Singapore, for their valuable feedback and

引用

Zhu, Xianyu; Wu, Ruohan; Chen, Junshi; An, Hong,swPredictor: A data-driven performance model for distributed data parallelism training on large-scale HPC clusters,PERFORMANCE EVALUATION,VOL170,November ,2025 (CCF B)

@article{acsa2025_4,
  title = {swPredictor: A data-driven performance model for distributed data parallelism training on large-scale HPC clusters},
  year = {2025},
  doi = {10.1016/j.peva.2025.102530}
}
title swPredictor: A data-driven performance model for distributed data parallelism training on large-scale HPC clusters
title_zh 待补充
abstract 待补充
abstract_zh 待补充
keywords HPC, Distributed, 学位认定 B, CCF B
year 2025
published_date 待补充
online_date 待补充
paper_type Journal
publication_status Published
volume 待补充
issue 待补充
pages 待补充
article_number 待补充
publisher 待补充
doi 10.1016/j.peva.2025.102530
research_area 高性能计算与并行算法
tags HPC, Distributed, 学位认定 B, CCF B
category 高性能计算与并行算法
summary 关注大规模并行算法、超算平台和性能优化,是 HPC 方向的重要条目。
authors Zhu, Xianyu, Wu, Ruohan, Chen, Junshi, An, Hong
corresponding_authors
affiliations 待补充
funding 崂山实验室项目