论文研究方向
论文研究方向
基准测试:搭了一套真实系统进行性能测试
- DirectCXL
Gouk D, Lee S, Kwon M, et al. Direct access,{High-Performance} memory disaggregation with {DirectCXL}[C]//2022 USENIX Annual Technical Conference (USENIX ATC 22). 2022: 287-294. - 在最先进的实验平台探索CXL的性能表征,并为程序员提供了一些指南
Sun Y, Yuan Y, Yu Z, et al. Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices[J]. arXiv preprint arXiv:2303.15375, 2023. - 异构计算
Cabrera, A.M., A.R. Young, and J.S. Vetter. “Design and Analysis of CXL Performance Models for Tightly-Coupled Heterogeneous Computing,” 2022. https://doi.org/10.1145/3529336.3530817.
CXL软件栈
- 海力士HMSDK
S. Ryu et al., “System Optimization of Data Analytics Platforms using Compute Express Link (CXL) Memory,” presented at the Proceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023, 2023, pp. 9–12. doi: 10.1109/BigComp57234.2023.00011. - 三星MSDK
K. Kim et al., “SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander,” IEEE Micro, vol. 43, no. 2, pp. 20–29, 2023, doi: 10.1109/MM.2023.3240774.
内存池
- DIRECTCXL
Gouk, D., S. Lee, M. Kwon, and M. Jung. “Direct Access, High-Performance Memory Disaggregation with DIRECTCXL,” 287–94, 2022. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85138473074&partnerID=40&md5=4605596d72e2054c690576e6e0e93e86. - 云服务内存池
Li, H., D.S. Berger, L. Hsu, D. Ernst, P. Zardoshti, S. Novakovic, M. Shah, et al. “Pond: CXL-Based Memory Pooling Systems for Cloud Platforms,” 2:574–87, 2023. https://doi.org/10.1145/3575693.3578835. - 带NDP的内存池
Wahlgren, Jacob, Maya Gokhale, and Ivy B. Peng. “Evaluating Emerging CXL-Enabled Memory Pooling for HPC Systems.” In 2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC), 11–20, 2022. https://doi.org/10.1109/MCHPC56545.2022.00007. - 可组合和横向扩展架构内存池 intel的人写的
Sharma, D.D. “Novel Composable and Scaleout Architectures Using Compute Express Link.” IEEE Micro 43, no. 2 (2023): 9–19. https://doi.org/10.1109/MM.2023.3235972. - 基于CXL的动态容量拓展
Ha, M., J. Ryu, J. Choi, K. Ko, S. Kim, S. Hyun, D. Moon, et al. “Dynamic Capacity Service for Improving CXL Pooled Memory Efficiency.” IEEE Micro 43, no. 2 (2023): 39–47. https://doi.org/10.1109/MM.2023.3237756. - 探讨了云服务提供商该如何设计CXL内存池
Berger, D.S., D. Ernst, H. Li, P. Zardoshti, M. Shah, S. Rajadnya, S. Lee, et al. “Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms.” IEEE Micro 43, no. 2 (2023): 30–38. https://doi.org/10.1109/MM.2023.3241586.
分离式内存
CXL局限于机架级,提出基于以太网的CXL,实现了通过load/store内存语义访问以太网连接的远程分离式内存。
Wang C, He K, Fan R, et al. CXL over Ethernet: A Novel FPGA-based Memory Disaggregation Design in Data Centers[C]//2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2023: 75-82.
安全性
一篇关于CXL的安全问题的综述
Stark, S.W., A.T. Markettos, and S.W. Moore. “How Flexible Is CXL’s Memory Protection?” Queue 21, no. 3 (2023): 54–64. https://doi.org/10.1145/3606014.FAM架构下数据一致性
Alwadi, M., R. Wang, D. Mohaisen, C. Hughes, S.D. Hammond, and A. Awad. “Minerva: Rethinking Secure Architectures for the Era of Fabric-Attached Memory Architectures,” 258–68, 2022. https://doi.org/10.1109/IPDPS53621.2022.00033.多个host共享device内存的安全问题
GFAM,多设备共享内存安全问题
UIO提供的点对点通信的安全问题
好像没啥专门做CXL相关的,都是提了一嘴
Fabric Attached Memory (FAM) 架构:也叫memory-centric architectures,允许多个PE(processing elements)连接到共享内存池
Tiered Memory 层次存储
- 本文提出了一种新的操作系统级应用透明页面放置机制(TPP),采用轻量级机制来识别热/冷页并将其放置到适当的内存层。它支持主动将页面从本地内存降级到 CXL 内存
H. A. Maruf et al., “TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory,” presented at the International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, 2023, pp. 742–755. doi: 10.1145/3582016.3582063. - 通过调整缓存大小来解决层次存储延迟不等问题
H. Lee, S. Lee, Y. Jung, and D. Kim, “T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving,” IEEE Computer Architecture Letters, vol. 22, no. 2, pp. 73–76, 2023, doi: 10.1109/LCA.2023.3290197.
应用
软硬件协同
- Jang, Junhyeok, Hanjin Choi, Hanyeoreum Bae, and Seungjun Lee. “CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search,” n.d.
- Huangfu, W., K.T. Malladi, A. Chang, and Y. Xie. “BEACON: Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL Support,” 2022-October:727–43, 2022. https://doi.org/10.1109/MICRO56248.2022.00057.
- “Partial Failure Resilient Memory Management System for (CXL-Based) Distributed Shared Memory,” n.d.
图计算系统
Zhang, X., Y. Chang, T. Lu, K. Zhang, and M. Chen. “Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric,” 25–35, 2023. https://doi.org/10.1109/CCGrid57682.2023.00013.
分布式深度学习
Arif, M., K. Assogba, M.M. Rafique, and S. Vazhkudai. “Exploiting CXL-Based Memory for Distributed Deep Learning,” 2022. https://doi.org/10.1145/3545008.3545054.
FAAS
基于CXL的对象粒度内存接口,用于维护Faas对象,关键创新是容错一致性协议
A. Patil, V. Nagarajan, N. Nikoleris, and N. Oswald, “Āpta: Fault-tolerant object-granular CXL disaggregated memory for accelerating FaaS,” presented at the Proceedings - 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2023, 2023, pp. 201–215. doi: 10.1109/DSN58367.2023.00030.
云计算
虚拟机快照主要用来数据恢复,传输粒度为4Kib(页面大小),利用CXL以64B粒度修改内存。
D. Waddington, M. Hershcovitch, S. Sundararaman, and C. Dickey, “A case for using cache line deltas for high frequency VM snapshotting,” presented at the SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing, 2022, pp. 526–539. doi: 10.1145/3542929.3563481.
CXL-SSD
- Jung, M. “Hello Bytes, Bye Blocks: Pcie Storage Meets Compute Express Link for Memory Expansion (Cxl-Ssd),” 45–51, 2022. https://doi.org/10.1145/3538643.3539745.
- Kwon, M., S. Lee, and M. Jung. “Cache in Hand: Expander-Driven CXL Prefetcher for Next Generation CXL-SSD,” 24–30, 2023. https://doi.org/10.1145/3599691.3603406.
- Yang, Shao-Peng, Minjae Kim, Sanghyun Nam, Juhyung Park, Jin-yong Choi, Eyee Hyun Nam, Eunji Lee, Sungjin Lee, and Bryan S Kim. “Overcoming the Memory Wall with CXL-Enabled SSDs,” n.d.
利用CXL的特性解决问题
分离式内存数据中心的DRAM能耗较高,虽然可以通过低功耗模式节省能源,但是由于DRAM交错提高带宽,需要对操作系统、内存控制器、甚至DRAM进行侵入式修改。
本文提出了DRAM Translation Layer (DTL), 基于 CXL 的存储设备内进行灵活地址映射和数据迁移的机制,DTL 在从主机物理地址 (HPA) 到 DRAM 设备物理地址 (DPA) 的地址转换中引入了一定程度的间接寻址。
W. Jin, J. Lee, W. Jang, S. Kim, H. Park, and J. W. Lee, “DRAM Translation Layer: Software-Transparent DRAM Power Savings for Disaggregated Memory,” presented at the Proceedings - International Symposium on Computer Architecture, 2023, pp. 217–229. doi: 10.1145/3579371.3589051.
评估方法
- 真实硬件:
- 自己实现的支持CXL的CPU和FPGA(IP软核 CXL3.0)
- intel AMD 推出的CPU和FPGA (IP硬核 CXL1.1)
- NUMA模拟:模拟内存池的延迟还可以,复杂一些的软硬件协同无法模拟
- 模拟器:GEM5模拟
- 其他的模拟方法
- python脚本:语焉不详Accelerating Performance of GPU-based Workloads Using CXL
- 以太网+片上互联
当前挑战
实验如何开展:没有硬件、如何模拟
选择合适场景:
- type2:异构计算会从缓存一致性中获益什么?如何模拟?
- type3:
- 基于CXL的分离式内存池+具体应用优化
- 软硬件协同:带DSA(NDP)的内存池,近存计算