论文研究方向

基准测试：搭了一套真实系统进行性能测试

DirectCXL
Gouk D, Lee S, Kwon M, et al. Direct access,{High-Performance} memory disaggregation with {DirectCXL}[C]//2022 USENIX Annual Technical Conference (USENIX ATC 22). 2022: 287-294.
在最先进的实验平台探索CXL的性能表征，并为程序员提供了一些指南
Sun Y, Yuan Y, Yu Z, et al. Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices[J]. arXiv preprint arXiv:2303.15375, 2023.
异构计算
Cabrera, A.M., A.R. Young, and J.S. Vetter. “Design and Analysis of CXL Performance Models for Tightly-Coupled Heterogeneous Computing,” 2022. https://doi.org/10.1145/3529336.3530817.

CXL软件栈

海力士HMSDK
S. Ryu et al., “System Optimization of Data Analytics Platforms using Compute Express Link (CXL) Memory,” presented at the Proceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023, 2023, pp. 9–12. doi: 10.1109/BigComp57234.2023.00011.
三星MSDK
K. Kim et al., “SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander,” IEEE Micro, vol. 43, no. 2, pp. 20–29, 2023, doi: 10.1109/MM.2023.3240774.

内存池

DIRECTCXL
Gouk, D., S. Lee, M. Kwon, and M. Jung. “Direct Access, High-Performance Memory Disaggregation with DIRECTCXL,” 287–94, 2022. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85138473074&partnerID=40&md5=4605596d72e2054c690576e6e0e93e86.
云服务内存池
Li, H., D.S. Berger, L. Hsu, D. Ernst, P. Zardoshti, S. Novakovic, M. Shah, et al. “Pond: CXL-Based Memory Pooling Systems for Cloud Platforms,” 2:574–87, 2023. https://doi.org/10.1145/3575693.3578835.
带NDP的内存池
Wahlgren, Jacob, Maya Gokhale, and Ivy B. Peng. “Evaluating Emerging CXL-Enabled Memory Pooling for HPC Systems.” In 2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC), 11–20, 2022. https://doi.org/10.1109/MCHPC56545.2022.00007.
可组合和横向扩展架构内存池 intel的人写的
Sharma, D.D. “Novel Composable and Scaleout Architectures Using Compute Express Link.” IEEE Micro 43, no. 2 (2023): 9–19. https://doi.org/10.1109/MM.2023.3235972.
基于CXL的动态容量拓展
Ha, M., J. Ryu, J. Choi, K. Ko, S. Kim, S. Hyun, D. Moon, et al. “Dynamic Capacity Service for Improving CXL Pooled Memory Efficiency.” IEEE Micro 43, no. 2 (2023): 39–47. https://doi.org/10.1109/MM.2023.3237756.
探讨了云服务提供商该如何设计CXL内存池
Berger, D.S., D. Ernst, H. Li, P. Zardoshti, M. Shah, S. Rajadnya, S. Lee, et al. “Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms.” IEEE Micro 43, no. 2 (2023): 30–38. https://doi.org/10.1109/MM.2023.3241586.

分离式内存

CXL局限于机架级，提出基于以太网的CXL，实现了通过load/store内存语义访问以太网连接的远程分离式内存。
Wang C, He K, Fan R, et al. CXL over Ethernet: A Novel FPGA-based Memory Disaggregation Design in Data Centers[C]//2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2023: 75-82.

安全性

一篇关于CXL的安全问题的综述
Stark, S.W., A.T. Markettos, and S.W. Moore. “How Flexible Is CXL’s Memory Protection?” Queue 21, no. 3 (2023): 54–64. https://doi.org/10.1145/3606014.
FAM架构下数据一致性
Alwadi, M., R. Wang, D. Mohaisen, C. Hughes, S.D. Hammond, and A. Awad. “Minerva: Rethinking Secure Architectures for the Era of Fabric-Attached Memory Architectures,” 258–68, 2022. https://doi.org/10.1109/IPDPS53621.2022.00033.
多个host共享device内存的安全问题
GFAM，多设备共享内存安全问题
UIO提供的点对点通信的安全问题

好像没啥专门做CXL相关的，都是提了一嘴
Fabric Attached Memory (FAM) 架构：也叫memory-centric architectures，允许多个PE(processing elements)连接到共享内存池

Tiered Memory 层次存储

本文提出了一种新的操作系统级应用透明页面放置机制(TPP)，采用轻量级机制来识别热/冷页并将其放置到适当的内存层。它支持主动将页面从本地内存降级到 CXL 内存
H. A. Maruf et al., “TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory,” presented at the International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, 2023, pp. 742–755. doi: 10.1145/3582016.3582063.
通过调整缓存大小来解决层次存储延迟不等问题
H. Lee, S. Lee, Y. Jung, and D. Kim, “T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving,” IEEE Computer Architecture Letters, vol. 22, no. 2, pp. 73–76, 2023, doi: 10.1109/LCA.2023.3290197.

应用

软硬件协同

Jang, Junhyeok, Hanjin Choi, Hanyeoreum Bae, and Seungjun Lee. “CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search,” n.d.
Huangfu, W., K.T. Malladi, A. Chang, and Y. Xie. “BEACON: Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL Support,” 2022-October:727–43, 2022. https://doi.org/10.1109/MICRO56248.2022.00057.
“Partial Failure Resilient Memory Management System for (CXL-Based) Distributed Shared Memory,” n.d.

图计算系统

Zhang, X., Y. Chang, T. Lu, K. Zhang, and M. Chen. “Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric,” 25–35, 2023. https://doi.org/10.1109/CCGrid57682.2023.00013.

分布式深度学习

Arif, M., K. Assogba, M.M. Rafique, and S. Vazhkudai. “Exploiting CXL-Based Memory for Distributed Deep Learning,” 2022. https://doi.org/10.1145/3545008.3545054.

FAAS

基于CXL的对象粒度内存接口，用于维护Faas对象，关键创新是容错一致性协议
A. Patil, V. Nagarajan, N. Nikoleris, and N. Oswald, “Āpta: Fault-tolerant object-granular CXL disaggregated memory for accelerating FaaS,” presented at the Proceedings - 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2023, 2023, pp. 201–215. doi: 10.1109/DSN58367.2023.00030.

云计算

虚拟机快照主要用来数据恢复，传输粒度为4Kib（页面大小），利用CXL以64B粒度修改内存。
D. Waddington, M. Hershcovitch, S. Sundararaman, and C. Dickey, “A case for using cache line deltas for high frequency VM snapshotting,” presented at the SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing, 2022, pp. 526–539. doi: 10.1145/3542929.3563481.

CXL-SSD

Jung, M. “Hello Bytes, Bye Blocks: Pcie Storage Meets Compute Express Link for Memory Expansion (Cxl-Ssd),” 45–51, 2022. https://doi.org/10.1145/3538643.3539745.
Kwon, M., S. Lee, and M. Jung. “Cache in Hand: Expander-Driven CXL Prefetcher for Next Generation CXL-SSD,” 24–30, 2023. https://doi.org/10.1145/3599691.3603406.
Yang, Shao-Peng, Minjae Kim, Sanghyun Nam, Juhyung Park, Jin-yong Choi, Eyee Hyun Nam, Eunji Lee, Sungjin Lee, and Bryan S Kim. “Overcoming the Memory Wall with CXL-Enabled SSDs,” n.d.

利用CXL的特性解决问题

分离式内存数据中心的DRAM能耗较高，虽然可以通过低功耗模式节省能源，但是由于DRAM交错提高带宽，需要对操作系统、内存控制器、甚至DRAM进行侵入式修改。
本文提出了DRAM Translation Layer (DTL), 基于 CXL 的存储设备内进行灵活地址映射和数据迁移的机制，DTL 在从主机物理地址 (HPA) 到 DRAM 设备物理地址 (DPA) 的地址转换中引入了一定程度的间接寻址。
W. Jin, J. Lee, W. Jang, S. Kim, H. Park, and J. W. Lee, “DRAM Translation Layer: Software-Transparent DRAM Power Savings for Disaggregated Memory,” presented at the Proceedings - International Symposium on Computer Architecture, 2023, pp. 217–229. doi: 10.1145/3579371.3589051.

评估方法

真实硬件：
1. 自己实现的支持CXL的CPU和FPGA（IP软核 CXL3.0）
2. intel AMD 推出的CPU和FPGA （IP硬核 CXL1.1）
NUMA模拟：模拟内存池的延迟还可以，复杂一些的软硬件协同无法模拟
模拟器：GEM5模拟
其他的模拟方法
1. python脚本：语焉不详Accelerating Performance of GPU-based Workloads Using CXL
2. 以太网+片上互联