In this talk, I will introduce the principles and technologies behind AIMC, highlighting memory options ranging from volatile to non-volatile devices. I will present recent integrated test chips developed by my group, which leverage high-density phase-change memory (PCM) and employ careful design–technology co-optimization (DTCO) to achieve both energy efficiency and the inference accuracy required by convolutional neural networks (CNNs). Finally, I will extend the discussion to transformer-based large language models (LLMs), which rely on matrix–matrix multiplications with rapidly changing data and activations. I will outline architectural opportunities for accelerating LLMs through the use of dynamic random-access memory (DRAM) and advanced 3D transistor arrays.
Prof. Daniele Ielmini (Politecnico di Milano – IT)