Also called an "intelligent processing unit" (IPU), a neural processing unit (NPU) is a set of circuits or a separate chip that accelerates the execution of AI applications, known as "inference." See
AI training vs. inference.
The NPU accelerates matrix multiplication especially in desktop devices with limited computing power compared to datacenter servers. See
AI PC.
In 2016, Qualcomm introduced its Zeroth Machine Intelligence Platform and Snapdragon Neural Processing Engine that enabled mobile devices to execute AI applications on their own rather than going to an AI datacenter in the cloud. See
neural network and
neuromorphic computing.