The difference between predicted and expected values when training a machine learning model. The "gradient descent" optimization algorithm is used to derive the smallest difference in the cost function. This is achieved by using "backpropagation," which means keep going back to the input layer and adjusting the weights and biases. See AI weights and biases and large language model.