Instead of scanning over all training data to compute the average for gradient descent, mini-batch gradient descent looks at a subset of 1000 examples
Each step will be less computationally expensive, but also less reliable to reach a local minimum or local maximum like the original gradient descent