Why isn the correlation divided by the number of samples?
It depends how you define the correlation equation. Strictly speaking, you can normalize by the number of samples (ie divide by N) to align with the equations. However, in most practical applications, you are looking for the peak, or actually the index where the peak occurs. The actual magnitude of the peak is usually (but not always) immaterial. You’ll find a similar thing with the FFT when you get to it – some people don’t divide by anything, some people divide by N, some by sqrt(N)… you can make arguments to support all these, but at the end of the day it is just a scaling that is applied to all values.