About contraction. There are indeed 8 summands in the expression: \( \delta_{rm} \frac {\partial x^m} {\partial y^r} \frac {\partial x^n} {\partial y^s} \), but when you contract this with \( dy^r dy^s \) there are two summands, namely \( \partial x^1 \partial x^1 + \partial x^2 \partial x^2 \). Thus, contracting a tensor leaves a scalar and reduces the number of terms in the implied summation. This is just rules of summation when indices exist and when they don't (they get contracted when they're repeated, and disappear from the expression.) Contraction requires that one repeated index is up (in the \( dy \) terms), the other down (in the denominator of the gradient terms). Which is about having to take care of covariance; relativity means the laws of physics are the same everywhere. Or it means that gradients or gradient functions like \( \frac {\partial} {\partial x^n} \) transform covariantly; displacements, like ordinary vectors, transform contravariantly.
Sorry, typo: \( \delta_{rm} \frac {\partial x^m} {\partial y^r} \frac {\partial x^n} {\partial y^s} \) should be \( \delta_{mn} \frac {\partial x^m} {\partial y^r} \frac {\partial x^n} {\partial y^s} \). I notice Susskind starts with \( g_{mn} dx^m dx^n \). This means \( g_{mn} = \delta_{mn} \). When the indices for whatever the metric is 'multiplying' are different, that implies something like the expression I corrected. Since the metric is covariant, does that mean it indexes the \( \partial y \) components? Susskind does say several times that lower or covariant indices are in the denominator, so I'm assuming that's how I interpret it. \( g_{mn} \) is a 2x2 matrix (just because it has two indices).
Well, actually what you have is \( [g_{11}\frac {\partial x^1} {\partial y^r} \frac {\partial x^1}{\partial y^s} + g_{22}\frac {\partial x^2} {\partial y^r} \frac {\partial x^2}{\partial y^s}]\). This is just multiplying the non-zero components of \( g_{mn} \) (after accounting for the \( dx^i \)) by a set of numerators which can be considered as a 2x2 matrix. So you actually have \( \delta_{mn}\partial {x^m}\partial {x^n} \otimes \frac {1} {\partial y^s \partial y^r} \) aka the Kronecker matrix product. Although you don't really gain anything by using matrices, it seems. But there is the fact that \( \delta_{ab} = (I)_{ab} \), the identity matrix.
I like this page The author indicates we can talk about s, the invariant interval, as a smooth function: \( ds = [\frac {\partial s} {\partial x^1} dx^1+ \frac {\partial s} {\partial x^2} dx^2 + . . . + \frac {\partial s} {\partial x^n} dx^n] \) where s is a function of the \( x^i \). Moreover we can easily see that ds can be defined as the inner product of two vectors, one with components which are gradients: \( \mathbf {g} = \Bigl ( \frac {\partial s} {\partial x^1}, \frac {\partial s} {\partial x^2}, . . . , \frac {\partial s} {\partial x^n}\Bigr ) \), and \( \mathbf {d} = \Bigl ( dx^1, dx^2, . . ., dx^n\Bigr ) \) So ds =\( \mathbf g \cdot \mathbf d \) and s (or ds) is a scalar field on the manifold (a field which depends only on position). The components of g are gradients, such that \( \mathbf g = \nabla s \); the components of d are displacements from a nominal point at \( (x^1, x^2, . . ., x^n) \). Hence g is a gradient vector (or rank 1 tensor) which transforms covariantly, d is a displacement vector which transforms contravariantly. I think about contravariant vectors as "ordinary" vectors; when you change to Y from X, the direction of such a vector changes (it contra-varies as coordinates vary, although this is not at all rigorous). Another view is if the coordinate system is scaled, the vector will shrink if they scale up and it will grow if they scale down. Gradients (the gradient of the interval s or ds for example) don't do this, instead they vary "as" the coordinates. Draw a diagram or two and see for yourself.