Skip to content

First, note that the smallest L2-norm vector that can fit the training data for the core model is \(=[2,0,0]\) On the other hand, in the presence of the spurious feature, the full model can fit the training data perfectly with a smaller norm by assigning weight \(1\) for the feature \(s\) (\(||_2^2 = 4\) while…

Read More