Big Difference In Accuracy After Training The Model And After Loading That Model
Solution 1:
It is difficult to answer that without your actual data. But there is a smoking gun, raising suspicions that your validation data might be (very) different from your training & test ones; and it comes from your previous question on this:
If i use
fit_transform
on my [validation set] features, I do not get an error, but I get accuracy of 52%, and that's terrible (because I had 89.1 %).
Although using fit_transform
on the validation data is indeed wrong methodology (the correct one being what you do here), in practice, it should not lead to such a high discrepancy in the accuracy.
In other words, I have actually seen many cases where people erroneously apply such fit_transform
approaches on their validation/deployment data, without never realizing any mistake in it, simply because they don't get any performance discrepancy - hence they are not alerted. And such a situation is expected, if indeed all these data are qualitatively similar.
But discrepancies such as yours here lead to strong suspicions that your validation data are actually (very) different from your training & test ones. If this is the case, such performance discrepancies are to be expected: the whole ML practice is founded upon the (often implicit) assumption that our data (training, validation, test, real-world deployment ones etc) do not change qualitatively, and they all come from the same statistical distribution.
So, the next step here is to perform an exploratory analysis to both your training & validation data to investigate this (actually, this is always assumed to be the step #0 in any predictive task). I guess that even elementary measures (mean & max/min values etc) will show if there are strong differences between them, as I suspect.
In particular, scikit-learn's StandardScaler
uses
z = (x - u) / s
for the transformation, where u
is the mean value and s
the standard deviation of the data. If these values are significantly different between your training and validation sets, the performance discrepancy is not to be unexpected.
Post a Comment for "Big Difference In Accuracy After Training The Model And After Loading That Model"