Big Difference In Accuracy After Training The Model And After Loading That Model

December 18, 2023 Post a Comment

I made Keras NN model for fake news detection. My features are avg length of the words, avg length of the sentence, number of punctuation signs, number of capital words, number of

Solution 1:

It is difficult to answer that without your actual data. But there is a smoking gun, raising suspicions that your validation data might be (very) different from your training & test ones; and it comes from your previous question on this:

If i use fit_transform on my [validation set] features, I do not get an error, but I get accuracy of 52%, and that's terrible (because I had 89.1 %).

Although using fit_transform on the validation data is indeed wrong methodology (the correct one being what you do here), in practice, it should not lead to such a high discrepancy in the accuracy.

In other words, I have actually seen many cases where people erroneously apply such fit_transform approaches on their validation/deployment data, without never realizing any mistake in it, simply because they don't get any performance discrepancy - hence they are not alerted. And such a situation is expected, if indeed all these data are qualitatively similar.

But discrepancies such as yours here lead to strong suspicions that your validation data are actually (very) different from your training & test ones. If this is the case, such performance discrepancies are to be expected: the whole ML practice is founded upon the (often implicit) assumption that our data (training, validation, test, real-world deployment ones etc) do not change qualitatively, and they all come from the same statistical distribution.

So, the next step here is to perform an exploratory analysis to both your training & validation data to investigate this (actually, this is always assumed to be the step #0 in any predictive task). I guess that even elementary measures (mean & max/min values etc) will show if there are strong differences between them, as I suspect.

In particular, scikit-learn's StandardScaler uses

z = (x - u) / s

for the transformation, where u is the mean value and s the standard deviation of the data. If these values are significantly different between your training and validation sets, the performance discrepancy is not to be unexpected.

Python Developer

Big Difference In Accuracy After Training The Model And After Loading That Model

Solution 1:

Post a Comment for "Big Difference In Accuracy After Training The Model And After Loading That Model"