K-Fold cross validation increases the certainty and trust in the models accuracy score.

k-fold cross validation requires (k) times retraining. It comes with a huge computational cost.

When to use

doubts appear for example, if reruns of the model sometimes yield bad accuracy scores without an apparent reason.

How it works

Pasted image 20250319133042.png

How to choose K:

Just make sure that the test dataset is still large enough. Don't overthink it.

Example: 100 datapoints, test dataset is about 20% => k = 5 would be appropriate.

Implementation

Let's say the data is in arrays or a dataframe:

notice that we actually repeat the KFold cross validation 10 times to get even more accurate scores. This is obviously only possible, if the training cost is manageable.
from sklearn.model_selection import cross_val_score, RepeatedKFold
from sklearn.linear_model import LogisticRegression

X, y = get_dataset(...)

estimator = LogisticRegression()

rkf = RepeatedKFold(n_splits=5, n_repeats=10)
scores = cross_val_score(estimator, X, y, cv=rkf, n_jobs=-1)

print("Average Score:", scores.mean())
print("Standard Deviation:", scores.std())

cross_val_score

Input:
- estimator: the model type that will be trained and evaluated
- X, y: The full dataset
- cv=rkf: The cross validation strategy used. In this case this means that we split into 5 folds, and repeat the process 10 times. The splits are random each time. Therefore we train and evaluate 50 times in total.

Output: The accuracy scores from each individual trained model.

Wait this is way too high level, doesn't work for my use-case:

I agree, that's not coding anymore, its way too simple. You can simply write your own loop(s):

indices = np.array(range(len(X)))
np.random.shuffle(indices)

k = 5 # amt of folds
fold_size = len(patients) // 5
fold_accuracy_scores = []

repeats = 10
for repeat in range(repeats):
    print(f"Repeat {repeat + 1}/{repeats}")
    for fold in range(k):
        start = fold * fold_size
        end = start + fold_size if fold != k - 1 else len(X)

        test_indices = indices[start:end]
        train_indices = np.concatenate([indices[:start], indices[end:]])

        train_patients = patients[train_indices]
        test_patients = patients[test_indices]

        X_train, y_train = get_dataset(train_patients)
        X_test, y_test = get_dataset(test_patients)

        fold_accuracy_scores.append(train_log_regression(X_train, y_train, X_test, y_test))

avg_accuracy = np.mean(fold_accuracy_scores)
std_accuracy = np.std(fold_accuracy_scores)

print("\nAverage accuracy:", avg_accuracy)
print("Standard Deviation:", std_accuracy)

Consequences for the final model

K-fold cross validation is a tool to get an accuracy score during development.
The architecture, hyperparameters etc... found via investigating and improving that accuracy score, are then used in a final training using the entire dataset, without test data.

After deployment the accuracy in practice should be monitored. If it deviates too much from the score found via cross validation it should be investigated.

Related subjects

Nested cross validation