Some notions (like kernel or this one) seem to reappear everywhere and mean nothing or everything. Inference however has a clearly defined meaning.

Inference is the designation made to the process of using a trained model to make predictions. The application of the model once training has finished

Once a model has been trained, it enters the Inference Phase. The model does not update or learn during this phase.

Inference efficiency can be different from training efficiency, different methods can be applied and have different impact on the model (see amls course).

import torch
from my_model import TrainedModel

# Load trained model
model = TrainedModel()
model.load_state_dict(torch.load('model_weights.pth'))

# Set the model to evaluation mode (important for inference)
model.eval()

# Sample input data
input_data = torch.tensor([[0.5, 0.3, 0.2]])


with torch.no_grad():  # Disabling gradient calculation because it is not needed anymore (training is done)
    # --- Inference Phase ---
    output = model(input_data)
    predicted_class = torch.argmax(output, dim=1)

print(f"Predicted Class: {predicted_class}")

Similar notions

Batch inference: Predictions made on a large dataset (batch) at once.
Real-Time Inference: Predictions made immediately when new data arrives