The below report shows each intent’s performance with different measures. Each measure gives insights into why each intent performs in a certain way.

Types of performance measurements

1. Precision

It shows the percentage of confusion between one and other intents, e.g. If precision is 100%, there is no confusion between intent samples and other intents. This measure gives you insights into the confusion between 2 intents or more.
Assume you have two intents; one has samples that are supposed to be in the other intent. The model is now confused, unable to categorize the sample in which intent, e.g. if one intent has 20 samples and they are all accurate, and another intent has 10 samples, 5 of them belong to the previous intent, then the precision measure of the first intent becomes less.

What to do if intent precision is low

ℹ️ The graph shows the intents with less precision in Yellow and those with more precision in Blue.

2. Recall

Shows the accuracy of each intent sample. i.e., when intent has 20 samples, 10 are accurate, and the other 10 are inaccurate. The intent recall here is 50% accuracy. The recall checks the accuracy of each intent’s samples. It answers the question: How many samples are correctly classified?

What to do if an intent recall is low

ℹ️ If your samples’ recall measure is less, you need to add more samples to your intents.

ℹ️ The graph shows intents with less recall in Red and the ones with more recall in Blue.

3. F1 Score

This is the average measure between Precision and Recall. To fully evaluate the effectiveness of the model, you should consider both precision and recall.

What to do if F1-score is low