-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving Confusion Matrix Interpretability: FP and FN vectors should be switched to align with Predicted and True axis #2071
Comments
👋 Hello @rbavery, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at [email protected]. RequirementsPython 3.8 or later with all requirements.txt dependencies installed, including $ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
additionally it'd be nice to provide the option to normalize the confusion matrix along a row instead of along a column. This would show class specific precision along the diagonal. I should also mention that I'm happy to submit these changes as PRs if the maintainers are open to them. |
@rbavery thanks for the feedback. Normalization is the default as it allows for better results comparison across datasets. I'm not sure I understand your other suggestions, but I do agree the confusion matrix is challenging to interpret due to the background, which may dominate at lower confidence thresholds. This is also probably due to the fact that people are much more used to seeing classification confusion matrices. Perhaps an option may be an In any case, be advised there was a recent PR with a confusion matrix bug fix: #2046 |
Hi @glenn-jocher thanks a bunch for the quick response. I updated the issue to show what the swapped FN and FP background vectors looks like in the new confusion matrix in units of count, so you can compare with the non-swapped count matrix.
I agree proportions are easier to compare across datasets, it's something I'm doing with this and other models. I'm concerned about what the proportion represents when we just consider a single confusion matrix. I think swapping the bottom row for the rightmost column allows us to convey more meaningful proportions when the confusion matrix is normalized. I'll try to state my concern a little more clearly with some questions: What does a proportion represent in the current implementation? For a true positive cell, shouldn't the proportion represent either recall or precision, depending on the axis the confusion matrix is normalized along? If a TP proportion cell shouldn't represent recall or precision, what should this proportion represent if background is to be included?
I also think it could be a bit too dangerous/tempting to offer an option to not include the background class. Object detection models often have a trouble with missing objects for background or predicting background when there should be a detection. Taking this component out of the visualization would make the above example look a lot better than it actually performed in reality, and folks that are more used to image classification matrices (as you mentioned) might be especially tempted to remove the background class from the evaluation. This would lead to users of their models being very surprised when the model missed a lot of groundtruth or makes a lot of bad detections on background. |
At the core of my interpretation is that background is just another class in object detection. So on the Predicted axis (rows), we should have predicted background (False negative). Along the True axis (columns) we should have the truth being that there was only background and no detection of another class (False positive). And I think it makes sense that the bottom right corner remains empty, because there are going to be too many cases were background was correctly predicted as background. |
@rbavery the confusion matrix is just as an intersection of data, with the units as probabilities (when normalized). So if Recall and Precision are a separate topic, and are computed within a single class, never across classes. |
By class specific precision I mean
Apologies if I'm being creative with the terms. But I think you've stated my point
In the current implementation, this rule does not hold for the background row, the last row. If you swap the columns and rows, the rule does hold. For example, we have a column for "boma". We have a row for "boma" The probability should be the probability that the groundtruth boma (col) is predicted as "boma" (row). BUT for the last row, we instead have that the probability that background (row) is predicted as "boma" (col). This is not as intuitive to me as switching it so that the rule you stated holds for all columns and rows. If there's a justification for not switching I'm curious to hear it. |
In fact, the current normalization means you aren't summing across all groundtruth along the column, since the bottom left background/boma cell has no groundtruth "boma". But it is included in the summation of the column that is used to normalize by total amount of groundtruth class "boma". |
@rbavery I've produced updated confusion matrices here to try to understand this better. The first is at 0.25 conf (default), second at 0.9 confidence, which should increase the 'background' row, but it does not. So perhaps a transpose is order. Can you apply your proposed changes with the two COCO command below to compare?
0.25 conf (default)0.9 conf |
Yes, happily. I can take care of that on the weekend, I'm working toward a deadline right now. I appreciate you testing this and for the feedback. |
@rbavery agree, as confidence is increased then more objects are missed, bottom row should increase. As confidence decreases the more FPs should dominate which would be the last column increasing. Sure, submit a PR and I will experiment with this there. Thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Someone explain what is " BACKGROUND" COLUMN AND ROW in the confusion matrix, and how to interpret them? |
@SaddamHosyn
|
🚀 Feature
The current code to generate the confusion matrix produces something like this:
If the confusion matrix is normalized so that counts become proportions, the proportions are not reflective of standard metrics like the class specific recall.
For example, if we focus on the "boma" class and the proportion of TP, ideally we would want to show an interpretable proportion like recall: TP for boma/ TP for boma+ FN for boma.
Instead, the proportion currently reflects TP for boma / (TP for boma + FN for boma - FN_background for boma+ FP_background for boma), which is difficult to interpret.
I think this is a simple fix, switch the bottom row and label with the rightmost column and label. This way the "True" and "Predicted" axis labels act as axis for all the cells in the confusion matrix, including for False positive background and False Negative background.
For example, this confusion matrix shows more clearly that 4% of the "boma" class are correctly detected out of all groundtruth.
This is the count matrix that swaps the FP and FN background vectors to show what we did to produce the proportion matrix above.
Motivation
This will make normalized confusion matrices more interpretable. I'm not sure how folks interpret the current implementation of the confusion matrix if it is normalized, and I would appreciate guidance on this if there is some reasoning behind the current implementation. However I think the change makes interpretation easier.
Pitch
Take the False Positive background row and swap it with the False Negative background column. Do this so that the "Predicted" axis reflects the category that is predicted for every single row and so that the "True" axis reflects what is groundtruth for every single column.
Alternatives
Don't normalize the matrix and just use counts. But even in this case, I think swapping the row and column data and label positions makes sense and is easier to interpret, since the axis act as axis for all rows and columns in the matrix.
Additional context
I spent some discussion time with @alkalait to confirm that the current implementation is not very interpretable if the matrix is normalized, and that swapping the row and column to make the axis valid for all rows and columns would be an improvement. Hopefully my explanation of the above isn't too confusing and I'm happy to clarify.
Thanks for considering this issue and for open sourcing this awesome project!
The text was updated successfully, but these errors were encountered: