Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch to fix uconn timestamp storage #744

Merged
merged 5 commits into from
Aug 18, 2022
Merged

Conversation

lisaSW
Copy link
Contributor

@lisaSW lisaSW commented Aug 17, 2022

This is a patch to store every timestamp in the uconn ts list rather than a unique set. This will not impact any of the scoring algorithms, which already gather the timestamps with $addToSet

@lisaSW lisaSW requested a review from Zalgo2462 August 17, 2022 17:23
Copy link
Contributor

@Zalgo2462 Zalgo2462 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. I will give it a run here in a bit.

We might want to consider using TsListFull instead of TsList when calling getTsHistogramScore on line 218. This would keep the timestamp list in line with the timeline/ histogram scores.

It would be convenient if we could also use TsListFull for the rest of the scores. Below, I've written a bit about using TsListFull for the ts skew and dispersion scores. If you see any reasons which would prevent us from doing so that I have not addressed below, please let me know.

From a first glance, it looks like the Bowley skew would actually work with tsListFull. If we use TsListFull, then tsLow, tsMid, and tsHigh might all be 0. If just tsLow is 0, then it doesn't look like we will run into any issues. In the case that both tsLow and tsMid are 0, we have a special case that makes sure that tsSkew gets set to 0 on line 137. The skew score is defined as 1 - tsSkew, so a beacon with all of its delta times within half a second would score high on skew. I think that is a reasonable expectation.

It looks like the issue with the dispersion score hits when we go to create the tsMadmScore on line 183. We divide by tsMid here which could be 0 if we use tsListFull. I believe the only distributions where we'd see tsMid get set to 0 would be distributions with a huge spike at 0 and a tail of smaller bins out to the right. In these cases, the median deviation from the median should also be near 0. So, I think there is good reason to add a special case here where we just set the tsMadmScore to 1 if tsMid is 0.

Copy link
Contributor

@Zalgo2462 Zalgo2462 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed using TsListFull for IP beacon scoring, but that can be implemented in a separate PR if the testing goes well. I tested this PR as it stands and it works well.

@Zalgo2462 Zalgo2462 merged commit a589de4 into master Aug 18, 2022
@Zalgo2462 Zalgo2462 deleted the timestamp-list-patch branch August 18, 2022 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants