-
Notifications
You must be signed in to change notification settings - Fork 179
Added storage size based retention method and new metrics #343
Added storage size based retention method and new metrics #343
Conversation
Prometheus PR: prometheus/prometheus#4230 |
I just left comments for things that stood out to me, don't treat it as an exhaustive review though, since I'm not a TSDB expert :) (could you update the PR description to say |
@juliusv Hey julius, thanks for looking over it, I made a few of the changes you (and simonpasquier) recommended. There's still 1 or 2 things I'm not sure about, I know you said you're not a tsdb expert, who would it best for me to talk to? |
I am not an expert either, but can have a look in few days. |
2555b08
to
ba05566
Compare
I rebased the code and changed my code to better match how deletions are being done now. I fixed one or two things on github's editor to fix conflicts and didn't add a signature to those commits. Not sure how to fix that now cause it's still complaining about them. |
@mknapphrt No need to worry about it now, you can squash the commit once the PR is ready to go. |
So it turns out that @juliusv was right before, even though ULID's are lexicographically sortable identifiers as was pointed out, the time they are generated doesn't coincide with the time range of the block itself, because when they are compacted they get a new ULID which is more recent. I changed it so that it first sorted the blocks by their times ranges and then decided which blocks to delete. |
Any opinions? @brian-brazil Is this more what you were thinking of instead of walking over the directory? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general approach seems fine, I'll leave the detailed review to the maintainers of this repo. Thanks for your work!
Thanks for all the feedback guys, I really appreciate it. I've gotten a couple responses saying they're not the tsdb expert, so I was wondering who is the tsdb expert that I could bother to look over this? @juliusv @krasi-georgiev Thanks again |
@mknapphrt The biggest experts here would be @fabxc and @gouthamve according to https://github.com/prometheus/tsdb/graphs/contributors |
Doesn't seem there's been much activity around here recently, but I just wanted to try nudging @fabxc and @gouthamve to see if you guys could take a look over this. Thanks |
Maybe @gouthamve has more time after https://twitter.com/putadent/status/1026013754251653120 now :) |
@mknapphrt I can have a look again after Promcon as I already have a use case for this so would be great to see it merged. |
Congrats @gouthamve! And thanks @krasi-georgiev, I appreciate it |
yep not a bad idea. |
A warning should be added in the prometheus code, right? Done at the time of parsing the flags or on creation of the tsdb I would think. Or do you think it would fit better in the tsdb? |
aah yeah definitively in Prometheus. I don't want to delay this one any longer, but it would be great if @gouthamve or @fabxc have a final look , but if not will wait few more days and will merge. |
Ok, are we still going to be doing any benchmarking or at this point should I take out the tsdb changes in the prometheus PR? |
Lets merge and will move to Prometheus next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the whole corrupted block handling atm. Left some comments, but other than that, the actual size based retentions bits look good. I'll take another look once the handling of corrupted blocks is fixed.
db.go
Outdated
continue | ||
for ulid, err := range corrupted { | ||
if _, ok := deletable[ulid]; !ok { | ||
return errors.Wrap(err, "unexpected corrupted block") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always trigger an error no?
deletable := db.deletableBlocks(loadable)
means the deletable
list will only have blocks from the loadable
list which don't have any corrupted blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes when there is a corrupted block that is not set for deletion (by size/date retention or replaced by a parent after a compaction) we should return an error.
I guess an extra comment here to clarify would be good.
sort.Slice(blocks, func(i, j int) bool { | ||
return blocks[i].Meta().MinTime < blocks[j].Meta().MinTime | ||
return blocks[i].Meta().MaxTime > blocks[j].Meta().MaxTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this changed? Curious if it changes anything....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really can't remember but reverting it fails the TimeRetention test
for ulid, block := range blocks { | ||
if block != nil { | ||
if err := block.Close(); err != nil { | ||
level.Warn(db.logger).Log("msg", "closing block failed", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the default behaviour before the refactoring. Not sure why.
Maybe it will at least allow deleting it under linux even when closing fails.
Signed-off-by: Krasi Georgiev <[email protected]>
Signed-off-by: Krasi Georgiev <[email protected]>
One final thing, I still don't see the handling of corrupted blocks is right: This block expects that But from here: The I think the best way to handle |
Signed-off-by: Krasi Georgiev <[email protected]>
I changed the logic a bit. We still want to ignore corrupted blocks replaced by parents (crash during compaction) and possibly in a future PR also ignore corrupted blocks that would be outside the retention policy anywa, but for now this is good enough as I can't think of a nice way to implement this. |
Signed-off-by: Krasi Georgiev <[email protected]>
843d846
to
33f2846
Compare
Signed-off-by: Krasi Georgiev <[email protected]>
@mknapphrt thanks for the great work and sorry for hijacking the last few commits. We just wanted to have this ready for the 2.7 Prometheus release. |
@krasi-georgiev No worries! I'm happy to see it got finished up, i've been following along quietly but haven't had the time to do much in the last few days and you definitely seemed to have a handle on it. Is there an expected release date for 2.7? I'd be happy to work on the Prometheus side of this, I don't think it should take quite as long as this change haha |
after merging #374 it would be time to implement in in Prometheus. 2.7 should happen in the next 1-2 days so it would be perfect if you want to update tsdb and implement it in Prometheus. |
@mknapphrt with #374 and #501 getting merged, you may now go ahead and open a PR for this in Prometheus. |
I'm actually in the process of doing a PR based on prometheus/prometheus#4230 Sorry for hijacking it but there is some time pressure to do Prometheus 2.7 today. |
Thanks for all the help guys! Glad to see this finally got somewhere. I'd be happy to help in the future as I'm assuming the WAL size is a goal down the road. |
…-junkyard#343) Added methods needed to retain data based on a byte limitation rather than time. Limitation is only applied if the flag is set (defaults to 0). Both blocks that are older than the retention period and the blocks that make the size of the storage too large are removed. 2 new metrics for keeping track of the size of the local storage folder and the amount of times data has been deleted because the size restriction was exceeded. Signed-off-by: Mark Knapp <[email protected]>
sure will ping you if we need to get to that. Thanks for your help as well. |
Added methods needed to retain data based on a byte limitation rather than time. Limitation is only applied if the flag is set (defaults to 0). Both blocks that are older than the retention period and the blocks that make the size of the storage too large are removed.
2 new metrics for keeping track of the size of the local storage folder and the amount of times data has been deleted because the size restriction was exceeded.
This PR is in conjunction with a PR with prometheus/prometheus. The idea is to allow a user to set a maximum number of bytes that the local storage can take up, in conjunction with the retention period, and whenever that limit is exceeded the oldest block will be deleted to regain space.
closes #124
closes prometheus/prometheus#3684
Signed-off-by: Mark Knapp [email protected]