-
Notifications
You must be signed in to change notification settings - Fork 312
Journal files not erased as they roll over #124
Comments
Sorry for the delay -- I was out of town for a while. There are two scripts built with kestrel that should help here:
That will dump out a (sort of) human-readable list of the operations in a journal file, without the associated data. It might be useful for figuring out what's messed up, although if the journal is "hundreds of gigabytes", it might be too time-consuming to matter.
This takes a set of journal files and packs them into their minimal operations (usually one ADD for each item that's currently in the queue). You should take the bad server out of rotation, run this script on its journal files, and then start it back up again with only the new journal file. (You can delete the old ones after it's finished packing the new one.) It might be helpful to "qdump" the new file before starting, in case there's something wonky. |
Hi, I've tried qdump on some of the files and I am getting something like:
qdump.sh my_queue.1373865238490
00000000 ADD 2601
00000a3e ADD 2641
000014a4 REM
000014a5 REM
000014a6 ADD 2597
00001ee0 ADD 2616
0000292d ADD 2437
000032c7 REM
000032c8 ADD 2572
00003ce9 ADD 2624
...............
0199cf REM
000199d0 REM
000199d1 ADD 2649
0001a43f ADD 2649
0001aead ADD 2631
0001b909 REM
0001b90a REM
0001b90b REM
0001b90c REM
Exception in thread "main" java.util.NoSuchElementException: queue empty
at scala.collection.mutable.Queue.dequeue(Queue.scala:65)
at net.lag.kestrel.tools.QueueDumper.dumpItem(QDumper.scala:102)
at net.lag.kestrel.tools.QueueDumper$$anonfun$apply$2.apply(QDumper.scala:47)
at net.lag.kestrel.tools.QueueDumper$$anonfun$apply$2.apply(QDumper.scala:45)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at scala.collection.Iterator$$anon$22.foreach(Iterator.scala:451)
at net.lag.kestrel.tools.QueueDumper.apply(QDumper.scala:45)
at net.lag.kestrel.tools.QDumper$$anonfun$main$1.apply(QDumper.scala:201)
at net.lag.kestrel.tools.QDumper$$anonfun$main$1.apply(QDumper.scala:199)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.immutable.List.foreach(List.scala:76)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)
at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:44)
at net.lag.kestrel.tools.QDumper$.main(QDumper.scala:199)
at net.lag.kestrel.tools.QDumper.main(QDumper.scala)
Other files in the same queue seem not to have a problem with qdump, ie I am getting a full list of ops and qdump exits without exceptions. Additionally, the overal queue + journal size was reduced from 154G to 500m just after start pushing new data to the queue Thanks, |
We've just had the same situation occur that @kperi described above. I'm trying to rescue the situation without losing any data, but when I try to run qpack.sh on the out-of-control journal files, I get errors like these:
This suggests one possible reason for the journals not getting packed automatically: the journal packer thread is failing on corrupt journal files because of this exception. I couldn't find any direct evidence of this in the log files, but then, the disks were full due to this issue, so maybe the logger couldn't write the appropriate stacktrace. A bit speculative, but it's all I've got so far... |
Note: there were a lot of timestamped journal files with size 0, but I removed all of these first before trying to run qpack.sh. |
Quoting @andrewclegg and @robey from this post: #16
"We have queues where there are hundreds of gigabytes of old journal files for queues that currently have items=0. Obviously this takes up a lot of disk space, but it also means Kestrel takes ages to start up, as it replays all those old operations.
How can we force Kestrel to compact them?"
The text was updated successfully, but these errors were encountered: