Journal files not erased as they roll over #124

kperi · 2013-07-03T17:02:21Z

Quoting @andrewclegg and @robey from this post: #16

"We have queues where there are hundreds of gigabytes of old journal files for queues that currently have items=0. Obviously this takes up a lot of disk space, but it also means Kestrel takes ages to start up, as it replays all those old operations.

How can we force Kestrel to compact them?"

robey · 2013-07-09T06:47:56Z

Sorry for the delay -- I was out of town for a while.

There are two scripts built with kestrel that should help here:

$ ./dist/kestrel/scripts/qdump.sh --help

That will dump out a (sort of) human-readable list of the operations in a journal file, without the associated data. It might be useful for figuring out what's messed up, although if the journal is "hundreds of gigabytes", it might be too time-consuming to matter.

$ ./dist/kestrel/scripts/qpack.sh --help

This takes a set of journal files and packs them into their minimal operations (usually one ADD for each item that's currently in the queue). You should take the bad server out of rotation, run this script on its journal files, and then start it back up again with only the new journal file. (You can delete the old ones after it's finished packing the new one.) It might be helpful to "qdump" the new file before starting, in case there's something wonky.

kperi · 2013-07-15T15:23:34Z

Hi,
Thanks for the reply.

I've tried qdump on some of the files and I am getting something like:


qdump.sh my_queue.1373865238490


00000000  ADD 2601  
00000a3e  ADD 2641  
000014a4  REM
000014a5  REM
000014a6  ADD 2597  
00001ee0  ADD 2616  
0000292d  ADD 2437  
000032c7  REM
000032c8  ADD 2572  
00003ce9  ADD 2624  
...............
0199cf  REM
000199d0  REM
000199d1  ADD 2649  
0001a43f  ADD 2649  
0001aead  ADD 2631  
0001b909  REM
0001b90a  REM
0001b90b  REM
0001b90c  REM


Exception in thread "main" java.util.NoSuchElementException: queue empty
    at scala.collection.mutable.Queue.dequeue(Queue.scala:65)
    at net.lag.kestrel.tools.QueueDumper.dumpItem(QDumper.scala:102)
    at net.lag.kestrel.tools.QueueDumper$$anonfun$apply$2.apply(QDumper.scala:47)
    at net.lag.kestrel.tools.QueueDumper$$anonfun$apply$2.apply(QDumper.scala:45)
    at scala.collection.Iterator$class.foreach(Iterator.scala:772)
    at scala.collection.Iterator$$anon$22.foreach(Iterator.scala:451)
    at net.lag.kestrel.tools.QueueDumper.apply(QDumper.scala:45)
    at net.lag.kestrel.tools.QDumper$$anonfun$main$1.apply(QDumper.scala:201)
    at net.lag.kestrel.tools.QDumper$$anonfun$main$1.apply(QDumper.scala:199)
    at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
    at scala.collection.immutable.List.foreach(List.scala:76)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)
    at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:44)
    at net.lag.kestrel.tools.QDumper$.main(QDumper.scala:199)
    at net.lag.kestrel.tools.QDumper.main(QDumper.scala)

Other files in the same queue seem not to have a problem with qdump, ie I am getting a full list of ops and qdump exits without exceptions.

Additionally, the overal queue + journal size was reduced from 154G to 500m just after start pushing new data to the queue

Thanks,
Kostas

andrewclegg · 2013-09-01T14:35:06Z

We've just had the same situation occur that @kperi described above.

I'm trying to rescue the situation without losing any data, but when I try to run qpack.sh on the out-of-control journal files, I get errors like these:

Packing journals...
Packing: 2.2M   2.2M  Exception in thread "main" net.lag.kestrel.BrokenItemException: java.io.IOException: Unexpected EOF
        at net.lag.kestrel.Journal.readJournalEntry(Journal.scala:414)
        at net.lag.kestrel.Journal.next$1(Journal.scala:422)
        at net.lag.kestrel.Journal$$anonfun$next$1$1.apply(Journal.scala:427)
        at net.lag.kestrel.Journal$$anonfun$next$1$1.apply(Journal.scala:427)
        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1060)
        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1052)
        at scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:952)
        at scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:952)
        at scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:941)
        at scala.collection.immutable.StreamIterator.hasNext(Stream.scala:946)
        at scala.collection.TraversableOnce$FlattenOps$$anon$1.hasNext(TraversableOnce.scala:391)
        at scala.collection.Iterator$$anon$22.hasNext(Iterator.scala:457)
        at scala.collection.Iterator$class.foreach(Iterator.scala:772)
        at scala.collection.Iterator$$anon$22.foreach(Iterator.scala:451)
        at net.lag.kestrel.JournalPacker.apply(JournalPacker.scala:73)
        at net.lag.kestrel.tools.QPacker$.main(QPacker.scala:65)
        at net.lag.kestrel.tools.QPacker.main(QPacker.scala)
Caused by: java.io.IOException: Unexpected EOF
        at net.lag.kestrel.Journal.readBlock(Journal.scala:443)
        at net.lag.kestrel.Journal.readJournalEntry(Journal.scala:382)
        ... 16 more

This suggests one possible reason for the journals not getting packed automatically: the journal packer thread is failing on corrupt journal files because of this exception.

I couldn't find any direct evidence of this in the log files, but then, the disks were full due to this issue, so maybe the logger couldn't write the appropriate stacktrace. A bit speculative, but it's all I've got so far...

andrewclegg · 2013-09-01T14:36:25Z

Note: there were a lot of timestamped journal files with size 0, but I removed all of these first before trying to run qpack.sh.

DavidFishman mentioned this issue Dec 13, 2013

Archived Journal Files Do Not Compact as Documented #132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Journal files not erased as they roll over #124

Journal files not erased as they roll over #124

kperi commented Jul 3, 2013

robey commented Jul 9, 2013

kperi commented Jul 15, 2013

andrewclegg commented Sep 1, 2013

andrewclegg commented Sep 1, 2013

Journal files not erased as they roll over #124

Journal files not erased as they roll over #124

Comments

kperi commented Jul 3, 2013

robey commented Jul 9, 2013

kperi commented Jul 15, 2013

andrewclegg commented Sep 1, 2013

andrewclegg commented Sep 1, 2013