Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate vector tiles from RAWR tiles ? #25

Closed
KAI10 opened this issue Jul 2, 2018 · 7 comments
Closed

How to generate vector tiles from RAWR tiles ? #25

KAI10 opened this issue Jul 2, 2018 · 7 comments

Comments

@KAI10
Copy link

KAI10 commented Jul 2, 2018

I was able to generate RAWR tiles by running rawr-process in tilequeue repository. However, i am not sure how to use these RAWR tiles to generate vector tiles. Do i use code from this repository, or is there any example available showing how to do that?

@zerebubuth
Copy link
Member

You can use the meta-tile command or the tile command from tilequeue to generate either a "meta tile" (package of several adjacent tiles) or a single tile, respectively. Your config.yaml needs to be set up to use RAWR tiles by setting use-rawr-tiles: true and the rawr.source section needs to point to where the RAWR tiles are stored. (See the source code for more info on how to configure the rawr.source section.)

Hope that helps!

@KAI10
Copy link
Author

KAI10 commented Jul 3, 2018

Thanks, I was able to generate vector tile using tile command. However, when i generate vector tile (in json format) from rawr tile, it takes about 36 seconds, where generation from postgresql database takes around 3 seconds. Coordinate of the tile was 10/768/442. Is this normal ?

@zerebubuth
Copy link
Member

Yes, generating a single tile from a RAWR tile can be very slow, especially for RAWR tiles with lots of data in them. If you want to generate a small number of individual tiles, or generate tiles on-demand, then I'd recommend not using RAWR tiles and using the database directly.

The reason for this is that RAWR tiles were designed to be plain data and are not indexed by the min_zoom of the individual features. The idea was that it could change without needing to regenerate the RAWR tiles. When a RAWR tile is loaded, tilequeue calculates a min_zoom for every feature and spatially indexes it. This means that subsequent operations can be very fast, but there's an up-front cost to be paid. So the first tile might take 36 seconds, but the second should take much less - hopefully less than 3!

PostgreSQL has already paid this min_zoom calculation and indexing cost when we imported the data, so it is able to return results without delay.

Generally, when we use RAWR tiles it's for large-scale batch generation and we group jobs by the RAWR tile they're part of, so we can amortize the indexing cost over many tiles (85 per z10 RAWR tile in our current configuration). This means we can run all those jobs concurrently without overloading the (small cluster of) PostgreSQL databases.

@nvkelso
Copy link
Member

nvkelso commented Jul 3, 2018

Triaged some of this into the wiki, please followup with additional questions.

@nvkelso nvkelso closed this as completed Jul 3, 2018
@KAI10
Copy link
Author

KAI10 commented Jul 8, 2018

Thanks for the detailed explanation. For the next step, how do i run the process so that tiles that are part of a single rawr tile are generated together, sharing the indexing cost ? Here's what i tried:

I used the enqueue-tiles-of-interest-pyramids command to create a queue (of type file). The queue seemed to contain tile coordinates in a hierarchical manner (according to their zoom level). I copied one such group into a file tile_queue_file_group.txt and used it as a queue. Then i ran the process command with use-rawr-tiles set to true.

However, it seems each tile is taking large amount of time to be generated where i was expecting, starting from the second tile, generation time would be low.

@zerebubuth
Copy link
Member

That sounds like it should work, but there's a couple of things which might be complicating the process.

  1. To keep tiles together with other tiles generated from the same RAWR tile, the code enqueues groups of tile coordinates in the same message payload. For the file type of queue, each payload is a separate line and this means that the file should have multiple coordinates (written z/x/y) per line, separated by commas. If you only see one coordinate per line, then you might need to switch to the multiple message marshaller in the message-marshall section of the config.
  2. Lower zoom tiles generally take longer to process than higher zoom ones because there's more data in them, and there's much more processing done on that data. To test this, you could try deleting the lower zoom tile coordinates in the tile_queue_file_group.txt and seeing if the higher ones are generated as fast as you think they should be.

The queueing system in tilequeue is what we used to use, but we've since switched to AWS Batch, which doesn't use the same system. Instead of grouping messages in queues, we group whole tile pyramids of tiles together in a "job". We generally group at z7 and the job is run using the tilequeue meta-tile command. To use this to generate all the tiles from a z10 RAWR tile, I would set the batch.queue-zoom config setting to 10, and run tilequeue meta-tile --config config.yaml --tile z/x/y with the z/x/y of the RAWR tile. That should ensure that the RAWR tile is indexed only once.

If it's still slow, you could try putting a breakpoint or print statement on the RAWR tile index generating code to check if it's being called multiple times.

Hope that helps!

@KAI10
Copy link
Author

KAI10 commented Jul 12, 2018

Successfully batch generated vector tiles from rawr tiles with reasonable run time. I had the issue you mentioned in 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants