Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust tile rank for transit stations per railway type #506

Closed
nvkelso opened this issue Jan 22, 2016 · 26 comments
Closed

Adjust tile rank for transit stations per railway type #506

nvkelso opened this issue Jan 22, 2016 · 26 comments
Assignees
Milestone

Comments

@nvkelso
Copy link
Member

nvkelso commented Jan 22, 2016

In London the kind_tile_rank sometimes biases towards lower quality service features like light rail & trams to the detriment of larger stations serving "heavy" rail. The same thing is sometimes true in New York (comparing Manhattan to New Jersey when suddenly many light rail appear when only heavy rail appear in Manhattan). There are probably two factors at work here: tile versus meta tile, and railway type. This issue is focused only on railway type.

Since OSM doesn't have trip frequency level data, the route relations act as a good proxy, but the fact that something has more physical infrastructure investment should also act as a signal that there are probably more trips passing thru a station.

On railway type, from @zerebubuth:

If we want to count subway routes separately from overground routes, 
then we can do that. Many (although not all) route relations are tagged 
to say whether they're surface, subway, light rail or tram lines. It just 
seems like this makes comparing stations harder (i.e: which is more 
important, the one with 4 surface lines and 2 subway lines, or the one 
with 2 surface lines, 1 light rail line and 3 tram lines?).

Proposed solution is to determine the "preferred" / "best" / "dominant" transit type for a station, and to also sort based on that as a tie breaker, maybe weight it, add zooms to the mix?

  • heavy rail station A with 2 lines, light rail station with 2 lines.

not

  • light rail station A with 2 lines, heavy rail station with 2 lines.

London
12/51.5093/-0.0760

screen shot 2016-01-21 at 16 12 18

@zerebubuth
Copy link
Member

One possible way to do it, which feels like it might be quite intuitive, is to take:

  1. The number of mainline rail routes, up to a maximum of 9 and multiply by 100.
  2. The number of subway plus light rail routes, up to a total maximum of 9 and multiply by 10.
  3. The number of tram / other routes, up to a maximum of 9.

Sum these, and we'll get a score for each station between 0 and 999 (it would be interesting to see if any station in the world gets 999). That can then be the value used for sorting to produce the final rank.

Alternatively, if we think that something with 9 light rail routes should be more important that a single mainline route, then we could add them together after scaling, say 6 * mainline + 2 * light + others or something to that effect.

Finally, a few observations about London:

  1. What's that "tear" near West Silvertown near the river to the east?
  2. Almost no stations are showing in the centre of London - are they being suppressed by the road labels? Even if the correct stations aren't showing, I'd expect some stations to show.
  3. I forgot that in production we use the "flat nodes" functionality of osm2pgsql, so raw nodes aren't stored in the database (on my local machine, I don't use that function), so I'll need to patch the function to try joining on the planet_osm_point table instead. Tracking that in Use points and nodes when exploring transit relations #507.

@nvkelso
Copy link
Member Author

nvkelso commented Jan 22, 2016

I dig the proposed solution, let's take it for a spin.

Digging into your comments about London:

  1. The "tear" is, I think, an areal way with some Tangram rendering artifacts. Brett is already on it (we saw similar in Hong Kong a couple weeks ago).
  2. Road labels don't interact with station labels right now (station labels are "on top of" road labels). It looks like it's being compounded by a filter that worked in the USA but not in London: if the station doesn't have an "area" then it's considered less important. Maybe that should be a contributing factor server side influencing the kind_tile_rank, but it doesn't seem useful by itself client side?

Some images (below), with kind_tile_rank labeled and stations below the threshold shown tiny icons (and the area filter removed, thus restoring Waterloo and a couple more stations).

Right now subway neighbouring stations that share a couple lines for a small run pop out and get a better rank. True both for subway and light rail – and how there are several light rail routes so they're blowing out nearby subway stations with just 1 or 2 routes.

screen shot 2016-01-22 at 09 42 45

screen shot 2016-01-22 at 09 42 36

screen shot 2016-01-22 at 09 42 28

screen shot 2016-01-22 at 09 42 05

@zerebubuth
Copy link
Member

I think the fix in #508 will have a big effect here.

Re: areas, I think that's a good tie-break rule, and it'll be more effective once #348 is done. For the moment, it might do more harm than good, as there's a lot of major stations mapped as nodes - Penn Station in NYC, for example.

@nvkelso nvkelso removed the ready label Feb 4, 2016
@nvkelso nvkelso modified the milestones: v0.9.0, v0.8.0 Feb 12, 2016
@nvkelso
Copy link
Member Author

nvkelso commented Mar 8, 2016

Looks like things are better (thanks to #508), but big rail stations are still missing / too low of a rank. Let's pursue the options @zerebubuth describes above for this milestone.

Testing scene file: tangram-skin-and-bones.zip

The same screenshot areas as above:

screen shot 2016-03-08 at 15 35 43

screen shot 2016-03-08 at 15 35 02

screen shot 2016-03-08 at 15 34 47

screen shot 2016-03-08 at 15 34 27

@nvkelso
Copy link
Member Author

nvkelso commented Mar 8, 2016

We had been talking in another issue about setting the kind to be more specific (subway, light_rail, etc not just station). Seems like the lookup process would be the same, should we store a new "service" property on the feature as part of the tile_kind_rank calculation?

@nvkelso nvkelso added the ready label Mar 8, 2016
@zerebubuth
Copy link
Member

A couple of points here:

London isn't a great place to be testing public transit stuff in OSM. When I was exploring the data, it wasn't one of the cities with the most well-mapped public transit relations. Paris or Berlin would be better, and perhaps we can set aside some time to fix the data in London / NYC?

Public transit relations often span the overground and underground (subway) stations, as well as having light rail and/or international services. This means that a single kind or service tag doesn't fully represent the services at a station. A couple of possible solutions to that:

  • a service hierarchy (which would limit styling options in the case that the user prioritises subway / light rail over international services)
  • or a services=[...] list (which isn't supported by all formats)

@zerebubuth
Copy link
Member

A third option would be to add boolean flags, e.g: is_light_rail, is_subway, is_international. The downside of that is that it's quite verbose, but it is supported by all formats and railway stations are fairly rare features.

@zerebubuth zerebubuth added in review and removed ready labels Mar 9, 2016
@zerebubuth
Copy link
Member

Here are the top 20 stations (or halts, tram stops, aerialway stations) in London according to the scoring system set out in this comment:

933 n3662847634 station "London St. Pancras"
933 n3637436408 station "London Liverpool Street"
933 n119274464  station "King's Cross St. Pancras"
631 n441947584  station "Victoria"
631 n3768010552 station "London Victoria"
610 w140162700  station "Stratford"
610 w140162700  station "Stratford"
600 w170130337  station "West Ham"
510 w72570259   station "Queen's Park"
510 n3663368461 station "Euston"
510 n3663368460 station "London Euston"
500 w25025418   station "Canning Town"
431 w302026559  station "London Paddington"
431 r204439     station "London Paddington"
431 n1264482797 station "Paddington (District, Circle, and Bakerloo Lines)"
420 w289193423  station "Whitechapel"
420 n2426825788 station "Whitechapel"
410 w198353814  station "Highbury & Islington"
401 n3757075887 station "Wandsworth Common"
400 w41599363   station "Canada Water"

For comparison, here are the ones that I wrote down before starting that I'd expect to see at the top (but in no particular order). The bold ones are not in the above top 20, and show where they are in the list:

  • Paddington
  • Waterloo (114th / 612)
  • Victoria
  • London Bridge (233rd / 612)
  • Liverpool Street
  • King's Cross / St. Pancras
  • Euston
  • Marylebone (59th / 612)
  • Farringdon (560th / 612)

The remaining issues seem to be mostly data:

  1. Only one of Waterloo's mainline rail routes is detected, because only one is mapped (the yellow highlight on ÖPNVKarte).
  2. None of London Bridge's mainline routes are detected, for a similar reason.
  3. Marylebone detects a couple of mainline routes and a subway route, and comes in at 59th - probably just needs more routes mapping, as I'm pretty sure it serves more than 2 routes.
  4. Farringdon gets a score of zero because the transport relations are broken and there's no way to "get to" the routes from the station node.

I've played around with the scoring function a little; my intuition is that an interchange station is more important than something which either only serves rail routes or subway routes. But it seems like it's hard to quantify.

@zerebubuth
Copy link
Member

I should add: there are duplicates in that list, but we'll try to get rid of those "in post" using the site or stop_area relations which connect them.

@nvkelso
Copy link
Member Author

nvkelso commented Mar 11, 2016

Yes! I was thinking the same thing in #587 regarding interchange stations (and even terminal stations). Double their score?

If one of these is a "transfer" station where two lines come intersect (but not in a sequence 
running parallel), I could see zoom 12. And if it's the terminal station on the line then zoom 11. 

@zerebubuth
Copy link
Member

Here's the top 20 with the first two digits doubled if they're both non-zero:

963     n3662847634     station "London St. Pancras"
963     n3637436408     station "London Liverpool Street"
963     n119274464      station "King's Cross St. Pancras"
961     n441947584      station "Victoria"
961     n3768010552     station "London Victoria"
920     w72570259       station "Queen's Park"
920     w140162700      station "Stratford"
920     w140162700      station "Stratford"
920     n3663368461     station "Euston"
920     n3663368460     station "London Euston"
861     w302026559      station "London Paddington"
861     r204439         station "London Paddington"
861     n1264482797     station "Paddington (District, Circle, and Bakerloo Lines)"
840     w289193423      station "Whitechapel"
840     n2426825788     station "Whitechapel"
820     w198353814      station "Highbury & Islington"
623     n307542330      tram_stop       "Wimbledon"
623     n18089211       station "Wimbledon"
621     r3791635        station "Willesden Junction"
620     n18036791       station "Poplar"

It's very similar to the last one, except we've lost West Ham, Canning Town, Wandsworth Common and Canada Water and gained Wimbledon (x2), Willesden Junction and Poplar. Which, on the whole, seems like an improvement.

I'll start porting this to SQL so we can try it out on dev.

@nvkelso
Copy link
Member Author

nvkelso commented Mar 11, 2016

Related: add osm_site_relation: #590

@nvkelso
Copy link
Member Author

nvkelso commented Mar 11, 2016

Related: #587.

@zerebubuth
Copy link
Member

Progress dump.

Note: For the screenshots below, I had to turn off all filtering of stations by area, and drop the min_zoom for all station queries to 10 (see 1a2f07a). This might not be such a bad thing, as stations are relatively rare and we can drop them based on kind_tile_rank before the client ever sees them at lower zooms. Alternatively, we can base the min_zoom calculation on the transit score instead.

London, z11

506-z11-london-1

London, z12

506-z12-london-1

Top 20 data:

London

 typ  | osm_id   | score | root_relation_id |           name           
------+------------+-------+------------------+--------------------------
 node |  441947584 |   961 |          1572442 | Victoria
 node | 3768010552 |   961 |          1572442 | London Victoria
 way  |   72570259 |   920 |          3803730 | Queen's Park
 node | 3663368460 |   920 |          1569858 | London Euston
 node |  963506307 |   920 |           199275 | Wembley Central
 way  |  140162700 |   920 |           297366 | Stratford
 node |   10287073 |   920 |           199275 | Wembley Central
 node | 3663368461 |   920 |          1569858 | Euston
 way  |  289193423 |   840 |          3155934 | Whitechapel
 node | 2426825788 |   840 |          3155934 | Whitechapel
 way  |  198353814 |   820 |           206255 | Highbury & Islington
 node | 3662847634 |   663 |          1569873 | London St. Pancras
 node |  119274464 |   663 |          1569873 | King's Cross St. Pancras
 way  |    4959629 |   663 |          1571791 | Liverpool Street
 node | 3637436408 |   663 |          1571791 | London Liverpool Street
 way  |   -3791635 |   621 |          3791636 | Willesden Junction
 way  |   -3791635 |   621 |          3791636 | Willesden Junction
 node |   18036779 |   620 |                  | Westferry
 node |   18036791 |   620 |                  | Poplar
 node |   18020039 |   620 |                  | Limehouse

A couple of odd things here:

  1. The high rank of Queen's Park is because it has a part-time stop on a national rail route. As far as I can tell, this "part-time" is basically never, so this might be considered a data issue. Wembley is similar: the station node is attached to the railway, along which these national rail routes run, but they don't actually stop at Wembley. One fix for this might be stricter about the roles via which routes are discovered, but this would have unwanted side-effects on other stations.
  2. The high rank of the DLR stations (Westferry, Poplar, Limehouse) is because 3 of the DLR routes were tagged as being rail routes. I've since fixed the data to make them light_rail routes instead.

NYC

 typ  |   osm_id   | score | root_relation_id |                       name                        
------+------------+-------+------------------+---------------------------------------------------
 node |  895371274 |   902 |                  | New York Penn Station
 way  |  265947358 |   302 |          4637816 | Grand Central Terminal
 node | 2024882763 |   260 |                  | 14th Street (1,2,3)
 node |  597928313 |    70 |                  | 59th Street - Columbus Circle (A,B,C,D,1)
 node |  544356374 |    70 |                  | 34th Street-Herald Square (B,D,F,M,N,Q,R)
 node | 2024979388 |    60 |          2917356 | 59th Street (4,5,6)
 node |  597928309 |    60 |                  | West 4th Street-Washington Square (A,B,C,D,E,F,M)
 node |  597928318 |    60 |          2917356 | 59th Street-Lexington Avenue (N,Q,R)
 node |  591998010 |    50 |          2917492 | 53rd Street-Lexington Avenue (E,M)
 node |  597928319 |    50 |          2917492 | 51st Street (6)
 node |  597928317 |    50 |                  | 23rd Street (F,M,PATH)
 node |  591997657 |    50 |                  | 42nd Street-Grand Central (S,4,5,6,7)
 node |  597928308 |    50 |          3420745 | 14th Street (F,M,PATH)
 node |  597928312 |    40 |                  | Delancey Street-Essex Street (F,J,M,Z)
 node |  597928316 |    40 |                  | 42nd Street - Bryant Park (B,D,F,M)
 node |  597928314 |    40 |                  | 47th-50th Streets - Rockefeller Center (B,D,F,M)
 node | 1692394907 |    40 |                  | Fulton Street-Broadway Nassau (A,C,J,Z)
 node | 2013903253 |    40 |          3420790 | 14th Street-8th Avenue (A,C,E,L)
 node |  591995518 |    40 |                  | Broadway-Lafayette Street (B,D,F,M)
 node | 2052618474 |    40 |          3420670 | 14th Street-Union Square (4,5,6)

NYC doesn't have many station relations, so it looks like it's mostly sorting the mainline stations first. The only interchange station is 14th Street, because of this suspect-looking relation. @rmarianski - are there any non-subway routes running through 14th Street?

San Francisco

 typ  |   osm_id   | score | root_relation_id |                  name                   
------+------------+-------+------------------+-----------------------------------------
 way  |   28295229 |   200 |                  | San Francisco 4th & King Street Station
 node | 2160213364 |   100 |                  | 22nd Street
 node | 2160213345 |   100 |                  | Bayshore
 node | 1723738813 |    60 |                  | Embarcadero
 node |  297863037 |    60 |                  | Van Ness Muni
 node |  313885839 |    40 |                  | West Portal Muni
 node |  297863017 |    40 |          1069465 | Castro Muni
 node |  301506011 |    40 |          1069835 | Church St Muni
 node | 1564479547 |    40 |                  | West Portal Muni
 way  |  159782838 |     0 |                  | Balboa Park
 way  |  101546361 |     0 |                  | Daly City
 way  |  136822046 |     0 |                  | Forest Hill
 node | 2150077208 |     0 |                  | 24th Street Mission
 way  |  132246043 |     0 |                  | Glen Park
 node | 2150077196 |     0 |                  | Civic Center / UN Plaza
 node |  301506348 |     0 |                  | Forest Hill Muni
 node | 2150077204 |     0 |                  | Powell Street
 node | 2150077198 |     0 |                  | Embarcadero
 node | 2150077201 |     0 |                  | Montgomery Street
 node | 2150077206 |     0 |                  | 16th Street Mission

There's a lot of opportunity for improvement in San Francisco!

@nvkelso
Copy link
Member Author

nvkelso commented Mar 24, 2016

Change is looking fantastic in NYC, London, Paris! As you note, SF is so-so but that's because it needs more linking-up on the data side. There aren't any regressions over v0.8 prod so we're good.

One question:

I expected two Waterloo stations, one for rail, one for subway in zoom 16 tile, but I don't see the subway station (3638795618). Are we too aggressively de-duplicating features? Perhaps they split apart at zoom 17 when originally implemented in the earlier release, but now that we're "meta-tiling" at zoom 16 we need to adjust the queries.yaml end_zoom? Starting at zoom 14 I expect to see individual features. Is it as simple as queries.yaml, or does other logic need to change?

screen shot 2016-03-24 at 02 12 43

@zerebubuth
Copy link
Member

I think zoom 14 is far too early to start showing individual features. At zoom 14, the two Waterloo station points are 14px apart, which is just enough for two confusingly tiny 12px icons to be drawn (but neither labelled). Even if one had a label, it would be confusing as to which point was being labelled.

Zoom 15 is not much better, and I think it's probably best to split the features at z16. I'm not quite sure whether you're saying that's never happening since we stopped generating >z16 tiles or not?

@nvkelso
Copy link
Member Author

nvkelso commented Mar 24, 2016

Sure, let's give zoom 15 try (to stop dedup'ing). There's certainly room to place the multiple icons by then, and it looks weird / broken to not show them when everyone else does (labeling is labeling, shrug). Looks like the same problem is happening at Euston?

screen shot 2016-03-24 at 08 18 54

screen shot 2016-03-24 at 08 16 14

screen shot 2016-03-24 at 08 16 05

screen shot 2016-03-24 at 08 14 05

screen shot 2016-03-24 at 08 13 47

screen shot 2016-03-24 at 08 13 34

@nvkelso
Copy link
Member Author

nvkelso commented Mar 24, 2016

Everything else looks good, and this isn't causing a regression. Let's pick up the zoom 15+ stuff in the next release with #637.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants