feat(parsers) Parser registry #723

mbredif · 2018-04-11T10:56:11Z

Description

This PR introduces a registry of format parsers in the scheduler, similar to the protocol registry that is already there.

Motivation and Context

The goal is to decouple format parsing from the providers, so that protocols, formats and styling may be mixed and matched (instead of current providers that hardcode formats).

(extracted from #705)

zarov

Even if it doesn't really do nothing alone, as you said, I'm interested in having this merged. Thanks for this !

zarov · 2018-04-11T13:58:49Z

src/Core/Scheduler/Scheduler.js

@@ -175,6 +202,18 @@ Scheduler.prototype.getProtocolProvider = function getProtocolProvider(protocol)
    return this.providers[protocol];
 };

+
+Scheduler.prototype.addFormatParser = function addParser(format, parser) {


Why not take an array as format, so we could batch a lot of format into one parser ?

No strong opinion on this... parser is provided provided as a class rather than an instance so the change is only cosmetic.

done, parsers now have mimetypes, extensions and format strings that are all used as registration keys

zarov · 2018-04-12T08:02:01Z

src/Core/Scheduler/Scheduler.js

+import B3dmParser from '../../Parser/B3dmParser';
+import PotreeBinParser from '../../Parser/PotreeBinParser';
+import PotreeCinParser from '../../Parser/PotreeCinParser';
+// import GLTFLoader from '../../Parser/GLTFLoader';


Do you plan to support it before merging ? If not, please remove those comments.

Ok, I will remove them

autra · 2018-04-12T08:11:38Z

What I meant by "parser needs more work" in the other PR: as we create a registry, we should use it in other providers instead of referencing parsers directly. This will allow to be able to use different parsers in a provider according to the needed output format. This work should be done in this PR to avoid an inconsistent state in the codebase (I don't expect it to be too complicated anyway). Thanks!

zarov · 2018-04-12T08:16:39Z

we should use it in other providers instead of referencing parsers directly.

Even better: why not moving them out of providers, and have something like this:

return provider.executeCommand(cmd)
    .then(blob => parser.parse(blob, options))
    .then((result) => {
...

The problem here could be that we don't have anything to parse for texture, but a "fake" parser for image could be created: parser = { parse: blob => blob };

I can also see (later on) having the Fetcher getting introduced between the first and second line ;)

zarov · 2018-04-12T08:38:55Z

I quickly added my thoughts here zarov@cc623b6

elemoine · 2018-04-12T09:06:07Z

This PR uses the scheduler as registry for parsers, but it doesn't use the scheduler for the parsing itself. That doesn't make too much sense to me.

I quickly added my thoughts here zarov/itowns@cc623b6

That may be just me but I actually don't see the value of doing the parsing in the scheduler. What is the benefit compared to doing it in the provider? I understand that the core thing of the scheduler is the priority queue. Here for the parsing the priority queue is not used, so what's the point of doing the parsing within the scheduler?

zarov · 2018-04-12T09:11:31Z

@elemoine the idea is to separate more the things we currently have in itowns: instead of adding everything in providers, we could have a simpler chain, managed here in the scheduler.

Provider -> (Fetcher ->) Parser -> Stylizer

I think having those blocks called somewhere will be a pain to maintain later, and moving it all outside of providers makes things easier. See the discussion in #182

elemoine · 2018-04-12T09:23:34Z

I think having those blocks called somewhere will be a pain to maintain later, and moving it all outside of providers makes things easier. See the discussion in #182

Thanks for pointing me to #182. I'll go ahead and read that before anything.

autra · 2018-04-12T12:20:12Z

To add on what @zarov said: parser are also good candidates for workers. If we start using workers, we'll need to schedule call to them too. This is a step is this direction.

mbredif · 2018-04-12T17:32:55Z

What I meant by "parser needs more work" in the other PR: as we create a registry, we should use it in other providers instead of referencing parsers directly. This will allow to be able to use different parsers in a provider according to the needed output format. This work should be done in this PR to avoid an inconsistent state in the codebase (I don't expect it to be too complicated anyway). Thanks!

As I anticipated, that means it turns into a big PR, but you asked for it...

PointcloudProvider and WFSProvider now use the parser registry
the parsers I updated (Potree CIN/BIN, geojson, gpx, PNTS) have metadata beyond the parse function : strings for format, mimetypes, extensions (that provides an collection of registry keys as suggested by @zarov) but also a fetchtype to select the Fetcher method)

I will probably not have time in the near future to go much further on this PR, so do not ask me to cover all providers and parsers :). In the meantime, I created a branch for using the format registry on 3dtiles : format_registry_3dtiles. I think it works but I have not tested it much and changes are not trivial so it may deserve its own PR.

peppsac · 2018-04-13T07:47:00Z

RasterProvider.js has a lot of parsing code: don't you think it would be a good idea to move all of it out of this provider?
(also RasterProvider implements a gpx parser instead of using GpxParser)

mbredif · 2018-04-13T07:53:30Z

so do not ask me to cover all providers and parsers :)

:)

Yes of course, but I sincerely think raster provider is obsolete as it contains too much hardcoded stuff and that the existing functionnality of rasterizing single files will be superseded by a properly configured FileProvider in the same way that the new functionnality of rasterizing WFS features will be offered by a properly configured WfsProvider . So that will be treated in another PR once this PR and #705 are merged. (this PR is already big enough), similar to the upgrade of the 3dTilesProvider.

mbredif

(comments of @zarov have been addressed)

mbredif · 2018-04-13T08:01:27Z

src/Core/Scheduler/Scheduler.js

+import B3dmParser from '../../Parser/B3dmParser';
+import PotreeBinParser from '../../Parser/PotreeBinParser';
+import PotreeCinParser from '../../Parser/PotreeCinParser';
+// import GLTFLoader from '../../Parser/GLTFLoader';


mbredif · 2018-04-13T08:02:20Z

src/Core/Scheduler/Scheduler.js

@@ -175,6 +202,18 @@ Scheduler.prototype.getProtocolProvider = function getProtocolProvider(protocol)
    return this.providers[protocol];
 };

+
+Scheduler.prototype.addFormatParser = function addParser(format, parser) {


done, parsers now have mimetypes, extensions and format strings that are all used as registration keys

peppsac · 2018-04-13T08:06:33Z

Do we even want to introduce a FileProvider?

At the last codesprint we exchanged on iTowns' API and the need of moving away from the current layer/provider design and instead implement something similar to OpenLayers or vector-tiles = splitting sources and layers.

So IMHO we should work toward this goal instead of adding new features built on the current broken layer/provider API.

mbredif · 2018-04-13T10:25:25Z

We definitely agree that legacy providers have to be removed/reworked/replaced and that the provider/layer API should be redesigned.

To stay on topic here, I encourage you to use more relevant PR/issues :

Split providers into protocols, formatters and symbolizers #695 (or a new one) to discuss the (admittedly needed) reworking of the provider/layer API. Be sure I will join this discussion.
FileProvider #705 : FileProvider discussion.

Putting that aside, do you have any thoughts on the scope of this PR, which is introducing a parser registry, normalizing the parser interface and using it in the providers of geometry layers ? (putting aside 3dtiles to limit the size of the PR, and the TileProvider which is not reading from any source as far as I understand)

zarov

@mbredif if you don't have time to work on this, I'll gladly take over, in another branch and PR (as I'm really interested to see more advance on this as I said). It needs more work here, in particularly in documentation.

zarov · 2018-04-24T13:52:40Z

src/Core/Scheduler/Scheduler.js

+        that.parsers[format].push(parser);
+    }
+    register(parser.format);
+    parser.extensions.forEach(register);


I think it should be simple and only rely on format. We ditched mimetypes in #597, and I fail to see how we could benefit by having extensions and mimetypes on top of format.

I am not so sure either, but these 3 (extensions, mimetypes, format) seem complementary :

extension may be the only thing you get when you are handed over a file (by url or drag n drop)

mimetype is needed by wfs (in the query string)

mimetype may be provided in the response header of fetches

format is for me the name of the parser given by itowns, with more uniform naming conventions

...

What might still be missing is some kind of accept function that analyse the file content (header?) to determine whether it is an acceptable file to parse (eg reading the first 4 bytes of tiles in 3dTiles formats) .

extension may be the only thing you get when you are handed over a file (by url or drag n drop)

Extension is not really reliable, and can be confusing: a xml file can be a KML, GPX or even something else.

mimetype is needed by wfs (in the query string)

It isn't really a mimetype, see for example the (usecase in GeoServer)[http://docs.geoserver.org/stable/en/user/services/wfs/outputformats.html]. But I see the necessity here.

mimetype may be provided in the response header of fetches

Yes, but then we should maybe rework on the whole format/mimetype thing.

format is for me the name of the parser given by itowns, with more uniform naming conventions

Agreed, that's why I think it should only be registered with the format option.

I'm with @zarov : I'd rather not have 3 different ways of declaring a format.

What we could have, though, is a format detector: if the format isn't explicitely specified, we can try to deduce from: url, filename, magic bytes, etc (something similar to what is done in 3dTileProvider and RasterProvider)

zarov · 2018-04-24T13:53:29Z

src/Core/Scheduler/Scheduler.js

@@ -256,6 +274,29 @@ Scheduler.prototype.getProtocolProvider = function getProtocolProvider(protocol)
    return this.providers[protocol];
 };

+
+Scheduler.prototype.addFormatParser = function addFormatParser(parser) {
+    // eslint-disable-next-line no-console


To not forget to remove the log before merging, you should remove this comment imho ;)

mbredif · 2018-04-26T10:39:09Z

The Parser normalization part of this PR is for me compulsory, but I must confess that I am now not entirely sold at a parser registry maintained by the scheduler. In fact I started on the premise that the protocol provider registry was a good thing, but I am not sure of it any more : if we get rid of JSON layers, we could ask the user code to create protocol and parser objects and hand them over in the layer options which would not require any protocol registry. I will open soon a new issue to discuss this.

For now, a middle ground could be to have each provider have its own registry as a simple array of default parsers in the preprocess function, but leave the opportunity to pass in the layer options a parser or an array of parsers that would override the default array of parsers defined by the provider. This way, we would still have coupling of a provider with its default parsers, but the goal is to make it strictly limited to the import of default parsers and their instantiation in the default array of parsers.

WDYT?

I'll gladly take over

Please go ahead 😀 .

zarov · 2018-04-26T11:35:35Z

we could ask the user code to create protocol and parser objects and hand them over in the layer options which would not require any protocol registry

It is an option indeed, and then we wouldn't have to maintain a list of relation between format/mimetype and parsers.

a middle ground could be to have each provider have its own registry as a simple array of default parsers in the preprocess function

I feel that going to this is like the current supportedFormats behavior we have in providers: it would be an improvement, but a tiny one.

a parser registry maintained by the scheduler

It is to me the best solution, everything assembled in one place, the Scheduler, based on the configuration given by the user. We could drop the list of default providers and let the user decide which one to use with their layer. But I really don't think we will benefit by spliting everything in multiple class.

mbredif · 2018-04-26T12:35:09Z

Let me rephrase myself to show that I am sure we are already on the same page 😃 .
I do not care much about the registration key (which you propose to be only format instead of what this PR does with multiple registrations). In fact, I am now proposing that this registry may be a simple array of parsers, ordered by priority ? All parsers may then expose relevant metadata so that a generic format detector may be used (mimetype, possible extension, accept function that checks the file header for e.g. magic bytes ... ).

I agree that file extension only is not reliable (or even sometimes against the spec as for 3dtiles), but sometimes it works and some other times it may still allow to filter out easily some parsers.

It isn't really a mimetype, see for example the (usecase in GeoServer)

In this case, maybe parsers with mentions in the wfs spec may have an optional 'wfsOutputFormat' string. (maybe with some defaults for others)

I feel that going to this is like the current supportedFormats behavior we have in providers: it would be an improvement, but a tiny one. it would be an improvement, but a tiny one.

I agree, it is a small step to prepare for an upcoming more drastic decoupling that would require more discussion. but if there is already an agreement for a centralized parser registry, then go for it !

Following iTowns#723 and the model of Providers, a Parser registry has been partially added to the Scheduler. You can now register a Parser using scheduler.addFormatParser(formatName, parser). In the Scheduler queue, the Parser will be called right after the Provider. It takes the datablob returned by the Provider, parses it, and return it. The parser selection relies on the format setted in the layer. For layers that don't have a format, or have a format that is not supported, a fake parser, returning immediatly the blob, is setted. This commit introduces three parsers: GeoJson, Gpx and Kml. This allows to refactor a bit the RasterProvider, in hope of letting go of it in the future.

Following iTowns#723 and the model of Providers, a Parser registry has been partially added to the Scheduler. You can now register a Parser using scheduler.addFormatParser(formatName, parser). In the Scheduler queue, the Parser will be called right after the Provider. It takes the datablob returned by the Provider, parses it, and return it. The parser selection relies on the format setted in the layer. For layers that don't have a format, or have a format that is not supported, a fake parser, returning immediatly the blob, is setted. This commit introduces three parsers: GeoJson, Gpx and Kml. This allows to refactor a bit the RasterProvider, in hope of letting go of it in the future. BREAKING CHANGE: GpxParser doesn't return a THREE.Mesh anymore.

zarov · 2019-01-15T10:24:35Z

#966 covers this, we can close this PR

mbredif mentioned this pull request Apr 11, 2018

FileProvider #705

Closed

mbredif force-pushed the format_registry branch from 099dd2f to e7f21ce Compare April 11, 2018 11:02

mbredif changed the title ~~Parser registry~~ feat(parsers) Parser registry Apr 11, 2018

zarov reviewed Apr 12, 2018

View reviewed changes

mbredif force-pushed the format_registry branch from fbd4c03 to d4806c1 Compare April 12, 2018 19:35

mbredif added the ready label Apr 12, 2018

mbredif commented Apr 13, 2018

View reviewed changes

mbredif force-pushed the format_registry branch from d4806c1 to bb44641 Compare April 14, 2018 07:41

mbredif added 4 commits April 16, 2018 14:06

feat(parsers) adds a parser registry to the scheduler

81d8ee9

feat(providers): potree provider usarsers and providers

3780a4e

refactor(parsers): GpxParser output normalization

13bdb68

refactor(parsers): geojson and pnts parser, metadata in parsers

9a9f019

mbredif force-pushed the format_registry branch from bb44641 to 9a9f019 Compare April 16, 2018 12:11

zarov suggested changes Apr 24, 2018

View reviewed changes

mbredif mentioned this pull request Apr 26, 2018

feat(core): add vector tile loading in TMS #710

Merged

mbredif mentioned this pull request Apr 26, 2018

Role of StaticProvider #739

Closed

zarov mentioned this pull request May 25, 2018

feat(core): add a Parser registry #766

Closed

zarov closed this Jan 15, 2019

zarov deleted the format_registry branch June 18, 2019 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parsers) Parser registry #723

feat(parsers) Parser registry #723

mbredif commented Apr 11, 2018 •

edited

Loading

zarov left a comment

zarov Apr 11, 2018

mbredif Apr 12, 2018

mbredif Apr 13, 2018

zarov Apr 12, 2018

mbredif Apr 12, 2018

mbredif Apr 13, 2018

autra commented Apr 12, 2018

zarov commented Apr 12, 2018 •

edited

Loading

zarov commented Apr 12, 2018

elemoine commented Apr 12, 2018

zarov commented Apr 12, 2018

elemoine commented Apr 12, 2018

autra commented Apr 12, 2018

mbredif commented Apr 12, 2018 •

edited

Loading

peppsac commented Apr 13, 2018

mbredif commented Apr 13, 2018 •

edited

Loading

mbredif left a comment

mbredif Apr 13, 2018

mbredif Apr 13, 2018

peppsac commented Apr 13, 2018

mbredif commented Apr 13, 2018

zarov left a comment •

edited

Loading

zarov Apr 24, 2018

mbredif Apr 26, 2018 •

edited

Loading

zarov Apr 26, 2018

peppsac Apr 26, 2018 •

edited

Loading

zarov Apr 24, 2018

mbredif commented Apr 26, 2018

zarov commented Apr 26, 2018

mbredif commented Apr 26, 2018

zarov commented Jan 15, 2019

feat(parsers) Parser registry #723

feat(parsers) Parser registry #723

Conversation

mbredif commented Apr 11, 2018 • edited Loading

Description

Motivation and Context

zarov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

autra commented Apr 12, 2018

zarov commented Apr 12, 2018 • edited Loading

zarov commented Apr 12, 2018

elemoine commented Apr 12, 2018

zarov commented Apr 12, 2018

elemoine commented Apr 12, 2018

autra commented Apr 12, 2018

mbredif commented Apr 12, 2018 • edited Loading

peppsac commented Apr 13, 2018

mbredif commented Apr 13, 2018 • edited Loading

mbredif left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peppsac commented Apr 13, 2018

mbredif commented Apr 13, 2018

zarov left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbredif Apr 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peppsac Apr 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbredif commented Apr 26, 2018

zarov commented Apr 26, 2018

mbredif commented Apr 26, 2018

zarov commented Jan 15, 2019

mbredif commented Apr 11, 2018 •

edited

Loading

zarov commented Apr 12, 2018 •

edited

Loading

mbredif commented Apr 12, 2018 •

edited

Loading

mbredif commented Apr 13, 2018 •

edited

Loading

zarov left a comment •

edited

Loading

mbredif Apr 26, 2018 •

edited

Loading

peppsac Apr 26, 2018 •

edited

Loading