Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parsers) Parser registry #723

Closed
wants to merge 4 commits into from
Closed

feat(parsers) Parser registry #723

wants to merge 4 commits into from

Conversation

mbredif
Copy link
Collaborator

@mbredif mbredif commented Apr 11, 2018

Description

This PR introduces a registry of format parsers in the scheduler, similar to the protocol registry that is already there.

Motivation and Context

The goal is to decouple format parsing from the providers, so that protocols, formats and styling may be mixed and matched (instead of current providers that hardcode formats).

(extracted from #705)

@mbredif mbredif mentioned this pull request Apr 11, 2018
@mbredif mbredif changed the title Parser registry feat(parsers) Parser registry Apr 11, 2018
Copy link
Contributor

@zarov zarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it doesn't really do nothing alone, as you said, I'm interested in having this merged. Thanks for this !

@@ -175,6 +202,18 @@ Scheduler.prototype.getProtocolProvider = function getProtocolProvider(protocol)
return this.providers[protocol];
};


Scheduler.prototype.addFormatParser = function addParser(format, parser) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not take an array as format, so we could batch a lot of format into one parser ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on this... parser is provided provided as a class rather than an instance so the change is only cosmetic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, parsers now have mimetypes, extensions and format strings that are all used as registration keys

import B3dmParser from '../../Parser/B3dmParser';
import PotreeBinParser from '../../Parser/PotreeBinParser';
import PotreeCinParser from '../../Parser/PotreeCinParser';
// import GLTFLoader from '../../Parser/GLTFLoader';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to support it before merging ? If not, please remove those comments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will remove them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@autra
Copy link
Contributor

autra commented Apr 12, 2018

What I meant by "parser needs more work" in the other PR: as we create a registry, we should use it in other providers instead of referencing parsers directly. This will allow to be able to use different parsers in a provider according to the needed output format. This work should be done in this PR to avoid an inconsistent state in the codebase (I don't expect it to be too complicated anyway). Thanks!

@zarov
Copy link
Contributor

zarov commented Apr 12, 2018

we should use it in other providers instead of referencing parsers directly.

Even better: why not moving them out of providers, and have something like this:

return provider.executeCommand(cmd)
    .then(blob => parser.parse(blob, options))
    .then((result) => {
...

The problem here could be that we don't have anything to parse for texture, but a "fake" parser for image could be created: parser = { parse: blob => blob };

I can also see (later on) having the Fetcher getting introduced between the first and second line ;)

@zarov
Copy link
Contributor

zarov commented Apr 12, 2018

I quickly added my thoughts here zarov@cc623b6

@elemoine
Copy link
Contributor

This PR uses the scheduler as registry for parsers, but it doesn't use the scheduler for the parsing itself. That doesn't make too much sense to me.

I quickly added my thoughts here zarov/itowns@cc623b6

That may be just me but I actually don't see the value of doing the parsing in the scheduler. What is the benefit compared to doing it in the provider? I understand that the core thing of the scheduler is the priority queue. Here for the parsing the priority queue is not used, so what's the point of doing the parsing within the scheduler?

@zarov
Copy link
Contributor

zarov commented Apr 12, 2018

@elemoine the idea is to separate more the things we currently have in itowns: instead of adding everything in providers, we could have a simpler chain, managed here in the scheduler.

Provider -> (Fetcher ->) Parser -> Stylizer

I think having those blocks called somewhere will be a pain to maintain later, and moving it all outside of providers makes things easier. See the discussion in #182

@elemoine
Copy link
Contributor

I think having those blocks called somewhere will be a pain to maintain later, and moving it all outside of providers makes things easier. See the discussion in #182

Thanks for pointing me to #182. I'll go ahead and read that before anything.

@autra
Copy link
Contributor

autra commented Apr 12, 2018

To add on what @zarov said: parser are also good candidates for workers. If we start using workers, we'll need to schedule call to them too. This is a step is this direction.

@mbredif
Copy link
Collaborator Author

mbredif commented Apr 12, 2018

What I meant by "parser needs more work" in the other PR: as we create a registry, we should use it in other providers instead of referencing parsers directly. This will allow to be able to use different parsers in a provider according to the needed output format. This work should be done in this PR to avoid an inconsistent state in the codebase (I don't expect it to be too complicated anyway). Thanks!

As I anticipated, that means it turns into a big PR, but you asked for it...

  • PointcloudProvider and WFSProvider now use the parser registry
  • the parsers I updated (Potree CIN/BIN, geojson, gpx, PNTS) have metadata beyond the parse function : strings for format, mimetypes, extensions (that provides an collection of registry keys as suggested by @zarov) but also a fetchtype to select the Fetcher method)

I will probably not have time in the near future to go much further on this PR, so do not ask me to cover all providers and parsers :). In the meantime, I created a branch for using the format registry on 3dtiles : format_registry_3dtiles. I think it works but I have not tested it much and changes are not trivial so it may deserve its own PR.

@peppsac
Copy link
Contributor

peppsac commented Apr 13, 2018

RasterProvider.js has a lot of parsing code: don't you think it would be a good idea to move all of it out of this provider?
(also RasterProvider implements a gpx parser instead of using GpxParser)

@mbredif
Copy link
Collaborator Author

mbredif commented Apr 13, 2018

so do not ask me to cover all providers and parsers :)

:)

Yes of course, but I sincerely think raster provider is obsolete as it contains too much hardcoded stuff and that the existing functionnality of rasterizing single files will be superseded by a properly configured FileProvider in the same way that the new functionnality of rasterizing WFS features will be offered by a properly configured WfsProvider . So that will be treated in another PR once this PR and #705 are merged. (this PR is already big enough), similar to the upgrade of the 3dTilesProvider.

Copy link
Collaborator Author

@mbredif mbredif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(comments of @zarov have been addressed)

import B3dmParser from '../../Parser/B3dmParser';
import PotreeBinParser from '../../Parser/PotreeBinParser';
import PotreeCinParser from '../../Parser/PotreeCinParser';
// import GLTFLoader from '../../Parser/GLTFLoader';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -175,6 +202,18 @@ Scheduler.prototype.getProtocolProvider = function getProtocolProvider(protocol)
return this.providers[protocol];
};


Scheduler.prototype.addFormatParser = function addParser(format, parser) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, parsers now have mimetypes, extensions and format strings that are all used as registration keys

@peppsac
Copy link
Contributor

peppsac commented Apr 13, 2018

Do we even want to introduce a FileProvider?

At the last codesprint we exchanged on iTowns' API and the need of moving away from the current layer/provider design and instead implement something similar to OpenLayers or vector-tiles = splitting sources and layers.

So IMHO we should work toward this goal instead of adding new features built on the current broken layer/provider API.

@mbredif
Copy link
Collaborator Author

mbredif commented Apr 13, 2018

We definitely agree that legacy providers have to be removed/reworked/replaced and that the provider/layer API should be redesigned.

To stay on topic here, I encourage you to use more relevant PR/issues :

Putting that aside, do you have any thoughts on the scope of this PR, which is introducing a parser registry, normalizing the parser interface and using it in the providers of geometry layers ? (putting aside 3dtiles to limit the size of the PR, and the TileProvider which is not reading from any source as far as I understand)

Copy link
Contributor

@zarov zarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbredif if you don't have time to work on this, I'll gladly take over, in another branch and PR (as I'm really interested to see more advance on this as I said). It needs more work here, in particularly in documentation.

that.parsers[format].push(parser);
}
register(parser.format);
parser.extensions.forEach(register);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be simple and only rely on format. We ditched mimetypes in #597, and I fail to see how we could benefit by having extensions and mimetypes on top of format.

Copy link
Collaborator Author

@mbredif mbredif Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so sure either, but these 3 (extensions, mimetypes, format) seem complementary :

  • extension may be the only thing you get when you are handed over a file (by url or drag n drop)
  • mimetype is needed by wfs (in the query string)
  • mimetype may be provided in the response header of fetches
  • format is for me the name of the parser given by itowns, with more uniform naming conventions
  • ...

What might still be missing is some kind of accept function that analyse the file content (header?) to determine whether it is an acceptable file to parse (eg reading the first 4 bytes of tiles in 3dTiles formats) .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extension may be the only thing you get when you are handed over a file (by url or drag n drop)

Extension is not really reliable, and can be confusing: a xml file can be a KML, GPX or even something else.

mimetype is needed by wfs (in the query string)

It isn't really a mimetype, see for example the (usecase in GeoServer)[http://docs.geoserver.org/stable/en/user/services/wfs/outputformats.html]. But I see the necessity here.

mimetype may be provided in the response header of fetches

Yes, but then we should maybe rework on the whole format/mimetype thing.

format is for me the name of the parser given by itowns, with more uniform naming conventions

Agreed, that's why I think it should only be registered with the format option.

Copy link
Contributor

@peppsac peppsac Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @zarov : I'd rather not have 3 different ways of declaring a format.

What we could have, though, is a format detector: if the format isn't explicitely specified, we can try to deduce from: url, filename, magic bytes, etc (something similar to what is done in 3dTileProvider and RasterProvider)

@@ -256,6 +274,29 @@ Scheduler.prototype.getProtocolProvider = function getProtocolProvider(protocol)
return this.providers[protocol];
};


Scheduler.prototype.addFormatParser = function addFormatParser(parser) {
// eslint-disable-next-line no-console
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not forget to remove the log before merging, you should remove this comment imho ;)

@mbredif
Copy link
Collaborator Author

mbredif commented Apr 26, 2018

The Parser normalization part of this PR is for me compulsory, but I must confess that I am now not entirely sold at a parser registry maintained by the scheduler. In fact I started on the premise that the protocol provider registry was a good thing, but I am not sure of it any more : if we get rid of JSON layers, we could ask the user code to create protocol and parser objects and hand them over in the layer options which would not require any protocol registry. I will open soon a new issue to discuss this.

For now, a middle ground could be to have each provider have its own registry as a simple array of default parsers in the preprocess function, but leave the opportunity to pass in the layer options a parser or an array of parsers that would override the default array of parsers defined by the provider. This way, we would still have coupling of a provider with its default parsers, but the goal is to make it strictly limited to the import of default parsers and their instantiation in the default array of parsers.

WDYT?

I'll gladly take over

Please go ahead 😀 .

@zarov
Copy link
Contributor

zarov commented Apr 26, 2018

we could ask the user code to create protocol and parser objects and hand them over in the layer options which would not require any protocol registry

It is an option indeed, and then we wouldn't have to maintain a list of relation between format/mimetype and parsers.

a middle ground could be to have each provider have its own registry as a simple array of default parsers in the preprocess function

I feel that going to this is like the current supportedFormats behavior we have in providers: it would be an improvement, but a tiny one.

a parser registry maintained by the scheduler

It is to me the best solution, everything assembled in one place, the Scheduler, based on the configuration given by the user. We could drop the list of default providers and let the user decide which one to use with their layer. But I really don't think we will benefit by spliting everything in multiple class.

@mbredif
Copy link
Collaborator Author

mbredif commented Apr 26, 2018

Let me rephrase myself to show that I am sure we are already on the same page 😃 .
I do not care much about the registration key (which you propose to be only format instead of what this PR does with multiple registrations). In fact, I am now proposing that this registry may be a simple array of parsers, ordered by priority ? All parsers may then expose relevant metadata so that a generic format detector may be used (mimetype, possible extension, accept function that checks the file header for e.g. magic bytes ... ).

I agree that file extension only is not reliable (or even sometimes against the spec as for 3dtiles), but sometimes it works and some other times it may still allow to filter out easily some parsers.

It isn't really a mimetype, see for example the (usecase in GeoServer)

In this case, maybe parsers with mentions in the wfs spec may have an optional 'wfsOutputFormat' string. (maybe with some defaults for others)

I feel that going to this is like the current supportedFormats behavior we have in providers: it would be an improvement, but a tiny one. it would be an improvement, but a tiny one.

I agree, it is a small step to prepare for an upcoming more drastic decoupling that would require more discussion. but if there is already an agreement for a centralized parser registry, then go for it !

@mbredif mbredif mentioned this pull request Apr 26, 2018
zarov added a commit to zarov/itowns that referenced this pull request May 25, 2018
Following iTowns#723 and the model of Providers, a Parser registry has been
partially added to the Scheduler. You can now register a Parser using
scheduler.addFormatParser(formatName, parser).

In the Scheduler queue, the Parser will be called right after the
Provider. It takes the datablob returned by the Provider, parses it, and
return it.

The parser selection relies on the format setted in the layer. For
layers that don't have a format, or have a format that is not supported,
a fake parser, returning immediatly the blob, is setted.

This commit introduces three parsers: GeoJson, Gpx and Kml. This allows
to refactor a bit the RasterProvider, in hope of letting go of it in the
future.
zarov added a commit to zarov/itowns that referenced this pull request May 25, 2018
Following iTowns#723 and the model of Providers, a Parser registry has been
partially added to the Scheduler. You can now register a Parser using
scheduler.addFormatParser(formatName, parser).

In the Scheduler queue, the Parser will be called right after the
Provider. It takes the datablob returned by the Provider, parses it, and
return it.

The parser selection relies on the format setted in the layer. For
layers that don't have a format, or have a format that is not supported,
a fake parser, returning immediatly the blob, is setted.

This commit introduces three parsers: GeoJson, Gpx and Kml. This allows
to refactor a bit the RasterProvider, in hope of letting go of it in the
future.

BREAKING CHANGE: GpxParser doesn't return a THREE.Mesh anymore.
zarov added a commit to zarov/itowns that referenced this pull request Jun 11, 2018
Following iTowns#723 and the model of Providers, a Parser registry has been
partially added to the Scheduler. You can now register a Parser using
scheduler.addFormatParser(formatName, parser).

In the Scheduler queue, the Parser will be called right after the
Provider. It takes the datablob returned by the Provider, parses it, and
return it.

The parser selection relies on the format setted in the layer. For
layers that don't have a format, or have a format that is not supported,
a fake parser, returning immediatly the blob, is setted.

This commit introduces three parsers: GeoJson, Gpx and Kml. This allows
to refactor a bit the RasterProvider, in hope of letting go of it in the
future.

BREAKING CHANGE: GpxParser doesn't return a THREE.Mesh anymore.
zarov added a commit to zarov/itowns that referenced this pull request Jun 11, 2018
Following iTowns#723 and the model of Providers, a Parser registry has been
partially added to the Scheduler. You can now register a Parser using
scheduler.addFormatParser(formatName, parser).

In the Scheduler queue, the Parser will be called right after the
Provider. It takes the datablob returned by the Provider, parses it, and
return it.

The parser selection relies on the format setted in the layer. For
layers that don't have a format, or have a format that is not supported,
a fake parser, returning immediatly the blob, is setted.

This commit introduces three parsers: GeoJson, Gpx and Kml. This allows
to refactor a bit the RasterProvider, in hope of letting go of it in the
future.

BREAKING CHANGE: GpxParser doesn't return a THREE.Mesh anymore.
zarov added a commit to zarov/itowns that referenced this pull request Jun 29, 2018
Following iTowns#723 and the model of Providers, a Parser registry has been
partially added to the Scheduler. You can now register a Parser using
scheduler.addFormatParser(formatName, parser).

In the Scheduler queue, the Parser will be called right after the
Provider. It takes the datablob returned by the Provider, parses it, and
return it.

The parser selection relies on the format setted in the layer. For
layers that don't have a format, or have a format that is not supported,
a fake parser, returning immediatly the blob, is setted.

This commit introduces three parsers: GeoJson, Gpx and Kml. This allows
to refactor a bit the RasterProvider, in hope of letting go of it in the
future.

BREAKING CHANGE: GpxParser doesn't return a THREE.Mesh anymore.
@zarov
Copy link
Contributor

zarov commented Jan 15, 2019

#966 covers this, we can close this PR

@zarov zarov closed this Jan 15, 2019
@zarov zarov deleted the format_registry branch June 18, 2019 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants