Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileLoader: text honors mimetype's charset as encoding #23292

Merged
merged 9 commits into from
Jan 24, 2022
Merged

Conversation

ycw
Copy link
Contributor

@ycw ycw commented Jan 21, 2022

Related issue: #23287

src/loaders/FileLoader.js Outdated Show resolved Hide resolved
src/loaders/FileLoader.js Outdated Show resolved Hide resolved
@Mugen87 Mugen87 changed the title FileLoader: honor encoding FileLoader: Honor mimeType. Jan 21, 2022
src/loaders/FileLoader.js Outdated Show resolved Hide resolved
@Mugen87 Mugen87 requested a review from gkjohnson January 21, 2022 14:25
@Mugen87 Mugen87 added this to the r137 milestone Jan 21, 2022
Copy link
Collaborator

@gkjohnson gkjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks great! I didn't know about this feature of TextDecoder. I've added some comments above. Also I found these links useful for understanding the mimetype format:

https://stackoverflow.com/questions/63909392/content-type-multiple-parameters

https://www.w3.org/Protocols/rfc1341/4_Content-Type.html

Edit: Also do you have a file to recommend trying testing this with?

src/loaders/FileLoader.js Outdated Show resolved Hide resolved
src/loaders/FileLoader.js Outdated Show resolved Hide resolved
src/loaders/FileLoader.js Outdated Show resolved Hide resolved
src/loaders/FileLoader.js Outdated Show resolved Hide resolved
@ycw ycw changed the title FileLoader: Honor mimeType. FileLoader: Support encoding for textual response Jan 21, 2022
@ycw
Copy link
Contributor Author

ycw commented Jan 22, 2022

I updated fileloader .setResponseType to accept encoding for text ( / mimetype for document ), and deprecated .setMimeType(), s.t. users are required to config the response ( type and "parse-method" ) in one place.

// for text 
loader.setResponseType( 'text', 'shift_jis' )

// for doc
loader.setResponseType( 'document', 'text/xml' )

// deprecated
loader.setResponseType( 'document' ).setMimeType( 'text/xml' )

I make this change because the old api is misleading, setmimetype is irrelevant to 'text', instead, mimetype was just used to config the domparser for 'document'.. you can see that mmdloader author also got confused:

loadVPD( url, isUnicode, onLoad, onProgress, onError ) {
const parser = this._getParser();
this.loader
.setMimeType( isUnicode ? undefined : 'text/plain; charset=shift_jis' )
.setPath( this.animationPath )
.setResponseType( 'text' )
.setRequestHeader( this.requestHeader )
.setWithCredentials( this.withCredentials )
.load( url, function ( text ) {
onLoad( parser.parseVpd( text, true ) );
}, onProgress, onError );
}

( he attempted to set encoding 'shift_jis' for a 'text' respond using mimetype... )

summon @gkjohnson @Mugen87

@ycw
Copy link
Contributor Author

ycw commented Jan 22, 2022

This change looks great! I didn't know about this feature of TextDecoder. I've added some comments above. Also I found these links useful for understanding the mimetype format:

https://stackoverflow.com/questions/63909392/content-type-multiple-parameters

https://www.w3.org/Protocols/rfc1341/4_Content-Type.html

Edit: Also do you have a file to recommend trying testing this with?

@gkjohnson https://jsfiddle.net/saue56qm/ ( http/1.1 https://httpwg.org/specs/rfc7231.html#representation.metadata )

@gkjohnson
Copy link
Collaborator

I updated fileloader .setResponseType to accept encoding for text ( / mimetype for document ), and deprecated .setMimeType(),

I'm not sure if fully follow the motivation behind the change and the code is a bit unclear to me with the difference in behavior between document and text in the function. Perhaps @mrdoob or @Mugen87 can chime in on the API change? I think I'd prefer to avoid them for this fix.

@takahirox
Copy link
Collaborator

@ycw

I make this change because the old api is misleading, setmimetype is irrelevant to 'text', instead, mimetype was just used to config the domparser for 'document'.. you can see that mmdloader author also got confused:

If I'm right, mimetype has an effect to text type response according to the spec...

https://xhr.spec.whatwg.org/#response-body

@Mugen87
Copy link
Collaborator

Mugen87 commented Jan 22, 2022

I vote for a solution which does not change the API.

@ycw
Copy link
Contributor Author

ycw commented Jan 22, 2022

@ycw

I make this change because the old api is misleading, setmimetype is irrelevant to 'text', instead, mimetype was just used to config the domparser for 'document'.. you can see that mmdloader author also got confused:

If I'm right, mimetype has an effect to text type response according to the spec...

https://xhr.spec.whatwg.org/#response-body

3js had changed to use fetch, and since then, response's mimetype is not handled at all

@takahirox
Copy link
Collaborator

takahirox commented Jan 22, 2022

3js had changed to use fetch, and since then, response's mimetype is not handled at all

Ah, I think I understand what you meant now.

you can see that mmdloader author also got confused: ( he attempted to set encoding 'shift_jis' for a 'text' respond using mimetype... )

I, MMDLoader author, just haven't updated the code yet since FileLoader started to use fetch.

@ycw
Copy link
Contributor Author

ycw commented Jan 22, 2022

I updated fileloader .setResponseType to accept encoding for text ( / mimetype for document ), and deprecated .setMimeType(),

I'm not sure if fully follow the motivation behind the change and the code is a bit unclear to me with the difference in behavior between document and text in the function. Perhaps @mrdoob or @Mugen87 can chime in on the API change? I think I'd prefer to avoid them for this fix.

Response kind should come w/ "pares method" (if needed), they should come in one go, e.g.

loader.setResponseType( 'text', 'shift_jis' ) // jp text
loader.setResponseType( 'document', 'image/svg+xml' ) // xmldocument
loader.setResponseType( 'blob', 'video/mp4' )  // mp4 blob
loader.setResponseType( 'arraybuffer' ) 
loader.setResponseType( 'json' )

current api is misleading to me, consider:

loader.setResponseType( 'arraybuffer' ).setMimeType( 'video/mp4' ) // blob? / ab ?
loader.setResponseType( 'text' ).setMimeType( 'image/svg+xml' ) // xmldocument? / plain text?

@takahirox
Copy link
Collaborator

takahirox commented Jan 22, 2022

By the way, do we really want mimeType and textEncoding configuration?

IIRC, the reason why .setMimeType() was introduced to FileLoader was MMDLoader needed it for loading VPD file which can be encoded as Shift_JIS. MMDLoader is a kind of an old loader. When it was first written, TextDecoder was not available on some major browsers like Chrome. Using mimeType was an only solution without external libraries.

But now TextDecoder is available on major borwsers. And decoding can be done in user side like

loader 
  .setResponseType('arraybuffer')
  .load(url, buffer => {
    const decoder = new TextDecoder(encodingType);
    const text = decoder.decode(buffer);
  });

or

loader 
  .setResponseType('text')
  .load(url, text => {
    const parser = new DOMParser();
    const doc = parser.parseFromString(text, mimeType);
  });

That means, regarding #23287, the problem can be fixed in MMDLoader.

And if I'm right, only using UTF-8 encoded resource is encouraged in JavaScript, so textEncoding should be rarely used.

Actually it was suggested that we may be able to remove setMimeType() and see if someone complains in #22510 (comment).

@ycw
Copy link
Contributor Author

ycw commented Jan 22, 2022

By the way, do we really want mimeType and textEncoding configuration?

IIRC, the reason why .setMimeType() was introduced to FileLoader was MMDLoader needed it for loading VPD file which can be encoded as Shift_JIS. MMDLoader is a kind of an old loader. When it was first written, TextDecoder was not available some major browsers like Chrome. Using mimeType was an only solution without external libraries.

But now TextDecoder is available on major borwsers. And decoding can be done in user side like

loader 
  .setResponseType('arraybuffer')
  .load(url, buffer => {
    const decoder = new TextDecoder(encodingType);
    const text = decoder.decode(buffer);
  });

or

loader 
  .setResponseType('text')
  .load(url, text => {
    const parser = new DOMParser();
    const doc = parser.parseFromString(text, mimeType);
  });

That means, regarding #23287, the problem can be fixed in MMDLoader.

And if I'm right, only using UTF-8 encoded resource is encouraged in JavaScript, so textEncoding should be rarely used.

Actually it was suggested that we may be able to remove setMimeType() and see if someone complains in #22510 (comment).

@takahirox nonutf8 text is so rare.. I agree that this should be handled by specific loaders in the future.

for now, in this pr, i will only focus on sniffing encoding from mimetype, and make 'text' kind response honors the encoding.
s.t. any existing loaders ( like mmdloader ) using 'setMimeType' will still work without updates.

@ycw ycw changed the title FileLoader: Support encoding for textual response FileLoader: text honors mimetype's charset as encoding Jan 22, 2022
@ycw ycw requested a review from Mugen87 January 22, 2022 15:54
@ycw ycw requested a review from gkjohnson January 22, 2022 15:54
@gkjohnson
Copy link
Collaborator

Code looks good to me but it would be good if @takahirox could verify since I'm unfamiliar with what exactly was breaking with MMDLoader and how to test it.

@ycw
Copy link
Contributor Author

ycw commented Jan 22, 2022

@gkjohnson Thanks for review. The mmdloader broke due to the fact that current fileloader doesn't take encoding into account for text kind response, it blindly parses received payload by utf8; whereas mmd format is encoded in shift_jis, i.e. the parsed text is gibberish; from mmdloader POV, the text is holding a malformed mmd file.

@takahirox plesae help checking if vpds are parsed correctly, thanks.

https://raw.githack.com/ycw/three.js/fileloader-sniff-encoding/examples/?q=mmd#webgl_loader_mmd_pose

poses test

@takahirox
Copy link
Collaborator

@ycw Yes, that example looks good, thanks.

@mrdoob mrdoob merged commit 55d4f24 into mrdoob:dev Jan 24, 2022
@mrdoob
Copy link
Owner

mrdoob commented Jan 24, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants