Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Prevent of bolding entire content pasted from google docs #62

Merged
merged 48 commits into from
Jul 29, 2019
Merged

Conversation

msamsel
Copy link
Contributor

@msamsel msamsel commented Jul 16, 2019

Suggested merge commit message (convention)

Feature: Prevent of bolding entire content pasted from google docs. Closes ckeditor/ckeditor5#2491 .


Additional information

  • Refactoring of the automatic tests. After this change there will be mocked up entire clipboard event in normalization tests, not only fired normalization function for MS Word.
  • There are also small changes in the description of unit tests for better separation and readability. It is really had to figure out from which file which tests are run as there is way too much of subgroups and indention which start to mess up. Mocha does not combine subgroups Group up descriptions with same names mochajs/mocha#1413 from different files.

@msamsel msamsel marked this pull request as ready for review July 16, 2019 12:50
@msamsel msamsel requested a review from jodator July 16, 2019 12:50
@msamsel
Copy link
Contributor Author

msamsel commented Jul 17, 2019

@Mgsy maybe you will have a moment to click over this PR and find some strangely defined Google Docs document which still will be pasted with bold.

Just please be aware that PR fixes only the situation when the entire pasted text became bolded. Support for the basic styles, lists and other feature will be introduced later in some following PRs.

@mlewand mlewand requested review from mlewand and Mgsy and removed request for mlewand July 17, 2019 13:43
@Mgsy
Copy link
Member

Mgsy commented Jul 18, 2019

Unfortunately, this fix doesn't work on Windows.

@msamsel
Copy link
Contributor Author

msamsel commented Jul 18, 2019

@Mgsy can you take a look one more time. It should work now.

Copy link
Contributor

@jodator jodator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reviewing this for some time an tried to remove the need for static methods - they are artificial IMO for test purposes only and I think that we can spend some more time to clean some things here a bit. And probably make PFO more maintainable for the future (pasting from other office suites or editor types - ie excel).

In order to start this, we need to know the API for that - namely, what must be passed do the normalize method? In the _inputTransformationListener() method there's either html & dataTransfer passed or data.content. This part needs to be a bit unified - it would be nice to have at most 2 params (where the first in the data which need to be processed and the latter additional data (like dataTransfer for Word).

I'm thinking about private API entirely for now - just to refactor the switch and make the PFO tastable better.

The basic idea is to create an interface - from what I see ATM something basic like:

interface Normalizer {
    isActive( html ); // bad name - anyway t
    normalize( html, dataTransfer )
}

This way we will be able to:

  1. do not expose private methods for testing (ie by creating stub with the same API as Normalizer that will check if proper normalizer is called and if it is called only once)
  2. test normalizers independently (or just as integration test as now - doesn't really matter)

* Listener fired during {@link module:clipboard/clipboard~Clipboard#event:inputTransformation `inputTransformation` event}.
* Detects if content comes from a recognized source and normalize it.
*
* **Note**: this function was exposed mainly for testing purposes and should not be called directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Note**: this function was exposed mainly for testing purposes and should not be called directly.
* **Note**: this function is exposed mainly for testing purposes and should not be called directly.

data.content = PasteFromOffice._normalizeWordInput( html, data.dataTransfer );
break;
case 'gdocs':
data.content = PasteFromOffice._normalizeGoogleDocsInput( data.content );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is here data.content taken directly and not the html as for word input? At least some explanation is needed.

src/filters/common.js Outdated Show resolved Hide resolved
} )
};

PasteFromOffice._inputTransformationListener( null, data );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 can be used as I checked that:

editor.plugins.get( 'Clipboard' ).fire( 'inputTransformation', data );

* @param {module:utils/eventinfo~EventInfo} evt
* @param {Object} data same structure like {@link module:clipboard/clipboard~Clipboard#event:inputTransformation input transformation}
*/
static _inputTransformationListener( evt, data ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we do not use such static methods. Anyway, this method was exported only for test purposes and wasn't needed to be exposed at all, so let's try to fix this (check this: https://github.com/ckeditor/ckeditor5-paste-from-office/pull/62/files#diff-01566d953bd66051289510b52b899e4bR166)

Copy link
Member

@Mgsy Mgsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mlewand mlewand requested a review from Mgsy July 29, 2019 07:58
@mlewand
Copy link
Contributor

mlewand commented Jul 29, 2019

Since there has been quite a few changes, @Mgsy can I ask you to verify once again the fix?

@msamsel
Copy link
Contributor Author

msamsel commented Jul 29, 2019

@Mgsy please also check if nothing went wrong with MS Word, as there was also changed way how MS Word filters start to be fired.

Copy link
Contributor

@jodator jodator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're almost there ;) Please take a look at the comments, most importantly:

  • There is too much [].forEach() IMO used in test - some of them are redundant and prevents us from writing a clear explanation of what this particular test tests other than case #.
  • Some corrections to the docs are needed.
  • The code to be moved around (namespaces names).

I' malso thinking about common API for filters but I think that we can live with a current state of things. II'll create a follow up for that (or update a current one if existing)>

*/

/**
* @module paste-from-office/filters/common
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a common filter - so maybe we should just move it to removeboldtagwrapper.js.

*/
export function removeBoldTagWrapper( { documentFragment, writer } ) {
for ( const childWithWrapper of documentFragment.getChildren() ) {
if ( childWithWrapper.is( 'b' ) && childWithWrapper.getStyle( 'font-weight' ) === 'normal' ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

child would be enough - better to read here :)

for ( const childWithWrapper of documentFragment.getChildren() ) {
if ( childWithWrapper.is( 'b' ) && childWithWrapper.getStyle( 'font-weight' ) === 'normal' ) {
const childIndex = documentFragment.getChildIndex( childWithWrapper );
const removedElement = writer.remove( childWithWrapper )[ 0 ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid such constructs if possible:

writer.remove( child );

writer.insertChild( index, child.getChildren(), docuemntFragment );

also will work.

/**
* Removes `<b>` tag wrapper added by Google Docs to a copied content.
*
* @param {module:engine/view/documentfragment~DocumentFragment} documentFragment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong parameters in the docs.

/**
* Method applies normalization to given data.
*
* @method #exec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might forgot about it - it should be full form: execute()

{
'text/html': '<meta name=Generator content="Microsoft Word 15"><p class="MsoNormal">Hello world<o:p></o:p></p>'
}
].forEach( ( inputData, index ) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above (helper test function vs forEach()

describe( 'isActive()', () => {
describe( 'correct data set', () => {
[
'<meta name=Generator content="Microsoft Word 15"><p>Foo bar</p>',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not detect the other option - only one form of compatible content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare readability of the tests:

describe( 'isActive()', () => {
	it( 'should return true for MS Word content', () => {
		expect( normalizer.isActive( '<meta name=Generator content="Microsoft Word 15"><p>Foo bar</p>' ) ).to.be.true;
	} );

	it( 'should return true for MS Word content - in Safari', () => {
		expect( normalizer.isActive( '<meta name=Generator content="Microsoft Word 15"><p>Foo bar</p>' ) ).to.be.true;
	} );

	it( 'should return false for non-compatible content', () => {
		expect( normalizer.isActive( '<p>Foo bar</p>' ) ).to.be.false;
	} );

	it( 'should return false for content from other source', () => {
		expect( normalizer.isActive( '<p id="docs-internal-guid-12345678-1234-1234-1234-1234567890ab"></p>' ) ).to.be.false;
	} );
} );

marking every test with numbers provides little value for others - I have to check what the # was and why it might fail.

Those tests are simple enough and doesn't have to be run in a loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not detect the other option - only one form of compatible content.

It wasn't present such check in original data. Now it's added :)

const normalizer = new GoogleDocsNormalizer();

describe( 'isActive()', () => {
describe( 'correct data set', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the notes about running tests in MSWordNormalizer tests.

tests/filters/space.js Outdated Show resolved Hide resolved
tests/filters/parse.js Outdated Show resolved Hide resolved
Copy link
Member

@Mgsy Mgsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked it once again and everything works fine. I didn't find any new unexpected behaviour regarding pasting from Word.

@msamsel msamsel requested a review from jodator July 29, 2019 14:08
// @param {Boolean} shouldBeProcessed determines if data should be marked as processed with isTransformedWithPasteFromOffice flag
// @param {Boolean} [isAlreadyProcessed=false] apply flag before paste from office plugin will transform the data object
function checkDataProcessing( inputString, shouldBeProcessed, isAlreadyProcessed = false ) {
// const htmlDataProcessor = new HtmlDataProcessor();
Copy link
Contributor

@jodator jodator Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove - left over comment.

*/

/**
* @module paste-from-office/filters/removeboldtagwrapper
Copy link
Contributor

@jodator jodator Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong module = should be removeboldwrapper.

Copy link
Contributor

@jodator jodator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for two out-of-order comments I forgot to start a review with them.

Finishing touches and we are good to go :)


clipboard.fire( 'inputTransformation', data );

if ( shouldBeProcessed ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two methods of testing could be preserved - now the test is a bit too mangled ;) Also two booleans in method parameters are too much.

@msamsel msamsel requested a review from jodator July 29, 2019 15:00
@jodator jodator merged commit 8102de3 into master Jul 29, 2019
@jodator jodator deleted the t/61 branch July 29, 2019 15:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PFGD] Prevent of inserting bolded content from GDocs
4 participants