Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add the stdlib_diff tool to compare gno and go standard libraries #2869

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

Villaquiranm
Copy link
Contributor

@Villaquiranm Villaquiranm commented Sep 29, 2024

continuing the work started on #1425
thanks FloRichardAloeCorp for the great job on this issue 👍
closes #1310

Contributors' checklist...
  • Added new tests, or not needed, or not feasible
  • Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory
  • Updated the official documentation or not needed
  • No breaking changes were made, or a BREAKING CHANGE: xxx message was included in the description
  • Added references to related issues and PRs
  • Provided any useful hints for running manual tests
  • Added new benchmarks to generated graphs, if any. More info here.

@Villaquiranm Villaquiranm requested a review from a team as a code owner September 29, 2024 10:54
@Villaquiranm Villaquiranm requested review from ajnavarro and zivkovicmilos and removed request for a team September 29, 2024 10:54
@Villaquiranm Villaquiranm changed the title Feat/stdlib diff feat: Add the stdlib_diff tool to compare gno and go standard libraries #1425 Sep 29, 2024
@Villaquiranm Villaquiranm changed the title feat: Add the stdlib_diff tool to compare gno and go standard libraries #1425 feat: Add the stdlib_diff tool to compare gno and go standard libraries Sep 29, 2024
Copy link

codecov bot commented Sep 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@Villaquiranm
Copy link
Contributor Author

@thehowl I took a look of this, the current status:
1- Not done yet
2- I think it is already done, I can select the code and the + & - are not selected
3- I was thinking on implementing this using semaphores ? like that I could have the entire code on the same file as it is right now. If you think is necessary to use channels I could do that too.

4 and 5 are fixed on eaf6a87
Only for 4 I think there are some folder we should avoid ? For exemple cmd & vendor as they have a lot of files and I think they are not related to standard lib ?

Thank you for your contribution!

Sorry for the wait, this is an initial to improve. Nice work, but you can improve.

Some functionality missing on the generated report:

1- Formatting is still very funky. Use CSS to render tabs correctly; they should render as 4 spaces.

image

2- Ideally, for the added/deleted lines, make the + and - symbols non selectable. This makes it so that if I select the lines in either side of the page I can copy and paste the code without copying the symbols.

3- I believe the report's performance could be improved. Consider using the library I mention in the comments (diffmatchpatch) and parallelizing execution. You can parallelize execution by having a goroutine trying to first of all find all of the directories in the source and destination, and then sending off to a channel individual package names.

4- The tool does not currently perform diffing recursively. Ie. creating the report on my stdlibs it does not generate a diff for package regexp/syntax. Consider searching using something like filepath.WalkDir.

5- On the report, please make the lines which are == greyed out (putting css opacity: 0.75 probably works)

@thehowl
Copy link
Member

thehowl commented Oct 2, 2024

@Villaquiranm

  1. Formatting: you're right, + / - are not selectable. Can you add line numbers as well, and align each column? (line number, + / - / no sign, code line), making sure that only the code is selectable?
  2. For the diff'ing algo and performance, I actually found this library when I was working on a side project some time ago. I believe it's the best for this use case. diffmatchpatch, which I linked in the original PR, is not ideal as it is for "character-by-character" diff (ie. what Google uses for google docs); not for the kind of diffs we use in programming.
    There are also likely some other "easy" performance wins. As you mentioned, we should skip directories which entirely don't exist in their Gno counterpart (in the index, we can just show cmd in red and say missing in gno or missing in go for libs like std).
    Parallelism may not be necessary; let's see what single-thread performance we can get, and if it's still slow after my suggestions use pprof to find out what's taking the most time. I think we can get <5s performance on most machines pretty easily even without it.
  3. Can you make sure the index lists sub-directories as well?

@Kouteki Kouteki added the review/triage-pending PRs opened by external contributors that are waiting for the 1st review label Oct 3, 2024
@jefft0 jefft0 removed the review/triage-pending PRs opened by external contributors that are waiting for the 1st review label Oct 4, 2024
@jefft0
Copy link
Contributor

jefft0 commented Oct 4, 2024

Removed the "review team" label because this is already reviewed by thehowl.

@Villaquiranm
Copy link
Contributor Author

@Villaquiranm

  1. Formatting: you're right, + / - are not selectable. Can you add line numbers as well, and align each column? (line number, + / - / no sign, code line), making sure that only the code is selectable?
  2. For the diff'ing algo and performance, I actually found this library when I was working on a side project some time ago. I believe it's the best for this use case. diffmatchpatch, which I linked in the original PR, is not ideal as it is for "character-by-character" diff (ie. what Google uses for google docs); not for the kind of diffs we use in programming.
    There are also likely some other "easy" performance wins. As you mentioned, we should skip directories which entirely don't exist in their Gno counterpart (in the index, we can just show cmd in red and say missing in gno or missing in go for libs like std).
    Parallelism may not be necessary; let's see what single-thread performance we can get, and if it's still slow after my suggestions use pprof to find out what's taking the most time. I think we can get <5s performance on most machines pretty easily even without it.
  3. Can you make sure the index lists sub-directories as well?

Hey thanks a lot again for the review 👍 @thehowl .

For 1 I added the line numbers and they are not selectable

2- Thanks a lot for this library tip, I was actually taking a lot of time trying to figure this line diff algorithm but It was pretty straightforward with the one you recommended. The only downside is that the library just return the part of the file that have some changes, so to print the entire file I needed to add some specific code to accomplish it.

3- subdirectories are listed on the index, if a library have subdirectories it will appear like a toggle list.
For performance I think we have something decent / usable the entire diff of the standard library takes around 2 seconds on my machine.

I think there is still some improvements to do to the overall code, Just wanted to make sure this is the behaviour we'll like before doing some cleaning .

Thanks

@thehowl
Copy link
Member

thehowl commented Oct 5, 2024

I suggest you not to rebase your PRs, but instead to merge in master; it makes my work when reviewing easier. And possibly yours as well, in managing conflicts. (There are none here I think, but this is good general guidance)

@thehowl
Copy link
Member

thehowl commented Oct 5, 2024

How are you running the thing? I'm running with go run . -src /usr/lib/go/src -dst ~/oc/gno/gnovm/stdlibs/ -out $(mktemp -d | tee /dev/stderr) but it turns out an empty report.

@Villaquiranm
Copy link
Contributor Author

I suggest you not to rebase your PRs, but instead to merge in master; it makes my work when reviewing easier. And possibly yours as well, in managing conflicts. (There are none here I think, but this is good general guidance)

Sorry about that I'll keep it on mind

How are you running the thing? I'm running with go run . -src /usr/lib/go/src -dst ~/oc/gno/gnovm/stdlibs/ -out $(mktemp -d | tee /dev/stderr) but it turns out an empty report.

I run it like this and for me it's working fine:

 go run . -src /opt/homebrew/Cellar/go/1.23.0/libexec/src -dst ../../gnovm/stdlibs -out ~/gnoreport/
2024/10/05 23:55:08 Building report...
2024/10/05 23:55:09 Report generation done!

@Villaquiranm
Copy link
Contributor Author

@thehowl
For some reason if I put
-dst ../../gnovm/stdlibs/ instead of -dst ../../gnovm/stdlibs is not working I'll check :(

@thehowl
Copy link
Member

thehowl commented Oct 5, 2024

Okay, figured it out.

My src directory is a symlink. It works when I use its realpath (it's a unix command). Can you make sure symlinks are handled? (It's pretty common on linux).

The performance is satisfactory now. Thank you.

I have a bunch of comments to make on some of the looks. Mostly things about UX / confusing more than design wise, just to make sure this thing is decently usable. Tell me if you want 'em your way, or I can wait if you already have ideas.

  • Can we make the toggles of differing files open by default?
  • Can we make files that are missing in one or the other closed by default, but with a visually impactful note? (like red/bold ...)
  • The split view doesn't align. Try to see it on github for what I mean (screenshot). Lines which are the same should be on the same level, ...

image

  • Can we make sure to add also files and dirs that don't exist in go's stdlib? like std. This way, it's not just a compatibility document, but a general way for users to see what changes between the two standard libraries.
  • Maybe it's best if you work your way from a Unified diff in the gotextdiff lib? I haven't examined your code yet, just spitballing based also on what I remember from using it.

@Villaquiranm
Copy link
Contributor Author

Villaquiranm commented Oct 9, 2024

@thehowl Thanks again for this new round of review :)
fixes on 3550f0b

My src directory is a symlink. It works when I use its realpath (it's a unix command). Can you make sure symlinks are handled? (It's pretty common on linux).

  • handled symlinks
  • Can we make the toggles of differing files open by default?
  • now the different files are open by default, but equal or one side missing files are not
  • Can we make files that are missing in one or the other closed by default, but with a visually impactful note? (like red/bold ...)
  • Missing files are printed in colors Red if missing in destination and green if missing in source
  • The split view doesn't align. Try to see it on github for what I mean (screenshot). Lines which are the same should be on the same level, ...
    I tried to align the best I could with the lib, There are still some lines that could be on the same line but in overall is better than before
  • Can we make sure to add also files and dirs that don't exist in go's stdlib? like std. This way, it's not just a compatibility document, but a general way for users to see what changes between the two standard libraries.
  • I'm still missing this I'll work on it now
  • Maybe it's best if you work your way from a Unified diff in the gotextdiff lib? I haven't examined your code yet, just spitballing based also on what I remember from using it.

Yep I'm using unified diff but seems like it only keeps the lines that have some differences, so entire parts of the file could be omited

edits := myers.ComputeEdits(span.URIFromPath(f.Src), f.srcContent, f.dstContent)
	unified := gotextdiff.ToUnified(f.Src, f.Dst, f.srcContent, edits)

Some sort of what github does(you can only see from line 8-12:
image

Copy link
Member

@thehowl thehowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some more improvements overall, but I think with these few changes we're good to merge and iterate. Thanks, and sorry for the delay in the review!

@@ -0,0 +1,31 @@
# Stdlibs_diff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Stdlibs_diff
# stdlib_diff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed on 2ff4f9e

@@ -0,0 +1,31 @@
# Stdlibs_diff

Stdlibs_diff is a tool that generates an html report indicating differences between gno standard libraries and go standrad libraries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Stdlibs_diff is a tool that generates an html report indicating differences between gno standard libraries and go standrad libraries
stdlib_diff is a tool that generates an html report indicating differences between gno standard libraries and go standrad libraries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed on 2ff4f9e

| src | Directory containing packages that will be compared to destination | None |
| dst | Directory containing packages; used to compare src packages | None |
| out | Directory where the report will be created | None |
| src_is_gno | Indicates if the src parameters is the gno standard library | false |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, IMO, this shouldn't really matter: flipping src and dst should yield mostly the same results, with the green/red lines swapped.

The result is that we should also show, in green, at the package level, libraries like "std" and "crypto/chacha20": they can be an aid to quickly illustrate what's new compared to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the flag src_is_gno and I changed the way to obtain what each directory is suppose to contain gno or go files. on 2ff4f9e

Comment on lines 277 to 294
// getRealPath will check if the directory is a symbolic link and resolve if path before returning it
func getRealPath(path string) (string, error) {
info, err := os.Lstat(path)
if err != nil {
return "", err
}

if info.Mode()&fs.ModeSymlink != 0 {
// File is symbolic link, no need to resolve
link, err := os.Readlink(path)
if err != nil {
return "", fmt.Errorf("can't resolve symbolic link: %w", err)
}
return link, nil
}

return path, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually work, because it doesn't understand symlinks in intermediate parts of the path: you're looking for filepath.EvalSymlinks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use filepath.EvalSymlinks on 2ff4f9e

}
return "", err
}
return strings.ReplaceAll(string(data), "\t", " "), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my way to convert the tabs to 4 spaces. is is working maybe is not the best way ?
Maybe I misunderstood

1- Formatting is still very funky. Use CSS to render tabs correctly; they should render as 4 spaces.

This line is doing the spaces to have the width equivalent of 4 spaces but when selected it is still a tab. I thought we had to convert it to real 4 spaces.

 p {
            white-space-collapse: preserve;
            margin: 0;
            font-size: 1rem;
            font-weight: 400;
            tab-size: 4;
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the view take up 100% of the width?

image

(note it's not using all the available space on the right)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed on 2ff4f9e

@thehowl
Copy link
Member

thehowl commented Dec 9, 2024

btw, can you modify the gh-pages workflow? (.github/workflows/gh-pages)

so it creates the report automatically on each push. Maybe put the report in the subdir stdlib_diff, so it eventually shows up on https://gnolang.github.io/gno/stdlib_diff

I suggest you also try it out on a fork and link to a sample output, so we can check the workflow works correctly ahead of the merge.

@Gno2D2
Copy link
Collaborator

Gno2D2 commented Dec 9, 2024

🛠 PR Checks Summary

🔴 Maintainers must be able to edit this pull request (more info)

Manual Checks (for Reviewers):
  • SKIP: Do not block the CI for this PR
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

🔴 Maintainers must be able to edit this pull request (more info)
🟢 The pull request head branch must be up-to-date with its base (more info)

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Automated Checks
Maintainers must be able to edit this pull request (more info)

If

🟢 Condition met
└── 🟢 The pull request was created from a fork (head branch repo: TERITORI/gno)

Then

🔴 Requirement not satisfied
└── 🔴 Maintainer can modify this pull request

The pull request head branch must be up-to-date with its base (more info)

If

🟢 Condition met
└── 🟢 On every pull request

Then

🟢 Requirement satisfied
└── 🟢 Head branch (TERITORI:feat/stdlib_diff) is up to date with base (master): behind by 0 / ahead by 41

Manual Checks
**SKIP**: Do not block the CI for this PR

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

@github-actions github-actions bot added the 🧾 package/realm Tag used for new Realms or Packages. label Dec 13, 2024
@Villaquiranm
Copy link
Contributor Author

btw, can you modify the gh-pages workflow? (.github/workflows/gh-pages)

so it creates the report automatically on each push. Maybe put the report in the subdir stdlib_diff, so it eventually shows up on https://gnolang.github.io/gno/stdlib_diff

I suggest you also try it out on a fork and link to a sample output, so we can check the workflow works correctly ahead of the merge.

@thehowl I followed your recommendations, here is the result :) thanks
https://villaquiranm.github.io/gno/stdlib_diff/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧾 package/realm Tag used for new Realms or Packages.
Projects
Status: In Progress
Status: In Review
Development

Successfully merging this pull request may close these issues.

Tool for comparing source code of standard libaries between Go and Gno
7 participants