Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proforma: reuse existing files on import, if possible #2538

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kkoehn
Copy link
Collaborator

@kkoehn kkoehn commented Sep 18, 2024

Use xml_id_path to recreate proforma structure with tests and model_solutions containing multiple files

also sets xml_id_path for exported files, if it is not set. (similiar to uuid)

@kkoehn kkoehn self-assigned this Sep 18, 2024
Copy link

codecov bot commented Sep 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.91%. Comparing base (35c601c) to head (7e9974e).
Report is 17 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2538      +/-   ##
==========================================
+ Coverage   69.80%   69.91%   +0.11%     
==========================================
  Files         203      203              
  Lines        6488     6512      +24     
==========================================
+ Hits         4529     4553      +24     
  Misses       1959     1959              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kkoehn kkoehn force-pushed the proforma-reuse-existing-file-on-import branch 2 times, most recently from 218f9b9 to a7affb4 Compare September 23, 2024 20:53
@kkoehn kkoehn added enhancement ruby Pull requests that update Ruby code labels Sep 23, 2024
@kkoehn kkoehn requested a review from MrSerth September 23, 2024 21:13
@kkoehn kkoehn marked this pull request as ready for review September 23, 2024 21:13
Copy link
Member

@MrSerth MrSerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a first round of my review, during which I found a few questions. Let's check these first and I'll continue with a second round.

While reviewing, I also got one extra question: Given the changes in CodeOcean to the XML Path ID and the updated metadata, would you recommend to reexport all exercises to CodeHarbor after merging this PR?

db/schema.rb Outdated Show resolved Hide resolved
app/services/proforma_service/convert_exercise_to_task.rb Outdated Show resolved Hide resolved
app/services/proforma_service/convert_exercise_to_task.rb Outdated Show resolved Hide resolved
app/services/proforma_service/convert_exercise_to_task.rb Outdated Show resolved Hide resolved
app/services/proforma_service/convert_task_to_exercise.rb Outdated Show resolved Hide resolved
app/services/proforma_service/convert_task_to_exercise.rb Outdated Show resolved Hide resolved
app/services/proforma_service/convert_exercise_to_task.rb Outdated Show resolved Hide resolved
@kkoehn
Copy link
Collaborator Author

kkoehn commented Sep 24, 2024

While reviewing, I also got one extra question: Given the changes in CodeOcean to the XML Path ID and the updated metadata, would you recommend to reexport all exercises to CodeHarbor after merging this PR?

Yes that would absolutely make sense!

@kkoehn kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from a7affb4 to 4d3198e Compare September 24, 2024 20:07
@MrSerth

This comment was marked as resolved.

@kkoehn kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from 4d3198e to ac0fe42 Compare September 24, 2024 21:09
@kkoehn

This comment was marked as resolved.

@kkoehn kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from 38f45b0 to 20a2705 Compare September 24, 2024 22:45
@kkoehn kkoehn requested a review from MrSerth September 24, 2024 22:47
@MrSerth

This comment was marked as resolved.

Copy link
Member

@MrSerth MrSerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done, thank you! The current code changes look good and I've understand them.

Unfortunately, I still had issues verifying the code changes. Once again, I was using the already well-known file changes from here:

First of all, when the new version was in CodeHarbor and I tried uploading the old version, I got an exception there (the other direction from old to new works). Then, even when resolving this in CodeHarbor, I still couldn't import the old version in CodeOcean if the new version was stored already.

The error I got was: Validation failed: Files feedback message must be blank. The problematic file is:

#<CodeOcean::File id: 67823922, content: "# actual tests here (in functions)\r\nfrom z_helpers...", context_id: 1101, context_type: "Exercise", file_id: nil, file_type_id: 8, hidden: true, name: "p51test", read_only: true, created_at: "2024-09-25 19:28:50.331358000 +0000", updated_at: "2024-09-25 19:29:34.363568000 +0000", native_file: nil, role: "regular_file", hashed_content: "70721d58fcbac5c9570e63633e778d9c", feedback_message: "Feedback", weight: nil, path: nil, file_template_id: nil, hidden_feedback: false, xml_id_path: ["63332861"]>

Despite the listed issue, I also found an issue with the proglang method. Below is the fixed entry that also works if no execution_environment has been selected:

    def proglang
      regex = %r{^openhpi/co_execenv_(?<language>[^:]*):(?<version>[^-]*)(?>-.*)?$}
      match = regex.match @exercise&.execution_environment&.docker_image
      match ? {name: match[:language], version: match[:version]} : nil
    end

Could you include that, too, please (ideally with a test)?

app/services/proforma_service/convert_exercise_to_task.rb Outdated Show resolved Hide resolved
@kkoehn kkoehn force-pushed the proforma-reuse-existing-file-on-import branch 6 times, most recently from 708bb5d to ebef98c Compare October 8, 2024 19:30
@kkoehn
Copy link
Collaborator Author

kkoehn commented Oct 8, 2024

please note that import_spec.rb:129 is flaky right now - will look into it tomorrow

Copy link
Member

@MrSerth MrSerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing tonight, but noticed another issue: Some files can be left unreferenced.

Specifically, consider an update form the "old" to the "new" version of task 845 (ZIP files attached above). As part of the update, I see the following query in CodeOcean:

  CodeOcean::File Update All (1.0ms)  UPDATE "files" SET "context_id" = $1, "context_type" = $2 WHERE ("files"."id") IN (SELECT "files"."id" FROM "files" WHERE "files"."context_id" = $3 AND "files"."context_type" = $4 AND ("files"."id" = $5 OR "files"."id" = $6) ORDER BY "files"."path" ASC, "files"."name" ASC)  [["context_id", nil], ["context_type", nil], ["context_id", 1101], ["context_type", "Exercise"], ["id", 67824363], ["id", 67824368]]

And indeed, as expected after this query, some files were updated to have a context_id and context_type of NULL. This shouldn't be the case, I think, and files no longer referenced should be deleted completely.

Otherwise, the functionality seems to work fine. I'll continue my review once the functionality is fixed.

@kkoehn kkoehn force-pushed the proforma-reuse-existing-file-on-import branch 2 times, most recently from e6a3f23 to 0ad75ef Compare November 26, 2024 21:57
Use xml_id_path to recreate proforma structure with tests and model_solutions containing multiple files
@kkoehn kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from 0ad75ef to fefb056 Compare November 26, 2024 22:12
@kkoehn
Copy link
Collaborator Author

kkoehn commented Nov 26, 2024

I started reviewing tonight, but noticed another issue: Some files can be left unreferenced.

Specifically, consider an update form the "old" to the "new" version of task 845 (ZIP files attached above). As part of the update, I see the following query in CodeOcean:

  CodeOcean::File Update All (1.0ms)  UPDATE "files" SET "context_id" = $1, "context_type" = $2 WHERE ("files"."id") IN (SELECT "files"."id" FROM "files" WHERE "files"."context_id" = $3 AND "files"."context_type" = $4 AND ("files"."id" = $5 OR "files"."id" = $6) ORDER BY "files"."path" ASC, "files"."name" ASC)  [["context_id", nil], ["context_type", nil], ["context_id", 1101], ["context_type", "Exercise"], ["id", 67824363], ["id", 67824368]]

And indeed, as expected after this query, some files were updated to have a context_id and context_type of NULL. This shouldn't be the case, I think, and files no longer referenced should be deleted completely.

Otherwise, the functionality seems to work fine. I'll continue my review once the functionality is fixed.

sorry for the delay - should be fixed now

Copy link
Member

@MrSerth MrSerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the continuous work on this topic. When checking the import with the logs on my machine, I (still) noticed the following (845 old to new):

  CodeOcean::File Update All (4.7ms)  UPDATE "files" SET "context_id" = $1, "context_type" = $2 WHERE ("files"."id") IN (SELECT "files"."id" FROM "files" WHERE "files"."context_id" = $3 AND "files"."context_type" = $4 AND ("files"."id" = $5 OR "files"."id" = $6) ORDER BY "files"."path" ASC, "files"."name" ASC)  [["context_id", nil], ["context_type", nil], ["context_id", 1101], ["context_type", "Exercise"], ["id", 67825538], ["id", 67825539]]
  ↳ app/services/proforma_service/convert_task_to_exercise.rb:21:in `import_task'
  CodeOcean::File Load (0.6ms)  SELECT "files".* FROM "files" WHERE "files"."id" IN ($1, $2) ORDER BY "files"."path" ASC, "files"."name" ASC  [["id", 67825538], ["id", 67825539]]
  ↳ app/services/proforma_service/convert_task_to_exercise.rb:34:in `import_task'
  CodeOcean::File Destroy (1.0ms)  DELETE FROM "files" WHERE "files"."id" = $1  [["id", 67825538]]
  ↳ app/services/proforma_service/convert_task_to_exercise.rb:34:in `import_task'
  CodeOcean::File Destroy (0.7ms)  DELETE FROM "files" WHERE "files"."id" = $1  [["id", 67825539]]

Especially, I notice two things:

  1. The two files are first updated and then removed (same IDs).
  2. During the update, they are still unreferenced from their context.

Furthermore, the UPDATE query looks a bit weird (regarding the WHERE clause). Nevertheless, I would tend to accept this, since the functionality is working now. Two questions still remain:

  • Can we switch the order to avoid the unnecessary UPDATE?
  • Should we opt to have a dedicated transaction (or potentially a SELECT FOR UPDATE / lock)?
<ActiveRecord object(s)>.with_lock do
  # Stuff
end

<ActiveRecord class>.transaction do
  # Stuff
end

Otherwise, please find some domain-level comments attached.

app/services/proforma_service/convert_exercise_to_task.rb Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have an issue when exercises get duplicated.

ExercisesController#clone should be:

    exercise = @exercise.duplicate(public: false, token: nil, user: current_user, uuid: nil)

(since otherwise duplicating breaks due to the UUID uniqueness check).


Furthermore, when duplicating an exercise, the newly introduced xml_id_path on files is not reset (and hence simply kept). When the XML path ID is set through a ProFormA import, this might be okay-ish, but when it is set due to a previous export, we might mingle the IDs, don't we?

Copy link
Collaborator Author

@kkoehn kkoehn Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the first part - I can't follow the second part. If the xml_id_path is valid for one Exercise/Task context it should be valid for a copied one as well, since we are not using CodeOcean-ids for the path, but the xml_id right?

Edit:
We are using the id to generate the xml_id_path, but I still don't see it as a problem. The important thing is, that the used ids are unique in context of the given exercise. That would still be true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the first part 👍

Your considerations on the second part are understandable too. And indeed, since the xml_id_path is only checked in the context of an exercise, this shouldn't be a problem in a technical sense.

By the way:

  • Why aren't we checking the uniqueness with something like:

      validates :xml_id_path, uniqueness: {scope: %I[context_id context_type]}
  • Why aren't we checking that the :xml_id_path is blank if context_type != 'Exercise' (This requires a custom validation).

That having said, I also see advantages when the xml_id_path stays unchanged for duplicated exercises (since it allows easier tracing of changes, ...). Maybe we simply add a single comment (to above validations or the duplicate method?) to explain why we decided to keep it and that it might get out-of-sync with the id (which is fine and intended).

@kkoehn kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from fefb056 to d3e3727 Compare December 4, 2024 20:43
@kkoehn kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from d3e3727 to 7e9974e Compare December 4, 2024 20:44
Copy link
Member

@MrSerth MrSerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I did another round of review and only have another marginal comment. Let's get this done this week, if you like 💪.

Comment on lines +109 to +110
# checking the last element of xml_id_path array for file.id
codeocean_file = @exercise.files.where('array_length(xml_id_path, 1) IS NOT NULL AND xml_id_path[array_length(xml_id_path, 1)] = ?', file.id).first_or_initialize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SQL query is executed per file, thus creating an N+1 query. Rather than using SQL here, it would have been more beneficial to use in-memory filtering with Ruby. That way, we would load all files at once (using @exercise.files) and then simply select the matching file. I would expect that this improves performance (less SQL) and potentially readability (easier access to last array element). Since we access @exercise.files in destroy_old_files already, this would even skip the first SQL query (to get all files).

I see this part as optimization only and, given the infrequent use of the endpoint and the time spent already, would be fine to skip this optimization here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ruby Pull requests that update Ruby code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants