Proforma: reuse existing files on import, if possible #2538

kkoehn · 2024-09-18T20:26:54Z

Use xml_id_path to recreate proforma structure with tests and model_solutions containing multiple files

also sets xml_id_path for exported files, if it is not set. (similiar to uuid)

codecov · 2024-09-18T20:35:06Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.91%. Comparing base (35c601c) to head (7e9974e).
Report is 17 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2538      +/-   ##
==========================================
+ Coverage   69.80%   69.91%   +0.11%     
==========================================
  Files         203      203              
  Lines        6488     6512      +24     
==========================================
+ Hits         4529     4553      +24     
  Misses       1959     1959

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MrSerth

I've done a first round of my review, during which I found a few questions. Let's check these first and I'll continue with a second round.

While reviewing, I also got one extra question: Given the changes in CodeOcean to the XML Path ID and the updated metadata, would you recommend to reexport all exercises to CodeHarbor after merging this PR?

db/schema.rb

app/services/proforma_service/convert_exercise_to_task.rb

app/services/proforma_service/convert_task_to_exercise.rb

app/services/proforma_service/convert_exercise_to_task.rb

kkoehn · 2024-09-24T19:43:34Z

While reviewing, I also got one extra question: Given the changes in CodeOcean to the XML Path ID and the updated metadata, would you recommend to reexport all exercises to CodeHarbor after merging this PR?

Yes that would absolutely make sense!

MrSerth

Nicely done, thank you! The current code changes look good and I've understand them.

Unfortunately, I still had issues verifying the code changes. Once again, I was using the already well-known file changes from here:

First of all, when the new version was in CodeHarbor and I tried uploading the old version, I got an exception there (the other direction from old to new works). Then, even when resolving this in CodeHarbor, I still couldn't import the old version in CodeOcean if the new version was stored already.

The error I got was: Validation failed: Files feedback message must be blank. The problematic file is:

#<CodeOcean::File id: 67823922, content: "# actual tests here (in functions)\r\nfrom z_helpers...", context_id: 1101, context_type: "Exercise", file_id: nil, file_type_id: 8, hidden: true, name: "p51test", read_only: true, created_at: "2024-09-25 19:28:50.331358000 +0000", updated_at: "2024-09-25 19:29:34.363568000 +0000", native_file: nil, role: "regular_file", hashed_content: "70721d58fcbac5c9570e63633e778d9c", feedback_message: "Feedback", weight: nil, path: nil, file_template_id: nil, hidden_feedback: false, xml_id_path: ["63332861"]>

Despite the listed issue, I also found an issue with the proglang method. Below is the fixed entry that also works if no execution_environment has been selected:

    def proglang
      regex = %r{^openhpi/co_execenv_(?<language>[^:]*):(?<version>[^-]*)(?>-.*)?$}
      match = regex.match @exercise&.execution_environment&.docker_image
      match ? {name: match[:language], version: match[:version]} : nil
    end

Could you include that, too, please (ideally with a test)?

app/services/proforma_service/convert_exercise_to_task.rb

kkoehn · 2024-10-08T19:31:47Z

~~please note that import_spec.rb:129 is flaky right now - will look into it tomorrow~~

MrSerth

I started reviewing tonight, but noticed another issue: Some files can be left unreferenced.

Specifically, consider an update form the "old" to the "new" version of task 845 (ZIP files attached above). As part of the update, I see the following query in CodeOcean:

  CodeOcean::File Update All (1.0ms)  UPDATE "files" SET "context_id" = $1, "context_type" = $2 WHERE ("files"."id") IN (SELECT "files"."id" FROM "files" WHERE "files"."context_id" = $3 AND "files"."context_type" = $4 AND ("files"."id" = $5 OR "files"."id" = $6) ORDER BY "files"."path" ASC, "files"."name" ASC)  [["context_id", nil], ["context_type", nil], ["context_id", 1101], ["context_type", "Exercise"], ["id", 67824363], ["id", 67824368]]

And indeed, as expected after this query, some files were updated to have a context_id and context_type of NULL. This shouldn't be the case, I think, and files no longer referenced should be deleted completely.

Otherwise, the functionality seems to work fine. I'll continue my review once the functionality is fixed.

Use xml_id_path to recreate proforma structure with tests and model_solutions containing multiple files

kkoehn · 2024-11-26T22:24:03Z

I started reviewing tonight, but noticed another issue: Some files can be left unreferenced.

Specifically, consider an update form the "old" to the "new" version of task 845 (ZIP files attached above). As part of the update, I see the following query in CodeOcean:
  CodeOcean::File Update All (1.0ms)  UPDATE "files" SET "context_id" = $1, "context_type" = $2 WHERE ("files"."id") IN (SELECT "files"."id" FROM "files" WHERE "files"."context_id" = $3 AND "files"."context_type" = $4 AND ("files"."id" = $5 OR "files"."id" = $6) ORDER BY "files"."path" ASC, "files"."name" ASC)  [["context_id", nil], ["context_type", nil], ["context_id", 1101], ["context_type", "Exercise"], ["id", 67824363], ["id", 67824368]]
And indeed, as expected after this query, some files were updated to have a context_id and context_type of NULL. This shouldn't be the case, I think, and files no longer referenced should be deleted completely.

Otherwise, the functionality seems to work fine. I'll continue my review once the functionality is fixed.

sorry for the delay - should be fixed now

MrSerth

Thank you for the continuous work on this topic. When checking the import with the logs on my machine, I (still) noticed the following (845 old to new):

  CodeOcean::File Update All (4.7ms)  UPDATE "files" SET "context_id" = $1, "context_type" = $2 WHERE ("files"."id") IN (SELECT "files"."id" FROM "files" WHERE "files"."context_id" = $3 AND "files"."context_type" = $4 AND ("files"."id" = $5 OR "files"."id" = $6) ORDER BY "files"."path" ASC, "files"."name" ASC)  [["context_id", nil], ["context_type", nil], ["context_id", 1101], ["context_type", "Exercise"], ["id", 67825538], ["id", 67825539]]
  ↳ app/services/proforma_service/convert_task_to_exercise.rb:21:in `import_task'
  CodeOcean::File Load (0.6ms)  SELECT "files".* FROM "files" WHERE "files"."id" IN ($1, $2) ORDER BY "files"."path" ASC, "files"."name" ASC  [["id", 67825538], ["id", 67825539]]
  ↳ app/services/proforma_service/convert_task_to_exercise.rb:34:in `import_task'
  CodeOcean::File Destroy (1.0ms)  DELETE FROM "files" WHERE "files"."id" = $1  [["id", 67825538]]
  ↳ app/services/proforma_service/convert_task_to_exercise.rb:34:in `import_task'
  CodeOcean::File Destroy (0.7ms)  DELETE FROM "files" WHERE "files"."id" = $1  [["id", 67825539]]

Especially, I notice two things:

The two files are first updated and then removed (same IDs).
During the update, they are still unreferenced from their context.

Furthermore, the UPDATE query looks a bit weird (regarding the WHERE clause). Nevertheless, I would tend to accept this, since the functionality is working now. Two questions still remain:

Can we switch the order to avoid the unnecessary UPDATE?
Should we opt to have a dedicated transaction (or potentially a SELECT FOR UPDATE / lock)?

<ActiveRecord object(s)>.with_lock do
  # Stuff
end

<ActiveRecord class>.transaction do
  # Stuff
end

Otherwise, please find some domain-level comments attached.

app/services/proforma_service/convert_exercise_to_task.rb

MrSerth · 2024-12-02T14:45:27Z

app/services/proforma_service/convert_exercise_to_task.rb

We still have an issue when exercises get duplicated.

ExercisesController#clone should be:

exercise = @exercise.duplicate(public: false, token: nil, user: current_user, uuid: nil)

(since otherwise duplicating breaks due to the UUID uniqueness check).

Furthermore, when duplicating an exercise, the newly introduced xml_id_path on files is not reset (and hence simply kept). When the XML path ID is set through a ProFormA import, this might be okay-ish, but when it is set due to a previous export, we might mingle the IDs, don't we?

Fixed the first part - I can't follow the second part. If the xml_id_path is valid for one Exercise/Task context it should be valid for a copied one as well, ~~since we are not using CodeOcean-ids for the path, but the xml_id right?~~

Edit:
We are using the id to generate the xml_id_path, but I still don't see it as a problem. The important thing is, that the used ids are unique in context of the given exercise. That would still be true.

Thanks for fixing the first part 👍

Your considerations on the second part are understandable too. And indeed, since the xml_id_path is only checked in the context of an exercise, this shouldn't be a problem in a technical sense.

By the way:

Why aren't we checking the uniqueness with something like:

validates :xml_id_path, uniqueness: {scope: %I[context_id context_type]}

Why aren't we checking that the :xml_id_path is blank if context_type != 'Exercise' (This requires a custom validation).

That having said, I also see advantages when the xml_id_path stays unchanged for duplicated exercises (since it allows easier tracing of changes, ...). Maybe we simply add a single comment (to above validations or the duplicate method?) to explain why we decided to keep it and that it might get out-of-sync with the id (which is fine and intended).

app/services/proforma_service/convert_task_to_exercise.rb

app/services/proforma_service/convert_exercise_to_task.rb

Co-authored-by: Sebastian Serth <[email protected]>

MrSerth

Awesome! I did another round of review and only have another marginal comment. Let's get this done this week, if you like 💪.

MrSerth · 2024-12-08T20:16:56Z

app/services/proforma_service/convert_task_to_exercise.rb

+      # checking the last element of xml_id_path array for file.id
+      codeocean_file = @exercise.files.where('array_length(xml_id_path, 1) IS NOT NULL AND xml_id_path[array_length(xml_id_path, 1)] = ?', file.id).first_or_initialize


This SQL query is executed per file, thus creating an N+1 query. Rather than using SQL here, it would have been more beneficial to use in-memory filtering with Ruby. That way, we would load all files at once (using @exercise.files) and then simply select the matching file. I would expect that this improves performance (less SQL) and potentially readability (easier access to last array element). Since we access @exercise.files in destroy_old_files already, this would even skip the first SQL query (to get all files).

I see this part as optimization only and, given the infrequent use of the endpoint and the time spent already, would be fine to skip this optimization here.

kkoehn self-assigned this Sep 18, 2024

kkoehn force-pushed the proforma-reuse-existing-file-on-import branch 2 times, most recently from 218f9b9 to a7affb4 Compare September 23, 2024 20:53

kkoehn added enhancement ruby Pull requests that update Ruby code labels Sep 23, 2024

kkoehn requested a review from MrSerth September 23, 2024 21:13

kkoehn marked this pull request as ready for review September 23, 2024 21:13

MrSerth reviewed Sep 23, 2024

View reviewed changes

kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from a7affb4 to 4d3198e Compare September 24, 2024 20:07

This comment was marked as resolved.

Sign in to view

kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from 4d3198e to ac0fe42 Compare September 24, 2024 21:09

This comment was marked as resolved.

Sign in to view

kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from 38f45b0 to 20a2705 Compare September 24, 2024 22:45

kkoehn requested a review from MrSerth September 24, 2024 22:47

This comment was marked as resolved.

Sign in to view

MrSerth reviewed Sep 25, 2024

View reviewed changes

app/services/proforma_service/convert_exercise_to_task.rb Outdated Show resolved Hide resolved

MrSerth mentioned this pull request Sep 25, 2024

Check task overwriting through ProFormA openHPI/codeharbor#1320

Open

18 tasks

kkoehn force-pushed the proforma-reuse-existing-file-on-import branch 6 times, most recently from 708bb5d to ebef98c Compare October 8, 2024 19:30

MrSerth reviewed Oct 8, 2024

View reviewed changes

kkoehn force-pushed the proforma-reuse-existing-file-on-import branch 2 times, most recently from e6a3f23 to 0ad75ef Compare November 26, 2024 21:57

basic functionality of reusing files based on xml_id.

3d54adb

Use xml_id_path to recreate proforma structure with tests and model_solutions containing multiple files

kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from 0ad75ef to fefb056 Compare November 26, 2024 22:12

MrSerth reviewed Dec 2, 2024

View reviewed changes

kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from fefb056 to d3e3727 Compare December 4, 2024 20:43

kkoehn and others added 3 commits December 4, 2024 21:44

refactor xml_id_path to use array

1f9d9a9

Update app/services/proforma_service/convert_exercise_to_task.rb

af14474

Co-authored-by: Sebastian Serth <[email protected]>

fix exercise#duplicate when xml_id_path is set

7e9974e

kkoehn force-pushed the proforma-reuse-existing-file-on-import branch from d3e3727 to 7e9974e Compare December 4, 2024 20:44

MrSerth reviewed Dec 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proforma: reuse existing files on import, if possible #2538

Proforma: reuse existing files on import, if possible #2538

kkoehn commented Sep 18, 2024 •

edited

Loading

codecov bot commented Sep 18, 2024 •

edited

Loading

MrSerth left a comment

kkoehn commented Sep 24, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

MrSerth left a comment

kkoehn commented Oct 8, 2024 •

edited

Loading

MrSerth left a comment

kkoehn commented Nov 26, 2024

MrSerth left a comment

MrSerth Dec 2, 2024

kkoehn Dec 4, 2024 •

edited

Loading

MrSerth Dec 8, 2024

MrSerth left a comment

MrSerth Dec 8, 2024

		# checking the last element of xml_id_path array for file.id
		codeocean_file = @exercise.files.where('array_length(xml_id_path, 1) IS NOT NULL AND xml_id_path[array_length(xml_id_path, 1)] = ?', file.id).first_or_initialize

Proforma: reuse existing files on import, if possible #2538

Are you sure you want to change the base?

Proforma: reuse existing files on import, if possible #2538

Conversation

kkoehn commented Sep 18, 2024 • edited Loading

codecov bot commented Sep 18, 2024 • edited Loading

Codecov Report

MrSerth left a comment

Choose a reason for hiding this comment

kkoehn commented Sep 24, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

MrSerth left a comment

Choose a reason for hiding this comment

kkoehn commented Oct 8, 2024 • edited Loading

MrSerth left a comment

Choose a reason for hiding this comment

kkoehn commented Nov 26, 2024

MrSerth left a comment

Choose a reason for hiding this comment

MrSerth Dec 2, 2024

Choose a reason for hiding this comment

kkoehn Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

MrSerth Dec 8, 2024

Choose a reason for hiding this comment

MrSerth left a comment

Choose a reason for hiding this comment

MrSerth Dec 8, 2024

Choose a reason for hiding this comment

kkoehn commented Sep 18, 2024 •

edited

Loading

codecov bot commented Sep 18, 2024 •

edited

Loading

kkoehn commented Oct 8, 2024 •

edited

Loading

kkoehn Dec 4, 2024 •

edited

Loading