-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proforma: reuse existing files on import, if possible #2538
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2538 +/- ##
==========================================
+ Coverage 69.80% 69.91% +0.11%
==========================================
Files 203 203
Lines 6488 6512 +24
==========================================
+ Hits 4529 4553 +24
Misses 1959 1959 ☔ View full report in Codecov by Sentry. |
218f9b9
to
a7affb4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a first round of my review, during which I found a few questions. Let's check these first and I'll continue with a second round.
While reviewing, I also got one extra question: Given the changes in CodeOcean to the XML Path ID and the updated metadata, would you recommend to reexport all exercises to CodeHarbor after merging this PR?
Yes that would absolutely make sense! |
a7affb4
to
4d3198e
Compare
This comment was marked as resolved.
This comment was marked as resolved.
4d3198e
to
ac0fe42
Compare
This comment was marked as resolved.
This comment was marked as resolved.
38f45b0
to
20a2705
Compare
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done, thank you! The current code changes look good and I've understand them.
Unfortunately, I still had issues verifying the code changes. Once again, I was using the already well-known file changes from here:
First of all, when the new version was in CodeHarbor and I tried uploading the old version, I got an exception there (the other direction from old to new works). Then, even when resolving this in CodeHarbor, I still couldn't import the old version in CodeOcean if the new version was stored already.
The error I got was: Validation failed: Files feedback message must be blank
. The problematic file is:
#<CodeOcean::File id: 67823922, content: "# actual tests here (in functions)\r\nfrom z_helpers...", context_id: 1101, context_type: "Exercise", file_id: nil, file_type_id: 8, hidden: true, name: "p51test", read_only: true, created_at: "2024-09-25 19:28:50.331358000 +0000", updated_at: "2024-09-25 19:29:34.363568000 +0000", native_file: nil, role: "regular_file", hashed_content: "70721d58fcbac5c9570e63633e778d9c", feedback_message: "Feedback", weight: nil, path: nil, file_template_id: nil, hidden_feedback: false, xml_id_path: ["63332861"]>
Despite the listed issue, I also found an issue with the proglang
method. Below is the fixed entry that also works if no execution_environment
has been selected:
def proglang
regex = %r{^openhpi/co_execenv_(?<language>[^:]*):(?<version>[^-]*)(?>-.*)?$}
match = regex.match @exercise&.execution_environment&.docker_image
match ? {name: match[:language], version: match[:version]} : nil
end
Could you include that, too, please (ideally with a test)?
708bb5d
to
ebef98c
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started reviewing tonight, but noticed another issue: Some files can be left unreferenced.
Specifically, consider an update form the "old" to the "new" version of task 845 (ZIP files attached above). As part of the update, I see the following query in CodeOcean:
CodeOcean::File Update All (1.0ms) UPDATE "files" SET "context_id" = $1, "context_type" = $2 WHERE ("files"."id") IN (SELECT "files"."id" FROM "files" WHERE "files"."context_id" = $3 AND "files"."context_type" = $4 AND ("files"."id" = $5 OR "files"."id" = $6) ORDER BY "files"."path" ASC, "files"."name" ASC) [["context_id", nil], ["context_type", nil], ["context_id", 1101], ["context_type", "Exercise"], ["id", 67824363], ["id", 67824368]]
And indeed, as expected after this query, some files were updated to have a context_id
and context_type
of NULL
. This shouldn't be the case, I think, and files no longer referenced should be deleted completely.
Otherwise, the functionality seems to work fine. I'll continue my review once the functionality is fixed.
e6a3f23
to
0ad75ef
Compare
Use xml_id_path to recreate proforma structure with tests and model_solutions containing multiple files
0ad75ef
to
fefb056
Compare
sorry for the delay - should be fixed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the continuous work on this topic. When checking the import with the logs on my machine, I (still) noticed the following (845 old to new):
CodeOcean::File Update All (4.7ms) UPDATE "files" SET "context_id" = $1, "context_type" = $2 WHERE ("files"."id") IN (SELECT "files"."id" FROM "files" WHERE "files"."context_id" = $3 AND "files"."context_type" = $4 AND ("files"."id" = $5 OR "files"."id" = $6) ORDER BY "files"."path" ASC, "files"."name" ASC) [["context_id", nil], ["context_type", nil], ["context_id", 1101], ["context_type", "Exercise"], ["id", 67825538], ["id", 67825539]]
↳ app/services/proforma_service/convert_task_to_exercise.rb:21:in `import_task'
CodeOcean::File Load (0.6ms) SELECT "files".* FROM "files" WHERE "files"."id" IN ($1, $2) ORDER BY "files"."path" ASC, "files"."name" ASC [["id", 67825538], ["id", 67825539]]
↳ app/services/proforma_service/convert_task_to_exercise.rb:34:in `import_task'
CodeOcean::File Destroy (1.0ms) DELETE FROM "files" WHERE "files"."id" = $1 [["id", 67825538]]
↳ app/services/proforma_service/convert_task_to_exercise.rb:34:in `import_task'
CodeOcean::File Destroy (0.7ms) DELETE FROM "files" WHERE "files"."id" = $1 [["id", 67825539]]
Especially, I notice two things:
- The two files are first updated and then removed (same IDs).
- During the update, they are still unreferenced from their context.
Furthermore, the UPDATE
query looks a bit weird (regarding the WHERE
clause). Nevertheless, I would tend to accept this, since the functionality is working now. Two questions still remain:
- Can we switch the order to avoid the unnecessary
UPDATE
? - Should we opt to have a dedicated transaction (or potentially a
SELECT FOR UPDATE
/ lock)?
<ActiveRecord object(s)>.with_lock do
# Stuff
end
<ActiveRecord class>.transaction do
# Stuff
end
Otherwise, please find some domain-level comments attached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have an issue when exercises get duplicated.
ExercisesController#clone
should be:
exercise = @exercise.duplicate(public: false, token: nil, user: current_user, uuid: nil)
(since otherwise duplicating breaks due to the UUID uniqueness check).
Furthermore, when duplicating an exercise, the newly introduced xml_id_path
on files is not reset (and hence simply kept). When the XML path ID is set through a ProFormA import, this might be okay-ish, but when it is set due to a previous export, we might mingle the IDs, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the first part - I can't follow the second part. If the xml_id_path is valid for one Exercise/Task context it should be valid for a copied one as well, since we are not using CodeOcean-ids for the path, but the xml_id right?
Edit:
We are using the id to generate the xml_id_path
, but I still don't see it as a problem. The important thing is, that the used ids are unique in context of the given exercise. That would still be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the first part 👍
Your considerations on the second part are understandable too. And indeed, since the xml_id_path
is only checked in the context of an exercise, this shouldn't be a problem in a technical sense.
By the way:
-
Why aren't we checking the uniqueness with something like:
validates :xml_id_path, uniqueness: {scope: %I[context_id context_type]}
-
Why aren't we checking that the :xml_id_path is blank if
context_type != 'Exercise'
(This requires a custom validation).
That having said, I also see advantages when the xml_id_path
stays unchanged for duplicated exercises (since it allows easier tracing of changes, ...). Maybe we simply add a single comment (to above validations or the duplicate
method?) to explain why we decided to keep it and that it might get out-of-sync with the id
(which is fine and intended).
fefb056
to
d3e3727
Compare
d3e3727
to
7e9974e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I did another round of review and only have another marginal comment. Let's get this done this week, if you like 💪.
# checking the last element of xml_id_path array for file.id | ||
codeocean_file = @exercise.files.where('array_length(xml_id_path, 1) IS NOT NULL AND xml_id_path[array_length(xml_id_path, 1)] = ?', file.id).first_or_initialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This SQL query is executed per file, thus creating an N+1 query. Rather than using SQL here, it would have been more beneficial to use in-memory filtering with Ruby. That way, we would load all files at once (using @exercise.files
) and then simply select the matching file. I would expect that this improves performance (less SQL) and potentially readability (easier access to last array element). Since we access @exercise.files
in destroy_old_files
already, this would even skip the first SQL query (to get all files).
I see this part as optimization only and, given the infrequent use of the endpoint and the time spent already, would be fine to skip this optimization here.
Use
xml_id_path
to recreate proforma structure with tests and model_solutions containing multiple filesalso sets
xml_id_path
for exported files, if it is not set. (similiar to uuid)