-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model collections #163
Comments
Dear @apdavison this is an important use case and similar for datasets (although with much larger collections, I suppose). We received multiple requests by users and reviewers to "hide" versions in the KG Search and only present them on direct request. I had something like the following in mind for this, but did not yet fully discuss this with @olinux or the development team (I think it matches what you suggest, but maybe not completely?):
If this would not work for your use case could you maybe give a more concrete example? @olinux your thoughts on this? |
@lzehl This is close to the model collection use case; the main difference is that each of the component single neuron models has both a |
@apdavison I think I do not understand the structure of this completely... Let me ask a couple of questions (to see where I might have the wrong assumption):
|
@lzehl Let me try to restate the problem:
The problem is that because there are so many they would dominate the KG Search results, so we would like a single entry per collection in the results, which gives access to all the Models/ModelVersions in the collection. (Note that we have several such collections). Possible solutions: 2. create a Model (with one or more associated ModelVersions) to represent the collection. |
Thanks for explaining again @apdavison. Here my thoughts: Normally I would say that for grouping related models typically a Project should be used. But that of course does not solve the problem that you do not want to flood the KG Search Results. For solution 1: I still have some question here: why is there a ModelCollectionVersion needed? would it be not sufficient to group all Models (and with that all their versions) into one collection? Is there any other metadata you would like to capture for a modelCollection, besides using it to group related Models? For solution 2: Assuming there is only one Collection and no CollectionVersion needed. I would maybe redefine the ResearchProduct / ResearchProductVersion schemas by moving the "hasSupplementVersion" to the ResearchProduct (that would work for datasets as well, and I suppose for software too @jagru20 ?). I would then define one Model for the whole collection and define for each related model one ModelVersion and list them in hasSupplementVersion. I would leave the hasVersions in the Model blank. For each version of the related models I would again define ModelVersions and connect those via hasNewVersion/hasAlternativeVersion with the corresponding ones in hasSupplementVersion. Note: the "hasSupplementVersion" could also be renamed to something else if needed. |
for example, more models might be added to the collection. |
@apdavison I see. That case could be covered in solution 2 over a Project then: Each ModelCollectionVersion would be one Model (rest is the same as stated above) and these Models are grouped into a Project. Would that work? Solution 3 (similar to 2 but a different angle): Leave schemas as they are, but add an optional property "isPartOfCollection" to a ResearchProduct that can link to another ResearchProduct of the same type, in your case a Model. The referenced Model(s) (entered in "isPartOfCollection") represent(s) the CollectionVersion(s) which can be grouped into one Project. Each Model that has listed another Model in "isPartOfCollection" does not need to be directly visualized in the KG Search (including their ModelVersions), only the once that are referenced in "isPartOfCollection" are grouped in a Project. |
I need to think this more through... @apdavison could you let me know in which use case a collection version is really needed (e.g., should they always get a DOI?) I understood first that the feature you're missing is mainly for visualizing purposes and not because that this structure is needed for referencing. I'm just asking again because the versioning makes this problem much more difficult to solve cleanly... Or asking differently, is it necessary that the overall collection is citable (meaning that it get's a DOI)? |
Yes, the collection needs to be citable. It is less important that the individual members be citable. If we add the property "isPartOfCollection" to ResearchProductVersion rather than to ResearchProduct, I think that solves the versioning problem. Then any ResearchProductVersion which for which "isPartOfCollection" is not empty should be "hidden", and any ResearchProduct for which all its versions are hidden should also be hidden. |
@apdavison sorry for all the spams on that issue today. Would the previous sketch of the model satisfy your use case? |
@lzehl any possibility you could share the diagram with me in an editable form? then I can show what I have in mind |
@apdavison Of course, I'll send you an email. |
Hi all, I am not sure if I can give much input to the discussion, but to answer the question on the location of hasSupplementVersion: I am not sure if that property makes Sense in the researchProduct because it would mean, that one software entity can link to a softwareVersion. In my opinion, this possibility should be - for software - located at researchProductVersion, as it is theoretically possible that different versions of the same software have different supplementVersions (i.e. the former components). However, if hasSupplementVersion needs to be moved, it might be sensible to amend softwareVersion such that it holds a property "hasComponents" which also would be a bit more straightforward for software. |
@jagru20 yes. I thought that you might mention the "hasComponents". Let's wait for @apdavison feedback on the sketch. Please continue following this discussion so that we can solve this issue for all research products sufficiently (considering all adoption of the concept of an additional grouping for the research products and/or research product versions). |
@apdavison looks good I think. The ResearchProducts on the right side and the ResearchProductVersions of the different model versions in the middle should not show up in the KG Search, correct? Some points / questions:
|
|
Hi, IMHO, we should (as discussed above):
|
Based on @apdavison example structure and @olinux comments I'd like you to have a look at the following drawing |
I picked up the following aspects from the suggested approaches:
Although quite complex in structure this seems to be the most consistent way of capturing such cases in the graph database. USE CASE ONE: Model collections (@apdavison does this still fit?) @olinux does this still fit with your thoughts as well? |
Hi Lyuba, |
@olinux & @jagru20 for "hasComponents" to be honest it does not really makes sense to me to allow pointing from a version to a concept. Would it not be sufficient to allow "only" the registration of the conceptual collection (white shade) with it's Research Product components (gray shaded) in the above depicted metadata model? Meaning for such a case the colored collection versions could be left out if they do not make sense to be explicitly captured. |
@lzehl this is actually not the same use-case: What I had in mind for software was that you're registering your software - let's say "Knowledge Graph" with its version "v3" -> now "Knowledge Graph v3" depends on a component called "ArangoDB". So what I would do is to register "ArangoDB" as another Software. There's plenty of different versions for ArangoDB and the "Knowledge Graph" is trying to upgrade regularly to them. The question is now which granularity you would like to track. You could state:
The first is obviously the most generic but also the one which needs the least maintenance. Here, you would need to point to the "concept" for "ArangoDB" which is version independent. The second is the most practical if we want to improve granularity to a version level (and therefore disallow to link ResearchProductVersions to ResearchProduct) since at least the metadata wouldn't need to be updated for every minor version -> nevertheless, if we decide to migrate to ArangoDB 4 (which is possible without changing the Knowledge Graph version number since it's an internal dependency), our meta-data entry would need to be updated though. Here the question appears who is actually doing it and how the software team is going to be notified about the upgrade. The third approach would only be realistic if we would automatically ingest dependency trees (based on the existing mechanisms like Maven / Gradle / npm / ... ) which - imho - is not the purpose of openMINDS. It would definitively not be possible to manage it. |
@olinux thanks for providing this hands-on example. It helps a lot to organize my thoughts better. I think the key point for software is that there we are talking about dependencies of a software product which has frequent sub-releases that might not all be captured in the KG. The components in such a software were (most likely) not build to serve that software but were produced as independent products, similar to the models in a collection. The difference to the model collection: all software dependencies are needed in order for the main software to work while in a model collection a single model could also be left out without affecting the overall functionality of the model collection (in most cases I guess). My question here is clearly: should software dependencies on that level be really captured within the graph database or is it not sufficient or even better to document such dependencies within the software repository in the versioned specific software specifications? That does not mean that we may want to capture the dependencies directly in a few cases, but for those I still would think the coarse level you suggest would be sufficient. I'm asking this for two reasons: on the one hand that level of detail seems to me more on tier-3 level or even beyond (since changes might happen frequently) on the other hand I think we do not aim to register all software out there within the KG in order to cover all possible dependencies of all software products. From your comment above I think you argue in the same direction, correct? What could be done for software to "outsource" this issue is to allow to point to a "dependency file" for a specific registered software version and to better capture that the repository link of a software product does point to the overall repository and not necessarily the registered version (e.g. the official release of that version). |
As far as I understood, the purpose of the current components attribute in software was not to capture all possible dependencies of a software, but rather to yield to other neuroscience-related software that this software uses as a component to function. What we considered as neuroscience-related until now is software, that either already is part of the KG or is to be integrated into it (i.e., no commonly known libraries or services, but other specialized software or libraries). I am not totally sure, but I think this is also a question about what information we want to deliver in defining another software as a component. Do we want to
In the first case, the first of @olinux granularity examples is totally sufficient IMO. Maybe @bweyers could briefly explain the initial intention behind the Components entry? |
@apdavison , @olinux , @jagru20 , @UlrikeS91 , @skoehnen , @bweyers I've made the following changes now (within the PR #168):
All properties discussed above are of course not required. |
this issue seems to be solved for now therefore I close it. Let see if it will hold up in the use cases |
A common scenario in modelling is that we have a large number of similar/related single neuron models. Each model needs to have a separate representation in the KG, as they can be used individually, and may have different validation reference data, etc. However, we don't want to flood the KG Search results with hundreds of such models, rather the user should retrieve a model collection, with links to the individual component models.
Currently we achieve this by keeping the individual models in the Model Catalog, and releasing a uniminds ModelInstance to represent the collection.
This is a hack, and for openMINDS / KG v3 I'd like to do things more cleanly.
What I propose is to use the "hasSupplementVersion" property of
ModelVersion
to hold the links from theModelVersion
representing the collection to the list of component single neuron models.The question remains: how to hide the individual component models in the KG Search? If this is the only use case for "hasSupplementVersion", then I guess the KG UI logic could exclude models which are "supplements" to another from the search results, or have a checkbox to allow the option of including such models. An alternative approach would be to add a
ModelCollection
schema to openMINDS.The text was updated successfully, but these errors were encountered: