Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace data space with workspace in docstrings #743

Open
cbkerr opened this issue Apr 14, 2022 · 17 comments · Fixed by #685
Open

Replace data space with workspace in docstrings #743

cbkerr opened this issue Apr 14, 2022 · 17 comments · Fixed by #685

Comments

@cbkerr
Copy link
Member

cbkerr commented Apr 14, 2022

New focus is on pinning down what "data space" means

#743 (comment)

Original issue description

Summary

Consider the following analogy: "The directory of the job's workspace is to job as the directory of the project's workspace is to project."
It is currently false!! Fixing this would break things, making it a good candidate for 2.0.

The fix would make the following analogies true:

  • "Job workspace is to job as project workspace is to project."
  • "Job directory is to job as project directory is to project"

Problem Details

What we have now is: "Directory of the job's workspace is to job as directory of the project is to project."
(If your head is spinning like mine was, read those again after reading the rest of the issue)

Some example usages in the documentation:

Here is an illustration of the problem. When developing dashboard, to display an image from a job or project, you need to get the job or project directory in a general way.
The way I found to do this was job_or_project.fn("") because currently the separate syntax is job.workspace() or project.root_directory(). Both are aliased to .path().

Solution

  1. Replace "job workspace", "job workspace directory", and job.workspace() with "job directory", or job.path() (also accessible with job.fn("")). The job directory is a directory containing files associated with a signac job.
    Currently job.workspace() is an alias for the job path. I prefer writing "job directory" rather than "job path" in the documentation, even if you would write job.path() in the code, because a directory is a container, which is a distinct concept from the path that identifies the container. This deprecation is announced in Additional deprecations #685.
  2. Reserve the word "workspace" as a directory containing directories and applies to entities that act like a signac project.

Schematic

# Current
project/                                       <-- the project directory
project/workspace/[jobid]/                     <-- the default, a job directory contained in project workspace


# What this enables in the future (aligning with what Vyas and Simon have brought up)
project/workspace/[jobid]/workspace/           <-- after upgrading a job to a project
project/workspace/[jobid]/workspace/[jobid]    <-- adding some new jobs in the sub project

Benefits

  • Makes these analogies true!
    • "Directory of the job workspace is to job as directory of the project workspace is to project."
    • "Job directory is to job as project directory is to project"
  • Fewer things to explain in the future glossary.
  • Saying "project workspace" parallels nicely with "project data space", the abstract concept we often use when talking about signac projects.
  • Job directory describes exactly what it is.
  • Project workspace describes exactly what this directory is called, especially after removing the ability to customize it in Remove configurable workspace directory  #714.
  • We can unambiguosly refer to the project directory and project workspace as two separate things.

Signac roadmap for context

I then realized that @vyasr already mentioned this idea in the tentative signac roadmap coming at it from a different angle. I think that means it's a good time to open a focused discussion on it. He suggested:

Using path instead of Project.root_directory and Job.workspace to facilitate a unified Directory interface for working with arbitrary filesystem layouts

Does this writeup capture your idea @vyasr?

@joaander
Copy link
Member

What are the differences in semantics between "project workspace" and "project data space"? Or are they synonyms?

@vyasr
Copy link
Contributor

vyasr commented Apr 14, 2022

Yeah, my proposed change is intended to address this problem in a slightly different way. Essentially, both a Job and a Project are directories. A directory has a path. Therefore, both of them should have a path, which fixes the analogy.

The concept of a workspace is a little more specific, relating to the exact directory layout currently used by signac. The data model can be roughly described as "A root directory, which we call a Project, contains a subdirectory called its workspace. That workspace directory in turn contains one subdirectory per data point, each of which is called a Job." A Job therefore does not have a workspace.

In fact, the solution that you proposed (allowing jobs to themselves contain workspaces) is precisely what @csadorf and I were trying to get at when we discussed the long-term roadmap and I made the case for both Job and Project subclassing a generic Directory! Both Job and Project are Directories, so they have a path, and that is independent of a particular layout. A given Project needs to have a well-defined layout, which is a higher-level concept that currently encompasses the workspace as well. By encoding that layout in a standalone "data model" concept, we would allow users to define different data layouts such as the nesting that you proposed. The project/workspace/job/ hierarchy is a specific data model that just happens to be our default.

@cbkerr
Copy link
Member Author

cbkerr commented Apr 15, 2022

What are the differences in semantics between "project workspace" and "project data space"? Or are they synonyms?

@joaander I've found 2 definitions of "data space" in the docs:

@cbkerr
Copy link
Member Author

cbkerr commented Apr 15, 2022

the solution that you proposed (allowing jobs to themselves contain workspaces) is precisely what @csadorf and I were trying to get at when we discussed the long-term roadmap

@vyasr I made the connection between fixing the double meaning of workspace and your future "data model" after thinking about how to clarify the definition of workspace. I wrote out the future directory structure to show myself how clarifying the definition helps resolve some of my confusion around your idea. I will clarify my initial example that I was applying "my proposal" to the idea I had heard you discuss.

I think we are mostly on the same page!
(edit: the following is a misconception corrected below). However, I don't think a Job is a directory For instance: job = project.open_job({a: 1}) creates a job but not a directory until you job.init(). I don't have as clear an example, but I don't think a Project is a directory either.
I would be comfortable saying that in the future, Job and Project both inherit from Directory, but not that they are directories. It also feels like we need to distinguish between the concept of a project (or job) and how it shows up on the file system.

By encoding that layout in a standalone "data model" concept, we would allow users to define different data layouts such as the nesting that you proposed.

What's a "data model"? I prefer the other term you use "data layout". But I could see other options too like "project structure/template/layout" or "file/directory layout". You use "file layout" in the roadmap.

@bdice
Copy link
Member

bdice commented Apr 15, 2022

Inheritance relationships like class Project(Directory) are usually described in programming with the terminology “is a,” as opposed to composition patterns that use “has a.” Not to get too deep into ontology but that word choice is common in CS. https://en.wikipedia.org/wiki/Is-a

In that sense, a Project or Job “is-a” Directory under the proposed class hierarchy.

@bdice bdice closed this as completed Apr 15, 2022
@bdice
Copy link
Member

bdice commented Apr 15, 2022

Fumbled buttons on my phone. Reopening.

edit: … twice.

@bdice bdice reopened this Apr 15, 2022
@cbkerr
Copy link
Member Author

cbkerr commented Apr 15, 2022

Thank you for clarifying that!! I'll add a note about it to my comment but preserve my expressed confusion.

@joaander
Copy link
Member

What are the differences in semantics between "project workspace" and "project data space"? Or are they synonyms?

@joaander I've found 2 definitions of "data space" in the docs:

I brought this up as it became an issue when writing the workflow tutorial for hoomd: https://hoomd-blue.readthedocs.io/en/v3.0.1/tutorial/05-Organizing-and-Executing-Simulations/01-Organizing-Data.html

The signac tutorials use the word "data space" a lot, so I introduced that concept first. But then signac mandates the directory name is "workspace". It is confusing for users (especially new users) when more than one word describes the same thing. If they are the same, it would be good to only use one - workspace since that is the required directory name. If they are different, then they need to be defined clearly and used consistently.

@cbkerr
Copy link
Member Author

cbkerr commented Apr 15, 2022

It is confusing for users (especially new users) when more than one word describes the same thing.

Totally agree!

Issue tracking glossary: glotzerlab/signac-docs#59
Google doc on defining terms: https://docs.google.com/document/d/1_merhcK3ohas4IloE616yL7gypMRFcaQLh2oChExC7o/edit?usp=sharing

@vyasr
Copy link
Contributor

vyasr commented Apr 20, 2022

@cbkerr could you update this issue in case there were any important/useful/relevant points made in the meeting today that you think would help contribute to this discussion?

@cbkerr cbkerr linked a pull request May 7, 2022 that will close this issue
12 tasks
@cbkerr
Copy link
Member Author

cbkerr commented May 7, 2022

Now that #685 and #752 and I think all that is left of this issue is to phase out the use of "data space".

@stale
Copy link

stale bot commented Jul 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@cbkerr
Copy link
Member Author

cbkerr commented Aug 15, 2022

@stale-bot this is not ready to be closed. This should remain open because "job workspace" still returns many hits in the next branch.

./signac/__main__.py:912:        help="Print the job's workspace path instead of the job id.",
./signac/contrib/import_export.py:741:    """Copy the source to job's workspace.
./signac/contrib/import_export.py:771:    """Copy the source to job's workspace when the source is a directory.
./signac/contrib/import_export.py:872:    """Copy the source to job's workspace when the source is a zipfile.
./signac/contrib/import_export.py:1006:    """Copy the source to job's workspace when the source is a tarfile.
./signac/contrib/import_export.py:1209:    data space paths that can be imported as a job workspace into project.
./signac/contrib/job.py:610:        """Initialize the job's workspace directory.
./signac/contrib/job.py:705:        """Remove the job's workspace including the job document.
./signac/contrib/job.py:853:        """Enter the job's workspace directory.
./signac/contrib/job.py:863:        Opening the context will switch into the job's workspace,
./signac/sync.py:210:    """Synchronize two job workspaces file by file, following the provided strategy."""
./signac/sync.py:298:        The src job, data will be copied from this job's workspace.
./signac/sync.py:300:        The dst job, data will be copied to this job's workspace.

I made a more specific issue to track usage of "data space": #809

@stale stale bot removed the stale label Aug 15, 2022
@vyasr
Copy link
Contributor

vyasr commented Aug 28, 2022

@cbkerr any activity (including your comment) will cause stalebot to remove the stale label, but it will reapply it again as soon as the issue goes inactive again. If you want to keep an issue open permanently, you need to add the pinned label (left to you as an exercise if you think it's worth keeping this open indefinitely even if nobody puts in the effort to fix it 😉).

@vyasr
Copy link
Contributor

vyasr commented Nov 2, 2022

@cbkerr could you revisit this now and see what you would like to change? IIUC the remaining action item is to remove all references to a job's "workspace" in docs in favor of a job's "directory" or the "path to a job" or something along those lines, is that correct? Would you be able to make that change?

@stale
Copy link

stale bot commented Jan 7, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 7, 2023
@cbkerr
Copy link
Member Author

cbkerr commented Mar 30, 2023

All references to job workspace will be gone after glotzerlab/signac-docs#185.

I'm changing the name of the issue to better track that we need to resolve this comment: #743 (comment)

@stale stale bot removed the stale label Mar 30, 2023
@cbkerr cbkerr changed the title Fix double meaning of "workspace" Replace data space with workspace in docstrings Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants