Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comprehensive documentation improvements #4031

Open
wants to merge 27 commits into
base: develop
Choose a base branch
from
Open

Conversation

seancolsen
Copy link
Contributor

@seancolsen seancolsen commented Nov 19, 2024

Overview

This PR:

  • Adds a lot of new user-focused docs content to our docs guide
  • Cleans up a bunch of other stuff in the docs guide too
  • Makes changes in the UI to make some things easier to document and reference
  • Adds lots of small help bubbles in the UI, many of which link out to the new content in the docs guide
  • Adjusts terminology in the UI to make things more consistent and clear
  • Makes some small UX adjustments in a few places too

High level demo of docs content changes

2024-12-03_14-57-18.mp4

Demo of UI changes

2024-12-03_11-31-33.mp4

Notes

  • Docs: commits are docs content
  • Other commits are UI changes

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the develop branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

@seancolsen seancolsen added this to the v0.2.0 (beta release) milestone Nov 19, 2024
docs/docs/index.md Outdated Show resolved Hide resolved
docs/docs/user-guide/access-control.md Outdated Show resolved Hide resolved
docs/docs/user-guide/collaborators.md Outdated Show resolved Hide resolved
@seancolsen
Copy link
Contributor Author

seancolsen commented Nov 21, 2024

Thanks for the proofreading, @TheGiraffe3. I addressed your feedback in 71f92e8. (This commit will later be squashed, FYI, so you might not see it.)

@seancolsen seancolsen marked this pull request as ready for review December 3, 2024 20:28
@seancolsen seancolsen added the pr-status: review A PR awaiting review label Dec 3, 2024
docs/docs/administration/backup-restore.md Show resolved Hide resolved
docs/docs/api/index.md Outdated Show resolved Hide resolved
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts while watching docs changes video

Big Picture: I like the new structure better than the old, with some caveats. I got distracted with some content concerns while watching, which are noted below (as well as structure caveats).

  • Administration should be below User Guide in nav as far as daily importance goes, but the "install" sections seem like they should be towards the top. This is a weakness of wrapping installation into broader section

  • Too much editorializing ("nascent and somewhat rough", "rudimentary support", etc.). Just ask for feedback (if anything). Perhaps a little section at the bottom of these pages inviting a user to request extra querying features, etc.

  • Overall, the prose is not very information-dense. We should spend some time tightening that up. For example, on the "Mathesar 0.2.0" page, you have:

    We do not support upgrading from previous versions to 0.2.0. You will need to install a fresh instance of Mathesar to use this version.
    Instead, consider:
    We do not support upgrading from previous versions to 0.2.0. You must reinstall Mathesar to use this version.

  • Generally, we should change the passive voice to active where possible. E.g., in the "Mathesar User Guide" section, the first sentence could start with "Mathesar gives you a spreadsheet-like...". If we really want to include the information about the fact that Mathesar is a web application, we could write "The Mathesar web application gives you a spreadsheet-like..." or even "The Mathesar web UI gives you a spreadsheet-like..."

  • I think "Collaborators" is a confusing term for the sidebar, since it's non-standard jargon. Maybe "Sharing PostgreSQL roles" for a header instead?

  • One of our most unique features is UI-driven schema (and data!!) migrations. This is not mentioned anywhere, or at least it's not visible in the nav bar anywhere

  • "Relationship", "Relational", "Related", etc. are overloaded terms, the way they're being used in the docs (and elsewhere). I.e., the docs overload them. A "Relation" in an RDBMS refers to a table, a view, a sequence, etc. Basically, anything whose OID is stored in the pg_class table in PostgreSQL. It does not refer to a foreign key or the mapping between two tables defined by entries in a foreign key column. This leads to never ending confusion. Consider the sentence from the PostgreSQL docs:

    Referential integrity
    A means of restricting data in one relation by a foreign key so that it must have matching data in another relation.

    Also see the definition of relation from the PostgreSQL Glossary:

    Relation
    The generic term for all objects in a database that have a name and a list of attributes defined in a specific order. Tables, sequences, views, foreign tables, materialized views, composite types, and indexes are all relations.

    More generically, a relation is a set of tuples; for example, the result of a query is also a relation.

    Normally, I detest appeals to authority (e.g., the authority of the PostgreSQL docs), but I think it's crucial to get this right, and align it with those docs, since:

    • An advanced user will reduce their estimation of our credibility if we misuse the term (especially since this is a kind of common "gotcha"-inducing mistake), and
    • A naive user will end up extremely confused if they're trying to cross-reference between Mathesar's docs (or UI) and the PostgreSQL docs when solving some problem.

    With all that said, I completely understand that the colloquial uses of the words "relation", "relationship", "related", etc. make it an easy term to use for foreign key references. I'd argue that's precisely the problem, since the colloquial meaning is a confusing red herring once you're reading other literature about RDBMSs (including the PostgreSQL docs).

    To soften my point a bit: While the PostgreSQL docs are quite precise w.r.t. the words "relation" and "relational" (these refer only to the "proper", technical concept), they're a bit looser with the words "relationship" and "related", which they treat as non-technical terms. We could go that route, but I think it would be a mistake, since it's really confusing to have such related words refer to such unrelated concepts. Even in the PostgreSQL docs this leads to some inconsistencies and confusion: See their section on parent-child relationships in the table inheritance. I also note that if you click through documentation versions for PostgreSQL in the Foreign key section, they seem to be moving away from "related" and "relationship" when discussing foreign key constraints.

My suggestion is that we try (yet again) to develop terminology for foreign key constraints that does not lean on "relationships" as a term, and maybe doesn't even lean on the concept. This will also have the benefit of letting us generalize to actually use the technical term "relation" when we have views, sequences, etc. represented in Mathesar.

Thoughts watching the product walkthrough video

  • "PostgreSQL Role Name": I definitely agree we need to change this away from "username", but why not just "role name"?
  • Connect/Read + Create : I think this is an improvement
  • Database Permissions page: Love the changes, definitely a net improvement
  • Same major concern about "relationship" terminology

Thoughts while reading through the code are noted line-by-line. I will probably submit another review with more line-by-line comments, but I'm a bit too fatigued at the moment to keep going on that.


## Exploration and access controls

- The Data Explorer will now allow you to modify any data. It is a read-only reporting tool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The Data Explorer will now allow you to modify any data. It is a read-only reporting tool.
- The Data Explorer will not allow you to modify any data. It is a read-only reporting tool.


- The Data Explorer will now allow you to modify any data. It is a read-only reporting tool.

- Your ability to view data in the Data Explorer is determined by the privileges of your PostgreSQL [role](./roles.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence (and others like it) is slightly misleading in a situation where there isn't a bijection between users and roles.


A short-term consequence of this development strategy is that, for the time being, Mathesar _only_ works with PostgreSQL databases. However we are hopeful that in the future we'll have the opportunity to leverage PostgreSQL's powerful [Foreign Data Wrapper](https://www.postgresql.org/docs/current/postgres-fdw.html) functionality to connect to other kinds of databases such as MySQL, SQLite, Oracle, MongoDB, and more.

Mathesar's tight integration with PostgreSQL also means that, in order to function, Mathesar needs to install a few [Mathesar-specific schemas](./schemas.md#internal) upon connecting to your PostgreSQL database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider reframing as:

Mathesar needs to install some SQL and PL/pgSQL functions upon connecting to your database. For clarity and security, these are kept in Mathesar-specific schemas.


- The **Internal Server:** Most Mathesar installations have an internal PostgreSQL server which the Mathesar application controls and utilizes for storage of application-specific metadata.
The biggest superpower of databases (specifically _relational_ databases like PostgreSQL) is the ability for cells to reference records from another table. In PostgreSQL, this concept is called foreign key constraints, and Mathesar leverages it so you can model your data with [relationships](./relationships.md). If you've ever used `VLOOKUP` in a spreadsheet, you'll love using relationships in Mathesar!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of the overloaded and confusing nature of the "relation*" terminology.

@@ -20,11 +28,86 @@ If you're starting your database from scratch with Mathesar you can either:

- Use another tool to create your database on an external server and then connect Mathesar to it. You can administer that external server yourself, or choose from a variety of hosted PostgreSQL solutions such as [Amazon RDS](https://aws.amazon.com/rds/postgresql/pricing/), [Google Cloud SQL](https://cloud.google.com/sql/postgresql), [Supabase](https://supabase.com/database), and others.

## Connecting a database
## Database Permissions {:#permissions}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using "Privileges" for consistency with the way they're referred to in the PostgreSQL docs, as well as these docs.

@@ -0,0 +1,25 @@
# Mathesar Metadata

Mathesar's philosophy is to keep as much of your data as possible inside your connected PostgreSQL database, structured consistently with the way it appears in the Mathesar interface. However, some of the customization that Mathesar offers doesn't fit neatly into the PostgreSQL model, so Mathesar stores a thin layer of metadata in its [internal database](./databases.md#internal) to support these features.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point is repeated on another page. Is that intentional?

@@ -0,0 +1,190 @@
# Relationships
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in main review.

@@ -0,0 +1,190 @@
# Relationships

Relationships allow a single cell in one table to reference a row in another table. When one table references another in this manner, the two tables are said to be "related". This is a core feature of relational databases, and it allows us to model complex data structures using multiple tables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph in particular mixes up the terminology in a way that may trip up users if they're doing any supplemental self-education.


### Limitations of Mathesar's reference columns

- Mathesar does not support supports "composite" foreign keys — foreign keys that reference _multiple_ columns in the referenced table at once.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Mathesar does not support supports "composite" foreign keys — foreign keys that reference _multiple_ columns in the referenced table at once.
- Mathesar does not yet support "composite" foreign keys — foreign keys that reference _multiple_ columns in the referenced table at once.


## What is a role

PostgreSQL role system is elegant and powerful, if albeit somewhat idiosyncratic.
Copy link
Contributor

@mathemancer mathemancer Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PostgreSQL role system is elegant and powerful, if albeit somewhat idiosyncratic.
PostgreSQL's role system is elegant and powerful, albeit somewhat idiosyncratic.

Copy link
Contributor

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a significant improvement over the existing documentation and UI language. Really nice work! In this round of review I focused on the documentation. I think the overall hierarchy makes sense.

I did, however, run out of time, specifically for reviewing the "Your data in PostgreSQL" section and its documents. I'll take another look at those tomorrow but I wanted to make sure you had my existing comments first.

docs/docs/releases/0.1.4.md Show resolved Hide resolved
docs/docs/api/index.md Outdated Show resolved Hide resolved

!!! example

To call function `tables.list` from the Tables section of this page, you'd send something like:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a discrepancy in calling the RPC methods "functions" on this page, but then referring to them as "Methods" in the sidebar. It would be nice to unify that language.

Additionally, it would be nice if this page referenced the methods page and gave some context, for example: "See the methods page for avaliable RPC methods".

To put it another way: How do we help users translate this formatting from the python methods page:

add(*, database_id, user_id, configured_role_id, **kwargs)

into the equivalent rpc call formatting:

{
  "jsonrpc": "2.0",
  "id": 234,
  "method": "tables.list",
  "params": {
    "schema_oid": 47324,
    "database_id": 1
  }
}

perhaps one hand-picked example on the index page would be enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Can you implement this? I would like to stick with "methods" because that's what the RPC spec uses.

docs/docs/user-guide/access-control.md Show resolved Hide resolved
@@ -0,0 +1,16 @@
# Overview of Access Control in Mathesar

To manage access to data, Mathesar utilizes PostgreSQL's powerful role-based permissions system. Each user accesses a database through a specific PostgreSQL role, and the user's access is determined by the role's privileges within PostgreSQL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling a bit with the second sentence. I'm uncertain if this is even better, but how about something like:

Mathesar manages data access using PostgreSQL's powerful role-based permissions system. Users interact with the database through designated PostgreSQL roles, with their access determined by the privileges assigned to those roles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Can you implement this?

docs/docs/user-guide/roles.md Outdated Show resolved Hide resolved

The owner generally can do anything directly to the object, but not necessarily other objects contained within it. For example, a role might own a schema but not have access to certain tables within the schema.

## Privileges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might we need to explain "permissions" vs. "privileges" somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Can you implement this?

docs/docs/user-guide/schemas.md Outdated Show resolved Hide resolved
@@ -0,0 +1,21 @@
# Stored Role Passwords

Mathesar stores passwords for any [roles](./roles.md) that you would like to use when authenticating with PostgreSQL to work with data. Roles with stored passwords can then be assigned to [users](./users.md) via [collaborators](./collaborators.md)
Copy link
Contributor

@zackkrida zackkrida Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "authenticating with PostgreSQL to work with data." of course makes sense, but may be difficult to connect to the act of using Mathesar for most users. When does "authenticating with PostgreSQL to work with data." happen? When I connect a database? The phrase describes an "under the hood" mechanism of how Mathesar works.

I do suspect the ultimate fix here is that "Stored Role Passwords" are ultimately a field in the "add database connection" and "add collaborator" forms and don't even need to be a "top level concept" for users (I described this a bit in a general document about permissions, I'll add that background below), but there's probably an incremental improvement to be made here while the concept still exists.

Expanded thoughts on "stored role passwords"

Could storing role passwords in Mathesar simply be added to the "add collaborator" flow?

In what scenario would someone want to store a role password in Mathesar without an associated Mathesar user connected to the role?

In this new approach, the role password would be stored as it is now, but in the UI there would be a password input in the modal after clicking "add collaborator". If the role password was already stored from a previous collaborator, we could disable the input and show a "this password is already stored" message and/or let the input be editable so they could change the password if desired.

The table listing collaborators would then have an "edit collaborator" button for each collaborator, which allows for changing the associated role and/or changing the stored password.

I think this could smplify things quite a bit. The concept of a "Collaborator" as a link between a Mathesar user to a DB postgres role that gets authenticated via the role password seems to be a simpler mental model.

A small side benefit: we could potentially even remove the "In Mathesar" vs. "On The Server" concept from the settings sidebar.

Alternatively, an approach similar to the above could be taken, but added to the singular "Roles" table and settings page instead. So the table headers would be "Role", "Login", "Saved Password", "Child Roles", and "Actions". Then, a singular "edit" modal from the pencil icon could be used to add/remove child roles and have a checkbox to choose to store the password, with an input to add/update the password itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this to you to address however you see fit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm not really sure what to make of this. It seems like you're suggesting changes to the app. For example:

Could storing role passwords in Mathesar simply be added to the "add collaborator" flow?

This is a great suggestion. It's just that it's pretty far out of scope for this PR. In this PR the goals are: (1) docs content in the guide, and (2) minor UX and copy adjustments in the app to further align the UI and the docs with one another.

I see significant opportunity to improve the UX and iron out some of the kinks you bumped into on our call. But I suspect we probably won't take up that work right away. (That's really a question for Kriti.) And when we do, it'll involve a number of higher-level conversations between various people who were more deeply involved in all this design work, especially Pavish and Ghislaine. If I'm understanding your suggestions correctly here, I think we'll really need to wait on considering it until we open up the "access control" design can of worms again.

In what scenario would someone want to store a role password in Mathesar without an associated Mathesar user connected to the role?

Yeah, good question. I don't think it's necessarily that someone would intentionally want this state as an end goal. But it's more so that this state is possible given our data model, and may happen to occur temporarily as a user is going about other work. For example: Say I have a user alice assigned to the role reporting, and no other users assigned to that role. Alice has left the team and now I'm onboarding her replacement Bob. First I delete the user alice. That in turn deletes the collaborator associated with her user. Then I create Bob's user, bob and assign him to the reporting role. Mathesar keeps the stored password for reporting throughout this process even though there is a point where no collaborators are assigned to it.

For now, if there are docs changes you'd like to make without changing the structure of the UI, then I'd support that. But larger structural changes to the app will need to wait until after beta.

docs/docs/user-guide/tables.md Outdated Show resolved Hide resolved
@seancolsen
Copy link
Contributor Author

@mathemancer I committed the changes from our pairing session via 4911484. I also made ed5d39d with another suggestion you had.

@seancolsen
Copy link
Contributor Author

@zackkrida I think I've responded to all your feedback now. Everything you've suggested sounds great, with the exception of some comments I left on "Access Control" vs "Roles & Permissions". I implemented some of your suggestions. But for most of them, I've left them for you to implement. I'll leave this PR open for you to push changes. Let me know when you're satisfied.

And for any remaining content that you still have to review, I think it would be fastest for both of us if you could please push more commits directly. I can always look over your commits, which is faster for me than reading critique and applying changes. After seeing your critique thus far, I'm confident in the chances being very high that I'll agree with your changes. Just to reiterate: this "pile on" sort of process is not how we typically collaborate on code, but I think it's appropriate for docs content where (in my opinion) the need for speed sometimes outweighs the need for consensus. I also feel like, in your new role here, you ought to have more ownership over the docs content than anyone else on our team. So I would hope we can get to the point soon where we all defer to you on these docs decisions.

@zackkrida
Copy link
Contributor

@seancolsen that sounds good, I'll go ahead and push some commits later for the areas you've replied to with "Sounds good. Can you implement this?" and any of the documents I've yet to read.

Any hesitancy on my part to directly make changes was accounting for:

  • My relative newness to the project (It's my ~8th day as a maintainer!)
  • My absence from past discussions about terminology, conceptual models, etc.

I appreciate the reassurance that you think my direct edits should be okay!

The one area I'd like to push a bit for your insights on is this comment. It's one of those thoughts that opens a much bigger "can of worms" question about the approach to permissions that I don't feel I'm informed enough to weigh in on definitively. If, however, the concern there is a matter of your time / expediency, feel free to pass. I'm pretty content with leaving the existing language in place for now.

@zackkrida zackkrida self-assigned this Dec 12, 2024
@zackkrida
Copy link
Contributor

Leaving a note for myself: we should make sure it is clear in various places that the docker compose installation method is the "official" production installation method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-status: review A PR awaiting review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants