First of all, thanks for taking the time to contribute! 🎉👍
We follow the GitHub Flow: all code contributions are submitted via a pull request towards the main
branch.
Opening a pull request means you want that code to be merged. If you want to only discuss it, send a link to your branch along with your questions through whichever communication channel you prefer.
All pull requests must be reviewed by at least one person who is not their original author.
To help reviewers, make sure to describe your pull request with a clear text explanation of your changes.
For pull requests of new contributors, if errors or areas for improvement are identified in their contributions, Open Terms Archive team can initially handle them. This is intended to speed up the delivery process and help the contributor acclimatise to the project. As they become more involved and learn more, they will be encouraged to take on more responsibility by implementing the changes themselves. The aim is to support growth and confidence in the contribution to the project while promoting quick delivery.
GitHub Actions is used to release the package on every merge to the main branch.
Branch protection is active, meaning that a merge to the main branch can only take place once all tests pass in CI, and that the peer review policy has been fulfilled.
We strive to follow this recommendation to write our commit messages, which contains the following rules:
- Separate subject from body with a blank line.
- Limit the subject line to 50 characters.
- Capitalize the subject line.
- Do not end the subject line with a period.
- Use the imperative mood in the subject line.
- Use the body to explain what and why vs. how.
Except this one:
Wrap the body at 72 characters: As URLs might be used for references in commit messages, the body text is not wrapped to 72 characters to ensure links are clickable in Git user interfaces.
We add these additional rules:
- Do not rely on GitHub issue reference numbers in commit messages, as we have no guarantee the host system and its autolinking will be stable in time. Make sure the context is self-explanatory. If an external reference is given, use its full URL.
When opening a pull request, it is required to fill in the changelog. It must be determined whether the changes made to the codebase impact users or not. These two cases are mutually exclusive and have different implications.
All changes to the codebase that impact users must be documented in the CHANGELOG.md
file, following the Common Changelog format with these additional specifications:
-
The
unreleased
section of Keep a Changelog must be added in the changelog with the addition of a tag to specify which type of release should be published and to foster discussions about it inside pull requests. This tag should be one of the names mandated by SemVer, within brackets:[patch]
,[minor]
or[major]
. For example:## Unreleased [minor]
.
Changes that require an adjustment in the infrastructure, they are considered as a breaking change in order to notify Collection operators about the need to update their deployment dependency accordingly. -
Each listed change must provide an actionable way to adapt the user’s codebase, either directly in the changelog or through instructions or links.
-
Changes should be a single sentence without punctuation, following Common Changelog examples.
-
Since each release is produced automatically from a single pull request, the notice links to the source pull request rather than references, which would always reference the same pull request. References can link to relevant parts of an RFC, decision record, or diff. This notice is automatically generated by the CI during the release process and should not be added manually.
-
The notice is also used to present sponsor information and it is required. Since the development of this project is funded by different actors, and following discussions with sponsors, financial contributions are acknowledged in the changelog itself. The format of the notice thus diverges from the Common Changelog specification in that it is not “a single-sentence paragraph”. Sponsor information is in quote format, starts with “Development of this release was supported by <funding_from>”, and provides the name and link to the sponsor, as well as information on the specific funding instrument, as specified by the sponsor itself or as required by law. A short message from the sponsor might also be added, as long as it abides by the community’s Code of Conduct and aligns with the project’s goals. For volunteer contributions, the sentence should start with: “Development of this release was made on a volunteer basis by <contributor_name>”
For non-functional changes (e.g., documentation, CI workflows) that do not impact users and should not trigger a release, it must be clearly indicated that documenting these changes in the changelog is unnecessary by adding the following content in its entirety to the changelog:
## Unreleased [no-release]
_Modifications made in this changeset do not add, remove or alter any behavior, dependency, API or functionality of the software. They only change non-functional parts of the repository, such as the README file or CI workflows._
This content will be automatically deleted by the CI after merging.
Avoid “you” and “we” (who is “we” in a common anyway?). Use neutral or passive wordings.
- You can find federated public instances on GitHub.
+ Federated public instances can be found on GitHub.
Use snake case for multiple word options:
- ota validate --schemaOnly
+ ota validate --schema-only
For command-line examples and documentation, we follow the docopt usage patterns syntax. Quick recap of the main points:
- mandatory arguments are given between
<
and>
; - optional elements are given between
[
and]
; - mutually exclusive elements are given between
(
and)
and separated by|
.
- ota track --services [ $service_id ] [, $service_id, ...]
+ ota track [--services <service_id>...]
In order to improve the understandability of commands, we document all CLI options and examples with the long version of the options.
- ota track -s $service_id -r
+ ota track --services <service_id> --extract-only
An “instance” of Open Terms Archive is comprised of a server running Open Terms Archive and up to three repositories. An instance has a name describing the scope of services it aims at tracking. This scope is defined by one or several dimensions: jurisdiction, language, industry…
For example, the
france
instance tracks documents in the French jurisdiction and French language, while thedating
instance tracks services from the dating industry.
The instance name is written in lowercase and is made of one word for each dimension it focuses on, separated by dashes.
For example, the
france-elections
instance tracks services in the French jurisdiction and French language that could impact the French electoral processes.
This name is used consistently in all communication, written references, and in the inventory of instances that are managed automatically. It is also used as the base for naming the database repositories, by suffixing it with each type:
- The repository containing the declarations of services to be tracked is named
<instance_name>-declarations
. You can create it from a template. - The repository containing the snapshots of the tracked documents (unless the instance is storing them in an alternative database) is named
<instance_name>-snapshots
. You can create it from a template. - The repository containing the versions of the tracked documents (unless the instance is storing them in an alternative database) is named
<instance_name>-versions
. You can create it from a template.
We deploy identifiers for packages and namespaces across different universes: package managers, social networks, URLs… In order to unify these names across constraints, we reserve everywhere the name opentermsarchive
, with no space, no dash, no underscore, no capital.
For example, NPM does not allow uppercase and spaces; Ansible does not allow dashes and spaces; Twitter does not allow spaces. The name
ota
is too unlikely to be available everywhere.
First of all it's important to distinguish two fundamentally different kinds of errors: operational and programmer errors.
-
Operational errors represent run-time problems experienced by correctly-written programs. These are not bugs in the program. These are usually problems with something else: the system itself (e.g. out of memory), the system’s configuration (e.g. no route to a remote host), the network (e.g. socket hang-up), or a remote service (e.g. a 500 error).
-
Programmer errors are bugs in the program. These are things that can always be avoided by changing the code. They can never be handled properly, since by definition the code in question is broken (e.g. tried to read property of
undefined
, or forget toawait
an asynchronous function).
So the very important distinction is that operational errors are part of the normal operation of a program whereas programmer errors are bugs.
Also noteworthy, failure to handle an operational error is itself a programmer error.
There are five ways to handle operational errors:
- Deal with the failure directly. For example, create directory if it's missing.
- Propagate the failure. If you don’t know how to deal with the error, the simplest thing to do is to abort whatever operation you’re trying to do, clean up whatever you’ve started, and propagate the error.
- Retry the operation. For example, try to reconnect if the connection is lost.
- Log the error — and do nothing else. If it's a minor error and there’s nothing you can do about, and there is no reason to stop the whole process.
- Crash immediately. If the error cannot be handled and can affect data integrity.
In our case, we consider all fetch
-related and extract
-related errors as expected, so as operational errors and we handle them by logging but we do not stop the whole process. We handle errors related to the notifier
in the same way.
In contrast, we consider errors from the recorder
module as fatal, and we crash immediately.
The best way to handle programmer errors is to crash immediately. Indeed, it is not recommended to attempt to recover from programmer errors — that is, allow the current operation to fail, but keep handling requests. Consider that a programmer error is a case that you didn’t think about when you wrote the original code. How can you be sure that the problem won’t affect the program itself and the data integrity?
This section is highly inspired, and in part extracted, from this error handling guide.
We acknowledge the efforts of our contributors by listing them on our website and this is made possible by the use of the All Contributors bot.
All Contributors enables adding a contributor with a comment on an issue or pull request, without writing code. To do this, please use the dedicated issue on this repository.
Please read the following contributing guide.