diff --git a/.gitignore b/.gitignore index 19b86974..7be5ad36 100644 --- a/.gitignore +++ b/.gitignore @@ -1,9 +1,10 @@ *.pyc *~ .DS_Store +.idea .ipynb_checkpoints .sass-cache __pycache__ _site .Rproj.user -.jekyll-cache/ \ No newline at end of file +.jekyll-cache/ diff --git a/_episodes/01-package-setup.md b/_episodes/01-package-setup.md index 5e44ac49..3a633979 100644 --- a/_episodes/01-package-setup.md +++ b/_episodes/01-package-setup.md @@ -14,9 +14,10 @@ keypoints: - "You can use the CMS CookieCutter to quickly create the layout for a Python package" --- -For this workshop, we are going to create a Python package that performs analysis and creates visualizations for molecules. We will start from a Jupyter notebook which has some functions and analysis, which you should download on the [setup]. +*TODO: Define "package". Distinguish from "module". Consider distinguishing w.r.t distribution, archive, source, installed...* +For this workshop, we are going to create a Python package that performs analysis and creates visualizations for molecules. We will start from a Jupyter notebook which has some functions and analysis, which you should download on the [setup]. *<- wording?* -The idea is that we would like to take this Jupyter notebook and convert the functions we have created into a Python package. That way, if anyone (a labmate, for example) would like to use our functions, they can do so by installing the package and importing it into their own scripts. +The idea is that we would like to take this Jupyter notebook and convert the functions we have created into a Python package. That way, if anyone (a lab-mate, for example) would like to use our functions, they can do so by installing the package and importing it into their own scripts. To start, we will first use a tool called [CookieCutter](https://cookiecutter.readthedocs.io/en/latest/) which will set up a Python package structure and several tools we will use during the workshop. @@ -42,9 +43,9 @@ $ cookiecutter gh:molssi/cookiecutter-cms ~~~ {: .language-bash} -This command runs the cookiecutter software (`cookiecutter` in the command) and tells cookiecutter to look at GitHub (`gh`) n the repository under `molssi/cookiecutter-cms`. This repository contains a template which cookiecutter uses to create your project, once you have provided some starting information. +This command runs the cookiecutter software (`cookiecutter` in the command) and tells cookiecutter to look at GitHub (`gh`) in the repository under `molssi/cookiecutter-cms`. This repository contains a template that cookiecutter uses to create your project, once you have provided some starting information. -You will see an interactive prompt which asks questions about your project. Here, the prompt is given first, followed by the default value in square brackets. The first question will be on your project name. You have very cleverly decided to give it the name `molecool` (it's like molecule, but with `cool` instead, because of your cool visualizations - get it?) +You will see an interactive prompt which asks questions about your project. Here, the prompt appears first, followed by the default value in square brackets. The first question will be on your project name. You have very cleverly decided to give it the name `molecool` (it's like molecule, but with `cool` instead, because of your cool visualizations - get it?) Answer the questions according to the following. If nothing is given after the colon (`:`), hit enter to use the default value. @@ -82,10 +83,10 @@ The first two questions are for the project and repository name. The project nam The next choice is about the first module name. Modules are the `.py` files which contain python code. The default for this is the `repo_name`, but we will change this to avoid confusion (the module `molecool.py` in a folder named `molecool` in a folder named `molecool`??). For now, we'll just name our first module `functions`, and this is where we will put all of our starting functions. -Another thing the CookieCutter checks for is your email address. Be sure to provide a valid email address to the cookiecutter (it must have an `@` symbol followed by a domain name, or the cookiecutter will fail.). Note that your email address is not recorded or kept by the software. Your email is asked for insertion into created files so that people using your software will have contact information for you. +Another thing that CookieCutter checks for is your email address. Be sure to provide a valid email address to `cookiecutter` (it must have an `@` symbol followed by a domain name, or `cookiecutter` will fail.). Note that your email address is not recorded or kept by the CookieCutter software, itself. `cookiecutter` inserts your email address into generated files so that people using your software will have contact information for you. #### License Choice -Choosing which license to use is often confusing for new developers. The MIT license (option 1) is a very common license and the default on GitHub. It allows for anyone to use, modify, or redistribute your work with no restrictions (and also no warranty). +Choosing which license to use is often confusing for new developers. The MIT license (option 1) is a very common license, and the default on GitHub. It allows for anyone to use, modify, or redistribute your work with no restrictions (and also no warranty). Here, we have chosen the `BSD-3-Clause`. The `BSD-3-Clause` license is an open-source, permissive license (meaning that few requirements are placed on developers of derivative works), similar to the MIT license. However, it adds a copyright notice with your name and requires redistributors of the code to keep the notice. It also prohibits others from using the name of the project or its contributors to promote derived products without written consent. @@ -95,7 +96,7 @@ You can see more detailed information on each license at [choosealicense.com](ht 1. [LGPLv3](https://choosealicense.com/licenses/gpl-3.0/) 1. Not Open Source - In this case, the cookiecutter will not generate a license. You can add a custom license, or choose to not add a license. If there is no license in a repository, you should assume that the project is **not** open source, and [you cannot modify or redistribute the software](https://choosealicense.com/no-permission/). -For most of your projects, it is likely that the license you choose will not matter a great deal. However, remember that if you ever want to change a license, you may have to get permission of all contributors. So, if you ever start a project that becomes popular or has contributors, be sure to decide your license early! +For most of your projects, it is likely that the license you choose won't matter a great deal. However, remember that if you ever want to change a license, you may have to get permission of all contributors. So, if you ever start a project that becomes popular or has contributors, be sure to decide your license early! > ## Types of Open-Source Licenses > @@ -105,10 +106,10 @@ For most of your projects, it is likely that the license you choose will not mat {: .callout} #### Dependency Source -This determines some things in set-up for what will be used to install dependencies for testing. This mostly has consequence for the section on Continuous Integration. We have chosen to install dependencies from anaconda with pip fallback. Don't worry too much about this choice for now. +This determines some things in set-up for what will be used to install dependencies for testing. This mostly has consequence for the section on [Continuous Integration]. We have chosen to install dependencies from anaconda with pip fallback. Don't worry too much about this choice for now. #### Support for ReadTheDocs -This option is to choose whether you would like files associated with the documentation hosting service [ReadTheDocs](https://readthedocs.org/). Choose yes for this workshop. +This option is to choose whether you would like files associated with the documentation hosting service [ReadTheDocs](https://readthedocs.org/). Choose "yes" for this workshop. ### Reviewing directory contents Now we can examine the project layout the CookieCutter has set up for us. Navigate to the newly created `molecool` directory. You should see the following directory structure. @@ -164,9 +165,9 @@ Now we can examine the project layout the CookieCutter has set up for us. Naviga ``` {: .output} -To visualize your project like above you will use "tree". If you do not have tree you can get using `sudo apt-get install tree` on linux, or `brew install tree` on Mac. Note - tree will not show you the helpful labels after '<-' (those were added by us). +To visualize your project like above you will use *tree*. If you do not have *tree*, you can get it using `sudo apt-get install tree` on Linux, or `brew install tree` on Mac. Note - `tree` will not show you the helpful labels after `<-` (those were added by us). -CookieCutter has created a lot of files! This can be thought of as three sections. In the top level of our project we have a folder for tools related to development (`devtools`), documentation (`docs`) and to the package itself (`molecool`). We will first be working in the `molecool` folder to build our package, and adding more things later. +CookieCutter has created a lot of files! They can be thought of as three sections. In the top level of our project we have a folder for tools related to development (`devtools`), documentation (`docs`) and to the package itself (`molecool`). We will first be working in the `molecool` folder to build our package, and adding more things later. ~~~ ... @@ -183,10 +184,11 @@ CookieCutter has created a lot of files! This can be thought of as three section ~~~ {: .output} -This the only folder we actually have to work with to build our package. The other folders relate to "best practices", which do not technically have to be used in order for your package to be working (but you should do them, and we will talk about them later). You could build this directory structure by hand, but we have just used cookiecutter to set it up for us. This directory will contain all of our python code for our project, as well as sample data (in the `data` folder), and tests (in the `tests` folder.) +This the only folder we actually have to work with to build our package. The other folders relate to "best practices", which do not technically have to be used in order for your package to be working (but you should do them, and we will talk about them later). You could build this directory structure by hand, but we have just used `cookiecutter` to set it up for us. This directory will contain all of our Python code for our project, as well as sample data (in the `data` folder), and tests (in the `tests` folder.) > ## Packages and modules -> +> *TODO: Rewrite. Separate discussion of packages vs. modules from discussion of importable entities and scoping.* +> > What 'packages' or 'modules' are in Python may be confusing. > In general, 'module' refers to a single `.py` file containing Python definitions and statements. It may be imported for use in another module or script. The module name is determined by the file name. A function defined in a module is used (once the module is imported) using the syntax `module_name.function_name()`. > 'Package' refers to a collection of Python modules. The package may also have an `__init__.py` file. @@ -205,11 +207,14 @@ $ cd molecool ### The `__init__.py` file The `__init__.py` file is a special file recognized by the Python interpreter which makes a directory into a package. This file can be blank in some cases, however, we will use it to define how the user interacts with the functions in our package. +*TODO: Cite section on defining the interface, where we can also mention `__all__` and `_` prefixed names.* +Contents of `molecool/molecool/__init__.py`: ~~~ """ -molecool -A Python package for analyzing and visualizing xyz files. For MolSSI Workshop. +Analyze and visualize xyz files. + +For MolSSI Workshop. """ # Add imports here @@ -224,7 +229,7 @@ del get_versions, versions ~~~ {: .language-python} -The very first section of this file contains a string opened and closed with three quotations. This is a docstring, and has a short description of the file. +The very first section of this file contains a string opened and closed with three quotations. This is a [docstring](https://www.python.org/dev/peps/pep-0257/), and has a short description of the file. The section we will be concerned with is under `# Add imports here`. This is how we define the way functions from modules are used. @@ -235,44 +240,51 @@ from .functions import * ~~~ {: .language} -goes to the `molecool.py` file, and brings everything that is defined there into the file. When we use our function defined in `functions.py`, that means we will be able to just say `molecool.canvas()` instead of giving the full path `molecool.functions.canvas()`. If that's confusing, don't worry too much for now. We will be returning to this file in a few minutes. For now, just note that it exists and makes our directory into a package. +goes to the `functions.py` file, and brings everything that is defined there into the file. When we use our function defined in `functions.py`, that means we will be able to just say `molecool.canvas()` instead of giving the full path `molecool.functions.canvas()`. If that's confusing, don't worry too much for now. We will be returning to `__init__.py` in a few minutes. For now, just note that it exists and makes our directory into a package. ### Our first module -Once inside of the `molecool` folder (`molecool/molecool`), examine the files that are there. View the first module (`functions.py`) in a text editor. We see a few things about this file. The top begins with a description of this module surrounded by three quotations (`"""`). Right now, that is the file name, followed by our short description, then the sentence "Handles the primary functions". We will change this to be more descriptive later. CookieCutter has also created a placeholder function in called `canvas`. At the start of the `canvas` function, we have a `docstring` (more about this in [documentation]), which describes the function. +Once inside the `molecool` folder (`molecool/molecool`), examine the files that are there. View the module (`functions.py`) in a text editor. We see a few things about this file. The top begins with a description of this module surrounded by three quotations (`"""`). Right now, that is the file name, followed by our short description, then the sentence "Handles the primary functions". We will change this to be more descriptive later. CookieCutter has also created a placeholder function called `canvas`. At the start of the `canvas` function, we have a `docstring` (more about this in [documentation]), which describes the function. + +We will be moving all of the functions we defined in the Jupyter notebook into python modules (`.py` files) like these. -We will be moving all of the functions we defined in the jupyter notebook into python modules (`.py` files) like these. +### Installing from local source. -### Python local installs +You may be accustomed to `pip` automatically retrieving packages from the internet. You can also install packages from local sources that contain a `setup.py` file. -To develop this package, we will want to something called a developmental install so that we can try out our functions and package as we develop it. +To develop this package, we will want to use what is called "development mode" or an "editable install" so that we can try out our functions and package as we develop it. We access development mode using the `develop` command to `setup.py`, or the `-e` option to `pip`. + +*TODO: Note that "editable" install is not (yet) standard and may even go away in the future.* #### Reviewing `setup.py` Return to the top directory (`molecool`). One of the files CookieCutter generated is a `setup.py` file. `setup.py` is the build script for [setuptools]. It tells setuptools about your package (such as the name and version) as well as which code files to include. We'll be using this file in the next section. #### Installing your package -A developer install will allow you to import your package and use it from anywhere on your computer. You will then be able to import your package into scripts in the same way you import `matplotlib` or `numpy`. +A development install will allow you to import your package and use it from anywhere on your computer. You will then be able to import your package into scripts in the same way you import `matplotlib` or `numpy`. -A local install uses the `setup.py` file to install your package by inserting a link to your new project into your Python site-packages folder. To find the location of your site packages folder, you can check your Python path. Open Python (type `python` into your terminal window), and type +A development installation uses the `setup.py` file to install your package by inserting a link to your new project into your Python site-packages folder. To find the location of your site-packages folder, you can check your Python path. Open Python (type `python` into your terminal window), and type +*TODO: update.* ~~~ >>> import sys >>> sys.path ~~~ {: .language-python} -This will give a list of locations python looks for packages when you do an import. One of the locations should end with `python3.7/site_packages`. The site packages folder is where all of your installed packages for a particular environment are located. +This will give a list of locations python looks for packages when you do an import. One of the locations should end with `python3.7/site-packages`. The site packages folder is where all of your installed packages for a particular environment are located. -To do a local install, type +To do a development mode install, type ~~~ $ pip install -e . ~~~ {: .language-bash} -Here, the `-e` indicates that we are installing this project in 'editable' mode (i.e. setuptools "develop mode"), while `.` indicates to install from the local directory (you could also specify a path here). Now, if you examine the contents of your site packages folder, you should see a link to `molecool` (`molecool.egg-link`). The folder has also been added to your path (check `sys.path` again.) +Here, the `-e` indicates that we are installing this project in *editable* mode (i.e. setuptools [*development mode*](https://setuptools.readthedocs.io/en/latest/userguide/commands.html#develop-deploy-the-project-source-in-development-mode)), while `.` indicates to install from the local directory (you could also specify a path here). Now, if you examine the contents of your site packages folder, you should see a link to `molecool` (`molecool.egg-link`). The folder has also been added to your path (check `sys.path` again.) Now, we can use our package from any directory, similar to how we can use other installed packages like `numpy`. Open Python, and type +*TODO: Consider using doctest-compliant examples (with expected output).* + ~~~ >>> import molecool >>> molecool.canvas() @@ -295,6 +307,8 @@ This should work from anywhere on your computer. > {: .solution} {: .challenge} +*TODO: Consider removing, move to a separate lesson, mention in the context of an existing package, or just cite Python Packaging Guide for optional components.* + Optional dependencies can be installed as well with `pip install -e .[docs,tests]` diff --git a/_episodes/02-git.md b/_episodes/02-git.md index 09233633..e99e215d 100644 --- a/_episodes/02-git.md +++ b/_episodes/02-git.md @@ -24,9 +24,9 @@ keypoints: Version control keeps a complete history of your work on a given project. It facilitates collaboration on projects where everyone can work freely on a part -of the project without overriding others’ changes. You can move between past +of the project without overwriting others’ changes. You can move between past versions and rollback when needed. Also, you can review the -history of your project through commit messages that describe changes on the source code +history of your project through commit messages that describe changes on the source code, and see what exactly has been modified in any given commit. You can see who made the changes and when it happened. @@ -36,26 +36,26 @@ team. > ## git vs. GitHub > > `git` is the software used for version control, while GitHub is a hosting service. You can use `git` locally (without using an online hosting service), or you can use it with other hosting services such as GitLab or BitBucket. -> Other examples of version control software include SVN and Mercurial. +> Other examples of version control software include Subversion (`svn`) and Mercurial (`hg`). > {: .callout} -MolSSI recommends using the software `git` for version control, and [GitHub] as a hosting service, though there are other options. +MolSSI recommends using `git` for version control, and [GitHub] as a hosting service, though there are other options. Recommended Hosting Service: [GitHub] Other hosting Services: [GitLab], [BitBucket] ## Making Commits -You should have git installed and configured from the [setup] instructions. +You should have `git` installed and configured from the [setup] instructions. In this section, we are going to edit files in the Python package that we created earlier, and use `git` to track those changes. First, use a terminal to `cd` into the top directory of the local repository. -In order for git to keep track of your project, or any changes in your project, you must first tell it that you want it to do this. You must manually create check-points in your project if you wish to have points to return to. If you were not using the CookieCutter, you would first have to initialize your project (ie tell git that you were working on a project) using the command `git init`. +In order for `git` to keep track of your project, or any changes in your project, you must first tell it that you want it to do this. You must manually create check-points in your project if you wish to have points to return to. If you were not using the CookieCutter, you would first have to initialize your project (i.e. tell `git` that you were working on a project) using the command `git init`. -When we ran the CMS CookieCutter, it actually initialized the use of `git` for us, added our files, and made a commit (how convenient!). We can see this by typing the following into the terminal on Linux or Mac +When we ran the CMS CookieCutter, it actually initialized `git` for us, added our files, and made a commit (how convenient!). We can see this by typing the following into the terminal on Linux or Mac ~~~ $ ls -la @@ -78,7 +78,7 @@ If you are on Windows and using the Anaconda PowerShell Prompt: ~~~ {: .bash} -You should an output called `.git`, `.git` is a directory where `git` stores the repository data. This is one way that we are in a git repository. +You should see `.git` in the output. `.git` is a directory where `git` stores the repository data. This is one way to see that we are in a git repository. Next, type @@ -101,7 +101,7 @@ $ git log ~~~ {: .bash} -You will get an output resembling the following. This is something called your git commit log. Whenever you make a version, or checkpoint, of your project, you will be able to see information about that checkpoint using the `git log` command. The cookie cutter has already made a commit and written a message for you, and that is what we see for this first commit in the log. +You will get an output resembling the following. This is something called your git *commit log*. Whenever you make a version, or checkpoint, of your project, you will be able to see information about that checkpoint using the `git log` command. The cookie cutter has already made a commit and written a message for you, and that is what we see for this first commit in the log. ~~~ commit 25ab1f1a066f68e433a17454c66531e5a86c112d (HEAD -> master, tag: 0.0.0) @@ -112,21 +112,21 @@ Date: Mon Feb 4 10:45:26 2019 -0500 ~~~ {: .output} -Each line of this log tells you something important about the commit, or check point that exists for the project. On the first line, +Each line of this log tells you something important about the commit, or check point, that exists for the project. On the first line, ~~~ commit 25ab1f1a066f68e433a17454c66531e5a86c112d (HEAD -> master, tag: 0.0.0) ~~~ -You have a unique identifier for the commit (25ab1...). You can use this number to reference this checkpoint. +You have a unique identifier for the commit (25ab1...). You can use this hexadecimal number to reference this checkpoint. -Then, git records the name of the author who made the change. +Then, `git` records the name of the author who made the change. ~~~ Author: Your Name ~~~ -This should be your information. This way, anyone who downloads this project can see who made each commit. Note that this name and email address matches what you specified when you configured git in the setup, with the name and email address you specified in the cookiecutter having no effect. +This should be your information. This way, anyone who downloads this project can see who made each commit. Note that this name and email address matches what you specified when you configured `git` in the setup, with the name and email address you specified to `cookiecutter` having no effect. ~~~ Date: Mon Feb 4 10:45:26 2019 -0500 @@ -137,16 +137,20 @@ Next, it lists the date and time the commit was made. Initial commit after CMS Cookiecutter creation, version 1.0 ~~~ -Finally, there will be a blank line followed by a commit message. The commit message is a message whoever made the commit chose to write, but should describe the change that took place when the commit was made. This commit message was written by the cookiecutter for you. +Finally, there will be a blank line followed by a commit message. The commit message is a message that whoever made the commit chose to write, but should describe the change that took place when the commit was made. This commit message was written by `cookiecutter` for you. When we have more commits (or versions) of our code, `git log` will show a history of these commits, and they will all have the same format discussed above. Right now, we have only one commit - the one created by the CMS CookieCutter. ## The 3 steps of a commit -Now, we will change some files and use `git` to track those changes. Let's edit our README. Open `README.md` in your text editor of choice. On line 8, you should see the description of the repository we typed when running the CookieCutter. Add the following sentence to your `README` under the initial description and save the file. +Now, we will change some files and use `git` to track those changes. + +Let's edit our README. Open `README.md` in your text editor of choice. On line 8, you should see the description of the repository we typed when running `cookiecutter`. + +Add the following sentence to your `README.md` under the initial description and save the file. ~~~ -This repository is currently under development. To do a developmental install, download this repository and type +This repository is currently under development. To do a development install, download this repository and type `pip install -e .` @@ -173,7 +177,7 @@ no changes added to commit (use "git add" and/or "git commit -a") ~~~ {: .output} -Git even tells us to use `git add` to include what will be committed. Let's follow the instructions and tell `git` that we want to create a checkpoint with the current version of `README.md` +Git even tells us to use `git add` to include what will be committed. Let's follow the instructions and tell `git` that we want to create a checkpoint with the current version of `README.md`. ~~~ $ git add README.md @@ -194,7 +198,7 @@ Changes to be committed: ~~~ {: .output} -We are now on the second step of creating a commit. We have `added` our files to the staging area. In our case, we only have one file in the staging area, but we could add more if we needed. +We are now on the second step of creating a commit. We have added our files to the staging area. In our case, we only have one file in the staging area, but we could add more if we needed. To create the checkpoint, or commit, we will now use the `git commit` command. We add a `-m` after the command for "message." Whenever you create a commit, you should write a message about what the commit does. @@ -206,7 +210,9 @@ $ git commit -m "update readme to have instructions for developmental install" Now when we look at our log using `git log`, we see the commit we just made along with information about the author and the date of the commit. -Let's continue to edit this readme to include more information. This is a file which will describe what is in this directory. Open `README.md` in your text editor of choice and add the following to the end +If you neglect the `-m` option, and you configured an editor during set-up, `git` will open the editor for you to compose your commit message. + +Let's continue to edit this readme to include more information. This is a file which will describe what is in this directory. Open `README.md` in your text editor of choice and add the following to the end. ~~~ This package requires the following: @@ -214,7 +220,7 @@ This package requires the following: - matplotlib ~~~ -This file is using a language called [markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). +This file is using a language called [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). > ## Check your understanding > Create a commit for these changes to your repository. @@ -261,22 +267,23 @@ $ git log We now have a log with three commits. This means there are three versions of the repository we are working in. `git log` lists all commits made to a repository in reverse chronological order. -The listing for each commit includes the commit's full identifier,the commit's author, when it was created, and the commit title. +The listing for each commit includes the commit's full identifier, the commit's author, when the commit was created, and the commit title. -We can see differences in files between commits using git diff. +We can see differences in files between commits using `git diff`. ~~~ $ git diff HEAD~1 ~~~ {: .language-bash} -Here HEAD refers to the point in our commit history (and current branch). When we use `~1`, we are asking git to show us the different of the current point minus one commit. +The argument to `git diff` refers to the comparison point in our commit history. +`HEAD` is an alias for the commit at the tip of our checked-out branch. `~1` is a modifier that refers to the given commit minus 1. We are asking git to show us the difference between the current files and the second-most-recent commit. Lines that have been added are indicated in green with a plus sign next to them ('+'), while lines that have been deleted are indicated in red with a minus sign next to them ('-') -## Viewing out previous versions +## Viewing our previous versions -If you need to check out a previous version +If you need to check out a previous version, ~~~ $ git checkout COMMIT_ID @@ -299,17 +306,18 @@ d857c74 (HEAD -> master) add information about dependencies to readme In this log, the commit ID is the first number on the left. -To revert to the version of the repository where we first edited the readme, use the git checkout command with the appropriate commit id. +To revert to the version of the repository where we first edited the readme, use the `git checkout` command with the appropriate commit ID. ~~~ $ git checkout 3c0e1c6 ~~~ {: .language-bash} -If you now view your readme, it is the previous version of the file. +If you now view your `README.md`, it has reverted to the previous version of the file. To return to the most recent point, +*TODO: Confirm that the example shell commands below render correctly.* ~~~ $ git checkout master ~~~ @@ -334,16 +342,16 @@ $ git commit -m "Initial commit after CMS Cookiecutter creation, version 1.0" ## Creating new features - using branches -When you are working on a project to implement new features, it is a good practices to isolate the the changes you are making and work on one particular topic at a time. To do this, you can use something called a **branch** in git. Working on branches allows you to isolate particular changes. If you make sure that your code works before merging to your main or **master** branch, you will ensure that you always have a working version of code on your main branch. +When you are working on a project to implement new features, it is a good practice to isolate the changes you are making and work on one particular topic at a time. To do this, you can use something called a **branch** in git. Working on branches allows you to isolate particular changes. If you make sure that your code works before merging to your main or **master** branch, you will ensure that you always have a working version of code on your `master` branch. -By default, you are typically in the master branch. To create a new branch and move to it, you can use the command +By default, you are typically in the `master` branch. To create a new branch and move to it, you can use the command ~~~ $ git checkout -b new_branch_name ~~~ {: .language-bash} -The command `git checkout` switches branches when followed by a branch name. When you use the `-b` option, git will create the branch and switch to it. For this exercise, we will add a new feature - we are going to add another function to print the [Zen of Python](https://www.python.org/dev/peps/pep-0020/). +The command `git checkout` switches branches when followed by a branch name. When you use the `-b` option, git will create the branch and switch to it. For this exercise, we will add a new feature: We are going to add a function to print the [Zen of Python](https://www.python.org/dev/peps/pep-0020/). First, we'll create a new branch: @@ -359,7 +367,7 @@ import this ~~~ {: .language-python} -into the interactive Python prompt +into the interactive Python prompt. ~~~ def zen(with_attribution=True): @@ -398,29 +406,29 @@ $ git commit -m "add function to print Zen of Python ~~~ {: .language-bash} -Let's switch back to the master branch to see what it is like. You can see a list of all the branches in your repo by using the command +Let's switch back to the `master` branch to see what it is like. You can see a list of branches in your repo by using the command ~~~ $ git branch ~~~ {: .language-bash} -This will list all of your branches. The active branch, or the branch you are on will be noted with an asterisk (`*`). +This will list your local branches. The active branch, or the branch you are on will be noted with an asterisk (`*`). -To switch back to the master branch, +To switch back to the `master` branch, ~~~ $ git checkout master ~~~ {: .language-bash} -When you look at the `functions.py` module on the master branch, you should not see your most recent changes. +When you look at the `functions.py` module on the `master` branch, you should not see your most recent changes. You can verify this by using the `git log` command. Consider that at the same time we have some changes or features we'd like to implement. Let's make a branch to do a documentation update. -Create a new branch +Create a new branch. ~~~ $ git checkout -b doc_update @@ -442,7 +450,7 @@ To switch to an existing branch, use Save and commit this change. -To incorporate these changes in master, you will need to do a `git merge`. When you do a merge, you should be on the branch you would like to merge into. In this case, we will first merge the changes from our `doc_update` branch, then our `zen` branch, so we should be on our `master` branch. Next we will use the `git merge` command. +To incorporate these changes in `master`, you will need to do a `git merge`. When you do a merge, you should be on the branch you would like to merge into. In this case, we will first merge the changes from our `doc_update` branch, then our `zen` branch, so we should be on our `master` branch. Next we will use the `git merge` command. The syntax for this command is @@ -453,16 +461,16 @@ $ git merge branch_name where `branch_name` is the name of the branch you would like to merge. -We can merge our `doc_update` branch to get changes on our master branch: +We can merge our `doc_update` branch to get changes on our `master` branch: ~~~ $ get merge doc_update ~~~ {: .language-bash} -Now our changes from the branch are on master. +Now our changes from the branch are on `master`. -We can merge our `zen` branch to get our changes on master: +We can merge our `zen` branch to get our changes on `master`: ~~~ $ git merge zen @@ -475,19 +483,23 @@ This time, you will see a different message, and a text editor will open for a m Merge made by the 'recursive' strategy. ~~~ -This is because `master` and `zen` had development histories which have diverged. Git had to do some work in this case to merge the branches. A merge commit was created. +This is because `master` and `zen` have development histories which have diverged. `git` had to do some work in this case to merge the branches. A *merge commit* was created. Merge commits create a branched git history. We can visualize the history of our project by adding `--graph`. There are other workflows you can use to make the commit history more linear, but we will not discuss them in this course. -Once we are done with a feature branch, we can delete it: +Once we are done with a feature branch, we can delete it: ~~~ $ git branch -d zen ~~~ {: .language-bash} +*TODO: Check excerpted docstrings with source material.* + +*TODO: Check example code for PEP-8/PEP-257 compliance.* + > ## Using Branches - Exercise -> For this exercise, you will be adding all the functions from your Jupyter notebook to the package. Create a branch to add your functions. Add all of the functions from your Jupyter notebook to the module `functions.py` in your package. Verify that you can use your functions. Once the functions are added and working, merge into your master branch. +> For this exercise, you will be adding all the functions from your Jupyter notebook to the package. Create a branch to add your functions. Add all of the functions from your Jupyter notebook to the module `functions.py` in your package. Verify that you can use your functions. Once the functions are added and working, merge into your `master` branch. >> ## Solution >> >> First, create a new branch in your repository @@ -722,7 +734,7 @@ $ git branch -d zen >> ~~~ >> {: .language-bash} >> ->> Next, switch back to your master branch to merge: +>> Next, switch back to your `master` branch to merge: >> ~~~ >> $ git checkout master >> $ git merge add-functions diff --git a/_episodes/03-github.md b/_episodes/03-github.md index 98813fc2..30985bf3 100644 --- a/_episodes/03-github.md +++ b/_episodes/03-github.md @@ -7,16 +7,18 @@ questions: objectives: - "Explain reasons to use GitHub." -keypoints: +key points: - "You can use GitHub to store your project online where you or others can access it from a central repository." -- "You can use GitHub to store your projects so you can work on them from multiple computers." +- "You can use GitHub to store your projects so that you can work on them from multiple computers." --- +*TODO: Consider how to handle redundancy with lesson 2.* + ## Putting your repository on GitHub. -Now, let's put this project on GitHub so that we can share it with others. In your browser, navigate to `github.com`. Log in to you account if you are not already logged in. On the left side of the page, click the green button that says `New` to create a new repository. Give the repository the name `molecool`. +Now, let's put this project on GitHub so that we can share it with others. In your browser, navigate to [github.com](https://github.com/). Log in to you account if you are not already logged in. On the left side of the page, click the green button that says `New` to create a new repository. Give the repository the name `molecool`. -Note for the last question, "Initialize this repository with a README". We will leave this unchecked in our case because we have an existing repository (as described by GitHub, "This will let you immediately clone the repository to your computer. Skip this step if you’re importing an existing repository."). If you were creating the repository on GitHub, you would select this. There are also options for adding a `.gitignore` file or a license. However, since cookiecutter created these for us, we will not add them. +Note for the last question, "Initialize this repository with a README". We will leave this unchecked in our case because we have an existing repository (as described by GitHub, "This will let you immediately clone the repository to your computer. Skip this step if you’re importing an existing repository."). If you were creating the repository on GitHub, you would select this. There are also options for adding a `.gitignore` file or a license. However, since `cookiecutter` created these for us, we will not add them. Click `Create repository`. @@ -33,7 +35,7 @@ $ git remote -v You should see no output. Now, follow the instructions on GitHub under "...or push an existing repository from the command line" ~~~ $ git remote add origin https://github.com/YOUR_GITHUB_USERNAME/molecool.git -dgit branch -M main +git branch -M main git push -u origin main ~~~ {: .language-bash} @@ -50,7 +52,7 @@ Now if you refresh the GitHub webpage you should be able to see all of the new f One of the most potentially frustrating problems in software development is keeping track of all the different copies of the code. For example, we might start a project on a local desktop computer, switch to working on a laptop during a conference, and then do performance optimization on a supercomputer. -In ye olden days, switching between computers was typically accomplished by copying files via a USB drive, or with ssh, or by emailing things to oneself. +In ye olden days, switching between computers was typically accomplished by copying files via a USB drive, or with `ssh`, or by emailing things to oneself. After copying the files, it was very easy to make an important change on one computer, forget about it, and go back to working on the original version of the code on another computer. Of course, when collaborating with other people these problems get dramatically worse. @@ -74,6 +76,7 @@ If you do not get this message, do `cd ../` until you see it. Next, make another copy of your repository. We'll use this to simulate working on another computer. +*TODO: use `git://` URL or discuss other authentication methods.* ~~~ $ git clone https://github.com/YOUR_GITHUB_USERNAME/molecool.git molecool_friend $ cd molecool_friend @@ -81,7 +84,7 @@ $ cd molecool_friend {: .bash} Check the remote on this repository. Notice that when you clone a repository from GitHub, it automatically has that repository listed as `origin`, and you do not have to add -the remote the way we did when we did not clone the repository. +the remote the way we did when we created the repository locally. ~~~ $ git remote -v diff --git a/_episodes/04-function-style.md b/_episodes/04-function-style.md index 65b65f92..2c30b8cc 100644 --- a/_episodes/04-function-style.md +++ b/_episodes/04-function-style.md @@ -1,15 +1,20 @@ --- title: "Python Coding Style" + teaching: 30 + exercises: 15 + questions: - "How can I write python code that is readable?" + objectives: -- "Learn how to raise exceptions" +- "Learn how to raise exceptions." - "Understand how to follow PEP8 style for Python." -- "Understand what docstrings are and their importance." -- "Learn to write docstrings in numpy style" -keypoints: +- "Understand what docstrings are and why they are important." +- "Learn to write docstrings in numpy style." + +key points: - "Your code should adhere to standards outlined in PEP8 so that it easily readable by others." - "All functions and modules should be documented with docstrings." --- @@ -19,7 +24,7 @@ keypoints: > - Completed Using Branches - Exercise in Episode 2. {: .prereq} -# Editing function to our package +# Editing a function in our package Let's look at one of the functions in our package. Open your `molecool/functions.py` module in a text editor. The function `open_pdb` reads coordinates and atom symbols from a pdb file. ~~~ @@ -38,9 +43,9 @@ def open_pdb(f_loc): ~~~ {: .language-python} -If we want to test our function, we require a pdb file. The workshop materials downloaded during the setup include a set of pdb examples. These are found in `molssi_beter_practices/starting_material/data/pdb/`. We want to store these files in our molecool directory. Luckily, cookicutter created a folder designed specifically for that purpose. The folder is in `molecool/data/`. This folder can contain any data useful for testing of the basic functionality of our code. Be mindful given that this folder is also downloaded when installing our package, so do not include data whose size is significant. +If we want to test our function, we require a pdb file. The workshop materials downloaded during the setup include a set of pdb examples. These are found in `molssi_beter_practices/starting_material/data/pdb/`. We want to store these files in our `molecool` directory. Luckily, `cookiecutter` created a folder designed specifically for that purpose. The folder is in `molecool/data/`. This folder can contain any data useful for testing of the basic functionality of our code. Be mindful that this folder is also downloaded when installing our package, so do not include data whose size is significant. -Go ahead and copy the pdb files in a new folder `pdb` inside the data folder. With the files in our molecool folder, we can access the function when we execute it in the interactive Python interpreter. Test this by opening the interactive Python interpreter and typing the following +Go ahead and copy the pdb files to a new folder `pdb` inside the data folder. With the files in our `molecool` folder, we can access the function when we execute it in the interactive Python interpreter. Test this by opening the interactive Python interpreter and typing the following. ~~~ >>> import os @@ -56,7 +61,7 @@ Go ahead and copy the pdb files in a new folder `pdb` inside the data folder. Wi ~~~ {: .output} -You should get a list of atomic symbols of the water molecule from executing this code. +You should get a list of atomic symbols of the water molecule by executing this code. You can also see the atomic coordinates by executing: ~~~ >>> coords @@ -72,9 +77,9 @@ array([[ 9.626, 6.787, 12.673], ~~~ {: .output} -Hooray! It seems like this function works! This should come as no surprise since we are the authors of the function and we know its internal structure. This is not necessarily true for someone editing our code and specially not true for someone just using our code. There are instances where even though the code is executed correctly, i.e., there where no syntax errors, an unwanted expected behavior occurs. In these cases, our code should be able to stop itself to prevent a malfunction. +Hooray! It seems like this function works! This should come as no surprise since we are the authors of the function, and we know its internal structure. This is not necessarily true for someone editing our code and specially not true for someone just using our code. There are instances where unwanted behavior occurs, even though the code executes (i.e. there are no syntax errors). In these cases, our code should be able to stop itself to prevent further malfunction. -## Raising Errors +## Raising Exceptions Take for example the division by zero. If we try to calculate ~~~ @@ -89,9 +94,9 @@ ZeroDivisionError: division by zero ~~~ {: .output} -In this example, the code was smart enough to identify the division by zero and halted. This type of feedback is much more helpful than just throwing an ugly `NaN`. This is called an exception error. There are several built-in exception such as the "ZeroDivisionError". You can choose to raise errors yourself when you think a function should fail (instead of the function not failing, or running until it hit a failure.) +In this example, the code was smart enough to identify the division by zero and halt. This type of feedback is much more helpful than just throwing an ugly `NaN`. This is called an *exception* error. There are several built-in exceptions, such as the "ZeroDivisionError". You can choose to raise exceptions yourself when you think a function should fail (instead of the function not failing, or running until it hits some other failure.) -Consider our function `write_xyz` +Consider our function `write_xyz`. ~~~ def write_xyz(file_location, symbols, coordinates): @@ -109,7 +114,7 @@ def write_xyz(file_location, symbols, coordinates): ~~~ {: .language-python} -When examining this function, you may see a few opportunities for failure. For example, a user could supply `symbols` and `coordinates` with different lengths. If the `coordinates` argument is the longer one, we will not see an error raised. The function will simply ignore the last coordinate. If `symbols` is the longer argument, we will not have enough `coordinates` and an error will occur. Neither of these is our intention, and one of them would complete without us knowing (some errors are silent)! +When examining this function, you may see a few opportunities for failure. For example, a user could supply `symbols` and `coordinates` with different lengths. If the `coordinates` argument is the longer one, we will not see an error. The function will simply ignore the last coordinate. If `symbols` is the longer argument, we will not have enough `coordinates` and an error will occur. Neither of these is our intended behavior, but would occur without us knowing (some errors are silent)! Let's try this out. In a python interpreter, try the following: @@ -124,7 +129,7 @@ Let's try this out. In a python interpreter, try the following: You will see that no error occurs. If we open the written XYZ file, the last coordinate point has been discarded. -We probably intend for these variables to have the same number of elements. When they don't, there's no way to tell what the user wanted, or if they have accidentally passed us incorrect data. We should check the length of theses and raise an appropriate exception to halt the program if necessary. +We probably intend for these variables to have the same number of elements. When they don't, there's no way to tell what the user wanted, or if they have accidentally passed us incorrect data. We should check the length of these and raise an appropriate exception to halt the program if necessary. ~~~ def write_xyz(file_location, symbols, coordinates): @@ -161,17 +166,17 @@ ValueError: write_xyz : the number of symbols (2) and number of coordinates (3) ~~~ {: .output} -The already built-in exceptions include errors that are common while programming. For example, our function requires explicit use of numpy arrays. Nevertheless, a user may be tempted to use a list of length 3 to describe the position of two atoms. We know that it is not possible to perform arithmetic between full lists. In this case we might use the exception type `TypeError`. +The built-in exceptions already include errors that are common while programming. For example, our function requires explicit use of [numpy] arrays. Nevertheless, a user may be tempted to use a list of length 3 to describe the position of two atoms. We know that it is not possible to perform arithmetic between full lists. In this case we might use the exception type `TypeError`. -Other types of common exceptions include variables not being defined (`NameError`) or asserting that two numbers are the same (assert). The latter will be particularly useful when we want to automatize testing within our package. +Other types of common exceptions include undefined variables (`NameError`) and failed assertions that two numbers are the same (`AssertionError`). The latter will be particularly useful when we want to automate testing within our package. ## Coding Style -Our functions are now smarter and will better guide users while using them. However, our function still might be hard to read and understand for others so we might want to consider styling it properly. +Our functions are now smarter and will better guide users while using them. However, our function still might be hard to read and understand for others, so we might want to consider styling it better. As a developer, you spend a lot of time thinking about writing your code. However, code is read much more often than it is written. Following a style guide will help others (and perhaps you in the future!) to read your code. -For Python, the common convention for code style is called [PEP8]. PEP8 is a document that gives guidelines for best practices in Python coding style. PEP8 is a recommendation, not rule. However, you should follow this convention when possible. +For Python, the common convention for code style is called [PEP8]. PEP8 is a document that gives guidelines for best practices in Python coding style. PEP8 is a recommendation, not a rule. However, you should follow this convention when possible. > ## Python PEP > @@ -189,7 +194,7 @@ PEP8 recommends that > Function names should be lowercase, with words separated by underscores as necessary to improve readability. -Though not specifically reference in PEP8, we also recommend making all variable names descriptive so that someone reading your code can easily understand what the variable is. +Though not specifically referenced in PEP8, we also recommend making all variable names descriptive so that someone reading your code can easily understand what the variable is. Consider a few variable we have defined in our function (`c`, `sym`, `c2`, `l`). Is it clear what these are or mean? We can change them to be more descriptive and readable. @@ -209,7 +214,7 @@ def open_pdb(file_location): ~~~ {: .language-python} -For this rewrite of the function, we have made the following changes in variable names +For this rewrite of the function, we have made the following changes in variable names. - `f_loc` ---> `file_location` - `c` ---> `coordinates` @@ -221,7 +226,7 @@ These variable names follow PEP8 convention and are much more descriptive and re ## Indentation -PEP8 indicates that indentation should be 4 spaces per indentation level. Our code meets this criteria. +PEP8 indicates that indentation should be 4 spaces per indentation level. Our code meets these criteria. ## Whitespace @@ -266,7 +271,7 @@ $ git push origin main {: .bash} > ## Exercise -> Below is the `calculate_distance` function that takes two points in 3D space and returns the distance between them. Even though it works just fine, the variable names are not very clear and it doesn't follow PEP8 styling. Take a couple of minutes to reformat this function in `molecool/functions.py` module. +> Below is the `calculate_distance` function that takes two points in 3D space and returns the distance between them. Even though it works just fine, the variable names are not very clear, and it doesn't follow PEP8 styling. Take a couple of minutes to reformat this function in the `molecool/functions.py` module. > ~~~ > def calculate_distance(rA, rB): > d=(rA-rB) @@ -274,7 +279,7 @@ $ git push origin main > return dist > ~~~ >> ## Solution ->> Here is a better formatted version of `calculate_distance` which is easier to read and understand. +>> Here is a better formatted version of `calculate_distance`, which is easier to read and understand. >> >> ~~~ >> def calculate_distance(rA, rB): @@ -323,7 +328,7 @@ canvas(with_attribution=True) If we try the same thing on our `calculate_distance` function, we don't get a helpful message. -We will want to write a docstring for our new `calculate_distance` function. This way, it will be clear to new developers who use our code what the function does, and be accessible to any users using the function interactively. Returning, to the `functions.py` module file, edit your `calculate_distance` function to look like the following: +We will want to write a docstring for our new `calculate_distance` function. This way, it will be clear to new developers who use our code what the function does, and be accessible to anyone using the function interactively. Returning to the `functions.py` module file, edit your `calculate_distance` function to look like the following. ~~~ def calculate_distance(rA, rB): @@ -356,9 +361,11 @@ def calculate_distance(rA, rB): ## Docstrings We've now added a multi-line comment (called a `docstring`, short for "documentation string"), to the beginning of our function. Docstrings **are the first statement after a function or module definition** and are opened and closed with three quotes. - The docstring should explain what the function or module does (and not how it is done). +[PEP257] provides very basic guidelines for docstrings. +There are many ways you could format a docstring (different styles/conventions). We recommend using [numpy style docstrings], and this is what the example above and `calculate_distance` function are written in. + > ## The `__doc__` attribute > > When you add a docstring to a function or module, python automatically adds this to the `__doc__` attribute of the object. @@ -367,8 +374,6 @@ The docstring should explain what the function or module does (and not how it is {: .callout} ### Sections of a Docstring -There are many ways you could format this docstring (different styles/conventions). We recommend using [numpy style docstrings], and this is what the example above and `calculate_distance` function are written in. - Each docstring has a number of sections which are separated by headings. Headings should be underlined with hyphens (`-----`). There are many options for sections, we will only cover the most relevant here. If you would like to see a full list, check out the documentation for [numpy style docstrings]. #### 1. Short summary @@ -389,7 +394,7 @@ We do not have an extended summary in our `calculate_distance` function, since i #### 3. Parameters This section contains a description of the function arguments - keywords and expected types. -The parameters for our `calculate_distance` function is shown below: +The parameters for our `calculate_distance` function are shown below. ~~~ """ @@ -401,10 +406,10 @@ rA, rB : np.ndarray ~~~ {: .language-python} -Here, you can see that the parameter section begins with the section title ("Parameters"), followed by a line of hypens ("----"). On the next line, we have the argument names (`rA, rB`), then a colon followed by the input type of the argument. This line says that the arguments `rA` and `rB` should be of type `np.ndarray`. The next line gives a more detailed description of the variable. When the input parameters are of different type or they aren't related to each other they should be written on a new line. +Here, you can see that the parameter section begins with the section title ("Parameters"), followed by a line of hyphens ("----"). On the next line, we have the argument names (`rA, rB`), then a colon followed by the input type of the argument. This line says that the arguments `rA` and `rB` should be of type `np.ndarray`. The next line gives a more detailed description of the parameter. When the input parameters are of different type, or they aren't related to each other, they should be written on separate lines. #### 4. Returns -This section is very similar to the `Parameters` section above. In contrast to the `Parameters` section, each returned value does not have to be named, but the type of the return value is required. +This section is very similar to the `Parameters` section above. In contrast to the `Parameters` section, each returned value does not have to be named, but the type of the returned value is required. For our `calculate_distance` function, our `Returns` section looks like the following. @@ -435,9 +440,9 @@ Examples ~~~ {: .language-python} -It is important that your Examples in docstrings be working Python. We will see in the `testing` lesson how we can run automatic tests on our docstrings, and in the `documentation` lesson, we will see how we can display examples in documentation to our users. +It is important that your examples in docstrings are working Python. We will see in the `testing` lesson how we can run automatic tests on our docstrings, and in the `documentation` lesson, we will see how we can display examples in documentation to our users. -We have three lines of code for our example. In examples, lines of code begin with `>>>`. The first two lines define numpy arrays that are used in our `calculate_distance` function. Note that `r1` and `r2` must be numpy arrays (as indicated by our `Parameters` section), or our Example will not give valid Python code (our function would error if we ran it). On the last line, you give the output (with no `>>>` in front.) +We have three lines of code for our example. In examples, lines of code begin with `>>>`. The first two lines define numpy arrays that are used in our `calculate_distance` function. Note that `r1` and `r2` must be numpy arrays (as indicated by our `Parameters` section), or our example will not give valid Python code (our function would error if we ran it). On the last line, you give the output (with no `>>>` in front.) Now that we've written a function in our project, we should commit our changes and push to GitHub. @@ -512,27 +517,27 @@ $ git push origin main ## More on Coding Style -If you look at PEP8, you will see that it is quite long. While you should definitely read it if you spend a lot of time programming in Python, there are luckily tools which will help us make sure our code is following PEP8 convention or other styling guidelines. There are autoformattign tools such as`yapf` and `Black` and static code "linters" such as `pylint` or `flake8`. +If you look at PEP8, you will see that it is quite long. While you should definitely read it if you spend a lot of time programming in Python, there are luckily tools which will help us make sure our code is following PEP8 convention or other styling guidelines. There are auto-formatting tools such as `yapf` and `Black`, and static code "linters" such as `pylint` or `flake8`. -Automatic code formatters will parse over your python files and format them according to standards defined by that code formatter. It is usually a good idea to use a formatter (of your choice) when working on a python project. In particular, [Black](https://github.com/psf/black) has gained popularity lately. +Automatic code formatters will parse your python files and format them according to standards defined by that code formatter. It is usually a good idea to use a formatter (of your choice) when working on a python project. In particular, [Black](https://github.com/psf/black) has gained popularity lately. -We will use [Black](https://github.com/psf/black) in this workshop. Black is an autoformatter which is almost entirely non customizable, ensuring all of your files will be uniform. +We will use [Black](https://github.com/psf/black) in this workshop. Black is an auto-formatter which is almost entirely non-customizable, ensuring all of your files will be uniform. -Install black using pip. In your terminal, type +Install `black` using `pip`. In your terminal, type ~~~ $ pip install black ~~~ {: .language-bash} -Now we can use black on our python files. +Now we can use `black` on our python files. ~~~ $ black molecool/functions.py ~~~ {: .language-bash} -You can see the changes to the `write_xyz` function, for example. You'll notice that Black also has some rules which are in addition to PEP8 formatting. For example, strings are all normalized to use double quotes. Note that `black` does not always follow PEP8. For example, PEP8 recommends that line lengths be no more than 79 characters. This is a convention which is most often not followed. Black defaults to 88 characters per line instead. When you are working on a project, the exact style you use may vary - however, it is important to define a style. This will make your code much cleaner and easier to read. +You can see the changes to the `write_xyz` function, for example. You'll notice that Black also has some rules which are in addition to PEP8 formatting. For example, strings are all normalized to use double quotes. Note that `black` does not always follow PEP8. For example, PEP8 recommends that line lengths be no more than 79 characters. This is a convention which is often not followed. Black defaults to 88 characters per line instead. When you are working on a project, the exact style you use may be different - however, it is important to choose a consistent style. This will make your code much cleaner and easier to read. Now that we've changed and formatted some functions in our project, we should commit our changes and push to GitHub. @@ -543,28 +548,28 @@ $ git push origin master ~~~ {: .bash} -There are other tools, such as [pylint](https://www.pylint.org/) or [flake8](https://flake8.pycqa.org/en/latest/) which are not automatic formatters, but will check your code for adherence to the PEP8 standard. Pylint, for example, will find your variables which are not `snake_case`, functions which do not have `docstrings`, simple stylistic changes, unused variables, etc. Flake8 is a little less strict in general. We will try `flake8` out here. If you would like to try `flake8`, first install it +There are other tools, such as [pylint](https://www.pylint.org/) and [flake8](https://flake8.pycqa.org/en/latest/) that are not automatic formatters, but will check your code for adherence to the PEP8 standard. Pylint, for example, will find your variables which are not `snake_case`, functions which do not have `docstrings`, simple stylistic changes, unused variables, etc. Flake8 is a little less strict in general. We will try `flake8` out here. If you would like to try `flake8`, first install it. ~~~ $ pip install flake8 ~~~ {: .language-bash} -You can run it on our `functions` module: +You can run it on our `functions` module. ~~~ $ flake8 molecool/functions.py ~~~ {: .language-bash} -To see any errors still left in the module. Let's examine one of these +shows any errors still left in the module. Let's examine one of these. ~~~ molecool/functions.py:1:1: F401 'os' imported but unused ~~~ {: .language-output} -This tells us it is looking at the line `molecool/functions.py` on line `1` (your line number may vary). `F401` is an error code which you can look up. Here, we are importing `os`, but never using it. We should remove this from our file. +This tells us it is looking at line 1 of `molecool/functions.py` (your line number may vary). `F401` is an error code which you can look up. Here, we are importing `os`, but never using it. We should remove this from our file. You will also see a second "unused import" error: @@ -573,7 +578,7 @@ molecool/functions.py:5:1: F401 'mpl_toolkits.mplot3d.Axes3D' imported but unuse ~~~ {: .language-python} -Althought it appears this isn't used, this import is actually necessary for our 3D plot. We can tell `flake8` to ignore this problem by adding a special comment: +Although it appears this isn't used, this import is actually necessary for our 3D plot. We can tell `flake8` to ignore this problem by adding a special comment: ~~~ from mpl_toolkits.mplot3d import Axes3D # noqa: F401 diff --git a/_episodes/05-package-structure.md b/_episodes/05-package-structure.md index 081dc0a4..1ccd72fb 100755 --- a/_episodes/05-package-structure.md +++ b/_episodes/05-package-structure.md @@ -1,7 +1,7 @@ --- title: "Deciding Package Structure" teaching: 30 -exercises: 0 +exercises: 10 questions: - "How should I break my code into modules?" - "How can I handle imports in my package?" @@ -13,12 +13,12 @@ keypoints: - "You can use the __init__.py file to define what packages are imported with your package, and how the user interacts with it." --- -As new features are implemented in codes, it is natural for new functions and objects to be added. In many projects, this often leads to a large number of functionalities defined within a single module. For small, single developer codes, this is not a major issue, but it can still make it difficult to work with. With large or multi-developer codes, this can slow development progress to a crawl as it is difficult to both understand and work with the code. +As new features are implemented in codes, it is natural for new functions and objects to be added. In many projects, this often leads to a large number of functionalities defined within a single module. For small, single developer codes, this is not a major issue, but it can still make code difficult to work with. With large or multi-developer codes, this can slow development progress to a crawl as it is difficult both to understand and work with the code. -In this lesson, we will simulate a developing code by starting with a single python module containing all the methods we have developed, and converting it into a well structured package. +In this lesson, we will simulate a developing code by starting with a single python module containing all the methods we have developed, and converting it into a well-structured package. ## Package Structure -Lets start by reviewing the package structure provided to us by the [CMS CookieCutter]. We have a directory containing our project with a number of additional features. Under our package directory, `molecool`, we can see our current python module `functions.py`. For a more detailed explanation of the rest of the package structure, please review the [package setup] section of the lessons. +Let's start by reviewing the package structure provided to us by the [CMS CookieCutter]. We have a directory containing our project with a number of additional features. Under our package directory, `molecool`, we can see our current python module `functions.py`. For a more detailed explanation of the rest of the package structure, please review the [package setup] section of the lessons. ``` . ├── CODE_OF_CONDUCT.md <- Code of Conduct for developers and users @@ -70,8 +70,8 @@ Lets start by reviewing the package structure provided to us by the [CMS CookieC ``` {: .output} -The easiest way to start is to see what we currently have and try and decide what is related to one another. Looking through the `functions.py` file, we see a number of different functions, and for the sake of simplicity we abbreviate and rearrange them here: -``` +The easiest way to start is to see what we currently have and try to decide which parts are related to one another. Looking through the `functions.py` file, we see a number of different functions, and for the sake of simplicity we abbreviate and rearrange them here: +```python atomic_weights = { 'H': 1.00784, 'C': 12.0107, @@ -96,34 +96,44 @@ atom_colors = { } def open_pdb(file_location): + ... def open_xyz(file_location): + ... def write_xyz(file_location, symbols, coordinates): + ... def calculate_distance(rA, rB): + ... def calculate_angle(rA, rB, rC, degrees=False): + ... def draw_molecule(coordinates, symbols, draw_bonds=None, save_location=None, dpi=300): + ... def bond_histogram(bond_list, save_location=None, dpi=300, graph_min=0, graph_max=2): + ... def build_bond_list(coordinates, max_bond=1.5, min_bond=0): + ... def calculate_molecular_mass(symbols): + ... def calculate_center_of_mass(symbols, coordinates): + ... ``` {: .language-python} Right at the start we can see two dictionaries of atom data. Clearly these are related and should probably be grouped together. Looking at the functions, we see two functions that handle opening files, `open_pdb` and `open_xyz`, and a function that writes a file, `write_xyz`. It may make sense to group these three together in a module based on input and output. -Lets start making new modules to place our related functions into. +Let's start making new modules to place our related functions into. ### Atom Data We will take the `atomic_weights` and `atom_colors` dictionaries and move them into a separate module called `atom_data.py`. This is enclosing the constant data that our system is using in a single place. This allows all of the new modules we create to access the data from a single location, avoiding the need to copy the dictionaries to each module that needs them. If we have any other data, related to atoms, used by many of our functions, adding them to this module would be a good idea. -``` +```python """ Data used for the rest of the package. """ @@ -165,10 +175,10 @@ atom_colors = { {: .challenge} ### Measure -Our `functions.py` file contains two functions that handle taking measurements, `calculate_distance` and `calculate_angle`. Simliar to `atom_data`, we will simply place these in a module within the main package. Since both functions are taking measurements, we will call it `measure.py`. +Our `functions.py` file contains two functions that handle taking measurements, `calculate_distance` and `calculate_angle`. Similar to `atom_data`, we will simply place these in a module within the main package. Since both functions are taking measurements, we will call it `measure.py`. ``` """ -This module is for functions which perform measurements. +This module is for functions that perform measurements. """ def calculate_distance(rA, rB): dist_vec = (rA - rB) @@ -188,7 +198,7 @@ def calculate_angle(rA, rB, rC, degrees=False): {: .language-python} ### Visualize -Similarly, we have two functions that handle visulaization of molecules. We will place them into a module called `visualize.py`. +Similarly, we have two functions that handle visualization of molecules. We will place them into a module called `visualize.py`. ``` """ Functions for visualization of molecules @@ -251,7 +261,7 @@ def bond_histogram(bond_list, save_location=None, dpi=300, graph_min=0, graph_ma ### Molecule -Our last function is `build_bond_list` which is not particularly related to any of our other functions (docstring added). The name `functions.py` does not really give a lot of information about what is available in the module. We can rename the module to something more fitting, say `molecule.py`. +Our last function is `build_bond_list`, which is not particularly related to any of our other functions. The name `functions.py` does not really give a lot of information about what is available in the module. We can rename the module to something more fitting, say `molecule.py`. We also add a docstring. ``` def build_bond_list(coordinates, max_bond=1.5, min_bond=0): """ @@ -290,7 +300,7 @@ def build_bond_list(coordinates, max_bond=1.5, min_bond=0): ### I/O Package -When looking at the three I/O functions, it may be easy to jump ahead and create an I/O module, as mentioned previously, however, what we really have is two distinct groups of functions that are related. More specifically, we have two functions that handle the input and output of a `.xyz` file and another function that handles the input of a `.pdb`. Each group is handling input and output, but are still somewhat unrelated because of their file type. Instead of making a single module, we are going to create a subpackage to handle i/o and place a module for each group within it. +When looking at the three I/O functions, it may be easy to jump ahead and create an I/O module, as mentioned previously. However, what we really have is two distinct groups of functions that are related. More specifically, we have two functions that handle the input and output of a `.xyz` file and another function that handles the input of a `.pdb`. Each group is handling input and output, but are still somewhat unrelated because of their file type. Instead of making a single module, we are going to create a subpackage to handle i/o and place a module for each group within it. Create a new directory called io within the package and create two new files `pdb.py` and `xyz.py`: @@ -346,10 +356,10 @@ def write_xyz(file_location, symbols, coordinates): coordinates[i,0], coordinates[i,1], coordinates[i,2])) ``` {: .language-python} -Now any module that needs to handle input and output can import the needed module from the `io` package. Since these are currently small modules, it would not be a big deal to import all of them, but consider a large I/O suite contianing a large number of file types and functionalities, it will quickly create inefficiencies to leave them in one module. +Now any module that needs to handle input and output can import the needed module from the `io` package. Since these are currently small modules, it would not be a big deal to import all of them, but consider a large I/O suite containing a large number of file types and functionalities, it will quickly create inefficiencies to leave them in one module. ## Fixing Imports -When we first copied the functions from the Jupyter Notebook into `functions.py`, we were able to import `molecool` package and access the functions within `functions.py`. After we extracted the functions from that file, we won't be able to import those functions in the same way. In fact we won't be able to access them at all. Every time we restructure our code or create new folders we have to be careful and modify the init accordingly. Let us then add the new functions into the `__init__` +When we first copied the functions from the Jupyter Notebook into `functions.py`, we were able to import `molecool` package and access the functions within `functions.py`. After we extracted the functions from that file, we won't be able to import those functions in the same way. In fact, we won't be able to access them at all. Every time we restructure our code or create new folders we have to be careful and modify the init accordingly. Let us then add the new functions into the `__init__` ~~~ # Add imports here @@ -368,7 +378,8 @@ In this way, we should be able to call each of the functions after importing our ~~~ {: .language-python} -Even with the imports fixed, if you try and run some of these functions, you may find yourself with an `ImportError`. This is because the functions can only see the code that has been "loaded" into the module. Each set of functions now exists as standalones within their module. +Even with the imports fixed, if you try and run some of these functions, you may find yourself with an `ImportError`. This is because the functions can only see the code that has been "loaded" into the module. Each set of functions now exists as stand-alones within their module. +*TODO: rephrase or clarify "stand-alones"? If we look at our original `functions.py` module, we will see that we had a number of import statements at the top of the file: ``` @@ -377,7 +388,7 @@ import numpy as np import matplotlib.pyplot as plt ``` {: .language-python} -These are modules that some of the functions need to run. Now that we have moved the functions into separate modules, we need to add in the import statements into each file where they are needed. Lets start by looking at `measure.py`. Looking through the functions, we can see that each of them has a reference to `np`, which is what +These are modules that are needed by some of the functions. Now that we have moved the functions into separate modules, we need to add the `import` statements into each file where they are needed. Let's start by looking at `measure.py`. Looking through the functions, we can see that each of them has a reference to `np`, which is what we imported `numpy` as in `functions.py`. Besides visual inspection, you could have also seen these missing imports by using `flake8` on the modules. @@ -387,9 +398,9 @@ $ flake8 measure.py ~~~ {: .language-bash} -You will see a message which says "undefined name np" +You will see a message which says "undefined name np". -In order to make these functions work again, we need to add the import statement +In order to make these functions work again, we need to add the following import statement. ``` import numpy as np ``` @@ -520,7 +531,7 @@ This will work, however, the main reason we broke up the modules within the `io` We can, of course, edit our `__init__.py` file to make this simpler. At this point, the way we actually do this import is going to be stylistic - how do you want people to interact with your package? -The goal we are going to go for is to call an IO function using: +The goal we are going to go for is to call an IO function using ~~~ molecool.io.IO_FUNCTION @@ -529,28 +540,28 @@ molecool.io.IO_FUNCTION where `IO_FUNCTION` is any function relating to IO. -Within the `io` directory, create a new file called `__init__.py`. Open that file within your desired editor and add the following two lines: +Within the `io` directory, create a new file called `__init__.py`. Open that file with your desired editor and add the following two lines. ``` from .pdb import open_pdb from .xyz import open_xyz, write_xyz ``` {: .language-python} -These lines are relative import statements to the functions within the `io` package. Think of them as pointers to the functions, i.e. when we look at the `io` package, it directs us to the location of the underlying functions, so we do not need to look within each submodule. This allows us to use the following import statement to our top level `__init__.py` to access the functions: +These lines are relative import statements to the functions within the `io` package. Think of them as pointers to the functions, i.e. when we look at the `io` package, it directs us to the location of the underlying functions, so we do not need to look within each submodule. This allows us to use the following `import` statement to our top level `__init__.py` to access the functions: ``` from . import io ``` {: .language-python} -We can now call our IO functions using our target syntax. +We can now call our I/O functions using our target syntax. ~~~ >>> molecool.io.open_pdb() ~~~ {: .language-python} -If we wanted the io functions to mimic the imports from the rest of the modules, we could modify our top level `__init__.py` file to reflect that. +If we wanted the I/O functions to mimic the imports from the rest of the modules, we could modify our top level `__init__.py` file to reflect that. ~~~ from .functions import * @@ -562,9 +573,9 @@ from .io import open_pdb, open_xyz, write_xyz {: .language-python} -And we could even make these functions more accessible by removing the need for the `io` +We could even make these functions more accessible by removing the need for the `io` module. -Which would allow us to call functions by simply typing +This would allow us to call functions by simply typing. ~~~ >>> molecool.open_pdb() ~~~ diff --git a/setup.md b/setup.md index a1df19c1..f52be7a2 100644 --- a/setup.md +++ b/setup.md @@ -4,7 +4,7 @@ title: "Setup" This setup tutorial will walk you through installing the software you will need for this workshop. -For this workshop, you will need to have Python installed. We recommend and assume you will have Python installed using Anaconda, and will be talking about package management using conda and Anaconda (see instructions below). You will also need to download workshop materials, and configure git. +For this workshop, you will need to have Python installed. We recommend and assume you will have Python installed using Anaconda, and will be talking about package management using `conda` and Anaconda (see instructions below). You will also need to download workshop materials, and configure `git`. We will cover the following topics. Click on a particular topic to skip to that section. @@ -41,7 +41,7 @@ In this workshop, we will be moving code from a Jupyter notebook into a Python p ## Installing Python Using Anaconda -If you already have a Python 3 verison Anaconda or MiniConda installed, you can skip this step. +If you already have a Python 3 version of Anaconda or MiniConda installed, you can skip this step. [Python][python] is a popular language for scientific computing, and great for general-purpose programming as well. We recommend using Python with the conda package manager. @@ -51,19 +51,19 @@ The installer for Anaconda can be found on [this page](https://www.anaconda.com/ Throughout the rest of this set-up, we will assume that you are using the `conda` package manager. -Please set up your python environment at +Please set up your python environment as far in advance as possible. If you encounter problems with the -installation procedure, ask your workshop organizers via e-mail for assistance so +installation procedure, ask your workshop organizers via e-mail for assistance, so you are ready to go as soon as the workshop begins. ## Choosing a text editor -You will need an editor for Python files for this workshop. If you do not have a prefered text editor, we recommend [Visual Studio Code](https://code.visualstudio.com/). If you installed a recent version of Anaconda, you will have VSCode installed already. You should be able to open it from the Anaconda Navigator window. If you have another prefered text editor, you should use that for the workshop. +You will need an editor for Python files for this workshop. If you do not have a preferred text editor, we recommend [Visual Studio Code](https://code.visualstudio.com/). If you installed a recent version of Anaconda, you will have VSCode installed already. You should be able to open it from the Anaconda Navigator window. If you have another preferred text editor, you should use that for the workshop. ## Using Anaconda and conda Use of [Anaconda] with its package manager, `conda`, greatly simplifies package installation and environment management. -`conda` is a general package manager, meaning that it can install dependencies and packages in languages besides Python, unlike `pip` (which is Python's package manager). Both `pip` and `conda` can be used to install packages. +`conda` is a general package manager, meaning that it can install dependencies and packages in languages besides Python (unlike `pip`, which is Python's package manager). Both `pip` and `conda` can be used to install packages in a conda environment. ### Python environments A `conda` environment contains a specific collection of packages you have installed. This means that packages are isolated, and installed only for a specific environment -- you can have several environments each with different installed packages, or different versions of installed packages in different environments. @@ -88,9 +88,9 @@ $ conda activate molssi_best_practices ~~~ {: .language-bash} -Once you've activated an environment, the name of the environment will be in parenthesis at the front of your command line prompt. +Once you've activated an environment, the name of the environment will be in parentheses at the front of your command line prompt. -If you wanted to create an environment for testing your code in Python 3.5, for example, you could use the command (Do not execute this, it's just an example.) +If you wanted to create an environment for testing your code in Python 3.5, for example, you could use the following command. (Do not execute this, it's just an example.) ~~~ $ conda create --name molssi_35 python=3.5 @@ -99,7 +99,7 @@ $ conda create --name molssi_35 python=3.5 When this environment is activated, Python 3.5 will be used instead of Python 3.7. -To see a list of all your environments +To see a list of all your environments, ~~~ $ conda info --envs @@ -142,6 +142,8 @@ $ conda install numpy=1.15 ~~~ {: .language-bash} +*TODO: mention `-c` syntax and `conda-forge` before using below.* + For this workshop, you will need to install the following packages into your environment - NumPy - Matplotlib @@ -195,6 +197,7 @@ This helps repository maintainers coordinate the efforts of all the people who c Most importantly, it makes it easier to figure out who to blame when something goes wrong. You can provide git your name and contact information with the following commands: +*TODO: Since angle brackets appear in the commit log wrt standard email address representation, add real example for clarity.* ~~~ $ git config --global user.name " " $ git config --global user.email ""