Editorial pass on content (apache#29094)

akvelon · Oct 20, 2023 · 6b92742 · 6b92742
1 parent e7a6405
commit 6b92742
Show file tree

Hide file tree

Showing 4 changed files with 82 additions and 63 deletions.
diff --git a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md
@@ -261,7 +261,7 @@ BigQuery's exported JSON format.
 
 {{< paragraph class="language-py" >}}
 ***Note:*** `BigQuerySource()` is deprecated as of Beam SDK 2.25.0. Before 2.25.0, to read from
-a BigQuery table using the Beam SDK, you will apply a `Read` transform on a `BigQuerySource`. For example,
+a BigQuery table using the Beam SDK, apply a `Read` transform on a `BigQuerySource`. For example,
 `beam.io.Read(beam.io.BigQuerySource(table_spec))`.
 {{< /paragraph >}}
 
@@ -397,8 +397,8 @@ for the destination table(s):
    whether the destination table must exist or can be created by the write
    operation.
  * The destination table's write disposition. The write disposition specifies
-   whether the data you write will replace an existing table, append rows to an
-   existing table, or write only to an empty table.
+   whether the data you write replaces an existing table, appends rows to an
+   existing table, or writes only to an empty table.
 
 In addition, if your write operation creates a new BigQuery table, you must also
 supply a table schema for the destination table.
@@ -512,7 +512,7 @@ use a string that contains a JSON-serialized `TableSchema` object.
 To create a table schema in Python, you can either use a `TableSchema` object,
 or use a string that defines a list of fields. Single string based schemas do
 not support nested fields, repeated fields, or specifying a BigQuery mode for
-fields (the mode will always be set to `NULLABLE`).
+fields (the mode is always set to `NULLABLE`).
 {{< /paragraph >}}
 
 #### Using a TableSchema
@@ -539,7 +539,7 @@ To create and use a table schema as a `TableSchema` object, follow these steps.
 
 2. Create and append a `TableFieldSchema` object for each field in your table.
 
-3. Next, use the `schema` parameter to provide your table schema when you apply
+3. Use the `schema` parameter to provide your table schema when you apply
    a write transform. Set the parameter’s value to the `TableSchema` object.
 {{< /paragraph >}}
 
@@ -728,8 +728,8 @@ The following examples use this `PCollection` that contains quotes.
 The `writeTableRows` method writes a `PCollection` of BigQuery `TableRow`
 objects to a BigQuery table. Each element in the `PCollection` represents a
 single row in the table. This example uses `writeTableRows` to write elements to a
-`PCollection<TableRow>`.  The write operation creates a table if needed; if the
-table already exists, it will be replaced.
+`PCollection<TableRow>`.  The write operation creates a table if needed. If the
+table already exists, it is replaced.
 {{< /paragraph >}}
 
 {{< highlight java >}}
@@ -745,7 +745,7 @@ table already exists, it will be replaced.
 {{< paragraph class="language-py" >}}
 The following example code shows how to apply a `WriteToBigQuery` transform to
 write a `PCollection` of dictionaries to a BigQuery table. The write operation
-creates a table if needed; if the table already exists, it will be replaced.
+creates a table if needed. If the table already exists, it is replaced.
 {{< /paragraph >}}
 
 {{< highlight py >}}
@@ -759,8 +759,8 @@ The `write` transform writes a `PCollection` of custom typed objects to a BigQue
 table. Use `.withFormatFunction(SerializableFunction)` to provide a formatting
 function that converts each input element in the `PCollection` into a
 `TableRow`.  This example uses `write` to write a `PCollection<String>`. The
-write operation creates a table if needed; if the table already exists, it will
-be replaced.
+write operation creates a table if needed. If the table already exists, it is
+replaced.
 {{< /paragraph >}}
 
 {{< highlight java >}}
@@ -786,7 +786,7 @@ BigQuery Storage Write API for Python SDK currently has some limitations on supp
 {{< /paragraph >}}
 
 {{< paragraph class="language-py" >}}
-**Note:** If you want to run WriteToBigQuery with Storage Write API from the source code, you need to run `./gradlew :sdks:java:io:google-cloud-platform:expansion-service:build` to build the expansion-service jar. If you are running from a released Beam SDK, the jar will already be included.
+**Note:** If you want to run WriteToBigQuery with Storage Write API from the source code, you need to run `./gradlew :sdks:java:io:google-cloud-platform:expansion-service:build` to build the expansion-service jar. If you are running from a released Beam SDK, the jar is already included.
 
 **Note:** Auto sharding is not currently supported for Python's Storage Write API exactly-once mode on DataflowRunner.
 
@@ -877,32 +877,33 @@ Similar to streaming inserts, `STORAGE_WRITE_API` supports dynamically determini
 the number of parallel streams to write to BigQuery (starting 2.42.0). You can
 explicitly enable this using [`withAutoSharding`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withAutoSharding--).
 
-***Note:*** `STORAGE_WRITE_API` will default to dynamic sharding when
+`STORAGE_WRITE_API` defaults to dynamic sharding when
 `numStorageWriteApiStreams` is set to 0 or is unspecified.
 
-***Note:*** Auto sharding with `STORAGE_WRITE_API` is supported on Dataflow's legacy runner, but **not** on Runner V2
+***Note:*** Auto sharding with `STORAGE_WRITE_API` is supported by Dataflow, but **not** on Runner v2.
 {{< /paragraph >}}
 
 When using `STORAGE_WRITE_API`, the `PCollection` returned by
 [`WriteResult.getFailedStorageApiInserts`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedStorageApiInserts--)
-will contain the rows that failed to be written to the Storage Write API sink.
+contains the rows that failed to be written to the Storage Write API sink.
 
 #### At-least-once semantics
 
 If your use case allows for potential duplicate records in the target table, you
 can use the
 [`STORAGE_API_AT_LEAST_ONCE`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html#STORAGE_API_AT_LEAST_ONCE)
-method. Because this method doesn’t persist the records to be written to
-BigQuery into its shuffle storage (needed to provide the exactly-once semantics
-of the `STORAGE_WRITE_API` method), it is cheaper and results in lower latency
-for most pipelines. If you use `STORAGE_API_AT_LEAST_ONCE`, you don’t need to
+method. This method doesn’t persist the records to be written to
+BigQuery into its shuffle storage, which is needed to provide the exactly-once semantics
+of the `STORAGE_WRITE_API` method. Therefore, for most pipelines, using this method is often
+less expensive and results in lower latency.
+If you use `STORAGE_API_AT_LEAST_ONCE`, you don’t need to
 specify the number of streams, and you can’t specify the triggering frequency.
 
 Auto sharding is not applicable for `STORAGE_API_AT_LEAST_ONCE`.
 
 When using `STORAGE_API_AT_LEAST_ONCE`, the `PCollection` returned by
 [`WriteResult.getFailedStorageApiInserts`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedStorageApiInserts--)
-will contain the rows that failed to be written to the Storage Write API sink.
+contains the rows that failed to be written to the Storage Write API sink.
 
 #### Quotas
 

diff --git a/website/www/site/content/en/documentation/sdks/python-unrecoverable-errors.md b/website/www/site/content/en/documentation/sdks/python-unrecoverable-errors.md
@@ -16,46 +16,58 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-# Unrecoverable Errors in Beam Python
+# Unrecoverable errors in Beam Python
 
-## What is an Unrecoverable Error?
+Unrecoverable errors are issues that occur at job start-up time and
+prevent jobs from ever running successfully. The problem usually stems
+from a misconfiguration. This page provides context about
+common errors and troubleshooting information.
 
-An unrecoverable error is an issue at job start-up time that will
-prevent a job from ever running successfully, usually due to some kind
-of misconfiguration. Solving these issues when they occur is key to
-successfully running a Beam Python pipeline.
+## Job submission or Python runtime version mismatch {#python-version-mismatch}
 
-## Common Unrecoverable Errors
+If the Python version that you use to submit your job doesn't match the
+Python version used to build the worker container, the job doesn't run.
+The job fails immediately after job submission.
 
-### Job Submission/Runtime Python Version Mismatch
+To resolve this issue, ensure that the Python version used to submit the job
+matches the Python container version.
 
-If the Python version used for job submission does not match the
-Python version used to build the worker container, the job will not
-execute. Ensure that the Python version being used for job submission
-and the container Python version match.
+## Dependency resolution failures with pip {#dependency-resolution-failures}
 
-### PIP Dependency Resolution Failures
+During worker start-up, the worker might fail and, depending on the
+runner, try to restart.
 
-During worker start-up, dependencies are checked and installed in
-the worker container before accepting work. If a pipeline requires
-additional dependencies not already present in the runtime environment,
-they are installed here. If there’s an issue during this process
-(e.g. a dependency version cannot be found, or a worker cannot
-connect to PyPI) the worker will fail and may try to restart
-depending on the runner. Ensure that dependency versions provided in
-your requirements.txt file exist and can be installed locally before
-submitting jobs.
+Before workers accept work, dependencies are checked and installed in
+the worker container. If a pipeline requires
+dependencies not already present in the runtime environment,
+they are installed at this time.
+When a problem occurs during this process, you might encounter
+dependency resolution failures.
 
-### Dependency Version Mismatches
+Examples of problems include the following:
 
-When additional dependencies like `torch`, `transformers`, etc. are not
-specified via a requirements_file or preinstalled in a custom container
-then the worker might fail to deserialize (unpickle) the user code.
-This can result in `ModuleNotFound` errors. If dependencies are installed
-but their versions don't match the versions in submission environment,
-pipeline might have `AttributeError` messages.
+- A dependency version can't be found.
+- A worker can't connect to PyPI.
 
-Ensure that the required dependencies at runtime and in the submission
-environment are the same along with their versions. For better visibility,
-debug logs are added specifying the dependencies at both stages starting in
-Beam 2.52.0. For more information, see: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#control-dependencies
+To resolve this issue, before submitting your job, ensure that the
+dependency versions provided in your `requirements.txt` file exist
+and that you can install them locally.
+
+## Dependency version mismatches {#dependency-version}
+
+When your pipeline has dependency version mismatches, you might
+see `ModuleNotFound` errors or `AttributeError` messages.
+
+ - The `ModuleNotFound` errors occur when additional dependencies,
+   such as `torch` and `transformers`, are neither specified in a
+   `requirements_file` nor preinstalled in a custom container.
+	In this case, the worker might fail to deserialize (unpickle) the user code.
+
+- Your pipeline might have `AttributeError` messages when dependencies
+  are installed but their versions don't match the versions in submission environment.
+
+To resolve these problems, ensure that the required dependencies and their versions are the same
+at runtime and in the submission environment. To help you identify these issues,
+in Apache Beam 2.52.0 and later versions, debug logs specify the dependencies at both stages.
+For more information, see
+[Control the dependencies the pipeline uses](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#control-dependencies).
diff --git a/website/www/site/content/en/get-started/_index.md b/website/www/site/content/en/get-started/_index.md
@@ -21,17 +21,18 @@ limitations under the License.
 
 # Get Started with Apache Beam
 
-Learn to use Beam to create data processing pipelines that run on supported processing back-ends:
+Learn how to use Beam to create data processing pipelines that run on supported processing back-ends.
 
-## [Tour of Beam](https://tour.beam.apache.org)
+## Tour of Beam
 
-Learn Beam with an interactive tour with learning topics covering core Beam concepts
-from simple ones to more advanced ones.
+[Learn Beam with an interactive tour](https://tour.beam.apache.org).
+Topics include core Beam concepts, from simple to advanced.
 You can try examples, do exercises, and solve challenges along the learning journey.
 
-## [Beam Overview](/get-started/beam-overview)
+## Beam Overview
 
-Learn about the Beam model, the currently available Beam SDKs and Runners, and Beam's native I/O connectors.
+Read the [Apache Beam Overview](/get-started/beam-overview) to learn about the Beam model,
+the currently available Beam SDKs and runners, and Beam's native I/O connectors.
 
 ## Quickstarts for Java, Python, Go, and TypeScript
 
@@ -49,10 +50,15 @@ See detailed walkthroughs of complete Beam pipelines.
 - [WordCount](/get-started/wordcount-example): Simple example pipelines that demonstrate basic Beam programming, including debugging and testing
 - [Mobile Gaming](/get-started/mobile-gaming-example): A series of more advanced pipelines that demonstrate use cases in the mobile gaming domain
 
-## [Downloads and Releases](/get-started/downloads)
+## Downloads and Releases
 
-Find download links and information on the latest Beam releases, including versioning and release notes.
+Find download links and information about the latest Beam releases, including versioning and release notes,
+on the [Apache Beam Downloads](/get-started/downloads) page.
 
-## [Support](/get-started/support)
+## Support
 
-Find resources, such as mailing lists and issue tracking, to help you use Beam. Ask questions and discuss topics via [Stack Overflow](https://stackoverflow.com/questions/tagged/apache-beam) or on Beam's [Slack Channel](https://apachebeam.slack.com).
+- Find resources to help you use Beam, such as mailing lists and issue tracking,
+  on the [Support](/get-started/support) page.
+ - Ask questions and discuss topics on
+  [Stack Overflow](https://stackoverflow.com/questions/tagged/apache-beam)
+  or in the Beam [Slack Channel](https://apachebeam.slack.com).
diff --git a/website/www/site/content/en/get-started/downloads.md b/website/www/site/content/en/get-started/downloads.md
@@ -19,7 +19,7 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-# Apache Beam&#8482; Downloads
+# Apache Beam<sup>®</sup> Downloads
 
 > Beam SDK {{< param release_latest >}} is the latest released version.