From f391565d6f5f48fe3110682d8434ed823be95d67 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Tue, 15 Mar 2022 17:16:25 +0800 Subject: [PATCH 01/49] modify readme of examples --- examples/inference/python/README.md | 124 ++++-------------- .../fairseq/ls_fs_transformer_export.py | 2 +- .../fairseq/ls_fs_transformer_ptq_export.py | 5 +- .../ls_torch_fs_quant_transformer_export.py | 3 +- .../fairseq/ls_torch_fs_transformer_export.py | 3 +- .../ls_torch_fs_transformer_ptq_export.py | 3 +- .../native_fs_transformer_ptq_export.py | 2 +- .../python/export/ls_transformer_export.py | 68 ++++------ .../export/ls_transformer_ptq_export.py | 24 +++- examples/inference/python/test/ls_fairseq.sh | 2 +- examples/training/fairseq/README.md | 12 +- 11 files changed, 88 insertions(+), 160 deletions(-) diff --git a/examples/inference/python/README.md b/examples/inference/python/README.md index a2ea4130..6d52236b 100644 --- a/examples/inference/python/README.md +++ b/examples/inference/python/README.md @@ -1,105 +1,32 @@ -# Examples of exporting models for LightSeq inference +# Model export and LightSeq inference +This repo contains examples of exporting models (LightSeq, Fairseq based, Hugging Face, etc.) to protobuf/hdf5 format, and then use LightSeq for fast inference. For each model, we provide normal float model export, quantized model export (QAT, quantization aware training) and PTQ (post training quantization) model export. -## Switch to the current directory +Before doing anything, you need to switch to the current directory: ```shell cd examples/inference/python ``` -## Export models -### Hugging Face -1. Hugging Face BART - -Export Hugging Face BART models to protobuf/hdf5 format. -```shell -python export/huggingface/hf_bart_export.py -``` -2. Hugging Face BERT - -Export Hugging Face BERT models to hdf5 format. -```shell -python export/huggingface/hf_bert_export.py -``` -3. Hugging Face GPT2 - -Export Hugging Face GPT2 models to hdf5 format. -```shell -python export/huggingface/hf_gpt2_export.py -``` -### Native Fairseq -1. Native Fairseq Transformer - -Export native Fairseq Transformer models to protobuf/hdf5 format. Refer to the `examples/training/fairseq` directory for more training details. -```shell -python export/fairseq/native_fs_transformer_export.py -m checkpoint_best.pt -``` - -2. Native Fairseq Transformer using PTQ - -Export native Fairseq Transformer models using PTQ to protobuf/hdf5 format. Refer to the `examples/training/fairseq` directory for more training details. -```shell -python export/fairseq/native_fs_transformer_export.py -m checkpoint_best.pt -``` - -3. Native Fairseq MoE Transformer - -Export Fairseq MoE models to protobuf/hdf5 format. -```shell -python export/fairseq/fs_moe_export.py -``` - -### Fairseq Transformer + LightSeq -1. Fairseq Transformer using LightSeq training library - -Export Fairseq Transformer models training with LightSeq to protobuf/hdf5 format. Refer to the `examples/training/fairseq` directory for more training details. -```shell -python export/fairseq/ls_fs_transformer_export.py -m checkpoint_best.pt -``` - -2. Fairseq Transformer using LightSeq training library with PTQ - -Export Fairseq Transformer models training with LightSeq to protobuf format, and then using PTQ to speedup inference. Refer to the `examples/training/fairseq` directory for more training details. -```shell -python export/fairseq/ls_fs_transformer_ptq_export.py -m checkpoint_best.pt -``` - -### LightSeq Transformer - -1. LightSeq Transformer - -Export LightSeq Transformer models to protobuf/hdf5 format. Refer to the `examples/training/custom` directory for more training details. -```shell -python export/ls_transformer_export.py -``` -2. LightSeq Transformer using PTQ - -Export LightSeq fp16/fp32 Transformer models to int8 protobuf format, and then using PTQ to speedup inference. Refer to the `examples/training/custom` directory for more training details. Note that in this example, we do not need to finetune the models using fake-quantization. -```shell -python export/ls_transformer_ptq_export.py -``` - -### Fairseq Transformer + custom Torch layers -1. Fairseq Transformer using custom Torch layers - -Export Fairseq Transformer models training using custom Torch layers to protobuf/hdf5 format. Refer to the `examples/training/fairseq` directory for more training details. -```shell -python export/fairseq/ls_torch_fs_transformer_export.py -m checkpoint_best.pt -``` - -2. Fairseq Transformer using custom Torch layers and PTQ - -Export PTQ Fairseq Transformer models training using custom Torch layers to protobuf/hdf5 format. Refer to the `examples/training/fairseq` directory for more training details. -```shell -python export/fairseq/ls_torch_fs_transformer_ptq_export.py -m checkpoint_best.pt -``` - -3. Quantized Fairseq Transformer using custom Torch layers - -Export quantized Fairseq Transformer models training using custom Torch layers to protobuf/hdf5 format. Refer to the `examples/training/fairseq` directory for more training details. -```shell -python export/fairseq/ls_torch_fs_quant_transformer_export.py -m checkpoint_best.pt -``` - -## Inference using LightSeq +## Model export +We provide the following export examples. All Fairseq based models are trained using the scripts in [examples/training/fairseq](../../../examples/training/fairseq). The first two LightSeq Transformer models are trained using the scripts in [examples/training/custom](../../../examples/training/custom). + +| Model | Type | Command | Resource | Description | +|--------------------------------------|-------|-------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------| +| LightSeq Transformer | Float | python export/ls_transformer_export.py -m ckpt_ls_custom.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt | Export LightSeq Transformer models to protobuf format. | +| LightSeq Transformer + PTQ | Int8 | python export/ls_transformer_ptq_export.py -m ckpt_ls_custom.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt | Export LightSeq Transformer models to int8 protobuf format using post training quantization. | +| Hugging Face BART | Float | python export/huggingface/hf_bart_export.py | / | Export Hugging Face BART models to protobuf/hdf5 format. | +| Hugging Face BERT | Float | python export/huggingface/hf_bert_export.py | / | Export Hugging Face BERT models to hdf5 format. | +| Hugging Face GPT2 | Float | python export/huggingface/hf_gpt2_export.py | / | Export Hugging Face GPT2 models to hdf5 format. | +| Native Fairseq Transformer | Float | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt | Export native Fairseq Transformer models to protobuf/hdf5 format. | +| Native Fairseq Transformer + PTQ | Int8 | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt | Export native Fairseq Transformer models to int8 protobuf format using post training quantization. | +| Fairseq + LightSeq Transformer | Float | python export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt | Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. | +| Fairseq + LightSeq Transformer + PTQ | Int8 | python export/fairseq/ls_fs_transformer_ptq_export.py -m ckpt_ls_fairseq_31.17.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt | Export Fairseq Transformer models training with LightSeq modules to int8 protobuf format using post training quantization. | +| Fairseq + custom Torch layer | Float | python export/fairseq/ls_torch_fs_transformer_export.py -m ckpt_ls_torch_fairseq_31.16.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to protobuf format. | +| Fairseq + custom Torch layer + PTQ | Int8 | python export/fairseq/ls_torch_fs_transformer_ptq_export.py -m ckpt_ls_torch_fairseq_31.16.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format using post training quantization. | +| Fairseq + custom Torch layer + QAT | Int8 | python export/fairseq/ls_torch_fs_quant_transformer_export.py -m ckpt_ls_torch_fairseq_quant_31.09.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_quant_31.09.pt | Export quantized Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format. | +| Native Fairseq MoE Transformer | Float | python export/fairseq/native_fs_moe_transformer_export.py | / | Export Fairseq MoE Transformer models to protobuf/hdf5 format. | + +## LightSeq inference +### Hugging Face models 1. BART ```shell python test/ls_bart.py @@ -113,7 +40,8 @@ python test/ls_bert.py python test/ls_gpt2.py ``` -4. Fairseq based models using LightSeq inference +### Fairseq based models +After exporting the Fairseq based models to protobuf/hdf5 format using above scripts, we can use the following script for fast LightSeq inference on wmt14 en2de dateset, compatible with fp16 and int8 models: ```shell bash test/ls_fairseq.sh --model ${model_path} ``` diff --git a/examples/inference/python/export/fairseq/ls_fs_transformer_export.py b/examples/inference/python/export/fairseq/ls_fs_transformer_export.py index 5993f79a..ff4b7704 100644 --- a/examples/inference/python/export/fairseq/ls_fs_transformer_export.py +++ b/examples/inference/python/export/fairseq/ls_fs_transformer_export.py @@ -1,5 +1,5 @@ """ -Export Fairseq Transformer models training with LightSeq to protobuf/hdf5 format. +Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. Refer to the `examples/training/fairseq` directory for more training details. """ import argparse diff --git a/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py index 2aeeba23..98c28ca7 100644 --- a/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py @@ -1,11 +1,10 @@ """ -Export Fairseq Transformer models training with LightSeq to protobuf format, -and then using int8 quantization to speedup inference. +Export Fairseq Transformer models training with LightSeq modules +to int8 protobuf format using post training quantization. Refer to the `examples/training/fairseq` directory for more training details. """ import argparse import torch -import h5py from export.proto.quant_transformer_pb2 import QuantTransformer from lightseq.training import ( export_ls_config, diff --git a/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py b/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py index 6a05cecb..3a5702a7 100644 --- a/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py +++ b/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py @@ -1,5 +1,6 @@ """ -Export quantized Fairseq Transformer models training using custom Torch layers to protobuf/hdf5 format. +Export quantized Fairseq Transformer models training with custom Torch layers +and other LightSeq modules to int8 protobuf format. Refer to the `examples/training/fairseq` directory for more training details. """ from collections import OrderedDict diff --git a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py index 4f9d8267..37373098 100644 --- a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py +++ b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py @@ -1,5 +1,6 @@ """ -Export Fairseq Transformer models training using custom Torch layers to protobuf/hdf5 format. +Export Fairseq Transformer models training with custom Torch layers +and other LightSeq modules to protobuf format. Refer to the `examples/training/fairseq` directory for more training details. """ from collections import OrderedDict diff --git a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py index c6498893..9e706409 100644 --- a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py @@ -1,5 +1,6 @@ """ -Export PTQ Fairseq Transformer models training using custom Torch layers to protobuf/hdf5 format. +Export Fairseq Transformer models training with custom Torch layers +and other LightSeq modules to int8 protobuf format using post training quantization. Refer to the `examples/training/fairseq` directory for more training details. """ from collections import OrderedDict diff --git a/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py index 446605f9..7d9d7b1d 100644 --- a/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py @@ -1,5 +1,5 @@ """ -Export PTQ native Fairseq Transformer models to protobuf/hdf5 format. +Export native Fairseq Transformer models to int8 protobuf format using post training quantization. Refer to the `examples/training/fairseq` directory for more training details. """ from collections import OrderedDict diff --git a/examples/inference/python/export/ls_transformer_export.py b/examples/inference/python/export/ls_transformer_export.py index 4f549e81..49b50820 100644 --- a/examples/inference/python/export/ls_transformer_export.py +++ b/examples/inference/python/export/ls_transformer_export.py @@ -1,7 +1,8 @@ """ -Export LightSeq Transformer models to protobuf/hdf5 format. +Export LightSeq Transformer models to protobuf format. Refer to the `examples/training/custom` directory for more training details. """ +import argparse import time import numpy as np import torch @@ -142,7 +143,7 @@ def create_data(): ) -def create_model(vocab_size): +def create_config(vocab_size): transformer_config = LSTransformer.get_config( model="transformer-base", max_batch_tokens=2048, @@ -154,29 +155,7 @@ def create_model(vocab_size): fp16=True, local_rank=0, ) - model = LSTransformer(transformer_config) - model.to(dtype=torch.half, device=torch.device("cuda:0")) - return model - - -def ls_train_predict(ls_train_model, src_tokens, trg_tokens, batch_size): - """ - NOTE: We do not use beam search here for implementation simplicity. - """ - torch.cuda.synchronize() - start_time = time.perf_counter() - encoder_out, encoder_padding_mask = ls_train_model.encoder(src_tokens) - predict_tokens = trg_tokens[:, :1] - cache = {} - for _ in range(trg_seq_len - 1): - output = ls_train_model.decoder( - predict_tokens[:, -1:], encoder_out, encoder_padding_mask, cache - ) - output = torch.reshape(torch.argmax(output, dim=-1), (batch_size, -1)) - predict_tokens = torch.cat([predict_tokens, output], dim=-1) - torch.cuda.synchronize() - end_time = time.perf_counter() - return predict_tokens, end_time - start_time + return transformer_config def ls_predict(ls_infer_model, src_tokens): @@ -188,6 +167,19 @@ def ls_predict(ls_infer_model, src_tokens): return ls_output, end_time - start_time +def parse_args(): + parser = argparse.ArgumentParser(description="export LightSeq checkpoint", usage="") + parser.add_argument( + "--model", + "-m", + type=str, + default="checkpoint_best.pt", + help="path of LightSeq checkpoint", + ) + args = parser.parse_args() + return args + + if __name__ == "__main__": ( tokenizer, @@ -205,34 +197,23 @@ def ls_predict(ls_infer_model, src_tokens): trg_seq_len, ) = create_data() - ckpt_path = "checkpoint.pt" - pb_path = "transformer.pb" + args = parse_args() + model_name = ".".join(args.model.split(".")[:-1]) + pb_path = f"{model_name}.pb" - with open(ckpt_path, "rb") as fin: + with open(args.model, "rb") as fin: state_dict = torch.load(fin, map_location=torch.device("cpu")) - ls_train_model = create_model(vocab_size) - ls_train_model.load_state_dict(state_dict) - ls_train_model.eval() - print("torch model loaded.") + config = create_config(vocab_size) - export_pb(state_dict, pb_path, pad_id, start_id, end_id, ls_train_model.config) + export_pb(state_dict, pb_path, pad_id, start_id, end_id, config) ls_infer_model = lsi.Transformer(pb_path, 8) src_tokens_np = np.array(src_tokens.cpu()) print("========================WARM UP========================") - ls_train_predict(ls_train_model, src_tokens, trg_tokens, batch_size) ls_predict(ls_infer_model, src_tokens_np) - print("========================TORCH TEST========================") - predict_tokens, ls_train_time = ls_train_predict( - ls_train_model, src_tokens, trg_tokens, batch_size - ) - mask = torch.cumsum(torch.eq(predict_tokens, end_id).int(), dim=1) - predict_tokens = predict_tokens.masked_fill(mask > 0, end_id) - predict_text = tokenizer.batch_decode(predict_tokens, skip_special_tokens=True) - print("========================LIGHTSEQ TEST========================") ls_output, ls_time = ls_predict(ls_infer_model, src_tokens_np) ls_output = [ids[0] for ids in ls_output[0]] @@ -242,9 +223,6 @@ def ls_predict(ls_infer_model, src_tokens): print("\n".join(src_text)) print(">>>>> target text") print("\n".join(trg_text)) - print(">>>>> lightseq (train) predict text") - print("\n".join(predict_text)) print(">>>>> lightseq (infer) predict text") print("\n".join(ls_predict_text)) - print("lightseq (train) predict time: {}ms".format(ls_train_time * 1000)) print("lightseq (infer) predict time: {}ms".format(ls_time * 1000)) diff --git a/examples/inference/python/export/ls_transformer_ptq_export.py b/examples/inference/python/export/ls_transformer_ptq_export.py index ac4c77b0..6d0e1471 100644 --- a/examples/inference/python/export/ls_transformer_ptq_export.py +++ b/examples/inference/python/export/ls_transformer_ptq_export.py @@ -1,8 +1,8 @@ """ -Export LightSeq fp16/fp32 Transformer models to int8 protobuf format, -and then using int8 quantization to speedup inference. +Export LightSeq Transformer models to int8 protobuf format using post training quantization. Refer to the `examples/training/custom` directory for more training details. """ +import argparse import time import numpy as np import torch @@ -183,6 +183,19 @@ def ls_predict(ls_infer_model, src_tokens): return ls_output, end_time - start_time +def parse_args(): + parser = argparse.ArgumentParser(description="export LightSeq checkpoint", usage="") + parser.add_argument( + "--model", + "-m", + type=str, + default="checkpoint_best.pt", + help="path of LightSeq checkpoint", + ) + args = parser.parse_args() + return args + + if __name__ == "__main__": ( tokenizer, @@ -200,10 +213,11 @@ def ls_predict(ls_infer_model, src_tokens): trg_seq_len, ) = create_data() - ckpt_path = "checkpoint.pt" - pb_path = "quant_transformer.pb" + args = parse_args() + model_name = ".".join(args.model.split(".")[:-1]) + pb_path = f"{model_name}_ptq.pb" - with open(ckpt_path, "rb") as fin: + with open(args.model, "rb") as fin: state_dict = torch.load(fin, map_location=torch.device("cpu")) config = create_config(vocab_size) diff --git a/examples/inference/python/test/ls_fairseq.sh b/examples/inference/python/test/ls_fairseq.sh index bf6b4d75..9ff1a6d7 100644 --- a/examples/inference/python/test/ls_fairseq.sh +++ b/examples/inference/python/test/ls_fairseq.sh @@ -3,7 +3,7 @@ until [[ -z "$1" ]] do case $1 in - --model) + -m) shift; MODEL=$1; shift;; *) diff --git a/examples/training/fairseq/README.md b/examples/training/fairseq/README.md index 623ad511..221bd0dd 100644 --- a/examples/training/fairseq/README.md +++ b/examples/training/fairseq/README.md @@ -1,5 +1,5 @@ # LightSeq for Fairseq -This repo contains an example for how to use LightSeq to accerate the training of translation task in [Fairseq](https://github.com/pytorch/fairseq). +This repo contains examples for how to use LightSeq to accerate the training of translation task in [Fairseq](https://github.com/pytorch/fairseq). First you should install these requirements. ```shell @@ -7,7 +7,7 @@ pip install lightseq fairseq sacremoses ``` ## Train -Then you can train a translation task on wmt14 en2de dataset by running the following script: +Then you can train a translation task on wmt14 en2de dataset using LightSeq by running the following script: ```shell sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh ``` @@ -16,11 +16,17 @@ Or you can use LightSeq modules like `--arch ls_transformer_wmt_en_de_big_t2t`, by adding `--user-dir=${LIGHTSEQ_DIR}/lightseq/training/cli/fs_modules` to `fairseq-train`. +You can use `--use-torch-layer` to replace LightSeq layers with custom Torch layers based on native Fairseq layers. + +You can use `--enable-quant` and `--quant-mode qat` to run quantization aware training for subsequent LightSeq fast int8 inference. + This script firstly download the dataset and then run native Fairseq training script using optimized model and optimizer. The `lightseq-train` command is just a easy-to-use wrapper of `fairseq-train` with adding LightSeq to `--user-dir`. +We also provide other training scripts to support custom Torch layers and quantization. All model files have been publicly released. **Refer to [examples/inference/python/README.md](../../../examples/inference/python/README.md) for more training, export and inference details.** + LightSeq can achieve about 1.47x speedup using batch size 4096 on 8 V100 GPUs, compared with original Fairseq implementation. You can delete the `ls` prefix in parameters to switch to fairseq modules. @@ -45,7 +51,7 @@ lightseq-generate /tmp/wmt14_en_de/ \ --gen-subset test \ --path checkpoints/checkpoint_best.pt \ --task translation \ - --max-tokens 8192 \ + --batch-size 128 \ --beam 4 \ --lenpen 0.6 \ --fp16 \ From 5ed629ad09a37d506981284f12a372746d7187e6 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Tue, 15 Mar 2022 17:24:21 +0800 Subject: [PATCH 02/49] modify table in example readme --- examples/inference/python/README.md | 30 ++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/examples/inference/python/README.md b/examples/inference/python/README.md index 6d52236b..febe0f31 100644 --- a/examples/inference/python/README.md +++ b/examples/inference/python/README.md @@ -9,21 +9,21 @@ cd examples/inference/python ## Model export We provide the following export examples. All Fairseq based models are trained using the scripts in [examples/training/fairseq](../../../examples/training/fairseq). The first two LightSeq Transformer models are trained using the scripts in [examples/training/custom](../../../examples/training/custom). -| Model | Type | Command | Resource | Description | -|--------------------------------------|-------|-------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------| -| LightSeq Transformer | Float | python export/ls_transformer_export.py -m ckpt_ls_custom.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt | Export LightSeq Transformer models to protobuf format. | -| LightSeq Transformer + PTQ | Int8 | python export/ls_transformer_ptq_export.py -m ckpt_ls_custom.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt | Export LightSeq Transformer models to int8 protobuf format using post training quantization. | -| Hugging Face BART | Float | python export/huggingface/hf_bart_export.py | / | Export Hugging Face BART models to protobuf/hdf5 format. | -| Hugging Face BERT | Float | python export/huggingface/hf_bert_export.py | / | Export Hugging Face BERT models to hdf5 format. | -| Hugging Face GPT2 | Float | python export/huggingface/hf_gpt2_export.py | / | Export Hugging Face GPT2 models to hdf5 format. | -| Native Fairseq Transformer | Float | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt | Export native Fairseq Transformer models to protobuf/hdf5 format. | -| Native Fairseq Transformer + PTQ | Int8 | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt | Export native Fairseq Transformer models to int8 protobuf format using post training quantization. | -| Fairseq + LightSeq Transformer | Float | python export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt | Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. | -| Fairseq + LightSeq Transformer + PTQ | Int8 | python export/fairseq/ls_fs_transformer_ptq_export.py -m ckpt_ls_fairseq_31.17.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt | Export Fairseq Transformer models training with LightSeq modules to int8 protobuf format using post training quantization. | -| Fairseq + custom Torch layer | Float | python export/fairseq/ls_torch_fs_transformer_export.py -m ckpt_ls_torch_fairseq_31.16.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to protobuf format. | -| Fairseq + custom Torch layer + PTQ | Int8 | python export/fairseq/ls_torch_fs_transformer_ptq_export.py -m ckpt_ls_torch_fairseq_31.16.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format using post training quantization. | -| Fairseq + custom Torch layer + QAT | Int8 | python export/fairseq/ls_torch_fs_quant_transformer_export.py -m ckpt_ls_torch_fairseq_quant_31.09.pt | http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_quant_31.09.pt | Export quantized Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format. | -| Native Fairseq MoE Transformer | Float | python export/fairseq/native_fs_moe_transformer_export.py | / | Export Fairseq MoE Transformer models to protobuf/hdf5 format. | +| Model | Type | Command | Resource | Description | +|--------------------------------------|-------|-------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------| +| LightSeq Transformer | Float | python export/ls_transformer_export.py -m ckpt_ls_custom.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt) | Export LightSeq Transformer models to protobuf format. | +| LightSeq Transformer + PTQ | Int8 | python export/ls_transformer_ptq_export.py -m ckpt_ls_custom.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt) | Export LightSeq Transformer models to int8 protobuf format using post training quantization. | +| Hugging Face BART | Float | python export/huggingface/hf_bart_export.py | / | Export Hugging Face BART models to protobuf/hdf5 format. | +| Hugging Face BERT | Float | python export/huggingface/hf_bert_export.py | / | Export Hugging Face BERT models to hdf5 format. | +| Hugging Face GPT2 | Float | python export/huggingface/hf_gpt2_export.py | / | Export Hugging Face GPT2 models to hdf5 format. | +| Native Fairseq Transformer | Float | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to protobuf/hdf5 format. | +| Native Fairseq Transformer + PTQ | Int8 | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to int8 protobuf format using post training quantization. | +| Fairseq + LightSeq Transformer | Float | python export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt) | Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. | +| Fairseq + LightSeq Transformer + PTQ | Int8 | python export/fairseq/ls_fs_transformer_ptq_export.py -m ckpt_ls_fairseq_31.17.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt) | Export Fairseq Transformer models training with LightSeq modules to int8 protobuf format using post training quantization. | +| Fairseq + custom Torch layer | Float | python export/fairseq/ls_torch_fs_transformer_export.py -m ckpt_ls_torch_fairseq_31.16.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt) | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to protobuf format. | +| Fairseq + custom Torch layer + PTQ | Int8 | python export/fairseq/ls_torch_fs_transformer_ptq_export.py -m ckpt_ls_torch_fairseq_31.16.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt) | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format using post training quantization. | +| Fairseq + custom Torch layer + QAT | Int8 | python export/fairseq/ls_torch_fs_quant_transformer_export.py -m ckpt_ls_torch_fairseq_quant_31.09.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_quant_31.09.pt) | Export quantized Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format. | +| Native Fairseq MoE Transformer | Float | python export/fairseq/native_fs_moe_transformer_export.py | / | Export Fairseq MoE Transformer models to protobuf/hdf5 format. | ## LightSeq inference ### Hugging Face models From 35adcfde74ab54bffec067994ccce5c4fb02deba Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 16 Mar 2022 13:27:31 +0800 Subject: [PATCH 03/49] add cpp example of quant_transformer --- examples/inference/cpp/CMakeLists.txt | 3 + .../cpp/quant_transformer_example.cc | 72 +++++++++++++++++++ examples/inference/cpp/transformer_example.cc | 17 +++-- 3 files changed, 85 insertions(+), 7 deletions(-) create mode 100644 examples/inference/cpp/quant_transformer_example.cc diff --git a/examples/inference/cpp/CMakeLists.txt b/examples/inference/cpp/CMakeLists.txt index 64cec769..a9a14d4f 100644 --- a/examples/inference/cpp/CMakeLists.txt +++ b/examples/inference/cpp/CMakeLists.txt @@ -3,6 +3,9 @@ cmake_minimum_required(VERSION 3.18) add_executable(transformer_example transformer_example.cc) target_link_libraries(transformer_example PUBLIC liblightseq) +add_executable(quant_transformer_example quant_transformer_example.cc) +target_link_libraries(quant_transformer_example PUBLIC liblightseq) + add_executable(bert_example bert_example.cc) target_link_libraries(bert_example PUBLIC liblightseq) diff --git a/examples/inference/cpp/quant_transformer_example.cc b/examples/inference/cpp/quant_transformer_example.cc new file mode 100644 index 00000000..08930deb --- /dev/null +++ b/examples/inference/cpp/quant_transformer_example.cc @@ -0,0 +1,72 @@ +#include "model_base.h" +#include "util.h" + +/** +@file +Example of how to run quantized transformer inference using our implementation. +*/ + +int main(int argc, char* argv[]) { + std::string model_weights_path = argv[1]; + int max_batch_size = 8; + + auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( + "QuantTransformer", model_weights_path, max_batch_size); + + int batch_size = 1; + int batch_seq_len = 13; + std::vector host_input = {63, 47, 65, 1507, 88, 74, 10, + 2057, 362, 9, 284, 6, 2}; + + void* d_input; + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_input, sizeof(int) * batch_size * batch_seq_len)); + lightseq::cuda::CHECK_GPU_ERROR(cudaMemcpy( + d_input, host_input.data(), sizeof(int) * batch_size * batch_seq_len, + cudaMemcpyHostToDevice)); + + model->set_input_ptr(0, d_input); + model->set_input_shape(0, {batch_size, batch_seq_len}); + + for (int i = 0; i < model->get_output_size(); i++) { + void* d_output; + std::vector shape = model->get_output_max_shape(i); + int total_size = 1; + for (int j = 0; j < shape.size(); j++) { + total_size *= shape[j]; + } + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_output, total_size * sizeof(int))); + model->set_output_ptr(i, d_output); + } + lightseq::cuda::CHECK_GPU_ERROR(cudaStreamSynchronize(0)); + std::cout << "infer preprocessing finished" << std::endl; + + /* ---step5. infer and log--- */ + for (int i = 0; i < 20; i++) { + auto start = std::chrono::high_resolution_clock::now(); + model->Infer(); + lightseq::cuda::print_time_duration(start, "one infer time", 0); + } + + for (int i = 0; i < model->get_output_size(); i++) { + const void* d_output; + d_output = static_cast(model->get_output_ptr(i)); + std::vector shape = model->get_output_shape(i); + std::cout << "output shape: "; + for (int j = 0; j < shape.size(); j++) { + std::cout << shape[j] << " "; + } + std::cout << std::endl; + + if (!i) + lightseq::cuda::print_vec((int*)d_output, "output", 15); + else + lightseq::cuda::print_vec((float*)d_output, "output", 5); + } + + // const int* res = model.get_result_ptr(); + // const float* res_score = model.get_score_ptr(); + // lightseq::cuda::print_vec(res_score, "res score", 5); + return 0; +} diff --git a/examples/inference/cpp/transformer_example.cc b/examples/inference/cpp/transformer_example.cc index 6998064a..79413bb0 100644 --- a/examples/inference/cpp/transformer_example.cc +++ b/examples/inference/cpp/transformer_example.cc @@ -8,15 +8,15 @@ Example of how to run transformer inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; - int max_batch_size = 128; + int max_batch_size = 8; auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( "Transformer", model_weights_path, max_batch_size); int batch_size = 1; - int batch_seq_len = 14; - std::vector host_input = {0, 100, 657, 14, 1816, 6, 53, - 50264, 473, 45, 50264, 162, 4, 2}; + int batch_seq_len = 13; + std::vector host_input = {63, 47, 65, 1507, 88, 74, 10, + 2057, 362, 9, 284, 6, 2}; void* d_input; lightseq::cuda::CHECK_GPU_ERROR( @@ -43,14 +43,14 @@ int main(int argc, char* argv[]) { std::cout << "infer preprocessing finished" << std::endl; /* ---step5. infer and log--- */ - for (int i = 0; i < 10; i++) { + for (int i = 0; i < 20; i++) { auto start = std::chrono::high_resolution_clock::now(); model->Infer(); lightseq::cuda::print_time_duration(start, "one infer time", 0); } for (int i = 0; i < model->get_output_size(); i++) { - const float* d_output; + const void* d_output; d_output = static_cast(model->get_output_ptr(i)); std::vector shape = model->get_output_shape(i); std::cout << "output shape: "; @@ -59,7 +59,10 @@ int main(int argc, char* argv[]) { } std::cout << std::endl; - lightseq::cuda::print_vec(d_output, "output", 5); + if (!i) + lightseq::cuda::print_vec((int*)d_output, "output", 15); + else + lightseq::cuda::print_vec((float*)d_output, "output", 5); } // const int* res = model.get_result_ptr(); From e8fa612bedb562c26c4011f53b4397dd392ee5ef Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 17 Mar 2022 01:14:04 +0800 Subject: [PATCH 04/49] support huggingface bert ptq (stage 1) --- .../ls_hf_transformer_encoder_layer.py | 77 ++- examples/training/huggingface/run_glue.py | 6 +- examples/training/huggingface/run_glue.sh | 7 +- examples/training/huggingface/run_ner.py | 6 +- examples/training/huggingface/run_ner.sh | 14 +- .../huggingface/run_ner_no_trainer.py | 618 ------------------ .../huggingface/run_ner_no_trainer.sh | 26 - lightseq/training/ops/pytorch/quantization.py | 8 +- .../ops/pytorch/torch_transformer_layers.py | 58 +- .../ops/pytorch/transformer_decoder_layer.py | 2 +- .../ops/pytorch/transformer_encoder_layer.py | 2 +- 11 files changed, 126 insertions(+), 698 deletions(-) delete mode 100644 examples/training/huggingface/run_ner_no_trainer.py delete mode 100644 examples/training/huggingface/run_ner_no_trainer.sh diff --git a/examples/training/huggingface/ls_hf_transformer_encoder_layer.py b/examples/training/huggingface/ls_hf_transformer_encoder_layer.py index 38db61fe..19e29ac7 100644 --- a/examples/training/huggingface/ls_hf_transformer_encoder_layer.py +++ b/examples/training/huggingface/ls_hf_transformer_encoder_layer.py @@ -1,39 +1,14 @@ import random -from lightseq.training.ops.pytorch.transformer_encoder_layer import ( - LSTransformerEncoderLayer, +from lightseq.training.ops.pytorch.quantization import ( + TensorQuantizer, + enable_quant, + disable_quant, + qat_mode, + ptq_mode, ) -class LSHFTransformerEncoderLayer(LSTransformerEncoderLayer): - def __init__(self, *args, **kwargs): - super(LSHFTransformerEncoderLayer, self).__init__(*args, **kwargs) - - def forward(self, hidden_states, encoder_padding_mask, *args, **kwargs): - ls_encoder_padding_mask = encoder_padding_mask / -10000.0 - ls_encoder_padding_mask = ls_encoder_padding_mask.squeeze() - output = super().forward(hidden_states, ls_encoder_padding_mask) - return (output, None, None, None) - - -def gen_bert_config(training_args, config): - bert_config = LSTransformerEncoderLayer.get_config( - max_batch_tokens=4096, - max_seq_len=config.max_position_embeddings, - hidden_size=config.hidden_size, - intermediate_size=config.intermediate_size, - nhead=config.num_attention_heads, - attn_prob_dropout_ratio=config.attention_probs_dropout_prob, - activation_dropout_ratio=config.hidden_dropout_prob, - hidden_dropout_ratio=config.hidden_dropout_prob, - pre_layer_norm=False, - fp16=training_args.fp16, - local_rank=training_args.local_rank, - activation_fn="gelu", - ) - return bert_config - - def get_hf_bert_enc_layer_params(layer): init_ws = [] init_bs = [] @@ -59,10 +34,48 @@ def get_hf_bert_enc_layer_params(layer): return init_ws, init_bs -def inject_ls_enc_layer(model, training_args, config): +def inject_ls_enc_layer(model, training_args, config, enable_quant=False): + if enable_quant: + from lightseq.training.ops.pytorch.torch_transformer_layers import ( + TransformerEncoderLayer, + ) + else: + from lightseq.training.ops.pytorch.transformer_encoder_layer import ( + LSTransformerEncoderLayer as TransformerEncoderLayer, + ) + + class LSHFTransformerEncoderLayer(TransformerEncoderLayer): + def __init__(self, *args, **kwargs): + super(LSHFTransformerEncoderLayer, self).__init__(*args, **kwargs) + + def forward(self, hidden_states, encoder_padding_mask, *args, **kwargs): + ls_encoder_padding_mask = encoder_padding_mask / -10000.0 + ls_encoder_padding_mask = ls_encoder_padding_mask.squeeze() + output = super().forward(hidden_states, ls_encoder_padding_mask) + return (output, None, None, None) + + def gen_bert_config(training_args, config): + bert_config = TransformerEncoderLayer.get_config( + max_batch_tokens=4096, + max_seq_len=config.max_position_embeddings, + hidden_size=config.hidden_size, + intermediate_size=config.intermediate_size, + nhead=config.num_attention_heads, + attn_prob_dropout_ratio=config.attention_probs_dropout_prob, + activation_dropout_ratio=config.hidden_dropout_prob, + hidden_dropout_ratio=config.hidden_dropout_prob, + pre_layer_norm=False, + fp16=training_args.fp16, + local_rank=training_args.local_rank, + activation_fn="gelu", + ) + return bert_config + for i in range(config.num_hidden_layers): bert_config = gen_bert_config(training_args, config) init_ws, init_bs = get_hf_bert_enc_layer_params(model.bert.encoder.layer[i]) model.bert.encoder.layer[i] = LSHFTransformerEncoderLayer( bert_config, init_ws, init_bs ).cuda() + if enable_quant: + model.bert.encoder.layer[i].apply(disable_quant) diff --git a/examples/training/huggingface/run_glue.py b/examples/training/huggingface/run_glue.py index 1a2274da..0c07c916 100644 --- a/examples/training/huggingface/run_glue.py +++ b/examples/training/huggingface/run_glue.py @@ -228,6 +228,10 @@ class ModelArguments: default=True, metadata={"help": "Whether to use lightseq TransformerEncoder"}, ) + enable_quant: bool = field( + default=False, + metadata={"help": "Whether to enable quantization"}, + ) def main(): @@ -411,7 +415,7 @@ def main(): # Replace with LightSeq encoder layers. if model_args.with_lightseq: - inject_ls_enc_layer(model, training_args, config) + inject_ls_enc_layer(model, training_args, config, model_args.enable_quant) # Preprocessing the datasets if data_args.task_name is not None: diff --git a/examples/training/huggingface/run_glue.sh b/examples/training/huggingface/run_glue.sh index 84fa3c38..3a7cc33e 100644 --- a/examples/training/huggingface/run_glue.sh +++ b/examples/training/huggingface/run_glue.sh @@ -18,7 +18,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) export TASK_NAME=stsb python3 -m torch.distributed.launch \ - --nproc_per_node=1 \ + --nproc_per_node=8 \ $THIS_DIR/run_glue.py \ --model_name_or_path bert-large-cased \ --task_name $TASK_NAME \ @@ -27,10 +27,11 @@ python3 -m torch.distributed.launch \ --max_seq_length 128 \ --per_device_train_batch_size 32 \ --learning_rate 2e-5 \ - --num_train_epochs 3 \ + --num_train_epochs 50 \ --output_dir /tmp/$TASK_NAME/ \ --overwrite_output_dir \ - --with_lightseq true \ --fp16 \ --seed 1234 \ --logging_steps 10 \ + --with_lightseq true \ + --enable_quant true diff --git a/examples/training/huggingface/run_ner.py b/examples/training/huggingface/run_ner.py index 1f287bfd..eea1da87 100644 --- a/examples/training/huggingface/run_ner.py +++ b/examples/training/huggingface/run_ner.py @@ -98,6 +98,10 @@ class ModelArguments: default=True, metadata={"help": "Whether to use lightseq TransformerEncoder"}, ) + enable_quant: bool = field( + default=False, + metadata={"help": "Whether to enable quantization"}, + ) @dataclass @@ -370,7 +374,7 @@ def get_label_list(labels): # Replace with LightSeq encoder layers. if model_args.with_lightseq: - inject_ls_enc_layer(model, training_args, config) + inject_ls_enc_layer(model, training_args, config, model_args.enable_quant) # Tokenizer check: this script requires a fast tokenizer. if not isinstance(tokenizer, PreTrainedTokenizerFast): diff --git a/examples/training/huggingface/run_ner.sh b/examples/training/huggingface/run_ner.sh index e37695d1..fc089afe 100644 --- a/examples/training/huggingface/run_ner.sh +++ b/examples/training/huggingface/run_ner.sh @@ -19,14 +19,18 @@ if [ -d "/tmp/test-ner/" ]; then fi python3 -m torch.distributed.launch \ - --nproc_per_node=1 \ + --nproc_per_node=8 \ $THIS_DIR/run_ner.py \ --model_name_or_path bert-large-uncased \ - --per_device_train_batch_size 16 \ --dataset_name conll2003 \ - --output_dir /tmp/test-ner \ --do_train \ --do_eval \ - --num_train_epochs 1 \ - --with_lightseq true \ + --per_device_train_batch_size 16 \ + --num_train_epochs 3 \ + --output_dir /tmp/test-ner \ + --overwrite_output_dir \ --fp16 \ + --seed 1234 \ + --logging_steps 10 \ + --with_lightseq true \ + --enable_quant true diff --git a/examples/training/huggingface/run_ner_no_trainer.py b/examples/training/huggingface/run_ner_no_trainer.py deleted file mode 100644 index 88db653b..00000000 --- a/examples/training/huggingface/run_ner_no_trainer.py +++ /dev/null @@ -1,618 +0,0 @@ -#!/usr/bin/env python -# coding=utf-8 -# Copyright 2021 The HuggingFace Inc. team. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -""" -Fine-tuning a 🤗 Transformers model on token classification tasks (NER, POS, CHUNKS) relying on the accelerate library -without using a Trainer. -""" - -import argparse -import logging -import math -import os -import random - -import datasets -import torch -from datasets import ClassLabel, load_dataset, load_metric -from torch.utils.data.dataloader import DataLoader -from tqdm.auto import tqdm - -import transformers -from accelerate import Accelerator -from transformers import ( - CONFIG_MAPPING, - MODEL_MAPPING, - AdamW, - AutoConfig, - AutoModelForTokenClassification, - AutoTokenizer, - DataCollatorForTokenClassification, - SchedulerType, - default_data_collator, - get_scheduler, - set_seed, -) -from ls_hf_transformer_encoder_layer import inject_ls_enc_layer - -logger = logging.getLogger(__name__) -# You should update this to your particular problem to have better documentation of `model_type` -MODEL_CONFIG_CLASSES = list(MODEL_MAPPING.keys()) -MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES) - - -def parse_args(): - parser = argparse.ArgumentParser( - description="Finetune a transformers model on a text classification task (NER) with accelerate library" - ) - parser.add_argument( - "--dataset_name", - type=str, - default=None, - help="The name of the dataset to use (via the datasets library).", - ) - parser.add_argument( - "--dataset_config_name", - type=str, - default=None, - help="The configuration name of the dataset to use (via the datasets library).", - ) - parser.add_argument( - "--train_file", - type=str, - default=None, - help="A csv or a json file containing the training data.", - ) - parser.add_argument( - "--validation_file", - type=str, - default=None, - help="A csv or a json file containing the validation data.", - ) - parser.add_argument( - "--max_length", - type=int, - default=128, - help=( - "The maximum total input sequence length after tokenization. Sequences longer than this will be truncated," - " sequences shorter will be padded if `--pad_to_max_lenght` is passed." - ), - ) - parser.add_argument( - "--pad_to_max_length", - action="store_true", - help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.", - ) - parser.add_argument( - "--model_name_or_path", - type=str, - help="Path to pretrained model or model identifier from huggingface.co/models.", - required=True, - ) - parser.add_argument( - "--config_name", - type=str, - default=None, - help="Pretrained config name or path if not the same as model_name", - ) - parser.add_argument( - "--tokenizer_name", - type=str, - default=None, - help="Pretrained tokenizer name or path if not the same as model_name", - ) - parser.add_argument( - "--per_device_train_batch_size", - type=int, - default=8, - help="Batch size (per device) for the training dataloader.", - ) - parser.add_argument( - "--per_device_eval_batch_size", - type=int, - default=8, - help="Batch size (per device) for the evaluation dataloader.", - ) - parser.add_argument( - "--learning_rate", - type=float, - default=5e-5, - help="Initial learning rate (after the potential warmup period) to use.", - ) - parser.add_argument( - "--weight_decay", type=float, default=0.0, help="Weight decay to use." - ) - parser.add_argument( - "--num_train_epochs", - type=int, - default=3, - help="Total number of training epochs to perform.", - ) - parser.add_argument( - "--max_train_steps", - type=int, - default=None, - help="Total number of training steps to perform. If provided, overrides num_train_epochs.", - ) - parser.add_argument( - "--gradient_accumulation_steps", - type=int, - default=1, - help="Number of updates steps to accumulate before performing a backward/update pass.", - ) - parser.add_argument( - "--lr_scheduler_type", - type=SchedulerType, - default="linear", - help="The scheduler type to use.", - choices=[ - "linear", - "cosine", - "cosine_with_restarts", - "polynomial", - "constant", - "constant_with_warmup", - ], - ) - parser.add_argument( - "--num_warmup_steps", - type=int, - default=0, - help="Number of steps for the warmup in the lr scheduler.", - ) - parser.add_argument( - "--output_dir", type=str, default=None, help="Where to store the final model." - ) - parser.add_argument( - "--seed", type=int, default=None, help="A seed for reproducible training." - ) - parser.add_argument( - "--model_type", - type=str, - default=None, - help="Model type to use if training from scratch.", - choices=MODEL_TYPES, - ) - parser.add_argument( - "--label_all_tokens", - action="store_true", - help="Setting labels of all special tokens to -100 and thus PyTorch will ignore them.", - ) - parser.add_argument( - "--return_entity_level_metrics", - action="store_true", - help="Indication whether entity level metrics are to be returner.", - ) - parser.add_argument( - "--task_name", - type=str, - default="ner", - choices=["ner", "pos", "chunk"], - help="The name of the task.", - ) - parser.add_argument( - "--debug", - action="store_true", - help="Activate debug mode and run training only with a subset of data.", - ) - args = parser.parse_args() - - # Sanity checks - if ( - args.task_name is None - and args.train_file is None - and args.validation_file is None - ): - raise ValueError("Need either a task name or a training/validation file.") - else: - if args.train_file is not None: - extension = args.train_file.split(".")[-1] - assert extension in [ - "csv", - "json", - ], "`train_file` should be a csv or a json file." - if args.validation_file is not None: - extension = args.validation_file.split(".")[-1] - assert extension in [ - "csv", - "json", - ], "`validation_file` should be a csv or a json file." - - if args.output_dir is not None: - os.makedirs(args.output_dir, exist_ok=True) - - return args - - -def main(): - args = parse_args() - - # Initialize the accelerator. We will let the accelerator handle device placement for us in this example. - accelerator = Accelerator() - # Make one log on every process with the configuration for debugging. - logging.basicConfig( - format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", - datefmt="%m/%d/%Y %H:%M:%S", - level=logging.INFO, - ) - logger.info(accelerator.state) - - # Setup logging, we only want one process per machine to log things on the screen. - # accelerator.is_local_main_process is only True for one process per machine. - logger.setLevel( - logging.INFO if accelerator.is_local_main_process else logging.ERROR - ) - if accelerator.is_local_main_process: - datasets.utils.logging.set_verbosity_warning() - transformers.utils.logging.set_verbosity_info() - else: - datasets.utils.logging.set_verbosity_error() - transformers.utils.logging.set_verbosity_error() - - # If passed along, set the training seed now. - if args.seed is not None: - set_seed(args.seed) - - # Get the datasets: you can either provide your own CSV/JSON/TXT training and evaluation files (see below) - # or just provide the name of one of the public datasets for token classification task available on the hub at https://huggingface.co/datasets/ - # (the dataset will be downloaded automatically from the datasets Hub). - # - # For CSV/JSON files, this script will use the column called 'tokens' or the first column if no column called - # 'tokens' is found. You can easily tweak this behavior (see below). - # - # In distributed training, the load_dataset function guarantee that only one local process can concurrently - # download the dataset. - if args.dataset_name is not None: - # Downloading and loading a dataset from the hub. - raw_datasets = load_dataset(args.dataset_name, args.dataset_config_name) - else: - data_files = {} - if args.train_file is not None: - data_files["train"] = args.train_file - if args.validation_file is not None: - data_files["validation"] = args.validation_file - extension = args.train_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files) - # Trim a number of training examples - if args.debug: - for split in raw_datasets.keys(): - raw_datasets[split] = raw_datasets[split].select(range(100)) - # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at - # https://huggingface.co/docs/datasets/loading_datasets.html. - - if raw_datasets["train"] is not None: - column_names = raw_datasets["train"].column_names - features = raw_datasets["train"].features - else: - column_names = raw_datasets["validation"].column_names - features = raw_datasets["validation"].features - text_column_name = "tokens" if "tokens" in column_names else column_names[0] - label_column_name = ( - f"{args.task_name}_tags" - if f"{args.task_name}_tags" in column_names - else column_names[1] - ) - - # In the event the labels are not a `Sequence[ClassLabel]`, we will need to go through the dataset to get the - # unique labels. - def get_label_list(labels): - unique_labels = set() - for label in labels: - unique_labels = unique_labels | set(label) - label_list = list(unique_labels) - label_list.sort() - return label_list - - if isinstance(features[label_column_name].feature, ClassLabel): - label_list = features[label_column_name].feature.names - # No need to convert the labels since they are already ints. - label_to_id = {i: i for i in range(len(label_list))} - else: - label_list = get_label_list(raw_datasets["train"][label_column_name]) - label_to_id = {l: i for i, l in enumerate(label_list)} - num_labels = len(label_list) - - # Load pretrained model and tokenizer - # - # In distributed training, the .from_pretrained methods guarantee that only one local process can concurrently - # download model & vocab. - if args.config_name: - config = AutoConfig.from_pretrained(args.config_name, num_labels=num_labels) - elif args.model_name_or_path: - config = AutoConfig.from_pretrained( - args.model_name_or_path, num_labels=num_labels - ) - else: - config = CONFIG_MAPPING[args.model_type]() - logger.warning("You are instantiating a new config instance from scratch.") - - if args.tokenizer_name: - tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, use_fast=True) - elif args.model_name_or_path: - tokenizer = AutoTokenizer.from_pretrained( - args.model_name_or_path, use_fast=True - ) - else: - raise ValueError( - "You are instantiating a new tokenizer from scratch. This is not supported by this script." - "You can do it from another script, save it, and load it from here, using --tokenizer_name." - ) - - if args.model_name_or_path: - model = AutoModelForTokenClassification.from_pretrained( - args.model_name_or_path, - from_tf=bool(".ckpt" in args.model_name_or_path), - config=config, - ) - else: - logger.info("Training new model from scratch") - model = AutoModelForTokenClassification.from_config(config) - - model.resize_token_embeddings(len(tokenizer)) - - # Replace with LightSeq encoder layers. - args.local_rank = accelerator.local_process_index - args.fp16 = accelerator.use_fp16 - inject_ls_enc_layer(model, args, config) - - # Preprocessing the raw_datasets. - # First we tokenize all the texts. - padding = "max_length" if args.pad_to_max_length else False - - # Tokenize all texts and align the labels with them. - - def tokenize_and_align_labels(examples): - tokenized_inputs = tokenizer( - examples[text_column_name], - max_length=args.max_length, - padding=padding, - truncation=True, - # We use this argument because the texts in our dataset are lists of words (with a label for each word). - is_split_into_words=True, - ) - - labels = [] - for i, label in enumerate(examples[label_column_name]): - word_ids = tokenized_inputs.word_ids(batch_index=i) - previous_word_idx = None - label_ids = [] - for word_idx in word_ids: - # Special tokens have a word id that is None. We set the label to -100 so they are automatically - # ignored in the loss function. - if word_idx is None: - label_ids.append(-100) - # We set the label for the first token of each word. - elif word_idx != previous_word_idx: - label_ids.append(label_to_id[label[word_idx]]) - # For the other tokens in a word, we set the label to either the current label or -100, depending on - # the label_all_tokens flag. - else: - label_ids.append( - label_to_id[label[word_idx]] if args.label_all_tokens else -100 - ) - previous_word_idx = word_idx - - labels.append(label_ids) - tokenized_inputs["labels"] = labels - return tokenized_inputs - - processed_raw_datasets = raw_datasets.map( - tokenize_and_align_labels, - batched=True, - remove_columns=raw_datasets["train"].column_names, - ) - - train_dataset = processed_raw_datasets["train"] - eval_dataset = processed_raw_datasets["validation"] - - # DataLoaders creation: - if args.pad_to_max_length: - # If padding was already done ot max length, we use the default data collator that will just convert everything - # to tensors. - data_collator = default_data_collator - else: - # Otherwise, `DataCollatorForTokenClassification` will apply dynamic padding for us (by padding to the maximum length of - # the samples passed). When using mixed precision, we add `pad_to_multiple_of=8` to pad all tensors to multiple - # of 8s, which will enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta). - data_collator = DataCollatorForTokenClassification( - tokenizer, pad_to_multiple_of=(8 if accelerator.use_fp16 else None) - ) - - train_dataloader = DataLoader( - train_dataset, - shuffle=True, - collate_fn=data_collator, - batch_size=args.per_device_train_batch_size, - ) - eval_dataloader = DataLoader( - eval_dataset, - collate_fn=data_collator, - batch_size=args.per_device_eval_batch_size, - ) - - # Optimizer - # Split weights in two groups, one with weight decay and the other not. - no_decay = ["bias", "LayerNorm.weight"] - optimizer_grouped_parameters = [ - { - "params": [ - p - for n, p in model.named_parameters() - if not any(nd in n for nd in no_decay) - ], - "weight_decay": args.weight_decay, - }, - { - "params": [ - p - for n, p in model.named_parameters() - if any(nd in n for nd in no_decay) - ], - "weight_decay": 0.0, - }, - ] - optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate) - - # Use the device given by the `accelerator` object. - device = accelerator.device - model.to(device) - - # Prepare everything with our `accelerator`. - model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare( - model, optimizer, train_dataloader, eval_dataloader - ) - - # Note -> the training dataloader needs to be prepared before we grab his length below (cause its length will be - # shorter in multiprocess) - - # Scheduler and math around the number of training steps. - num_update_steps_per_epoch = math.ceil( - len(train_dataloader) / args.gradient_accumulation_steps - ) - if args.max_train_steps is None: - args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch - else: - args.num_train_epochs = math.ceil( - args.max_train_steps / num_update_steps_per_epoch - ) - - lr_scheduler = get_scheduler( - name=args.lr_scheduler_type, - optimizer=optimizer, - num_warmup_steps=args.num_warmup_steps, - num_training_steps=args.max_train_steps, - ) - - # Metrics - metric = load_metric("seqeval") - - def get_labels(predictions, references): - # Transform predictions and references tensos to numpy arrays - if device.type == "cpu": - y_pred = predictions.detach().clone().numpy() - y_true = references.detach().clone().numpy() - else: - y_pred = predictions.detach().cpu().clone().numpy() - y_true = references.detach().cpu().clone().numpy() - - # Remove ignored index (special tokens) - true_predictions = [ - [label_list[p] for (p, l) in zip(pred, gold_label) if l != -100] - for pred, gold_label in zip(y_pred, y_true) - ] - true_labels = [ - [label_list[l] for (p, l) in zip(pred, gold_label) if l != -100] - for pred, gold_label in zip(y_pred, y_true) - ] - return true_predictions, true_labels - - def compute_metrics(): - results = metric.compute() - if args.return_entity_level_metrics: - # Unpack nested dictionaries - final_results = {} - for key, value in results.items(): - if isinstance(value, dict): - for n, v in value.items(): - final_results[f"{key}_{n}"] = v - else: - final_results[key] = value - return final_results - else: - return { - "precision": results["overall_precision"], - "recall": results["overall_recall"], - "f1": results["overall_f1"], - "accuracy": results["overall_accuracy"], - } - - # Train! - total_batch_size = ( - args.per_device_train_batch_size - * accelerator.num_processes - * args.gradient_accumulation_steps - ) - - logger.info("***** Running training *****") - logger.info(f" Num examples = {len(train_dataset)}") - logger.info(f" Num Epochs = {args.num_train_epochs}") - logger.info( - f" Instantaneous batch size per device = {args.per_device_train_batch_size}" - ) - logger.info( - f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}" - ) - logger.info(f" Gradient Accumulation steps = {args.gradient_accumulation_steps}") - logger.info(f" Total optimization steps = {args.max_train_steps}") - # Only show the progress bar once on each machine. - progress_bar = tqdm( - range(args.max_train_steps), disable=not accelerator.is_local_main_process - ) - completed_steps = 0 - - for epoch in range(args.num_train_epochs): - model.train() - for step, batch in enumerate(train_dataloader): - outputs = model(**batch) - loss = outputs.loss - loss = loss / args.gradient_accumulation_steps - accelerator.backward(loss) - if ( - step % args.gradient_accumulation_steps == 0 - or step == len(train_dataloader) - 1 - ): - optimizer.step() - lr_scheduler.step() - optimizer.zero_grad() - progress_bar.update(1) - completed_steps += 1 - - if completed_steps >= args.max_train_steps: - break - - model.eval() - for step, batch in enumerate(eval_dataloader): - with torch.no_grad(): - outputs = model(**batch) - predictions = outputs.logits.argmax(dim=-1) - labels = batch["labels"] - if ( - not args.pad_to_max_length - ): # necessary to pad predictions and labels for being gathered - predictions = accelerator.pad_across_processes( - predictions, dim=1, pad_index=-100 - ) - labels = accelerator.pad_across_processes(labels, dim=1, pad_index=-100) - - predictions_gathered = accelerator.gather(predictions) - labels_gathered = accelerator.gather(labels) - preds, refs = get_labels(predictions_gathered, labels_gathered) - metric.add_batch( - predictions=preds, - references=refs, - ) # predictions and preferences are expected to be a nested list of labels, not label_ids - - eval_metric = metric.compute() - # eval_metric = compute_metrics() - accelerator.print(f"epoch {epoch}:", eval_metric) - - if args.output_dir is not None: - accelerator.wait_for_everyone() - unwrapped_model = accelerator.unwrap_model(model) - unwrapped_model.save_pretrained(args.output_dir, save_function=accelerator.save) - - -if __name__ == "__main__": - main() diff --git a/examples/training/huggingface/run_ner_no_trainer.sh b/examples/training/huggingface/run_ner_no_trainer.sh deleted file mode 100644 index 278aa9cc..00000000 --- a/examples/training/huggingface/run_ner_no_trainer.sh +++ /dev/null @@ -1,26 +0,0 @@ -# Copyright 2020 The HuggingFace Team. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -THIS_DIR=$(dirname $(readlink -f $0)) - -if [ -d "/tmp/test-ner/" ]; then - rm -rf /tmp/test-ner/ -fi - -accelerate launch $THIS_DIR/run_ner_no_trainer.py \ - --model_name_or_path bert-large-uncased \ - --dataset_name conll2003 \ - --output_dir /tmp/test-ner \ - --task_name ner \ - --num_train_epochs 1 diff --git a/lightseq/training/ops/pytorch/quantization.py b/lightseq/training/ops/pytorch/quantization.py index f3676603..64f53440 100644 --- a/lightseq/training/ops/pytorch/quantization.py +++ b/lightseq/training/ops/pytorch/quantization.py @@ -29,14 +29,10 @@ class QuantLinear(Linear): def __init__(self, in_features, out_features, pre_activation=None, *args, **kwargs): super(QuantLinear, self).__init__(in_features, out_features, *args, **kwargs) - if pre_activation is None or pre_activation == "encoder_out": - input_quant_config = act_quant_config - elif pre_activation == "relu": + if pre_activation == "relu": input_quant_config = relu_quant_config else: - raise NotImplementedError( - f"pre_activation {pre_activation} is not supported" - ) + input_quant_config = act_quant_config self.input_quant = None if pre_activation != "encoder_out": diff --git a/lightseq/training/ops/pytorch/torch_transformer_layers.py b/lightseq/training/ops/pytorch/torch_transformer_layers.py index f704c6f3..1f59b61b 100644 --- a/lightseq/training/ops/pytorch/torch_transformer_layers.py +++ b/lightseq/training/ops/pytorch/torch_transformer_layers.py @@ -535,11 +535,30 @@ def __init__(self, config, initial_weights=None, initial_biases=None): config.intermediate_size, ) self.fc2 = QuantLinear( - config.intermediate_size, self.embed_dim, pre_activation="relu" + config.intermediate_size, + self.embed_dim, + pre_activation=config.activation_fn, ) self.final_layer_norm = LayerNorm(self.embed_dim) + if initial_weights is None or initial_biases is None: + return + + # load initial weights + self.self_attn.qkv_proj.weight = nn.Parameter(torch.cat(initial_weights[:3], 0)) + self.self_attn.qkv_proj.bias = nn.Parameter(torch.cat(initial_biases[:3], 0)) + self.self_attn.out_proj.weight = nn.Parameter(initial_weights[3]) + self.self_attn.out_proj.bias = nn.Parameter(initial_biases[3]) + self.self_attn_layer_norm.weight = nn.Parameter(initial_weights[4]) + self.self_attn_layer_norm.bias = nn.Parameter(initial_biases[4]) + self.fc1.weight = nn.Parameter(initial_weights[5]) + self.fc1.bias = nn.Parameter(initial_biases[5]) + self.fc2.weight = nn.Parameter(initial_weights[6]) + self.fc2.bias = nn.Parameter(initial_biases[6]) + self.final_layer_norm.weight = nn.Parameter(initial_weights[7]) + self.final_layer_norm.bias = nn.Parameter(initial_biases[7]) + def build_self_attention(self, embed_dim, nhead, attn_dropout): return MultiheadAttention( embed_dim, @@ -654,7 +673,7 @@ def __init__(self, config, initial_weights=None, initial_biases=None): self.fc2 = QuantLinear( config.intermediate_size, self.embed_dim, - pre_activation="relu", + pre_activation=config.activation_fn, ) self.final_layer_norm = LayerNorm(self.embed_dim) @@ -662,6 +681,33 @@ def __init__(self, config, initial_weights=None, initial_biases=None): self.onnx_trace = False + if initial_weights is None or initial_biases is None: + return + + # load initial weights + self.self_attn.qkv_proj.weight = nn.Parameter(torch.cat(initial_weights[:3], 0)) + self.self_attn.qkv_proj.bias = nn.Parameter(torch.cat(initial_biases[:3], 0)) + self.self_attn.out_proj.weight = nn.Parameter(initial_weights[3]) + self.self_attn.out_proj.bias = nn.Parameter(initial_biases[3]) + self.self_attn_layer_norm.weight = nn.Parameter(initial_weights[4]) + self.self_attn_layer_norm.bias = nn.Parameter(initial_biases[4]) + self.encoder_attn.q_proj.weight = nn.Parameter(initial_weights[5]) + self.encoder_attn.q_proj.bias = nn.Parameter(initial_weights[5]) + self.encoder_attn.k_proj.weight = nn.Parameter(initial_weights[6]) + self.encoder_attn.k_proj.bias = nn.Parameter(initial_weights[6]) + self.encoder_attn.v_proj.weight = nn.Parameter(initial_weights[7]) + self.encoder_attn.v_proj.bias = nn.Parameter(initial_weights[7]) + self.encoder_attn.out_proj.weight = nn.Parameter(initial_weights[8]) + self.encoder_attn.out_proj.bias = nn.Parameter(initial_biases[8]) + self.encoder_attn_layer_norm.weight = nn.Parameter(initial_weights[9]) + self.encoder_attn_layer_norm.bias = nn.Parameter(initial_biases[9]) + self.fc1.weight = nn.Parameter(initial_weights[10]) + self.fc1.bias = nn.Parameter(initial_biases[10]) + self.fc2.weight = nn.Parameter(initial_weights[11]) + self.fc2.bias = nn.Parameter(initial_biases[11]) + self.final_layer_norm.weight = nn.Parameter(initial_weights[12]) + self.final_layer_norm.bias = nn.Parameter(initial_biases[12]) + def build_self_attention( self, embed_dim, nhead, attn_dropout, add_bias_kv=False, add_zero_attn=False ): @@ -847,7 +893,7 @@ def make_generation_fast_(self, need_attn: bool = False, **kwargs): class TransformerEmbeddingLayer(TransformerEmbeddingLayerBase): - def __init__(self, config): + def __init__(self, config, initial_embeddings=None): super().__init__() self.emb_lookup = nn.Embedding( @@ -855,9 +901,13 @@ def __init__(self, config): ) self.emb_lookup.to(dtype=(torch.half if config.fp16 else torch.float)) self.embeddings = self.emb_lookup.weight - nn.init.normal_(self.embeddings, mean=0, std=config.embedding_dim ** -0.5) nn.init.constant_(self.embeddings[config.padding_idx], 0) + + # load initial weights + if initial_embeddings is not None: + self.emb_lookup.weight = nn.Parameter(initial_embeddings) + self.embed_positions = SinusoidalPositionalEmbedding( config.embedding_dim, config.padding_idx, config.max_seq_len, config.fp16 ) diff --git a/lightseq/training/ops/pytorch/transformer_decoder_layer.py b/lightseq/training/ops/pytorch/transformer_decoder_layer.py index 08fd08c3..d2d7ac8a 100644 --- a/lightseq/training/ops/pytorch/transformer_decoder_layer.py +++ b/lightseq/training/ops/pytorch/transformer_decoder_layer.py @@ -160,7 +160,7 @@ def __init__(self, config, initial_weights=None, initial_biases=None): self.para_offset = self.para_offset[:-2] self.para = nn.Parameter(torch.Tensor(self.para_offset[-1])) - if initial_weights is None and initial_biases is None: + if initial_weights is None or initial_biases is None: # enc-dec kv weights and bias self.init_transformer_weights() return diff --git a/lightseq/training/ops/pytorch/transformer_encoder_layer.py b/lightseq/training/ops/pytorch/transformer_encoder_layer.py index 6d94587d..fea35d6c 100644 --- a/lightseq/training/ops/pytorch/transformer_encoder_layer.py +++ b/lightseq/training/ops/pytorch/transformer_encoder_layer.py @@ -109,7 +109,7 @@ def __init__(self, config, initial_weights=None, initial_biases=None): self.para_offset = LSTransformerEncoderLayer.gen_offset(hs, ims) self.para = nn.Parameter(torch.Tensor(self.para_offset[-1])) - if initial_weights is None and initial_biases is None: + if initial_weights is None or initial_biases is None: self.init_transformer_weights() return From 8c72f26ca7ecf807c26f75b6f2b684a88521fdb6 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 17 Mar 2022 14:29:20 +0800 Subject: [PATCH 05/49] fix huggingface bert weight loading fp16 bug --- .../ls_hf_transformer_encoder_layer.py | 12 +- examples/training/huggingface/run_ner.sh | 2 +- .../training/ops/pytorch/gpt_encoder_layer.py | 7 - lightseq/training/ops/pytorch/quantization.py | 4 +- .../ops/pytorch/torch_transformer_layers.py | 132 +++++++++++++----- .../ops/pytorch/transformer_decoder_layer.py | 4 - .../pytorch/transformer_embedding_layer.py | 4 - .../ops/pytorch/transformer_encoder_layer.py | 31 ---- 8 files changed, 98 insertions(+), 98 deletions(-) diff --git a/examples/training/huggingface/ls_hf_transformer_encoder_layer.py b/examples/training/huggingface/ls_hf_transformer_encoder_layer.py index 19e29ac7..5564bd0c 100644 --- a/examples/training/huggingface/ls_hf_transformer_encoder_layer.py +++ b/examples/training/huggingface/ls_hf_transformer_encoder_layer.py @@ -1,12 +1,4 @@ -import random - -from lightseq.training.ops.pytorch.quantization import ( - TensorQuantizer, - enable_quant, - disable_quant, - qat_mode, - ptq_mode, -) +from lightseq.training.ops.pytorch.quantization import qat_mode def get_hf_bert_enc_layer_params(layer): @@ -78,4 +70,4 @@ def gen_bert_config(training_args, config): bert_config, init_ws, init_bs ).cuda() if enable_quant: - model.bert.encoder.layer[i].apply(disable_quant) + model.bert.encoder.layer[i].apply(qat_mode) diff --git a/examples/training/huggingface/run_ner.sh b/examples/training/huggingface/run_ner.sh index fc089afe..01a36104 100644 --- a/examples/training/huggingface/run_ner.sh +++ b/examples/training/huggingface/run_ner.sh @@ -26,7 +26,7 @@ python3 -m torch.distributed.launch \ --do_train \ --do_eval \ --per_device_train_batch_size 16 \ - --num_train_epochs 3 \ + --num_train_epochs 10 \ --output_dir /tmp/test-ner \ --overwrite_output_dir \ --fp16 \ diff --git a/lightseq/training/ops/pytorch/gpt_encoder_layer.py b/lightseq/training/ops/pytorch/gpt_encoder_layer.py index ef6409a0..bfd9a559 100644 --- a/lightseq/training/ops/pytorch/gpt_encoder_layer.py +++ b/lightseq/training/ops/pytorch/gpt_encoder_layer.py @@ -1,16 +1,9 @@ -import math -from dataclasses import dataclass - import torch -from torch import nn -from torch.autograd import Function -from lightseq.training.ops.pytorch.builder.transformer_builder import TransformerBuilder from lightseq.training.ops.pytorch import transformer_cuda_module from lightseq.training.ops.pytorch.transformer_encoder_layer import ( LSTransformerEncoderLayer, ) -from lightseq.training.ops.pytorch.util import copy_para class LSGptEncoderLayer(LSTransformerEncoderLayer): diff --git a/lightseq/training/ops/pytorch/quantization.py b/lightseq/training/ops/pytorch/quantization.py index 64f53440..4fcf062f 100644 --- a/lightseq/training/ops/pytorch/quantization.py +++ b/lightseq/training/ops/pytorch/quantization.py @@ -1,7 +1,5 @@ -from audioop import bias -import torch import torch.nn.functional as F -from torch.nn import Parameter, Linear +from torch.nn import Linear from lightseq.training.pytorch_quantization.tensor_quant import ( QuantDescriptor, QUANT_DESC_8BIT_PER_TENSOR, diff --git a/lightseq/training/ops/pytorch/torch_transformer_layers.py b/lightseq/training/ops/pytorch/torch_transformer_layers.py index 1f59b61b..9a9b8c63 100644 --- a/lightseq/training/ops/pytorch/torch_transformer_layers.py +++ b/lightseq/training/ops/pytorch/torch_transformer_layers.py @@ -6,12 +6,11 @@ import math import uuid -from typing import Dict, Optional, Tuple, List +from typing import Dict, Optional, List import torch -import torch.nn.functional as F from torch import Tensor, nn -from torch.nn import Parameter, LayerNorm, Dropout, Linear +from torch.nn import Parameter, LayerNorm, Dropout from lightseq.training.ops.pytorch import util from lightseq.training.ops.pytorch.layer_base import ( @@ -27,6 +26,11 @@ ) +def copy_para(x, fp16): + y = util.copy_para(x) + return y.half() if fp16 else y.float() + + class MultiheadAttention(nn.Module): """Multi-headed attention. @@ -546,18 +550,32 @@ def __init__(self, config, initial_weights=None, initial_biases=None): return # load initial weights - self.self_attn.qkv_proj.weight = nn.Parameter(torch.cat(initial_weights[:3], 0)) - self.self_attn.qkv_proj.bias = nn.Parameter(torch.cat(initial_biases[:3], 0)) - self.self_attn.out_proj.weight = nn.Parameter(initial_weights[3]) - self.self_attn.out_proj.bias = nn.Parameter(initial_biases[3]) - self.self_attn_layer_norm.weight = nn.Parameter(initial_weights[4]) - self.self_attn_layer_norm.bias = nn.Parameter(initial_biases[4]) - self.fc1.weight = nn.Parameter(initial_weights[5]) - self.fc1.bias = nn.Parameter(initial_biases[5]) - self.fc2.weight = nn.Parameter(initial_weights[6]) - self.fc2.bias = nn.Parameter(initial_biases[6]) - self.final_layer_norm.weight = nn.Parameter(initial_weights[7]) - self.final_layer_norm.bias = nn.Parameter(initial_biases[7]) + self.self_attn.qkv_proj.weight.data.copy_( + copy_para(torch.cat(initial_weights[:3], 0), config.fp16) + ) + self.self_attn.qkv_proj.bias.data.copy_( + copy_para(torch.cat(initial_biases[:3], 0), config.fp16) + ) + self.self_attn.out_proj.weight.data.copy_( + copy_para(initial_weights[3], config.fp16) + ) + self.self_attn.out_proj.bias.data.copy_( + copy_para(initial_biases[3], config.fp16) + ) + self.self_attn_layer_norm.weight.data.copy_( + copy_para(initial_weights[4], config.fp16) + ) + self.self_attn_layer_norm.bias.data.copy_( + copy_para(initial_biases[4], config.fp16) + ) + self.fc1.weight.data.copy_(copy_para(initial_weights[5], config.fp16)) + self.fc1.bias.data.copy_(copy_para(initial_biases[5], config.fp16)) + self.fc2.weight.data.copy_(copy_para(initial_weights[6], config.fp16)) + self.fc2.bias.data.copy_(copy_para(initial_biases[6], config.fp16)) + self.final_layer_norm.weight.data.copy_( + copy_para(initial_weights[7], config.fp16) + ) + self.final_layer_norm.bias.data.copy_(copy_para(initial_biases[7], config.fp16)) def build_self_attention(self, embed_dim, nhead, attn_dropout): return MultiheadAttention( @@ -685,28 +703,64 @@ def __init__(self, config, initial_weights=None, initial_biases=None): return # load initial weights - self.self_attn.qkv_proj.weight = nn.Parameter(torch.cat(initial_weights[:3], 0)) - self.self_attn.qkv_proj.bias = nn.Parameter(torch.cat(initial_biases[:3], 0)) - self.self_attn.out_proj.weight = nn.Parameter(initial_weights[3]) - self.self_attn.out_proj.bias = nn.Parameter(initial_biases[3]) - self.self_attn_layer_norm.weight = nn.Parameter(initial_weights[4]) - self.self_attn_layer_norm.bias = nn.Parameter(initial_biases[4]) - self.encoder_attn.q_proj.weight = nn.Parameter(initial_weights[5]) - self.encoder_attn.q_proj.bias = nn.Parameter(initial_weights[5]) - self.encoder_attn.k_proj.weight = nn.Parameter(initial_weights[6]) - self.encoder_attn.k_proj.bias = nn.Parameter(initial_weights[6]) - self.encoder_attn.v_proj.weight = nn.Parameter(initial_weights[7]) - self.encoder_attn.v_proj.bias = nn.Parameter(initial_weights[7]) - self.encoder_attn.out_proj.weight = nn.Parameter(initial_weights[8]) - self.encoder_attn.out_proj.bias = nn.Parameter(initial_biases[8]) - self.encoder_attn_layer_norm.weight = nn.Parameter(initial_weights[9]) - self.encoder_attn_layer_norm.bias = nn.Parameter(initial_biases[9]) - self.fc1.weight = nn.Parameter(initial_weights[10]) - self.fc1.bias = nn.Parameter(initial_biases[10]) - self.fc2.weight = nn.Parameter(initial_weights[11]) - self.fc2.bias = nn.Parameter(initial_biases[11]) - self.final_layer_norm.weight = nn.Parameter(initial_weights[12]) - self.final_layer_norm.bias = nn.Parameter(initial_biases[12]) + self.self_attn.qkv_proj.weight.data.copy_( + copy_para(torch.cat(initial_weights[:3], 0), config.fp16) + ) + self.self_attn.qkv_proj.bias.data.copy_( + copy_para(torch.cat(initial_biases[:3], 0), config.fp16) + ) + self.self_attn.out_proj.weight.data.copy_( + copy_para(initial_weights[3], config.fp16) + ) + self.self_attn.out_proj.bias.data.copy_( + copy_para(initial_biases[3], config.fp16) + ) + self.self_attn_layer_norm.weight.data.copy_( + copy_para(initial_weights[4], config.fp16) + ) + self.self_attn_layer_norm.bias.data.copy_( + copy_para(initial_biases[4], config.fp16) + ) + self.encoder_attn.q_proj.weight.data.copy_( + copy_para(initial_weights[5], config.fp16) + ) + self.encoder_attn.q_proj.bias.data.copy_( + copy_para(initial_weights[5], config.fp16) + ) + self.encoder_attn.k_proj.weight.data.copy_( + copy_para(initial_weights[6], config.fp16) + ) + self.encoder_attn.k_proj.bias.data.copy_( + copy_para(initial_weights[6], config.fp16) + ) + self.encoder_attn.v_proj.weight.data.copy_( + copy_para(initial_weights[7], config.fp16) + ) + self.encoder_attn.v_proj.bias.data.copy_( + copy_para(initial_weights[7], config.fp16) + ) + self.encoder_attn.out_proj.weight.data.copy_( + copy_para(initial_weights[8], config.fp16) + ) + self.encoder_attn.out_proj.bias.data.copy_( + copy_para(initial_biases[8], config.fp16) + ) + self.encoder_attn_layer_norm.weight.data.copy_( + copy_para(initial_weights[9], config.fp16) + ) + self.encoder_attn_layer_norm.bias.data.copy_( + copy_para(initial_biases[9], config.fp16) + ) + self.fc1.weight.data.copy_(copy_para(initial_weights[10], config.fp16)) + self.fc1.bias.data.copy_(copy_para(initial_biases[10], config.fp16)) + self.fc2.weight.data.copy_(copy_para(initial_weights[11], config.fp16)) + self.fc2.bias.data.copy_(copy_para(initial_biases[11], config.fp16)) + self.final_layer_norm.weight.data.copy_( + copy_para(initial_weights[12], config.fp16) + ) + self.final_layer_norm.bias.data.copy_( + copy_para(initial_biases[12], config.fp16) + ) def build_self_attention( self, embed_dim, nhead, attn_dropout, add_bias_kv=False, add_zero_attn=False @@ -906,7 +960,9 @@ def __init__(self, config, initial_embeddings=None): # load initial weights if initial_embeddings is not None: - self.emb_lookup.weight = nn.Parameter(initial_embeddings) + self.emb_lookup.weight.data.copy_( + copy_para(initial_embeddings, config.fp16) + ) self.embed_positions = SinusoidalPositionalEmbedding( config.embedding_dim, config.padding_idx, config.max_seq_len, config.fp16 diff --git a/lightseq/training/ops/pytorch/transformer_decoder_layer.py b/lightseq/training/ops/pytorch/transformer_decoder_layer.py index d2d7ac8a..6e722f0d 100644 --- a/lightseq/training/ops/pytorch/transformer_decoder_layer.py +++ b/lightseq/training/ops/pytorch/transformer_decoder_layer.py @@ -1,17 +1,13 @@ import math -from dataclasses import dataclass import torch from torch import nn from torch.autograd import Function from lightseq.training.ops.pytorch import transformer_cuda_module -from lightseq.training.ops.pytorch.builder import TransformerBuilder from lightseq.training.ops.pytorch.util import ( copy_para, state_dict, - MODEL_ARCH, - check_config, calc_offset, ) from lightseq.training.ops.pytorch.layer_base import TransformerDecoderLayerBase diff --git a/lightseq/training/ops/pytorch/transformer_embedding_layer.py b/lightseq/training/ops/pytorch/transformer_embedding_layer.py index 757eb57f..1ff65da9 100644 --- a/lightseq/training/ops/pytorch/transformer_embedding_layer.py +++ b/lightseq/training/ops/pytorch/transformer_embedding_layer.py @@ -1,12 +1,8 @@ -import math -from dataclasses import dataclass - import torch from torch import nn from torch.autograd import Function from lightseq.training.ops.pytorch import transformer_cuda_module -from lightseq.training.ops.pytorch.builder import TransformerBuilder from lightseq.training.ops.pytorch.util import state_dict, get_pos_embedding from lightseq.training.ops.pytorch.layer_base import TransformerEmbeddingLayerBase diff --git a/lightseq/training/ops/pytorch/transformer_encoder_layer.py b/lightseq/training/ops/pytorch/transformer_encoder_layer.py index fea35d6c..b249d12c 100644 --- a/lightseq/training/ops/pytorch/transformer_encoder_layer.py +++ b/lightseq/training/ops/pytorch/transformer_encoder_layer.py @@ -1,5 +1,4 @@ import math -from dataclasses import dataclass import torch from torch import nn @@ -7,12 +6,9 @@ from lightseq.training.ops.pytorch.layer_base import TransformerEncoderLayerBase from lightseq.training.ops.pytorch import transformer_cuda_module -from lightseq.training.ops.pytorch.builder import TransformerBuilder from lightseq.training.ops.pytorch.util import ( copy_para, state_dict, - MODEL_ARCH, - check_config, calc_offset, ) @@ -134,33 +130,6 @@ def __init__(self, config, initial_weights=None, initial_biases=None): cur_para.copy_(b.view(-1)) idx += 1 - @staticmethod - def get_config(**kwargs): - @dataclass - class Config: - max_batch_tokens: int # max batch token numbers - max_seq_len: int # max sequence length - hidden_size: int # size of transformer hidden layers - intermediate_size: int # size of ffn inner size - nhead: int # number of heads in attention - attn_prob_dropout_ratio: float # attention score dropout ratio - activation_dropout_ratio: float # ffn activation dropout ratio - hidden_dropout_ratio: float # dropout ration before residual - pre_layer_norm: bool # pre layer norm or post - fp16: bool # fp16 presion - local_rank: int # rank in local node - activation_fn: str = "relu" # relu or gelu - - if "model" in kwargs: - if kwargs["model"] not in MODEL_ARCH: - raise ValueError("{} architecture is not supported.") - MODEL_ARCH[kwargs["model"]](kwargs) - del kwargs["model"] - - config = Config(**kwargs) - check_config(config) - return config - @staticmethod def gen_offset(hidden_size, intermediate_size): hs, ims = hidden_size, intermediate_size From 1d6c37666eab7a36950d81123678b5704a3a983c Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Fri, 18 Mar 2022 00:20:21 +0800 Subject: [PATCH 06/49] finetune quant bert from fp16 ckpt --- .../ls_hf_transformer_encoder_layer.py | 17 ++++++--- examples/training/huggingface/run_glue.py | 12 +++--- examples/training/huggingface/run_glue.sh | 5 +-- examples/training/huggingface/run_ner.py | 12 +++--- examples/training/huggingface/run_ner.sh | 8 +--- .../training/huggingface/run_quant_glue.sh | 37 +++++++++++++++++++ .../training/huggingface/run_quant_ner.sh | 33 +++++++++++++++++ 7 files changed, 99 insertions(+), 25 deletions(-) create mode 100644 examples/training/huggingface/run_quant_glue.sh create mode 100644 examples/training/huggingface/run_quant_ner.sh diff --git a/examples/training/huggingface/ls_hf_transformer_encoder_layer.py b/examples/training/huggingface/ls_hf_transformer_encoder_layer.py index 5564bd0c..2c6da210 100644 --- a/examples/training/huggingface/ls_hf_transformer_encoder_layer.py +++ b/examples/training/huggingface/ls_hf_transformer_encoder_layer.py @@ -1,4 +1,4 @@ -from lightseq.training.ops.pytorch.quantization import qat_mode +from lightseq.training.ops.pytorch.quantization import qat_mode, disable_quant def get_hf_bert_enc_layer_params(layer): @@ -26,15 +26,17 @@ def get_hf_bert_enc_layer_params(layer): return init_ws, init_bs -def inject_ls_enc_layer(model, training_args, config, enable_quant=False): - if enable_quant: +def inject_ls_enc_layer(model, training_args, model_args, config): + if model_args.model_type == 2: from lightseq.training.ops.pytorch.torch_transformer_layers import ( TransformerEncoderLayer, ) - else: + elif model_args.model_type == 1: from lightseq.training.ops.pytorch.transformer_encoder_layer import ( LSTransformerEncoderLayer as TransformerEncoderLayer, ) + else: + raise NotImplementedError class LSHFTransformerEncoderLayer(TransformerEncoderLayer): def __init__(self, *args, **kwargs): @@ -69,5 +71,8 @@ def gen_bert_config(training_args, config): model.bert.encoder.layer[i] = LSHFTransformerEncoderLayer( bert_config, init_ws, init_bs ).cuda() - if enable_quant: - model.bert.encoder.layer[i].apply(qat_mode) + if model_args.model_type == 2: + if model_args.enable_quant: + model.bert.encoder.layer[i].apply(qat_mode) + else: + model.bert.encoder.layer[i].apply(disable_quant) diff --git a/examples/training/huggingface/run_glue.py b/examples/training/huggingface/run_glue.py index 0c07c916..e60dd39b 100644 --- a/examples/training/huggingface/run_glue.py +++ b/examples/training/huggingface/run_glue.py @@ -224,9 +224,11 @@ class ModelArguments: "with private models)." }, ) - with_lightseq: bool = field( - default=True, - metadata={"help": "Whether to use lightseq TransformerEncoder"}, + model_type: int = field( + default=1, + metadata={ + "help": "0: original Hugging Face layer, 1: LightSeq CUDA layer, 2: custom Torch layer" + }, ) enable_quant: bool = field( default=False, @@ -414,8 +416,8 @@ def main(): ) # Replace with LightSeq encoder layers. - if model_args.with_lightseq: - inject_ls_enc_layer(model, training_args, config, model_args.enable_quant) + if model_args.model_type == 1 or model_args.model_type == 2: + inject_ls_enc_layer(model, training_args, model_args, config) # Preprocessing the datasets if data_args.task_name is not None: diff --git a/examples/training/huggingface/run_glue.sh b/examples/training/huggingface/run_glue.sh index 3a7cc33e..63eb2447 100644 --- a/examples/training/huggingface/run_glue.sh +++ b/examples/training/huggingface/run_glue.sh @@ -29,9 +29,8 @@ python3 -m torch.distributed.launch \ --learning_rate 2e-5 \ --num_train_epochs 50 \ --output_dir /tmp/$TASK_NAME/ \ - --overwrite_output_dir \ --fp16 \ --seed 1234 \ --logging_steps 10 \ - --with_lightseq true \ - --enable_quant true + --model_type 2 \ + --enable_quant false diff --git a/examples/training/huggingface/run_ner.py b/examples/training/huggingface/run_ner.py index eea1da87..c2b3fd6f 100644 --- a/examples/training/huggingface/run_ner.py +++ b/examples/training/huggingface/run_ner.py @@ -94,9 +94,11 @@ class ModelArguments: "with private models)." }, ) - with_lightseq: bool = field( - default=True, - metadata={"help": "Whether to use lightseq TransformerEncoder"}, + model_type: int = field( + default=1, + metadata={ + "help": "0: original Hugging Face layer, 1: LightSeq CUDA layer, 2: custom Torch layer" + }, ) enable_quant: bool = field( default=False, @@ -373,8 +375,8 @@ def get_label_list(labels): ) # Replace with LightSeq encoder layers. - if model_args.with_lightseq: - inject_ls_enc_layer(model, training_args, config, model_args.enable_quant) + if model_args.model_type == 1 or model_args.model_type == 2: + inject_ls_enc_layer(model, training_args, model_args, config) # Tokenizer check: this script requires a fast tokenizer. if not isinstance(tokenizer, PreTrainedTokenizerFast): diff --git a/examples/training/huggingface/run_ner.sh b/examples/training/huggingface/run_ner.sh index 01a36104..a5de37fa 100644 --- a/examples/training/huggingface/run_ner.sh +++ b/examples/training/huggingface/run_ner.sh @@ -14,10 +14,6 @@ THIS_DIR=$(dirname $(readlink -f $0)) -if [ -d "/tmp/test-ner/" ]; then - rm -rf /tmp/test-ner/ -fi - python3 -m torch.distributed.launch \ --nproc_per_node=8 \ $THIS_DIR/run_ner.py \ @@ -32,5 +28,5 @@ python3 -m torch.distributed.launch \ --fp16 \ --seed 1234 \ --logging_steps 10 \ - --with_lightseq true \ - --enable_quant true + --model_type 2 \ + --enable_quant false diff --git a/examples/training/huggingface/run_quant_glue.sh b/examples/training/huggingface/run_quant_glue.sh new file mode 100644 index 00000000..f923a3df --- /dev/null +++ b/examples/training/huggingface/run_quant_glue.sh @@ -0,0 +1,37 @@ +# Copyright 2021 The LightSeq Team +# Copyright 2020 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +THIS_DIR=$(dirname $(readlink -f $0)) + +export TASK_NAME=stsb + +python3 -m torch.distributed.launch \ + --nproc_per_node=8 \ + $THIS_DIR/run_glue.py \ + --model_name_or_path bert-large-cased \ + --task_name $TASK_NAME \ + --do_train \ + --do_eval \ + --max_seq_length 128 \ + --per_device_train_batch_size 32 \ + --num_train_epochs 100 \ + --output_dir /tmp/quant/$TASK_NAME/ \ + --overwrite_output_dir \ + --resume_from_checkpoint /tmp/$TASK_NAME/ \ + --fp16 \ + --seed 1234 \ + --logging_steps 10 \ + --model_type 2 \ + --enable_quant true diff --git a/examples/training/huggingface/run_quant_ner.sh b/examples/training/huggingface/run_quant_ner.sh new file mode 100644 index 00000000..e822de30 --- /dev/null +++ b/examples/training/huggingface/run_quant_ner.sh @@ -0,0 +1,33 @@ +# Copyright 2020 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +THIS_DIR=$(dirname $(readlink -f $0)) + +python3 -m torch.distributed.launch \ + --nproc_per_node=8 \ + $THIS_DIR/run_ner.py \ + --model_name_or_path bert-large-uncased \ + --dataset_name conll2003 \ + --do_train \ + --do_eval \ + --per_device_train_batch_size 16 \ + --num_train_epochs 20 \ + --output_dir /tmp/quant/test-ner \ + --overwrite_output_dir \ + --resume_from_checkpoint /tmp/test-ner/ \ + --fp16 \ + --seed 1234 \ + --logging_steps 10 \ + --model_type 2 \ + --enable_quant true From 872b54013e77092b5a8efe453fb5b1daa17b04c1 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Fri, 18 Mar 2022 23:39:02 +0800 Subject: [PATCH 07/49] add emb quant of bert --- examples/training/huggingface/gpt/run_clm.py | 8 +- .../ls_hf_transformer_encoder_layer.py | 56 ++++++++++--- examples/training/huggingface/run_glue.py | 8 +- examples/training/huggingface/run_glue.sh | 7 +- examples/training/huggingface/run_ner.py | 8 +- examples/training/huggingface/run_ner.sh | 2 +- .../training/huggingface/run_quant_glue.sh | 7 +- .../training/huggingface/run_quant_ner.sh | 2 +- .../ops/pytorch/torch_transformer_layers.py | 78 +++++++++++++++++++ 9 files changed, 147 insertions(+), 29 deletions(-) diff --git a/examples/training/huggingface/gpt/run_clm.py b/examples/training/huggingface/gpt/run_clm.py index 90b9dd8d..807d1934 100644 --- a/examples/training/huggingface/gpt/run_clm.py +++ b/examples/training/huggingface/gpt/run_clm.py @@ -66,7 +66,7 @@ MODEL_CONFIG_CLASSES = list(MODEL_FOR_CAUSAL_LM_MAPPING.keys()) -MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES) +module_typeS = tuple(conf.module_type for conf in MODEL_CONFIG_CLASSES) @dataclass @@ -82,11 +82,11 @@ class ModelArguments: "Don't set if you want to train a model from scratch." }, ) - model_type: Optional[str] = field( + module_type: Optional[str] = field( default=None, metadata={ "help": "If training from scratch, pass a model type from the list: " - + ", ".join(MODEL_TYPES) + + ", ".join(module_typeS) }, ) config_overrides: Optional[str] = field( @@ -390,7 +390,7 @@ def main(): model_args.model_name_or_path, **config_kwargs ) else: - config = CONFIG_MAPPING[model_args.model_type]() + config = CONFIG_MAPPING[model_args.module_type]() logger.warning("You are instantiating a new config instance from scratch.") if model_args.config_overrides is not None: logger.info(f"Overriding config: {model_args.config_overrides}") diff --git a/examples/training/huggingface/ls_hf_transformer_encoder_layer.py b/examples/training/huggingface/ls_hf_transformer_encoder_layer.py index 2c6da210..6ad9b8d8 100644 --- a/examples/training/huggingface/ls_hf_transformer_encoder_layer.py +++ b/examples/training/huggingface/ls_hf_transformer_encoder_layer.py @@ -1,4 +1,5 @@ from lightseq.training.ops.pytorch.quantization import qat_mode, disable_quant +from lightseq.training.ops.pytorch.torch_transformer_layers import BertEmbeddingLayer def get_hf_bert_enc_layer_params(layer): @@ -26,18 +27,55 @@ def get_hf_bert_enc_layer_params(layer): return init_ws, init_bs -def inject_ls_enc_layer(model, training_args, model_args, config): - if model_args.model_type == 2: +def get_hf_bert_emb_layer_params(layer): + init_ws = [] + + init_ws.append(layer.word_embeddings.weight.detach().clone()) + init_ws.append(layer.position_embeddings.weight.detach().clone()) + init_ws.append(layer.token_type_embeddings.weight.detach().clone()) + init_ws.append(layer.LayerNorm.weight.detach().clone()) + init_ws.append(layer.LayerNorm.bias.detach().clone()) + + return init_ws + + +def gen_bert_emb_config(training_args, config): + bert_emb_config = BertEmbeddingLayer.get_config( + vocab_size=config.vocab_size, + embedding_dim=config.hidden_size, + max_batch_tokens=4096, + max_seq_len=config.max_position_embeddings, + padding_idx=config.pad_token_id, + dropout=config.hidden_dropout_prob, + fp16=training_args.fp16, + local_rank=training_args.local_rank, + ) + bert_emb_config.type_vocab_size = config.type_vocab_size + bert_emb_config.layer_norm_eps = config.layer_norm_eps + return bert_emb_config + + +def inject_ls_layer(model, training_args, model_args, config): + if model_args.module_type == 2: from lightseq.training.ops.pytorch.torch_transformer_layers import ( TransformerEncoderLayer, ) - elif model_args.model_type == 1: + elif model_args.module_type == 1: from lightseq.training.ops.pytorch.transformer_encoder_layer import ( LSTransformerEncoderLayer as TransformerEncoderLayer, ) else: raise NotImplementedError + if model_args.module_type == 2: + bert_emb_config = gen_bert_emb_config(training_args, config) + init_ws = get_hf_bert_emb_layer_params(model.bert.embeddings) + model.bert.embeddings = BertEmbeddingLayer(bert_emb_config, init_ws) + if model_args.enable_quant: + model.bert.embeddings.apply(qat_mode) + else: + model.bert.embeddings.apply(disable_quant) + class LSHFTransformerEncoderLayer(TransformerEncoderLayer): def __init__(self, *args, **kwargs): super(LSHFTransformerEncoderLayer, self).__init__(*args, **kwargs) @@ -48,8 +86,8 @@ def forward(self, hidden_states, encoder_padding_mask, *args, **kwargs): output = super().forward(hidden_states, ls_encoder_padding_mask) return (output, None, None, None) - def gen_bert_config(training_args, config): - bert_config = TransformerEncoderLayer.get_config( + def gen_bert_enc_config(training_args, config): + bert_enc_config = TransformerEncoderLayer.get_config( max_batch_tokens=4096, max_seq_len=config.max_position_embeddings, hidden_size=config.hidden_size, @@ -63,15 +101,15 @@ def gen_bert_config(training_args, config): local_rank=training_args.local_rank, activation_fn="gelu", ) - return bert_config + return bert_enc_config for i in range(config.num_hidden_layers): - bert_config = gen_bert_config(training_args, config) + bert_enc_config = gen_bert_enc_config(training_args, config) init_ws, init_bs = get_hf_bert_enc_layer_params(model.bert.encoder.layer[i]) model.bert.encoder.layer[i] = LSHFTransformerEncoderLayer( - bert_config, init_ws, init_bs + bert_enc_config, init_ws, init_bs ).cuda() - if model_args.model_type == 2: + if model_args.module_type == 2: if model_args.enable_quant: model.bert.encoder.layer[i].apply(qat_mode) else: diff --git a/examples/training/huggingface/run_glue.py b/examples/training/huggingface/run_glue.py index e60dd39b..b1319a9b 100644 --- a/examples/training/huggingface/run_glue.py +++ b/examples/training/huggingface/run_glue.py @@ -45,7 +45,7 @@ from transformers.trainer_utils import get_last_checkpoint from transformers.utils import check_min_version from transformers.utils.versions import require_version -from ls_hf_transformer_encoder_layer import inject_ls_enc_layer +from ls_hf_transformer_encoder_layer import inject_ls_layer # Will error if the minimal version of Transformers is not installed. Remove at your own risks. @@ -224,7 +224,7 @@ class ModelArguments: "with private models)." }, ) - model_type: int = field( + module_type: int = field( default=1, metadata={ "help": "0: original Hugging Face layer, 1: LightSeq CUDA layer, 2: custom Torch layer" @@ -416,8 +416,8 @@ def main(): ) # Replace with LightSeq encoder layers. - if model_args.model_type == 1 or model_args.model_type == 2: - inject_ls_enc_layer(model, training_args, model_args, config) + if model_args.module_type == 1 or model_args.module_type == 2: + inject_ls_layer(model, training_args, model_args, config) # Preprocessing the datasets if data_args.task_name is not None: diff --git a/examples/training/huggingface/run_glue.sh b/examples/training/huggingface/run_glue.sh index 63eb2447..d9ef2525 100644 --- a/examples/training/huggingface/run_glue.sh +++ b/examples/training/huggingface/run_glue.sh @@ -15,7 +15,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) -export TASK_NAME=stsb +export TASK_NAME=sst2 python3 -m torch.distributed.launch \ --nproc_per_node=8 \ @@ -27,10 +27,11 @@ python3 -m torch.distributed.launch \ --max_seq_length 128 \ --per_device_train_batch_size 32 \ --learning_rate 2e-5 \ - --num_train_epochs 50 \ + --num_train_epochs 10 \ --output_dir /tmp/$TASK_NAME/ \ + --overwrite_output_dir \ --fp16 \ --seed 1234 \ --logging_steps 10 \ - --model_type 2 \ + --module_type 2 \ --enable_quant false diff --git a/examples/training/huggingface/run_ner.py b/examples/training/huggingface/run_ner.py index c2b3fd6f..e4729278 100644 --- a/examples/training/huggingface/run_ner.py +++ b/examples/training/huggingface/run_ner.py @@ -43,7 +43,7 @@ ) from transformers.trainer_utils import get_last_checkpoint from transformers.utils import check_min_version -from ls_hf_transformer_encoder_layer import inject_ls_enc_layer +from ls_hf_transformer_encoder_layer import inject_ls_layer # Will error if the minimal version of Transformers is not installed. Remove at your own risks. @@ -94,7 +94,7 @@ class ModelArguments: "with private models)." }, ) - model_type: int = field( + module_type: int = field( default=1, metadata={ "help": "0: original Hugging Face layer, 1: LightSeq CUDA layer, 2: custom Torch layer" @@ -375,8 +375,8 @@ def get_label_list(labels): ) # Replace with LightSeq encoder layers. - if model_args.model_type == 1 or model_args.model_type == 2: - inject_ls_enc_layer(model, training_args, model_args, config) + if model_args.module_type == 1 or model_args.module_type == 2: + inject_ls_layer(model, training_args, model_args, config) # Tokenizer check: this script requires a fast tokenizer. if not isinstance(tokenizer, PreTrainedTokenizerFast): diff --git a/examples/training/huggingface/run_ner.sh b/examples/training/huggingface/run_ner.sh index a5de37fa..87bda517 100644 --- a/examples/training/huggingface/run_ner.sh +++ b/examples/training/huggingface/run_ner.sh @@ -28,5 +28,5 @@ python3 -m torch.distributed.launch \ --fp16 \ --seed 1234 \ --logging_steps 10 \ - --model_type 2 \ + --module_type 2 \ --enable_quant false diff --git a/examples/training/huggingface/run_quant_glue.sh b/examples/training/huggingface/run_quant_glue.sh index f923a3df..6d54e6bf 100644 --- a/examples/training/huggingface/run_quant_glue.sh +++ b/examples/training/huggingface/run_quant_glue.sh @@ -15,7 +15,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) -export TASK_NAME=stsb +export TASK_NAME=sst2 python3 -m torch.distributed.launch \ --nproc_per_node=8 \ @@ -26,12 +26,13 @@ python3 -m torch.distributed.launch \ --do_eval \ --max_seq_length 128 \ --per_device_train_batch_size 32 \ - --num_train_epochs 100 \ + --learning_rate 2e-6 \ + --num_train_epochs 20 \ --output_dir /tmp/quant/$TASK_NAME/ \ --overwrite_output_dir \ --resume_from_checkpoint /tmp/$TASK_NAME/ \ --fp16 \ --seed 1234 \ --logging_steps 10 \ - --model_type 2 \ + --module_type 2 \ --enable_quant true diff --git a/examples/training/huggingface/run_quant_ner.sh b/examples/training/huggingface/run_quant_ner.sh index e822de30..ad280815 100644 --- a/examples/training/huggingface/run_quant_ner.sh +++ b/examples/training/huggingface/run_quant_ner.sh @@ -29,5 +29,5 @@ python3 -m torch.distributed.launch \ --fp16 \ --seed 1234 \ --logging_steps 10 \ - --model_type 2 \ + --module_type 2 \ --enable_quant true diff --git a/lightseq/training/ops/pytorch/torch_transformer_layers.py b/lightseq/training/ops/pytorch/torch_transformer_layers.py index 9a9b8c63..b7347bc1 100644 --- a/lightseq/training/ops/pytorch/torch_transformer_layers.py +++ b/lightseq/training/ops/pytorch/torch_transformer_layers.py @@ -1047,3 +1047,81 @@ def forward( .view(bsz, seq_len, -1) * mask ).detach() + + +class BertEmbeddingLayer(TransformerEmbeddingLayerBase): + def __init__(self, config, initial_weights=None): + super().__init__() + self.word_embeddings = nn.Embedding( + config.vocab_size, config.embedding_dim, padding_idx=config.padding_idx + ) + self.position_embeddings = nn.Embedding( + config.max_seq_len, config.embedding_dim + ) + self.token_type_embeddings = nn.Embedding( + config.type_vocab_size, config.embedding_dim + ) + + self.LayerNorm = nn.LayerNorm(config.embedding_dim, eps=config.layer_norm_eps) + self.dropout = nn.Dropout(config.dropout) + + self.register_buffer( + "position_ids", torch.arange(config.max_seq_len).expand((1, -1)) + ) + self.register_buffer( + "token_type_ids", + torch.zeros(self.position_ids.size(), dtype=torch.long), + persistent=False, + ) + + self.emb_quant = TensorQuantizer(weight_quant_config) + self.pos_emb_quant = TensorQuantizer(weight_quant_config) + + if initial_weights is None: + return + + # load initial weights + self.word_embeddings.weight.data.copy_( + copy_para(initial_weights[0], config.fp16) + ) + self.position_embeddings.weight.data.copy_( + copy_para(initial_weights[1], config.fp16) + ) + self.token_type_embeddings.weight.data.copy_( + copy_para(initial_weights[2], config.fp16) + ) + self.LayerNorm.weight.data.copy_(copy_para(initial_weights[3], config.fp16)) + self.LayerNorm.bias.data.copy_(copy_para(initial_weights[4], config.fp16)) + + def forward( + self, + input_ids=None, + token_type_ids=None, + position_ids=None, + inputs_embeds=None, + past_key_values_length=0, + ): + assert input_ids is not None + assert position_ids is None + assert inputs_embeds is None + assert torch.all(token_type_ids == 0) + + input_shape = input_ids.size() + seq_length = input_shape[1] + position_ids = self.position_ids[:, :seq_length] + + token_type_ids = self.token_type_ids[:, :seq_length].expand( + input_shape[0], seq_length + ) + + inputs_embeds = self.word_embeddings(input_ids) + token_type_embeddings = self.token_type_embeddings(token_type_ids) + + embeddings = inputs_embeds + token_type_embeddings + embeddings = self.emb_quant(embeddings) + position_embeddings = self.position_embeddings(position_ids) + position_embeddings = self.pos_emb_quant(position_embeddings) + embeddings += position_embeddings + embeddings = self.LayerNorm(embeddings) + embeddings = self.dropout(embeddings) + return embeddings From cf0caa6ebc3f328cf972e5b32fb5c14a56eeb835 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Sat, 19 Mar 2022 00:31:53 +0800 Subject: [PATCH 08/49] add example of hf bert squad training, modify dir of huggingface training --- examples/training/huggingface/README.md | 14 - examples/training/huggingface/bert/README.md | 19 + .../huggingface/{ => bert}/__init__.py | 0 .../huggingface/{ => bert/glue}/run_glue.py | 0 .../huggingface/{ => bert/glue}/run_glue.sh | 2 +- .../{ => bert/glue}/run_quant_glue.sh | 2 +- .../ls_hf_transformer_encoder_layer.py | 0 .../huggingface/{ => bert/ner}/run_ner.py | 0 .../huggingface/{ => bert/ner}/run_ner.sh | 2 +- .../{ => bert/ner}/run_quant_ner.sh | 2 +- .../training/huggingface/bert/qa/run_qa.py | 663 ++++++++++++++++++ .../training/huggingface/bert/qa/run_qa.sh | 35 + .../huggingface/bert/qa/trainer_qa.py | 105 +++ .../training/huggingface/bert/qa/utils_qa.py | 434 ++++++++++++ 14 files changed, 1260 insertions(+), 18 deletions(-) delete mode 100644 examples/training/huggingface/README.md create mode 100644 examples/training/huggingface/bert/README.md rename examples/training/huggingface/{ => bert}/__init__.py (100%) rename examples/training/huggingface/{ => bert/glue}/run_glue.py (100%) rename examples/training/huggingface/{ => bert/glue}/run_glue.sh (96%) rename examples/training/huggingface/{ => bert/glue}/run_quant_glue.sh (96%) rename examples/training/huggingface/{ => bert}/ls_hf_transformer_encoder_layer.py (100%) rename examples/training/huggingface/{ => bert/ner}/run_ner.py (100%) rename examples/training/huggingface/{ => bert/ner}/run_ner.sh (95%) rename examples/training/huggingface/{ => bert/ner}/run_quant_ner.sh (95%) create mode 100644 examples/training/huggingface/bert/qa/run_qa.py create mode 100644 examples/training/huggingface/bert/qa/run_qa.sh create mode 100644 examples/training/huggingface/bert/qa/trainer_qa.py create mode 100644 examples/training/huggingface/bert/qa/utils_qa.py diff --git a/examples/training/huggingface/README.md b/examples/training/huggingface/README.md deleted file mode 100644 index d8686202..00000000 --- a/examples/training/huggingface/README.md +++ /dev/null @@ -1,14 +0,0 @@ -# LightSeq for HuggingFace - -This repo contains an example for how to use LightSeq to accerate the training of BERT in HuggingFace [Transformers](https://github.com/huggingface/transformers). - -We modify the token classification [examples](https://github.com/huggingface/transformers/tree/master/examples/pytorch/token-classification) in HuggingFace Transformers by replacing their encoder layers with the fused ones in LightSeq. - -First you should install these requirements. - -```shell -pip install torch ninja transformers seqeval datasets -``` - -Then you can easily fine-tunes BERT on CoNLL-2003 by running the bash script `run_ner.sh` -or on GLUE by `run_glue.sh`. From our tests, speedup is about 1.6x . diff --git a/examples/training/huggingface/bert/README.md b/examples/training/huggingface/bert/README.md new file mode 100644 index 00000000..d96138ad --- /dev/null +++ b/examples/training/huggingface/bert/README.md @@ -0,0 +1,19 @@ +# LightSeq for HuggingFace BERT + +This repo contains an example for how to use LightSeq to accerate the training of BERT in HuggingFace [Transformers](https://github.com/huggingface/transformers). + +We modify the examples like token classification [examples](https://github.com/huggingface/transformers/tree/master/examples/pytorch/token-classification) in HuggingFace Transformers by replacing their encoder layers with the fused ones in LightSeq. + +First you should install these requirements. + +```shell +pip install torch ninja transformers seqeval datasets +``` + +Before doing next training, you need to switch to the current directory: +```shell +cd examples/training/huggingface/bert +``` + +Then you can easily fine-tunes BERT on different task by running the bash scripts `run_ner.sh` +or on GLUE by `run_glue.sh`. From our tests, speedup is about 1.6x. diff --git a/examples/training/huggingface/__init__.py b/examples/training/huggingface/bert/__init__.py similarity index 100% rename from examples/training/huggingface/__init__.py rename to examples/training/huggingface/bert/__init__.py diff --git a/examples/training/huggingface/run_glue.py b/examples/training/huggingface/bert/glue/run_glue.py similarity index 100% rename from examples/training/huggingface/run_glue.py rename to examples/training/huggingface/bert/glue/run_glue.py diff --git a/examples/training/huggingface/run_glue.sh b/examples/training/huggingface/bert/glue/run_glue.sh similarity index 96% rename from examples/training/huggingface/run_glue.sh rename to examples/training/huggingface/bert/glue/run_glue.sh index d9ef2525..e6a82979 100644 --- a/examples/training/huggingface/run_glue.sh +++ b/examples/training/huggingface/bert/glue/run_glue.sh @@ -20,7 +20,7 @@ export TASK_NAME=sst2 python3 -m torch.distributed.launch \ --nproc_per_node=8 \ $THIS_DIR/run_glue.py \ - --model_name_or_path bert-large-cased \ + --model_name_or_path bert-base-cased \ --task_name $TASK_NAME \ --do_train \ --do_eval \ diff --git a/examples/training/huggingface/run_quant_glue.sh b/examples/training/huggingface/bert/glue/run_quant_glue.sh similarity index 96% rename from examples/training/huggingface/run_quant_glue.sh rename to examples/training/huggingface/bert/glue/run_quant_glue.sh index 6d54e6bf..46f3e58f 100644 --- a/examples/training/huggingface/run_quant_glue.sh +++ b/examples/training/huggingface/bert/glue/run_quant_glue.sh @@ -20,7 +20,7 @@ export TASK_NAME=sst2 python3 -m torch.distributed.launch \ --nproc_per_node=8 \ $THIS_DIR/run_glue.py \ - --model_name_or_path bert-large-cased \ + --model_name_or_path bert-base-cased \ --task_name $TASK_NAME \ --do_train \ --do_eval \ diff --git a/examples/training/huggingface/ls_hf_transformer_encoder_layer.py b/examples/training/huggingface/bert/ls_hf_transformer_encoder_layer.py similarity index 100% rename from examples/training/huggingface/ls_hf_transformer_encoder_layer.py rename to examples/training/huggingface/bert/ls_hf_transformer_encoder_layer.py diff --git a/examples/training/huggingface/run_ner.py b/examples/training/huggingface/bert/ner/run_ner.py similarity index 100% rename from examples/training/huggingface/run_ner.py rename to examples/training/huggingface/bert/ner/run_ner.py diff --git a/examples/training/huggingface/run_ner.sh b/examples/training/huggingface/bert/ner/run_ner.sh similarity index 95% rename from examples/training/huggingface/run_ner.sh rename to examples/training/huggingface/bert/ner/run_ner.sh index 87bda517..a01e7041 100644 --- a/examples/training/huggingface/run_ner.sh +++ b/examples/training/huggingface/bert/ner/run_ner.sh @@ -17,7 +17,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) python3 -m torch.distributed.launch \ --nproc_per_node=8 \ $THIS_DIR/run_ner.py \ - --model_name_or_path bert-large-uncased \ + --model_name_or_path bert-base-uncased \ --dataset_name conll2003 \ --do_train \ --do_eval \ diff --git a/examples/training/huggingface/run_quant_ner.sh b/examples/training/huggingface/bert/ner/run_quant_ner.sh similarity index 95% rename from examples/training/huggingface/run_quant_ner.sh rename to examples/training/huggingface/bert/ner/run_quant_ner.sh index ad280815..6e64c0a6 100644 --- a/examples/training/huggingface/run_quant_ner.sh +++ b/examples/training/huggingface/bert/ner/run_quant_ner.sh @@ -17,7 +17,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) python3 -m torch.distributed.launch \ --nproc_per_node=8 \ $THIS_DIR/run_ner.py \ - --model_name_or_path bert-large-uncased \ + --model_name_or_path bert-base-uncased \ --dataset_name conll2003 \ --do_train \ --do_eval \ diff --git a/examples/training/huggingface/bert/qa/run_qa.py b/examples/training/huggingface/bert/qa/run_qa.py new file mode 100644 index 00000000..1657c192 --- /dev/null +++ b/examples/training/huggingface/bert/qa/run_qa.py @@ -0,0 +1,663 @@ +#!/usr/bin/env python +# coding=utf-8 +# Copyright 2021 The LightSeq Team +# Copyright 2020 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +Fine-tuning the library models for question answering using a slightly adapted version of the 🤗 Trainer. +""" +# You can also adapt this script on your own question answering task. Pointers for this are left as comments. + +import logging +import os +import sys +from dataclasses import dataclass, field +from typing import Optional + +import datasets +from datasets import load_dataset, load_metric + +import transformers +from trainer_qa import QuestionAnsweringTrainer +from transformers import ( + AutoConfig, + AutoModelForQuestionAnswering, + AutoTokenizer, + DataCollatorWithPadding, + EvalPrediction, + HfArgumentParser, + PreTrainedTokenizerFast, + TrainingArguments, + default_data_collator, + set_seed, +) +from transformers.trainer_utils import get_last_checkpoint +from transformers.utils import check_min_version +from transformers.utils.versions import require_version +from utils_qa import postprocess_qa_predictions +from ls_hf_transformer_encoder_layer import inject_ls_layer + + +# Will error if the minimal version of Transformers is not installed. Remove at your own risks. +check_min_version("4.17.0") + +require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/question-answering/requirements.txt") + +logger = logging.getLogger(__name__) + + +@dataclass +class ModelArguments: + """ + Arguments pertaining to which model/config/tokenizer we are going to fine-tune from. + """ + + model_name_or_path: str = field( + metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"} + ) + config_name: Optional[str] = field( + default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"} + ) + tokenizer_name: Optional[str] = field( + default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"} + ) + cache_dir: Optional[str] = field( + default=None, + metadata={"help": "Path to directory to store the pretrained models downloaded from huggingface.co"}, + ) + model_revision: str = field( + default="main", + metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."}, + ) + use_auth_token: bool = field( + default=False, + metadata={ + "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script " + "with private models)." + }, + ) + module_type: int = field( + default=1, + metadata={ + "help": "0: original Hugging Face layer, 1: LightSeq CUDA layer, 2: custom Torch layer" + }, + ) + enable_quant: bool = field( + default=False, + metadata={"help": "Whether to enable quantization"}, + ) + + +@dataclass +class DataTrainingArguments: + """ + Arguments pertaining to what data we are going to input our model for training and eval. + """ + + dataset_name: Optional[str] = field( + default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."} + ) + dataset_config_name: Optional[str] = field( + default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."} + ) + train_file: Optional[str] = field(default=None, metadata={"help": "The input training data file (a text file)."}) + validation_file: Optional[str] = field( + default=None, + metadata={"help": "An optional input evaluation data file to evaluate the perplexity on (a text file)."}, + ) + test_file: Optional[str] = field( + default=None, + metadata={"help": "An optional input test data file to evaluate the perplexity on (a text file)."}, + ) + overwrite_cache: bool = field( + default=False, metadata={"help": "Overwrite the cached training and evaluation sets"} + ) + preprocessing_num_workers: Optional[int] = field( + default=None, + metadata={"help": "The number of processes to use for the preprocessing."}, + ) + max_seq_length: int = field( + default=384, + metadata={ + "help": "The maximum total input sequence length after tokenization. Sequences longer " + "than this will be truncated, sequences shorter will be padded." + }, + ) + pad_to_max_length: bool = field( + default=True, + metadata={ + "help": "Whether to pad all samples to `max_seq_length`. " + "If False, will pad the samples dynamically when batching to the maximum length in the batch (which can " + "be faster on GPU but will be slower on TPU)." + }, + ) + max_train_samples: Optional[int] = field( + default=None, + metadata={ + "help": "For debugging purposes or quicker training, truncate the number of training examples to this " + "value if set." + }, + ) + max_eval_samples: Optional[int] = field( + default=None, + metadata={ + "help": "For debugging purposes or quicker training, truncate the number of evaluation examples to this " + "value if set." + }, + ) + max_predict_samples: Optional[int] = field( + default=None, + metadata={ + "help": "For debugging purposes or quicker training, truncate the number of prediction examples to this " + "value if set." + }, + ) + version_2_with_negative: bool = field( + default=False, metadata={"help": "If true, some of the examples do not have an answer."} + ) + null_score_diff_threshold: float = field( + default=0.0, + metadata={ + "help": "The threshold used to select the null answer: if the best answer has a score that is less than " + "the score of the null answer minus this threshold, the null answer is selected for this example. " + "Only useful when `version_2_with_negative=True`." + }, + ) + doc_stride: int = field( + default=128, + metadata={"help": "When splitting up a long document into chunks, how much stride to take between chunks."}, + ) + n_best_size: int = field( + default=20, + metadata={"help": "The total number of n-best predictions to generate when looking for an answer."}, + ) + max_answer_length: int = field( + default=30, + metadata={ + "help": "The maximum length of an answer that can be generated. This is needed because the start " + "and end predictions are not conditioned on one another." + }, + ) + + def __post_init__(self): + if ( + self.dataset_name is None + and self.train_file is None + and self.validation_file is None + and self.test_file is None + ): + raise ValueError("Need either a dataset name or a training/validation file/test_file.") + else: + if self.train_file is not None: + extension = self.train_file.split(".")[-1] + assert extension in ["csv", "json"], "`train_file` should be a csv or a json file." + if self.validation_file is not None: + extension = self.validation_file.split(".")[-1] + assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file." + if self.test_file is not None: + extension = self.test_file.split(".")[-1] + assert extension in ["csv", "json"], "`test_file` should be a csv or a json file." + + +def main(): + # See all possible arguments in src/transformers/training_args.py + # or by passing the --help flag to this script. + # We now keep distinct sets of args, for a cleaner separation of concerns. + + parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments)) + if len(sys.argv) == 2 and sys.argv[1].endswith(".json"): + # If we pass only one argument to the script and it's the path to a json file, + # let's parse it to get our arguments. + model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1])) + else: + model_args, data_args, training_args = parser.parse_args_into_dataclasses() + + # Setup logging + logging.basicConfig( + format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", + datefmt="%m/%d/%Y %H:%M:%S", + handlers=[logging.StreamHandler(sys.stdout)], + ) + + log_level = training_args.get_process_log_level() + logger.setLevel(log_level) + datasets.utils.logging.set_verbosity(log_level) + transformers.utils.logging.set_verbosity(log_level) + transformers.utils.logging.enable_default_handler() + transformers.utils.logging.enable_explicit_format() + + # Log on each process the small summary: + logger.warning( + f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}" + + f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}" + ) + logger.info(f"Training/evaluation parameters {training_args}") + + # Detecting last checkpoint. + last_checkpoint = None + if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir: + last_checkpoint = get_last_checkpoint(training_args.output_dir) + if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0: + raise ValueError( + f"Output directory ({training_args.output_dir}) already exists and is not empty. " + "Use --overwrite_output_dir to overcome." + ) + elif last_checkpoint is not None and training_args.resume_from_checkpoint is None: + logger.info( + f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change " + "the `--output_dir` or add `--overwrite_output_dir` to train from scratch." + ) + + # Set seed before initializing model. + set_seed(training_args.seed) + + # Get the datasets: you can either provide your own CSV/JSON/TXT training and evaluation files (see below) + # or just provide the name of one of the public datasets available on the hub at https://huggingface.co/datasets/ + # (the dataset will be downloaded automatically from the datasets Hub). + # + # For CSV/JSON files, this script will use the column called 'text' or the first column if no column called + # 'text' is found. You can easily tweak this behavior (see below). + # + # In distributed training, the load_dataset function guarantee that only one local process can concurrently + # download the dataset. + if data_args.dataset_name is not None: + # Downloading and loading a dataset from the hub. + raw_datasets = load_dataset( + data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + ) + else: + data_files = {} + if data_args.train_file is not None: + data_files["train"] = data_args.train_file + extension = data_args.train_file.split(".")[-1] + + if data_args.validation_file is not None: + data_files["validation"] = data_args.validation_file + extension = data_args.validation_file.split(".")[-1] + if data_args.test_file is not None: + data_files["test"] = data_args.test_file + extension = data_args.test_file.split(".")[-1] + raw_datasets = load_dataset(extension, data_files=data_files, field="data", cache_dir=model_args.cache_dir) + # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at + # https://huggingface.co/docs/datasets/loading_datasets.html. + + # Load pretrained model and tokenizer + # + # Distributed training: + # The .from_pretrained methods guarantee that only one local process can concurrently + # download model & vocab. + config = AutoConfig.from_pretrained( + model_args.config_name if model_args.config_name else model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + revision=model_args.model_revision, + use_auth_token=True if model_args.use_auth_token else None, + ) + tokenizer = AutoTokenizer.from_pretrained( + model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_fast=True, + revision=model_args.model_revision, + use_auth_token=True if model_args.use_auth_token else None, + ) + model = AutoModelForQuestionAnswering.from_pretrained( + model_args.model_name_or_path, + from_tf=bool(".ckpt" in model_args.model_name_or_path), + config=config, + cache_dir=model_args.cache_dir, + revision=model_args.model_revision, + use_auth_token=True if model_args.use_auth_token else None, + ) + + # Replace with LightSeq encoder layers. + if model_args.module_type == 1 or model_args.module_type == 2: + inject_ls_layer(model, training_args, model_args, config) + + # Tokenizer check: this script requires a fast tokenizer. + if not isinstance(tokenizer, PreTrainedTokenizerFast): + raise ValueError( + "This example script only works for models that have a fast tokenizer. Checkout the big table of models " + "at https://huggingface.co/transformers/index.html#supported-frameworks to find the model types that meet this " + "requirement" + ) + + # Preprocessing the datasets. + # Preprocessing is slighlty different for training and evaluation. + if training_args.do_train: + column_names = raw_datasets["train"].column_names + elif training_args.do_eval: + column_names = raw_datasets["validation"].column_names + else: + column_names = raw_datasets["test"].column_names + question_column_name = "question" if "question" in column_names else column_names[0] + context_column_name = "context" if "context" in column_names else column_names[1] + answer_column_name = "answers" if "answers" in column_names else column_names[2] + + # Padding side determines if we do (question|context) or (context|question). + pad_on_right = tokenizer.padding_side == "right" + + if data_args.max_seq_length > tokenizer.model_max_length: + logger.warning( + f"The max_seq_length passed ({data_args.max_seq_length}) is larger than the maximum length for the" + f"model ({tokenizer.model_max_length}). Using max_seq_length={tokenizer.model_max_length}." + ) + max_seq_length = min(data_args.max_seq_length, tokenizer.model_max_length) + + # Training preprocessing + def prepare_train_features(examples): + # Some of the questions have lots of whitespace on the left, which is not useful and will make the + # truncation of the context fail (the tokenized question will take a lots of space). So we remove that + # left whitespace + examples[question_column_name] = [q.lstrip() for q in examples[question_column_name]] + + # Tokenize our examples with truncation and maybe padding, but keep the overflows using a stride. This results + # in one example possible giving several features when a context is long, each of those features having a + # context that overlaps a bit the context of the previous feature. + tokenized_examples = tokenizer( + examples[question_column_name if pad_on_right else context_column_name], + examples[context_column_name if pad_on_right else question_column_name], + truncation="only_second" if pad_on_right else "only_first", + max_length=max_seq_length, + stride=data_args.doc_stride, + return_overflowing_tokens=True, + return_offsets_mapping=True, + padding="max_length" if data_args.pad_to_max_length else False, + ) + + # Since one example might give us several features if it has a long context, we need a map from a feature to + # its corresponding example. This key gives us just that. + sample_mapping = tokenized_examples.pop("overflow_to_sample_mapping") + # The offset mappings will give us a map from token to character position in the original context. This will + # help us compute the start_positions and end_positions. + offset_mapping = tokenized_examples.pop("offset_mapping") + + # Let's label those examples! + tokenized_examples["start_positions"] = [] + tokenized_examples["end_positions"] = [] + + for i, offsets in enumerate(offset_mapping): + # We will label impossible answers with the index of the CLS token. + input_ids = tokenized_examples["input_ids"][i] + cls_index = input_ids.index(tokenizer.cls_token_id) + + # Grab the sequence corresponding to that example (to know what is the context and what is the question). + sequence_ids = tokenized_examples.sequence_ids(i) + + # One example can give several spans, this is the index of the example containing this span of text. + sample_index = sample_mapping[i] + answers = examples[answer_column_name][sample_index] + # If no answers are given, set the cls_index as answer. + if len(answers["answer_start"]) == 0: + tokenized_examples["start_positions"].append(cls_index) + tokenized_examples["end_positions"].append(cls_index) + else: + # Start/end character index of the answer in the text. + start_char = answers["answer_start"][0] + end_char = start_char + len(answers["text"][0]) + + # Start token index of the current span in the text. + token_start_index = 0 + while sequence_ids[token_start_index] != (1 if pad_on_right else 0): + token_start_index += 1 + + # End token index of the current span in the text. + token_end_index = len(input_ids) - 1 + while sequence_ids[token_end_index] != (1 if pad_on_right else 0): + token_end_index -= 1 + + # Detect if the answer is out of the span (in which case this feature is labeled with the CLS index). + if not (offsets[token_start_index][0] <= start_char and offsets[token_end_index][1] >= end_char): + tokenized_examples["start_positions"].append(cls_index) + tokenized_examples["end_positions"].append(cls_index) + else: + # Otherwise move the token_start_index and token_end_index to the two ends of the answer. + # Note: we could go after the last offset if the answer is the last word (edge case). + while token_start_index < len(offsets) and offsets[token_start_index][0] <= start_char: + token_start_index += 1 + tokenized_examples["start_positions"].append(token_start_index - 1) + while offsets[token_end_index][1] >= end_char: + token_end_index -= 1 + tokenized_examples["end_positions"].append(token_end_index + 1) + + return tokenized_examples + + if training_args.do_train: + if "train" not in raw_datasets: + raise ValueError("--do_train requires a train dataset") + train_dataset = raw_datasets["train"] + if data_args.max_train_samples is not None: + # We will select sample from whole data if argument is specified + train_dataset = train_dataset.select(range(data_args.max_train_samples)) + # Create train feature from dataset + with training_args.main_process_first(desc="train dataset map pre-processing"): + train_dataset = train_dataset.map( + prepare_train_features, + batched=True, + num_proc=data_args.preprocessing_num_workers, + remove_columns=column_names, + load_from_cache_file=not data_args.overwrite_cache, + desc="Running tokenizer on train dataset", + ) + if data_args.max_train_samples is not None: + # Number of samples might increase during Feature Creation, We select only specified max samples + train_dataset = train_dataset.select(range(data_args.max_train_samples)) + + # Validation preprocessing + def prepare_validation_features(examples): + # Some of the questions have lots of whitespace on the left, which is not useful and will make the + # truncation of the context fail (the tokenized question will take a lots of space). So we remove that + # left whitespace + examples[question_column_name] = [q.lstrip() for q in examples[question_column_name]] + + # Tokenize our examples with truncation and maybe padding, but keep the overflows using a stride. This results + # in one example possible giving several features when a context is long, each of those features having a + # context that overlaps a bit the context of the previous feature. + tokenized_examples = tokenizer( + examples[question_column_name if pad_on_right else context_column_name], + examples[context_column_name if pad_on_right else question_column_name], + truncation="only_second" if pad_on_right else "only_first", + max_length=max_seq_length, + stride=data_args.doc_stride, + return_overflowing_tokens=True, + return_offsets_mapping=True, + padding="max_length" if data_args.pad_to_max_length else False, + ) + + # Since one example might give us several features if it has a long context, we need a map from a feature to + # its corresponding example. This key gives us just that. + sample_mapping = tokenized_examples.pop("overflow_to_sample_mapping") + + # For evaluation, we will need to convert our predictions to substrings of the context, so we keep the + # corresponding example_id and we will store the offset mappings. + tokenized_examples["example_id"] = [] + + for i in range(len(tokenized_examples["input_ids"])): + # Grab the sequence corresponding to that example (to know what is the context and what is the question). + sequence_ids = tokenized_examples.sequence_ids(i) + context_index = 1 if pad_on_right else 0 + + # One example can give several spans, this is the index of the example containing this span of text. + sample_index = sample_mapping[i] + tokenized_examples["example_id"].append(examples["id"][sample_index]) + + # Set to None the offset_mapping that are not part of the context so it's easy to determine if a token + # position is part of the context or not. + tokenized_examples["offset_mapping"][i] = [ + (o if sequence_ids[k] == context_index else None) + for k, o in enumerate(tokenized_examples["offset_mapping"][i]) + ] + + return tokenized_examples + + if training_args.do_eval: + if "validation" not in raw_datasets: + raise ValueError("--do_eval requires a validation dataset") + eval_examples = raw_datasets["validation"] + if data_args.max_eval_samples is not None: + # We will select sample from whole data + eval_examples = eval_examples.select(range(data_args.max_eval_samples)) + # Validation Feature Creation + with training_args.main_process_first(desc="validation dataset map pre-processing"): + eval_dataset = eval_examples.map( + prepare_validation_features, + batched=True, + num_proc=data_args.preprocessing_num_workers, + remove_columns=column_names, + load_from_cache_file=not data_args.overwrite_cache, + desc="Running tokenizer on validation dataset", + ) + if data_args.max_eval_samples is not None: + # During Feature creation dataset samples might increase, we will select required samples again + eval_dataset = eval_dataset.select(range(data_args.max_eval_samples)) + + if training_args.do_predict: + if "test" not in raw_datasets: + raise ValueError("--do_predict requires a test dataset") + predict_examples = raw_datasets["test"] + if data_args.max_predict_samples is not None: + # We will select sample from whole data + predict_examples = predict_examples.select(range(data_args.max_predict_samples)) + # Predict Feature Creation + with training_args.main_process_first(desc="prediction dataset map pre-processing"): + predict_dataset = predict_examples.map( + prepare_validation_features, + batched=True, + num_proc=data_args.preprocessing_num_workers, + remove_columns=column_names, + load_from_cache_file=not data_args.overwrite_cache, + desc="Running tokenizer on prediction dataset", + ) + if data_args.max_predict_samples is not None: + # During Feature creation dataset samples might increase, we will select required samples again + predict_dataset = predict_dataset.select(range(data_args.max_predict_samples)) + + # Data collator + # We have already padded to max length if the corresponding flag is True, otherwise we need to pad in the data + # collator. + data_collator = ( + default_data_collator + if data_args.pad_to_max_length + else DataCollatorWithPadding(tokenizer, pad_to_multiple_of=8 if training_args.fp16 else None) + ) + + # Post-processing: + def post_processing_function(examples, features, predictions, stage="eval"): + # Post-processing: we match the start logits and end logits to answers in the original context. + predictions = postprocess_qa_predictions( + examples=examples, + features=features, + predictions=predictions, + version_2_with_negative=data_args.version_2_with_negative, + n_best_size=data_args.n_best_size, + max_answer_length=data_args.max_answer_length, + null_score_diff_threshold=data_args.null_score_diff_threshold, + output_dir=training_args.output_dir, + log_level=log_level, + prefix=stage, + ) + # Format the result to the format the metric expects. + if data_args.version_2_with_negative: + formatted_predictions = [ + {"id": k, "prediction_text": v, "no_answer_probability": 0.0} for k, v in predictions.items() + ] + else: + formatted_predictions = [{"id": k, "prediction_text": v} for k, v in predictions.items()] + + references = [{"id": ex["id"], "answers": ex[answer_column_name]} for ex in examples] + return EvalPrediction(predictions=formatted_predictions, label_ids=references) + + metric = load_metric("squad_v2" if data_args.version_2_with_negative else "squad") + + def compute_metrics(p: EvalPrediction): + return metric.compute(predictions=p.predictions, references=p.label_ids) + + # Initialize our Trainer + trainer = QuestionAnsweringTrainer( + model=model, + args=training_args, + train_dataset=train_dataset if training_args.do_train else None, + eval_dataset=eval_dataset if training_args.do_eval else None, + eval_examples=eval_examples if training_args.do_eval else None, + tokenizer=tokenizer, + data_collator=data_collator, + post_process_function=post_processing_function, + compute_metrics=compute_metrics, + ) + + # Training + if training_args.do_train: + checkpoint = None + if training_args.resume_from_checkpoint is not None: + checkpoint = training_args.resume_from_checkpoint + elif last_checkpoint is not None: + checkpoint = last_checkpoint + train_result = trainer.train(resume_from_checkpoint=checkpoint) + trainer.save_model() # Saves the tokenizer too for easy upload + + metrics = train_result.metrics + max_train_samples = ( + data_args.max_train_samples if data_args.max_train_samples is not None else len(train_dataset) + ) + metrics["train_samples"] = min(max_train_samples, len(train_dataset)) + + trainer.log_metrics("train", metrics) + trainer.save_metrics("train", metrics) + trainer.save_state() + + # Evaluation + if training_args.do_eval: + logger.info("*** Evaluate ***") + metrics = trainer.evaluate() + + max_eval_samples = data_args.max_eval_samples if data_args.max_eval_samples is not None else len(eval_dataset) + metrics["eval_samples"] = min(max_eval_samples, len(eval_dataset)) + + trainer.log_metrics("eval", metrics) + trainer.save_metrics("eval", metrics) + + # Prediction + if training_args.do_predict: + logger.info("*** Predict ***") + results = trainer.predict(predict_dataset, predict_examples) + metrics = results.metrics + + max_predict_samples = ( + data_args.max_predict_samples if data_args.max_predict_samples is not None else len(predict_dataset) + ) + metrics["predict_samples"] = min(max_predict_samples, len(predict_dataset)) + + trainer.log_metrics("predict", metrics) + trainer.save_metrics("predict", metrics) + + kwargs = {"finetuned_from": model_args.model_name_or_path, "tasks": "question-answering"} + if data_args.dataset_name is not None: + kwargs["dataset_tags"] = data_args.dataset_name + if data_args.dataset_config_name is not None: + kwargs["dataset_args"] = data_args.dataset_config_name + kwargs["dataset"] = f"{data_args.dataset_name} {data_args.dataset_config_name}" + else: + kwargs["dataset"] = data_args.dataset_name + + if training_args.push_to_hub: + trainer.push_to_hub(**kwargs) + else: + trainer.create_model_card(**kwargs) + + +def _mp_fn(index): + # For xla_spawn (TPUs) + main() + + +if __name__ == "__main__": + main() diff --git a/examples/training/huggingface/bert/qa/run_qa.sh b/examples/training/huggingface/bert/qa/run_qa.sh new file mode 100644 index 00000000..78e5c390 --- /dev/null +++ b/examples/training/huggingface/bert/qa/run_qa.sh @@ -0,0 +1,35 @@ +# Copyright 2020 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +THIS_DIR=$(dirname $(readlink -f $0)) + +python3 -m torch.distributed.launch \ + --nproc_per_node=8 \ + $THIS_DIR/run_qa.py \ + --model_name_or_path bert-base-uncased \ + --dataset_name squad \ + --do_train \ + --do_eval \ + --max_seq_length 256 \ + --per_device_train_batch_size 16 \ + --doc_stride 128 \ + --learning_rate 3e-5 \ + --num_train_epochs 10 \ + --output_dir /tmp/squad \ + --overwrite_output_dir \ + --fp16 \ + --seed 1234 \ + --logging_steps 10 \ + --module_type 1 \ + --enable_quant false diff --git a/examples/training/huggingface/bert/qa/trainer_qa.py b/examples/training/huggingface/bert/qa/trainer_qa.py new file mode 100644 index 00000000..7f98eba2 --- /dev/null +++ b/examples/training/huggingface/bert/qa/trainer_qa.py @@ -0,0 +1,105 @@ +# coding=utf-8 +# Copyright 2020 The HuggingFace Team All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +A subclass of `Trainer` specific to Question-Answering tasks +""" + +from transformers import Trainer, is_torch_tpu_available +from transformers.trainer_utils import PredictionOutput + + +if is_torch_tpu_available(): + import torch_xla.core.xla_model as xm + import torch_xla.debug.metrics as met + + +class QuestionAnsweringTrainer(Trainer): + def __init__(self, *args, eval_examples=None, post_process_function=None, **kwargs): + super().__init__(*args, **kwargs) + self.eval_examples = eval_examples + self.post_process_function = post_process_function + + def evaluate(self, eval_dataset=None, eval_examples=None, ignore_keys=None, metric_key_prefix: str = "eval"): + eval_dataset = self.eval_dataset if eval_dataset is None else eval_dataset + eval_dataloader = self.get_eval_dataloader(eval_dataset) + eval_examples = self.eval_examples if eval_examples is None else eval_examples + + # Temporarily disable metric computation, we will do it in the loop here. + compute_metrics = self.compute_metrics + self.compute_metrics = None + eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop + try: + output = eval_loop( + eval_dataloader, + description="Evaluation", + # No point gathering the predictions if there are no metrics, otherwise we defer to + # self.args.prediction_loss_only + prediction_loss_only=True if compute_metrics is None else None, + ignore_keys=ignore_keys, + ) + finally: + self.compute_metrics = compute_metrics + + if self.post_process_function is not None and self.compute_metrics is not None: + eval_preds = self.post_process_function(eval_examples, eval_dataset, output.predictions) + metrics = self.compute_metrics(eval_preds) + + # Prefix all keys with metric_key_prefix + '_' + for key in list(metrics.keys()): + if not key.startswith(f"{metric_key_prefix}_"): + metrics[f"{metric_key_prefix}_{key}"] = metrics.pop(key) + + self.log(metrics) + else: + metrics = {} + + if self.args.tpu_metrics_debug or self.args.debug: + # tpu-comment: Logging debug metrics for PyTorch/XLA (compile, execute times, ops, etc.) + xm.master_print(met.metrics_report()) + + self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, metrics) + return metrics + + def predict(self, predict_dataset, predict_examples, ignore_keys=None, metric_key_prefix: str = "test"): + predict_dataloader = self.get_test_dataloader(predict_dataset) + + # Temporarily disable metric computation, we will do it in the loop here. + compute_metrics = self.compute_metrics + self.compute_metrics = None + eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop + try: + output = eval_loop( + predict_dataloader, + description="Prediction", + # No point gathering the predictions if there are no metrics, otherwise we defer to + # self.args.prediction_loss_only + prediction_loss_only=True if compute_metrics is None else None, + ignore_keys=ignore_keys, + ) + finally: + self.compute_metrics = compute_metrics + + if self.post_process_function is None or self.compute_metrics is None: + return output + + predictions = self.post_process_function(predict_examples, predict_dataset, output.predictions, "predict") + metrics = self.compute_metrics(predictions) + + # Prefix all keys with metric_key_prefix + '_' + for key in list(metrics.keys()): + if not key.startswith(f"{metric_key_prefix}_"): + metrics[f"{metric_key_prefix}_{key}"] = metrics.pop(key) + + return PredictionOutput(predictions=predictions.predictions, label_ids=predictions.label_ids, metrics=metrics) diff --git a/examples/training/huggingface/bert/qa/utils_qa.py b/examples/training/huggingface/bert/qa/utils_qa.py new file mode 100644 index 00000000..fd0bc16f --- /dev/null +++ b/examples/training/huggingface/bert/qa/utils_qa.py @@ -0,0 +1,434 @@ +# coding=utf-8 +# Copyright 2020 The HuggingFace Team All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" +Post-processing utilities for question answering. +""" +import collections +import json +import logging +import os +from typing import Optional, Tuple + +import numpy as np +from tqdm.auto import tqdm + + +logger = logging.getLogger(__name__) + + +def postprocess_qa_predictions( + examples, + features, + predictions: Tuple[np.ndarray, np.ndarray], + version_2_with_negative: bool = False, + n_best_size: int = 20, + max_answer_length: int = 30, + null_score_diff_threshold: float = 0.0, + output_dir: Optional[str] = None, + prefix: Optional[str] = None, + log_level: Optional[int] = logging.WARNING, +): + """ + Post-processes the predictions of a question-answering model to convert them to answers that are substrings of the + original contexts. This is the base postprocessing functions for models that only return start and end logits. + + Args: + examples: The non-preprocessed dataset (see the main script for more information). + features: The processed dataset (see the main script for more information). + predictions (:obj:`Tuple[np.ndarray, np.ndarray]`): + The predictions of the model: two arrays containing the start logits and the end logits respectively. Its + first dimension must match the number of elements of :obj:`features`. + version_2_with_negative (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not the underlying dataset contains examples with no answers. + n_best_size (:obj:`int`, `optional`, defaults to 20): + The total number of n-best predictions to generate when looking for an answer. + max_answer_length (:obj:`int`, `optional`, defaults to 30): + The maximum length of an answer that can be generated. This is needed because the start and end predictions + are not conditioned on one another. + null_score_diff_threshold (:obj:`float`, `optional`, defaults to 0): + The threshold used to select the null answer: if the best answer has a score that is less than the score of + the null answer minus this threshold, the null answer is selected for this example (note that the score of + the null answer for an example giving several features is the minimum of the scores for the null answer on + each feature: all features must be aligned on the fact they `want` to predict a null answer). + + Only useful when :obj:`version_2_with_negative` is :obj:`True`. + output_dir (:obj:`str`, `optional`): + If provided, the dictionaries of predictions, n_best predictions (with their scores and logits) and, if + :obj:`version_2_with_negative=True`, the dictionary of the scores differences between best and null + answers, are saved in `output_dir`. + prefix (:obj:`str`, `optional`): + If provided, the dictionaries mentioned above are saved with `prefix` added to their names. + log_level (:obj:`int`, `optional`, defaults to ``logging.WARNING``): + ``logging`` log level (e.g., ``logging.WARNING``) + """ + if len(predictions) != 2: + raise ValueError("`predictions` should be a tuple with two elements (start_logits, end_logits).") + all_start_logits, all_end_logits = predictions + + if len(predictions[0]) != len(features): + raise ValueError(f"Got {len(predictions[0])} predictions and {len(features)} features.") + + # Build a map example to its corresponding features. + example_id_to_index = {k: i for i, k in enumerate(examples["id"])} + features_per_example = collections.defaultdict(list) + for i, feature in enumerate(features): + features_per_example[example_id_to_index[feature["example_id"]]].append(i) + + # The dictionaries we have to fill. + all_predictions = collections.OrderedDict() + all_nbest_json = collections.OrderedDict() + if version_2_with_negative: + scores_diff_json = collections.OrderedDict() + + # Logging. + logger.setLevel(log_level) + logger.info(f"Post-processing {len(examples)} example predictions split into {len(features)} features.") + + # Let's loop over all the examples! + for example_index, example in enumerate(tqdm(examples)): + # Those are the indices of the features associated to the current example. + feature_indices = features_per_example[example_index] + + min_null_prediction = None + prelim_predictions = [] + + # Looping through all the features associated to the current example. + for feature_index in feature_indices: + # We grab the predictions of the model for this feature. + start_logits = all_start_logits[feature_index] + end_logits = all_end_logits[feature_index] + # This is what will allow us to map some the positions in our logits to span of texts in the original + # context. + offset_mapping = features[feature_index]["offset_mapping"] + # Optional `token_is_max_context`, if provided we will remove answers that do not have the maximum context + # available in the current feature. + token_is_max_context = features[feature_index].get("token_is_max_context", None) + + # Update minimum null prediction. + feature_null_score = start_logits[0] + end_logits[0] + if min_null_prediction is None or min_null_prediction["score"] > feature_null_score: + min_null_prediction = { + "offsets": (0, 0), + "score": feature_null_score, + "start_logit": start_logits[0], + "end_logit": end_logits[0], + } + + # Go through all possibilities for the `n_best_size` greater start and end logits. + start_indexes = np.argsort(start_logits)[-1 : -n_best_size - 1 : -1].tolist() + end_indexes = np.argsort(end_logits)[-1 : -n_best_size - 1 : -1].tolist() + for start_index in start_indexes: + for end_index in end_indexes: + # Don't consider out-of-scope answers, either because the indices are out of bounds or correspond + # to part of the input_ids that are not in the context. + if ( + start_index >= len(offset_mapping) + or end_index >= len(offset_mapping) + or offset_mapping[start_index] is None + or len(offset_mapping[start_index]) < 2 + or offset_mapping[end_index] is None + or len(offset_mapping[end_index]) < 2 + ): + continue + # Don't consider answers with a length that is either < 0 or > max_answer_length. + if end_index < start_index or end_index - start_index + 1 > max_answer_length: + continue + # Don't consider answer that don't have the maximum context available (if such information is + # provided). + if token_is_max_context is not None and not token_is_max_context.get(str(start_index), False): + continue + + prelim_predictions.append( + { + "offsets": (offset_mapping[start_index][0], offset_mapping[end_index][1]), + "score": start_logits[start_index] + end_logits[end_index], + "start_logit": start_logits[start_index], + "end_logit": end_logits[end_index], + } + ) + if version_2_with_negative: + # Add the minimum null prediction + prelim_predictions.append(min_null_prediction) + null_score = min_null_prediction["score"] + + # Only keep the best `n_best_size` predictions. + predictions = sorted(prelim_predictions, key=lambda x: x["score"], reverse=True)[:n_best_size] + + # Add back the minimum null prediction if it was removed because of its low score. + if version_2_with_negative and not any(p["offsets"] == (0, 0) for p in predictions): + predictions.append(min_null_prediction) + + # Use the offsets to gather the answer text in the original context. + context = example["context"] + for pred in predictions: + offsets = pred.pop("offsets") + pred["text"] = context[offsets[0] : offsets[1]] + + # In the very rare edge case we have not a single non-null prediction, we create a fake prediction to avoid + # failure. + if len(predictions) == 0 or (len(predictions) == 1 and predictions[0]["text"] == ""): + predictions.insert(0, {"text": "empty", "start_logit": 0.0, "end_logit": 0.0, "score": 0.0}) + + # Compute the softmax of all scores (we do it with numpy to stay independent from torch/tf in this file, using + # the LogSumExp trick). + scores = np.array([pred.pop("score") for pred in predictions]) + exp_scores = np.exp(scores - np.max(scores)) + probs = exp_scores / exp_scores.sum() + + # Include the probabilities in our predictions. + for prob, pred in zip(probs, predictions): + pred["probability"] = prob + + # Pick the best prediction. If the null answer is not possible, this is easy. + if not version_2_with_negative: + all_predictions[example["id"]] = predictions[0]["text"] + else: + # Otherwise we first need to find the best non-empty prediction. + i = 0 + while predictions[i]["text"] == "": + i += 1 + best_non_null_pred = predictions[i] + + # Then we compare to the null prediction using the threshold. + score_diff = null_score - best_non_null_pred["start_logit"] - best_non_null_pred["end_logit"] + scores_diff_json[example["id"]] = float(score_diff) # To be JSON-serializable. + if score_diff > null_score_diff_threshold: + all_predictions[example["id"]] = "" + else: + all_predictions[example["id"]] = best_non_null_pred["text"] + + # Make `predictions` JSON-serializable by casting np.float back to float. + all_nbest_json[example["id"]] = [ + {k: (float(v) if isinstance(v, (np.float16, np.float32, np.float64)) else v) for k, v in pred.items()} + for pred in predictions + ] + + # If we have an output_dir, let's save all those dicts. + if output_dir is not None: + if not os.path.isdir(output_dir): + raise EnvironmentError(f"{output_dir} is not a directory.") + + prediction_file = os.path.join( + output_dir, "predictions.json" if prefix is None else f"{prefix}_predictions.json" + ) + nbest_file = os.path.join( + output_dir, "nbest_predictions.json" if prefix is None else f"{prefix}_nbest_predictions.json" + ) + if version_2_with_negative: + null_odds_file = os.path.join( + output_dir, "null_odds.json" if prefix is None else f"{prefix}_null_odds.json" + ) + + logger.info(f"Saving predictions to {prediction_file}.") + with open(prediction_file, "w") as writer: + writer.write(json.dumps(all_predictions, indent=4) + "\n") + logger.info(f"Saving nbest_preds to {nbest_file}.") + with open(nbest_file, "w") as writer: + writer.write(json.dumps(all_nbest_json, indent=4) + "\n") + if version_2_with_negative: + logger.info(f"Saving null_odds to {null_odds_file}.") + with open(null_odds_file, "w") as writer: + writer.write(json.dumps(scores_diff_json, indent=4) + "\n") + + return all_predictions + + +def postprocess_qa_predictions_with_beam_search( + examples, + features, + predictions: Tuple[np.ndarray, np.ndarray], + version_2_with_negative: bool = False, + n_best_size: int = 20, + max_answer_length: int = 30, + start_n_top: int = 5, + end_n_top: int = 5, + output_dir: Optional[str] = None, + prefix: Optional[str] = None, + log_level: Optional[int] = logging.WARNING, +): + """ + Post-processes the predictions of a question-answering model with beam search to convert them to answers that are substrings of the + original contexts. This is the postprocessing functions for models that return start and end logits, indices, as well as + cls token predictions. + + Args: + examples: The non-preprocessed dataset (see the main script for more information). + features: The processed dataset (see the main script for more information). + predictions (:obj:`Tuple[np.ndarray, np.ndarray]`): + The predictions of the model: two arrays containing the start logits and the end logits respectively. Its + first dimension must match the number of elements of :obj:`features`. + version_2_with_negative (:obj:`bool`, `optional`, defaults to :obj:`False`): + Whether or not the underlying dataset contains examples with no answers. + n_best_size (:obj:`int`, `optional`, defaults to 20): + The total number of n-best predictions to generate when looking for an answer. + max_answer_length (:obj:`int`, `optional`, defaults to 30): + The maximum length of an answer that can be generated. This is needed because the start and end predictions + are not conditioned on one another. + start_n_top (:obj:`int`, `optional`, defaults to 5): + The number of top start logits too keep when searching for the :obj:`n_best_size` predictions. + end_n_top (:obj:`int`, `optional`, defaults to 5): + The number of top end logits too keep when searching for the :obj:`n_best_size` predictions. + output_dir (:obj:`str`, `optional`): + If provided, the dictionaries of predictions, n_best predictions (with their scores and logits) and, if + :obj:`version_2_with_negative=True`, the dictionary of the scores differences between best and null + answers, are saved in `output_dir`. + prefix (:obj:`str`, `optional`): + If provided, the dictionaries mentioned above are saved with `prefix` added to their names. + log_level (:obj:`int`, `optional`, defaults to ``logging.WARNING``): + ``logging`` log level (e.g., ``logging.WARNING``) + """ + if len(predictions) != 5: + raise ValueError("`predictions` should be a tuple with five elements.") + start_top_log_probs, start_top_index, end_top_log_probs, end_top_index, cls_logits = predictions + + if len(predictions[0]) != len(features): + raise ValueError(f"Got {len(predictions[0])} predictions and {len(features)} features.") + + # Build a map example to its corresponding features. + example_id_to_index = {k: i for i, k in enumerate(examples["id"])} + features_per_example = collections.defaultdict(list) + for i, feature in enumerate(features): + features_per_example[example_id_to_index[feature["example_id"]]].append(i) + + # The dictionaries we have to fill. + all_predictions = collections.OrderedDict() + all_nbest_json = collections.OrderedDict() + scores_diff_json = collections.OrderedDict() if version_2_with_negative else None + + # Logging. + logger.setLevel(log_level) + logger.info(f"Post-processing {len(examples)} example predictions split into {len(features)} features.") + + # Let's loop over all the examples! + for example_index, example in enumerate(tqdm(examples)): + # Those are the indices of the features associated to the current example. + feature_indices = features_per_example[example_index] + + min_null_score = None + prelim_predictions = [] + + # Looping through all the features associated to the current example. + for feature_index in feature_indices: + # We grab the predictions of the model for this feature. + start_log_prob = start_top_log_probs[feature_index] + start_indexes = start_top_index[feature_index] + end_log_prob = end_top_log_probs[feature_index] + end_indexes = end_top_index[feature_index] + feature_null_score = cls_logits[feature_index] + # This is what will allow us to map some the positions in our logits to span of texts in the original + # context. + offset_mapping = features[feature_index]["offset_mapping"] + # Optional `token_is_max_context`, if provided we will remove answers that do not have the maximum context + # available in the current feature. + token_is_max_context = features[feature_index].get("token_is_max_context", None) + + # Update minimum null prediction + if min_null_score is None or feature_null_score < min_null_score: + min_null_score = feature_null_score + + # Go through all possibilities for the `n_start_top`/`n_end_top` greater start and end logits. + for i in range(start_n_top): + for j in range(end_n_top): + start_index = int(start_indexes[i]) + j_index = i * end_n_top + j + end_index = int(end_indexes[j_index]) + # Don't consider out-of-scope answers (last part of the test should be unnecessary because of the + # p_mask but let's not take any risk) + if ( + start_index >= len(offset_mapping) + or end_index >= len(offset_mapping) + or offset_mapping[start_index] is None + or offset_mapping[end_index] is None + ): + continue + # Don't consider answers with a length negative or > max_answer_length. + if end_index < start_index or end_index - start_index + 1 > max_answer_length: + continue + # Don't consider answer that don't have the maximum context available (if such information is + # provided). + if token_is_max_context is not None and not token_is_max_context.get(str(start_index), False): + continue + prelim_predictions.append( + { + "offsets": (offset_mapping[start_index][0], offset_mapping[end_index][1]), + "score": start_log_prob[i] + end_log_prob[j_index], + "start_log_prob": start_log_prob[i], + "end_log_prob": end_log_prob[j_index], + } + ) + + # Only keep the best `n_best_size` predictions. + predictions = sorted(prelim_predictions, key=lambda x: x["score"], reverse=True)[:n_best_size] + + # Use the offsets to gather the answer text in the original context. + context = example["context"] + for pred in predictions: + offsets = pred.pop("offsets") + pred["text"] = context[offsets[0] : offsets[1]] + + # In the very rare edge case we have not a single non-null prediction, we create a fake prediction to avoid + # failure. + if len(predictions) == 0: + predictions.insert(0, {"text": "", "start_logit": -1e-6, "end_logit": -1e-6, "score": -2e-6}) + + # Compute the softmax of all scores (we do it with numpy to stay independent from torch/tf in this file, using + # the LogSumExp trick). + scores = np.array([pred.pop("score") for pred in predictions]) + exp_scores = np.exp(scores - np.max(scores)) + probs = exp_scores / exp_scores.sum() + + # Include the probabilities in our predictions. + for prob, pred in zip(probs, predictions): + pred["probability"] = prob + + # Pick the best prediction and set the probability for the null answer. + all_predictions[example["id"]] = predictions[0]["text"] + if version_2_with_negative: + scores_diff_json[example["id"]] = float(min_null_score) + + # Make `predictions` JSON-serializable by casting np.float back to float. + all_nbest_json[example["id"]] = [ + {k: (float(v) if isinstance(v, (np.float16, np.float32, np.float64)) else v) for k, v in pred.items()} + for pred in predictions + ] + + # If we have an output_dir, let's save all those dicts. + if output_dir is not None: + if not os.path.isdir(output_dir): + raise EnvironmentError(f"{output_dir} is not a directory.") + + prediction_file = os.path.join( + output_dir, "predictions.json" if prefix is None else f"{prefix}_predictions.json" + ) + nbest_file = os.path.join( + output_dir, "nbest_predictions.json" if prefix is None else f"{prefix}_nbest_predictions.json" + ) + if version_2_with_negative: + null_odds_file = os.path.join( + output_dir, "null_odds.json" if prefix is None else f"{prefix}_null_odds.json" + ) + + logger.info(f"Saving predictions to {prediction_file}.") + with open(prediction_file, "w") as writer: + writer.write(json.dumps(all_predictions, indent=4) + "\n") + logger.info(f"Saving nbest_preds to {nbest_file}.") + with open(nbest_file, "w") as writer: + writer.write(json.dumps(all_nbest_json, indent=4) + "\n") + if version_2_with_negative: + logger.info(f"Saving null_odds to {null_odds_file}.") + with open(null_odds_file, "w") as writer: + writer.write(json.dumps(scores_diff_json, indent=4) + "\n") + + return all_predictions, scores_diff_json From ae3ffbd4ffb49d55ec1f5a21c1b1ce699c4ac0eb Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Sat, 19 Mar 2022 01:39:44 +0800 Subject: [PATCH 09/49] format --- .../training/huggingface/bert/qa/run_qa.py | 189 ++++++++++++++---- .../huggingface/bert/qa/trainer_qa.py | 46 ++++- .../training/huggingface/bert/qa/utils_qa.py | 150 +++++++++++--- 3 files changed, 301 insertions(+), 84 deletions(-) diff --git a/examples/training/huggingface/bert/qa/run_qa.py b/examples/training/huggingface/bert/qa/run_qa.py index 1657c192..04055b16 100644 --- a/examples/training/huggingface/bert/qa/run_qa.py +++ b/examples/training/huggingface/bert/qa/run_qa.py @@ -52,7 +52,10 @@ # Will error if the minimal version of Transformers is not installed. Remove at your own risks. check_min_version("4.17.0") -require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/question-answering/requirements.txt") +require_version( + "datasets>=1.8.0", + "To fix: pip install -r examples/pytorch/question-answering/requirements.txt", +) logger = logging.getLogger(__name__) @@ -64,21 +67,33 @@ class ModelArguments: """ model_name_or_path: str = field( - metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"} + metadata={ + "help": "Path to pretrained model or model identifier from huggingface.co/models" + } ) config_name: Optional[str] = field( - default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"} + default=None, + metadata={ + "help": "Pretrained config name or path if not the same as model_name" + }, ) tokenizer_name: Optional[str] = field( - default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"} + default=None, + metadata={ + "help": "Pretrained tokenizer name or path if not the same as model_name" + }, ) cache_dir: Optional[str] = field( default=None, - metadata={"help": "Path to directory to store the pretrained models downloaded from huggingface.co"}, + metadata={ + "help": "Path to directory to store the pretrained models downloaded from huggingface.co" + }, ) model_revision: str = field( default="main", - metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."}, + metadata={ + "help": "The specific model version to use (can be a branch name, tag name or commit id)." + }, ) use_auth_token: bool = field( default=False, @@ -106,22 +121,33 @@ class DataTrainingArguments: """ dataset_name: Optional[str] = field( - default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."} + default=None, + metadata={"help": "The name of the dataset to use (via the datasets library)."}, ) dataset_config_name: Optional[str] = field( - default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."} + default=None, + metadata={ + "help": "The configuration name of the dataset to use (via the datasets library)." + }, + ) + train_file: Optional[str] = field( + default=None, metadata={"help": "The input training data file (a text file)."} ) - train_file: Optional[str] = field(default=None, metadata={"help": "The input training data file (a text file)."}) validation_file: Optional[str] = field( default=None, - metadata={"help": "An optional input evaluation data file to evaluate the perplexity on (a text file)."}, + metadata={ + "help": "An optional input evaluation data file to evaluate the perplexity on (a text file)." + }, ) test_file: Optional[str] = field( default=None, - metadata={"help": "An optional input test data file to evaluate the perplexity on (a text file)."}, + metadata={ + "help": "An optional input test data file to evaluate the perplexity on (a text file)." + }, ) overwrite_cache: bool = field( - default=False, metadata={"help": "Overwrite the cached training and evaluation sets"} + default=False, + metadata={"help": "Overwrite the cached training and evaluation sets"}, ) preprocessing_num_workers: Optional[int] = field( default=None, @@ -164,7 +190,8 @@ class DataTrainingArguments: }, ) version_2_with_negative: bool = field( - default=False, metadata={"help": "If true, some of the examples do not have an answer."} + default=False, + metadata={"help": "If true, some of the examples do not have an answer."}, ) null_score_diff_threshold: float = field( default=0.0, @@ -176,11 +203,15 @@ class DataTrainingArguments: ) doc_stride: int = field( default=128, - metadata={"help": "When splitting up a long document into chunks, how much stride to take between chunks."}, + metadata={ + "help": "When splitting up a long document into chunks, how much stride to take between chunks." + }, ) n_best_size: int = field( default=20, - metadata={"help": "The total number of n-best predictions to generate when looking for an answer."}, + metadata={ + "help": "The total number of n-best predictions to generate when looking for an answer." + }, ) max_answer_length: int = field( default=30, @@ -197,17 +228,28 @@ def __post_init__(self): and self.validation_file is None and self.test_file is None ): - raise ValueError("Need either a dataset name or a training/validation file/test_file.") + raise ValueError( + "Need either a dataset name or a training/validation file/test_file." + ) else: if self.train_file is not None: extension = self.train_file.split(".")[-1] - assert extension in ["csv", "json"], "`train_file` should be a csv or a json file." + assert extension in [ + "csv", + "json", + ], "`train_file` should be a csv or a json file." if self.validation_file is not None: extension = self.validation_file.split(".")[-1] - assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file." + assert extension in [ + "csv", + "json", + ], "`validation_file` should be a csv or a json file." if self.test_file is not None: extension = self.test_file.split(".")[-1] - assert extension in ["csv", "json"], "`test_file` should be a csv or a json file." + assert extension in [ + "csv", + "json", + ], "`test_file` should be a csv or a json file." def main(): @@ -215,11 +257,15 @@ def main(): # or by passing the --help flag to this script. # We now keep distinct sets of args, for a cleaner separation of concerns. - parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments)) + parser = HfArgumentParser( + (ModelArguments, DataTrainingArguments, TrainingArguments) + ) if len(sys.argv) == 2 and sys.argv[1].endswith(".json"): # If we pass only one argument to the script and it's the path to a json file, # let's parse it to get our arguments. - model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1])) + model_args, data_args, training_args = parser.parse_json_file( + json_file=os.path.abspath(sys.argv[1]) + ) else: model_args, data_args, training_args = parser.parse_args_into_dataclasses() @@ -246,14 +292,20 @@ def main(): # Detecting last checkpoint. last_checkpoint = None - if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir: + if ( + os.path.isdir(training_args.output_dir) + and training_args.do_train + and not training_args.overwrite_output_dir + ): last_checkpoint = get_last_checkpoint(training_args.output_dir) if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0: raise ValueError( f"Output directory ({training_args.output_dir}) already exists and is not empty. " "Use --overwrite_output_dir to overcome." ) - elif last_checkpoint is not None and training_args.resume_from_checkpoint is None: + elif ( + last_checkpoint is not None and training_args.resume_from_checkpoint is None + ): logger.info( f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change " "the `--output_dir` or add `--overwrite_output_dir` to train from scratch." @@ -274,7 +326,9 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, ) else: data_files = {} @@ -288,7 +342,12 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, field="data", cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + field="data", + cache_dir=model_args.cache_dir, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. @@ -298,13 +357,17 @@ def main(): # The .from_pretrained methods guarantee that only one local process can concurrently # download model & vocab. config = AutoConfig.from_pretrained( - model_args.config_name if model_args.config_name else model_args.model_name_or_path, + model_args.config_name + if model_args.config_name + else model_args.model_name_or_path, cache_dir=model_args.cache_dir, revision=model_args.model_revision, use_auth_token=True if model_args.use_auth_token else None, ) tokenizer = AutoTokenizer.from_pretrained( - model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path, + model_args.tokenizer_name + if model_args.tokenizer_name + else model_args.model_name_or_path, cache_dir=model_args.cache_dir, use_fast=True, revision=model_args.model_revision, @@ -358,7 +421,9 @@ def prepare_train_features(examples): # Some of the questions have lots of whitespace on the left, which is not useful and will make the # truncation of the context fail (the tokenized question will take a lots of space). So we remove that # left whitespace - examples[question_column_name] = [q.lstrip() for q in examples[question_column_name]] + examples[question_column_name] = [ + q.lstrip() for q in examples[question_column_name] + ] # Tokenize our examples with truncation and maybe padding, but keep the overflows using a stride. This results # in one example possible giving several features when a context is long, each of those features having a @@ -416,13 +481,19 @@ def prepare_train_features(examples): token_end_index -= 1 # Detect if the answer is out of the span (in which case this feature is labeled with the CLS index). - if not (offsets[token_start_index][0] <= start_char and offsets[token_end_index][1] >= end_char): + if not ( + offsets[token_start_index][0] <= start_char + and offsets[token_end_index][1] >= end_char + ): tokenized_examples["start_positions"].append(cls_index) tokenized_examples["end_positions"].append(cls_index) else: # Otherwise move the token_start_index and token_end_index to the two ends of the answer. # Note: we could go after the last offset if the answer is the last word (edge case). - while token_start_index < len(offsets) and offsets[token_start_index][0] <= start_char: + while ( + token_start_index < len(offsets) + and offsets[token_start_index][0] <= start_char + ): token_start_index += 1 tokenized_examples["start_positions"].append(token_start_index - 1) while offsets[token_end_index][1] >= end_char: @@ -457,7 +528,9 @@ def prepare_validation_features(examples): # Some of the questions have lots of whitespace on the left, which is not useful and will make the # truncation of the context fail (the tokenized question will take a lots of space). So we remove that # left whitespace - examples[question_column_name] = [q.lstrip() for q in examples[question_column_name]] + examples[question_column_name] = [ + q.lstrip() for q in examples[question_column_name] + ] # Tokenize our examples with truncation and maybe padding, but keep the overflows using a stride. This results # in one example possible giving several features when a context is long, each of those features having a @@ -507,7 +580,9 @@ def prepare_validation_features(examples): # We will select sample from whole data eval_examples = eval_examples.select(range(data_args.max_eval_samples)) # Validation Feature Creation - with training_args.main_process_first(desc="validation dataset map pre-processing"): + with training_args.main_process_first( + desc="validation dataset map pre-processing" + ): eval_dataset = eval_examples.map( prepare_validation_features, batched=True, @@ -526,9 +601,13 @@ def prepare_validation_features(examples): predict_examples = raw_datasets["test"] if data_args.max_predict_samples is not None: # We will select sample from whole data - predict_examples = predict_examples.select(range(data_args.max_predict_samples)) + predict_examples = predict_examples.select( + range(data_args.max_predict_samples) + ) # Predict Feature Creation - with training_args.main_process_first(desc="prediction dataset map pre-processing"): + with training_args.main_process_first( + desc="prediction dataset map pre-processing" + ): predict_dataset = predict_examples.map( prepare_validation_features, batched=True, @@ -539,7 +618,9 @@ def prepare_validation_features(examples): ) if data_args.max_predict_samples is not None: # During Feature creation dataset samples might increase, we will select required samples again - predict_dataset = predict_dataset.select(range(data_args.max_predict_samples)) + predict_dataset = predict_dataset.select( + range(data_args.max_predict_samples) + ) # Data collator # We have already padded to max length if the corresponding flag is True, otherwise we need to pad in the data @@ -547,7 +628,9 @@ def prepare_validation_features(examples): data_collator = ( default_data_collator if data_args.pad_to_max_length - else DataCollatorWithPadding(tokenizer, pad_to_multiple_of=8 if training_args.fp16 else None) + else DataCollatorWithPadding( + tokenizer, pad_to_multiple_of=8 if training_args.fp16 else None + ) ) # Post-processing: @@ -568,12 +651,17 @@ def post_processing_function(examples, features, predictions, stage="eval"): # Format the result to the format the metric expects. if data_args.version_2_with_negative: formatted_predictions = [ - {"id": k, "prediction_text": v, "no_answer_probability": 0.0} for k, v in predictions.items() + {"id": k, "prediction_text": v, "no_answer_probability": 0.0} + for k, v in predictions.items() ] else: - formatted_predictions = [{"id": k, "prediction_text": v} for k, v in predictions.items()] + formatted_predictions = [ + {"id": k, "prediction_text": v} for k, v in predictions.items() + ] - references = [{"id": ex["id"], "answers": ex[answer_column_name]} for ex in examples] + references = [ + {"id": ex["id"], "answers": ex[answer_column_name]} for ex in examples + ] return EvalPrediction(predictions=formatted_predictions, label_ids=references) metric = load_metric("squad_v2" if data_args.version_2_with_negative else "squad") @@ -606,7 +694,9 @@ def compute_metrics(p: EvalPrediction): metrics = train_result.metrics max_train_samples = ( - data_args.max_train_samples if data_args.max_train_samples is not None else len(train_dataset) + data_args.max_train_samples + if data_args.max_train_samples is not None + else len(train_dataset) ) metrics["train_samples"] = min(max_train_samples, len(train_dataset)) @@ -619,7 +709,11 @@ def compute_metrics(p: EvalPrediction): logger.info("*** Evaluate ***") metrics = trainer.evaluate() - max_eval_samples = data_args.max_eval_samples if data_args.max_eval_samples is not None else len(eval_dataset) + max_eval_samples = ( + data_args.max_eval_samples + if data_args.max_eval_samples is not None + else len(eval_dataset) + ) metrics["eval_samples"] = min(max_eval_samples, len(eval_dataset)) trainer.log_metrics("eval", metrics) @@ -632,19 +726,26 @@ def compute_metrics(p: EvalPrediction): metrics = results.metrics max_predict_samples = ( - data_args.max_predict_samples if data_args.max_predict_samples is not None else len(predict_dataset) + data_args.max_predict_samples + if data_args.max_predict_samples is not None + else len(predict_dataset) ) metrics["predict_samples"] = min(max_predict_samples, len(predict_dataset)) trainer.log_metrics("predict", metrics) trainer.save_metrics("predict", metrics) - kwargs = {"finetuned_from": model_args.model_name_or_path, "tasks": "question-answering"} + kwargs = { + "finetuned_from": model_args.model_name_or_path, + "tasks": "question-answering", + } if data_args.dataset_name is not None: kwargs["dataset_tags"] = data_args.dataset_name if data_args.dataset_config_name is not None: kwargs["dataset_args"] = data_args.dataset_config_name - kwargs["dataset"] = f"{data_args.dataset_name} {data_args.dataset_config_name}" + kwargs[ + "dataset" + ] = f"{data_args.dataset_name} {data_args.dataset_config_name}" else: kwargs["dataset"] = data_args.dataset_name diff --git a/examples/training/huggingface/bert/qa/trainer_qa.py b/examples/training/huggingface/bert/qa/trainer_qa.py index 7f98eba2..c3c2ba01 100644 --- a/examples/training/huggingface/bert/qa/trainer_qa.py +++ b/examples/training/huggingface/bert/qa/trainer_qa.py @@ -31,7 +31,13 @@ def __init__(self, *args, eval_examples=None, post_process_function=None, **kwar self.eval_examples = eval_examples self.post_process_function = post_process_function - def evaluate(self, eval_dataset=None, eval_examples=None, ignore_keys=None, metric_key_prefix: str = "eval"): + def evaluate( + self, + eval_dataset=None, + eval_examples=None, + ignore_keys=None, + metric_key_prefix: str = "eval", + ): eval_dataset = self.eval_dataset if eval_dataset is None else eval_dataset eval_dataloader = self.get_eval_dataloader(eval_dataset) eval_examples = self.eval_examples if eval_examples is None else eval_examples @@ -39,7 +45,11 @@ def evaluate(self, eval_dataset=None, eval_examples=None, ignore_keys=None, metr # Temporarily disable metric computation, we will do it in the loop here. compute_metrics = self.compute_metrics self.compute_metrics = None - eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop + eval_loop = ( + self.prediction_loop + if self.args.use_legacy_prediction_loop + else self.evaluation_loop + ) try: output = eval_loop( eval_dataloader, @@ -53,7 +63,9 @@ def evaluate(self, eval_dataset=None, eval_examples=None, ignore_keys=None, metr self.compute_metrics = compute_metrics if self.post_process_function is not None and self.compute_metrics is not None: - eval_preds = self.post_process_function(eval_examples, eval_dataset, output.predictions) + eval_preds = self.post_process_function( + eval_examples, eval_dataset, output.predictions + ) metrics = self.compute_metrics(eval_preds) # Prefix all keys with metric_key_prefix + '_' @@ -69,16 +81,28 @@ def evaluate(self, eval_dataset=None, eval_examples=None, ignore_keys=None, metr # tpu-comment: Logging debug metrics for PyTorch/XLA (compile, execute times, ops, etc.) xm.master_print(met.metrics_report()) - self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, metrics) + self.control = self.callback_handler.on_evaluate( + self.args, self.state, self.control, metrics + ) return metrics - def predict(self, predict_dataset, predict_examples, ignore_keys=None, metric_key_prefix: str = "test"): + def predict( + self, + predict_dataset, + predict_examples, + ignore_keys=None, + metric_key_prefix: str = "test", + ): predict_dataloader = self.get_test_dataloader(predict_dataset) # Temporarily disable metric computation, we will do it in the loop here. compute_metrics = self.compute_metrics self.compute_metrics = None - eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop + eval_loop = ( + self.prediction_loop + if self.args.use_legacy_prediction_loop + else self.evaluation_loop + ) try: output = eval_loop( predict_dataloader, @@ -94,7 +118,9 @@ def predict(self, predict_dataset, predict_examples, ignore_keys=None, metric_ke if self.post_process_function is None or self.compute_metrics is None: return output - predictions = self.post_process_function(predict_examples, predict_dataset, output.predictions, "predict") + predictions = self.post_process_function( + predict_examples, predict_dataset, output.predictions, "predict" + ) metrics = self.compute_metrics(predictions) # Prefix all keys with metric_key_prefix + '_' @@ -102,4 +128,8 @@ def predict(self, predict_dataset, predict_examples, ignore_keys=None, metric_ke if not key.startswith(f"{metric_key_prefix}_"): metrics[f"{metric_key_prefix}_{key}"] = metrics.pop(key) - return PredictionOutput(predictions=predictions.predictions, label_ids=predictions.label_ids, metrics=metrics) + return PredictionOutput( + predictions=predictions.predictions, + label_ids=predictions.label_ids, + metrics=metrics, + ) diff --git a/examples/training/huggingface/bert/qa/utils_qa.py b/examples/training/huggingface/bert/qa/utils_qa.py index fd0bc16f..c1c5c10b 100644 --- a/examples/training/huggingface/bert/qa/utils_qa.py +++ b/examples/training/huggingface/bert/qa/utils_qa.py @@ -74,11 +74,15 @@ def postprocess_qa_predictions( ``logging`` log level (e.g., ``logging.WARNING``) """ if len(predictions) != 2: - raise ValueError("`predictions` should be a tuple with two elements (start_logits, end_logits).") + raise ValueError( + "`predictions` should be a tuple with two elements (start_logits, end_logits)." + ) all_start_logits, all_end_logits = predictions if len(predictions[0]) != len(features): - raise ValueError(f"Got {len(predictions[0])} predictions and {len(features)} features.") + raise ValueError( + f"Got {len(predictions[0])} predictions and {len(features)} features." + ) # Build a map example to its corresponding features. example_id_to_index = {k: i for i, k in enumerate(examples["id"])} @@ -94,7 +98,9 @@ def postprocess_qa_predictions( # Logging. logger.setLevel(log_level) - logger.info(f"Post-processing {len(examples)} example predictions split into {len(features)} features.") + logger.info( + f"Post-processing {len(examples)} example predictions split into {len(features)} features." + ) # Let's loop over all the examples! for example_index, example in enumerate(tqdm(examples)): @@ -114,11 +120,16 @@ def postprocess_qa_predictions( offset_mapping = features[feature_index]["offset_mapping"] # Optional `token_is_max_context`, if provided we will remove answers that do not have the maximum context # available in the current feature. - token_is_max_context = features[feature_index].get("token_is_max_context", None) + token_is_max_context = features[feature_index].get( + "token_is_max_context", None + ) # Update minimum null prediction. feature_null_score = start_logits[0] + end_logits[0] - if min_null_prediction is None or min_null_prediction["score"] > feature_null_score: + if ( + min_null_prediction is None + or min_null_prediction["score"] > feature_null_score + ): min_null_prediction = { "offsets": (0, 0), "score": feature_null_score, @@ -127,7 +138,9 @@ def postprocess_qa_predictions( } # Go through all possibilities for the `n_best_size` greater start and end logits. - start_indexes = np.argsort(start_logits)[-1 : -n_best_size - 1 : -1].tolist() + start_indexes = np.argsort(start_logits)[ + -1 : -n_best_size - 1 : -1 + ].tolist() end_indexes = np.argsort(end_logits)[-1 : -n_best_size - 1 : -1].tolist() for start_index in start_indexes: for end_index in end_indexes: @@ -143,16 +156,25 @@ def postprocess_qa_predictions( ): continue # Don't consider answers with a length that is either < 0 or > max_answer_length. - if end_index < start_index or end_index - start_index + 1 > max_answer_length: + if ( + end_index < start_index + or end_index - start_index + 1 > max_answer_length + ): continue # Don't consider answer that don't have the maximum context available (if such information is # provided). - if token_is_max_context is not None and not token_is_max_context.get(str(start_index), False): + if ( + token_is_max_context is not None + and not token_is_max_context.get(str(start_index), False) + ): continue prelim_predictions.append( { - "offsets": (offset_mapping[start_index][0], offset_mapping[end_index][1]), + "offsets": ( + offset_mapping[start_index][0], + offset_mapping[end_index][1], + ), "score": start_logits[start_index] + end_logits[end_index], "start_logit": start_logits[start_index], "end_logit": end_logits[end_index], @@ -164,10 +186,14 @@ def postprocess_qa_predictions( null_score = min_null_prediction["score"] # Only keep the best `n_best_size` predictions. - predictions = sorted(prelim_predictions, key=lambda x: x["score"], reverse=True)[:n_best_size] + predictions = sorted( + prelim_predictions, key=lambda x: x["score"], reverse=True + )[:n_best_size] # Add back the minimum null prediction if it was removed because of its low score. - if version_2_with_negative and not any(p["offsets"] == (0, 0) for p in predictions): + if version_2_with_negative and not any( + p["offsets"] == (0, 0) for p in predictions + ): predictions.append(min_null_prediction) # Use the offsets to gather the answer text in the original context. @@ -178,8 +204,12 @@ def postprocess_qa_predictions( # In the very rare edge case we have not a single non-null prediction, we create a fake prediction to avoid # failure. - if len(predictions) == 0 or (len(predictions) == 1 and predictions[0]["text"] == ""): - predictions.insert(0, {"text": "empty", "start_logit": 0.0, "end_logit": 0.0, "score": 0.0}) + if len(predictions) == 0 or ( + len(predictions) == 1 and predictions[0]["text"] == "" + ): + predictions.insert( + 0, {"text": "empty", "start_logit": 0.0, "end_logit": 0.0, "score": 0.0} + ) # Compute the softmax of all scores (we do it with numpy to stay independent from torch/tf in this file, using # the LogSumExp trick). @@ -202,8 +232,14 @@ def postprocess_qa_predictions( best_non_null_pred = predictions[i] # Then we compare to the null prediction using the threshold. - score_diff = null_score - best_non_null_pred["start_logit"] - best_non_null_pred["end_logit"] - scores_diff_json[example["id"]] = float(score_diff) # To be JSON-serializable. + score_diff = ( + null_score + - best_non_null_pred["start_logit"] + - best_non_null_pred["end_logit"] + ) + scores_diff_json[example["id"]] = float( + score_diff + ) # To be JSON-serializable. if score_diff > null_score_diff_threshold: all_predictions[example["id"]] = "" else: @@ -211,7 +247,14 @@ def postprocess_qa_predictions( # Make `predictions` JSON-serializable by casting np.float back to float. all_nbest_json[example["id"]] = [ - {k: (float(v) if isinstance(v, (np.float16, np.float32, np.float64)) else v) for k, v in pred.items()} + { + k: ( + float(v) + if isinstance(v, (np.float16, np.float32, np.float64)) + else v + ) + for k, v in pred.items() + } for pred in predictions ] @@ -221,14 +264,19 @@ def postprocess_qa_predictions( raise EnvironmentError(f"{output_dir} is not a directory.") prediction_file = os.path.join( - output_dir, "predictions.json" if prefix is None else f"{prefix}_predictions.json" + output_dir, + "predictions.json" if prefix is None else f"{prefix}_predictions.json", ) nbest_file = os.path.join( - output_dir, "nbest_predictions.json" if prefix is None else f"{prefix}_nbest_predictions.json" + output_dir, + "nbest_predictions.json" + if prefix is None + else f"{prefix}_nbest_predictions.json", ) if version_2_with_negative: null_odds_file = os.path.join( - output_dir, "null_odds.json" if prefix is None else f"{prefix}_null_odds.json" + output_dir, + "null_odds.json" if prefix is None else f"{prefix}_null_odds.json", ) logger.info(f"Saving predictions to {prediction_file}.") @@ -291,10 +339,18 @@ def postprocess_qa_predictions_with_beam_search( """ if len(predictions) != 5: raise ValueError("`predictions` should be a tuple with five elements.") - start_top_log_probs, start_top_index, end_top_log_probs, end_top_index, cls_logits = predictions + ( + start_top_log_probs, + start_top_index, + end_top_log_probs, + end_top_index, + cls_logits, + ) = predictions if len(predictions[0]) != len(features): - raise ValueError(f"Got {len(predictions[0])} predictions and {len(features)} features.") + raise ValueError( + f"Got {len(predictions[0])} predictions and {len(features)} features." + ) # Build a map example to its corresponding features. example_id_to_index = {k: i for i, k in enumerate(examples["id"])} @@ -309,7 +365,9 @@ def postprocess_qa_predictions_with_beam_search( # Logging. logger.setLevel(log_level) - logger.info(f"Post-processing {len(examples)} example predictions split into {len(features)} features.") + logger.info( + f"Post-processing {len(examples)} example predictions split into {len(features)} features." + ) # Let's loop over all the examples! for example_index, example in enumerate(tqdm(examples)): @@ -332,7 +390,9 @@ def postprocess_qa_predictions_with_beam_search( offset_mapping = features[feature_index]["offset_mapping"] # Optional `token_is_max_context`, if provided we will remove answers that do not have the maximum context # available in the current feature. - token_is_max_context = features[feature_index].get("token_is_max_context", None) + token_is_max_context = features[feature_index].get( + "token_is_max_context", None + ) # Update minimum null prediction if min_null_score is None or feature_null_score < min_null_score: @@ -354,15 +414,24 @@ def postprocess_qa_predictions_with_beam_search( ): continue # Don't consider answers with a length negative or > max_answer_length. - if end_index < start_index or end_index - start_index + 1 > max_answer_length: + if ( + end_index < start_index + or end_index - start_index + 1 > max_answer_length + ): continue # Don't consider answer that don't have the maximum context available (if such information is # provided). - if token_is_max_context is not None and not token_is_max_context.get(str(start_index), False): + if ( + token_is_max_context is not None + and not token_is_max_context.get(str(start_index), False) + ): continue prelim_predictions.append( { - "offsets": (offset_mapping[start_index][0], offset_mapping[end_index][1]), + "offsets": ( + offset_mapping[start_index][0], + offset_mapping[end_index][1], + ), "score": start_log_prob[i] + end_log_prob[j_index], "start_log_prob": start_log_prob[i], "end_log_prob": end_log_prob[j_index], @@ -370,7 +439,9 @@ def postprocess_qa_predictions_with_beam_search( ) # Only keep the best `n_best_size` predictions. - predictions = sorted(prelim_predictions, key=lambda x: x["score"], reverse=True)[:n_best_size] + predictions = sorted( + prelim_predictions, key=lambda x: x["score"], reverse=True + )[:n_best_size] # Use the offsets to gather the answer text in the original context. context = example["context"] @@ -381,7 +452,10 @@ def postprocess_qa_predictions_with_beam_search( # In the very rare edge case we have not a single non-null prediction, we create a fake prediction to avoid # failure. if len(predictions) == 0: - predictions.insert(0, {"text": "", "start_logit": -1e-6, "end_logit": -1e-6, "score": -2e-6}) + predictions.insert( + 0, + {"text": "", "start_logit": -1e-6, "end_logit": -1e-6, "score": -2e-6}, + ) # Compute the softmax of all scores (we do it with numpy to stay independent from torch/tf in this file, using # the LogSumExp trick). @@ -400,7 +474,14 @@ def postprocess_qa_predictions_with_beam_search( # Make `predictions` JSON-serializable by casting np.float back to float. all_nbest_json[example["id"]] = [ - {k: (float(v) if isinstance(v, (np.float16, np.float32, np.float64)) else v) for k, v in pred.items()} + { + k: ( + float(v) + if isinstance(v, (np.float16, np.float32, np.float64)) + else v + ) + for k, v in pred.items() + } for pred in predictions ] @@ -410,14 +491,19 @@ def postprocess_qa_predictions_with_beam_search( raise EnvironmentError(f"{output_dir} is not a directory.") prediction_file = os.path.join( - output_dir, "predictions.json" if prefix is None else f"{prefix}_predictions.json" + output_dir, + "predictions.json" if prefix is None else f"{prefix}_predictions.json", ) nbest_file = os.path.join( - output_dir, "nbest_predictions.json" if prefix is None else f"{prefix}_nbest_predictions.json" + output_dir, + "nbest_predictions.json" + if prefix is None + else f"{prefix}_nbest_predictions.json", ) if version_2_with_negative: null_odds_file = os.path.join( - output_dir, "null_odds.json" if prefix is None else f"{prefix}_null_odds.json" + output_dir, + "null_odds.json" if prefix is None else f"{prefix}_null_odds.json", ) logger.info(f"Saving predictions to {prediction_file}.") From 6657d868bb297fe3e2ff6991d353ac2880c5643f Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Sat, 19 Mar 2022 02:46:55 +0800 Subject: [PATCH 10/49] rename huggingface dir to fix conflict with datasets --- examples/training/huggingface/bert/README.md | 4 ++-- .../training/huggingface/bert/{glue => task_glue}/run_glue.py | 0 .../training/huggingface/bert/{glue => task_glue}/run_glue.sh | 0 .../huggingface/bert/{glue => task_glue}/run_quant_glue.sh | 0 .../training/huggingface/bert/{ner => task_ner}/run_ner.py | 0 .../training/huggingface/bert/{ner => task_ner}/run_ner.sh | 0 .../huggingface/bert/{ner => task_ner}/run_quant_ner.sh | 0 examples/training/huggingface/bert/{qa => task_qa}/run_qa.py | 0 examples/training/huggingface/bert/{qa => task_qa}/run_qa.sh | 0 .../training/huggingface/bert/{qa => task_qa}/trainer_qa.py | 0 .../training/huggingface/bert/{qa => task_qa}/utils_qa.py | 0 11 files changed, 2 insertions(+), 2 deletions(-) rename examples/training/huggingface/bert/{glue => task_glue}/run_glue.py (100%) rename examples/training/huggingface/bert/{glue => task_glue}/run_glue.sh (100%) rename examples/training/huggingface/bert/{glue => task_glue}/run_quant_glue.sh (100%) rename examples/training/huggingface/bert/{ner => task_ner}/run_ner.py (100%) rename examples/training/huggingface/bert/{ner => task_ner}/run_ner.sh (100%) rename examples/training/huggingface/bert/{ner => task_ner}/run_quant_ner.sh (100%) rename examples/training/huggingface/bert/{qa => task_qa}/run_qa.py (100%) rename examples/training/huggingface/bert/{qa => task_qa}/run_qa.sh (100%) rename examples/training/huggingface/bert/{qa => task_qa}/trainer_qa.py (100%) rename examples/training/huggingface/bert/{qa => task_qa}/utils_qa.py (100%) diff --git a/examples/training/huggingface/bert/README.md b/examples/training/huggingface/bert/README.md index d96138ad..b3f8a0b3 100644 --- a/examples/training/huggingface/bert/README.md +++ b/examples/training/huggingface/bert/README.md @@ -15,5 +15,5 @@ Before doing next training, you need to switch to the current directory: cd examples/training/huggingface/bert ``` -Then you can easily fine-tunes BERT on different task by running the bash scripts `run_ner.sh` -or on GLUE by `run_glue.sh`. From our tests, speedup is about 1.6x. +Then you can easily fine-tunes BERT on different tasks by running the bash scripts `task_ner/run_ner.sh` +, `task_glue/run_glue.sh`, `task_qa/run_qa.sh`, etc. From our tests, speedup is about 1.6x. diff --git a/examples/training/huggingface/bert/glue/run_glue.py b/examples/training/huggingface/bert/task_glue/run_glue.py similarity index 100% rename from examples/training/huggingface/bert/glue/run_glue.py rename to examples/training/huggingface/bert/task_glue/run_glue.py diff --git a/examples/training/huggingface/bert/glue/run_glue.sh b/examples/training/huggingface/bert/task_glue/run_glue.sh similarity index 100% rename from examples/training/huggingface/bert/glue/run_glue.sh rename to examples/training/huggingface/bert/task_glue/run_glue.sh diff --git a/examples/training/huggingface/bert/glue/run_quant_glue.sh b/examples/training/huggingface/bert/task_glue/run_quant_glue.sh similarity index 100% rename from examples/training/huggingface/bert/glue/run_quant_glue.sh rename to examples/training/huggingface/bert/task_glue/run_quant_glue.sh diff --git a/examples/training/huggingface/bert/ner/run_ner.py b/examples/training/huggingface/bert/task_ner/run_ner.py similarity index 100% rename from examples/training/huggingface/bert/ner/run_ner.py rename to examples/training/huggingface/bert/task_ner/run_ner.py diff --git a/examples/training/huggingface/bert/ner/run_ner.sh b/examples/training/huggingface/bert/task_ner/run_ner.sh similarity index 100% rename from examples/training/huggingface/bert/ner/run_ner.sh rename to examples/training/huggingface/bert/task_ner/run_ner.sh diff --git a/examples/training/huggingface/bert/ner/run_quant_ner.sh b/examples/training/huggingface/bert/task_ner/run_quant_ner.sh similarity index 100% rename from examples/training/huggingface/bert/ner/run_quant_ner.sh rename to examples/training/huggingface/bert/task_ner/run_quant_ner.sh diff --git a/examples/training/huggingface/bert/qa/run_qa.py b/examples/training/huggingface/bert/task_qa/run_qa.py similarity index 100% rename from examples/training/huggingface/bert/qa/run_qa.py rename to examples/training/huggingface/bert/task_qa/run_qa.py diff --git a/examples/training/huggingface/bert/qa/run_qa.sh b/examples/training/huggingface/bert/task_qa/run_qa.sh similarity index 100% rename from examples/training/huggingface/bert/qa/run_qa.sh rename to examples/training/huggingface/bert/task_qa/run_qa.sh diff --git a/examples/training/huggingface/bert/qa/trainer_qa.py b/examples/training/huggingface/bert/task_qa/trainer_qa.py similarity index 100% rename from examples/training/huggingface/bert/qa/trainer_qa.py rename to examples/training/huggingface/bert/task_qa/trainer_qa.py diff --git a/examples/training/huggingface/bert/qa/utils_qa.py b/examples/training/huggingface/bert/task_qa/utils_qa.py similarity index 100% rename from examples/training/huggingface/bert/qa/utils_qa.py rename to examples/training/huggingface/bert/task_qa/utils_qa.py From 4a986aee79e35a203073eaa8a453e5197b74dbee Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Sat, 19 Mar 2022 02:57:35 +0800 Subject: [PATCH 11/49] fix typo of gpt --- examples/training/huggingface/gpt/run_clm.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/examples/training/huggingface/gpt/run_clm.py b/examples/training/huggingface/gpt/run_clm.py index 807d1934..90b9dd8d 100644 --- a/examples/training/huggingface/gpt/run_clm.py +++ b/examples/training/huggingface/gpt/run_clm.py @@ -66,7 +66,7 @@ MODEL_CONFIG_CLASSES = list(MODEL_FOR_CAUSAL_LM_MAPPING.keys()) -module_typeS = tuple(conf.module_type for conf in MODEL_CONFIG_CLASSES) +MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES) @dataclass @@ -82,11 +82,11 @@ class ModelArguments: "Don't set if you want to train a model from scratch." }, ) - module_type: Optional[str] = field( + model_type: Optional[str] = field( default=None, metadata={ "help": "If training from scratch, pass a model type from the list: " - + ", ".join(module_typeS) + + ", ".join(MODEL_TYPES) }, ) config_overrides: Optional[str] = field( @@ -390,7 +390,7 @@ def main(): model_args.model_name_or_path, **config_kwargs ) else: - config = CONFIG_MAPPING[model_args.module_type]() + config = CONFIG_MAPPING[model_args.model_type]() logger.warning("You are instantiating a new config instance from scratch.") if model_args.config_overrides is not None: logger.info(f"Overriding config: {model_args.config_overrides}") From 76aa5d8be10734d2bab5cb9dd583f14766d2ccba Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Tue, 22 Mar 2022 14:12:35 +0800 Subject: [PATCH 12/49] export fairseq models to hdf5 --- .../fairseq/ls_fs_transformer_export.py | 83 +++----- .../fairseq/ls_fs_transformer_ptq_export.py | 71 +++---- .../ls_torch_fs_quant_transformer_export.py | 40 ++-- .../fairseq/ls_torch_fs_transformer_export.py | 36 ++-- .../ls_torch_fs_transformer_ptq_export.py | 38 ++-- .../fairseq/native_fs_transformer_export.py | 41 +--- .../native_fs_transformer_ptq_export.py | 32 +-- .../inference/python/export/fairseq/util.py | 46 +++++ .../huggingface/hf_torch_quant_bert_export.py | 182 ++++++++++++++++++ lightseq/training/__init__.py | 3 +- lightseq/training/ops/pytorch/export.py | 2 +- .../{export_ptq.py => export_quant.py} | 162 ++++++++++++++++ 12 files changed, 507 insertions(+), 229 deletions(-) create mode 100644 examples/inference/python/export/fairseq/util.py create mode 100644 examples/inference/python/export/huggingface/hf_torch_quant_bert_export.py rename lightseq/training/ops/pytorch/{export_ptq.py => export_quant.py} (67%) diff --git a/examples/inference/python/export/fairseq/ls_fs_transformer_export.py b/examples/inference/python/export/fairseq/ls_fs_transformer_export.py index ff4b7704..c1930940 100644 --- a/examples/inference/python/export/fairseq/ls_fs_transformer_export.py +++ b/examples/inference/python/export/fairseq/ls_fs_transformer_export.py @@ -2,9 +2,7 @@ Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. Refer to the `examples/training/fairseq` directory for more training details. """ -import argparse import torch -import h5py from export.proto.transformer_pb2 import Transformer from lightseq.training import ( export_ls_config, @@ -13,6 +11,7 @@ export_ls_decoder, ) import lightseq.inference as lsi +from export.fairseq.util import parse_args, save_model def _extract_weight(state_dict): @@ -26,7 +25,7 @@ def _extract_weight(state_dict): return encoder_state_dict, decoder_state_dict -def export_fs_weights(file, state_dict, save_pb=True): +def export_fs_weights(transformer, state_dict): enc_norm_w = state_dict["encoder.layer_norm.weight"].flatten().tolist() enc_norm_b = state_dict["encoder.layer_norm.bias"].flatten().tolist() dec_norm_w = state_dict["decoder.layer_norm.weight"].flatten().tolist() @@ -36,78 +35,52 @@ def export_fs_weights(file, state_dict, save_pb=True): .flatten() .tolist() ) - if save_pb: - file.src_embedding.norm_scale[:] = enc_norm_w - file.src_embedding.norm_bias[:] = enc_norm_b - file.trg_embedding.norm_scale[:] = dec_norm_w - file.trg_embedding.norm_bias[:] = dec_norm_b - file.trg_embedding.shared_bias[:] = dec_shared_b - else: - file.create_dataset("src_embedding/norm_scale", data=enc_norm_w, dtype="f4") - file.create_dataset("src_embedding/norm_bias", data=enc_norm_b, dtype="f4") - file.create_dataset("trg_embedding/norm_scale", data=dec_norm_w, dtype="f4") - file.create_dataset("trg_embedding/norm_bias", data=dec_norm_b, dtype="f4") - file.create_dataset("trg_embedding/shared_bias", data=dec_shared_b, dtype="f4") + transformer.src_embedding.norm_scale[:] = enc_norm_w + transformer.src_embedding.norm_bias[:] = enc_norm_b + transformer.trg_embedding.norm_scale[:] = dec_norm_w + transformer.trg_embedding.norm_bias[:] = dec_norm_b + transformer.trg_embedding.shared_bias[:] = dec_shared_b -def export_ls_fs_transformer(ckpt_path, out_path, save_pb=True): - with open(ckpt_path, "rb") as fin: +def export_ls_fs_transformer(model_path, pb_path, hdf5_path, hdf5): + with open(model_path, "rb") as fin: ckpt_file = torch.load(fin) args = ckpt_file["args"] state_dict = ckpt_file["model"] - if save_pb: - file = Transformer() - else: - file = h5py.File(out_path, "w") + transformer = Transformer() encoder_state_dict, decoder_state_dict = _extract_weight(state_dict) - export_ls_embedding(file, encoder_state_dict, 300, True, save_pb) - export_ls_embedding(file, decoder_state_dict, 300, False, save_pb) + export_ls_embedding(transformer, encoder_state_dict, 300, True, save_pb=True) + export_ls_embedding(transformer, decoder_state_dict, 300, False, save_pb=True) export_ls_encoder( - file, + transformer, encoder_state_dict, args.encoder_embed_dim, args.encoder_ffn_embed_dim, - save_pb, + save_pb=True, ) export_ls_decoder( - file, + transformer, decoder_state_dict, args.decoder_embed_dim, args.decoder_ffn_embed_dim, args.decoder_layers, - save_pb, + save_pb=True, ) - export_fs_weights(file, state_dict, save_pb) + export_fs_weights(transformer, state_dict) export_ls_config( - file, + transformer, args.encoder_attention_heads, 1, 2, 2, args.encoder_layers, args.decoder_layers, - save_pb=save_pb, + save_pb=True, ) - if save_pb: - with open(out_path, "wb") as fout: - fout.write(file.SerializeToString()) - else: - file.close() - - -def parse_args(): - parser = argparse.ArgumentParser(description="export fairseq checkpoint", usage="") - parser.add_argument( - "--model", - "-m", - type=str, - default="checkpoint_best.pt", - help="path of fairseq checkpoint", - ) - args = parser.parse_args() - return args + save_path = save_model(transformer, pb_path, hdf5_path, hdf5) + return save_path if __name__ == "__main__": @@ -115,15 +88,9 @@ def parse_args(): model_name = ".".join(args.model.split(".")[:-1]) pb_path = f"{model_name}.pb" hdf5_path = f"{model_name}.hdf5" - print("export to pb model >>>>>>") - export_ls_fs_transformer(args.model, pb_path) - print("export to hdf5 model >>>>>>") - export_ls_fs_transformer(args.model, hdf5_path, save_pb=False) + path = export_ls_fs_transformer(args.model, pb_path, hdf5_path, args.hdf5) src = [[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1, 1, 1]] - pb_model = lsi.Transformer(pb_path, 8) - pb_output = pb_model.infer(src) - hdf5_model = lsi.Transformer(hdf5_path, 8) - hdf5_output = hdf5_model.infer(src) + model = lsi.Transformer(path, 8) + output = model.infer(src) # Expected result: [23, 550, 34, 118, 148, 2939, 4, 42, 32, 37, 6, 224, 10, 179, 5, 2] - print("pb results:", pb_output) - print("hdf5 results:", hdf5_output) + print("results:", output) diff --git a/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py index 98c28ca7..5ff9c780 100644 --- a/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py @@ -3,7 +3,6 @@ to int8 protobuf format using post training quantization. Refer to the `examples/training/fairseq` directory for more training details. """ -import argparse import torch from export.proto.quant_transformer_pb2 import QuantTransformer from lightseq.training import ( @@ -13,6 +12,7 @@ export_ls_decoder_ptq, ) import lightseq.inference as lsi +from export.fairseq.util import parse_args, save_model # adjust this value to achieve better performance @@ -30,7 +30,7 @@ def _extract_weight(state_dict): return encoder_state_dict, decoder_state_dict -def export_fs_weights(file, state_dict, save_pb=True): +def export_fs_weights(transformer, state_dict): enc_norm_w = state_dict["encoder.layer_norm.weight"].flatten().tolist() enc_norm_b = state_dict["encoder.layer_norm.bias"].flatten().tolist() dec_norm_w = state_dict["decoder.layer_norm.weight"].flatten().tolist() @@ -40,89 +40,76 @@ def export_fs_weights(file, state_dict, save_pb=True): .flatten() .tolist() ) - file.src_embedding.norm_scale[:] = enc_norm_w - file.src_embedding.norm_bias[:] = enc_norm_b - file.trg_embedding.norm_scale[:] = dec_norm_w - file.trg_embedding.norm_bias[:] = dec_norm_b - file.trg_embedding.shared_bias[:] = dec_shared_b + transformer.src_embedding.norm_scale[:] = enc_norm_w + transformer.src_embedding.norm_bias[:] = enc_norm_b + transformer.trg_embedding.norm_scale[:] = dec_norm_w + transformer.trg_embedding.norm_bias[:] = dec_norm_b + transformer.trg_embedding.shared_bias[:] = dec_shared_b -def export_ls_fs_transformer_ptq(ckpt_path, out_path, save_pb=True): - with open(ckpt_path, "rb") as fin: +def export_ls_fs_transformer_ptq(model_path, pb_path, hdf5_path, hdf5): + with open(model_path, "rb") as fin: ckpt_file = torch.load(fin) args = ckpt_file["args"] state_dict = ckpt_file["model"] - file = QuantTransformer() + transformer = QuantTransformer() encoder_state_dict, decoder_state_dict = _extract_weight(state_dict) export_ls_embedding_ptq( - file, + transformer, encoder_state_dict, 300, True, - save_pb=save_pb, + save_pb=True, ) export_ls_embedding_ptq( - file, + transformer, decoder_state_dict, 300, False, - save_pb=save_pb, + save_pb=True, ) export_ls_encoder_ptq( - file, + transformer, encoder_state_dict, args.encoder_embed_dim, args.encoder_ffn_embed_dim, act_clip_max=global_act_clip_max, - save_pb=save_pb, + save_pb=True, ) export_ls_decoder_ptq( - file, + transformer, decoder_state_dict, args.decoder_embed_dim, args.decoder_ffn_embed_dim, args.decoder_layers, act_clip_max=global_act_clip_max, - save_pb=save_pb, + save_pb=True, ) - export_fs_weights(file, state_dict, save_pb) + export_fs_weights(transformer, state_dict) export_ls_config( - file, + transformer, args.encoder_attention_heads, 1, 2, 2, args.encoder_layers, args.decoder_layers, - save_pb=save_pb, + save_pb=True, ) - with open(out_path, "wb") as fout: - fout.write(file.SerializeToString()) - - -def parse_args(): - parser = argparse.ArgumentParser(description="export fairseq checkpoint", usage="") - parser.add_argument( - "--model", - "-m", - type=str, - default="checkpoint_best.pt", - help="path of fairseq checkpoint", - ) - args = parser.parse_args() - return args + save_path = save_model(transformer, pb_path, hdf5_path, hdf5) + return save_path if __name__ == "__main__": args = parse_args() model_name = ".".join(args.model.split(".")[:-1]) pb_path = f"{model_name}_ptq.pb" - print("export to pb model >>>>>>") - export_ls_fs_transformer_ptq(args.model, pb_path) + hdf5_path = f"{model_name}_ptq.hdf5" + path = export_ls_fs_transformer_ptq(args.model, pb_path, hdf5_path, args.hdf5) src = [[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1, 1, 1]] - pb_model = lsi.QuantTransformer(pb_path, 8) - pb_output = pb_model.infer(src) - # FP16 result: [23, 550, 34, 118, 148, 2939, 4, 42, 32, 37, 6, 224, 10, 179, 5, 2] - print("pb results:", pb_output) + model = lsi.QuantTransformer(path, 8) + output = model.infer(src) + # Expected result: [23, 550, 34, 118, 148, 2939, 4, 42, 32, 37, 6, 224, 10, 179, 5, 2] + print("results:", output) diff --git a/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py b/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py index 3a5702a7..f7b7b9c4 100644 --- a/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py +++ b/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py @@ -4,18 +4,17 @@ Refer to the `examples/training/fairseq` directory for more training details. """ from collections import OrderedDict -import argparse import torch -import tensorflow as tf from export.proto.quant_transformer_pb2 import QuantTransformer from lightseq.training.ops.pytorch.export import export_ls_config, apply_rule -from lightseq.training.ops.pytorch.export_ptq import ( +from lightseq.training.ops.pytorch.export_quant import ( gather_quant_token_embedding, quantize, ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi +from export.fairseq.util import parse_args, save_model enc_layer_mapping_dict = OrderedDict( @@ -148,8 +147,10 @@ def fill_quant_pb_layer(tensor_names, state_dict, layer, mapping_dict): def export_ls_torch_fs_quant_transformer( - model_dir, + model_path, pb_path, + hdf5_path, + hdf5, max_step=300, bos_id=2, eos_id=2, @@ -157,7 +158,7 @@ def export_ls_torch_fs_quant_transformer( ): transformer = QuantTransformer() # load var names - reloaded = torch.load(model_dir, "cpu") + reloaded = torch.load(model_path, "cpu") args = reloaded["args"] model_dict = reloaded["model"] @@ -305,31 +306,20 @@ def export_ls_torch_fs_quant_transformer( save_pb=True, ) - print("Writing to {0}".format(pb_path)) - with tf.io.gfile.GFile(pb_path, "wb") as fout: - fout.write(transformer.SerializeToString()) - - -def parse_args(): - parser = argparse.ArgumentParser(description="export fairseq checkpoint", usage="") - parser.add_argument( - "--model", - "-m", - type=str, - default="checkpoint_best.pt", - help="path of fairseq checkpoint", - ) - args = parser.parse_args() - return args + save_path = save_model(transformer, pb_path, hdf5_path, hdf5) + return save_path if __name__ == "__main__": args = parse_args() model_name = ".".join(args.model.split(".")[:-1]) pb_path = f"{model_name}.pb" - export_ls_torch_fs_quant_transformer(args.model, pb_path) + hdf5_path = f"{model_name}.hdf5" + path = export_ls_torch_fs_quant_transformer( + args.model, pb_path, hdf5_path, args.hdf5 + ) src = [[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1, 1, 1]] - pb_model = lsi.QuantTransformer(pb_path, 8) - pb_output = pb_model.infer(src) + model = lsi.QuantTransformer(path, 8) + output = model.infer(src) # Expected result: [23, 550, 34, 118, 148, 2939, 4, 42, 32, 37, 6, 224, 10, 179, 5, 2] - print("pb results:", pb_output) + print("results:", output) diff --git a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py index 37373098..cbee3c8d 100644 --- a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py +++ b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py @@ -4,10 +4,8 @@ Refer to the `examples/training/fairseq` directory for more training details. """ from collections import OrderedDict -import argparse import torch -import tensorflow as tf from export.proto.transformer_pb2 import Transformer from lightseq.training.ops.pytorch.export import ( gather_token_embedding, @@ -16,6 +14,7 @@ ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi +from export.fairseq.util import parse_args, save_model enc_layer_mapping_dict = OrderedDict( @@ -92,8 +91,10 @@ def _get_encode_output_mapping_dict(dec_layer_num): def export_ls_torch_fs_transformer( - model_dir, + model_path, pb_path, + hdf5_path, + hdf5, max_step=300, bos_id=2, eos_id=2, @@ -101,7 +102,7 @@ def export_ls_torch_fs_transformer( ): transformer = Transformer() # load var names - reloaded = torch.load(model_dir, "cpu") + reloaded = torch.load(model_path, "cpu") args = reloaded["args"] model_dict = reloaded["model"] @@ -230,31 +231,18 @@ def export_ls_torch_fs_transformer( save_pb=True, ) - print("Writing to {0}".format(pb_path)) - with tf.io.gfile.GFile(pb_path, "wb") as fout: - fout.write(transformer.SerializeToString()) - - -def parse_args(): - parser = argparse.ArgumentParser(description="export fairseq checkpoint", usage="") - parser.add_argument( - "--model", - "-m", - type=str, - default="checkpoint_best.pt", - help="path of fairseq checkpoint", - ) - args = parser.parse_args() - return args + save_path = save_model(transformer, pb_path, hdf5_path, hdf5) + return save_path if __name__ == "__main__": args = parse_args() model_name = ".".join(args.model.split(".")[:-1]) pb_path = f"{model_name}.pb" - export_ls_torch_fs_transformer(args.model, pb_path) + hdf5_path = f"{model_name}.hdf5" + path = export_ls_torch_fs_transformer(args.model, pb_path, hdf5_path, args.hdf5) src = [[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1, 1, 1]] - pb_model = lsi.Transformer(pb_path, 8) - pb_output = pb_model.infer(src) + model = lsi.Transformer(path, 8) + output = model.infer(src) # Expected result: [23, 550, 34, 118, 148, 2939, 4, 42, 32, 37, 6, 224, 10, 179, 5, 2] - print("pb results:", pb_output) + print("results:", output) diff --git a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py index 9e706409..1ed7a0e1 100644 --- a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py @@ -4,18 +4,17 @@ Refer to the `examples/training/fairseq` directory for more training details. """ from collections import OrderedDict -import argparse import torch -import tensorflow as tf from export.proto.quant_transformer_pb2 import QuantTransformer from lightseq.training.ops.pytorch.export import export_ls_config -from lightseq.training.ops.pytorch.export_ptq import ( +from lightseq.training.ops.pytorch.export_quant import ( gather_quant_token_embedding, fill_quant_pb_layer, ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi +from export.fairseq.util import parse_args, save_model # adjust this value to achieve better performance @@ -118,8 +117,10 @@ def _get_encode_output_mapping_dict(dec_layer_num): def export_ls_torch_fs_transformer_ptq( - model_dir, + model_path, pb_path, + hdf5_path, + hdf5, max_step=300, bos_id=2, eos_id=2, @@ -127,7 +128,7 @@ def export_ls_torch_fs_transformer_ptq( ): transformer = QuantTransformer() # load var names - reloaded = torch.load(model_dir, "cpu") + reloaded = torch.load(model_path, "cpu") args = reloaded["args"] model_dict = reloaded["model"] @@ -267,31 +268,18 @@ def export_ls_torch_fs_transformer_ptq( save_pb=True, ) - print("Writing to {0}".format(pb_path)) - with tf.io.gfile.GFile(pb_path, "wb") as fout: - fout.write(transformer.SerializeToString()) - - -def parse_args(): - parser = argparse.ArgumentParser(description="export fairseq checkpoint", usage="") - parser.add_argument( - "--model", - "-m", - type=str, - default="checkpoint_best.pt", - help="path of fairseq checkpoint", - ) - args = parser.parse_args() - return args + save_path = save_model(transformer, pb_path, hdf5_path, hdf5) + return save_path if __name__ == "__main__": args = parse_args() model_name = ".".join(args.model.split(".")[:-1]) pb_path = f"{model_name}_ptq.pb" - export_ls_torch_fs_transformer_ptq(args.model, pb_path) + hdf5_path = f"{model_name}_ptq.hdf5" + path = export_ls_torch_fs_transformer_ptq(args.model, pb_path, hdf5_path, args.hdf5) src = [[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1, 1, 1]] - pb_model = lsi.QuantTransformer(pb_path, 8) - pb_output = pb_model.infer(src) + model = lsi.QuantTransformer(path, 8) + output = model.infer(src) # Expected result: [23, 550, 34, 118, 148, 2939, 4, 42, 32, 37, 6, 224, 10, 179, 5, 2] - print("pb results:", pb_output) + print("results:", output) diff --git a/examples/inference/python/export/fairseq/native_fs_transformer_export.py b/examples/inference/python/export/fairseq/native_fs_transformer_export.py index 0b77fd19..fcc59234 100644 --- a/examples/inference/python/export/fairseq/native_fs_transformer_export.py +++ b/examples/inference/python/export/fairseq/native_fs_transformer_export.py @@ -3,20 +3,17 @@ Refer to the `examples/training/fairseq` directory for more training details. """ from collections import OrderedDict -import argparse import torch -import tensorflow as tf -import h5py from export.proto.transformer_pb2 import Transformer from lightseq.training.ops.pytorch.export import ( gather_token_embedding, fill_pb_layer, export_ls_config, - export_pb2hdf5, ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi +from export.fairseq.util import parse_args, save_model enc_layer_mapping_dict = OrderedDict( @@ -93,9 +90,10 @@ def _get_encode_output_mapping_dict(dec_layer_num): def export_native_fs_transformer( - model_dir, + model_path, pb_path, hdf5_path, + hdf5, max_step=300, bos_id=2, eos_id=2, @@ -103,7 +101,7 @@ def export_native_fs_transformer( ): transformer = Transformer() # load var names - reloaded = torch.load(model_dir, "cpu") + reloaded = torch.load(model_path, "cpu") args = reloaded["args"] model_dict = reloaded["model"] @@ -234,27 +232,8 @@ def export_native_fs_transformer( save_pb=True, ) - print("Writing to {0}".format(pb_path)) - with tf.io.gfile.GFile(pb_path, "wb") as fout: - fout.write(transformer.SerializeToString()) - - print("Writing to {0}".format(hdf5_path)) - f = h5py.File(hdf5_path, "w") - export_pb2hdf5(transformer, f) - f.close() - - -def parse_args(): - parser = argparse.ArgumentParser(description="export fairseq checkpoint", usage="") - parser.add_argument( - "--model", - "-m", - type=str, - default="checkpoint_best.pt", - help="path of fairseq checkpoint", - ) - args = parser.parse_args() - return args + save_path = save_model(transformer, pb_path, hdf5_path, hdf5) + return save_path if __name__ == "__main__": @@ -262,9 +241,9 @@ def parse_args(): model_name = ".".join(args.model.split(".")[:-1]) pb_path = f"{model_name}.pb" hdf5_path = f"{model_name}.hdf5" - export_native_fs_transformer(args.model, pb_path, hdf5_path) + path = export_native_fs_transformer(args.model, pb_path, hdf5_path, args.hdf5) src = [[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1, 1, 1]] - pb_model = lsi.Transformer(pb_path, 8) - pb_output = pb_model.infer(src) + model = lsi.Transformer(path, 8) + output = model.infer(src) # Expected result: [23, 550, 34, 118, 148, 2939, 4, 42, 32, 37, 6, 224, 10, 179, 5, 2] - print("pb results:", pb_output) + print("results:", output) diff --git a/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py index 7d9d7b1d..66749e70 100644 --- a/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py @@ -3,18 +3,17 @@ Refer to the `examples/training/fairseq` directory for more training details. """ from collections import OrderedDict -import argparse import torch -import tensorflow as tf from export.proto.quant_transformer_pb2 import QuantTransformer from lightseq.training.ops.pytorch.export import export_ls_config -from lightseq.training.ops.pytorch.export_ptq import ( +from lightseq.training.ops.pytorch.export_quant import ( gather_quant_token_embedding, fill_quant_pb_layer, ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi +from export.fairseq.util import parse_args, save_model # adjust this value to achieve better performance @@ -118,8 +117,10 @@ def _get_encode_output_mapping_dict(dec_layer_num): def export_native_fs_transformer( - model_dir, + model_path, pb_path, + hdf5_path, + hdf5, max_step=300, bos_id=2, eos_id=2, @@ -127,7 +128,7 @@ def export_native_fs_transformer( ): transformer = QuantTransformer() # load var names - reloaded = torch.load(model_dir, "cpu") + reloaded = torch.load(model_path, "cpu") args = reloaded["args"] model_dict = reloaded["model"] @@ -267,29 +268,16 @@ def export_native_fs_transformer( save_pb=True, ) - print("Writing to {0}".format(pb_path)) - with tf.io.gfile.GFile(pb_path, "wb") as fout: - fout.write(transformer.SerializeToString()) - - -def parse_args(): - parser = argparse.ArgumentParser(description="export fairseq checkpoint", usage="") - parser.add_argument( - "--model", - "-m", - type=str, - default="checkpoint_best.pt", - help="path of fairseq checkpoint", - ) - args = parser.parse_args() - return args + save_path = save_model(transformer, pb_path, hdf5_path, hdf5) + return save_path if __name__ == "__main__": args = parse_args() model_name = ".".join(args.model.split(".")[:-1]) pb_path = f"{model_name}_ptq.pb" - export_native_fs_transformer(args.model, pb_path) + hdf5_path = f"{model_name}_ptq.hdf5" + export_native_fs_transformer(args.model, pb_path, hdf5_path, args.hdf5) src = [[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1, 1, 1]] pb_model = lsi.QuantTransformer(pb_path, 8) pb_output = pb_model.infer(src) diff --git a/examples/inference/python/export/fairseq/util.py b/examples/inference/python/export/fairseq/util.py new file mode 100644 index 00000000..eb962543 --- /dev/null +++ b/examples/inference/python/export/fairseq/util.py @@ -0,0 +1,46 @@ +import argparse +import tensorflow as tf +import h5py + +from export.proto.transformer_pb2 import Transformer +from lightseq.training import export_pb2hdf5 +from lightseq.training import export_quant_pb2hdf5 + +def parse_args(): + parser = argparse.ArgumentParser(description="export fairseq checkpoint", usage="") + parser.add_argument( + "--model", + "-m", + type=str, + default="checkpoint_best.pt", + help="path of fairseq checkpoint", + ) + parser.add_argument( + "--hdf5", + "-hdf5", + action='store_true', + help="whether to store hdf5", + ) + args = parser.parse_args() + return args + + +def save_model(transformer, pb_path, hdf5_path, hdf5): + if not hdf5: + try: + str_model = transformer.SerializeToString() + print("Writing to {0}".format(pb_path)) + with tf.io.gfile.GFile(pb_path, "wb") as fout: + fout.write(str_model) + return pb_path + except: + pass + + print("Writing to {0}".format(hdf5_path)) + f = h5py.File(hdf5_path, "w") + if isinstance(transformer, Transformer): + export_pb2hdf5(transformer, f) + else: + export_quant_pb2hdf5(transformer, f) + f.close() + return hdf5_path \ No newline at end of file diff --git a/examples/inference/python/export/huggingface/hf_torch_quant_bert_export.py b/examples/inference/python/export/huggingface/hf_torch_quant_bert_export.py new file mode 100644 index 00000000..569367b2 --- /dev/null +++ b/examples/inference/python/export/huggingface/hf_torch_quant_bert_export.py @@ -0,0 +1,182 @@ +""" +Export Hugging Face BERT models to hdf5 format. +""" +import os +import h5py +import numpy as np +from collections import OrderedDict +from transformers import BertModel +from lightseq.training.ops.pytorch.export import fill_hdf5_layer + +os.environ["CUDA_VISIBLE_DEVICES"] = "-1" + + +""" +For the mapping dictionary: key is the value of the proto parameter, +value is a powerful expression, each && split tensor name of the matching path or expression. + +The sub-pattern of the path is separated by spaces, and the expression starts with a expression_. +You can operate separately on each tensor and support multiple expressions. Multiple matching paths +and the expression will finally be concatenated on axis = -1. +""" +enc_layer_mapping_dict = OrderedDict( + { + # BERT is post_layernorm + # NOTE: add an additional "final" at the beginning for some weight + # to distinguish them from "attention output *" + "multihead_norm_scale": "attention output LayerNorm weight", + "multihead_norm_bias": "attention output LayerNorm bias", + "multihead_project_kernel_qkv": "attention self query weight&&attention self key weight&&attention self value weight&&expression_.transpose(0, 1)", + "multihead_project_bias_qkv": "attention self query bias&&attention self key bias&&attention self value bias", + "multihead_project_kernel_output": "attention output dense weight&&expression_.transpose(0, 1)", + "multihead_project_bias_output": "attention output dense bias", + "ffn_norm_scale": "final output LayerNorm weight", + "ffn_norm_bias": "final output LayerNorm bias", + "ffn_first_kernel": "intermediate dense weight&&expression_.transpose(0, 1)", + "ffn_first_bias": "intermediate dense bias", + "ffn_second_kernel": "final output dense weight&&expression_.transpose(0, 1)", + "ffn_second_bias": "final output dense bias", + } +) + +src_emb_mapping_dict = OrderedDict( + { + "norm_scale": "embeddings LayerNorm weight", + "norm_bias": "embeddings LayerNorm bias", + "position_embedding": "embeddings position_embeddings weight", + # manually process token_embedding due to "token_type_embeddings" + # "token_embedding": "embeddings word_embeddings weight", + } +) + + +def extract_bert_weights( + output_file, + model_dir, + head_num, + pad_id=0, + max_step=50, +): + # load var names + encoder_state_dict = BertModel.from_pretrained(model_dir).state_dict() + + # Insert additional "final" to some weight to prevent ambiguous match + def _insert_final(key): + l = key.split(".") + l.insert(3, "final") + return ".".join(l) + + encoder_state_dict = OrderedDict( + [ + (_insert_final(k), v) + if len(k.split(".")) > 3 and k.split(".")[3] == "output" + else (k, v) + for k, v in encoder_state_dict.items() + ] + ) + + enc_var_name_list = list(encoder_state_dict.keys()) + + # initialize output file + output_file += ".hdf5" + print("Saving model to hdf5...") + print("Writing to {0}".format(output_file)) + hdf5_file = h5py.File(output_file, "w") + + # fill each encoder layer's params + enc_tensor_names = {} + for name in enc_var_name_list: + name_split = name.split(".") + if len(name_split) <= 2 or not name_split[2].isdigit(): + continue + layer_id = int(name_split[2]) + enc_tensor_names.setdefault(layer_id, []).append(name) + + # fill encoder_stack + for layer_id in sorted(enc_tensor_names.keys()): + fill_hdf5_layer( + enc_tensor_names[layer_id], + encoder_state_dict, + hdf5_file, + f"encoder_stack/{layer_id}/", + enc_layer_mapping_dict, + ) + + # fill src_embedding - except for position embedding + fill_hdf5_layer( + enc_var_name_list, + encoder_state_dict, + hdf5_file, + "src_embedding/", + src_emb_mapping_dict, + ) + + # handling token_embeddings for BERT + token_embedding = ( + encoder_state_dict["embeddings.word_embeddings.weight"] + + encoder_state_dict["embeddings.token_type_embeddings.weight"][0] + ) + print(f"processed token_embedding, shape: {token_embedding.shape}") + token_embedding = token_embedding.flatten().tolist() + hdf5_file.create_dataset( + "src_embedding/token_embedding", data=token_embedding, dtype="f4" + ) + + # save number of layers metadata + hdf5_file.create_dataset( + "model_conf/n_encoder_stack", data=len(enc_tensor_names), dtype="i4" + ) + # fill in model_conf + hdf5_file.create_dataset("model_conf/head_num", data=head_num, dtype="i4") + hdf5_file.create_dataset("model_conf/src_padding_id", data=pad_id, dtype="i4") + hdf5_file.create_dataset("model_conf/is_post_ln", data=True, dtype="?") + hdf5_file.create_dataset("model_conf/use_gelu", data=True, dtype="?") + + # Move layernorm weights to match layernorm implementation in lightseq + tmp_scale, tmp_bias = ( + hdf5_file["src_embedding/norm_scale"][()], + hdf5_file["src_embedding/norm_bias"][()], + ) + for layer_id in sorted(enc_tensor_names.keys()): + new_tmp_scale = hdf5_file[f"encoder_stack/{layer_id}/multihead_norm_scale"][()] + new_tmp_bias = hdf5_file[f"encoder_stack/{layer_id}/multihead_norm_bias"][()] + hdf5_file[f"encoder_stack/{layer_id}/multihead_norm_scale"][()] = tmp_scale + hdf5_file[f"encoder_stack/{layer_id}/multihead_norm_bias"][()] = tmp_bias + tmp_scale, tmp_bias = new_tmp_scale, new_tmp_bias + + new_tmp_scale = hdf5_file[f"encoder_stack/{layer_id}/ffn_norm_scale"][()] + new_tmp_bias = hdf5_file[f"encoder_stack/{layer_id}/ffn_norm_bias"][()] + hdf5_file[f"encoder_stack/{layer_id}/ffn_norm_scale"][()] = tmp_scale + hdf5_file[f"encoder_stack/{layer_id}/ffn_norm_bias"][()] = tmp_bias + tmp_scale, tmp_bias = new_tmp_scale, new_tmp_bias + hdf5_file["src_embedding/norm_scale"][()] = tmp_scale + hdf5_file["src_embedding/norm_bias"][()] = tmp_bias + + hdf5_file.close() + # read-in again to double check + hdf5_file = h5py.File(output_file, "r") + + def _print_pair(key, value): + if key == "sampling_method": + value = "".join(map(chr, value[()])) + else: + value = value[()] + print(f"{key}: {value}") + + list(map(lambda x: _print_pair(*x), hdf5_file["model_conf"].items())) + + +if __name__ == "__main__": + output_lightseq_model_name = "lightseq_bert_base_uncased" + input_huggingface_bert_model = "bert-base-uncased" + head_number = 12 + + pad_id = 0 + max_step = 50 + extract_bert_weights( + output_lightseq_model_name, + input_huggingface_bert_model, + head_num=head_number, + pad_id=pad_id, + max_step=max_step, + ) diff --git a/lightseq/training/__init__.py b/lightseq/training/__init__.py index 8059c730..16728a3d 100644 --- a/lightseq/training/__init__.py +++ b/lightseq/training/__init__.py @@ -27,8 +27,9 @@ export_pb2hdf5, ) -from lightseq.training.ops.pytorch.export_ptq import ( +from lightseq.training.ops.pytorch.export_quant import ( export_ls_embedding_ptq, export_ls_encoder_ptq, export_ls_decoder_ptq, + export_quant_pb2hdf5, ) diff --git a/lightseq/training/ops/pytorch/export.py b/lightseq/training/ops/pytorch/export.py index 0a692c3c..d8dac8e0 100644 --- a/lightseq/training/ops/pytorch/export.py +++ b/lightseq/training/ops/pytorch/export.py @@ -361,7 +361,7 @@ def export_ls_config( def export_pb2hdf5(transformer, f): - """Convert bart protobuf to hdf5 format to support larger weight.""" + """Convert Transformer protobuf to hdf5 format to support larger weight.""" MODEL_CONF_KEYS = [ # model_conf "head_num", diff --git a/lightseq/training/ops/pytorch/export_ptq.py b/lightseq/training/ops/pytorch/export_quant.py similarity index 67% rename from lightseq/training/ops/pytorch/export_ptq.py rename to lightseq/training/ops/pytorch/export_quant.py index 2ae90314..f0f4e7ae 100644 --- a/lightseq/training/ops/pytorch/export_ptq.py +++ b/lightseq/training/ops/pytorch/export_quant.py @@ -318,3 +318,165 @@ def export_ls_decoder_ptq( enc_out_mapping_dict, nlayer, ) + + +def export_quant_pb2hdf5(transformer, f): + """Convert QuantTransformer protobuf to hdf5 format to support larger weight.""" + MODEL_CONF_KEYS = [ + # model_conf + "head_num", + "beam_size", + "extra_decode_length", + "length_penalty", + "src_padding_id", + "trg_start_id", + "diverse_lambda", + "sampling_method", + "topp", + "topk", + "trg_end_id", + "is_post_ln", + "no_scale_embedding", + "use_gelu", + "multilg_type", + ] + + EMBEDDING_KEYS = [ + # src_embedding + # trg_embedding + "token_embedding", + "position_embedding", + "norm_scale", + "norm_bias", + "encode_output_project_kernel_kv", + "encode_output_project_bias_kv", + "shared_bias", + "lang_emb", + "trg_vocab_mask", + "output_ln_clip_max", + "logits_clip_max", + ] + + ENCODER_LAYER_KEYS = [ + # encoder_stack/{i} + "multihead_norm_scale", + "multihead_norm_bias", + "multihead_project_kernel_qkv", + "multihead_project_bias_qkv", + "multihead_project_kernel_output", + "multihead_project_bias_output", + "ffn_norm_scale", + "ffn_norm_bias", + "ffn_first_kernel", + "ffn_first_bias", + "ffn_second_kernel", + "ffn_second_bias", + "multihead_ln_clip_max", + "multihead_project_output_clip_max", + "ffn_ln_clip_max", + "ffn_first_act_clip_max", + "multihead_qkv_dense_clip_max", + "multihead_output_dense_clip_max", + "ffn_first_output_clip_max", + ] + + DECODER_LAYER_KEYS = [ + # decoder_stack/{i} + "self_norm_scale", + "self_norm_bias", + "self_project_kernel_qkv", + "self_project_bias_qkv", + "self_project_kernel_output", + "self_project_bias_output", + "encdec_norm_scale", + "encdec_norm_bias", + "encdec_project_kernel_q", + "encdec_project_bias_q", + "encdec_project_kernel_output", + "encdec_project_bias_output", + "ffn_norm_scale", + "ffn_norm_bias", + "ffn_first_kernel", + "ffn_first_bias", + "ffn_second_kernel", + "ffn_second_bias", + "self_ln_clip_max", + "self_project_output_clip_max", + "encdec_ln_clip_max", + "encdec_project_output_clip_max", + "ffn_ln_clip_max", + "ffn_first_act_clip_max", + "self_qkv_dense_clip_max", + "self_output_dense_clip_max", + "encdec_q_dense_clip_max", + "encdec_output_dense_clip_max", + "ffn_first_output_clip_max", + "self_qkv_bias_out_clip_max", + ] + base_attr_to_keys = { + "src_embedding": EMBEDDING_KEYS, + "trg_embedding": EMBEDDING_KEYS, + "model_conf": MODEL_CONF_KEYS, + } + + from operator import attrgetter + + print(f"start converting protobuf to hdf5 format.") + # load src_embedding, trg_embedding, model_conf + for base_attr, keys in base_attr_to_keys.items(): + for key in keys: + hdf5_key = f"{base_attr}/{key}" + proto_attr = f"{base_attr}.{key}" + + if key not in dir(attrgetter(base_attr)(transformer)): + print(f"key {key} not found in {base_attr}, skipping") + continue + + print(f"loading transformer {proto_attr} -> {hdf5_key}") + _data = attrgetter(proto_attr)(transformer) + if type(_data) is str: + print( + f"find type str, explicitly convert string to ascii encoded array." + ) + # explict convert to array of char (int8) to avoid issues on string reading in C + _data = np.array([ord(c) for c in _data]).astype(np.int8) + elif type(_data) is bytes: + print( + f"find type bytes, explicitly convert bytes to unsigned int8 array." + ) + _data = np.array(bytearray(_data)).astype(np.ubyte) + f.create_dataset(hdf5_key, data=_data) + + # save number of layers metadata + f.create_dataset("model_conf/n_encoder_stack", data=len(transformer.encoder_stack)) + f.create_dataset("model_conf/n_decoder_stack", data=len(transformer.decoder_stack)) + + # load encoder_stack + for layer_id, layer in enumerate(transformer.encoder_stack): + for key in ENCODER_LAYER_KEYS: + hdf5_key = f"encoder_stack/{layer_id}/{key}" + proto_attr = key + print(f"loading transformer.encoder_stack {proto_attr} -> {hdf5_key}") + _data = attrgetter(proto_attr)(layer) + if type(_data) is bytes: + print( + f"find type bytes, explicitly convert bytes to unsigned int8 array." + ) + _data = np.array(bytearray(_data)).astype(np.ubyte) + f.create_dataset(hdf5_key, data=_data) + + # load decoder_stack + for layer_id, layer in enumerate(transformer.decoder_stack): + for key in DECODER_LAYER_KEYS: + hdf5_key = f"decoder_stack/{layer_id}/{key}" + proto_attr = key + print(f"loading transformer.decoder_stack {proto_attr} -> {hdf5_key}") + _data = attrgetter(proto_attr)(layer) + if type(_data) is bytes: + print( + f"find type bytes, explicitly convert bytes to unsigned int8 array." + ) + _data = np.array(bytearray(_data)).astype(np.ubyte) + f.create_dataset(hdf5_key, data=_data) + + print(f"proto to hdf5 conversion completed.") From ebe1071affe43236f184fadc1cf9d735e81fe056 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 23 Mar 2022 20:33:49 +0800 Subject: [PATCH 13/49] quant hdf5 load (stage 1) --- .../native_fs_transformer_ptq_export.py | 9 +-- .../inference/python/export/fairseq/util.py | 5 +- .../proto/quant_transformer_weight.cc | 57 +++++++++++++++++-- lightseq/training/ops/pytorch/export_quant.py | 12 ++++ 4 files changed, 72 insertions(+), 11 deletions(-) diff --git a/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py index 66749e70..d97436a3 100644 --- a/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py @@ -277,9 +277,10 @@ def export_native_fs_transformer( model_name = ".".join(args.model.split(".")[:-1]) pb_path = f"{model_name}_ptq.pb" hdf5_path = f"{model_name}_ptq.hdf5" - export_native_fs_transformer(args.model, pb_path, hdf5_path, args.hdf5) + path = export_native_fs_transformer(args.model, pb_path, hdf5_path, args.hdf5) src = [[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1, 1, 1]] - pb_model = lsi.QuantTransformer(pb_path, 8) - pb_output = pb_model.infer(src) + src = [[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1, 1, 1]] + model = lsi.QuantTransformer(path, 8) + output = model.infer(src) # Expected result: [23, 550, 34, 118, 148, 2939, 4, 42, 32, 37, 6, 224, 10, 179, 5, 2] - print("pb results:", pb_output) + print("results:", output) diff --git a/examples/inference/python/export/fairseq/util.py b/examples/inference/python/export/fairseq/util.py index eb962543..270508b3 100644 --- a/examples/inference/python/export/fairseq/util.py +++ b/examples/inference/python/export/fairseq/util.py @@ -6,6 +6,7 @@ from lightseq.training import export_pb2hdf5 from lightseq.training import export_quant_pb2hdf5 + def parse_args(): parser = argparse.ArgumentParser(description="export fairseq checkpoint", usage="") parser.add_argument( @@ -18,7 +19,7 @@ def parse_args(): parser.add_argument( "--hdf5", "-hdf5", - action='store_true', + action="store_true", help="whether to store hdf5", ) args = parser.parse_args() @@ -43,4 +44,4 @@ def save_model(transformer, pb_path, hdf5_path, hdf5): else: export_quant_pb2hdf5(transformer, f) f.close() - return hdf5_path \ No newline at end of file + return hdf5_path diff --git a/lightseq/inference/proto/quant_transformer_weight.cc b/lightseq/inference/proto/quant_transformer_weight.cc index dd03a082..7cf2349f 100644 --- a/lightseq/inference/proto/quant_transformer_weight.cc +++ b/lightseq/inference/proto/quant_transformer_weight.cc @@ -32,6 +32,13 @@ __inline__ float dequantize(unsigned char i, float scale, float clip_max) { return (float(i) - scale) * clip_max / scale; } +void copy_i8_to_float(std::vector &i8, std::vector &f, + float clip_max, float quant_range, int start, int num) { + for (int i = start; i < start + num; ++i) { + f[i] = dequantize(i8[i], quant_range, clip_max); + } +} + /** Read model config stored in custom proto file. */ @@ -649,16 +656,32 @@ void QuantTransformerWeight::hdf5_parse_emb_wei(hid_t hdf5_file, std::vector offset; std::vector value(value_size); // preallocate vector for performance + std::vector value_i8(value_size); std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) << " MB of embedding weight." << std::endl; int idx = 0; + float clip_max; offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/token_embedding", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/token_embedding", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != vocab_size * _hidden_size; }, "Wrong token_embedding_size !"); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/emb_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + vocab_size * _hidden_size); + if (source == "src") + _src_emb_clip_max = clip_max; + else { + _trg_emb_clip_max = clip_max; + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/output_ln_clip_max", + H5T_NATIVE_FLOAT, &_output_ln_clip_max); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/logits_clip_max", + H5T_NATIVE_FLOAT, &_logits_clip_max); + } + idx += vocab_size * _hidden_size; offset.push_back(idx); @@ -695,11 +718,23 @@ void QuantTransformerWeight::hdf5_parse_emb_wei(hid_t hdf5_file, offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/encode_output_project_kernel_kv", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size * 2 * _n_dec_layer; }, "Wrong encode_output_project_kernel_kv_size !"); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/encode_output_project_kernel_kv_clip_max", + H5T_NATIVE_FLOAT, _encode_output_project_kernel_kv_clip_max.data(), + [=](int size) { return size != _n_dec_layer; }, + "Wrong encode_output_project_kernel_kv_clip_max_size !"); + for (int i = 0; i < _n_dec_layer; ++i) { + copy_i8_to_float(value_i8, value, + _encode_output_project_kernel_kv_clip_max[i], + _quant_range, idx + _hidden_size * _hidden_size * 2 * i, + _hidden_size * _hidden_size * 2); + } + idx += _hidden_size * _hidden_size * 2 * _n_dec_layer; offset.push_back(idx); @@ -763,9 +798,11 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { _n_enc_layer; std::vector offset; std::vector value(value_size); + std::vector value_i8(value_size); std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) << " MB of encoder weight." << std::endl; + float clip_max; int idx = 0; for (int layer_id = 0; layer_id < _n_enc_layer; ++layer_id) { std::string dataset_prefix = "encoder_stack/" + std::to_string(layer_id); @@ -787,9 +824,14 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size * 3; }, "Wrong multihead_project_kernel_qkv_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size * 3); idx += _hidden_size * _hidden_size * 3; offset.push_back(idx); @@ -803,9 +845,14 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/multihead_project_kernel_output", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size; }, "Wrong multihead_project_kernel_output_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_kernel_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size); idx += _hidden_size * _hidden_size; offset.push_back(idx); diff --git a/lightseq/training/ops/pytorch/export_quant.py b/lightseq/training/ops/pytorch/export_quant.py index f0f4e7ae..17f56715 100644 --- a/lightseq/training/ops/pytorch/export_quant.py +++ b/lightseq/training/ops/pytorch/export_quant.py @@ -353,6 +353,8 @@ def export_quant_pb2hdf5(transformer, f): "shared_bias", "lang_emb", "trg_vocab_mask", + "emb_clip_max", + "encode_output_project_kernel_kv_clip_max", "output_ln_clip_max", "logits_clip_max", ] @@ -371,6 +373,10 @@ def export_quant_pb2hdf5(transformer, f): "ffn_first_bias", "ffn_second_kernel", "ffn_second_bias", + "multihead_project_kernel_qkv_clip_max", + "multihead_project_kernel_output_clip_max", + "ffn_first_kernel_clip_max", + "ffn_second_kernel_clip_max", "multihead_ln_clip_max", "multihead_project_output_clip_max", "ffn_ln_clip_max", @@ -400,6 +406,12 @@ def export_quant_pb2hdf5(transformer, f): "ffn_first_bias", "ffn_second_kernel", "ffn_second_bias", + "self_project_kernel_qkv_clip_max", + "self_project_kernel_output_clip_max", + "encdec_project_kernel_q_clip_max", + "encdec_project_kernel_output_clip_max", + "ffn_first_kernel_clip_max", + "ffn_second_kernel_clip_max", "self_ln_clip_max", "self_project_output_clip_max", "encdec_ln_clip_max", From ee296bb85035db71d5f7349216ef02fabf161511 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 24 Mar 2022 01:40:06 +0800 Subject: [PATCH 14/49] quant hdf5 transformer finished --- .../proto/quant_transformer_weight.cc | 151 ++++++++++++++++-- 1 file changed, 139 insertions(+), 12 deletions(-) diff --git a/lightseq/inference/proto/quant_transformer_weight.cc b/lightseq/inference/proto/quant_transformer_weight.cc index 7cf2349f..baa819c3 100644 --- a/lightseq/inference/proto/quant_transformer_weight.cc +++ b/lightseq/inference/proto/quant_transformer_weight.cc @@ -723,6 +723,7 @@ void QuantTransformerWeight::hdf5_parse_emb_wei(hid_t hdf5_file, return size != _hidden_size * _hidden_size * 2 * _n_dec_layer; }, "Wrong encode_output_project_kernel_kv_size !"); + _encode_output_project_kernel_kv_clip_max.resize(_n_dec_layer); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/encode_output_project_kernel_kv_clip_max", H5T_NATIVE_FLOAT, _encode_output_project_kernel_kv_clip_max.data(), @@ -832,6 +833,7 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { H5T_NATIVE_FLOAT, &clip_max); copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _hidden_size * 3); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size * 3; offset.push_back(idx); @@ -853,6 +855,7 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { H5T_NATIVE_FLOAT, &clip_max); copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _hidden_size); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; offset.push_back(idx); @@ -879,10 +882,16 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _hidden_size * _inner_size; }, "Wrong ffn_first_kernel_size !"); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_kernel_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _inner_size); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -894,10 +903,16 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _hidden_size * _inner_size; }, "Wrong ffn_second_kernel_size !"); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_second_kernel_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _inner_size); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -907,6 +922,34 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { "Wrong ffn_second_bias_size !"); idx += _hidden_size; + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/multihead_ln_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/ffn_ln_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_act_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/multihead_qkv_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_output_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + _enc_clip_max.push_back(0.0); } // for std::vector<_DataType> raw_value; @@ -933,9 +976,11 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { _n_dec_layer; std::vector offset; std::vector value(value_size); + std::vector value_i8(value_size); std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) << " MB of decoder weight." << std::endl; int idx = 0; + float clip_max; for (int layer_id = 0; layer_id < _n_dec_layer; ++layer_id) { std::string dataset_prefix = "decoder_stack/" + std::to_string(layer_id); @@ -957,9 +1002,15 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/self_project_kernel_qkv", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size * 3; }, "Wrong self_project_kernel_qkv_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/self_project_kernel_qkv_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size * 3); + _dec_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size * 3; offset.push_back(idx); @@ -972,9 +1023,15 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/self_project_kernel_output", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size; }, "Wrong self_project_kernel_output_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/self_project_kernel_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size); + _dec_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; offset.push_back(idx); @@ -1002,9 +1059,15 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/encdec_project_kernel_q", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size; }, "Wrong encdec_project_kernel_q_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/encdec_project_kernel_q_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size); + _dec_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; offset.push_back(idx); @@ -1017,9 +1080,15 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/encdec_project_kernel_output", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size; }, "Wrong encdec_project_kernel_output_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/encdec_project_kernel_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size); + _dec_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; offset.push_back(idx); @@ -1046,10 +1115,16 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _hidden_size * _inner_size; }, "Wrong ffn_first_kernel_size !"); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_kernel_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _inner_size); + _dec_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -1061,10 +1136,16 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _hidden_size * _inner_size; }, "Wrong ffn_second_kernel_size !"); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_second_kernel_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _inner_size); + _dec_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -1074,6 +1155,52 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { "Wrong ffn_second_bias_size !"); idx += _hidden_size; + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/self_ln_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/self_project_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/encdec_ln_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/encdec_project_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/ffn_ln_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_act_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/self_qkv_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/self_output_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/encdec_q_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/encdec_output_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); + _dec_clip_max.push_back(0.0); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/self_qkv_bias_out_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _dec_clip_max.push_back(clip_max); } // for std::vector<_DataType> raw_value; From b9648320286cf9ad3638d8133207189f86d4b80a Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 24 Mar 2022 02:35:03 +0800 Subject: [PATCH 15/49] fix fairseq infer bug --- lightseq/training/cli/lightseq_infer_cli.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lightseq/training/cli/lightseq_infer_cli.py b/lightseq/training/cli/lightseq_infer_cli.py index 6283f926..52873664 100644 --- a/lightseq/training/cli/lightseq_infer_cli.py +++ b/lightseq/training/cli/lightseq_infer_cli.py @@ -98,10 +98,12 @@ def _main(args, output_file): ) # Initialize LightSeq model + # NOTE: QuantTransformer can not load float models, but Transformer can load int8 models. + # So QuantTransformer must be initialized first. try: - model = lsi.Transformer(args.path, args.batch_size) - except: model = lsi.QuantTransformer(args.path, args.batch_size) + except: + model = lsi.Transformer(args.path, args.batch_size) gen_timer = StopwatchMeter() From 5f52dd6e1b7cbf19222c758455db3159fbce506e Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 24 Mar 2022 16:56:50 +0800 Subject: [PATCH 16/49] export quant beert, delete hf quant pos emb --- ...rt.py => ls_torch_hf_quant_bert_export.py} | 135 +++++++++++------- .../ops/pytorch/torch_transformer_layers.py | 2 - 2 files changed, 81 insertions(+), 56 deletions(-) rename examples/inference/python/export/huggingface/{hf_torch_quant_bert_export.py => ls_torch_hf_quant_bert_export.py} (53%) diff --git a/examples/inference/python/export/huggingface/hf_torch_quant_bert_export.py b/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py similarity index 53% rename from examples/inference/python/export/huggingface/hf_torch_quant_bert_export.py rename to examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py index 569367b2..6b3419eb 100644 --- a/examples/inference/python/export/huggingface/hf_torch_quant_bert_export.py +++ b/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py @@ -3,10 +3,13 @@ """ import os import h5py -import numpy as np from collections import OrderedDict -from transformers import BertModel -from lightseq.training.ops.pytorch.export import fill_hdf5_layer +import numpy as np + +import torch +from lightseq.training.ops.pytorch.export import apply_rule +from lightseq.training.ops.pytorch.export_quant import quantize +from export.fairseq.util import parse_args os.environ["CUDA_VISIBLE_DEVICES"] = "-1" @@ -22,20 +25,31 @@ enc_layer_mapping_dict = OrderedDict( { # BERT is post_layernorm - # NOTE: add an additional "final" at the beginning for some weight - # to distinguish them from "attention output *" - "multihead_norm_scale": "attention output LayerNorm weight", - "multihead_norm_bias": "attention output LayerNorm bias", - "multihead_project_kernel_qkv": "attention self query weight&&attention self key weight&&attention self value weight&&expression_.transpose(0, 1)", - "multihead_project_bias_qkv": "attention self query bias&&attention self key bias&&attention self value bias", - "multihead_project_kernel_output": "attention output dense weight&&expression_.transpose(0, 1)", - "multihead_project_bias_output": "attention output dense bias", - "ffn_norm_scale": "final output LayerNorm weight", - "ffn_norm_bias": "final output LayerNorm bias", - "ffn_first_kernel": "intermediate dense weight&&expression_.transpose(0, 1)", - "ffn_first_bias": "intermediate dense bias", - "ffn_second_kernel": "final output dense weight&&expression_.transpose(0, 1)", - "ffn_second_bias": "final output dense bias", + "multihead_norm_scale": "self_attn_layer_norm weight", + "multihead_norm_bias": "self_attn_layer_norm bias", + "multihead_project_kernel_qkv": "self_attn qkv_proj weight&&expression_.transpose(0, 1)", + "multihead_project_bias_qkv": "self_attn qkv_proj bias", + "multihead_project_kernel_output": "self_attn out_proj weight&&expression_.transpose(0, 1)", + "multihead_project_bias_output": "self_attn out_proj bias", + "ffn_norm_scale": "final_layer_norm weight", + "ffn_norm_bias": "final_layer_norm bias", + "ffn_first_kernel": "fc1 weight&&expression_.transpose(0, 1)", + "ffn_first_bias": "fc1 bias", + "ffn_second_kernel": "fc2 weight&&expression_.transpose(0, 1)", + "ffn_second_bias": "fc2 bias", + # weight_clip_max + "multihead_project_kernel_qkv_clip_max": "self_attn qkv_proj weight_quant clip_value_max", + "multihead_project_kernel_output_clip_max": "self_attn out_proj weight_quant clip_value_max", + "ffn_first_kernel_clip_max": "fc1 weight_quant clip_value_max", + "ffn_second_kernel_clip_max": "fc2 weight_quant clip_value_max", + # act_clip_max + "multihead_ln_clip_max": "self_attn qkv_proj input_quant clip_value_max", + "multihead_project_output_clip_max": "self_attn out_proj input_quant clip_value_max", + "ffn_ln_clip_max": "fc1 input_quant clip_value_max", + "ffn_first_act_clip_max": "fc2 input_quant clip_value_max", + "multihead_qkv_dense_clip_max": "self_attn qkv_proj output_quant clip_value_max", + "multihead_output_dense_clip_max": "self_attn out_proj output_quant clip_value_max", + "ffn_first_output_clip_max": "fc1 output_quant clip_value_max", } ) @@ -44,12 +58,26 @@ "norm_scale": "embeddings LayerNorm weight", "norm_bias": "embeddings LayerNorm bias", "position_embedding": "embeddings position_embeddings weight", - # manually process token_embedding due to "token_type_embeddings" - # "token_embedding": "embeddings word_embeddings weight", } ) +def fill_quant_hdf5_layer( + tensor_names, state_dict, hdf5_file, hdf5_dataset_prefix, mapping_dict +): + for proto_name, ckpt_rule in mapping_dict.items(): + target_tensor = apply_rule(proto_name, ckpt_rule, tensor_names, state_dict) + if proto_name.endswith("_clip_max"): + hdf5_file.create_dataset( + hdf5_dataset_prefix + proto_name, data=float(target_tensor[0]) + ) + else: + hdf5_file.create_dataset( + hdf5_dataset_prefix + proto_name, + data=target_tensor, + ) + + def extract_bert_weights( output_file, model_dir, @@ -58,54 +86,44 @@ def extract_bert_weights( max_step=50, ): # load var names - encoder_state_dict = BertModel.from_pretrained(model_dir).state_dict() - - # Insert additional "final" to some weight to prevent ambiguous match - def _insert_final(key): - l = key.split(".") - l.insert(3, "final") - return ".".join(l) - - encoder_state_dict = OrderedDict( - [ - (_insert_final(k), v) - if len(k.split(".")) > 3 and k.split(".")[3] == "output" - else (k, v) - for k, v in encoder_state_dict.items() - ] - ) + state_dict = torch.load(model_dir, "cpu") + + var_name_list = list(state_dict.keys()) - enc_var_name_list = list(encoder_state_dict.keys()) + for name in var_name_list: + if name.endswith("weight_quant.clip.clip_value_max"): + state_dict[name[:-26]] = torch.Tensor( + quantize(state_dict[name[:-26]].numpy(), 127, state_dict[name].numpy()) + ).to(torch.uint8) # initialize output file - output_file += ".hdf5" print("Saving model to hdf5...") print("Writing to {0}".format(output_file)) hdf5_file = h5py.File(output_file, "w") # fill each encoder layer's params enc_tensor_names = {} - for name in enc_var_name_list: + for name in var_name_list: name_split = name.split(".") - if len(name_split) <= 2 or not name_split[2].isdigit(): + if len(name_split) <= 3 or not name_split[3].isdigit(): continue - layer_id = int(name_split[2]) + layer_id = int(name_split[3]) enc_tensor_names.setdefault(layer_id, []).append(name) # fill encoder_stack for layer_id in sorted(enc_tensor_names.keys()): - fill_hdf5_layer( + fill_quant_hdf5_layer( enc_tensor_names[layer_id], - encoder_state_dict, + state_dict, hdf5_file, f"encoder_stack/{layer_id}/", enc_layer_mapping_dict, ) # fill src_embedding - except for position embedding - fill_hdf5_layer( - enc_var_name_list, - encoder_state_dict, + fill_quant_hdf5_layer( + var_name_list, + state_dict, hdf5_file, "src_embedding/", src_emb_mapping_dict, @@ -113,13 +131,21 @@ def _insert_final(key): # handling token_embeddings for BERT token_embedding = ( - encoder_state_dict["embeddings.word_embeddings.weight"] - + encoder_state_dict["embeddings.token_type_embeddings.weight"][0] + state_dict["bert.embeddings.word_embeddings.weight"] + + state_dict["bert.embeddings.token_type_embeddings.weight"][0] + ) + token_embedding = quantize( + token_embedding.numpy(), + 127, + state_dict["bert.embeddings.emb_quant.clip.clip_value_max"].numpy(), ) print(f"processed token_embedding, shape: {token_embedding.shape}") - token_embedding = token_embedding.flatten().tolist() hdf5_file.create_dataset( - "src_embedding/token_embedding", data=token_embedding, dtype="f4" + "src_embedding/token_embedding", data=token_embedding, dtype="uint8" + ) + hdf5_file.create_dataset( + "src_embedding/src_emb_clip_max", + data=state_dict["bert.embeddings.emb_quant.clip.clip_value_max"], ) # save number of layers metadata @@ -167,15 +193,16 @@ def _print_pair(key, value): if __name__ == "__main__": - output_lightseq_model_name = "lightseq_bert_base_uncased" - input_huggingface_bert_model = "bert-base-uncased" - head_number = 12 + args = parse_args() + model_name = ".".join(args.model.split(".")[:-1]) + hdf5_path = f"{model_name}.hdf5" + head_number = 12 pad_id = 0 max_step = 50 extract_bert_weights( - output_lightseq_model_name, - input_huggingface_bert_model, + hdf5_path, + args.model, head_num=head_number, pad_id=pad_id, max_step=max_step, diff --git a/lightseq/training/ops/pytorch/torch_transformer_layers.py b/lightseq/training/ops/pytorch/torch_transformer_layers.py index b7347bc1..e635e6fe 100644 --- a/lightseq/training/ops/pytorch/torch_transformer_layers.py +++ b/lightseq/training/ops/pytorch/torch_transformer_layers.py @@ -1075,7 +1075,6 @@ def __init__(self, config, initial_weights=None): ) self.emb_quant = TensorQuantizer(weight_quant_config) - self.pos_emb_quant = TensorQuantizer(weight_quant_config) if initial_weights is None: return @@ -1120,7 +1119,6 @@ def forward( embeddings = inputs_embeds + token_type_embeddings embeddings = self.emb_quant(embeddings) position_embeddings = self.position_embeddings(position_ids) - position_embeddings = self.pos_emb_quant(position_embeddings) embeddings += position_embeddings embeddings = self.LayerNorm(embeddings) embeddings = self.dropout(embeddings) From 63e90d92c2a07e4be2ed5e174cc1caf98dca2641 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 24 Mar 2022 20:32:33 +0800 Subject: [PATCH 17/49] add quant bert files --- lightseq/inference/model/CMakeLists.txt | 10 + .../inference/model/quant_bert_encoder.cc.cu | 307 +++++++++++ lightseq/inference/model/quant_bert_encoder.h | 93 ++++ lightseq/inference/proto/CMakeLists.txt | 7 + lightseq/inference/proto/quant_bert.proto | 66 +++ lightseq/inference/proto/quant_bert_weight.cc | 484 ++++++++++++++++++ lightseq/inference/proto/quant_bert_weight.h | 93 ++++ lightseq/inference/pywrapper/CMakeLists.txt | 5 +- lightseq/inference/pywrapper/quant_bert.cc | 159 ++++++ lightseq/inference/pywrapper/quant_bert.h | 49 ++ lightseq/inference/pywrapper/wrapper.cc | 88 ++++ 11 files changed, 1360 insertions(+), 1 deletion(-) create mode 100644 lightseq/inference/model/quant_bert_encoder.cc.cu create mode 100644 lightseq/inference/model/quant_bert_encoder.h create mode 100644 lightseq/inference/proto/quant_bert.proto create mode 100644 lightseq/inference/proto/quant_bert_weight.cc create mode 100644 lightseq/inference/proto/quant_bert_weight.h create mode 100644 lightseq/inference/pywrapper/quant_bert.cc create mode 100644 lightseq/inference/pywrapper/quant_bert.h diff --git a/lightseq/inference/model/CMakeLists.txt b/lightseq/inference/model/CMakeLists.txt index ba5b7668..e767db9c 100644 --- a/lightseq/inference/model/CMakeLists.txt +++ b/lightseq/inference/model/CMakeLists.txt @@ -52,6 +52,16 @@ else() CUDA::cublasLt_static) endif() +add_library(quant_bert_model STATIC quant_bert_encoder.cc.cu) +target_link_libraries(quant_bert_model PUBLIC cuda_kernels) +target_link_libraries(quant_bert_model PUBLIC quant_bert_weight) +if(DYNAMIC_API) + target_link_libraries(quant_bert_model PRIVATE CUDA::cublas CUDA::cublasLt) +else() + target_link_libraries(quant_bert_model PRIVATE CUDA::cublas_static + CUDA::cublasLt_static) +endif() + set(moe_files moe_decoder.cc.cu moe_encoder.cc.cu) add_library(moe_model STATIC ${moe_files}) target_link_libraries(moe_model PUBLIC cuda_kernels) diff --git a/lightseq/inference/model/quant_bert_encoder.cc.cu b/lightseq/inference/model/quant_bert_encoder.cc.cu new file mode 100644 index 00000000..1fd4fb86 --- /dev/null +++ b/lightseq/inference/model/quant_bert_encoder.cc.cu @@ -0,0 +1,307 @@ +#include "quant_bert_encoder.h" +#include "../kernels/embKernels.h" +#include "../kernels/transformerKernels.h" + +/** +@file +Transformer encoder, composed by gemm lib and + custom cuda kernel function +*/ + +namespace lightseq { +namespace cuda { + +template +QuantBertEncoder::QuantBertEncoder( + int max_batch_size, const int *p_d_token_id, int *p_d_padding_mask, + _DataType *p_d_output, const QuantBertWeight &tw, + cudaStream_t stream, cublasHandle_t hd, const int *p_d_lang_id) + : _max_batch_size(max_batch_size), + _p_d_token_id(p_d_token_id), + _p_d_padding_mask(p_d_padding_mask), + _p_d_output(p_d_output), + _p_d_lang_id(p_d_lang_id), + _tw(tw), + _stream(stream), + _hd(hd), + _p_d_src_emb_wei(tw.get_src_emb_wei()), + _p_d_enc_wei(tw.get_enc_wei()), + _fone((_DataType)1.f), + _fzero((_DataType)0.f), + _atten_scaler((_DataType)sqrt(1.f / tw._dim_per_head)), + _max_batch_dim(max_batch_size * tw._max_step * tw._hidden_size), + _max_thread_per_block(1024) {} + +/** +Compute GPU memory size needed by transformer encoder, + to see how these memory is used, checkout init_buffer() for detail +*/ +template +long QuantBertEncoder::compute_buffer_bytesize() { + long sz1 = _max_batch_dim * 6 + + _max_batch_size * _tw._head_num * _tw._max_step * _tw._max_step; + long sz2 = _max_batch_dim + _max_batch_size * _tw._max_step * _tw._inner_size; + return max(sz1, sz2) * sizeof(_DataType); +} + +/** +Init the GPU memory pointer which point to + the memory buffer needed by encoder. +These buffer are used during custom cuda kernel function, + find the corresponding function to see how these buffer are used +*/ +template +void QuantBertEncoder::init_buffer(void *pbuf) { + _DataType *p_d_buf = reinterpret_cast<_DataType *>(pbuf); + _p_d_qkv_projected = p_d_buf; + _p_d_q = _p_d_qkv_projected + _max_batch_dim * 3; + _p_d_k = _p_d_q + _max_batch_dim; + _p_d_v = _p_d_k + _max_batch_dim; + _p_d_c = _p_d_v + _max_batch_dim; + _p_d_ffn_buf1 = p_d_buf; + _p_d_ffn_buf2 = _p_d_ffn_buf1 + _max_batch_dim; + return; +} + +/** +Some requirements needed by custom cuda kernel function +*/ +template +std::string QuantBertEncoder::check() { + // if (_max_thread_per_block < _tw._hidden_size) { + // return "violate hidden_size <= max_thread_per_block"; + // } + if (_tw._inner_size & 1) { + return "violate inner_size % 2 = 0"; + } + if (_tw._dim_per_head & 1) { + return "violate dim_per_head % 2 = 0"; + } + if (_tw._multilg_type == 0 && _p_d_src_emb_wei.size() != 4) { + return "violate p_d_src_emb_wei.size() = 4"; + } + if (_tw._multilg_type != 0 && _p_d_src_emb_wei.size() != 5) { + return "violate p_d_src_emb_wei.size() = 5"; + } + if (_p_d_enc_wei.size() != _tw._weight_per_enc_layer * _tw._n_enc_layer) { + return "violate p_d_enc_wei.size() = weight_per_enc_layer * n_enc_layer"; + } + if (_tw._multilg_type != 0 && _p_d_lang_id == nullptr) { + return "lang id should not be null when multilg"; + } + return ""; +} + +/** +Encoder inference +*/ +template +void QuantBertEncoder::run_one_infer(int batch_size, + int batch_seq_len) { + if (batch_size > _max_batch_size) { + throw std::runtime_error("batch size of input greater than max_batch_size"); + } + if (batch_seq_len > _tw._max_step) { + throw std::runtime_error("seq len of input greater than max_step"); + } + /* ---step1. init--- */ + _batch_size = batch_size; + _batch_seq_len = batch_seq_len; + _batch_token_num = batch_size * batch_seq_len; +#ifdef DEBUG_RESULT + std::cout << "batch_size-" << batch_size << " batch_seq_len-" << batch_seq_len + << std::endl; + print_vec(_p_d_token_id, "batch_token_ids", batch_size * batch_seq_len); +#endif + + /* ---step2. encoder feedforward--- */ + launch_enc_emb<_DataType>(_p_d_src_emb_wei[0], _p_d_src_emb_wei[1], + _p_d_token_id, _p_d_output, _p_d_padding_mask, + _tw._padding_id, batch_size, batch_seq_len, + _tw._hidden_size, _stream, _p_d_src_emb_wei[4], + _p_d_lang_id, _tw._multilg_type); +#ifdef DEBUG_RESULT + for (int i = 0; i < _batch_size; i++) { // batch_id + for (int j = 0; j < _batch_seq_len; j++) { // token_id + std::cout << "emb out: token-" << j << std::endl; + print_vec(_p_d_output + i * _batch_seq_len * _tw._hidden_size + + j * _tw._hidden_size, + "emb out", 10); + } + } // not normal +#endif + for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { + _weight_offset = _layer_id * _tw._weight_per_enc_layer; + self_attention(); + ffn_add_norm(); + } + // last layer norm + ker_norm_layer_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_output, + _p_d_src_emb_wei[2], _p_d_src_emb_wei[3], _max_thread_per_block); + +#ifdef DEBUG_RESULT + for (int i = 0; i < _batch_size; i++) { // batch_id + for (int j = 0; j < _batch_seq_len; j++) { // token_id + std::cout << "encoder output: token-" << j << std::endl; + print_vec(_p_d_output + i * _batch_seq_len * _tw._hidden_size + + j * _tw._hidden_size, + "encoder_output", _tw._dim_per_head); + } + } // not normal +#endif + return; +} + +/** +Encoder self attention +*/ +template +void QuantBertEncoder::self_attention() { + /* ---step 0. layer_norm, add output_bias to "query"--- */ + ker_norm_layer_resual_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_output, _p_d_q, + _p_d_enc_wei[_weight_offset], _p_d_enc_wei[_weight_offset + 1], + _p_d_enc_wei[_weight_offset + 5], _max_thread_per_block, _tw._is_post_ln); + +#ifdef DEBUG_RESULT + print_vec(_p_d_enc_wei[_weight_offset], "layer norm scale(head): ", 5); + print_vec(_p_d_enc_wei[_weight_offset + 1], "layer norm bias(head): ", 5); + print_vec(_p_d_q, "layer norm out(head): ", 5); + print_vec(_p_d_q + _batch_token_num * _tw._hidden_size - 5, + "layer norm out(tail): ", 5); +#endif + + /* ---step 1. qkv = ori_q * qkv_wei + bias, and reshape qkv for multi-head + * gemm--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size * 3, _batch_token_num, + _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 2], _AType, + _tw._hidden_size * 3, _p_d_q, _BType, _tw._hidden_size, &_fzero, + _p_d_qkv_projected, _CType, _tw._hidden_size * 3, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + print_vec(_p_d_qkv_projected, "self qkv(head): ", 5); + print_vec(_p_d_qkv_projected + _batch_token_num * _tw._hidden_size * 3 - 5, + "self qkv(tail): ", 5); +#endif + + // get q, k, v by split and reshape qkv + ker_arrange_encself_qkv_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_qkv_projected, + _p_d_enc_wei[_weight_offset + 3], _p_d_q, _max_batch_dim, _batch_seq_len, + _tw._dim_per_head, _tw._head_num, _max_thread_per_block); + + /* ---step 2. correlation = q * k, perform softmax on correlation--- */ + CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( + _hd, CUBLAS_OP_T, CUBLAS_OP_N, _batch_seq_len, _batch_seq_len, + _tw._dim_per_head, &_atten_scaler, _p_d_k, _AType, _tw._dim_per_head, + _batch_seq_len * _tw._dim_per_head, _p_d_q, _BType, _tw._dim_per_head, + _batch_seq_len * _tw._dim_per_head, &_fzero, _p_d_c, _CType, + _batch_seq_len, _batch_seq_len * _batch_seq_len, + _batch_size * _tw._head_num, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + ker_correlation_softmax_encself_launcher<_DataType>( + _batch_size, _batch_seq_len, _tw._head_num, _stream, _p_d_c, + _p_d_padding_mask); + +#ifdef DEBUG_RESULT + print_vec(_p_d_c, "self attn correlation(head): ", 5); + print_vec(_p_d_c + _batch_token_num * _tw._head_num * _batch_seq_len - 5, + "self attn correlation(tail): ", 5); +#endif + + /* ---step 3. new_q = correlation * v--- */ + CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._dim_per_head, _batch_seq_len, + _batch_seq_len, &_fone, _p_d_v, _AType, _tw._dim_per_head, + _batch_seq_len * _tw._dim_per_head, _p_d_c, _BType, _batch_seq_len, + _batch_seq_len * _batch_seq_len, &_fzero, _p_d_q, _CType, + _tw._dim_per_head, _batch_seq_len * _tw._dim_per_head, + _batch_size * _tw._head_num, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + // use v to save reshaped q, since they are in same size and v + // will not be use again before the next multi-head-attention + ker_arrange_atten_output_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_q, _p_d_v, + _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block); + +#ifdef DEBUG_RESULT + print_vec(_p_d_v, "self attn before ffn(head): ", 5); +#endif + + /* ---step 4. new_q = ori_q + new_q * output_wei--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_token_num, + _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 4], _AType, + _tw._hidden_size, _p_d_v, _BType, _tw._hidden_size, &_fone, _p_d_output, + _CType, _tw._hidden_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + print_vec(_p_d_output, "self attn ffn out(head): ", 5); + print_vec(_p_d_output + _batch_token_num * _tw._hidden_size - 5, + "self attn ffn out(tail): ", 5); + + print_vec(_p_d_enc_wei[_weight_offset + 4], "enc wei:", 5); +#endif + + return; +} + +template +void QuantBertEncoder::ffn_add_norm() { + /* ---step 0. layer_norm, add output_bias to "query"--- */ + ker_norm_layer_resual_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_output, _p_d_ffn_buf1, + _p_d_enc_wei[_weight_offset + 6], _p_d_enc_wei[_weight_offset + 7], + _p_d_enc_wei[_weight_offset + 11], _max_thread_per_block, + _tw._is_post_ln); + +#ifdef DEBUG_RESULT + print_vec(_p_d_enc_wei[_weight_offset + 6], "layer norm scale(head): ", 5); + print_vec(_p_d_enc_wei[_weight_offset + 7], "layer norm bias(head): ", 5); + print_vec(_p_d_ffn_buf1, "layer norm(head): ", 5); + print_vec(_p_d_ffn_buf1 + _batch_token_num * _tw._hidden_size - 5, + "layer norm(tail): ", 5); +#endif + + /* ---step 1. first ffn layer--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._inner_size, _batch_token_num, + _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 8], _AType, + _tw._inner_size, _p_d_ffn_buf1, _BType, _tw._hidden_size, &_fzero, + _p_d_ffn_buf2, _CType, _tw._inner_size, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + + if (_tw._use_gelu) { + ker_bias_gelu_launcher<_DataType>( + _batch_token_num, _max_thread_per_block, _stream, _p_d_ffn_buf2, + _p_d_enc_wei[_weight_offset + 9], _tw._inner_size); + } else { + ker_bias_relu_launcher<_DataType>( + _batch_token_num, _max_thread_per_block, _stream, _p_d_ffn_buf2, + _p_d_enc_wei[_weight_offset + 9], _tw._inner_size); + } + +#ifdef DEBUG_RESULT + print_vec(_p_d_ffn_buf2, "ffn activation(head): ", 5); + print_vec(_p_d_ffn_buf2 + _batch_token_num * _tw._hidden_size - 5, + "ffn activation(tail): ", 5); +#endif + + /* ---step 2. second ffn layer--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_token_num, + _tw._inner_size, &_fone, _p_d_enc_wei[_weight_offset + 10], _AType, + _tw._hidden_size, _p_d_ffn_buf2, _BType, _tw._inner_size, &_fone, + _p_d_output, _CType, _tw._hidden_size, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + return; +} + +template class QuantBertEncoder; +template class QuantBertEncoder; + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/model/quant_bert_encoder.h b/lightseq/inference/model/quant_bert_encoder.h new file mode 100644 index 00000000..db68d430 --- /dev/null +++ b/lightseq/inference/model/quant_bert_encoder.h @@ -0,0 +1,93 @@ +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "../proto/quant_bert_weight.h" +#include "../tools/util.h" + +/** +@file +Transformer decoder, composed by gemm lib and + custom cuda kernel function +*/ + +namespace lightseq { +namespace cuda { + +template +class QuantBertEncoder { + private: + typedef OperationTypeTraits _optraits; + typedef typename _optraits::DataType _DataType; + const cudaDataType_t _computeType = _optraits::computeType; + const cudaDataType_t _AType = _optraits::AType; + const cudaDataType_t _BType = _optraits::BType; + const cudaDataType_t _CType = _optraits::CType; + + // private member function + void self_attention(); + void ffn_add_norm(); + + const int _max_batch_size; + int *_p_d_padding_mask; // true sequence length(remove padding), [batch_size] + + const int *_p_d_lang_id; + const QuantBertWeight &_tw; + cudaStream_t _stream; + cublasHandle_t _hd; + const _DataType _fone; + const _DataType _fzero; + const _DataType _atten_scaler; + const int _max_batch_dim; + const int _max_thread_per_block; + + _DataType *_p_d_qkv_projected; + _DataType *_p_d_q; + _DataType *_p_d_k; + _DataType *_p_d_v; + _DataType *_p_d_c; + _DataType *_p_d_ffn_buf1; + _DataType *_p_d_ffn_buf2; + + // {token_emb, pos_emb, norm_scale, norm_bias} + const std::vector &_p_d_src_emb_wei; + // {multihead_norm_scale, multihead_norm_bias, multihead_qkv_kernel, + // multihead_qkv_bias multihead_output_kernel, multihead_output_bias + // ffn_norm_scale, ffn_norm_bias} + // ffn_first_kernel, ffn_first_bias, ffn_second_kernel, ffn_second_bias} * + // encoder_layer_num + const std::vector &_p_d_enc_wei; + + int _batch_size; + int _batch_seq_len; + int _batch_token_num; + int _layer_id; + int _weight_offset; + + public: + const int *_p_d_token_id; // input token id [batch_size, batch_seq_len] + _DataType + *_p_d_output; // encoder output, [batch_size, batch_seq_len, hidden_size] + + QuantBertEncoder(int max_batch_size, const int *p_d_token_id, + int *p_d_padding_mask, _DataType *p_d_output, + const QuantBertWeight &tw, cudaStream_t stream, + cublasHandle_t hd, const int *p_d_lang_id = nullptr); + long compute_buffer_bytesize(); + void init_buffer(void *pbuf); + std::string check(); + void run_one_infer(int batch_size, int batch_seq_len); +}; + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/proto/CMakeLists.txt b/lightseq/inference/proto/CMakeLists.txt index 7d162569..745d8e7f 100644 --- a/lightseq/inference/proto/CMakeLists.txt +++ b/lightseq/inference/proto/CMakeLists.txt @@ -10,6 +10,7 @@ include_directories(${CMAKE_CURRENT_BINARY_DIR}) protobuf_generate_cpp(GPT_PROTO_SRC GPT_PROTO_HEADER gpt.proto) protobuf_generate_cpp(BERT_PROTO_SRC BERT_PROTO_HEADER bert.proto) +protobuf_generate_cpp(Q_BERT_PROTO_SRC Q_BERT_PROTO_HEADER quant_bert.proto) protobuf_generate_cpp(Q_TRANSFORMER_PROTO_SRC Q_TRANSFORMER_PROTO_HEADER quant_transformer.proto) protobuf_generate_cpp(TRANSFORMER_PROTO_SRC TRANSFORMER_PROTO_HEADER @@ -29,6 +30,12 @@ target_link_libraries(bert_weight PUBLIC utils ${Protobuf_LIBRARIES}) target_include_directories(bert_weight PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}) target_include_directories(bert_weight PUBLIC ${CMAKE_CURRENT_BINARY_DIR}) +add_library(quant_bert_weight STATIC quant_bert_weight.cc ${Q_BERT_PROTO_SRC} + ${Q_BERT_PROTO_HEADER}) +target_link_libraries(quant_bert_weight PUBLIC utils ${Protobuf_LIBRARIES}) +target_include_directories(quant_bert_weight PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}) +target_include_directories(quant_bert_weight PUBLIC ${CMAKE_CURRENT_BINARY_DIR}) + add_library( transformer_weight STATIC transformer_weight.cc ${TRANSFORMER_PROTO_SRC} ${TRANSFORMER_PROTO_HEADER}) diff --git a/lightseq/inference/proto/quant_bert.proto b/lightseq/inference/proto/quant_bert.proto new file mode 100644 index 00000000..51b7f727 --- /dev/null +++ b/lightseq/inference/proto/quant_bert.proto @@ -0,0 +1,66 @@ +syntax = "proto3"; +option optimize_for = LITE_RUNTIME; +// all the matrix are stored in row-major order, +// plz see https://en.wikipedia.org/wiki/Row-_and_column-major_order for details + +// the definition of "Multi-Head Attention", "Scaled Dot-Product Attention" and +// "Feed-Forward Networks" +// plz see https://arxiv.org/abs/1706.03762 for details + +message QuantBertEncoderLayer { + // layer norm before "Multi-Head Attention" + repeated float multihead_norm_scale = 1; // [hidden_size] + repeated float multihead_norm_bias = 2; // [hidden_size] + + // "Multi-Head Attention" linearly project weights kernel for query, key, + // value, + // before "Scaled Dot-Product Attention, with shape (hidden_size, + // hidden_size*3) + // is built by numpy.concatenate((query_kernel, key_kernel, value_kernel), + // axis=1) + // perform numpy.dot(input, multihead_project_kernel_qkv) will get the [query, + // key, value] of + // "Scaled Dot-Product Attention" + repeated float multihead_project_kernel_qkv = 3; // [hidden_size, 3, hidden_size] + repeated float multihead_project_bias_qkv = 4; // [3, hidden_size] + repeated float multihead_project_kernel_output = 5; // [hidden_size, hidden_size] + repeated float multihead_project_bias_output = 6; // [hidden_size] + + // layer norm before "Feed-Forward Networks" + repeated float ffn_norm_scale = 7; // [hidden_size] + repeated float ffn_norm_bias = 8; // [hidden_size] + + // "Feed-Forward Networks" + repeated float ffn_first_kernel = 9; // [hidden_size, inner_size] + repeated float ffn_first_bias = 10; // [inner_size] + repeated float ffn_second_kernel = 11; // [inner_size, hidden_size] + repeated float ffn_second_bias = 12; // [hidden_size] +} + +message QuantBertEmbeddingLayer { + // token embedding table + // look it up directly will get the input token embedding + repeated float token_embedding = 1; // [vocab_size, hidden_size] + repeated float position_embedding = 2; // [max_seq_len, hidden_size] + // the last layer_norm of encoder, + // only for pre layer norm, + repeated float norm_scale = 3; // [hidden_size] + repeated float norm_bias = 4; // [hidden_size] +} + +message QuantBertModelConf { + int32 head_num = 1; + int32 src_padding_id = 2; + bool is_post_ln = 3; // Pre-LN or Post-LN + bool use_gelu = 4; // use gelu for activation otherwise relu + // Multilingual model type, 0 for bilingual + // 1 for token level multilingual, + // 2 for sentence level multilingual + int32 multilg_type = 5; +} + +message QuantBert { + QuantBertEmbeddingLayer src_embedding = 1; + repeated QuantBertEncoderLayer encoder_stack = 2; + QuantBertModelConf model_conf = 3; +} diff --git a/lightseq/inference/proto/quant_bert_weight.cc b/lightseq/inference/proto/quant_bert_weight.cc new file mode 100644 index 00000000..962f8ab3 --- /dev/null +++ b/lightseq/inference/proto/quant_bert_weight.cc @@ -0,0 +1,484 @@ +#include "quant_bert_weight.h" + +#include + +/** +@file +Load the model weights which stored in custom proto file into GPU memory. +Currently, fp16 and fp32 versions are provided. +Weights in proto file will always be in fp32. For fp16, the weights + will be casted from fp32 into fp16 +*/ + +namespace lightseq { +namespace cuda { + +/** +Cast weights into required datatype. +The datatype of weights in custom proto file will always be in fp32. +*/ +template <> +float QuantBertWeight::float2required(float value) { + return value; +} + +/** +fp16 version, cast fp32 into fp16 +*/ +template <> +__half QuantBertWeight::float2required(float value) { + return __float2half_rn(value); +} + +/** +Read model config stored in custom proto file. +*/ +template +void QuantBertWeight::proto_get_model_config(const QuantBert &bert) { + _hidden_size = bert.src_embedding().norm_scale_size(); + _inner_size = bert.encoder_stack()[0].ffn_first_kernel_size() / _hidden_size; + _max_step = bert.src_embedding().position_embedding_size() / _hidden_size; + _src_vocab_size = bert.src_embedding().token_embedding_size() / _hidden_size; + _n_enc_layer = bert.encoder_stack_size(); + _head_num = bert.model_conf().head_num(); + _dim_per_head = _hidden_size / _head_num; + _weight_per_enc_layer = 12; + _padding_id = bert.model_conf().src_padding_id(); + _is_post_ln = bert.model_conf().is_post_ln(); + _use_gelu = bert.model_conf().use_gelu(); + _multilg_type = bert.model_conf().multilg_type(); +} + +/** +Load the weights of embedding layer into GPU memory. +*/ +template +std::string QuantBertWeight::proto_parse_emb_wei( + const QuantBertEmbeddingLayer &layer) { + std::vector offset; + std::vector value; + int idx = 0; + + offset.push_back(idx); + if (layer.token_embedding_size() != _src_vocab_size * _hidden_size) + return "wrong token_embedding_size !"; + for (float ele : layer.token_embedding()) value.push_back(ele); + idx += _src_vocab_size * _hidden_size; + + offset.push_back(idx); + if (layer.position_embedding_size() != _max_step * _hidden_size) + return "wrong position_embedding_size !"; + for (float ele : layer.position_embedding()) value.push_back(ele); + idx += _max_step * _hidden_size; + + offset.push_back(idx); + if (layer.norm_scale_size() != _hidden_size) return "wrong norm_scale_size !"; + for (float ele : layer.norm_scale()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (layer.norm_bias_size() != _hidden_size) return "wrong norm_bias_size !"; + for (float ele : layer.norm_bias()) value.push_back(ele); + idx += _hidden_size; + + std::vector<_DataType> raw_value; + for (float e : value) raw_value.push_back(float2required(e)); + _d_src_emb_wei = raw_value; + for (int e : offset) + _p_d_src_emb_wei.push_back(thrust::raw_pointer_cast(_d_src_emb_wei.data()) + + e); + + std::cout << "finish initializing emb_wei from host to device" << std::endl; + return ""; +} + +/** +Load the weights of encoder into GPU memory. +*/ +template +std::string QuantBertWeight::proto_parse_enc_wei( + const QuantBert &bert) { + std::vector offset; + std::vector value; + int idx = 0; + + for (auto enc_layer : bert.encoder_stack()) { + offset.push_back(idx); + if (enc_layer.multihead_norm_scale_size() != _hidden_size) + return "wrong multihead_norm_scale_size !"; + for (float ele : enc_layer.multihead_norm_scale()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.multihead_norm_bias_size() != _hidden_size) + return "wrong multihead_norm_bias_size !"; + for (float ele : enc_layer.multihead_norm_bias()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.multihead_project_kernel_qkv_size() != + _hidden_size * _hidden_size * 3) + return "wrong multihead_project_kernel_qkv_size !"; + for (float ele : enc_layer.multihead_project_kernel_qkv()) + value.push_back(ele); + idx += _hidden_size * _hidden_size * 3; + + offset.push_back(idx); + if (enc_layer.multihead_project_bias_qkv_size() != _hidden_size * 3) + return "wrong multihead_project_bias_qkv_size !"; + for (float ele : enc_layer.multihead_project_bias_qkv()) + value.push_back(ele); + idx += _hidden_size * 3; + + offset.push_back(idx); + if (enc_layer.multihead_project_kernel_output_size() != + _hidden_size * _hidden_size) + return "wrong multihead_project_kernel_output_size !"; + for (float ele : enc_layer.multihead_project_kernel_output()) + value.push_back(ele); + idx += _hidden_size * _hidden_size; + + offset.push_back(idx); + if (enc_layer.multihead_project_bias_output_size() != _hidden_size) + return "wrong multihead_project_bias_output_size !"; + for (float ele : enc_layer.multihead_project_bias_output()) + value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.ffn_norm_scale_size() != _hidden_size) + return "wrong ffn_norm_scale_size !"; + for (float ele : enc_layer.ffn_norm_scale()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.ffn_norm_bias_size() != _hidden_size) + return "wrong ffn_norm_bias_size !"; + for (float ele : enc_layer.ffn_norm_bias()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.ffn_first_kernel_size() != _hidden_size * _inner_size) + return "wrong ffn_first_kernel_size !"; + for (float ele : enc_layer.ffn_first_kernel()) value.push_back(ele); + idx += _hidden_size * _inner_size; + + offset.push_back(idx); + if (enc_layer.ffn_first_bias_size() != _inner_size) + return "wrong ffn_first_bias_size !"; + for (float ele : enc_layer.ffn_first_bias()) value.push_back(ele); + idx += _inner_size; + + offset.push_back(idx); + if (enc_layer.ffn_second_kernel_size() != _hidden_size * _inner_size) + return "wrong ffn_second_kernel_size !"; + for (float ele : enc_layer.ffn_second_kernel()) value.push_back(ele); + idx += _hidden_size * _inner_size; + + offset.push_back(idx); + if (enc_layer.ffn_second_bias_size() != _hidden_size) + return "wrong ffn_second_bias_size !"; + for (float ele : enc_layer.ffn_second_bias()) value.push_back(ele); + idx += _hidden_size; + + } // for + + std::vector<_DataType> raw_value; + for (float e : value) raw_value.push_back(float2required(e)); + _d_enc_wei = raw_value; + + for (int e : offset) + _p_d_enc_wei.push_back(thrust::raw_pointer_cast(_d_enc_wei.data()) + e); + std::cout << "finish initializing enc_wei from host to device" << std::endl; + return ""; +} + +/** +Read model config stored in custom hdf5 file. +*/ +template +void QuantBertWeight::hdf5_get_model_config(hid_t hdf5_file) { + _hidden_size = get_hdf5_dataset_size(hdf5_file, "src_embedding/norm_scale"); + + _inner_size = + get_hdf5_dataset_size(hdf5_file, "encoder_stack/0/ffn_first_kernel") / + _hidden_size; + + _max_step = + get_hdf5_dataset_size(hdf5_file, "src_embedding/position_embedding") / + _hidden_size; + + _src_vocab_size = + get_hdf5_dataset_size(hdf5_file, "src_embedding/token_embedding") / + _hidden_size; + + read_hdf5_dataset_scalar(hdf5_file, "model_conf/n_encoder_stack", + H5T_NATIVE_INT, &_n_enc_layer); + + read_hdf5_dataset_scalar(hdf5_file, "model_conf/head_num", H5T_NATIVE_INT, + &_head_num); + + _dim_per_head = _hidden_size / _head_num; + _weight_per_enc_layer = 12; + + read_hdf5_dataset_scalar(hdf5_file, "model_conf/src_padding_id", + H5T_NATIVE_INT, &_padding_id); + + read_hdf5_dataset_scalar(hdf5_file, "model_conf/is_post_ln", H5T_NATIVE_HBOOL, + &_is_post_ln); + + read_hdf5_dataset_scalar(hdf5_file, "model_conf/use_gelu", H5T_NATIVE_HBOOL, + &_use_gelu); + + try { + read_hdf5_dataset_scalar(hdf5_file, "model_conf/multilg_type", + H5T_NATIVE_INT, &_multilg_type); + } catch (HDF5DatasetNotFoundError &e) { + // default value + _multilg_type = 0; + } +} + +/** +Load the weights of embedding layer into GPU memory. +*/ +template +void QuantBertWeight::hdf5_parse_emb_wei(hid_t hdf5_file) { + std::string dataset_prefix = "src_embedding"; + + size_t value_size = _src_vocab_size * _hidden_size + + _max_step * _hidden_size + 2 * _hidden_size; + + std::vector offset; + std::vector value(value_size); // preallocate vector for performance + std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) + << " MB of embedding weight." << std::endl; + int idx = 0; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/token_embedding", H5T_NATIVE_FLOAT, + value.data() + idx, + [=](int size) { return size != _src_vocab_size * _hidden_size; }, + "Wrong token_embedding_size !"); + idx += _src_vocab_size * _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/position_embedding", H5T_NATIVE_FLOAT, + value.data() + idx, + [=](int size) { return size != _max_step * _hidden_size; }, + "Wrong position_embedding_size !"); + idx += _max_step * _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/norm_scale", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong norm_scale_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/norm_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong norm_bias_size !"); + idx += _hidden_size; + + std::vector<_DataType> raw_value; + raw_value.reserve(value.size()); + for (float e : value) raw_value.push_back(float2required(e)); + _d_src_emb_wei = raw_value; + for (int e : offset) + _p_d_src_emb_wei.push_back(thrust::raw_pointer_cast(_d_src_emb_wei.data()) + + e); + + std::cout << "Finish loading src_emb_wei from host to device" << std::endl; +} + +/** +Load the weights of encoder into GPU memory. +*/ +template +void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { + size_t value_size = + (_hidden_size * 2 + _hidden_size * _hidden_size * 3 + _hidden_size * 3 + + _hidden_size * _hidden_size + _hidden_size * 3 + + _hidden_size * _inner_size + _inner_size + _hidden_size * _inner_size + + _hidden_size) * + _n_enc_layer; + std::vector offset; + std::vector value(value_size); + std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) + << " MB of encoder weight." << std::endl; + + int idx = 0; + for (int layer_id = 0; layer_id < _n_enc_layer; ++layer_id) { + std::string dataset_prefix = "encoder_stack/" + std::to_string(layer_id); + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_norm_scale", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong multihead_norm_scale_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_norm_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong multihead_norm_bias_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv", + H5T_NATIVE_FLOAT, value.data() + idx, + [=](int size) { return size != _hidden_size * _hidden_size * 3; }, + "Wrong multihead_project_kernel_qkv_size !"); + idx += _hidden_size * _hidden_size * 3; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_project_bias_qkv", + H5T_NATIVE_FLOAT, value.data() + idx, + [=](int size) { return size != _hidden_size * 3; }, + "Wrong multihead_project_bias_qkv_size !"); + idx += _hidden_size * 3; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_project_kernel_output", + H5T_NATIVE_FLOAT, value.data() + idx, + [=](int size) { return size != _hidden_size * _hidden_size; }, + "Wrong multihead_project_kernel_output_size !"); + idx += _hidden_size * _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_project_bias_output", + H5T_NATIVE_FLOAT, value.data() + idx, + [=](int size) { return size != _hidden_size; }, + "Wrong multihead_project_bias_output_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_norm_scale", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong ffn_norm_scale_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_norm_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong ffn_norm_bias_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_FLOAT, + value.data() + idx, + [=](int size) { return size != _hidden_size * _inner_size; }, + "Wrong ffn_first_kernel_size !"); + idx += _hidden_size * _inner_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_first_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _inner_size; }, + "Wrong ffn_first_bias_size !"); + idx += _inner_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_FLOAT, + value.data() + idx, + [=](int size) { return size != _hidden_size * _inner_size; }, + "Wrong ffn_second_kernel_size !"); + idx += _hidden_size * _inner_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_second_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong ffn_second_bias_size !"); + idx += _hidden_size; + + } // for + + std::vector<_DataType> raw_value; + raw_value.reserve(value.size()); + for (float e : value) raw_value.push_back(float2required(e)); + _d_enc_wei = raw_value; + + for (int e : offset) + _p_d_enc_wei.push_back(thrust::raw_pointer_cast(_d_enc_wei.data()) + e); + std::cout << "Finish loading enc_wei from host to device" << std::endl; +} + +/** +Load the proto file into CPU memory and parse it. +*/ +template +std::string QuantBertWeight::initializing(std::string weight_path) { + if (endswith(weight_path, ".pb")) { + std::cout << "Parsing protobuf: " << weight_path << std::endl; + QuantBert bert; + // Verify that the version of the library that we linked against is + // compatible with the version of the headers we compiled against. + GOOGLE_PROTOBUF_VERIFY_VERSION; + + std::fstream raw_input(weight_path, std::ios::in | std::ios::binary); + if (!bert.ParseFromIstream(&raw_input)) { + return "Parse weights from [" + weight_path + "] failed."; + } + + proto_get_model_config(bert); + if (_hidden_size % 4 != 0) { + return "hidden_size should be a multiple of 4 to avoid misaligned " + "address in CUDA"; + } + + std::string res = proto_parse_emb_wei(bert.src_embedding()); + if (!res.empty()) return res; + + res = proto_parse_enc_wei(bert); + if (!res.empty()) return res; + + std::cout << "finish initializing all weight from host to device" + << std::endl; + // Optional: Delete all global objects allocated by libprotobuf. + // google::protobuf::ShutdownProtobufLibrary(); + return ""; + } else if (endswith(weight_path, ".hdf5")) { + std::cout << "Parsing hdf5: " << weight_path << std::endl; + + hid_t hdf5_file = H5Fopen(weight_path.c_str(), H5F_ACC_RDONLY, H5P_DEFAULT); + if (hdf5_file < 0) { + return "Unable to read HDF5 file from " + weight_path; + } + hdf5_get_model_config(hdf5_file); + if (_hidden_size % 4 != 0) { + return "hidden_size should be a multiple of 4 to avoid misaligned " + "address in CUDA"; + } + // hdf5_parse_* would throw std::runtime_error on error + hdf5_parse_emb_wei(hdf5_file); + hdf5_parse_enc_wei(hdf5_file); + H5Fclose(hdf5_file); + + std::cout << "Finish loading all weight from host to device" << std::endl; + return ""; + } else { + return "Unsupported weight extention for [" + weight_path + + "]; Supported extensions: .pb, .hdf5\n"; + } +} + +template class QuantBertWeight; +template class QuantBertWeight; + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/proto/quant_bert_weight.h b/lightseq/inference/proto/quant_bert_weight.h new file mode 100644 index 00000000..66bda4e5 --- /dev/null +++ b/lightseq/inference/proto/quant_bert_weight.h @@ -0,0 +1,93 @@ +#pragma once + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "quant_bert.pb.h" +#include "../tools/util.h" + +namespace lightseq { +namespace cuda { + +/* +Load the model weights which stored in custom proto file into GPU memory. +*/ +template +class QuantBertWeight { + private: + typedef OperationTypeTraits _optraits; + typedef typename _optraits::DataType _DataType; + _DataType float2required(float value); + void proto_get_model_config(const QuantBert &bert); + std::string proto_parse_emb_wei(const QuantBertEmbeddingLayer &layer); + std::string proto_parse_enc_wei(const QuantBert &bert); + + void hdf5_get_model_config(hid_t hdf5_file); + void hdf5_parse_emb_wei(hid_t hdf5_file); + void hdf5_parse_enc_wei(hid_t hdf5_file); + // store the weights pointer + std::vector _p_d_src_emb_wei; // size: 4 + std::vector _p_d_enc_wei; // size: 12 * enc_layer_num + + // store the weights on gpu memory + thrust::device_vector<_DataType> _d_src_emb_wei; + thrust::device_vector<_DataType> _d_enc_wei; + + public: + std::string initializing(std::string proto_path); + + const std::vector &get_src_emb_wei() const { + // {token_emb, pos_emb, norm_scale, norm_bias} + return _p_d_src_emb_wei; + } + + const std::vector &get_enc_wei() const { + // {multihead_norm_scale, multihead_norm_bias, multihead_qkv_kernel, + // multihead_qkv_bias multihead_output_kernel, multihead_output_bias + // ffn_norm_scale, ffn_norm_bias} + // ffn_first_kernel, ffn_first_bias, ffn_second_kernel, ffn_second_bias} * + // encoder_layer_num + return _p_d_enc_wei; + } + + int _hidden_size; + int _inner_size; + int _max_step; + int _src_vocab_size; + int _n_enc_layer; // number of encoder layer + int _dim_per_head; + int _weight_per_enc_layer; // 12 + + int _head_num; + int _padding_id; // for src + bool _is_post_ln; + bool _use_gelu; + int _multilg_type; + + void print_model_config() { + std::cout << "***model config***" << std::endl; + std::cout << "encoder layers: " << _n_enc_layer << std::endl; + std::cout << "hidden size: " << _hidden_size << std::endl; + std::cout << "inner size: " << _inner_size << std::endl; + std::cout << "head number: " << _head_num << std::endl; + std::cout << "dim per head: " << _dim_per_head << std::endl; + std::cout << "src vocab size: " << _src_vocab_size << std::endl; + std::cout << "is_post_ln: " << _is_post_ln << std::endl; + std::cout << "use_gelu: " << _use_gelu << std::endl; + std::cout << "padding_id: " << _padding_id << std::endl; + std::cout << std::endl; + } +}; + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/pywrapper/CMakeLists.txt b/lightseq/inference/pywrapper/CMakeLists.txt index a1d6cd6d..def6efb9 100644 --- a/lightseq/inference/pywrapper/CMakeLists.txt +++ b/lightseq/inference/pywrapper/CMakeLists.txt @@ -8,18 +8,21 @@ pybind11_add_module( gpt.cc bert.cc quant_transformer.cc + quant_bert.cc moe.cc) target_link_libraries(lightseq PUBLIC gpt_model) target_link_libraries(lightseq PUBLIC bert_model) target_link_libraries(lightseq PUBLIC transformer_model) target_link_libraries(lightseq PUBLIC quant_transformer_model) +target_link_libraries(lightseq PUBLIC quant_bert_model) target_link_libraries(lightseq PUBLIC moe_model) set_target_properties(lightseq PROPERTIES OUTPUT_NAME inference) add_library(liblightseq SHARED transformer.cc gpt.cc bert.cc - quant_transformer.cc moe.cc) + quant_transformer.cc quant_bert.cc moe.cc) target_link_libraries(liblightseq PUBLIC transformer_model) target_link_libraries(liblightseq PUBLIC quant_transformer_model) +target_link_libraries(liblightseq PUBLIC quant_bert_model) target_link_libraries(liblightseq PUBLIC gpt_model) target_link_libraries(liblightseq PUBLIC bert_model) target_link_libraries(liblightseq PUBLIC moe_model) diff --git a/lightseq/inference/pywrapper/quant_bert.cc b/lightseq/inference/pywrapper/quant_bert.cc new file mode 100644 index 00000000..74327f6c --- /dev/null +++ b/lightseq/inference/pywrapper/quant_bert.cc @@ -0,0 +1,159 @@ +#include "quant_bert.h" + +namespace lightseq { +namespace cuda { + +QuantBert::QuantBert(const std::string weight_path, const int max_batch_size) + : LSModel({"token_ids"}, {"encoder_output"}), + _max_batch_size(max_batch_size) { + /* ---step1. init environment--- */ + CHECK_GPU_ERROR(cudaSetDevice(0)); + CHECK_GPU_ERROR(cudaStreamCreate(&stream_)); + CHECK_GPU_ERROR(cublasCreate(&hd_)); + CHECK_GPU_ERROR(cublasSetStream(hd_, stream_)); + + /* ---step2. load model weights into GPU memory--- */ + + // saved in custom proto file + std::string model_weights_path = weight_path; + std::string res = tw_.initializing(model_weights_path); + if (!res.empty()) { + throw std::runtime_error(res); + } + + tw_.print_model_config(); + + /* + step3. instantiate encoder and decoder, init the gpu memory buffer. + using thrust vector to avoid manage gpu memory by hand + */ + + // register device memory for inputs and outputs + CHECK_GPU_ERROR( + cudaMalloc(&d_input_, _max_batch_size * tw_._max_step * sizeof(int))); + CHECK_GPU_ERROR(cudaMalloc(&d_padding_mask_, + _max_batch_size * tw_._max_step * sizeof(int))); + + CHECK_GPU_ERROR(cudaMalloc( + &d_encoder_output_, _max_batch_size * tw_._max_step * tw_._hidden_size * + sizeof(optraits::DataType))); + + encoder_ = std::make_shared>( + max_batch_size, d_input_, d_padding_mask_, d_encoder_output_, tw_, + stream_, hd_); + res = encoder_->check(); + if (!res.empty()) { + throw std::runtime_error(res); + } + + long buf_bytesize = encoder_->compute_buffer_bytesize(); + std::cout << "Bert buf_bytesize: " << buf_bytesize << std::endl; + + // encoder and decoder use the same buffer to save gpu memory useage + CHECK_GPU_ERROR(cudaMalloc(&d_buf_, (size_t)buf_bytesize)); + encoder_->init_buffer(d_buf_); + CHECK_GPU_ERROR(cudaStreamSynchronize(stream_)); +} + +QuantBert::~QuantBert() { + CHECK_GPU_ERROR(cudaFree(d_input_)); + CHECK_GPU_ERROR(cudaFree(d_padding_mask_)); + CHECK_GPU_ERROR(cudaFree(d_encoder_output_)); + CHECK_GPU_ERROR(cudaFree(d_buf_)); + CHECK_GPU_ERROR(cublasDestroy(hd_)); + CHECK_GPU_ERROR(cudaStreamDestroy(stream_)); +} + +void QuantBert::Infer() { + int batch_size = input_shapes_[0][0], seq_len = input_shapes_[0][1]; + encoder_->run_one_infer(batch_size, seq_len); + CHECK_GPU_ERROR(cudaStreamSynchronize(stream_)); + set_output_shape(0, {batch_size, seq_len, tw_._hidden_size}); +} + +void QuantBert::set_input_ptr(int index, void *input_ptr) { + switch (index) { + case 0: + encoder_->_p_d_token_id = static_cast(input_ptr); + break; + + default: + throw std::runtime_error("invalid input index"); + break; + } +} + +void QuantBert::set_output_ptr(int index, void *output_ptr) { + switch (index) { + case 0: + encoder_->_p_d_output = static_cast(output_ptr); + break; + + default: + throw std::runtime_error("invalid output index"); + break; + } +} + +const void *QuantBert::get_output_ptr(int index) { + switch (index) { + case 0: + return static_cast(encoder_->_p_d_output); + + default: + throw std::runtime_error("invalid output index"); + break; + } +} + +std::vector QuantBert::get_input_max_shape(int index) { + switch (index) { + case 0: + return {_max_batch_size, tw_._max_step}; + + default: + throw std::runtime_error("invalid input index"); + break; + } +} +std::vector QuantBert::get_output_max_shape(int index) { + switch (index) { + case 0: + return {_max_batch_size, tw_._max_step, tw_._hidden_size}; + + default: + throw std::runtime_error("invalid output index"); + break; + } +} + +DataType QuantBert::get_input_dtype(int index) { + switch (index) { + case 0: + return DataType::kInt32; + break; + + default: + throw std::runtime_error("invalid input index"); + break; + } +} + +DataType QuantBert::get_output_dtype(int index) { + switch (index) { + case 0: + if (bert_optype == OperationType::FP32) { + return DataType::kFloat32; + } else { + return DataType::kFloat16; + } + break; + + default: + throw std::runtime_error("invalid output index"); + break; + } +} + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/pywrapper/quant_bert.h b/lightseq/inference/pywrapper/quant_bert.h new file mode 100644 index 00000000..a73e5bd8 --- /dev/null +++ b/lightseq/inference/pywrapper/quant_bert.h @@ -0,0 +1,49 @@ + +#include "model_base.h" +#include "../model/quant_bert_encoder.h" +#include "../proto/quant_bert_weight.h" +#include "../tools/util.h" + +#ifdef FP16_MODE +const lightseq::cuda::OperationType bert_optype = + lightseq::cuda::OperationType::FP16; +#else +const lightseq::cuda::OperationType bert_optype = + lightseq::cuda::OperationType::FP32; +#endif + +namespace lightseq { +namespace cuda { +class QuantBert : public LSModel { + private: + typedef OperationTypeTraits optraits; + std::shared_ptr> encoder_; + + optraits::DataType *d_encoder_output_; + int *d_input_; + int *d_padding_mask_; + int _max_batch_size; + cudaStream_t stream_; + cublasHandle_t hd_; + void *d_buf_; + QuantBertWeight tw_; + + public: + QuantBert(const std::string weight_path, const int max_batch_size); + + ~QuantBert(); + + void Infer() override; + void set_input_ptr(int index, void *input_ptr) override; + void set_output_ptr(int index, void *output_ptr) override; + const void *get_output_ptr(int index) override; + std::vector get_input_max_shape(int index) override; + std::vector get_output_max_shape(int index) override; + DataType get_input_dtype(int index) override; + DataType get_output_dtype(int index) override; +}; + +LSMODEL_REGISTER(QuantBert); + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/pywrapper/wrapper.cc b/lightseq/inference/pywrapper/wrapper.cc index 7b0dd438..38416130 100644 --- a/lightseq/inference/pywrapper/wrapper.cc +++ b/lightseq/inference/pywrapper/wrapper.cc @@ -237,6 +237,88 @@ class PyBert { } }; +class PyQuantBert { + private: + lightseq::cuda::LSModel *model_; + int *d_input_; + std::vector d_outputs_; + + public: + PyQuantBert(std::string weight_path, int max_batch_size) { + model_ = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( + "QuantBert", weight_path, max_batch_size); + std::vector max_input_shape = model_->get_input_max_shape(0); + int max_size = + std::accumulate(max_input_shape.begin(), max_input_shape.end(), 1, + std::multiplies()); + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_input_, sizeof(int) * max_size)); + + for (int i = 0; i < model_->get_output_size(); i++) { + void *d_output; + std::vector shape = model_->get_output_max_shape(i); + int output_size = std::accumulate(shape.begin(), shape.end(), 1, + std::multiplies()); + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_output, output_size * sizeof(int))); + model_->set_output_ptr(i, d_output); + d_outputs_.push_back(d_output); + } + } + ~PyQuantBert() { + delete model_; + lightseq::cuda::CHECK_GPU_ERROR(cudaFree(d_input_)); + for (auto d_output : d_outputs_) { + lightseq::cuda::CHECK_GPU_ERROR(cudaFree(d_output)); + } + } + + py::array_t infer( + py::array_t input_seq) { + auto input_seq_out = input_seq.mutable_unchecked<2>(); + const int *input_seq_data = input_seq_out.data(0, 0); + int batch_size = input_seq_out.shape(0); + int batch_seq_len = input_seq_out.shape(1); + + lightseq::cuda::CHECK_GPU_ERROR( + cudaMemcpy(d_input_, input_seq_data, sizeof(int) * input_seq_out.size(), + cudaMemcpyHostToDevice)); + + model_->set_input_ptr(0, d_input_); + model_->set_input_shape(0, {batch_size, batch_seq_len}); + + model_->Infer(); + + std::vector output_shape = model_->get_output_shape(0); + auto output = py::array_t(output_shape); + float *output_data = output.mutable_data(0, 0); + lightseq::cuda::DataType output_type = model_->get_output_dtype(0); + if (output_type == lightseq::cuda::kFloat32) { + const float *d_output = + static_cast(model_->get_output_ptr(0)); + + lightseq::cuda::CHECK_GPU_ERROR(cudaMemcpy(output_data, d_output, + sizeof(float) * output.size(), + cudaMemcpyDeviceToHost)); + } else if (output_type == lightseq::cuda::kFloat16) { + const half *d_output = + static_cast(model_->get_output_ptr(0)); + std::vector h_bert_out(output.size()); + lightseq::cuda::CHECK_GPU_ERROR(cudaMemcpy(h_bert_out.data(), d_output, + sizeof(half) * output.size(), + cudaMemcpyDeviceToHost)); + for (auto i = 0; i < h_bert_out.size(); i++) { + float f_data = __half2float(h_bert_out[i]); + output_data[i] = f_data; + } + } else { + throw std::runtime_error("Not supported output type"); + } + + return output; + } +}; + class PyGpt { private: lightseq::cuda::LSModel *model_; @@ -448,6 +530,12 @@ PYBIND11_MODULE(inference, m) { .def("infer", &PyBert::infer, py::return_value_policy::reference_internal, py::arg("input_seq")); + py::class_(m, "QuantBert") + .def(py::init(), py::arg("weight_path"), + py::arg("max_batch_size")) + .def("infer", &PyQuantBert::infer, + py::return_value_policy::reference_internal, py::arg("input_seq")); + py::class_(m, "Moe") .def(py::init(), py::arg("weight_path"), py::arg("max_batch_size")) From d228cf232eaee68779ecfd8c25847997068aca1c Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Tue, 29 Mar 2022 02:12:58 +0800 Subject: [PATCH 18/49] support quant bert inference (not test) --- .../inference/model/quant_bert_encoder.cc.cu | 326 ++++++++++++------ lightseq/inference/model/quant_bert_encoder.h | 20 +- lightseq/inference/model/quant_decoder.cc.cu | 9 - lightseq/inference/model/quant_decoder.h | 1 - lightseq/inference/model/quant_encoder.cc.cu | 13 - lightseq/inference/model/quant_encoder.h | 1 - lightseq/inference/proto/quant_bert.proto | 27 +- lightseq/inference/proto/quant_bert_weight.cc | 145 ++++++-- lightseq/inference/proto/quant_bert_weight.h | 17 +- .../proto/quant_transformer_weight.cc | 35 +- lightseq/inference/pywrapper/quant_bert.cc | 8 +- lightseq/inference/pywrapper/quant_bert.h | 1 - lightseq/inference/tools/util.cc.cu | 11 + lightseq/inference/tools/util.h | 5 + 14 files changed, 422 insertions(+), 197 deletions(-) diff --git a/lightseq/inference/model/quant_bert_encoder.cc.cu b/lightseq/inference/model/quant_bert_encoder.cc.cu index 1fd4fb86..84f218de 100644 --- a/lightseq/inference/model/quant_bert_encoder.cc.cu +++ b/lightseq/inference/model/quant_bert_encoder.cc.cu @@ -1,6 +1,8 @@ #include "quant_bert_encoder.h" -#include "../kernels/embKernels.h" +#include "../kernels/embKernels_int8.h" #include "../kernels/transformerKernels.h" +#include "../kernels/transformerKernels_int8.h" +#include "cublas_helper.h" /** @file @@ -28,20 +30,14 @@ QuantBertEncoder::QuantBertEncoder( _p_d_enc_wei(tw.get_enc_wei()), _fone((_DataType)1.f), _fzero((_DataType)0.f), + _src_emb_clip_max(tw.get_src_emb_clip_max()), + _enc_clip_max(tw.get_enc_clip_max()), + _ione((int32_t)1), + _izero((int32_t)0), _atten_scaler((_DataType)sqrt(1.f / tw._dim_per_head)), _max_batch_dim(max_batch_size * tw._max_step * tw._hidden_size), - _max_thread_per_block(1024) {} - -/** -Compute GPU memory size needed by transformer encoder, - to see how these memory is used, checkout init_buffer() for detail -*/ -template -long QuantBertEncoder::compute_buffer_bytesize() { - long sz1 = _max_batch_dim * 6 + - _max_batch_size * _tw._head_num * _tw._max_step * _tw._max_step; - long sz2 = _max_batch_dim + _max_batch_size * _tw._max_step * _tw._inner_size; - return max(sz1, sz2) * sizeof(_DataType); + _max_thread_per_block(1024) { + CHECK_GPU_ERROR(cublasLtCreate(&_cublas_lt_handle)); } /** @@ -51,15 +47,129 @@ These buffer are used during custom cuda kernel function, find the corresponding function to see how these buffer are used */ template -void QuantBertEncoder::init_buffer(void *pbuf) { - _DataType *p_d_buf = reinterpret_cast<_DataType *>(pbuf); - _p_d_qkv_projected = p_d_buf; - _p_d_q = _p_d_qkv_projected + _max_batch_dim * 3; - _p_d_k = _p_d_q + _max_batch_dim; - _p_d_v = _p_d_k + _max_batch_dim; - _p_d_c = _p_d_v + _max_batch_dim; - _p_d_ffn_buf1 = p_d_buf; - _p_d_ffn_buf2 = _p_d_ffn_buf1 + _max_batch_dim; +void QuantBertEncoder::init_buffer() { + std::cout << "encoder buffer init start" << std::endl; + + _DataType *qkv_buf; + CHECK_GPU_ERROR(cudaMalloc(&qkv_buf, 3 * _max_batch_dim * sizeof(_DataType))); + _p_d_q = qkv_buf; + _p_d_k = qkv_buf + _max_batch_dim; + _p_d_v = qkv_buf + 2 * _max_batch_dim; + + CHECK_GPU_ERROR(cudaMalloc(&_p_d_c, _max_batch_size * _tw._head_num * + _tw._max_step * _tw._max_step * + sizeof(_DataType))); + + int max_batch_dim = _max_batch_size * _tw._max_step * + std::max(_tw._inner_size, _tw._hidden_size * 3); + CHECK_GPU_ERROR(cudaMalloc(&_int8_ffn_in_buf, max_batch_dim)); + CHECK_GPU_ERROR( + cudaMalloc(&_int32_ffn_out_buf, max_batch_dim * sizeof(int32_t))); + CHECK_GPU_ERROR( + cudaMalloc(&_int8_ffn_out_buf, max_batch_dim * sizeof(int8_t))); + + CHECK_GPU_ERROR( + cudaMalloc(&_int8_p_d_src_emb_wei, + _tw._src_vocab_size * _tw._hidden_size * sizeof(int8_t))); + quantize_weight(_p_d_src_emb_wei[0], _int8_p_d_src_emb_wei, _tw._hidden_size, + _tw._src_vocab_size, _quant_range / _src_emb_clip_max, + _stream, _cublas_lt_handle, kRowMajor); + + _p_device_emb.push_back(nullptr); + _p_device_emb.push_back( + to_gpu(_p_d_src_emb_wei[1], _tw._max_step * _tw._hidden_size, _stream)); + _p_device_emb.push_back( + to_gpu(_p_d_src_emb_wei[2], _tw._hidden_size, _stream)); + _p_device_emb.push_back( + to_gpu(_p_d_src_emb_wei[3], _tw._hidden_size, _stream)); + if (_tw._multilg_type != 0) { + _p_device_emb.push_back( + to_gpu(_p_d_src_emb_wei[4], _tw._hidden_size, _stream)); + } else { + _p_device_emb.push_back(nullptr); + } + + // prepare gpu memory for weight + _int8_p_d_enc_wei = std::vector(_tw._n_enc_layer * 4); + _scaled_ffn2_colsum = std::vector<_DataType *>(_tw._n_enc_layer); + for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { + _weight_offset = _layer_id * _tw._weight_per_enc_layer; + CHECK_GPU_ERROR(cudaMalloc(&_int8_p_d_enc_wei[_layer_id * 4], + _tw._hidden_size * 3 * _tw._hidden_size)); + CHECK_GPU_ERROR(cudaMalloc(&_int8_p_d_enc_wei[_layer_id * 4 + 1], + _tw._hidden_size * _tw._hidden_size)); + CHECK_GPU_ERROR(cudaMalloc(&_int8_p_d_enc_wei[_layer_id * 4 + 2], + _tw._hidden_size * _tw._inner_size)); + CHECK_GPU_ERROR(cudaMalloc(&_int8_p_d_enc_wei[_layer_id * 4 + 3], + _tw._inner_size * _tw._hidden_size)); + + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset], _tw._hidden_size, _stream)); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 1], _tw._hidden_size, _stream)); + _p_device_wei.push_back(nullptr); + _p_device_wei.push_back(to_gpu(_p_d_enc_wei[_weight_offset + 3], + _tw._hidden_size * 3, _stream)); + _p_device_wei.push_back(nullptr); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 5], _tw._hidden_size, _stream)); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 6], _tw._hidden_size, _stream)); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 7], _tw._hidden_size, _stream)); + _p_device_wei.push_back(nullptr); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 9], _tw._inner_size, _stream)); + _p_device_wei.push_back(nullptr); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 11], _tw._hidden_size, _stream)); + + quantize_weight(_p_d_enc_wei[_weight_offset + 2], + _int8_p_d_enc_wei[_layer_id * 4], _tw._hidden_size, + _tw._hidden_size * 3, + _quant_range / _enc_clip_max[_layer_id * 12], _stream, + _cublas_lt_handle); + + quantize_weight(_p_d_enc_wei[_weight_offset + 4], + _int8_p_d_enc_wei[_layer_id * 4 + 1], _tw._hidden_size, + _tw._hidden_size, + _quant_range / _enc_clip_max[_layer_id * 12 + 1], _stream, + _cublas_lt_handle); + + quantize_weight(_p_d_enc_wei[_weight_offset + 8], + _int8_p_d_enc_wei[_layer_id * 4 + 2], _tw._hidden_size, + _tw._inner_size, + _quant_range / _enc_clip_max[_layer_id * 12 + 2], _stream, + _cublas_lt_handle); + + quantize_weight(_p_d_enc_wei[_weight_offset + 10], + _int8_p_d_enc_wei[_layer_id * 4 + 3], _tw._inner_size, + _tw._hidden_size, + _quant_range / _enc_clip_max[_layer_id * 12 + 3], _stream, + _cublas_lt_handle); + + if (_tw._use_gelu) { + _scaled_ffn2_colsum[_layer_id] = nullptr; + } else { + CHECK_GPU_ERROR(cudaMalloc(&_scaled_ffn2_colsum[_layer_id], + _tw._hidden_size * sizeof(_DataType))); + float relu_scale = _enc_clip_max[_layer_id * 12 + 7] / 2; + _DataType *temp; + int weight_size = _tw._inner_size * _tw._hidden_size; + + CHECK_GPU_ERROR(cudaMalloc(&temp, weight_size * sizeof(_DataType))); + CHECK_GPU_ERROR(cudaMemcpyAsync(temp, _p_d_enc_wei[_weight_offset + 10], + weight_size * sizeof(_DataType), + cudaMemcpyHostToDevice, _stream)); + launch_scaled_colsum(temp, _scaled_ffn2_colsum[_layer_id], + _tw._inner_size, _tw._hidden_size, relu_scale, + _stream); + + CHECK_GPU_ERROR(cudaGetLastError()); + CHECK_GPU_ERROR(cudaFree(temp)); + } + } + std::cout << "encoder buffer init succeed" << std::endl; return; } @@ -115,11 +225,11 @@ void QuantBertEncoder::run_one_infer(int batch_size, #endif /* ---step2. encoder feedforward--- */ - launch_enc_emb<_DataType>(_p_d_src_emb_wei[0], _p_d_src_emb_wei[1], - _p_d_token_id, _p_d_output, _p_d_padding_mask, - _tw._padding_id, batch_size, batch_seq_len, - _tw._hidden_size, _stream, _p_d_src_emb_wei[4], - _p_d_lang_id, _tw._multilg_type); + launch_enc_emb_i8I<_DataType>( + _int8_p_d_src_emb_wei, _p_device_emb[1], _p_d_token_id, _p_d_output, + _p_d_padding_mask, _tw._padding_id, batch_size, batch_seq_len, + _tw._hidden_size, _stream, _p_device_emb[4], _p_d_lang_id, + _tw._multilg_type, _src_emb_clip_max / _quant_range); #ifdef DEBUG_RESULT for (int i = 0; i < _batch_size; i++) { // batch_id for (int j = 0; j < _batch_seq_len; j++) { // token_id @@ -135,10 +245,6 @@ void QuantBertEncoder::run_one_infer(int batch_size, self_attention(); ffn_add_norm(); } - // last layer norm - ker_norm_layer_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_output, - _p_d_src_emb_wei[2], _p_d_src_emb_wei[3], _max_thread_per_block); #ifdef DEBUG_RESULT for (int i = 0; i < _batch_size; i++) { // batch_id @@ -159,10 +265,15 @@ Encoder self attention template void QuantBertEncoder::self_attention() { /* ---step 0. layer_norm, add output_bias to "query"--- */ - ker_norm_layer_resual_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_output, _p_d_q, - _p_d_enc_wei[_weight_offset], _p_d_enc_wei[_weight_offset + 1], - _p_d_enc_wei[_weight_offset + 5], _max_thread_per_block, _tw._is_post_ln); + if (_layer_id == 0) { + ker_norm_layer_resual_i8O_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_output, + _int8_ffn_in_buf, _p_device_wei[_weight_offset], + _p_device_wei[_weight_offset + 1], _p_device_wei[_weight_offset + 5], + _max_thread_per_block, _quant_range / _enc_clip_max[_layer_id * 12 + 4], + _tw._is_post_ln, true); + } + CHECK_GPU_ERROR(cudaGetLastError()); #ifdef DEBUG_RESULT print_vec(_p_d_enc_wei[_weight_offset], "layer norm scale(head): ", 5); @@ -174,12 +285,13 @@ void QuantBertEncoder::self_attention() { /* ---step 1. qkv = ori_q * qkv_wei + bias, and reshape qkv for multi-head * gemm--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size * 3, _batch_token_num, - _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 2], _AType, - _tw._hidden_size * 3, _p_d_q, _BType, _tw._hidden_size, &_fzero, - _p_d_qkv_projected, _CType, _tw._hidden_size * 3, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + cublasLtMM_withAlgo_i8IO( + _int8_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size * 3, + _tw._hidden_size, 0, 0, 0, + _enc_clip_max[_layer_id * 12] * _enc_clip_max[_layer_id * 12 + 4] / + (_enc_clip_max[_layer_id * 12 + 8] * _quant_range), + _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4], _cublas_lt_handle, + _stream, false); #ifdef DEBUG_RESULT print_vec(_p_d_qkv_projected, "self qkv(head): ", 5); @@ -188,10 +300,11 @@ void QuantBertEncoder::self_attention() { #endif // get q, k, v by split and reshape qkv - ker_arrange_encself_qkv_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_qkv_projected, - _p_d_enc_wei[_weight_offset + 3], _p_d_q, _max_batch_dim, _batch_seq_len, - _tw._dim_per_head, _tw._head_num, _max_thread_per_block); + ker_arrange_encself_qkv_i8I_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _int8_ffn_out_buf, + _p_device_wei[_weight_offset + 3], _p_d_q, _max_batch_dim, _batch_seq_len, + _tw._dim_per_head, _tw._head_num, _max_thread_per_block, + _enc_clip_max[_layer_id * 12 + 8] / _quant_range, true); /* ---step 2. correlation = q * k, perform softmax on correlation--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( @@ -223,80 +336,95 @@ void QuantBertEncoder::self_attention() { CUBLAS_GEMM_DEFAULT_TENSOR_OP)); // use v to save reshaped q, since they are in same size and v // will not be use again before the next multi-head-attention - ker_arrange_atten_output_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_q, _p_d_v, - _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block); + ker_arrange_atten_output_i8O_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_q, _int8_ffn_in_buf, + _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block, + _quant_range / _enc_clip_max[_layer_id * 12 + 5], true); #ifdef DEBUG_RESULT print_vec(_p_d_v, "self attn before ffn(head): ", 5); #endif /* ---step 4. new_q = ori_q + new_q * output_wei--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_token_num, - _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 4], _AType, - _tw._hidden_size, _p_d_v, _BType, _tw._hidden_size, &_fone, _p_d_output, - _CType, _tw._hidden_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + cublasLtMM_withAlgo_i8IO( + _int8_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size, + _tw._hidden_size, 0, 0, 0, + _enc_clip_max[_layer_id * 12 + 1] * _enc_clip_max[_layer_id * 12 + 5] / + (_enc_clip_max[_layer_id * 12 + 9] * _quant_range), + _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 1], _cublas_lt_handle, + _stream, false); -#ifdef DEBUG_RESULT - print_vec(_p_d_output, "self attn ffn out(head): ", 5); - print_vec(_p_d_output + _batch_token_num * _tw._hidden_size - 5, - "self attn ffn out(tail): ", 5); - - print_vec(_p_d_enc_wei[_weight_offset + 4], "enc wei:", 5); -#endif + ker_residual_bias_ln_i8I_i8O_launcher<_DataType>( + _int8_ffn_out_buf, _p_device_wei[_weight_offset + 6], + _p_device_wei[_weight_offset + 7], _p_device_wei[_weight_offset + 11], + _int8_ffn_in_buf, _p_d_output, _batch_token_num, _tw._hidden_size, + _enc_clip_max[_layer_id * 12 + 9] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 6], _max_thread_per_block, + _stream, _tw._is_post_ln, true); return; } template void QuantBertEncoder::ffn_add_norm() { - /* ---step 0. layer_norm, add output_bias to "query"--- */ - ker_norm_layer_resual_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_output, _p_d_ffn_buf1, - _p_d_enc_wei[_weight_offset + 6], _p_d_enc_wei[_weight_offset + 7], - _p_d_enc_wei[_weight_offset + 11], _max_thread_per_block, - _tw._is_post_ln); - -#ifdef DEBUG_RESULT - print_vec(_p_d_enc_wei[_weight_offset + 6], "layer norm scale(head): ", 5); - print_vec(_p_d_enc_wei[_weight_offset + 7], "layer norm bias(head): ", 5); - print_vec(_p_d_ffn_buf1, "layer norm(head): ", 5); - print_vec(_p_d_ffn_buf1 + _batch_token_num * _tw._hidden_size - 5, - "layer norm(tail): ", 5); -#endif - /* ---step 1. first ffn layer--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._inner_size, _batch_token_num, - _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 8], _AType, - _tw._inner_size, _p_d_ffn_buf1, _BType, _tw._hidden_size, &_fzero, - _p_d_ffn_buf2, _CType, _tw._inner_size, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + cublasLtMM_withAlgo_i8IO( + _int8_ffn_out_buf, 1, _batch_token_num, _tw._inner_size, _tw._hidden_size, + 0, 0, 0, + _enc_clip_max[_layer_id * 12 + 2] * _enc_clip_max[_layer_id * 12 + 6] / + (_enc_clip_max[_layer_id * 12 + 10] * _quant_range), + _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 2], _cublas_lt_handle, + _stream, false); if (_tw._use_gelu) { - ker_bias_gelu_launcher<_DataType>( - _batch_token_num, _max_thread_per_block, _stream, _p_d_ffn_buf2, - _p_d_enc_wei[_weight_offset + 9], _tw._inner_size); + ker_bias_gelu_i8I_i8O_launcher<_DataType>( + _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, + _p_device_wei[_weight_offset + 9], _tw._inner_size, + _enc_clip_max[_layer_id * 12 + 10] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 7], true); } else { - ker_bias_relu_launcher<_DataType>( - _batch_token_num, _max_thread_per_block, _stream, _p_d_ffn_buf2, - _p_d_enc_wei[_weight_offset + 9], _tw._inner_size); + ker_bias_relu_i8I_i8O_launcher<_DataType>( + _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, + _p_device_wei[_weight_offset + 9], _tw._inner_size, + _enc_clip_max[_layer_id * 12 + 10] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 7], + _enc_clip_max[_layer_id * 12 + 7], true, true, true); } -#ifdef DEBUG_RESULT - print_vec(_p_d_ffn_buf2, "ffn activation(head): ", 5); - print_vec(_p_d_ffn_buf2 + _batch_token_num * _tw._hidden_size - 5, - "ffn activation(tail): ", 5); -#endif - /* ---step 2. second ffn layer--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_token_num, - _tw._inner_size, &_fone, _p_d_enc_wei[_weight_offset + 10], _AType, - _tw._hidden_size, _p_d_ffn_buf2, _BType, _tw._inner_size, &_fone, - _p_d_output, _CType, _tw._hidden_size, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + cublasLtMM_withAlgo(_int32_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size, + _tw._inner_size, 0, 0, 0, _int8_ffn_in_buf, + _int8_p_d_enc_wei[_layer_id * 4 + 3], _cublas_lt_handle, + _stream, false); + + const _DataType *scale_ptr, *bias_ptr, *res_bias_ptr; + float clip_max; + if (_layer_id == _tw._n_enc_layer - 1) { + scale_ptr = _p_device_emb[2]; + bias_ptr = _p_device_emb[3]; + + ker_residual_bias_ln_i32I_launcher<_DataType>( + _int32_ffn_out_buf, scale_ptr, bias_ptr, _p_d_output, _p_d_output, + _batch_token_num, _tw._hidden_size, + _enc_clip_max[_layer_id * 12 + 3] * _enc_clip_max[_layer_id * 12 + 7] / + (2 * _quant_range * _quant_range), + _max_thread_per_block, _stream, true, _scaled_ffn2_colsum[_layer_id]); + } else { + scale_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer]; + bias_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer + 1]; + res_bias_ptr = + _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer + 5]; + clip_max = _enc_clip_max[(_layer_id + 1) * 12 + 4]; + + ker_residual_bias_ln_i32I_i8O_launcher<_DataType>( + _int32_ffn_out_buf, scale_ptr, bias_ptr, res_bias_ptr, _int8_ffn_in_buf, + _p_d_output, _batch_token_num, _tw._hidden_size, + _enc_clip_max[_layer_id * 12 + 3] * _enc_clip_max[_layer_id * 12 + 7] / + (2 * _quant_range * _quant_range), + _quant_range / clip_max, _max_thread_per_block, _stream, + _tw._is_post_ln, true, true, _scaled_ffn2_colsum[_layer_id]); + } + return; } diff --git a/lightseq/inference/model/quant_bert_encoder.h b/lightseq/inference/model/quant_bert_encoder.h index db68d430..54a6382a 100644 --- a/lightseq/inference/model/quant_bert_encoder.h +++ b/lightseq/inference/model/quant_bert_encoder.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -45,8 +46,11 @@ class QuantBertEncoder { const QuantBertWeight &_tw; cudaStream_t _stream; cublasHandle_t _hd; + cublasLtHandle_t _cublas_lt_handle; const _DataType _fone; const _DataType _fzero; + const int32_t _ione; + const int32_t _izero; const _DataType _atten_scaler; const int _max_batch_dim; const int _max_thread_per_block; @@ -59,6 +63,10 @@ class QuantBertEncoder { _DataType *_p_d_ffn_buf1; _DataType *_p_d_ffn_buf2; + int8_t *_int8_ffn_in_buf; + int32_t *_int32_ffn_out_buf; + int8_t *_int8_ffn_out_buf; + // {token_emb, pos_emb, norm_scale, norm_bias} const std::vector &_p_d_src_emb_wei; // {multihead_norm_scale, multihead_norm_bias, multihead_qkv_kernel, @@ -67,6 +75,15 @@ class QuantBertEncoder { // ffn_first_kernel, ffn_first_bias, ffn_second_kernel, ffn_second_bias} * // encoder_layer_num const std::vector &_p_d_enc_wei; + std::vector _p_device_wei; + std::vector _p_device_emb; + + std::vector _int8_p_d_enc_wei; + int8_t *_int8_p_d_src_emb_wei; + const float _quant_range = 127; + const float _src_emb_clip_max; + const std::vector _enc_clip_max; // size: 12 * enc_layer_num + std::vector<_DataType *> _scaled_ffn2_colsum; int _batch_size; int _batch_seq_len; @@ -83,8 +100,7 @@ class QuantBertEncoder { int *p_d_padding_mask, _DataType *p_d_output, const QuantBertWeight &tw, cudaStream_t stream, cublasHandle_t hd, const int *p_d_lang_id = nullptr); - long compute_buffer_bytesize(); - void init_buffer(void *pbuf); + void init_buffer(); std::string check(); void run_one_infer(int batch_size, int batch_seq_len); }; diff --git a/lightseq/inference/model/quant_decoder.cc.cu b/lightseq/inference/model/quant_decoder.cc.cu index 1672d34f..6abbe6f8 100644 --- a/lightseq/inference/model/quant_decoder.cc.cu +++ b/lightseq/inference/model/quant_decoder.cc.cu @@ -70,15 +70,6 @@ QuantDecoder::QuantDecoder(int max_batch_size, return; } -/** -Compute GPU memory size needed by transformer decoder, - to see how these memory is used, checkout init_buffer() for detail -*/ -template -long QuantDecoder::compute_buffer_bytesize() { - return 0; -} - /** Init the GPU memory pointer which point to the memory buffer needed by decoder. diff --git a/lightseq/inference/model/quant_decoder.h b/lightseq/inference/model/quant_decoder.h index 63682766..fb524b0b 100644 --- a/lightseq/inference/model/quant_decoder.h +++ b/lightseq/inference/model/quant_decoder.h @@ -159,7 +159,6 @@ class QuantDecoder { QuantTransformerWeight& tw, cudaStream_t stream, cublasHandle_t hd, bool output_topk = false, const int* p_d_lang_id = nullptr); - long compute_buffer_bytesize(); void init_buffer(); std::string check(); void run_one_infer(int batch_size, int batch_seq_len); diff --git a/lightseq/inference/model/quant_encoder.cc.cu b/lightseq/inference/model/quant_encoder.cc.cu index 3f9d2b9d..7e14d680 100644 --- a/lightseq/inference/model/quant_encoder.cc.cu +++ b/lightseq/inference/model/quant_encoder.cc.cu @@ -45,19 +45,6 @@ QuantEncoder::QuantEncoder(int max_batch_size, int *p_d_token_id, CHECK_GPU_ERROR(cublasLtCreate(&_cublas_lt_handle)); } -/** -Compute GPU memory size needed by transformer encoder, - to see how these memory is used, checkout init_buffer() for detail -*/ -template -long QuantEncoder::compute_buffer_bytesize() { - // long sz1 = _max_batch_dim * 6 + - // _max_batch_size * _tw._head_num * _tw._max_step * _tw._max_step; - // long sz2 = _max_batch_dim + _max_batch_size * _tw._max_step * - // _tw._inner_size; return max(sz1, sz2) * sizeof(_DataType); - return 0; -} - /** Init the GPU memory pointer which point to the memory buffer needed by encoder. diff --git a/lightseq/inference/model/quant_encoder.h b/lightseq/inference/model/quant_encoder.h index d14f3fd0..953ea3b6 100644 --- a/lightseq/inference/model/quant_encoder.h +++ b/lightseq/inference/model/quant_encoder.h @@ -99,7 +99,6 @@ class QuantEncoder { _DataType *p_d_output, const QuantTransformerWeight &tw, cudaStream_t stream, cublasHandle_t hd, const int *p_d_lang_id = nullptr); - long compute_buffer_bytesize(); void init_buffer(); std::string check(); void run_one_infer(int batch_size, int batch_seq_len); diff --git a/lightseq/inference/proto/quant_bert.proto b/lightseq/inference/proto/quant_bert.proto index 51b7f727..661de482 100644 --- a/lightseq/inference/proto/quant_bert.proto +++ b/lightseq/inference/proto/quant_bert.proto @@ -21,9 +21,9 @@ message QuantBertEncoderLayer { // perform numpy.dot(input, multihead_project_kernel_qkv) will get the [query, // key, value] of // "Scaled Dot-Product Attention" - repeated float multihead_project_kernel_qkv = 3; // [hidden_size, 3, hidden_size] + bytes multihead_project_kernel_qkv = 3; // [hidden_size, 3, hidden_size] repeated float multihead_project_bias_qkv = 4; // [3, hidden_size] - repeated float multihead_project_kernel_output = 5; // [hidden_size, hidden_size] + bytes multihead_project_kernel_output = 5; // [hidden_size, hidden_size] repeated float multihead_project_bias_output = 6; // [hidden_size] // layer norm before "Feed-Forward Networks" @@ -31,21 +31,38 @@ message QuantBertEncoderLayer { repeated float ffn_norm_bias = 8; // [hidden_size] // "Feed-Forward Networks" - repeated float ffn_first_kernel = 9; // [hidden_size, inner_size] + bytes ffn_first_kernel = 9; // [hidden_size, inner_size] repeated float ffn_first_bias = 10; // [inner_size] - repeated float ffn_second_kernel = 11; // [inner_size, hidden_size] + bytes ffn_second_kernel = 11; // [inner_size, hidden_size] repeated float ffn_second_bias = 12; // [hidden_size] + + // clip max + float multihead_project_kernel_qkv_clip_max = 13; + float multihead_project_kernel_output_clip_max = 14; + float ffn_first_kernel_clip_max = 15; + float ffn_second_kernel_clip_max = 16; + float multihead_ln_clip_max = 17; + float multihead_project_output_clip_max = 18; + float ffn_ln_clip_max = 19; + float ffn_first_act_clip_max = 20; + float multihead_qkv_dense_clip_max = 21; + float multihead_output_dense_clip_max = 22; + float ffn_first_output_clip_max = 23; + float ffn_second_output_clip_max = 24; } message QuantBertEmbeddingLayer { // token embedding table // look it up directly will get the input token embedding - repeated float token_embedding = 1; // [vocab_size, hidden_size] + bytes token_embedding = 1; // [vocab_size, hidden_size] repeated float position_embedding = 2; // [max_seq_len, hidden_size] // the last layer_norm of encoder, // only for pre layer norm, repeated float norm_scale = 3; // [hidden_size] repeated float norm_bias = 4; // [hidden_size] + + // clip max + float emb_clip_max = 5; } message QuantBertModelConf { diff --git a/lightseq/inference/proto/quant_bert_weight.cc b/lightseq/inference/proto/quant_bert_weight.cc index 962f8ab3..6ad5941d 100644 --- a/lightseq/inference/proto/quant_bert_weight.cc +++ b/lightseq/inference/proto/quant_bert_weight.cc @@ -36,9 +36,11 @@ Read model config stored in custom proto file. template void QuantBertWeight::proto_get_model_config(const QuantBert &bert) { _hidden_size = bert.src_embedding().norm_scale_size(); - _inner_size = bert.encoder_stack()[0].ffn_first_kernel_size() / _hidden_size; + _inner_size = + bert.encoder_stack()[0].ffn_first_kernel().size() / _hidden_size; _max_step = bert.src_embedding().position_embedding_size() / _hidden_size; - _src_vocab_size = bert.src_embedding().token_embedding_size() / _hidden_size; + _src_vocab_size = + bert.src_embedding().token_embedding().size() / _hidden_size; _n_enc_layer = bert.encoder_stack_size(); _head_num = bert.model_conf().head_num(); _dim_per_head = _hidden_size / _head_num; @@ -60,10 +62,12 @@ std::string QuantBertWeight::proto_parse_emb_wei( int idx = 0; offset.push_back(idx); - if (layer.token_embedding_size() != _src_vocab_size * _hidden_size) + if (layer.token_embedding().size() != _src_vocab_size * _hidden_size) return "wrong token_embedding_size !"; - for (float ele : layer.token_embedding()) value.push_back(ele); + for (unsigned char ele : layer.token_embedding()) + value.push_back(dequantize(ele, _quant_range, layer.emb_clip_max())); idx += _src_vocab_size * _hidden_size; + _src_emb_clip_max = layer.emb_clip_max(); offset.push_back(idx); if (layer.position_embedding_size() != _max_step * _hidden_size) @@ -84,9 +88,7 @@ std::string QuantBertWeight::proto_parse_emb_wei( std::vector<_DataType> raw_value; for (float e : value) raw_value.push_back(float2required(e)); _d_src_emb_wei = raw_value; - for (int e : offset) - _p_d_src_emb_wei.push_back(thrust::raw_pointer_cast(_d_src_emb_wei.data()) + - e); + for (int e : offset) _p_d_src_emb_wei.push_back(_d_src_emb_wei.data() + e); std::cout << "finish initializing emb_wei from host to device" << std::endl; return ""; @@ -116,11 +118,13 @@ std::string QuantBertWeight::proto_parse_enc_wei( idx += _hidden_size; offset.push_back(idx); - if (enc_layer.multihead_project_kernel_qkv_size() != + if (enc_layer.multihead_project_kernel_qkv().size() != _hidden_size * _hidden_size * 3) return "wrong multihead_project_kernel_qkv_size !"; - for (float ele : enc_layer.multihead_project_kernel_qkv()) - value.push_back(ele); + for (unsigned char ele : enc_layer.multihead_project_kernel_qkv()) + value.push_back( + dequantize(ele, _quant_range, + enc_layer.multihead_project_kernel_qkv_clip_max())); idx += _hidden_size * _hidden_size * 3; offset.push_back(idx); @@ -131,11 +135,13 @@ std::string QuantBertWeight::proto_parse_enc_wei( idx += _hidden_size * 3; offset.push_back(idx); - if (enc_layer.multihead_project_kernel_output_size() != + if (enc_layer.multihead_project_kernel_output().size() != _hidden_size * _hidden_size) return "wrong multihead_project_kernel_output_size !"; - for (float ele : enc_layer.multihead_project_kernel_output()) - value.push_back(ele); + for (unsigned char ele : enc_layer.multihead_project_kernel_output()) + value.push_back( + dequantize(ele, _quant_range, + enc_layer.multihead_project_kernel_output_clip_max())); idx += _hidden_size * _hidden_size; offset.push_back(idx); @@ -158,9 +164,11 @@ std::string QuantBertWeight::proto_parse_enc_wei( idx += _hidden_size; offset.push_back(idx); - if (enc_layer.ffn_first_kernel_size() != _hidden_size * _inner_size) + if (enc_layer.ffn_first_kernel().size() != _hidden_size * _inner_size) return "wrong ffn_first_kernel_size !"; - for (float ele : enc_layer.ffn_first_kernel()) value.push_back(ele); + for (float ele : enc_layer.ffn_first_kernel()) + value.push_back( + dequantize(ele, _quant_range, enc_layer.ffn_first_kernel_clip_max())); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -170,9 +178,11 @@ std::string QuantBertWeight::proto_parse_enc_wei( idx += _inner_size; offset.push_back(idx); - if (enc_layer.ffn_second_kernel_size() != _hidden_size * _inner_size) + if (enc_layer.ffn_second_kernel().size() != _hidden_size * _inner_size) return "wrong ffn_second_kernel_size !"; - for (float ele : enc_layer.ffn_second_kernel()) value.push_back(ele); + for (unsigned char ele : enc_layer.ffn_second_kernel()) + value.push_back(dequantize(ele, _quant_range, + enc_layer.ffn_second_kernel_clip_max())); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -181,14 +191,27 @@ std::string QuantBertWeight::proto_parse_enc_wei( for (float ele : enc_layer.ffn_second_bias()) value.push_back(ele); idx += _hidden_size; + _enc_clip_max.push_back(enc_layer.multihead_project_kernel_qkv_clip_max()); + _enc_clip_max.push_back( + enc_layer.multihead_project_kernel_output_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_first_kernel_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_second_kernel_clip_max()); + _enc_clip_max.push_back(enc_layer.multihead_ln_clip_max()); + _enc_clip_max.push_back(enc_layer.multihead_project_output_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_ln_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_first_act_clip_max()); + _enc_clip_max.push_back(enc_layer.multihead_qkv_dense_clip_max()); + _enc_clip_max.push_back(enc_layer.multihead_output_dense_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_first_output_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_second_output_clip_max()); + } // for std::vector<_DataType> raw_value; for (float e : value) raw_value.push_back(float2required(e)); _d_enc_wei = raw_value; - for (int e : offset) - _p_d_enc_wei.push_back(thrust::raw_pointer_cast(_d_enc_wei.data()) + e); + for (int e : offset) _p_d_enc_wei.push_back(_d_enc_wei.data() + e); std::cout << "finish initializing enc_wei from host to device" << std::endl; return ""; } @@ -251,16 +274,23 @@ void QuantBertWeight::hdf5_parse_emb_wei(hid_t hdf5_file) { std::vector offset; std::vector value(value_size); // preallocate vector for performance + std::vector value_i8(value_size); std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) << " MB of embedding weight." << std::endl; int idx = 0; + float clip_max; offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/token_embedding", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/token_embedding", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _src_vocab_size * _hidden_size; }, "Wrong token_embedding_size !"); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/emb_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _src_vocab_size * _hidden_size); + _src_emb_clip_max = clip_max; idx += _src_vocab_size * _hidden_size; offset.push_back(idx); @@ -289,9 +319,7 @@ void QuantBertWeight::hdf5_parse_emb_wei(hid_t hdf5_file) { raw_value.reserve(value.size()); for (float e : value) raw_value.push_back(float2required(e)); _d_src_emb_wei = raw_value; - for (int e : offset) - _p_d_src_emb_wei.push_back(thrust::raw_pointer_cast(_d_src_emb_wei.data()) + - e); + for (int e : offset) _p_d_src_emb_wei.push_back(_d_src_emb_wei.data() + e); std::cout << "Finish loading src_emb_wei from host to device" << std::endl; } @@ -309,9 +337,11 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { _n_enc_layer; std::vector offset; std::vector value(value_size); + std::vector value_i8(value_size); std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) << " MB of encoder weight." << std::endl; + float clip_max; int idx = 0; for (int layer_id = 0; layer_id < _n_enc_layer; ++layer_id) { std::string dataset_prefix = "encoder_stack/" + std::to_string(layer_id); @@ -333,7 +363,7 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size * 3; }, "Wrong multihead_project_kernel_qkv_size !"); idx += _hidden_size * _hidden_size * 3; @@ -344,14 +374,26 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { H5T_NATIVE_FLOAT, value.data() + idx, [=](int size) { return size != _hidden_size * 3; }, "Wrong multihead_project_bias_qkv_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size * 3); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * 3; offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/multihead_project_kernel_output", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size; }, "Wrong multihead_project_kernel_output_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_kernel_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; offset.push_back(idx); @@ -378,10 +420,16 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _hidden_size * _inner_size; }, "Wrong ffn_first_kernel_size !"); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_kernel_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _inner_size); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -393,10 +441,16 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _hidden_size * _inner_size; }, "Wrong ffn_second_kernel_size !"); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_second_kernel_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _inner_size); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -406,6 +460,34 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { "Wrong ffn_second_bias_size !"); idx += _hidden_size; + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/multihead_ln_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/ffn_ln_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_act_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/multihead_qkv_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_output_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + _enc_clip_max.push_back(0.0); } // for std::vector<_DataType> raw_value; @@ -413,8 +495,7 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { for (float e : value) raw_value.push_back(float2required(e)); _d_enc_wei = raw_value; - for (int e : offset) - _p_d_enc_wei.push_back(thrust::raw_pointer_cast(_d_enc_wei.data()) + e); + for (int e : offset) _p_d_enc_wei.push_back(_d_enc_wei.data() + e); std::cout << "Finish loading enc_wei from host to device" << std::endl; } diff --git a/lightseq/inference/proto/quant_bert_weight.h b/lightseq/inference/proto/quant_bert_weight.h index 66bda4e5..c511f133 100644 --- a/lightseq/inference/proto/quant_bert_weight.h +++ b/lightseq/inference/proto/quant_bert_weight.h @@ -5,7 +5,6 @@ #include #include #include -#include #include #include @@ -39,9 +38,13 @@ class QuantBertWeight { std::vector _p_d_src_emb_wei; // size: 4 std::vector _p_d_enc_wei; // size: 12 * enc_layer_num - // store the weights on gpu memory - thrust::device_vector<_DataType> _d_src_emb_wei; - thrust::device_vector<_DataType> _d_enc_wei; + // store the weights on cpu memory + std::vector<_DataType> _d_src_emb_wei; + std::vector<_DataType> _d_enc_wei; + + // store the clip_max of weights and activations + float _src_emb_clip_max; + std::vector _enc_clip_max; // size: 12 * enc_layer_num public: std::string initializing(std::string proto_path); @@ -60,6 +63,10 @@ class QuantBertWeight { return _p_d_enc_wei; } + float get_src_emb_clip_max() const { return _src_emb_clip_max; } + + std::vector get_enc_clip_max() const { return _enc_clip_max; } + int _hidden_size; int _inner_size; int _max_step; @@ -74,6 +81,8 @@ class QuantBertWeight { bool _use_gelu; int _multilg_type; + const float _quant_range = 127; + void print_model_config() { std::cout << "***model config***" << std::endl; std::cout << "encoder layers: " << _n_enc_layer << std::endl; diff --git a/lightseq/inference/proto/quant_transformer_weight.cc b/lightseq/inference/proto/quant_transformer_weight.cc index baa819c3..1c8c8eef 100644 --- a/lightseq/inference/proto/quant_transformer_weight.cc +++ b/lightseq/inference/proto/quant_transformer_weight.cc @@ -28,17 +28,6 @@ __half QuantTransformerWeight::float2required( return __float2half_rn(value); } -__inline__ float dequantize(unsigned char i, float scale, float clip_max) { - return (float(i) - scale) * clip_max / scale; -} - -void copy_i8_to_float(std::vector &i8, std::vector &f, - float clip_max, float quant_range, int start, int num) { - for (int i = start; i < start + num; ++i) { - f[i] = dequantize(i8[i], quant_range, clip_max); - } -} - /** Read model config stored in custom proto file. */ @@ -670,7 +659,7 @@ void QuantTransformerWeight::hdf5_parse_emb_wei(hid_t hdf5_file, "Wrong token_embedding_size !"); read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/emb_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, vocab_size * _hidden_size); if (source == "src") _src_emb_clip_max = clip_max; @@ -730,7 +719,7 @@ void QuantTransformerWeight::hdf5_parse_emb_wei(hid_t hdf5_file, [=](int size) { return size != _n_dec_layer; }, "Wrong encode_output_project_kernel_kv_clip_max_size !"); for (int i = 0; i < _n_dec_layer; ++i) { - copy_i8_to_float(value_i8, value, + dequantize_array(value_i8, value, _encode_output_project_kernel_kv_clip_max[i], _quant_range, idx + _hidden_size * _hidden_size * 2 * i, _hidden_size * _hidden_size * 2); @@ -831,7 +820,7 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar( hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _hidden_size * 3); _enc_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size * 3; @@ -853,7 +842,7 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar( hdf5_file, dataset_prefix + "/multihead_project_kernel_output_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _hidden_size); _enc_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; @@ -889,7 +878,7 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/ffn_first_kernel_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _inner_size); _enc_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; @@ -910,7 +899,7 @@ void QuantTransformerWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/ffn_second_kernel_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _inner_size); _enc_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; @@ -1008,7 +997,7 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar( hdf5_file, dataset_prefix + "/self_project_kernel_qkv_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _hidden_size * 3); _dec_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size * 3; @@ -1029,7 +1018,7 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar( hdf5_file, dataset_prefix + "/self_project_kernel_output_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _hidden_size); _dec_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; @@ -1065,7 +1054,7 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar( hdf5_file, dataset_prefix + "/encdec_project_kernel_q_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _hidden_size); _dec_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; @@ -1086,7 +1075,7 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar( hdf5_file, dataset_prefix + "/encdec_project_kernel_output_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _hidden_size); _dec_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; @@ -1122,7 +1111,7 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/ffn_first_kernel_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _inner_size); _dec_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; @@ -1143,7 +1132,7 @@ void QuantTransformerWeight::hdf5_parse_dec_wei(hid_t hdf5_file) { read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/ffn_second_kernel_clip_max", H5T_NATIVE_FLOAT, &clip_max); - copy_i8_to_float(value_i8, value, clip_max, _quant_range, idx, + dequantize_array(value_i8, value, clip_max, _quant_range, idx, _hidden_size * _inner_size); _dec_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; diff --git a/lightseq/inference/pywrapper/quant_bert.cc b/lightseq/inference/pywrapper/quant_bert.cc index 74327f6c..a205ca96 100644 --- a/lightseq/inference/pywrapper/quant_bert.cc +++ b/lightseq/inference/pywrapper/quant_bert.cc @@ -46,12 +46,7 @@ QuantBert::QuantBert(const std::string weight_path, const int max_batch_size) throw std::runtime_error(res); } - long buf_bytesize = encoder_->compute_buffer_bytesize(); - std::cout << "Bert buf_bytesize: " << buf_bytesize << std::endl; - - // encoder and decoder use the same buffer to save gpu memory useage - CHECK_GPU_ERROR(cudaMalloc(&d_buf_, (size_t)buf_bytesize)); - encoder_->init_buffer(d_buf_); + encoder_->init_buffer(); CHECK_GPU_ERROR(cudaStreamSynchronize(stream_)); } @@ -59,7 +54,6 @@ QuantBert::~QuantBert() { CHECK_GPU_ERROR(cudaFree(d_input_)); CHECK_GPU_ERROR(cudaFree(d_padding_mask_)); CHECK_GPU_ERROR(cudaFree(d_encoder_output_)); - CHECK_GPU_ERROR(cudaFree(d_buf_)); CHECK_GPU_ERROR(cublasDestroy(hd_)); CHECK_GPU_ERROR(cudaStreamDestroy(stream_)); } diff --git a/lightseq/inference/pywrapper/quant_bert.h b/lightseq/inference/pywrapper/quant_bert.h index a73e5bd8..79f6478d 100644 --- a/lightseq/inference/pywrapper/quant_bert.h +++ b/lightseq/inference/pywrapper/quant_bert.h @@ -25,7 +25,6 @@ class QuantBert : public LSModel { int _max_batch_size; cudaStream_t stream_; cublasHandle_t hd_; - void *d_buf_; QuantBertWeight tw_; public: diff --git a/lightseq/inference/tools/util.cc.cu b/lightseq/inference/tools/util.cc.cu index b25b599e..31e247f5 100644 --- a/lightseq/inference/tools/util.cc.cu +++ b/lightseq/inference/tools/util.cc.cu @@ -343,5 +343,16 @@ int read_hdf5_dataset_scalar(hid_t hdf5_file, std::string dataset_name, [](int size) { return size != 1; }, "Expect scalar with shape of 1."); } +float dequantize(unsigned char i, float scale, float clip_max) { + return (float(i) - scale) * clip_max / scale; +} + +void dequantize_array(std::vector& i8, std::vector& f, + float clip_max, float quant_range, int start, int num) { + for (int i = start; i < start + num; ++i) { + f[i] = dequantize(i8[i], quant_range, clip_max); + } +} + } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/tools/util.h b/lightseq/inference/tools/util.h index 468092a9..310144f5 100644 --- a/lightseq/inference/tools/util.h +++ b/lightseq/inference/tools/util.h @@ -225,5 +225,10 @@ T* to_gpu(const T* host_pointer, int size, cudaStream_t stream) { return gpu_pointer; } +float dequantize(unsigned char i, float scale, float clip_max); + +void dequantize_array(std::vector& i8, std::vector& f, + float clip_max, float quant_range, int start, int num); + } // namespace cuda } // namespace lightseq From 3400a1d92aad025e0936a2526e5e1fdb5b04a569 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Tue, 29 Mar 2022 02:21:59 +0800 Subject: [PATCH 19/49] fix quant bert expoort name bug --- .../python/export/huggingface/ls_torch_hf_quant_bert_export.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py b/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py index 6b3419eb..18dc2f1a 100644 --- a/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py +++ b/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py @@ -144,7 +144,7 @@ def extract_bert_weights( "src_embedding/token_embedding", data=token_embedding, dtype="uint8" ) hdf5_file.create_dataset( - "src_embedding/src_emb_clip_max", + "src_embedding/emb_clip_max", data=state_dict["bert.embeddings.emb_quant.clip.clip_value_max"], ) From 968f9ac780423fa5ea5d99424431426fe542d557 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 30 Mar 2022 17:23:50 +0800 Subject: [PATCH 20/49] support quant bert inference --- .../huggingface/bert/task_ner/run_ner.py | 7 ++ .../inference/kernels/embKernels_int8.cc.cu | 47 ++++----- lightseq/inference/kernels/embKernels_int8.h | 4 +- .../inference/model/quant_bert_encoder.cc.cu | 99 ++++++++++++++----- lightseq/inference/model/quant_decoder.cc.cu | 17 +++- lightseq/inference/model/quant_encoder.cc.cu | 21 ++-- lightseq/inference/proto/quant_bert_weight.cc | 12 +-- 7 files changed, 138 insertions(+), 69 deletions(-) diff --git a/examples/training/huggingface/bert/task_ner/run_ner.py b/examples/training/huggingface/bert/task_ner/run_ner.py index e4729278..b077246f 100644 --- a/examples/training/huggingface/bert/task_ner/run_ner.py +++ b/examples/training/huggingface/bert/task_ner/run_ner.py @@ -28,6 +28,7 @@ import numpy as np from datasets import ClassLabel, load_dataset, load_metric +import torch import transformers from transformers import ( @@ -519,6 +520,12 @@ def compute_metrics(p): compute_metrics=compute_metrics, ) + if not training_args.do_train: + state_dict = torch.load( + training_args.resume_from_checkpoint, map_location="cpu" + ) + trainer._load_state_dict_in_model(state_dict) + # Training if training_args.do_train: checkpoint = None diff --git a/lightseq/inference/kernels/embKernels_int8.cc.cu b/lightseq/inference/kernels/embKernels_int8.cc.cu index bade6241..28251303 100644 --- a/lightseq/inference/kernels/embKernels_int8.cc.cu +++ b/lightseq/inference/kernels/embKernels_int8.cc.cu @@ -14,7 +14,8 @@ template __global__ void ker_enc_emb_i8I(const int8_t *token_emb, const T *pos_emb, const int *tokens, T *output, int *pad_mask, int pad_id, int batch_size, int seq_len, - int hidden_dim, float dequant_scale) { + int hidden_dim, float dequant_scale, + bool scaled) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx >= batch_size * seq_len * hidden_dim) { return; @@ -39,7 +40,8 @@ __global__ void ker_enc_emb_i8I(const int8_t *token_emb, const T *pos_emb, } char4 value_i4 = ((char4 *)token_emb)[token * hidden_dim + dim_idx]; float4 pemb = ((float4 *)pos_emb)[seq_idx * hidden_dim + dim_idx]; - float scale = dequant_scale * sqrtf(hidden_dim << 2); + float scale = dequant_scale; + if (scaled) scale *= sqrtf(hidden_dim << 2); value.x = float(value_i4.x) * scale + pemb.x; value.y = float(value_i4.y) * scale + pemb.y; value.z = float(value_i4.z) * scale + pemb.z; @@ -49,12 +51,10 @@ __global__ void ker_enc_emb_i8I(const int8_t *token_emb, const T *pos_emb, } template <> -__global__ void ker_enc_emb_i8I<__half>(const int8_t *token_emb, - const __half *pos_emb, - const int *tokens, __half *output, - int *pad_mask, int pad_id, - int batch_size, int seq_len, - int hidden_dim, float dequant_scale) { +__global__ void ker_enc_emb_i8I<__half>( + const int8_t *token_emb, const __half *pos_emb, const int *tokens, + __half *output, int *pad_mask, int pad_id, int batch_size, int seq_len, + int hidden_dim, float dequant_scale, bool scaled) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx >= batch_size * seq_len * hidden_dim) { return; @@ -82,7 +82,8 @@ __global__ void ker_enc_emb_i8I<__half>(const int8_t *token_emb, __half2 *value_h2 = (__half2 *)(&value); char2 *value_i2 = (char2 *)(&value_i8); __half2 *pemb_h2 = (__half2 *)(&pemb); - float scale = dequant_scale * sqrtf(hidden_dim << 3); + float scale = dequant_scale; + if (scaled) scale *= sqrtf(hidden_dim << 3); #pragma unroll for (int i = 0; i < 4; i++) { float2 value_f2; @@ -101,7 +102,7 @@ void launch_enc_emb_i8I(const int8_t *token_emb, const T *pos_emb, int batch_size, int seq_len, int hidden_dim, cudaStream_t stream, const T *lang_emb, const int *lang_id, int multilg_type, - float dequant_scale) { + float dequant_scale, bool scaled) { if (hidden_dim % 4 != 0) { throw std::runtime_error("violate hidden_dim % 4 = 0"); } @@ -111,7 +112,7 @@ void launch_enc_emb_i8I(const int8_t *token_emb, const T *pos_emb, if (multilg_type == 0) { ker_enc_emb_i8I<<>>( token_emb, pos_emb, tokens, output, pad_mask, pad_id, batch_size, - seq_len, hidden_dim, dequant_scale); + seq_len, hidden_dim, dequant_scale, scaled); } else { throw std::runtime_error("multilingle not supported"); } @@ -124,7 +125,7 @@ void launch_enc_emb_i8I<__half>(const int8_t *token_emb, const __half *pos_emb, int seq_len, int hidden_dim, cudaStream_t stream, const __half *lang_emb, const int *lang_id, int multilg_type, - float dequant_scale) { + float dequant_scale, bool scaled) { if (hidden_dim % 8 != 0) { throw std::runtime_error("violate hidden_dim % 8 = 0"); } @@ -135,7 +136,7 @@ void launch_enc_emb_i8I<__half>(const int8_t *token_emb, const __half *pos_emb, if (multilg_type == 0) { ker_enc_emb_i8I<__half><<>>( token_emb, pos_emb, tokens, output, pad_mask, pad_id, batch_size, - seq_len, hidden_dim, dequant_scale); + seq_len, hidden_dim, dequant_scale, scaled); } else { throw std::runtime_error("multilingle not supported"); } @@ -145,13 +146,13 @@ template void launch_enc_emb_i8I( const int8_t *token_emb, const float *pos_emb, const int *tokens, float *output, int *pad_mask, int pad_id, int batch_size, int seq_len, int hidden_dim, cudaStream_t stream, const float *lang_emb, - const int *lang_id, int multilg_type, float dequant_scale); + const int *lang_id, int multilg_type, float dequant_scale, bool scaled); template void launch_enc_emb_i8I<__half>( const int8_t *token_emb, const __half *pos_emb, const int *tokens, __half *output, int *pad_mask, int pad_id, int batch_size, int seq_len, int hidden_dim, cudaStream_t stream, const __half *lang_emb, - const int *lang_id, int multilg_type, float dequant_scale); + const int *lang_id, int multilg_type, float dequant_scale, bool scaled); template __global__ void ker_dec_emb_i8I(const int8_t *token_emb, const T *pos_emb, @@ -159,7 +160,7 @@ __global__ void ker_dec_emb_i8I(const int8_t *token_emb, const T *pos_emb, const int *lang_id, T *output, int batch_size, int beam_size, int hidden_dim, int vocab_size, int step, int max_step, int multilg_type, - float dequant_scale) { + float dequant_scale, bool scaled) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx >= batch_size * beam_size * hidden_dim) { return; @@ -170,8 +171,10 @@ __global__ void ker_dec_emb_i8I(const int8_t *token_emb, const T *pos_emb, int8_t emb; int token = tokens[flat_3dim(batch_idx, beam_idx, step, beam_size, max_step)]; emb = token_emb[flat_2dim(dim_idx, token, vocab_size)]; - float value = float(emb) * dequant_scale * sqrtf(hidden_dim) + - float(pos_emb[flat_2dim(step, dim_idx, hidden_dim)]); + float scale = dequant_scale; + if (scaled) scale *= sqrtf(hidden_dim); + float value = + float(emb) * scale + float(pos_emb[flat_2dim(step, dim_idx, hidden_dim)]); output[idx] = T(value); } @@ -181,7 +184,7 @@ void launch_dec_emb_i8I(const int8_t *token_emb, const T *pos_emb, int *tokens, int batch_size, int beam_size, int hidden_dim, int vocab_size, int step, int max_step, int multilg_type, cudaStream_t stream, - float dequant_scale) { + float dequant_scale, bool scaled) { if (step >= max_step) { throw std::runtime_error("violate step < max_step"); } @@ -193,19 +196,19 @@ void launch_dec_emb_i8I(const int8_t *token_emb, const T *pos_emb, int *tokens, ker_dec_emb_i8I<<>>( token_emb, pos_emb, tokens, lang_emb, lang_id, output, batch_size, beam_size, hidden_dim, vocab_size, step, max_step, multilg_type, - dequant_scale); + dequant_scale, scaled); } template void launch_dec_emb_i8I( const int8_t *token_emb, const float *pos_emb, int *tokens, const float *lang_emb, const int *lang_id, float *output, int batch_size, int beam_size, int hidden_dim, int vocab_size, int step, int max_step, - int multilg_type, cudaStream_t stream, float dequant_scale); + int multilg_type, cudaStream_t stream, float dequant_scale, bool scaled); template void launch_dec_emb_i8I<__half>( const int8_t *token_emb, const __half *pos_emb, int *tokens, const __half *lang_emb, const int *lang_id, __half *output, int batch_size, int beam_size, int hidden_dim, int vocab_size, int step, int max_step, - int multilg_type, cudaStream_t stream, float dequant_scale); + int multilg_type, cudaStream_t stream, float dequant_scale, bool scaled); } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/kernels/embKernels_int8.h b/lightseq/inference/kernels/embKernels_int8.h index 6ec8fde1..a914f9f1 100644 --- a/lightseq/inference/kernels/embKernels_int8.h +++ b/lightseq/inference/kernels/embKernels_int8.h @@ -11,7 +11,7 @@ void launch_enc_emb_i8I(const int8_t *token_emb, const T *pos_emb, int batch_size, int seq_len, int hidden_dim, cudaStream_t stream, const T *lang_emb, const int *lang_id, int multilg_type, - float dequant_scale); + float dequant_scale, bool scaled = true); template void launch_dec_emb_i8I(const int8_t *token_emb, const T *pos_emb, int *tokens, @@ -19,7 +19,7 @@ void launch_dec_emb_i8I(const int8_t *token_emb, const T *pos_emb, int *tokens, int batch_size, int beam_size, int hidden_dim, int vocab_size, int step, int max_step, int multilg_type, cudaStream_t stream, - float dequant_scale); + float dequant_scale, bool scaled = true); } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/model/quant_bert_encoder.cc.cu b/lightseq/inference/model/quant_bert_encoder.cc.cu index 84f218de..358eaaa3 100644 --- a/lightseq/inference/model/quant_bert_encoder.cc.cu +++ b/lightseq/inference/model/quant_bert_encoder.cc.cu @@ -229,7 +229,7 @@ void QuantBertEncoder::run_one_infer(int batch_size, _int8_p_d_src_emb_wei, _p_device_emb[1], _p_d_token_id, _p_d_output, _p_d_padding_mask, _tw._padding_id, batch_size, batch_seq_len, _tw._hidden_size, _stream, _p_device_emb[4], _p_d_lang_id, - _tw._multilg_type, _src_emb_clip_max / _quant_range); + _tw._multilg_type, _src_emb_clip_max / _quant_range, false); #ifdef DEBUG_RESULT for (int i = 0; i < _batch_size; i++) { // batch_id for (int j = 0; j < _batch_seq_len; j++) { // token_id @@ -276,11 +276,14 @@ void QuantBertEncoder::self_attention() { CHECK_GPU_ERROR(cudaGetLastError()); #ifdef DEBUG_RESULT - print_vec(_p_d_enc_wei[_weight_offset], "layer norm scale(head): ", 5); - print_vec(_p_d_enc_wei[_weight_offset + 1], "layer norm bias(head): ", 5); - print_vec(_p_d_q, "layer norm out(head): ", 5); - print_vec(_p_d_q + _batch_token_num * _tw._hidden_size - 5, - "layer norm out(tail): ", 5); + for (int i = 0; i < _batch_size; i++) { // batch_id + for (int j = 0; j < _batch_seq_len; j++) { // token_id + std::cout << "qkv_attn input: token-" << j << std::endl; + print_vec(_int8_ffn_in_buf + i * _batch_seq_len * _tw._hidden_size + + j * _tw._hidden_size, + "qkv_attn input", 10); + } + } #endif /* ---step 1. qkv = ori_q * qkv_wei + bias, and reshape qkv for multi-head @@ -293,12 +296,6 @@ void QuantBertEncoder::self_attention() { _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4], _cublas_lt_handle, _stream, false); -#ifdef DEBUG_RESULT - print_vec(_p_d_qkv_projected, "self qkv(head): ", 5); - print_vec(_p_d_qkv_projected + _batch_token_num * _tw._hidden_size * 3 - 5, - "self qkv(tail): ", 5); -#endif - // get q, k, v by split and reshape qkv ker_arrange_encself_qkv_i8I_launcher<_DataType>( _batch_token_num, _tw._hidden_size, _stream, _int8_ffn_out_buf, @@ -319,12 +316,6 @@ void QuantBertEncoder::self_attention() { _batch_size, _batch_seq_len, _tw._head_num, _stream, _p_d_c, _p_d_padding_mask); -#ifdef DEBUG_RESULT - print_vec(_p_d_c, "self attn correlation(head): ", 5); - print_vec(_p_d_c + _batch_token_num * _tw._head_num * _batch_seq_len - 5, - "self attn correlation(tail): ", 5); -#endif - /* ---step 3. new_q = correlation * v--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._dim_per_head, _batch_seq_len, @@ -342,7 +333,14 @@ void QuantBertEncoder::self_attention() { _quant_range / _enc_clip_max[_layer_id * 12 + 5], true); #ifdef DEBUG_RESULT - print_vec(_p_d_v, "self attn before ffn(head): ", 5); + for (int i = 0; i < _batch_size; i++) { // batch_id + for (int j = 0; j < _batch_seq_len; j++) { // token_id + std::cout << "out_attn input: token-" << j << std::endl; + print_vec(_int8_ffn_in_buf + i * _batch_seq_len * _tw._hidden_size + + j * _tw._hidden_size, + "out_attn input", 10); + } + } #endif /* ---step 4. new_q = ori_q + new_q * output_wei--- */ @@ -354,6 +352,17 @@ void QuantBertEncoder::self_attention() { _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 1], _cublas_lt_handle, _stream, false); +#ifdef DEBUG_RESULT + for (int i = 0; i < _batch_size; i++) { // batch_id + for (int j = 0; j < _batch_seq_len; j++) { // token_id + std::cout << "attn_ln input: token-" << j << std::endl; + print_vec(_int8_ffn_in_buf + i * _batch_seq_len * _tw._hidden_size + + j * _tw._hidden_size, + "attn_ln input", 10); + } + } +#endif + ker_residual_bias_ln_i8I_i8O_launcher<_DataType>( _int8_ffn_out_buf, _p_device_wei[_weight_offset + 6], _p_device_wei[_weight_offset + 7], _p_device_wei[_weight_offset + 11], @@ -367,6 +376,17 @@ void QuantBertEncoder::self_attention() { template void QuantBertEncoder::ffn_add_norm() { +#ifdef DEBUG_RESULT + for (int i = 0; i < _batch_size; i++) { // batch_id + for (int j = 0; j < _batch_seq_len; j++) { // token_id + std::cout << "ffn1 input: token-" << j << std::endl; + print_vec(_int8_ffn_out_buf + i * _batch_seq_len * _tw._hidden_size + + j * _tw._hidden_size, + "ffn1 input", 10); + } + } +#endif + /* ---step 1. first ffn layer--- */ cublasLtMM_withAlgo_i8IO( _int8_ffn_out_buf, 1, _batch_token_num, _tw._inner_size, _tw._hidden_size, @@ -391,6 +411,17 @@ void QuantBertEncoder::ffn_add_norm() { _enc_clip_max[_layer_id * 12 + 7], true, true, true); } +#ifdef DEBUG_RESULT + for (int i = 0; i < _batch_size; i++) { // batch_id + for (int j = 0; j < _batch_seq_len; j++) { // token_id + std::cout << "ffn2 input: token-" << j << std::endl; + print_vec(_int8_ffn_in_buf + i * _batch_seq_len * _tw._inner_size + + j * _tw._inner_size, + "ffn2 input", 10); + } + } +#endif + /* ---step 2. second ffn layer--- */ cublasLtMM_withAlgo(_int32_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size, _tw._inner_size, 0, 0, 0, _int8_ffn_in_buf, @@ -398,16 +429,23 @@ void QuantBertEncoder::ffn_add_norm() { _stream, false); const _DataType *scale_ptr, *bias_ptr, *res_bias_ptr; - float clip_max; + float clip_max, dequant_scale; + if (_tw._use_gelu) { + dequant_scale = _enc_clip_max[_layer_id * 12 + 3] * + _enc_clip_max[_layer_id * 12 + 7] / + (_quant_range * _quant_range); + } else { + dequant_scale = _enc_clip_max[_layer_id * 12 + 3] * + _enc_clip_max[_layer_id * 12 + 7] / + (2 * _quant_range * _quant_range); + } if (_layer_id == _tw._n_enc_layer - 1) { scale_ptr = _p_device_emb[2]; bias_ptr = _p_device_emb[3]; ker_residual_bias_ln_i32I_launcher<_DataType>( _int32_ffn_out_buf, scale_ptr, bias_ptr, _p_d_output, _p_d_output, - _batch_token_num, _tw._hidden_size, - _enc_clip_max[_layer_id * 12 + 3] * _enc_clip_max[_layer_id * 12 + 7] / - (2 * _quant_range * _quant_range), + _batch_token_num, _tw._hidden_size, dequant_scale, _max_thread_per_block, _stream, true, _scaled_ffn2_colsum[_layer_id]); } else { scale_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer]; @@ -418,11 +456,20 @@ void QuantBertEncoder::ffn_add_norm() { ker_residual_bias_ln_i32I_i8O_launcher<_DataType>( _int32_ffn_out_buf, scale_ptr, bias_ptr, res_bias_ptr, _int8_ffn_in_buf, - _p_d_output, _batch_token_num, _tw._hidden_size, - _enc_clip_max[_layer_id * 12 + 3] * _enc_clip_max[_layer_id * 12 + 7] / - (2 * _quant_range * _quant_range), + _p_d_output, _batch_token_num, _tw._hidden_size, dequant_scale, _quant_range / clip_max, _max_thread_per_block, _stream, _tw._is_post_ln, true, true, _scaled_ffn2_colsum[_layer_id]); + +#ifdef DEBUG_RESULT + for (int i = 0; i < _batch_size; i++) { // batch_id + for (int j = 0; j < _batch_seq_len; j++) { // token_id + std::cout << "encoder layer out: token-" << j << std::endl; + print_vec(_int8_ffn_in_buf + i * _batch_seq_len * _tw._hidden_size + + j * _tw._hidden_size, + "encoder layer out", 10); + } + } +#endif } return; diff --git a/lightseq/inference/model/quant_decoder.cc.cu b/lightseq/inference/model/quant_decoder.cc.cu index 6abbe6f8..42ac5402 100644 --- a/lightseq/inference/model/quant_decoder.cc.cu +++ b/lightseq/inference/model/quant_decoder.cc.cu @@ -564,7 +564,7 @@ void QuantDecoder::embedding() { _p_device_emb[7], _p_d_lang_id, _p_d_cur_step_query, _batch_size, _tw._beam_size, _tw._hidden_size, _tw._trg_vocab_size, _cur_step, _tw._max_step, _tw._multilg_type, _stream, - _trg_emb_clip_max / _quant_range); + _trg_emb_clip_max / _quant_range, true); #ifdef DEBUG_RESULT for (int i = 0; i < _batch_size; i++) { // batch_id for (int j = 0; j < _tw._beam_size; j++) { // beam_id @@ -862,7 +862,16 @@ void QuantDecoder::ffn_add_norm() { _tw._inner_size, 0, 0, 0, 1, _cublas_lt_handle, _stream); const _DataType *scale_ptr, *bias_ptr, *res_bias_ptr; - float clip_max; + float clip_max, dequant_scale; + if (_tw._use_gelu) { + dequant_scale = _dec_clip_max[_layer_id * 19 + 5] * + _dec_clip_max[_layer_id * 19 + 11] / + (_quant_range * _quant_range); + } else { + dequant_scale = _dec_clip_max[_layer_id * 19 + 5] * + _dec_clip_max[_layer_id * 19 + 11] / + (2 * _quant_range * _quant_range); + } if (_layer_id == _tw._n_dec_layer - 1) { scale_ptr = _p_device_emb[2]; bias_ptr = _p_device_emb[3]; @@ -878,9 +887,7 @@ void QuantDecoder::ffn_add_norm() { ker_residual_bias_ln_i32I_i8O_launcher<_DataType>( _int32_ffn_out_buf, scale_ptr, bias_ptr, res_bias_ptr, _int8_ffn_in_buf, - _p_d_cur_step_query, _step_token_num, _tw._hidden_size, - _dec_clip_max[_layer_id * 19 + 5] * _dec_clip_max[_layer_id * 19 + 11] / - (2 * _quant_range * _quant_range), + _p_d_cur_step_query, _step_token_num, _tw._hidden_size, dequant_scale, _quant_range / clip_max, _max_thread_per_block, _stream, _tw._is_post_ln, false, true, _scaled_ffn2_colsum[_layer_id]); diff --git a/lightseq/inference/model/quant_encoder.cc.cu b/lightseq/inference/model/quant_encoder.cc.cu index 7e14d680..1592e974 100644 --- a/lightseq/inference/model/quant_encoder.cc.cu +++ b/lightseq/inference/model/quant_encoder.cc.cu @@ -234,7 +234,7 @@ void QuantEncoder::run_one_infer(int batch_size, int batch_seq_len) { _int8_p_d_src_emb_wei, _p_device_emb[1], _p_d_token_id, _p_d_output, _p_d_padding_mask, _tw._padding_id, batch_size, batch_seq_len, _tw._hidden_size, _stream, _p_device_emb[4], _p_d_lang_id, - _tw._multilg_type, _src_emb_clip_max / _quant_range); + _tw._multilg_type, _src_emb_clip_max / _quant_range, true); #ifdef DEBUG_RESULT for (int i = 0; i < _batch_size; i++) { // batch_id for (int j = 0; j < _batch_seq_len; j++) { // token_id @@ -380,16 +380,23 @@ void QuantEncoder::ffn_add_norm() { _stream, false); const _DataType *scale_ptr, *bias_ptr, *res_bias_ptr; - float clip_max; + float clip_max, dequant_scale; + if (_tw._use_gelu) { + dequant_scale = _enc_clip_max[_layer_id * 12 + 3] * + _enc_clip_max[_layer_id * 12 + 7] / + (_quant_range * _quant_range); + } else { + dequant_scale = _enc_clip_max[_layer_id * 12 + 3] * + _enc_clip_max[_layer_id * 12 + 7] / + (2 * _quant_range * _quant_range); + } if (_layer_id == _tw._n_enc_layer - 1) { scale_ptr = _p_device_emb[2]; bias_ptr = _p_device_emb[3]; ker_residual_bias_ln_i32I_launcher<_DataType>( _int32_ffn_out_buf, scale_ptr, bias_ptr, _p_d_output, _p_d_output, - _batch_token_num, _tw._hidden_size, - _enc_clip_max[_layer_id * 12 + 3] * _enc_clip_max[_layer_id * 12 + 7] / - (2 * _quant_range * _quant_range), + _batch_token_num, _tw._hidden_size, dequant_scale, _max_thread_per_block, _stream, true, _scaled_ffn2_colsum[_layer_id]); } else { scale_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer]; @@ -400,9 +407,7 @@ void QuantEncoder::ffn_add_norm() { ker_residual_bias_ln_i32I_i8O_launcher<_DataType>( _int32_ffn_out_buf, scale_ptr, bias_ptr, res_bias_ptr, _int8_ffn_in_buf, - _p_d_output, _batch_token_num, _tw._hidden_size, - _enc_clip_max[_layer_id * 12 + 3] * _enc_clip_max[_layer_id * 12 + 7] / - (2 * _quant_range * _quant_range), + _p_d_output, _batch_token_num, _tw._hidden_size, dequant_scale, _quant_range / clip_max, _max_thread_per_block, _stream, _tw._is_post_ln, true, true, _scaled_ffn2_colsum[_layer_id]); } diff --git a/lightseq/inference/proto/quant_bert_weight.cc b/lightseq/inference/proto/quant_bert_weight.cc index 6ad5941d..6b0375f3 100644 --- a/lightseq/inference/proto/quant_bert_weight.cc +++ b/lightseq/inference/proto/quant_bert_weight.cc @@ -366,6 +366,12 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size * 3; }, "Wrong multihead_project_kernel_qkv_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size * 3); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size * 3; offset.push_back(idx); @@ -374,12 +380,6 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { H5T_NATIVE_FLOAT, value.data() + idx, [=](int size) { return size != _hidden_size * 3; }, "Wrong multihead_project_bias_qkv_size !"); - read_hdf5_dataset_scalar( - hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv_clip_max", - H5T_NATIVE_FLOAT, &clip_max); - dequantize_array(value_i8, value, clip_max, _quant_range, idx, - _hidden_size * _hidden_size * 3); - _enc_clip_max.push_back(clip_max); idx += _hidden_size * 3; offset.push_back(idx); From df94d6953d951ec2b18de0f965666fa8f692a831 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 30 Mar 2022 17:51:55 +0800 Subject: [PATCH 21/49] update black pre-coommit version --- .pre-commit-config.yaml | 2 +- lightseq/training/ops/pytorch/torch_transformer_layers.py | 4 ++-- lightseq/training/ops/pytorch/transformer_embedding_layer.py | 2 +- lightseq/training/pytorch_quantization/tensor_quant.py | 2 +- setup.py | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index cd71922f..7a1f5781 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -12,7 +12,7 @@ repos: args: [-style=file] - repo: https://github.com/psf/black - rev: 21.5b2 + rev: 22.3.0 hooks: - id: black diff --git a/lightseq/training/ops/pytorch/torch_transformer_layers.py b/lightseq/training/ops/pytorch/torch_transformer_layers.py index e635e6fe..e3906e5f 100644 --- a/lightseq/training/ops/pytorch/torch_transformer_layers.py +++ b/lightseq/training/ops/pytorch/torch_transformer_layers.py @@ -64,7 +64,7 @@ def __init__( assert ( self.head_dim * num_heads == self.embed_dim ), "embed_dim must be divisible by num_heads" - self.scaling = self.head_dim ** -0.5 + self.scaling = self.head_dim**-0.5 self.self_attention = self_attention self.encoder_decoder_attention = encoder_decoder_attention @@ -955,7 +955,7 @@ def __init__(self, config, initial_embeddings=None): ) self.emb_lookup.to(dtype=(torch.half if config.fp16 else torch.float)) self.embeddings = self.emb_lookup.weight - nn.init.normal_(self.embeddings, mean=0, std=config.embedding_dim ** -0.5) + nn.init.normal_(self.embeddings, mean=0, std=config.embedding_dim**-0.5) nn.init.constant_(self.embeddings[config.padding_idx], 0) # load initial weights diff --git a/lightseq/training/ops/pytorch/transformer_embedding_layer.py b/lightseq/training/ops/pytorch/transformer_embedding_layer.py index 1ff65da9..58f77618 100644 --- a/lightseq/training/ops/pytorch/transformer_embedding_layer.py +++ b/lightseq/training/ops/pytorch/transformer_embedding_layer.py @@ -110,7 +110,7 @@ def __init__( ) def reset_parameters(self): - nn.init.normal_(self.embeddings, mean=0, std=self.config.embedding_dim ** -0.5) + nn.init.normal_(self.embeddings, mean=0, std=self.config.embedding_dim**-0.5) nn.init.constant_(self.embeddings[self.config.padding_idx], 0) def __assign_layer_weight_grad(self): diff --git a/lightseq/training/pytorch_quantization/tensor_quant.py b/lightseq/training/pytorch_quantization/tensor_quant.py index 6c30bc19..8a05c42b 100644 --- a/lightseq/training/pytorch_quantization/tensor_quant.py +++ b/lightseq/training/pytorch_quantization/tensor_quant.py @@ -441,7 +441,7 @@ def forward(ctx, inputs, min_range, max_range, num_bits=8): ) ctx.save_for_backward(inputs, min_range, max_range) - step_size = (max_range - min_range) / (2.0 ** num_bits - 1) + step_size = (max_range - min_range) / (2.0**num_bits - 1) min_bound = -(2.0 ** (num_bits - 1)) max_bound = 2.0 ** (num_bits - 1) - 1 diff --git a/setup.py b/setup.py index 3765ca38..815ac8de 100644 --- a/setup.py +++ b/setup.py @@ -65,7 +65,7 @@ def build_extension(self, ext): cmake_args += [ "-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}".format(cfg.upper(), extdir) ] - if sys.maxsize > 2 ** 32: + if sys.maxsize > 2**32: cmake_args += ["-A", "x64"] build_args += ["--", "/m"] else: From 6d3e74c9a6e6637816ea04fb95eea034a800b635 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 6 Apr 2022 18:04:03 +0800 Subject: [PATCH 22/49] add quant bert test example --- examples/inference/python/README.md | 31 +-- examples/inference/python/test/ls_bart.py | 1 + examples/inference/python/test/ls_bert.py | 1 + examples/inference/python/test/ls_gpt2.py | 1 + .../inference/python/test/ls_quant_bert.py | 176 ++++++++++++++++++ .../huggingface/bert/task_glue/run_glue.sh | 2 +- .../bert/task_glue/run_quant_glue.sh | 2 +- .../bert/task_ner/predict_quant_ner.sh | 42 +++++ .../huggingface/bert/task_ner/run_ner.sh | 2 +- .../bert/task_ner/run_quant_ner.sh | 2 +- .../huggingface/bert/task_qa/run_qa.sh | 2 +- 11 files changed, 242 insertions(+), 20 deletions(-) create mode 100644 examples/inference/python/test/ls_quant_bert.py create mode 100644 examples/training/huggingface/bert/task_ner/predict_quant_ner.sh diff --git a/examples/inference/python/README.md b/examples/inference/python/README.md index febe0f31..7d1f03ae 100644 --- a/examples/inference/python/README.md +++ b/examples/inference/python/README.md @@ -9,21 +9,22 @@ cd examples/inference/python ## Model export We provide the following export examples. All Fairseq based models are trained using the scripts in [examples/training/fairseq](../../../examples/training/fairseq). The first two LightSeq Transformer models are trained using the scripts in [examples/training/custom](../../../examples/training/custom). -| Model | Type | Command | Resource | Description | -|--------------------------------------|-------|-------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------| -| LightSeq Transformer | Float | python export/ls_transformer_export.py -m ckpt_ls_custom.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt) | Export LightSeq Transformer models to protobuf format. | -| LightSeq Transformer + PTQ | Int8 | python export/ls_transformer_ptq_export.py -m ckpt_ls_custom.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt) | Export LightSeq Transformer models to int8 protobuf format using post training quantization. | -| Hugging Face BART | Float | python export/huggingface/hf_bart_export.py | / | Export Hugging Face BART models to protobuf/hdf5 format. | -| Hugging Face BERT | Float | python export/huggingface/hf_bert_export.py | / | Export Hugging Face BERT models to hdf5 format. | -| Hugging Face GPT2 | Float | python export/huggingface/hf_gpt2_export.py | / | Export Hugging Face GPT2 models to hdf5 format. | -| Native Fairseq Transformer | Float | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to protobuf/hdf5 format. | -| Native Fairseq Transformer + PTQ | Int8 | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to int8 protobuf format using post training quantization. | -| Fairseq + LightSeq Transformer | Float | python export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt) | Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. | -| Fairseq + LightSeq Transformer + PTQ | Int8 | python export/fairseq/ls_fs_transformer_ptq_export.py -m ckpt_ls_fairseq_31.17.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt) | Export Fairseq Transformer models training with LightSeq modules to int8 protobuf format using post training quantization. | -| Fairseq + custom Torch layer | Float | python export/fairseq/ls_torch_fs_transformer_export.py -m ckpt_ls_torch_fairseq_31.16.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt) | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to protobuf format. | -| Fairseq + custom Torch layer + PTQ | Int8 | python export/fairseq/ls_torch_fs_transformer_ptq_export.py -m ckpt_ls_torch_fairseq_31.16.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt) | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format using post training quantization. | -| Fairseq + custom Torch layer + QAT | Int8 | python export/fairseq/ls_torch_fs_quant_transformer_export.py -m ckpt_ls_torch_fairseq_quant_31.09.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_quant_31.09.pt) | Export quantized Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format. | -| Native Fairseq MoE Transformer | Float | python export/fairseq/native_fs_moe_transformer_export.py | / | Export Fairseq MoE Transformer models to protobuf/hdf5 format. | +| Model | Type | Command | Resource | Description | +| -------------------------------------------- | ----- | ----------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- | +| LightSeq Transformer | Float | python export/ls_transformer_export.py -m ckpt_ls_custom.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt) | Export LightSeq Transformer models to protobuf format. | +| LightSeq Transformer + PTQ | Int8 | python export/ls_transformer_ptq_export.py -m ckpt_ls_custom.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt) | Export LightSeq Transformer models to int8 protobuf format using post training quantization. | +| Hugging Face BART | Float | python export/huggingface/hf_bart_export.py | / | Export Hugging Face BART models to protobuf/hdf5 format. | +| Hugging Face BERT | Float | python export/huggingface/hf_bert_export.py | / | Export Hugging Face BERT models to hdf5 format. | +| Hugging Face + custom Torch layer BERT + QAT | Int8 | python export/huggingface/ls_torch_hf_quant_bert_export.py -m ckpt_hf_torch_quant_bert_ner.bin | / | Export Hugging Face BERT training with custom Torch layers to hdf5 format. | +| Hugging Face GPT2 | Float | python export/huggingface/hf_gpt2_export.py | / | Export Hugging Face GPT2 models to hdf5 format. | +| Native Fairseq Transformer | Float | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to protobuf/hdf5 format. | +| Native Fairseq Transformer + PTQ | Int8 | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to int8 protobuf format using post training quantization. | +| Fairseq + LightSeq Transformer | Float | python export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt) | Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. | +| Fairseq + LightSeq Transformer + PTQ | Int8 | python export/fairseq/ls_fs_transformer_ptq_export.py -m ckpt_ls_fairseq_31.17.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt) | Export Fairseq Transformer models training with LightSeq modules to int8 protobuf format using post training quantization. | +| Fairseq + custom Torch layer | Float | python export/fairseq/ls_torch_fs_transformer_export.py -m ckpt_ls_torch_fairseq_31.16.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt) | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to protobuf format. | +| Fairseq + custom Torch layer + PTQ | Int8 | python export/fairseq/ls_torch_fs_transformer_ptq_export.py -m ckpt_ls_torch_fairseq_31.16.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_31.16.pt) | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format using post training quantization. | +| Fairseq + custom Torch layer + QAT | Int8 | python export/fairseq/ls_torch_fs_quant_transformer_export.py -m ckpt_ls_torch_fairseq_quant_31.09.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_torch_fairseq_quant_31.09.pt) | Export quantized Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format. | +| Native Fairseq MoE Transformer | Float | python export/fairseq/native_fs_moe_transformer_export.py | / | Export Fairseq MoE Transformer models to protobuf/hdf5 format. | ## LightSeq inference ### Hugging Face models diff --git a/examples/inference/python/test/ls_bart.py b/examples/inference/python/test/ls_bart.py index 7738f49c..2e667c44 100644 --- a/examples/inference/python/test/ls_bart.py +++ b/examples/inference/python/test/ls_bart.py @@ -71,6 +71,7 @@ def main(): # change to "facebook/bart-large" for large model hf_model = BartForConditionalGeneration.from_pretrained("facebook/bart-base") hf_model.to("cuda:0") + hf_model.eval() sentences = [ "I love that girl, but does not me.", diff --git a/examples/inference/python/test/ls_bert.py b/examples/inference/python/test/ls_bert.py index 7e3b0d4f..baa00a3c 100644 --- a/examples/inference/python/test/ls_bert.py +++ b/examples/inference/python/test/ls_bert.py @@ -76,6 +76,7 @@ def main(): print("creating huggingface model...") hf_model = BertForSequenceClassification.from_pretrained("bert-base-uncased") hf_model.to("cuda:0") + hf_model.eval() print("creating lightseq model...") ls_model = LightseqBertClassification("lightseq_bert_base_uncased.hdf5", hf_model) diff --git a/examples/inference/python/test/ls_gpt2.py b/examples/inference/python/test/ls_gpt2.py index bc0f980b..f316d06d 100644 --- a/examples/inference/python/test/ls_gpt2.py +++ b/examples/inference/python/test/ls_gpt2.py @@ -81,6 +81,7 @@ def main(): print("creating huggingface model...") hf_model = GPT2LMHeadModel.from_pretrained("gpt2") hf_model.to("cuda:0") + hf_model.eval() # lightseq gpt perplexity supports batch infer with different lengths, # but sampling doesn't support diff --git a/examples/inference/python/test/ls_quant_bert.py b/examples/inference/python/test/ls_quant_bert.py new file mode 100644 index 00000000..3b1d402e --- /dev/null +++ b/examples/inference/python/test/ls_quant_bert.py @@ -0,0 +1,176 @@ +import time + +import torch +from transformers import BertTokenizer, BertForTokenClassification, BertConfig +import lightseq.inference as lsi +from lightseq.training.ops.pytorch.quantization import qat_mode, disable_quant +from lightseq.training.ops.pytorch.torch_transformer_layers import ( + BertEmbeddingLayer, + TransformerEncoderLayer, +) +from export.fairseq.util import parse_args + + +def ls_bert(model, inputs): + torch.cuda.synchronize() + start_time = time.perf_counter() + ls_output = model.infer(inputs) + torch.cuda.synchronize() + end_time = time.perf_counter() + return ls_output, end_time - start_time + + +def hf_bert(model, inputs, attn_mask): + torch.cuda.synchronize() + start_time = time.perf_counter() + hf_output = model(inputs.to("cuda:0"), attention_mask=attn_mask.to("cuda:0")) + torch.cuda.synchronize() + end_time = time.perf_counter() + return hf_output, end_time - start_time + + +def ls_generate(model, inputs_id): + print("=========lightseq=========") + print("lightseq generating...") + ls_output, ls_time = ls_bert(model, inputs_id) + print(f"lightseq time: {ls_time}s") + print("lightseq results (class predictions):") + print(ls_output.argmax(axis=2).detach().cpu().numpy()) + + +def hf_generate(model, inputs_id, attn_mask): + print("=========huggingface=========") + print("huggingface generating...") + hf_output, hf_time = hf_bert(model, inputs_id, attn_mask) + print(f"huggingface time: {hf_time}s") + print("huggingface results (class predictions):") + print(hf_output.logits.argmax(axis=2).detach().cpu().numpy()) + + +def warmup(tokenizer, ls_model, hf_model, sentences): + inputs = tokenizer(sentences, return_tensors="pt", padding=True) + inputs_id = inputs["input_ids"] + attn_mask = inputs["attention_mask"] + + ls_generate(ls_model, inputs_id) + hf_generate(hf_model, inputs_id, attn_mask) + + +class LightseqBertClassification: + def __init__(self, ls_weight_path, hf_model): + self.ls_bert = lsi.QuantBert(ls_weight_path, 8) + self.classifier = hf_model.classifier + + def infer(self, inputs): + last_hidden_states = self.ls_bert.infer(inputs) + last_hidden_states = torch.Tensor(last_hidden_states).float() + logits = self.classifier(last_hidden_states.to("cuda:0")) + return logits + + +def gen_bert_emb_config(config): + bert_emb_config = BertEmbeddingLayer.get_config( + vocab_size=config.vocab_size, + embedding_dim=config.hidden_size, + max_batch_tokens=4096, + max_seq_len=config.max_position_embeddings, + padding_idx=config.pad_token_id, + dropout=config.hidden_dropout_prob, + fp16=True, + local_rank=0, + ) + bert_emb_config.type_vocab_size = config.type_vocab_size + bert_emb_config.layer_norm_eps = config.layer_norm_eps + return bert_emb_config + + +class LSHFTransformerEncoderLayer(TransformerEncoderLayer): + def __init__(self, *args, **kwargs): + super(LSHFTransformerEncoderLayer, self).__init__(*args, **kwargs) + + def forward(self, hidden_states, encoder_padding_mask, *args, **kwargs): + ls_encoder_padding_mask = encoder_padding_mask / -10000.0 + ls_encoder_padding_mask = ls_encoder_padding_mask.squeeze() + output = super().forward(hidden_states, ls_encoder_padding_mask) + return (output, None, None, None) + + +def gen_bert_enc_config(config): + bert_enc_config = TransformerEncoderLayer.get_config( + max_batch_tokens=4096, + max_seq_len=config.max_position_embeddings, + hidden_size=config.hidden_size, + intermediate_size=config.intermediate_size, + nhead=config.num_attention_heads, + attn_prob_dropout_ratio=config.attention_probs_dropout_prob, + activation_dropout_ratio=config.hidden_dropout_prob, + hidden_dropout_ratio=config.hidden_dropout_prob, + pre_layer_norm=False, + fp16=True, + local_rank=0, + activation_fn="gelu", + ) + return bert_enc_config + + +def inject_ls_layer(model, config): + bert_emb_config = gen_bert_emb_config(config) + model.bert.embeddings = BertEmbeddingLayer(bert_emb_config) + model.bert.embeddings.apply(qat_mode) + + for i in range(config.num_hidden_layers): + bert_enc_config = gen_bert_enc_config(config) + model.bert.encoder.layer[i] = LSHFTransformerEncoderLayer( + bert_enc_config + ).cuda() + model.bert.encoder.layer[i].apply(qat_mode) + + +def main(): + args = parse_args() + model_name = ".".join(args.model.split(".")[:-1]) + ckpt_path = f"{model_name}.bin" + + print("initializing bert config...") + config = BertConfig.from_pretrained( + "bert-base-uncased", num_labels=9, finetuning_task="ner" + ) + + print("initializing bert tokenizer...") + tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") + + print("creating huggingface model...") + hf_model = BertForTokenClassification.from_pretrained( + "bert-base-uncased", config=config + ) + inject_ls_layer(hf_model, config) + state_dict = torch.load(ckpt_path, map_location="cpu") + hf_model.load_state_dict(state_dict, strict=False) + hf_model.to("cuda:0") + hf_model.eval() + + print("creating lightseq model...") + ls_model = LightseqBertClassification(args.model, hf_model) + + sentences = [ + "EU rejects German call to boycott British lamb .", + "-- Dimitris Kontogiannis , Athens Newsroom +301 3311812-4", + "BayerVB sets C$ 100 million six-year bond .", + "China says time right for Taiwan talks .", + ] + + print("====================START warmup====================") + warmup(tokenizer, ls_model, hf_model, sentences) + print("====================END warmup====================") + + print("tokenizing the sentences...") + inputs = tokenizer(sentences, return_tensors="pt", padding=True) + inputs_id = inputs["input_ids"] + attn_mask = inputs["attention_mask"] + + ls_generate(ls_model, inputs_id) + hf_generate(hf_model, inputs_id, attn_mask) + + +if __name__ == "__main__": + main() diff --git a/examples/training/huggingface/bert/task_glue/run_glue.sh b/examples/training/huggingface/bert/task_glue/run_glue.sh index e6a82979..a7756ab2 100644 --- a/examples/training/huggingface/bert/task_glue/run_glue.sh +++ b/examples/training/huggingface/bert/task_glue/run_glue.sh @@ -18,7 +18,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) export TASK_NAME=sst2 python3 -m torch.distributed.launch \ - --nproc_per_node=8 \ + --nproc_per_node=1 \ $THIS_DIR/run_glue.py \ --model_name_or_path bert-base-cased \ --task_name $TASK_NAME \ diff --git a/examples/training/huggingface/bert/task_glue/run_quant_glue.sh b/examples/training/huggingface/bert/task_glue/run_quant_glue.sh index 46f3e58f..d60e9233 100644 --- a/examples/training/huggingface/bert/task_glue/run_quant_glue.sh +++ b/examples/training/huggingface/bert/task_glue/run_quant_glue.sh @@ -18,7 +18,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) export TASK_NAME=sst2 python3 -m torch.distributed.launch \ - --nproc_per_node=8 \ + --nproc_per_node=1 \ $THIS_DIR/run_glue.py \ --model_name_or_path bert-base-cased \ --task_name $TASK_NAME \ diff --git a/examples/training/huggingface/bert/task_ner/predict_quant_ner.sh b/examples/training/huggingface/bert/task_ner/predict_quant_ner.sh new file mode 100644 index 00000000..df81783e --- /dev/null +++ b/examples/training/huggingface/bert/task_ner/predict_quant_ner.sh @@ -0,0 +1,42 @@ +# Copyright 2020 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +until [[ -z "$1" ]] +do + case $1 in + -m) + shift; MODEL=$1; + shift;; + *) + shift;; + esac +done + +THIS_DIR=$(dirname $(readlink -f $0)) + +python3 -m torch.distributed.launch \ + --nproc_per_node=1 \ + $THIS_DIR/run_ner.py \ + --model_name_or_path bert-base-uncased \ + --dataset_name conll2003 \ + --do_predict \ + --per_device_train_batch_size 4 \ + --output_dir /tmp/quant/test-ner \ + --overwrite_output_dir \ + --resume_from_checkpoint $MODEL \ + --fp16 \ + --seed 1234 \ + --logging_steps 10 \ + --module_type 2 \ + --enable_quant true diff --git a/examples/training/huggingface/bert/task_ner/run_ner.sh b/examples/training/huggingface/bert/task_ner/run_ner.sh index a01e7041..2664fdbb 100644 --- a/examples/training/huggingface/bert/task_ner/run_ner.sh +++ b/examples/training/huggingface/bert/task_ner/run_ner.sh @@ -15,7 +15,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) python3 -m torch.distributed.launch \ - --nproc_per_node=8 \ + --nproc_per_node=1 \ $THIS_DIR/run_ner.py \ --model_name_or_path bert-base-uncased \ --dataset_name conll2003 \ diff --git a/examples/training/huggingface/bert/task_ner/run_quant_ner.sh b/examples/training/huggingface/bert/task_ner/run_quant_ner.sh index 6e64c0a6..3d962e66 100644 --- a/examples/training/huggingface/bert/task_ner/run_quant_ner.sh +++ b/examples/training/huggingface/bert/task_ner/run_quant_ner.sh @@ -15,7 +15,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) python3 -m torch.distributed.launch \ - --nproc_per_node=8 \ + --nproc_per_node=1 \ $THIS_DIR/run_ner.py \ --model_name_or_path bert-base-uncased \ --dataset_name conll2003 \ diff --git a/examples/training/huggingface/bert/task_qa/run_qa.sh b/examples/training/huggingface/bert/task_qa/run_qa.sh index 78e5c390..61346d8d 100644 --- a/examples/training/huggingface/bert/task_qa/run_qa.sh +++ b/examples/training/huggingface/bert/task_qa/run_qa.sh @@ -15,7 +15,7 @@ THIS_DIR=$(dirname $(readlink -f $0)) python3 -m torch.distributed.launch \ - --nproc_per_node=8 \ + --nproc_per_node=1 \ $THIS_DIR/run_qa.py \ --model_name_or_path bert-base-uncased \ --dataset_name squad \ From 0189252136ed6a73b255cd790edf88fb5d6af10b Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 6 Apr 2022 19:46:43 +0800 Subject: [PATCH 23/49] support cpp quant bert example --- examples/inference/cpp/CMakeLists.txt | 3 + examples/inference/cpp/quant_bert_example.cc | 65 ++++++++++++++++++++ 2 files changed, 68 insertions(+) create mode 100644 examples/inference/cpp/quant_bert_example.cc diff --git a/examples/inference/cpp/CMakeLists.txt b/examples/inference/cpp/CMakeLists.txt index a9a14d4f..e2f1b630 100644 --- a/examples/inference/cpp/CMakeLists.txt +++ b/examples/inference/cpp/CMakeLists.txt @@ -9,6 +9,9 @@ target_link_libraries(quant_transformer_example PUBLIC liblightseq) add_executable(bert_example bert_example.cc) target_link_libraries(bert_example PUBLIC liblightseq) +add_executable(quant_bert_example quant_bert_example.cc) +target_link_libraries(quant_bert_example PUBLIC liblightseq) + add_executable(gpt_example gpt_example.cc) target_link_libraries(gpt_example PUBLIC liblightseq) diff --git a/examples/inference/cpp/quant_bert_example.cc b/examples/inference/cpp/quant_bert_example.cc new file mode 100644 index 00000000..fdebe16d --- /dev/null +++ b/examples/inference/cpp/quant_bert_example.cc @@ -0,0 +1,65 @@ +#include "model_base.h" +#include "util.h" + +/** +@file +Example of how to run QuantBert inference using our implementation. +*/ + +int main(int argc, char* argv[]) { + std::string model_weights_path = argv[1]; + int max_batch_size = 128; + + auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( + "QuantBert", model_weights_path, max_batch_size); + + int batch_size = 1; + int batch_seq_len = 10; + std::vector host_input = {101, 2859, 2758, 2051, 2157, 2005, 6629, 7566, 1012, 102}; + + void* d_input; + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_input, sizeof(int) * batch_size * batch_seq_len)); + lightseq::cuda::CHECK_GPU_ERROR(cudaMemcpy( + d_input, host_input.data(), sizeof(int) * batch_size * batch_seq_len, + cudaMemcpyHostToDevice)); + + model->set_input_ptr(0, d_input); + model->set_input_shape(0, {batch_size, batch_seq_len}); + + for (int i = 0; i < model->get_output_size(); i++) { + void* d_output; + std::vector shape = model->get_output_max_shape(i); + int total_size = 1; + for (int j = 0; j < shape.size(); j++) { + total_size *= shape[j]; + } + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_output, total_size * sizeof(int))); + model->set_output_ptr(i, d_output); + } + lightseq::cuda::CHECK_GPU_ERROR(cudaStreamSynchronize(0)); + std::cout << "infer preprocessing finished" << std::endl; + + /* ---step5. infer and log--- */ + for (int i = 0; i < 10; i++) { + auto start = std::chrono::high_resolution_clock::now(); + model->Infer(); + lightseq::cuda::print_time_duration(start, "one infer time", 0); + } + + for (int i = 0; i < model->get_output_size(); i++) { + const float* d_output; + d_output = static_cast(model->get_output_ptr(i)); + std::vector shape = model->get_output_shape(i); + std::cout << "output shape: "; + for (int j = 0; j < shape.size(); j++) { + std::cout << shape[j] << " "; + } + std::cout << std::endl; + + lightseq::cuda::print_vec(d_output, "output", 5); + } + + return 0; +} From 1078cc27968f7dbccaa99e89be03dc61fdab9b23 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 6 Apr 2022 22:01:02 +0800 Subject: [PATCH 24/49] format --- examples/inference/cpp/quant_bert_example.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/examples/inference/cpp/quant_bert_example.cc b/examples/inference/cpp/quant_bert_example.cc index fdebe16d..a1d96121 100644 --- a/examples/inference/cpp/quant_bert_example.cc +++ b/examples/inference/cpp/quant_bert_example.cc @@ -15,7 +15,8 @@ int main(int argc, char* argv[]) { int batch_size = 1; int batch_seq_len = 10; - std::vector host_input = {101, 2859, 2758, 2051, 2157, 2005, 6629, 7566, 1012, 102}; + std::vector host_input = {101, 2859, 2758, 2051, 2157, + 2005, 6629, 7566, 1012, 102}; void* d_input; lightseq::cuda::CHECK_GPU_ERROR( From 33bb9058318531a4e3bcc06632a65b528d05b0ba Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Fri, 8 Apr 2022 17:18:51 +0800 Subject: [PATCH 25/49] modify readme --- README.md | 4 ++-- docs/guide.md | 4 ++-- docs/training/images/single_step.png | Bin 240588 -> 361377 bytes examples/inference/cpp/bert_example.cc | 19 +++++++++++++++--- examples/inference/cpp/quant_bert_example.cc | 20 +++++++++++++++---- lightseq/inference/README.md | 2 +- lightseq/training/README.md | 4 ++-- 7 files changed, 39 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index 829ef337..02abb282 100644 --- a/README.md +++ b/README.md @@ -41,7 +41,7 @@ The following is a support matrix of LightSeq **inference** library compared wit ## Performance ### [>>> Training](./lightseq/training) -Here we present the experimental results on WMT14 English to German translation task based on Transformer-big models. We train Transformer models of different sizes on eight NVIDIA Tesla V100/NVIDIA Ampere A100 GPUs with data parallel and fp16 mixed precision. +Here we present the experimental results on WMT14 English to German translation task based on Transformer-big models. We train Transformer models of different sizes on eight NVIDIA Tesla V100/NVIDIA Tesla A100 GPUs with data parallel and fp16 mixed precision. [Fairseq](https://github.com/pytorch/fairseq) with [Apex](https://github.com/NVIDIA/apex) is choosed as our baseline. @@ -97,7 +97,7 @@ cd examples/inference/python then you can check the performance by simply running following commands. `hf_bart_export.py` is used to transform pytorch weights to LightSeq protobuffer. ```shell -python export/hf_bart_export.py +python export/huggingface/hf_bart_export.py python test/ls_bart.py ``` diff --git a/docs/guide.md b/docs/guide.md index 1fd427c3..651cc616 100644 --- a/docs/guide.md +++ b/docs/guide.md @@ -119,7 +119,7 @@ These functions can export the configuration, embedding, encoder and decoder wei LightSeq provides export examples of native Hugging Face BERT/BART/GPT2, Fairseq trained with LightSeq and LightSeq Transformer. All codes are available [here](../examples/inference/python/export). #### Fairseq -The main code is as follows (some parameters are omitted). Complete code is available [here](../examples/inference/python/export/ls_fs_transformer_export.py). +The main code is as follows (some parameters are omitted). Complete code is available [here](../examples/inference/python/export/fairseq/ls_fs_transformer_export.py). ```python model = Transformer() encoder_state_dict, decoder_state_dict = _extract_weight(state_dict) @@ -136,7 +136,7 @@ First, you need to divide the state dict into two parts of encoder and decoder, The above functions export the checkpoints to protobuf by default. Specify `save_pb=False` to export to hdf5 files. You can use the [Fairseq training example](../examples/training/fairseq) to obtain the trained checkpoints. #### Hugging Face -LightSeq provides three examples of exporting native Hugging Face models ([BERT](../examples/inference/python/export/hf_bert_export.py), [BART](../examples/inference/python/export/hf_bart_export.py) and [GPT2](../examples/inference/python/export/hf_gpt2_export.py)). Because these native models did not use LightSeq modules to pretrain, the users must manually make the export rules. +LightSeq provides three examples of exporting native Hugging Face models ([BERT](../examples/inference/python/export/huggingface/hf_bert_export.py), [BART](../examples/inference/python/export/huggingface/hf_bart_export.py) and [GPT2](../examples/inference/python/export/huggingface/hf_gpt2_export.py)). Because these native models did not use LightSeq modules to pretrain, the users must manually make the export rules. #### LightSeq Transformer LightSeq provide an example of exporting its own Transformer module, which is similar to Fairseq models export. You can use the [custom training example](../examples/training/custom) to obtain the trained checkpoints. This export example can also compare the results and speeds of forward propagation in training library, inference library loading both protobuf and hdf5 files. The results show that the inference library is faster than the forward propagation of training library by about 2x. diff --git a/docs/training/images/single_step.png b/docs/training/images/single_step.png index aae28f40a974364ca5e9ee2877101c0c65170188..ea79e34c69ed2c1498cc9bb49598401626be9c61 100644 GIT binary patch literal 361377 zcmb5V1yohr`aTSs5D=sUq`Nz$y96bqMWmz~q(Nc}64HWngCNq~og#vCHv-b#@y&fa z_Z+?dZ~Vr%j?G}nAl91mozMHkyMv!9J;FdELW6^Y!;qJgR)vE@F^7XgTtz_!UvXtZ z(uadXx3rLwdMYm^Me+2NovDSj2^^eU@CU6s&(yx)r|P_qjQkG4j>Pf6&DLV1tU!UN zA`w%f(!@N#eel_#uvdfcSa!;TFLZbcQH-FpU9iMPlcez!;y-^p~TTgm%KE}f{SUr z**&;H7QcV3@d#;Ltot^9eGiF?46X+;X!x8b-gbd=#3`zCNBpI#L4;KGG@J{{;=Gm#_<8^96@Ij7=;OQfc_ zw}4vvItZ^cqUT#mX3IycX-QAi4i2(~lp9jB)%mmGz_Z$!KFWeVEAbIYX)gJDu3fj< ziMOn^0tXW0V%149qk~*lpLts#a$;%^vAryi01?#$#3K>28ORu0f6yuD2rXm2B96lm=<28}>l;dhM-SA1mKkAIIxQ~MSo-#+=I)VN1yMSj1FX^>iUT623 zszW15Ma-!`bTckh`|~SN+{4s7Fghn3D&ha&rBYG}Xv?Ew^5y8F9AH~G0L``@{ z{ECBMt4zG_G06&cPnQhFY8!#SKa*p3>fMgueWp}Bc|H5&S8PgAD;=M=92_N-Y9#IM zmDaO}={+P+?)3{%`x0R$ckmK7#$gFbduFh-*KO|u1zL#&iUCBx`lkU}4q9s3So)iz#=BF}sowl3rfTZ{De#=VAF2tZK31N0yN27u)gl zB?{-82eK9I82fNDm=SNrDJEVx*v*)6MA)79=lb)uQMWlu3N4-(4?=!os)WODM~W$E zgdFW!L3N~*FohqS25vSK1ZTfCubLDB=P-1m8PaSF61Ukl;t+QS;?C2>(1-z>#%vJs5Cf! zW8`X|OTLtEcxQ>eyFVTC-CkKVsRlB$A)dv4_ZRvG=lzw%m$k1O&kZ7mg(%r-j)j0D zjuwxYMbPHVhBVy5!a;uj&C3#;0t6ES?g?ZUXyXFhCxoyCM1BOeH(wV7Ywl_(-*H2& zYxVpdNP&WyLFv>1$0<{Qb*BSwUltPilm;zWN;Q&)1{a(9l`(!97Ne{twMma=HrhF! zz?)?8erlT@n-$8T;06iNj0cZpoK=iQh)9DyvRHU<6vgQ?6V}x$;B@`j9z7X+?1cF| zBvqU}b9~U{h<3^kUV+rez%-tEHLSGs0Sn&Omh1_V0u<&}rc$B;c+s`|XF_9dy*h&V zNzInR?U5cW1oLC2y&GtaTQoUiYrq$KtKUh!G<&862{Vwkr#Qpm_jmADXycKj%q$7};TP^6 z=AP*<7%NX!?!6+HNqQ-@pG)(!jNy>+V+>KRMXyM&j6 zG4)%y(Tpjz$q`g_0cQbc`pRIBq>EVDEbZYp=6MyuQ?H)mZPRSS7i#$xEoQe4E?D)A zGK_y3t;yVf99>}X@yk$hwsyL9ru);w_oi|9+FzfC6n2g#W@!{TOK;|AM)2xR%LvH| zX?Z4`*lr(A4LHjQjtOQ6zD|)5oOV*$5Y91jx+|E}pT(YSmz$R3x{l*)<0Me0UDsLX z>8#^if9!NLbNv1&XY<1*>8O>N0lF1(U{w3FR{oZn)-#?yw+B}71`|Vu58I2-FVn?d zh{@@v>t{DB-`3sGBjiYYlBjBB@)PvC45-OyTzuAU5s-wgOzeU&PP)UF%-3Lk23>Rw zeM;?_#h;ZRJ01Ew)FV_ev^|Wri#Sx7$TfjO;f+F>!dM~+_fMXWNyzc<6vh+D_`F(2 zjLPB#`czjLO{){4wEYuitn;<=wbN?D+%wN1hhD3vtPJ>b_=NaYtck33tctAN&-%*> zsv3-Bst&4fo)=aNS{=7ebqP*-lscyzI}&>lhbuxAE5C{8B-*5Jt4}|kwsMgcva7eQ z4|9pzR`JC2%IRmr>$J? zxI!=KPIbtfA~?xHL=r`s+&tVLi&zN@HL^4gH83^UdTe@hd$1phu1W2oo+NAzuOAMM z5AzM2`tIs`hkH+Y*I%bblo8TCsPq~!@^SPjy4Acrfio3<-V*D3B3>?TYhbdFMG->b z{XIP0j73Nj-wJb6qr|a<=gfZZP8GW+Z#^U8T~tg)Gz*MTR7G^wyIxqPWU+*rq+_Hz z*rMDGjBl{82}_xl=(^+Em(?BW9(jhPid?AEJmeBBV>wWX@rvR3Y=|ysdKCZAXh%jz zCM0X~r8qjhsbb&tv!sc<(&jMQ@Gw#x(G!d0Co=^#<@-lxU-s*l=jjpY3*>5&%##?D zt(Arp5+l_=98n6;xI4ePtchASji`@*8Bg^%Pnq^3-NzldZ9;NE@7c3(R5DyN`k8oh zZikDG!)SE+9}0qk+KybV2dzAk&tFaLR1C~Ku%)f0Pl?uxn)_ySICiLa;CE(qo&xb{ zwQMa%zYQHF9N@m@9yfMDLwHNu#)hq3 z$$CGJkxnf~*&>=>4vltO%`3MkX^J&0W+#%eEV%4KSochAo~ru8Esak+d8B^oyb!C! zRO`>F;m%Jq`XaZgqEuXxg=N zTTtVNhDPPI`RO~$isyQ>wPtOpUp=Kx#>V-_tY^C?v>M2_;&y%Zt)`0k?n#rGi>`Z> ztVWJzylX#dm#LK74xQQ&G}QI?ds8VzifHeL6*lOEs`t}9jAg@ zx7PhsI(fD6;6B`lBznGi5rLhRN;MKJywazkL**)~KEgdh zUo=YOe5Q1raogex_pq79fNmO}OMtVUH|LE^_meMQn~K`RDsKziOfEOR>r41lIz?>m zxUqW8-kMJ8_yNSIZMND8-ge&;J@uLo%+&Ln( zTN5Gl6J9t)wGU=b-(oE$7=|=MbjQ~Q5X6EGUSH^ji=BvLLxs*tZwdSxB9e<2@O}XC zVQunE-c(T$jtN|&z#+mD!6AVwc<><#PyEldEIb1o!e3v9z`+Gtz#;zqJxbs+>|Z4K zfc?y`&yaM;f4&37JRRXb*NCgIZ;i8idILV8+RJG>!olIw!am^TRcZF%;KbqNr6tte z;I~px8YEBqphn*Nx+iR=20+1Kv4~sgC`Q{pYXsi?=7+r>u&vT~f`88^_izugUgKy}7xt zvEcug7u1^C-k@r<;GK*(qV)grVu45HISQnJQ2j5jNd<98m^E_2#{c-&AQ;yeLB1&e z!%LhOH&UEB2|nNBe_0K9>_~AbAB6wunW!SaA<}&BF>U#*_rI+m&G+x~|I;)1q8M|7 zRWBnf`AqP?ZL*}~|FcWlv?X4bF69QTkaFiu&En781H!{P$}T2z4ekLZKZic9#_4-= z-XV30%+skZ%2Ue?x0o#ZXfaV@Sf-IUTko1=*L*$p`ucd><@|6!>>Fn{iBTe-oo%@2 zg}GXT+kW=u2x5fO5%4bZ+{i$nJAMl6Y;nOuoj3d5(WU<1y_|wonq< z<|#Mo4iBn(63b%INiO8N6FQ;a^xFM!(k!hrEN6cszxcUB*Ic7#<6#IMvnd)L(|zpy zWnYAQ98v-3oP2X`Sz1cW|L5W`L2%5o-(_!L|9;0%h$MwVx=N-rs#>FmbGh68%Lqv> zNEoY5RXlZs@KC{v+M!fsFH#{Fp4mEQ?%Y(5&v@pq_ge3m9<2|qrnt^=X4AJ4@!48a zyODq|y9k{PV2WKCB=Xu=o@`ATVOCm>_RhG@d#xS~D_2{niQZm09jTaKq+%I#ggO zh}s%WLKW4ti`RB$eN_HBkleeb@UW!)Ub*CY-${k@3aRPa|cMNVkoVyw#OxWfu*Z*G&h#%sE!$ zMRQFA`mYCNd-hz;G#MVbqh}yv^I`d%g^jSRrMRZL?>3%p*>(^P=O`W$4A%YeV8(ah zFEWGZUu=jEq4cMTiBWx?S@{yJ7zBedFJEnh;ekJt|5bI^);9P`j3D)rgNNa`YYYMVycIKR_DU zgeZD*W~A2a?Nx64<^7RO_H$Dij08M$tid<8W?OIWvT@iootr9Tf{o`XHyfxjVXNQb zIvje!Zo<%P`i)4f+*EN{H7Qc^?O~hs@y2M5X5n+aN%zB5ZHY(Ew6}5pC@FPp{>_5& zKCm4R%Io@2c)+=_{WML!$hJk5eND{y(!X}&bAH>sM_W48wkdd)!&>1c(jPsq_Pf8? zRL|n)$A<_117PA%nx~({=PGRl5#R}^>SV)+a$L5j2*a%c|6SD;I(WJr)C}gY&4@w0 zi=jOv`&$!65yyx`ZpZ|giAbK;o$ON@&C1{2U2 z9Q$hxpO4wQMn2E{(R^ynH$#uB?B?fi%_6BjdYrYx?>&$zGUri& zN6)e9c7E_fDrC`OJ=EM`KUBhct+A(8S@hhvLCxvKA3dW?8jce=@9gR)LK2U~%b-^z zia{j1Z?Yep-(2q6a1LD#b{~g)Abx0RTvWd!$XrA5ja)KPXt!ZFQt$0Lfg!KMlC+ex zV1Zl|RlrS|{`bKWL#g3>Ep{mc+1 zY7KM?g9KlDhPQ_@p0cqtKZM*9hNPr84awU++*k{yPhBU~sdLKHu6^~fx)ATT{ej2s zxd%a^33|MLbvU3i*x}GkYW5DFHSy;Y$||>wyxfMv-TK`o6o&vIUk>A~iP9YPJXIG0 z8wxa8p2LthdEF%fvl_>H!h7PLiD< zZA}iVe|p+lwH(6URN~h;Nc7PBJ4L$zyX?mMl9DfCH)mOsl~!?DGRs|IjmPH9kI3A! z1=iF38rtvTv+RvkQvK&3{?#!jCE!$~n8Fwc?h>$7XyDXlf`g!v&#qHd5SF=!7E1K6 zw@~&qn_k1j_eZX>k7LX=8{8a*YaPvY?JEQuAa41BUqO!3Wf~;K4{KsFQiX?($ z^Hbh0JH$8sehScw7XcZaBhC9FkMgtZe3KUj+1u;O(`ter$>VyS-vXZ9t7;>^e#7x? zG+zr7|4WJDr`5+FnA|V53Z5+%TFLxTO$QKopN@B9g*uOp?AoV%=G@oZ>*w9qNO5Cr zzw6Uyu6i7AxUcyao^8?`%*L|nw7s)P`*(%!%RwIBXj0rxI}iVkU{sObh$zva)#QuO z`xQx?Mgh7z&9^r}T}wY7^=~EFHBHv+eZfXLv9q478=Zc;_l6dmSuH12yWBL4b)%Si zYv%lLEh)|CMqmjmBg5zX2f=WuQFqYw)W4(^>O{$5GsQEUDH9SOg*WuE(DKm%;@Pe9 zSdJobHPP{~zUkrWw^hIq`Xg~@Ck3$sA|;VZ@8sPP0}X4qNVooz&G+9RjzJD^GX}Za z={l`H3RKP!veHtdU7mA%G!1~p+$vwQ(9&|`lOakP$+xS#$35*SgdqgB9xFXjrt9A` zGBhHH`PiJAUEckRcXe1nA19Rjx4T6E0q4`h$L*oYo5ABTH*H-_JtsHvW zqGQ)+A|kLq)6S&+$skza%R&XW zYBG)GP1%iLTR%CMaggCY@w6W@1{ZkLgWa*;nq4*q{7?_6V;`r*Fxf*IXGHJIZN75H zHKpRVsY)x)%~DN*1o>RieWVT;=%pv`=Jtj!)-(s+U5?^?hb@s9JID zTHhe80%{P#ur?n=U$GY_gYy2542WpqTlhZVq4< zA0l>r6rBK$rPiq*rs#TMVBvxUpFs=UaE8g^TZ;X40BanU zgS=WXHd@pMv;lLR{Ve)#=lcs@%Ys{E2y|m8Z5DIWye(yj^}WwODtg5$#%Wgmiwu#B zkQ5#NhH(<}5b?3t4^0;^e{R3DM8 zV=pq3LgUgfc zITJJjwynbM=Zvb^avg@M@n;y&PR88irR@uV%njgN!}uHAr`+uGa3c#kT3Yu*=qlJ$b)^wNEQHIe*=7S ziPvs_sT14eaV!gA={MQ#aMDIEfEY!S^{&O$q*VuFI(F{$*J~a^|JAW96w+V)=cvr2Tot-zi>teLV){BbwZ6k4biKosQM z&%pNA@?K3aTdFeaiKNsc>>zwZ?p0B}t%;=_xr(UM?A@5a?~o}Jv*_5LED*rMFwuI4 z+!{PWY(}Wdzvuu10zq7U&}ylMF3#1?WUNjDFo^p$M)Mon4J~f2E^+c4$BT5k*gxId z%=aW1dG}i{;()wGo4DR^ag1*PLa-p0YVO_5ivWQcbX_IbJs;4(w;d8NsrRrIe1$Tv!xywWrvAs*`1yBwc5@61f9=WBkO9{cUaz|*bZK>*fp z21S(q9M0{wIRR&6BWyaBNu##{NT7Csk#nF`r>@HpL>74S6FRs8D-8R6*&nE~*>BVbQe-hEn43!#b} zM-e`mFzQOf@jtPfYYt(M7^4_$mk1?pC4C4 zx%)YeH^-|p=V_nB#`Dd;9#%*do+Os0Lp=04oh2yfpKe+X7wQA#$$YFp+bGMNgFh5* zxuCqitT~VT^&XT^%)0?#=v$G%s|kcV_m29<|7$&HAp|e%q1!sctL#l@ZQ+AlsaW6E zb%dfuqT$-k&V0lB56 zfCk*)s6O^(AY|4m;sM>7+kb>vx2^&MTe15FsA{>NmFfwwT3K<0_*@^y9rL$t8iTVO zAY6?8S-s#HXY+jBjxs0;Osr>su9EJBjjAL2`vnB?^;wP;Sm=2jYY@u=Jx^*12s=zH zyPz@;hj#Ju{Fd ztcFe`dN-DvJJ&7$)*7I1WIJGI5zInELx-0pTJitk z8q0A5c9VtUw1^lFAgwaw=n!j#&Z<P(>o{6j)FRb$M*uiIW8XKJH9i~L*8QY& z1X3soV8z?m{axXWPg#E%E$*x2m3ml@K{bVz@m|hlCgND-0)U=BLLtj?UiXD}Eo@`lU zjhA9BI)W9SHT3$IQ2C;4(Zlu{T~*dsNrWs|CIo-9#ijStXDS$TJ7b5Z!qzC(n_Vbv zvYi!#d5@rHgKnMbePPl5-g3D*9DW{f`3_mrOQ7&<5Hmpko4_*hi=hMFkm7R^$)gNF zHK%xQ!LbNT@R{sICz5T;`3BXt2YV3L$jTE3m1s<1G! zt^I+2%VwAI2@mGvgy6H5Wtt5kpKOeN%{k9%RE?~9-G?KP|r^w@N4JiD{KMdS<_maG{V8U4!otfVFU3D$Wyt+~?P;KwI% z3zRnhG~6I@DA-O}=q;`sDV+MOe_i;t^AD{g?~{6^|JW0>l`nt!KN6P(9naV}QzU_=#BN^H z&(*b{VJV^DdG4=5xQg(-i_vRWr0`cqq`1XhIQ8BfD>Ma=ygDFu%LA&oRAuwvHf8f3 z>}*g}BCbEeO$q)844V_~O0B0hFbInjfWQC^kj_U#^oO4K19Ho!z^0jGAjlSSU{t;* zc4a>d6uMfy%XYTfode>>9T-D#HG&pmc?fy9hA2iIzkB&md&makTgO4YKV%G8${r0! z5}zF}tWUwv4^IMDGx6;w=U+byT|uFEHC^kNd#y;DB3}FIhr-9lA6AUtlgNI+2S17o zLFvI#HO|!ldO6u`DcD7R)Y;q-Jm~iO&w?5ezcpY)uw3{3Z%t;i{{13sDaBK%N&wkm z?l)+;`7VX|5EQ`}&CApaa&;rHFhd{U@*v~I`gV<@VCAenYvjikDQeXJqjvteIG_ZK z=wQ+3v#tDpagbdcgSNfudGGSY)6J2*tRO=7(`Hpx&^P6zAe|uue89Hk%GWC9m1bfB zI#bd8eSo4y^snEmDz~uPe9JpsZFj1A2S+>D>+dz`Kqawnf!CXA0TMG2hxRdzciCeA zLvBDR?madCji>$kHLud(2D#;|RBK@O z*sYVmXVMpHmv>S0w#X66ZHUtm{oT%jam6R9tj5`dOk1^q^~A05S)Ehn-F^86oDV<> zZ~*=l@n2>JSic}R+<tikE-cQd;GX+*6rH+!)Zf}yDAAwpFP=ISC?x%%$*+9o}OW(4*sIk9SEg6RkZFsWK zGGlp>D&pDK`+@lzP+>N)EZQ%ttS8^=C=NRR4*jt8ps+#NnrocKVp9JFS&@y1Gj@fC zpZe$W^IZO_O=vQmZ!TXHcTkKN1CD@WhKZH4k2{@^obOP)rqn|aXd(EZ59r)nF z`mjyE^UnJ?I&XsC?5wBXQ5a@kNnDbTUxN)@L57!{F;ZCg@@pRt$l;SEu@|Gp9;eQp z@7`*;(tsm+i74kFX-t)Au{BdWoFn|@S~CxcD)I)gOe3AsaaG|E0CvLua(B}9banT~ zQtxYzL*T?scjDR3HPqIwCR!=PvFR#+v}D#6*2%JiS^(|Ex$s72*8)1k`U`WH9n8bo z{;^ZLMt0bOfQiUnX!GAE7*TI0YaBaY2mw1UtbwK#lt)tElX5?Xu~k4g5{YeVT8}oz2Y`plwj?;!QxEj` zWxMrM1^4cpIKGZvvA2v`rt~7qSmyA5*XEx$O(C956W9M(`q%E4MZgWdd*M{KRsYTs zA9^K4VFC$r>=&S?dMH1bDFbu$`%L0=P5VVJx8eono|XU=HqN#^RT+|g3G%HZ5Qa^L zk_wGof!pMr>cUkPHq?8)FT)M=1yF=ZK?hSkNr+udJoVytc{X+l%#RvQI$qok2!T3Q z1bB*T`3UrF9HuzyjIJ=^B+!#5hZ7%K#cus%`hS$@@9g*Q>qtMu7+#J$hMTJHsp7qe zL;%*B-m2tlvP$*(lZkj_)3kLO29layobKA~G@j~l1OJTMxCiU-WZEtXgP1oS94X)z ztlc6yBm9(xYaI>(<8zhrpr-j)e(y1V>UB_-zrT8v5mo8Fe*%^np6bH6-lwO%pO2qq zH3%M$sBKFU#dz-2cFTqRSRDLa4T!^2!a_zVj~0-ixu+G_&D0F40}e+K4~)91(=Sb1 z>uCg7Gj)o8{H5RjD!7&-gvKy3VkrWp#$tjIqM-sR$;KlZ1YI3W_@@)(K39fgMY>jh znO5$HrfPYrv}zzfPy?MVo>6;ALm~AepuL1cCQh;w+Pijxk zssMC$x$4J$JZKE|!mc;?-(T4$P#YEV@O1uiNp0`VAKz~p2hyW_n@J_p-w<^Q+~Qc zmW82Rz$CEN0cur(naaWBOdv&M`3%UMkOu;KvHq!yIhr27uprI#n8g-qydTuK4HlMF1A+z@a0hD-`t)@jN z2M@c45fbzu*-Wp%${XZ)IY1O7$wctc%*s={GMl}cQQY7EZ4st)zC_a)AL|lDhY{i?I+|m zjR)sM_Bt$>UxoPZ#Y=@7Oo*@eQCkasQP3r&;BZDDnI&~I_VqUzU+T>JQHvl{wULQ_ zU1mK|(kg3^`M$X6qE|Lf68+0i)+0>Rhg9p9Qixp23&5iYnhXrpbfmhTFkiF$t1bOS zO2<$*ko|Gfo=~ojGizzM5Mtc?9>{TtFiuO3r8LL~Cv6a(V7tRqt1KpMlz zU(FERm=E}St3ORsP^SRW&Uv)4YnOFUgc}T1xbDume7}7Ege}-zyG>Te0XgLe$lu9i zuaimMTRz-jH@3xNIa_<-!{)1lC&LvM!|~2@_U#zhOQE2jObg1Vz@iNudcV^ksi6K= zXP_x@%aus)L$@fznW8{D>VD@hw<(brDRobvqiNy&;}wv)^8f&;#FN-_{pMN=QdLHa zUf6I1%%5p)2Y|*Uy}^I6p0)jzEBa{zztQVEDg1sxyg|~ign*fd66r2ugd6HD&`anl zR!!F@)1S7fd*9-kQPfdL%C7aN*f8PjBsqfU!samT_NrUyGi3vX&EtwzSAPSgw6_~TKP2BTxGl@j^ zTZI^()o!Y`^&6#syfKi~1SJv%wk`%aRepJ}tOs(iLPg4Ts$@FHrd}A&6M;x8`fL_f zTeqLzq=81{zDqaq@~@4CR!~J!Lon!Uv1x#J$EwDs{Os1MsJ8103^^nS&Znq@X^KM_ z^8itG0?hH-?%z8H*%HwZ>Rzxe&{Hno6QTPY)NhcYSUCL2oObqQow_<$qnpi=9RgCa z=RlBH`y*DT@95{SZWMZ&LPt1>z|i!7_%)cf_-TV1djZ&0c?$f7ghCsIQ1sUd2|7pu zWeM;v4$s%px<&9Du;=MRQLwYA@&GdmIK(n7eTs~Y+ja+023TtqKYh89BZEl+P2xh4 z{*<))cK+(8Ita;^U#Gnpbdn=z8_JZqEA{!laCo~lOcdwkrj`X_d0<(0~LyNFd9m-@-ekHdo+5e*nwX0A#;JDxzb^*yS~L{*x?0Wx76Z?_(V?9+UxsWp?Y? z9p3QzuC}db-Hg~vgA3>22lInl98HHP@totX1L2#Wl>R=nJs>^}PHokt934k=v!NLnof5Z)BglA>b>lR@5*3Ne_4P0p(GA^hRzO zU_sRq8sY;jJ@=2z@zqKmNM-oc5yXZ(v>58J#d%AW%GoMCMWK2{ zp(&{V?Pl|7&*O;sD&ZhoqW8*PXqk^T6Vsp?3CcO=V_u|4Qn=^ zcf!|*2e>wJe4Y^D`L`C$C-gaE8TBP_cc<=(`a!z;MmDF!j|4I&DaXQ1!M;}Geh*Sw zB2+n$lb7=^9B$ZCBZhgH&a-DAAs=DFq1L?n;6UMslSF5HBh$L@v`?eSvduykt7yIl$v(Ms zjD}lc8M@)vm*{b!a_+$4qnXmAD{VtmXg$21s|?EG%5v%Q9G5AfnCjZDo+oQi2-t>2 z-e^4hdtegN1k0R4(?M1pvx8+Yw1PunqSlFFu-D>OTN0n66-R4Ax{E(ip*`tNaFmY? z@47WHD4KRP@I|_)9yoK>DX-}yNq;uU4d8qQ5n|NSBX;DoelXn;v`*IQLBRIHypf#H znEurV1;nAZa7!;b>S|3IB8DG*oG1N?#fdw~Ie2>g%-JC{xLAF(Q)J~BMPT3UXU$(L z2iz|HO%67xDV^qrmLomkxm-Dk)*XCQ!qI|}qSV-v93=a4@?pb@&ED5sr04zfvW07+ z(AEeqW0&2F-3IBMYPvO@@9RBMirsgOa?Jj{`0p%|4C$HUse|ze4Yo_+0 z0i?W~^pp~ppmj~a8?7-at>HtZl!$Ok`ZM3g$jU7WBcPUtexilY z5{K+M@4rIz{t-)-68(4ILqKCx@e?SFI@&9CQ4^{ zs%Y&MI@;^Idiee(-NQ?u)s}8kgy9&DaUzG}e$nrdPDCWaPq_KZYk4id5TmB_W$^Ol z&P1=Z=Ypu25|>9tw~Qww;B=e_YOWHx5-~5?Dgee>UNXm5o_kp z0xFN!j$cyc;!G_^p_Ea7bs)`-dbZV%ftc<2vxW=j7l^9zz9m1e6qkO6@WV!gOo3A5 z#i5Oshs~s1d&-`XKbd%vgoV%T?jL2#y;UDhjg&FR*<{2wWABG?t`xpKAbQz{md@Z; zIo0MUT8SzL2 z{yNn|zAa$Rzy>H^QytaM>Ws|26c?ppw6Vt#uHMOmlqrMfzMtrG*WuhzbdS(K zhvIKcxDTrZKy%$(z_^izOnigs9^;wnLUA;Fp<9%c5 zvnX725Urow(nYEU#a&4EUB{~6Ywno97vD8gz^CP+=xZtaW#9lW4+{5_80&GB`CyxO znGzAUj}PCZ3r@mv@3jRIploa02y;QC?OT8_aHBB4B=jPtCo(LHW6--9!F5yAa~{>G znnmGu!(5+lZl)Z~R6u<79?q|6aL)6bI90$gM^`qS#G+sLP=OmjN4!AQmk)~Cq4Ieg zP-HJxt<}F2A3HYUp%NL-+{2VX0@)QT1*;B+A-l3YPv3RO@{GBPI9A5@;;sy8^PmK6 zEGSIRovhWKK9eO+LXnWBJ|NDEen5Xs^mW=K15Y)&KGL zpnFpEQ^90G=zftXPc(uSr8g-dn1dE5apB?!Mho$Byeto*c@@w&<#F*pv@7mBjg_#7 zHU3t*Alz(EJtzU8-;q11``fdLJVm9j4W*w^Twx1dkFjQ0GiVf!2)c*ILuXw6QX#U= zBiNvI#TKaEEKSsiXV)gF$LKLaQVJVjy6xIcf^$;Q+!A>d#=MEK;V|VBP*(246MTVg z?pHse;l0R}j*{pQA9--v-|gzhK!g(h2Ahz3TJK_`;SrU`QhjG-@PvfVv~FBo@H#ZU z9hKN^UJOjxZb>G|Ps;OP3?Y|xaCt|_vA+uI&64e+ zT*-z$Fa#dovWVZ1pYL-J!!4k2&@}?1hxvt)#_i;2$~yh>THlWjkM55HkA(=OlsfGQ zyk3(~sA>>wKcV^Tl&PTpS8R{Q1lxKzg5v7aInM#CZ75V#{VNLII*m}f5Ggwe-=HVB zm+2AG8&pLt-qAF--bLCDaRN#cT_(ha*Lwx1NrU^UT67Qr3Ex(&R~7ZUEK#O=vhDd8P=si{n_YJHF2IzYSPZ`>d`7E_h5P7r zxBNqYgLd{hitSzygPVyE07uXg>E%t=5WI62;yuHp=t^~`fau1Y-d_pV3MNE>;ou7O zoo95d(9ojD`AYPOc6^_^fC|1tMRS_?O2{(=t>B7aId37Ctr8g!QP`@*$OuI^z0H<3 zYD>C7YP;wE1Momb$d9jaeAP3rLKJ@tf>E6BUvW&{$l0SL0E^rtOMhV`5-rPCHOAhC zYT!IeZV1z5X$ediOg$NXGzqk|^s$`mmV~b15?4{DP&2OC)A3I#S=Q=gx7P~|4!~rZ z=J9#WSF4}txA5fwPv)vX6agdx44}7r7hnGf;-3VBi&&)nwrGhx%lVgHx{_GP3}NdX zB&HN(3IWhX;fng*j3%-E326j9f)K?z_>)c)!PKSVBKOf{=!kSukMQ%xmTptP5h+d6!rw?^7Rt-BW zdYt`uwg1Y?Nrzz%bGNx7PIMR8(l)&|J4v-Qz&e<6eVL4li>J+^Q`IN?jn0|E(R4({ z-;9Je`1vCysmy~}=W*RC+R!*LV6{RuE!6(Xk0|6Qx)F(9JYatyJn9~qK2v|T6ko#>xbQ@ocT^D;qa1y;8-)jC7}rn8{XW!U{qT179zLK;7^! z7R{XLu|ZKG-?dBbk)_)%gmB%I?If2d`v`P%6eOPni!-5OwGGN-8A$qtKd<=S`C?pV zK+c5tec{~?T$wodO@~fnKMklUs0##0-KM@!a8OQP>(x3fqzppouiQ2J271}=2W#Zw zws$aP8U5;a=h1L}JZ^o|O#OW=?B~_;x-IDnf>Y~*St6WBdY96pUq3e)m4uwy zE>hvxQ<>O@&e)ynBhVuO{iD@=tOYEFv&0+*G>yaFlejtb8M zWBSA%n0IJfzg-oSDLwsw&o+6s+co9Zk3)i3i5flK5k25{BN>ELaPW}pZVsavf=7n< znG94p>DC3JtZj&@%ZAcEcX7=r>g-J{M?ag$eN`N(H4wDFn7I*Uc=M8kEP?$lkr5pK z{j%?vIgK~>XIc-AhQXMlEtUo%Mxfzp+cSFw5-@=x04$p~om5qfArx;>JiMthdA~N_ z9J42}kw%5oyLJ%cn4ei#=)!8D4Rz&{xF22QqYE%>QG!V-Y}hYy`XZmD<8F^kmN^(o zA$^7#@*LZnIAQw92zv8_HvPMpAAD{O(^NVMV~)k-CY350Z{+pJdc5zBC79Q+`lPf9 zULBY3kX_tdGmx85MY6!h5&0kPbsgWm*R~-U^g*f1H5RIk-+}KRmb0vIpQtGb3|*F` zXpEbV^!8)97>~{v2p?k!1S+{Upb)dEDrpuM9{x%)4(Q-7zM>EW9YlWOj~AjX5L@yy z_nLP~mvCY>Lw{;A>K3~^xf@z>XTMKBz#BgG8qAo6x+aE)h$BO?Z|xbyxk!$-?ej#u z>X(WRT;GW=FGq+~U26hbD_>1my>yL)6^z(P*m?-Ojvw0xLTGM6xX%`^qnoPADWwAZ z-*L}4h?099sjy#Q8m@pwDownj8_)Tc!yzCa6Z=H$W*_s?cI)KluLG8DfZbw2KUg|r zZv4rQlZNgkEam(u)gq&^BBa!edFKvdYiK={3q^0z6Yhvlai_|eTG_q!=E7hK%VtBY z>#Kt}#dDHo!k~;4N{+7JiC}Ifts?8MDzQ$p*}K%v6xY{o%Nq^KxRve4fx*^c90}_F z{48u>+(7V2v}TmKlXRl#YhtKNd@I5oQ!Vmq@6{7~bLp*CukUPBX8x#>Z5t!M-Y)>X zRi@}hB6>kI;7G=@@?)7&il4@;0MKO0uZy9&UZtX_Mn4(`6CTKX>!BdRd@OAF>_Lxd zdf51`=UBB=BPUvOSu=_q`f$G(4gxHOFeuh}+vaIm3kOG7GkpEIf&t-ix zD;U|V(n3o0-roXmYl;=8CZqXlXbt*WVF75QS|$pO<2V5kB)y&oHD zP2;OZ!pQA_7S8MPV=+6wJ;Qqs;T-8TW+it?}YC zFbs{`?{^2Y$}knnhNDLcWfVjZ6IaGJWvxya`~KKBj}=gts3@zgCD$nB&fGc+<6-+$M2MJs7}^sW zZmt=s#5w2tfOPE{1VeC;?ZpetqCCfcPJAk%bTL&Q12zv^^VO?EU?X%YOiP@p4c+BB zd$X9beEM9O07s$f;<6K#q@pL{a*YPlnloePVp$o037zWga_XCV+;b1F@V9O+B5Iw5 zKL^E2Fd>DW!X=;>2f!hG06Zte({kdH%*RJsL^|Cc1s<~jWP;MK31(aq@ABvR)ewZJ z`7)3MH-u1o+n-mSi!9a4@SqLFECt5}J@ZSR=s#_O`2n+v3A^eF4B4K7I zY%iSn_{NAfNrqw~Z6m!|WH(yWQF+e77DQfyOoI@S;7Dv%$QSoTiQ*LYEs1i;HXE9g zz1VaEalI?#VkKJ=m7B_mLb_Ex>V}W8nNY_-uiq}{@e50QDr*V@R8M7bIH`;1B-^?# zK6QVOuHjHbT0tbcz_?l3-)LBacmK!Qmod>WrldxcA>u5$l^ zbz;nbjm<~Irqk|mibB^xLmigMnAug}w2t($Bc}2Q+XEp|9L1j+ib`ASoL9YyngVM3Q&ZTnI z!tZT0-(K4hze-gSU`K{2o~KCBnlO7?gTsvf6X~x)MtTG1$17}w`r)xU6P-PXOl(p; zHuAn-(hngIlE3I+Mz)lDxQ^^6soeW9MG33*N5Ty4D1}Yu0}h~;GLBzAFEb9tNbB_- zhXtjLK5|m84!`kDm)-cxz16Jy>=r}aDV{FmHt#&X?z#-qS6>p#IesJM z>utm#__Cq?k$D8cENCY?Zw10#?{#1QWW!}7FI4zW&GuZt+plMHyD6&ylS0;zatM7B*KeJ zR4K017G=hcA3EhlWOJ3AH5 zt5W*HYCl=C{kcmk62XHmANliEA>P3`Y(%rsGZDjZ#<|^hn**wBgQueg({8s;OBSn% zOTitCe#5W8sHcrasLUNCBIy{{fMw};{&vz_G7|1t66kcX@-Hqt@mJstMyI!!Wdt^a zNA_J%l>(5J6M)=aqr?@C-LN@ueV_&SNI*<&2*!Jz?+`F`-V# z96s8vN)x$0z9HWS0NF={jO)6l#N9~@r@+1MwwfutS}*a;{$!X_+Cbb*pe<}IU=}2k zId=0+E+z3Cy9{sl$Ffph&+m27s$ao~{o(opHBs}GOS<`E*aQORzECS7;Swq$^T?O4 zf_nAY_o$|jDHnp;+65$_$bMte_lW?HEsXHki=|PZ!Q#4*apW`a zw>|vptp(yYbS>!^*lR%>(jIWq-fqQ*)?oUCf=*>t9DhKxt$=Q6a;KO+ckG&M3%#Hg zem9&{Xv)d2ANSqKo&v20iV7b`YF0KSY}#SF>XN=(^8PJFqR;H;=55DUH09j%^J!w? z92@K)+5eBNGXaP4ecQj?*k|mM-B_cnG4^eULP)Y@-%HsNvae$|w(LrXQkH~Dc4N(2 z5tV%lS<7zz_f+5a_x=6f_c$CK>L|@Tb3gZeUFUV4pJy7FQ4(1V7T&!8^F|py$(dM+ z)*q)Y4mcj(L3bW0B>KLa4#w??`-wKZS3R#!6G#rapal+WD*@d_W_EU{m*Qp%F`Vqp z3Cz1OEzNR^%I4VpidI#s?q|N-C$r#oIY?>QP^imVQ4#C@q;Pb1E&!yZGd6LRUnm%2 z!_`owsK@;hO_2=H=<#HTF{oI9_pxQ%T{JB8t(VhL2kNTU!&blZ_MaOqoIavk^0edh zMy&Ckg``|Wj?LQm&pihBb|5YO(0q*Id;i5qQ6p=Xh5i=k(RW~GM4j~$fj%s-k@*6q~$Y))e%xqy(cF?nyr1J zmn-wW%H~6lL?$Yc7CB`*_6W<66E(#qQ2yl(SDKRUi2L`#-`5O2(5XgwyHPEQm$Qi!&9XSy=Y2e0&kWMiiW8XKdy{@62wca@LFdi7IuTp?~|kf=XM6Z80h5u41dO)pw+tEvH)yQF(Zf=MS<2d!SW5SbKOeKwOSX6(KNplwbGWjL)V zL^D)(z|$#cMPG{5Q$HfGd|LYnDOp0#@b~&I zA~ekk_)s-San~v`nC`y5Dj7Jkbr0)sYh1~22^bjffIx7EhK%_SE6N$Qm4vWr8$aiJ zqBm41qz#$^Dc%;}Wb-b%;W5tT!`tN!Vu)DHSCAvNi9n=uu*llWI(X6;b1vfFtBliiFo}6ub@6R%id>N0rxUknVQX^l(f-j?e`^j-7l* z_4p({_mZ!%XVR=d(}!WwhMguF>mA=768P@$s0k(iQWy2}$Ja>^hiHdZ$3)e^N@8}e zfPq5YupFQOq`U0=k?>I5g$ynL43SvT;SKZTijbM_yE{AqsXzYlc4K0wZzYKr8T#?f z-J2pZu^)e27JK)h)Yq1X)l;b82=&!tapNX;0H3=lZxDxoZ$9bd!mrh$Go&X7L@@r%6 zPA+R7dLLuPbG;(q?%xk+18UPxrpBW$k1hZ{Yqxxl<@iGyL5?D48|3F{RS+0e2!Z`B zCnyLH@1Y*>Wn92!i3nf(Vo80X^^M81pfj2pZU>Sf?qgSfUQvA1{Fc^zPSyofsdXHu z%E}v6Ux=xaD2h9Sfhs12s@2{#|B+)`Sc~YV*TdwTwY_Z0RtushK;;$*OQ(=#r5?DW z6-Vgo*OGQ!+=W_Ef+AiuN{zqG02tKHnfBkFOYCY%m-2Fsfk&+9_NK1JlZ)mD3*3GT z7pBY8d^XB)rE3ruNAIPUosuN(^+@kjTf*#hAn>{MXFC_TK5+Rtbnx>^W@Ls&I!rHX zrdP@@_cNj8_9lO7{NN1{%#}zJ(b(w9_-lfJ(BMeTvDmu>f8+a%xJT%1Sf)xp~;u@6f$=ey&GB_Hep4^lg4q?@>Q7I||v3ZNu426R1L0S%Z(>odi z{3W))ss)5CDQ`I5ECrlZDT5I4Obxf{`15bNDFX%-HWYL92jyC}61>VAvO4DBb1bAw zQtWRyVBI?;z~B$RlBX8J5T;SFcezNS)chfa8ij%$pH|_uLV03dKCOMJHFLLd5vHQ=W*-D?KhgNv9DfXK7!+UQR49Ga121B1*qYjwnz%YUypy)}X8;Zk(^UzbBK4 zjF22&^g?EiM$ZuqXTazYpMbb3c1s#6m}A4~0z71}_uC4*vmPyP@7BaWD~vly8n5Mn z^-;}kp?+vM;wvIKG)`24<}Zi-xPQto9S{%EabJ#A${ZrRIz$RIA)vM z>N}(ti73hssp?xFdjfGF5Bs$~nTz+0nS8CkW9|LP%vqB<$t(1po_gigq?CR&LnGf> zkqZ5hG++EukEhO**ItS%oN+07{V!J^7#uA zdrVISn~ePy!GqGuT1%KZyLUS13>*Sq9_a?*rePK~QAg>_)P;WT^$=i_y%;PuEEn#z5VAe@g8Hs5=|2ON* zs7Qp@n+NF%uHdua-` zUS9`Z;noPzU8*{*i_0e5@*}>CK9?s!xF%5ZRarSUtF=RRFi zI5IM&bF;4PvT&u@a1r@NGBoR+(WC?URAT@Ry%~EP>JqA^-^OsIQ2FcmVEy+w7hb9h zH2Qh>`H|m^j*+-5t>F^>z*@ReA(8`q1q5^5Hwpth^lkhBLrDGE!Xau1B1M+&+@EDD z$4jqSKzwFoUYYPP9~~lb zn6`V=may#edS$aWk8z_M9P5-ftvZ@OFBvd-`T@OGVfF5fUc6-BzK2Eh@)=aZ+*kPm zo)+%Z8`+YMym69((c{Djw;Mp%cs}QWdW3U_5+-+V|Jn3F3=DiofSQaVLQ~MgxcJ8U zyGN+U>7P;%63<4WXf)@6hir{FZs?j~q`&E0R6B9JS^>x>m~U{Y*D;?nmmYrVF6U)D zSg0}uA}Ge2(8sqY{DD zz}r%BF-pz5aVrA*`nsiO)b(FL9DQ|{?CJyVW2|=};n{=_kdX7_4Pqjw{Gu7ASW$^mCLl_W=3vSG(bDx0h2V1qRCf ze&oK6eua7+O`wlk%?n*_)qL2JmW+nSuzR7)->Ka5JvsdO9(X@);k{T5NhH)ofCVxj zuPu&FYA2>D5?180jf2F?+jh_&(;iLg$$iVuwt1WgsmyXgeCz>~92fo18LMC=HI;vcj$CB*A;^7=VY&Fz&~UZ;!1?(8{>D4!#{;mS zq+l`Fnh=|b;PB_z?pF2Jc(Rdkk3XvP2qZ(?(-jT}83c5zPCT!BfRIw%joNc2kpb7O z{@!R963HYxEzT=`@hOk-md6vrrqjdKb>gJfY~wQp;(nBv!-xS-@dvkf$9QMSA{?{f z;+KnqIkoU?fExk3%b0Fq&%S9M6#6 zbl~8~H`=9}tGaEyi$Od`_MwP3@CcpBG{us901P~aXWG&ByBA?d67K*h?OY~jnx>7@ zdx1al2^Xr=7N`>`*Hnl4_wq>Nu@W9RGJjQkIN{;5;`-i-89tW5cFyX@BxkbuD|S%a zKS}&T`1D|g9Q(9$rn>UFs6jNZ--gD+Q0RW`sJ;}4YShA}jHXieFYtOSo;-+BmRTW= z6_1gUaJK}iEz^ymK`iv9H89=4&pL7b0GWp>btjY{DtZ0}J2`}%4l2Ge@v84K2fAUR zB)rGq^C?Ha=hjgygH+KbFdCXhLt0iU(|Dn3(&cn3&k%OrNzArC{~~^XBM>yM;w?I? zc5S)zT3O__!0%9TDOt_QtIG+C&J2^KnXzIyXFh+eRUZ$=?3Pf)eC0R%hb z7Ngpe81i{@R3H1Sj99rb6n*6(Y6iJDKn-nuh6fhKoh@SGRp%6D9#FvrvNdJwQ5nH& zND{kA8t-v8WX7t>Lm>>##TswhCQ_I@wM>J`8~2PqF?}-m52&m=gd|7ixzp*FovGNZ z<9qkEHp-mddu>sb*isZ#@|dXhCxjAQv?d`F4Q_{J)@Hw6fL@+@jf$n0dBwA?UJ>Tk zAxFDE83zgXrDc;5HEz^1i?l<%Ssi_n9P~spG|ZG3V=n#392n_;J?XBPYfi8K229nt z-^;z127CwTl2FOcp3x1CpaC~&hkb!jA^Y^?D0h2-$~V}t3-$kbj%fH8h&@%%gt7Tr{x#F2@lhX zxqI{xq_0Fn=o_l8zC8A~_n&@&4*%2?bboTBAGwm68nMiPx|9c^fj2JBhc1UkeF)P} zzk;ItWs91CNaay$_r+-5q=QZ3hrkvBIU3rap8&L5kz8FWQc~Q+|gd0j}o2kDL#TkK+?)8%YX8v$nGi1r% zeOeA^F*I^H46>K3%!{Ug_@i&PoJa8D=Koxu|NRZ43(;qRq0OuO)JEdQqacHHOZl9a zn`a}yOi(N&{~CMt$7l4MhM2TnZus90D{6_ncF zZ@o@B3#b;*FdqTc@zv)R^%DXhp)8r7`?L-}K>A-_^9)3Cyh9Ls&9muX3iX+Pt>M>y z{XJDRgeoO%QGX&h#7 z_U|g{pLMYf0f9Kb7BG3Cllx@O`#}$Q8@gURq5Ahbp4TIK&8&vFGi&thzg`#j5d=@{ z`~u1alk_bR-_Hh7IyNI>fGg|7hh5L3iV#ZNMKn0P#Wz>o1C+&1Ah8jBC37kS_cc!G z`1uR-S5H?a>nndpp+Vl@IOcEf`A?mG5Z=_U{5cL?5XamS?FBh6X7vu~zAI!PX zr?}O@@Y##z+F5*IzjEmznD!JP*MTKlXUys84g4LGVgge6+Rt#b7qtHCZ6^TYYbG23 zDp!CX7OouF;1324)EIGb61L*AwmP4ZRE2^WW){mQAH zjE_tT76y*^EDyD^2P{95af=4AuBu=V+0zBtH_>bThaeD#=kv(~aq}iR z7Y2Qb%fHbMwE}`X_WW(@#zC-O=iUT#Y|EgZA4@?3cPs9%%jTa=!;d6=wnO95?+^C> z-b3*l&#`uBH*3(*uM1!|xWE;>|A@&n#Yo%`A$gRo@j@YR0MYdhd`qp?!-QBS=7 zecd?;HUCen#NWN)&wb3T!hi&$&G&HUJlh9AD9r_!%JvFCGQ(4SgKLs_x^2`tTAcyF zV_S;(D>#h)dHrZrT7axZ#Hg;~513e(X_ZYF9-wgUOReiwK=kY4^DVmc@4etZ{}TKN z)l?FwRTX~I?UB0n-z3shJiKMQ-zOR}FG3{c)GrSDY3;*;0r^$Yt4)Ci#e?zp+VE{g zyPnYxfEL04XHb>2in8knA7CTXPWDF7%6D6O_3*oXv%BzH0spV}`_Jz6??0@G2_-r6 zx&7L!IMo84%@0?vWaA+PU%!ESz&TzOu@b!;Sl?a7<=0J5M9aE6WU z|Dq87&!zeOjRPGa)*Opu3J4On&l(+xMmGXlb-&<_XDf1P6dlj#I%hQ>!w|Yj8)+xR*2M$}0h#v{XHcGCBx= zbn1lfZr0)n8~=5q{PWx5OafOuyip4r${^J@iWhkC5ov%Pyw7q3&fxc8^K-{mk=4TFY(j?2A8U2dwLh7 z{J;2N|I}Idtu5@sZ$l05NDeiJ{0DH{oogqB+T)W!pKP7MUhjJg$d78Dss%J=P87<> zyVw%C-#`@6*cn;bZxN~__rbDwj*Nt`{reyY!_jM-ktBk}ft<`f-R=_u{yzoB8z8}V zF?1*vziogW^A&~xzx`i*Zhm1DWF*;?w~EvBK~6Y5*5Ux~JJjGdN~wwmOQhdp;2#9d z9x$@EYq{9igPOxvr56W43X^k==S5;j6k-hZeF_k{X^`e&p1#- zD(l!W$bFoauDXzZQE)#fYXGdIN7C)gR7l?rgxopipt+o;Bx;GuQkS02JFW_8O{eESWrA|HF%pBd)w&C*DZfbJ6z+{aDg zZR&tbS>6=+{Z9n~K3l*-!VA%3PXSDk?T~PoI*fEh&r&5!qFj6u;3AnU8tg%yxSxB# zqYLT^#D84%oD7mI5$M@*(qBCBD&*hQuB!Iis-i|-e#8PJ#@@Ne+OZu`9ZSh#Z*C#07Rj-bm4 z!5vNE8D9xPb4BVvx$K}d5;rgeEJw8DEd&Yf}t0daQX z$LrM>Z*_r=(iZ3T8>BpMMx+>XVwi1oP8~p~tV!4o86n?NB_e|Ws;`=g_&r4ZfJO{F zIY9DbSp~#_+hfwb9)4Kfb0+A4acPeCm|(JEkqO`^-@RS0ydy;T%^oVKzy>(Ja-eGd zTKZ4N)`FPkntNEuR@toEYww(JFX}UxS3b9A>|&*RJsnQf7UzEix^#MxAsg}{L!*j+ z)ac)f>hvlWyrg{S48QVaqXmgyZA#r`cn?B$%!s7sHP8%wSrXGknAEc~{2Ylf=1 zu)snyF@32N2eX~=lPt0VCD|2f?87geuPA6AgUz*gO`weJ3W1=*L%U!RKRAC>jFf$EPHHzSbEhP}+tSDOgs5M} z&yEZc=*`?QKz`QPC3E)R%OM|DLd&!kpMprqZKz7|+k?^Eh7=(B-~>_3tVA@CO4S|m zbq_bVifW*-Q9|da&$qk<0!C(it`B!=4;a?oQ3gnO4OF6w);wbe=V|i|GS*Ge8C>e} z*{X{!Uie7^PNT?iQ+7^PQ62xR2XI=ETd)u>sbOdq4iq$X=R4rFFi-UF`*K^7Fp~8+ ziB}&Hz*LCho823QH;0r<7CZsu67xvbw43xw!m!CI+lbd7uCe}7y5a@^rM$rRDnZPA zM%HgK4-`6G`$m2$KYq0u%Bnu?vQ+G*5|j>-!2CBjhTl5yLgNWGguzN|oXN+228TP} zkcT*!M&Zv|s9I<>y)!y(4sLHZz_&ggu6i#9=V zCqnSm{bdHdbekgf--J}~@pFsd{f&ML8Q1OV=0Lq81NhXX=e)Oj1akqI^J^`@;`w+b z^_*866~1zoIK9cwyzA-__Esn@*e)E{2#wJAqIwbBps$L~;-dTr?Zb0*ZVGroMN_uJ zKX8;+-TUU>6(<7e>5f#P3^sE1I7dQPe6AH;GoyG0mLg85u}fs+v7fjr=qK?;n)lTH zc7Cj{-~hKIWLP09YwyZ8e3ljZE`q`zMGTIp7+dIt{(U8%S!xo_D@s4p6QyT^cc32A znApuQ2;a)BFEN)@+HpN#AhUFV1$GJ4g%$g@fVyq6+k&zvihf8qL$zYH*|d4zFA#*l zO$gdLAZhI2;Jtvau$U-GSDkH+K=5h11CA2p>7^i|-`i`MqHEj$1E{0DsFrBx%<;J+ z8Xo##1?{M|yRIyTk}uwTKN})}XK!uXiED-zWdAUSk-h?9I6#MqBoKba+PEMkNIl$j zVa$?&XFz-LjUt1#$ij$IRQ}~cqK#b;)yW0c(kLdKBM_XsEwyvo9rJ6we}F90J-AM; zV(IE<|JcaySi;sR*@U}BlHngsf085*laZxVqbBht_Z79!d&14g2osopiiWxsT80 zNFxX~LRyz&W537+O|30kNd4;bJEw(Pjb^*8T`YGtt&}7$Lg4*ey;0JP>mgT(Q>?18 zAXH8&TTeD#P>R7?*H5CANvs%7Ton1h*&|&?+G_&lwGR%ufl}tD=0G8VaPz6MGY#%5 zE6fnPYAyHS~a0_LWiCL5bUpe1g#m?{#=TJ+3v=tjhj4cyo#@-58~k?v9AbfNrbK77^^U1{ zAn>7_*CeAB)XQ-|IPQyCmFmJRqOa^pPOB8?ubwW$fo;+gWez>P(yj4*+(-r2=Y!rkG$_*<^z7Gtof5`EGE_n5u6 z3ntKY5$=sw`RsR@@LknQw~uG42{%~82qe>WfMnDhLokpXA?Iyjb)bp*q>Xqh^F-`UagYlngQdcVn@ELX>b`+8< zcWnpy0f6%wpX>=e)~IcSPSRy1mO|Mzc*GxrH1%wYGu?1(YL;2ArqO^iW&@4EOCpR4ApG8%+fb5u)VYCYTV z1P-*SK7F@u7#Rmru^F!I*d|$kY!BHH@`We71NiDSJi+J0;E)i6vy5~JDh)%d=3G9e zooUl^IcgWs5*a)Ns&`);BiaKPs<_4#$5A!fQC~qK6CiY|y9*QiZCLj=91LEg;3!ba>8DKBNhE&>31lUPH!9Vv*|ozeVXrfMdaaEjaA zi1z^4bw|4|LYvt0ig+B^g1<5*nW>V&`1JhP*(In1wq#PRxE=TM_rY+=b$^NQwx(y( zR&2E_s`EAvXFtRnR588%nap6&>Rgdk& z#xKtnPVM7lydStP6?hH4+dC<6-jH*?=j;1RX|I1Ea3^?Uj2_z3N+p-dh^T|{#He56 zr+IRI@Nz<@6WX_A9PEHB7%2~VeBvnS^y@8Psoeb@Ty|XD9p}X~$dLC@l82ilQWZv` zz2ksMPY~01atE{XvR4&VuPMDQybtcZan{ z3o5wFVkUd;FFc5yb3<%p$5J`YT0VQqK%^J1svTI$PeM|p%Tl^>dxH7xji7Y|Jjl$7rl!0{FgUH@=!>*w+{h%r4Oq#)tBsb$y@Ww{UN z)+*=uy;f&sQ4@yPZ4{!k#KnG%G&XBWblqX?tEs;$V*vMcjm+>+wxl_4Hg*w$)|@TY z6Q=oG1W61%^t)>Rm2HPr`;_9~c^vhZ6}mpXT=hblA0K*f#Otr*4BUu) zR3Q*Tuujg#5u8dMTSN1B_7cHkQAFBu&Q{#GRo%(F)2g|jJ8~FuEc=Kls34k9CPBD{ zI%)#g%cO!ljWJu;QwC{pTzp>UY}+Ls9>|lW zruOU==hkpDB}rs&@iB#@hY`SvnGZ6q(O@ieet|P8U{R_0I&+t&&>Z6hdr&S2FDk~b zxiV^TCGDK=Zne>xl&**zMOx`s>b30zniNav1-9|TOSV?FS2t5qL z$@U7Ir_mMC<#ys|RUWm7bPFfV%~DZVifHshsHFe4!<9OnRMG`pN;l)Fpjlo6cpB51 zQLy0xc;zn+iY|R1cn2d=fB2bOHHT&u@$BQW+*01lwWnr&iD^>3FQgw4eRz!iK9^0@ zQ*9hG2`u72SRmy&E-hf1CTO`Zb_2!W3v@5g825;9Ju?5u8t(s{6nvcX+yp#xpiA=v zhU&`edG=pg)$8GNbf*{1ib1t>;@v}5mXUo(9Y>`z7l`dD^?JR9WjmUNFrOOoMg;Z+ ze#7oSER!3S2T(_dvIBd8d?Y=Azoinpcr@$}k$QB&5g{5;R zZ%Z>L2%Jf$2ba0&2PgTg^}kp%CS4i%?%15j%@ z_XI?Q2did^g6i3c+VaeshDO}YJW2D?5eC2n&+^L$Ce=yS%S+`-LUYx`2ItjxZpkUv zUFMsUIYzP2I8>@W-B<{T+@XGQGe@G~)ka}YAoD8g0?a6RQ%N;lEu*+Qh{>_<>Hy>n z54Kk6nAS{J74{en>xm5`94}SsLthedMOF&siz&E1n?>ikHF z{YlgUw=W%#x5Q30*n2Lo5U!AvG~^p}S2r`%?Tr=z4i?M6Xlzb*@7C zwhBB+EnF1qx$onDq>BLy$UIrqtjy_Q0cR}@N zHjHYCmh^OB4LY-E+w~REl`*hYv>Y%pA?MV8DaMbX%6T38iU$Q8fS2^ryvNI+S8XXW zu%3@VsMuhwU8TREWqgKBzYREm{k92UR-R8DVLFiyf_n}Ho;~Zn={GeqU+aQn{#9-^ zrr|H7)25_I{^ZqH;`y*f^!JJ~NA`AKm3irf`!Qj8PNFqix-G2i8gAV$D^_EcqFFmR z)@SZ%r$a2tGcTTRdz}y-3u81CTmhm@`2$0LaMPI!69FZ-e$p=foa6>45#gZJw(6*C zO^cSX5}yV-W>*E(S1nEnd(Uj{*`2DKqWtC?4L|;oJV(w{gSEVavHMCh4m|YO3T<-{ z_Lh*N$*E)EevOt?&70ZKGEyR$0w-N8j(1LqFWvBNsLvq)gZlY)Xl zz7ms^0V~?X9#=v>D7W3G>L1#*+`Yx#7p{y+5Dl;@tNL?wJ}SBUf_N zx~c?vsx)!ffDaMyz*DrC;QRDFYTNiDEjf}Ra*`-MVisdUy?2Cb+_sN%E*28-L`Vo> zI=)2M78qYwTBFMEo>9{HWD}NnkO3>1i#Io0cc!L3pD?DwMd}n7(x1o)JQ*=w+~b=k z;$pms);T$(^VAdek#`m^e4xqZ;Dt<4?1)!6V;DP`v7^QO)HDlf#rp?S&Q@VHeeAJq-lh#bYG7iqNGlA^s{R zL&wq}@EG;APMp`VtQtHi_fGoUFO@;lEIGMiAuIS>hDPS=#q*X}z1jx{ic9Bex_N^m ziY+c3>S5odq-;?@E3|JU9F)Zz)q5(fpSRoRG^lg9rLV9e2;R88X^sQKdB zb4h5l-t)v?TO6DDtSi~71Hf6s(iW{ev%R2=@>g#aejj7*YKB{=AUc5+wRv$1~7qhbQ@xv{}6VmPYUL|l@_B4Yu9Nf z7@_{`hp*OtMi4}>GGDo++IBy$R69~oX&=+gzjvQMu~pv?K^fm8U}%Brhk16Q^3?>i zZXgk#O&tU$#A-Lqk!LUKB-{NQZ^;<64T^r0aryo^bJ`f^FDvlMwC-1U*ZoFxc3wsk zU-0c??`6g4m4t{_Woke0&t)u%y*#`#(i_!)?ThU4{o;AC84VRn+mjQMjK+B*oDEoL zUA^8h>kOyfbLU_IuYUm%I?CU-rCu0L38Q+9WHfDQb4_ysjb>A_yb1uq9}_XhX8>xh znfcb6&x1QX>7or?n+ImK7=i0<3!{uAYsgAlc){!1uuk>gHwseD5FL7)m+3}KE<*JQ zdC_xv`v`fuKPNJx$m=YzxK_q7y-)QXYiK!|KEYk|4b%qZ@pI|+TZ%og?*q9;!Z1>- zA`+zud#;xEz>~?8`N=nQu0!BF<<_bf2D{}~+J`LUpfNYq1HS3nZhkeAoLZS$f7$1|9NX9bIWf3!(2%iaa0O3Vwx zIfF;j(mG-r9tmB{;mA8o$$ss-|J9&iC)N`0GB~k}m4lynl;xmn$OqM=3S7-DNt)$9wyrJrB_VkQPrqlY8%#HAkDxty$6GU9A;ISWp7xwGYjL5ydU@s>OZrx4&k@X?K z)6ow*#3Apq`E%=SID2Sw&X#qKL|rF&Xx!aQiPMsIWg2{NZ!6;(4aMQ{;~NM?vj8y-CLBeyHu-rjLlPp;#K)ZmJi+y_mMk?~onyPJ(E6k$d%8(kZix!IEX?JDCt zI!WL5UzfQhuu8Nkd_^aOi-sY_d3R?QE}!)C(&~RLKCEb_Kwz@zI#~AFZ*Jdj6V7)) z-scIlN5Xih1xwUs-jLjfz9N57raVoMd>Q)qmyPfqni0Won@7;{%7EW?EjF1aVQs-w z32uK0;ruwAQE(kkd;9kJGW>LmRpddU1C0_fw)cfWG3quN>ri%9j?8fF@!v%(K4r+xx3&!Wnxez&Y5!>uraLPHa`PQR~* zqM*@V)gUNTrGF%b3YWVdxRpL9T0Fclyk{D6Yss@eP+n*7yvP`1>qqA0Q!sPC0t9{t`&h%%~9y^kSvGT z6)xSU!i#6k$cEXCAU7o*^>QwW*Xdoylgii^fR#Evmg>kwZ`gaWD{N zXxL=-UdymzkW?#HXz_Sj@#>Sg)thV~R(Yzlx2U32TH+X~B*CI_i98XXU;Ys~)(w#= zy1)Fnc~1IgMGdF4gRtaZ`Hjt971hQHshS2N=UqhBA5esqqFS%eo#yY(Ii*M1#*|bkgLI^Au>`jvQ#1dY=cBE zt7s=#LQ1AY>}=&MHmi0L-oJM)elFsQY6mHy997VP}c-S)baS#^lT0r5~B(w|20T&@BT! zN?Al@>Bq{G=fs4}vMDTslj;Ybi@H-z+rwG1n$>m7($xaGbMM8>Wn%~OJ(D)Vxl`5;qm+3$S3f4T`2qak1ZeBOKK%zBSSAmX)4dE#)ulmNrdzr zzF3J*wCp!!Zr1tI=AsDUaGj@CNbXcY>_QvtQB(>8B2yAiIg1-wv+lv}q9s0Q$(*sz zOP);tq7ETXN)&;E7`on-&B4+leL=3nj*rS7#VN(V3jWxXM7GOtGJ_(_oR^>) zQbMwrp5h1B5SMX2NBNTw8^O9G8{AbYe>GM{-77q=wSnxTIJ!sl>C11O!4}0aS+?`k zSZye|!Ey_(FEI0!G~_GPV#mmAZ)H@F*I68EKf|d`e(6jZI!pabFv+1fHr#BU>JX_g z5VQvDd<6u0P#H@8FAT0GIyAknRzJIlt?){G64vQ*#@@TLnKvf<4-3FM0q8+^2CO)bydpT!#bC^Q@vk_ee|h(&WiR^s5oWQ69`#Iypx^ z4b|J3+1u?>r+cw66m;c4+Qfw@upK0=(y!<}!4eB6uE!(D;X<5ti;CTE9Do<)I70H7 z?(sMn^N*{RN(FyW9u4CqrJPHRvHoJyJTu;Vl`ppN$)Nf}dPrm(tN*Y$b$H6-=gtxQ zak}C`OilwTH}OxR*g#P3-;m6T;bBjesjwWrKA(HA9!_;v(uI~~Kkd6Mx5*VPM%jLI zDfGQ*M?BcwC%kEGH@x&RN zlIPe6u_BtMblh}I)eR#Na_MppJZ1D7yb^=?&()l*$UTspJCjf?W&gC{O@vC;7I{&{ z`#U`Q%sL!kDUmqm^&?r$6e)FBk!|l2Qk=jl^~5d;6L@jX_0^)0&&CeVDUWt7_OQIv zW;!2frt~Y0c#7{uwsSV(Q_`e}GS%geL^B6yL{gPTcI9!1FYs`@q~0T#nc(GZl{L0* zdvr7DHrictksuwBND(HOu`oBAvD$-&wXLKDo( z6x`px=l7xp@}`Yaq~Ay8gol3r*6!Q=^cA`-jyaZR^>V>^$ISF1&Tn4ZTr@qe{No{O zkYuDMb)oV?no-sPk<{tXSgwQsWrWYIW~(S5EN`+fZwYH?)kv^XjqT$yfe9@8e5*2+ zODIY$z{&CmJU<7pj4quNxQeY_Dn`a8W7Mr3wPiHdC5?R_&J>@qtLWdLE4oSVALq4C z`66G<2Fo2Hew!)+`uU2=MGJ%bHm#iYYu(7S>T50)xoR!q2m&^hCvcQ`;G2E*tt;Z^ zE1ao_iFszDyv0AA2ahkbevFGmmWb+|6dK1zmrLD_cLjnLB;4>_Q`sfQY$ z2;ASLSBr1Vkti(EL4&yus;nS7#CKg5dAA4(3^Hg_d3)LW+4`plC1N)QKiz8~rBA}k z&=Wic4B=$;#I4~y3N<>^i`#kmc^dN)hO8O9TthrL_PK|n)SurI(;{0`I{dTv7p~9@ zmPm>$CAm7nRYT9w>-p43@)%qu@mMo+5GKAtCgn=DTGzdyynbxHPRH_W zmO)QoPWKz;@}g|NO`@XrmRCtF6I{W&Sklv`>LziCpAz&fx8#SkN6HQ-TuBGgOP)r# z@C3;o8}7@yLi*$4g+9mZc@3!Rk~}&uL2g#oNR5{=TFuu*S08;C+R^%uO+a#O*^8!5 zT3A<{Zx?QC5AN~?YifS4d>owb=pf&;_OaOCvMU_iAUCq;-b|M_Wt9ZAWmBppDmN4) zNFYZFS@jc)@_2RjVDdtj;1Q9IJ7^B1#w@LDfe5o}`iJxDH#s1x&=$;RR(svn28g8i z*_&mJoh%k@0X+RCZLGOW@G)~APJ1@ySD97;t{clyS#di^AKlVdB7D-S)D$e9BLlV} z%ZjB}q{GQ5>9qEz>zjwQ*c`9TCA(L(k_CTBw3q5@xae)?obl3L$UoAJvrZ*&Ap{Zs z5O$Au(oVyae8Z5dgsZK(7e@t4&Be%R&uJD;}$%d?kYyL(X-ilM)-%;Q6xHsR?D3_bIdQlfs6Vwm7;SV)x$Tx3b_s5LXbXf`*+r`MxV4~R@25QhRDucNJj8B+npdJz z+Ws;N^3(7q#B@0+*LqsgJ5A%!i;aqNJUUMv)JKWFtChE{_-ylux0ResotA?P zio&(*X5x=^C~XtBb=0l)Z$rDuk#o==&d5qN?AQRC!+Blndd`~mhRIoYn0bmgiL2Ob-J6NH;XhMqi@x6;h2*?Gb3XRlvawOA6xp;4awf;_g( zQ%C!g&_Y3*eCj}Zb_{~=%lP%%lW8wXg3fUfF6MVnNM3PVLsOsksxzCXHqfzX6l1Cu z&8Vq54ljdO}x zZwOy*D%1weh$Ojy*6kp&KBuH2hq^b-pK}K_Rh*^$niYC}BPxT2y~+--j9 z*)cG5$hOEhFr={5;C~@OHr$V{t*EYU%5=7!oIQ0rrhR1YWg^Zo#?!!?=#e{?WX?`E znyDIK>YRVq>5XXd5^!5~Z8$gCp7oSs(j8a3l+qZgNp#eoCEGbhZ%SPIVQF_tC#Wia zP~vKAk%c*C@WhMHWQQaQXV=4joFvfc1^EFFQj0&h^VBZ?H6~!*9db-cPau@ zIiT_F_clZUiT4Mi(yQz9L(fUApF+Xhej3Wnbird<$v9*sT$}VrSQ}N?U25Y{>X9jy zVFpqdj#UgI_Lym}zIvm|s4j$IB~d}ZuTZFh^Kx)|CFKk4P4v(9?$Rg*nhnwMo;Av% zSHCO^+KcF(nhKY-z1dIn_Hzh}iq}%U1!cM4u@ zUCc!vf3DD7|3v2m@16N3XAiI6G?R3_B)?C{nphM&Lkk0~oLbFDq<)Do5@Vj!rbiAw zx8g_&!!gaGP~2rcir*xdy<7C&_w19Zn3+cDtitws0FO!~144O7zIfo6pQI^99F)Cl zrW94UOf)x5gDiDFSL%zguvixoyowck zW9zQi0}9W5Y|Q@9lN27>`|2dc&W1ANzBRIL1SwW=w{_mVPNv}=EAy6vy{Q)YK^SJw z)=4}|^`hqmzhe%Bk~wqSX869&FE6LhpO{SM7Yy0qIMr%8OI z#p`e^Z%>NA~`^C0W8M$qqbqgjwGLre(2^A@3rKG_oFdYumK6XTSc+^jy+Hv(pK#Y%Hr5jp{qJXoc)THuZRmF zS~3?&x8W$>e$nR&bk>Cm+Ted>h<1; zt$KN(W}b-2`QtRVOD)2p=AX(uy%mg~Fo-#ewR$`*md~!Mk8s@g>~4ws;d*K2LuicW zH&T+39=i5&F6}o(Qo8huPaeA0tLN>9p~w2LQ4uVJAO3L&i11u#y?mY`l%l?E$fzTMr0WxNG%i%HGO-9^UOt`bcaMg{_a{v`zhPB`gNxea! z>Gd4fb!!FQWD^t?9K)B2Prm4LtRh~`%a8aWcL^tYqt}z4Sh91rqk}|0H9sM6B2)2R z&w$jyN=^iVvFaV@pqDGE@R&KM#dhiRZIM?Cgz^>bli^7dnM6|(5@^zo&k{c{+8t5e zvwqMmWt8e;sHgvtbT7IOp*t^QL=m+KCzTgMPIY6e*uAzyZ;}*Oi|dNLRtFl5cM9vt zVc|HDb;lk1c$UaFwZgG}>#7E_dNmW`o+F48_ zO&uFmIAQ}K67_g6>=OL@=SFhe9IS%X&)cjXqO%*tYIob(7G>s%(|h3-5VskFrVPWdqQ7kP9A+8~ai z2uARusy!e#y^3m?^<18=?)9u^>Ga4=u3(8UJ%=5%;Z=3<#!iG6k(wuR#mx3Jnn^}0 z?gG@nMQ|%wq2Kieha4eMpH6^f8>h#rd*l`L_4cF~JOyyqcq*Ru?7kls59%s!L_^Uz zhv{HFyNT?#NGUIiN8uo_qFL{V$+%EC6bYmHV$VJR#o_C0>$h)<7z>vil^6;{@z#iu zTb{vkN;z?Dc(BF)a*-glc!X|JUaj^*P0Mp?uftPPhs3|~t-3kKOT;7-a&KJEPACP$ zvp^tG9%p&J;OvR;Z1Cr!es*jq#j?8xn9KJ%%6*W2$jqp__9*;Z+^n_RfRiISzI4Ln zKDo*`9as$|nY?HCeB%YBuel1p@dTkh!B0l0DKJRdw)!PTmc87`{NZ&MzRW~3_D6CT zKKY>TZRDP&yIalq#3>!yos$+_i_F5lH*zvL*uN=ib%HMuk+GZ>ZmyQ@>>OyZOpW(= zBILHyB@3Rjs|aAaRi}Ir&NjztE>lh-crq@GRYsPuJoqjK6-^Szc4*T4D!!E;S08m0y7!B*Z?JFxaflyi2=LY+wMf_cx8aIVL zHA_A7sO)0+ISL7&lQ0?po%5OsiAChbhf&q0Df z_h8t|uzODNcQ|>GFiF#pgIM9_3(Odn`#|l?<^w?Vrbz*a^?&@VExO4~&f9{c@ua0J3JLR?+m0VlTHMI{%tnutt zg2lIPf9nlRZ*gB$Er>zlj`TqEP9(X$)o-ooZW|~7+;o((zXW3 z1y@IloT6E^V3-7w%?V!^n>OGTm@7JUSD{Q?SHpA}Ml@jp+njggdT zc{~23ndW^J1ZjnNkcTMykWQCRfN}GH+xcdK?tQDa z=iMZ7g9=CTv%+s%NK_h+<9be7~Zh7pQjwd=d_R__WF$wE!c zwzeW?&-H3QM75CvV-9=BaglS~Q{WD5GjJVa=$ur}a8yX_m5g?vX?$#>*xv^oRtl(b zl8!kcq&4z#qp*|?posQ}u$ki32a9YKMfH+(-%0O<{CQ8aISlmRP&=ypl92;%$%bf2 z>|s9Md-k@v-36Y8c1pcCZwV65v(shBf?=~5X30yKjJT*w90%dAf0wyAVM$?@vZrSe zl%pyJdznxj&%Iz^%P+x4k?W$1GtGJ@L&cRSJH9ykOfd+R;K(+%UP;zYO3Z}ahC{W6 zTgM_W=}ypL<*r;a2|8T^4O&H-zrcws3Y6%&LtZkWz@(J(QqyW)phUi+5B6rS{Bt-Z zA`nJ2Ns1N7+k=>IVCpHN(XhZV+?s&yHJW0i5p&HcDF$MV#c3XnrL3h@AvvXF1u?9o zs92Qoy@3c0Dm0C5UJ%)1LZnu#S==*SUS;*BP=qs08tsX7PyXirtw>^vr}RL>U!jXW z3A4vZZNozSitLjlRQ?$OC7ANZHCJGJc?Gj)PHfoq&}a$UCjWhc_bZBQBp3(_T@;j5 zN~9ZSWm-V;4tx}xj>0!o({v4f-cNoO#&I|mGvWU`GlgqiYWW%=lz2TtbS{~viU9fn z2VA+$%7gDe2KAM@b+ z;$S1})@Gd1GBK@eQQhzreoL=#2^oC67VQ_$k~}}}?>i|}Gilc(1R0+L>nR`wv_GOO zlGD(p>2IU=$n%O0vR1l{C#v|~39H5-$No_)Rfo`$!u^TR0wcp_rW|yOm;v^!&LIZ1 zY1a$?p--0VFDpIGy!w}~aPA)HlA2HlErr4}H18zy)*;vrIKQ75 zX005qTRn6mjvascp)&bK==ESib^lI_V-0T!>|sK<;1jE8!y2#OY>@Y{2zSLoW;5Fh z3FAyb0uQ1ZHhmrTu|lR-)+!2ILw=^S@W~A^mKE~16dP4&E)BZt^j+#9-!{vGWqyN- zNI$pNcIUS*@gg6?46Ap=?qfTpJ++LuLefvX;}ut}BVOxP4kjtB&oQoaQyJXiRV){b zmYDe8pApLGz-Gfyg^x1aK{*zb`jBnQNBv~E zYFMeU$6ao%HUnSc7cZN`IX#W{FYf(cp&Zb?pAG5eq}qW{}3oD5VwNR+{kg zkeM-P@~IjGLtl|B&ViV4D-eR?(rABt@v}0%Wr9DR^CBZqReG5K+ zB!3^*=VvakpF^Nocq0YzcjfYPa+e!EDWUG^f>7+FcIUArNemT{5#`avY8$b6(QGcA$Cj^eWj@ z@uy-f9>Lo$D)%5Zp$JZ&*0~rb?+s5fnyH0O1gTh;@DtJXC293?80N5i2K^86o_Tan z@v-5E{8`qFO2t>O**Gs4?Sa0<<7g-Y8n1by7Z25eT+0`36-H-PIe#>uSLh?T-Ntc; zTtub-;}~TkXJbkAi`sMul3xi+g5l?qsf8u5s%mqXDlXA&NHW%4NqmKlhcBRE78gAV zybgAZXYtHSogny?la^Dq{mSL}h5r`C2QRv3_aSf4!RV96dh7A9h#eTUUvz)WD1xbV z?&vA-?e4%p^7Cs)Mqh$xzjy62Gex)6hIGdGT(`;)m3h(2TV8##)zcvn~ z@aQGQ6HAnbj473;Vi6iKCNpG})r%oX$o*|GLjbk1Hi=h8onWE(Drn|1f%@ej`&RQ% z$6?FvV=rGY>G9f4aH~}{BaZ)TfiIB^#)S3(-4EwJOsD0=M$oG{?~TnJ#V&A2-3nSA zTX9-D>x7l}WC<`1t}!A*2GcxF^J$`R%|Xr837{a@bWDA2}Mq`%>Q>d z5a~v*@dG4$fWvmV`G};6OuQ!leP9=JY3}TZPOqus2dmq;_4xil_W-kO*(VVzWMgqc zA6LeL6nytUeM#+&B*}FY-doGof>UT+ z&z^0a=dtKh;$KgPaGZ(s*q=99Fdz>k2WlNy?i?S zf_9ZdeeLVUCzFeoGf0Som@n<>*Ww&6wlpknfPh(lrvGgf;+98YB8QPFezW!Tl>0cv zZ$UGOEVuA9gAb=c>EoFw0ot7<9wK)R=b?n|^T}cr34WXEjNB_oBXdx-4uN};e(^9h~iAQyq2vdg0q;xI*5JEr3b`rW3 zc8e0jK>Xu;|7E8p=#XLx6AR6GBbW8~kKvTRf~?&)MO;{%)>JBM&P?h-q2K$wMx^L4 zYlN7|;#bq@7BJOVWe&x`4vN(Wy%j{%DCl@0m6r<5bCTA)Z|+SM#c}yV)nak5V6-ce z@DX}+6V;W1Wnhwo2AMgA-<|DX8l4=T4gy^i3r8;^UibNs2>?)gtz4%qQQ#=}Mav{F zL)II6e6qavNArW33-D$J!NzF*1;I!iJHf?J>H~Z9h;0~t8@V|9q}P`Q2hmsGD_!mAyyQP&r`A+ z7riLbUIv+{BmYi&)F5h1g1_=VrO<wkdc^x2}{@LFAK zNw*a0-ueN{fjtf>M2dtl%5kP~#i3@w6qDW_A|hUNM3W)o@;d9yY@D;>^8iThW1mJb z%>O=m(7+BTSNBjfg|gD~4k=?!bPAzbvS`BHkq#IRi2_AQ0#_K0^Ou#fcFzuaOXs?! zVg5>ek3zel>Mi1z&U5vR6*|xB%OYfd!6z0UgtP&D>qRxxu|Q6~EGwa@BtLzVCYx5~ zmGrMJF~awq4I3C&8^cz<`Wz3XxlE%`9AtJ`2F%}oqPX2PHjru#VaA(DGl#e)v_hDR z6DwfmWURr?=h=jGVXF^g${xS3)VgvApv({TmdGgLMy&=(0~w40o+QSNm{n@|hwtH& z$j5O%RNXRO)*>iAq`oHAl{0z>TGRbGZy#)FG^Lf@Vcyb8p>ajC)GZu+J@$W{(jOk- zHX#UOqmzt{58hBKnN6Cs08J}k`9q)*_X_uAJfSPF5Nwj4F{*O95oCBXV8rBCO7Ky^ zQRn5K?DQiA3^Q~lilL{waqzN)4c*A<(1@cGTcJZ&gGETY<9gNHFVUF9USIh*$voL z-^NEa=#zxem5|s2opUh`(nUkuA`5lG*f%P2c436ZwtiEmiH{qT74^P-#pLMR|-}X zms-Q`rhif@N3_4z_nMLeM^a1T22l!$a##zBDm0<0+^#yOz_ae#J83$RXfAYHrkvOg z*_Ieu1^CF{=EdlQ_s)gyk9o0Z9f;5yBg$f{M;GCaBeP?y=qn=7+3pcUh zXHcS8Zu(oP6HFwFBq+s1ECX3Re)Q#P3W_2Le><5!%6>E%!Ah^)>pgQ&fWrpX4R=>W z=2>v{qhSjL5(}0J2{F@79Z2u-%L8h9)}cFwFxv31fi1^ACO4pSTmssK_hTk%JAoB< ztavn+$B^+S(Jg8JpZNrThz^I~_&P8Vo0AP>`YyY(lXr2tA#zP|O|;$TA=L`yvD^e^ z%nGGt(vldkj{%N+AxJ^~bTqZ2IJ!t?gDG`}UvCm?&}F$VA9RP5DN5>Di1s|Hp`{nv zHY*F3bdZ(JabAI7i+vMt2r3Ix>QQ9QUPDd>NKn}`_~7^?UGfTfbViVTIk$2%4WoU8 zo5eYj68|vbQ!F2aFFjdIeX95&cwM5%NJ27VIphf`!fBd;a$W1*4I9cW_n~X>DVJQc zzz3yJP&IXbTC#qJ6hqe~A-W-Ao+EwJ!n}*^4CP%H8&r8U#HEsf!-IRmRlMqtJc%UIE?+lcQ&mZBjLcb+amAl0J z-~oi;++$Xbfnc7xf@~9=`-bv5wELb`RiR-qrSPcrQXT<~UCpliYq|@gim2Ye&5ju8 z)c`HXF$hhMBU|rJl938dIYfnwNYc|pQ^etAF>-LM%9wV}gb>hBsxiktC*qt%O?f#3 zl~7>$eoh-+(w_QaP_4A4?YXRXCg>l0Ii{{MjQ9g$vbZl~hRc`-K!lzJkB%ieZ4dtlVUeY^c$IGbwnX2CmS6$A9%*BRllZ~mhH2h8y8a0+$1c#>; zkD4ZKchuv2JaP_s7|LLhQFP{#XJL)c<7V)u)%ZUz-wjr*D*Qzt!-nBW#J z<6elMT_G9A5(-n>I}xI_oV4IAsHA+^+qOccs6l9D`}rYA!2RbZ{0T2pL)(wgp0E2| z*moNDr@DsExbx0kpJWt>eVS|=CfD7RTR=<-#OF~hIUhRX_=kiM`o>1zGCAV;|9`#)tF804?;Jntu+L>&a5Ldam{NoIlj zhh#O`M0Mf+@{0egwH^}^@A~uO@`NnG{I5vI*kcQbHm&F1%^;MybmgC{$ zFV%1`k+yn~7$y&(Eo>uBKhu8x^Zw`G>UTNGa*A%V*2Nc^j8u>H1%qrQ-{Y4lWTMl2 z{8cmH<2jy@u${oqM#Qi4Lmn~Pm1eeOkZ0iBN%pqtRwm&9jn?PKH=~5v9KKx0iQcSm z@{)pknk9rwngfG*sJ#>2c1hpP<6M5{ed zwm<%zE%|e{C4A|)0Mp-nC6akcyKsWy-y1TDgeQ@o9x6Ant-K(#-;i+jpk_2fJ3j6^ zIg+2GfVW!1u-<&Zn_9p<(OUCM)iFiyk+Zqtd#tw>mlm3wC^uMD!(k)nW+KAM^m%eF zT^owhkZ>0+ex1+#j87^z>YgT(PIOsF%B63Ra0szLvTi?r1O=YU zv;e!LY#kAg!nE9@9x1-xhR_LA_^IrTg>>RrXThmek8chftZbwOFr2!tbpv|U<-+?a;Tb7-W5qVZ!Oh6s zYk7_{8|UFmcIUQvFPB!ZPMoBO^ftX0Hr-wRU{LWyjlAt@Vc&^b&W%Qm=ua5(PwEN2 zm|3_<@X(o>;8cbs%|>R4_g>mxGmZaz5$NB5@1bY7UedLwcY7XY8 z_dc%1VMgL#<8bTV37tha|t#cbRra#PvYO?Hc1*F#LGuWiUu=?<}~@l_LrVI|^*!k?7lU z3+|MConNm%U=?&One_RuUk42&L;5QBruwzg4V)Y*AoLTcIJf}__lsa(S+RuFA}aX~ zfT5ZPKV+FKz-)paL>by~Z_u`hi-y)k?$U-^*Y}ktO&j$bxbOzY4mp8|Qyzz4YqiDfaNd2EJfghG3+CU&5$X`xi zzW5?l_ua)(!rq@98|{J?p{VdV*1mW0%%Svh(?rpG1}}Z_$_BX10|Dp{Z1=W4UY+c0 z0A@ZXfeY`A*re@$eNH}R_->YvN;JFeZW{ueOo?y;za24?(R|d(MNNvfO#WdG;ZWctRpgj7i%250BW9YVE#W=tY_&L09V<8 znIkwe+X0Tof@64B_53dejOJ}?ht}a3Ao;qH07T>h4Y&$R$f3ZN zbJzi}{IQ@kp5tRdA0D?h*~wJBhxeMr!7?bx=iP(Eg7)jS->FPlYIa_Py*T;L!7?oO z44$zVGUD?ISp%JQJ22VgH&C{AhnwTn&nnhtqNTfc0@sjnKL}Yg@oll}8 z;QDas?pE-9P;air(?|T5WrF|OMNUk_?umHOqX_9|PyU~aw~7Zx{$2q=oSQxtn&_-C zAu88Y_|ZjxXz9KnZj+nR8~q5dUHSv%+i-uhgdT~X4wOZIu`SBKeg>~rFMh$p>Laj@ zy;!Ba+XR0SUoHVtx$;N<`bjNkII~AuWfgR0oQC(tW9oKdh0%n>6Fw2jcd(NY(f%Lr z2rUrY!6vvd{R|>{JD@^l2UxE5VzJtf|9Hax`BW0wFs`d#wepEn&-EYP5f0CCw-=2!^&VnmjW3T7_uN*J{C&aX?Kj!?Yp|Hw)3)Y9t;1d92 z{O1$GZ{G~Gjx9*8z5t6<#~ATDpwYFx=v??%$l_8~H?HKj0|1$;Y-76r`j#P@K*?a# zCb+A51Gd8ffF>$dj~nLf=6>xz7W_ZM_SZ^c_{dE`~+4Th2ofE>U}SL^5U z|32;?wPZ&}&p7{M75{wm8+Ih12jbotj_VF+k8Qy)QGl-bUjI_NaV_a_Gmv@`Qr{mE zn6qtal%D~pO}54?jK56%e;%RKsCImLs%3Nv_)XH5H3S(@{q6GY)_?8hHdf$01}gxz zx;xyGfM5NdQ}F9PF{SEs*cYLqJ0nh!TiC?G&+Fe$i6|0ytD>)l?EO~31?-8U=NS>b z8;CR*i+ZM8EB6E3lSYQmhV!5|?NJYqVfc8S-!+o?sv!eP5!q{1{9ivPMB=3+LcuXz zIMoW=p=+G8`_xoz*5lRY^0LW);oo2RjZW1$=+Co{)+&_N285BiMC2{(;1vX}3}C;1 zcJa)z{@?lqdP)2^`7Z7#ec$_Ifr{|eJoP+B?e)6?*Z2R=TMicEocIs&CmYO++5z#a z!^#h3HBlm)dBFx7!`TZUJo^EKg~a4q0#EG2)vmpNrnYj=|6CVvCD1A0nvo*ly_<)p zi1}e#*gfzOa6u53x`VxM21j*)dpZR^=hp@mJ(>^r(zH=D1hf90r>*N z1GeDUqc&L)%$6p=wU`G?K>fU;sQ)<}M))PzlcK#~vN>&vH-9)mG_|(z%?kR6m}}21c+033TXk97nk3uWUgO*k|ojY z;%we5u!CE)Oxv2srD^7Wu4+O!Lcf!G+JLm;|E&2x$NWDlaS{!;OCzGkG1|jiWT{{r z*Bx*ru22l16#3Ov10CzKq|evxDX0Iy#ikLRegzJ11rYr;QG+)V=OIC309*z55wBYB z`T4Ic*zTh0mH`|j4Jsq&R&dlTwfp??RU_C~|92WCX|Imf`otPpZ>XFyz^3JhY?l;T zuvPy1Y5(WG{JCu@^2Q+H>;$TI_A|tfPnv9@&jSDJD(F$eZGK+XA-HviVho8u546gwg-SrXPo(6-7>LZWKltnVJkt}1I`h1Uu(uUrBPO;mh zc+9bJs<-hpr9$h3pCAyhq3+X{a{#*=`jTA0Hhm6wfeV0;Pmo;;-ezI-N{;#ob(|O2 z>NU#QAkZkrdm#?2a7oMJzt78mRvCOO;DrLkmfB`g)!l!6NGTW!++_#1k@yV2#4ZLZ zfd$W4i}9d*`f!W!Y~TNG4jU*re+CCa?75-V(({dvYe z`dWbVSwA?L)4%qMCn({vrXUM_L+^yjs*H+7_Tv^9E0J;GP`S%{mtWy()C*oa<;!$` z^@RWb|JDEXKY5|_APL`X7XOGu@~pA#rraC_i$qAbi?KheS!Oii#{?yvsx}nP$A~Mh zr?{c)rzEo>2jF@~0BL_5g4`ho?81#3=okG23Wg%pS#JOMw7t^C;4!jW>5AL~j?(d5 zrhV&{s=b~`DDgaCxbAC#)LLUV#mn<#Jx|xkxkJDpoCjj}-f?v%!1fgDn?^ES3jiK@ z+N9<>@sCr0K=n=$pQr(&K8j~^OScU`eX-`hPR+mU_+S5k;lg*I71+lTY!<6zo3^`~ z{e|n}k^R=|qsM|k!%qFFHXH;IA~A_?oWGHKcWvpC4babGDOvc$mv@33Le(Dk{<(Up z7)U@_J51!}_qPRj&%;{+WszKNAsH#nY_?ZRKF6`zqi?uOg1i9$O;mRcRGOmh2V)qj zF)*#bvlMV@cla6~BUCl1JVGDG7$7f3@?B} zL@OSqBsitc>*z9_i-63sS6ZBNM5*=xSe@#nW|yrM23-(cJQoC!ZxAcb!#!e#2*Ej9-bUc0ebTkeZn|2{0%1cdBKa?1(p`Zzg_KUc z{|M3;@LaZ*?f|SfN5&!)usN;l*)Nqef5ZDGazKdMhR=x|`3fLc&(0cv+i+R3RX_?? z9qHKwWhHlTC2J}>TZ2^u7bfKO0cwR$4<-V*_6EQ`iq@)h`fDvznILv;H*Vhn7Zhzd zFX9ncGGGZBoB_$%={sXI6XZ$vT2ej}8kG?@P zxVTRn+@#$40bWf*nZ`ADH#N9ppMw>W%z@F46ahc?5qHR?8f_w&$(1r*>rM1)=|3S~ z;ub3p<(}eO;c7!1kDZDh+u#y3z5;YtA3U^GZ-I%OsNcvZP7iFt|Ju9>aU=+aAIN9z zf%Q7|BOy{puS)!%r%MS28sAm^8PPEe3Xx&=a4uHeND{sc_W^OmPT;7iE`7nS-|oxrr1p>&X(0YSEGo)QHl z0p69n^bG)xr}MT!5x@5lsL1*)fg`kC7{4Wns6HGDa z(d437|F2oaiU$Idk>bPZBmX@F`goiY$aPEBW@IlpGzC25$O3~b*}KGYNPo@(4&s`j z&*4|650S$g8AHy!SoBPoU2rc%-}7h6&5aZ)Tx^f(d$%es-9jHoJDC(SwI&7Gg0L-W z7QR;^&`dy8uFnLJ@_C?uyT9}EqF2W}nzO+nK~I60fe`5}3?ai5NJ!oi5$WD5 zJXuru>K~(1MBmPPV03qx_UUr_19hNS0`CfHWa8nfBpUVFLzhiF@v@PBU*9|~q-V@l z*UPy65fsk-ti|VnwyZH`=3(TFW*nn`2t13>9|jAcEv|6sgvxnvIfZh{z*(h@E4Tnv z%~52x?see^%cQEOu~aOw2LLp=eW+-DZu|I-@_nO~0Or5eST&q)m&0+~xR>679cs!Blcjyv8YC<;m&>jDag*?x=%jSZUXmLkd3zWOJDq|*ZfDvu|2qNEHBvz=)@1!@Hx>V-vKvb ztiVuH=kGWvuMaN|cL{)Ys6&vkkdSSji1K!Bs837u4vRhw$EDqKe@%kIgbt-nse%vI zeE)jvyj%bMR3-}ST-(^Q)$?`TxY-`?{t0P3O5pS8egr`Ix6>)whFiBsX>A_?Dq~o4 z$_=gofG-?;nk4lCgm8_We#-66v?vcE;=%sKJ*DLZ%<&RsXRqVMv|PU8b*+~2#|^q4 zS6oTT6PDg3;!6=c8#~rF6^^)9#PRh2S?4arz4EuaK*b0RawDXdZi}0o?2ehJemD3c(33vi=B7Ut-MG!o#z^(yOxy0M9e`erVtWM(lBy^yRfj zaL*e=+57HejZ}btokL~%f+c<&7n4_p&P;~B_Q6Pkt~h|d_;91%^|m@c>Sb{ZjcsY} zSmk(6E{wb(phRITG{8|S^VPwH7Q|{|)L7@sEYrX~5)em6@CZ1e;&=UU?4=8raXm!s z5n!V#l#jpvVHcZi674TnoP`px*I9Ijt9u%r3DEag{`GYeR5ga-{Pp_W{2IiJ;EiSL z;_P*}!LFH=CKAPrwnY0v#E4gAEO2L6+ znz2&IVQ@>9T3i2lV*E zkGsdft^gdiCWk(YweZN?*@ATcOVe%vgG_7WFH z{LZ2q)@td`@K0kOv25Np1Y2()&lUK-?!85e0J&)wO00E|Xj3E!CADCE949^<3e?(5 z<1s90seWqfWzoVAk5(` z%_zFl0fu&g@Tn;GYYyRPc|39W)BM0}ch*K*k1 zipyA&jvgwDzy3U}7wb|Ciw9*(KItA~Z|$7#sCyCvvC{zP%BL_uqDPB%vY3F=?B;ql zQ&_A>rhyl*NsT{-2*(M+7x(D?hRJUWALk&y(M3hTW2j2?Zn6JRvhW8 zJFYi^YGMQeo%iwTdMyV2Zy>Wqxgh3;tk8YvDXnSTO&KJqniyS-j5VW`5Nz>wu2ERdaiucC&}n6(c=4ZsrO zIH90A3ii=HJl5s(#zhPTNy1Y>MK0%DF<*=&E**lraHEz8$*FUYzuH4ah)&if|NELy z2fo4Mo6>v9xo^zP%r<5RWSNdBc{#4r8bdlU@;^X zUW&;q+letNcP zE|PeMkA?TTx^ll3@QHv%Rnh9$Rlxf9*fWPr(xHxUuq^<_Lk^WZ?Mu{)U3k5VmagZ8 z;%Ff(sH_fOrVQurWVOUMIxfJB^mn9>ATvrc#e*qgBS{9MCb8pbqc>) z-k{ZADJO0&CZ6NmZP=L>`WYofs+ymK*S-`}Nj7KxS_OlUPqh->_xJfS{yCl$#e4FS ziDmg5fmeFLut(QDpw^p8{QbjUo|+~id{?Z_;Q6BLcM6K3o|c{nRZQeXu^^TBjdcy6OHc(C_F6HGc(ebGswEr^kb@-Nk-6$>m!ax%V>x4)r#p{r^?o+CoZPK0lN{=z&QysBP@wHr#a$M>7eT`2H_hi1e8j=1DPbYvVFPe<+y zoSleaMQ`a#LIx{Q_>1XDkwpD%_?sWy=mZqjcnk7T7s<3O%%AZ?=9Lr)3&SE`ryla! zILPec>!zh7l~kwaMVWOMXfLKu_8;llc*8WFf_T_W-iDlHT6GOmLooEj@-uS>MU$0- z)N;grz+uzhRZpN6L8QQ)!+ZWXsh3=lb7LmtA&1z@OTTOJx9>Jp-fLfN;|7K=d3^Cb zUOfYMh@yUTgWRlt{mCdihA-foDq(66=zJO*#dwSybGW4@PyG`}v;AB-Ht4Q8|Dd+# z^LtEQpHj2g=ugx(grEL4RhyBXCVUvM<9=oIS%2?d?lfg=x+eP}&@G&#)d(->j`Bg3 z-|(?EbFuOr(d)3bY`x2r+aU7%mM?b{SBr)5(^Lr-e^Z0Pk- zZtioY#;M4HJv}IqbbF2wopAniMq4U z;+|ByZjCe54j;g_`FfQIO~N%OkM`loryaSW9`nUcXOEHT+?SGTTxR$Kvv~gXd=bgm z$PdYowP)LrC!2u3GQ++<77p*DNwm*&lN^Pqu<7f1wA2~Q{|aUsunL_ zp(g&i+V^K7=hlWn1NobUgpm!>#jEZK`dxdRfi%DsxBtt|$7k$tKd6KWiC<&1Dzjro z5(BB|TsNx=zq4D4)yc`0L++liZuuVzGGKThVz(ZM8AO^%`f3FjcT7llL<*Qu z+?G#x6=W33mw=lPDXT#2G4I&ZBv0en^+8=Pn?;t08~Owf%v&>)hnOp17mYEv8daEJDqtbyz17=qwCAL>OIb! zPm*r_RP-<;m`ufm|0Zgo4Zr&J@R-R0iRG-Kdlw zLwGK{J7+3bLJf(8a8Z3Qg7j^c#y}0823#SpEg#1P9^woTW+)^ZrVlf@s0%j4IwD2& z&8Qivm0CHK|3xvC$`iv27(A==zZsBkuVRV5wi>f!TZ_@RYx=qbO z0rgw)(+8zZcpNcV)u3Q4vX-y7Aqe6{?~ZODpRn4;0MR?3l&NQk=?aFpffS6_2~IB# z+3M4QOlo_qE*4&r77l*;U3td}0+~{{Y8U(O&ll^Nx9TjHyC8js?zxrW2twUIWU=YFFo@woUfuv!?CbTYd%CF1A}7bZ89N4D&K#Lmv!baIti`w2DXU`JWp1iy=MpaW0utTk+$9bK-GF|U z>~SX0o>~|J$t2xp+v)K4GVeY<%d3;x`}UHt;0ezYhT+|Z5vO3{{EW@)e3Uxg4LYTo z;Owa=1@lBsMyn8WjI}g0Yzj@KlU9!4Wj=N#1a!0`Ygg;GztbeBh)z8OCqWu$;iK@1 zBeiqD6jzQVgO^r&%VGgJk`F?fIh{5C;D4!dNSHpEg0$ ze&3(-9JioOxB;D@38b?>&t_9^5O;LLr-|SVt%AT$64z?`J^uKnwTfy#9E+Ubt>hb! z)0;0*E2NMySye#21K&16SjRc3*g^x}a=2eS%YYY0D;@exUC)$NT`I_wb>l-BSe!|D z+FLyZWGgH~Zc)-?*UEGQN>sat{BFXH-!+gV>+|;}%Nf&`q zlqongw{kUsZae(;3gl!D%-atTP9)tzK0TaZHS3Av^yC(8X$7=RdT16fc{$e-y4{hEPGOVPVB5Ua35Ts1tY@lQ zwCNXNh9PZCB50vdsuJcLkAfvxB7KF?DyLY_1z^yrm4xLa0=WkFqhiGZUVht576~!$=W4YaeyVC@1*)HEPgOe3 zza=xGvsnQdSe?N(`Rc_Ou{fu!aQt8GLZ1l2Dk(w%2HGN~n%t<+2{AI%+l&fVH@FD^ z9g>fd1}Cm)d=9>B9>7z=TF+gdiIe8Hg%bI7pfd|;9-99ZKnJvQ)PIZwdH)$)E%GWc z0Cf$0(AOFNB=dM!^mwpiqcjg}?n$_W5Q4Pf@X-Uy`iZrbntsjIbnNa@kzs*~wUdZC z-r6rUzH$O16|q^AuLGR0S;cvs5;w>haZtbgvsAPYx0x4dEEL&V%VZZl;!4te#wT4m+jNO;uWP5F;I1BZc*PwgHR~xu+i3l*oA%yzdD56Io-`~cA63Tu7_=6w zI{Pu#hr;j_^o72uaNg(W@gPB=;q5h)jWPIP;ppV7Fb&&PDycBp2&sgpx7o|<)IUDn z|En~f7m5EvICvjLdLh~G5{Qnv!qv}B2XvHQ;HN=d7@QB31EH*cZLtGDm7cq@?*;IE z8x&tvdRa6op^Kxw2Wmt{<3l`l@{@7gqGq$BWCO2HXDfOPi2QBe#w4?{IHW%ZmD0H6 z>4ZeZ9eLvuAQ$_~Az%A)0MI+HY7$D-x&x{Lm+1I3_vGSfsPVtj*&2Xsx(WVWjxc1G zk0BNovi(u=;3Fsne6DPs*oPUW{WU_fjR?<7ZW?caVU%*uxCtQ^Aw4dqIjM;RcmV8! z4|RS*_|@X#;i`KGVTvSu+~PYfCkuU}Pz|ABgC{nGN6uBH(G=0{{oG#V)UU@AA!Jg z%PQ{^k~Q^6EK8G=O{bj3-e`gs1Z;wnBiZJSYh&`XCu-s{^TwhN;A!QAtRddsMT;2D zBoe#bkDsMhVCp>_%3~H+VBi`!iKCHGQg34QYiDKeRHNurVi=urLc2m#eednJ6!bXi z?bnWosnyY`x%{^LjdCa!d&p^Cq53F%rv*GNiQlqEaEB9~w1NpgC&AYh8(AL7hC=Zm zz_c`X-Tu27LE;+&AgP9`YDCjl9n^h$;v7SFh?%&JiSquLmhRX1Y9M^VXN4II*MI@R z$H1Fgxd|lTpHsvltjcwVT7YJ0dkL$mn^PrPd}jLT>N4$Q?vCFKCc548g9}5dHXoA9 zKdwZ@{n&Cah#w(KLSZ8Y(OM?6ZQh=wo>;_)=NB)~hvZcxAk{3bnfEQm69YoV%iuO` zX=qJrtyc}CA-z&?@936CKt15+^gqU0qSdl?@KXr6B~B8kyVb4D_U^n0yH3iY2nz#{ zoS*n`wJnUdG}Y|98Q6syul#4(qvcDM|3A9EGOWri+FB5$yChV)Q)!g$1`(tUkgknL zr!;~z3J6Feq0%6sDBVa%sWhUXgh=zv7f*2Rckhq$JRFt1-&kv|ImaAhOk@l!fdXHX z<6I;aUWzw=bUzGo_1w9H>dBSo$uY?$*`w#DOH|9D#TAU@f73QR#3k$=cb?VTa`O3@ zECT*CZIMZJd|T+AMTZ@6F)}~An!z6WK#2uo86QoD=E;-5U>^-H?(0`s?Te{iNZ$LR z_aKwl^bx<<>>VN906K^1Ir-~h@`4X(M;~?(iq6F>=>}evfehna^P}KTp}#w@W1jRp zE+^xU9OTi_v-r~<$`r!NOnSc^SKw|$Xa3#F`t+b7+8S||fG-7du|1S%=u39A*a2D4 zIC&y`Iv~TtTCd-z9(}omHQnN&V^6a{Zq&5=UtqQ_|9m z=dmWKfFEf?#pJ!g+sV<13cc|3YT5f-x%_0$^NGL%d9xFyo23&;U1o*!7zbw3b*1h# zx6(L%S3adAa|v4lPFGrOPZ2Tu7$p_5c3L6-%9g}tO3rdRxwG+?LVo&V&=8X$Y^@hh z-rhbZ+NP1k){fb$Pz&77)~>$tb3tk659lLh>DK~05%P=~Php41(~i%IaVpzM1>U|^_GWVE@qNqByw)DJ(h7wmA7H8{gqO>EQ5@aWbo{Te4$!mDksKo zZ4wmi$Lkplc9Tu`sXoRzGkfBDNb^s`q1o!7eoKmhd3`{4+o`dI*<#jJtM%ccL~JY= zX!K#tiSm7`U74?n`Ya-cKUn?Ks5)4Xwv%3(kxAd5a&}RNyL5y&-E32wO~n)Y+lbdc z81%1t5%ca;cAzeb|EBP|M_!RTg zw(x3K?ThW&9IXIvXJ8C?`U!LNbo;<0IkH@h3K!sAQT@GhSrBY*cPm@-FLhRP1 zkjRv^OS_Zx#M=ilk*_@#AHVkaEJsyx1pk-le1AhM5$&Al?=6_Q6Sks~vkZkD=@iz^ z6?)O2rX53|oHMJZBu4VowRctV-^`ILm*8Fv%k2S#UXy99HIO6+r%He^g5ayi<8Et( zBTQ|K;3$Y`lcSGxuNoxrLA1`F(Ugr5Fxq~Qq*mA!3Li(v=0iUdImj1gDs?mIB=^sO7chc16aH%L;-w{ zDh@Q(Q|>S=)uVdWMB=*Hx~H95!@3{X*_4|cG^_{^>x&H$x!M@qJ}?=C_X zY4|1agNwvP*Qu386~Cp?Nv-?wRO{eoi99-!)-WSEIa=g#3M=KJfV?y(ZSqBJ(BY{x zN6Kq6^u2n?$XiH4Uh0LRF@@@@bG)N!PZxP1$uwuk>phiC>z4|EnGYGwgSlk2XMok_ zg11a(`zC?8EULX)H1Wj(awR%`8SZ^&Z=^2Ukm0(iEBK z?CG!Ke7`Hy6|8Xi( z#}TX{f9rM{+8Q!DbZZt*{6ga;eQa#%9IANi08cr(d&ZYo%zM-Fdx;bc)f{rs8zeWIUsG!!+UjP!6JU`%+`co!HFD71Mzq59lO?S> z$ToC)JTmXH{dYr!fQ3SYogtHX_n40(CfklrE*2j#$s9YY(Cz;UN%J$yuxOp%6O$J@ zWk(62>~hcHs@frv8dagKbFGw1LMfp-E^-qqcs{EA|l6k^~1{cjnLqY zyRYwaalq~FFjGiy@9o#=4KgyTA75oZI%WPDX3Nrk#p~j4zr%KTYm;5h#FFYWd5+^> z2lgLAn$!UuQ7>+JYy7RqqS!#D;-fNFp8=ZJTlY@I~nmQF*jF1-gM8GSIwUfFv?p{vpARqu-v9ZbG{Q@|ECG zKg3qrurN#4&gCQLc*j=9C)^{Oo^Gu@QMs=WnK+`9^v&GkzSXJ(O73fxW=0@O1z=OfYRaboHtb zAF4L@D8Z4RByBzCouGMJaUUawMxm68)90*)0Y!SAX2$At;FA0@>EEjbNWO32XxLvV zcMAWILv}|x!Q}>1u`U@jmCSyC-Xx;??n^9cVJrscz_SEN{T1|h1u!AJEPz^XK&T+?7_%?k#=qp?3tXc~pgN7y)w~@K zyha{Kq0yJew|jwX!^xfWYsfg}b0LE&z<UIiAk>(Da6aPf9Yxl&j&D*tYIE_Jg6ZUzxEM>e;bo*= zmuowY6Fukw1~vnoK;^p{u?}f?>iKm=mdk1PeAbsP@kvYvk?!BRAOhLDFTIYUjt^oq z16%eda4N-75|`IYe#xXB5f&KP<}rapIe|h#BXv-${6+ zL>1@*{XEmDUOy{lzi7!r_Y9IA2HzV)zCdWAoh4zk?7rN01=zR?=OI7iLEm?bZL3{U zh{(`Enr=39!rqABkxbf{(BSr{s^yvT&*^oz17hvj98zKz zu*hi^@Yude$$0>@e}$^0^C!&(AYZd8Z!QmWGGuZA%HSz7uvhbRM_QFU zB!d)=D1O8{?V=p-`xRgYkgCuYthcym%KG1YWgja~fn5~+Nml3CdCUv$nMW(6=K>8e zwf#1wht#Rrb`=*m7vlMD>+232HGKt#7ybE{VYXxfHJJM}6=*u=f6HH;8^0^tBK!Bd z3piyi{c!QcRmRAwht7P~n@h7_kIDy)$xE)y!`3LciXP(zJlHR6TIaHU5Qu*#%6*4t z^!^Gp`aN$v>ZHO2udw#JITX3)6(p(-raA8%BDl+|7qJoVW|e=mJ*1lb@L-(3we#7* zMK4Qq$Kj#FHP#d|mBF0;OdXE!EIcY5IVBw{DK=SoCiw++rb`Rccq$Nln3|Q5hU=m2e8%kz(ce6aho3Qt4CZMD5%dD7j4fb14=5Dvu%a z|1fVoELg%bum#E1vVe`%x|~`6UDqYdMwW64UeoVF>b;#d1~_j}^PMdR^GT*g=E2Ob z!pRa-LvJDa^ULUv1f zA^G||v3n|ah`7(IF!bHcdD!@7*v8$y@SHpjMcG!HpmIwo^JIQJ$Aza(W0R&@M|p|k z*~;-Ot1eY;c^e0VHLsy_HqzkbmfYE9mfTOzDx2{DK>-&drmqFQ_i7rfY!{R*B{QnB z!;V1-`I)ZEDdAF8g;Xv&j0pWoczTc7TRK3-GEJcusb7C)goHctRqK|f#Ey9i2eXsy@bg5kCOKA;_9Yjoxf;`% zhYLNauif$r{YjAM(n?PkLxW|-$1Z*w|4RkxI%x;Bu1)YHDAiUazAVeGremd`PKQnoDvYV@ zW$^A2Nx^jy`M>$Z_$8Psg|<`8s;mPO-At3=*H$v^2a#W5iK?DCya$47Po7{cx;P9K z#O`%J_uL#4FkSjwhJgV@^LrWzDEt1L=M>mVLAlo*B0M%0&J0^=%T)Al8)bYT5BDV) znJey7KhJw~uhM4tJEHK6pF(vHu_)qSytQYZi!;FD)&M|N>)oD|d5Ozb0l`AKoWzt- z>htN0Yi_`$_P+jg)poRWMNhn|;TlD+VQ#*9vmk%WOZo2tXGbEIH(9plwNTKZ6>lTM z)=r=_%COzsF>{sE;K`qm!a2S>sVdSK5lshWoLu}aH7j3L^r+z?eai*s^ngZ7c59`-?C0y&lP~&6>Ppo!WSJI-IBN|RB4LhA zwP~7xT9KE4Bgd{+iUN^KpkdP9MEENX)(+|RtdjaV zi!Xdf+`5hq&@C*JT$i?YLoE$VbTQDdm=rPN(_Jnp?rGA)3%yPgE0 zsVvFAF{UpfMuaY#Kse>)Fh*IG6+ddO{Yx23T6Qpt&aWpzjKtPnX`jkHLnVILs`4_a zbgAU0xBvdSm??zKwI#4QrVzh7j)iO~A|+q+-Lm0Oy>LqwZyYWR>L<^V#naghHl;Xq zGPPE%?1U=T-^j?9J%dH46B(4Ro_ZJN`n`l5JRqnubJ|{lDQm>i23+D~S1*WCkyru2 zsY6t4sbM#odoulHjZp?+eN1oe&rDf!~&fVI^X- z!$oxhccUpt8Vk&*lU0Z4pM^BaexM-07KVY6==SKa!K55mV`kCu3Dq9lD4eR8*9|h^ z7U0jhuF`3l+p)qRA5nAt@-Es7T?U*Ii8%W>GEd+B^CcxZnx@dV3=qkd3P4AA6k@q~ zWDiM#l@b_IujQgZ*bucBOlo_(yGiymy5sBTW(9`wOtIl|T-Wl{_~%uIcyHokm=``b zOE=)Jo%`H(;rLw834PNp+}NWNpisG-yW-DqWRjRFmav445tr)UC^C$5FX}2FC(A+a zsTo;FJXfKxb&;RfUDW@`w9aKgcPd~~T`Gf0F^YO#>>0V$?iH2e4F<=}CA18bKHl8V zlqdboV!^bx!iSiCewUxpDZf*jsZSzdxpldyk#X(WkGq>b5e!$WX?=EhSp{m9)iNZe zu!%O$#~l%3}{u4Xn;EwE!lV}^eU`FGjPSXcmQ75+f zF5qY`(F)En<-1uoHiu$Rp@w6^3=SgOWN=?so|D(OxErmZ;f9l2Jb-S2t&lJ%@8F4n z0k^D6_i_}2+6y<`Hx?=0MgW#Ch`5@|rj0|fLcYstHq9`xoxvI^KMbf?g;%=rV7xQ)b$yu-toPw7dFL9Ul?+w`J?Cfvh^E%728 zl5?;L3^EgKcgKi0@!O!oLf~KEB4Q|Vgg9prprjoDc7M^j(#_-7-iEu|AHaLt| zbZ{H;nAX?q+AEQZkAvFbywG%Q861lNIu(3kSFR;Ekmyl?mWf@ zgTx4x%+N4`QN{I~sXxpjpNV`1_$6O-BDWn>X^Az}xvyss<|>45shV|4U<>c_G8dBP z8b2hrDIyfLmg}-8Hd6zgGob_)Tj(kzWwRIW5FRHK#}76}2BQMS#6|79T$o=BXAjTt z)AczUlJ}up(iVao{4b0r1a`9eMcG@gFmDmHPhliHFqpy36-0{06=d4F2G#Clp**JH z0Ze*rtCh!G{t84AE9e?%t*#1`g#PvF(9nFG3x;PRpfsGwCGJ+LM2u(P<@8>SOP56{ zEcf)5rI04Sd99Mjk;f}t`P{^e$9BG6b2xX$s#cPfWcShzwC!B_IHCd@W0F3~7A<>i zegE2+qn>l2Q;JTQIObf3_y>=+jmaV3%InRmpm+bsec0bLc9qytZp*}lV)qFHaoW)Z zt}?Z(MNiWV7a~h0m4_%K95KGsrxmD(P)t%p<@e{A*}>Ud1|nZk{H9;uMlqYQ*yp!4 zljR?r#UC8H_#?;d$ERKyLkj`E#D?n?Xus<{HU{TLBJyjJt|sbaY)={hO#z9WT?8QBuZisoD7V}0$N8$B|) z3@B^bWB#i`h#1&KMNhO=L(WJy+4Q`qhuJ~A)U}x&C!1s^8iC*3&ecNUlE3C{inV-D zAq2Q>?<@rSj+U|?)yW5=LZcVuhg$ zCH}nANT(ih_#kUB;D{vVrA*;V^4+V2J2u?UCn{}6{cz>>*%j*`o}~YGW zn_*eXF3-tWNHPkhOw*pV_@>zE9Wn1G%_dFCB?!^S#(3Skkw#DKUhOcYw(4v6xPZ5o zoFs;0TqyzS7VBSc z@d!&5Bw4xAQXhI#S?k|e)w zJaEsL)jI7pkB4||q?7@-1E=@8rRV865MT@>q{uUn;9n5f;fWOe`|gqV_6uuC6>0v7 zj0*UO7IkE`zbf)9S$j741R?JBip)L)S;-^1m^N%OPQC7fy^$topC=X0vkLLJx$A2 zm^AV1(Dv}^_*FB#jgO!kE0%O^euU6SMVi)HX-?k?(RpsAmi5_hx^^{_50gmg61r*< zm-A8HuOH+2XajEChIlj@tj~Yg%vn*@z7g)IlXdyCbUU!;!lYzLTfvqW`#n{eugz%b z)8SH+2~jJ&ixO8jSPsf;C#p)UGxNz9fgO`db54;vt%Z_R?J?0%!LX94_Icy|&jI7} zvPJQOxbwf`8j5|VIn5&}+553Zzi!aC7U^GP6QI*F!704*J48y>&+L`%!rFM>)jx?^ zn*Wh#YPvh`8Z|pM99eqk)YmvQHxUsEOO)OVQ+vqqT|F`rFKC(d+Att_qE3c?kdcaW zPLz@cI*A2cw^DBtjo}&iRTK1I%fsaW_lYHA^EURo8zPbh3etxQ6>U*8mhur~;Rbgd zKgWELLC3%NE(E{4i|2KB#%h)1j2Y%P37_5X_pf?O^Y|V{+SFPEwNN!0kCjzXOz^Rm`!qxQXKTgK zql;d;GXdIJ)>5r5M4$^byZ6gD#nTn;&QO=Uew%*QkK}@Y{kVYX8doi=MGi}y58Ll_ zIigtlJ*#j9Z{6Y>0~cGTx_G*91#;dHgzgriRS+hsGwJr?3Q%wxCbTmem?{(rfHH7l zuu5(=ah^9v26?0&G5*y$!+plcBKOBqdO6?pIEtAggzQh8`YV5UDUOzlz7O)(w8yvF zW9YR%Xz%3{z#;W*xG33md8ngyc;{P3O%v$(16$%!G#|0{D&-ucoF>fQP0BVa?!Qv0 zgF@C^X*-O{CbD`T*zzlf4r)-=$Hr!Q#W#?)SgrNigTsvE*U>uToRyh_JBSOJ^1!G1 zPhsFD@)6jN%Y54vDjrX<58zNlT@0|Jv|36Y1L^K+JLoa_^U$84?lDut`GXU|bK@nQ zA_jGtXT1GW8I>hCvoBl8ZcFX7HPh3K$uSu7lo9x><(chEG+P{0T6u}OY{!(?*TD-6)B>{^`bvSnzjl6>ya?b9e(ek|Fx` z<&LKdJEmKI!0`;n-)t!s%DviwPvE*cbLsny^R^D+j6i zthbUFEZP9LzY2=8_mq8Q4ez<+BPm=dCP-|V%6rU`TvH;DJw8Vl;QDQ(mYk1dKG<({ z@g}J|Ib>2f7))=5aV|zZseG=;zWb+eIzAPUxaf>dEz7wQUNuPzRruX67+Hx&QPhIt zv#Z|Q)1lJmQS}wbX34c*wWN#or?v-lo^>2;zX zygtiAg*=O2KvQbWZ_tFOq4+O`lWFF#KKw}@+$B$Nw>+!2U2Na$;1DQWl|M5LN1?lF z39XZF-mz*w=F^^$ajGWsRNiihcoIad;>Mt@Vs3W%faF3Dk`^jrE&Xl*6m&8V8^87< zIfrh$N)p!-MXY{xxS0!R^DRmE8~SEG^OK+V&)xs|ty`X1B=f8(0s zm3F*1{^V%zW1;L#fP1Aadqo#%-HQ8gP-nDpoO3&pI2WNDu}8f~5?Auv=pebVEdpxL z{af7)3)~e;1NrOe5ut}=@7g50kSb29jCdQE@84fMwe}o?0v zz1DnT=Q7rnF9z^PcCg#aYFrk3tx;0>qc-ZTon;2jAm2#UAtX7Gk-?`f6F(Mpx!Ye@ z=lX5DIT*JDuy@($d;Cj^lFQdj(1B+Xi<`P+pk-OryX& z`Ef_gF?8Sc&B)|N_YF=upVUCi+q6Ml@>QIzjAQo=81UyEAXSo_ zTG02~ldzbGEIfxi83K|UWuC8kOkQ(!eY#O(Ctu{Lv?K6~Ju2Kc&xX^u#wp}OP;TKv zK;!6`v8-s>g)1_AciER7$-PwZsQ`ZcO+2c}{m;^X7$URYU4BEM8!W|0qDAslF=8XG zz*^u8@+F5`1FmVS3 zFke?M-tu|fyqCzSe>Pv`0h_+bcOr>Kec`!6b{`gT5T zeNAq@HDceUrq(}o`|;+!jzkfx7rkJ3V;)k{jfnfd{9qAZA4142BRR^X!p}-5pL(x% z2;KkwQQ-8COf+K56s*r+b7H(pKBZ&$NN&8wZtTeK87sZ4$fgBy@Y%OSkguX@>}rhX zUVU0&EYtA2n*Y0Odi1zxRFX%J*E=#N)jy1}FyS<7thZ)FuH!vp=BAfv8vV2j5N}N) zB%)O3%NQ}st)LyRSYtenirSlt*UTcHHjE?cPY&KyBl=pL%+!-jg)BFn2>+u%9Z5HA zcm<9m{aMwBZU@Zz!f!?kt1)n?^}B|XbLf|TH-4Nd?C1a<-MEyR5wlLIUZ7qJVj+ox zV(a{UV0PYsG|?XZ+X9!1iKlL#tRVy9#~$|FT)G9)=!$kSmg(IgY3wT`;ZC*fVhkBj zxQ~`V&(@k}FJr)(2ZjEph(NB$U#bdM!2DUU^fzdj+=j!u-UMh`JTFWbzB8q}1F6V( zK$h_8PzieOQk zR@JXKC>rV-anlmTpd(84UkE}xauy#Gzt$ZWN4e|O#Za2MzeU`N(%4B{6O@du^`>>4 zk&8})ZBWQQ75)f161Bcc=2@c7J^HG2T$oYqx@@tt19B#D?^fkvUAOOVH;mthYG?0S zwSFp7+SrFLE`8}Fa~?=O5>AYDG^zO=TSc1g2TA+=fDk!6of%k&1BEeg#2T{(#^?x} zs)5HVnAyydHFB_4Iv(yDj2h2RCyCqv51djeB&`F#awGH3Sa06Ipq0jahyr zuLkKL$IF;&egfJTs8DO$;0`i5555ZupFXYZ1EsYB-a5wUFw2Qko~{xL@4^+4rG7;D zju(D;^Cl!2lN%~sP`QYyE`-lzW_xUAy06XT@#0&Yd1_*PVS~-E4j(xa=`ufu8p>VE zk_#`$YB3S78kVS5N?i@0xVFnb!yp|AD z2RK|>c2nzWpyOjxv#6&etOhnbCZF$*fhI{E;-VqPbzC=bj63hXi=<2CPcn>k zf`Io5*Sy-ZL54n3Y;ak>EAR}i8Ub;q?I<{uBvZ%<=PV?~Rr?(s9^;4A59hm*Ut0?B ztxndKi}yxI4xJ6Z47vBp^P+3{(bulnG6-ZlgFybwBPHFe$BMWBYla=@*v2O$n66?|-NBxq1IV zo-e#dP1634hRuqiLa^tgZex`7#xeBn2@ugrc-NFaOci%4r6FVg?J9Eg3-B2l*DM3h z)`SN-aamCNEvx=LfsxN}DfAPRv(%Fj*DbaDPaju9$0<~RvGNQ>RMs^1&<>^4X&5QZAQvpE8W%9cT<2C;DQV4|`wJ_#OHi zLWY-uR3rwu3b(M$5Suu7E0C6INeUx&ve$5AxG~hK`bAK1CX^b8X}oaU^?<0%dKQ=c z-PhXd9SFDu5M9&WSfPvLi_%$lVxIiXc`wGHEzrl;$|vTkB#zUle|dTwVv*leK$%ct{UJ}UP$ze)ZsfJwkDE@$Rd!=FTW)3P*9^}Erzy-m zrx$cHUrJvO{>gr}4@28#n9nNU}AKIv!%W%0qs;NU!9Qc?SEf4TLMp5>Q21ZxtLm zdB9V*#1Q{cloNxl@uF85uU4l2btvN0S!~PRUH%v^=0O2z!YPbH)yx~OZu1N$2RR~- z;y0F+aj|oej5W-A_D}jLJ^NYEc7*A?RIUErj^jbSDpRxBbNRh`?ee?o!qqS8=C;#J z+m{c4e~uIJb*F(CmG{BjY}Wd`YK$zrWmwN6#Qn4}K~B3=G^XbgTpcr6uyR-eXi2TG zGR}Ge>HD!O;2NUpR(!0A`1UDuB|5ng)-PEwv2*2OQNo~wJd^OA6QQw zfQIZQ0z(-tAdW^mKXY*O(w9DHi1&a%Nm?9q@l#Ce2)K+O5?$&`oJ*hsHGX|@{*pJ$ zbFuqW@8J;l8v4Yg8tSy1+!3XO)B^2@vszIFpe}OiPe{nV9AEbZrkD1^=thSNj0Phc zr)>qw1}*P*m+cl{AvcwUk&tB$nN;4Z-1lbn2*{9#Qd_UT*`Z!pngSba4z!l{P7WW>(kq;X6$PAB6s)YLO)b8}KBp5IRF6P!^=RO**an9Z| zc=4IWqBB7`_QD`O)*M~{ExFY;4}1PpN%f~?%_&o%cVi8P_B-qI)}MA@mFI{)a%14E zxq*&fGKT<=>I^uA&htagkVJ^5s^O3PlI5TsHF0}t9Z8wOckc4thekd5%0D2JH_X>~ zs&<zDqspbXJdeba1cj;6{uw%Co|s~EKvKZlWI@L-tKq~cRj@P;4j zc|1$x6dUL_pF>ROB1Vh!`$$ZbDT@vzkM=*bMxUqoc0cch<5*csRAGZ`n#9GS5~EQh zTlzhH1W3>QRp%5UEkA+YkvW(8K4KC9CA;$!1xjS4VwHg!t~O_KB_6e}$vU`a_-u_i z)K*=dD&%t4%A!_hC?6EZy{U8N?@jHI?%x_GaY?*!ef2K2EQ^uVz|9w?4aVgHnk75S zB+MXyS|~Os58OjJN8NDkrb$6l@0}r zdzy;VpXTn?PYSe}X{_%I^OdC|M)#7>BrdtN#&SJ7QM1Yxdb7pq z(7Qr>En&u+g!G<|J8TpY!Iw zh#gITPckN}*HK6(ttkr`kIRMrNrQi2RexlG&p&q~Tj^vWT& zn#1@6gICPQNopV${VMcICs$<);Hz;+Y`%j*yGtsHdITU#_TXF!g2fHh>SjM&t7b14 z!(QbyEfXJS@T)UKv*-|!%j^FD*RTYo>j}{2|NAs(Z^2;tlt?rU6JGAks*BpNm89#O zZp9JRMuuG?wxJPhP50ZP&)*@}+^&01it3MH%Z8Kctqat21ksm0>3%XCOp#czU2y+` zc1>)>%6h-zoR)RDj}>v&`*X3+S`tw~jX;K-DuHJ+0hIsxf}|~#Un}QD=dN%+^C$~y zR2w=j<^Cib4a#(d7}V56zTg+g^P-BZ7@u0ei6N+1=VoE!cLIDpL61^(kJ@qufidG; z74m8mXg?chGrJy#hd&Q-P6nNXM(xR^mW93yN0ev(vXDAo1`JFHkPVURC_6u*A~Ns0 zRnEKB)V3ucAkn83ktp{GGLC{b4mWC5nd`#H4bj+~BTyER!8-A57Ru&Jy9mpP z$Cg2lboiBwhC3Kaz9Q$;tLgA8!TH|*OtyH&80n|QY+@X~^5fn=`rIP9H=f$Z^||Tj zxRyfUqD4i*3cc+G%Y4$^fWS7quG@Ow102jSO%i|7Q^~rttB@D;;eokCQL1Hfzxb8A zBO(R|BZ0EPHF&fle0ewI(eZOdwS81NO6ouf#C+_xyMf9_s=vTZj=BP8@kreB7Y>h~ z;_=%R&BW8S66c;pK5FU#IpY}c|ORF~>GQ zg;lKe6%Zw38o_DvwEZ^J*?IT4CFUvyUfz=RtD7**kk`$Zh_suk8(!#satBcqs&43% znJX`%Skom^pm2wJ;JBNx0vWu8$bnS6f7VFi?hnx-6q4D}Q|I-Kmf?$lU z%_FPkXeWc{6Td?jvyAavuVk6)L1Kt(G&QiV!X_-RsS?i0ecg%^OgJr;b|Dw?jjux2 zNxMxm`JHFp7tFsKB`x`YD=>6nKED^IEw&(PmH{a|Xz84O2PLm0yK z_^V7l6;RBQDIb6sJ-FP6=Tv|!+(gAXmoVYn$2KSnY$Xpi<9#5`YNWli{=njm_y|rq)GU#Jt}P_+_ba{_$0JCTl>B|kJ? z?_M)~4c62xiPMCCKU-vV%~_y#&QHR;#_Dm+suq^xwR;>*1y22m4HSv!)SyNrnkh=XaUQ=Ss zzLqL@-E8k8pkHY#Ws;97ljD^{m~_~I9pl7qoX>;QU2hw1U_y7UJWD=u95@!n#KYlI zj1=6Z|7=^}yU>|6dP4_VYw?peF!fV)d~+xPVy^`5_NQ>7LC#dQ-^?rnBhRlFaakuB zUt${SAAzKlt1w>W?=#0=%9I{Kg(3Ai%T;u}h=O+Sw88V8Ar=y|d#jUUpHUw9dz)YH zkXY?Q=c!j)*5mMhAoQ+S?JLcSPyngL&wmXbWToJ`(VuWZp1728-I%n!v-+u$RU&j2 zaZLt(T0W6W&&B)axbv4FY!0b(>8DCqcU zQlN_Ip9@BUt-&$YsF{F}@n2V76S4wY5Nb3LU7mBH`5AeD?~LH!Z60H2m!CKsr0Cp8 zVqDS*qYg;%b2T^7@tc14seD~f^So*UO4oV^5d;MgNFJNp~9K7-d z``-(GW6G+PqeKkNz^4;io_DPNEQde&sR}8eO0B!UzMb}m)I-zhJ7Ge0ZyREMNcfHOxy@TR3t#?lmhD|Lf@&X=!6RpHg&ShlO$4`qt zF^`TS!0#~8YTx#Q$C2WTR*u-hZ*FNG26^eT^vBQmqYFAUX_uY{IQQNMw*Lh5)3fwE zp}A3KJU*R@cvf76ie3Z6*3xD2ADtOe3dHri-dP-2o;4FPwp6eyc;_1O?dPwM-`s{3 zeD>4sz;Z#E{NT;sxYcD?;wHXWkb#S^gD*-f?${J+ae|H2)EaBPx2^&7$7_91a1KE zor)j)+<_X)8a1FKS3vxtSAh2m#_=Z~Uo)L15i~LKthpLd^D0F~)um4X#Gv3#bW8V_ zW<7DVY&}LIF13)I72x0H&}$e6>cA~TO$~(a7Qub!*6=qCmKc=Z)fUelw)_-#EjkZx=DH(jfcTPS>#(HNS70wR+tFV)EMtXfB>@qdl>? zatNT4ODp1(`K=!~&MQIPu?_JqQ;6YGYu~gN3FYq}pFbn=qrcbFajo4++>c&}KOr=c zbLO&(`sb~l>K7(AinMjAg(uEX8`eUoV)cS%5Yh$cmClCAKrFxU)QLC+Iwqk_S&+%1 znNDl`0zeF>ZnS^y9NywONx;s=;_FqtQe)BQ{ZhbZy~Q@C21V%SGa6Jdh0JLbk_rnL zA*U=BpdhjmLAwcOA!U(a<@0Z<#gYPrtBjwUtQ(k=utU@K@0na=BIn3Rm%0NnjHMd? zVyx@#x09&@N;shUlV=ya64B9ckdG>^bc05_I#opWjG$@#;qJGmcTxv&|Ls1}C1USa zS!ClMc*>N1z5UkD|$Ahe={Zd#_(w`WkqncYP5Vk*;|dMAi~Xsd;Vl&1u34I0?^=HqSqZB*vV+WJ8D^ zg(dZ$Gxp2WS0)@pA9Vj)A_cY0+h>GPoQ^fqXOln4MdFi#=F5`W`33KB8}OgCxP}F^ zj%;9<$jUcIE^;a^B609OKA;ZFQ;Vq*^M3 z@;BOvRyRh?{LZIU0X8nOh3PwE@bxq$;Jv)8EvE64fbrpvuLD&DqwN&|i`ur|{2>f# zC#`(^{ke~Mg+`?%n4bT7RE}ine{ByXdURCkwf8EouL7A%%6kA9FMB{jH&!9*x$ZUH z3XG=}=-MdLbKFB(YJdG69V>Dtvgc!;{sf~d`=+DA*UIi|033`%&@TZJnqH<c#&gX-q0~Z5Y&z(n!p? z6D`^bHPJ1gtE?qwuL59q;~R);Chn=7*z;0=M*NI_v$4=Kfgdn=8P?idDIoiP)q8a> zAqn}i3$KKhmbEKjk|!E$wnk~gwG%eP^_1qgS&8If2>Rq=q?H9EwBPIsTs0Xea!ntS zBNp`blQjcVr8iz>wV>w1eq9+o5g1XJ0l=c`so>M1UxeC3OiJ`FTf#aL*;KFPI88Ud zJ?8?4hV`oRzYR1&qHKnrpNaAW52pvS061d6 zbB0a?W6q;-9Nj%h%o2mb`XAP52Qy>?;~H!Vceg^T(jeF#a@DzTX?2T7aS;#$u@Z=6P%UQ(cS|5 z-3j<}qTjJrq~nM?SS8dI%PvUU1H6#lA}kYHUT9=aqrAJYW#i`%R^tJLptt3qFLvPy z#MLI~g?<_`BMYq8Tvkg_A;^Me32g+7q;4~kQ?D2+9XOYX!0%xfS^PVJM`BqeRy*3e zHpu(;Z@nape0V23!>3c~2a58_f5ZDAeatMo#-KG}^?%Sppss!sfaP`_e#g%T8gnW{(^`|5ZVx}?)Gw`o zRzM%NaQ!j0z4y#)IBU96!B<2FI0e}~OEN*QT?sT^te|GN%-b7NBTsMvHP+AnH_ZaV zMNA3?Xvg>vJoek<_S)=+dpg!@_1gx*w{rJ}FK0yq#) z^LF3QC0x^ZWlr%TXzv~WKLOKUY5`lRVgj0YG%(glJ^jiUVqWnYjyIEFl*u&?$>5;A zf!0tG$-eZf6Nzhu8k5+5|A`c6=vV?I{&K|sxxdqBxw}Y0M0(pBN?3$@_yOk)NHYwe zt-?Km;r!8BCVs$x_&6mTBPT&weVeRxNotUZ|2|#{qqMk+;*V)Xe|-_Kw}8_l77+5D z0-1Nf?voMJOl`JT12ji;ePKWKJ)n{Gj+7WJ^2ocqz)hYj4839+z&$DDIF*F);< zhU`C;!QWY(NC3TTh=^q!l710#9wQHWSsj=>uW!Hfx7O}Y z=6NAtO7Y2>iVNY%n?7;>$3Mn&VJp?Z91-CO5W1Tqc$Bg?{VN3s4Y#+{EMO%rufnN& z#bNAmqSNVrKL{F`Qkk*&N4sCI{JUWOd9r`LB%oduDOL5J>lOx`O5idW1RN$^%K+zu zl0b@%8cionF1uOqa|p+L89Zr&d-6H{*G~zk zm-#Z`m^3DQIq9ygo4CmbKBI5>sQ>v-+g~H41wTxLd=2(bw9_yJg)l(j(}7=9I#P}( zCBNx@()EEm-YVb$AheBbH(zsu$Se~bUUs2nbYg&BrSed0pU z4HyCU?Q@e~XLHV-|NUbBw?Cz$Mj%k2wGM&n)ZH_<6csQmWEh5h*+MVKrDDqo|34QL zCi4ZF1Csm+>dO_6gwZltAvHv6*IbkS+Otf340_sc0;L zxpZPGakte95bV`(ubUQcK=#2ofk6$ECh$#-3bg1aV0iR{jZK2X{SD1)ewRof?aNTz zi(9%3u!?gjBV=pkNSh6zeFKj}<4as@70a$f=ZTHTL>T~kze`YwVG*Dw7X4W({F6%# zg&hFoJJD)OLH&*Aa3=NjU$N);%F=V9{8Z9G7iH*Is%lg{$G}Qx4}gTz9%O`PdT7C4 z6|qTp>*0wBg?kxzW#{U1^&CJ6BJM1o5~-~rKzsz1$VcEbnFCqkV~HGOVRk^29I0`h zomzE9`}eicxnd=78;yWhFy>-rtXM-9>&a)=Zh+pE%v%lC7o80iP$`dq5qO&YDaRAL zwEY93i(VU4<+Z?`D^_x3;BQ$P$S%*hI%d+*8bz}L<49hD=(Wvz7~rcFutQwGdT^bbe7k`(+pCSpuv}HEy|t)kiJNz#Y+15+Jd%DdAI~1 zI(sFnbvWA}+wIi3{cwQp>?=ea*6NcOSxdhwg`8;%VAP(SS28ryF>R7<46Zmw;ptyWxS% ziT70-Dg`BMF;r3a!CPViN@z>ekO(j@QJR=m?EgOOQy6jnZ=iE#4yb=QsCmaAEla@Q z&cZ(NSsvkUBs>bsmy3*96DZ~&g{}Z>+W>SJ#d6^!%AwPc&tZ)Z0S+oKgXbsP90h0U z4S+W$b74NOyu=~o7Zr6qeCySY%@YXA`WX+mJ@DxFo3SO*pRPzKg78zP9`J zhMy|n{3r){Xa&|K@|t~u2IyQBW3)&!_BAFQ13(Fj+wPcx_^-}|P858IiUB~m58~O6 zV1Hr*osmj-Fs0j#Wq*>kG z1Tqmy2(_`;R}JX80iZ|^bYYUcPV_$PiPZ`O)0`RmtUBM2FkwhusMVC%%Ve4c1nVtC z=)`~w;wmv~l3+KHp7|Bvb>bST|WvJroQ+;FS;Hxm4*ki#(vF^)> zh^&*~A5;h0+gGZTkd2xd0rg$|Yc?{@k`p*g5nCw^Cu8JkgaGmX==$!s9Q*$LBq>D7 zDyh&A3hkm?6rxF_C8J&1Nok8R(~|b0vPyeMi!`-qFC(FemXdzQ$9>av|DMo4b`DH1K*7;wjq7H|#q62iSGoS}BpR*=YyD4k zu_~X0EjuKuqIVzIy?KGfd#VJ?J^p31=BmpN zhiJjV-Vy(^rBUrdkiiWij>bg0CI-m5Ls&jK5Djd$>j_)RlVAeURyF#M6L%Ek1QbQD zate0gy8#OOn%i)<13r|d*9stzbijfuspVH>v&rh+Cgegewv2Q6hlV!ww_?dlyZ77ocQ*hU?VRo*)c`6DLm=KqUavBv zJyvVFvJ-R~lYr&K=H8CXgpSgwjNi;vP+pa-LCmC$t^4C@^GjocqVQSfW~kY4Yz z-(aInl`Tk|f66TGiRFIpbw!(VJDLMO32K-vk8b(^mZ>=7mW*3BHXnOP;+F0;`FMv` z_*Rfc06sCSTCXF?j-ui*m`WK**mzrJ!RP|QnB^VnQZv4MR7IQkb#7i-zi#CR=T)a~n}3+if5)VIL0#jSgu=wu5iFTZ;7KU1Xv zL0T4rX+IqWQ5)~VlbL46$Re8o{4j?*&zIU4+!#M}^4+k2VFk|Dx`N1$n&QnYMvqSx zb|r;yK)Go5MM3?_Z%>PVZl7lm_Xrpct;p#UIoqxZCAYs{eV7U{rc^SM(G4D8=NF0P zkB&cyGBqR{WW%=?VYQR=rY$?)8!SnB9B3_Hvjz;bYy{L6m<{w?T9sxNtawFUz^xAS zv)Y}j@RlCRCcEiWpBf(Jes%Oruw!h6QRZ#xyG%)AW>_5FyX%e~2P&66{v+wdFWr~z zzp-X{1lh>FqcH|`c2lxL_Y)pAPfok{_CCC^#$RXA+&%_U2NPul31I-7FsjLkc8QmaCf zYzdw51WbuFV96kQOCU*6vVm1+GVQG`Xw|}Q?xk$TEm-yYR<+X&>|CKirLgPNDwPlH zw1*~AU8fx+9DK#amB)P*g1*9B=BZJ#UcPka$7Joy&717NzbDWdAUh#IApRWqKze7D)UKy5z5x_5T=eels zsY{`#z(Um_JQDmaJ{vFjY8HuQT(U0NO}98@+OSamSl>Wado@pP9r9odFEM^3pt|I( zU4Dw)VA`+N1%vt=8T*Y}fo=HfHM^(@Ge?;;!<~9ZSD=!q*KGgqCWzwAXpl$QI4}_N ztTvP}*Lc&0my1Imm|Mx(K~kWFe14G+qd_5H-#}zjt+iRBSH0}MmAUMWI(98lMF}y+ zvyiNOsNwUpK>MU5v>eq|`bf^sv_1@$Q;BCMA%~Lb;sz`Jm$?mB9X&76tpT{F{QB!G z!u}u(8%4Ht1*gcpdtdG+x$zax^0nqtHYsyZR+% zs7lFl2Qr%-8Cw0N0^Ngf$NHotqmTL!3IgBu1rE-OU;Q-|TqyT71HCEhF;79d8S}B3 zfIxzAh(92G=O${fkCMEe+vYj0b||9d`MddyvG1jx5G(WI8R#qd4Bf?+a)N{+P<`bc zJag0KmP(a~x((Ie^&#X^j1&wSjcMZ#AJql*T^uG`=Q(~Hkd%r0qH*H$ekZlT9 zczBy=ROepM+yIZ#H;1eUj;n(-o&*(i|f6r_%3MDqloGD>2hxXSxs=OG&#EWOJN8pTggw8T3rdEl{1TWHgv_5muk*|9CqdV;bWhU7w(udk}~aqP#Y!oDnp8Ol9Ks zp$JaGh-e&fuoViZGh~Z$#u?|Cu#)ArbfKcA0cg6aML#F$TfFttFU6j1+OnG?GVVW7 zLruv^6mvQ}SZpaT<##vo7W%m97x$g7%AtIZ{@8`mLDM4Cn_;e{;&x&z-=SP%fprba z_x){!3Q(OW?bVC_jtu;S^@8tSs+-6>LQt8zsJasvPrn5DDI|7R4bns!er}h{mg5Xb zAytFxZ)e`p6ga6O)w|*(PwcXV>BhBL*4F|FZ^JQ)nltg5n>+O)*V;QMWf`4#GyEha znsS6YzGDGQMWoq{$j9s7ens1+yf?^k3k%k?D-xG7lz}^pj%U29_xYkM`rNSb{pKMb*!zsS;gldF zesB5=RLSRVWY7+RrJO3ycRBM=G8)V*{j#MKBu7QnmDr`t8j|WVi#IT>wF z_c|M3N-J8uvm(Yd_WH9Z!`o44o1wGbepQ%|!pu?pu*$nY(%Y0_s?PuMN>eDB_$@gc zRsk}bMhm^HTQz;p{|eobob$X3l1EV;@Ceo4*WeFIm-VxOASZ)g_3 zDiPGL8u;Xhpept#gm`F}u|84qedFI@Jp(HSMg7_i#M1lKve+g581=H}zOF4#o}D4K z!>`lA%A4+GE>mWp?v)=L=pp06Na=6o8zVnS-IdvT?BUUoa=em8AW1OP1iq!v1~uzB zSBu4~#qJ&YPPr$d3zrC`xSM1~%P1wf$_>&vrKR>DU+aZgURo#0Go@kqRlY+I&Go9+ z>M5s+dqVzm_STPpGz%2(`w1TsImr_D>RVjEUD>nmf;wfdZ+bQA_Flg~f#7#!+|OW; z=jF|Q_o`Zw0~j^I=HVJ~m{C?snU_D;j6G%xlcL};_-yssj6a%Rb$d%6pBaYJCyxuY z{#pa=N0MghhE5{^u2_whAAW*k$Bv)q1zl-NU)C~Z=_=uE+6p=A4YS{&loF<6uis=d z2V}u$MK(UNyGIb=&^PyUCZwf8p5rP5k$$5reL$*`g`~MTobYspqO{KWwq|) z1ou{jR+J0}g97vvN>WiL#AxRD{(sZNFpp9f+k5IJ&#*eW2Oxw(nP~H1=jWgAUdTt? zr6Tmu(8+Tj;1C_%aCIT}aW8fTNiPZV9=@_0qkC)C5?K*i3cma}#_r=!{eY{RWq-rg z(gW5OXj1ZeId@@&WrGs!^mDQpI)R5>u6z#bG&ZQ z%4gnlGQ;HX$!g1KqCDnwn>-yZ4i5n1sFF*zo!}>a8EGTo!NPF-AZk><7ly$NOUcBVqHGZ zILDTg>aZx7(n|5wG4Eq!Ar#s?SX`XN#^RbEdo*m@L{C3vR7+xboiNIeKL2d+3xJ`y zOdy0>=D#ny)je3y;D71(=gW&tUPd>ho4|;w#*ZRzx&G^3-}Ttro=Vm^$9wu4>Y;q; z;Cp1)D(Z6=!F$JdE1xQxC3ASL-%iHKARfDquPTl%Oi&7C*@cxyJ`2h!thrBLx6i#7SMW<`HwSw>$0t#>3Ll{qA9pCWy+8SVKNv%26-PQ@oE5 z9v9sqb-1>FXEG78i2A#LAlwc@4r>Rf>*MWKyUEt_fiUyrgPRf^)ENJh$?)g+2 znRYk|`7qLL>N8`uJAQ~blJxNcL{%YUhv&=sD8(a_;w~J+Z-~}Ati3)eaMKjliQ4tT zkIV$Hu3eb%2G!>m%^BleIlqEh7(JO?tf*_?PXLiHzr}k0yoLFMI@c zkOuY9Q+xhHPI2eU4du5?@r>7PmZL8BG8th$Qw$l)sW7!6j6JGDcDz7h2~tbs_3c6c z*s`{2rBLj(vzt1n_-@UPQi~J4>Ll^85ACu@hwC4cI%1 zd-`glyE9l0>o{BrYNKwb zs@Duz6?yg%Z$gs@%W+)u`{J1q&j@jl^ZSKvV~DPub(XE5ntH%_joD9f^2$cv6kyTq zWEbqJ?uvNH`ZaJ>RBTTA2DCvt?iTMcsWF*|sC|o>^ch3%is{DLxHi@Gu?TK@Evfza zOhydpvxMF_W%T7xO-xQ@)plJ>F>e;?qg&s66%x`sM$IHzUA)Z%XL5d|^=$QA2~R_N zP_YM4TpFHZ&baHZAAJr;S(s`bIX@Z11?RtXLYWSVy9=bjQ&KLjn`maQtq=&XpJ}}0 zh18}AqOW5TUjcv^NI_$~ku9InIi1KTZP&cyk;l`Nm7g!MS}nN#C8R*6ghpDiUQR%P=mJ<`&?5<$WJU;$ zwyxJ8JZSL?qb6m_8*Jj8{p)(6iL%|Wm?yO^dqileok5$mbWzVOBIvm=#M>nufa^q=wa^S)j}CCU1}tLDIc9& zyVN=t-F$;>T>TB61fMajq~@#mc`apUk3mK6ssWV)Num1JL`Lt#&5NTiPqPVLkp4}L zx+H?Tid5MW#zmIU=MWEprwY)-7()<}1SGOkxKn*1h+K*b02n^p+*{)g2_9NB6H2^b`I8``jz%`~@gI?sT@V z_Q;7Br>$7(!zZhFd$VEX*9go*vKLtctM%lT3rw8-QR_}vyV=+mEWfbQUReUWwtdi?z?#&&Djb(=Fc6|fJvmwC3VJx> zVG3$8vTBuaIon-~t{mXGJtQs}xJj<5G;JN3I(OcCP-D0wao#$rK?6YEpLaa4m%QT6 zq4mxNjC}CJ`*8~Ga26;OVMo>v9gIL8O;Cg8Cnp%*v;E>&e_$sUR1MDT zfc=v5FMY5r4&*i_QYNixD^9twq z@?pCTd}ekol#qH|A-EyCM%g@ShUlL-5NnKaVb`m8z9r|y=-K! zu7Oky)g7feKL7qUlvfSWp)uGjHrGio>}bS=FAoT_f5~|pERQS}K@UedOuD{$!p7>L z#C##y{$8{__vc(Vlo}pGsil*KTOW zDo1`kn^X=J<}GrBdo7`CerbL(d?ehwz8AW}!Sq_$bi~R3j70nr(NR(KL0Mz;AVva` zDx*CBzq5ccw<9O%c72wQzB`vzki?GMChij_q$$0=!85jxHyycmh46Ep!aVeGsQ&Hu z(tmRh$#{Mj=m5aJZOFcFj4?;zu_SFV^++;mh|^c!o*2kl~HWFPKhzKKx$x z&%buMOvoFU_;aOnr0O5Aj_{v9+Y#%-8RlQM2`=z+C_~W35j5|b5Nw~tsrfu^fwFcS zu_s)kvXgj*z=Go5^pXYC&)IJ^BOp#L((mjvmX?y5MM%SR$4AOoI=AsyvTiFO7ns)( z`mf6UB>LtD>Sbh+vr?ujGK=qJcTFW0#@rtWj8!XdHy-MedG*H&fOvBLxw4Uya=TROa*NNX$zi2m;HLO|;iR_VA+T-?p zIWN6^1PKqm=FT{Z(N1|gUDgvV1I3et5{m1CcJ&5qO}bV2`=Oi~W`zerUM2|0%*KSj zLCnx-)L6j|rGc+N{?)yEZ-QJb+o;TxxcxZq)s$;4S_bRO@d0KS>(?14z)iHnw)>mMO6bZ zm%pG3`ZWyZw8}=wI*zQ5Pmrd}2>x~gWra!yho1E&i995+up`U(^s&l!z2+#U&HF&J znI69%>VI*GYmEZW^WF{pS6m9fDd9`>9T9i?Iaz*YuD`3sWPFJn55j&=~&WoOCGeq6*&<;|@6r`3oIKk0s*h>!fWcTlF^dwOVV z>bAP^fqlDv+pwHx2@y8Dra14DPB+Yr`h0)O4+W}!Ve7Ml{s>Ng_iq(yHlb_GAk&gM zBT*nPTh$UyW+eQZm)h0J5QCo1yL|huZP`$(f2;r`OwDYEJUvj!JCCFcAQ|S}eoy-+ z-Y_dHAY7462&n8DmA%9$BRf8Kd9MNVagezt?#ul?dnf3Cs+R#(mx$Kw5XC8UC2H^# z=ty?Rb-s4gdZ3oUq87`jL&wgxSK%~?2U-f;(KE8sfm)eDSh(QC&?BETUdd#jDmT=` z;_9r|-~2@NekB|T1lU*~UYY}mxfhVx>Ba@Q^}K(B5u(Y5O`e1%JONlwKz;bONaQUL z27bu>11tEwpmU3qZU|m79G+w)^R0AsQcFF*asSTy1a_fV@^Zf|g4e3XCiG=6d?k3}X3tVWZ64j{>wHLta|^s>)^!}wmcOdD!nR}Nxn*R3x*Rc` zLWGvi#AC^Iz2JVSO5n|+{{~DvfI1qXopM`fVG}hsq`D_OPuZ*n;TAmjpa+(gUC&x& z5>wNUETL|qnai)!A%n5n>+Gf5roG^zO<&spgCD9=*Y z5D?9!@w=fSJBzi53CG-ln3k&@Bjb!MPewO6hxtg$ZAI|`J_nWk*3lhf#clK@M>RB973%5>PWdhj_aMhol>8qxsH3a2jizsDc1@i#h8IL(A# zT8S)|jxxvaVJ*=))RI*g>b1tra- zGfZr6Im2;4N&`xSkYUg^l&n?v+R?q6xAYGK6Rp+yx*m60>ZkmHmX>TGVKoHyFU2Qu z12eNcm)5ThjFs;IjTV!|6OCUnj55cDpy~zr@mK#7E;(tTu1!g=k*?y|;CJN**2YxU zUFQk;1@23S2uam->8&RuH$IU6Jt;b1s#C(FarEKQ3Kf~LqkSI}g;74GVH3B!U;(UO ziO-s~>O0%NuX;MXpxR$qj@hof_EfdD^4MUt;LUw8{yM)BOsMgeD83Z5^dQGY6G>jv zqvP8Mr<+(KH=%1MUP~2GT0i2%);~p5)q&c&LP{T1o^QBDNyX*&sL!rfq(V!ZP@(nr zCzCt$K#5|>jKH4E%CT$e)7_eN8~lH5wm1Mm;2?nC?eS8%lkcvxI&I~xgqPk%W)rN2 z79sl!G;xv}M=dapAnToTgKj*vd42dmDBZ`uRl+RqNNsXtj0eq9_Mf|{2q^G-r%<;1 z+|U?^4Z)p^VmwU>+k#m3={&;aU*S!*qGiY@a7#rdr@ zq>CXrC>>}YXiXm^G$&kb%Tn3NQ+-8;?+Ul6Fi`i>rMY?Nsf8C}vTQ#iBB;&@A@^YESN*@7_XM@{VO0hQgx zOQw+#;bWAj7%81;vTA3wN;W1}T0kKM)`svLWcmF1LEu1Fe!EM@+Y;1PpAIcHE4myk z^Z>L7;MjXH`~Hvs3@l<(3OsvLNgD)^qgj3j`_O%^$+Dg{{{bf zgFsBLqUcR)d-2L}`9vZ8jUF_#4BX_G$?HWu#a@QUV6h2ygKc8!d$+&dBeAnhHQnM; zvuOH$@b;eff2*tIRq&$#EHL=Ee0Rb5SrJeAr2FAhpZR%scwR|ky1WrjHUNKqNve!o z#P2JinHO8aID6plkK*>Aa5KD4KA|%zC-?7P_~);39wkoRhuQbe--$|}mz-i~87Y{u z@vj!r-5z=%h`Mz5F&8BntsZjwHCMRRUz39jv@#m;HNp zaLFEvk3P~+PBW@zw|G>?!d2X5e-lw;d-SQ3Tl)Wg4Hp55hxaxyw#8Imp3Y@M%-Ca) zg}1^}XOT!(SH81yr+Jcjk|+C9EtA*dsM5VXThv`ht8?wi)_Bf9UN8y_kH6_IrJr5oKlbh}d5ZF<}y@Zk?+h1FK2 zH$F*7-THG*ejk>(Uvd^P$GqMAT=wt9;qUDd^#r?wQh;?o(-VLn`4L(I{rs4vTqaW( z+v+R?oxi~VKmmx(K*5lB@K&u2u!hf4g~Dk+62V3|e(unctcjA@0*?=WNt<^evfVD< zZz@6UGE_@X{e$K%gS%(Tyh9O_hXQQo?g(B1Pr8hba>xLy*RXH@3Me{+dr9-2Kwd>Y zu$hIUucV!Tu}59OGrUZ3hKC~rEjw}&GAaN(8-h0QF-Y$&+pbX#ChA8?_30zGY38vP z1Q>A49dkNnM{Z99qT?AQSF->?=Q7n;&&=EM{c^ueXE$Vq;L&$PG`MMu_(=79iw29i z3Mp1O!B@t4yc}KM=3|dE&V7CrpbS62QQ*RZyFYwiq6m(Wu=d0m6du-jcrtVYL)Zg%*myA1M!OzVUIdyho zNBIrj9_1J-#Y0thrVFNf3y(}sKg z;1V7J+&}O=N__9$z4G&ml8DUNyzzZ@dM);<_d?+pget>vpqBA}ubi^LtutGziKA`Z z_BH!NdV||;?fdzd=hUha2(!37*D%(F>~=4R0E-Pj;hnhTdvx=Fvf9;(*;M3|t#ko> z0%P+1o>H5YSu>Vv*EjGLJ{8Lk25s`s=W4wP#Gn#*dG&#^H$G3Xz$##u)3+-L=Dvs2 zH<~MDzCyvlre4U(qriN7eJ(xK;VxvBbQlPM?O~zrUeHi`QQrw$_fXidx9^9}bE$en z^(*wA@bReA_r8wS`5OIs&FT|=Ip=DoYQiO+kpQ~5-%ZI(XAW>)ZzG_M|HQT z*e=nay7&ZV?R5LqdyH|v3caS86Gv*fL8tQz;jtiufqVd=B;V6?82pX`@LGZwF+=GW z7^nIISX)b^7*UDAE9x6IU>q2XSGJ`raxFUdP(tTvt|s|^1yk#l`0`AGWv4TBb8WT4$0x9FV!*ys zyR>9cO^URrWk(C0$){g9kzA6L_jsY3>c+E!*VT`f)vPEVKe$cY{uOAE09osNo`x;B zhi?`jVyr^ammzqvAKxbZ+7D+E9!PKaQl2-Cu;;`HgeUbZalQX|S@cQAw+jR`Q({ta zM7-H_rtuxs!M@x8x#Nf^PZ`xI7*U*OJaVZ^$J~1VyOOKDXu_BRTUBb$e|dcnXTY#Y z7kD+D%x1{6g6xLsCn(NGd~EMZe(3GId<$2ElMm%0I{ zF@M0hJ05iBac1q|r|6VEzWs(nW#VgIeYTo8?sITwCKOvO%bdrGQ|R?gn6JD6*2zs6 zmyC4vPdnN~73Ah?i-vf4)6mv$?3wI7>+)^>Y{43(f2twwro|mtb$nwF{GVUY8JqCu zOElfEC&dlYTxfp=N1VBbx1EyeM8|2j(eC**1w$9}BAe}pyH1S0KZ`_=ON{B9u2>*Y zGkZqv^sT~?yK9jJ;tC&SW1BtO8miDmLab*IGkJj^9ByGjfJn(sB4#?;;e6*(^WV29 z_;0w(SU0lo;VC9R>w?(6$ijLVsS;OXM+Y*tZ{;vIgPCUTLam65fKC69}Qmn=j8LP(W**nn-4qU6(qd ze{0Hq7ya+Mz+mYK=%u}PzS#Db6OCyccr9-O=Q`*x+$D(oBsS4X8h?{gY4@|qp_@fh z^3WP^CO(`fMpzPV?ac_JH~%z~NxxsGS}H+A^M&;A=k^~G*X@?5&Pe(G9~GJ77Uscu z>_`wH zu3`jog$M zfvd`vJ8t)UblkFX(yD1!3%Ahm+3uJ&p#v@Z{)zkf9`t(m4hHTQ^`KPIJ`HZFdnC%d zXE+-k1y;-AJQq*>z$n=OeCBUJ;@kUv8on5lodxlKrQq5%dIU!`@vFp>RLaIc-Fs1; zLb!O6Yp&7rI&RaxZoct@cbJUnDa}r~_J*tfo(S$$3?@IGvhu5x!+=M*^~+KCwOl-w zF8?T^J-lGmJDB*z>CZiIo5lykIH1Jak*dd z*r;50_MzZU@`r*powjUv5q@;AZDO{p12e5hVa?P+IJe#!~YKJL=pt>|{MMePsrQ(IK^~*GaH#G$UI4idAv`&1R_p-_$e;T#;%~$uGVNUYh+h5xXODF%bIb7)Gdj15az<;WD<`~)ww-S{_?we|eHz#-& z5=3R0dBT$X;ijS`f=+Q0AsAhfE~siDm&>V%?>8M0b4!+N4{c4vQ5$n4uqzuqktTcmhV^6;~es*5VzZSzHs4ghSE zu=-(I*$ey0kX!0UMD)qhQm68+rcH*Aq|D)l$~itk2;2gW-)WBboAy=SRvaJhwn`@3 zd_riL`a+1iolHR@ldH3u*Qv+Zyki$=MR%FnaQu0M3`(8|5}{XDZ`D-Gu#Yltoc;)0 zF7!oMs|DyDalDfmXpyUS?P`ko3{m9t2@mg@R5Z&D9^NaD#+J#53F_E>`*3hNBkvop zHoYP8R6;Fqm0iE8PcyySDstcRjrQKlRL{YM7Fk2=dV_Ih%|J=hA+hpY{IsUzA5Ao; zs&f*ym7CK00{m`&wrTZ4)}8;}TIP+kYb7{-Nhl7yS@T2iLhk$(mwr z=0RSS6%R0c`E8doc5U^xC>Qhy?~)y&W$oWE7+HkLY*dZfWm)E^xiItRHMhqJy$a7v zYupwt<6LsC0%hd~C>FlG8k6@@> zytF38R}6LO*`c<~Duad;-|4FC;FGCgvfG=_+PxKHO2}~-4C^HA$#_=g=_IWQpb4AS zuV3HvV0-rku3w;ix0K;UHgpb0TktQ6;Mv2XiQ#~jAH8ti;>t**P0d=1u^-q1SI2z@ zwgw- z!!8CLtJ?gL=ZjBs3g_i-qF-iqFC#tI*}T@NtIc8Dj^6Q;lPhyl=>Eqag{)sn_x)UI zrDVLjv*~FY$mdLBsa@svgH>}qw}(4Jgsf|u5)V`fmi+hPzh`{T&(6!HiEQoi<&^p2 zaKpzb1$EUCG!7yQ@@qLE0xkP&vi&8hzk%;zgU?TFKM#U*$s&{b>9c~v?;cb~NU1+| zK2xpsZpoUj_DobY5Una@nD-XfZcDzQX9paTOD>qI-2xXG*f!^O;v(yTu1aIG7#sy@VKQC)SeTY6|} zU6;0QnRVy+byaQWec9^8!hSu4-BeS&%N$Nc=izZ%@c9WNM&;tOw=1^5RU4^=@71*c zK4mkm>z~}}ht|AUi|RZG7BV3x$<7O25H`O-0(v>@B}#~UIPY;{g_)nVM`LsI65?r4 z7@8#}{;GBVrC<6O3Gn_90U$g-H0=>A{3O6!`sD?6Oj!h`OPAXoAz*K(SH3= z{jQnk+|ACtra2`jkaPM4ec<62(lvAciiWm}p~S?)TY}Yg+=hOgS|(`Dj$Hb)^ewrIonod`FT^H=$=z9R&0q zfJ~$(`r-=v=Sfm2qDT+7Z-VHanW-)%tz{*`4{^X#eK8lBDBML09>21Gy5xMX@@SO0 z2(x*kb3M?qbOP-KCPPhNZ%@i&xc9mNxIu zorrwzlRe0xe)ri6b2#|~0YKPw^bVRS?>6Iwes%kX0w-%q z2!aBk%fS-JV9|gXa|XThq<-V{XMlw&871X#dD%B?1AC0KvO^!+Ap>C(HSx<+AK&2` z^Q%#HcLnOVtvCTI^N%`}OhV!S#`uS0<1_7L`1YB^o=JQ&Qu=USR=oGr;OGSuVXxBGz z{c2<3SJ_JXz*jJ0eas61Yiz>9iK?e=uGX>!<$h`tHL=pt4))bhHUwbmdkAY$IKHqm zX+D5~3HG&hI2u(1ytLsB|C1GIXp8w`LR%lds@7=T)yHmJ8)Y$kp;EJ|1Ct-M z#w9If44O(#{9vgLPKe)=k!y5hvB;HZ4o5uoeY?X>t2AS;rB?mIo~hI>%^R~G?|G@tAIoi=G<{YI)f(3);Eey5;QBHJ6nX=S}1zWLbZbPv+J7U^}? z_Z39ZijEI;noqIDoL)0Y1@`jLo!JU_tE@VD_eoh$ZM85S0qK3<18j_s_T7 zGS~@JhJ>F9{sE(YOb654XWgciz=#;IFu!Y3ALrmvie@$G1!v(KVi-N{bY8!WzjY3$ zX3(4`I8D0pRO6k5uUr3n4HgNG{Ytp^` zhBM|}v{S313#t>jtir#$)wi2}^7;E`fZ7-Fz#kNHw#^tIvNo^&Ae!O??13}-t1mjK zlusfSuJyiialnIOKo=*2gCsD4t#VQ7D1enh1CgX#%V?eC)DN1ET%td0;(Auut2xvB zc(!7LcJox$-8Z5~+E0*?;Zdklg%*gQWBoyfmgmF@rJy;T)&;X^z(V{_4f7|Y{XKBc z7nv~TgW^~Lu*LvHLfen8*uF;`kFEk1FFdedmXkpA0#^ZsXFR)F;G*!~9A3{ELo|bo zjEp<%8p3sM$~IkaD{jcRr+~S}J-`P>2ib4Lh3+6jQ=_XO;fb(`k6xm`{~DuBC9Zx# zMjWKM2|D6IDCa(q>7Pn_HN$5)|NFZhJxcni6#_>}TskHg zTtyKK;(F(sQlE!oq)7Kgby7PBP|NXCF`IUxGseN=5sz;o=xc=wgJFzf^dxDB8`f(u z@4wGFJ7m&yjR1gjhdxQN$JECwu4X;z5A+BD{A$JH*fGYWrb2i3`Xk$-+7!3t^N?JL z2!7S5t4{kZp$M_5i#V9rPk{39TWHNp07Zg-O`oMI3#o=O|NT-PsNEcWnMdk>k^<(| zz4)D+4vilEV5j@+WMXo&wm_RVz9uGCY%wVO+kl*inGT7r(}TZPb=f(N&RNtq%C2(f z`yT7F=g)wr6t*MN<=U?Ty+p3_%IS{wSgh13;7iqj#uj0%eJ}2MFlH`cw}zY>N0SOs zayU2MmZiFt|GjkC6$Dv9|Jl~~hTH3u1z^ZK1W@mPQYb`Lw&f`a?h%Gd6G5fhzb7xu zT>1{~XZKLi2;)>XHyZ6mVJZx20rXDCJEHoy=bAKYMw8BdIDeGq*u4J$*Jpn{UGP2_ z>#k(5um0~v%w63wA%bvVsh^*yX+t^iZv#|7N06QuMvu+Ep$2G7{Tff291q%z=ZeAx z<)_m02z2i-)L|-%yLbm&mI~73Ylrc`4t%-q`vE_s1$E9|+&}U6IJk-sbMlhmp8l7X znytJw=2TgqL1-QmgBw7V4~X~0ncFqy0)$7A91DTaw`ymU*Qg{;XIpiTn~9St=4-TU zj|%Lm<~NiDb~k!VzB(`P4P zQFPrc&@L>hM%hBJsLd^d%^Wk!`=o67~)e2#bXfMTE5@< z&so-6bc0hydnxBGlMCkF(t?r4Vb#6uNy6H{PZ-BtllJ^jhd-&uU$HgqplaoquP~Bn zN6HribDnQx07ecL+9X9xp-I&qAZ&Q(NmCe5`-6Vmp2uB3Uz&S)DBBrHNu;;5m`!}a zTB@s;*FN>3+&}iB$hB31o5{AR){lU5-;Ifwq&4mfi(33AIC^SsBcmhsOK;C&rr=NL z22SOV2EkZT0B)22MUnq+u`-YDB8)OVHyTN4H(h$gRy?|`O8d{p%%eMVqz6d^=Ztxp zAGAhZK=|1cR7sw~Exq8~{=S|YRFyFeZNYK{Y@}pUNUyA)`zW+ZX0%f*J*oO-uU?W! zgDJ;83+`m9gTuXuc1OS-ImNKaxWio~tFPUdi44DYWWcsV`*mi5OWTsU?? zi-YP3*RS`3jN7>d)0_5o>afSmN*g43mC?{`4xys)T)5&sJeQ$m*CZqc7y&yL)#v=h zKLF&m2>1c?O0A^lly#AG*pOoO_d$62@KHQUApR5@s!(H;P>By>(l0M(J$JR-40M>A@FJ1# znD9K{%i)h!@Cv;=kgqy}L}viZ`Pzn{oybRq_0=v}e`5zXZ(2NO|_yG=RJo=v4;6hlcKAA6yW5Ew2 zjOEo`eP>bg?Z}&2)r)D;dem`tQATV3-dt{m1!SoKQJ&nt-I-$}hvMHzHdf_Qx;oHR zTG8iU->&%kbSawTg{sT$Ao7qnza#;Z!G4T!$%|=#fk@L^wQh`w?*M8IV=$^tu<36S zeDcN2hyBvJx#vT7krIP`ATGM4B_l`?MiOz)>|N#3kXB>UtPE-*T7O!^S}B1}avU)A zZ&ZGuyYS?NiJ^79-S75NeGb*#qZ9=afQk&um)5|~C=+4r0P?Oe=M~H(I`uugRlv?~ zitACRHAtC>p2gKsr&MP1B)s$R3S+%{VY*3VFoAt9t*Wa zK0i9g%~V(SiUW^O`2Cme&DujBuwdxQ>*2b&u*7ZsTiq7e(??_yHF2Q=fA65jNnq0+ z-c8qR{FKU$Lp^i zPEQP52^)M8o1!_P7|EmHSH!siEr~EH=Uk`nzsV0A!|mDI=CNln|IH$l5k_R}Et9KW zUYomhw0pTN?Q3m;zJ5#+orx4W@l<}Zq}-E=P?1Csv{WSqZC)*%rFwB`JY!Z(f_@y2 zlxTDlEF_8EWgF9b2)OI9q@P{z+`A=!Ez7R59)Rhsz*QDrVIP|4}0gVWWew|_VF{m|>dD;T$D2|5M|EpE~BQ{c4j zNgYB@brASIDPRIn_^|CbBY&@143*kW=Tby@A#iTTB)x-GAHa&;wQIA}8Lm@Q@%!)l zKmw>E#nWV29cnjknU`oGTHF0wQ-n=)^Be?)@6^xr;kXwoB%Q|}o3QyVVG~w}+U4{i z)Huo?ER{o`A9WbHnP8Oz=vCFPTbp8hDJBq3@915AE*WF)RQX=3tEdaw6=AfWkUwY& zf*Jd`fZ%0nfN?IC<999s%-wgqa@TIL?3muc4M-sXXFRX}qbHpX%}N*%4HyIDnEW6 z*n<%G*BRR$0dB#yy=XnqXFbxhN8Pqw-QAZ8wDc*$^nW8_{cZejqYFj?+c0wzEv`Kn zz>~V#37iEfxYV{bZ?&3=or_%D%xs?1^{oE6o6$*sC_g>%3%(Wu7FH6Y)tTUpg$;q- z@oij~D%No`+tX^xrXXrKSbH0VL5Y|`A|t39A8hi`wyf+g{su~z%=q^^tYm+TiFG!; z30Zp!m?=U*RhcV!XZmGiG@j=>AR^kNF)lkeu(f&3+n4r#dq`#m$$5egI?L zx5}Da_kVTI&P0EXTXqPxPn`c5nQ=2`&`w!T!WI#yw>kZ7zGZ;UHT+qULE{a8ILS7m%q6ntaS*ds7rT07uEMU>uIvh z27Mbl>1)WcJAX)=vCx2R18HkzJV(U4TqjO^FVdP8g|7C{xx^Q1qwT2PX1Z%dFu>+U zJlfIthJ`oGR7g0Km*_%CkE&E@#58VgbZum{7i6S|aE@Quk5E-aDZ9(k>6)s~mcOY1M=<4M-4#!Oa36&Kr`(6?y8AYY(6u6QHj$XDt6&pTJVFDsl) zR576CBjw7n)rXm@5C;qgT2dX{J5d&2$Z1vDd(LPMdhUs3Flov{ya?|Gbbil-4Nk)% z;WLPgmv<(j99BR}p=6YYLUQVaM=3CH*ptXIvMQVvJmeK8c7Ja>)(La5Z9nB<4|2%6 z(lMdd@dzoqdmzf&4s}SGD|}D#){ue}&E+tRezsv+#`MOAD)TQN9(u~DcF4+dtW?eF z+o(l8X~Cph!&opWFA%wFe$zzICvJHjkeo~}k(kFYC_j)!s)SaZw(PM4}JxKtkQw`Vxz?qUZTeN)>d5N{lil8Pe$nHgjc~VNxqxY z-YrA^?y-0EonqMi@+as!PSTQ^GE^Rs(hyY@qKt${Bc%r|b!ajU&K-$6B_`20B0h{Y zbsj8&>+WlPiEv-7n}6XLbjIsgW8PNtoEyLlthJa8e@lW*M07gS*Q)cpdDd>hRL8f^ z7g%}6|HWaMAJLrw$7`RMj(1aOQ&6!+p39H4PwmcagdB=!u;f^_s+Gf-Q7%MFO}sei zMK74^Bo4aHIIZdLk@%-YU^q4Jk&pu}^=`(Z{?iZi9&2&_eDJybJmFo09s!;2m5$3Y z>D|ULmlJHqv!te_tDogZxy_xoUxG2knP9Dm1Y&tngl5A#Z-wCeNvH#7r>~3@vh_t_^`{gFVkWJ- znQZI7-r1^kW}ycTq8v)WRYM^W@_+92nY%?Z&kA>>OhGPM1v@5Q!io8$+>MdXu3u)T zu;Q({3rM?v%N4$fv^#j1`F(YFf6eZYbZs%-9rEpKdw0_d3&HLI1#(!~-mi(3-`*1Phfhu$K+;vwjU( zk2epZjj8~Vtn&5Py=Y+Vk6_jLqMZFZ zPX-uN_TN%8RQcyiOvY2t+8Y8qOT(0|{%A>&gvhA{%{5H1_+$ms1gcnxr1v#qio|s^)zGYQx3iTn zro!6fRf;NUVvmC`&uzPU_DS~2`^<+AKHJ3f%VV`^uC-Md{YH1#M8y@ew~b28pb6=u zPwGHdK}fO7$Ql%H#+norsFc*BAdCwX)R`Y&c7tu>%+y#qRee};;dh_G-dxk>@GxrA z;SCj_V*YUa{sjI}8tRL;BO7(uC;ok-fZgojRfDirE9I0Y<5{v;ocNjPJ+T6WmWB;f zjVINY_vh#A7O&QOJ2>Akau(x_0Yl(yJV!megAdhvEZeftTlTE)CW=1l@o~qgkEi5k z25YFAe!3eTw`cb9yWn9d)auONC2rYal1#d}jjs8-Prx(rT%h5LMcVbkd_!%hS@%`3 zn5pk(dd~19AG>h)D00&-P;k3=Za*ozU|IAD6wvsW6?DC&O==0>cc>2RH zQr8w|?ou!V8`x?$RoVtb?b#X03#0b|%9akbLlW$~GeqLuPn|er^{qO}UPm{Wh2_66 z!<)3q>QJ2K%#S&UjA6+}-E9hn2Fed8Ctl-2<>xaH?mmcOH`~vdG44^;dMtK7Nn$>f zb-cAQ6A|sdmVJ*2wy3}1wA>YWe?qM>0z3s*UsN2Z9xb^s zZdwvKoC!o(Cw#8XNY&%se<8v(o* zH!#a#=1U$1YkJjv=ucy(df{tB7jsGaQwzbKPkj7ilRiJMVN%>-G}tcIjY8 zt~wb_+0Yw*dU2SEQjWayJuRlYiEY->AriA%?>_6LMgw#D z6I>ocfeO%L$q?KhlYk4OlIU&-hj?_)1Mz$ZeE)#$r}y4!wx;#`v_jJUS5U2Yt-rKa zZRr{AKkvLNExY}8jB89`s8A}%bTt`}!4;x*z^TYqwS{D?j(s?bLycn&}6+`{K^I z(Em%t26L43pG3VTWD^&kHTM22+J|R$&1d^X%<>Cg;Twe5dD>FaznVG)qs0jqI1U22 ze(tJfVb7_FD-wce;vBLG&$E3{Pz811|5@NhC< zMOBavkMHig7(ozO08+4+U^5M-?mj=zq~Oyk~1mrN@=l`o+R7NyVrS?Ztu`8GyYdD za^?tSesna})6@?v*Wl|LaVrE`NrpfISaRst`Ex2lHrgFNC&_3Q5r=$e+*sCueA|2(jWE~z54NSa5`RY zzEOkH3R0pLJy~#Oz;qBq4?&PWg9xY6CvK`KEa6IeW%T_3S=}ayR#EWFoAPy#|L!S; zHVjPnyJIrU_K{vVF4w3kSb4+K9p`*wm%j*0*4{nk9A>N9_{DVOx(Iu#6I1)#>sq|g z`qxjV`lFS}%JZ2=3)FUZi84{@oWD}eYQNhjqN`cmKK&VeNfnB^51^Nc@~(!1^;xFV z1pvd$gaby0yH66V;S0x32D*TvE>)fK-u(W!bCtaNEw49yW%~zh^9)&0H?8OzngX1h z?@=4D9OYb!Ae-Zd^5ykMHUUWtM%jIpUZMZ`D~y^DM+VuHz@hez7ND5lC>4RyYg?I&^VZ>b+f_#KIm(rRgYIhI#o=rAQX`f)Ad(FD$ z;^@mAM^pEoJ78n&Le|_GU|xzMcl!k%C_9cUDrA3}B-C&PzR<-b#PO=$LE(N5Je(_l z`M$L^yQUk)E{lV7&@S7b5S0Qz>t2FhGj}7t{)N}6Nf9o_kE_&c5mGLn0aQEXrMG9J z0Qn$E!pUN8j+Vh^BimZXXXgAB+-J~!0>nJ;i(!*+w>@_t|5esbvdoCcUU&^1ZC z>A4urd)LY;Rk&U=x{Ns0dT(0ZRcnsEyKKI@+}by8lTj`>Ds(rxi0|P|(JeBsf8)*) zF19LfKRDoP7TPAkZY)~8mP_X5uvTJVcF+EAiv#1X2z5jl@G47-=!V^+qu$NVXAg`OU%J1K6Lv_8$u_~&Jx;7!nE z%Nbk;sNkY9E*Q{Or@wHa z=K#Ek^c9Fw?Rnu`;@bD&AQ%MK0`b=T@!jk?C`VPnzLEOPj;XLNn*?*^iRgP!|UK>NSxgrs{T}*IWE4_q8$L2+(WZe!(z`C0aMcn zzf_@8*=?u?Ri0IV#D!*oN=sA{)S@FB#C~Cf6wJ?Hhk3WM8LYP}oi9oh?SxTI7zp+w zZa%8aJTrTULsW6P^{o;mGNMDwED>B05mS5J-bAZ8x&FTpqxXJx5I#kvj$U5qAH@Tn!67TX2LR zovoo>0&-8Y>(d3l;OIFVp`4h2K?U=6G7e-nq%-#lG<3b#Oj`8g5W-#nuj1jyk&}O~ z2uhzMh*|swY~kaC5}WO$IWV+nuMXkl^n%-m0-KMPb)MN7L=CH%a;POjv8$6JE&(gB z!sQZ>hQ^`trVgj-5Ex*3_Yjcl(PYL@LUN|Dd_U-*!(o;g{mU$}5=PijpSiEcW)YZ} z7S1D9qGX9J(*W;U;at$E(E(~T@o?+tR-bj*uVnM1e_3g04W8j_tYD4IsnM^J*(R{) zhW|?&R6Ax$eCAsZO7(8e9~zf?si8nYF#Q5RnFc7@n;b#AbszXX#-Cu~@%pviqsyRP zFxUX7{5%{EVi_WJAJ9-WMimNcnQZI4+JWiEH`876f9pW@ZakrL%b*cd$nMZQh3Bz7 zFM1hP38W^0EL>9$RAPmIK8b$-Zq!^(xk%I8GN`Be+S^g{2uttYr##9>-IrdUA1L2E z^1eUryA9IEdtnffiIttnvwEv<2e ztzgbps6lfc2WP}XEa!o%Cf{pv@Lb5k|Z;JOzdU_A7&3v``EdwI@adeS%zL*CzF z=Pl98;1_fcfGoK5^lhK{d)F3eO%nXYYdsEzZf-{t5=B#ls7v`B@_<*-07R{O6w^)S zFa%efwcGSRS6YT#K{qIruOm?B=E8tr zdFD3Q)4c>#AnMqy%!WGImFM0hE47UwUG=J{-yMQx|!#pG$f!$%Xs6`0)c-8_eH3nV|FCd3Rk0R70I2pHc ztm4h>$a|MWbmOw)?fM+t2JV(Pe-e7oQTtyKT!d$5SZ8kw)62x3Zh%{iHl*)ckYmHizywODowwVzGgXT2QSbA`Xa)H6-T%s4c(t z3^eCa&*-0Y$_QeKJD(1SPXrm0mf9yr%Nu+y0;9k09)c}E!w0?fD=46qrc-jp>L7!h zlJ&w%E(u7Hg$epe5I(HgM~xmFW1^}nXc)QHLI5`O63Y0@$wZdPI1AH0=5wkgx(kKrO@{XU7_QnDHq|%)Q zKo^;m3g}}zubFq`v&Qng(I-v+h&+FZ#hp$XZm>MIWfgE}kY~|9n6I!`;|rNOLRg?& zD~`(KDN%<39xy0ye;m+?pdFdYo&=O%;Smp9{nwCwF9XrwKH!QIjCm=taJe{PaI*g2 zuj@nY-$ustNL~tjrQ3weq2MJIj07fCcMVVo46S_NOztKSceLiy9TPIu#}T$q1jz#t zZ9-xNEt5@#*+`CViF2Eq(po%h&g#ETB<*n|Ohp0i)(H&5r>I ze<U$cp$m2rte%~{=89U3?bs=#}qKsg0(rFMnB+-m`O0H@t9+sXt<3P<4G4`jY}%nMOV+ry4Pp(15BT#B3|Y|Hh(kQN*Fy+@Tl&zq=L>26$}u z&KZjTc?U2=W~UM@Itl@noDnQ(e?>yE+W>M}l`P+sVY2yDvmj)P4I&1IAXcpdE4wtH zlJIDBWZyS|0%l_k#M_3}lA=PnxwlP}y>G_!@WMx|F#2M73CPPbE?sM0Il3S z@D`#(EYCt;`3jUi^}t&yE#GW$hOp3DWQuAuzA=6QP-by$#A4zz;)Uh>reU}Jx( zj}BtsmqX}mkb246kp;=N!x(J49Z0Km>u(O|5>*t~O?i1PtHwR~*6}+a{mb9l5XrTXa7`2e?s4%Z*l=~VIXf>($9s?TQgeE0}hP_*s7iaj{6J1 z{#@O4RUAMAR%ZH40Z1ZHSn$H0&dS| z3g2Haw#Ykk=kufW2edIaU`$3kd>w`%ZG`hku?i7AFuSAiU;l(gZ*k5{aTFr_u z7F;rKKukHBq&YLhDX{=0W22HcCf@l6u^sQnal*>t3!&dAZK?mB8We_T%6%Qw5KV%* zJh|@6KQ7+9A_14<4O{8yq=SdezLbCif&gmKh267fwPqBCrUF&k6M}#*5vdIH8rn!@ zqnwZH&qUv_eaCu=)JdX@m9y#Q-w79(o+TQAvg9_hQd?%>1L9MO*aI$Hg>?vjE*HB$ zUQUTFWd2eLy-sd>h<)Laf+qD4i*`k#tASl{9wJT{xW(1Nk(J&HG{f3N@Ma*B`gaV` zBjRPS-*IWIhZ&v-vVaG1{1~;iJ7~f|r~;X;akUqQk^#o4s~v0OC15&RBlU58aG$OZ#{!O5Nm@Kw<&oTks0PXQB#2`3*9JtVT&fZF; z0!kWg9k3nPK0XGR5GlkOD_uSkB+(vvzogvSd5!nZEdc4fpn|ce71HA+q^>(NvxM^$ zm(>^lAkQhJV|Gu!Xj>Bm=hBSJ`^nF15lU77-}yTqV(yeXLPA4)%BIt=*Fj`hnJJ2x zTLX&1)5jpfI1jf)l=7u8U0gDz9gCEFuohsido6LgG+;e5{oZUw%Dr_Eloeqinl}aF z#qAWK(mD-=v(_b>qapN6yQfZra-=X4++9M{rA+~y{n8eX0;+FCu}|DcDD zXcRkU5*K9YQFPp%5uAdP5w-Gt>r1T6?m<@=4GLi^fJi@L4m)HdgXT%#rG3DNIwvN-m=3!O zg%M)T7%*uXe;^QCoS8YNkvqYMQ}1*8>tOasm&|)8>QND?VNAg6##nvGu|3p0nN&3? z!@Ar|(qpfr1hzwQcFUP7&1$=W)6Z*ITqg^>FuUo&(qdCTa#h(3>pC;#c$rdDa@0X9 z2qeGTC|sc&Gxjwgs>|auD%^XsIr_*WQx%tT4V`mAShpRt<|@~moMl_I!WFzv3n|Bc zuMS}`xAgnfna>h?pMD*?s5pVX=^g*$9cRf}o$Lo16?_H+;|CcRg&{S4wupe?60ELs z{D`E?ZKdwC6~K@;!BXiyn6Q1epS}^sqZJy=D5xl;A91Mo0Y#V_eFF7h=z}y-a!f0V zG~KLp3ZZ1Jut(YcOnR*+Fs`zusL|d4jYEsQYLw7>m`pXyYuk5f91LHfMVQ0Unz^bC zwfaRtQ-!=-i=J;fwvc~`X3dox=LqMmlU{5%#vszkg)}rv9L`n5SopP1+c$4D>GhZH zv9mx&6@Q1g!i<2!eeYpDKVGP(j8o?@iKSMxLUzOb#)!6b{0ELkJW83WHN?Cc-Rane z6dF{szsWnd!Ec@&whKMEtj!FLfv;}o$mAslzRgg|*e+M55BmG;hEF7*>3RiBn&)h* zi*-$mNz#Q9o$hSi<+Ub_ca7=b4n0VU%m^&bu1!U-h7q2K7S3TT%aPo5}rt?Tcl-4f@9|G2k^T3>5L)!r} zbxsWgNEgi&4s*^?D1`=BQ%Tniiar5{8E_p|lUHYYC_q`4e}k@NfVj_R( zf~j&VWNv-!z^KZ<6o+5^$H@(vY}zvPf}F?bX>=U0rR|f9nG*Y&v$7_rByojzF3DL`&#?9s#;I~31)wagLdPSI=dE*kWg%vjMRpM zD-<(`Aw?N$`0_z5OSNYx!9)**-in7+KxnMs zO95X$G@pL0^6aMa>0cIYq3RyG-|kq<9WofCTJlA~tSJdxd^sDcp-<&tz9DE`vb|88 z>;X+I$J}Bl3*==kh$W&o+ePV4tHmLU_K2O*nukdy9Se@`y510UaPG^ofBHBEAt5yD z<{ca});FHUvD(o2nHme3c=`=inqjP=yi!meHRB8hVn?%^P=x$lnhsZ!rmIfS!}FbM zofZ0tn0ftqp{RS{H-qB%9F}kar;Mt=2AcJnJM*36*=0;m4}3P`NuZamd@0g)s}ZTY zp}~$0?fh~TX}jveYTSfW!A&-|i9h}-I{{()Fim=pzl=r~Di|b33S!<+GWEKzp|^oT z)Fkh?o$w?W!Cg$C;kthqccIe3LD)BSL2Y1-jI7Bk6(e6Lh0Sokx(Qj1MU87Ns?h?) zU96t-?}rD|4PpglProwZaG;GW8dC-6EsrWS*mLmX79F6S?89t0QJ3F9Ei*C7Cb+W8 z4QrUrXela0m(~4*h!rS<+Dsm|KjVQkoG!2c>PzV@wyJn(*9(`r9D*M5P8^!8V2F?K z>gfQaeF9n8uo;E8%9=9Dg1b>M)6a~g>p6IbbemY{R1yW{DtYiB9fQg&U?v94i4VnA z!$K5LA;7PG{n&~|jD@<& zm6ZPe6g1`tP!G`v{OKknHVS2z?M%m+n_CLm?bCt&$XF1&5?v@~e4sTMR>QH`H4|vW zKm1Xskev`~AX4}H2`g@rAS`T@i5`nGKuQG1E@aTw7bgCdJk3#odn6GZ|k^6z9Te;9$0^C<494~hnzQdm_ z6F!^lwpzoNgU^PeiwHtwJUG;{ul$5EVm=BS?(B*|p+p?BAP<$7aAv-F6~fE;&VD(3hvtII%)zkm@7wvO0vR?S z1+(p}ef*#9fA1eQo)i(XlITXUqa~$@_*^CiGy-M2>CdP2!(S#{J$?Bl`o;_F zRmy6G-+}*)D#iqx2{e$e)ntBEsR)6vxlP1?;DCGs(DN`2vW5qUItnh=wbD+$2^Th3 zM~ooHAzXiUTGaamsS{ZNbRz+Uy;=tW`oEe@6#kFu;Bx>v6t|cxzHf|c@$ko{e#r6A zWk|8Gz$htNaLikq&)uXS>-Sk(g&Z9H5cOeU!@IY0UreOW!CB4_DZIe)N5)P0(2=C`w%U3Xag5 zidV<~N&Eh1kdJ;PYUq#*5krbb24_H_Q)vNmtg8*au3Ea$>F=UU?ZQr=IwnD2k3xQ(5nRM!ZYb)4IOzz zNKRTdF(kTo{9(;J@0|hETk9d;SXHL%1)o?^JTh!~3>6<98CvUN0`QI&U`Y^k~2j^N$W*oMiL#Ay*UOLw|`tQ6DO>g}eJ(cP?rp21Ivqq|P-WKkwfk z`}e0S4u2u#%^0829Qg156-9+D*lPME+DcXA?_WZ1$?Jo(%L!UHCm7~7-u?x(UQ0#9 zb*>|}N%;zgKh?ha3r zKnqYuzygi|ppX6lw}_EAK$S%SMdYIWtf0|1vxkxt#af@5X z>$6IqxbO9uSXk`l>#MiRLv(+iqyIb}xLIO0 zqeQd*>XVUGwxTs|$@2R`haTO(-vUL&hf6N|MlruCa0oO|{ZQqyX;%VKLwe~YJV;%m zmOj~ocWlaxr{BolUsrw1nwif`|D6rwvtrt-Vig0AW-~uPzrPT!ZDkW}@%i$oP_LHZ z!Iy=Xd@Y|yUy5wI4=724Ga^Sw@ZoH>6vE*X$;*IQNRee;Pc;v0r~EXjEDrqe4t<^7 z3PQG*FPgqF{pW)I{?5+=JQ+>z$#py?mVwP;PhQUfyE0NE0O@1q;W1U{Sgk+m&zzxM+sF?Q_|vWFFoQ?hW(jQqbhJJl7>IMzXZ z6KG!qiNC@%42hlK1xtp!;_y-JfyYf7EC}!Cyb@Ey*j4fIBTlfDCp?qG6&A$}fMj6+ zp``KIQ(bAT`adGZmE*06H}EyZUtRR98-l0D$Yhc=B>R!07z38jJd9jS zbdZAMT{)Czbnsk*15r@>^}`>W>o$zxuOt)1>u7d99reE#Dc2Lr?<)|K(xT0hBl`h6 zo9omv=h=kKL9ca$H3Zdl9lm>tQ|xv_$E@`dy}SZy{&675OtP1dNC<0~Dbv}T!wG$? zAJYcgfkfeByw>tbItY+|Zv#6W+=NTMDZ5wY)SB^*R1{-vt&E09o{xCu2%-whPhaU5 zie^99LByjx=ko3H=|Y&@$n0239t45b2qMHF)CyLfIYFyHTut06&1T;!3)|Qh%o_=` z)Bk&GXK2vKS}~XiHLn9_l&I}V;#Y_S&x(2O=-BuI2|qp>9NuB+>UclA`B8e!d(Y{A znZ(#AI>GfMAY+0io+4r#%4XVk*3ZQzu*h;H7Yg3O7)C>X>=W$QEdue35a0QL%T`bR z`-^}77uPZPx?IPKW{gkj1k}QDWmZuf&mW9~tzQ&Tm+k?fp+ezSe7^gk8^n#5pyhYQ z2nuVo?G_jy5WTEpFf+usz#4R6b?;2!gC~L9a))3H^m!}{CKjLTI4RV-;cz!23(;hM zhQ0Gc;u8`jDCCL1yo1|Mu>h_jj$k--rJjblruGNiVg~>R9TD}hHmtW7U^FpU*RX36 z##Q2qdB;amIQO|h2a5KaAMc+g|N9F6hBq!kyrkmte890!)lu7GnX20? zCHBiGWM1M;%bJ5D%|wbVMR;AHe52PKZTo2F=W@jc624&6o)1WX78NhzEo7D9r)~b< zCpp7}2C${0(~6+eyxM6jV3v+ zPryWn^ZCVBKJgT43|zn8~y=;e{&fq(EiMhw??{mQw_ zF}`MvhZXqV)nGb%+<#VeDjQs5za|E8Hs=RGj6TAF5=%ffHx2;FWT9lW*$%u4NaTor z7;M9(raPT?_uyS$yN+iZVL;#nYfXz`=qtuh^93)z{%m>y+&Y<(=2)jeMxRY&?X-6m zSE$sAKeCbb%c_f$akQn#6IxnD);TbRY%(Z(iO9-|j6%e#EeprPWxFv*6|g&bVV`&= zY?CRx`QUB+c?W1)s)+Y7D&Z+bM0~%FOha=&$7!Yc4(0`c+YlaDu9&P)G76mJFO5C) z;I`z3ku~fdTkY{)4~<)SVQ=~XnCe+w3`UK(gBey4641d@t8*SYgNp89YmaU1>k``r zL_+6qK=MEm^cv`pM@eM??qZn})a;o46gi)8kk-Z&6JXtM$nyM$;svH4b09vu)@-^o zi%;4kH#ecL_hBWu_cIQlq^T{~j?(On^Lh}I2@+7U&LRG~`4HL)p||Q8fTM?m4ZEO3 zcnzjb5RWA_9;m`MXTN|mm5AV-n=qa%0?1!NKT*@-(^nlB>#rO&JahOqn9AV=$-X79 z-hv}{r1OCf+KbiDt^w5!D#3ORS(XNI&y?8%HXvF1HOe;c}nu?4!cJFC&jxPeFK+I z$KI=E#0j?~M>7-YN7w)YOOVk5f%Puc3nD+0wRSNweJcFbx5!~VNEj$79ODG)P~~D4 zUoFPo`_CapX*k}@mly!+?PpP)R}X#LU3M%%`sL3E+jnnw(W4mJL=*MBj~=_}#qhas zFAB1H0p{**%&M$UBy<)BobKQE*|)30eb^pK2q5LoLg^PM1pNX!0G?jXmneS7`kpg8 zkjyM@iDl^ic@(H7b}U!?#!q}E{*6%5RGhF*>fvUPx6Y~{K=b?Sn8Q%pG4Om*>P}83 ze33EKFkB#_i0I93Bg#!!ER<6#3)^U{is~{;R_d()`{rrx`=4Mp$z63hSQ;Nru`U(i z%`{(O57xH09Jo=c55fWcEk^bubCKIzpr%KsSq?AHT3;ysFi>{32>AFrOQU3EgF^-J z#+r@p(u+}Fa*NFo}YXlQW}_qYTh5R+3NcEvQ=iuXWZ~Ewf1!vOWF~li6*}_IRB2*TY8< z2{{rk?LNF{Ip5{+BcR6Xfty~1$!naH5D%e*2bp8p%7tpudw9`lmrpIn+9b1iJI#xB zRjPhqFUxLfZFe8r_OrUMr_)xJxH@Kbd;VCF{cip4PQyn!7$9EQx4)*I6hE*S7fJO} zd)mb5N8rcefQ9v^xpzsv*b#lTJr%E8|9Gu!d11ZF<0CK0c_}imE?M=wEdL?aq!@O{ z6(hdjFc{~#v74FbXX+R*Lz+$0s&yqVXpi7iWYrkEY~1>Pl>aIY3^Iv;YLk`KaJ#XYfhE7Id)yVK0 zR*QLW(#xAXx6B{tG++*{AZW&hcbyGIQ3OYS!hfR$#va;FBMExpPft#NpeHDHQAi04 zr)~;eUqRd~>=~?aB7~t8Y-_a(c1Y_`oWujBh6&{9TUm-xt zXMT#@wXHj4Rm0*$);=~+2>WAkMJUK(84300Dhe3UpEh7i2E~M@jvWZM(A0!x;d+mz zOI~N$`=M?g_r!ihByrX<=6DT2ku$IvT0MX4g)+hD5^OT$)UVP66P}X+mj6jVkC6Vp zniZXDb8*e-h6Csj@|SL@vrbhc1_#x9k)CIe?s!;!J&(i3C!jK6gsH!pdmT_l(By*9Z=^TG};3=szATu_#e?lzZ%BW!KNA84o1fw6Yb41) zzv?o)OsQQ1gql(PXkj>t=pky1vnKNpU)BsvHqQz{({kULRX!b9&A#P_3vDvAu@Z(% z=)oK@CS6TXvjk3H;e*t6{!1bU2l}537sZIk+IcnbXvoxcY%cbw_P3F;oHF#Fby+oE(dOm34+sA6`5c-WG zKpHxw!S5-+Y~JT^@zLZ5ZGFcu%U!diS6{TouagLBB0{W6DVM3%%BsVS@9*M3h#d!6 zZhSqRUe~_u&5+FnA_rkW6`rKzP}@7V9>RCnkKzD0K`ysppkJ{`R~Y~r-2PpNA!t!) z)^(S?Ry*lmV__i=@4x(2%MqGWAmbP5bJhhenl9j^{b062=fh+6oNpY9;5eUq(4ST_ zdj(3f*#ggH?^npDXyh)`L=V%mj0Tb}#Et_W%c)89n_;D2mx%>wye? zH4U$%)i=h>S$KT2W$+G%`NHd>qQs-}dOSv{vA?ea1~fEx?N-t*monf2g=sG>eI=ZL zK2(h0SP7{#I9wy)I<|FZUk&zurRib}Wel5Pe+ZaO;f0+M%I|0TuL$GKFrZ;v;*$%^ zEu#++>-T=6b1-0J9Q%IX$M=0K6gQHo9l40NBlX*D=rY=#q<q=6JFlpv%9YKHK`66#)9g88QQoxso9lK?`(G>KZt_IB=iT)Z+5L8(|IWPA zOqiKqQQcmwR*w7~(b2~5X2id|OuBQcbkNUmGItTO9vj^Y$4mokHp zX*+jzAH7ryWQ@VqxD9!plhu0>w0>vuiuR4#cq1msn|6yN;H=%(0#l$ z^2Q|URO%)yvUH5yd0bkXv46)}x+ddEk&#osK{eydEEuY7XAc;3o_AQcXz7@ztjHd) z?Cj~68<@y^Qp1rt+8tS6S))n$R^A*V$%5#liT&jn*U(^p`!jsUd_B-DMZtqYFw))p3>4wtWvBP*`8(5Ts#P?e)bWxuiyhv`S zB|Adt-)-;MdQfHkZjVhl@p4u&`s-et=b(@@?S+AZmOTlsxs>1b+VoBPlE}wW052>6 zVoNqPZotd`UKfu~k4?s{pwG5}$2|Xqdba0f%9e4!Gpbrs4e-ERWElQD00>jjUF(w~B9x67}h| zOcNE&LUP~M6DibubLr8Ax?v*ASd)Mr;21Oh58;gyWSo~$E&IBK(@E7$H-8qNy>+dR z-Y!5g)snY+?2hC^4QU)qw!7)T?UoKUJ4xPt|=Q_Uij zKlVeBO~YiWW8Zes3JP%xS^T*DB?`w9*I6Mg#L_R`w7nqvu^LyB0XA63OSiaj7;V#LbrGE1fXbqNChd4H!u zfZ&Rt6l}670rKnubRCYO1NJxhF1?ZftG>EssSZ5yLx{ldh#`ohTcnc0<>wGmE@gC| zUiKn*s-7**6aHI%{+q37u;mEN>I@^pawK$f$S!keGbH^Atj8wo$lEl}DGOfGo>0jm zi~Quk(SP0CTM{WbxT=*}LfcKR&r9>8dtv)okgDo&8a zppUG!MN%$W?iVr)+PUbPgW$n^_h0M2Ntw$(oP{CUwZ0@Wjj}7*fyg?BB$->>NQUwZ z(WPNr&u2H#J}4fbFF_k$)CWAA_rVtQfZA)4(=|XLrj46{nMZ;3p)v|$rz8$b6pN>p z)ditszWdDZHqHkSP<&m5N_*7b!}7la0|b;P=g8`~p-kuERtifLBU_<^KX>&e|3_FQ z9_hl3{@)K>ON3DotzowFQH2%il$ErXKI-HG^A-mvSNtR= zB8>XAtxLy2tk{G}w+=WJP9zu04K*!=*5Nwo-xe$)1Op}XT&Ip*6W_^UQ1c9k^jQc> z%Qg;z0+EUJE5-*g%{?F?Yc|g8Jj>OtM}p2eC_4r|SRZYEf`Kk<061@} zr}A#i`(XqcxM|ni=~>eUtYjBRVGE_N6#za>v4b=S;9`g1xWIXG#==)@(v-Ut2J=g4 zl{`RMo^1c8EJ9^-=%3qmSi$3~_}%c7OT7PlyPe8Z5HEov9=e;}xkrx2RI`ssI$C7u{{&JZosck?Jr41d{rn;8Py4bK;65iy<{OUx9Tq3_P&xkyIIRQitCw+$*l18i zi8TcB%RUC6?OT9oM%n!FQ#I29@b6Vr%ugSVvZf+<5<=SEnWcUxaUxWm6a1JWRndVC zs5SI7NUE_l+IL`OQ0^29C_&8#2jjE_jNetSsEsGcWj0IKwemKi=08fPuHm z!kthC&{L$iW1pp{6SR6_C8tzYj(c{AC^7=xVsxD#i0Ilw{hT(?89q@WzO}>oi;6vN z9my|kF0a63NWXch4TH!f$ueZt~fj%J`?(!AZ~@G z<;6t}IfaKFDk~g(ASg$YEsWX5qTDxn1^{7U;l9!YVA5}lk3JJH!1SzEpeYy1Q;460 zVtJYfq5~`Tw*Nb?*s*tES4qeOoUK%^C=ElJCz1Mb8YFZt8S$wHSF+|ViQ!((YQHLp zuX4%UoaYCTbp;d-2WXJhp92cF33C|8D1=Dm?rx6LlZ_V z7Cr2|f=m!#XL)NZFdq`%9-D0Y_&?KE6{J~DHZpMaLwRCg9sDFxZJw~+gc73wn0%7- zTE+I$H1D<{NDTmQZ+dSz)WfG%c6}43>|)yXDrQ|p7& z2&3nA2|Rs%O^Bcakj!X~C}_||7y{f?mZiTyyNCr>U~?8`HY8t^d;%+@mx-{y<8yBs}Cart9DgU za%_VW6sk#=C2*X>5sNwO4-90mJ&H@(J684LFc2!87_c$6KAwxW@94f_@EZn75(T{$ z5?%?WuTzXLTHj#AgRwLHCQRSNxq_CcK~Hmw>ceR*2}P##%sef9Os)7<2%w~GJXsi= zws#wSRH+g;4D<$L#PUa~FIKg1ju9IXk)f`i2dMaeP9#bI$A=rr1Kob4Kq8LcA*z1Gvrevl3o&a%~XfAJ^*z zGY-As+}QA%=2xMfd^DIFU*4Qs+scykmjXlut(3o$XDNl_Gd&FiB8C$Lqi9GW0y3X2 z)O&RT(_$J#Lrm{n0K4vo=oA()c(uSYQlUfvAsIZeK|>POPh`X8kvq0LXvE$M9swP4 z4s+Tiqlr4g@5ykoRH-?ME0#{pyCS72Fk(Q=N&3wiAqybb+Q$5U880aKX=T+6;^r3p zMd=z=zr%bo2Py&;b3(rZqe))M<<}yXx%{NL?!RasJ1d%E`N@3h@zxR|l)EYyj%jZ@ zeKg*2dVLU^IsBT22H@?(A;+_!7?@nrA9*?{6($|Q1`KdUJ)#FI#&n$30QtuBS!6c} z9EX7iuiPjZP4>fBPQfOjuvw5VKf?Soq6r0^a9l1Sl^ zRwmck_^?InYCo;1ZgU^`TojV@tS%OlrA$mq{?#Fl!k@ZNj;Gd?=pWjvJ2U zGpmkTndImq1+1Dg3{(0iILG)BRVXB;*lU~yg9K+*!nKkPz~%&U>WKH`AMd{w!0Tnk zHVnE8jXs44*~_CYKE5VgTL?)+s?J-8kJO(*IFEgV`AYZcIOi`F575HWBxM)rHBTl` zKcV+{bz40$PHZxc!8D_AB1)Em(}&@Ej7>FT#Ug};^W8@$ykfp~ItVaeJAwr?dx@vd zW2{w>7oN-utP}8~f5AFRL4Pd_FONpE-?$>W`%Un9scgQyVRAV!0{Vn5U8ez54=N&< z+ZUIci6aK(*-7kZ0?_zA|zZC?S)x4|N|$E-_i62{b;=lg-~`la1j z?S37!d8~XlmrFR-`WSltsg~J^++bQSTPw^cSq5FIRiZCFNeMp)e9Z~BLoU0YcnDCP zFS7V1nEu`?KrWc&?8qJwXD-o1aLj}W`Mg`rdiif^8? zD6v+c0)FHYx*w?Em@q?xx%b80b3RP`lm{U4+6jaPtc5^zs`kUrc=UDdP;h8XCgS7# z*;T$rm!XE_GAm&EJ!qfp`$~p>z>axDvl;@5q#~HZo_V-)DoA+Np7Yd+;R+#sJSGY> zu#`S$RWq3%*_ox<`GsHf1=zp; zpCc2iBdRyr7olC0)v}(ua2+rPnc*u(dxrp zMfc8jzAU6|Y^6bi!2}%uH=z$>j&_~4X*~9mp1~xlgAHepN_cL6d~fSw!ooe`d{8?VcEl7=f!iZI};e`j7yWR-lpTHW9agYA;x(r!{@$mK4ahJFxeRe4O} zezSvA`{fZ>fMJ#pu6^@&-TIRSyP6Jx6%uIo+T-_bEm@TM0;k{zm?x?)H9sQ=JKlPW z$k}A%P406bDki+|wGOtF!Yc_gGk`Q0s@tw%9C@E)(6eNjNq3xnL>d}vAu=2;3xsi} zk@OWlht4a3$q1>3)}Ka*|8m`MHwRcPotr zpharE{92Pw*5VS_@y0HA&j{2QvVFO-=uP{ETM3U^eo+ao#Mytl%#*NkoQ!19Ncd70 zXd%BqP8>X;%E5wE(xd>$IQf&d%meOzcd4Vo7iN!HwWgJVrbkMiSR1bH5AAPz7>LKs z2wti&%n=E?{D{}BJI>2Q=n{7^tD1A3@0a9gD<(s^K9^*q&m7)e4`R4D=|{*ARPw%pzVm{(lGW^CQ=J5ys>S{hgn{sg@0O%a}Cm1DbKpfL;6ChX_#)|5PU6rF0sYJ}6m&{II37cR_=Gj(?Z$WmS=8EUZqcXoOb zz+s=K;<_T-326q#35l4jfvt;J;|s9g;-)m+X9k>lOX+X}f$p^t;(DIsh1E|}H7Kw^ zmE4}oK1w0Ag!wYyG^G0z`cTris~0rpRt@ew{Wfmkij@Fj5rSX(3Kg_>8#GN~>B8$A zq=i4q476^0O(I+!L#(~wI8KJ6F68rE9*2qTS_rk@D3J-3mc~6=3_ek?Q0Y*GC?#l zYw`(Huh%U2?SI6Y;f-ikn#=Xp$i<|H^BRh*9Dhy%oGSMSznmwp@)%kvCLGi1*C3LeHM?1G^ZAXJe2{tK5$>{gT^$hVE7%JWY_B`UN=cPrD{o;!k-69e(+qV}OM{gFJ@aU&)VF#I26%cOFAZx<)~ok zQ&DOdBLyiRr_o{R3g294vqOaPgoAqB9a`o42qOp8$Kknf?m=r^0ebOO5Zjs~0PnmB z`!G{uV#JOh7%S`w>_uu`4$;2ePER7tmLEr!REq^T2(e_HV}-IEeN`#cTb%8?uqZn> zo%st!(_3U$&ZSCIi;cg$`BZ|K0R7uqj)*;$P|q3Jc>XoLu;F~MPgk{3ndW`y!YOUA zR$+#jwgklHuV1hOJ=Jq6HK>FO{2RCu?bBh>xsuonYbuij;!SXK5h*fYRh_wHF#1f$hX-w? z@wnX@nha5%1wvd0=>S~=Upj%vYg$KccoXXDJV;5^*F(_G*<7|` zk*7Y}<*OyAb|EZ%SVFZ*l&X`4ZzMWGLAvJ6`-Qg#fstV4AE>@sLQ=-SJpL;EL($pB z8tD(!`Qp~P+_CW9bJ%c@V!Zq4(!4i5WnPk)0opv)p_911{q-vvPTmhHlAVr3DD$Sx zH?Zvux=QEcC85m4(A4&RIN_K?g{D=L;xD1(>i;GeA`Mz@1FZ#3CT|^Q?7sJ0I0BRe z9zijx4+znz=SlJug2=)Y%3b@_V7u{|_bk83vkA%6`fed@Lhg|}BiEE_&10hZQ1ZNP z@Z^soBTYVXv*3Z8Wb4e^A8?5L+gU@dZ%G>%^i@%%9>wESi_@tR3bERSd6VgN_s)0r zVyn;)g0XCTCZg1QrZ;-*DCo$oy)S%PmO0!MZCN9VNBRO zm-{-C!AToPg2w8CX;bo=(VSJ11V#&jaj$INlzx66O0=9=sK-I#$EE+p@Ej>!)E0fX zml%hq7_)P@!BH%2g3uM2PtIQ7%%Ew)musY|#!_PwZ=v6adUSM)WckWl)yo+S->wD` z2oXGCeUutQk`FcI?Fc_t>{F61d4QtxsrFNBqF%etX69F)N>4Y6GCO|Go>+y$i%v|R z3^#n7=+ifRWevwDU9g3v{egadEvf|Qr5BGA@_niFR8-Dkd*aI(x>RQ7U8G1E`;w4W z*@#Gp11GDmO-azN5Woe&Gr6tbTEh>u~$0Gnaofh!w1Zv(FVFDi~#eYY1onxK}egkew#$Tg912IYV z>l&h-xYd7jvZUCj?%ML!fAk|{l6U_rae;G|*Xz60jRD_e1S&Z#a&K0RIb{g8@^qg@ zF3P^}3u|E98Di-PgG@uU|9bN`81L^*v#VH|X?ppOK=@-0wn)+5k3-)9pxJDJqg_Tk zkj2qFzd|mlKhINT!b-m0jM*Ao0{{QBu*1I#PJwO2?GH+}YnNgAIpNBd=c~XTSs4W{ zc<|O>7S^2=K8<_EUq;|BetJ}ZM1KPi;I_iLf5DmMn z#yifx>QYnp|K7~J#`!M$t#TEZ=6v=&7NP;R#&{V~k!}8XEp_!&O?W!^cTCm6lY1MP zKT{+dR-TR2OlALO*hAia_f+Z1DLMKNy2{5xAMt=c@4EIvg8A_=ZuxqUZ0os3uRJUU z@dAyxu7NoaM=VpQi>XCqI!!2ySfXsNW~AWaAqG*HFnNRhXYw{Es!A}KX+9R0U1nW@ z9lEVd9&^|BBowPl$UHC&nCxa?BPGDlhmgkrJc!owk>CA)hUt}MJpFRbc=o=qUGY(J zxVr&I?J^eaJSaMBhQ0>-wGU(kbYY(ZWM2ZojHTbvl_cG((8zIglIA83&@3S_d+f)0qO1WG{Xt^;q4B(go)KsH&&?ZS66J26Xem0Bj><@l z9Fet)0o>M$9bA?jrOSCHNU{Nd?hL+YBWGY!c|ZfnQ*9}Rl#aUkKt5$jiIU*2V~Pi@ zgjKPe02;aU%nmJFJBRQ$+i*b>O4UelLLGatOm+>#l2MOPxs^xV4=Agjj@oRs=B6Vd;p577Q_Hwid;{l zER$Q_TL~b8);eA^aqWNoJMhGHgu#|=_0*D`D^`J+Ws_PCySLih)GmS3G{*h39qviSVCf za9M4E%P*O%RL8mo+P#2zi>CN6hm8Bv`d^g6N#ef&;JIjW!kk^&<#978paA-#k^4_G($MNaAQB zV|ec2&ySd`?>dppm!i!DP)zGy@cI9F(Tn%|i<_9mR+xADCh=-g&(}|{or{G03%x5w z^lpQD0Sk|Y1Io7hpmIZ`o#oHiHp&@oY}^!5n% zmaWk3;Oep)d3V5fz|PAVEtYnMQgSr@A#I%UpKlB!KOr2)J}#h%mYaS%O8*r!%bCsg zpdKy%`vySj6%MV4ScNZSEKo^b?c+DbETZ`KuAYHruR8qjFZ;k`-(&|Qm3r>o&wr@Z zEV}S^yeG5i9-zOZsY|=%#(EDXi)48B3EEx=%tx|^jO)4b8v2q{iHUJbAC6H)WT?1 zjmfBR-L4C{zCGAN)%C?@+3H6eD_oo*UAiSa{~Gq_Z_Ca(DA`uzW6_G~aIkFbe)_A2 z5wH=$OvlAzhCBuOkfUMooaE!K|+gGn3?d zX@@opW!QrRaOEAS#qpQQjG+pqBfm7t9=C+{X;Da-ZZNZ?c4E&sZ%6{=kSa zNm_Wq5-9&Zh#mp*DX^fh`C#+f-r4H*Lb} zLawqH+)K2$YR459NF?7zkKAZd9fbXfk?7;l8@&qHQ>cQT_C- zH`ona5$|k9+H+obQ_KC9RG%sApHLrwldce;6K$8yNX)oq*8*0uZ4Y4oR&gy&A4%Tj z?I;8O$=Q~}`^WHb(jvpYRp^S4Ad7=wysB}w(fiEDTd>-O;p4@4EAVJv`8xIA>vG3;8od>Lf$H!ExBQRo*4QG!p7FTOY|~aZQQ1_4}%a(jb6@AE4Qax z*2ok(J1Lk1Y1XhwOCiU3ZTdE8+k9o)^_j1tw)B}zIR!OXwY-b_P(zfj_jEn|0%qo9 zWB3LNfWXPTZ9hg&*0K^f{3E_7XTC92C>vV%Epm_|vT}MctBS)|SSj1`E09Lx&ao3P zXWBnkZ3xVz31|zFr3o8S81=SM!3Cg~KA)9X{gJBxz`9>Q2M8pLZKI=tGf#ib_o_V7 z*nBhU6??5kAAJH}Xeky%64>1)0rD-c3&s5e@HREDnR|2{d%Xqd8i^jwq=pog=NF|- zuG=I(oWX;6)!rn9*gm0ch5Q2bxuy19Z_{zvwH}z-YErDgf1$q-818syBT9NeoQWh=HBl+4s3+ zW|nymgQfzlKtZ;B^MwEMp9_%Czy)m7=dYAYYDfTr%Wd>lNmZO!a zaS~mjetf|j4Gg&G-n^%kcxj1VW`Z>sL{%zH>~5#2b_@P89-0m-Ej$#`E7#v)P-_U{ z*O7x0Lz6?;6)ceAHab8AmoRpD8`|GNr_P2IO0^;5QBBka+hcd79!9g;lkz5lyTX1J z{C9_DDfPu62zxHgi$)}(HogeA?cF`v5E-F9dtUqWZ7qW`ivY#}gAjDut$6ynmx^6i z=%(8N{_FD=)zSsqP=8)=6 zrwtByfVfO!=G8V+#4@-Yo0tNvyaEM&5W`c+Bl3;4q|_=}j8A2x&yI@s2%O z0$NOhb5qe@`K8|gfJZcy^#XNUT1Sx9NB4i14X&j0*-EoA5;*@1#XB_0-fZGt`T@KuU-%68+q}v2mp3AOf2Qftr)o`8 z)#dCn=F9nW#1}N!R&686>S{o=FZL&5OzW&EIaZ2QB|A*qv_tu(fNPWA@%Hz+tLA@S z>jD`Dcizr7$BAong4dw9-9R23n+j3gxS`LbOOT2+C*6r$Yi^B?0dQLz!!rF;f<*eB zJD=3A!lPppN?@%*$w9q6N}nFtuglFVhBf;iO|~z>(K8Cm>_XBwAqtcqp8Qheu^1r- z;<>7eocX_?YMLwI$#NpD{2!8^_5TG53E;zr)nzFJ$K<{)Q-9)4ao3B?!Mxk$#mA3G zNX!UKpR0)+xj9Mssd}F)d!cOr&YHv!<+o>+@_95Oj2JECxqr{vR_6eV8Q^8nl8hH} z^^%4 zsONn`(?et~gjlo4lS$WZ;@%8~KjnJ0-o+L97>b&173k_scm=KINq|>jxiq>++wD>#TUchMi>XO|@X*k^?`@KBFygUsP=#XZR2BwDHr*Az<_%dz> z`py8W1bfM8OD!b;D3fmYNe{53AM5{|A&TdKt`4_4;aUUA0tVgeIMO-oI#gtZ{1OO` zzW}i%bJJH}9S41YZBJt^&H2CYfl~Q6|DDhjx_<{+XVXA_oKr`?i1uDih<0J-MSesH z|Jwl7U-%s($YT4UlH}hu8JfZ&w&C@%AX>`Yhqq}cOW(D259UP~s>?)@;y(85TV44< z6Fb2!)P7XUs+W9Xj+GzpdG;=~T2X!PJ6O*c+z@63wiw|NrIjB)AIcW$(pUf+q^!P@ z#tl>-Ug}%NUyx@?;@CaeBV799*6C)te?wEwitg}WSQad^c32? zv$ytumBo|B(wT*3iKjcK&`WSRHYo@?6LkZL6x`kxBJ5}n!?UyLw|`R5&iRzG@x;x1 z%{-`-nPl7fv7f9<{FG$ztu-2NMEqy|BXH1_mbErULHB4SARF;)REYHlR`Z$RwSM53 zx?#;)xMKQW3{85X7URK#t|P+hqznViDl$3)T1?bRt7Dz9%Gn951|C>8FUSXr0b)bS zD>KJgFtte#u6_G43y^5L>|y>hUq(udCnPYF&AUZ(4Fo(AmOr z(W(`AggOt(P^1^7k5!vxyRoaPYQOSx+Uj!rD9WD|$#d@+r1_^n=z}Uu=x(6IQ^Yi7 znfVf>`SKN6*UE0*py?xR(-1NN$zSRWBC(kzvMUeO(^8g*FIrMyR)D4H8K_?$Py4hf zRK5#Ly+nuOt+sdZgR~VC1m;sO&{~B@d09g)&h-`$&FL|JAX30UJ`j*2Zugy*T9;La z4#ZY`H3E(f0NWFN3wWp-P$XY5r@D%hWQAioo{joOq$xb*3UVTy(!rPpATUEwvoEg) zZwKw<33*pec7X*wyg_=!Ueml_t4326a3ksJ6ku;sL0D`mQnsyel7X&Z0uWK)J}2pn zd4BOuCIYOUKu&T`b^UfMRh;b&0+%V!#=MDIHcpX^weZZQYS8r|hMjjgODKZidx^wy zOhU-ir6yPbpI|yT^W;xv^H-*jlYYKi9G8+y87dkOcaX=O2cY$QQ4Q~*P$tZ+ZXWRu zsRFAP0F45wYW8sQOFJ4S$}B3`SChir^*jFbA0b3sKqP4ad)0(V3NR@L!@>X}cw&-` z^&(wq%U0tv_*oAl#JG&(C(xqySK)B<)&zV!CYkH~ z_<9=vaNN`0ot^+72IIO2(TEVY*bO7P>_4_OjqLEMNt%y2ymbg8$*`<0@XAoKwP9l= z{&7y1cBU?of5rxRARr_UIOpC3lDPj$=BPTMcnw`t5M7@KM))kA^x$g+miT z@Zb$WLUW-L?XPTe;L7ESzUSA$GIUNinrOLrKpXKa1W;@RpPLH^av!<1nfhg$v7K}7 znU!WN!2B4&8wtK#t|*4TepagzSmTP}mn?;_zov&s+Yh!#2R<(CKsnni<#rOcSSuPk z{zM_l-O`*9G6R{hvx0r0m*XFl%L;)*uGZb+Kg)kwUFsW7T*9XkY@DN96^nsA1NuQk zh})aMTuJk`e9&GScX(g}5L}*I?vdk4CYP|j$5@GCPYIIDC)MYxQ%SUNNZCX78eSl8 zqDTq;c0Ea}c}V(tfW>JCI%`Qc`U|nh(ze*=)jR z?xXF@*NNwBzR3Ri2hBi@Px!*ea1i#8Ondn7C2ScBu@*F0c1D|IPbv) zj$?oe0g@9COmM)zryz8i8v%C4x1Fpl5YU3Wn?d%w`O=n@8*`Fdx8R7e%dd7_Ln0)^ z(tfM6-^O4)X__-;mN);wvX7Jc0Qb!*T>A9`Sh1l!B5jsVg#l0a)U}a07SK*nI+}7( zV#b>J*n297H_DrG{nX$bSlA~jIBRcJoD4y&3@z7ZdNXY3d>jQ7NjuB5EukKK3J}^} z69>{g%&5)13};x=abdzc(;BXuEB{|>%tNC0(mE=|yy-yisJLeKL4pkF1swYWE zbHo~6i-#4gRRQy`h7fVn?pIDIX>9#uDp;K|T_lw?(J2fSI>~PMT&aM(fduxbL;nKD z*Q|7ZXsxAIj8JqpiLknYi*hyI~!skU-SdWL`yX|~v? zdFizEjx?7$8ynb+r;z7J7Uft{z7LyV52-~^+ok*04-q989Z?wKMWKEnPdaG9??y) zyb84VMRHG5oEW2=1CTOB7q5z_EDO_HO^k_z9%n;>mmf)v1P%P#V{Fl41skI|h6AR~ zw2)0>-}lcoG9#t5IB}z5Z2HBI6&N_1cLwQ2G!2e+5}Z><#ghiPfMYP*hPRoJY!W&r zp(JpTZ7u>5tthftHD_1;*yFU{k2mB1pn^^|3dc5}=h3qGDqQbmX)#`qc+;F&MZ9NL zz3Ax6+>AAdnU`a~mN(j=6}MLnq7E|duJL9fZK>{$9?6gs_Bg89$AS^VMJJe+22Nxi zKfn^mP|oK}&tnTy)2q@(tiTes9R1L@QVKk^QH^Zlp~I(Kshg8*u;sJq9Y^NU5UMY8 zCeAVYlTVldvkrKtp@mAbdbfsg>N0c24+}YX>(Nh9q>Ii3Dvax_mKJ4^O~Idk_{oxC zOe3=-iPx7I20-pu(h@{Pw(DFfc8lgjv7ccBXaq*)xO7!fIezXR+aw*ED&KId*l4Em zEvqlRoLj?d6d$t86BA{)PXPh~HqQ~3n7Bf8b6Yxoro0y?Fjbk|{>DNsc&J#0_&IGx zHePOonzqi720dG`ES`Z@N$sFL7yx5&wZdA+-Y$&`tx(~V2y>T1m1uUTK^~U_9~a{I zC_TEyZ(MR#&49F8xxH<}>wkawydHmHQ5136bS+XB6?^n^M!P@8hr$@=tavs&qI@yB zpDId>*5Lac!?%6*j4L9f+`C5t7V%dKj5i4i z(kp-of4phNP>0(qbi18rx1Gxw#D-^~YlU1ud2eZms*2$7!V1w%qswK1P@KbyX9w!V z->iUUq5)_+%`W-96UDjyi~d)00LDJ96fT!@Oj}426bU)z+z0_t3*CDB?0oq z=p`&&Uj+NS4$$w+qtHIR@`x*r zGrZhLp|*ju&ekM5?w6+{=|QvVM?eCle+Y{})}<8)oL{je3e62XHu`Y#cievw=o~O= z$;q@5petV1p{Px&&~rciJd74|BE?-*!xwmJ>sP%Y@t)M*7DJO!`~AcNBHxw}06k`5 zz@hsn?yesqfjICK`2P3rb?@PP6A+daJdVV~?pGUP+kTiBh88F7i6|`4Wn6$Oj+4Dl z;_a~lMH$(>gcT7orY~KfT-c?Ju&1exQNl&J)#qU6*%!JOIYKp+Bg2BEQEH%M=HR0y zjKaGWyhu-t!r?W@8$Tka`V6kX=F{Y@6|e(;=TY-kyJnA}J8fsY3285?$HH~#{jP6F zyBT-dmEGq)HsM$`cw_0i5#6 zCKdz?0Q%v!;=Uw)cq@RMN&K0 z`T!daM5jb}wWkW5iHKW1{JO&uXoC&lYN2xfxnfF8DY3hjTI?C#1|5}YkFg70CtzZ2 z>_|R7fbN>o0WztJ9@at29O%`JoRFT{>*B%e%qWmf&fJnvwp$&rxDZ8iZ{wENWMRQP zVZDJjL|c96odWGV*ev~Y;bT!x4Doi+$#_6#k4XvFB~OHD`V0zB{pU(${9`#+a~M7NZ+ve&I)jYLZb(9Rs27dPYaco`CSwcL8aQo#r*WDuWweuI@-8`l8!hJq1V$tU zrOaF`Ua^i6{v&}8?q-`x330fjYM4WFoRFvj=Oi}nd zOc>ZGDUY)Stq%)_-Y*`tpjD$JkR&JUhy*~)*)an30k}$opeIlE;))F3k4^}rI^*iY zq;R7m>tZ%$0A%oEdn9|(Y9RiIkgCE~`S;v@sOE)l4Ig#JoG$~UwF%>Lmg0GQw)6yO zmx{ppeWx+eqv5{!-S%BM4{8DVZG#t>Q4l&`k7}jWvSS}38Jj%v+U4eAB%R_Z4}k>t z2VUC~L%q=A#~H8syV4!F2YgGEUn&?r|KKu+UY|F5Si9h+5u|%ppcWT92SUl(DU~{m zvQ+S4v*Y#`@oms#wgr-@5906EnOvA9+k+E$lL8b;w{2TyBh%hF{qEZ&QBD+P-N{9e zuE74U@-6y=?pUzoC)mQoEiGkO&JXan(qa|Egn{%X7U10|%=z5PxEFf) zaz<1dj)8h!M>T&rV4=4%eW#q84v9!ehI((Q=CW(U+?jz!^o4E2sc>RH6r-%r5|ce6 z2sGl_9H{a(9qy1TaRw@CwT>1wB+{G%uxPB8yXGAltDF6q7TtD zS#C{5TnOs!)c@7wOs)#GrC2OGc7@Ng>d1LR8g@%ZF%TCf;Gtn!_G}~ZSbrm{{6SqF zNAs0EmGIe=AVpexwu`3+f5FE#UbXvjU^|Ac9I5};1+eG%M^7membiH6OxrcFjPZ>F z<+ez;bF*Y|aTT5Q#vYD+=nv11e#bbIigy0uC)Kn$vzx|bTHbt1!_wctmtNo*cCcSY z;VJJq37@nB=V7B68wofwN31ZbdB`OJDjA%MPyw7Fb2~fr6)-wu0HGs}JDN9~3KG<8 zQsr-b8oJ=)K1Aojy1p+rz9|HdisBxrsx`oS zX!Ly{55e;;rM1v1@tJ$tjwA)oxGm6U_cKQ!B`c_K)O{KZ*jg*Lv>CW=b5oMwbNQ<5 z@7(}w(}IY}&q{Y*wGc5J{4Oc!`Gfnb3pLb=x$Cq_Q){G>8n1naCoBe=-U~%~ zRa9-GS~ti_UirwQDLF=zCl$Tz;j3Q2#ej&8cM`ILwJKjEk$6(If6l$Wg&1jn%UI4A zbchl+Bgz$GdFjWRnLQczTdk7su~J&iT~KE1Zldw$ozu0^&q}|TEq(}jZT4ZP5qeKP z2Ip%8f#1_9BDOC9?pe#7;#}ZoQz9~OG4OdI43u0p5KJ;(wmG()l6??j-0KFlh}PGn z9QfucKHNnxfDm3ix8#AO=2tR?;5{C6z@w%%Q~b*H#J+q`@OguT<)<ALxGba~iv^40l$d&Vq|L zZ?vcR&B2jFm(s=+fK736IlC+MGPEI z#zp(MN5EVA#*B!>K3(pUV^`_rl{vHC%3^fqbZstYwJVli2f3*rYVnmzxOjE3{xH6* zw!?D*M%||u=TjoN)QJ}GEX@h1M5;@rg3_=2|Nne*R|PhiWvY+mb3S_KXM`N(|3O=j zRdfz9_SXGt5T-)YDK(7&tHaL-2BFaa4M-s7m(|G?y0G`IuFQp!>8T!8`s!3#`S%CCCjDuOH7&z*vDOx!%xM2B@JZ`xHdqJ@DpfS;WslQWt&YxWXzCC#59J4XoG-<>8nKybj?XY;0 z*Cf!|W~vxcxKZ1$^Kofxyy|;q6ZcGzXCqtARau{ax@f3)sL`UBb~g^80pOt$ALi{w zn7+9Aq9L$eYm~&gG>gVXgo5 zKm8>R4F#pOZW6?LV%e%H@TzSu-<-8>6nH95Cq>I8s-mi$HCk6!7k3Yb{iyiON|UYo zxk+V|5jv*X99@{Ge^4gWK;~x&yle!dlFQ@Iq!I)b4&1`3D_$bg+^X%Qss&aTx)wU4 z)$fAGm~AJvvC8l@2Q723bAWh_y;C4ljnxm<%ucvIVS@757r}kTOkJcF_ znR{!denpkWM3IhJM?u`ZyzGlf+%gm!XmhwP*AHF-gaVl8z8Oexe`*!tBe*+&a=+9Y z_Z0&=D^70L+9j{_ZxzQgPw6i|6KI#OnkC%{Dn{}Gdu`TBzFM08>2qOEHHc+$jv_x- z0=r0#GSA3-qP|HH-rst@cugX!lpnv(wmV41J7C9A8@-@-YcltFqr)4;@&2K>X_==7#gPf6oX(9-`U@F+Qqs#=3UyBi_-f@;5J;fC+rpYWhq}5Cbl@YXNToyWwj_ zo)>|iovqujSk?C(@NYP-4?~d0yf?GM=RZdycwdH;j%K>r|EoP*xw%zSpugMz43vaq zTcQuNySb@+U|Ou5W=+%6w)?k*eMWk^1!Y zjp3jiuJ^L}vZYjRZ9u#%j?p(Cw>u>OyQeXC4dA{byd#V)$3qo(k}z-_pRA#Ay=SbG zO2Hsf)m|WDtJCrDOW!d}@i7a&8ycoe>BKpXGOkt~d=>(PAwF)Gv-sV#^e=N{w~1wk)IDvuTm{X^I?#Cmw{jVx5fZy z47tF5e$WwK3vJT9VVjIArA4OKg-KxA0;Q}wNLy*Gdv2unfo{|`@`HOzpgj8tK}|+k zcY!ao{#I{S9@wQZ!%fb$G7}!ZL`r#@W$BD|V4-jdH9>CY(@o%ZJSX>an`yS5ovpV8 zDrInqnsV(k(cKQFszjr=A>2u2w%#@$GKIgMt)ET{?d{5Ke^IBqfB8hOVmsJmyD&ni zRtVf&);M|ib$B}iA)8ty7&}d;54S5DSmXPFHEq^bE9(~etN(v&Ah+y*VOq|=E#^W1&7 znyqV z_Q;(Le`xe{x!EIqjsim2iy4bs_VNY)PDn)Tb9sE*0cu}JH{NnQBAz0_3&uUHoRe4o znrkLAf398dIK%I9a?B4LyqB%1<9``?ZEhV5{90sVjg@3r1nq|68>7Zrz+*ecPt?ZM zpSN-A8|*wF$WMf3Nv5X!HK`asWpn{u3O>8+WYD~GZ zS+t#2*dd?LvcYV***{LJa*Ri3L~R$2h!}RtT#o)_m6Zex?uHY0Z8X4?yzJjm10x4D-Gq)}%0otyewqS1|WJ z`2hb|x9wX|GBt0g%WQEeLb~Ya+_JyJ_Myk+G7fEhcD?TVgpzpYDvk-Iiuc+g^3N87mRk%-wIA)%Q-GH&W%7PJOE2L(Jl!VrQHpQ z9|r3O8+v_L-EwiE=vOhFj5lGV9uWpl3G-7{G}q&9`0lYAI%EA3vA29D?oqT)Q(?`Q zAkI zn8`sM+Qy?GJQY-onhfDP^dxm%56*YQ%yP?YCq$)`9#}{?2nNxrnBuO#K+6lkVI9UOz8QZvPU5Dca>hU-&M|<1WT?d4=kV-s3DadRQ1hmqG7;!) z-KGmNA+Mf*$zxmKP`M7Y@Vg5A{w$KK_?{oOT*))l8kade7zQj?#K^x~X-06%IXTqbxO&aWnIyclYlr^5>*Ln@!O-q#%Q3OcaYzT7DI;_2 zS6Yf9ldec4DH`mfwq*$c%1&@?-}>Dg*+&ntX{~HN=XE&Y)(p0)=z$pja4T`pq!>q8 zsS&4g$P+<2mz14>E9We!FptU|XHRfN35SLIRk40REBG0Qe@6q8Dv2rGGUwO>E z4OAh&Lr!tLkDI{0ei6=Gb4lcig1z@-Ld^l`(&*fMVr+cENK4>r&@I5bBb^ixbhP0Z z?I1NC?g%a!JvSR6I8X?b7gJ7INNAD^)n556P~Y=KG8XwjhFeAFEQoyC&6|-iS@a*) zupUpek4L&@TnW*QqF!a_isKWx9UKRSJ|s(Jx-Z&)OSK$NPu}CI*iG8UePxEtg+&Ay-gLbCxkU3OEmV!=4nbdkR%XI)_!1J%2KgAoVzrVoYGes zKHVx5vw-bv>W4%&mPy_ZkgDe@eE^Rh~g`u@;k1jCP9Z)B+2hexSbY?!N z2N%Q{8=NR9zFj1dD_EiDKeq7!EDxdvbmL?$p!F$qx#wWioc_53pWG%j*L5Txj*!{F zIUfwHqWY*q`ZXftmf@BPgwTeK`sICDUcz~@K(37bO(IfA>$NRIzXe&%`l8^66DAi< zxrMU|ybi*o#5Gc?7VvN}hXubE+TQmQ;E&IUZn{}0L&x(+4P7Z=5 zVU$X6(Hq>wGDyjDBe`qty43XMV(SnfUPS=W>s#n=7e9_n&-||*tXdb#)FoXmB0)N- z_Dse@`*m%1=XqPoaq3WdWe9xXjDWzg&Wq)BH_-W4TuKumwl^!2!?wasUzXGrsfe)5 z5wzoWY>rk{)|F2_>ott=WGQ?bndT1c9bFfCIfsHx+iPTss^MDK9YvP#rrKQaEjiIY z|2@j9KkoQvphIG8HZj0J=CLJE&ky@R26hxCQLr7&WoS?Y#jT9Hh?{V-C-hl>N5R?z zu|VL_ox^dW=+kRR47pwlYu|@CbK7;urITg;KI3kFR~7i6FEOq(>#y1jy17Ne;tn$a zRR&tS(2w**w>t{`0&XcHctIObYuxy)Ji@>HX%zNI%}Qg#3TwL8MOtvyNC%DBMO8}c z=NDh|gWX{dkDyziyzk3{CMMV$cbPEjKgax?Z$l+lV4Fzxe8Du%v%=El@?7hwe)-|B zTuFhTB>%Ie^y}N~`d4HM`?x3siS49_f!1dg^D;(ep|DAI=v{sZ4Qjlrl$J09{PQf^ z@jD$QI(%yzUS4+3Xm1UdOjl{pers{lHFiKdglVLg_x(?ox@o60_ zS0zP$rW=WEuvacEaCcBlyYc9I2Wn-0aujGE&N~icY-jW&gWZ(KPNxISFU?*+3;h=c zVmIS%ZP|_d+zr3{pc1heG?k6GCwfwxO`l9BkgljNqCoR={>hGJt@|`3&_rPVnB*0*fP#G=JA!xOm|`2&oEv);)ZjTA)Dhr^CnhoGum_6Y&vrCOI5x-}9>7)FIx zJDE@jq5`Rna!l?=$B_ZUGe5k^I?QZwY*N%;cEPm@Sn0$ZR(#*Q<}(qKizn`G*j6_A z=!ks1;a0l9c{7DgMo;u!Jo}GXET#F_%y#;JAz=v&$jbM>8doS-pLjfz!OVqICEcX zK!3J_de3$Jt}X?eA5hA{YaFCoRCCdH788WOUHxAJi*+m@UbE}fRg5$4PibSed2ih( zS`p-rw|39)99_%6(4$tBpE5>w6+WhN-HEoJ_;4NnVs)zE^hR#ZrC*%Lvi)67bY@$n=Y3%Wm z-5ZYf0L(Bf^yOy{e^n>2%zcWgPQJEfHoP>=*N;XWwZU&B6>;ksalB_I*jJ?9xT)kZ ziup?3OXw{Ei%wn`3Y-m0meVSjs!y>;8!I}eWDEk{2nF09|8yWfNiz)S&SsZ4&TdvC z>%%bOt7`}CDFUc;4v_>$7cXm5)?sNrIV+9Vv?$i?RO2f|p~gnF z8?^{M*)G);8@g*v1(WAgG|k&|iiZvhu~E<{jd#6FSDev=m~X4Yb7sdPL@*OX6;D6l zQVI@m@J#UFR^onq9gTFMDtSF5(+q60NXV`QJ8XK>KZk6;o9PTJKAqF@8Xvl)IV!<) ziXd)qCQ^7|Sh~(-`g_jA)wA7)YG;;p;(QpidjXUTuMbEjM}#g|&Qx;FK3m2S_@C83R9U z*SvsS1nvycspG`u!sG{Su^G2T>5*@N<&_B}Reg8pS%C#M+^nbwqn_dDle7?<51hCB z87V>xIGoX)@j)`K?DTVs7A`E`t`MOkjfm8+6lVjfLk5|H;;6Z1&=eu~(f+f?YWCuf zXunokS-vYTT*&D)vU^v<_Toj3R01MSRgOqJJzA))$N3@eWHdK~>T36q{bg^3Y8}F$ z)Z2u8mKkkr2EX~;GaTwnnb_q=s1+I!1H%Z35r#^Frg0!&?WDK6GpcgBQL< z4kxsHQJ#&{Ie-cPeGqD0(o}ne_P9ASoeiDt3HZ z*c-e3rr{f&f`M~b0NuFwXFl8g`C|v)blri&>{M~tmRXJ;kbW-YAHS-B19t3FS{vS; zX1v+7eq$C!oK>GL-p3X-E06hpuxb{Zc44$O{GDxV%ojaC5>W)kl|bAzq0NsfflR6Ar>&^WSCBwJZ^*Fw zbgox?l#S>TN#^dF0XlxIm-TR78l3nr^dI0(Se8I3p%2wq>6Xqb{k2<8_tk zi1KkY3VTt-`;*AQVnr2t39r8ZpPcpNBj6X6aVx?I?>lS?GtEHMJ4}5Yl#h^pnyi?` zRA01y86CT>wf@X~coxfJM3^j%?Qqeq~O-cz{EiW{@}oYBbK3h+UPSdm`|H_p%vme!^P4~5Qa!3s%phXo1V z#As>0bwa#rFZ6h)Hy~~&m^pfqFU7-3QZZ#i{^+55B{#OoXgvUrvnX>)T+5&LI_+*C z7o9ziL(v;9l$jZKs?JUTd5K87ChC8pvr@2Ikzm_3sr^}bdA6&RxGtADhZlkq-z24W zJ5*L5fMk7|mjR56=3GO1?<3LiaEyI6N**;J1V^}0s`^9(Eu(EHRUVWgQ=2UpIs(qR6$sPXsRE)n#ZKmE z0IcKDCVPo{-|iIFFZ8(KWbL~r)c&YTsY3>q(xOz;XppR!QT|ZYZYL1cPla;@{nyBO z05fvOky$3ga|e>7TP+dj=Gpbn=B*n_@zT3J1q?)|c zfqJ|e8DAAXX(HL%qk}(Bs#@>IABgs>;d!kZHjKcn?}9jf5E$3(vsxov5aZAARH?n( z^dDo}t;XBPycS@xaCEs$XsVI<4XI2{e{6u8AhB!{)p$Q!o5dC#h+N+4pXp1-w#4xx zI%^Ef5eoms8qIyk7-zSJB9jS!n*DF}`9nGyDiMG;{VKokDfMrm4rjOv{F_9GV6sG8T zYt@acj^_W4D$v5kjg|YR3MS-61*V*v(Xjq3fuQQoDC#jdQ*YmXwsf#Q35 z`&YH>jm87%9vhA&1(;j?cv1a~lNog{iYyTltaM~3p*}N8aW~`Md`or7qSdKZS72W_ z$QH(flNXJh-u8%(3)SC1^H<>GPP(P;NW5_{;~9}_zYT4{dDzA^P#V=(53bmos%kKs zxI?IeBZm$lKZM3YSadJ$ErDTWJ{|PVzF)gA%0)82G#X#t<|}K_q&N2=&^@E0@!UEt zhA&N=lXm(LEY8^S9W9V)ASjHVvC~GxWS%6dN z5!LdX^AU!**^mRx2s*UDFxRC5`yC1MvgB1nLC%WfyeHCS5@)IOY{Qz-E@ z(u5(IK@A*4@%X!8rRv1M<4@!V^Qb~nyGrfVydj4KJ$*T%7~t`S=_t;sD9Q`B%uvOK zf<1xXU7sLVU9GunoMSg4&NPzSNjt4$Uz)F%q;t?eLy0A#yD(Ra5g(r_bi}ZDhjF>; z%hdEkas|awJh}oiuTYk6JjvpOY?qaDA}}{zd@pm*mXGn~7;%XH;qnT(o(#_0qlO2x zo)V3hLMbkz;X%_Hv|zLckTRlQP>*#cbht4PSjX=Fubbz~n@j+qSOTKnZ3;_vHD;$; zB}rdKS^}+G)dcr8|Mr&-{`rJdp_c(R>G8W4iU$Gpu|Ee<-ZDex*>__t4K$S7a}z>) z1L*>MaN6%wKs(Z2Cf7`?nLahev=zu~MZ0I(^y-81@T}Pu-kRm#Dx%?1EuS|N?f4~K6bL0?=QUORs_`EA z%x+i~X4>2;*qdx`Hs^ZL<0R~SIHkiC zM8#RXtF*F!dLzgpi!u(|T3k@MAU}1SOVAV(1&tGXzT9_-rWzDZK%@ZSMn#B9#{XgO zExf93+qX|jaw83!P64I6I~1fWLZqd;J2u^obV(R=NOww?Af3`3O2d2Y`+j_8elxS) z`2%LwVtKo6754t_E6zC1<5T*osP~21#MI}HWQ7zd_Mt(HnpTAU1 z1r){AT)2IhB5|*M#3+7RL9`MrtI=fWquj|Jabm6u8nGQ2=WYW_4@>k;66WFcRD&X- zyGb|Hl+Qc6k6t`>u!h8;Vw1g2)ZJXab_aHXzy8FR$$J(n!JjS!<)PTwXn7edo?jW7 zZHY>~z_GI3OIJf0MMMvm><3?ryMBtY4R<8Q6duID8gq}d{kW9IP{x;1dv+UfzybUz zgE}9I6pwE0F})c4eMqFPN!!I7l_?b>WU!a50V^V6ieg~o%PiGB$580t z3GeU)YB2*IPKtQbWVw6ZKQz$wGk>W?HBJZX(yCI^*U#`bn3#_X#K+hUCA@##n;wcT z8#8^C5T|QCf6^>aK=Nfg!U+~ab7ZFK!GWZBNJ9fU43T@SS*oBvbkdED~U6ks^bVk=!kNUSrjZut-E<|+m33`L~pu70u9JV%? z-f{fzz*v*>qT9far|ziyu*xhJ85XTN{KZyzqy1y(;l~JeGIraBOfXqBaWeN?lEem- zFHXG!tywJ*h)O4Gypgy^&9XB~i~SG?d{-n&+#+7qjY?xZx_LUW^Ys_zS0S44NF6ZN z@rlyF537J&z@r#_HBUOyWPYz^P3U8$@Yy(DT{z}JaB5C88p7NcrUb&kXYH9R2Q%dj z#tw~{N#gtSt*xa7-xvd&Zh9~%$G$7SySYcW4s>9qhseU`%obCOfa&1#a{pkS`(8-U zq{{J=pOpzr$4|^xT}TwB=1`|Pi&+Ow&0fClkyPkKmUBb9^j3cinGf--ur@*H6;Zi> z!JTVxVmadie&VRl5W6pZuDpAqjUuaWhkr3->yvP|^kY!Ri$5cP@i@JeplkDC!=w9i zo*8ZRN-pbTt4SqJm8e#};)Xr~)WtQgy|v$wVb9O}a;I@Zg$$k%FrLova~B&FyrJM} zEdG?}_KnEK_E9&==x_I@Ri&TDLj0Ma!vbs2kl*dsx~YhoO^88D{ea#0`kjIZ4;cI8 zwXF*lf8?E5{CqX|0d{Nxl&eerF_f>_xTwNtr4bfAspirnFyG@T^0l&gN<+W*Jd zP5nk<$n6uL{jiXQRDXaAIQl1s&AmeB(=P#HT<+t+=n zq#Pq7WPDmI<@^sM#Tk_`D0}^0jOm-aAPo?@1=~PT2)`pgRsVvlF7N!q*bv-b|11rF zZwhAEg3m6GMA}ZDtFI`synRJQ1q+cyyiDCb_yND84 z)HDk0xfdzM?b3WOB2fW*tFKyY6WNu5ZuRs$TBGfxaiGC-gS5RlRBT%muI=Qf4a5%E zj%q67_1F+*LZYKVx>fd*bCE(z1?EU(>P)*PdQ?YbtK)tA7)RDI*n+@a8@KeI>BgLYU=mTf7owRwuZ9QV2Sg~aXIv^Jl23O9ZTVVB)I z?j!W?l19s7RZ@CuA2q)Rw5(9t(CA`x(4Z;Gp*yyHwUo}val=)V55Qh;JAHwT$&j!l z$NrWSx*6~tt&-zFYhcnsf5Y&R!QKaU=C*vUicZYYi+e=YL>a1%gdvj^m75(FW1SykbIwzw-A?-{aHnRUMe z3N)GcAB4Yfa8rlV*=5Sex3t=~{4C$PGo?)$muL1qmtXb(9*aNuIQR=66O>VmB;<-vhms<}OETt=t&@-fFvj zz$0Azv8H8}BPBRYXHXjGuzhd%Ec_>H)Bw@gOMrie&oEq-u7;JQ>K79(TGtp~rB=X# zMVtBP=hHWhbH>s~Q6CE8aec+q2*JEBTEweGt@^$Gyw~Z(IP7at`QLLw zi#VTz-na@J`JuD zCZj_5S;;izCH`~Pv8IW<3LS_KFijFe{1J@sJ=KuWuu=7Hl4yjpknd-EM+&5wD zOFt%6-zu)lP|GIhrJ8z_$uIsGeUpM~yAfAzC6=U$auiLvcnYV8J;rTX>h<7W86c0^ ztg3VV`layXJSSV>UjFaK;+fP$Q6$hGee8vPQWF?Y8@K@LXiWr|T=mV#AigDhkD3klJNBpe)>m zl@|?#3@|q$(g5xfbW|!O{a{-n`4W-+>Q_-As7^nDa}U!{r`Smcbq%N-qzU+_D_zyl z&AfR5)x%Y%M`yV)3?N6Hg-?mvy`7^#0@hnqzGa_M48t?3?iJ|vUS4?PVuso&(^mH& zqEV>1QtA>{HjrNFTd4)mzP3sku~ivQLAc*6t{zvs(`uup4^J`oa;7LvGIa&|o8C|9 z2LNn0u+4)rBEASb*HhqR_xWlHW3b0?VK7C6&YDSeZMNH1T-$2ECOVe1j@}Xbj(&6- z(>ty`nYWeRv*b+tW&@)qyJCecYm$L(6dTF@yVw=AT<-5mXLnFIv}}ron+KO3b5ord zsS%z(9|+9D{%J9%jq*NcOnXG;8y-xYKoX-B@YCbLz`F=7;nU*s@)3P>t|$!m@g2oP zaSa z;T0;mmD-wcx>r|&kK8ORpNc_B@-!?;o!MuyWqg zgQZD>&e<0;Mn#Znu*!XN-{q$@gBXN3HceIh-i3%XE9os>B$`pc^DN4xQn)}R6)aLq z(Kuz!c9gyeRfz?~i4w*hYgRcGw;*DyJ`4DS*Pem%x0Zvdx8u<1HF88aZ@A{A+zO}9EtLU#Ov{tHsC;# zCXSY+zykxJM+94qX4T%}^cwiTiu{04ONN}ThT6*KV+!Q$CQ*pm6aGn3D6*0$s8GbS z{%OwSd$br}+JrcT`Np-JK)>DSLO#!^La?%URKQqT_+8A4Q^3K?CI(uZ(9coI}WV4k1ekW(#nwdkrwJyACB} zb9DvmeYpCgRg-<;N_a8{xEiM5UOF@%`w%9`h$XyDYFOUhlk@0E8w}WJTTrcdYSv4W z7LyG=+0+@loU?{v8%947HXHFg2C?!b9-fZ?bZBN-}@ zym!Ytp|pm5Fpwyac#2!Y6euOS^EuB#ZMgu)*1K49;A9<|cL*{gRs+APXYf)f8OiL7 zL!E%ZHhLbNxe|iF^@>Qqm>X%p2e&vB0uIqbx|Ld?{pyJgamz?MV8fo%p~fd?{6y!U z&kG+>ba{nVh~H+dvHLz7>ScjABk7SQlULeQRbkU@2vbM7WqJ6MqW_}=FHrhSB`UZ6 z0shK11#as^uo?{94-8w)gA?B-H1zE)+&*;@t6nq>MAC-viW&eoU4iw_Hza=~V23Xr zc;?`Py^W~ShAK3l_)2Aakk4i|9^PcI28j7##=DG?ug@pGy~^?{$#Nw=9Y{^G9+LaR zYkvt-U3Z65<`?j{|COd3IH}YFvr7J2 zv-DpM{M%cgr^;el#0CCF#VN|41y?;&WUdx{Csp!kdlT82@hDc4N>4!e$S9$61)kou zp1?ddlXO{-hu>vGYGfwI`}z9g(|h!`4=qGjiOPeGfYQ=AgD`W?U0rUPz%bllmOs)b zdMJeTX)8McoA>%|4)|6qNQcSr6uH;incVD<`vOxJA#Jw3(z;bl(}SV~P(17~%zww3 zpd$R`8UUtl9S4VSAH~Nky}VFPg+=+QY}9w2Bg2Vo%(X5N=<~p+aySy{)||)1#l%f7 zkx$uoC5YsP2OA^m%=XOb>^IHExJTasA^t>wjb`VE?dgmFo{2BI_R*%Uy8qOPJLx*!uihpc%uTDGXI1$kzO%miF z+Zym?l;As=*zcME>Zd9Da8c_qO?@Y@b>Fq(E2>HR`REv?rk#^4EtIe36+9n{7dUyYbvFcfZQ?g$Pog?uMUGtZcMOWf9 zRNzAK46ICv@2MD)^pc^SH=C!~s_Szn| zrpBRSTP*S4BlHR4Y0X9y$H~qJQXHr~%xPV4m{RecR5B~0c=#hE48U*WGAt})&PZ$L zV*Irk@Y(kOy7}#0jQxmyPPDjwR<*~+lCNE(G=5{a@;9s(awKr8+l-(po5o`x479nj zF&YeHADqUPzh!NS#dDZ5^W2aoH9H&3Q$FfJ@t%1?om(G*vG^y7Opx#@Yc>&Y|#4lJcc@549frk1B-UN7Fl`e}T+ ztqI>`4RU@8N~o2EX%TX3z1!oi$fT64fWQVB*1hELI=T|nw zG4rYEmf0bbJ#Oq${D6#fpK$}b`;yTHh#~Q085LErQS zw(7jjfvBdp{B&O(!A=MWYXfAH9WkS2>!_gjaG+_k#^Xqwh{ze#SJ#st+NIh$Zag>~ zB9m$11x+|_5$);Pk8oOr<=WQUk}9>7mVMUzo@ZKgNKx>636C<907NwZ`J(q@F`4gu z0O%B!HW z;}7ua0Eh0C5tb++z6w3@e0Ty{=5lTb)k!WNv7VQGEA{4tk159Tc&5T<-0*m!c&IV% zDuJ!NRv6PwFR3ZTAwJdGTZP{!7g74LunGeI^8`4q{BZO36z~!19uRA`RUkZQSGxjM zNP`1kdH>0GDpABQe-YTGFmjN}{2_u8V#jnQ!U3X-%$dQUa%6e{6j*<*__8`&rwu}+ z#&r1%kMP?@Lytnv=D41iNI@3gRr-s2jfkCJxOxJH)44%kmAjLu6Fd*y61DvH`k(g< zf7>`Hn%LLiZYTWUcP-3l68ifI-oH-`rjEf$dKo}{Lz8JY+(>04*(7MtM!WqG5yH#$ zLzXk8Jpu#S4mg*lH^Kq!z}0w9vz;2tz#qNOFPRLi6V9Ht0ES@FK+`8D<&xMbXT9ro zocM6BF)z)?9g9!#Mo3Bx%RUq<=k@?@bXj*~PxS}muo^fzev&tL>!x7{_j#`MQkeKZ zF9xQH5CaRuk>%tU2B@(nl3l>;7=v?3#l689tKw{*o#QK;gNCsz7blKUIj+c351i|# zew=`b&{L`#O|N5ME_Ce}$+PCH#qET_Tz53;{l}sn9eyOiPbC5*h4{J_$r?q#ub=`20s4tfxzn?3~kHw0kLlS%Z&+1Ts_cEJegEmY${$(^hBo+1cLr{ z!2nLC1Xb~R`h#n5QXgp6rxB@39cS>*n$ZgY|8*19-MUT`C^{QUAiT=4K2Pa7ztSw^ z1*WQuIo`B9#1%aGPERRM1{GNO{T>_|F2I|vC?-1!uLp#i``Ch1V9M>xW~WtMK-}7ZyXnJmeG6uJQBy|W z$Zsa=fpyZti0|F0lbqFyh|7{FC|3t1$c?UA%h2P;<-iICU-}3HZUATG1*Ed`C!*-f zZ*VhVEwN|2h&MSQ&nCGLdM#0Z>XX$;9=jN>9O-=(Q*n-GMS(fecyG( zN2|&}$BBzxT-pD`nO1q#+161cP#oTuXaU#U_XyJ%b(Bwxz7%|La1)l| z?9c=bqqmN4!TA0>AQ(>MHhzZULEoZqdKi8Kl>_lz@rz~T*6~k6HE71M3HsQ-t z5MvL#|ARmRm1pE3KFxrd>~@>x6L+>t80CVBj@WUo43zBb9A2vwj)N;p)=g|Gpnis& zFZb0%00W%m>DA60e~)q)qRcY*UXDtMc(%HKe?lhX5jcM8(rp_0;DEX4q_y}(dN`Y- z0X$eCO3ife@I1@nzsCJt*Z*vK2m+GZ=v$x{TmT=z=I`Ao{@L%!e3P${b2h$yL?``eu7H2Ht~H2@w$qE4>8 zHu~Rx`~SU{Eodujgu!m7ji_uqn~RdMG*TXM)))qSlbONV00`{B=e%)=NcXt;(0fE#csn1S_d9|+xmo*t zoE+TSa1-2QO&ykBg^vM7r=CH<>o_lxJ3P;30iUo5KjaFb zuK8SaQkzt5)hQ!=%5Vr5`4ua3Gx?OlK5!01;$?kZlvVP{&NmcP6mW=z-T}aet?&cUQ1nus z237{Bk1l{TYpp26nYXCppCa<#ozpC{42DHrseLXCIx*qQULT^D->U)GvmTyXz}^37 zzp%^!WE>wKW4op&p!e#&htxs!6^U2>J!MfAD)G7$8P2f#~4f%Bbz}{1Iqb2u1RC?Bx9uaM6)q85h5X-@)GW zoRhC#)dRK90?9b_qWP7&r_O&@Kh>qy>%U$Av5z=G{MG*j)uE8Ssq3h__0gqtkt8^HtE(i$3dL7 z3?Qqyz(FK6+foMmdg}=cJfWUmQZVM--Sh&L1pxqGT0QfyA2;iG6E>v&p8|z81V&Ck zQ5Tu#Uqf|EHU41lR(uT#Vel|=a{|q7;nu`J?;9a!llVm*9 zEWl@=r%oo<766JuyKVUD%I-5#e+VMXttAYnNpRUTPka|vr}9p1%YJ}; zh6CQWxJED8KN}Msm*j+?=WytP9LT2q0m)hLNaTbrh<_Jlba39hdh>7o3uOQe3$ABW z`DwIB$Z-e>n^I)h-?`L0!8O$Y+*D`agijx1{W^9la4ITNQ`2X76Sfcc@(fh8bprTkad`Jl5W8QcNL|FFyPL+9D)$~xa9T@ ze!2tLGp8$HIS2`>RdAM%RQVav7o7uJ(>811NO|`X5;`n?ZPhDz2{!B=d)+cp@3K29 zsQ#4VOm^ltFa@?7Eg~Ae69<`Vy&SiGE10dQSLUDTYUCI(>A)HSFLo;-EWJ;J8f!l< z6L@s{97pK=((SQbhQn*&*U6gBa6f<)?k^pJN!`{;*!iBA^WO_xCQ7n6hIhOu^{5$? z1)6om9DY4Zp4?sz){VOR28!^6J2_kGzTkGUIJ0o707hKj9*POeX0<5GGj7KS?=ZW< zbF;}H7u?kc%^slX>B2L_`$}hK21UA=!r?pZ0cdV~d5n{pESxu>!qrVUGXO}vB=9G;YzjEzz;#p#?Q&OvB_3qu5|aaPfTgVyoD$v$ zfMR30jlJ9;rl})P5p4I=jGVZxj5&yo?!VnqH3_=I@@B|l8sRud4{kcMl=eM@mo%)= zH@6%|dt3sSs1hXI{d4hYu`u)!O9Y&Ct-no*-mCnnKiSFsTeyIRqxhWVQ45J$i-MD^ zk20uM)YI*uBLVoMvUJ`92>o0=IDN07rc3iz`(d~hO8T5l@P{roHbz#+o-Zp7X?b;-?4zkj)-wQtp$4W^p5 zRp}2VN<{W-rdhI&;NMogSc*E~(tRrNG=)0FZ8hvx90w7Ea#?XY; z@*!V0k`#g!?}B|!94t)tk;{ch?tbR_g1lr$G5?A)ki)Ab#QYf>RuAe<7$Qa;U&7wI zv>yGrF18V~lpu?8mE?D8N(bNhpgUQP*wuB%bLv~;U-sIapE*U>2)QBXC%mI--eMQ? zvE*@U*yH#0#xo(cPD1={+8V)+`}@EUN_@)d>#r2#?~7SFQF$gTtyd+$y{sJ%8c_nu z1D45;1;ZY5I~yk{b3XcHa5>|7w{(B2r!xlTe)J3#H6|RaM)@nG{&WcwI~xb0kiSq^{Tj!bZuHl6f0h!Q7=i%}>4 zv|5T3+v%rk7e)%h{+#^e)KubH%P@0l-XW3sZyU(AV4ff+c)2uw?j>u~Np#q*HtJi5TwS(qCZMnCm>9U{uIvv9X%#4>H2^RgBXrKlnV;FFt zJWz0WI=6Jk|E{p)ZxA4f_z(N~wv_uoU4BrFzqyC_BayBHkUKG&D5Xb&N}%~&v3Q#s zqmdgw*E>y>R9j!>pTx22AB^N_F^q<#kJWbAVs z;MF?hnHywSLW>w zpAL{eH--Fw^b+uVRyZ}8s|NxHrWo-nvz2B+csaG;=bW@kc5U47_nW6x+~kX#j`6Mo z9k;wzX#3noA@9MO8+mt|-3g~jn3JDe;lg9miBm24sA%%vzJqm7W-N+I$cERB=JJFC_h@^QPe-13)PipBorpC0|m7UZzZ%q>x&AExc6vle&z7{3dFA101 z1NqZ_7Vv}T)f;=bF9G79=AV(-H@}!3#t=)-D=v$z-dK3%UWWavQ`#NW{QrFDbI)R5 zbs*oL&}|Bz?j7^QuNa2u38gj%doDkw`~LDd+xlQmulBR5j+jYz4ISsCJR8lyyR%Oc zzYi%h$T2=NifR}8YmJd)@j$3N_qa?mIc;ZDQVV5;Uv-fa{XCmL^f?#NQ{^^vZvIXe zQ3rz9>HA|F16YhR_acYp@7YqY`b%9Ct=yaFi+2Jg;!P2Uu7^d))fZ26anlo)2w2`( zpLDj87wdOFxq}z^t=BaFhF5f>LP%~Y5n}I?#)`%`LMw&@$QNc&9dB#NDM@WDiqBTH zTL6U5O-FOYk{jzhtcm5eWIfHaN-l4Jl+hQDHi9_fpCb6_@=8Dp3Kb+BgX)PH@)h0jCwKY4@!agrYhqEk z>73FsnK;)llk*kP-=AW7MFPve+5oM@?<+oCnNiEJBKOu~(d13UVZG^J)as1mt*rFL zR=?^|bMMoWzz-NceRw6{&7DPlOWaX0nhm=JIo;Zl@qzN(U?`JzV)TDX$dV|aL?-~z z4o%XH64zp=i5>Lh2p!2YJ<5D~D0@*#!2ckob?b$@hrO4#$n_TY){0bXxht;c8?^ZfPJZ(Gpm_8yq;h(WM37sY0;W~*o-K~rX5MP{D|6 zSM~i9N zfv3_<4m`V3orJCMj)%&wSwLNwv%bSXMf=7N%qd3f_>ZR1##PR&V=nVZS#;oH0@Uyw)`h&3w4#n-}&v9Adyua zXb-?@T_~Q;S4!(8yzgR^uhhD!sFzSJuKKX?P34`9_#>dW`?nr&6Cz0pgq^xu?8L!c zo@ba15!g9=U2Y&*W+E=+&}hXVsT3ZH#2&X5_(_PdxZcq`v|P=st=xb;Um#5JbDZS+ z=>{%)rIfthrX591g^!BR)(bcFDyj9@M+<4~HbY|MCE?FGEGnx2)vLg|R_dIw{@!sG zk~(2InjBHu*+{dGVRhV)%rqow)`l)7bd<{x&*ZktQ#-hQyH!TyexU!R0K5OH2|V?i zZB^fX7Va=HGdo{$n&EwFP0h5lbGBm0AX~L8op(r^HOQ2Pn@!(Ek{&VeSKu)fTt|tT z<)^0YO(S@J-F?NOeqz#%T7CPqQ_YsgOu-8OQThFmUotpixMicn8_4iB!Z!Y|ZUL8RXqx4N~!TTxTp zfl&pKVE5Dxp$k_$h{vFkGmN1HLyhG@**bm=%{sBJ4D_`?3{#KJ{}^DC|o~2SjGqtXZFJL zp`O16fvW=4RscDF{v_Rfw9S&<+Wb!lTIy(RZe~qe)qzi9_aTj4TV808%~`904^Cte z4>g{It&fcNv97KL#-mV41td*Dg@3Zp5fnpYZ1F@T!Cqgb(O_895lnVWF4ZD`ujVCJ zV!9eXvY*`eR~YK=PD%$rhf;PNYp&eZLXK)Gr(7XR4A#sw;;b%%{{>8MGl7w^XIlBX zPLivf+%;+JwGDEpWX$CzIvI;N8*%}gCJ!<-2RC^+FpcP?5E*Zj&7Nn$WTqtZW>B8q zP>sczYgl#bzvZZ^v#Evp>M%igp}Y?>E}}N-J@PsUPDTy#WTu~SW?@`%ZN&b&Nx-j! z1fkfxP=rVM@;AK*FGV@K1wNEpf)=Wqj7-R?$W*og0{_;*R^FDbJNzZCbEc4>-SKghOm%Kn#RVZQJX>?0 zIq=Tc2d|%%#y$7CcLOuq#`EeB3@C&RpAl19ds|?GbcRqTUD*^tG%=`MMK8Y$l?-y$ zvx2~7XR^x3a(sxcu`SnHYo~K#!<%M?e)$b(pW&SpSgs-s)f!zO|61g6w!6hC!JL=& zP`cJvnH(8D?TfBy!znYGfmBYCK#3Z#!Ck3EXc55G^wIu@8nHZ{ zd(XED7^-vIMZQ@vNT>^c%P#G9b-9dcX+bpPM~k=!A2xmVjk_ZxM;op6D59A+5H-kN zf9AF;IGF2BrxZn4`J2zy)4W;pOuZ{^ch%khB0&i(D_P)40V(4ta#!CUtDy!4hW=U)0=mh2s1eP>HB#IXr>zx~W4a zQHCM!iMxsOAD*Z5hR3>s?wd^=Chw0+kmxIw^=a1=lbellNnYA8Y}~Q3;RH46w4chF zVI9iqq2^y(=_^z-I++I8{!DaW#*c!Z#DhrV1_(Es({H~unWur z2}&S_he}*;fJ>^DTl8xV{YNjex(Ei0Ma`BG*{XxuWw3BkG_1ZeY9FjF3MV~U(j(XBMMw}spoJL&hg z`w9HOHd<&xJf4yCW^m((07hi%ppR+nW?X6ou zr1PpJXJfXSlnk&RP-C}#ax$o1NDCI0Fjdd2oc|&T2Xi&2i=5W;Pa!Uee50Yg18l5% zo!!}y3O!jIHK36&??FUr#)+!z%hDv^mc}x^c3g=k8mX_Q_x>5)6Tg=4hRte!;7-*l z1#`tP)Ws)c7+HqW?sCCDqDo^%9_XaBNdL2Q&TI{hFq_0on)}owe2{bZBV`2AT|eGK zXTOW0deqCJG$`m3dK&!GJsy35R9G$OX!+*cv}ip;S@K%Y?hl?baeoJ%vnpODKEE%z!*9MJA7p_ZVC1227 z>X@_#han+9!D{17SIdaWhZ4Ty(TLLZOT6s4NJKBYv0E8JLF$?4}6V~^k!#YuFrwt2sEB4@UBQzl}vUjH};TIk+mQ>2dk zVJo2%nMAAScmp^U(Frlihd5+6GVWbLIMNidpnm*sI3VeW#IN>iZ3Uk|;Eu;g-4u01 zx2)!kg5n1k)N`T zmot;6D37GWtQwX|C=JQca#am<**Xjki21wnA3A`@CwQTQO*o%aM&xvO_blj{q!bt2L*(yl22U@7yLN0C%0rzGTh2UJq01Y5T>y4m zIab0VVj6I7uGjxgTf~tqYc3|==gb;!agDnbLwjgzkh4?I$Yvl@8TG>Z7D%A~U64Qt zxYf;nIs9>>g@{mR$gPeo<@B-1#j+K?pzz$?0jX(5+*E@7oAfrD*77YyCnp~D(3cjj zLn$4pBu%_y($)Fb3$t9R&D$k|G(Tq;JUucN43x&3-sblu>w0`y6H1N)jyr?7PbZe7 z2Sw7Wf-HhOZa_I;bFe62#1c#PmG=zOuba-guR&P~BbsIxP7=1gP5FYSf8t%ADD%*t zIJK6I^hgI?RI;9Z+qKj(FPNRKkV71961rv~m3dkgz3C?%2l_J})4z@84Wj9k2lqyO zd2wdNliVZkb_CvkE$F0`)&D{JL!IvpDEvN0NvY%C9q(x5qP5Lppuofnu!YcZue1yc z?-J@7ak)_X2!_w+Oe5WQtGa0fLgeMJKIt13!A^ki{G6vm{s=mV}yS7I9t2yb$812vm z$pRZ&emCXWD$Rz6z#jj6Jcp>ScHazVSvRUHr(Btgm2s-?F!6QMigSf-R4$5lwR*RD zHXkpmnL645$+_p%iPxZIkiegEqKL&Kn3S`oK1EDBzdQ$2{LN!xpRrU8a$l^)Mj#Dp zCu+{)^1bpjzMKd_NAf-G$A9?uZbroWPV1*@on^W^tf`4whLipY>m10|MNT!BQ?`sV zhMWA{1GX9>xG|u8Ct@Z|5!O@a^pOf59rHYmtQIuFm=e;JXKNiedg#GLHD^3GyEogC z)tfUfdRo}c%Y(v5FOLP48O1SNAvL<)7-C?fx7`E;LWd2m`SXN2;sEGrR$MZlas>8Q z(g`+t=oV_&?~Yhv+o6GQ+ou^H7I5U7^3Q+Ia54^cydkz9AK_Jn+JHWag|JO_$t%6+ zA)_3lVv5{WN$4S2box0&c}<3W!e+CBH-D-VMGP)TaIwj~C^8Hms%cTrNT zYSf^^@4PSj+!UulP3tCy)LhG;dl3|3HZ)F+NBD&0-FE`JRz>L_Xmo}KW~49-v2Eq{n)qIYA=8EOK#o?aP_z}teCHXZUoh~4Lrs5A>Uxk5g0ZPH2x3;*z|K#MzDZ^+QlP;@ zLzbV~8xdPJ$e&fr-cP;jYwXng(P36O#(7s$&*pVj5RQUsuXj8z*KGT>7OvNq#>8$&L<6Al%^we z$M`A^C=W^258q#b&VL-SX|4cfl>UTLO@KYSZa-^emTIsLJu-HWd4R@txnN1$(*`>Z zNMOM7*u&0C6aUR%8YmWDJogCWRO9uPl_t917{(#L?i_G2J`j`O`1a#vv;;){sD)yw z>cy?Lm@8;qG=Rd`f}Z@n-|_v&2l-7TtVh@#hTplp!-4c#4~c2X zd@F&wR!L|`3MdXQxuV*epM#EHzFZy(+>iebN4#!={+*g$5X{F%&p;o#YZyVT*Pxlj zN13GlNVk1<=UYF=-y29xb-BFr;VD6T@vzZ470VwCWD+cv>rr}td(#NIX+3Rg^YFbL>uIJC1KeW_ZsB0;1q1d!6jHrr-Wa1GQ4~hcS zWk&a=&1^;r<$1*6%-QO_iP>garEzUSj2%C4XZk?a`AEGBbw$7dbf2$Vfgp2?5{xHC zTl2K6T*+%gN-?lVA&d{fnO<0Qgi3!eAL*p>A61n{5<&w46+Y-2@=4sui(#09w zW~rdKD>BU;E%I6tx|PT^oI_PnA%~8mQ{(ds>+ty)-@-<8vQy74Xtp~p_2%nBGkJVQ zsQt&pBF--?2}|cqZC0qyTtydAIEK$$t8VA&>TtO^^K~4D_zY^#Y7%ZI#s%K=&?hjGSQs z$^+bWXHeoco4rYSQKfuVDQtMcI5LP zP%C(IJbE%Rmmc0&bXO(o!ivd$QS>HH)C~=pG|^^0ulLYXH`U%0@jSMo=%ndJ@xNN_ z&slARuas6g1yR-_Koc^@24QzAFUEyd36*Le&Wa4C8rO`QG7oc3UT!X$INzE)S*__U zY9yN!m?~P^JuRgrzJPZVK{tzBLVqL>DUwT)nFVz6be{J#7!T~nctJYWNTVf0dsb5Y zIWx|f?8TznT!D1-PZlND*pYnPV8mrhgFW0B@b@ z$%vM(+_JklFl$>w*0gng1d}xB5-gfWet@?2ZXtF?PEVjjARkprWh3K|+z?+qfu?tI zuF^@(xEz+V zw@OtLXODV$;ph4`w1`9$A_^+Mx-$o7VjdNutokd_{+-lCqbXgsj0PRqUPFY>`%Rsx zr`DO9px8Ozxy+jT7}E(C!yN)xT7jhK8|RXw7$VMm*KzOi70B?nh!(&^OrJm)Mv_ZY zRF^0;u;)8+yZ0X;!pxWpi!3vc@F|397<^kA{7kPM{C`wfg;i;$iX_0{+CZM+c~Eao zc0JVblWTg)JheTts&mkZkvHfoF*M{Mu^~5cbdE+ zG!^l0Utn}Cm(z+7Ile?RdJEQDFgqgSTIbFqX50;0V)5@%?uHMRz=S+ecIljHW!>O= zZ#CV=IWvm9C!7jYzW`tlvk}>0PI0c{q+84`7HFc__K3mBAQ1qDM?P}e3c@O#fI6lg zG}1?yUOZ(@lA-t=%dyxqlCLD%dNB*Diz(pGeX6c){jK`SZOTle+~Q#ZuUjTTkXXMT47=TtQdK#wj-{#5!7FFfcSj+GQJCze zrw6I7Tr02r-vM>ScALvH@@I3JWy2@KmS_P`1Z zjw7Yq-TlvNXxvi1^BiDoq78f)6^AgxXBt}+nhxPQJ?)`9)q(ICTt^BH#z21*_<<3I?hgHKmpNMFoa=H@Q2%gpM#;ImN1kTt8g@ zvr4S$CzT?=-1z}_q=MGrojfy3BYZN_J9Il|^E9J#+L_;0Fv}=fsKe$<%j-0EgniEF zn5XMt2%}dFP?Z-i#G!YmrErL`b#3t7B$z~v_3)N0Ghr``)D?ppEMTF3c{(~YR~pGq z2xG?u#TXKrU1AVRU82|#5%-#Co8<;saQ76Qya*<}v0zX-G3uUCGR%WSN~U43jCSw= z>jAqR=bRbmW;gNbfQBUV>vDt6aUfFbAyv+nIy!2Tt9y_R?(^~8^nD7Ct=wog2Xy4` zK;-r9yfn_05|u(qsPkpA;I52RIoZRP8<#;%$HW02Mssr#U~*h50sWXoj$tK~fzunu zD00t!{yPiw$`+lH_?5ZS+&_|-}L11)NbeH-Uq=B zgGut9$v_SA&*K(zYbkxuUj9=x2jHb!4>}P`YcSFko_sI6|G~Evtv7-a+_ls67sxSOKAJ@Eqo|~t74V7%xS1eyhn-aZ5j{+dgHe~4-yR9 zHq-mh?c1Qb3`Q(UNqHVwy=7JUgVrwd7Mcw)pVrY5IBMoq2{ zXsf#*F*K`%Tn5unjYp#u5?PD-R$Sa>#F`+nStN(tBag${U59JtDkY{IOV( z1eQe$w2)cugkM9^A#6xc;XR-OGS9o*>IwLep>KTuK0$folv8q0v7C-}*kuOC*2+K$ zvTDK&j_03&b2QQRq6oT9VA7OiecQ#O$h!9#9fZ>`F4FTCojaJ{w#<^c@ro$zD?9{u z;TJ5EM4fki25t#NO=3qB{8a-{v z%!eh)9=2S1d(hLX;hm-i4q!nkX{utsU%w*pT@kJc#Q3K%^iT1_DH@?D0APqeRkr~t zqXhOkE9$_cLgopiIp?0P zEVQW!qxT4bTasRs1Y+%>ph>B<;ceis4uVUiqG5gCB5ALyQMB$i{PJr&)myq00o&7T z!dLFarvhII2(WEiE=}@S|Mm2MCoH3Mu;Ql>r#j8mBPbYVagoV0Att8+@}J}Nplc~p zdKuDNx2{mUFZSqEVyIt#nmXm`zlN0Inl#}iWqe>v96xq>ijAl$hK=W%Fr>H812-Y98fPHtR#VBZm+v z&mxcC!y<9pj`oH~RF*<3qpTk`1x|2#fYRQ@jC)@dq3M zU!~Pvf4UGn3~BTjpg3C60TaP|&MOp()GW|1s;L)d?u;3>JHmaW}@6fam@gh9Q#%_rCs5FFjw(`_lS;*Cd_m zeT+NUDAe>MP7eFZ`CsEhczmIX^J;l{A`@5l6&XFVShmj`p4TA(5$8GTnD=?fLfE}$ z_myD-x{U99Lf&F1FRQ-Nm!bIUx}lML-;at`VRWq;?!(kxNm9?jb^MrNG9yAx-Qf-b zzLySxr&T1bfH@a}Ql|y|h5Np^hK`tIbqIFNCf@$~S0-W%h{SK+Cjbhl zE0$NEJ-0^dYMGDQZ zCs%sH!dV)#Rgjod(zM=mFuw4*q~L+Krw8or{Mmhdf#a-j2U)J*H(rgfkz&_(t9@Sy zBO0Vi8#ilvt=S!=boKO>qEwFEm=-!#kUvsRy8t@cVPj>Tz(2BQli*wPaB+2^f^55P zz0O(9tGo;}Eje7I4+D38#BzS@Fl;B=drAy1Z`R{o@)WP)u5KOrOm%o_OuH^7(bCo^ z?_rMeRByT79>zsI8nAUE7BE645LLlxD7Rsd-&(&BTo@AsJ^EL z@S?!f4TCy_fnG@1N^=3PlN>IIQinwB26jD+dyjq}QzK(Zn5bV?j&~`t>3pi?tzLed zStsY1atA~@YoYpuO=Ka{7Z+uqx92^iuk;?JtNeB~hqhLID45+@W6qeYtvCDFtcX3Z zw?(;+a+#2)#B;fY!F}k!i06e$+Y>~R$mGhX7j`S9$QA31J=UE%$F^PVTm#ADB4+AF zT2t3M*H^_^ua&s4*-3M%+tA>#4%)&&d)0ddGe-Z8Psc;RagS$P@Ty`^vRYK)JO>qo zmL6V@=>%T%=VbxJEFJ3QXXHI%g5Bx7tQ2V1GLDi!ft%-j!ce3?TdPZdcSryrgz7W( zUy-?Vl9nq~T_%PyLwA6mbXlT;$7C63#ZxZFGnU*_(c8uv5*U5^44PItz2CD+-eWvE z!|Bv5MJ|Y6N>ge*y2@$jbBBFAmwsr$nhHdV(Cc9Ji^di{VNR#hLMf=rc=KNX@_5U; zp32hh%6NfQq}6zFJueG?bege9%R31jpqY^p`$RcgVXhoC~VW%muBQcOoNly-$iZ zyp!>$`i*&-(BQ>t`&NLgT*lIF0!tr-KROO|S)5CMa#g4hsh`Rk;`1N0un=turbzzxv%55>= zSKF!)ZYWi|d~eO*948yQg69fkk6BCUDyh_Q19=ZI%EjOKKiGl2eb)J8?2G?7N82O-dcydLvq8UjKZ!}H>>KLUDI>HbmgL}B3xqV} z#c`ukO(g`x5+QAlWdd!8@SL^9O5uI_Qscbh2RkFiOzfV~I?iOzD4Mx>Cs*_SA>m}7 zO~!V`E!Gki&JMshiUpK2S<+<1@Yn&zc|EIocl&sPlGyAMa5ZklN5A^W>FiO(?N(>+2;+Vo_;Z^L`!7lF~v*BMCVY@0q5jLyk}?Cw1AuxgKvcA{`?s9w0S zYyTVqVcG4aM9(hALE$2gQ4ufmZgxLoVw^S}k8xk*q+3B|gwH?|x>Lj{V8c|o_lmcD zX)3R|rNF|CnqoN9B=D0p>C1QMQm`(OJDCFJ1}|QEP`x6Q4YQpR@mkT@?#3>)wLW!0 z+WOApCzp8h$Tz+GSP{_2UgciZ5x?WVm-IuQ^ao1g6dIY{`wo#sC}DREKEya&hfEc% z7I<8*570PrhJZK83gY90wUJ@}@1+HYA3Q%6ipto}xj9%HU``TQgTG_q>(nR3vKxG0 zbn+mU3&Uf=XpA_9v;QYK0=fQYf`g*)*qDm=eaCHr<+e7y3Fp>!0*eId=y z`a=WHG2XhBp09ohcITeq=zmYtc38I|^UGi(-XTcw5SWROhMCj<3BrXvx=&>C%k6pK zAj>+3{F}JJD+j-{e5X+pSbdvF^fRZRu;%&zG!B}ilKn7d>%e(QWr7t(ekRGddXrRL z(v~_!39SS%^;ta;RPL4FVXRuL$mIo&}=r2 zQyJ6Z=>L&#tVLMU=oA~snsMHP#-X>kR`)eX{|CQ1Yk6J&__5qHsZP&tlcGw zZr$~$Or?pvcBqyp9~Lo6Zp-4gZyo!`dE_LhdBI+GR=j!@N4)hAs{*)*l?mDl25W4j z6O6T(xdRU*Hz}N^^qDSfYJDi_Sm~*8k@MZzO(272opZ6US(yEn>MPD<^78@z27>*c4X4m!qdgg(_U0 zm1&IB?ND*YKD1`nH(%g2^**BhbeT`AuhY`Lg~iwAl%`iU#LtOTmJApDjk}g-zvDU` z6Sohi%)w1JKeQsfp4xBCixFvoz!uPWx3yw*>DSrNg_3CssJunj2 zR^&DJXZ^Is14}ObTfW=hxV(?pKWiGdbu_4IJXmUO;jBTlG8mZX7q@7Q!nl-lZ@7-s ze7ro-Ga{I|>@p-_ z9WoGc99v@W5gRsRCke|V>G~3xC38A+Zmtsj#ETU^$XeFCSk@#OTvhSDZ!_(C&WWY! z+i>>`3DYkm5qFvau^wk8=7=%37DJ;+Y*cFv%8N=Nv4XlJwk6| zgUoJ%dS~eEUObw;P8+$GC!CGGQGtiyL0c;ajV)bS3X?< zoqm-^Pl@O8kaw_PI6-2FPqM{AyHIKvbq5BL_Pw3%H1)^EEYuB|5?jUxJo@!9J!C$??T@>3q=sKD{hFA0&NF%EAM1u>P`Ir{xJXUqdfzl z?xfx~I<5cRXBniXIs9*)4Lm9x6#z^EZs$AxZx-=3B*LHlcTfLszMc~clZIxcLx~)MF9CmHk^i&h2OCB5Kl_`1F4liG>Tf&qpN+bC z_x|%n{UZSU=Z(5qi~kRy?;$b#ENG)h6Yc}+rCI>XW_|jaPj?8240d38Mlt{O#lZ{! zVtxT(e0`qPh;sF-(H(%5+fXMv>h+aZGb+^NeEbIKx=#Q;F_uy8^358S8Kd=Xr|ux+ z9Q1-G8MBsa=sF9kJ@5Sy0?5QtrLHdlO=w1zd*&BVzBUlx|*94SEKkdAM zcr%YvRkK$xY>L6HzKi)ezV%qg$tN?FG;I|{BYeya<=-S1Tr8Tw!gJ>?t+CwNv};Id zZx0%}L7;270t+am7{iF>dS1N!W1~O9$M!{Wk?3_{dUvwx{k-8YYL@jdYO-78{qMg4 zR@ypXX1X{4n<{>)t84%PAyjMdx3+8nR@hEftR6H&bfww{Pd%p`t|syvXc_@=&>;zO zZJGgdmR0lnn$oh?I*KOJH?l$x4Xi z{D#^Ve$ojv7o6-@BUf2}oHU#W`jSF@=PF1reS$a%H!&ZGpqG$Y?66F^YM4}yyJT9$a^0!(8%@xk6#x7fZB#_Up@@S5^;k4-n`vo*6J0QglId2!f8lbJq?0Udp z8Lg!W3&%IwG)#(@c98yK%bE}~vB~@C;jutgFT!;D04sd_skL6?hymy}N^RbWPAF#f z2aEA;A2Mw!c{cqEQ1G^%E0Q!GL+&B(z|@7uMv)5-zIiqRZ#nu2&|knJGT1&9)S&&c z4kK~=>Dd81-ChqJe|zsB_9EolGwP&B8DMc;O?*eb0(cWS>5=5M2*`6VoI>8Nc&a*{ z1_dQ;os2e3$ka0KR|hHF3|+oZ|o%r!u1lCl_iTj5u;2;Pxef2=>>5nmDI>+%tk4q&?d36jXJ zoiMmI_J~kqd+$H1&qH*OmSmAv?*I~BrFO<$)G5|>Vs_TWnEju0KX?_OcgBWUjCp-C z$(cFr_huxyq|-gSNrQmLn1S8KccF^uhp-)B@)5B<1%_lEwF{|EGSt_w7t1lP50@1v zIosjFtZ@zPmre6B}5)C*+% z1X@7+m&toGkiXkIX;a4Bn8TlP>qwBZ#u)vFkba9Uoqa&U9ww~2_ltDZGP+^bGlop* zDf8-Pk}I>=L0*4xJ`MY@ipFn&f~P#0d}UGItNK)>6W=Qr;7d4sC|_fl${Gu;XlY}w zR)4OkV)oHG-QgTv=jP_nOH*m?*7i*HTz2j3CEdRwP+lZF4(}`EhRog$*~c@3i z_t1@x4#;UwnLXKHJG#Ck$_IiQ`G~={8u4$IW|gAk8ejQ`e3YINNvJBCAKNd;{86Q zaMP9mrZY!97jtv6%XXYl3?L0qF+Z1`1W@-nwPZ5T{O^?9!Jyr&pyiGA|TFZDUk$J=flu-WW{bbD^Gl?)UQMl3Phm zmd3hC3RbP*4DB}P)+e<2>^xQ*fzO>Rw?&5Bh+D|WB}tr*^4H(aj(ZxlJ)3g)1tkRQyz<#>9cUs`p2RsH)n;n8IsYMf8?AGK3 zd5RFvx7LwBW5wNUGgxCv;vJw;j_!KXk{&exUqOkOSN2w^m1;0&W9?wAfr%+M<~zZ3 zhZ}UAXh$Qzi4ZzE*e2rO70jF&%{>>p76jy_deR<8a)}3Np@WRT^k~*-Z6xX3?l?eB zHV~GsqLOy4tZ1~7oG6XuxeakW9C$XE1m$k+pan8~mCEY}Bo_V47mFYPVtxSbLRUc+ ztTe5eu*FbeAxjqpGbkUhnywiZgw?vgka;ec^&(D{ldp$kXt>@Qnx~q(A(jR9nT1gw zw5%UhXKq1En=-W7^REbmz`IzJZS1wp$m_tVDv5p2^`$PzKyPFgSn5wheF1Jw3(G(1 zUb0Y;e3gHBMOn;7OV^Ml{uMlq()eUwGa1iy8!u&krO1pKdjYe;HG*pJ-s)YapC|Dn@ zSv@d8JBHGw>EcoqTQl|HA}vjyD-H3uBo{Mqb<{}e;1D5w^7|G>4Go#)W$)Z227On~ zN)U8TP_YxL>?^*`&YgoMg*|Ycdb)JPvSC#T5}J`|-{XfMJaL*{+VyW7Z7?(4IchL^ z{|g!`TC<+=rU@v-Ml*FFLFWU>j;G|5F{o3-2s`js9u_Ly%{>R%r_O_VKYd*@FJDNG|B`z19>8 z%+`z8ZaE;Cscrwt4-CHmP38I7vo!T`%{uEp=~gx7|zojwibB6blK+S|qhq=HZOE|l2KeZ{eK9Z82zagwebCq^$PkC2Tcgz<5KL;b z#(}K4InE&{zO3#}sYvlNuCe8{x44E`KIUNT!A+E6gXb%}ZfnI~BYvBlerdJwUHr&| z`w}pU1KD>T^e%A4(CL9QUD(AvVGuEr$ti8}s?zGEGXDxJzh`DtdrEhxms7m);o4^5 zP{GZq8r8_6|NWDlgGF#_?84;~vbJW1b8B-^uV7=1_JFDqncxmvOY-5R^-8x*R<>RL zK4|pw&^JU4mUj3^e_$tDcryR(?zMnau-D1F&uFGsG~`#f_4!qECzcLngU>^$QYfUZ zxyL0*k~+13arM*1sm3BvBZqjxldNxD!A!*(z_l%vVKpU1wfrP^d~WcITFzqd#nH+Na^ynbPH~6jHh#3_<1Fpv%a?x$X3k9mOa3k3=#`OWXix6d>SqgF zt;97Im&v~{F*47CjA(y!KXIet1Z~?1IN^PHmbtm(WBi~CxC6zLf@q~)W&dY#D3)3b zvwl4tqwT)NP*J1-OhBhW?z4VW7iXM2Q3yQzw*iACsFZ7FCyR8tjLp(Oq?l8sE3>$4 z*K8Y95bm<>2#G|`v~4#emMUC|h%B?Lyk2Q%TMka7^*iPMjvAG(|BzbSCWuhmnzx05 zu7F4-ph@u5>990(^DN1Gz?629&_BQbIw+e|qfzq{GJU{j3jw)mh>y?efLnOO>hPWN z9?vnIk^D?0U+U=z%ikGg#R#1&e9hBuLK0Y*1YhC5zWruVu3?G?CK5i;yE9OwzU#r2 zn-YtaG!E~q_mR-FhBm0f3qwe?<~7@-fs=E+Y&W%0&cNW2A$x*G)uB)t$+GNYRpy#Ixe%|Og%#grAY4BThnsQaj-Bjk`LBf;}I++0B& zsBQgyP@hGE6tWC`J|2{Jw4Ga!8Ips+xgSKQ11S5tDZBk-v&m%X`uJ_$G9- z{6Z?rzg9OXir*oqX#)m8390$sJC3uXO(&Ed=TmdiHrvsp(>bEx77gy~=05=iRmsOB zr=`xW=&Ykxm2#+^6Kf+}7!qzI7lRWvT1|-=4RtZER>xjN#tm6e0v!TW5R$=dt&gVp zYkOSU0p6AGvw3lvhQmRVX}ip9}#NhfVf+2#3vT)6J}QFL(ivbVT@?>xj5f9Wr_ zX85iyNY!H10Hm|>#fC44eE2|1Ko=|So#K7gRT_FIPgUm{V<($6{l3~LU2Xwq05CGENM;D;JfIj@*Yo zjgCwtB~6r#X3qfDQ2+7a$3gKym6@8H>9E;AOVR=uUw3AyFL&`?#mvc9l*^g`Hlj)` zph|dIsN~DnK6PcqZug+Tx!iXBRMxz1F656?AXuZQl)S$d>ZjE9 zNpNn!-L@f7`TQ&X`N9haL4Z-16?G+iFlgo&(EV|sUllXCxK3?;;`J1eF2@K zLDK$VF*?C5gQ4aQ-A22tvZme0;T5D3 z+U}GKREa70(u88#pwL&hlB8T0Oj57~%%_SDg=F*$_=#}>;UYNAzWr!hV9s&?>}l(I zL@-2ymy(frIDs))YqcAsAd*+{kkqkl+)scY-&Smw0edXiO&~_6OyRL(4)T})kOYcP zNM}}lj@2{4-F{f{g^2>>9FjBTbqC;O%o6JSooujhAcL3D2%V$JA?%4F>f43jw;sTG zloA?bJ#!LL=crFlG1KprF!Gs%Buy>OX+^&tfn*hmms%AW*fn-m-xc+W%j5FUmEDpm zo*Da!+L)fy|Grk2;UmR<7wKc*Gc?$jt2*iOGo0D=dz_l8%=&957M^*TCp=?b9d@!~ z$iMz*4zLuxId_4eFI9~6g#;$HV%fc;p^H}*HaUHNhF#aGKX-S)uj$bP3s7o#qpdX9# z6?Piy568kU<%mp(fp~}W#1SxMH`Xz+Sf*ovW2gE4fxAv!cogJdNR`scA-~%a9tD}h z|DtF?k)W8lUzcxih9~<|wXO$3UqIpvE;}KWVv^QFf9m9%{2g%DE9Ef(L`Zd}oWKXv z|G6l%5jRE0x3I_kMOO*Nsh;@N0wfSbgy&MK zAEh5|D_Y4!$DeVcp5PnMaL;3HbDptauHsxv`KB^+3k1~_gg>(MgQ%?0jyi^?Zj#ny zw8HpJRNIOQyjaE?;{jJ)w1cC^VV^iup^p3Acz5}o22C2Mrp<~a7c`E5^@=uDK@!dhvUl;l zy#OUz>=sv-jdv1%16%<<%d^X)^-gP#C1N0QN$o_xe%JA4wQ?D+KW<_DBtI`jEsp=- zDty)f2qGL%nW}VvVmR#z@0>MZx?~ZQK<#P;9c2QgK@Q7bnFV{dgTO0q1-x7fo-JQi z0W{HkkmpB3ALKX6KIU(CW|3*$Q|c4mDp}B_BL1N|J)iy<_>sK<{_EnJd`H*7Ldg|2 zs$s3^&_IN2M5g3Gl>F>LiJYM(S1F-Ac>5eEDF4uUfAhF`{cfux2|EB|K7RHeJ<0og z3q}c+$`QTX7hs@B{7$k_rb77ZV{wb5!H2i_8k1m8~$ zy%b@+vuF4vHzAaN?p@SxC$)Y{xUJPukjN_sZyRgzpyXT5yUk#@?|L0VKYRn^d^Lwv zCfWFQ>tAcgEJYt)v@*NTxBK_hmP`RbRbae2U45hNI(P?v=DgjVRC&7ivs{3X#SLmez zj>jVZfnJzaS4g#PKJWI%?IcfC7EuCLy{CyUORY&orh%eTDdHomPjy+*Gxi;l3_N!OI2lk(4y@;Ul=E^myYX`T`Pt0aEB*pn|8Ov-C+>^>&Y+`5(@7=gw<^N8)B4)60F ztd0XoS}PYtTDCWR+VBU=1lbsCl9GKK0{2f}91J+U(s>1cQBxe{jdqg@14H$L;|b1gWwI z4Ym5XdjSb=(z}dHMfTx2lK~pN5!|gr*bcXNr{Bt^^tUu5BKOS2UmX$6-bC)cDg&IN zxzZOsEi?Braach?m9ezJoLB_aOxFi0=3pE!c7^sh|G`QhhB+IJG7|q;UkG`e?e_G9 z_X~5!Y_T==x~-R7bjk$;_HmKxKDzIcGe10J(GgL#f!IPo0`^=W?Nj7-NK{+VLlbe` z?>~6fy{@kxDW{M}<-e|e*I>fGmcylR$~^Hj1c>oD~=Hp36TO~<(9&xMl6 zGk8%^-pstcw$VNPesvt6cmB1DiCv?@qC^%N1}A|zlde|dx?ysJKe}ho7Su?3`RcoI zXm6VbLduN1mnHOL@BhH@eCf84tCa1mnWs{mTuEX{5ji<7krQYyG`MGRkQ=dgS1dQD z2eh3Gd^69EpekYTegvN~j*y*VsM`D8e2PZd-CPPAWYX5`dsw;;;npbMi%`@H7x2*E`(U~TDp?cd~_W5N_`28aGGn!lL`2Lg+mVJ!Rpn88a_-PZ| zN6oP1B&UOApOIA$d7mVt&EaBg;Z46;dS`esf{(bJgGt*nwM0HPhmCVb4TRiQiEHB} zpSQ|I;rQtD4ynw9KlyC{wXuz6;2CjlqVf}?^xjyK`N#0kasI{N`2p70@nOhJq)}wC z{h8U^4=Kx zE_p_pwCnN3ngYeyNw|U}9e&V9tTn!nCt&&Ci`ZjJwN@%W9x)xta=P=}AT3&jrz6;6 z4_bI!!&86^a_xKJ9;qsNqne4&E8!o-lCWQxNs_xH@8$00mafc5$p^&Wf(uS}ziZO? z0|}l2{&9R3*|@H_YZVyE6hzD{Tek^*w`I^!;K)vP&dYt3ul&Wa>HNFD`O=nL_Fdet z@3zbXlV)JF`gkrbz4AB`92TA{en96Q06-mAPR8{yY(O)}@w~&rAl<{vW4_5pcLpg7 z1({zBiJyti+njD-n4HsW_)Uys9~J_xr02(%U4qVsEWipflpD_yOi>?4;o!2GHqmlB zdgUU=JE>7=MWS59;&3Q5c0QT%j%F~xU&<}urCk0kHbvu%V_t-t<9=T%<~o6z$fUNGqRUj7seYtci*^y&!3Zn=L0^XjLFtRdAG`V2PKsYwky8j$_yT7`Y%~3k z$@*y)@ZT>7hKMNKBBPc|`{?zkDE&x5F9`BDhWxITNAfS!u*?p(XQr|Y$IDEW%ARBM z6&;ef2bOitdI(Ky{X8vCXT{9ds&`gpl`Y$7E2WzBBL9^6JmzDRtPwT#<7Y)`^p(a;xanta;m z+=u#2^=SoV#3&E!J1ApR-%60%wa}O9Hh=s={v=y?(*T7ZornvqmQK-A;!W#kWV?cz@2Ou`~`3QT52T8?`|u>qyzUkcD2w{r=QqZ zk>-Kj^J{%+qQ_I9vC|Ebl$c@}-G^bZP1gPmMzHYZ%H^H}qHL4Cr%6{8;~&8?f>_}v z^5kOfB@zNf)H5W_2G_HySK!o!OnGwP_tQO-{Zyn?KPg8TO2A5H~O1_e3Sx!|Hid_2Hzg2_{=fXY=w7m#h z^jT6e9*S*6ZLA^NiSO15ln<6qns6W}$H#|5;t1%X7Z)Cq6VaWedzo`zc?YNyLAbJ9 z0nK@C$o*WfFriJqEvUocv*2X8KXJ?gca>hzyQNP5*YR)SnlXe03JTgtE!Ydb3G;v7 zb(3C84|~f-`{{E2AevFb2xpX0QK_mLkQth2W~8ZL*$T1ytt%>>To)gy)b*W9>%-&YlZg@wtq8A>x)Pk#=){!Vd!_Q%VR5xm7@ z#b72jszIFG9AB_lfX1sQ&`pPL9ghFa`1*tr33hUwj68=DeQ% z&GO6ZwZ(QjTph6D-Y#91y9N+t)l$GUf1<(V0$`nF@J!M$#%xujTZ8_diBJCv-GiLK@3^)ipKt z5gy-PeXhaH5(mp4gKa?g=(GLSr0&cEAl(~x*WDqu?SG;xOV`+wbxIx}QMArk-8;sNNrD4^mp+LXlDP67T~|W{yMp;@I1#MkQ+Px9ToshSy&8 zBHS#C$Tvwc-0Lw1rxyv|r3|s?T1uOCy%vuiX|7PZE5{ll*J_lT8G<<`cAr;m_BKHv z<_yGRWCJI8u{^Hm;4Voi%(F4MNQHv^=x6CQkv}_0;~lmGIz{^5YmVn>;I;vX`Ln|X zYU~9rfGiN9xc=QS>!p+BWErb}#uW`+N=<6RPXOMUeeN~2;t6kuoY{4iG8%qz##?ul z$tMNt+bitmllX7Cv}Lzt_cl0C3es7CMMp{C*)qiS1kf{AfiS&F5!zH}8N1#3@R#N% zlI=;;HFo(^GBDoNr{<#E15x(=TvSmGl~yxN0rb=F1T;$Y#8tz=*g6&q;nU_{pt-~i zO*2%Dlj4VXF;0LN1H8MFP1i^AjVa!O@DWVNiDK@4b#W$Hthzm*`<& zj?yoFJOOAtnJ>xubhXX6``ejnFldR~zTRr4MwtL}&*0I+;p_C-Z1;n}ii`G{igy7p zo7d}z8Ny}~RO_~u2WY=_NTpwwtbXNrOWSisMrT;XCT~N$&@lz)uZhXw_hg=e$ z+I%6ouS_BjYuX_bV5D$hG16ltF=#f}QMC~iuxskKX=8lqX>fBr2Ius{)xFb7s*0QR zJe3CvBpl#z?8X)FnYApSam-)z!|i`jbT&qZKIjW*%9e{^Ca|{1P3()5PwLUfqrO;w zIm_tdfG%{@Sg@CSZTMaJ9wyR^Nefw6DZD86@`32}_UvcvM)$XB-d5f&OLmqt&Jr!= zR5G#mm6>pwnDnZp5XH>qznFt{-Ii)aab@^5>|2hlJ`hv+cCjt?=p$8Zy*U^#T3Khe zG3pbAAuHsvLNcWwRbB@K;w%M;9@K{vkC`)!dQp8&yHGQU_v8%IJJv z>bS+4*Ah}m$@rbL5A1tR_ZBb3QR>`4*5W<#uyZUK|An?>A@}2}4s50NN|23Wc0nkr z)`O@^AlD&}9{L{EOsWjQje+W15!9lzw)@9(9@k$Qa&{mnYcc^na}L>DavHlFdZhk5 z#2wi2)>M1lA%M9LI?V*Yq>dL2`*Ar+!K)s5E+^YgU`FtZS#TDk@nL(h#g_MOf|zBkfui}8@Tm!N>g{b zWKPGx(=@jkW7M5)ih@W=Lk0h0uTTtAS@5CSRl19})u_5K8;@ue_v&ooZolWL$cT&hra1vj)tvVCUL9UG>pMxSuW_;2PxOTe1?~Fdp0gH zQHWxT3qm@OnGjb4kGTgm=1=3EI z@KARF3G8PE>2i%Z(gJ{wx9)fOi|i)&{X+13d7&MSjKcA2r}gXYI|z<*5m=w8*uo27 zC_@+_qt{^130Hu;Z8cahvX_ZDa*}eHB9fn5CY4S%dv0(|<9y@D8eBg%t=ePe@B6kr zs6Sfc|8iS1>%6JG_axUi;&VOApmUd<7j?18_7Gq zkmKKqEPmbm5&b|;VDG7mz$p3^A*Y$+*vEH3+3)uh8Qg#SlS7VIptH7Yt`yEDT0Kzr zF5V)9>m#UDreM$kQoYD~0Y=-mfH^US6)wZ{QM1&K{#zpQEe_ zucD5eVtG*+A%ti9Y1k?w2lKN;@Yiq&TP?RR0-8@g`;UCZ9Tm<@48_uL4uTH=BQyA= zWg2lr7%dLM%!ZQhJrcITZOF2(d|u1g13tzv*lMzjcGH78a+_Md{UYn31td-x3n5Pd zqx871$aSaIij)?ZxVcUWt{u>Bl9YD|B44kHP3l4Ych;bk4BBC?WR#+t6Q$WI%2zLG zdQ=+T4cx=YdPrP|ad!{%q6z0@CZW63J2$N?P*?MLwl#@zi8b|9TO#UJ&F*69`K&i4 z7wUe4?MUWSqjLRP5q8Q-Ax&x_yNi?ClwSiuvAF3hxGIZzKWtY- z5h;sip-crY66ZZZuVQ=GgIF?u(~OZ2sJ zUu14N`B(!`8-&vVpA5;kwCKMoP3E)30Hy>N}*$;NMV@=_*eK;>DzxSM`3M;s#SBH_JXBiGTJWt}Y+lL*l zMK}OfkZQ@-L0I(;vADbq<}a*AS~%`sh&1?I;Q`-4U;c)unbYWilm2zk=r4 zZf@XIEab%2L$j09W6?dI?0Qj{*!z`uZCmsjsztE3~HM)d}y-tUY2`prXWl()a8 z+8y-99oJCXLP$1d6dVjAL!G;#85I-ykBSP?Up~ct_p~te7WGH82UtokOF*B3`p8QM zl=)5&JyezO0XLT*ec~Ko;6}d*9hwmq59tqveKqWI?$*2pmo|4T%;sRZ$H|$*whk1b zPS9l69>eH2*%1sh1Kw6fG=m>D0A+k~@Ga5<(E%?N&)wkmH@*p{{uLeQh7} zgQ!M3McfLlQr6SfXt}wXs-Wt}unlL=y$OoOo}r_LpqDN+7v>(|L%^#iA9VwhwMK~0 zHIz%_N%aO85Sz+93s2?@tl20oj>bi4{PtclRhaa51(u^Dzv( zf&+7Var^m0OV==$2}O!wd`l&HnD^$6hK#fwO>=!#-@P~jXkd0TbHE5!0* zzpKz=8))nWRkREZp(ipemMqBCWNFwwBG24HuVu;3MK8hO$S@FA1_W3s?ea7BLljv$x%JBj~Z8+-=i!5GEtR@vSl8NMVwLL z$bRci;vb8oSLyZkzDH%dIJXJF`e$P?{#QPc+x7)rG0fbLN0(ScT^*>2-U}m zyqVE+*V#I!<7>_9lQh2v$JZ*?%_Ubm`shp~c6lKR!}qbJ=k!EPV`XSQNf%J@{sugG zML{jX?ML-Er|hcKRZyADE#Yd^3%pNE_NJA9{lG=^A z5mM?`XFbZGM4C(k31_(S9%Bnznx7SwE%uCw91Bj`hJ6zi0+I461y=N zhDA5E4Q?Z=A~38#bmtB_lx;zn3IrS?cv|dwkeFWRNPaNB0Z4AX;kh71M&xSly1sDQ zo7VRmfC+@ZZ}Mv)zc=Sa#0XL>-&YsqK7X`%$zLYGoxgMu-WCWc7g%L6*RvNXor3qiKu zGfDsUd*r|ebY9z$G-233cKieXfD&DrZSeKVwrr;D9by>9Z|GX$JrZH{aRB>l6+GZ3 zG7n`w$7i?SR8^|Aw>$u>n^-=(^_F_35jgYF0#*P5aDsTkE}Ou`sgy0p3Fal*b0fir z-lb581s_QJFCRz_rn5sI=)A&$_s<_FE8=tq(6OWsfbcpVls>&L3)MN-_|i2~Iq0J3 zRVwVjieF|)hLR2hyd(bmKKh7~MO+mV-TIeLqY8UAN0?IcxY34y=;qgsv<1blG>pFc z6h%fvkpn;s29UQL+!$`bPjDkwtjR`7-@UoK{#+ui`#&#r9`p2gK@l$>u@IN-=ujRi5|MkPSW3>TPzT5wgzxTI) zBF%Ceo{;ji)zIW`%l+T~(Et(rlZidPe$@Z*gZ<}I{z1io*yx6rx_ zfDX0az)kdy^^e$iRtAT%^%_Zz83$%eM7Ngp_u-kBdzsde3kb5>!kYP4UK6HzG;{{QP zoq##9C47)0exZ7KXCg>`mI1RPRLvG}VmN>jgj=)LK3GkTn%iPr2l!>-8~_ZXjX|Tv z*7N`%^$CFHMq|V)_t%1WzJ46W4i`LxLCFPQ8$2yp44(QId2KBf^C+wjQ+f`6nI|8j5)lHV@} z%_JTHB+)9K+u91pGaVF{5>5W%fp~EPpCzX_$cwn8<@bTwC=OEnk#wFE*wQkpm$8%@ z_f@QP#xPrh;gm8%;1Lp-mXaK1Ss~oQriU)|kQQVh$)+QDuNrEmZxfdRrqx!bEbgZNTXVfQ-TwTy^OA& z0nmXh(h3lu;nbSJ7kytJyqmgbmpF=IA-16Mulw3zRYgM#d-DFKiw~U- zPYd#&?3e@zW9^Kv~NZtrW z7$Lc94|s(imL#tCC+>q|sn^#R%SvH}gN`hKV<-j&`^Ln5_d=Z7`kB&`z&*vs$;=d? z+KkNq0SxW@>y%bYkr~4i0;ro)zdc0ul#`Urz)vw*C@%6F=*I;*ZvmQuIhebC*F`IB z1<3jEZ03FN<2$416Bsor#^1b!CIj{iWQ#(h(Bp*MoQGkydf(pPPDW8rN9G+6bUEe# zBmzz>>l|S~r>{N(7Mhk|fNgmV$16{tbmP}fOHo-K*H^6PticNE zN}tr@a6j-XcB2vV^wUq)yEs_$SCu&K=^X%{x5?%J(MO2NW=6)KfmWs$3~B}N+D#Jp zghXJSfqH=pNRA8W@s1gEe?Za(5*MJR2(psl0qv$lOR@AI&>AJo{gmmatXe%JFFb?0 z_~5^ZgwnS>0pXU}0!P{ntK%XNCxP3VR4FlSgEI9E96B;>WY$w!nTssGBpxx$bbWvJ z*GY1zd&>#dlSm{3>29Y}x|b!6TQ<639U+S3fxu;NN{h;F51BggemM$ne31nJ z>Y7>}Tm@zrZI9%~IM67Ed`lDajR%#-Omr0X7QV3~t~b|X*lwWQ<6qzY7CZj=kEsTj zs-RCc{yq|g{}U~41v?`_knt58!0k9vJg%G?o_{JV1!p4phuFb1G{e4ukM|!T;$0M| zB)-uzm$U-2*%cB&AafTm9^^mm`#9Vpr&%Bdpe6e+QsY;EEw{8KAIH|ZjC^?SluK#8x6n>}jX{$AGp%Ojp*$MU>Q=k^25gQ4LiVJbQ~E8rQs*bY z#T{m+knQdiAFZ^++Ph+Kd3Q%+bA@Zg5Ds zA}E)44nefqurwFTeW_gZdX40{WRjNQKYpwy@#Fi+GEAUA@)fWcXW0o8Ejf#7nDzJI z0s!nhiP`v-534UiQrFv^Li;N3#zKpHMYp8^hujXk^czkaa3=>Egn~*-)>a%U|k;D6Id`z3IK;m zElV&RH)jI$8|&PzKjI<57+*x9UprNjFkN*ZRk%irZ!y=Z{C|@b|0Mw6VAj@I{$Fw$1!09WL>Rx6yS41(pfGk+IePQcy z&At-7xID<_7V*W-HM7-~Y}&i{-bjd#;&JO^DE2GD5s$0@@@0LDE-j+T-^@ zyLd|fKT!>|mKqPgA)P7k2A@J2`-2Zq=Ti1SO~v90KT!fO*-w9Max2#A%M&;u?8^ec zX4saRQQ5!tf7(oe^}0f=9@EnAUo8Q(4^-#}^GomHkb)IsEGmI`4>kg4$K{i;x=BIPKbv0t(S#yReWW=G9{=T&Y>^3()Xl58&`^sNs zRZ`R3`g5nN+7C_*bBC8#g7;Vh`3Ei93m;dDz4QC$=7qY3Mq>->;Ys{8M^`?p^SiAsyJvW{nj4%^9)0#;^f45$6%cnA;$s}<1I4>fWlRE8eqexFLfyTJa`xD3E<>HioeO61}=B1d_-?Zoz*13VDtzoYc?%f)EMD&7)*(UA?2VA)gZjQIreSgVt z{`^}Qq2B55B=4_L-*8zO?hGTo-4esB!fZ_xgCf7()ccBE_XoEjKRa|Y_jJomWNjXk z&3DCOhU!H{@O>cFyZldsKn!qM8j)7-4{7#>xrUv|cP{(L;cmUgHmp^!#ka`5*fgcR zD6qi3+AIF-QH55f<^MF8zlS>?8N5h|4`bjoPp;}9F6ve9gGq&hjjQYPZVAo-YKxxbYEHne$X~S zAq5%k=Fa#m2Qz~wc9Kup@p6^SYlv=Y&S@8;x&LbhYd*slOvRnOWztRjy`@G{-yOVf z3_GXtOU9-lQc*#0_{hRS(_C6gS!5#Q{!J2d$4B|KZ0(MCb}P zW);a{k*4ySkeb7ap-0-{}e1qrA}4Ib4q#Y(tb$4m-ZBxmJK zkAm+VW0IfD|Eny+l(Be`blQD89GOq|#~I?MM2N0~4|2*0hFs(xJ@|CTYMe_~!7B;U zbc}^+i@%S=732O9#QfD$q~f5N`5q$i6q+k=G=*x*V_W3#2a(+@;7)SrhFu8O0oddg z7l}6M-)I{`P2JdXUR9@^o``koJLSk%E9OS7{d=wdsl^;ZdZ4b!2Kc~aC|kD56P*;9 zgO#8ui5VZ-wboA)`E)uw^&W!|q-&pbwy||V2s2gX^&MDbE+lQ@LCovdrpVGN0rILs zZt;lg%jBV|Z`Hj2IUA9Ns8WkxT(_~1WeU4OHB9WgcJ1UK60Nv!s^W?;83>C3{X~T7 z*N&N3((y6GNyOc-2s6c(MBn(BQ<(C?g(eP#JDy%V8uQ#8x>Ve`)0pTe+m?T3 z-w|th3{PFJWBD+0hEhdvkxy+1Q z+7_S!$8`?-=ca@QATzWXW@YQe4DF+JWW*6okT>MYN#OebQvIGz+yS|115uI!h&rNw2UiR$362cECOeM3?sv;i<-J z4YRhQ&4ig`DkL@$poqPG!}lrN@C~bQO-8(Nr~&Ar+_NT~-)6FIWR&i9z>4y?RgY2= z2(L}nzLe@DK~|Ntl(}X-#+Tm({Wmg07b*A%lZ)2ZRzLKPeu7lZ9o`lgF?WXKCUAPA zTKbm7CLj%KrNN)}Qj{&#%iKo{{rw-e5fg|rh~_0kduU$VpCuQ7B&v{|wj@;cE`c0V zblDigdgWY2owGl$<1>sV5mfEGuDK10;x{nPUo)eVwRrH28L6+O=*J*N#>HgB=m$lW z`Ln&XxX2XdX@%K2Y;wLg;0&1-E0*8E6fKkQ!V_&$>a`q$o+`{<{}a(`G+Hyys34IK z!SU{n4E>S1WS{t4emtQ8ZJyT*wUyyY=nWA~lY9MDoHImR0OMOex}w(F8C{t%~0 zcS#^&p3cgw*=VVucs}3g)57tgkHyNJ58pwpMg)uN`u;`lKq%@fis7WobG6d93=YC8 z^w}_>Ryk<%1kveB;Fcv8wHjQ^vzCzB2}D|-j%z6K{VFR0`(|);5-A9KmeQL*HKO2j z-Oyv13m>%+eIAXoj=+TZmtvpE>%ZDPd>d?h*arjfZA7M_=0EsGB;H8%d=vW$k)zXxjKn(MwyeDuX^-weBvw4LPFqe z*|?$^hs-vG0ca`uS7=FH6-FwH0_I6ZepjzWw&%8|9aQ91y_WQEEvDm3dq zVki4#`Fp#E2%L_acz&(FeRH=~%Hl_vxKJ*Q*3)F^8|Q;@38;C^wp1X5LX3=C!(5V< zesmkaWFl6J{as!)fY5^$zdj6pNZS!9a11c!*azW2?8^2;v+-D!ct+DrziUN_$s+;f zuDDzT%zS1cOEc;2sC}jXbOTWD1bkr1&Zf;V)Zgj(w?#yS&q%==+f@{=*a$ ztg!ScI>V5i`}T7ZEX~eN7Ex&{!^aiWGfWsa{yBw|IB|l#7+f)YA4Ti>yNpB&B}QQ9 zD{J}FZIgF{a=5e=FXebocQz8}-ROFW@7kuG*?wFH2y0+r0wQ8 zfMg6)mDfPQ=F5Lby&9XDS8=%qI8@eW_e$`t}b_ zo`y;#gm?Gp#;Jj6Bs{@Y6*YY3Lm*L28i*T-vlQ*JV*?*k^{OXv1E|F<9l!&8%089IY5+WP zY{W;CB!`bM;?c~ivT17wp;9hK#%Ee){!TN797I9@bt1J-LzJ}z`GyFHP%z@YHhQ=r zT*(9*%S8#kZi>M%(YEc+ECSQmP1)yz!r5^B{7w%O~2J6Y8A3l>a?m#b|Uv( zBvWm;7;POCBdt;1e!WNhgR?|8`_b77m?8|*|5C2pKYus^sN=iE$DB(6=QF4(fY7h^Iu0Z@Kvy5WQ@Q;aWrzUL+`NoQQOm#!;2| z8L52&Cp*zFpoT|DfOqvw!tkcqJ+0p@Qzm}DmVanuX5ZkX^F$Au=L~0MX46&}qMCte zqTja*Fv~Ij*a%4l@l5BFZwO)w&A_(>#$lVRfI`Kr5JF+1YP)`2Gy*ypqC_QA{&TZL zUO;U;*?Vjgb5ZtYP6#+qO~I!4eMIB!BXXOGCc8mNhB)XY`)|YpEJ<=p&-=`=aa}T+ z{%LzmT?wNVpc&4Icoshai+3J;PB<-k7}C!TN{p}QX4D$FRinkaHWYOli;A{{`=tY3 zzYk2%Bf*rD5Sg1kQwZ%ud*k(sp4rz+-FGh_DEGDV!7207mNQ}3Ub(|cjBK!ah~$P| z40pgm!|7-mW|jBW{?Dm9hmwnT0Ewc=_0#YPS;b~^ZzLR+tBnNEkM3%ouu_T9-h&Rj z{sY2C$r3G*qdj5Otmx>5>HEfrMIB`Hu*j!xI1@T+@ZivyL>(YYob=N(Ha3Rv;A05E zQ=^@0fZRvrz%s_e0MPn~@pb1BKo)qcyA-D$l2T$byn_GS?{|p(%Mk99eo6S^zYh+mjPvN5cVC{)T!toI{ zo!vOR3jo;2b%^T7hBhSI@^eu(QiTZvR1M6iRTS|KP|_L`4NtHh*k4<{SyW`;BS%2v)S-slDaghUTM;mZ_bZ7DTMsxd(GJZDInSVms_C zwaGXhe8xCz2lHMJ%*s~4eN*J#ZHFg_0ImLLCEuh!^`WHhQ`l0Zxv4xz)K)7fR0NOJ zxMJPm^Zy}ykshcq&U4UQR_V{|-uF6nF9&di#?HUxqx6zIpkE1lbKV!QOB&tl6&SBC z0QMg4H-YELs#ViJ+T?li?6(AOywcKbU@xBAij>DfEORerd=3J@IFM!-46+=EEmD>$ z{Rk3-f`Fcc@1B}Ye#v=i3o{-y*{{ct1S zlzIHx+lz`wTMj%1p4^mTIfq@59H6LMprz^Fg$43~S*l}dXUO~ERM2UeEi7`Z|0z?XOnLRz23G9I`QXdql zZ;bLXpS;2&K0A87;6)J-`mC%XpJa7(azFVRM)m$#J{{+%L}Ccx4;#3bUMQ`c$(b_0 z4Cm(o9Qf))km?{j82=)e{{Q}`6oyX)8W$wzI?E+0z{BbQIKxc0 zgscbu=W_i2eorI}B_4H}X27G#+LbvjD>}e59OoOv>FWXHOcH?h9^Ifk*9U;X02{Vx z>f!!}Cqc^9A8hyE&F=sHU%^p432D@D2nB6IblPbISmr}+A{XIjzn9St9|*7V3IiO& zbgB(Ri469IaYkau2F?JSA_QUeFy_nOhr2@CWOBGQzW~s3HSZeT21dLhqt(}%fjuSM zXEEc(-~`|TdIRH{teLJ=p;ql+0RR@+W>JDx0BTDtjq9Z~ogRHEP%mw+;n~qI)1ug~b6}vkAz< zA%f>pQwExwKt%WS6r>WY7tmP}V?j`I-J$!o*lZDkJZzu|84F z1AuRY^4ELxo8qgZg$#SlUGq7#u_58>rX0a*t0NVEB|l=$9|gPCtQN*ibCuy15)eu%mHp(afU# zKIXKNl;2UXCO5(=F&;#nLi16)7f_&Dg}^&|xVE@72x4mSxyWj7%FTu)Q9d&eA#Ovi z(9%b?sI$0api{ggkE3>V1TH9q6TueeC@H`J&TZT|+Zx&+X_iH)z$-c($+R9qeYTXg zBg0qRzkw)#;yYjzb{Z~PYWEo*z~YSKpGH*v1e?M$Y%1+=C7IELm{pe3e`pmKV5N=| z8kWA29>c=1E9+r3R+hyC;+6p^>|IG#Iq>n}hsopRKKd)SWRn4!m`j{S9XQ(?AFjGh z^I9uQvE)cHWDRc#RQSr_#u3nUoo$Gl!~;PxAk5pqZKFnooStAniry*IyxpkXUNmaYHrGxMUm#Z7Gc ziFo0cOzD&0EHx*hGoR^Kl6zpRbYfynFnq=5D^;q$&Cz6h!SdOCe3WpbUk{)Dsk?&r zQAmi)5b;kD?O^zXvWz*ZV|f`k4Y+p&nVoF0pfM*w=`t@~!j_UwUzc!(jRkS=<&Bj@ zg|ONo!YeqcJc_XxI)Kybe_jqRf!W?B~ z?6*_XbtJiRxtNalL2&w020U;Dk@D1$0D5vq$_`2d?avcy_$y=bd{Dr~c1H zpP3@qc9B%`ceLK|)dCyd@-OO)NQR^s0ZFLv{?!z(*2hC6J8%ikt8H zm4XmaWCKF&1WZO>YwQ$gP??N_J>N)YX7`nW7`F%0e{n^b3wwlV$fy}Aa=32dOqaQ6 zMY`Ei`zazb(f=bsN=3z!XItaEbkfZPKU2{DbN zntn9;yRc0p{>ey8pAd@W=O^e{M7`M4WsDojg>^~_TnWWJu-*3~GT$9Wi1Vf9^v0DS zd3RTcZ#AnHF17zfxu&L~aphuI@U9EAa@&t93)M~Ok)1-cJ87S3-{Jq8D5%cmN1J~y&igSyG_ z?LodmMgSoWvru5V9MfQ|U)T?NYxMUWw~>I>a@5Py|!`?rA<`PN+{IP`@i1_jPydN9%eXu3s&M2{py_xH^Q<1d^=ZA5HWf9 z{e9Tj!H{tQYDx^&D(jgRe|2LyT$3;ucT|tk4D_$Y!S-mbWlox8I!!j62Yki6K91qM zNltWwE?vL-QNcYN`FKsyyFEMMBarj_K};6xY?g)UG;V19_-KO7a+FD@c}-Iy)%aGr zr}-j!&oVy@&R=~e8o9ip#@8lBj(k}mEIwhFT!@&IrMEv^;z31-=a&+Fg(j^Xs^StA zxXBZZnf>QL;Ut4dP+1I*uzU<23btUPqbc4-XCuHefIKy^w88J9vj;$WDVozy-;uIO zuTR^QnZ1lohPXpBjx*qE;=~{+9_hqbE%NYB4Kqd(qg^!f*(;c-aeo0-TyG#n^IayQ zR5#~kNBG^Wy9%x=(45v-{BfxH;E7R2{e13-t%1)H9E3VFM(+d_CJLQVaS}-W9>Bhg zV{XVlUH3YextzdO48Te@us_@wNkZi;q>t|Zg5WnxB+4q3>B>9L$CqUlW?&-?nVY(T z8<^gW5pC;GSRT7nG&RK@T!}nLI`fx#NNeRiIHf2w6~V0+hT8$EHra1<1Z>{JVWp@`4Qm@<5y@`3)h8( zEGKgkpx%1qBm>0lFG^MFZI6xMm$&I`;g%(|=);$^QOzMwzl}C`3*i|wnly8RU^jr} z>d+}v3o(apKtEPAIziM?53cxi+}DFZ;MO}zA{P>jAF_>>5Mye!>a`*Q1nm*m}_sP+a4BuX9%`4x=M zfXMr~3^cGojV>xJh#e<%7s=L@bfVQEND^VmmroNC61`LV3$&r}&^U~U?MFV(57vAD z^g8!D3w0*3iV*dPEP?Sr%VzeF$Y}S4IMX^Mbb-*4L($=IJWgN7-^I)U-4h=Zy?M6= zKowbJ)P%?qOInm*6U|t#%=3(MJzsgtDeJhp34q!}Z4e|fO=3tVyKhN87FpJ#oof4E zbA>#u0zNk$JX5E#%ZxJHOun&o8$z0K9U$r^9$BpPG2Zl5)7r5ZdMP!KrH!`6K>Dg> z1og&;zB(AQ*?N7#^pQa>`ANp(OCI{#-;eU@z{F@JdJV4DtfsmWt6+|~prfC( zW#BskKO!1`1mOw0cA<>}U5sFW_kc`ve{H7*+F&bfyxN*lX{~TUnrkY(2;y_#h4g7T zBv}{wFYiJ8mWkSVrn~?VQYph(9uM2AfPCH<%!IKbvke+XpEcUlGh>Ecd9+!8%O-+l z3+spSxOg)jg>8Z(AOTi?Fc)4JKz)*)J>sS)jRc#+&zX$wpgd13tx*uD$ZRg7i1({U z;C+56Go4J?g1Sf@+W<~)@T<__O8`>AJ(B8*-yh!X=zf(|yN>IJh|DIv=uay7`{+-; zl)QJ=VhSX&_!TPa>00FJrn8!++q9($n{07qHKkYq4F)d_`(Pg(Q z(M_d#=#Ou@#3{-9enj^HS7~7oS;%SHlKWP zLfgqQnNm}I9(dsH5H&6Tc$fbBeyXGv?}I|o(sORa-}>`rDP7PrKUAxdz^_Nvb^05zy;%sVxmCt}m7|KjX=urVWs8%gIa@tSGHZy|lR;85={eJi$7 zU5?ts^vJ4+@y4UXG_!%;;#c14pFf?ccoAx_zgj+h47nNiHv>Ic%x210hN`H|C7P+u zw4&1IX+C{c&8(H6KmQ39LK$ToF4zCqWSV_Tt-xXsaBLAQN1HyHZuy7Adr2eDh=S#m z3lyOV*p0FQ$l>nJqc_-@DOOqKC^Tq-txs!J*6u5q3W!ynGBODj#)`A*vcW9riL){F z=)}#$juzr9p_`xdhlNj4V9Y>dNGdX5XdZWd7@{ZZ$PymhGQhhnhhizDmi89Y;Qjrn zc-PnE*Za7rUi?ND1+DRj!8@aU-V4@qWL^1&aOKm}&)`Y%S)-;b-G)FGGXic6l;BkJ z*}VWV>y3}Ncr=ObTb;NWO&5!0jdTZ?=058>^xWab@Yju88{F$88j<3SJ2otAeh14c zs8ySQgZ)MJFuFEMCW;%>{8r{qEbcK{jGCs5N9qYvrB{y2u^jqlqnj4`I?~%6AOQTa zXZ2C0?LHVcD%0Y%HU9LQl>2B)zt+Yri)9bplxH7iZK+}uIbw2g@xblI>V`kuz#(Nt zIipXVP4+f^U#fK{>H4yps#61z6k%GBa;JKG1UZe^rc^d|mk^MRbwpW?UR#A3qQUax zFA;!)5i9uGI+LcK7u(%2?vwdSL3vEfU9Wt9-_tW~(LP(u^Mv!g!JIMnas%tf4*f-s zg5Prmo`4mrPFTMWPx({3Qxq?4fqcwS>VeQK!O3W3q^7wF0m);-tMQ8IOWQ!JSnVkT zVr0b|oO_a;;~8&Cao@G%?AFAeChMG!5WZOsyQWV@FpkH-varsIAg{BgMsX)`9K9b5 z(vq$U($58OI&@GJiQb`I!yqG)PE1p%+2|VMpw4{@tS|*{t-Uv)rbd@8Zzn*l_vRIE z-gCgPtWN1FPe+v`gy=0kx7w8$4Bl7E4xD2YEu|~p|HMTQu%jH^=tq5v8UQ-2M|Ior z-2<6dYfGcW+Arcr#Cy}b(PZ^(t7V=U5`{l%r@IwooF|9NK<7T{dkH-(1#|xSkBMrV zf=_t-4AVMMw_!3#`AJH&rIEk?HALiQV-S6>{6#zyk$dcEfrFWM@dNezWL+MmJjOFd z{9AjGif=@^5(;@Q@{ARBx@P&J#~;x;V|*w`Q2Wz=&#|5NKl;2MOwDP|&!;i8pXC>V zXGSnQcfl+|EB(p?NBj|wx3zV2>O0eOPVF=gEPH#i$buaCM@tc5Se~G;IVEk0Ok0JN zu9E?dL~L^qE=@O(kvRMIlP-!Prg4g{A2}xzUY@NEsvsYYnFfZ)}8Q~??a?wgnyzL8>aAM+1_G5%3v@(=iqWwx-SfZ^P z2_kgxxrY*Kt}%-pSq*{Z%Poa{Qlb3Br1DkIt)e$T78D-=(6sgTL z@%u_gkw&r#F+kUuB>WHw4I&H{!hJKOkB51=mE`?1LIsU%`;dji5c5;b{RtyYoSeRL z_h|0s-8#t@C`Wnizp;&32x6)WJvWKa|@?bb>lTVQnR1y25M( zH*3(UIvR|W>Swx!Pk}-Q@OV1qx2uE&38JM>K*_I~M3yhZop=v$jzat&gGr2^tMJX% zVD905k@A=+m7)(6D$YK2lVBQ=OQkX{6S8d6tekO)z#P6s^e1Sdl=j;oYR8LLUkt$g zgFQE?v5_R##SEu0iihG)GlqwL01R#QZK|G&4cvW1(KFAxRfP4YeI9-zxDwz1Os5i^G5%!@D3-xZO0ezbNzP}N zHx3cjQsXKasbw+Rj)FC$nq*re7k9YbRTU&ZuA+CZ`n*BK+srpq#aYaBw-n`89VQhG z1sMMOtQPK>BS%)t5`@g9VyLpp6Lkw4gH%(}O&Qnd|Hyu7r3KTkWXbl^ zmeI>_<)*86b>WluTw_(&V|c?2itSJ{f1E@8jDgv^sDt8y`a@TF=OQy%Vy1I&dXJ`G zdF3$TKRcjNJ7Afte_HNUEjSVUwL<|LoR5zm`@A`J7wu5aJmP)@iR*rK&{x) zxi@6-P_#M7^t6wU$bL-wJ5gSa+@?$uCheR&n(JRrbhP(EI-&8S*+pojJ43i=p9C2r z@NsxcCdZM^VEBMzPzs&%@O}yE)n_%z^hXggqN!z( zPofd$sw~woB%`c8W?7f0N4s7p?`v*~XG%M(mVXS|e>0jpt2}`Vb=oV0UbKS2Y87U; zlRJy|yk31|>HC91q6L2&JU;`#am7lD(W)XFTkgi(G1Kd+8v`;-3Ndcike=q^OuMPE)7d=${q>~o<9vJVU5)$+i6KFKpZzmm^yx>HL# zgjE1`W2OSF=a;b7uG$4N8)@pN_T4;@=&wx$f^S%&K0UDa^2KWy_ZXZiK2_hGiT6Y^ z8wn5)3RVcBN!vT*Tux5P4UAZ6>(~Vm+GG#kck$i6`u2x z9Wj)O=59{Ut6t{z}LwhcAVv(gz%4rftD3Jdkn1vus{OTm9(lAeO>aNE36o>tN zS!zyGO_>~Vj_gB|CP71x=}gTBAB}9Le|}0>_9qeG2Es%-KDXbR>(t5Wv_>ejFw#5B z3&x69Y5IeLgQ9U5mBA6kn3$&2&fJugI0Y)tKII~4={JNqYzy!0G$NuYN?R5VwS<#G!Eq(me%HtSx74h?v@{*QU6 zDr_Uksukw*`R*Nvl#2hDM+TB+@K!^`@JzGDS4C*1a^*s>-XI9^d%%^J z?>-C)e5}=%_#%PiGWHBL@$#OvoF0$p4Wd5M1Z#m?T0G)(*J2LjmMdua zlrglL%&|KP?LWIjE;QkVbIlYpOjHp*6hToi6{&`}A+}!BxgiVv8ALdHYmWVsmr4a$ z&vaMytAqE414n0j;)c@gU+{PH=U-UpxPon%dob(lOd<4O=P)zSlMJD9xbYkSVfaw@ z#_X{^(M9LuberRqbJu^1MU8?44LFHNED|lKC{eJ;RMAD%zTAGR?|1ysz#L;ehoS`8 zN}NRbYFu?$3@(vH9Dtp3k}yc-RY`@-BdtuXHxdEYIi! z=QXjY?`{7tO62xhS!{bOVJC$79wTI(!`akNa>A(Xg3|oRkD}IOdXIq1HBl12$bhaz z@>loof@UU!B$PW}d}kfN9G}!e35iV;LXm=svsbe4VeGEB;Nw+|=(f`kPd%TS3zYVtaYHFNnL7G&71K4Y@i>t(d9 z{F0vL!DGmKA_<%jXrYlD;xI*KJtPf^fUqr?b53rx+Mysu06&2dNS@i&wph|&RE6z()C;h>|uMJ2vuGFWb^H#5SEk#`I;i{B1 z^>LL#1DtbTD$NF3ETyC9spIT5-u*A z%DuYWES&b)G=5qGduN0=%NSJ_7fy>;$OE5^A>qhWS?4NLU`7Ja?j2P-KOuChgeK8o))5S-s3 zinY#om!QWj$wh#c&Ky|Z^}LrNA|W0y$k%#T6juWpy!7<$KLzNvO~Wq}06Us7;6 zDA8O&pVcm)OZ2hr;tFH8DSFn9e8rW+E~yVY(9%u#2d==T5SEbakJo$mzBO<6tgQd9f3uWMIu@YELSNYS%uTrkToQRU-*L>l~ z8TmbDu2c4GGs9R{p*)M+>shwSMVyn(*C_n+M>3VupT5%~i~nZtl$QeYWkPMdyIK;& zd#Zj-Xewe$roy46JnwVk;vIDKogibpn+OH2ip#RZN6hRP`K0Y%o(Rq}l}2I6%K21j2i z`E|)tfZa8On<=`~Pb~^MNLDI5iA?gDv?{hdR;!DtKS~mu!192205ZpiJy)tx4MQ*( zHY;TEN$;R*;~;Cy*M6FeQ1RoLwsW9smBkqrH2!r_atP~vYxYsiOe3p9%%{lC678Vx ziVo(+b&4mi2$cLD^xXTz^*Nh&!DtP(2URr*?A-epC-QoBG(-Fpr= zx^QO$$^4X^c{U|i2H9gRP@(U+Q#;jG!F*vNTt*P2eApZB(2NZq2K-U zA_z+Z!JNPml5SnR^E5-n7Vs(+9&kaG);c-7z@P~b+@${I8JPE(El;EXcWK|UDnW!E z#584}d%z9gHZi^t81UJ`$itKH7OrDR+D-hw-BQ3 z>LvR8rFRBxoIpcx+x4H9p4hG-9uaE0GrJybWNqQ-<^C3-ES-;(5O?u%vykwxXR@7X z&69p8SvL}QC%;nugXr_m4>Ej1=qj4OWPKU+D0AIUrfs{yYj4EyA&vDn>FS3zmnryR ze$AOBmMcmLBI6Q51bwlxX#0tx621lRKRN{d?+F*Ce{-~XYUF6-rVm{ zaPfM+tVd7qyB@5y2s54d*;DHPJeBRR*sJ5JE^Cwo4d*`^uW0K zbMOZ9i5>?o*ACeifz2YZmC-^=f)mB;;5lT6khS;`4Q|-AdwRdoo^w8w=O^ce!^K3| zoDtOif}=R*YvKn{Vxk(0rb4(QRM`?V`C6f zuzvlW>NJA^!mZRobM@iIru7vDMRsHR7v7jGBUUQG#9fY`G58UGdc5 zkz`%>f|wxBeC>BtypTdsFC3!xJ^j1uWy2>TTR(Tb)@2;9NS zM?Kw*FKc*fzA_Dh3}ay-Ch=q$a}vap2*E$e#TS`;;BAOz3||?LiM~gq1&+!9R`gspavxv&7+-)}>xjr$kh^Rl;C>;qbYR~Mq_9Yb4mp3V~{T>j;Y$ySm= zGvKEO?$FHFLYbf>&>!Fe@jHb(Xp7f;FRkYvgP5`>l1jMXi%`37c3(TJvTr*TBVwsm z?2m=mpou4*)+Y<279EHl>nnKSxJ7eV5cSj8MQo5GRYV0UBNGigq%WoQpxOE#h)(Iiew&W^aO+Jc1R)80}RuxkBQln;^OB zN⋘w+UT!8uuZzqGkk9@a)~2(k)5FtvYrpO}d9lfV%)xL)8UUP3p=0P0VyBp|k0_ zYHIz53WuY0`x&i-D}nB-ZQ*3%?okAEL+rGO2<5yk5E{%?#1dB?l4Bl zv@m(_cU7zUQQp+tOSp!LSi{8j2MN%{%~fKr%~FI2O~E~f9ttu zMJN|hN%1%s*Ivp{qtDMPCO1e_LCe#ot&vebjcAx&v-BmeY=?3>3JtK&1u+j-B=$g~ zp8$_A^9pks2$H7FS|qX?nYa>l6VtvMcg2)GS_OGe55fxy+esGYNHYDIg#RKY@D9M+ zv5xh{cjpwYmyoB2UZ>j>sHC^i?eI|irS%YrM84?cib~e zBR_S*-9Woges%JeBHE(yr&nWvsMkwnm}ExYLUC!ornEv(jiHIlO}lXbG^5puyuaON z3tgV46!|*8$sKROO}Y(;o+3W{oEvW~!yIqNI%KgoOWPmWp(G@4DwUL)kRSX;B(qe; zVL3!=GGC3IrBQhdWMG;D+0>%#WiSSf2VFnqh^KkslGJp60te?jI(9{l3X@BP_d`sQ z3JJBwT}1H@HtBBAaus|#Ect+49>p*pc+H>ZA1mxA;biP{qdNDrPl``<4$q1T zI$wsK{*)!x6pqD3c@H}CMR8ef3JqIU6I3ZG3c(2PFY;v9_ysan%>2SKOH&gcroNJl z=X$p@DBU2!cl)ALdOyQFzs6u#{1?f-fr>?UnHbzR3`(oUzbf zuTM&f5=ljstcSz-lHM&p6g|2J=g!*w9JSA)>7wtPKlP;~7LkvXt$1E)z3Z73%lvAP zQ7q-oz-3xpiMsB%RKD} zoHPi%dH-+@ZkO5dgXBq_=){JfeyYY{q!m2_2kyi5q&sw{M%m{cx0h-(9tIz&|JbYqN9w zwiNO35ND|L!!c*Kd2xb^SLc!=-NBEaXlgy~sdJcyc|qdvNF*DhKwl8g9kOL!b9@@~ zUmU4}-{0xN`IU&r9zw(;98~fYxCaKktotd{QW5MEu9DRULan>fD>OgqvbO|N!JAm5 zR;)uuF5~3!?Irf3B()1ZO+D;AHJSI;f@Jdt0vEkR8(NWL=#1o~Us17vOTg2$R?2Ps;OVIN-pPoX z?rz_F$BpR2b|LS$UznIu4^SPx+YgS)9fZ8RH0GV7py-us8F=u}lY`(5R|p#< zXP!F+mp3Q@zq}`z19y(F-rmYIsu$zt`+ZX3ht!=*L&=vnCbq}$q*-TTM->zo zy-b{)*9af-${z=KS$QjfTnX%!sJalZaknVb;xwvgnd^yR|pquvm z+cDM0R`aj~<=Y7FQnde5zU}N2Qdw8f>?)A=n*yaNUX6 zW*JRq=~qhq40y#LmrpUp8NcoP$Svh(EJZbJBx190hMQSzRYlt;QD0I1IMu$eHtZY0 zOP3hh_n=r+2u^*Nx(camo4*e2)*#GBSMHbax=v;dv=;LT&Az>wJhuwvtcXyAQr&Tz z2VSB397Qzq%1zEJ93MFhu$3=XJ^0FzBABtV>6E!seds{b5mYF6WPR#< z1lb7R*4bTbC8xu0(r3ENQmp#5{J5HjKDMXv?9}b?NIp)7Uo%YKNx9lZVmP?+mY+;D z+{cmJ#L_tq0tR)A+Y2$2SIY^8D0NB*KFGiQ9+$@4ht+!bua_8BLk$t?UxNF~v`67* z?2!tnk)2~U^vuBz!+_JgQSMKwDJiCXi^D_@vrK+s1xD3^eu9_6prOEcrThGfkNw5b zY+15CJR|Zr7Gh6U9GX#eL#(ELdNppmb`E|{f(Q1cScT;0U1-dj?}k18!VvL9Gh4~^ zx(De)n%usU!cqU<)%dUSYZE1<2YI1w$g=OB{ug0hvQ2rin9g&om@3H~AyWIE33Un6hC%Neu*Z+j!or6?mey zgQ1WCaXr@HnF_6HRf}xnkdAIDJJ}pK^+!gZLsMfF( z)&)sdN#V<7mpY77xKQo#bLt1kTY9Cv7l@A?k8LAL&z0@&c60U}jKFS~Q7jPsuEQeB zDwai-K4N6&uG&$#qQ0~D$aih~PP$3JzC-gc^cMT`|MlUa>gHIXwrtVIP_sA(Z*3g4 z5LZg{Q*mg*QUCjEbigFQ_XAG<3zdLqgZkWH&M%DfG}bw)sTKMpE|I)B;RzzWsl6j0 zzwr3wb^M(IB+Rpm-%miFpV5SK4>c%9MqMz8S>v66z%xG42&_-;8n;kwKO6q7O2($)(N=Iw8-Qtr7}S77p11Il-} z4|2AKcVL8WcrKMqG3Kpbs9ClIL|DYq7u#0fwkMzX8YANpGruZK@<&UsZQce4Q5#!h z&NaQDm;ZfW?5PBqnjsIvZMJn|=EEdt1l?-|rGn&-=HR!oMaElp7qgRZYTpAFIh1d= za6=g^VowPoor2x2lb~`Um%RObUntwejm>Q?O8hH`Zz%3OJiMUDNk`cJg5oEoECmQ@ z22P<#Yi|I~bHJ_ecN_C}PvcV;an7?UE|#i#UqffRHSq&K!Y~3vdd-SDO^F`$?>{_G zwIP5SqB=Y0rLC%E2^mp+zBKsQG~X3{+tbkC(|xCg@CG|dyZ|MR|wSy3Q~Y0LOb-aa4w`E_^jXu8g$1r2JfL*MNJ zbDD{PgEHF&j|Lm$&Q*8PC%8KCkt$WqfxFaRU)EyPZ_yVvtPV~++Y*ODn7&`Y+ORYD z1I7q@SjCmZVBeR((>JgHY;S%)o4W%$Cv{KYabNp)hE!r9Lt+R%vYgBs2AcrU!R9f&m(dz&Bf>{6+R6v_=S(O5$Y zO2iR{TF~xxm^W$F7%*u|AbB;h+5O#6{jV0@qVPGpeZ>MT0>-#^&}mGv-*J@kikFY` z#mX-zlge|y&C#A?a;q0&Bt|YwTdV6G&*wmMrgJa7=-bP|d_EoO_qC5nL(IdjfQ? z(;#ZeaJ5|U#wWnsGxWWdE6b{~;1|)T-R(x=X&cWwqGtwP;m6Oy(>=Up8-Q%Jbf?y7 zwbw_vFE{KcjF8%oZ~E9qVUP)`S!)tF9QKnJzOvo*0cQ2@Zv5%FJYf!5x%&Hgx3{{= ztKo7;=d5DBZY|!@MW9pJ(y%xJYNTS7A^1sHS7LjM0WUx~>|QCqowU7zgL#HuMEbw# z3#v7z;)9BIn!U3IfR*XjvcAGGLC_g}=eevjy9gSp$G7Ln@ZOsSxOooNe^Fz+^>ZH7 zzvAs5Kf=Pyxo(gWPqH?60ed&f2RWklN$d}G)DUalZp0^3fFU>~+{;JpO7+?G>+3k9 z6L0V}eTu>Fe2K+I4LY3wwkH2D)xXd2J}vfXH{^fHJ+B|~1bpe9U`S4%1i~?o>2B(MC)EK`Ul?WWWGrJrd@$bT+BMf)ytNeAI83fu>#?rnoS&YN=G zGT(#mI@dhoe#DVj=)wHPA9|f~u(~%7U7r#DDb6S_7z2r7sPNT1=g<4!<|f-bCyYrq zAjO$M=1_Bi({&o+$@b2+ozY(+vhuy&BWnNpDE%l<-613rT2YWCbtD+g7v5xO54J=+ikxt6JAoL+g9+@AbL@kYRU7_kkf5Y$t^6!Kl*#W`8`ETnc+WD?xPilK!CgRAbyE|0GZ@>qqK z-4bBOb2^W zR30yo_IM$^P}|U2bzn~bL93ep!MVXyR`cawO$+LXEhsRyQR9fRXC;q~KtI%-wWR97 zb-s$Q!y-*Tkqd?kncu}GLazyhE06f8~l-ueMvU!Bq0;aw8_H()j>G}$qvJl$l7qiK<-a@K6jq!rq5PMM zg7m7SmFGN9EFuJmv4c5_74MIM9&QTsHjBWO$<*Bfe}Ms5&ald?!^%#6ueLgHo`AJa zzB7enq5TUWPZ3uk8c}Nz76jezF0dxEX}ET8j5{m=>{!Q~>*gYS5^m!DtGn4%4I1_a zhKW0GAE6^5n@CoV8cOA{2awmObI%Ep3B#)>KpY*YaT*tPJvsgsL?_w9ZZfLAHo3p< z`;TuykNMAuY+3^f#GfT?@VC^GEA@ONYg|$jlTKA}+*YW*-0V59Rh)vN>?SyG1pNHT zU88z6eqzi@re#EJ4GQPE@rxR#6F5t~Jk#`7lc{0*--ilj)i)HzR*9;g1fmoq15Uj` z!+2N^)OKTlw_}^}4%aglbORqC5)U)je{d4!8i+88k%?3iBgM;Szgob`31h&aIQ~}B zQhfB&sm*rKV#nahZ%QiB7J2;+++IJm4h|KXo6LkKU5`r+{?!4dNZ>%`gkf{waGy|d zrl5R)@8~q7EVZ9S7wUhQ1pk$3aG|Z~)g=GxmBNFSDD4Dcq?OC)OyF*r(`g5yweAl~ zkqYAUORm~L-Cn6$S=W0cv9r5(^`I)+8<-O0TPD5(wQuz0VXcylIgpw8@jtBnVtXwF zAKRB0@s;(g=N}Ip+%T*-0N;Mh2Nat71}(31o7cf6aqc~`*D1s4AZGB6R;#O;uyi;4 zjv46m8wMnXzu4o?jq|O0FZ@~tvV(h=?{9y1i{J)77Urq&&Iz#GUIFShFTlDYs+E-R z+8EQeLK6|J#{F|JK*b@05S%`(SIwf|XXQE)?oVEiTh zY(Rl<0e0u{!<>*)NZGB=pb_Spm#`bk_EXWn2Y?hLPT?DR?0k~QH^_gz-VtROO=0Bs6kpA*F=pU05ul)WXXIK0U^wK- zmDpEODdV(*tNd;ri>G%J&0VLt+8@4-jfC?AyIv3(ajZ?}h4iTB>To-t8$Aw?!JY(b zmolZyZPJ=D{pBV)?xYdO-gvA{puEwS|n<>8_JXqgR;;)p; zM2P`{8?{ht-+|2~Z%z;+A9S0Bnf z5Hs;d3vYhfyFpItiEpCBKBWC^26WRe(lA(V^T3YKgMh_5H%(G)-vTh40tip6Dt$mH zC~{yjp9IvPAJ7oBIiyVP2@Ns^y*zH^VYL$u#Ygf$Qs4Tp>RUL93kIedXo5;Davp=P zc3!@1jGQt|@#J!?1zvB^nx_rxc&b=z`{>X@H-(rN}Xz=c*B zFGOuj-QbDyU;iHVKd+B}{%|P-70PahMaB^aG+y

3U4Wmz`pRGe zzm{90!DCO7dKHI6ID> zJEEkdBrJ^Yc4-bHB~bx$5kNPQlY3TO&7=?wF)#ekI>2-WuI7UW_{FEx)X)+_b^v0a zcI<7bfZ};OH8nK=hvOwJ<>fzZ`3 zr1J9ex=g^kNo~wbO&zXW`BHkms<&7FO~ea8sft9<=*XY}hDDNXlc4jXdoETxFqDQp zzPKyUs^*FF@`mOMW$L8ezP-9M)&k`k$`Z6^kVf2EkgdCqXYsmm`q^^m!}_Ak+)6=E zL{MpO>?GD6&48UP|3JB#fOU^#{sVk`(~t3OiTql=zP_fvQ!+D;yNxPSx4>_~Z3%Z-YwHp8 z3K=_xo9Bv?2BL%#?C#-lp~8yrYmxW2{ch)m>pd^yuTbQf=UeU`_dKX~FW~p=+10(9 zp9vTU+g%|dS{WSr$Nir>VP(VI(2d(B7kv)pN}?#JZh&U&Zx zwoq3k0RgbUp#2hd6!!<=?o77EvCKU`>d-m*BPSUA`wz;){@q>Jp`~Eb<>#x&g31{F z(O(k|ntYSQ8ilzT>J}QNa;N(CZKhCVD_2)K-2ze-EuQNK1=Ef*?={$Z?-C7@z&1fqIWE6f%~RvnDc-=u)0-o*fZ4z zzQDk3G|;TfOweIsV&dM*6rc4k@7@}Hf`eC`j$Iv+-1r64kk>amKK=}<2pn0-V^TAW zC->3x;C_{nAFwD12C z`$>gj!vw^w*4*pOKb0E%$R^OhR5B^1udk~M892&9q5R9o)z#HBo=78?3$37n`plU# zyMkN-OEQ@elhoyRcje4x2FlpCIK<6DbN{+^z^BV7!z+&Fp@2vm9u6Rf!aOP^rB3UX zLhw&*{f_ned6?YrOrV0pL&SgMvgIiF=RbZNgpQ{YPWI74j&xaF8dU;k0)J2O@U2ZH zx=hJ4wwA#zCT9unS zTHd$==uv{ZjwT5w{|rlr=RV3-yhFYN2Tp5iS3iA<*MEI=n$S#tx`P|{nx6hvPR__P z?>XqorWY=}_rI6h^pi~}cDBDbF*;fUBMuYrw}w}8|GOUjYq`to>WV`H$reNlb=dQ&K| z*;A+1uRd`Tb*7>X@OO-jnc{Xu02aHsvK6YxaCj=99>VSCMbjW#qZ^xxCC0>rP`7kZ zkV<5B0g?|4q%C9^mQZ7w($O6e66$^b9*Q`W`uykV7)4wZRlH$592^`?lGIdGop0Ve z7}kc>Am_WF^H&{so4^#!*B1hY!>`pi%|{VdS?Qk0bt2;q47iu5nX0GR!cR3$b`eJs;Y`jC%iO1F_ClPI`}~mf-5$1 z-|E~8P>kUEkdPog91#}w@b*?Wn`BeU=^!L)Vy>>J2)Rz%Qh}_Dgg2y(ziM^>K$cAm@Pa2QlTi-`5A0-uavrR^xFPeGDd^ zCq$Wc*orkRCr1hV1%1f!xzE+!T$QI2wG1Rlckg}$%0iF9pxsxKeo~gt?uV59w?q0g zvd?m=ZFAb!)xHEWEQYJ*X=-YUOG}fTBV;_2PdbA zNiE9qK(TJW=2Yi4MZtpQ*cB(z}jse9u(9q8>=%D3ky4F=*%oEyL9Yb zmDM_7dgrB&OIp`s90=A1GLnpEucy--SHS-^?x)5{k zpHb;axLjr*oG-ysyRa&bdLoAi7A(rAB4lyO9u39J1HVGrXh-x@PZ<)=fl z9i@t89EY44rs z2QSQu4~qAcUETbcCSPR#>d@ld_YaKd*jYIr_ZyzAq@Iyf_qvx9{3uoKaEot;V*b&k zw`a-7G^e=@V)HE;0$_mhE?x<|#ed$=F!|m+vGmUFuCCeZCNq{cUXKF#rf)4T2GlAV z7#ZxhuiPMs3F=ese|lZD@$rhN{nIX;(_KVelA2F9+g`L!o6)vD*t##7T6y-GapWI+ zXsVMQ{$R^&rt$=i_PD}4r%6Op3=EFl_xML|e<#Pc!R*Jj$)>qnZLhp~sp4_NyfGg~ z$A@RGG8p-njmQ3rD-$Ot{bu=qL0EFBQpT}q%=(+hp+kr8Xve)ikZUDlY?fO#GRL)o zF4CJ{B;~(>(foBbdvU+`1Vp`AGX;Xpcfg-eR@CVuGBZCz@w2}CqJvH$ zS!q{_;-MIa%7fIGD6@!!ga?vCaOn4Ao@Q&Bkg`gDYYhH}Vz-KZ5B(1ro4yzRo<-Kt z#=Dlt@uuq2I=iAp+FLtJapa8FpeVqS$3#AO<_>?7CXY|wJX;34R1!ZQeN1qi-W)-H zUIpAHbv_0TNTL(C_;VaD!w<;G$sunuw`;!F{OL?T(sTFtF_^b7te~|WD_j9l($mxo z$zGrjok+2uA_`qosQ5g;wG|*Q|J;A&yVQ*n(jc_?e|I5df~>qeMvgHL{fFYy{92HU zE|{6YaYMRieRFg1OYIQ}iC<`Yh^&bRhfVnT`K>{uA0_UC<3vH?EoP_ALw|-!Ix9OX zOCFvt+N#W+)J=WJz{tEvNJxzKfE=ipRBTWyojKDDR~q1adS(WMAI`kb&el3?7C2N& zo-RM}_HM;J2mS~C5-JePEhjX&UVB6S4B;Oy*dUfc^6B6U~|hdu)7 z7&Bm?+)_`z37Gd8#n#0M_+5tjxgDC*hF|{x{AOgI^lvm8LjQ-hf zZ)-SXCbRZv)0iQqoUzal$YvJ>0Q3tA7Wd#*$R@py=1qAEf#WE3Twz^em^C*(zS6r<3xYXZG;13y?x&?N~O2@ z=1poOiO9*xh02pv-KIKHG3ho$10eu>_wLF#iYy0? zsHUc7bgI9f(~bEUH77;RjT?I=g=|e|%|40<3%|BG8mk_m9KhM684MA0?}a(44Pp=S zI|Gfu!JM4gpoJSYt$hZkriG>;eLxqgt1}fKWw|TFcmt{}Fa6Y@%z1pfk@6jGzx+lo z%bh@n7P?c*lOD&TTlVAJ5zpY`+l)6;!QU4Xfgo(O@$Yr)5{KkJNBW5M9 zH4T|LEgBq{+e=tnx0rOJ?=THelHQ>?#lv>Y>Jzlpp8AZx{c=s!&q9;h$K2dF5SKC$ zPKAVNK~`YX6FPTVLsw8s;^4uk{0C`=%?2KGOfsI2pm;sle3_Pbtz0>}j-26gr76;c zz+GOYn?s>y(?(YJQ(A4{{y{LqT@p765f4x4SB;*e4Rv_0Y;m)w&|@)CTboom>kPe` zg$38uH9}#zQ|;VjEa|N_0tXK^^e=7ZXj2fSbPBNiYoeqo3~G4ND}PsUd(HM>rM%ZG zc;p`83^8pTKhj_86_Sp=E-o&Pj*5C!B&A8en|OjP2~U`Xg=NXx5^(~X*A%9cH?2)U z@-V*7pWpu}iC^m*JT(AJ_=|w;LMy_E?HWV#g|~(b6>AwnMD_Lcu($TcwZihqtvd7= zz!)l?coyMi1mOIP5iZ6{)N-Y%0#X{+_I0H`SvYf;%=~WjAifMs=!Sj z+UWl0zOEJHfQ)SkoX>6BX=R~9{X$0va9cfuiv(JO@GG~ZgFf-w$(^FMt7%H7Z-09k zpThq*FZ8WlvIt8@k^k9JbDq-yH^+bU(eU43o;rSealDkMqq`eny@!S+cNGKWN}ZQiS3mUkBQY|*?6;Wl zC?S#DZI7n$rK5vVx}Is@zHGb?j4vt{vDf0I;3G(0F{ytZ8v3=qg`6q$2gRI5v;d!< zLoWv#64`#FIefkl_|v4a$MO8DiCUMcBKnlh*XNFCE|9ogzrNIv^QLBdp-k|mJ?h1k zYj+st1XB9|!Jy38wh`Q0lC4?wgLLNlNpJ#707sdlCgg zcVbtKRrPWtB{#m%>4KZlkG;bGXLIfGEI z*KWor^E42MEA4(Qs6)y99T$&)h?qg&j`7yGXd3oE>+gwis(NIi!Se7ZO+|O$5;Z18 zZ&*rh0lC2)#=yMyZ%^3oM)tr%H%vV`_2xZKrLD0Rh@0~?F;PBz$T-O4F|6Qcd+>?8 zcXRJID`<)YdK*w~QOarvllL6Bk)uEGsSlIGcl&8MJ(U+q5}Bpd)sRtx zQV9ha&M+kNlM=xR8c1Jw(@0EGTGX;vdEO{hveE0NQK|&Z-7vpkz{>ypGZ)q776!|s z6)!C>Uc3mG_w?T>PvR)fJv`03y`j9j+2 z83*ZYC?nT;;*v+r%m*7@EDFB#4sUEz??Vm~hT*{!-WEETjdi-{&+QXP8E>WQagnm| z-Fc;OEc>fwq+Ee=zfr%?>_;UPz9cZamGyu;{TMO^1p8pcA$5hNfEWHNCoPs7#qJp52xUIthdP~aPbo6-xgCtW?mx0pH=?m{JPj#{fRkN2G zH+Y9f-<*8*H~nipZ-bW!GxugK`_`aPyF$|N=Y=aC;Rt9THx0OYYb@Ghy&pd8ig%YT6-Gt1 zovR^TFvmh_u>lkh7e0}K)RQZl44Q?9?pEWYte9x)EW6Vnt z);V>0-1I4RM$T!vRIm!*P+TNj-5Zm8c@3%f#bv`{tRr}pR7<$aw& zw_6pRXZNHae6DVw$;c2-Uw3%QV?BL=B8p?Qoa6Qxrasxc3Bi_xGhXR@ifq+U9DNzQ z6hzY2e5sqs;sF=#w*R2vIH=&U04XZ_gaCtq&C$nsHx8waUrFz@ht(y0j+uSfYuJi{ zCt^WHZ6YG=^$}VLHhN2TCUvRgVk+$AD3WSU(Z|q5&5)r438h??z7NlfT|88hk=*OD zl=>uduW(vd(uvp&OBvSgF&T?n{L*{aKGspkvV=K!eJHW%a!aCF*%4spU42@x|1in8 zNRgu1&lB8R2g^K+Y4DRRqZ$vWi@4LK4+aa>`YO$2XYqYvFTM3|kmk>GVValS3!kHv z?D@XD8*7{iOYQh~FJY}#CCW0TA)<%^Yen%1!4CWO!H#EPfpN4EY&a>ya7#T#Ie==b zMerq@kcx_4XbVzwP6gzA-+1NemLvV=+INCuM8y*2tyqhFm9&-)FBew*YCn2=tBhUe zP%5L6<;<}BSBpTuQL+}>N5;~_^ylN#f`x9Nb$ism0K_}f7vmHgxI?*{L#iZ*N-`D( z=yS8S=CwOT>Uc|&AKEa)eP>zo3MBd~P2nJask*GEqjMAN>&M%oZN=U_yt5(%7hih* z(AC%1hc2_*x<0->g-9)d!Uu`Wd2lqKZ@YsrY1)No;E1L=*O6(K1p`qau)Z6jM+3j2 zDomap47=slEqdf*Rl>=5Q%?qH?f8Tr!bAVf-Qm4*D~crc$Tn|Wpyu0s|CHFbiYFI1 zb$h;T9o_r+&u|j^))68~q(MEQg?Dsl7mW#~yxsV_xHv!mHRN^k43qjUyoKBc8y!PE zUgIp=!k!l*>76i@*70ZP|GSxkWv`t_yQ@ZRk!pOO!47?VYSPk}r+KuJ0sT`Bo?$!g zao@0nUn@CH^I9K;(GlZn;3OQjKReQ>B-^yK^?suz;MUB)OueJq4##q)+loA_T5+ZTXmSz~Paqp2 zc`GF@Zgruyx})!BLt@8KDjuApc<_qs>Y6>wPZApm3&`vw zN0IT$tc{S~8-Ov2G*Miu&Cn9D7CH=+x+2!5^G`43HE7&3VU+8W^epqf+Gi`QceTuj zq03jy(Z85KnSXt{uK?8bhFU%hRgk%l3v?gK`C6YUo%+5=y3$fo7UUf}Df3^6Q-$%E zil!A7Ueed6PwyoCqA8Ysa>>gtYOXqZdU6v&xf!et{-LPk>bIId27-UI(EXI~Tz+Kg z_>{Dea86q|b=M7pIf*1yxu@t}FL6LVY3YG?@5BxsY#SK~1`~JV=unQ%`_V;Z9qXJu zsBYHg-4J*cy_Th(*9qhGB!S{fXE$8(x&DnB{b6c_&h z>T0?|suf|}mTTaBEpXN7Upsej@7tUVzT9!jhe8%Vhn(mb9)3C}Ts(NFLSN$OQFp-o zU22B}QufL-w@GNaBq|U~0Pp1wA3k$w#^xuBu?Um>58*fl#4U^BjOuaH)5_1Zgt0B_t-{ z`HCP`Dk(V+K|3jl!RYEyYJ|Xp*Oqv*x)%n^bL#8;_4RK-tA&P~!C91E$8p?4jh``| zTd5(``4-}(P3~L0c_+EJxOUfnkRR-yli+)qzGc%P=4SLeHE zVPli8npc5c7H{uJEo|jYjE@6BAz;Q!PchxIg3Axz2&p1*yAb67DgE2O0$C~QHsM=}z`7nKEjwK)6$JxLO>s?5xgX-7)W zA3Jw$5N3gRIW*vy7z+J*fOFWR5s{J5G$2)LPffgXH|J-}R`B=uRAIeapFKk@?puh$ zJ4*vh4x?8+Lx2kRGBfQfvON%1U^793X6E21)fam9vB{Y#Ee@-BM<6V>RKVXqE-&9N z49;d`VlwdZy6<-fRWC0ub@j-C0^5`?67g$aw`AJ(MIJf??0k(nV2+HOJm$!&e232? zRn33HWW`+2hd&K-{PPKCcaIUJEchn4-}`FIVu9Wk4-+M6V*4>k$q>rV*!!^Zy8~16yS1vbgc? z?(V92z1`hqaI?K4_4R^+f&d#&or3-rRaJHDa35dNG}IwXm_L91toB4}@*{8ow+Tss zPK{toOJd^`(gETZLECJ1Db=ppNqmOwDk;3-b6tb?RO z_Zs~00hACF7iG$%u2pCln-7;vH`z{GoARzy@J;~4JFu$Rxw-I4xXl+#OkjS5S({eb zcOwsrsM>l!kftU@fCDl`6K(NCp&`g~@81vW68n0H6UOEZA~k&J@JOAzE!I3&qi>j2I~bUnvTi6l73d3r%EFLt z^wRi7<=3IcXMhU;2XpTEzkh2Jblq&&;8l5w$QNlC!L^15i>w7tPVlK1G*6&o8qCQ8sqWVp%7FqLsruQ)n}_8~|MANF02lHQec zQ&EdLH(nKb2o88qh+I!Wj-{cdMs8KblEP>Tjp<9ZW0h4i(FTQ);b;Ste3BLh61aC0 zyVji;6PLVCDU3RVfDxy6adz(dv9c%36xxl^)IA0N_XQes`p1z3P3DKsawa-n*7G}c zR=33yyFFzd=rGv0!mWwaR;=oU4h8J^kR#&AGL8y8O7$3inf&|rL|V%f{v8}`1x0#{Zy0xv<{-cpCv78VweE*HgG($j&}s-Vp_uoQ z^g+)8;(e9^mlELzD;WfubA zGPk-e@NZOV`4CrexVq9NpO}7bFsnc#6lLqDb#}SC?h$2Se(2O**{IG`iJpX()yQro z7a1iLdQ49Ar*Svg3P!RP1XM*b=;20ws`o6$(}NBk*NT1vr3NDK6gEXhy@qV>UwQQ4 z0XEq>QFTzq%`Gf&Oby79p;6|@`@Z}5F|}n3F&!j!npn9a`Yt?m`gr@05nxtPK|u<3 zOb`o0eo&E@xB2~h#`a~1E=?V2h{*l&dw~tH-7T3HFYX>s&3iIL^7_HMRzYzVCRSEc zbMtvDycvARIBJW<*Z}NlAeYyd!>g4HQ4h&`yynZ$Zjgo9ReR9N5FB}jZkqo9e`HW_ z3Rx$JanFz~BvHY73;AKhpRQ3J4g`5aOGzu8Xh_v{DYv%`3v zpg+S#S>W{g;q<$`!AaG3UcZeh_8fNJ*zZ^y@Q8KoOYqs`j$A<*?T4|ZB##>QUNP2P z_*9f$nD?>0gd<_Xrczc>u>ldJG-a$lTU!$v-9~rE&Gqs2{@B-tHDc&Yh=rS0+O)zI zwSF3>nnxf=>2!c1|Ni|+go%)b;{prNIe1J$VhB`7S-ELvdo!ArvtwczvL9OKg&OBQ zHXdtBV~DO?_xCR|t$YOoS489r{1jYPDPX4B+II1Ipcow0(w{$&SDz~LP*6~;VXFb! z!JN_Y>9Z~C3PeOizkdBXb(2-a-~${h-Mg~x+Z*1jUu8uFeq@_?X>M*)XQ#-40~25CJ%Mdlqt-Vz-et?5)6n>Vx51U$gbfH#;GD)f2|j4xn2J_f3`_P9qq zYt&o2C%~nUSA+0o3r@vD_{FbO+FEM4x&PSK_BMd38sGT}D;21jnBLa!tn_F`4R^Lx z#F_WJPcY9mU>}u$5r$9I*j$|{035M_+;`bZGC*K(gF`0`Ej*N^nH;#)sA0VVL7uz`sCMvZky+yjWR z7zrq!+N)orZQH0b^!) zK0aH*>4T8{I;Q{Ywzhk6-cBI#R`L%iXjo66*mqUOneaqzrGftbg4&#!>1ia!o98em zp;iCIULa&sH#I%IMeq-ZvpAG#90Hmb_%s6M^}+vw)(&UlV-{Ro7SN}h1TAbHS_cwd z934g=84Pz7`~UeuSOqXL>IYC2!(NxPsx2#-;w%_*dpvuzb;jez3y}0ro;(Tifxd(7 zX(ZLQfNim=Ld~tn;dD>Z(NJ5v1ED6kY`sNi*dtXd;Woy^#K5e?KD;}EQ9yyff(U)l zU-%Q=aEK&7>DNGN>q>{`A99z&rCqiu^KyfWW2`$Sxo%&Ld*;iCZyNZ;o2Q;j=W2W| zb6oI7=a|9oL;MMG_s1>U=L6>mY_&NyD=tUp3(-cWWM$3G%)C_yTKskbkhRcoW^Alk zVFAV+&IF7c?n4JMGd0N}@o@8AI)9ef7=?C=u$iDvMs8jnq3|FS5I=PC>DrMXhz^Rr z3mhDd&>p#ix2$b#JrVSRT#Jk;q!`ol=Mi_MlGf145a&H|#1(t|zYm!Pnz=b%(aY1{ zU8qK_!?w&5>Oe&5{rg|w1eh7n?ZOFVnd1b~JJS!F{n`FI68?t^(FHy}8+Z_4vP7Yy z2y}to0*E1y6WxI{eF}1wV4y`vv~Wq1o*{G4I~R>O)(y`(z4FY8Zs2>J<1nz^aVy1_UwTLX!lD@qWB2_KK}c|)JSN{@kYaBZ>%YB%VaJd= z*g@VBAJi~Cy_*Dw^SvvBjWk>oOKVeMDBF~=Pm)L%NYpv2)xcl8ZX&ISl z?Tihvl(*v4_!ye|=RUj3a$v_4Nu---P>46dX2ZGAd**c8D)>+|G>2Dl&=nQpNF*XyN!{Y7@C_4e_w!&F7{;KTeJe2*ZWZrpX$iOBcx7L8LP3ER zn}>JqI3b0*xY%P;xfZa4Bee2~>G{!;aHp;XRR z<0>3Ut>mQ&SbYje*`o|$mx88_`p2pvZ&(1ZOWguIkb38i9LN-itiEnCKi(s}eoEo! z2|v(atagAAX(mVT{6W+Oo3C0rIwYkm!9hHZU}Pg<2V=nX z_R1m-fI#6c&eYehfZ&U~0nTuTp-Z!mLu~M+Qt>%%$jwqNZju72uu2YTGfp3&dEV*n zGnJUx`{}Dyveoh@BHBkTCrazp7~8)+%!wV-VB}9cd~2@+QTv={$i9gDn%1=)Wl9|= z1E5F!om!AutgbO9yng+fgp?HW$_2h8eBc3c_28Tj9}2UM5*NfgJ|oHuLV?&wX4(-D zKX&#k@ML4t_JsKXsi9yyIy+Si@K9lZ;LH&lqzOBS)mN8cC+wVhOwP@joj-pZRvK+|xOK*9 z8eGi_$a$N-AW?~F#R}MS;3(ziBr3A9vQUc@0@mnNzu|p&&3`(2_;7rU5Qvj~0)v8?t>D{NjN`<{B=Y8ttxNpBZO z9IAgx#o%8rA&jd1C%R4%C(ad~PFLo`lYI2}G41J1<%6@O`gwj=Ua3A~kh}FuPiY%T z4{!7tjD$TTVW>3!>(}YnrKwvgkU-*S*8|4Hw2@6uctdxkLS-28ffY4fk&Tsx*hQ{u zB1xJ}{!tSZbHw-e)+qOy%!6m8Y8OJX!$v){BIJ{biVmvK?|yKNkV(a>=*}sm5#$({ z)D6%In%s9c6I&|&+pkb@y-^&^i*=0v+0jZ!TRL|n;h3L+wu(lF4+R^kqu^`Mrnou! z1k;= zMtpb$NDP+ou#H?w6V;+DIZZzD<%>KJF(6Cu&U&#j50-{OvVVauYbBGkt||R_O#Co* zVs3F^ZZ#n!mrv-yC&EMd4;<(T*>l5UV)M!7Nf8p$(Leep_OO%jYf~HYeYUM(7w=Gp zt$wpO@_Z!e|4{Ya@mTk5|H$5nNXi+Kq>Lz1HrXo~*^;DWM3hYuvS-64B%5}bC5ez- zRwAS9?D4$M>;C=oyI=R~ey-=bx;W48_cM;;y^gD6Rf9WrM)WUB?`N8M_n`kl?}Nbs z=@(ysprTLGl1h#r3DwYI~-c`yDVuet5SS_ zej4=KKDrxWS`l%vu@G3?y-P`L4Ozq7?5uwxe#YNZBu(<&*z_u~w*MsppxXsc!F=Ri zDuu0I@KFU32_Ug&2m~u!ZRiUTlce{-BX8Chi5e!?A{*o)--P^1TllHy_xbvuXAWgt zHs5ge;-vEPt4>AR_9x64wz3bGB@44zF^1RpI@(fc4{?07?lSVHVGLi0CF3k5*)_k$ z_K2}(7jCjM{q1xlhM{3$*aG>fsTRMxw}Lj&}I z+-FEQ{!WQ<(8s8{{ra>Mrn!8B+oPY^@(rFMh~mnXTlmp9Nq!;F)zyIZ z@UkO{^R_FL?M@rtEon`ZhxWoD=J@?CCl(Iu{0K{-JggIcy5b;9BbFB-jv<9xEf$N} z*4jEFG_O^kZ(4m(zc!2O493Yk82pkuplG3PoMmmW40H`w1quQ4l zu>!#oK{YQ?CIdtPIEwn$3um&VqD9V*f>~exMdR4|WFc)?9c?G3Mp>GlitCpePYQ&R z8+F$U=*K!=xc!g&xkho4L&>?XI(d|JUp&g^?7uzp87c5bGZ3Ej;oB2B-xSjxr#jNK;dG}C>a%v@ zQ{hs`ee%Qt6bQ`Fcr{w8s*=!2zI%7(#}R)1Cn#}MgMvu|q#>oWKj6c~%pLdy*=}Bl zAE=1hgY+R>)Hs>}KD1HD5{iIs1XkXHoX9?lxn~Q99aORM4e6)@oY|DVlLDD*8VwW^ zN>KOl_C}o%#>FBp1z-{TWxNc}T}Ml+6PG{U8Q>lONIC5R=V@nOP%}h6DSXf`ZSbq) zrK9F|AOGElcNj=yd-IGq0H(TiYt^Z0)9(6pMx{nuF>X|V*FCPB4*~Y03{NOpGHd|= z>gem8pQ!?K;Cx2p=Fm&U$CH}|4H<3x={??RxA)b)pXXjXe6|unUzi*dBMMj*E-%Dg zJQWV9dj+JHXy3qfhNa{D2}%xU&BvPjJ4O}&)xv>EE`nXPi9QkwD4H{4tG|Yf3-GwH zG4A6P7>_hlPxlTCyhkG{C}=QzWJkz{N(Z|IMz#m^hDUg(m|A%*`CnpV{OjLIWuuGD z3L!F3KP+An`Z_BS7~Cy9#%xDQ@`8{kfb$D_L>%(O9S9Zzbs7wC-KCFF8OOx5p`((2 z45MtdE95M5&1Gx)Vk_KRNFqaH%?NDZ#5ImqcC}c}&(F)IU34 z)oAkP4$4KE%1$KEWxn(&`^Bfu^PyA1;%UuqZH;1c8|merl;ou;KV&jEpJf&0W)&?n zvPrzW;HW8+Iuycc#kT^dvp>T(B$){&l=c1r&~(zF>Yp&<({%HF5Jvw3>J0xDH#f@l z!RL;qs@#QcANMNYjl$B4&-><$boSADm(ySp=DxMZ@{PE9Blz87SSn1ZX}1uT4w+|pltp-GO~-46M=#f zKrpMe)-)1Wbq~(_K1x>&yY{(iO=9qs%{O@+JAaY_R?QS(iDbdijkX^@e1O{bZrw0p z`QpK}k88)^wuJuJn82Lw1nW)A-g4ZvNlD$P+G{KVqTymoNT67v($V>f10ABrlt$aY zpgDw2X7X*+@N<1`V_FkDGjV#-01=Y%%(%B8u7jy3uef*yo@Po$E}-?AC{$&)R)S>6 za09S}lCQzL`~eIYfF?+Kz?nco8At$hLAe!?w+Dvakd~j4NW%Mx5*C~mnxPxGOMyoN z7$17tA0M=3pgT-Q)@e|tpSr5`tS7!a&@zP5gb`-!j*braS?O!8&{K{8dqSN~ev9HW zNuflnIWoHdyd$tE%c;6JxoK3N6}J>z!sSG&c73Vs>O0Rlp5w=Nb&SdQtUX4;CY+T7 z>JyL5O-z4RsjZ4Xy1)&&3Q!E8l8T^&PB;x9 zV-kr8#}d>t9A`E@8R%w^4y_!KNanMoJ?L|$vU{fELTSuM-(KF)WaABehL0|dtH&k2 z@pouGBD}l3lb21nU}&XDklRH;Uu)L}XV|)@M`-Cp`mTp8IOh+CAxCXLweQf*IK0-z z1zq=35}k*7a4CSuzO75Sq8)TOEh)(ektMzxb7xfp{i7c~q%{~B8^ba79vmyAg0ME& zilIqbfIR#J&q{{D*F`5ePN8GRWRR|dvKQx*C_?h^a*+&$zLdkQ16>OBQGPXSewZSz ze}B$U+bcqET5cDi$`@#si;@|z6y5^5mBa7$l!VhV2pl;A`QmX22_nvqehVEQ^xmOh ze)CpzFOhDY2-r3AdTODi=A9OlrFp`fWKk6kou3;TEkV;^6wGO`w;>e@N+38U@i!;`{&lpn z(ls%;g_Xatu>tHMub=?NVk;yXK`@M0fe0LcQAFVo@OPZOfZZ)TTtjIEy)*oypGTRL zPB5d&36N9UB1t@c{P>|?<5o781+2hKt=lTZ z{0aUtU~M1_J0Sr=iHT>1V-Hu1NWm5|MbUKQl>tx+GRT9D3J6##HVx=a@OFq10}Ti^ zfRmLqySO+lI+}#BJ&)nv8Ty?~8EO8b_}$pzGl*EZcu`R&o+JgfNFrZ?`zqz}VTfz@x|6sH(dxc(-o=XxTRxdI$efhN<{FP`EHYL8-Qe;U7{_LSYYyxpP8Wdn z5Sb(6t)ssF&ik6&`aG;CbB1T>(1LQG^1KH5W~rj%4KW#5cgmm$6Wxey}O;#Z3!97_BgV`2}17h(@>t`2k80;;D5^ z0;U|Cn8*Pc{_@~B7!q8!5QqUTg|-Asjx|&XKj$`%FwiVDw}`1&o7TvX!_R z9gykrUTH|u!rVM1Ir(0Tk6r*{EFbt8J-hg?T=&ROuENOO4tcYxsj_gLtr+M7#n@kv zQ13r;yMAO2EAsc`qGJpT9Zd$nhfV1h_!oS*RgK5(1~gzY~u= z1l=7hK1A$&x_>WcdO?A-+qf>Yt#}{E50dp-x=*0?^V`OKjkf5&^lBo!X<7IVWK!eqcX!JH}=>BZ_pbqR_vQ0w2jhEcD> ztb<5WH{fi*M$tEu9Zs+LN0B)JOn&{}l#!mE3-&918!&Lw3m3E>ooKG9kwU)Cc0d&8EjSPacC~i;Bz>dXqJvu``&y14jaMSN5f}&pM)cpC+l%-ufGu**|Tr^OF)+|0A9RTY(cuWN`6o1DeI760Run8iP&9`DH5I4NlEPT z%%EwYHL4dufZc-UTVx(F#s=euqF83)5Ro%7Vy9>UqXF~+>Pn!D#O4e1uyLgW+vEM@ zLP)(r9KrE)d8=0k6z|^Wlt4l()BJ#!rzaK;f_x+pBJ}Os4R}BC^ueSRMn&mtMqiBO zD|q@85Dt=QTXjUt5u6dMfE+^sX2P6sdO5&T{rnDeBcDGbaiQq#+YKag!o);;w1|y? z;~TAA^)~bCv0uMBPz2z0K#KxZW(hXh$B+D{PZP6!0N3LXh8Jc9{LBGIh|>@=hHnnc zAJP-RdNX5cTP^;ZW{XRt)$gnE4@X@QUy|jZSH=iqI6z=15Gc&_iM zYnToy;iN@e9ezV`S_PtX-e8r%Qj5KU^@2nCFQg6F=vYWN;Hkb~DPDJU#GiJWjnB`^ zD}mDM#*K?&51i%qN1^9|{%ITA0rv=iJGd%{?NR1{K$rvU6!5y39jii_g+mcf<$wH1 zD-5m9k0&5;0Br?>c`Pv5^XDZ124cf-$$0l5{RtJ0dI1O7UA5Rh-NoEoALR%4XliSN zw@4GQ4ulBJe3qA5;oP}hETS)8zlN6xS1szUdIAZaja1;QWG6nzdDHPa%nluOq zBo6)CL{_0fJVvzxO)u^TP+0g2V%xSpQx&3mF_6GPe>Xf_0}#7y$kxiLCHfGgTk738 zPNFh{LkC+>?D+9@$@mot(`svfME8|Kw1scyJ2~bZiLMOiIf@4LbAOMJ@NsTN3Co6M z5;z$yv8NY2(WRWH6@O_sQ3Wmn?F`r;Ym~`OY&zfL-5DxhH}hAvVow>)7a z=co8no;o)mjq@7Ynf9@f1uMwRk$A;wN!=O?At5n-R8UZz?J+&5_$Ab@IJOOTWRl17 z0iPsR$8xisBxQGfT@!cww&jmA(ba8*w*yBFemfIi`h6~`d88!bR5}iq@Yj7gc1KFN zJVTFMO1jnmLAJ3EHc@L4juM>!l3&PkI#Fs2I6vb;hZfc8O}JY?0~DLSU}W6h)(p^m zL#v!3;aU7=%w3(QW~xtl++kB=@s9%W8#@D=#7Ep&nLslT&q7*|dA6aDA8j;-gz7|xaa6zx+R;B>m?bVO6 zWIF;A*p?V4N*z9>F#WJ&PDmFNXq1T@nLVNw8(o~UQ}EWlwsevElz!(wKcdJbBrZs6dE%+Y)FqnIh)yd#@{d_Uh+10_r_5D`PnBbh9vSz%~X~DIC<=M{_KSR#ro~XnNri6 zQfK;ejJ9{~>s#LL7z(E;+iTRLU1V03j09FsB5;|YhaFwcS78yL*GFdKu^Y0`D@$0aME|>Wadd56q zC9!Mk?CeJlJnj}*@>kUq2X=$*%zgiO?X{p_UV3d(_}7S0D8E3!A^xm@slgS9>K^oa zDB8{~jTs2#)+PmPv+w=#w48D|(QN&!cU}vp5X*b@827oy;uEw5XLV973EwMEezfg8 zm}jIlzdd{-UFlaOC~ia}K~s+y5U*TbD_JzlC=xjL?)_Hf{lcz5ms^m+n1XFqf&c0!CbRJOZ9l+3|npb%*1Iw+m_>TkPk;%Ba-% z`4R!YH4g4wLw+rdMdm*}*YeIDJ5b^NUP5{{i7dUo%RSw1!MA2)P9>6SbwId)Om1{D zFOYVIDmrPZ&b-P+XD^$9u6!_41XHtuVB)#qBUGC|tSI9~x{SQ~Zx)_6?U~*brT5Cg zP4Y8~D%iC9eS2b6k3ps7nIZDeOb<1Tl|#81|N9yPOuDp*On26(-W1@Y3wmpM+wayd zeM~z@F2~@TaQI=(bnjWOzc&j9UpTMPS5Gbqnkk;+TFzZrEH&yYqfI+VSXl~M+g~6s zS2w(OuVJUfL+l^o44sb8MHl+IG z6B0%GyPs3!D2U(ZzyAr=VbpbDbG=_P6u~&^$&hh}?0((3Ut`pUXZ5^gD3^#|e0X#C zyPp4)ei$7^*blDGo5Wu_%NEFY{>u~Mubv$xZ4jj_pC-3b_`7cMg)!xD28q$RB#T!9 zKi>U2za;d(pTK*#x}(Z7b=I&`)_L#XaZaa& z?@QydktAH}p)I$ze^M@-8{Ri2`0l5dJ*oWL&+Jdz@5p-P8BZ7bZqAHO85x!S2>IwY zT<5eH+VS8rN3*7coltdUrRXcJ%jL>m9;(9p-v1K!vK@TiS-k!3t-g8eUE%9|gACke zq<1rv8hYyls!DgWD(0GR=W^N464dlxUW!;5vUr+T!ted|*N+LKYx9n$$Asxt{{PRH zcX)K~FZC6c$W3zZ(4f^h`QJz6c3B>%zUFj3Cq_lJB_%QEbZs<0T69Cl1)nFnr9VtJ z*RGFRHn@&EUu0|YA#8ZmFbT|Mi7y>jOesz3tW);cVAVX!T+H z)x8TW2MfJ}#3+SM1^&MezQJXsYDxL3RZ6Lofp$m%g?5J%?wvHeLNBRJvl6>- zS>ORzcUz9bRR`YO*n=P5R+n-F=SP~XKa-2K#k>_w{;WgqdDZkaG@4-ij&P?5gdwgT zreXxOp#@n_gp%Uo1jFjx zPc1%wNn5)G_rEev3~mubA>%p`q_G~6)7h=9WF_`MYmaDHW_q$FzvB6UOCP%q>gv7q zY_WU|(`L`RwbC<`Wq(s#7#Qd;Z3Qh-33~5O7kCO+v;b@SZyIGi)D0*YBf5hne8C&w zmxP3bym088q_dHtuLC26GNWCGCDazY2Wlp?Ee@Z}sRLp8l>|%!qaf<9hb^vowt!pV zki@(h$Wi_S*w!^a&Q6g$>~Vdy;YcByUcXFO@o7sJnEu4vJ39N zRpCdIKRW$oOl0{67Yq!@9mu?QNE7}yUqkCvEdVSO)j&WXgOhQ-_=wh4&)%64vHlQU zuEidmHHEAk^&j-cfV2DS&zGPli%}H^K6YZgGA>(QosU{UFaePBZdO(+4m&F=D@Vt| zf`auEKASvq)@)W`nTROngg_T(?fLT`M&=MWgOh3kHL@xjR|8VoG&R4#PXvbNlB44p zyicGNRlICnj=95}4=)rv?>Lrv^v#1iSFc!$?eXiiDSM`L#^!{I$fWCZzS4#cy(hX# z5dvWd4S@aEiA#M$19ToP?b&Loq$tG_^`xhowKFd9YQU)D4+NRT$z1~Y{C)KVo~6ae*wj%Mpm;Ce($Q&(F%xvV@f zJv$0lQ8(0_(D4J>I+z$_<1Mp|AdDAir56^m%FQ4SLnt^Dju^Xv+`Up`2o8FCZ=t@1 z2tUiCWG-g)rt$PXW^QgjpxyED@$>VpaF*ki1RVf5&Bv<{`Jt%*NR*Bh2Q-AJoI#gv zthM!+ME^)qu8e!?&)!`h7BunQZ%Uyjpb;GW&@Ve0Bop;#w@;32iT+yW!S%lkT6CuR-gy^XdW(SC%7bbP|?Qpyuzt$u2 z`gDXtO%^9Yen1D}OTKZNaK(9mkg5%Q27n#;mN{y97kAxNK2liueN|Rj{in{*B&TH1 zksoC$dWM-&+n3F>Hjj+N3v~`30G5jM4HW_R;|6SJSv>IjTxdV7MH#QJM zRudg)=*e42@-Ii__Uc+<#9E&fuSN=ABlsrJjTcNzB#hIMx<_CZxaD0#0dx!!8m!w* zjkW$^r?q~N|5=P8w-;?xIU>TsCI{ZF<3UeEz<7=62*1EJ0s6y(Z2J6}NFL&~L*@@F zfw-htiB8)m)%PP0Hx)=mR=X|zuuY%gd10yWs?7sc0Y{F}1hoLpjD-w0yGguv0g~Y0 zK&FLy@q?2GO@ga%Q#)!OThUX-_)anP4E@xZusj2Y^_tH09ikfcP|6}L&tGLlW zA6Hu*b$;0YJO7Pi2w(XMEm?xz1<7&VjhWHW zoWerjFPq#Z+E?XZY0djWuy(&NHI? zu^fRY0)~fjY-T3Pp|h(C$KD$HDy)e2@82U^2Dm9GrN+iaP@k%#MPFUjeTrpc(_ZB* zyZ8?XqXqQBcqsHYcs~?mHBk8gT`YyPf@T-Lc-bLItOE#t`@VmNaF_@{0jC4S$Hc_6 zy*uPZ%l8LunF)Orw<;HM1B5H=8|)Gb_cgVRPQ@8r66;;~{nfO>UKD#+$sX7m&K%X* za~Fc#w5wgb0f7Mf`%+b$^>U3qU%Ya%{5o%Z&&xWxyVm%e&|X5=ft%@{ALJr{DDd~x z2kZV3oe>B;1^;?7Z>s%GREM@mT5C$v_+D~WWn7>rzCE7p5QOmN4312$227J0|MN$a zSqYW~c(!2kui3%A$7nodGm3ZN)sVZ(?->gH{6&E8_!Z-i5TpRu7}umaA4|YEv}H+6 zhr`yuxgl}|31`AMgOP@X$!rhO(wK@mX3unB4i^nAt%Ls6Nx@hiy}!bA>fawF+^iII zw>SOK_0nP9UfE@*;H-o^f83?Zmz!+MwwI5p`9Ab>B0H@slQsx$5CGIki{!7r@t+{2 z^L{*s&3BKnDeK|u0un{;?q^(xkh9~2f)0M|sQf5|{R!{S`oRVcT~5s)Jh{HdhxQ`K z;Mz4}UcY!_0#VL-{CGLA#Enc8uKeVFDZTi==vGYDILC^^8NqZ^3KZ|ofQ!NQ2v7^WM^RS-(6_rPM&Es?2RCfT z@)|F~v-AxN?(}sH^?~nEnV}~4Oh$VBCZnuf2^Y4pMR5s}FPUxoDpbT(`Gk@{7HbxhYBs&KOXbR53c165Iar|&K;J6$H zYlb*iILaX}h0+ZE&zIDZBS1M*M64B}PDx5jL-=1+R>tC(JuK<_w@*SxN2lQwlx>I4 zoask`D~OL`q{nO)Pt8xoRz~yf;p9B!wRG)vDekXD7^U(Vp|E|66n9|2IHbp~D`46J zE;SStC~c7TMO3O->tIyRAOdWjJ^KSF0MO>&&H~hp>}g*yuZ54gwxxw}iQl>g0|NF( zQ7v8mm9-y8^w?MzoMdp;<0u$>=anQQhIN6WM5F{?3YQ7W8!UaeUy7S*W^=MoKjFZh zB?GgEz5}*rdP-h^623{p7m?>>Z9NQ%40g3+F{&c;p!y#|;eexWZ$;lbQc3OPWS5Z0 z*1ngfUYpt$L&mlBth>TRg1L;ktDVhmu2XKm_|TcbyIOTDen2~uFV-luOki?KbI)dQ z2^TURlX%VJYefFwc;CgZ>FnY%(Uuhg3o70obkMv>O*`M!fA|0r15j~o=hDoetV2g0 zuzMsKNS!)$bE@m8sOZjyImj(ASBt&$F_am2S*?sdfKidC*?ONdQ1<&s|QU0l(t7iywRnps8IG73xlWR4asDVP!+ug0RWCd7l-8&2HtD`8x>ZQ!g&P^2xiayZHF*_`G5bAKW%L{XJqT=FMK>vfXiS)9J zx1^FGG{v(-vAm=xJS}1xIievH;mva^eT;`^XS)*>v6A$N*(## zdf*S?X6w1Ogd+osLrYT=z(9jdDH3D7v4L^UpmkLviqs{z8k(D%p=v}ViIr7)PR>Dx zy(^b%2VtTY5Ew_#h4|Fv<>lhS!hQ_%!-`p5_zlvn+Kh{t*%_AyH2*<+5nZJVU1{V9 zl%Fv_=q<%9{_?YsmoswQJ>xlxKz5WdUBdJ`ye8TYQU(#%v+6@-qHNR}d=geYvT;yO zsWY`*vEda7tA#XddH=KNGoJ!aPvk88ypcN^#jPKB;c?xEWX-ZXxy9ed=H}|2we(Bv z-(hm5f6qt$o^JL7nlZLo4E%9EW{FuMCmyvr=)X_&yQ*95>XsRFatu^1Kn0?S4>=^^ zAAX{7joNb@m;jhQuHY|3i$7AUb%|pt|BL1i#{TCNSLH`o_yee!_%Yf6lo$j`V{@M1 zMZvp4lmJT~Gt&u3_Wm21WCf%PFBcw7qQn*S8u|#yNaBqMfjl}r^j7#gL~{!b&8@}p zo#7ucU+gaBQkAk_*sts?en|jMbqGOVW{ZA==g|RsCKLcta2cU2h2CLdb>omuD$)t$ z*`kj?a*(bo01gui5gi!1$cs2J5Pe%zG=R89Fzb+m;-rRXe`;!KM3RxT4(#d1^qU#v zl4aDpmV}{>fT5pASObEA7Z8Ia5E8qJRRURD;f2atm$#@$@zJqE;h1|-QGuCPHki_d zhX5aJ_hRn0`wkl4S-lM1>?cnk36@B1j3wN|HbBpUKs5#RF@+et@Wta}9eWwgS7?hMa;~0Q!2j&2B@&=3?#c71GB4?GA1IWD5 zlVd-iCWMbJ0+?1sC}z0g$cE?_h1#%WDnj)U=^qwGi~1vwfK9^A$=L@x0w{g#4LAs3 zx%nSuw&OSvoW%8ort%Stdk`Jsk7LHj?b~a4dRe+x11JMH+6EV!n@@u3$8{l%as%2D ztLI1*Kr%lcA0LL8c>L}Ahob|^!J`5K04jN6FqsF&HpGTvR17DlgS|ZwABDUT(k`e! zAk^w>Z7qS^1a-@oPoM0e%0ayb-2<}OLamKa5kr%VQUxb*ZdO*%L(-q(Wy^9MlW6II zRZgSPhxRf%Hy7vTu6d7C_+#)c(H)}4)#P7*&;ShbJHKri&l11qZ;ZHk=_(N8;k{Z0 zWv6=%43#K}PVX|qr z*yoL9S^`JE_}=<+jaW-_Gi?1z0o?SI7ja!hcQ`;6gi0E-vvhr-@5ZrdiBLJ1dgYh> zdql)5{rm7`(Y=T9T!9$}ATdmf_Xn~*1w`;Qf?kA$2i<3o6l+}xh-#=OPH_C3?t6S_0zWl1VVFZH zhD+i2vu4JI8=0JhWHy>&$47ag-LSTi2I^ox;?*JF1C_Qcj79+Dpv&m%?S(y=tp5Y@ zb0j4_v1%bLMOdg4Kt9y|GjLDX+1>4N#!(2?v>z$A&|0M%;?9MI6iOM~FOefFX?|O` zaN5Fv4D5@J*8er5%wMPPbHdc=aWH|2KKO!b1XHc}2?UbIUlLL!HDBQc=NA;5yxK*7 z>1~6^7ic_=C1JWCfr6uhrh=JshrZ=hHcxGZl!q<#6AP`!tkIZNxIa4~%bfk2qBD)PY?S;9tGe z+bA~&YI$9ro!>G-mck-9g62NPAr~J76Ao5X3P7yJza*#muF;s$z!QwG8?izhHjKsq z{=9$7%OUSXwD%=%x99(7ae=rAm$5GFED#IeK!b0V_EI9h8VlF2U#_b-AmJN=gbw+@ z!$$1v7-kraN*#b=)95_%A<@i17YR^`z7F%fIguzI8>=ulmR0^3aZ~cl(f)Gz2JGrO_$)mQf8OK(Z3m0%8lG^vjBi+eN=q+#Q-@;-CxsX;gKjm3JBU zP5$|VUPOI%t{1m8>e?mfjaXU9H)PQ$;OR(kOe{N+2@pfqA=%}LfoOD|Sg`5oL0PCS zpX`q+3r-*Olb-oC(ZAJHzZ)Qf401?@-rf&VveZy?GOdGAXNZ7h9v0)`es zhPMKk)*FaPOiE(YC|s$S4l@iz>0L1ZDyZTygT4>9G^+ZeNlhaoBe+~OzaBk_kAv5D z%=Gymd}2IMT=HlN+?}1}PCYahqWm_39Y~lWzZZ@f)^PCx+QZ$Bqv`bi87#Y5lDF5NT{kbBF^2F*dUgy;shT~k{2@P)qk^yfEg+{&KCvr96r8zrtW z*UcZ!h>p9IW8m1Vp}d%NxH^H&K+FBxRk`^fd?ms$e3DN)DO4eB636&i0yA8y=(AnW zE1)|i)K~lNea*SjjzQE$aP{%TU>vVaDHw2avKqMDa35Liua&`IC@9Fm{=gl(yWLm5 z+4QCJFzSrJpdbu>b-|!01b(522q>lwsl)sg%t|}EoFg!aLklCdLq@nUF49=2<=t0`Y_HhS=)_)kQHjVb?p-Cqkp@#oqMYp!6(H@nTuKYZ)Y;%$NU^wyc8ux33a~ zu|W3Igpu2vq$^4|+oAIQB~M@p9oiYBVq{`ML%M-m3CIC3h{Mbqm>3hz&qM z%^mHYzCK7yk*?X9cUsrR)AJ=ZkP?rz*aKYLPVj2sDo40PP>p|&d8ksLP&>Ii7#5tA zh_4}9X25KcG1Z}M`igV2_72S~HE5S_1xCWzo3)hoh+@OvKeRg>R5BvAvBA-iS}6uJ z+kM45hjwgs#ehVYRRVDpsPk*%)-YEFKe5kB)KeKeA9GUMGk$7}^38qv6kzoM3Qrv2 zIHH3^v=3e#HVQx0tz6X2ygEXaDeD2s#?~dqN-Mr=#+1hV9lzRB{Rxd4ooc_zGs}9= zN>K=(pMH@ik8OejnS9?DMOi^^@+u(CxXXhT&;-Ieh7t@`PC1QiI^U9LWZc6bOzl8{ z*gvrcT#Oj{dI%FSq^>$ka^<11dw|!6LJx2zK_L}D7f2Ko4_MURIBZeHpn64QJVL3s z8c;aMf(yiw&3Xi?8A3RQ3)r$~=MKg-g6P7i#mq~%^;*;RQi+^)Er5w29`d_9ZGs5} zB_(Wu2yM+dR;T7_Q<~gtw>;H7gJ2kRCp7n&?(Ie>#_W5Ni3%KfuzV4(HZb-$l_;|M zp_0UbP}_Cc)pZ`P2eaJ*8m;$4a?8pAS?OJ=JeuRwXLs%#N`;2Ff{@1rl&Z_^pm6K}vuuSpK+1yK-<|9P|K|l!MON2fie_ zs7&vR#1Kb2VS3cP$Qy2Me}E0)?vvZ}T>7(T`~kQju?O^{We8tKyg`i!edJ-x4x(bC zc&h}mHh7xin>*GkmefINGQH}FQzYkWCf?(k5UBFoHMcl%j;jIqM~wj^4=?jB8^4#l zE3pU%v(Et8jw>6Ig-0=xkwLk8eDcwX-ax~F#u#bom_xn+uMCnRvhwry?B0zy?DdV+ z8~kw-BO^P$&pM-InZe-&6BBY7aLA)Yz(MXwS{+pW{5jDFWAvwja)ikCp>ok>bq7ES z#Wn*>RA9g-Rs`;-Xzxs4UokN?rJ|y$s|Yzc{{7m5XtZ(?SgH%3_y%U>Oy(KXHq?3x zr^0^nJg%vR$s^SGwy6AVshBIBcmEEgeOs1(;uvD|b~*TIjk}DD@Ri?=$-k=x=8u{{ z^p9>~KZb{+EgovVW6dE@M=lQ*1_RxQhF~nWXQvh6*PKRYa(wz#tWa8S}blcAbk)Nlhl%rUF&?xhQ!O% zfI{NJC3^f*Ll+J9wLWiw8>(7oz3`7ap6X>04(Iz)d6r>b3tj*xb-z$%V0 zTf4*M<&;MJxmMF(W3&WN92_AJVP$ym;1*sk&M}>b;%caC2n2$#a4}{^^Qno85uRfx z+lZuviAg^4hd~u{b8|C>D8}*`XCsRY7}4IS<@GNA5l?h0j*dsuz7o$^s69Yr5#WiYVBgs7I0kU#am$+?UkNuEzl#w-&px}- z_0TAgJJN@*W`^zHiGLP^70@D^AX`6;(41DcdJZf8HdtGPVe@D6e{?x5EDY|{kibmv zlUwT~(37uH20hX=HtvCN1*I!ou^=Fb903q)W&;RjUAQ1l7zMWAZz}gRUX_oqa>j3q ztHEWqm9&PYJ*oYo8Fav6@Kxo|pRL+l+Ph{7f&R4HHzj1;0knrX^xlX+WqI!(#9)C! zxi|gM-qx1;Nt!cBXvyQm*WK8Qk;5-~`b#HU6MybX8+kQFa5 z-qW-LY-6$6E3$a%rBwh7VDjZ|`~5e-jja1l!{v(?7o?O-Hg66IZ=d5{JL#L%kTpesb3Ww~3m zg$oF-3#HCWpB*L^nKIk#3W^I;Z2|I6ySj!~DXdxMG$>X^{0Ra9{-CTCpXc+R=fO=d zc{T`lCeW}nfCT7iKq&k;g(-i;@!-X8SXqXwP6ZecwdRnGOe1O*9z^^Tj97DVcSrgs zaaur)I@}pO@Wbf(2k2ln6S*zbYwp*C&5Dz2UG7*qkU0O)=83t_AzF)B^mtoVDbO#c zSH3SaXSJ2jx$sYC?R=Q1;i+?r?mH-kTwhsdmUEYOk>jGJu5tSoLXYHu!oGos`OTXL@!`O?ski*5?!8LFXsk&!AN#%#KM~fwDYNP=uAPQv#hlUlrYq-Gys1hfqo%L9&w<_<| zVWr#_f@GZ;SSZ`lfG8Yla$Ssz+===A6sO@=*mf^7NzONNi zU-n79EVZ&S?rYY(wI!ddG(Sf!%azNg|HS@8XJto>I zVwnRvSOS{jg*$1wz-JJCfr@42tjf?T zWE~DOU|)L}GVVv-Q=~{nO7^ZdnkW4FvpgN1=obV;l>iw1l+Q0My}r-wigB{QXHMIl z@way>G2`c8pgl&LBU*vTkmOvtc8&gu)n2ai;0r*E&q^v70U5&_%qey{C3%75a?2`T z{8rel6o#D_%~?hL4}|V#$wSb2ydfv_@*cth2omzLgYGcYegL?Ej2*x2&1IY( zd-m=<$J&YT7u2uf#ztrpw6u;wfW?nk3AnFD z2zBEO3|5GVi~E+N!CoRo67lxv*DMbzF)e{ou6w)igQ3~MV0EBcJw3?;YK)eqUM^&k zNn;AgCj^k_%&&U024yMu)I@!-0bJ~cV*{UI35gyyi)S&p_mZ{s94=|t8&qT^Y4S@- z5IwYeBQ1l^Y@v$Q*OF&4Fu)|`%^bfLyO8OIeqDT{(q7-o7b-73FZC%*rq#Psc%k6y zg+-b^&8v#%0B*T&6qF@CU5kn$$aVbc>T#M{YVh>t-sgSv=S8l8bi=wm_CtrG48^@X1iO z3e=HOu@c8>hgX@CEGBu0*UYi0W({o4c8IwkBBZG;lD@bGZx z7F3qS4a-h+oRkruvBh>Yd+7`#=>bVukI9Zkgqfj5`}dAPez>Bn%mFF|Jg)<@^noG6 zD-3Hl37?FkP3cZNM+B}yhX3}&oEOPG4`fdWh+<-+(h^k%1=vjfIJn2Oo&TT@hoZ#U z?SwhOUgRZ-PotHH=jr|RqdZaLFgcnzpmQxv< zkWjAeLCVDMm?)+eJ4Bia-PPRd@j!+P2?@rLO{0jwGEOE4!fq@dHPjWYDujfr=g+%- z{NQ>%ga|Aa7M3A1x15*&Giel|P+QsIoHYv}+`}P_BFFp4fqW-yET;C8XsK^6@;=<3 zE+u*Dl)`>_z7_K-UvvIAc>)jZU4{4+wKprFx@W6?WbjG`-PKb#&@%kl@-+K1bBM=_ zlV9!h9`NpwJQzq$foY>z5Zi$%9-8wk=DoAFA?LSUYr+xzgM7mi=v=dv2#m**bb^Ex zs1IczObx#1cMEkdeZ@{U`=E9qY^-g@|0k_NMqtE(8iMI^va ziolmZXGA&oX6sW=Z!bW+oQ0KzjMumT-dWNYi? zR83=C-glr!?vpvmg zn7{{Oi8#3qv3MxNkFbUTj6ib^k_UhRoO<~C5H{WiYXMz6`Z*#?QFe*$a|cB_cLuad z>1P+|1KgO&_0FHKMCcnR$SDJoaaKizEXTKyw!}aXx{;aI zj2-m>yU20D5JRIwmBj2@3BL-;?AtOy^}{T*avc(F_k$DnNFf@%J{k9Lfj&1V=i=R3eIq{05cYi&_ySKS@6Y6%|zJL%C_bebBi+ z?yJP`D+UGz)ve_R8tvRP?iY@=*yM+Njg{4k%ROD&*>MTp{5Q|@g1`FK<*TM`lAKTo z1q?ajED34_WhKZ@;+ROh0bQF8%7SmQwG|i$Z~>zVU!20H1!A!QaM8JMH%AW^Tgm_8 z&RJM&Aj*kR1x5`z0kV7|!+;A5FAi}HK$||pxgfL~BD^gS$m(6}@wZ8q{QV=V>xz5_ z5DyQ)@D1YNZT0VFKaF15q2_}4U@j?^`2}`X)@o7J=*`sWsQw|6_H%h42#{sz)K{UV zs2D_`rlSj=?tg3vP$|` zSXyqQzmEP)&oX+E;{@Jh9)<8cjtY27oSe9!1-G;NXe)-q{g~nXBr{b2o1ryW{x<&l z?O`PB7x5)7y9+oDzU^pxQs46S%d2A47S)qTy`iwjNf{S#x+89c0yDXWAqSf zurI3N5m4Ena)jmqu(=I@PQ8jpQd$~rdC5besayM>_}~x)631i&&l4YuvWm(H{8;VK zd-w1Ez^stS5vNs|4_{rA2c2N@!|jHFTMcLR+a?l8~3Yzl?(0Fu!7)|>)yd4t#Dw*sh`v+oYHQ)fp%f2Vm%_D`^n4U2-DLf@+I zN>Km1axV|Dh3_|}U;XCtd-$ZlARo|rJBV{s1=1J-D}D1mJ(w+wC71#q`Kb;UAELp~ z$_z#xxpS!(p%$X_S#9=D@$Zd|J<%0dTl)JbHufh+Ii&wUfC0~YNIO*ap(4jkY)jHq zhTWu|qo1RHzR2tq=HZ~Jg#IA=qt})cZ1yMZTm64LU$v~fP&o5;AySE_vCx&z_BF;i z$}>-P7Y|lmH9C9LzR*w_@pZAWzmdX(Isy{%`=Y(H$EYzmq(UW23R4+J?(%S>gQsfn z@qoq%;l|%PJJ09n;~Eo2uL%b{z#il>@x**;Yz&Q`DljP@gLCN9*|;51TvF}mv`UTt z0Mi&A&S>{t->D@oW>0(V)EiIpo6_?Z|f7MY4 za=0&34Ao3VhgM9siIOJ6#IAg*#QQLoU3}yj& zq{V+OAgKl~1S*yFk9?)7%DfiM#r&J6F|5)TTAVv~$f2HtIvbq12QF9{@=maf<%u=NEn@&*+z94ch6<|yxhD908gyX;10RlS0owR#J&E8UH@#n zH0J)ykk0P-lMy$23a^<~#kgPX_;OkiSX>Tk7Yychz%ohiJU)$rki`Ag@T zYbVy%*Q@?@nL-)^)c`dkBfs_tWQuStG{zn{-1iui3aG0e%Icc@-%ui8&^5QLFGlj9 zx^q2wh9?bg!v#^6?N@8Qf0u^LQD@gSs(q9g!Pd#xr`X6y)GP~3uBs*dx&Ro_1Dw~= z(y@}2RR}&JR4%6Oa}#PZEVeMuYQy-<6zk+4665h=|LduseX<+9GOtA-SFGl?!6B7 z>K6k#GU#N~tXUfTQFW)B@iHv##s_@^OcSN4FQ86*y-#-!HgqG3UknpMQd3`5NrX}o z=Q}{(h6ATq_t;M@h4(FB>*4Y70ZNB4FQG~-92_f+hteQktl3FBa?2Mb8n86{({uR- zj2Xl^U07O(JQ{}g6SNlokrecqZxmHi8f{UT>GOe zbBHSZ!#(ThW`1KK;|#0G6U;AW_!zHfco})!_N1Y>b1PLxO|8vd97%c%ev3zVsmWgX z;QOT+ot^;14=r=+gd2=AIIvnBD&gf~qNXMk+MuTg3G{1bM(4ddB)1nXJglnnn`RNS zE0=0%fs!*v{|X));wtCOh|&qAQTvBW^YieteD1%t>i{uB92|G6IC>V1Sg!97pO)!~ zd^If?4&o01gd$wHO#7GTjm!a@*!Z?^AJn3J!Voz1 zKUA)i4qeYC>`$o31ZE2O`T2!>-V^aLJDV@74c8~Cas)jz+u>TlOe`CGTE*`-Ud$eR z0mJKSl%-~7TwfRPo=8baZKEnK-i92M$a7pMWI#eN#3ANyp5t6cPED$-N4Azh>h2n$ zrZA;UoRyE7?evd76EFCoAtR*;DqXWLjJvpo;*Ed4`u)5eXbCD;-bI zhnrm_UYclR-}r66hsGVQ2MD4x`p!f@vA;EKlcLSe%{?|{d zQvBJRQ@~}>aNZ+GKk>k8Wzp3WBt|gff1w57AO4BQ6Z<+SlFuQo(QSRn5l7z-!yi~X zI5L{;%EQCy@(%+M+J{~fy3x3}ECWG2EQijgqj^<-B}F78{-F?n^$38sL+==?3Wr5$O^G1TIKOcQ=yKDXk(PAPpi4g0wVJ(kN2zn&jYel_mcLOrTT36UV3n4n8xG1ptuqnSC)yo}*~n zv(@2Bcm)7iLQz|lVFRq+ac>gFM3#L%M8A8{ zg+IS@JH7pGy+u^X>8V3UHKH(Np@!dyK zpD*oxFQc8&6G*sRO5_dtM@g2-%e->EKah)&bA69;jG(wPINfe0-u%@nx$`TTXN#Qx zF$`j_A0K-(2;Sx9j<%*zm0r3ynK=57YS15hU{R{9BoeMX^Wr5llnkQT`Tsc#p~Yj~ zJQUD*j_gnId863>Q1m~~U&%V2&z0QNBTgHC*kXkc$8e~z zFEhyqTsBR%J;E6;h${A1J(zub`^KnHB8vsS_wBUdT$k8gwd{Hgg z#);X2qklg!k>Ph#3{GU0y_dUEd9=zX9?w?8cbIBBaO4(tQN5w}dQLo)=TAMfIUixF z15*mP5*X0|0{HjK-97N@tJ^2f+~9lOChYS(f_I<~g1jF?{MSMJ6N6hPih7Y7^`*%!rwT*86HU&H4n3wdg4Z8%N4=|aNVR-lB#n%5 z2)sM|Lh$nFkZ|C6R(l=I;!)mb+0uPRrTE{i#SeUevIs{H>BfaiZSu7U5a_`+5dxPW zxLv4Q0H(S$0#=o@Y34UK2|3!U4reXA4z4f#q_pYr(+fT25;u(jyv%bVE>CJh6$Gx_$oqp13ZvV%z z=VceX&DJn}GQG5jx3Oez#5YQ*&&quZl)pY3xzKGFj-F9kdWr~s^w=a*7fw=RN5RHF zJ6=K>4g>KR!Beh&YNT=6^UoVK?-p2Td*y;Qq^|46oc^Yat7&|Tw2}pC6TayOw z!qHl5S|7aOQ#GOS?|G^=teztbj~3g|o>KUplMS{q)rr|?Ri!g$X%2i5LB`sxBR>gY zk$HzQD;QXisG~%@(2J-^K60ZE$lXoMV~)q1(dUX!JSgo z^y%tyYk4^347%uvc2;TTf-hJfjHuR`lCZtamF~U8!TJh2&L5+#zEwsC4@3XO`iS3g zmEJ#&dc*6&RI134ez)cHJtv#VXcsI3N#1mx#Bb%fl3$vv*9@xO6yP(-^)3;a2n;YkZ`hb_v+p!g$z=4LO6})fsF6#TodQMJnudH?^UlwbY;lFV z?`sdvX-1b5&RylPo={rBwuj|b9n6?^BjIR@3-8QW$eqyXX zddMWs&7N_rgji%EA#Iv=Ek0JL7*ks0@TMc_V`tWskhr(7R~>En+U$e2jFY$#8=_vc z%b{0DqEPwmwh3X(#s=Y54Z-!~zYP}oGXE{?>()SlvowE8Mr{O_nQ1!jTK%Jf&$bB| z_hc@V{^gHiRi!m#nuP>u=_rdnQXeHmT*>E^@^TRTIvkdhew1)Lud#XSnv_sdNzas1 ztma`ww$N$Ry^9_kK4mYOK2wO2x3^FG-?b)&A7Y0Dtzu_Ht!WkrxZ{KhQelbd#u9q| z3rKPQC#9ZEY|2fp$*DPR%17NGJkoJh%_uA4_{*d;?&ov2KY2Hw@VoqTc>+`ZiA|5X zF`QIU2tJ%}NduCMyPTi`AMdO5m(mK~tFcbRDd*d>ipKG>>}#n+bU%U`^tJQ95j(Ap zM18@i22AIK;>mJWDk~GQqZ^M%*gkPL+y5ZGa{J)oK(TAoY?v1A_53+Fh3mPyHyCn< zc{&xOcDm(Q)|ymJ^oU>%;k6j}36@&S73X_o1pKYo-QpNtx|~dTd5GtR-fw%(H1D;U zv%4(uV*G_$)#UOilMqq1Ro0~q^WzjtZ0IiQO>athgS<;_El?lbYz3(B+-qr8fro+q zXi4r~dVch!^wPQq{ZDdZTAKsdTIBTHC2Y*5h>23gsqVs7RbqFCgkdow0#j0nb?yD_ z)Yb?=PJA<@>8{;fPbtS$gl7-BUd`gz3I@owVeFBs4UU!?J~}Q1%Rn*)W3#Sv-gmmA z8W{LeHZ-Ny2Jq*{QN(tXZF=|lG!M1U17e4sNbA4HhU25Mul`*1eXrr){Hi|pn%rn` z&E6k8gruaT2(d#3?aH_sfR9WW4|D3*On1K7?MzkSNNz2!NK3_vsA zt6b)5SGu}`6|+{ci;v(yn@)mZl)1*IjzvTVcX2+B!r{jdE=RHMIcEiJ(MhNG**}`t z&UqY_qVB)tllJGo9X;kG!XZt7KycIOU?zGs$rN(?s<*ym*_<2k+H5=dDzm-0G8dwR zWJad2+e3{DaRX^fq)kubG9D&vwQJvc^U-v@ncHsi zv#X}5aua1f6+S)=X$b_^gVbm)`>fSzZP`uL2ZrBEV z0at=7)}3r5v_>XH|8yxK%&##!D02TKQhIPdlM2s)oa$bkg|)Q5#a(N0-(C0H)wgjG z2;EASdRi4E(;@GJT9$sdU=7==qeSaV4YXIIXq~cXIDa3?5#G(C7JGT0vC-JD>`lS2 zyik^v&{nCAz?=K=?!D1y@UEzdS$0Lld_(>)OFTFECS(+?qn`LL1I5E#4`l({&rv? zN&mMz%gSGr$vUkB*q7vZ$%9(-#+$bimKuev@51frl!aBuR19W*a`je72uNKzU$>Ji zzQ}6-(7AEv`t)erj$B{{z9~A=KTPj*IwSd%310;xeQ;{_9pSw~tQ5SU9P~iOXMqpN zMdG~k^ea<^3s*5BSgI)97>pl%_VsefKt(B&iEIrg?(Az=^XOpzd6MDP=%p3sYq>7F9-)?#ua@JJaful?2u%1Y zrc)cEHce@|(>ntAuZebu{Kmo;9I;_*z~5cV1+qLWy>vzD@< zi4(VB0mDeRsLnht7Q1?~?!PDNv2<3_ZpWMHvZetT2*f~$e^lb(zTrr0dcc^XKo*G> zTWs9VewvRlB(d0av#5oOhCBZ5T<+9IHmI33DROk^D}P>piB#x`5(0tq*Hh7J+bySf z6t_9ybp;-jU@y%aeQf zl8iPqN_yeFM;_11@Tr^;qg|ridyop8x>lc+#8I@*J$Eom^2g8Ouaa<_+cvg7LiRx+uR=3CRn_L61_Rq$Y- zz58s-c;B6Sq>k@oyKu0%Z^HeM?a?3?jMjS{{nc5GU4e=mp5ZIW5=MzmuQk@qYrU~X zSS>F5L{>g2{(wIi6FXt4I-Xrlo%kF~Ol3Ei9!VVHe4oHU?Et67Td6TsHilelmb7NP zrmT1Nrd$`UTm36Y*(u7O1!7#NFdNg24$JJ7MVkiRNlWYsoB+7@@$Ro$-hSc`TKEiI zfzvxmrDpd|+BgblJh4J<4pR7;M+f}FJlZRMBclWlAOgSFQX${@tCEaU54m11jTy3! zl3;pPOl>+WIgbg^re`^WMui#lj;(Y4P~Yl8gs4dImeDc(Oux=o=Oq0>$Kq|yc!9mS z<*7ZBF>l`nl zvqeq#|K(Nu3~f>+`%E=oEwEDA`)gV>4R!Ynbt6k94C8Sb-*=Uux8Be^KdV|tlD*pv zj8$Q`Nt|$jO`^``b>~dM8yQ5K*CBm?64|{MmYqD-H=3-%lWv?D8@>y&60vESd$Fvu za-(XOzk8HHH;AheLz9BU7EI24P@Iu>Ps_liL<@6#e~o!J=>lUS#w2F8J)y}QWLnDf z=}Mc9I#ik}4O%6|j(7-4;)q)Lv!Owk@BUd_wqKDe{wu9T*U8){#S5BkPnr#%+HVe8Kz=RPFe9c$;5ecf+Kv z+L#CTK}F#AiT^5#3G;a`3z=*l>mViTr<$B|1HMqO+#={9M_5wRtiVhOAnhiwQ)Xum zr3rwEz3hu)u3y%);h)q8eZy@bUfN1-H#?lzMMSv|60x=!PJquwP|HkS53P4^1iMk z5SF=s@q&BUbNFBw8-uV97b45Nz@i_Z$9P)dbpUw*rW6`IqL_2CC-xoWfx{&{!FF$c zeeY+NP!`mY2D?^19W2awa>|QX8RB@?pf=3`=QmJ=!0h?f0>46}btaIc|Ib_UH~dvR z7ydq3VuNom^Y(b6RV2-^Y0c8_pA+$CeQ=Vr=HisRMl0}U!JY#!4G^QE!a|T? zFg)@Bjtda5L2v{F3^syb4r#N_FTw|@e`w1IbK{M;?F4PZ#!i3So-fwhIWdjI!04=E z!9`crgEtIVWH;})7y@t~`R5*C2fQy3aAW7-a0Nfm>TjC$4u^^J&n>7|5-lKadOUou z#!)-a8=W#J!*m6YnApK=(xo}s8pdW-d89pI~>rJ>KVvq zgp}ypHEyweisPYTjt?PbLg@OD`1k+`fdAn~02Wcu>(Dom5rNO}+yj$+0;C(UP|pt4 zy2r1pW2$1$6Ua8%R9Lv}d-x}2z=uJ^g$ies{~>g5xW7BTer#};sBC_ap4daydDEiH zCC+^n^TVrJt9q-f&kY+Uzp6uk{Qu(lqO^Pn{b*llljH%>4B%p4sY zIJL%6a6W8WQBy!;d;!Xxco$RSDoYdWk+u5<4Nxwr;M zm*0cwD1@^814e-?`0KEfO@NFG=FTXdLlC!Wp5O7WZ?KSH-pJ@{=biWG_!w@K9ZhMd zwxU_zM;R{sH?^b^wDB>onFMLb@!n6SEmul@uY=-{EiNdsvjwBRPl(l6a^qKz ze1A&VfR}0l$0+&j)ImEGKgNPjEPI6gA9w(8P{sJ23guHFk81EYVzUt0rVws?togtwEcx=iT-b>ig!Pa5$>3)(U z2<(9R#hs=B`Y-3k>Bwdi2+V2tkB)n*1HUKs=*VQmvOHh7*K3C)c_^vZ#*|@j+kG`e z#U=?i7N!uAHudjedFkT9X##?oyhqt15yHH zqWqj3P--m0$@gTs0fpX=3Sa!6Lh50&ixIdbrrSHe5~U$59wKxN<@4IJs3X37d?q#2?Uu&t2JOiA%e7BP^G{=u&vw_6$Mz*kWvgxaha)w`1{lO&qRTF@B zz&AbwB7^g*x28ydiEj$pC@CrRz>%KkxCEwCuS((H|AaWQ!jNF*MApzXiHq$;vt6{_ zx_X#axmmN<{;+NiZ~yN;ltI0jgJU|r%*okk$(V>*3|1s;LD#D(I_^A~Y`k%+PsH=s zi3KF0H_l*h0%>OlIOe#wFtB(4p#g$;J-}j+$^Tdnc18lihXRCbNoqh^6#zC3<9|&t^t|~LrDNrz;&n%j!MY(MYaO}2JSSN*PTGE#+p>#i6vw6m)ESJ806HV*>Qwy zFHBhoJCeh8yh@$UI+x}~=qI0-)hk%v{jaa-MTO0Up}?IWcXcRw8%X}@@7+tSf5Ql! ztsD4CWCjNa{Dg?(8RAmsbI)$w3M&^~pOgdLO6cwPBy!>|pIQtBQgXC`@ z*g{RyA@)n7q+$OLe?FOs-;}>qWpa#=AF;9X=h*MidFRkarTh?baS`^cg;Gk86oNjt-2D@*RINR7CT85i(&VxsSahqT|e;`7D! zXs~uwj!4SkKIR%`Aa!zjx`PPaAYa?uXo|8lI8`=Kmy&cRsIk^xvXiMV{T5NyL&?R7 z(kqQ?AZ9V`rKW@!ERL@{vQ(b@><632k%Lt%R#oqNzoY?KH40OD5ppg^ib1q>@!yT3 zGQerL*)0%%dXol5#<7CwYV2V6VOU{^75EftjIGC9Wz*7^o9gz!)W{=R=hIHws#~k{!<-tyY7r(^>={M zc$0C}Je-c%+4L0_x{*VV=iVPcjO^f-FJ7LWyHG6vdl$g7=T!{O?b>9!oj3t_1DtgA zN-}nYkTd`_P7Lkv+GO{+=Zjn+KhfR$7(4JF-&6_%TY4X$A!cc+7#~MYM(xxgefJmT z!4nmos6{IL=gaH!pI|m|`s-Uag!LuyWZ=ZFc;^AZvwFhiPL?Jl)&1RR90e+HM*rI; zQ)Do6>htbe%zB=(rh7WG%{=S5vvmqx96hw@N);P@RwejCdx3{adNEe;KMu^#=UUVB zD_DUGIY^i!;M42Ti9p@g{yvR<%e0ICQ?_`T`0R-$TeZ}0fGRRv@((Ds3T26aFK z2@d{%8e!zYO$Rsm4CFDD=*>t(~xe)oDxPKX_O(QQ= zllc|P%mG~-mXaiRO`x3Ibh#Kx(|~IPOmpDlfOGWH%8G&aOSZzeU}nu6ag_NqJiQ<_ z^}wUqBH}C7?&Ry&{g8IK=``7S%u+NqEfphgxPO` z?gO@Cl#=AeErMWpkKtz;R87EH#EQ0rF4p4h?R^Y|1kh$dB@MhYP)35wo`av?1sp2w zWket^SOW$nb_k1v#93JSt^G0abmEBkrQZt;y_S9MExUj9R>F;BIbth<>EYI$z(t2CgqfCmMKwR;XzLAiM9(bt98Kd(xD)TA#U1$f`PIBdL4Z!B}a zfd~}CZutXxlyjh}j}^EpN7zrplNS*h3g)*JT;UI)Scd07ps)CyJZW284{9tni`-tWMiACDe#>WyTFB25E%EYSk7V7@3P#$>)!w^|@8 zE2|B3A&`&y55DzqwKqvHKEjTkMGwTn;V1@meoV|TxMf8-BzM7M27Dv%Yl6l)zUhP5 z$?h=Lss~>1YWt0}#*an*mx;zQW0`yW#z;IQ4aepi{kP2(?H1uB0Pu& zT07f)i1*&Xy8-XzB^cI&TV1Mb7Vpb5On2s>6>$(m#@1Wbj>>*^5;3MB?a?WYSjjA~ zI$3TWOZdQAv@|%Fa{F_~li@Qj65ow=1E%+dIV1@0+(b{+tNdpSW%?DMCHj&leZSyZ z{W#)(^eCMyp|z^okLI9%&>HvI`E((@3N#!t)p3!2H)s}Xj2|`6%;8Ad4ELyGTVlnk zyHUFxy)$a09&6rW1y;Z+)QR)Qy2Ec6b#MY*(K}^tf8AdZe*=mkhJG*|pmO?~g!~B) z4!knI;mL)XtoVgOo;AP71ynL53g$JU*8=)1%~K!60@a>G?i|&!@A)%lYN~PDKK&`e z7e}>onJ>>9=zoh|5}O5Wy4tj+ z{?cFNZehpC)azRfSbZ={l%iRY;#Qpn3#>N(^S9sMBP)-*BEXhT9}3NJ z^2|6E@st+~EmnC%+Q}uudKz3L;6`Q93Eu6x~hyNHJN!hY$8<7-T`O z!nZFMT?jT{lFnnrP@S{;YF)T7rv9jWy{R z18_id1#SCGYbHl!&Ohr(t;gkuAI8Io8YUc${9_|oHqu_W!>hs5uBPH9UtR1m!mY!D zxcL!_i(m-{pW0HDpUkibR&T&eO2icce<{$Vkfz8W~0aIJpfKYP=(3wCNuXf2rE1-g|b3Um{NMDsLWDJy) zPo+dWS>~RZ>nv)E^4vi}BLoaqHq`D6pf*NXZI87^wkhR3{Z`nU%yAFKX*Y2)X`aF2 zOL+VuQQ+d_bO7(U_#*I*1w*L8t8}oy1V%V;Z@Pac5?BzfZo}Zod zg)tdygQGkJ50^SmMF`~+1HK!r?iOWBXSLsqN#nL_q$_lA$NNKBbW7R?7r-_Ov_caY zasY+C4WcwcU7{=@Ie_21sT|0pKL2#cB*r#9V310VxdgHdm}0Q^h8+KW&|o(R8IGpu zy)G=3_mK4*^%)7aiK+_PNTm<%hBM~8EKpGNVgs93MzCkrp=Xa@fd;?z*^&Z0*HWCc z5?fF>z_%#lUha3_0Pm84_DAbLSp>Q@?rSLKL7K?R#ug&pXTEHa;f)Rk? z3xXz~PrzdgRUV?dwx=%p)$)HoGZx;vTm;~yYn)YU`fH}nlu7PxVhxAMWvyco3miD# zj*gCxhw_cn_h{fAVYj!hFYv=aEPAp1eAODxD|jGi!olY$9xa#|`p-0tK}D!izkdA+ zzBtJdpb-JZnhcmTfqY@a;W_MIP(B^yUtjb?q~@Hcq?1e#EN4smhQ z8siPI!GUg|O9S^e?tQb*!X)UAF8U$M6}+v0y6Gh)3)WV$*B8PP64Q1<6BPz!aL_@k zJTgwr$i-6Ew)X>u+uH|hrf=+OEkxc}RO9EzDY5Pc1vT93D#^xcdC094sFv{4CUaKS z{CxdsnYty|T%#;96J)PU8OXWl_ZbSEx?F|$&c7Uwn}-UJ#bD)XuRb&olu)=Zt}2Uq z%&aeG7WiurIP@qFZ(7>3Hkl8#q!C{!-?;atM&4IC0s({@NLHP}x{h9)g%Z7Wp3}6Z z&nQNV-h{={+vamt?D8K;Zbe7kG-{K$|#)qGBLErhJ8O8Z`@A$U!XR6EmmO=(8ZZYI)BEpQ=OuT zQ-=}qC}FCXjH3?^yMUJoDl5)CI~iGVOL>IL{{s>o79fIMW)3mEW6v z#G-pstC5sI|3WJe(N{|2u?s8`ce&}FolI=TCN4Rj^v<)1G{FBM0T%@T9s26>@N(Jo z47ex?wXA)WnNO{^xbXkQjbo^}5WA#m@r#-+faIK)aw;WAImmN0q(hkuZyBq{t#9Jq zFaZeE&*Yq%-1v# zwfCwOxv-CH@;n_+~C-#(|{KwVDS#IO$$HpC5GCTsSkP4~C^igmWBC=nZo@noQvmr7DB{C~6=Ky`B6P zQzS|rK)*idw^G#|7oZ z`yTI!9@^A0>Sb}v4j#n*=OgfYvH<0&o`g*hsv{r(GXi9M`W`80@q3TgK3dw|e!qZr01|91j0GN@QT**#hc(7=1P% zy9KaAy6m1_RvK~xe`@>0uBHxZRfszPFDn1tafCWDPAW+wREc%^2b=4=H?v=I``p8J zcbr#V*$>q()oI|ukM;_T3VR0uP&qjaW4OPe-Gfex7~oY1%1B7K(f1clYE)Yar3>iL z@sSaTR7CM;#8c0c|AvVOHeUVbs?dbXV>B6Z?tT;ItPX-i*>V~J2=V0`aDt#T8`ovd zlHjEZZxd|UkDr?+vRd=la zuH*UqV%?WD-DXd{MnXw&bic4HQ-Q-b_%e@Gb3qOnsGVWV4%3layhGyls(2eSZYLq*9|dXoXgxg%x|c!0t(~u=qJ>Ls36iF#XS^IVG*4 z(SeF2qcK<5XXNGg17Kk!j$qr@BLd4A)SrUn3~B(dD}Xu)1~BY{aVz54Ba@(VhHFEa zBm?AGFx8ofWE47~t8n1AdnH0m_<}uYf7&cu!ay@ytia;D1g(H!g-fpF$2JrdC^w51 zizoG_kxTsz$bL5rY(h0n%?rrRQCokeu0!h`GBt9~`Nt?blPBe8PWE(nyn}EKydrDl+2njZ5Iub}t=_|s`fjbju6PoEmp`7F?r@zbbN5=70X|p+Rznc!nROa1e$Q~Uc&C; z(o#^U8x(!R#KsPGHr?{38(&nq^QeNVI@xP8YNrGV{KKnD>fQ3%Y5E^||2mmX<$A+0 z1;LE>IlvZhTqn^L_Xtv9T0@HlJNna(ORqf6EIEnLRK5Mm%2%|!SMRf}MA?#X58B+; zY*s$C1P?|EV8JDQWTAZw%SS{f5($328k}&}|+5 zvjM!+Mgki%r5H$wGmXEp&QSdug*k*BwuT^gUJ(c9NSJ2?T%BzI_M;8eJcL>b_CaM1 zg=hz~-Jp)l1v>$wr?-0-I5OS6jK0JcR-V0@L@H7V8$WW*_~NWeWEiTQ(sqs(oN85I zqu5jhDnAratGLyMW^bRq zSjHG{+AO=%#-}PF%J3+sK$;AGWiK)4JP*F9GBv{p3yMt`EWoN47c)q>qr5r5=^Xw^ zFmoWgvC@vObq}w?>O*3op3kZ_^#wm@GgM+K&&NOBDPX43<@651(0tp98~nA=Q3p;+ z&@4k8pTa04H40@U#NjMKvI-_Ca$+iDTYy*JPgrBs-~rQ}6dkQD?mPMabj6%0WbK-E zhUI&7uv3W3GZSn0ptMIR3?d_^VC#jBO?zBq^a$o3u)AkLkRDi(r}|cKN%3-&>&NG7 zRD3UutIE&xu!&mxk&o4GOe+}zv+FL*A;F&gN8_a@|A|_l`hI$8VRD`nD76asKO|${q-Snle58{rL9+BQnK| zc#nL7IQ@kFP>FtNgN$n)#_e7-F9FcZ%bnv7(lZrhe#secyCrJaUp+CJ`{c~_M*3i# zp~JtkTYR_WB_x7Rk6v=$-xtH;zG-JmpsXk3b=YC9aD=Xb+weq9@30ozfIH6mn*|I~ zDe<=}RzB2y8`_^PH1Ar(z*xJPrD;5?{ri^u4Yb0bk+hR3hS-Z~dOEXd+ zpOB)@;HJ7qzwp$jNJ~nphfxRMp1_k?he~GMXTepp_Wrxy(8KG}mwuL;H3**P0mfMB zOX-U!1!#3k-sVZZW)$rF_w(A9+jRd2CO1tmBZR&(NPq!O0ZbU6kDam;I*>xma8dJQ zWO{NQ+kPMDx|!8l{Y!I%=v!OZQs74Iq!GZCPj-w>Vf5;f6L95Q#X>U&hB7Z;TnL;K z)I+@f3`OE;{g0m}0LB(E_h#DpFL!u=SW4RM=s=VHdE+LBVm@g>>X8jj=2RH$7l0)% zfu9KYgird^qbb>}BJiu;e1v1}hZc>) z%!dPKOqx`Ja&MG1Mf;xx@V~kS8%TK7RVCD5OyJHFuB+)n~N zztIRzR;=2a#ozPaA`+Sz?jASo-@dp&Tf?>a0(BRvDTMJXNR!3ER~ptt{qY6R9l$U} zxAsTKlF2ug!^rxb;iv2@yO>3!!)rSeJpUvN-s42U{Vd61RSp#FL(v%kdxa|jvuxce zW8VHFl(iPTQsB}eI7;qaBmRl*!HWv&YB+WAC3%y1FjKyfNNCBv47UPnfwL0~if7=bK^`DP!F=-~QGjv@nrInFK~a+8=JUFzJ=f6e@`Y8hoLGo1QL|50 zV-ncWd(&qkj+k$V3fkl(yTP0UUhDs$Tn8sr*YVOg#Y?zl>B-6Npqx!q3aq}N@6zno z-sJq{?i-tXuM(4_!K)xJzfe7Z{Bgbex4ip@f7{D3<2u z=G5?<4%-x7mmAm2Kw0*hNI$IaH-<1A{y=%W6F@*qsK!d!&`9{aH1n;icV+oL8*ZvU zrhI|(j7_~ONRS~r5o$zz?zEQ2QV?WWVF*OI8{tfWdCo;#Ws zwr@XY%OQxP~cT6Zz5{1JreYL&ErK~71hY67fj$h!I8W$@sl3tq^P1zlUe zqvT-ntA0hR?0e2!L^`;7(QS7btGjaJCjkQi>;R&`cOq~7+S?0$1#qUtz=x<11j`MT zSbp$OC&2_&M>)Q!aeahOy~XPsamAaGM8kFic0Js8{@X6~I%YoFo3%b6=K4c&^ah9| zP}r?r_uz2eFM?MXrfvY_GLB+UQUl@C*Ti5XGTGxFMV3}%DQQ2g37@Ev>b_Z4;wg5j zXC!Y~!*iHWfxtsoxq*KjN?^{E3c$jx{-RxXKjDg=_0k==&+0OTNlT>0N(y@ILZ9$* zT$x++u<}N*@>H0w;p>47nGcqC;P}ujfKrPn=@|kJ1bh}aqAh@&;FN3@&SGsDyO#gQ zVC24n$q8wbd9fk>48!H;C_oB=dkL%llamO~{15uC1Ng8X)nXnXcd#0$=zkAc&M_Zt&NcTy-WDFFAwUDDC zDw^J)NZ0}MGzet^a2b`C#T?ZsTSp&VxcG0&=UxxvUA7q7Z%q&Ae{7bvbEkJWzBAwZJ{i$dV4v*8y@DRDAWop zS-`~%P6Wuj(RmftmV}4Vi`}R5-=B%{3d=eT(5ZB}2H33LPEBv`fa0z0LHk9|dDsB9 z?r-Lp|MV+7P_wjfqFLCffUyav3k4nB3zSc7kt%8wS-gRpDx@YW5oIXoH%BfuHAtJK zY5Hc7Z25Sclg5OPELH{gKZHGn&0k zNmoFLK8*blz^5Q}v`DU{nP|)sOHD!-gdh ze{_2K1Wc=eAqFX*1QCaY=(YQwFqi+8u| zktui%jc&cHr&o-)75Ci(uATkr+>msJ9xZZN9iDKO-|KQnmA?%lCQl-#K@ zeu$>HSt%1e@ zAZAb7ga&2v=lAS0hy72K#?_GZJF~qUJv<+Cxt`n%w)#fGm5m$BoTiZt-z4ST(MjM< zwL`}UI5Hb;2$H}7l-JTTD&dCUZ=eH_D3^B1tcW*?9fN)ZP)2bbpmb5WZ_D=ZDG!_dcoVuY#tWipB)ST<^E*0r}Rt zFUtHY5i|7;s@2?q5?jV~st)=HDA$8FeBxJ7Zfb|C>R%NXu@=i_2EJF6<+Uo!9gy4A zGi$QE5_&pao-G-FFB<0_8P{zTQGCl7+#}%#J713BhyS+@up|J6bt!rUKEu(7LqvD1 zQi|VZmREuqv-RD8-<)^zyRyI*sfrq>@MVLM*gF01-!)}JSh7KK(-Ee=Fuw*X9y6eH z!~6iMF-7!G&>_OdLoOMzNgh3qK$#_|U#Mj{JE?N>sKVvrfRmh1R${JwWc*iE?G>i7 z5G!O2^?R$%ILB}=s*_?flxyh_$a7{z$Kf6hs5@xm--=i zGeB}Hqeq>eLhA5{;vL^0oBdz6I-+5@E2doV=>?fTYK z{R+IeDi9zg$a%&HdjlBG!-df(NdZc&>lVKA!fBDH$NSx_ zlCtc}MQ(HrF(oo;r0G8FsL*ESL0<%%IEMZt6PT|6uhrv!g;0SSFw5~vY&^jYE*Wj-AicCz>LC*;d~qhw?;EYoI;UjauY zd7NQhrIsEtsO5(?kFN9E&jS-Gr|1we&xcRvJroqc)-2RAH9n9+uU;2ZPq&^&`4zMN zs$Pj23CSu3rN0e^A2)s^mM)qeW^uT#K5nAryS)XWpHKfw9!?G=Ns=2n%YHcL`uq1n z=;~_DU13L>YlO`^>mv0J9^a&PDf`Rlqkp^wu3Gt$%-L!RWZK=3y+DP^K{Q`7`-kHn zc0$li{eqUYj)ulUbud--jr#NPY#tFEnT?8QWztQLPhCHG4f-YDy{ut)-a1Mdc=+vd z0aO0(uc8F7*gpV+2r<_w-4gxV5|%*j`0fJ!JoxMZ1r5Ih_&?1o>Cmo!>=lhYwM#Z{ z&&hg{D>i?%WX>8*Leo{_afwdDlA4vZ`Ai%(U0(HMW?YRnw$O4z?%4YA%{C54f+s1- zc-*fdZVL8ZGg1Et&WuS^iOA8O9rKw~Zj741CyyCF(BM`RcbH55GMVcCAV$siH;aEPv}14{6q!-~kM+>Aht3 zVfTuj1Z$dzZnxG^itk*ecSq*_j)|e_(A;vLwtB`Xy0aO^rDeo&$Lo)8TJ=1nSFsJFey@A1n9#dmL04V!YFi z#);M|Y>S&HA}tES3tJul7Y8tyJ?Uh2QiU9CNLFzNC&Owq>@;P5I-yvH0oziJs=r>0 z(FW4;&;0h^*iTm3xR6!*VN8&Zta(U+M+?8$!CkO`>s|t7L}1U23*jM#EUKE=m<7EX zG@)}(LTP)ReHoQ~`z4RNME*YVS&T&of-_VEnmvH-U{?T3K&esNSc!OB8t^6e9Zco` z>m>+%Wa;UW>}owHrN8NHei(ncnUToMs;-2ca{lhYI^HM&&dZ2Y;_pgLbBqF3ouI3L z>ECAo!ra-y2^YaW)Vv1XDFB37hJs!e6lci1RK2y*@ZX(~o63d1T2yP?F5AY1?3>`T z>JEtv6+INi+bvOKEzkvlxmVDEjaj-i_>?t3&jc_Tbft=Fm|~yqFE^ZY$2L+w%BS6Z z9K*p{GueEuI{e6XKH@bm)q6_r2`oC6CKwK*SYMmtj=j;8R55F?S%7Q^2wHo~~5JuYB#NnUb>t{x>H#HyvjV&=)AB$B$~;PY2i%=HSusaGu+hG1GmXdOYT+*iG0-wRKCg8ymx2vfh$E zSK2m9V*22xXU}eMAYNq|ymffNUja!6mpq{U0Bs&dpL0-o!M=KlGL?g_MUvUCRfqh$ zT<_!h<9`RWn`K}60s=$?qRX^R%XFUYPT^DI$Q!i=pnYyYvaII${+TnCDRRnu%P7Oa9H$LvtoIMna8efWg zLWaq&%l4H)yY>gv&;1E9|9%13@_&{cVUBq|4sguEhHKvH1vANa=OXeWt3;WmHQ!_| zlij-9->5UxnC&!aZF+f7k~DcV!<=yIeujY%|DD~=_n{ffMp1H;x1IKEgKL- z`Fjp>HZ23r>_2$I>KW+Te$yml+bzBYZ<3VY>Hc69ei`y#6k@h4gxjzp zHC{Uu#z)TpC5Aj|$f4j=*9Rk3DC9sg!%dEvc8Yh+Ww9_-)LA2Ut!!sD(H!BtcK%km zasK>e*9$h*2LJBXu@ZuRCXV$)teb+pCWnafKBrXFlnf+Ba8N>y(UWF#Pss;RvqMI% z=6F{$M$%KU??VyDNOw<#LEZ35$A_YMC<`cD!1_rx)_g$Kd!IlEPpQSd?_KVpp1IykStA_jprkX%Ve> zf~;)(8!!F)s#`QHDy6p7Y=8x?ICu@$8EA?L%|5>-DzAC0Ys9ygVQP&meL`!%b zT=1fs=3X+m>pAlO)cJN^dm<-!1xBT7|mLP%8H_;QEp4KJ2-oiGUp@` zxu#{*2@DOI^kwwY>cURqHUg|dpGbJSc2yM%!OjfI`g{JB%Oob;>jpgRa_!=0|CqyTb?TEqd2(ngNzBEt6^V%)SQDE^Grw8W=kbNbqL%0-81u z$tYzL`}i}4?VHVsutUsiN;4>V298okC3Cg2N2q9MaOLa%*_g5qPyKStY)*Vc^Z0%% z`t;+4a$9kFQnE|8h;t|FKHLm!~DH6#LL4CM^6MoD5-R0^WEqk_E`<mF^_HTlk-2& z=r+&INqKx<3TB3c zEpR+*#WNN5|9><2I=k+k&y0^E-4YE zyF)-ikQR`Tjx%|`ea;U!48|VIjk2Eg#C^{>uWD%#eWX_?`Ni)&CicTYeG}X}Z>=v7 zm`J0WKU*%Wdw}SkHt=>`?TZA;)f4}e-)E>?%}i@{{AMtKb=lTwnk$>y;_d|OWx2i} z-Q5y6I`EP5Iyo)at7o?t+-}BO8%|(f2g4nBPHuEXQxM#F1!EZ)$Jx$)9@ua~Lq~@n zxwnVnb4jQvju6>k6;u5qAe`?0Z2dJZIkHx#-JvxLd1;T{^EiE`D?`<}VN1?jw0m2O zSCW1RdN&BtO#b+>yQ<{7VTBfZ6?>iKge5GsZ~{0eYS%?Tia2$Bb=(5|(AvrhHx?lk zl?Y_V(@F*p2lNHAGsfM_VWC)qme{h^9cWWfG=LZi+4h?a?F0xokzJdBhDQ0ps68ckSKvv&XdCKX4OsHz zf~3x}`c`gimto@oRNtn08aVQB1L(VNP*FvWe>y5Z zIc4AopxrCJ6!uf?)JeOFFS2$GeB&(4(%t6X?0S}K+KDn3xRH#G=7tl1$+O7RaPPSTEp-D!0miS?bdEn2i~j>FrmY{Ra_p6EjsSC^H2BS zy-WL;miawCHcczMw)(f%7U>tUrjsVW+Ew#7SZh3PO`xY$yoG51Sc{HY^R<8|6Uy0B zU=H5MZaq);p-_c)geAD_XJHT?oRCD2riS5d0+<4;H4K0Gp^*WBr5>BfNahU`2^cOU z#)7#P7bt3gC=44x_UVdqVuEk-z(K$-%#V6oF8PeR7BpU?gt&~DuWNP|T(t9}?YGaU zXEp+6LxbC)&mKw@sD>NlMq~h81gwqVbPA!(?d4z3HlYEyi;^{-W3ebFD@z823A9o0 zbi&jGC}yVsu)s-@gbz_<$mU!(D+0cvKokoO2?5wqDq+Pu!^;JZD&}N@_T%Rcqeqg@ z$Vu8^4|zWq{76thsBf0oDyq)>$sYnN8v-?2?F39($jFk;v0op~ie-owG=x)ykp-^n zD~93JNii{EO>1mJpJ6WReyVcB`p-I>nLuKl8_ZF2-x32EYv-qS@dOhi85uD?i&9U)84M>Calo3!~GbLD=a({d&NwIHbq<0?2D%KW_)L zB|HQP3FF5a>*iqocN1CMG^gLOfN3Yx%`<=%)czcG=t=S%4_$%eSn3hCz;-r=qe0^S z0ftnEPRN!24Qbw5y+|;AQ#K|(-n3Fn-*fM2WFe`Bs7BxL@AyP97UNaPI zzjYnG_&Ixz6%4_lECeAr0g8Am1CUW<37%HwOuNehwds$5)u$- zGebuK)E}!TSCAIsf?qi_zaWWApWT+mX#U%J2awX6UW!3B+bUq={Uf!7SX<=!m31eJ z=&@+m&U#nI*=(Y!=@>ahN}ThizS3(q#l|=n(-jOvT_6vwfQZOt*!}e3t$&<}5lr6M z%)M0??S#q}kCLIdbZA;-|4**@c6*ynnaTbglBw2ui@dX6&tewm*wa=hb)VGsCo|-whJ*z!E;K3K zQfSi#44;==QB#w+Q|tAP#HS`J*D4C*~wE^V&d`7pyBp(B}TA+kOM4yF>`jZYW1xTJ?kcF#Q zcyBK!BfAb&90EO|UU@qED#5Mc|8g{75))p=W~QBhHwy0_B`x_ZFwV*!7A>NAg2d7n z3;Z7Mk4`H?P2V#cmJVKSarV``2$Vz**Xvy3${)SA z#Lo%s8tkPvsk7sS_-$NrldcnYT#v=FhuXOMVN{cmoGb{T7U^pnAMJ{kAxfb{(+bv+ z*Xnui{v8}VOj2;Ve#MA$+kereDNX6V9!^R0dSMW|ag-+Pz+*xbhvh@Bj8h0wCJwaI z^yLK2b@wx}?T6|=`zWSHFKMBfa@A8=+`AdjtE!HG(|q^IH_{d;tdt54RuT=&NTu`C z;s=^7B&4gQ6HMXFVd+lLR(AOU)L{ z;Y+1wX1=;*ChY-H))uMf2}dhoGCATlB$HbmAG+Afm7P5I3>d=gvhaSAKZ>)b1Y%6h zO;@%)k|Vd&RaG)=eUo($jOmRUlIZ1L#S06W+^_|}5`hTgwBU6NEpPk`ZdmY>;t6-q z2!fw_qbi11NskGXf_?q{H9Doe@?uLbLWVzUk(YY^x!YDFz<&=RK&54iN2siwuk2)V z{^@t`J*lU5^koaE$-xy6oQC`4BojbFn620;C}6d5!8i|QQ8Kc!;O~449p1CG+aSUR z-C^dR4>Gas_CI0000kZmY;l{#$vEF2V-xiZZA3A{8ZJ^67K>-J?qRAuji16?$7~$Rcg@YgZjZPLQlQL~1(DlMs z-?lGn_!5NBLmtOGG^bE2y|iMJFP``Way*^VSR1-PNUXbzt3qFeCH@rL9*QP5;qkgj ztH{5eFmnoFNYfW}rz4sDaWtswAjdPhvEna0TXm{7DloSmreIj{HiRJ(@1hb}BUGBt zU3Ld*aterR1mP&)hkz??6l4He%J-@@5L)Enq7f(}XrTF<7JF8w+yc!MTx&Q?Ac^gy zOr2&AfvP=z#o^ItvO>PUKN9p0kH}y9!ls}Md9>Em)8L@4=R!wY_$GYT=+yxIyNCbU ztZfGp2Q&Ujb);I)`fEE#Hy#y{ROn!zcluG-rTrDBk7kesUU~JjHKb#Kg8sJ~HLX<2 zcT%k2#|xg%?>>!QPE}e9z+oqdg0D>reojz!QGa4$Epg=5Yo=eS?ayN;$L5g zZZGGY98pD}3$7#IDs-=6ul}F56m~dhPoQkFgrSC$SrvmU6w%OZ!o8L`ouXO14FVfG zJ1(!gUTAropr={|JYL^pyLK(zqs+h9p=O;!W8kb)4s7hT*Vn7dJiDwM24{ZoJ>A8?aQo9@5h7$mM;aBPaJ6`Ey89^ zecm?2dCOk+S}7Rbi)C#8);Y3BQ?GQZL@z`aWZkeB>y&D0HjVW3d@bf#XWGcS{X*w5 zKQB*YzndOd_}8Elg2YBuRt(mmU_9`RbLvR5WgvRW4Wkzx9<^;*)NlyipbdmYej|1wm5lg|)1qZ!tT8|=}-dPgr-lM3?8?Ei2%e}>XvE> z-jA8r*3t^agJ7HI(O7>aFoFUD(mxkV^v72QVY9PLGiJ`7SMRr`GI()uC$PLDqe7)r&;nScmtnQ7)Q|VM zhpi_^e%wi6%MSK4um$OVwkGpZS`b^)#Waot>XA1zrr;mb-7Ryr85lD@t{x>l2ErFL zWu*s5^MA@I)Q#sih84szy*I7-J{MhHYszSoX%!?sFsSNJ;Y=4T_AqgxLLCc}chLOt*_nK?-O{6GgiA%Iq>E;*5H0 zw)eM9G7k@Pbn);dpgDTGK1@4P8aH45HlS+7ny2`vt+4SQ*Vlh#_9M47LO8ou zB(m380vsJFw=|QxqN}doHm7EwzsVMl&n``mXS0bVkzj^92|Ox*uH&@B9c3H0ua$QO zhqVLd_#mgPgE?HH!D6v() z$>o-#o!C8xnacTlj7(Z)N>1MAq81-tELazS znE9vk)Ju3s3wT*Ct&0Tw0GtJaK7>u(czCDqTDd?Y0y*LL85kf>Wb{omz65BXfju<^ z*j>Dcw*d?~#qq#%L{mj2Q>_R}AYM>C00j-&{9J2!4VC9^S5(b!V07+$jjmYygsWet zyR-Hlju0KsC|T~fiN}S1cBQgV%d~>ErUfCkjUc%{q+*Ezngfp*D}{9h`}w@b?!TS6 z6zK+-V}c?(nYc&JvhW)z*mhV04fN*r{1rr^1d>IVvbDFjg8~#XJlW{`%fIaR!YX}m zaIj&x3que>5|TQTo>-X8MJ(56XJ^AUK}1XpI>+QypSs;9q)*t%Jv}@Nl@YI}o09z` z-`{iHQ7xVm-QiLu+n%pOCwO8vwhg%x&|*0?=sl`6ChfH5yGhHIy~tNcGOE-ni$8%F zqMNqoX5`IQ{sra^i7**v1S6%RY>#yMKxj3=yZ*7qk+tbJH@L{8%h2_jo0}N7*i0z4O^Hb?q4sovtK}1STBaCzz|oUcs=0f=Ld1= z_%v zfEQaRmVp=ML~yiDmeeA3fd9}3LZ<^4ZVHrLGu!2VlE!c$e1atTWCBPfKx3eN0y8i%!BQCh)V(He~?H)Lre_U=@^DVaVaNx1qGl9I&f*aOz`<2_U~Sw z)X_ynm^tAc16ScD4JoCNe1UCa#~Hq={&dzAx2=fHJu6h!fZ%8#1&?^O*n-QB4>ng? zdu&V$QLr zNIO>?SgJMGuN-Q|uq2rI_-=+wAXh(b?yiIC6UM7RB^G(`U>7E9G~jvz__$$C&P>nS zy4)D|#ULx^cR)N!tOsNa_WvMRXq!AUPucsuHCx0CeCmu78SIAlj*ojyBRe%GW|1`Ni*4AGnXPilU<%+Ua`EVUxY{LA?8 zK{AIA+lbT{E@&Wc%h-%>Jg=Gn55Ef94!LSjp%N}oC?Dc%n z;0<~s6dd*1Qnx`y_$}JjVO58<0eApEDW*AReO~_LPYlXc^BsNYI2ZYq@KDm%VD@vv z$h_Ip-~iviSkm47449fgwR#9Q)eM$F&rgfR@Te%rRxEJ#E@FBBc~OVbFpq`7(T1t` ziZ6g)b6`jhi%~M;f{e2Ne4FB5c_%P4FsBk`o^r8lzOye=>W98YERtaS? zh0v|8?beKz;is3KvH-O%Tdk9-Jt_$?qJ^Q0S_b)&g^H> zodDapaoK{(6*Lv==U8&rHy5zcpCo&lwmEkFTueg7s$8v0G81`4t-{&j-~FZGDBg=gHP9hkqs7LQbxgLAO3+AO<2KmqUf9|+lb_14R zS}mUJy8m{dEmit)9KrZ_mdlxMp+N{M!Ply2eH6lOzLT<@zoQQXlLQ5fhOM$Di!CJW z1n`sa;~wnqXFn!e_l$QqrIuc+iSx?beS_ie>|c<~VYAiBYipWX-CnaK@~;_}E;6jm zuU2r$iNEEdUwDUXc0Xt|D30y2(h!sKB*dcMzCjNsqzyt4lUzNoF5C*nra_Klnh&su zkGlHZ?b8=F_Zl_h4zR6Em#ML{dbv&8?jc@ek(`?(MZB?ecMLj88i6UvY*elLuWs=J ztL4-MYtDHZ(#aM-pXg2+aO0olVH3Jhgx{qdht-YzI|ePN7ufqoW3+6>8!Ov~j+JCC z#PZ_$qetE)m1I02K}Vx4K;HN_nlzR`Mwq5sCzgm}$3aB&4XgsH*v-w%MxUU=L<71_ z$Z>-xB?U~uj=b?yv7ph1`6b0lOPf)H@#v%4GlxftXYS&PDzR&{5+~6w73Jr-IozI2 zl-?&mS#mLMPQSZ4{kVjK{9*6a6}iCuJH3W>$QJ!08&1+W+>DGtos5xta3}fUfl@Ml zG$e^Lut_uSd2)Wj3dLk^rwHTWh}a80EF=}2-Ar`8M($50`R<37WonA}`}ttQ3x z#}Y~Js)T$0_7hc=HfE1^Fp_;PHgmcQbG7lUe{c%qv~pfz@$V#`Tg+=Mtx*ur?jgr( zR0S&uS-|!=YrV{GR83o|IoMark^JVmr4GNQ18Idt^y_fpI2cw24ugpVn=C=6D|DT} zgRTcNO}F$HWuqerXg7ne@Fb7nKVOxe_v)m-_}4I1voF$?%II1AxqU+{HqaUEbpI(p z!%vN+1wMO9us^NPB{+FzVOl)4u~WYseFFhNl`K58wz_(w_0TE3p-%SxSpm!n${XpY zPg$_#ZO@X#GLf$lbC(##18tH~wxz!WooHRymiX!Vj;_C&CvmF^`sCN61zY)Nj#}Kk zIgQ^9wVGT?2h5B~S0N7%vTY3kiiY30o+sEy6U*KQ7gNKX387BhiQ#c2T-v9$5S(gC{&qU+&<28Dx5f!o7ht{_Wd~r}W=2 zvY@#TJ^xb;@wtF+?ZU(bw(dEI&oXt916Mwvb%GbjcYDf`NrVO($vf9>9zxPwk4@B; zx1y6#>7kOF_{BtylxjsmZ9ZCNil4s-0?q;75mK_;v)nnG!q1qG3s zaSnfdbjx0t0HtICv*p?>Nl=NuqK-izu!wt+{AFf5ITV5?Fz5LQ-ZZaXy@LG=d=$Oz z|AEsC6GjmxCbTjnf7Z2XU4q$rYN9ru&gMf>Q$+2zFO&xRIs~a&f7Gr$JI)dvZ$7u8 zq3Ec6y<3!;b?aUs9*2E~`(_cCu>OVeQ829xN|&16JZ{B=6(~;zK5_>PI{JyBeCSk?mQ23f8zh_~iGRm5*8jI2w`|E%KJfHC*F;H&OCKrk2^erGk>&*kWM<|jTlbiH-9^315#t=F(_ z@#k5d$8~w~E#wKwXLgl}_pTuhJp zk0yoT@hKVWJ9Pt6N8X`4lv#$gXfBv#%Vh_)%FiWD<=<6($N9X2sPnT;S$Cm=YJ!VH z_kW-coxM<{zB0SndFSzfamMFTFWp|55WbO2BO-Lcg_qNao428x<(lF0N!rQV(<+s) zc|5t%vHWactW${3$DaW+0VSX}v-%>gIpW?PGVi-l;^MxSopqCc-~N-pp;&UG3y*d# zF1cgY+b~@roEzPA)q!iumgDf?-}gzX#m}n$!!y>6-96G?T{(PxiN54Sr`j$Ukec)g zhFNa?%v}XEy+Noj&A6?Ow{9qjqw;W=D%Qz;?&AyWYwcDkDfc?$k@pwEKs}qxb6oNF z>^>D^jBk4TexC(4v)UJ<#BBLt8v{hiwC_mjI@F5v3~WCuWH6`8vxkMm5KcagneZbo z%iVrBX~Bz&#Z!7U%;m*7ThCId3q9vQwS9rvSb^0S>Gn3~;rxGw$%qgtR`E5istd<4 zucxnz8jl%pD2%iNE0(L@v5$3z6(~zkkRrrhiq9IMqHUS?PerB6%o@?A4PJKX_gO9f zME5@EltkI9!0|^S9ltG?Vj=3Yy{vp*dZ~5YtDf&oTB#$PF=`Q*^Q@%0Ao+gt@3VjA zbm6^wb?0AhAr|vllW+!jZC~qskrP@X3&e<(Dw(+5vdatI<=L+$wqhBuj(1_RiO8F5 ze)OHo@fP9TNJ^ZNF?rcF_J>RjU(Z_W8n6n5G7)ICs7KQrERPh9M^Rd8X|~e6@l!5B zC;T-m+pBiGTD0^j%#)so^E>?Q-NXBbK-6MU9b;hdD5T_M@P%Mv(cuVmoaRJcP>qI0l%U+2rRHpEkJ@&JfEJP;D(#4yMqq ziRqR!qQaX}5G8(}=Nl=VAz$lC(JXm!+>DDK#!sbHe&3~OIpR9G>&YA4TB*i3%`aRZ zyXWOdTWJuPn#|P4F{e{;>Ric0*n84_Ww-3ocL`oh3v`}$w+Q?V6nIU|Hv>2D7E z>5ohRkk(2*yQiADH{9SHVS46@Ib$@zsDcQXxs!O<)T%X+7h*rHb2G>3Dr;Yn^7jR_ z?s3;wZ|9MZsz>i8d)yk*mqp_3>|bz@5~Cr;UDMAdg{TDCTe&~?vh`aH%Sf?mr>rktpO=pPWlk4$=Y(2Vm#$8W31}S-j*gvU6@51p$5`$N*M6ky$5=2_uxS4Mo>)*!&;#l?QL6*8SLNTig}E{M#+Uc-6z&hs?(r~_8U zuXE;@pYZN}%N$e@oFi;S39-na%4szBJ?I-sUGQQw&o@r*VkM%JYW&E=l;(I{flZ;P z=fpnSD6-JVG1VwSs6*p;5pSN;Vb?*=09O89aSGO}=i0=dHA-qBAClMCdJzG9Sk>u$Si?|5~F zUVYgVB)J`NU-iRiPL~P=GQuV**0uZXA4|Q)&a+9q<9QLSVtUQuj}FwbbP}6viGRgP zkx{t{jXiq}`KsP~k7E71#1*RZW}y~g$Y+AbK8#MY7{8AoMT6g^-RnfTVDKACh~0qB z-)+GPj@iHV92Fz#gOsmBV-dca?>PM)o^IS9Ff>jPe4@veXK*y$7%Rij^Wn(NhrMgXfon~(p>?VB^%NJ)@ z_(Zkv6`5knKO*`mmM$e-d?%L*7Bb?~!$t-aH|rhy3KX>XDds-fPWi!i)a!nfkhp3cQfUqZVnRjs(0DThkU)8_SbnN@ zvpn0X3Ax+*m>lm8K*25CP#F4yz1@uMN#{nc#$J3uCuya%y&HRYRRabQF>^})%N{mv zX>#mI3ex6V6j)m3f5+Rzzt7(mAVo)ec+2mHVY@r>;h9yP;!Z)Qnmm|TYy*e~M z2+z_|h}l;>ZB58Kg4;;zvntjsv-I4qw-Il|+E6&=eJGV>%n9=%``TZ1QNGj0u}@7+ zqJ61Acy#GsBgiQj9AWjncDK9EvP(+JDC(&9KwfR{BUk%#gjlUw6?Lxq2iB4icYiGv znFd@s&ii{fIAgR%4TUm;Lz3zH{4aWk{{8%{8`k#AtC}zbS=MOJMxkD7Pjh>I|KnXE zIn>jR;%H0>tj>Cq$4t>}XuFFQJ)u!d)`!MONXgJ$I>mV4`_w*;{ zZ?|1(Cvalm>eFk`3+0YwP}5RTqwbSQ5Z?P>%-u(u(7_(zdGnWlA zqW%@0$R+`^@2&de-7lu5J!EZMudv7A`*P<#U$GPIQ(E|o1NbxSt3PS}jQ zi{u>^p}*Ao&#MFvIq8*xv}DX6@I;;XVgRu?(@RA>Nl{N++CW^&`>xl)=Cjjk4YX}b zMTAR9C~8__D38qqay1FRZx?+S%J{x;`3@7yj9mI)Tj+1;tUbpJSJtb%6q&YYIfO6y zFFDyr`CVkaPm&WFX{nV3{uB+;%jXp0Uy1s53dK{t3*|0Y0+jp9`!6@Ay3EqJ( zUv^K95F-`N7VE9shKMU#Wz82nhvE^h0xOh;cwcw&zRo8T)*I}JjUBqvg>&cGRjgWU zZhxQCZ5rTOijM}Fu$+F`>UyIfc5jwB!D=LVcR0w3vP+tmwYrYWdiNHhkmE|+O`)tY zjN6RVgv%&kud&>oaF7tT>Nn`9T!?EVj$p488T$4>;B94juI)(OT^TolPyF{-{MX6C z{))VV_}2o0BhC->d?~KKAMEY6q^3OP`A1=AKX9t~ra?%kT6F~8{N__{#uP76a5|2e zH&<58xu3p+vx_K0h7VcQ;XN=@uZ^lvo`K=w?XahPt-nT;c=2SN}U*XBzz6A%5K-5#iLs!U?r+LRi zUb9545+8dyGUnsZTp)d{ zu>0yPRQB>&nVURcKIUq@+?zSKiKJl)u#o!SuOEM@fH6drI@oNRD#1HW{oA zz#xVPP=Hsvw?`V5Y1VcJnb_a`SVNKkX|ZZdpL+yEZ0QEGe9cmO=q@rzQStVjV#$L5#8d``k&z=6Hn?ML~K~PXq0-sQrV%kiVQE^!GJqet@-mY6HGiXWgR3w)M z9}<9=&w+(?PTYwYS6V3}*0idxU)9rdJnUX}@Gf&@5|)@6*ob50eVjYS2@j|#^UFb>75k;!^k)C4lV+5X=GlwuMR^r{px%(~ zkSIyH+mNO9s#%pKj4TI}OmSXqdc7zHTe?$Mn0{9zbUEVApPBj;?kET12vYzy0KgW2 zR3PxVg%O!Ck=M-Z9_lT`x2T*-HgHh{ zsXIWq4S1^6LOTSLxC8o&KgEj0!L@wnvn9YewS za~Riog9eG|L)dct#l^)<>=I~Uf5ubGQ7XvDoWKxL2wX4m@a~1vrD@5SlOm~v#OZVB zeaRspBrVyTA#~o>_Wa0zUl&n7*S*bTL*mdN%vjkFsugZhout*@uE(Go9y%fShtH@Q z-(wHkqz+%4nVO7g-XW2=d0vgoQ z9PaHEmUwLG-DDF&5K#jnn!{>eejnr)dZ7tAplly-!?qE&&a~^ z8vZhn8!=&;7e>>@^@u^A%@_sYOQ9Kr$`$1VZfjq6zn%k#cK9oVhEu0TCQt3WO>(1%F6%)D zgf2$Ms^%PItc6nZQsDo@MMOsGRhub-ikh8R9_R-AJ7@Os)CWI|-!%inP^sVp=N=eT z1@n_$p(4hm9x)ujEFDlvFj$-ht0LeVj$8h7rja3RffI7W5QsUsh4`csPx9?X%SHq2 zKDlVy;rDm}-+gbmTSSy-iD=?4f9ql;R&SN>zuM3r(8}v%<>fimxHyzbXhy9_8-Cn$ zJYdE_t-+aX+&Aw(ZM2urg!9|M-KIV@ox9!Np%$ZT4WX(?i zAp)Ezj**v}+wH{POkkp&@0LZd((83EkRLj4al#fj+EH9wuz=}_ks%ZXvu}Huz;;}O z-SeN}oWqbs&(0>R#z-SEbqRcMb`FlH|K&n~V=0T3d|aDF>WUk+w7giF=GofyObgCe z*?`^c;*47lu_h^E=Izp^;!Wu=<^?8*cRf-HWIj)S>5Cl*#bJV5cbZ-IOJ3D%6eXVt z*DouBiBAnuswSkJiV@enzuca%u~`B83Iq;t{J{4F1A#xA24}SFe#nUNyK+mRD`_l2~*W?~n9uyU52rGV3Rm06HTKV~2nk z@@KmG^5Vi%GHb#T$a3H+x}V33a8BOcHAHQBH}&a*Jcqm)X^AF`FHq6+4~i`4Ic25% znP*F(8O&~S{R~db23N1+a+>u)qFC>|r5Vyji_r?d7o-)bz-c;{7D(({vE#>+zzIPn zc!Cg%vTF-l0GXxqtikKr8PGayMm?U#z{dWE9mbcUh!^ zx@dq@wr+_Z{0b1$iyH>4`llPg0f59@$51!*4abmz_6aP5VBI@AKL^ugV9Kt+KW_jX z4#4?4LcC@`F$G$F)wDHOlvMDMBJA2ey#P}EG^`(x*^vKJ02TuvSD%0z2Y{MR@b>X( zF=!yu0oo~07jEM0W;zje^6_bJU+(N0G)(BsU1{XKXJuuzgz%N#G>}EThPVY__D{l5 z74xrcN|9eBAY2A?JfYac<=!YKHzUzlh zy$(bIFp~%RB7^Lb+tufEAw*{WCt}I`+*}Rz4*OI+2|Sd5?|%UdGmHe)%9;>@fSjBh zxIR==ibC>|Q25tMcKOLSPsrwFQ}$KWY(%j4-H2;~!IBsF~k=jj_U1E6e&=?jbZe; zo!B+EW_>WjmMdM&+$-I0Hi1(2s8Mt2g}BV2;ssY{EJKn4)6CVkW3ICr&cr)Vw0vt( zU7`O*YMj1C7M%4^W8AV9d{r3$#cR^zT%m>>on4eOtproJ+~(Q|zz52fR z*G5i+@}-)^KQ)WgzmD|S)XiG+F}}{xy&A=kwuSZb4EPK1T;}Ma09ODpu5$CxuEUL` zMyr}&3W;h?g7G0IOx~1TmFyrR)|Ze0N3$aHU{?&p$?>NOM1GOJxAzq+oL-Q9ba?0# zxd+4&h{W;v^MiZk(>(maPNCs4)G!YLzUS1;O)Z}DPr#Jx#H=;uVS4LjG3EeB>85lFR0TF$l+kU~~F=6otxc{I$1c|#=i?;3h_pKq9k=1(C^=J@*ONY6s)bmG^{g>t-=&K|&2aS8to< zfRWb#pKF6J0jMIw8%~F+Hgz7rDFd+^)C*v52YhpQ4XCg-(3yc0!Qc(^qcW(D+&jwP zEW$5Z=8qmc%-Is->I724VFi$1PEJn1Rbxba8SEWyA{gKh<>loa`6)pi3YySHcthA| zam4{nC>;(3>fbSl4v2|}MtX*vzC-#2DALrUlfZ)M)vGK}sCq4e7(D|P44@J=W@h@X zor1yvsGRSD#s9BlnF9GXRClaRx5grTtJpz+5l2l-O?@>evDgjkCcz+&_B z^JlF2KJ+*Nk!Akvf)^a||MqkVBI0k4%xd>OEmnZGKrvld+YJM{6rinOeLh7+s6T$x zONJHvKiax6SLavVCfY!9_#uIs2<#2g^jg3ggHQ(b@2`Mn1frJbPfWBwH@zroy9Zms zr|!~qq?b0;PZlf_qA|~1p8egvB}sozy#?WGS^QIMjbd{BmK^2t=mPZNS-cg1f4&XK zU1XlYyS`_A;LAqXDwpS#_aqPlD*Dwjigh_Em7_M+C-BkhpWfVq>W&al{NGomk}neJ zf3L;T5YXoT-*#q2F!9G<|BHQwyc2kU|c(>R5E|G$qW zT_}mlMKz+RAe!}@g0vq`>bA9;Ktf3y3p#0kp<7h)Sl~wq3bc;#V_Q3Or^I=V-TQ&e zY=K_#1`?qPChoZGwEvwjU-B#*qHh?qvIz=0EeNle=XMye_d;!gU%6^_RHp9+Jh#*9 zR#Y00WvyAk-94)*KuGgQvvUMIcNlC!G0%z7!Y=P1t^TNy{yDdjILnj=3;(~1BF3v; zciS-d)hf2Gw#kt4gZj8QXFlHJpDPzNk^b`9t5_kkN?p!wN~OPeIp}6s-_c2Rew_Q; zC)C1lfk!sZEPRhAUw=&cpAp5$v*6Qb;|F&FvPt07)$?LB|M`PDCkU@Eo;w#KIx$je zyqQW3OjMlKIz{j#{MQF-&DL}s#`d4bRt@HxWDM|`=dUvGV!nySqZ(txYLlyqOGq?w zxtl&DC;Y+Q;kbwjgI2WECx(`x`|b=EKhtM@JtrA!k(LVge6BPdgH2~sZ3!ph+B&6R z?J;;$U$riG8~tU+dIe=UEl&t;UR)^DMXeah7+<>HOJ(gBrRx{XHZ>7)_nlUaQ?+B^ zDSZF7{o_BJ>!p(H$hbd_{tR~PA^59W>84tV>vDa{ih}~jFGjs1E5kL4_$NMIjRn7T z(_yOGrLhivRW67wWuo2LanA8;3H#$W4trC2i;Vt=D~e&Eb;jGIS_nj(27!pDtw9(A zmYE!pSzvcA|JU=M22Ar);Q~@mh}c=Ya9_%*^A(8+_fCvt$fM$uef5}j%#t~t4wRDiak#<5)$=H9@Gee z%UaZTc8*}OemnL=s~%PR&80{Tk^+$}mj^>Ez1Gz8`KryG@5Q4c$v1qEHuk#1i{z*#con<}b6+ywp53P+=Ppvh;Q; zl;Mf+(*^9bV1hSg?w@&_mpHV1CCk(=b6ZDVWO(e=?lgomN7#&T$oVseWi#-jqzst; z`y#->S1_^qdFJJ3BwhUg-@;cfrAJ8eND|*tFPzQq5V_QGbmLX6CJnVB(|HPMBEiKZ zDt1J*V==u$DkxE?SD2r{V-bnFa9X&A`8?(?C4#OsEh*KS252eHo`W5@1{{ozcR zc)cRa(YK>*NC^m67sUFu z`lzhq!aS|^Q#SJ%7SzzT`e+3IvJSeHc}A37mS(%XDdQCqxsGjnQ!O~4SunB5Gy76A zVoU;WZ|sbs*86G(ku&+A8;_ELysPrd?nBqXtk$)H>v)lgDi-JM=dAMi$0I%xr0VDs z?HZLG=gVOcK7!>Jw9Mzlo4Wh5RSa_3_Zmu-s;5=8OddQE3h$4WNL>6apfSbN4Cf)< zNFnAYe8_C}A~)=zPhYA7$0_@`Kkh{nJ&arnOR8G#oQmRme5gUcU@%nWY~5p6JM^zP zmR_}pOLwQ(mHL5f)!G0@BS0V%2XNx&jwE!wUByh`LA2n?UC6e@wLM!MCdp>_@iN_5 zN;c-p&^?a~DJBBC?xja>!?(F>4xXpCd}6%+pwA(l7v0#}Q5&`A5u66S%Kyerxjzz_ zR(?xJH1D}$TS3rC6-;hTp!s+iwk`hX&RoB>jrO=`;pZ{WXElyED)f2+?LIbEixs`I zlh_G$BG0@$gLOI8Q`;O#do)d^uWVV0WU?$z*cRSLTeZ|PWAfDv?9|M>)GVG|?a9{D zOzAUoOnSrV;i4*SZ87%iWkyy2+M&n!+%JwTmkuj+oy5i5Hk?>?B4t16Kx0?la%q$x zbSWFx4GPgTQ4z;Td-v}hwm3f2g}8i{7m+$2RF}<1ld;l|y~*Q=4bmhNmt;@M$~HGh zd`w9jW-j*FYNazz%#yUvOnqE?hhlB7Nl-S!t6Gd!6zVw&zOr31_%9xf7tYHHKC&>m zt2b}|$LDnNC!NQu_s!p)9}0>PmEUVX9obGmQ6-SlR;m2N`0hHR0Kth#jym}T^?GRD z(f66^8<98W=lRDtSs~a_GQj#Ros!Ol2xqP`|F+b#M$PaF&jQ^wZ!me z;6m!M6DT!w3;vbrAmP_;N>u7lg+tl(Z;}Vy*jB}dV$sD<*=i)J#YTn^ywh%v#hS-y z^V;KGb8X)_9+fEGR(Xl1$S6w4KmF}4dFE{k6<@2{L{)d|aek)0|ArZ*m-F6=4mn%s zzRU5BsiRiPGX^5EJALdO!b6e|oc_l4*o^D;IYtcXrcDy7%4Q?=FB74&r*#YEYn(hq z?PPr&KlM#xBlzp3HLua5c)mt_ww=Df4YdZg@rINpOguiVB!{(tT1@|-${8xnlrc+F zcAQR|intJTDaK&l#6}!V8l6fGFM3XD{gvO8N}U@fR7!*YdMcFk7g=I?bg*&jsQ;Qw zjXb7CYWp4c^ADG5E>=I;U2)8h-|kRW$lLMGY75=3AP+)25};nT#~Nbk*T3rXk%^_V zm!o~&-Qjavjy91+!@=2CaM*xoz5F(Y3Ocs?qIQ%XTN-7wxhcaoS&LS{>=GvC}!#-s=2ON(pf0*J^_QkTnI^d(~CDG>w@Np{_qErMYt) zadH2ojbd@ouLh~&94YvmE7k4oT@-Yyx*r3?Aw5f6Dcfbr&GK0sQ@VXp1E0UN>y^Y6 znc;fg=VQK=?PrJ8Qr1#sxxn*bj(H#Zv3whM_R<|rdQ%3aa2x%G!)W#)@;7MJb0+57 zNV}?D-7+knHv5 z6LEz`d(rFIG~3H0NA-fy9;uM(RELcpqdpFDO0nt(p}9DiX?Z`W`RjzvXI`TlKNg7O zk~eGW+2j(E63O_Ln))pENjI&Rq74Dm~e>tVA`%j@p$}oyT1uTUhag0+^>S$b`CfV3Vt&T!H;Q1jI(vh zRN0)q&kVQ76F4LPL)4F%>{@&H8~MEK4g;RErVlcrzPY$;N+V0LkfX({ z-4_cpBz>l)(1eLQkjWyz)4wgy?07ly!1V6+uLCjV#m#O(;{Xapch~$3Rs^Q^MxwVt z<=dinu0%Zy6~ZKim`NwYPN6}55iff8HUGTq-J@i{n#b^?wejrYRuC^ntDzw$y!(zn zYuaY!Y|5R@I1l3!on3c(Hlg4%)b@J{TG5IvPCq+pFnb5eIx@-e3~1NU zx-b>+vICYRoquUG&!C;8NCY71>Bi)GhcEte5$nD0`5GS-?;m&N6`lrlS~Uc%Js?i+ zd(g+hO+-X-N6`g28oz@0GVY3lezwuLpd-G#E^a_yC~2j>+n2OfLK6S((?%1Fy@*IU zQ=Vgz&5gLVwhOwa!RzGTmbi9>4jy(n@`Mx54)cu81<~m_#H|O}y>?f<#Z4ZB!%cpR z5G%n!e^J(-?xl4Hqp?QK{lBbZFWI6di|f~ar}BkpaDHGjk=xisAKYnFi7_?amQI;B zdHC^sOztTh{|>40ZGaqj;XjLTz>= z`;Ra&Jsgr+H)}8!uS)I2t?mjIx6#p?2s3rFkUVja%gTMJrA2&E!>lT}%XA{mL4s3?{4zrFwW>UziTdV4&+-_Pfq z``qW8`@S9!_@G)>K09rWiz$d)%Z=m4d$Y(8{-4_Q#9p-1WnV2tw|pGx$80OsR^C3O zIM^+9)$=>^l+g6H&2woYRjVcw#q8JFme;zvzuI0ad*YYXJTIXtB)pUCs?m0?Izu)4 zK*i*<7jFpWloVn<*$^GqKKF=Q>g5T}W%GZ;D}-xf*B7`LrDq`GD!349R3fBzK)Mi< zAQWBPN&8;Y9e66Xa4Pvo*_HsYt+sx@`WgLpGSRBkCN8nPIi5*p59MAq2kEpP^@b15@OM_Q8c^YnBcyNx+{aD`q2DiYjCNSX1m;s2ZuynJ zpU?OIWA2ynAW!3`h;sG;KJn>FO&^;|nvGkGXVz{EzHe2oZYoTxt*c2_QKoDZO>z<~ zDbQI8V%04!H&#yCFT={sw7=j_uqKDl&Va0|s=|Ro+@&^LU1|k)?fg5wW?SZe${Zdq zCi?Hnaf)-l5-uK1xIW=pM8|#D*YNDV=A_w^cu{)>WrNOt^YmVOx*w9{&agh*WTrW;}NiGIku6*s0{z5BL?rq?qO>36M>whwC)YY`6-e$Ms|@>DNZ>x#)3SD?QEM& z&c`>9?yy$Uo*Yy%vMx_A&+scYDtYQ-yFryD!{DFQpj$Yce~YT}<{X!X1(9J26HT8F zomq2-ykTl=JGI~&dB=K#NV$FN|1$j9ru$O4yg7w$ju|D|QjX>(SnmA!x^s=zE7s_W zD)$lle)0z6a-Y8^+jYi8_NQoa`c+S@w+ z7N%sqavU47+nEpitCYL+x$I$f4((xUucu;wlqTbVVDU~?zMNDHx-$jQDm50A}ILfy9{H>g^ zE~!iZGSB0-DwpOw^jEX#*kPtZ>^|W}6+PdrNI#)* zAx>8M{*kD5DoaP|?GM5fWF22>8j3WDMc>6pMgHfadYbImfaEc~b9wPcD>y@v^ykF& z#ru_WN$G93-z(2Pn1AZ>0mmZ{AQnR1%Qy#6IcEE$$C@XejeCiyTGK1>u51_-~VOt?!|X_J~KMCyz` zDk1?<6wci2E!qLv^U5#2^To0LDn118AZ4w0g~95fpqy{VJefX~@^R)bZXRT6Lvvhm zg>IW;di~jtGn-n%luqgJ2kiH{wb4NgWp8PI!`Olm2MR34pGM~l&KH; zH@E9_E1QhU402U2vOA~oF4d>#+ubR8O4cLoo}r&VCQ)Nue9YvWxtyz6RjT_-n#2Ul za~UeP%?$MJZ>4mPl%Mq2r08gS^I&zsb0PJ)T+fW7-_^eRbstz-rL`q`3Y6n~b`0u* z6Hgb&vPdi4vdZSD(!OT#(p|G7N~Sm6#%MrRl&-hWfV)fO?cNjFHJ_fseJwQ3Dd*pc zX03MV$>ka&jVk=-?_cF40Y(u}*kE8C^4C$-SSqE_3_Tz9Zy>w&q@Lc}+IKOH_JZ@Y zUSY&y%3NVf-TZRoT>evs4WsL=g74>TCoErJk=pha(PD996sAVnm1Uq!VPoLuk1jHN z^TNt-RP+HYm;QXwl$UlY+p()7mTJ;*w-2`G#E|@DZz;Y`Ffdd(kYs#jU0^Sj^<}wf z4YM~7r_D<0Zr+faBkfgNP$#60_fWl8&`S}slA4~(wu~`3KD}Nyd-%~T*>w@aoKXj< zY5D9im;FW=j=4(+wjJ}(i%_YChpcKD#%B z+p>V2&bEpM_t zpdt}SZKw+@$@elL-6G4|{`9-z@hjuTk~Sf#>@-)$atp2jaxJ#w@_coNJY@@~4&c%^ zM>G>`CDg(`cu>e^^G2BqbwpKOK1~**I!7f(CZbjT-h*nGTqk4pKn1l#nzsJH=j+ek z%WH9V>yW&RB3^M!v^~lcn{6i1-5!6ZpZBcQ5p4r!y(2#puF_mFIkqfwOOmXWg`J@{ zr}D9H@cx6n_m{Z~XuSpAUA-Br8Fn+G!#6HaELQ0T^T9pxA4=)m_S`ZyZ{6trO>Ufd zfMk-ku{hhTn?g*po@TF)qknV$)dc!eH^-%FH=aL~FZg@H(^Zj8kX5Sl-Lxf1@Y{k- zVoqrpgRraRXYX@%tXJOoXt9y{hV}FHs?LZRQ~g;1N_v|0J(P@hulIHw?bEx?6M25* zz>~4)d;#`!hyK0&50#%~w_gpkv^rKP)9EME>woNp7IXUNGfkX693w9^| zxJ#38iEcBHM;%T1&Q@JRlXJj!>9n$zqjH>ofBE%}drM>-%F6bF=@jnKCv7%PAJX@h z-T!7|c0KCg@^7KWpCL)Kba3>f_CD`z8Q5wW3-j~g`1fV6aG8;%YvMx?3?{+-KW7Hi&dH!H~73r08 z6+dKmJ`O2KHIz7u&hx)8=M(>YeRa+3=Hkl9b-zfx^Ftr^to>88`IyT2>&_Q{;T*cN7({oQpE@qfIDWY>iTl>S%$A@> z!O<0}zUhlr;!BQiWs>GXUVNx|N^vk$CL}Y#vM@hTDk7BtM*^x;Cfl8>_x3=jqB|zV zIUznLfAgk7EC#QmZqLrh{!u*@mLjHGxbgku?~mrhmk<|6-yQ$>;KZ_9f>@6WtEukD ze$yhE1Iis?oh`LW4jJU$%~YSyB!{r~FdIZ0Q+b+=|G7`*!*}=1no`|1JMFC+A|>z;SLU=+$WPWjDKD^?e12J~*`t$| z{S4dMBNzU;lk}mUq|;JNTu#*=>0c=vDy~2AgYQU~3bjnUV0Pa!9idPnA@B2b`X9Xs zX)V+w^Ud9RQiOc8n6laTIFa8VW9FPLb2XnL3M80}`VH8L7R}$Cu9Y=2N)@@ICvvrG zhVY$PT?{mzHr{GNkaz8juYGjDy8w;e*<(AQLchxIm6EV95$s6~l^8`g*Iv)hNo}f6 z)G(6~i=P*M@=7rO`_9w*6+!=e=_n5ptz>e|Y!6<&L;gO6gFAL=XwK}N-hsv2K5C|{ zB1+_OA3HhK|1>C1MNd_AHmzTs7q_y>GYE3`8D^)iBJ6Kgl(k`2Bg-+ zACL17GB)T4e^^4XHPvNj5qXZ+CAstdzhh7A?^5nHyezvVc|dtCYj)}FDSI{H`aSnX zi$9IV`xr+~9%2jN{35W~;V+$Y{baiE+a>7)3q`w##*MRI6Vm)eu81EH)YnOURP%&% z@L4gJoB?fcLhmK{FT<))2d5IXgyK^nW!yEdJoLT&jYRaQM1ThrwP3KSllo*xB`h4e|rt6PE4~Abv zuEc*Y=gYDOt5+4-6*Sb)s+}%Oi?$Hp!+eHCQ?hCgr`jmdZQDju& z(s0L&RWf>I{87Z6$t0EHHro=@5fAp&QM(Mkt@+>P!ycE7nTx2kx=FtuxVQAaZA_(V z|Fn%S6~TfB&hMp%-HZlBdA^S_aU3Vs>?nhEkUF*oGK%?pP; z*i`^7PAu0rI|W5#v>p2>P{xqNXcBPkqR>?ecWwKy`PIhP4$a9@zW)W^yMH>vZ#;2J z*Zadsca^&L=l9)}I+)&fuVa6RwytIX{)k?gk%+$XOF8`*Q|z6D&0qJa*LZm{pT6V! za`=rKl!`gfLrzzC2`t) z+Y+X<(F~d_GyQ>iy$Bg=DmE>X(5(#L7KfWx$=~-7N!&Bq_TFYIqh{!i{BkKxv(YB z#K`gWj^7LWOLBK;kwr0>zt@*rZ&1Ebf8mOZh2Y62$)9ZFo1|S5xRYNTndm4NYE@5K z_?4g$lC<>FImBi7ME)P%n0Iw5GR{#PNsCtYvxMtQA9#wsl$NdEuWvm<{&dEc`NX%a zXu=_hzIeW?e7W|e4Aac*-dh0|9j~?i3eB$%_Pa{AQXl(!UFfvq34Q&ObpGn+jZL>d zk-mF8Y2y09ET-z&XZb>sCmivwS87a^sY_gcd|*-a%PR1`BOaVPF0q|nV4&MKzs8e4l^^;uTu5HIa(capd3ALE zjMP5K=C^H_l1zxzMABiOdrr@F z1UOn&vRFQ(Wdxtq4NJWK%Ox|z({QHj!1{v-v1wm&D=CKPn*TgGUACq7?ya+ufxj%x z<#~p}katm;_IjbP$+Q0bfwT3TKybi@ERk0C{_eS#=Y_ANo8Nsv(k$D-9@h1Il*8Gu z_8sFF>e+(}YxD;NuB{0VCAL#b{P&4pKf9P}XBiQ`;97G#h(u|&M=ZzLg{ZWT4Y;PO zSt|+|uuzSA)9=*3NIseMseeJ#qC|Y!$f78eWxHGX`b`nT(y>#vww2~mn3DN4 zD|>@5_j~!Bm(N1=2j>54LR?1cXZDVXBr24X&3({ioG!MDckAl#iSy#&X8LP9pgh7%Y)DOF zXnN@$kw&0qo|I`Rjp?A0RF+Zkuu&JiJl&W?K=KoT6@LabgIhlF%kN%x?;-tGCpN;v z_s(x3eH-jqbh_v(%XN%_ils@)%bON|J52Pb@lT_Fi23QTHuHns)a9ae2|+XStoJg) zNm=)0++)M)f=7n5fSpu#>l+0r>cWOo zI^U&IW=BSryyjLdzJ8(d8n-SuwbS0@mw(G#ZtIP;X`@1>h{40@TYDKFalU-7%x9}D zbHWJ z{nL;Rg(z65$sGzik3p7-3CFiP-#hl@Nx66)?1;!)tRv>hv+vqt9(gFKP{IQ7!I z6d7y=j7uUQAH?jRf75O=5Nz21gmJ_-*~PF5DOcwSv6>@Lu%o%6DsXw-6T z6fJZ4;brvS&Ge(2bDH?Hb^R|ikp>qa=M5pbQAJED4h{|ms^mOqui_;-n=EVJ9Q@%~ zmZYR4)bKPQVuv|Dzuy}l;kgH?!B<%$7Uq0a9N}?u<7IRuoX!?%ZsU6{{@h@)x%jkn zd^Kf=!VQmNj41lu5rK;}F8wT3mZ)?(9Q%fb9z>sP4{Un-Z`yyz6OtcrC&1j=8%$Ya znF0!8t!^Vi>Fl=tie9di=Oqn1h`?)Ryk?H zymW&kKq@eBC%oHvmfd3X9)-itO6INZSG;O&O&#;EiRev}iOAaEgHUPT@*8$bYb`(R zO{|2H7N96I1xq_jShxLNsgOKbavru0eJO&t7aedGn;~o|X_6|od&a9S6H|2TO00iT zJ;By%;ZGxB2qBwGP>EB?@q&W9+K_6_vCGnYE0MCI_*O0|MwP!*ALqFwbw#Rl{jKx(JyJ?-X*hWf^rhHK6A{-`n9+K#4D<%Y~-IB`9-&BquU?JOs;uME<%xq zO%FobMoeT`#&J%Wckent`s4}DKQx}Vwt`=>WZp3*b9c|${rY94L`QQAcBR=L_%;Yf zlC^of<#+suASZq-+YRDoR#sd|3xus&&ne4dGt5Fe^sUYbZV?}CmS)6}878{Gq*>iC zX!h=H8HruRd-|9T$H%WV|LJU+LmauSZ3nVBr*4wl@hFS+`eL4Qad8novg1;draIRO zBoO``GR3HR3HB45*-rk>TtohrUu>Ajq6#GVdb#7u9nzjXlBv8AmUwiNKt&>{U0ad7 zpnSXJ+O=z*2d3GhZf+9Q%8m|*mTE%|4Su4B5A)Y&W5eXm zbLFI4H*a!q?rxUEIY~{4`0$lV+~b#3C)4C7(|S1_?X#nK@d$stLyLavrH5X0T~;?U z1{XtH_&;Ba(9n!_s9ht!o%Q?!=5R5fAG1Qc2l|vQE-n}meSJPoML{`V}CUOF>=l-|3W1s4@?IXVW`X^{Odk?9Y{7Bpruxj$!Z9qjp1= zDOChd>h<|EToj&MaMo0TcWtjP?(tovbC$JE@nl+)=bzyBD=RDWs7n1wZOVpMVlWoq z=;+vza?Y18{vZoKKZ8^DPoBb+sDiRbj}+pZI=i|)^@wz-!TkuP=TaF1p<2&G46UnP z;d78{JZ{~yl^g2s-__T7r`*)x{BT&Y%3NyBv-7m}?UK#385{Q zuzZtoLqdxwsHp|(25M@}GbTJ*OdQvGTomw9$5r1OrYgd&xk5=qq-aL)_=$A;2N#t2 z05~?8f8iS?Ekwt}d?M%7MH|qvLOo!4SQrY?2S4~JQI?KLQjuhb#gb`66Y?qvTm5st zFitPqJCq+rN-E z_VMv)(cwQZUr-=~k@?HoC)Ux4y>m0QN5i z&OCm*Gm?x8E^`xW6ff=^9D4T-MlZ)ZMIsO)kl5~E%zE6TWk;O*LkOG}_BO9;LnG?% z@n>p7V~T&V@NnO@!4IFWM12;~`Pj~V&Ig4InyDgtqc?=8zK)*OfHv$Kh-XhuHBSi> zHT}U)ISds@$RoC0>K}{ThE(PO@0sn0ou3h#v>|z`a1q&aID8A2^TCC_!NqU8=C4o^ z!$wJ^g92->uTk;0T`ff!nQ*wAto)b4MrgwV#MWber%Ulfnf#`N+w@G^RLsMhm@@W) z_7-6)By7b|2Y0Br-(}Ys<#W)Vi>-%@hzr7-8N=5QQ1UQ0Gc)_!DIgK?=f^5e(;Xz-zs?`r#398kR5>rXclB}6%;GNsN9 z3=GKAegFCMh^#xaM=&jfuJaO9e=uEOsUJF&?Yf_dt+{uRk@7@-$S8y60eB`-EJ z91Cb!keYp#-tThZ(EGJk63L@Q?$&a*c6sGd^ldPu<@=cXvwq+nxMA4=hCH1n`{xV<~w1o?)stn{aVS&rCxSNEv8u? z;aRtEa)a-SMLox$E(RCQ<2fqi`Gj_QNSm$(g#o$~YTP=lAcQpy9vgMS7Ds z(b0!^`Ifcn#rdZ86cF7S^CZBOky4ApJKyqJX`9^ z_xNvEl3%AzqMbG8w`F*)4|OVNf`!43XmZjjPbF!=&)?rd$^!b-NB(%MF`V|OeFGnj z(1%}&i25kW$>HZyo+~84&p!iT^81RK787)2>-@gGY9XPbIR_Jl>^m3nTJ{iqgUs_~ zuD8E?K6#L|leUm+eDo4!$2#LSkCI0^RIeYu#!FR{#+r_t9PI2Zr7}jV8WJYxMRGDT z-Rh=J?8$1{9Im6sVM;f0Z7KA~+^(-z4vzWBlha5v^c|ZC#QiJw3uda3{mIv_e@5m* z3|K7!EW@?GkSt1bftF*h{dK2F2$8Gb_vbsS$g>)x8qP`PG@PfUoP(B!mAUFgJbrqs~(%PbxA)wxl7~H{Tbbx#C#k^ zBK8n8Gz7Y;0TDxYTpquc;h>dx8cjHlMa8aa$dwAUUI479vYY~NQ~jCwlaYS-t?i_&xl6BU)}JI1(dz4!5~oeM=@OT7di=1w(L;IP&*x&{ znPm&&4`pZ0k1J_}ztE{-irc1)9Oz_8(c*lQ$Nk;QmizE%>HT^Az;*$jius@`Get*I zLKYU%=w>e)-5KW|y5~)~RW0*LegEid{g_}@3fuC2oe)xMWuKn|B?T+v@jfX@AN~36 z*1X?G75eOt)UM+W z*t@s@&0;8nr>8dqWBMpm$1fDbZPPCoiEUR|Dw;~K6D9_{=F)gZcq-W&j4pKB=(dXe z-2A|}+<0MO;HFM(#XpXGv1dM<)fn2 zQdR{U&aSF~wKjACx@kcU4HvKLpN1ek)`;!x`L&qfeuMY0Th!fo9(mT@-X6|uXNPhV ze>_5PPZc?YrAsUs^7X`S+lc!ib@`` zW4!vUwqSCqhx%N-gZbcbkLWw;acgAzE1hG?&i^tHm42D0aHG(A#^PXPNLjQ|#bW$& zqIm~nvG%2$eGe>~zV@@UIK*#oR}|Mt1)r(?b89K@0+l8k7IdTLN#H(xiDG%ox_G(oxXh9?Mqz>bizH=D><9V`Fjn^ZWj^6j@P=6R3jpQ;YNq-Lckvb3K@u& z-J||yf8sp*pUcM&TiQ$)F$NyY%HI%=a|$=gHJct5sxtQanpZYOK3Z&dwv1Y%+3wMW zoP0|bM^`Dc+L++3J%pe6BnT&0{Jb^%T->T{rD&jLPF1xE? z&CTJH)lSgvmncftjKQbuiu&#z%oo`Yr&b}Uh}~nJLw+LJ9bwlxmZJ&T8WD%p)z$a! z|Nf@-9G-qC>Vp>Xdo?v1(22wgiGP0P=&@s1ol>D$?DgG-A;zxnSzE-;ukTACx!!MT zK0@9Ti=FaoQyu<)vB-vfGEXb3$?QY^!^ELV-Ih#H$4mBl{b&)^34p}NW4llXPCIz^ zBg#IvP{GRCJoyYab?d4i6pHY55#k#VZ{e(gy!IBBUZLA)kGc&uNE?vbUD&OQVeS&V zxnhxFG!9q=vB^M`_+Dd4amXbfvG0i;ETv1^!RF?qM%x$RWPqJ}Sy@@nSCy}^lMSpR zUQ9;U1^bw5O6Oi-eGJ7u18O>;yk!^d4P{8eZtZ%l@F_bUxPu%Hv1n}k+fRi&^;Lsc z07MR^+9KNQA28C>LzHtZmYNkJnj^~C#Y4G;GW*%{=NbJj2x4)sbzYy8?*psw$on1a zdEON0T|-H_@#QV=DLKd`z|il)>YG3~YhUYOfprdiv)~eASvX=ICA;nN&??dkxCfD3 zKZ);k>2U!APrV9n@JXm?xM!?x($mx1RXY+76h!%ACwKui>2M2xdoZ2|*3=5J<+dK7 zxW<#(LR`h8MY7!6vcXay8v86&^eEmTY;)35Q+EXo@Lm6ioSkxhh+4hJO|f8$jjuQ7 z*jcR1;(3yMuXY{ei0P1b>0|nMLI4&LGu2KyK|xQBm4>~)aRknGS1)NFZ-Kcj%8KrH z%1I|H#$oBE8{k}P4fu0;Yl>CKfKpOg8U@G{w(vj{!LJY~ksk;bFRQ?|$>uIGS1Bx% zVE={hiswJ8>;)47Lqo%vH8;Q+s8K%~8Ui@r+vnU3nQrEL2kt;c)($QW&-(h19T{n9 zcRg{jY&&~i-1}byeAM7wH0&sUonKu=RJ533Bl1HC{sMH^{X;_=I~=JP8P}kDi_e;~ z5{u;@`o6pI2%(3I_?!PPzpb|U^5kfYP0{IHdIe9y1FE$M6jAX+PPj6cK$8Fzqr?^z z6oh~_LjBiE?J8}Tv9IWdt@Zx?e(dz6_PZ=Mj(xO69glkh42V>$hK+HsI7X`9)Y{HG zzYTSM7!q#%`NPPg=++ra4RFany@tJ6EPwv`_xBGtGy?+z4-a6qlleDoJii{X`GL=x zsIsB7y$_3wUF-1i3eMa6;Qj>a3R|*y>sm1s>WD1-=Tlr?gBA?M6KrpUtHxp6iF5-= zs!oK2hcMH`y|ix%GASH^2E6OD%zgN{D#151Y_OQz{bogmtdf^qTwO21e3zg|N*Sb* z0~Rav*W#_>h4=5@hi}crLl}gUEFX{toQm6Y{lfshzF)sV*gUXpU_rWrqB#tApDx@O z%2E9ZQ3rmt;1t3Y=;uQcw#&x2@edKz`wt(!ys4#apd^M<($S;Wu3qhsH-0by=XpUn zxi9!TH^02aYV5m5u1B#F4>92sK^-jeMy~|nsu7aWwWoYE+zs&~`@Rj)UN9m~$V zdGiD7!Q3U>Dg?W)SdI?{Stzj|MhRH|ue5FW2C;5@1cd70c7uw*yh^wH-rYsOI-k0ZbU!ihfh&Oci?yf>L__l=i ztVJcVryjW=cfh&}gfmPYKTS70j^}E?BL`81Y5?Kt22F zd=o2;1Qh?~4hUZF>3U!M?x%0#fGhpSQ0jn0r z;jVbb`0N%HUOiKfyESZ1&dvqBY2VZwY;93UpQY=fVBk{@`fQbRY-w#RVz!-lgNliq z%y$;4HafuscSBQ>B)#=;+ z#^W$$gH_3{V;Jl_@dN0%kc^7=Q;aZ+rIiC7nAR)r-_Kf-v!Gltu7b!+*QJ8z3>kem zoTiCw6E9|C)!`Md~q+wn|!+~*{bezXQCPW`*N*MUJP z{?Eq8{ZNF!AtL?&0}G4c1EzaK#0ZhX9*$PaR#sM2;#gK6A+dCTNCHfL!7U6|-hTc8 zRTEoRH4dr?K2%63x}Xe?%fKB>ms2w#jg7U&Xx^r)Eo1Z2^9{@M(C2|z*R{q2W7;z(qG2CJGD=x^p zTyfyZii?ZsxfMj@w8fqP!yVJJwM}w}*y4C{_WD8Hl znd^Z62V-4ojqP<(`TJ1vMxjaaZW%QjZlKz9vL+X9dTiK5*qa0%WsBVh4+_$rY$zKj zfn!5br0YV>b2BY1Mk328a~J$@#npPh+l>~HQwRgx+gGm;tW1&Cjp>!7QOL2UpPs%ZRs@LYd!pn>%a_9=VXi z8C--2gPp6d3uN%6ZcD>=^fB=#2#f!PWfO|GM5jr_NOCo@Oj)jJc$X;dUacnp`pDyz zb2X%5)lPzwEn#)>G7tTiz^&rAX%gJ=Zg9<@e^RuDhvpBTCgmj9BaVUJlQB$%TMUXv z9o0et6l zjB|pKL3oY;W!x^HcMRNEow~3-+d(K+``XQl19s)EmCQ}hcnn?D@E5%f<3K?{EwQvJ ztB0N9UQj-uoQ4k`#ea?uArGUfsF+?LQC%x4E*?X7UY^U#_jI%cd}87l_R!SXw_Q@# zwr7e3B$riGgmlwkDDpc}eg@LNbm?&CD&P;ab=dT9qFvxOiMI;>JzRv*Nc(F&Bv(Jp z?uUyZQ(sr$nK&Xu>Rx#nSjXW1hzLzZP2J%ej@AxU9YmXsc`R)PFAxTR?O0z5iJXM? z!-M8#K0~i{>#5pv2cc3guMJB@_>8?$Q~|0lkgjkm+$$|jTA~iA7*y936cq3UMsVmn99q_h9KI+KJ*sb)LvbE*2J&Gz&}1GxA*5 z^@YovmX)^AI|2i-D|CgA!mxYi{&las4xqumuodD?)aPobtfXT*Hk`C#8$?U7S3)8f zFZ~kW58&i}6_rMd3d@<}#D3%8k2UJiQRLRpXalwj%G)2u`=yD-);$2+H1?x3k&-{w6b4g?87h|YI zdqZ9x0koci0*lLGE)8+G>g45Q*nY^|-O()t~r8XY~VZh6-{__CfjRVNzTC?SM|g`E!KLmcIG z_4W02b%7>3Gr$jVc2xrh7P@hyh!dUFVgeB1+p}j6J3IMMpaxa=;f8FseY){xn$tsf zC_DMIsf?|yZ}+>5UL)F1mv}?(3}QXhOtv82W{?^?=2lGK3cCorq zQyf+?ScxzXwUb?!_@jGUbW=rQ_jYbY4&m4?`h`FMXCt(QKo!)qM_$Y9EpXUjSTw0^Sf4iBBQaQr0Xb zxW80Pd_o!X21ppe4u*jeHN-#3KOBHCmQ~xXeN@}d1E%H|AU%i0#l6L z7VZS(TdrQY0*5#v>jmJpytxUES*PSZs`^abI*jVE!7uUR`MJ1o3}^`~OzCjKia}$AKry^2dMSS8G54j+25&>Zo$0*dA4-$=3dZ2Y*5?rvKS9T{TbmOXH%#49T+0 zy!-i!7jd52#b!|_^1OyfS|BF21x_rIo%j9RNa2ywTomHZx@d63*ayBclg%M&qpU=| zX?9ffIxz9w^mKU1Gm?{$so`BOilaTkoX4`7o`mOZZe;@ z!wojbt>k?H^4X|Z0l$D}G!ie`xKLfw7v6OZ!?oFEXsew_4O0QW^&iv6z|wGUp0p6h zZwxFLmRFROEyIrNH>9~vD1QYQfxXvhXXo|{Ryvp56tX_N$ocd9;5#&BG3KEHg{DRk zr<;&yG)#3lbEc%nS#s~*gFJ~8F&$mW5PeI{%)C-_6`rt!FZ8mY46Zm$BD3df1qB5` z2no)+n(GuDqZq}&>xgV4P&p1|@R+sC9o{6GAWI?PV`5qd6rL!ewK^NcZE--jx*6u# zT1?$2x4ajEoa?6W#cm(;PP*Uz31e4C>mVblRXcMjDI8ro(}Rekty)5rk`~OAC=MU= zoKI&Z2}dagjL<3n z88Z@BN;qY$bx*v6ncJbH{dboNv3@N z(LpKj(oGRilrS4DNdK`enxKZPuAFpNC6LCRV=EjTZaCN`vZ!eu(YtCKNXm5&j)b^h zm6g|>xUThdKKhK%D4D>s|IGED(xmFdlI`o~Kxdnb!OA@H8cK-&7PQ`LPT`)j>1_fL zexO>Ld`RkE#7`riv=CW+d^V&N+`-)39Kn^ufBSvfad|qM$7kqkh?jWJ-2nVPO+7*5ELhvEz8|qr)wHyJ|7_ML?6rCp=v3eq}zAPrxm%BPas4LQ*oq)Th#v>eu%^*CMJesNSEGcY;S5h z41kEH%2xnnfjFl?%g7dD3MkY0nVXxVj=bTt0c$&PLzb6zgib^{M=tPRij2P&Aj5a% z>eXLpcY-K?jasf~XI+gk+EGi3*m@JQO(KS!LXACch4_5{QFcIYA{KrG>{79bS_-++ z63@g^&k)#$h{q^w4EJ@y70hCh%j)zgxxIcQnp!Y}0062Hq0Z*6OZgWXLuvVncKh7m zdFq4$aI`VCNONc>Z+W#??7$Of;D^qb-zdrBt3F#QKLG#_z$?m@(;st;RkHl~lyW(flb3Ujr z3jly53XX(2ER$cj@y7ogIl?Tek{KF#iGn^+VRE;193EkRHOF)ETyRou0w?O->ro2_ zbj5S8=q_ERrqnpp$I^F%yK3_3 zADla~K9&*d_9ufzMAgm+lXm+DKksMS>1Z0$m_}byP8w6BfA{nA>zD=Q*x5lBh1?|! zx%l}hr2o+$pM=9V03UE9f5`hA-@f@JH;IUeMZY@26l+~oA~E2mXx|#%0E;RK*ZUv# z8>hZUeEu(CHb>%|LKZBicB|IeS*%Sh`hmsSqZ$+&?(PN$SfZsHG|@y2{%m@nxpJY2 zw|8%{h+)zKoJd&(1T=uw0n?BTFlxt~Tf;!U;i&G=6qHyXf&O{wdqkpBH!lyfrK`pK zU~zZYs;#L32*FKcxl}BXuoOadUo~_?qB}cm?7(%R2ai?^^`~@PX(_#6AL#Gr2Pwah z)i+A@R2RtZU%((A9&)07moJ~lxCUxr<)oM*DldAzUV+eM!Q3T5rSDDtdZ)8rqN!t>N5qGeQwDpETdsF*7wN&@Ds}X)(b& z4_JeSiYijrf6h+F<>ggYaXJc8A6`cDx#L;J6w`EnYaBaBk`|E4!fft&g&t>a-0Nc< z(~)xH1~V`36!1~K9F;bbQ$b-Ng2`S~S?*z`YmOihQN3EDXsoitHbRUi2*1YWX4}RU z2pb+simBgd%y=KUIt!oTUExoZ!AK(K9=|qy1zd>Cj55^&b+WcL$?adr9>0J6`U}QU z^8ngvC(6y+4@JqN^Bx|)4V60>hY_lL@sm&%LvmM6S~|w~Xz)m#nW2b^3Qv&*T^Ae8 zWn(eSfZ!f^J#d~Cp%ZSv!8fqIvcBG97ftJ zCMEWn)gVXrP*&~z`yMWYQR>ZKq@%F+yV4PJT z5Oi-YD>T$4H6Pp`+robZ9Qir0lE8WK?3CN@p%I$r&LgxCNuZA24T1>rd$7?qoAuPFEW5=h3mLxM-}dTHqCFM@)0>NhA|1foz*(rJ}G zucByzb`Kg17426r#bR|c@!GYR5lyk=-&%`zL8iY zlW5RFf=x3EL;^h(6%`GQ_5x|(AwGuWv+=7Geq(t;MDqT9CsiZut9nVHIAWVhSH7+H|GF-fv4Ubo86s-M$Q8vOB7=3-LiSJhQpwiNEYojyoh*una-yWjf*5Q z)YUTQeFam^@<)k^`)<0TICc_kZ%Rr?Fsm4EXZGO(nUhyjST9M})vgxu_jjPgWDrzH zq4@pxXWI^1iKlRdKp3Jq81ZMdZh#~6c;phM-YlJ+=fO5!LUD}W0p+wW+KLF5;rIt9 zcy$l)pCpdZe&hgrA+cool3)88=B|eAm$jxaH><-OU?D(4oL!M{nY7s(2L*z-9=PcE z$lam0AD>St@t3 zPrz7?DALoupW(RJP)O7{jZz0Aff<+r!aIZE_d;*-L{=NIx~44(3H1P*E#ByXKK8uk z=52II!TEXA^5nGUSyn78EbMh?1-&#(F!O_B;!l*{W+zWVE1ij5iwTUw4iq?NLO;pM zA9| z)i99+APgQFZ*z6G=L(Z%#5$C%G!G*J?{A+yaOdo^OyY(1i^V$lCD_>60m=Y>B){R5 zpoMZE%b4J=oC9q&Iu$Oa+sqyB3f_GaKn{`r!J^>Cetgf39e94!xr2N87QH3uFrIO$X zlc$G>AyQuu5((mtzN$C9@BPp!t*u?@d^pbDNKbzWtLgvjLyEzP<gtl2QTXst zUKI!`KzhWaI|9pP?I`=3$4W7HH3DK-GaSB$`Y{A_)%NkcZ&hLW_=LzH)Xf=J4t#m> z#HZq#|5m58+DOzfFta7$@?e`)KS!J zs2~p(Wpbu`!zrYis81^LX@p#H6crXusV7=$IIqO>VupTrf|xA+jgBbZpXN^G{Pm98 zMD>N3MfO!5->-m|BWyWjrzeMsY7=%*@t8ot)ibG3qtw#8;u zc1=$(T!rqhr34$uDQ7O!1vV9&l*V(l6ZY8a*yTWq#IFhY?_DR!w1|CdRGd1vXjy@* zF{=2s%BBuX%|$R?yW{x#CjmE5ypXP8vS=2wjK?Ag0(o|^wD<}#Q*}HK=?BfDu#7jz zAy2@-$?i-WhNMYEEr=pMI=8V${sCb1L+aSQuW5V)?^&W#rJBTRJ|m>(q1BwRreFpC>vdSPOr zRXJ&DHK=0z-iTXK-57E zo5*dnGw6bRH2I0&yu>)knObX%yJ?PXcuI;TZ;j?@#b6YghlRyikq7P9UFTf{_)rv$ z)pjIFB2LyvWz*QQ9ofM>8#(Jk)#& zDggx{Abg`)IazD(_ia2m_uISb@ao;UQ_h@mJ>aX_w!wZInwaPn$6?HrPoFTpVUUV3 z|1*zYqhod)$GO_N%Dee11=n6CFM|pUE!Z>%fAt?yuY(l*?Gnt*7#R&DuuBgU%LQuU z4I#)rM|MD4-^Y!7lr2T%yh0XLwB{&*=w*Fee|&ur(1C5ekKn$2Hwq+@7V^?>nULKs z*$h=5e6$-h#_x@T*T^@Qf)MuLg_)3ENBQuLkt5@50{8h0Qo|8p@QN^EpNK?7`T~yP31r`1kT69bN05kF>B zW$rCp6gWKP^!l=$)jc*XT@@iI5)NH5c6K&$r4!c+tO`cBPTw}X{_yGjheu3g9(Hqt zU2DN^dKrOaR5x_W?n%qU;@?Y4DsAM3zWV5@)+Y*5_Lemjrn%!o$NI9cM9?g9EpYV0W;5$H_AUEU{BpTticnH%1DAydqw+ zZtu8h8noQ4jswNBv^xW~@$qr+SLZTiK8DiD%3YL(v&7$NK0nJ4A8VqLEuy)%vZ@OB zya|v2&9>!J`ZI5;m6Vh)=kpQMH~5MDa?Cd-gk6To^ZfZMXCn38uD1`eI z3ZM{?pIvv1+FNs?qB?LR(Bf_i{IuJIW@SwSu15ttgY;f(Hj-iImny){?&{&u2>JaR z1{-JH7*^f!zc8+*@EZVs0>fkg(^!vS%+1==evq;9R<1ZgM@+tj_TSf7fe6WUHx zogIacS!PBmNg;cMWE3(Yp+U&rJINLeBNS4$RN^_`_xJaEj^FeA|NQsd$9>#)NBDS; z*Xw#+*Lj`id3AR$%p3llKBzD~`aYR?-|KpGGmM1dk>Hn$!S+duwS$sy@cFnWEIxy)`~()UxL*(U86g9x2fzqPric#Iu&S=%4lX zp_tX3)YQ+|(?pJ&knkFfH8{NkH+=Bu(u-IHIKq!WvoVElU{sCoKfPAFLKpb`P4u?s zNH9&SyxcJn8#Du;|Hu>$$nNcaDJ&{_;kEOg;=<6-P}HeAIc42|-xHBfl~m{%nSkyi zAH+ZOaT+szf}n!!IZ#f}cFWv2P?0H{)>A)*u;!f&i_e zJP*^OJS;(`r;g*D(Ump{e~*VqumT}?1Q=5t*7-1;ochKuK}KB`Hnz|}mO4}ys?l84 zc*xr+=&BG-^`h9%uotR4L>HZ4`(c&vI`-EVeeSlI^7xJ90F0JO?^VYYF+#Cv2% z2|A8~LLE>DZEfv+@?Hga!njCeDxDubRP4FJ+?#O=sh~lwCwc*8sao*1t68PnXJ+Hz z0HFRY8dQket{|US7p|kxSM9rXLb}84n2u;_YHBp+QPPRY&6kA$8L^}ls?^@j?nl%Ksrsuma+h*#x5YDJG~DEw)&HWTfsbJ+#}ZGMc_`Xc2-4W`DNk-MLX z>fIGvi9Iv*1#I;-9{@O@<_1mcS7an@yykue$^~Vl-Okbx3}3*N}O zI5Si@@jo0Fex~;GA-s!+QzYy((fkUybH_4yXL1Nmaa;wMS0Jog9io)zj`OuHWjeuy zND43peXLcGbRd-FiB+$`x=l$*K@yz&@S$~eh;UzQZ0u0~q?;_dTKIn87s~pVX$^%2 z5J)8?CDY_@koHMs-JI!p+oT zr1p3BDQ<=_mWY}sdC#9efA;K%w6wq#er2RQ;dk%80}aq?ft%?>+{OaYmc~6?u;NayY>+{}9_&{O@8e{f&Z&p_NMn+Ny_f=I@U?cNHFiE5Zg@h2pOpxYlm$X4~ z358eXd<{nY06o&y)&1rs3uQH>5Gp?F%a<=(S&=xURx~`Ga#!Y#BS-;K*MEM(+G&im zN7;q%#3`dVC1?D^zZGD_?++sBt*nM)Z7Kz5;U{hoVBue4^$*&1uA6#2E8>yS5oup zRc})hS%*AZo8h^0efdvzFa7i%L0UCfS2y-J`osqy;P~+oEPvG+hC)GwE6+@dus{)4cJ7dpmL7xz6F?>q4difEXGSTTj)9s5upH3F z8GZf5*?~h`(G)IX$BrF4bm+7|9JZjc( zNs+*b6BhhTiihjtu}{P$B~LjYMmE97NMzc0EPwv|kuK>B+rZn#+$M!j7bi8s?%v(R z5FEUoe;5nxwHGHc+OYqN(E_Q#MQ3MuyEpfN{1b$dcOmOy_+ixPs9$M#hE08h>-cv6 zW2GAb+k#|$xV=5BW9ng2Qe9unBlA5pymQHQf1;Hi2YRW0p%j>0{*A61V#rDoi}Ae} zrP*fR+ay}oCtAz|87RSgPxtATJjg6#_~(+}+|>-3eJ5{!>gi!*_KBd(Ep1o4KEn6F zfBce4!#VUr3XFwiWMp`GUgaP*dww5u!D7Nz|Mt{t4N(m^5emk*AY~Vq!x7Cu-0eqe zjvHtnIWciExf7)*idXc&eu2nNur|FEW1P&)&9R$snA@IY;vp$a2`Fy{q^KLYeHZnP zi00Y%FVjj&R`DZbr1eN@@L!^bFBVNK(T8=bnJYCZYh@K z@gAXj^bmu3f|P*l&@G>Rpnny;X~6Fgxq_Z&D70%tM?t|aBm~sp4dnNXf0bQbwO)R= zE|n4xAZnd%Z-j=wNHu{Ep;td~pRJ>x;Aez{ATp&v>VZlvYX0L+dd_NYYPMT=F~;5O zY;4?W;`06)_W|O`$7@dg{CT$$8?w>55W@?{`uz_9umpPiRmGXPfj)3ZViya`(*Z}? znCo5RQGO#0{M;c?QJp9dR$tT3)(G_Os&~ak3XhBga)U?XiHNOuyN@~Go!1H-U%V#b z>HMeiGe$=HS*RD|AV_}uwz#OMQ7c&c?cCfP+yM|xT-_{j3}2aYI9Jb2sND}n7Jl4xV9C>QnY znJ_vxxRC-quzrACw+E}ZxVR`P`s1v@apnu?eYdCus2F4v6al#l7tw^gb?X*!r3MDp zR#rXzLliiFQ6pg`LxUXn&(X>0LbdnQ%*=0e*9jtW#~8x>hYzO842ouTI1O=`)*T9v za10N-U;-HkY6uZ}28gnjwzh%#7ZKtwUw)LBNby(I`T`yd;HFu*57oQ8-hxxI>{W#| z#x#)7^-WDx)uO`lAZk5fzfqWD1ieCTPgHjJHh5EXc6MePIOyrcJb19@D<4Z{k3Gf$ z72|nLbJ|m`$x&dxJf>BF3268E5xI*fA75j0 z^Nwr%>JuqxX}V@9L(bQ{z2%0MQspULd@LABtQXn#V1*+|BDv^(2b_^hV)!c+dR+{agm?5Uh$0;w9W^z>b8-&! zOLo5-dB2V*MBK~QAM=(qWr#s#hSRx3si;otkneF+Fe~ z2dCLtBcmJM-kGK84B#g${V6tmZY+!nzw=6q>PMg@@y*s?LWe4Qo>J{X>4j`jQcNty zOl$t*)!*GMEf;}a5L8zOhg?G;DH)mQ@Nkq3naey8%>bbW5te|1yhba~TK5BO;VOSd zYx@5BkD&)@v$OZR>sS>HQ^uS%G_-Ya&`3!E7$U?J7DLUEqS3GZhpByFU{}J9yLa!N zduh@9R^&VJ0~snDY9$dK47G1HNDZE@w1 z9Fai;koP32f}^G?$cP7}`r&Y#zyHnxz%GZ)D_9iYuA1`k^V_0XhO1M_@;H%p0YcQI zr0JQNFu`j$UaZ{Q#Jq*poS;wng|`pWjbbctFDW;>*p4Yqx)4rn%ZZ%W#nN`hKY(KM z^YiuhMzmrLG(3QR3+#JfRkSkHSY_)#Xo}{}1NR53_$x-^6MM z?3KJ<;e2;ZGYh{a1pg{!O29;Y;Gf+>8XUPGQGg4TDI;AnY$D8R&s?2qrBmWG$4VtBRyn?MU9B`MiR4U!4rlY_z>>5$5!_KU(d+u9l~Fi6}I zoQajhz`iF_C@9BI>+0&BI@O3oRy*xP>YOHDK56lJuISxtY-K>{QCE9nMPVUwGoXye zpgG7Ssf)5+zC{0C?#2?oM500g-bg^l)N3#K9AB#Goe^Ev=SPAvb)^fY>-oOd?4?~s zEky7d3yQMij3OqqQ)u@e78F#^>8qb)AjNS`^jJQB-h1fKLL)#Pj1(ONoX00BU~Z_U z)`}hEuf_0t4M{9?FYrEyj*9vLVtY}*th;Oq;;f3A8enoDD&t~tBkjPMs;a9|O5IOR zrn!EYEBZRxm+_jr7#RF$5LVT!i?KMt$AH>T!qs&i{80KGOTT?Mzh}4!7sOIVU4HS& zYs~6#W@h^H*o#Ne*#!cGE|uK%pOIFf{?`zKQS)25JTVj^dV;R5x)&Lkm>$@l1%nEa zt*uR4-w+dX8J6Rp-4+kCJW!b1!0fkzeMNcH^7+%mc=DIqofV>uRb8p&9-KsafIW}B ztWrSrX$j;1|@N=UM2Myb$OPZ{8rYi=imP1r-;w1lfG2IrtnZm%HDw=L5`!WY2-uV!Yyg z&Ap5LK z^MB-aFY{?&AQ?+?T-?Om9QfZc>9nz*=08`4@Y}R?&45MC6oy<&y;qcP{vFUKo+8!9 zijIBW%Rg(VhRESkaZE+O_TL(ZgU>T_KnkZV9+-)=R>CVvP};6XQiJj@@*{v0J_#LD z=*Be8CT4xd7|bhIukH#{hIH?oAWN2O9NCwhu+Xp~dKDp!Ge*Lhhs2cUO_o2^75<(h>OPtOv+o~3#7{IA1b>zqf$YV3B$TwmG z6OffodXy8(UA7Y^CxDy?L9CgG+N-no$j;{vU&TgIeM3W(O@W+81Mg#qFE!sO+6|=m@IT2l`~yiwsQ#HV%j=6a5{Ew5`i^#W!7zgI zn09(4qVcQYxuhhvtW{N7uJBA&5woF4E6RF+0x&Rv^51lqhT2|E&d8{!*xkDg$=N@Z zy6y{Ud?4kNU_t*fFHcijyV?1vK`zca^ws;l3ixk~Dw)>VF@I2Y+tZ>TwlsSR0w;w-k(7>B3`UcNHT16jgf2LIRZ2*|gR7`e3*#%bnrpBb-ZH4(a<*!T@QQ-buKc0DV~YlFvGilgSS$#8O zq}!TzO)w5e63V5D+fLCSTaLTQ;`DQRJv4n3x+M6x?slyZ;t5TR}R(~DFT zXE;h8fHoGc?a0Bfabth@H8d#U{D@We1VO`xFR>#>>w%JfKY(}B0uBf{5Jm9kV9x{P zW^}VHRge2-kjMwv@Ef?EJ%{YNKTAMel%Z! z`?J+$u1t{U5=G@;A3aUmwgGXx*8)%Pq&2s^_eyBYZ=h1KS+gMP2P(wm`y!j4{AVr9 zVD~uoDH5MUhYlS)xVXAnQcxf$EDU-U-||(A)R`WhLmFYs>Zep`-;ifR`~Q0B!+0THIEJ}$e}A_%|= z1)a#<$Vg|#qwtYr^k6PQx{ryk_oy%A4*JGO?FC!@Q*o#tn% z2wVsP{?m5n@!5Q^QBI7pW^Fm}$ERz=#?vJHQhmw86b9I-uuxopW2FtWj)vbZ(7{vI zNPud)aLm}?pD*@2CX3^^F0gD$G=2UW$rwLB87m#o;dN(QTTkR7PkQ&|(nP1>G~Gcu z$`Cu|JU2CkNzjL5)xG4of+=PJE0(-^g_a{kQ8V2c+fBDMSVoSE`X~{tYB(ccb%bYH zTU(3C9J#lo_Zi(^cheEzTUL$1H*VgX@Y!~Tk{#*i4Dw^y9eZkmQ6!Ahy**V^Qv<4U zQ;iMUx0v&-@eOWbtUu#5vx9?!QTJLGCUt%L`}gnTaBn+RU0EnHM^4-tb1OG{HB|RI zAV1&9&8+~_Z`b%*BllocmTXZ3DT(81!ZQh_C?r`KdV)-0*sdy1&2svXcC~<+15^yp zXSx)3?@^v&tRloh?RD($@83JQO@WHx+(^3i^b4hpRS&o|gT{Q~wj)v<_IthZ!i_thfz6KTJ-x`5IiT zZtUvf0#h%`B&T-|Btxc+)^{C`3Ux$J4{NN^NNQ(ENr_|0AEZPI?6+vwzG~EfI8m`m zO-72iC&)V;j5544csw{L}I zWr?2z_&zk`g(J{=PR4iBC*=h4u7Fpda?YIhh@!Zho0q3VEr?zmwopNV7*{kzK(IJa z>%D~iEVhS{_fDVei+b-76#2cquYi9OWJ8`sTv*#fN$l%S_O`ZUtW4BFC}O)KM?WlO z{lbj~O4mgSvryYf?9t&XF+HqP3d@##OiXu%{Ts(EosvD5=H{HMexYQwGF@3%P;g{@ zMRn)eH8(dloX>J^A&Hm?0y04R)fd6G;0a>De$}4NX_YM7qQT-1ch6A+etyOQY={*<0Mad6O=L8J<=+hQl+#CQW8@w9 zS7fA$=z_zRtH%WnC@TPF|FHRNtQS^R9!u_|gO>*&LIrMM-Ke=7*6xQdA8_`m+n$ZC zkK55l$F7U?BmzNoc5y+FZhs)Q?uIK9%mFlqssWxr5Kl}G74j)r*c=bjxhrfInk~#4 zAp}zA=u#0N71P4V- zkg#5=WQ1C^hKM!H>00hV3;O;y560 z@wutc*l4AU>5=Nc@MxI+_IEM?jYX8CZy!o^w6#559H|v%J#_5YO%xE^q&~VzNjiAv zkM6GGe$ciC?o((ij3WTNPtR(dg7u4dTey`!5RG8RXi#~S`v6=80Ez?b$dM!9A9G4M zvWWLOSvc_>Sdba@=>rKAK+o*15|n??SfWmUhO{3&l3%}m<&=&feE={5=f$>f%YXwJ zvmaQHp5SY+LR}Bw7V;=$h@))tqZJ;Iw-dUX;GlxsTnAhOM4CZm)?#OlqFli_80$iR zcNnWykcD?QDJRe13-iAVoBDFtmV_8l7~;Xnwo(Mps2^meeY z!FF))@$@UzB)aVgfQ&a!zCBLrZwcxhKE*a2hsIZ=Sjd8nMgygO`r(Qj3Tsd;^%$X? z##2VlC@6?t2pMRtP=U~k`G1)~VMq`Pkp6vCm&irMjOJz=VvR#QZzpcR&sdTRaFf%JcK`fN;YG6nP1{ z)YDg*04qUyM7P~^y$Q@7z#-V{uslS1PSj)Y*r`rE@fNG^6P^NVSg7j#r}#s|qqr4g zZxu0SAE*}i*sVue;{%vDg)!otJ< zNPZ^Df2jAkM~ZA~+^%1TjGOO8IcUfFW!NmB1>lW*pP%0nSbdqEK88*S&bZL1sQq*w z-57=-*TS>a){Y7aQt|Tga&~@ZEPOfka-L`xb)(QB( z+=SJQ`Xp67Jgv}WAH*%8AV)Q>qN+-_M}q9g3mJ0I2z(%TC^7vpHMNSCfY;5NTN

  • hkj2qkYg;cS58v_KJ|k7rE0bHKeK=>2MAh~q>fMGFPFfw3 z_t!7@j$EJLDQ&dw8$+A?etm03eElVpmhQ{TEn59;?|Z~HmuW1SE^t)tuyxXYX+EnR zczQ|o%aPijJ64_s_>!*_?{gJO5s`1-rE}XQIIxpuW>Dcu&-qA;9Xc_gHvf{tpPzf= zec{6Ck{i`3<2R3Ze5P4=qCPk(b6h`;tBRz1--G+ZNxyXH-TSMU*qJvbo^KBsanM+M zR^jsEed+h=ug|qVBq@y?k2$CCXRjCiv?zT+-ZGcgnmiP38<%&^%e!rn{Lzc-q3HAxoslF9#h0eKUdMCUPv%M)^vmob z6`vL@CDC;D%DmyE>!igqPs`p{;F>D(eW2;#2MYFpi((lgdXB2mE(dq!#V)JFU7VC@ zSGCw5wleZ%{);x}>0sUJ)!rQfj#kx9Vt?42a}TaQH{2MMQd!otZ!vyaS~|VM`9QO@ z*l0;8`O|Z4=Z!vXZIC97y;QecjUKyDqT9z-e82F`=6Op zMH4%cSLSf__FN-P9{jDUd_9Jn&^6#xhrrxEuej5)9HC!9zs{A}RDrSF0IE{- zCbk?DaV6>Wwuhdb4%0SLXMee#F$o|OjYZ^rPlRr3oBzYPix_FF=hQg^jztFub zE-seYao>7Dl;R5Mh7QT%w1`<*0r++ITB2GfaJ)baF_7H9sNHRU1jME|&206LACt@! ziDpMp5n-+e#@K+Plm^MDSj=~*0A9p)QCi^_LXrUCM*q5St1n|rD&aj0rIDtV*2?m- zQ>^D5+~d>TZ1cIfxd3AEU;m~0DGwy>ux&<3oy=0sf9zN?AZo{fyx3KsUJ!}h@bW^w zS+MvvE-me)K`zuL14#uNSwr9~5q&qgt8aRG7Srr@_1D0b0?bKtik<;(rtJ{7To5H; zn*c!yH{Yrh+iXomb_a+OZjFcM8s7s=7^QnH68aq8*s4`)!IKbYAjLK>afBN!Y0ZuA zQ9|)TGBSV$O^uC@Y?7elYHYx_f5Jhhc&|zN>l!OX-k|*vWWV@Gqqa7bgh*CKBXNKS#u(PiK=Lg=E;>^%SLTD6+J!a=(QnSkN?TNGB7mUZK9$dLg*FXR0{ zTLQ9-tgI|_j0a#?rpk0!Vxw$}q9W=@MDsn(Y`rB}&btcS>3z&KS)=NC0?OQg3|{Uu z-m~B(k5g4o(DDVkP8w9+B@Bfm#(!gc6&*dT>WU_>nQ0{knrS&Y7#mioPl{#~0H0)G zVS!Sys-20dMcvi$RCSIGf=%Pt@^3Sd>mW@5CBSw>wnTKj!0N?v!XsCr&G`O`^OL)N zs5(D%4-h}l8Z?~L%bO-sy3gh_IOFNvT%%!#Y^Jq%l-!tJ9hNo#kWJgO$so>vfd9>wABuPLm>_^fX$FV5)-cL$Qygu2+7^{8+ zR1D`xAk~Nic|gPfgR*79rX7WtO`$O&CTu@zl`>Ew zL;Y_nmWIX&l5mUzKn02VcVTT!z-alm4{JmTE&y=?*(2I#c*-+-LKOTSK75GO1s?t= zbAW&=PT-Lz8VdaGa;LsP*TaqA7xMXXXJHb9{2>2hAZT0+DD zZG)5U05>v-3FDbbSYDIEnP}_|*XuqM-!;8Xa5y%M} z8$d6?aBPX2>kC$RPWzcO92{m))k(4MJYI@c2ksiBkL~s)Y(nILhJq|8r$B+^jSUsg zz&RAlEgV9UJ>Ym;ULJ<&?jXS7JOH1XHX9^Mr1gN`Ym18w1>%rfJ_kNkR*maMUXL~s zVi}%gpJNH^i%|7}h}cpMdJSBi-{7>u6~gffl?&>J^i~BIIdpKLvqPGW1Rj~Y;ru!n z(>Sc~jVNtF|I7&sqXKpTcGN)LE|D5|4FO}p!DVi4{&)5JB}YeUrWYL^K#fI3=V29w zCxp%!H+ZNI@w{-IX_-#}ZLiJrg>+0Q-NVU-r(IdqLNKrH-9z35zKDy3m;3M;Ka1(vSxp}oPr1F}27)YH zTwI9bfGe$zlGAQdcOPVQ>cOvQZE7M|taw4VF77x4e{f(<-ui>-3xq-n>krh$mZxvu zya9uR8@=MI-&t8soT>nxpgD%ga@&k>^ z%|5Ua1;gd8e5GA4eEPHvT!PMvA3uD+gJf@I!J=$#mbv_?PN*L^43;I_cThr#wSaGg zpO#V4M|8n$8gBxMRmkasmn%&5aJJ*G!lH$_UcO7cX#W}=z9M(@6qvprQ{c|3d zG9js*CAFCZLtdEG3E=Na8k9f)4OxYa{j6|*99&$GJ(V@r)%j&U1(U@PXP;*U>==QQ z!GG4?-X1J8#E$XtMfBZLboriwj(-8R{FVi)C5Z;H z(-%*^!blQneb=__v#>#aEI5m}wvtw716@~ZElyFSzF5(~9oga~9z;%u;}AgCO&A)* zrly`AeSx^*=_lVaR*@m@m0gT0#Qbu?X!JI=XjoQ2GgP3W{*}+dKn(zW+5~olh`?8T;}@ z!oE8lwKYp6gjv$bh!WZ_e#6rdzZ7&h1m|XJaTR?+yJL_WXiy4dMLHQGJ}pE`jNG~E z3IV9&%S0xGLV`%FA$ROpW61)z19GW4xrV`kfn}AIyIEOrFJOH6(f>#Pe1L7PCzzH0 z1u&(jr)QSnn|JT%bbkSr1IGh}vaY|$w%Rjw+=Qw`MIX^tS4XD?UWZV4BfJGFS;6bq z$Y__UsfI>7wiM`!I8)HQ1D+9{`Wnk%##1g{^XmM_@$gK@f6kbh;f0yQDRcKIdeq^W zNVCqvEiJJFY{n9N?t&$HjFH&@QIg~1v-JesWa z2pcRa;M(jdf&28UVP~%BUp;F3upEV0!XY49@Tx-CM*4hQzb6koduA1VoOmLlq9_6I z!AMqGv6S7X?(Cp379PYOF{R5)1bTiI0F7`=$D;zJ5(r!8#Uil6uiqbhtVSQ6iIM|@ zV<37(n}?q@0wjT5)Ik;lD22MWn`{Fo2ILk|Yp^tn@;f4#wHNj}oSr>&3Mla#M7oBu;=mww4w!XlbUuZ+2qR*#xm&^{c z1iC+bc-S!o(C!2u-{U7w5W*x8u@TuV-&F{gUj2TS!4(8Wr3@HkgEudcIu)oi4oY*K z2cC%K(R5fdG$rIzRLH?Xve_EPVl}pJMFqRNHPVpTvz@4APHz80>V-%Rgcs*AXqRh9 zBcdWBAt|$RrQPoWa?(*iEu(j#yEr{aMwe@6~t`>Df;&OX?<%Kmmi36*w&CB!D$A1 zj9>VPz`U7H<&udVpMm~Cg#?)#%m)R-QU3)5xB~}`h&XL)%TK)lE^0>q__-K<6@6Sb z;7PRSgdh!&yDFrjkG6K>4)iU)y2(PUgbD|AVDQkui)>o1nOV02o{T36$79sdATDAV zAb@UD0UV6R-oAZlU%pr)-lK8wj~zS@d>Mo%gkX11Pcnbt-`$|9b##0O5e*16J{J#gHqILufIR?q*UWp)6FzL- zR@M6nw*0U&Jw3g(-}Am1g76-msqk_okLti5lsCVB_jG_PgJD=`=d6gLSU#=aKi(L! z=xS`7gf5;=4&-pVQ~-uX83@Je-DuZRSh2JO%a+uC+kJQ~+JtAK8kpW43AzkgiicuF zgr-;F>5sCotom}YsBwW`rl7Fjs%It8emxe#bbI3_%#;vqUS80xC3o`Td_J9e66b;+ zU1TS4>v82oIzXVbndF1#5TDUEXubree(X0G2>scCgjYgLEcEW(7r)kEfd#w+eP?bC z3L#P&v<9?XH<-lHpPQVWg`c$KNxioE^A1-moJ)lN_+@RxVxy+Bue06@$LOm4Peasbr7Bj{i32P;V_* zn3U12jbClBeu|2&0Ja97ifanUkwzh?2X>(+M{xe`;fMUhCk3j57G&CQ+pTv>|LDU$lCoH?fd=n3{e z9s~mj89^~q(Sh|}0gEUF{j=vGc0v;-$cr5GbtFyfumph&3}m3OsC1kF=q>?E!1csV zQv&f8B!ij;!+oeP-bNA9luJV=#2EI7?3jPPDze~Me=0?5Fu}%@NzPw5&Tuhsn`6nt zu5Y6iS81Mzrn4yvQ_|4a+AZdz62|LW1`z{_*};gaqSsf-n^g4HT8uptj;vA7?P$ z?Zr!31a<}CJb^kNvhSzQpHD6SMx()RU<%33!n!=y%i`kunNN|20PQOFypFVHFWGmQ`W@V8(V&groydJE`(p~A)Rc#&c41&lYV@k?5~kp z-wPRf3Tb>M$JJ%+SR(*3fiDE+uX|m`xEs=96XCN2=`U4nfnHTk_*cK9RCFr{6a)Od(5*J^xMaG4aL%L7=$%)|sL;blw9ZPxnv14!h>q@<7(qpdG||A4t`0g~Y`%8{g&f@%P~t zM%wcSg7`m;n3n*#o_IwU5>@aH@n2@sN1 zBiRaX{k?`oRiKG181|-MJDZxBMGDu*nyFBY8O-VC4meIVzedu;m=dO|o zS&|IhvGdnaydWY=N)}aQTGQx0?(y@|e+Oy|c*AgT3nRrr4=o$vt%LF!J3T-Z)2z_B zCUj6pRCz0;lmXy+Elq|61T^C3G}bx%6}Z7%FOCdZn{lM_h zPz=+Mv$S<1K&&xue}$OjaJPusJT{(w5A zX}5yuHN6dl#j_}z;F5&K5+qHNllNm{#DF5mIWYO%1`mlzoBiX?h5LMe z>5Q590$5NNv2TaB`P5X96&*GQ85tQ`_}_8zgLs4kjld2t*bZkX`g2GI6x8&VW|!#o z3=NU9V7D4?QN;LJ(BANgewyWKcT_27@(*B8Ou>Anp2Zofj!=xAVtUG=R}~tMXwZY* z3LFNOm)nFxcD`A9&n9`1HgyVYaXmqPdYIAgq@}HMT@iM_dMf(MUnsEwZh{wS4;is# zInF;6Dm;`4Ty-8qmW%ZbGLTLd$@C!7s+`tGL8<(&&+IXZ2J%J<_190#$~Xb!0VP^X zYb)e-R^CNh{;iKKyGx+yEigcnda zf?VOkSAc1KFI(M zqfxFQQb-IH$m~NWaaTn1ReE!_jpLTAb@Rdu+|qb$$a7~^3Xp^(0K_d=Y^|mPwYb`{6G^BB& zq2vO_g2v7&C>gHsYs3FAJF%-h=GW4a(ww#TQGQ31ZXm&NbINt}NIkz8@Gg6dq+&K8 zzT8YiMnZy=3jTtC4bjuS4GA^Q5%f!oOG<)NL4Ak0dUf?bwX!W zj^Ji|3*{F0oqPvR7A$_7n79JU?H?<8O+V*OJWCo=H0(HW<5J0&1Io|Uc;Bo5-Zb04NWG(LL86)vQ?C{~88E3+r z(cQAW(S`b@W!oOU8}eMnIvINR@a|0Oc1&A;j2|UMATFQEcL0U`>i~7kgyg zETL;l>@XsvN_Cx`hfzd6c|uO6bm{1HL?`nuj?~S~O}wr}RDq~MZ-caho2=Agi@m^La$GB!19mSkCAa~yhNm~~POqIo&`@;0Yn+0_VsF^{&!5dR^yGck zW16jTI6J2xZPd*l;Qh%h(jsviK2C&b%t9kQsI(!CIK_~N;#mUNIU+8eap?BgK6%Tc zU!Q@j!8ohPvi=UR&E%vcPC3t~KU;VcJBX>U$SJ{ytOb##NGija&;{5aT9 zi{Vy~dEO6~r2rg709sDj1)zfv#?9cIc6S$NA!u#(B_Du(HXNcy zE?w?EL2mN0Mx(#NR7vxyg?P(F{>`8dA0FB0mpUEc3n*W3PTY3TayTgR$1VrJ)p2aXy{O&ug=o>zQ zyJKu@{kR);CNk7ln2s)&5vI%w4iX@*@hY-WtiVs0egN~{cb%O`cQ%BP1bivet!Oa1nrx?=|cn&TMfz-zf#3YX=KZXheKH;DSa5fI1+zd2@^e58hr>8>$teHv~!sZ~` zLnRJib^w7KDO{G~o$D0JD(b=T$hwy)*z_zXBK(4ljSZB6Ae%uG$gpo;G=JksOvQNd z8>L>G4e!)FL!lo?f)x}L;L48DA1bmc%q=#9agK?+KH7A!EdiQ@W?I2;dy|@7UAmQ} za|2&Q<2d1dAuBud@uN*ZE(&e++$b2wR-dEC;=;V7Lbwb!%&n z#H>hJ%0cW4YaCBXN+LL795OFp&5nq#mw*~qn^bLL@@!5YPvZ@9mvD8Ot1$F^*V=jj z9>K#M*irAMJkVVMNplY~Gy3tk0(I^woLTmHVlC*ABSNW;JWfgBY_j0O*xSH;M?EId zyKQW23{o1$n7%q!AgLLGvuOHws^sJK$};3}?7)+&0E-pa3A{%U;JL)D1J(Hf_6M{O z{TK#@F)U8`I(+eI1tNpq21Y{z7zpnF36TJDM78hocRDn+? z$IsvN@gs*cL7G6KVqM6##sTwC>!M*i5@h3F7QGOk5X2`Jy`Xtegu7>P*X-;pQMupT zMajn$hP0j!?pj)U1$NUA0)yiU^MY7)!W9$=x(5F}w8+qB2Q?J6E{Y;BeZcXh3yXDw z2zF^@rK#qLX0{_THuRhzR#+0^Ije|r6SJMsW9zSn%^;yI#d)Ln1M}1Yt>VI(RP~>q z#o>(MLc7TWl?~-X0dj|hVVoLK8G#i)4B-K6EtFxx$$9>956BKUgTs`eu>`}tx~htd z)E}*Tz?<+l)XY8sTM`Qk_9&#k(jf7|zzMcjus*}@Lurp<#SZ)XGcyNBVACXgI)W^K zCNahVehI_P&HKT^!D>Zm?SMx0J9D&)QJDb$$n?X&B}@}fFLs5$1 zS$0-d{tejgD7aj#tQhy~aj>;rf&By!Dzv;Y5XJ|Fd`)|lvklOPgG~g-o;^s2P%=Q} z0>Gfh9)e|AP920!+!2C1++kB5Jjm^_$Gl4n_N7G+u$wZZk&2E^Kze}N7F}Vu6Ccl! zEl7dFK$*9_vvWrWF{TT|X*@0ls2n?Qnn2(#C}8YOP9HIDPAvxexbqtt8+e zCnD7OaqOKEVo)l=I>33Gv&?jYJDN&KDzgww3m2b=n%}! zDf7n8N!5j0-XyyZiWwLZV#QGNA?ewIa=_|9o8FhvXe5x93v_;E)zt?hZ`vq)`g2|{ z&wKlu_s_Ha^JY$F)UhTmKVRJ(xDGaAx_G);pd(jDiI-})H;wZ~V%^DRk4~qZ|B@6zx8I*>L7FA5~|bzD+CKrO~~T$Fv`Y{KIryXwh4wuDpuE5dx3jPH^JY-#uJy~nnxZL~2>#Xg#KFLtid)w(wL#*^5R{`{{qMuxxGE53x7 zPHzqFD!EMM-I<-E+PgL26jy3sz5SK=`rrEhH|?{2i!3RM0&npDAcHQPB)LfP7AVvI z@zP3v1jH|NU)-~HVB`*4{2w5X7f5u>uQ3gfPhy}RbDNzTvh zbT0jDZ+%3(IaJ`O{;8&Lznyc{J7^ehwgzV!##_8jeDG~o*RMyHZl5kxQ8_w)(yM-M z;NtafjYrQNrBGxbQ)I|8^`jHJ!%g`={~8=tpkVv&@9`zFT-NmM7!m)!zG_I*O`=Nt z$bbLl?U>%i_rJfyJwWQ|$N0a$6yr!DSNDH^2}a_3E&unI{{LV1|3A$C@0UtL^zm(3 zAL?HT^8{u-EfvwzcwYK+{hGNX?J3F(hZH5HnAzs<>YX3MhQ6=-oqzrNZgqiZhQ*SF zS`fE-{M1$D3GJ!x7Ye6O_=wu?xvix4sWqCLyg;}^j8;W|>cm^x#scBfeMeOEEy?TS z=v(wujAR|Uzpv`OPgkq95M`k*U)w=i|2;A;Uxl(~#uFwi+EbCEsfq!VcYHDz?DFRF zy{jKFG#p>mV^e;1PiSMwZb$e_{;9ka67nx5u|mC~H~6eJ7BB(Lpm{vv5ed1#zR+!N z-Ha@DSmd?+n&=oiv_CfPV9ndJ20_d}&I$dfTxp2+eQXgUL?>6*MsX)j|J|L*3rE(? zbNc+Z+G~wPxX+W9jIg|j4UhcrY6`XrHnz5qFs-qE=Am5DXxuG+BE`Ze%XyxZ?eY9g zm+>fXoN8cSdLm(xk@0=`ScxGkul}5L4P$d+(0>7B-j?>ID-yL z?=`iGJ~*88)YOhp(1Mwbp>a2%`N?lhX@bTBe#8J1k>7&)_Xu37Ou^g7$o(?T8W_9+ zw;DVK{QgqsVGoSfLy9BCrZ6RQx^D%wIx+kkmePO#QG@}|zgK%ac$AL^i3>pk!KkHA z5_lbEW4SPxKvnxt+(z~CWsxoYsYhRS(6E9MkW)|)!*DM-89m!DTt!Sw44ARVGw(ld z5PJ*oX>&Pk3t2Aa$;7FiwBF0N?4S%~0cEz5Pw|&sw zmw$)62~-YzGuWSVKsv$cR@@by2|du)uC!4gK0wh>3z);o$+bxF0n!1a#h-y30*Qu8 zDq!m=O z`YN>{2Q5ETpzua4QYpjD!-F>v^>v{z2PkLoYBU*g3&f zuOhqejGQaXGpbB8}tIWPG+^jACQWdsC*IfU#_12CuL z0Us3g)$DLl4w~3J#OPh{o<&cP3$7jaOyJ656Kg4vV0#$f_0~E<8_2*KG8{~?%Dz-fu6S)TCH$I&CpxNWI*@L z1N}k=54xib49?&%Mu|2Lk)$;o}Q%l0Sa@a~(4M#CBV}NMu&(@jD*< z1RwxY$rTJ07HR0CJ_%}#2IWx!Gb2tfq-HwWr&?P z@n}B3Ncb-lcm$h9nd!*}`b(IM04R}G)%fSJZPiV!tsy}{&?(wrnk{}rNa#)aAesRP z>gfi#V_jGUpf)imVX%R*hA5m37$Ta9yBeT))D1YFS;>+yNt6+2J1XO=o}LiwTr`me z%^3O|lui{;UgNGzUpt!^7&Ij61zTP0dZbMm1}B*{aH{QWY=G2&$4A?YsB3fi^2ify z4zc#K2W7aEc&GkC;Y#4WfSEAmq|6uuQ|l&5C1OnNZ=P)u{>PA zYaJMkx{^8yJ>g?+C6+zwuw5AAv|KNT8@N&%sCFvth(b0ksrII3z^*kALoK{=uK zr0~bh`nvmoeItZEH)|qG)iFmO9Al8_0K`8ia^=fw`AYqIm5k=$N=xS^96kJ7>W(0j zQgI3P7OJ>Zj#G)YwNrEEB1FTJYIQ?I%|9ERvr2_r6@7jp`n9w3kN z7}>LhZ2kDc%5v~ps)iZg8$1ky8zqS-ruCuba!m?G&XYtHnr3D!3#DKde;+L?5+d%o z-2t077k*Vij$*XSBTHuJ(91`V(>W@G9gmOpvC-~z@y)w;iARnE{U=`KjUPB+Gbtad za4+TJ+g&o}(c{O7Y3cMAxvA1xgNt~Tr6h_7{|0~O-h)~bLX=nKYxEY}0kNbI6315% zcE9ilargywbq2|9IEX+YF0Ss|H#Arsyt?RnXiqFuYK7p{hb_TCgFZA#GD!Lw&ShLysL)5At?;3})TJhXZ{B+PrCAgz?0Cw`u7UkZ=W z`><J7=|$!8 zy&|)K-aN&vws~X;uIW8{D!4HuSrs+u?LZkmUs&E|WrcYvmkxzm-BZ;_lenzEhJ`<^ zs!ZOkv~JK}?V?cpZ}Qlx7cYX)9{Nc=)4W&@{)v#|JQbUa_2~zVQn}Wk{s&jG=QieM5X7uTg7KHSgN z2~C^mz3uS_sBpwC%9SA}=K!*J;Xj~`R|Q&O$l$>|vYWSTq4UOKgxMiH%}Lpn8cRz= zC8A5$A3ppGA}HGW5Q+$$TfLX?9J*!V(j2$uJ3OBj#rJ*$%Ose(7>!UbehZ%k(CD1pBsaS}?p zXSajBwZB$?`0~4%cKjQcC3)dRqYAN%T3;mFIx9hQ3iBA6hIAYdyjqZn$!vPSFT{ga zw8Laq!YAp5wLH7}@nCZ9JBhv1wA=-i3A@cvLXiJKt5{3P#$z*V%ovp=&4Lzi=~9(}14qQ$o2t*C zyQmWzl@uiF9524BvnekSvgSz)tYTfpIk~qEAIW*f_+a_s&~` z2M?xE@D4-7E-NJgvleViC2(kb*=yTuje!G$nHcG=1Ga9pcD~2yMd4YEJRNp`UG_xjNyK(X8j0^)Q(s$nfa)l37f)XHU<>V7bv*7B5hlVo(}zVwoY6>nrL{xzT0j3EdF5{T9Ce? za#F$cIlcbQ{&jP0Tj>~ja6%Okidvp^dCzZhGU#hg*mSl(9}dYbeLF8-P?Z<2;h$Gs zotKf>2^{d{?SE)g5or-*runz&#-&xy``%n!@OL5~z~2^ zxsH9LGcjZ!CQqIe1jn-Nxx{hj4E9rCq=w6FyN*|7F|Q%@1m-^i>+g@Wst@R)P)4!? zSUcDdVlTi+PZ@G^b#;0#_nSCf7ti6QQR3a*XyoXZ`$)79LA!*K(Zxa^LA3~%mvdfo z!OKt2C(OVXi3y{x>So=8vMcC0VEcGyQV8avYGLiYr=21HBtW9HM{ z^iLL7mAW}@?Q0ui%h2VPWJlL^CYyARvcfl#-u*fpwlE5Cl4**dsSv(A-oAZ1_gW7< zU}g;sUhTbyCd1?HnJg9W%QLUnh>a>I{_*+qt$aRVTUAaDz4a94Z^r#!fBu^BY`_Du znbjcprY(_Ti9+nS<#bd%KLf58gpqkU{s1)g=l3__<1kNpp^A0SOwrRHH|`IJ7fP{;^71NQ6RT8_K`Evdb<^e5 zXXng91eTDnWbtB4KZDL5tL!$Fd~4)eQCd}GrI3}a+qCIj!1kRxSvmZkVn(4ypt6Ka zwhyn_eF`tkeodxiJ_FxdafGp*_KYf?;rC)8`fv*_GQM{h*PvX+jA<3Vdl31{wcGU__cGQXO`|Wf%f?g(j!uKjW5+ZdkfBgq%o;Cg;LjE^j zzxJrwU3bLm>c$zzA4En+Tm4w_Bo;$6PvnvR2;wDc)>PzTw45RA#EG}hPe{EI8>?8m z(?XMMW;cFN_41!^t$fj&-x_&XcquR%(8WgH9Cw8Mwfs9liAl2sSV##Rc}PLKW!<`M z?9Z_d+IRPA^UDF=oi_qQLkc@gGYbp+evFwk!RSH_hcp$5pgUo?Js1MiF6BmMZa1G0 z+4jgis&2MoU@!@b$uX*rqtB@6Mq+0?^Le>71E$x`);YN{rBDe}5f?2mk5Sg>xR&)Q~mx zhdUwjx91-sxP?`h7oI(T{`ib*cYo>s)(f(02po;=7e>|M5`JM@@`q(tXXd)*-R=Kv z-HDpS!ootrt>~*CWz336OXsJp_0mwTn`GE(tEcR*(XBj}$7Ti@aJzTk;2q|^TXrC{ zJVFjUi?ZGCHGTiyOFl}#)aB&DGE9y*Bpx`h2<1&JMR2W4>oFrdB|{ zu#S{}TEuJgPDShhA$T z5Mby0TUj$jR_dcyzOSqE1D!$&#ri3IR=!FZ%UUc#L;z-fZhpbn`g-xdksyfd1ELz= z;f{#XC?m8$6(uE$_z-^ER9w2>9tFPRt9Wf~s$(9%hG7^49AKpyuZey<>Cu)~%7D}3 zgSw?8qEg|T)5vogUG{x&GbIu44u?kZ2p$l`x}Ki3{Krfo4PU;@yRc-&UOVswp|zm^ z(dFeRV5p?@f*Ss{oHVuJ$d(vo`!ziWE3`+3whfp4>vDXuB25u`r-+_Tr>j`jP`)S^ z%QBzLuiom=bR2mpHOqa*Sr6WC>RU+=tU~VET!X(X;}TijEH0jt!07~@^77gt(MU|S zdT|K}ZVU{3`nAipwjB?DN&nB8#eQ7z(g|}CUgR$BuC<81#8H>N_x5dJTbgHqK{-%7 zIL$k3&4>Mh%?u0_ykPi6uRif+`8A@G926t3CN^K(KPwxza%E6C{8L}?<`VtdT ziKX48pW%Xj0bA_yrly-=-49J4O*R~v$s}zNP{B5>j%n@*QK1s_F!7~}49bz6Biwk& zexe_7KB_032|p0RebGq!uH!6>jH4y?e;QB@0*ow_XiXmsW%Snh+s*IS z;ChJC02Fq7#&K%5LTaQtT^Zr|5F%^uVPAr|ptV@1W}5mM1Zjk_Jnrz+UU9w&i%El9 zL_Kb7+`tJDY$#e0xA-aJ=p)7htPVLP_i|U_fJ4&mqA#6MVmQHhY^!31c44>ggh2Bn z@7@zH{a104CLUF^7F;IKebY5B!N1&Hn9u#t3rBkT)uoHTT3X-UZmx?-x6Jo9rQ zmz`h1nXPHhRG}S72Kwcdo0{J7lyaudX9D0}6pvmg(!P7!WdxK7a#d!nH|hi*vjf*J<*TgGKVdPg-1Bs&L8LaIZ6sFi{Yj4FeO_=h(2H$PqX9qVv_KDDUkm~r zjEuv6Y*FCb-)-&7tCGq4_JK_%2tD{gBOi_sKpyE&skLil0yAlOFHler!4Eg2et2D8 zJ>Z19jd({>;Sc~I={X^J7jm2Q^X}jkzU*%BR^wyng-e3pv;8>)26^D}QsK<(783P< z6~#TqiAk9Cwn`ekO1*!7gLr#1=ODf=58m}qp&4{=LOSjWT#h@$g#|Z}fkzF>kptoX zN)kO}m}G+FBJgZKqtMe58wFqTn7HV!$Z8wwXc+%}KD1)}3?K0$APD>Y$~N}Oy#YP} z*uc7vt%6q~N`3adu$g2p3yT{N3C{9@lL+Rg?znN;GoIp1e%+SxE}+vYdXlQDjxD zV1faH-Qcif@$xjQy99xJFBo-m2!|ct62uT@@ijdh$t!HGng5_*S{`QHVoI}rfGVqk zIol3a6@}z>jcZB>A5T0n!_}RMH=Fi(j`)Uv)9PX&0%F52bTcV7LjH6mwzisiKZAw#k=V@G^7V(tbe2?so z`ET_xh+(9As9&!n#$NhGROoP2>HrZ11qCkp%5_Z*?e5YVg7LU_a zL{}>b(3pe@?X_Fx5wo@0QhU`*mmeV8J|-pxBm^<=7+Qr%J`FXF--ZTCuZ$s-nxTog zp97FXGA{B8YDBKj3TU&Zs@ZSpW@4661QOgpNA6e`q@bE1)oOyXjl?k{P%M;g z&DIrFCLJKau(mhPnST}G7!y{#BJ|JmEaI9#uHNJ7a#V6Zd1!iyyW}eIHS|HuH)Z|3i4j)pMl~1)~|24 zGZfP`j~i1-L`#?7IUMggID)ndl~Tow_`ZFM&oIBLmsjO*Yr1j&Mi zYX%vcNTfPmUH?HzRrPMv<}KVR4h598M~)p^vt-F5ULYDI?%mq0@eEB8S4H*S-9Nt` zNQ!Dh_%np*012AyLkkkQiR@utdp(&oNWkVG(j1KpfY4AN;R6|M`bAY(PMTdezR+ko z+c_~uur1PH_;5ni)S@$hg4kw3bt>z$j>%qt;+iH?K^-=%mCxff%4K}>pTZ#<$WI~q znw3AWFIK*i_m^WBr9TO3ZPJq365i#B@{>ofPTtD+(DZ-!JDQG&OMpG-T+3#vOSSnk zs2*ffA>9NU(XbBZ>rfA}I*LgQv=^K=F;_NV?WRq@eb;QX>wimJUdJd8yTzwPQ_&nT zkN#27dFX+YC*S6~f+smCWtx3^EKU+H46 zBD*@ox;CQNhv!uKKXvXbXclhrsvp zUvfU0J>t!-eA}S9=CuDBd3~+saU{(ETULXD`V>cVt^jz@B;vQ(v|+<_9;(kh-h}Ke zESOgi9E2K^qydnK)AVDnZ{NI`R=Y6S&b4ks!`Fq|Wp?ywwtx9}xqe^^K~mnchIubv zJ`O&bazd9G)}Xr^z|LepsHVVr%b&0ePAy zo2ES#$=3QCm$+>?>H52DiBDPF3Pt&T7ytdH5?o&~rZ`etMbzg?XpL%e!nwDFW3oWa z+IMoqfCaUX2wk=zbS$RMtgKaY;;moyPM#j$Z%?$^ z=&2NWxz^V>a6?qER-bMv2jcs^ockLd;jyg2*CN|Qo_U^6!@L|rc!?!zB=(;D_zwQ>%`?~)ow}hI=-0;GwiYuLluDGDKZStIZXMOG3 zcgl1mjCYXzytLA=y`^za*sQl>zDS=@{jtFBZ%$V)t97pad+hquUk|aXEm+jqC+SrF zI_XA=@h6k=#A`b|-l`8Bb0^zU z#m}+9G$!WS&|{VD673bE4x0F=o@)M_oovL}WmZ%A%Kg@*_CI^Xnv+~~ddYN6(SNYN zEGqo(o!R+i*E==KJ0mjf23j7-8~!v~Ieqpo=hcHMrN@NC9*TRJRAE|XJSKIe-^u2Z z>e}BkSS7UOr~mJ1$LoI^I_K|>k1z3;uJ3Yiaa(@jy7Q=EA203;Zj;tGZqLh>582$c zz_9kqy6m*Yt&alleyINaRV>0uHnoPxX+%HNB-5q$`v*sNU z9IkoHIE6t*{_PPOs~ep7q*6fmxUOzx4xD=Yfo#-GIzaG=uR9HISQPiPjQd^mcSC6Y ztY%JV+(!EGJ4DSdHoXeM)Y>0Tnt;`oDYHrh{K{4rkL%Q7y6)!(?Ifcz@~J7!un zzEqC(f&6bHApBt8NV8M%cU6jw2uK2Q1usNcb>46e=OM1S)^8OIaKOd%Ne-v6%xg47 zi=?&`a1Y2!7;*6F->+YTq>Ci#27)~hM|HwQpPEB9fUFh~2!aLCcGJQA`x)&StLiJ@ zqY4Wx;_c&;lRv+If47EO>dQ85FRz!Bea&U}{$54oI!PhmI=jGIX}AKe2h$N9vuU4s z@ZfVU4vTU$_t!(Q;GAv8=T!9aZLSpyZJBwN1Am3k070J$=3tO(@-4NKd*kPzou&%o zToa%{5@m=zntZ&-)JLlA#Dh;i$y%Y~0EwK-3>mZC=U14|oY9DFh76f9c>j@!Tipil zA5Q0@qjN&L7}RyosbzQP$o}>H^XtoDq7q>%gCEAh8%zvt&%>+0mdyXHl8$(#7OvSf zGH1wFqCY&ZtY3y|e^Q)IaNWgeXyV!9$59Cz7=OAn+(0=lxCxj}E7Hcs#&22Au~^aZ zZFO1R-mQ7Jo3^p>TDDOh^YwR%_5pjbmYs7WaBQOWC2h?^TIIRW8)HD)`Y&93Nr?S} z-U#SgI7QSTc^=?}XrRC}{oj)rvcbWVL_!90**6uX)`%r|zeQHqxf65siDnWyp8x&2 z1sE>lu67pNz25ptLNQarPn&QcJWtb)jQd&8)v?CP>L$lXQTZ6z^zxp^#GLVtC_eRU zi!EN;!p<3(6iYKvLyK5KAW-E+k1uxymVde_T_`5WPB|jz!7Q60nS|*%JfAqZtnL^!5qi+{!;xtXQBiIPj$3>)NAX>9f1Wsj8rzq zb`4Qk8T_#|8vCm%hr%frMmGA9K3v>FO?~FwuHTca1hQta$wR^8a{Qa#eW~ z=uZ%k%*pf9G8ah*UoGl+TkXkdDwtQ`aOTdL10Di|siUh4V9;OtM&T7|cHd3R|~-^UPZ5Sv8TT!zQAKlzYG zJufehIe~V~zs&&TXX(EO>)*W-STQR*bexPf6mnV^A+ypcg(ZRfrtkTaODBsO-ZbCn z@9j5>hLFkQ$bkbF(K255TElX|FJA^IDUD>(ULBG`{QdvXCFwrwePEQKFmjNQ_RnNP zFbZ-3hKgyPeYU6=*^hSKKmg1lvofu|J7X0431R#+p+uAJxb+&xV;~NCsPBvD4b`%y}nmTS!NR^{U1SM&0)OK1{2 z1_HohYa%;=RuJf_lWt+_es5LPlaxmbwxQjqTIuNrLlFg{k2Mpi{H=))mru- zRRzgXmK&RK2zwRy>RV&X9O*(_zZ{OyJSN`ces8PXiE&t~p$;_Nl82!+hUIQO`sIBh zQD#Hq?ZYg*@eU4oUHLair76ogZjbDQP)ZzIDHo^?HaDMtx(YKq$Jg4$7X5(UsW-(F zbExmIqom2K@bO9gFQGzaO32>JlDfSt^VJg&+4UK3OO+RJ6HPHf>$u zp1{#aibkI7E*ELN9Y`xc*IkzmZxKG5j>h6Q;?2hHKaA4DEswsEzV3A_KK@qjS%?bD zN?WNmcUp3O1rpGTg-OSzEtU#NQ1mOCsdzIxT>p1=VeV(%T;fm9^mpDsIw#EvxlfQ_ z8e0uCltqQJKlZbK{nz^Vg9#y>$!U7PA;~VbXCw^D**1v$p*m}>ibP28`bm*lg{>i;|!tAzZB6TDL!RFLy(?*RR%~theMK|WVmo$t=exJ=!8QKl+-W8C^{qp4v{Rtn;=5&rN&OK{?@(jB~ zZd+NLQO_}V9K3(++Eh`)7ZOxC-P;Y0%Qinq)hm0iD<74njwb(()qj$rreG(fy=O2P zR6u zo+>=bv1bNt&-3}b?$k4fPS@xCx*7^+%D2rW_+h%{8! zhwjzCGv32iG_|?6Y1X6%KG0#4uU#z*x+0@-rhm9*;LC~c$JlE(pAy)+;h=Mb1WX_N z=HVenWy%JYT=g-lJ-6g}+v%wCtoQ2&ed>lm@Jw$n_w2{BZ@d<_4no(?m@WkUFwUPD zwOPut(^fT_%_hqcHOwc&W**HBsf6?AB%b;}ITW^s0M*cbx=zvan$<`#5fIW8OG)JF zJ;z%}S)S46aG@PVs1f)R1q~X<_5S_=Cri#7jK}FmhfAfx*6Gq~s==6!4f(=p#SjM| z5~rH0uW(}=7AR*|v3un10^tU*RHrDjsHlh<8>e=xL!-K@&OOAJwllWkWRQkG(1v1X_Zwu4A~zMF#3?AgQ0s9tS_|H}rBw z?i}5+dHuRhsQlZjp-SSYFPJByUdpEGwLeH}SfbVxwR7wCtycA~ac*KIZ*FO6yH(aF zX4CAkHx6DqUS_%N89_1VRWEaj&}5phl*Fb{e`;*;%&V(X{1;DYotoW4>+Vv8yltFX zdTf_sgdX>gnRl_HaHN@f%jA9T=k_IL1}A^ghz87P5ExV^738~nhuvQcpExq5|?e>fDA;}V9DYX*G za^(ev%9c`Xg?^u|leH=AXaOFu?pD=Rn|vh9tLV7=o^=F+f|g;+E&FpjP%B_i{IIzL ztBX?tZO5RHf!M;f|AY$e-`(OTkD0EH4z|kq0i{>dlZ(vSh97Xv-w_?Fd3Q?8sb#T~ zTev^M8JB$MPVPz!%=%;6&CEB_*Zg-*?fUjP-n60Y%B8>{z0A!r(+t{c!tEv03a0^{ zp*;Isow5J;@rC*NQg4KHNWkeFS9JB`F~zd`Y_@$`m~ye%Fy~Hx-CmC;REnQFl@l3s z&u@##;?_8$&nuz~4d;IU$t*x=q)jmVt2AUXy*cHQ;1q81!-xB}#)}pZWw&%G%O>m6 zJVy`_8rR#9NiNlq7tKOH9{!iuog+u?8SUb~1K^&s<`;k>wFs<~!7A`^uZQQL`f)JB*9|`IR z_4DkRk`)435I$J{qCav^?$Fs$3>1o?p*i!{4yWB$vSNXdxNDy7Z#E49fqxxf9aj$*N;mpI6>172w2`L?TC^*4*syP8=T?UmlnMeh#=lMuzSP0zThm$Bx8MGgLn| zgZDENTdr<^w`UF45r?ICX#b0$Ns(VFIdMz#iM+fg;H)fo-{s&y3QE}Y!CU|Q$gt2P zY&m7wI8JBny$6$$n!8Ej54S%x{m?_ER=(uy*Xsy3T4s`j?fYNY5V|g^*V{y}LR3uN z5-)|c4pp=ioku6PqBl|Q+n4iE$*RsFzc0obj%4zd=QlP8+i8VW?yh!|wrjFsdUX5C zTMGh%RHM&dzT8t*R`0KpYOqHZ8u%H5_j9p$WpG!2FVDMrbzRc!+?6O#{-b`84sjc- zQY-l))rK^MOnhm2P(?h)%|EATickUa!1g|?9LPohf#tQ7Lpb7NR zG;nML{(@d&_S}@;!|32%^;qsTkr}7a01J~l(o|xxmgKhJGqV`^oCg=J>KS@PUg3z5GGhq(<(a|(QkHv`Jn5}(6pHE9aj}GL zLqu3!U5%wE@zooKevT#>{;$lHQ@R18AfHWDSWm}@iRJE7L7*k*xHio~1q4RnfemXD zdm%CbZpo2H%1|8qtP9(J+>s-z{Cd4S1}2;sG%9lxk2z~eO2(Wp2)I6J*sxE>1H)cP zY<7sZUyN=xIXRhAPc51fC!cGkqRw{{XFK*ZE^j|7w{Z}7A{#|$qM-&u()(Yq|ZteCZ(lvF6S7^`LBL{ z|9*(=eVYcgAI8Z-`0rtqfkm0^BYBe3+E{m_gbO4fw1bw{C-2lkzFU)fOMz|8m z9aDtK;>hS#9?*`znJbc!Cc5bBMxuWQIwz5#^wzCL#uJ&D#_dE>IT4(co84Rv(U2f& zCJ+*Z0WX_$w{9(c%;+(=o*+bm8SitrPGsxfzpwJw#f%y-Cjqo*HE&bi-{lWalrrL_ zggR&RXRl05%pERF{SiPOef={`W1wrVJzC?SRfffv`M-+AhzMNsA!4fbIkDM>oomi` zIkmUmiYU8mkz+6{W!dj>UJFuHPl*k+$}ri{t>WwX$c(@j5k7}xHV!;Bd5PnvFSkoy z-MMN0Z^FKPufKem$b3xYU}+~*4hyC^D8N@{Wj|(tGl5HNsQ#*F2mFQJtDB;FH6jd{ zZ(^H4MBW2o$@c7d%*cmAk%Kr6=74|S;6GN214MYac#r@+2oFK~qqJ9Db{^|n+JF zuis^As}qwH|BYpd*~9lH8Do3KtJ^kaswtG4Jea3=Hbq*TQd%g)HO1S^ErrUmU;Gn5#1O5L%?}sc&M zh!O__F1*;?yD5>kzf{_pN|g*RFc~wkt901nd+v)qFMsT)CtBNQXP>B<>omGrX5V^S zYyLIT;{Eb>J~|@Q4bcFwxP`L3|GSLduG;dz8E-GDVG4pc*-KEzBWG z9USh%-*%EC&vOD7N~1Z)-5uc?()%?J(rkfgPcIwCE*o?R_9vANe5B9y$nm!1^a&v? z9XJ3j9XE={9f#1(eZ?q@6Nk?o&1K`*0QR9Hzz0XfF>RCo8VLol$i0T!JztKGh^=LG zfc+He?_HcrX1cm3X@e_wPd<)X>=tJa*t^i!5+lg}58MRas;8w@1jm||4xhYYTk{%mSO zS|y3gB2c_nedD?Q$)#z0p$6_;1$( z2G9$fA9Oqu!F?%4)LNEsTrV0rUaBq9r>j@ykMPqM7TkGP>Te_Al;8Dcs8z@D80DYO zD}M&B9*{rvvc#OPo4${7KKg3^wp~wlDDO*i@u>N2Ss&z~5HMD&bxB2!@?3(MCr-T1 zm-TK63sEJ7>(i%lqkn;@5&qnZ|HdT8{p5x3`unO2rC{-dBMgis;-2F}Lb`ZD*IfcE zn~MY$vB}!=r^v~vSBbJZPc5Y9BWBj!UAThjwa70D{(c>gv=hxH@-(wY-Nmo4y%*1K z$UCk1jo^sBNiy-nww61N95?P6rVAcUg<{~clMuD&)8XOYJmaqqd`$*T-qlLKP`3jj zB0KvVQY{`fL#6G%zP{+Eq5@->3w3K92E<2FD>hKY0pmzGeME-d=+PL`HWAc_^eCxaPhAII)N^WF6*`W4Np*4t72<$nG6QOm+07WvpZA&GvM z*u}%h;{xRHlqyg;^ZT^UZyV=4LcEKe6TL`u^vzr0;Sxji9Z( zdnCs`y|Q80HL?q*U<;(%TubN3g~z3VjmDK#lDw{Y68sZG+{k|#7nl^zr0w33Mnepa z?zaci5d=B^{`Zd`i&U3DRb!vy<8|?H<;ma~B$JG6>MA~%>V>GNTc{KOB&cEcazDdJ z{eVeh9SmNkuQzZ33p?M&OF`%3WFffuJLcni%o|)%rR_R2=m4Ne)3kZLd+l&E8J@DN z$o#9Hv~=f|N#%!!o*4S)*p>Kkg(kx{C$^cJ4S0-LsME1lXcI&cT`=D|d2 zYYQn=^g+C2fEUcNWUM{ncm*_MBb41sq%#;G3bVf6{Phomnn)$fZCGMCs-w%Bzc0At zq3Cm+4(1^zQ_}=N0)ze#!AAIVcG{RODA@*BLzi@i+#-L;!lYkZuslU8A=N72hcK>F z2HORid86DzyFAx9<0AhVvH>H-nN=oTjxzm!#uzauT4cGs zYiwjAL73#q6o1j=6})jdzG5w*fF2GrXU;#r;1;K6{+D$r{*YY-d@*`#>XRZ)ETo)U z=>H5o^TfJEtf|fG)xST)qHHc3qB*tHdi6@J`jGc|Jv|chd|feWQG8@v(!xW^c^M<- zU13o{SQ*q^E3_$f4PjAa9G8He0aEQ3{s9Vk6*MhJn|eyyXplW653PCj)t~ zDvC|sHfhv2zUhIVJtpt-!GxKObbP z%(5v@21CdS#tSIadDk|RnEs0%3=EpR!cseXv_I!S=fdv)WJ2QS;K4G#e($hJqp1UF zZYP^u3$khyp8a-&BTA{kyl22%FTWM>;Dw#ZSCw5E=@xe+B5YrKJ3cnXO@kzHbv6n%H8O!3EQuDZHA z(q^9%`=mO&Yh3xrk*>?Or#WhmJhxujJ>$W$m#bISO(gaXAn?e#fUd`X&Ho4qv&_uG zeKP2aD4z`U{iVK1(F@fV1B5L&|Aby?@f?Rniy>IXTDav~0|Gdpwo-q&0Wn{~=_!~{ z5LR?I%XDm{iK_IVE8{hzL-Cnl0l-jqR!SL}A}=3wMRFZItzbH}wX3@5m5r+=rt)`W z9OF3)XV~^OWkli%6Ls^WTd3b{CX2IKS!3fP#+=|?0z?@<{`rFk#`I*E z8Zk9@7n6LQU9(&uIy=~U#7lY%Nt(9yP0mSZAJwk|EltMF*q-*@by%VOp)=Jfh30*a z=f000Rw&`olvFQSzREJo?KgcDRYb-xObU6J9}^nciKGjM%?qYSWX4LCTv@4ZvzZu( zUb0e}gQ1k@Z|Poz6j8u*V5>=JN*EKYm0ff866B08)Z(>o65oF0Ap-&^MEi~$nc;K* zXN|jC%4VYhrBh$?J>gWH*psyUc-4oC9;*TaC-{ymHlou*JIO|b&|ZjgiwT`(Bm_oG zspqdm#|VCGAs)wX3@ZKwIDz{0`_F8{79oZ}5DX$PaO46(2tKTO%P>(~ydw{Hf`)Dkm9EVBXX%qT^C$_QsBw0B(MH-Hixw{^|GoRO!AAr2 zoMa!>oB_i=TAhA8_HlaGqt$(6V?0N^8*-DyMn-Cag$ynD*Jqc?pFe-zs{4u!26&il zdyY~uRqm9Ud+)`I4&>y*+BxP|F77DKZvg`1M1i-J91i77jwIFXwbD@BI>Fc6BIb^2 z?ft_e%PnFFv>&@|&_x@}$(2f2r}e|cpju?5xu_gcvEa|gVc1PXvPihRvIUbRs>t$> z{-1VB!#)_j)p6+KnUDHCVj@hwr#NP&rz2r` z%_BxdA0QGHzH4EdEN2pv&0NXj66IP8M^4?xP)aU>-Sp{F33D0AIUi`|X=5!f!3bX_ zCBbi@rNL7CA5yMg=2eEq=?oRg(l!=)NPG=sCjRuF=DEDE)_&iKfBqSL)~V5d?0XYI z4#L?G)(UEe0BPmQ&#dFHk8Nl_-$SBYO;z<@wjADhlX}}oRblkuFYn&PxQ`ezmtxs3{AnpYooQl_+e?JrwG+c}AiSLcPp6|8-$U z+CcW`F+4DEYW1`c8C<$_Y42ToQ3-lw3VwH=^?C z|94mSPx2-HzQ2mJha*`=wuBOj+h?Iq%&YV4gx<1J6NiaE9o$VpHHS8W?u!YACkPOl zdEa$wKHV2vmAHptjfK)M1_fzp;YnUfQ44MxDHzn?oAi&-6e~aJMs0t5=T5gpGoHjY zgA}6LrNQ^*_%a|Sr=o(iT!}3``VV-za`fi4SZoBHL{q_;dz0mhIPgxMI(3dV6r2=v zo)yyCrin_5>gJj&W0Rh#n`XVSjW4sAxKp&}kEmTXT6D^$GrI03Z5NHI@MlYlr1-8P zZ_QWZl(V9j4-V6@=74;7cSE-c8vXjs#TJFp%o_NjUUJ`kfVS9?p?7v z76)8)NH8id-&wi{akILrs$YJs@xt3K&sU~vjUT@iR08TsFfQX7ri+wg>CH7(p}>}}o9%}!K;LV)BnaVwqVYCBmk(^+Fb zk2pTUjkH=c5XbQeZPWZ#SC_-#t7RR{QY=Tqv+3z|{`oxaM0YfzgoXgv>6Q;~Qza`G zp`G={j0gbBx`??EZc0uS%;ALAKpL1H9K#}XU;<1+IbzY7^b()y>%}cqfRk97D*XNZ z=TIlKi1RP{_A-$X?|(r{!Fv7rAB--7PK|pvcHB7N4XDTr=3f=jr=IttO>P**E#FtN zyw_``Y%90rx&Gs-J-0S)+*5IMr0IqJCu&;@-qvsC1%uSda2Ro{$e(_l2SlGGfBNxY zY7CVE@r))7A=1?AOlW>){dsypxyigi5Sn&PQ`8alOhLaqOjqsJcw4u{zV3m62)^h1 zPt%uvc`HCEYvNX}G%yH#y7C{E-|XJa1RdFZK6v}VgQU_fT(TtE$DM|f3w>Vr2A>`| zeIlXW*Sd^JgI?nrUT=|hjDnhwyGLekb@jMO!_-Y~baT1UdCc8-pV89Hx8mpSIrqLJ z+um}0x#`V8v&6&a+uBNxTi0-RFSSZg!Cd*4SQh3g*Y~DIiuJ%fbiBDhi^yaZiWvq4 z;SOF2nmFg6ubRyhVv31rV`Bq|1Temg7t0}b?%w?FM>a7&Uh8+ylzui-{qy@v*)LxO-Vg36Sg~spFt3AKeea~fyC3)+lEieBgy5zx7E*Rsu5kG#cClWlv1(z9P z7fXP_^j+NC{$bw{y&4C3#Q4R2enPqw57tpEVSV=jRU9~Q0OkX%kHHSq-J9_jbeXXc z-#>6WgAZ~xoX+n^rGGzBizmMQ!i0_q!)wi(U(NQ?oqE!FTfEgoxm`XmvhUtqW)Tti z6g?B>xHW|9oH#)&W2&B{e3aFQ#6eldtWrVYL#@+so}+1IYpB(g7YI%{fyhr-$+tLb zPjDjn;wQNGjLKwH>P{U=Q!#EFK~S5qCa}w!=?%0%R!;5^@WSrhJtBuJzVOr~vwM7F z=#^gOS_22kYNM5<4S)G^VEDY$`9(9IqXVN@^KDi=Jh2@~OG_KG96io^@CI~qEKDp} z@SW-A;Pg`-S!DGMfL8}oK{3j_gRGFx$SR)z&UT8ds=B%t(Fq(dOpJ%1kp3fgnwU&D zM(aL)6e72ty|;W5W81i>BYEb8-Sw@r`?k}{@Xkmz{Oww;c>lSV7lGcqrCuR}r`OBri{&B8-OmLg$+ z56d{Y6`a=`_E=h%EP2bPAwU6uX2+XWI0$O!KlofjLPAbDlLinBDa)HUc*v0PLtZ4v@Vro}#l^$=O^Vwzg6KRo(l-Q_;9-nxf5Hw*rVA(V5y5jDjQO$dLgC2Zj7B(w+wH8G|`CR{5ph`0X~Y z-@Li7^6AKr*^V<&Mqz(%x4GxpGi1u-$&9yOUKMEUll=7Q6N{nCUc4A!a)48c%Wd>g z#bnZi3AK+N^>uquH^z))KGAee^JhGC03Vhej+0DGs^Mp&GlTc-8HCsnG-_FL8K@f4%!i24G#lJ?m9QMfax)GQkACHkyB4HHMu-*@s5v)lVpI9!v zbY+-DCA~TLvxy=y4Nc#^d+xOS{K-TEGVuWYv&liVPWnUzd3k^d>7TRn9^m;z8g>ov zB=-Kzc~JbIC2!#I_Ub$(3kxutzlSfEcfbGgWhc0z;}Dl&`%Brz2DKwSc_6!^|Jx2@ zf^wQ&G(SGDY3=9Rd;LoM89k{&|aS1#3`f>p9{vVKvkPxcqe}xe@TE z?Bk67O>I1TRw3;>giShdD)A>Tw1*5?S{N#hrK&K-pl>a3&x=PG^pn*Zn0IGKQLni$tz^%5Hym+3hnJF(UfN2Ao*CcJF3m!=nce zwjkpH%ES`7dA7RzgnUnSoX*{DbMsvb(WHfnlQUePDaeND5o3K_N@w) zN9`*m#5&%CgQHiFf9?{zNM%5z+NdQz`1}-u8D9G*(s4bJT(Pi$7ejntM4s1e3`qnkCnLzmwE)UiEaleRbvh5>Jta zm5$Pf_wP+;f|g%!+UPj?@C(wDW6$C9bT3|UHh69SF(xK9m;bs?1AF=|?bLEMrovpC z^?K6Z9UKoOP3bPD;}1GRYjsL{`~5=Lc6DT`efom$Q>3E6dJ zOMI()h+MeEJjr7AJ#f=EtzU0*uX5X4c^`?)-2M&WX}SI7hex>YW-eA#RFFCT)~qRI z1T|)CfA$wD%4fBS_8xZ7t(>YTC*LEZXnuOKoT=tIG6~&K2&SBPHhR>dfA?%S^Y(tO zyFtl9@7?%4g($&WC(iYBstoS1cbviofQzAzQ*7)~ovr%?OI3FFR#)s%T|LP)%-~ib>V!Qnt?@J4U3YoK;^ca@00+kBc}jz4GyfkIMc$7kTd3Q%i-A5J%=`w;5KrgWzl~heud9{P<^W#b5yN?{Wzf?-2u({da zS2fK>aqUH!!PyC0m)ZZ{kA5t9zlFHQu=y?Zcl0y*n*Q~!Q*LT&hxL$_kqQ31`_l9e zZmYz7CsXzD(C)coEoZ)BwfpdIKO;tIDynzM-LOrdveA340-jQg`n3MW@-VOhM$gc> z3l*JwmL(jwd^6r!GPPu8-xMVn+hTm2WZn?;Zho=pt$)6N7+tiiIdZTqXft)F8qxt71 z>gwK?KFpFcYgya|Liy;?%ZtnQ-=3;IUw(b9G}DAL@YbZ*RXP2+tfanT1i9!UFt|FTv2|NqNEq^>ZJ`xM5{QF&9-%g=;s#E^vj*-hg_qt?fqkq8Q+i0mhd-m!~Hovq+yz6LV=e_=G zeAAaERH?RCq=)el=lAW3R{CVGsVHu`zG+nJm-)L3d)jTE-!U&N@_t#Dy0St`<}=?+h)>EK!M`)$ikMDE=?^2@m&cFNJcFAn-KCd}}r|Eq+9 z4^!NCC%sm6KlkhMsqa%O!!=^Byq@p8V4mEGu)2i<+7+U^8e|L4o(-6?kCbLz)5fc3 zlr9zg9J3?4^Ge~YMV&pHY|Mv#410UrA+fS`b!dpxv1_8izrg4*#}Dtj^V7-mmu&A% z@UreUb>DP*dttBBijYJAII@Nich&g4TAsg8XR1tur6|(fV8rm@@pd{!iUq^RUhNcD z``GJ2k?N!9#`GU{BAWqHH$sleZeN$z{-L{Uko9!^TTcx>gkAsGbB%F$%7D~D|!&P@MC&p#irhT;_?Cl+cQI$PP)Ja+)?RGrD`26qffuau`vt zF@=EG&E!mR_$o5FF}ARUZ_Y*OY^slN(U01r5m7{U!#srYNBmkpjZB^i<3s;)xC<*( z>8WW7cNaQ{tbdjuZOt%1XFs2;$pi<+p-p-phFF+z#0g9jSR3I1dJ13oc#SaIc4hl;wfML?O2y7Ke2RC~VfRU7Jy?`9^B;nL{%MNOZ=ZN`e=QtM^k5s!mtk9ns$O{Mjy{i7dHeu)_=K3?!9qDq4 z$&I2tECSKhOcF8fCK{o>EGkdf1wu)mDPT7j;|4N zuwZ%bp+g+3Prx6Lr97C~@K*ZJ?$LC_Po zf4|@>meI|0d9LvOJ_D_wM}?lJwVBmVp`9)Si1;#o58>x5iJtPf6ULiv1vYTPz?loB z>Ws|IyHJ;~Qy5y!@=f%c&7=@B8qq-<+Pl|s-wE`MAT!`f{ngd~1YNZb$&xnubA}UR zQb=e2J8S#y)J-fmRv-;;&yF2`p1hWFAe1YJ@!xEBKUGzsQsWb~=GWY<{F5LnKt#f}Y9m3)jmu*2jmc$T?ie&)@LA(o#?PrkdTL9(~8^h1R}I z&+dI7b>Q+^TWgKU!!5G97EXWxDYSF{k`2Q}7A24E+m^QXI?Gp~kpub97~*ugZkG59 z%gam7yZae7iABy&UFA5iWCpMRx28zVC?4Kw5f}or9>XF}7Cj#;Dg^Y9ftrx*XLDlv z%VzHGXK~WmDJ1IjuX*~)mb29vt660l(t4m&%5cz#L?Lj2+a5Aq+)`9RRU#I&rJ=Mp z%P2{hWO=S20S9W>;4`-Wps86!$4?~Q&wP39=9a*xhxR9*(=x~&eb6J;beCp82^Sv*}G4 zmH8I`ho>&;6fQ(gCEU8->H$`Z1=GRbxhBEna9m&( zIBX>4vp$UGv$NyuC6qs^GI_b^(YM zyR(~KGuOCff9UK<>6F}kfEL7MWXLat{>xU;wVWirMLtnOr?Z8N^!Mk^b^G0fRFp^u3}t?UClV~3Rz~v!|EMBj;I2tZ8_+vaYAxLQTNAg zcmps)tx5*3Kc{DcZ94vzK>JS0LUkW~*kq5l${}`l{{W<02>lAQk{}vq3h-rOj z!WImEY(c?!4CpUOlo8L3SppfPX)DJ`YIcda_#^0Q0FnD_0l~ygdV{40iALcxPRZ=d z%ox6tKSqi!Jz5|8|F{S>hY~mvxS*VWx@ZwR6@-O}XuTn-5J`lQ&|Nf z9X4Z9e!kVQ_wJ^Cd04dQb3_tnnq75E1L<2a2?Krr%(A-q$)hXd`z(Gg?vlR+TC&S% z(>n03;?koh<>r%!2g!S+!t$iW#e0+(v>VYc;;J~{ms+CI^Ee)1@30ytkziE}@mIjw0$n%} zD_ZtD9~J!wJSr#Ml%Pz5aW7hQ10y2^q~TOlI@6<2B+=g9pLE{|SZ^(0U<{-Qk<0PVzVfSo2ih6< z7Kr-P;&yU&2CfU5DQb!uFhZbX47SBnhp~s5jSblt6kKRDyb75rK!4mSe!hEvF_019 z=0a&psnnPlCbLM+XxxV=QvBgxbhDj_m6iO#BieFVyRiqbLGj~YzJ}TRb+AT3e*^q6H{>_<2yFaqkwgBOcAX(i z`{mt)&-;MMi*JLUi#p~QI|!cbpa4T+lF;5xF5EuW_97h#Bu(Zv(AnHUM?gnH)(Zg7 zL#cwZfkNyT&VBNWGZu#z8#s4(8Lp0wk@pt?2I12GJKjbI(K71XTt(iBk60ZLVFc8~a@VBTo5nA{Jy=31l;I!~sl|Tc5tf zMTNJGMGxegD9ZyRoq>@t_?SI5dhjx+fFLr_wUuLnf;!-RXD4Hu?73$6l%o3sBX#V} z+}vDQUf$Mi+ro(h*kFl28JTYw+emFgUjTn12Ahk&Hv{Y610KVMWMPthdmn!hw5itT zH~UshCM{;t8^ zNbB?>j1+;ZOnm*y_^F#G9MxTn@)zKge6eKNQ9Lmnd=2;vLZ28;#5Tal7pH&>IWHc_<7}o<4%L$ ze|bxrdTx3lc`WyP+t?;=(9|{4MU@|mBblF{T5w-wGM<`BNcr$VYkBjuU*0x5(7I7$ zqRNk>oEITK*|u`rabWfOczfINI5;|bqg!55#)V8@}S_t+2*i0jnHT1#VB;lpueKVMmL=@tXjrQB?l@_wQZ__^mG{7t#$K@J51a zF#V1uT7WfRT?HM`2C)JXD{z*VmvI*y3@3sN0)Sap3Z{;5d&0rN%FGIy$Slj3DCtQwdC#Cl9=j>Ym*BOs9FQIbB|xt6 z0PK@d2S8~F6AiH} z64%0sP_O^bp9+KL7TNFETYSNSwY9Jw1=0#(ZQi~}X1%0_E7PnKA`)j?%CMYY%DTY! z#tr;!sy)agNAm?sM@X!}mM}bi9O~Y!GE0k#*!@@oH_@O+M9_&d;(}%kUH}+~s0Uu= za5Sda_^>{qTrGtZ+lZYyNr0kUknVsK)Q7l|)yuI+`4L7#OZP!g*I6g2zmi*v5Q%(6S0o6vRK|LR2}I3_|e+HfT?h6 zUA|Tq#vUwY)kuEM*kP8HEcWjP*RG-#p4jX9X_~USCI?B1fr|G0_f3h}rW4P0%1B5I zJ-)ye^FlMDOH)gW7~q2b6yQRl_-SpU&6F@{MZ<7I2{f~9v&XqjMty_LDXzX zjYXA(MFmo$Q1UEr0yMT7U5A0R;Is!qas}m6c?!AP23N=xvrKh$LO2m~=UQ-C(AAIU zZD2+Y_(qh zJ3CwpFf4@tyhnvyF{BlK1nIbRa1}m*dF9o7DCD4cBsbtFD3#*D$8pE922wK$aTic+ z(iB!5=GGD}k4l`|o4HzI9vF+m@WrasAEhT$1`tvrTHDgAgt>tOuWiDk4m7J*52+bv zA=!W%Y;9egFqe`wtHPg~BLX}1QlE0~&d(P0V0%sbjnLV7CDn_*7i4)bM#6xKJli<^fIM?Fu)vfW zgJdezojl=WhcPT3lq=3jVAK)#f!Xre5e3zJa>YDfOm>E80n7+$gn4oNJB$>h0GP!$OJ=ojjE~^Gi>^% za@-7vqbNWHpHnjgxepc^ZcG3pM(o2Ax|I`2VZC+~8EHCp*JZ$iUA~Xpz-@akm&Goqj<4Y(*KY>cXc=X5-7GC*t=*rQ_ zhV8?10vG9wvx5U!U=Lv&z*FkO)cA$~*nZ5;W+WsK7H^AoV4GoK+VPyUu>pKBgF+u& z7j_bn2b{R4ptu~ndw6hU1U^m;2=Z~^fsnPOknGuklN=j7U)w8KJ&A|f=26KeWy}kE z`3$?E^RhgsQEkD9K}8H&HEDVg4*WP`YAy`}$faI9QhE{@N}yIkx&9eI68v}=!ypC1 zFk}m6z0e{2K>2x&NbE(yg#QNTv49&zfxpFb1APqn1zC8RWD`G0suFGbGB0?|;hOBp zfK?D#9fZw^16ru$_3ZO}s`<1M$thj^(>OApaPit#yTvo^o&|lVUZ!cyL;*0K_;(n8G z{ya~N>=KTI@!Z@DxU}NAPj6eyZ=5SQILgtxF zVeLmDIC-94mByt2N)T>!06WwgD7Vkz2Vk4%Y-QX?SMj}yWt2<8IrX@rB52)1DL!Pv zx@|rD%{1PX5rAq`iFPd8=&7-Fve2r+2KJV9jcXQ2H5MiW9BDVBq6V=`u#<5&<5C=O zx$5n$1Tsm{d6XLoDPIrWy^9jM;~egTYq%UR$%JVezAzmtf8d;J^^U#b;^4~7`w(`E zl>gG07kD0l6A!PRBe?^uzXQ(JxQkr6Kj+*0|z5>RLe6@2=HL*=(van zf!)*f>J=HfHBY6=`-XEDty`IygoeAdg~hAsMO?%t<{8D4t->c6-nMzt$KEr4NLaN++22IA|lejgo8;i89^<6NljqhliVRBqUaGoQ#bnR zRUtk3&MB$%66st0#Kw)GK095<&Lv?sXrBVg-L?hqa-i#R&x{d*QEN!b2!y zws3Rr=7{RKgVO+^#Fz!+%rY&y)izFcVnri_-+&R4^op)M53SSnstX4+tI&| zIb^>BY)5s5SeKXXYpASIR-jvCedw?80aPLX6DOb-y@?3`O0mOm@*pFSzRTec5dNqF z(g9xOem5@j8!jDZ}aFC`{3NR}oT#CDl7<6Ojo7+)ij2OdSoW zLly*Qg8fYhe@#nI?o9T;)dsKh9xS*mJ9Ti6!&@8PBHL5%!xkl#G-Y4mje{g|r`_M{ z2xDMwLp5~bOGh=Jd8LJ?ZP?1M1m!cb5Hapv`p#@?okVzk4^GnXc|e&dWsFdn-58Pm}QBeo+GXY$|DL{;)1LqpbS!mKCgAKM)!|)&>I~#Aq7v5E%@qxIlnQ+6~ z56lq=o5aP%SK?I;&>&j$sp7u2nn~)sRK}G#I;Hr>)owD&f{#voAUW6t2Nm_z zp1fqq8j5j2GKk0-nVD^*z)5dlKfS}q>wN(@1)ChqS=tadqC7AyQa@-4EmJTUH>6y> z5rk;~%R#3Ffc7ISdr*VmDl*>0{fRdSu9`pm3zmdwxoPLCI$ytDn;)kV|7}c9`@J73J&5SA|;GcA=IPoy)tM)o(no$C8= zy|%hArCxxiCA4^Hq=WpdNlZn~5{1(^)6powZwwy?PJMo`O9#IYgXte%zNBVlt*`yP zg5D4<9tbI%K2ITrfFBM*)!QdWpyI}bk?@+q;t{qMym8rorqdGI7s$I8z6xn+s2QeE zSECXfQ?sp>O=0V83*Lvu%2>prpRd+?Bv74wGa*;sGW$_Lx_;OPD^1axkd0t>_t-b3g1lY+fa9(v^cJXz9o^xPwMZY z{jRW!neJ3T5nBa74-Xb^s1D_+pWhlb9lW(}@*f1uybeNVfoVoc$`P(8P@n(^0VWhJ z!2L}=LX}%vz%1O(8S}Jt;@cwuqF;4Y6+R{11%ReI&|X2+h!w!HNWVs*AlEZudNu&X z8d}5RXJyzCV>JX{)#kISZ_`3kx3xWD*z@T45up$lvrgHfD$Ty1yhZ&^xyHKHOZOJ3 zHKbtN(~%^G;px)CLhrkGXCDtjP>B(*pg|z(UJj*`!V=Wq$ccBT0dEr1jm7Wa;sKL- z#SiK94Dpk2!AH8KmKO%!s34Ulvo0`Y;IpKzuwedF+UKHZ$LzPcp;G9fs_&W7TdvgjkJO!Q`mru* zk>*#v!7qd9D&^&s@q0u$9%cjQO3l!V?q9oT_>%Q`98#;BgGKdNBPJ&%C|SMx7M0>L z_c{-MC-xB7#i|<^d;<^)$Xbv|gl1^#bB}q=XW{ka@;42B+3?6)1Frx3_itDk3njfr zV_bA)u%(4S>89=}Q>>pTM33(E(>1s7E(c`G=0HOrJ1Qw5ft-P!e7&vntuNiJbB+D( zo^nHJxRP&7Q|%u5>!Gb8C(O+yEaBfTqiY5D%Ly*5>gWwjp0>v%4ZknWfwbT;P7g~rr*;vivYFX+s{S;lDzo5k+xs>c*|~atDoCK*08oqn z0M*35fq&3qe;!0dSEyjhLTJ^j+@C*1z91q0z#q1xN$_6Dl`uCM4? zk>o9ICABSCvjC-smad=x{6mOjEyXa zQ0#JmK8|&2oyvAsq^}Ow!`3s!IXPozB?b951tpplnfDoFoiX`cwL~+PVx;t1zw1cp zu`C$2izGT)yrB;!bAHj>l6jUpb{vx?AL_S%6q?lsEt4&7;HCQ-7!owLIN$2+Uof&p z?8Bk0nteR!Gd3kT!qr$jVISr7u3Bpuy(D&LicdQK1s)bz?sw;>3qKMW9SxI|brey~ zjy%zc>Rko*O+NMXu(vN@G{@=5sLAdB(&~0`c3vm{OaSIhfIaan*1^T_sIvxk3$E(Q z3al;zZAJ7q?wb}-Q&Q+9VY`Em@ydg;i;A+;dALb9r=@&;1*}f=732VwfdPw18>YNa z1)6as*NYc@a7x3_1L~XS7h$N&{wP+Wt8F)3Q_+C~_ukUSF48nqBzFqL6Kqc<9(@cl^cT#@E!&MKLc>hlU4)Z zP>03bb;VjM6JnwW^9gh@=yh>jqOsW#KZ!RNG=rC zE-pGJP9qTzu17OJB~re^JP7AjXairL>-&_c7E=}}x*E)6?AlkDclFvGytR9)1mvTb z^?X9{3OJ-?Csr%WRlvc9Tt_xDCr64s&*m`N&zMKQ$y8+HGSbuYLELrM93h26L)8+{6(Bd@=O8_D z3J9nIX8B*9y2WYC3kjy?RASGJwz5+as~HwlO*qAREc5Q8w-qU13nyY3ETbPmyFpoT zmc3AUp?FVx3GLyCV>t=Owz){yy$Iup_R6B^6E}9uI>JI?j9$fT8@Liro2;9cCBmCg zdK#06*BFcyIii9t<{rhUY%%ohzvmw=@iWU<9c&RCdm5bj+)nOjHZMvm5Z_1OaUhbX zS^gXVMS>`U@P#5i`p*oD=DrJOf`r@z1gDw4&Iuk5Tnww-_TZ=V`X*tt8N$|%DE#LNm{%j=oUDG})L$%dFTT@` z2P5wi((k&Rpbep*;IUhlcj*qA{W3QfFx|alo=SUX+vI|}daL_K=#(%v!=fKcpyFQ# zHvPG4;|yg%)Y0#3Y$j((w7-AP|M>&Rd90cG{+S93B=Ta!cqpD9q5HA2TvS*93qaC<#DqhsT3`b<9!`#B{*4b#;HCdCC>A zt3zviuU|f}(_<_je8t|3Hx371-GTg&o~C6{QckYIXym9&wcE~GCVIEXsT_wWZNzG} zEK+Lzgck3FNTPba91pK*HrpB^&|p3Rw!vvrQ%!bisWt&sKMN4*FY6s4@Ukfh{&0Th zmZVulE4h`MI}PU?xc^#qelVXxu@lL9>_Bo)@aS}4YvJWqwvCFf4%rc2XUz3s9;N^n{4y0UX;w-S1Ns~;`mraILBX3_{3kyMZKtz?g&M2=7jrH*GD)ttd zC7rPGa-$@!Dt?+N&6mW&gpP9}`qZ<0;uWZc0m{_6X&4y9{Hs7}iyZ+tGUSkWzB)H! zg(3m(hR7V+iP{kw-fX>6gQpXoL<~W*9C;C6yN^*xo&|eayU$jkTZvFM;f!aribe6Gn#Q) z7PC0IzT-2`yhay_>`I}Q988_SeS$r3Gp>*vD+@XlfeWa7?L>G`ap6&sBWC&A@z}`o z>(w8MaYvMf@>qM6`Ud(LNWEyP95&dnic#PmvDz;x?tT3_s_v_(835{Gwutf(Y>4oJ zLqkE~193yn#Se>cDW5-2c4dH$5fmqj`s_0bI4dKj{Z>kH_Wt?SonKwoA^PGkPq}Ta(3fybN@6D6(|h{>-#;g^gcXm&+aJrB0=!v4idj=M;!s3vj$7G~yAt!MD^&)Gh9ykxHn% zN8K7=CCp$o(#YZQl9DhS*$Rqo=@)=j+u{ML8n^&;05q@*iE1yv zu!J$)ec^X~DO0FmX-S|k3H(|x^~%D~2-hz_m19@~g;W^gUY+XXD$E_)qMjH1zMlQd zJj!w~qhO$D*A*WZw*xgL@by*_U;6@pU!2u3+9HvIP~o72#_Tl{DOTWxX;$M#0~nmJ z8(NZB{)4z^u7I607Mz)#tp-oy0oxO+++&Ay@J7%WUZ+#yk2jNui;Pqc7Sv$J5CO!U z-7|?AX9tjFkf_VKWlO7_tP($~20Mn1n!&bIY5C0*Yl?mrr}gtCw2Pnh^_i8P1oxHs zPXPE&0D}&}i?x1dNWI;AC2H2!7(o8Vc+%PpFu%-LTzwMU&r^_nz_buWGC%JR!0oVU zwQamk^MnrE6SShA>b;l>SJCI9v^5hvD@xm^9>uq|ha(E;D!bq_LC1a45{JPvP|d~@ zS?TNRgFqIekDJ>SA2l@azo;%!rZm^%pcL69DvCXh6og>!Iv|9w1Z=>wzg}u4f!tC^ z41*?>KKmkm3h}gd=!@|JjPIybSoop$!rs)P9t1Rq>lX0sGkTCY&SDhCGD`E$d5VHo zr5z;`3kwT6XLObIKrnF2Gua%VXQh3Arf2KYk74ffx5z2)-m$25!8;A+e~?g7?DS7b zmufxb`aL{p;=Cws&8-j-fqndI* zR9>K0nDh;z^CXl4BdP{Lp|^L7{I6VI_^R{k<__W(hZ5S>))r!D{KJ6eqRJG7&cQQC zUpVCT6=nqY@L0)PS^f(-?-K{bt0VkZDO!1Fc79izdtbW!x!t)63;a$rVXx}4qJbv` z>}PXJaYz*1^QlShO{>xkzH9R zpao3IzDrWb?Zt+pdyy(-NvSWX6#97>C6VIwr$BKng*fN-~iU|whkz1dT{R?7eAQbF}nxHph~ zYi-?fIZU(zCx4)kJ|^QYUf}3j1F^{M1g6J8km;`m65)_xfgZloj@jS_+E|2(u(K1S zF>g_djYVc6$a~uTtc;Avj6;0@oaN8LLhe=Gd95jm@c1>DHlggtiGLa_Hk>Uma-zzP z0vH;ygZbmXo(L)dg!k6LVJe&mzJ;V{U66rY_U?L`*h~LoD$X2k>7O|Lnu=M~Q&?D7 zJX2uV{rJ`-W4C0pNlP973b)P}7mX?UV>Eg*6o zlz;FN!I#u})psCz$#} zN)2{){x>a#1mKfwnKdBM_}#bxloPs_(8(2`askf|x8!)ZAz5#65drPu&mTQ^|A>rx zAdCqrT%D-#l@NU#&0WJDxg^#3?pFtns(`lzm(@fj8Ls&y zi1z2k$KAors<42#t{nWU11c&bqoaaT`@g~?QQt0xIDn}J!lYTBKJHn4jQbs@Q^@G; ziVAlNivh4Q@BlPg-%Kweaudxa$~4^QhqlE`&CQLX`2y_WZGbO8-z8Lq#&-$Pv$feuGUGgcMU1#E8Myn{v^Bn3>+QJ}qfGl!}bwH9m? zNF5!*^!LD(Czn77A{dI!J6VD$FnPv+<3g*}1^JKQ6b5>-(m<^|J3S3HryRX8h?i9f z2?>Bi{mPQlx*Ra3CFk0L+MW|rU0xnIk|OSTP%jB)D&VY_7i{gux4|wZ2ZJdA$j@4(<14x#!t`*0`X9I47@W?%rpn;^K&+XwwM046x?fqMvNfD0}z zA3>f)Oc*XN?^akQw==)Nhm@vmxW+jWxj*YotPhuL6g@o`VvSTa`0MuU8 zdy_=bBe>exssE5q?y!c(iu~gQ(SvaQhCv@LV0d>FHegp`HVB0KI?gm38wmZr<$W`;E&C2pGHP*fKEepa1UuE+~f&TAH(juDImm7j~sYqP?xX_z?Ou+19?OYShnij z=rW7<1(?8LJ_3r;1C4OU40g~5@o;3*v5YQbijAez-O&M7m$3^t%Ixwi0}k*R!?%`S zdM4QiF#^sEKtO*{Z=u8m*o{nI%E)?+8;1T#kK0K8`m5U!wZhAMCAhSVG_?xKzs! ziBWA2iLkSq!Gvuq zs7&xdd&+TpvTaw8!q^2L9fci!8Io6!Xm?0ki#h|k_jwHY!F?c4bzl@9(Kd_X0KV&v z58e1{%FA^jRm1WDu@c~P>Rwx56#jtf!)1Q=w~3qsLiU&w`H+bObPOtALP$RQ(8f zui~7VK~7JZYt*lL#nC!@YUF+3>ogG8kq-GY(pv2})6-NtH>^O}7Qe#SZQM>R<8I9E=vM4AxbXP#O@ zlsR?&U}3($1$l~!Thwl*yQ>RP_aw4~&gDqULA&fu3lS@kymiZ@2%4Je}=`mv? z9&OmQb5V=p(^mS_5F;mJk@!io3iig!V^s7y(oX4)e4~o&{brIYCUuctZ|W2n$JLd& z)2k=*mC6^+iO!2gcf^`5?9|MA8w?N9%gPN^2Ry%?r5H9=`d3U)sn$!B}_d^NwX{Vd(^(?G|s(~7cn{1@cK5}ywKP?j_7ay&W{9(dfp z?`y=AlT2Q#N5XuAzXhOf0KpER9SC`&L%#$mjsj1fV+%{rl(DZd!JIE!dgWribU{{j z0KZUi(G7|?#o{ko1TWK5t5^G7t`}KRQ73N>h#k0A{HOVzUKNPR7(E0{ki#B8>SAx; z3%j=QS-SWt`>r#WzF0{gu@_xwb-i``poC7vj!?Lv#^k@8CKsl zeY&unz7QTEKr(UQffqfs;Eg`cwVY&-{tW*cs@^lYkiT)R`D^ABJ>|}R!waX3vSmB6 zhuzPd=^tk9upOqkl6E6yxOe3d#T^HGdk>T_H*Tz#DJKR^fkQWlsS4O=&plYK_(Z)N z$I3I?yuDdv!a`X!c69O(t+72k@}X+B0Lt9+W7e>y%Il`o{zGK(+f8|mN zN)#o!bncK7I3XcN8zLoHGSHZDi&RegPvu^3&%)-w)4BP==%4Om`<6fbkPiE#8Nn_I%06CB6h3mYl#s3hmZ35#71aZJL49Z)<9N=hp6ie|}@p zK$Eg5-^D{UU?pMX>C@yJpWV8%GWOB@xyE~spO6!IrN&tz@J%NNhpoqmhp4Jm$y-YLx|=u7s`<$_Dukcg z(NG_}_LSL#m0*19{9Jb0sNghf6wPAPW=!y!!^RG_6^_>CFRrsEpI>eZObXv%+Y@b{ zdiC1afh$}s7P)53+vB+Ot(L45OShg#nmrMw()6Kas3s? zLWFZiHs45zgvoKXyx?s#K_?jfYV#7ocqu^l@!*{)QRex7b@fEOQKvoMmbS=kPqgkB zeR^r6u1JZig(K>@eFf(qb;7Hl5!bz+);FUWHOB&xoR-*gCH=o2eZFS+h>4WQ4whVo zlX9nu&5Bl5yfxTCJzAq;j;6mmeZtg~T@%_g)3&u~Cw)2ZTa$g+q6ZmBpPvX`?Hsx` zLFvSO)73&bOGZ_71DCII)}zQdP8zX@wsDlQ2)olSs4Q>C6tdV>gg*WJzMtPx#sW_8 zkRxO;oHcrIqQJCOtt>}e2zqdGoE{9hL@GnKGmhCq%zd)Z7*RXOH+q)DW2`O9jEN9h zi}BGf!DF|-pIut`K2GT)EUujMQ<^&TmD*2dHNFnp=HL{k_KKZ}(z>7U6LN|-ldnrt00@fc74K7TZ z48Kv6Zz^CHem{3oIMZC^;;wt7T!TN$Y@4eT^CCy=cPJ2Y@`D1>nfTg-Om0=X?sXnc zOpmoF@~~;NUw1!MF&j>!ZbF40*IRaq^L*Ck@dFMVavAIz7Nt*Ihwc73NpN`6yAlGq zgD80j1byaIb5@q~`YAfBbg8;7hZGkp(PnhNH<|yH1pgie( zY0GfzzFiy$XjNJJtc)-aYn21!+G;KIFL!U0)= z5YvAj=DAx}s%%pMSBQ>?h)Al~w#Qy=684WRQBl!pH`|k&cy7#wKvYyG8{v@VDb+U**Ap6PY+D=1CD#Bsj`{s`tjtX|& zIYNE*++*V2z0=vv+jqsXCvSb1%@z7gK$MW98~ER2#sNz>{qbljESVO#3->Idx|dwPoo5Y1rpWKJmM(+{PYz%i1nF!vB7Xz^Pox zi#=waexJ2h*jc9;*>riNNuciLdujsVc<8<_LFa!Qa^Ch(n^CMX{hjkz%rx_4IKxfB zy1QXMb_9Lp|GrJADJxI+KXWk!$y7;~)n%nJ+n_8D0)Z`J*6G6F(>Z7EG7YO8@|oq` zB%gJs>nv;Ys`hpQJb_89jCKTqxR3QA|BUDqF5$MscTq}4hR2?`S{%F6{Y-$DKv;WY zv5g_P%E$RfD4!PTs&jz}*3jrU&P?(TTBFDRElOYl}bl|t(G zKk?L_>U3K+&s)$1cO6#r+bEixJK>pHTOL7A5O*{tP!ZG_zBt`3=HckPC4H0ss#MC5 z$P)&(pTf*cxfFyKl%+xIR9=z_?$yD%R(vOmb)6~~7ES&nx6eqq6WO*9jB{*C3EQl- zx&K|DBz-0jUeL3+PN>afGx-|-?Wi8gDES*ck<4!YSR~3!H?qdXDvn;0$LI)UPCn_* zN^inUVpR4wYP%;+UO%GatVd`?rMdQ)J$IBf61cRv&i*irHYeX_kTSitTqc%jec#K0 zom#>K!nvK%A}p@2@k?TSSiUnZ{5q~#tK9nB{-u7(dX4UC(OLR^_*%yHTQs#zUXy=I zq}-oZ2{S_rttTHoJl$mTwu)EJK!cH{(kkvm`=Gyy`u}?^fo^+!cD&>f_5_Ut973pD zGTBWr7g_$~TLw_9iuOJ?_H4wx{>#uPfYgp*A{;4jyhEop1W<#KOQPojlW*+bwgU{W zz5j|w#L9=?2$$V;vc#*6?_yg`xn`mbm6kVM&gWJc5bX{_#)rCypOvg(L%Ru-1U^S9 z@(Wk`TMP$(4T(9&5wi!63tVVuOdN5&eQ?XcAHQy1FZj^HqxGCBXMpsSaDQt9el|!8 z7_Vs4gIE;101%`W_5uLRpqCy&BD2zk2twdQyiPgPt)Tr5*~ykmm5@!wKyk`MQZdkk zQM3p40#o#f)!)t?^6Oa`7Jk_OEA7o7>-UpOytfj!P!jYx>W6$V@j)<>Iz4>5Q_sig zOro#&gz*$aflyz)3vsp^eS7=GxZ_ug_PL0PG&YZ+xW^CJpPuQJ6g?|2FzLS+hwrOH z8F>`080_N%qd*1w+T|7rj5mCgtN0pVD)kJ9mT%)Dj73pv*ztUUxj5_(i!3T8;N3A- z3kLxNczzfryC#1E_es3lId``1DE|X~c|Kp?ou8t3-q+s<;aR4Rr@_L>6(+}~<0G$p zPCgEW))B1<%)GWqS-1g7z|R|YQCQ}WiXif6YwJuGoT=6Mqj4%JHCR+WmUpp%_etq! z-CeQrRjR-V>6^TE%9Du+3FV+AgSh$7s=9OCER2y0S$eQg0uCy$V+Z`(y-YY&=4^xr zInVQgoWV$JYH9*24r)~X8x7WV9#9hj#b;z?#ll(*reTU+6fKwhkxcy|ZPWJhx0n4E z(_+tGT0D7p;&>KQvM}Kmf7N(QzY_}^8`R2e%cB8v{>5P6p{E2?Ko&{C85>MeCJlDv z4EHT;NIs&avnYD|`gC}{{A!TO0BJkX#cC;Krd%>2uA?wWc!|Q;u+eN+8hmR#l|e}- zEjF}`=VTFJv;Jlgq8zz4Sq$a>s+kc8uehrS5Y7_u9Y`sX7ixw_q>m#P63)uVFNDeI z&QpmNh9$P0DT{knS%@X8J4?7EZ~d|Tb8zongEy2!4ji>XMqh6m-Ut2()dNHD7;996 z0l*C8USg)n=$pT}J$B2M3eBR7r@Eg?)OsHXryh~d@Oe@5#XLq{-!>fwZF=X zRA;i>^sWltz%WS9f1bl2jy%L+VnQosjV{kJ6+Szb_m9qO5EZtD6CIhm*0ly*?A#9! zgtB{A}r^H`jaKJTlyZ%P?AvuSqF>71$?0R~&nK*CvpH3VAr3vU&gr zVqZ4-O{7h`j`MGA4yKRtY~M?AYFe|BTRCB`P;m1|QKc)Z;*EXD^M~)y(y3S6y2kAo zPV~MDx%{r=BFKDj<{+W7g+~n%a-2-%R;2&|?;yq@s|SYqGs87yPvwMO4meeR!bBY0 z7$B6AwoQKUi-AUJL|C45p_dNaf6OIRuAPz|^1cuMkh?=3>;&Vg-ohl*-aI@?2p@p% zJ{P|Yh%gFt(0>~tYr){J39(L3N8;taD7hlr0*{u&AAc_|5BmlvqgbNu*AHcz-G}Dj zCu2R4GU(FeHct6waers88)(C~kz@Gke7SS=zLUkAT3UZgxTC~HG#{S+pm(~;-cP^v zz7j#7@){?+P39VxZGezLCbKg|AC{iv$Y5v}Fnz{cIzq<3$gJosygfedZRI(X)s2^Y zM=f9j(=eP4zyL49-sNZ1D2P*l`+pd%%?L`E7vpQC&qWDdqKHhm{+3xP`Sg3~c1AJN zqS8YO98Zs_+)(|ZM1O<8sVI8NbDVL6=09nh<(ChYQYB^*`_#zkK+8_*nz}$+ap26H z0R4G`2uaG%bx}*z?paR`O(dd}e4pe$boJqC@awa_2xEji-(ypnK)4(H9xhyoP`BsC zOfY`en(oSskbCb`Z5hm_vbpoh$EV=v+~A7S1dBTVc*i;U+cf0pP*AiE#%HT~Oul5P zv%WhOB4Ji&2PROf^y4=Ggfa7{`4h%&B6`m;{oo$qa%=nKz))o_L&>IsyuFdf>2fUc zGG1RBy-@>K>)=&`b%~h-3_)a)J3e-750mRoS^-dfb3p_&2L_ytR~}QZ1HZ0tB1A}-=c)TH9O941$9tRj zSrw(Hw|SKl2*!>8JCx!{@^q5)lxIF0h6uFD?&t6A*&R(5|6*VZhjw%D4kFsX=a57A z5>6j-NFf^5_E{#_b5^?GA_us6B=3$8J;;j^DfLvBh1)C+I_59-W!syIq;_6X`<_(2 z8p!l8M0*QYRKXA-a80vq1SDpRCmx4B{~ndCU(k6c6Sy-&a5*0PwNE||G{YG2fX^Um z0;qFiVU7|$0ElPZlKV8&B6X2tf_$4>Ls&X*z?uN%&(?xJ?C z{zIiE6rr`xSNNneo~^OV=SJ$=4reg~EL&>Y&!^!0{yRi~o`O3x3(bm5QR~n%Gb3Sf z_|~2i`KA}+i-a4GB5}rG`PlQ1cLfqN4q^geifHJkxwyk0%--8 zc|Tqn_l~XhHN%M4SwOCB(|y~CY7gm9hxKC8Dy#h7HAd(CqI+5^-K!m1^<#xMsdBX4 z%Eo;4s!TD0X#m=ENl8IV<=j65glZ{uA@P zMXG4_%&l&{HWeKG^MY+({Cr)tZNdeTNWGp zn9IeZalHHg`Oe$!{xhddHWqyDK^J7~Vx|BF++L^4>K6iM_al6SZ0JTd&@X$lFga8F z3r%qf(ch{rfAbmgFfD!n>ZG(G!@C&bYh% zR6<2HD+oy$W>Xmj$AY)VS^rcz;PdNTM4KF(7_PUN#!FJsXozDl+v~XW=MQ)onYaY7 zA0STv(<1Hl;50N;|1j}^LmVb{++N(&guuhS1TUzXQSO0Ywpq)C-ORSO|) zKI}e0O{zq+4Dz|0?;ee6a7fqC2Ik^?`CaSAH~)vfC(3Lp*H;?tdvryk$&HkCZ>R)P z9&wa^R)R|YU+tjq0Z#rP6Gy4W;JyHD1$vr?#^N4&YiNl|PnH-xz@T^#vI`7% z>&VyhjeqCj>oF-m4N4byt;kyN^*z(%)gSDV`o4Ct z2k(4Pn}Jylk6I}Z$Vi5lx9#tQw2~|(!*N#C?EL3@LUmJieqSekaPI zrB+ZMN^vfNOYiuxwgSE#t09i>rysQ~_NqDO7F;^yB*rnxJsH^%>Gf%CiYeT^tvcr9 zWB-0(TFSB3jdq#0P{F~GH0K+!@>C^#+CcN^BgY-%}WZGEFnN$F*1aX=vL0eZDJ*p<0sXg3&HFaeBh$$zC=5SbVKZ zsM?pV--y5P#ANB3b>sB|nlqC()t(HxcxoK`IcK<4jn5%X$}*Up?DqYq#rg3dJTERP zWZHf!`%U@i=9BD$iAzQ!p*fvVm;3b#&4s7(mhUXj)o0V-jWbU1q}?4h2xQLz@GdZe&#DV*^7U6*l#*FbbZ_NZJiLjb`D`re zJ^AK?ja$+|SnyRt{p8L2bBPo;?SspklhX{GNtlXpHzstiqYtXad>uzDGzPG#@O(32 zT6N9fkKn9?qY0;^S)p--{fn6yN4_3MC9Wfhw91bMt_uw@vn) zfPo(7UUDx%&wcf)d;h`}XAMf_;I@yF|*iv}|l zVcx3Ih2ph?*7o&Yg$K~79c+qz#6u=ZfQ9}RG*rl8>s=3C5z&3xaITy4qVkN9`oVZ4 zHSZ2}tR9lmny|L}wwEv#aP+K+1AlvOh)yK1P&GXs{?WYl%K#L( z5!VMbn;+A9H5C_c?(fJmVLkiZO$k~>AQX?Sb@Hr!@)0 zL)osYr(YHc1K)_$DSa&}HqpX=crTM{8E(r#Dng|01X!Az;>S^nVw*wD&&_SKzHQsttSz9lA83=woQju zJHAx)jv_q{9cLgr*rt54c7#S9&LdIUm41Ics;%PK!Ib<51P3k?8%T*q5w7C<9^w(y zfk?>KEY{Hmq){N{-oVj| zT3?Ze8M@+Np{%Z7a-R)Mp4C{lQw-$f-eD<^tEge?SJzb4LL`z$n=b1d= z1zlG)+#|_mLGE`}UuFv-Wrcx07vJT-GRjo`-z0vtmZs*+^9~Tf?if8-y?wdq?|q;? z^tn)l{f1@@2oxw(If~Taz?tHBL;?HFvmQcuaF5}eio~K;%wuIFq6FK7G{&50E~b?! z8u7mvjeBMVu>Pa$#VUtS@)sXp%h|n+An>`MUVUe1aBwi(B;~nVOpBJ{6hF^Witl>91(U+b?|+*sIbSKbWw8_u9$Ok5MBKQ5ZFu6_t5eR z{v4N4U`RZH@HdDy)nlr&zs%NlS8^SbT{sa&CGX%mfR|WoR{UK!QejWetS4lHAkjf& z`XDjkyyfG;kC0zpGqtF3Jrk#+#cj4T^y~Z5Rnp#OxSNFdr|RV4wq61E2d(nbz2sy@ z-ZmjkQI^r9YqMzjw>USh%&fluvE3{?Ac|g-J)J^a$1!Jdd08>D3$w*D1Ow3|+?$Pa zzwVtFe0b6ssFr1_%9St7BctQ~POx8HIvS>Ff{F@uG`;Ny_>O_xBf1x-jiq}bCm!560={cuHzfZp(e3KgG#Ar z;m~ueDWM5i+5kv03{N88G{Z2JeLr4_#PkI0Ld2$?~eCx^49s&lgE$<|7ki&ms7eol|^$64dwrybl zqG(muGxay|=hS85>KSVW0eBHZBLN%lU{4kxqEng{Kgc*c;bB6>07vAl#K+StzeP>i z`^Hm5?Mo_G3;aa~5*jQT%)^y;x@WHO!)};-z`&Rz86e}c=tX8xVAYW-$@SV|oHB@t zoK$u*5X%2$?;>k_DLn=LJ?DIUZqF{b0)L?5y^aK9{NnP|F|$s zR{m(Er%n0~GsErXGAqph|4cr48#`BLXaa#M`?vD9>44`& zOAu|OybhKFUn-ZmaVi$V|JcOiboZm+&%yB+*43&($zK3#f&;h#x8yn38@8V3hYWwk zG;sQSwzKWbJP@Wf z!QR9{VXAOs^IzXO@JB(l^t++Job+4r;q|%6XoGn)Xgn931@xmo&-Y9VfXCT z&N2G}iI2SM``K>v|Bb*Rx%67XLqpwX8a#@ZidQ02@S6Lz#w5`Y7k9boAzlLLUHtc^ zRT}7creZj-0cf-3)dhG9b{9oQkX*Ygtjg2HjePr>S9M#Hyx?is^)buwzWB3{^+I;( zM*Z=ZDxI9`nfsdvm9`~4#1MiTgv}?Zl%U#&ml}9d@C6{vX(K}pLZCVeiiK_k^6b4o zd9;F>WED2JwKP}cUw312CKv)PC>=qD$k?#Jm zoD1FuU=Kr~3oM)o_+03h?;J7wL=QB^C&=s5?GbRU*`AH6Hu$AGYU}039O9SExsHO^ z72~Eq2E{JKp1-km5bVJ>9A-Ix3N@AQ#X+V*R$p+z0M9{#q7?V+0?WUUhjzr%>P}Gy zSL*iac|Y6T`KC0c3EAL>+U;v<{tT_nw_ChJQ=c!Yd%m!*a2U)WhmJP})p%)Xji9?e zjDt#JcXt=k6C-%@9wb2!0QqGB2pk^tJzHXr)RN&pecqKyME?j!F>hpgGIz&i&iX6L zJTU@KTGHet>A!ApB}xS2N^lMI9J`qm8#@4wcF64z}1}qHewfsnG3!52Kq8o~9jsCpAPvM<0}3dk%c53}~%^!wh+8 z{Y(a(Z$D)POF2@eP`6oq|5qbDKS4R`h{yQx&IsQ6kJ&nf(ee-Xbibh@_HpQ>@Xx6Mi284JEoD=;C2o0?RLG0ku3#@n+Y6yzLuiiABAWQ}d z9;B+JdJNw?2uCL$6$`#4Mv$P3qNJN%!Fmpkm5tL^opUU>C}BGH*+~?WqFh5RkH43D=Z3?nw;_1{D%Tvfe3rF%}~u@QgY{49j0O=dy$^ z`Y{`qei+K-V+U~=yayT~ihmFkI8~`sW=54GW5TgbYC2(G5-Fe&AV3+4H zs7H@gCC9d- zey*JMQM(T7Rh3j$c(u-wbDBkPs>M@A#^;{o*h7MdL?LIw9BAkNw-z_?hgq*EU$hrY zYoT{rJEM%ahFO1Ecl>BH@>A>P%!Q*DR;BS~>adB=a ztl=gQy)j_ZDd-?TxIPQE{J`3QN)E~wC^mnQ8hZb?vWv%hpi&acdh9x!MTkx{(Qh}- zt_LsvU)1+pdipq{sF8+OiW9M!7a8fH=EnTo2nL+yp(oJO!%GfWhLntqi0=i`mj;X< zVb%wf`0~l^ss|GCv%AUYbQLu;a=Nt)jQ2G@pDM1}oV!V)V8as&%YtQjoF*NBB~idy zbkLQ-FofYhjSD+qcrXNkd4M%6>AOkqiyjs%67Yg3|Dj|TQ94N&THzl6CdgKm=x_yD zV^Kb=b#GG|g$K8RlfmZEKXW$6de{^VlstTEFcgE*TgXBqh`NUnQxC^)`k(#^!$DCa zuW2gfdGbQ3;UM1MyfiM|D`Pjb>5nK(@?^nuK}|;^x^M|+heOIB9}2?%ZOJW@LqL~1 zwC)Oyt(275<1^PGj824Mj@?XGZL=NL(%7%2>*XgJr~m~DQb7oj3SXJRlZFWqQu6}c zE%5|lH1dBTU zT@)rPgD(gfOsnb%m_AdwfQCV&vks7=ou-3l&_%3UlPjl@IXpcu$y2yo9@Wb*8mg@f z6YpPZ3LUq>zS}-qmz~Ce)3YRpuc9gvLWS7+ms*x^}o=2ok_%Z`Tw;>|BT>v{3Lf6sy&?=bR zLb@oS#f0Crf?cuT?6W?ff=}qQtBg}b$xHFWESbUZzC@Xf>9skH!&tS`9phqLYa;}f z{tSAVmt-g{=C+<=6D>_=a@LxorNq*-Cs2?gA3r$IWH)8EXHC+nEn2jn{nPs|x!B-w zGe-VpIgmD}+BvG~FJTR3h3KY-YKc+Z7d`Wq)@ad~@EFrFJ-h2OS?%**@xf2$w2Tl! z6i+YG>dpM_NG#kQq|hQxqH8Bh?X5E6b>u6P+2q@gXq_MMZcf}genUa#-*$xMcgYtG zx&v@CUW2#+$lW8%fN49w`v#Yxsxj`==ZTJ)q6xfZ zA(!3#@Ot!PAQ;f-f)@FL27oMp+*QBSO;9lOqy`V|zg%)I!Wrj^Wos;l6VJ(D6R zFR~oaFc?-zaRG=ZmG_&NC}%# zoA8^|nr5Ud_UClU=X5MdjiS6Ca3S%RI8a095dH^A-5eOq&>>$#H+l>UB`a7v`RM7* zV6sL%MVZ7FqG84DMEc6YH-S1g5@T!m(Z&#VmY=o3K97UIM?r4Hcj_odz{>nqGu*(w zVhS=Qz~F+ExY3IslD^AheF&a7ZY`FvB>#~dkWlQ>q+X3G=l-%7r;Gxi<)F9S* z(O~Wz!nm)DvBvA&KjI#kOK)0F${-E))`j2S~C?q&%nyv`*cWmSN0!O4f(Wf-a` z*u}HCabp0I7PPKZ{Elc?bMNa%&S9d@X-2H^a7Yp?Y$rz|SgFRD1NU_^+VHN%??h;$E7 z3NlWcK`;gu4Ddo^W}K2>zvskbeKn+7X~Zz%L03t>*>qZ%0_?Yt6M}XLlUrh~K=$b* z1E*dl*vJ5?m>~SdX*SFN!P_uQ=p+McW72NRgVaq8%EiS({W={U7rTF@_FP;uBJ`ZC z9G+z!M>^sqd9cnfQ83O|5MhJY*J z^NY7*+gi+(mt_$#4PhLp>r@EpppsB1J;TDoDTLfPx*VK0O9lU}I}3zK#lL%}HFp1q zvLwU#ck+%EX>n_w0zlobTrRfn0^IzzvDP??3W2_pcZ?VcKFtOq7hTZRyRXkKk^Kr{Dd<^2RkVG!3Gi{FKZ8M zBNQ-}fPInLd1`0p=cHNnagj&N0afykOKnAe&g`(})87v$iz!_^X-A%(D}lH{sMsE)8mr>QUGNHj2?f_$xKp~u_7lTF!YBEPkfA6ZLMC^9Lb4+lL&n6yvs5B<3(9u z&miz59tZ*O36o8j0E;iu6~7@5HrD37cTa-&`;%Gk%2P<+?bQ?7YNc_mL%Pq z#TubeoJwTtP0qN-P^!o7v-wM_LWu<4u4BM5gIB`D!~~O$H+>tT4iaI2*BnR}i3$wO zvRW$bzx+z94Dw%)Z&w^`gc%aH%Cp|t+P*L3V}eO#IbOwhNr7FcfndV5+}fCJ5$$q- z0P)r1Ab;FO1``xx7s&NRfan6n0u*s84PpIvj*TKXMn}6XUBX+lOg__J!rZQbU++(W z_!Cxy5PHVhH!@gR1j?(S|Lvn(@t9e67J-Qw2rPkd2YN_*_%N%2wKnQfS%<#T*De9o ziY<%Qr$m9&pLrxMp10?!0G;+DJ2wKIai{w`M!IY)2P|3t6T)vy{JNe1SKl0Rx8%{5WvK2 zP8?c7>2e6F$YBQ*h}M5TIY99OcWva8swr@F0HZUgu^~rm_xqu~?1)lb-816!#wC;v zr|^Y|Nr1z?W!Dc=a%~1FMZ`XN$|?zWr+nUTfPY~pNtY-L!r+5oXaU8X$i;3u5NpP! zSqjB4^vW!MGh+U}v$IiiWt4XCc5zI#Xc&UPaX0G4CDT!neI8EwJyoTa3N2q$L?U(- z24J2Cz^wo)Sy%8vZZ;?#wA2A*H+c4eX8PTxyEfVV=;ydsiJ~VdmfR-Ml`=L-&7+w| zg_+dM^RW1lXyEk|7+^wwx;4{$4#16TW3yVG0*&`>>w_%sdkH_*9@4!PR_b4x*z3_K z{j6Uz!^i)}W0`PdT~6oAUPSrEXD4$S?;yQ%TUwnPv3_u3r#Fq!#4UH?m&Z*qNlo4p zd|`euQao;9P(M&%Ok$zcS(~`K`^H$xaJY`0_%b-x@F7;{o8>r!J2R4XRH{?G0h;Uz zA#|sf(r?2wZwfE-i)3kfCd{>Rh_X6#cuR+RFBmVi1CZEfX29y%3Y#`H`%0K0ijg*b zIIwK{;$1q+I3cH${Q`f@PAj5QFSi$2(Xwg}Va~slm2z1qjXBQ;1R5|PJ_9*7urLUv zK^5?O==Xs~th7JN3%cGwye38b^$=!w~y@tD^C_Fen%D&#I|Gb+k1aW>QMSfq>* zUKB7JOu+{f#v_xRAk5I$$>rC}9?d^&SSh^r(rV9RrNXsFDqQQ0+vMi8^*Z3pI+7naI?@~us*zD@ zqoM=k5ZL#%@@iipxeJ#R3qlAdW6=ltb*RAsH5~?W7oBiDuw_2OQgvh=0d|8xn4$Sk~=H{`@SLE(b7&Yug-TM55krdkk?9rG`sViInq1| z>bmqg;$sfs_k|c1KNE-93|MwX63B2u@r#BhuK99lCMF$L=(BA}=e;oITp%OfK|?X6 z1pVfZ2U#$1g31s?6mOCn!4ahI$?OGiLSTS8hxpQH*zPS4@`K zKfL;DdpT!N(=4d(!WH}&%0$|#2yrMUfy(83Iv)Z&2k@7u8$E?l-4=v4@WaX9D=WLX z$&VT5a;h?s&BJ-OTy6dIoc+9CR00Jc30b0+>x)%Uh=W6$M-cQ&)eDn{)^l~Gq1c0~ z1k+PlJgkRXt)6_r^6Z{225BBW>IwPxd+`fNygY7~<3%k7>(*#=2Rp=DM@naTux({)iHQ_U3-+*@`A+m7}=FwSt zP4-fW`xowg4aLqj5`_PQ4?`{BOxO%C8URvYQ~RgM!>M3M{N^ssDv9-zY2ZpJOiMclb0|QbKC67Z@yMS*V(065 ziPSxq7eBkDaIsI){HUAbXKR*qN{K1+i+fg9S|9H{3pq0+OFTby%cCcWQ%}WU(i^qE;N{ipo!0XPPaQSkiV1x^^Sb5R!;Ty z)O+4f4}gkqn}O~`g&_8+8UT+gcgPq>!Wd>ipM{fX1Sfbd<*DLg&1udtJ2OPh#Jj9N zW3`~y+{OE=?P|lkWsHJwlzb@y(%DkIdM?;CQ>+0*tp0<)YpdJakCGI!MEwlL?mv>T z#9U69WdcF_eHhhzt(0O(d&1&R--ystha- z9CG7R5W+~ceV!={D5!0GIgd2VMYk((!asiOFqXx=IpV=zs>kg;F$;o{F|(WBHsfVc zl9tvr%;nksLjfdoDYWS=*~cmQ~s{0|@gm>%>uCQCL4eK%O04I>?eLhbxYQ(l#0>wXT; zo;B*YE!P&8Xa*4=XjT7aLHi2~gfwoki;uTA2)DppR6N;y`Y{dn-uc;T7I2K=Xe(GQ zq?GFMa3uE(6BWE#eblZ)#Ys`t>7uZkLZqblHMoe-fArqY4J~}MBn6)Tap;3XLSV(z zVAubgW55}u!ni$nkc5kY8~69}fK;N}=5jS>K;b2${4)tIa*fZA(%X*O&-Xd;K0se4 zO&+BWE*PH8Um;;baCWYaXrJc;kQCw^Tqg8Je;bz3U;4BNNG$1#CUe_J^fDH`;D1Kx zM0q*P6N5rW;D0ZqrlySy_8x$xmOJ$9&ZwB}95Oi$0B|KRK1JG^LOro&vff5*U!D-B zPN=|&g-9bgX!RcH)gcXx;E}xoLJPGZ zo+e!f0E2ZTnyq2thz)jBu+o_UXB9@HSpcBiwQq8Okr|L2kTEhS=+77RSD>+GSl)_} zh#Nf9p4&isHTpN!ZRcq(BRwXPi`KsrOyEORaOFJun&_ks^5IoK~9roC+JnB;FyQeQcWGN=UoTfT(lEVDa^y=5NKBICr zQ-eJl7xp4K`myMfC1M{82bX}Sj~NeCpOmFMxg)K3%5`;&b9Crov;1Y9ha&?YhJWPP z?U(^7Fh~H1$lOB#KWK_n^kuXrJ52>wNLNZsRAv}$^mA;=c|NE3+}Y~xu!g*$Eq#Ga+4ObIQ-LBz<`W{?PvoqzB5XlQik^!%hOh15|% zDdMZ+OmlH2U&h{}q)hd;V@ia<_nv&Jzq&peg*;$x4gSVpk9PTjtu*@)(bO8ZVrGxU znpwoTdZzQ(KT)3xAT4gxY$f@OW#s13s&o! zHf)W;?VHDBGH-XElt|=|BMgQW`w&ALC&+OqI6y^4f_=L*5)_30;J5dk6wLYh_6%r! zV)2nEDs*hHtQ4u?0M0>?df`Zq_Y@4ckW6pDk#1sO)Y#7&fH|&5S2?D9=^OOUN!dor zD27iZ=HDeucsXajN-Q{CzLW&I>=&8HK0YQgy9pE(w1Hid%5ex5L|~G0_wMFvJ~J+D z2otbiN*Iz-nACnYeg}sVa6+Jc1KK(aLZP;ROZnxX$3_<+0xDRT%f~=oKRZ2b0`5oE z=g2Opdd=m3&BV(bTgX?GW z_0{>Eh&=FCgTVrJI~di^fxD^z`~w)Ju$Lg-Jdkqx4iO2-(V#5NbiGr^3~p|eI?@Ik zf^zhtqu!3}>?&Ar(C|Cvwi*F_075sgGk|6G1qiVZ-bw`Lxh*E5%D%4NcxRzZKt0ty zBNDpW*YkDWULRV3Q9=0%o-R3#a z7m=&O;T8c+415yc-)%-qrXkrvRhq`|b#UwmI_47nMsN%q%s8bB3dW^s0G13`S@(GV zU~jqLg5L&{@M4Wu9Au!NfW}S(I23Cwv8~k%Wb*W1%Ab!tu}*Et03l!(MTRBEJBw#4 zJ2Ia|Znrz~6TUrUK|$1eerD9lW{RfZfw3(z!owFLg(;v9fV_i5v>^k51`230td|HY zK6rssznf?ScMa-qSfAks!$IKKJOauhIQH?Eoa7k)h3kjL1Duo^VgHM>^C56QAoYyG zmb)}VPqk5Tie+^1iK@4wBQ)kaZj7HFViI&tj_#e$SH;FqzI>Le*)q_D=Px0x?0|F^ zhxh|Wmx_d>8nQJ^AR%SI5qSx;5_taLQbZ$s=hr3k$-NYKQ;q>n2iFw%ADaQj1NICs zn5GMqX@I}T-hH8cC=2{tv+h{9-{8~$8xB_BN?Ti7VJHNDxDBV_scdMKt*q5%O^2C` zLejx+{>eAOWejNk_n2Vj32GA{#loNw#v0nt`~Ytgxi1{>&VG?|R#<4IcuzJ^7Z^?8 zXbkv>4P)gH$GEJ`1x5V_0RA=LdTVg4>EtsA{&Ra7fvasml> zZyyAKE8xs`yILX%K$i(5AWpd%zE6dzHc(AzJgzQ0;PHULKNarH?b6sY?ab9%bU-GS z!h{R_<WFMm<~*)veD7<%ua zQ*q#gaXY~ggf)ENN7zq2i%AFTYi)R6`gWLcsY7;!VJ!fh<9>$!7QwEZKmvL`*PBb= zC`uOuhRf@E+h546VaTki6Jm7ZL!Jr}Vs40^SF+ zMiEe8=9~p8Qx;fzKtBh7I&!len9>0Xpowgn=?;MgW%kvFDg0jl!Qm^L#H26M8%Gk?dSrt~a`C?3J27eM<07H`m*RgQDX@&~Cvjb??B z+?;f~rvm1o+*)DTa*w~m>3@u!0O~KF&Oo;7^X=MMAnzZ9w$J0EeS~)lhz)$sYhbnN z$WD-(O5?K{aLKhF^$J6ha7tyQn6z9wPKi+{yvAoUq>Bb7D?CdkgNbpI=x!yLiE@4ipX@owQ-=zWs;Xc~v^aT`@GJKpcXWZmM(x zT*E*cv*&b>+re&FuPm2Vts?Bg7B2^)>uLj&DWb)oxRoOo0s_m;);4M;g!BvOnxp&y z(t_rD1TYqQq4hdDdo(B%e&2uuJ6*HV{F zf>Zz|1$i$3F4L<4Wvsfec?q^4fQi{ZyF-Nu(=zB%$Dgc_NFPq!SBvea_*rSF1CV z;N-IQXnc7AwC*Rs;GBPLrhH+k*>>YXBRo@S)ow$>cHzqzkq>h`3qD=4`7Bw!HzQ+f zFBXD5@dTS=dQ}EbG7BJ7h%Aw8c+^5#kmC1&4#cJ(j$ct{W``=-P8sD~Mc7FOewrzF zBFsS_?3|^0%xB4)P9Omn^~-wzShWxJ8=Or{vWM24TbAJUL?Y>5eFV3AcE~@*8pqL= zk+5vG)iO(pKH$AuuA@DxCUNLG<|BE4yJd>Eh;CKt`9KSM zp?<&1qfwxJJL&yQVUOW_CMP%NK~2XEp9dh)kbxgyKYt-1S2&;LY}WHfpKlbFLLDpu zG(MC@?LPXw#?9XIxii?I zI#9__nKY{lM$~4t>0i@2XB+Y}1xO2d2B;|fZ%DQe!UKTy#CtHWgz^F^UXaRHjw>nq zC@LuxfhI6Y6iOx_N*s3eibu-SgXr3O|HlI!>pu`nq>m@P|DHR8bgh}Y^C^(p3Z`O! zz67SRfs1uZOJ+K9UY!|VQ^Z4?oo=ZnJHR}^uG^5SO;UuLw6vJ#5X2jeb!^OHMki)YRs#4pjaKeDU z4V&SSYwd!u$>Yz8j(XiQyU1S3U##872z-NK@b45f(RMRUR@@X&xb1BvU$fWQkyb1- zTauPm7wOtfR|WG?jrF914+WX97XQ_4>601EXL2{@u2W=Dwn{`5lbDA=2VSjXTRXYV zZU6dpnG&NDA7vFfcTa_DW5(YPJrcyy(TlJyUI@I8fno`-)bC4re&0J${=Dgr4xUPh zc~Ie0dDtLUIjAY(g3{~|+o zLwi`E07%TL-WV_DtW^9!r)k(R60h)rcX>t^C{7d@w*80z)o@?k%5&b zELf^v#Orc^$h+zRhsDi%SI7_q<-MiZmU2@d--L%W0qkoStgI3?u6eG&tdJ?3H~lGF z%1~9T$W=M4bxGwg4j5Wz71=bok}aViQ>6bgNW@@|MnT$pXK}GiL$Q}}sEA|!*Q4RO zhgzQlD98}CDLP_p9KDcy46dG5-fV?&w$gjPa;Q83QpgU|UtwcU-87DpxgRG7YSN9F z&;U*lv;fHpslic=|C&5QtRyeL*L}=E)|elF4#U0a2C5bF10a(dg(dDi`Yt>D2XC`D zlUow^$u@!spZn_2)niz*K4%Kz>x|#Ek18=ZXHvonARphc<`D`yNLYE05E&ml0H^B_ zsv!sz1QI07FJ8BV*Gy-hphd{9pBn6_o6_93xQxt+|A37_dp&4b2}qmYiDTv}fAl1@2S8d!R^S8#am#2%X{Uo! zZJ2bh5qoj*0GyEjNu-_fQ%1uwAwZlhxhYQcB|tm&Aj)voe(UUNoRHOacD%6jZfE`T zjF-wlX>oe*axOF9sz3avw2tCR4xg(An&yh-IQhC)J-y9eDh(4%AB&}dV7C$AEckK& zF@33U(tB2qR^-(UG8}qW{FtQ2c;qCxCd2oNl{2*8NbU6s)O6_;PA9d`Bp>n z<>tmliLE(_xVW?m0lriNQ!|#Dr2Q?cgwOT{jo1B6f*$1Y{!w=vzpgy~)zoyd(ih9` zdeYLS{hu2V$-1rKc3XcwCuB2$U;RiLQ=FU`gqQ!PbtU<$jiyp$)Uw^? z*nT^oW+6zExpcXEyGt^L^XM+wSFUWUv2>|y)w6{4Ww)8F>qVc9PM-4Is4s>bx6%9$ zf|p(Vk~>O&3NWbSuJZjwLXXh+L38r?UtazD>KOAKSsF$L20$b10M`WHbB%>ssL#b{ z2Ozync$)Ii8;sEHjf&sTH7@IO6e@ihXLpWsDnME?~HR)#2S*<89w?QR6=8Tc2;^kI+X z-}kN@oB>V^e$~8@<@r|GOt>^6O`kqlf+;9#Pk6SRvIHhM^y8_l>aRFQDpmr6y$X7l zO1H$qBMACUQ)DwahuubtC%#S>@aIU3-SBt9!qH3Kx`Ys#zZ1?Qxq*Pi7i4Lbz=BL@c95k!XJ<*74`DqUe9Lix$@YNpv^3&$Y#8oYn_+EeOL+)@=yU(FaY$##Qdt8mb z%mVcl=0^xm(Xf5*&nH!=?X#6H;bLH>;JuP=s^(} z)EupQuaW8Y665-;>SFI)EAh3$mm1sWXH=$?F(PyZqG*aWr&kTsqPfwz-H6w|AB;EP z5niuCku>AAB8cIO$c9rS%*JCm2$=|xgXt+!O6jtM8sjbG! z=HW~I&c^gEoh`ZVqY(tkKdXspndDu_U(<8~U|X%$KI(E}TO30**{8Tf=a<_Ca}+tt zA+>*B>e#^GO~fiz+W9mH&Y|JV1i2DS4aFUyJ%2Leo-?~s(0I8>bN%j#y#dT43Ukd4 z=i$4Qdq|!Y9Ke@@A~^Zb2TC1SItd*sE)@k#AfT58&8__RV=A{PPFVKrwbMMha{{f1 z2i(}}vwlCjroB#Z70^8SebX0B;7e%icD{q&!JXFC%mLR+{D|xZ>VN$a6Eph1S}ro= z6=ST7@_C=A-uPBJXLI8e2H)VZ>kWF^_1~*P{or>hy88_k^CR$HAqq!>X}AZl6kxE& z4T}$?c0_Ks%+7*6f3i8SD-~D02IRo!AVFCKm_q~-xFN_5D=aJ#(OiTQuwlCed?^6f z^Mm3%*cetH%wAueus7|tnH!K8^P7>4SvUzF`FW?aNU)2Tp8AX}GPH`#{gAykRIoek zsc?C-ym-RMAe2TTiNl){0aqJ{C&CAusSIQBz_HuxO=SRI-Leunn&+90XV$cLAz?{8}k*ZRCc)9-ifBH485{kZg}o2b_XTpRG)ex|AaoUlJT zrpg_x4I9Pnbd}xGl9VMg<3L^B>vbHMh*dtm28IgR_X-}MzJ zgxx#L09t}BUisq28!^z8k_7|ACZCkr0IHtc8)FugVE4Paw4^oS548j`1VE1j4Jp0 zBZa%yn}#lObPfqvTVRXbKJ~gCXXF^55E6LYz&Y`|SfnA40PIZ|25^EqO|YgCtPo$A zx80Qh2vQs__=Ctz^@WtaJEbpcAYnjEbiTw>q^q?awuTxRIx2P+7MMKU(!X=MU(nVr zM&ZFS`bB^4vuyQFtx*iAVtmcZf*RApi>vFp5lu{C&)cJV{ng#YdJk?T!Yxz!R9N}t z3uHjZ7R1tVY!l)S%ySDsKp>J)GK3^u0Vf48?hoOJe9pdQN2E#l5cfuybw<`d>F1o0 z_-Xu3`6ZiG+i(BE&K=~PX}MTtN5Vbzdhq|2DKL@l=+~otz<6W9LpHhfdyZh?NHWPj zSw_*s^K={1WU%V33k0!_L}64Jbd5sanqZ z?1iXfvYA+W;8?K18G!1A#}nTuvUt9!QpJbHnmWP6M7U^kW3=rH1`%py(b$Pj`A+rE zH2bAg8m8G2+U4c3bt%e%?P;oubHgRPg7EvDa!(@7HuCOk36LRt395o&kDM6hM-6Xo zCg{kLjo*Qoew-uHlA9T`&1F>oq=QHqmiw^7B7=X4IKjqia%2LVm|a!UYCtAi3yQvYkm2^%F9)N#uvL$=);>g4V~jxQ&ww(+)=LV zn+<^9x{Bk2|95)@tBZ$2y&l=33X^kwLhNG|pR6Ia9!frL>_7n=4I0^|-j9K<)TNte zn$6Ee%BLOJC%GenBszB5DMJ$C33`~*N8TE^PqT!H2FvK?wF(c zwF?uXu593n`03Cm#E6Rz94xX$IO-~Idn;JG9H?U+P~v8-)3>nx94yH1S9~t4(xTqV zCnDW<_==$6*sEj3==)!}Sa-)3hhr$Qnor$k$L2qdUVWw!CXRTs$AQU- zkMN}q1NllF{8jB7P}19bXpe8ZMmlhi2@T29?lo3@b7=9|OWPE>&iLiFE$=PT;V@>A zBKm&AZ|;fC74>6_fMQye5I)_nN$Kf@oBx0TXNd)?rph2-0P^2BZP#2^KNfd0q4jW5z2+Fdl(sstapN$R8tQ)#BD%Vas3MPz)p7zLd`15DVKND{#DY|4k ztrb1*#AR=B(_0#fVcr+LGapWje~$w3B9M35LnyPiS83zGrgB^>`EtL9{T3aNwrdyE z=m{S(GiR%o$b@_`BwGH|iLriuzr*}BYucVihTgUN@MD+c?GeGUY4M*51noxl6%Jo$ zFFJd^?%fY$zj!KIhKCH+td{=P@Osdh0X?zm7hTUj;MA3sl|-Wt*dIL_b`b_Ych-V{ zj-B84zDmUZvZ&o{68X-1#dfbpPQ84z&_-*s<)w}Dp!Z|@4t$O`0up_X2CCSQauH}( zg$@TKT^nK1z$M6;utVCNEkb#?>s+^Wq8ABCOVECiA6{}hiEKIEp7w;FyXJWtHBWzz z8h*Jux-N^`Iph2zc-rosY@&KkqPgMYqs@bDL99_+LIGD}F6OW0CInIb^4JKifs#AP z=CV-~N4<~xg`n&PxioCZ?txMU_Od}^X7;@b5(+Uf@zTmlraS_b6!R54lG(_57B_uosEi`=kfY5*rjyp*D$QFT+Ai1E~W26Pi{qO5%p22;#L|3Cu!KLt$ zc{-@hu@hDMG;W_LbqK{cQoinMRU zoc99OMi0g@qUvxnzWbt{J=1@;^yhT^6Jc})(bp>f23)FjufsuE*h{qcgs1@~#D>IZ ztv`nSo+!t*oYD)NJii=3VGSKi4>Hrfa@TQg zC6$;(so^v|!`0Ny{1iKeA{I5!0jpi1)-#2~`DEIeXgV_?;b6BNZ-_F+Tw6QY(C26v zgK@53@4QSyu3^vOt9Hb<#|c$bv!*dO*F8`B{H(m(_-VY-`fM(03h497_RW z({D9|rtD5&Y{U=!1t`HinrEQY`Un%RKldrviHOHo?nMEJ7&?& zR$Jlz=6YAWi>Mzfqv6eGk=Gthyrh-_lw*S(6Yu!uM&V7Uo$-vA388g=^I_#v^6KNj z)I)5mF-!QlT;grGJB^ zANooN%%ZkYsyz5eF}CbD4F&i}X9rVP#nxhssP7&dJ|D{-201E0NA5`Fv~Q{N;-kUR z6&VY^PnWwsTZW_C;=vzT4LC zHT71Q;r7iTS4+mc%=2r5+pASr*C(?~gsqwMmttS9-w0CS30XI8KSN`oM@eCSN%W1o zfjn5;G|QN5IO7)fa}ja;CQOz7mx{M4S#?|Jybj#gSN8tJN&eg~O;Ya}woRAwAUEOP z+PfDS6+jgC*}muT>`RK5Va?~agwGwR8jMO7?ETpou@%TZnX2FvZ3-0oYW2L+ZF$S& z_L*p?eD5#W*0G#vXPBg{Y|E8erZCjQ1mps)9{BfuZY04WyGWwS<23z#g<9ShZ8>Es zhJh-^0Ewr$nDCO8-hhz<`yNKynD6M$_gG|%3ED+Ou?{i7e-6z#gPm68a?2j&*2h9y1&Xx?g+T8z0yMX>ak@Eal zz>!Jlth)Q~q0Is1$5kGaEEJnYmU5Cg6LLBk%tQ_q&2MB*Q{yXvdSg8(z62w`TZ7*) z$WsWYQSwP?*GpZQ!TJ-PFlH!2uygFjKiC=2SUJ7r-oLZgKo>cx*M@*JLI3*r!%e!q zeU!(0B=Md{ixe->v%DnET=`uaVyD^qy$$Zn<&53- z1h3pwn|Fr%Q+xJn1mX{~B^kFV?FxGNHEsXo<0<=RL^rXe8lL_1(W)>`(VeWm|6lT) zCG*{Rl#@G#?YZ)grp#Lafp7^w#gfOE ziKf0+p)@2e6)FzF0|Rc7LP77_;BmBNn<+`A31c(`&)S5#oKf5+FMlW(jrSy{{{?wx-fjJQ?eS5 zo`K4QKgH?gU4bsbLT5a-v(&4GSJaA_I8@2M(b!PuCYy1^&UF_GgV(w_IQk*G^Iu<# z=8hZI8>ZijYEF>vr)@<%Jo&CUC?%&|Ep)oI_>~j2$Is8p>^S7#kHx)gEZacN4}uhM zttioWljIopf?f^HoYj&x4T)6GdWbIW5gN+d;MvE|v*5HG1lwP9f19Nla`mObOLzW* zCtie#_<)NKrT$sX_O!hck?fRI0E!f*s}~d!DYJsl^bc@M##HAHTAg7D16*Okpmu zVsy0E`foq4Su@Hg-D+$$xqbsbGO8adtTG@2AXEPWnTa;*co|c?7;S^4?_puRO?#)e z!GrmUyQhT$To0}GD&G)dAUurF12uGSOajlpVQb;v$JX0$ zLoU{3|6Td~>dIQ4s40*p4h<12Mn&T=a!fN#euHmvmm7ZKgQG@bC|`{D7P4x8vYoX~ zB))GDDlL}$80*>3N{j9+pI&pG#l}P8fBH1f+gxQYf-s721;Xu|Q)6&gH?oi4PM*yB zn3JP;r@J!0aFUlU&?se+TS_axH=-H~|J%29Egs_`4h-X7-d`0wzbc~0 z3ut;{N_6ts42%AbStKUWW_H=(wMF`Fg&wyG#wq7_CB82GYEpUyPrt))FM;+({uu(H z*YKhw)yybFl7d%R5 zL!BH|8hf})&sLzmmeo%t{dR1bZdv3yA!h7kt77&Vvu0cj6}9>{TqSwg4RZB1)*cI< z+Lt`Ci+@MtK3FJ@}7(Bn|In%82A2+B`4p)^jti`{Lsw>)diP!{UqwYiZ>trFosNHN!C z^e)`RA4lMO%B#WOSCcn|c6=5xuHIf=iSF7?67fr1h?tNmIbtn%`0SMI_Vg!B^Y&hk zq5f>x!9SX6<8!Su*4r9IMIc^daPSIhU=CMF^!sLx7PvF^0H?nN13E& z$jgK8uk$Zf^H#kQpG-dI+_94e_v)Y#Eu(k%2!BOl%B(tL-lILL#$Bq$Pl7c+D)U() zNXhaT#{PbDZFE@BS{o+4S1C$-en|SypiAF@WbwE|8PRzPch&bD#k)Pm`b-ZX{0SUR zDAE&xbj|nT)Aiu_)`^H&%O!QCUu;J<6FfM71conL{=*yLC*WIDM!T=+3KI&;n4c3P zQGaCF0zro&dD;(hG~0uA`&~8A({Bx(gzq)eWEsKqs)->B;yr=o8w}rvRW!j-UZ3ts zc4c=MPPg-~YwvU!3S?c9`kBiTCP698aLX$ps;zDJdS-W;_s{iQ#vsG(Fv2J*#Zk48 zua*#EhD(F!Yfh9*hoNC-)vv!{Kg|up}8r?CB+#Tsuoch&%HXN%VW8Nb?DS z+LHD~k;#~m|2ywL-pRGe#9@|Yk1VK0C>W&W5?`xv{XSY7y`9~2$DzJBuk{JmD4zM| z%g&g*I{_SKn22u-m`CW`vYTU!+RM+Xj+vPbrzev41^Jc zl19xyb;>{v`|QWZHRFq`Z_+prh-#{|Q*w;-@o$&=y7K_oavj+&u#?`zO_G_$ST>yd zeG`r2_205&m2B%|^S5@*@N<+Cl6@eWLi%g?;H2Z&v5}y2B6GTR+s{1zN^{Mtn0ra) zt7J;u@EzEBYVi-wSHQr4*{>_3-ZDBr=|c|uPo6)Ac1 zexq-!Xj&(4-$LUFf#>mZ=|&KP?!^lZWn>|P(@tw*BKA63n2-hfjxVN^h)4 z;5LaIE2KGgFGi)~fvqNLLBWg-&SkW_noE0{OXI%hA8Nb`Di&Uu_c8nQKO+pIHTX7T z78DB7wwhN37xP{=LlzRX1`<2IQzRZqJjTe-^hb?D@};Ysq+KEgPK0tqyGlEP%|#4> zx=K{2V?s8zo{WK)f`K<$W1qC@O{jlW?XZPO_+2YN4%=nAc4H$cZiTIjYxF{q@Eq)gDm*vI-b6ei{%_XozVj_6PwF^6De{eE4uoovWt6q`p<- z`N_}txBFAd)ftm<`NjoA2TiVrp~ovNOELL-!sl}BG6xxrrs856;$;7_A)Wo<>qC3P zJdb61aZWv0W3Q_5==oRuk2|xnZFH;(-KNa*7s3#>=E-Eq@()7>Q{($d`_-QA(mNK1n>Qc?nv zQvc!ky)Wy0aJlqw&YV4a&&YP)v`(2X-#^b+!wxoqcxXCZ%*e~PUq5}r{ES3vx{ZC123k_hdWtnl3Em;f{ z@2_P!tz(_}-DiIefd&76g9H|iuuWD60qT~@w%k>0wq4>bk@z>k@T!l<&ga`<-xGt`o(;gN(*cJn(tfeXJ)@|+&m@RTz z_vj|c2q60?_x_WMz_f2f%jp0o6N##-%iw$3Cv=>2rekXr=wO1cNh!J9sj&PNLB%)A zA(zVi57pJlP;)!nnfQIk{~Hg2=keg~MmJ?5z}yvd2Up7*VG=`bwtqV$4wDm$te$kc ze8(nVug=}hk^Ufmxs&S~I(?r5g#~Kcwn%fNjU>unHMGHn{nB>1RVpouXEXRNfuskV z_$QgldoR5D*)k5!0Nhyg|E~Ic^8eS^?!X8Fb05@rn83U~1nll2xG?Bx9ELWIIC`J# z$}t@o@~jWzLxGuyeIe~tXModd-?RqXjQd+fu#_rnA{5M~Uy`GQzP}2edY|X=KBvE@ zuh`Bhf{omOBA@Mw3HE?842mhvJ~*bLN;N-mv8f`0*+x=U6LPHlcyMw*w<q`~+8x_M1;#qa)ukc}4N5DFZ^p0!6{3yvZp36k zz7ld-LtxLxC#MxT!;I!YgDYV*D-J6$=1!#su?jASA>Q{rBs+&}mu51UL*TGeTz0ii z%kbK8JLhfap;J6!+=i_PyHq*;Gq%6AVcXj6$Uj0I7Z}+Y45) z&zKx@5kTbWacviBasV!{%9mQaHIL6_3kX2@h2GH&>OkhCX{`h>+l1;Z$1^J5xHv2e zF1OHmiC{k1=HBmE9kh&tL+6=L{t0>^qW6c8+g~<*3q~!_!rNSbj?3szX_C%I_%IW% zv{bu%SYmIU%^F9 zqW-p4qnvyp#TN*rq8p~762^Qa zYpqUs>whN4$CVTm1bt2%L~KCj0}vax-7zi%CZF$ss|cIT6DIwtxY`y57?O2vzNfEr<5>on= z<%)L1(MYtL$PHJz# zA<HeuP}+bG zx1+|7{}#Uev~P)f_irCy{_h!3QvN>T`&Ecl= zO=&coIx@vPi3zGhqTGsUIz~N2p!j2lc!3%|#o`VYQXD2yoa)=!v_1b4=BJ;0?@3T7 zue`A+M!c9(oV3HTS}|+iFs`x(IV56$SwLdSHRZ9o_xGWSUb~n=b*70?zl^H_VvBen zz{9u+Vt4J~ImQi2{Td~8@2Ufi=+z4rPa-C9)|!+-n!L#Y?*=;TUqX!@+VEM#P+si4 ziW`zZp7IdJMa@}`2mMw|Y*?5Z9s^_o>ppg1L-SjS*}dt{+0dIf=|bMDY5=xsU2#bFKR7(Roow82#f2X31w2l_KiD+omQd<;*_6EplkU}?0DT0Ze-03nAk7F0Tb{PbVvct&F`xAj0^RS?W6$R*222V30o;Gx|9TbZEmPA5 zfOeIL>vqKC9uq4oq5^QC3gAf7y7_G83vd)b)J{@a`Q_x_kD?oYCq=#iJd0!JF#w;p z(9qH8i$B}L1EuY&TfZr5TzGa&x_t6z1huYRAJC$1z89Wivy>rV`|;{Th9v5Xkz{;_ zLt}@H|8~woXuyJkgp5n|a?V+{R+hhq9Qp^>?#Ym_^0BcoK>q6h$)7Av1~LaU9ajK! z;MvP2Mcatq=fc|VeG%Zv05tQtkg=wwWUJ`Zi!vfR%y>pW?3?(5vN~=vqx0Xcf9XWU7<@Udr8UMG^!RKx;1cePmgB=hLB}dLz zweAT*@LF*h%JEuj2N)$_b`Q7TNyW;F3T4FlE%0FDPF z&9li5k8Z#q3V^YiXj#zqf!+fc)-bw+%>p&@mpTVd3+(<161}yr>;EOPgih z1Z&)7(#0L#Nag2_E;s|chmd9r>e@^D(^4fV)y91pj$V$*VLfurt(N-nBSCD85jEJ2*PH z@R0zm$@hP0b|v zkzRHvFxUogL2O?xrU6I;8;yMSv7)Lf`s5W1GQBp@zqYH|IEGZ)YO<;Rr~XjVGQVptrsg0M4y4QP)Q<%Bce1z zP7Mch8r6x9s zxpGBYMrCC)b&ZT<%@QL-eYW88wAe!nc2+=E?|0oznx~FsI@163O6;{UzN)tRX#G;8 ztCo*;u~g{9c`*Yy&kuD74}-^Z?Pu zc0~i1hG!*zuanv=zO;lRB8FM^1t>9;HN1|A{Qs2#hgl5M8rSR@s-%4;&%y>+VXhXQ zo}OUdlphNBfsH(HL2CfOVRLb0L(-Q8WT$BL2NYqkn3kX?_;dxkeAAh{6IB?sYZrbl z=Yj>$$u&4i5*?Px;bvHq&`zqF7={qS;sI$jG5fcWn4M&yIDWFT`Zq_eL;cRS!X-w+ zv^+U-N@mjs1CAPpo7>5Yj#i1CDKY5qSnhf%Jon?q=RY#5hkii`rEv!iVt)WIBnnV_ z^}PZhHOzvF3UXx4!)f7X1*hKLUVX>*P>UtT!BQA!$Fkk@0cvm(z2gMCVPHZNpj?^+ zB`lrMrJ+U)?g3S*xU>{N8gxBY`}}cv07&Z`Wqe9|xDd>kKdu51{TG zd)@)sB}Sk3PKvgxo12C1S1=EOykH0>Sg|dD#mRH#1pZ@GA%I}S0WgfPFAxnp<{+M( z51`sS+t8%4=nuw{E?4Wn6-V9a!Y{^n2KdE~k%%;2$tIb68bG%JRh|BG023{Jq4dD2%!E7xNge=mNEO_E8z=O7nd{eq98+4CVU4v zXQ2E(qXhs*{*UQ5+Rral2RP7DC6FetxX*yl`W+CBl2ii>=Pyt~fE9jb6Z4UL{S_;y z2W*eNQ~_j;JAj(7>NQ1lJgoZt9S4vW07L)@3CJ!!-K{?XuPU-X$pJ{1{YRjJ6oKpw9PKhiKD-mR2JmN17==Vlz^A^raQ6Y^O29z}g{6yUp%wf) z72(TS01;4rBaT4G_f|(**ZmkX97ix49JAZbZ#AuQr-X=0vx^47{?h#Davoyj-3z*p zGlTEmoZUryTcN+sFAE~Ht6>Y)Wu^70?x=@ie#^V@`*&8zf#a_!7ko zS0Mc)hDpq8Ply@{Fq1S&GsqOFFo3mIUqC~U0e?B0#||HmQ@4PTzzu*H0W~qW z2H}lfC<7FRkKo4`i-^dxm04**HF#OTN-+Y!MDa3^pEY4-RKXqO;{H?AdIZA?l+)6F zr$$Fd1Jvy#7#CcJjF12A4M)KezbVTC(6<1138Vt{dP_&|FGd~uVA;ZJzX<5Fzx$Gt zlL5ot9lPX4lB(|yw7#T=?0Ct&tL)2R073-Ynpow?ZK=yaC+2XKY;!_3Yhjxa* z(U@t|!i^9$Z*{ha;a}ItLV@vPu!E!Kzrr~Ksf~glFB9XoyOFZ<6E! z7Dqf(d727dE|DQnIC{BDNx|wctCvqWsXUDgXH1!(WvPz*@d$T5GsG{`vPO=*Z|HdF zvpf$M*Rye4Q&gPtp@pv2f(P;Z{yU(Z_OS$T?rbi)>2kfdo*fK<>=QfTHeU@8W&jkS zx&u0bI}5PmyT`WPdJC|&z!gY6_yzIIv&gQu5Lt)5mexE#|I^47yb>mx1eFpj0JJnU zsYe@D9i4#mZwih077|Q1|6bQ5Y^vf!!wX<}_*VhlEk>!zA6uyrW~0lwWU1v-A|XWa zNAZWMu0V}xRe3#+OF8b8{Hg-z^ji|*vJIb_ zLpufn{~^G9ZlQ`S;91nBa-Xdo84RPHmyo>&=(7*cn+kOXGDf+vzh!^&@OEVp=5tLh z(G9pKC{G!iU4k6&gd={rd_gZUlEy$PXJlaD1X*m0vhXHZ&A=WPk5~%fU%7^~8U-rC zr-IMc$z`Zhnqzew4$SGj>9%G*iLxOi{6&WX(->|nrc|s-M2JvC4s=V#lSbxL`^De= zO{LN|PlHg<#zc9S4vm6lC zdtLuh&fq0*!TzlE;?o-#m^40uUW9)#NT(HSm~dtRqAVMw*1v2=NujwzL>Y?GmU9yB zG4zpYCtRp1gVbl`;s?D_kx>^QdfOsylRVPeeT4kUJsjdM*L z2~;H!7(}J0ib)4>58HBu+qfvf5O( z&S*X(Q}n;rcA2bYUORXmxn?;mS{^o!j`RIuSz2_Adyw6& z9*@hG?a?&|2}NHNetaI{YgyT302uAHMGWP}48eB$FS=$%W}AvE>%n$b_&T z6s5u?WYr=H$Z6zFy$WO@pWeTr6327(!&0k=*IhGs8Cm{g63@_~gpqa=^HT_i$3U=S zt6Mn#go#O55Aqu1zuz0!2#$ndSREKcO_dHxI)puX|5V)Pjm+k;q!)6cgsbtHhe+%? z@RgYLQDCDde$In`0Y<{K_2Xh+U|xv0BqX7PLWwHVc9qglCNomV$iDGutQF=4Yi!Al zS^PUWV)IWp8en7&epIQ*K_KwDdA>Jz0g2AUp>a=<3jEE_fltFMPtR~mcgge`af$~U zb#U9IY>SRCqI^rW`}EF?uIT(VWA`4MIPzdyx7WSs<{DF@z%l~T0~h-kzI!wj3+sQc zF#~#20H2iSG;1KmsINdWpc1O$c5g~JhP;>+-6hh%%Id649$^wf;FNd$;cu?B?l+IL z-7lRx7sI0m>g6FAs|F^dBw8SAf*(v|6R>ZJ=2B9AhXyHX;PD_TylH}N7>Hs**$b`D zf8IkV>B!iG|2{6(Tx7o3gJV-$!M6yVRy8@PklOd9RgV74==Q61Uld%8k46#x zyG7egU@|9^9~_{=$VrN60^4?&vXmdLjvaJ=o0mmyJg*y^u33e>ZH+})lf1JO&e_r3 z+N__}k_1I`-?DTXjVdp#O1pRPlm^%dIHcpp|MqttgL}U5=M$J`qx#SIROjk^1-U zJ4k05>m5{V=*STapBUm`v=}-f*Xvh)*QIeM4%yR9zN2;>dj>OK<>)@8?Z0=I7}@}5 zF5sto2N`r=48+P7*uJQwCGbQk#f!e^Mxazf!4a2KwqGyOxTDQEIl2ohIn=r(FjGX< zZGF<0iXvkY=Cl~Fds7qemZ|x6Zru0E(Zg_ATKgxwpzteing241a|W0~H@f{0My_b=WJ?H{RNfAoUn)iqrzbemVmkz!#ne+6z1r6nf zV7*gqa=IQSeN&r}n6uX4=6U_jp~fj?X+lp5<@w=3KOS@yRq@nsjs5tw8*vh~K4L>0 zA)1CU6|P4zQSx2pI7hra6DETd@^cZ6`)8TB7y5AB{4|dly=QTp4}MenS^GE3-_x{@ zb^opAT7*YlhZl&wcH$y!JbesW?Tr2KG?mFPKI10HbNH_AA%YiUS$}QHT@$UV@V^OB zRy=D6tGLyysD9hsKZ1FtYoT8q)^EfnXP>SJPj|14u=)dH_^F~n!IcYLK?91&b%*Nt z;JfLDMWac~$bgn2nOvvgTHGRi-$xG;)8=-*K7Ls7@$NhDXEK#G(hyeeS3tf{UOFwM|T zyUxQj2buI!Fjc`!ezAV^n7JpJ3MCqAKz|*H_<*&(mLyUZW=kkAbV|u))A8@1?b@gH z>jXP`g)hX=u+a%&==4(PQX}WGE4HZd`ya+16ShAFVNORs$Lb2vx!cZv(P%vRcKEb^ zEyG*=v(f$dYswwkHKSg#S-UjDlCawy4O#KQ$VZL4%pyyLE|wlaVzC+&nC^kli6&E= zEmf&q%e6pi(1s2YpbkXgV7FW|IzJC`$g`s5X+n8fy6qatTDt8i&ziRF8KEp!w~WyF zT~2OdHf%VCX*`=z8iz^cI!OX(M5^@(C~Va-s@%+kdP~JLHTW57P!O1v{Ip-;L*%%J z=CK)69N0Eqq@J0^=5wKUvVNF*8uvGtDo}%@@?>0bRd)26gjw*Veqzv%c9FygFQnM! zWb5{ml8eQyr7C9D>@}gSX&?8BHRJ0vc^kKbOaVvdLpq#_^a_aN-{n1CBs8d&FYl&! z4^!|TwsUa`VN45W{LcLc0%o|Li;hlE<2WUWP%TAU=^;S1*90IC3H;_@MaMPIV)6Ov z%{*IUmsTks$d=yU-#K}*8P&U~%`{oYXf*yF`qbhJ@=wOiqj%c)Oyyj&uYTai<|Z)y;&KH|NgN8n8UJJ$NacL9~X*npGsV9 zy%{y0VxcXx;g6z6L)s*KkRuTYi4cT@o)HBXaXYXoY_!k9G3z?%lLF0}=DL7br88I=}@yHFsCpK-@5m8Ng?C?Q%(*a#T;p<=`s7;WS2~!CT=JRu(;NLZ7^j78P#~*&HR5Mu&u6NVQFyt^-PlIK`jp z^{M_^cOyh(^;B_kFOGkMEaE23K2LjFL<(vB%hfiujB8}2 zQPQmC(xo1E^@7lB=N7dz_`2Jri%|Il z-TZBxwI&bFzfI{C{dj00?2-6*W6J+3ujwM6Q7fUO}2xLuce{a%Z^7$cvYwnh&k)N=dq!O65fT z5%kc1ztB=_!#p7~6KBRmh<)tqM@IL(U34CXEO9sHkQU3S=@IlO-Plb4jeHTBZ_PfQqr7gehbOYbb}BPsvl+qTTdEtjL-cWgxvx(1E-ietz&6=T zS+KpL5@$V6RAs;Ay+*dpbH0nW{ZONvjl;!THw;QESpG>rwcz`Su92 zHrS+%WrV!J5I?@S&pK5a9v-)^&?Fj+or}DWjG|A!2IYv5EDQYV9A@Fv^KzUoR5?7< zn|HFqu_w5j%YMOl48wFuNrC628G|Di;DTN*g#_1KEOX@czU!h+WXnrlnKyNnqoRg{ zP0+RMraSD#`cz!)@~)CwX=&%-`r8ea<1BYY?yJgN#`GKuM1jBIUt(3KpjH(I_7)=| zl@3@itWpQ@FePsm<}sE>E2<7`MOowMq2oWe4d>jIxb~~1kr~kNx#qU5RbL`!TN994 zLRu)1I49-Ch;4LU!WVuVE0WfH&>OkEqslVYy@^(EeO+(;`i;&u;TrAYaS)X!(_p`| zrF!8|n9!ewc9y@T1)`gS50$IDDw;9)NN76h?KNM97k<29RVC9LZ8xb=)>V6wP z3Qry4W`SIBII!WS<~hqZaXS>YGaJUrzU2m06r$$NLxPdRD3Wm`OF{q06MrCon6bRh zx;P_5#1Pv=Z526`x?S%pe6?7;yj%S*RZBTwP-t|kxmd-fA1WX}#bc$u&+PFnHCrH$t0Z{X)p!cJ+rYEp44$m5T0WZV&TzgeWOB z*;0kKOjb{5&zSJT(~kkpHfy{mW4q_Z_@ zMqA8@I=t^uwi7;Gg>yS*50^Z0zYWTBp3AIZ>`*_xtq`I_x1?T6CDi9OjIwpz^7Z&x zaWH>rqUe8F&C-!Cm5f~8VYWk5MfF<~*De^kPeCL3g^;8G2aZ&A^wR^ZUD!Oca4EAU zc5K+aYkLV3$tYS?cRjXYOc1$y)y3)AG*#2i_(3S^t28Cu+YSZSwZ(_1;R@FTC!oe)e09~luN?JLmWq0OXp|Z4`x-W^lw`JFn{{G4h_CoN1*)b8FP+Yej(hQ>gBEkyS$)xqboaWd3Gy&lp&*8|mJ7t^Qy)#g1$C!QdFotN|k^{Ne9wjHE2pI0-~95t7w^ zV|yP^rC%^xVwh{b$GG6fC|HIcnkh8?QlLiW|CeZtghg1$X7^!8{$;D}kCWT1bw$r2 z(OuM8qy4-Su)Xl-bg_~9d}BS_Q%DmYn?m+Z&vjpR_#=*mOy88MBwA&ME#?;&dC$V+ z5jC1RSl4%wMG8MFK7za;xfIBF4e zKc<91h~}%Robon*f>{}`G5ZxPaudFN{%%Zohk~1m7*&{A^*Xb}uK!3*(WS)5YoN|s zZolC*6%4DwK{iYayMCHf0znc9&nz8-)>JU#-n!A>I!kLc$jgzB@w$I7q+W>e6Gj(H z_S9iBW8k*hF0M70e@2O@8V0l4R+DSL%8bjD7M?10liSearx0ZxeQrVl6G`6^yr3aN zTB{=yVrw@a6s0!tei_p#*CwehJnp+tmdcI zGcc)nF7)W$H4&4mt?**$pUhjy>D#S57N$`~`jTfjs~SJFP?3w8&QA)2wIY{JRA6LkgI(0!4C z9EyI#Qu35fB3rs5fFNNHJ;Uj$(mn%fo#7zn-R)eiiS*<%bx(O2UKaO4r zR7=TnapQ9}Fut<(>SI|wQS>_y?f+18^I6xr`n?frFp*ReX~7KfVUc8$v)HZ6_>c4% z1m_()ZUlBQ$%W?rXps&6;O9_>SbLHoMFa@yE zAS`8%Vz7s+dJnCrf#2z_PwWHR$Tuwf?i;gf<}#6nby?rcYjS-mle@ z?_&@X_c+IWpbhqpDA|lC_|N6gkge3AuyTuz%gf*D_QS4I#z13`<37pgR3@q&zP7iP z=N-1OWCG;@&nTW?bT8rs2yt=!>JDSEACzqxse!LDTmpC6J zl(gcI(HLyLtz;hr!O|2zv5{$It^IT&XCQ$GW6Se2x#kfl17X?JZ~0?s$16-xgf&f0 z&Z1sSEsKL$_ZgXzoL&k`vj~+EbL53MTcmn10iI-sIC_4#DE74PQu6WT*EmwwE=M02 zinb@;H?&!edlSMI6Jyy7DbZ859Ss6Io)>?9PJOG?tCj!W;qtgvu5dl+oStoRJ%5oN zS`c}k(J3-Z)GRDv6ob5vwKbKgmca>w?l)UF;h$S8@yz~{TqY!%djb{8a{@@(^c=LT@?j3U;i~eWP5%1>x3Eq z#`hx`O#7fd%X`DT?Cuo3;CXA_214ed`b-wfU^x}+lFa>n7jdU)B1NCqEDNR_hcO*L z@l5PsJ-@^>*w*W~OXtj&S!|$>FpezyWpq2^SkR(O4?cYt*L6oc?t9XADCdc-BK73v z`N4ccD2Az+NV7Rg?WbF&{n{FahmHJiIL~(iEOoCoS`=- zcA*{AwcK}NJr8dw2S4xglJxx@b(>s$m>bA+{HeQ5yE*C{-L$FMy&iEeogg$J`0gLu zRh*>vtKxFvqPNvz8YK)Y8fnQD3X#d7w-)>e7VYnM_y<>ezAc=*t56|QhPp@QA7-eZO8gs5dV}srdiVp$Yt z;qm#5WX>$7vslI$4eTL{+J@^BAPys+*%!`LzS$|6hY%_vZf(Zn#D_-pZNyVE?8;|L zow%bISToaH+YvZoO`3*u0vvEJ3}tYL&t_)+g4oTk=|b^iRBp3W5P zQ2DISZs^hsE2UO^kcjHgzE%`0tna84q?*stSTfI4`6JmT=G9IM#qpX?>u5NBC7Io- z)@**;s&B2m*9P)O95^f5DmB9S=wkR_lhehn@T&7oL^`s1FA|*d0emzI?$=}U$(XXa zmWhfllSZMgY67B#OM&vDP{Li7>loD6l0P^Jrr+|_@gyzn)1H|p9E|{;qv6(E?WJm^ z49dr;$H1A$Q(b0DEEQ&N_hcd5RStAFT>fh(E=8*!n0#_eT1$P5f7Tdylm61*nOKC2 z*R*}Osl$dOgf}e}uB>`$nGd1CzD9e;f{Lc@i|KBNYhL}e8qw?YutF8jBBiJlpSl#E zb|=M(NppZ^gfhiQZi6q_B$K&h*VUM=&Au!dm27)m-`QCp*aZ1KHqpA;0r_>j&n;iG z^~v@g>sbRWrArkr4&4phyoZ884CAc|+TFUYj%usC;rsTI=t+E8gxK#ljxFih4-Kam z$G;F8zl!Tya%aghy=q#66_S59Vp=C9GdM(-50Z@gbWSQwn@s zeJXCnW5tr`oD&n;x>izuv0m;SCG3z(*rn-w3#02!@w*ikZKEgC{oRov29vv+KA*xG z#X|a-ggAIJLgA(U2+fz>xTmMg&*Gf4A6S}G`D+N50|;vnX=z)LS;0>=T@e~b9KH;p!}uR zx%x(TU!{sS4DWg~0|N3bSM5n#TYjj%v&^@eaT0&VrX8A!g4jpH<=d`kRvqSKi-Acr z<<)nQW^MCjw?hxKu_O+J0!geg2?zw=XA2E@LK+w6K2mMD!CN+>jH%#SaOi&S&m0iX za6x~Yj$V9#{Y6eY1O16Y<1(4&&&Gn3Bg-D`R%ECFM-j|+Ban83NhCPe5hBy?Zs>5M z!k=oksI6h~SZVhibaD8r}rR{v}`624v;H8m|Us#5N#@ohe5 zN!loeT4upUYL<7$oDuLs`0Kjc2h^?}_#KGt;TYC};?@{ zDT^ZE6aW71G#*Khv9UmJ6PfM<&`QmNf=$09wCmpa?i{@ir{Ee;$M+X82-MC!Q&19aanKHMZ z*idVjK+V#((yDgYUcF47QAs(@n_*HfEp~K!i#fY*8?~L1Li?y23#FYBsYdoY4hp|1 zYt$j@8Jv}U5#;LS?{&eV4^+|{SVJ{KvRIIYQ9TKy*N+48jS@M=RU;CriFKKYCd<$v zv`+^wHAciAsHR}V;wLDztrD14Ig7b$pOoKNJhXlmO6x{xRxanU1!d8+Zt;UvcFFmu zTO%Hk?H9&wnlh(Dinv_!xsQKGT^+3q?RNJM8Lu|#Z>hz^i&8M+iN2guE?j6F#fl_N zFf{K-jfjUJkhm846EQrCwaVae)&a`BVAl<7_5k6OtN zsbMwMrpu2C`4e%F{#otZ_2SI+;?}pwzKvvr!TQ>|v9P?S21bOS>|HQ!1J=)W`9jqf z5`)zXk~|iY)Zl+Bi)$|juvcZQc#W66IUi|28~qEO9Xu+i;_{kOh)HYayu{pL62uSz`!ydCRxW2W|# zr=NHOgD@UuD9^|2B+q**wdyY=IVzHZdB`zF{wxn8Z{)0MsEx;%<>s<3NNCe0wE?7^GDT1aJEIv=lmg1&?mxU{`7% zZA4o$i}#6g-yooA0$0g8HAA{9ZjL0SusS_A<|3u(X}!4M%dIAFgrA_j^frgo^Zr_Oju|qir1Bjv)H_VO zL}@VEvJgW3U5-srj!k)3-F!Q3rv#6-UE#}b(%?Vb!9C>24F(!|Rm{=8r{PPcrd<5g z6F;(=Xd5#IOu#}|)(#F)tPSHT`8^kM>e2y$1`)p$COpG-ggJIQ&w&o>=4jB0evcIE zmgTX(VC(go{ML%X0ze%)CNSVIOX{AST-6Fl*QB7 zYtvMzzNjI|`HLr$R zY*qRvp^X55W_vu7XPS41Lf1;Mkt6mi4&z;V=R^U%pEK5RJdX! z!~I~yi&so!*p=%SlKZTW%*~z=Q*4Vc@;7{XwG@wrL32R7CcS_@h`gPdEXiE&Z>hRH zEs{azIKG>sP3l^a71tvDyIzTD=EcJ1r0-HAY3&iK29ji~*l(@!v*MYy8cA>LVW)i^ ztP$6uuwyklTz+}tEyO-CKhhwyCCX?HQqKKh!ndm@>y0c`DJ1orjs2i?Ur4uJG1eu8 zN&O|A3;Cgkth!Z&Pu-Hm7E-}F%rrW|F#KK1d^^=5tsF1@lb)Iy^>$=2T=^w`%Ga69 z22Q2&o~lWPaWJL6WMCnE@{q{$`ov_iT7~qP3yh=)-EUZ-z2g#|LGt9 z&AqeyBZT%srUzz{wu1TbpL6GU$?sO@U$i$2*`70Z8y_Y71)o-N%DU&3{|>*^CteO@ z+|_}8p9n5lT51!S^+1BmSzk-@#m^T7Nadk&OEkYxn+r4z4NA0h=+WW^2%JM+8c=A@ zuW0Nl>5?I=1o2JL*_%xM(0)G;k2(iONUN4n{Pd=1KGV7XrQoBT_W^3%TwanjJyoz% z!nR1mw(wrnkJu;N#{|E_KcRAvR#gVLp<*~0JKT7YhgSl@B|5sA1&oYk-)HtAu1Eym z&F!k|BJr5e)Y1O@)qzVSC!!7>-j3&!!-NkuPp-ggyOgkdP`}H+iiUMvD_o(twA856 z+@BZT@KuQ!+e79hYJ7a`ie%K+_~$MNDsFq|fT>LgqCVu*PLH|%XY>r=$WO80)|E;9 zlkfl*HGEx(vU5et!VtD3R1zAYa>b6oTI2f6Pv}q-66Srhw3(x$ zqm)!=X=$krb4}bvsa7SMQCE6a*5RK&>j3W>M4rj55MoaS5or~0doaY7+a*uq{~j!? zx6EB>yCL1oa-=*8A|>5B5|usuB%qFxmshJ?Gm-i|5NZX+-#Wl6JfZfWRXJDa7MObi z8Pe`>pwsmekO-rq6p}1MpG%&`HpO{LLrG@8DvPzwwrl?A>zu)EI{Xn+_Jhb2sm~~- z1T?JE&>7{VsQPlrgNe6+%PtqVFx!b zDFDL%H+6tx1hh)6*Sc<_hOix6>k8vB-OKeeoZ96B-uXVl2_AWidq+-l!Ml=tt~1z$UM(0(D{F*w2{khvni>g0Z-~ZD-?5BLaIdK&%3_VIWx#jHY_cIe%aN z6Zi&H(t$M0vpofn?BCtq&IZnu~zK*j)S zs{yTWLoS`x)lVByi)|mpv#b2rzK2{;aQ$j~@dC=4Z7#V3!6=X`R1EJ&(%b+4kwIS)c_cu+RR_ zmuubzfm%+za4K8q{q*Q)CHVUPIRGV5^8*=M0JA53=hp@#&C77&sVKrLa`=Cqt$*Yx zLmr|(3Z4Bp^)+BVgd4_yi9V8%m9=J1X#jxbBcY4nEgb806lrnf5;r$D!4tg$eqhpt zn`hqSxCPu5-8(QUPoYP3eX*s;i{IH!t4nm45mi!w5}p8pND%pj&jYX^$j-^}z1WBH zW>LsHevhXB4+cEoP66BJ_p#b1OhJ*RzF2}6bqP-vsW-EDCXof?r-c)m0pzE6`1qFQ z=ICTXgeRi1iPXrwLqp*`CM%s@uikx3b-t#={^TuWgfoI;4Mt-YZgEsRzpli(eh;&I zM>etQ)p++(jq$gMO=sZgCJoYP&u99`0aFBc`2L}x(vp&}kdV)=@;`DX_s~MceU=Nyc&0mZt{Djb?Pty-ro4gz4i3(8v)$gt28@RUa**|WRfZ;`lAIg| zaf~8uQZBDCGyY$1p?BW~ZACp7p)xpdM|Qkf4^AE5ggvkt#V*&`(KRaPz{eOKMsfVO z(aDUjK$`?!ju;&gKMH}x%*+g}Dk3WC84T^uj6^Nx%16?y_dvaNWDJ)!%9AyY7`&;A zGYtdO&ztpDL&lcvd9%oofg42?iVe(--&f^QJ8>-7%N2ji#*ERON06hQrD1jqg2iyNA;=2mh%{W$snsEoiRY%(*oGzbwTfG^gr zSr6DD?C;B}+>M#AgKv(98Y*8bPIUh3*DoNy85tD?UKBWEbZ=eR}Ixt+kB1O393mmNeW2rpUMU4|$z}__E0e zS8Dx8V&8upUb?$H0YPFQ{|)8L04nHrfM9Uo1`mvnj5|H60rp;rvYZ_A{q4C$x#rIT zsVG`8F7xYK;Qr(ToL(#&^LSW1UPxbpJIEI|079Tr*}(>gSYQ{(lr)_C6#6k+5KQ1H zkoW|a`zW3{CWt9>4%njhBPQ(Zgy>-2Ju%&*BEd8<)6=(kp8;ENbTl+3CZMns>3V5BMTWR6IsnfL?II60u z&JgAc+B+^nqEz*)L#*oGQF z&Vg?D90&s1ZGY+TDQh9G69Zu2O-e$N%fo_!*x$XoyF0y;R~oZtotZHHs*nBqUARu* zfa#QN^UIenFFXKWa6jxFC3s~40UzLRuabOP8U<`Y;1Li2w^P_ooH!8VH4|XTHJKTS zxB(_TAVL63nE$d3af~H!)C1LfGBf@kB@4^(ho?JZ2`^)fdYM_A{f4x0%Pd9O*GRG( zz9#%|CktzlKc5pRP{@jj1vXenNJt=q{{@jkn4FsW>~Zw_^v3V81aHo>0cgI+cFM*D<~-F?G>k^E8Y9R z@gfuf0fGJdr2(D?ZH$v%V3z*&#@_C(C#d4n)9Q3|bp1166~&GotUy6XnY>f{Q>+nc zcWE9r`N{;^E$_%^PXt?i0PYqlPJxS>HNQbMk7To$IlobuVhnt_RaH0k_ZOQG-@MGc zzrQazti8K|nLunD+;NqnRD(h(m=)qtFc2kx1(&EO%!72LUe7BnCnz*umfBXl7aOj? zIA4{!ecZb5|46#-c&_`eZLbI=S!HA=3EA0%%1V+=l0-(j5oPa9p=7J9viDwzA~UN* zl#z;(WYlwgf6woa`}I8c%gy)u`MlrfT<5y3a}MnnzfL8ktjs*eDhz9vuDeJy!M^vr zp57b&(@LhQ$M5Z_?6O51)ryu#>(k5~bdnlY`@CrO9zE~a$EB#4kui_e&A2)&rll(U}o0^I87w`6RtO!)yBw?O@%4iqXX+_CIm~WJ+Et3_D!!n_^}t zC$YJVdhcHBq0$$=kDJA{`S{6`
    ehrz%$Yp(QaJYTncikTlO``w5JI1d~kzp9cn z6|L-wtq422yJ_$2)mMN0+Qk}R#E{`~MJ_6`K=i>i)YeLN3HkF$Viv{m<0XDEe2TxW zg|1rgCkZjpxN)TP*dc*+%yL~+zOQo℞sP88~G*bkV^D#E)G7+KIeLJ-k%5gn`DP zbeO(inxa)GXLKf7UyzobzT}##%7;*<9lfh!rKS$K`-z_C+;GKAGA)hw$VsfeAO%Pk~WQxkD%Epf$j)OVmyOYoQudL=7pEmkRz`-%tcaP;mMt!Bt5l|P z*v5v?#ij6PQ0KY8Q7>QS)2G!Y4)yjQVrDkJFtaH=VR0(c?CYP83kwTNOA|70f0He_ zaIhmq;Zq83)%SP(Bop%UP1Z8FS5s3{QsQ!KQaAnbP}c4b7ZU2b`y$A?-6GUUL>QY2 zjEi=WnE*Q7IHc41M3{?QJ~SgDU?Je?zz;}2LXm6=USjn>p6!NHQ&B0m8(v7vSZ*$g zl&d>m8~eN(;;r=bbi!UvzaJUTURqq3c6D{7+OtPZSo4(h*s#gP zn7J@sZR~Vp<5S<1{$rzd$#zt#^fRVoU^<+mf&UlYT1DIW5A=6#HxhjsC#;Z2>r=$u zGn|$WKHpC%DJeyDiQB(sk(QQzX*PhZ7oOAJSCJB|xnB48-@v$*9y`;PyLZs^M}W6T zyy$RW`Ow0t^Wx;aeI~?JMwd@dM^yO^IkfT=RWLWk4DZ=G(7b2Z$d86|IM8I2^l70U zWAWTWhw4AK&#kvk4gOo->_1cWYX5q7w$)(6c(miFr>~(p4~dgp&7V$$gX^6{B{rJTFJHeVY|pMV{hI0bcLES$tG4x& zS+%_cm0oby#M#9biLTKbVUDsaO{}|PTiZw9))jK@uc~NvtkFB?6pU?JN>1+>&E@;5 zBm^t#9c_acjP>P7y$yY4m&!UPF_gq7yI(hh+CRFpvl9i@36^Md1`;-@<5Z%>|0G)s zcB@vGlpN>vrJ<%?nyUGZhDq9mYZzIBhaMFfiIt6}H8<}3nD0Rg;>Nf4+t__^;p>ve zh`SC?;@^|gh5{$5oHXlitd>ywNLh>~u};n}mL=pat4$5thgK;x@!MPe3Y)xEWSr=| zQ4vD;uwG~Rz+ofjVQLSXU{V%;MGp-p1W}9=s+`8w&ad|AtfPE=V9EmA4f)v0+bn; zn8-@>N3ZQ~&NnQ``oVoCp#2SAHqs-<-tG?{{;mD=4itN8Bz#GPxk-AxmN0MU{(WyB zAGJ}|bJ!#Wx{#%>%3zh^y?c2^!hD2v zl}c=C4nf91$0r}0^;P8Djp+Y;mVda(=Xa&4GZ+#Zn}O4$-EQf+DT#?WqmG~4>uPE| zFL>9OW>MHzTzYJ{=B2={5}YHL->mkX=DDPEP6d~9gJHq?uU}Q!@y*FtCNVQJqil8N zg1kY2QFC*%_=DnEwZHSk*K->ieKGk4ExKeXU_n+^RxmkbUq>lM!UvH=+oT!^W3T3V zNagKcj{He=BQ&%fC?fy+_AbIORh7H!cKONsI$g7_g;%cYJQhE8tj}7k_R^Z<$KaxJ znXL=-Y#<9krC^idphI5wufa>^2GG2WnPcaW^{lC%hfF@q7}ftc`vprb{q-#oCXAe? zpIo|o_bz6bI9=#uExh8X^Ei=M}WZaNSYiewqclSQBLGtii(*a+_4=?sP zN~N7-qdV0el_N+Iq>{PUgjz0L^u!6qM~$sZLCr=pF<8p8hla-7+#DsM@hv&3zySwI z#0fGLpx3gAl?4?_ivK);!I5AB`~?!UVlnsuyNhW%P#=6*SfE_e4KZ%bHx*&Vb{Z59 zF`P<&n|@CRJqNu|jcw5Bx=+WxECqFd@M!G+e2(3q@?{j5zjRQ@AsmGy2;W?{=*YaW{=rIHzJZ~%{es0-qU$} zmjCD4TA6GVBkR)A63RJ3R8ut^4UDM0t)W5w@{d%IwfF-=Lqq&1s*PX29$##|?K{#L zHhBH>r%zX|T+x1Pi1)|7M`TAIPw(MF6f>rk4hAQMSz@%Yiq*g%At)#aIcI+XxiUkr zWr?1)Hu2g%GO4ZMK{nE*B4MRqP8@Z2_wqr9_20k4Vl@58ETpimT6t-aUGA@ktzhIn zd{;XE3qyKxkvaPI@4Sj*3dJT>jt4qdvK8k(e%$)~o&Kb3 z_QQAn#H8P*o6LS?(~3wiF5KYaWPG_y@u_y7R@7Nc7LtJlM~7--wqAIaWqY-$>`1SNtn ze#ntXmkRZK)bVCz-dl?U+=`0LPp(BqM7Vi+(gXhK=;X9u`+5zk(})pa;C#L_$;$h} zHKP@HMDSrcIxd4R3nyAUcH%@CsIXNL(Uffz;vGUE?}Txht$eIk0b)g*Fre!MJ1c zk*D3<+@x+|-4-_H$dB}32?AGI>)gtK0C`dm(mUDfV_2mnAV4W&i39F<9@EK^l1P(O zId>{6E5Rq>7o2-KJUC&ht{zNhwg(UL%o#Ojpeb6UG$4y9 zZB&NEi8W!(jCZk?8XIr8nwP(S|A$#)rxH%m zzUh1YdIhX3o-eQogY)Wlz<~J7Ms*&n$ckvQR^H(-#t<+Ie<_Nf6JRrEuC41zvh^W{ zE8ilcpU3=*)_Ui6n`TITmMwNY`*1-Ple%nCoWF1M-rs}}F^MS=*n0*m%; z|J45X6p48)k zw}MT9?SCg*{93}o!uq^G(zBYjx9t)JFs-&xa&vQk`}PfGx5KbXCPk19q>5u^)#}~# zMMYn*-2RrF#aw6fL!vj6_$`=HrKYC#{{4H*ws1MZ(mIC#GmSAJTM5sMixL>zu|gIP zbD(Q*%+lu=H+MGwjf!i-{Pgs3rKMt9&##fKF=+tmAVshV>yiw>+|hcqHY^B_Z(> zLm^H_-JE}!;QDd6N38*5=pa58zO)*T^h_A%UH@7Mh8)Z({D)C!3I*h}L91Bg?d9dA zuC9*t@dJNcy}kcn#0P^!Tx6tbH;*JrL`N55nwxq(*A;aTb3!5|m63{| z8YuNM`ODyqn3gJ$9R2q88M}owc1zSoKp#dKqGo$i2cg5zgD{p;${oboj zr6G?E3$%C<$g*9q|5f77mWRg{Y6DhgiS+mBN`B4w6Iu$J*oV0IBWS`2Yp6N>c21O6 z;7O1^sY+ggeC9f5t>#wcOEbyD%&ha+5Y;M(I9<%{Lcq!Z6^|8s2BxNwoo$2-(JXIK z*Nwa2wN=YjB6_th{3MJ@!OO+^tM6Ei2LWMtSmM|*x@W1HE#xNNAT3NxSjZYLINt<0 z#vJu&dAS~ebE*We5Z3J45MuVV?ae?77L|iuWi}<{5;;ZHH#Fsq2VKaMJ*-CQ6G|{qTH+G zbe^KgJUcd%DW!DMhs)>rL7oVB(F;JO^yv6m- z2J^|pz@VIH=infi+AzZu{8U*0xh6ydRONHfU>!s0Vm2hJ*n7-;%#`fo3K-7c-#;jU zVW0|a!I3*g*5TQ*`EgJ9zF?b6jb10V_8T6i3S3=V!(JCX!lbSS7i{g$+9dY$0mw5K zU0Cyil8D5<|51d^IQYUhnMy5JUNL9uV=4A|LqmQ3`w+KIpFSO+ zTna)HpNB9)lO?q8k!;0aSl>_oLMOWP0e~X+x_2ENLH%#--(=9yaGyQp_14SBXM215 zG7I+Bn|F#@pT)eLZk!wJuOXZpJO<3KVc`Fr>eLqs6#|4}r?pt8t(%q>)g#OG^9<>h zv=kjKb+X8%a+LolQv;6qB7`|;B0j#;!LHbE_!P*lSH&OTEpjAL;%t6uYG^o5H+q9a zA#7M=cWKNvEI5312b)68N;5%G{KmW$>}xu(`UvA@7A3$bgO?`@MEHC8_;jY*;yi}e zQPz~t87pf)!voDaxPl!`M#6_6=V7K(4L&e|5p(9{ZF^3?Hb6hFXJ8wPGduZmALL__ zi9P=j1_DAMQ!`n}?*k|iqrE;_IEs0s*oQ>%bb7WUjPcEgV->~|wTU@lB`)OW%u_y6 zp*EoZDbvV3%CEjTTzcRzUII=HV!l2{vvx&iVc-YCzF1mz&?|QoJiktcuw9pA)6>)A zPhw?bTb`ML=o5dRVa@Px9H5EpjoP>};ET1u*d+-)40l zZ9Toyc_i9UaG`%7N}>eDw%QQ)j3oYZs-4(}o$zKMj_sr)P$#ZBMgM*V2BM29QthLd zBFD8BtjrZ26XU$T(vdS+eq~Rn4dyzTLLopBhHi^LgJ}cqZ3%x*W2sv<~HkI2k>eS zaQ^)H`emxabc$E-r4>CE2XOcig2HWB%nRFW)-m{2UPdMwR1MEPHE z_LN&*77AE1AT{VRbYgfKEA5Nc0MZ10ZG#ciCStVPOHVTgE-GgPMO}}~!F06H5tmjZ z=Kz~=AQP(RG)x*@FJ9QPNMRM7UC$GGiy*D9VS1fkv+fT&f^Pn(dDMMkO94ndke71=m)Dnu$yEHw zgP#+n+`Gp>b^-y^Q9ipuD*iQ5N%Izr5&oXpg2J-04|sbJfpRJErn|*fAcvohOG6S0 zwE>N<>CAKFh=IVHpw^4WQYy;AFP?dN?lP~Iti1g2@z_OWr(=qW(=Zd@Xp^ZB5~7Fw zl`@kXADZc^ZOBUBx5oPL1X0|(cmKWxqHDJPy_6ImFE5X;lNX9chfexJkNNFl{f}h7 zkG_8V)2CNeRES^yK$c|mA@+YjDn#M%3DofRpFcR8g@8x_0ReH|q>s|l_E;XKLZZ0o z;XzABr$ML!LwS3CPy5hU5lkc}XoF6iUg@_j6J&}kEiFZ8Ltq-6nqt3k*xyV4>&$a{ z*H66%0+huNxVLOZsJP5hahTVK^~g!uU`beY`Nh4t@pWp3Z3B3AKbhOo z5Fb1FChj4b$__%p=mybQc@Yul32p3c0bjm-o6gbL(_|U<==Q)eiT{Ozu|5?O5I<4z zn_F1C>g~-^0qF*g4-sH+X&_wNJ1}2+HFv*p1H=%5j7J?vH2T~$_3V?)O)vf1R@g>~2iL!=mQ31DdrYS8K4&?+;-1{F>L|fD73tWwYD2<;)w8aL9RtbA^M`<~m7g zI20I1O++i{dODKjWUXa=T^%SMLf(cKCU?J-*Q&K>%AP4oq9;4p{jJb^l7VLi%#b|h z<>9ct`s8kMxRsL-E;zcoqgjKRNnUe=9u$pdp?#615~Lr5_yUe1KAu5_5sm;Al>q2X zfb+K5anv3Sup3;=*4O&}9~L9CgdV$8PWJFx)g%B|z*Q9#xlnNHi9_99_4gF87DKC&Bp5U=KJl(;Ic8v&R1E!{b@A)>b^VBa_MyiBD zn%nHRThCTm74I@$sCblZR3unq38sNt^W~Ehl0*m3cmCIB{}bPlDo{&D$I-lrPj1uq zE~Bzt~y{^H3rPxxFK3e zsP)hB57)ZQSr6m=L*_>A`i2p`)?)q@ie6tq^q6D5Cg3~JS7av_JWBDrtgxJj!`yMd z&D!6r1jT%);oe;Eh=k$-(l9WyDkB?HIqV&9(tL#^giu{ELHkY5+IJUmG%E?EI3w8! zk461TS{@!AUteF;^Wcr!X0HN-%fucxtByWAOn!}sOL4aA03q@{WaX7y4stj!h zMr6X2gN@B(vpoDs1vp$l6AI8iK~p64j4XNJkAmY_0@6+OLUdQU+t_2bg)=d@qGV?D z>XTETR?(F!qf*bMp7Km8hCi#iHXOrt5zfPb|UY0B&NJxT4e#AgnR_Z zeG%%gegUxX!9VkAb;@>`_dcb>q)|d76-qK*S6Uw_q+&Qfk2gBkq8< zM=+0J+pNbT!^jnbqESqK{{7fuaOyWnCEgJ6 z{%Yf*XUTjy_bk@+tQXoBKw<-Gu)**J#}CKv*w|R0%zD3ncYS>oeBF?jz#!vEYYwjg zAc?a?gS>9DufsxQ9i3hjVH!M)9v_#N!K(ZtnDLr?4qoZ=4JCx$A48p{ePZwil%ho{!D5_QNS!ypRrr@6#+lS86M>6@LJ&6to&p5W_-x$klM9n)R~uC3`yx<^DcB2> z(RP5UXJ%p1QWD2L-}$*Y*j@s+lsp#W)p=OO?Wn@}n4_-Xarr9#gGR*_*lq?*f$G>F zywKLja{_t@sl5AKTwFS7VF3Unh5_gDlWW10$6%X0dh`p7gMhGFOu>GLhq?@8EGafO zAxHtoaVp(4FEDEV6wqTAXm&oR&03P}iufgz)4;dNuiD*qP!EDD0Sh2RW0QL^|Au6= zX0Bm@2YAg}w?Yz5JUEMrL`x)3FtTGdI4rE8p&|DR@m9lSmgLk_6GKCBB>HT9s@KGh zi$Hm+df9@72m|09*CQzIQ@RU|a6&T16Vv)z%r_=|>f^yB{?+f$W|c+MC5C|0!7*^k z|3^#a5~At`o(Z+9KWBjMQ-J)}slfZ6fwKk{05cMP9`O)I-l{XdJJU)Gc_Arld*vpD zB0tw7IV4xmpo2LY7~y*vz7-(SD5e?Pi)U2D#MlxfTR+LmEGphjXP2oiQatp8mz%rI zPFy8~g`=DMy%7LJ>>TJLO{PE|N-ip%lP5==q||!!^L8y8{&R*e2*ZLQr#V*DyZMqW zJzwRDpNGEyJuba$gzJOf(aXW(MI8to`{=P_A;-wPZ-6^QT`1==t>EeT8;0OC%3I&w z!3aWF?T+aD`#Z2riABTP22U5}sL?E&urVuy4MY`bmud5=T8P1fYG==$pnJ0aGtAA@ zImC}To^0IPcrt$tPxjBBKLjc3H5oK`7hDNt=9Q^RVi(eO~YD%Eo z(2NsKp^IFL412d7$;lZwg(4GQeTvuEZxS68MCui-OhkRhImjp>Gt))< z0pjtgOINUA$5zXql`YrUTe`UMynM< zM9ktV=FEKHK$~fx9e2JWaZ*pG8^^wj{x$z3-jMHh5?{W5|9%x>4FQF`tCNtHK4_mb zO~KT)ReEx~A66Or3f4~|-hm@;`#)ktu#xcSnd41l&eoCK84Ca<;ihs?9c~=HGX~0R zzP||&2xt_P#PZTo!Kfpr(kJ@FQk#3O9N71LOhhEuU(Nao9!(O2(1$03WcDWVf&28ct^6R=ME^QGl zsMEn}6ZXCUz-o_`I(&auM_R2Rd4x;_gaS8C=+UEl{wp$lrb*n2D^#F$@ws`xEgvXf ziM)vJ1_HDcxd@p;LPAKMC;QF}A5C#06>RErtQ}&T;CPxdGe6)8 zM!c0H+FOds3Wu1m5ePnQF9Mez6$NHs7@88^;>9$BQvTLg{QUfc$ z{msJze1Gma{{4^M$;9T`(Z1~R@0iU>Yl6f}bwuSSdbfB3{7I?hh~GxbK@Eaw%Q^la zt*fjnv=_K$+FK*LJ|gZx=Sz?7;t#7HbO0j?gLG)%vL3@NLz171%gZ`?)zrc?PG{>Q zDD(SZ*{i&k1anO!}XFTDZ7^p`aX-EDlz;S8JALcdyq9gAdkE)z7&6? z(@q@K6pUhAZmgD|(lub+W@g^n-1PSuMC}dPSITij5U4%LgNdgXF;?ovyHp%>HnxWe z32yj;FpXl{iNLx&9f?!&egR$Ie<*A2{7whGzT_cgo?lrx;w05^Se>Rqi}#~@-Apq7 zz6d+Y$vPv^Qpl%hJ&vnQ7*sD<5KFXZ(- zm}j1hMHZbO$xe~r(*6Ooj9kkp<7#bfJ+uG#7xJiQ{LMxtCa{G#yFXQP=PK20Hw=>{ zZ)ykc-r1=vEV7+a5GczF1uM~@1RC|v2i2S)+aQ_Fgf0tn9l&R~xS z8EMA0dhj(Fjpf;NxhgvZJv)HUgSNOiv#{&bUZ-7(a+yM|n4-EQB z8bFsJHxhb(v?7W79IaNlx}W6bDQ7rWOuV+Q;ATEl{9t|CbzB{SC9p5Y-lvS^#Kq&L zPRyktS6`(r|Buyj1MhIudVEyTAUQE)Ee4x148M;kZhNuh2g}wb}5YHSO9N54@xSXcr`c=wrOYn5kWpCO; zrU7=3%hi6n5MkXKYVNn5#Bx`o-zoo;NEOx!p+l&vvlE+(xXt=&KHN=o))V9-9HC-Lmrq1&QtZVM~ZjY`p?wnRu^n{!S5x){F*BISXf6{0VSyw@%KZE zWS^XZ>w0s$Ew><0AZ9yIsl$s<&K?$8g^V%6h#yFrh%H$uaetV^2%EZ z2$u04t4ipp;dLVwK7nYgOy@w*J6=+IS)2LY3J{IYxo_;#a8l#!W3+ufR z9#=#sZls-kXd1)#Q?YaS1So^fX=%Z13CrjsWv8-7=zeAjc%vn)UF+4QHeB9C(as|9DryvKX9xD}LQX&07@|i-7ZhQas}-Q_4CRq-y~@6lzX%8JRNiZ&=(X0VEz}U^fwiSMN-nw!PA^yLD>-czSeEXI&16dek~0P z$jfiTLGuA`2bknYQV)dk3z=0k&e$RX%B&jlFfKDuW{cb03+PkUeG zS!%7~;IOz~lDaJjj-s0=JZmGs;JrY#?Y%nF2E5!+4z21_5OroaMJGEBvR<^VJij zCXV!1N={!f=p)^e6X{;$lc6~+{+v+3BnM}XTxxuxmFS} zA7PG42=oM%2r)E9^Jh1wU*3}^aH|4Wxj?bPnnPmp5=9S^IJmG!zfRS(r9A|X;7!lL zJyYj;&{5*$j5V!7U`{@)CC<3h*a=2WOAGnbN?(5f1p&9kF(=;D89pjo{0V&XL}e~2 zT3YM~IU<-$$7UADMJTdh%SHVP7;_x%NKL$Rw6E^}HS*YvBQ7=;f#gp(BmN~13`^)2 zz!O`fl~OrKOK};$68$*X88?H!blvmr^EbIRQiLr&tE;XsWT8`|20O2?gpA<`U7tUN zhw2gA0X~JZbn4jyQ$%k9rPg&9*~!kP#oP^9*zaK79m77Btw!1T7Ea|?wq;xrzCq(` zm1ti4R(PRE5x~lW(Qk!mkLFE$$dHe-&??P z0Guo*R}YY+zjVk!ZDnuw8X@6=n}w?g{L;B|=bF$>hHRx7%~A%(v!)(pUUa#M2+kAH zCA#HPAZ9(tT3ask*)Y7@qb6kO(o<-Sv5sfc0_p@|?mT?RjSXc!2 zMF1V5@Ox%h!1Z2v?s*C5RurvIVL!)VG&&0HK_3AGk~U78)1FX!4&$1?N5S|zqrv6MN0*Ler^dZ(1myy|o!G(&7il}7(UU3fLVBBl= zL+R^FM1}r!ZCo90+6xa>HMV>wJV>%Ue~_q$w59Ofz1smgY(Y%ARg~s z)bA7zER^dZ$)E+n_3m9lvkU5{f4k*9VnH74r^IKodmjJ%3fs+btL7afza1=Mn!Q3D zh!fXFuuZ&9Wn^WgR$(pG260#$r^uhsO8fiklr&nypzs0-?R})nxGOlhe~{jdxO4+c zO~Y{F5ZC)4MzvXMJvOXW*N82Vi3RTX40;&|aR%}YARLOHZ)oDi$2@X6R?VKS#3yuP4PL0;d;jZ@S?&_8!BCME_-VY=L{dc@UJ zf6&@wTIbFe*Zl?G8C}8%dWRSpiLWHj{w3-=R9pto1UN)!2Sb$8GnVpyOo*&dlHfam z)49@d6w7WcP+rNRd<8o9x33LRHRAp+toBmXX_j|Cv(MBzc4;lJDY?s}KgC3kHo z4@|y@Nhx8UAmu5V=3fxHT6?DvFpAx^9Gc<9 z@PW@OD;%7hR>sB@rYC&&xW7<7@dEudj*@H<>gBM!t!kgq)eS99!(oEz`+3a^_v%7{ z5kg%k#{uyNPVn>-qCPwgFu>v8S}yz{>uoVCcqDOE7kL2M@*1*>I1BBnx7N9H?`z#I zqvsJE)XYQj$0q(6XU?3tAbdbQseKA=6ma)HV6N=#?ZvZukAo@gJn6Bqa%uIg|Mq{d zy7~JP-K&s>5;8F%i?C-N`^DTvEKcXKJN65r(Gi};0MC0pc3^aV{{D>+e;4kSFOr)o zK@bZ-SDab1d#|N#V1!rr z{@pvYY?r{=!okt1eV8MR!4v(DXpEg1Ai1txG8R-of5L6mg6!I1VpXu zF0gEQ@+N^n=CEEH7O1OmuF9Tq9v=r*R@-K9dmmf{+2S*)nr; z=Rk5m`~qD03r2#}rJ!Y%CIJD76Zl;UZdXJhkQmm|yxRL5n|S&7+Dhnb zg_Qr{uET692|oPg4(-!fcz3{sU;xPr5OHG@+)nYxk%XOM@5(NphJzyI)h;BAnMmr7 z`~%b>bDKbZgk%W18Mj%1ZZ|9gYXlY{A;KTFU*iZ|#B(yS!oj4w&>v-DJ4~TRsb(|q zS(7e$$e|H3<(10JO*c2Kgi{#KJVbB(GP2avp$^M!!+^lRcAGc<{Y_ zedvuQ+?SI;riY@se&DgzIF1&=6c$gJ=XUmR=%;~fHH0c-b@fgp20L-qNg>tj)qg8Z zo+cN%wg z-pR?y4}Sle8XC57Tirg28~x#)##fjL^5@zJ&mJEhCOr&t*BIRi1_lMspWkqCxoB*x zsdPj_Eag>~rw;EvLiGnD9)w;{qX1o)=zqYV8{l5+upOSKLjhiitli?@)41a-RIX`A{%6Yte z`}XbF*ugzC?JVQ01b{A(=^J>YLrL`n7b8y@55RDm+$~2KaCLQKi;9jmXFEGP&p%== zw~3vZBmc#|^g;VV_;Q>8C>%LCITQ-{!S{&ycy@hS;x=lmh|V~(YgyjgIZoZL+rL}t zcU-#|AIk#V1>y}%R}qP%4TtFt=f`kxSqlW^XRoG$o#i7|8h4Vy9^vv!KV%@$CFv;= z`l|^vJC24n#zt zy~f6O?d^fSnP@*i8-G>RNyTv#S}^#97`^@qczWH@5guD@1A`ON(kZV#q5Gr#%HBO% zl$2qRyI};Ekf>1kmUGL~6aH%qSm`La&js_&2ff*^U!N5f6+M1@KQlA#cWe@Bm9eq0 zunLkNPp0qCxPL+jdhz0gR#oA%XU_KaZ^y?4>!PGa7)hoJ@Gl%4AHuw2^#q|L&L-bYD@c2zxbijMaaHT9dI!!N$HZVaI4M8c^= zga$3tWNw-i%+!@2K`d$KGRTq~yp@lfQ z78aN}aW=lcUzb;b7u`IBl#dVsT*hcJI6lroEDLvT@wa)x!hhH;xw&N~Q2U8dNlD zpK#?s&Hjb6`=!vk__wX8S2~rc6B+jJ65rr$nH(SIisM>qyIf|XMQ?xgDrK(Jkt4b2 z*?93n^VY3fV)gF)H#CGfAhYue*Iyi9Ui&HzASXjioKAUwqej2PblxrE} zEWEh{J%NXE zVLa3#d`&k;I6h;d)-AW9;wM}vFv^RFh#=(x;QU%&=ki1N1mRm(Cp+PCW*o}s|1 zp|8XB>rqT!S`lo~OgW;OlYQrBpERLQxZJJ>23%>U_tw_#hl$i9s>;ie_V8*~Ej0&< z(Ls6Y)T!X-%AMty74nkeKqcG}M;VQ-I6L3Lp@x|7=vF{HoY(QTGY_tB-%UEQ`)qBd z4cyh&uV0UCpZD;nR_TS|o0)z#g7=;yMvAXXb zS;)Ru$*Ur&`hy5v)2=pr-a}k^;Hq#I5P4x82x_anvn{nTU+Vk@T&??Fgjm*WamviS zx?Bv?z5~=?txVb_ha1qSEN-D0e`bljQ&gTM#h@Q9OV2X2HkXd$uuM zojUaTH zd&Bm_(J=iA^*WMl_G-2n%D&4#cqlTfzkQDa_!mw2GB;K9WBI3zo*1Ikq|4X6(jVQvOUyyk;&?i3d47*@kw zg9ryDPh3i>y0DODVh6XQs;Y|dQG`bd^hbm)yYd0Vo2!`6 z`VbPpydFXwaw1CK_}9^N@D8)}kWfBvEG|BaX$<}Sds*ALrhpbva)0;xC(kRJ83HH? zZwsQ$LUJ%$WMcuoM|L)GdBTF~q~FPtCli-By{*vGadoInYdUr{ezOtxzOJ@5-Z)q_ z8?|3-tS-7yqvr4Z{X$Cp6SX_ntDWzSKljFR^c+}Mb9R1)j-w|}>>=zQO}Ys}2yrN} z&V>6lWDX2%0j<|_EBZAEp&#lpv1Vnq8Z9T0A$q?sjgPS3pf;E*^@Z;<({(u{}f!(N2J4 z3@@hEj|zOm$Q%Y{%EfQr{=;=09#*^iJ-auqx$n&zs8PsVjq`fo;sF~_wNrf8eoL5X z1v0a@Sx9JPY|PBml#P|O&7Q^?euShP!AAy7Ottm(?Of}1smSs}=Vn{9i1-U0w_1Ki zOB-L>5muf>DeAr!x8bV0a8ldh;` zBh^vzyyW(9gvL5_nE)9>wuG4(kJ|-Q80sTjB_sE5?_?u$FeSzS5k&+f7)lz_JS$EG zbshsJCnwaOn04df;epk=`m!tx$0;6m;=ikuL7VxH0F(#m2{v$yx8LS-pg@ zseRTJ_^E*fSg3H_7Blgaz}wy3-6_e*NU3wjaGh%^E927<2KM1tv$Ll`Wy3g% zt9EwL?9#D0`DkdPzsnbDJI-U#?CXt2T)=58(glV2@-St+AR5$=Rq@eL(9=@PAqs(b z1d~g#;+v&{PbZExRA6hAbEUVsndu#o%Rs^p4G#W+!xE1^pzIK?G&{R|YrgU)bQ|Lj zKs7?0V=Kl2cD${#ZK=paL=kKe2rl6M)jqI+kCKy_-<{M>tREjY$!kPKCh5s%_S<)1;5%EI_1EqmeE%@kLinWDf8%C`M`3>$oO{ zX1d4sDrVA065Q~TCr_fDyMw+CTn))!8wwjZg1Y9`fIv!r2fG{&C9eL8kp%>CW>*OQ zlaBWGQfk3H+4_q!GuBtHj!jQ1Duhr6B9^x~TtOEIn5w;WbaeFen5uFc^#l&INq$}3 z1lp5ceX?xyDl02Hff+aGltiw}T)n%H5EmEs-~sDBZYF<2d6*e7<_@(AwyHxkI#mhPPm5#xDF)|6r@vbZqQiUl}+^3@k0DhlUd4<7ejQ z)eS~4r$RH7VDwj2Q?rSNHslVn9RMEA>h>BuLwK3=Ye6x`i+{z<45sj084qy1lqzn> zIiNxlePs_uU--u%VWFoJcn361u3+!L07C;k1I3w-;O1&U zS0mGyP4W6Z#Bj&LLWHN`v9Y*_h-=QyF%c1iZ{JEuN_K7$Um z<(QaQd`>=EuYsm*1B1*$Ym8Z0fV8sp0tC6y>qCbh#i7hx!B>Z$16cy#4mTVDOoD|L zE!1RfB!BRK;VQ<*$E*CRII%zMx6EP>>OgMZU(ipXkA`H6^7#P;;_)RU7y#0p zKR=AL1z`8AxLEk;(QW6Zva-REk&*ZBEy}q~9*US%)As6N;)1SXR()fq^gb4r{9G&* zARVwce}4boFan#450aOc2Nwc3Yya0OLZH^_!T2G4;$~`Th1b>9VFW?-N;c(x9v);A z6lj#gt&xlw0W={9pkz1P1`hy2;6{&9%$WkZ~n3Pwh7SWyIBXAgD8E zzMvNozZ3^JyhemM3SWpYMCgAb=RCn|Eas2*k@*mKfl%JW`ZmOKT>ntvKe?yl#Ae_x z!}mf@4T1+7CudyfkDou4&?1Ps6DJ)E8OSTHn*N6G%~2d zbnPO4C1q>QbN9R}tAEwg6QCKAiH=JO7aGI^ulnt+EuF*bbo=s-jYJzUKqcb7Ok{{| z*~3Ha6CI?}sgd~7Zd`B&x zjvn_m;ULx?WUEi7y~bOWLSC=SyYE!gTeL4(8nYUJT-)^Md=(UA8!!BQ|-_35Ww6ap2t-?lOZ~X~g-n=?$auglBuimcMF6Nw)T-KF8Z#++-5RtY;7D!&z| zA(yG8DuA&?D&IKU6S0F)HXoG@6SvSH9$|MN^(8KeSf)2)ItW2_AyL5G!VAX6 zV61B9UoZ!%YqjCvJWZc(e3&}QT|s2vjTvQ5|LoSLOn4vK(TTvOAX)2G8s zL_|eIcu#ahh)clrdzNyDA+8%sds^! zHAPZ0elc~tncO+3g82p@R1oZL-Kv7`2f+@#GR3Vut6wpi^30haPZ@I&2CBW;Y>eed zE+oY3yg~PPR5qgaJjlq1oge8GNP4pXe3ax{Czza;<^$sQ`EyYW(~G9~aUH+-&Yh;b zU@BS(_j#?u>Lp}k==vJ)`mvq5ElDA0ikUC?mq>7kaR6>w zLfl8GMKSCa(Y6_rZeJ#=M`EH8e!Y_S4|P<$Q0FkLqAO`XrUAWg@w39WDG!^q&uo zasU{Kceg}~YzB!4EyEyupgI+l1-Afh!?O!iyYltx?(XhCE`jGQBy#fd|34-alq6c> zfq4LtQC;$7)>Kv9_Y#bsfLHO^vu9;-!^fXRsPll0H8Yz^5qzdE_!{!i_3J?uqP&a| z>R@rGHr3K_?&0R00F+=|2$@H;aoo>LglwXzzos$IAl$ zhX;ak0W6en;1wSsAV!QvIy#T@p!1ja&wZie#}PCz5(}^`K4ucX<=&%5ukFN9gSff> zeV7fD1;)60j{AEhBGJmdfP<&f`uMH16SVU3^D$M|eIggsmS0j5W>E%@S4m09h)0q-+-9bx4{`HXHhp(C-Owa+ z{(MqK20?$ZiYz}Cf^&s;a`o!f_Kx_9+}tCeGD=JJ31g?a!{TNCIy*RM>FL20!$3s_ zmpQ(Hp&_Lk{eO9$X-;W5?Z7} zQYk`6mPwYRy#}q;P)&)5Hb$hRT_{HOrBb#Gl_a!0uY2Zsj^p|L|NH(i$M+bG?)&q3 zzpv#y&+EJ<)H1k$bP<(f>tY)zoR;YStGfrASx2(#vn1pIl+Sv}eDuUvGPnw&u;P zexsN3;tO!AN~K}Av> z}TxU5jcS$b&4F59;C1yWshj%&okH0M~% zGqx*^^4GcaDYfGD=IVUO^+`hak!Bs|`u(}4O0sy^#|_yR+jhm16xOC}cZUhfMe<=bEMcgvyxef6lM19R1BxTl1S>Mz~7ijHg%3 z4J^98FS{`VaO|>$=^^iCzZBaz(8BJi zQa`Kkm^-gGZ??3b@@YZO=cqkf0{!z+=AMeT>pj#NX5gUw)J@y+>8{rDq5EV!9;;T5 zNa))=VRiM%Z`u;o4fO$)U3t47Y|c6|rO&FFBMY1^TT53Do}Yd*=eX~9l{)JwUkkg% zkD}?-hi8Pr{P+JT%k-435&r-GhCeE+YfZ2#)$chIHS>wTM_ICXfYF49B_o55xOI0b z_IdunHANwfpK7VO)KshK`8__mx+>di_p-!g2PF>_>h%apNfq;Z66N^0wngr)55L^EG-}~%oz4}`Me943UOcw0##hM^ zXA=Kjuc4m#?rM=2+`U`sOLr-EO%yJwfcnLy8o`{q#<_P_^kfwCfp2#2?&9wEv^!1r z?1tf{cSQe8X7axjc2M85YlN|0`+|_yeLV)Q6PG>ZZ~ie7eHlEu-1zv1Esmc-Y7h#6 z#@ar8=4@Ew$A9SQd6vj)XDiR+>iS@$=WS7X$hx6UbLfMKo*1|^phe)P2!4Sn!Aa5~ z>j|(TIgX8utB1!FyHoyoYBOU0>pRF5uyBjQ{b$u&*S<7ZUhE$f2tGS@T!B5CI&~_) zc=BWrshN}j41^T!2<%RV?F5oQC8m65qDIZLXV;*{BZp`!T1Wq^6qA0FjLdByz6e(t-a=`b?}crWEj)@{8`m~XV(^LIT|gP&E5|e_CK0yVge6! z>e#U#X;pJML*fg|Nb*#H^!uZKzZM)^%7bAD_c?C6z7vUIA`=dBRA8Qjos+e7Cdda| zLy_iQ(yeG-7YlmpQ*FIBnvv_ygqy6Zt(qfHkM;!SKsd2}^XA?hk0jmUHM4JC8f@f9 z`X+K=sC5QI7F6a^eGh?&V*>-U?Ej-_NX&tgjgm&P8+;lgi=R`-i8O{tPm@lZIANsb zlJm4=@1YX)aS>xJQt!HD2?PR&jZZ6Dnsa;%>ZW_pY%hK8x#Zgp54zECk<*FJ}Dsi&`>Fnx=9di>qBqbq&( zF`Fi1C|vd#@CHXSBef?vYpwomQp@=*lGQ#WD-wcjo^xvc{(deqViYYRnj~lM8oI9u zN#jo?X&2q)- zsT&=IywvpJ_j1lmUN_`+QbNLTL1DnSS<2_q)7LXIfguUsx9=>48*%c-Q&!3;IFhY! z_j@?s(NFBv;gnf!gAVFH3v3)S{-Ehe&jwToo2h)kd>00b2lgB_?qUhRB$5;R3Pb}Z?2zqoM7F53ZfBm>m?;Wf=RI`RFLlK5ugFd76O)zHU|b?e`NJjK#)mdYWCzbcw{dVdGUnCV%j$XigmGcFY)v{W z{j%dUPT2X2Ir}L+Hom)eQ`WoLBQiR>mn9Cld1+LpLM@)*FJDd+NSC^0MMT`@Rw2*X zX>cknZJdQ7JzF+VQG5d3Vj^V+&Oe z{f?T3YMLH;<3^?u_wM2?i-O-uQ`1z$6g1VqY9BiZwZq z0EiShFzAk8;TaF0jzn+0?aM2}03!^)uOddI7TUWqGV3P>^bZ(40zOg^2!sOZWm@(o zCb#%JFaaEN{2@^ZRpeRk+0%{Y_igp*&Fj}+W43{5o<)o99-95r6YG_3+fkDa$Hid~ zJO|ITv@|pR=;g~jd~8Cgz?3_zDHOUxClme+R5wF-;HWz6fO@p~zbgrVrwVY*h&oXI zhOwwl3UX`-A@xrq=L7{>6s+$lqxj_3=8YSJS7upjFb;B@5gz~@E|SS&F?@DA(VC4N zGv@M@E3i9@P_HAN;zA2dmpV9Xewv*MhLf7=0cwR2($b~haA4Ed(GfF6#Xl3z>C{xy zl`9R^)Li(lh?~O6*Rry<@F5va#Jhd>_m{tmKz`~}i;kPGDan{=$HM#lr%z6qE&=)Z z1}E-1XSoG4z7kt`;o`;FfU0nKuW%+?|L`neE+q2zDxc}o3eYzpX=EH__UbYj#XWUY zt4tuD(fEjPuM6J~O3Wf5(G)F|Xbhi1j}!T3JwjuF_J1AdJ`&qLBGQBtX6B zgcG3grGRxcoul`yb98X%S9++(<$ZHAMkj{8Es5FL4^VCF>}*4r2_MKIhYULkc4vMlSp&q3 z`W#sbj4gyr&#%|Lr}R7`9=i9N$TSQ0mP0;?xk-kUad@q$@C2i$J(H0ft*a{%z<|+e zDOFWhZ^jRa#Rb`M=CVGloQR5z&Ea6vh$-th=lL{2838z>f&d~~#q<;r{PsAZu&b>~ zTa=U6t;5rD7i}nnk$uK;iHy!=ams@SZ%0tg7exe>mL`$}6Er;6t?M6r3#t#gMdV3O zoF}<-;SMSvFZq3fhK6_E3*IX;m?rzUwsbY3PxbT@>O;iIMPtLxl+2OHz0&;*NF7!W z#&j9bO!8Sc+9YewwQEDUS|K4KC7WNnNUl^|{3nN%*%*q8kOLzbHhQxRhyb++EykWb zh8o|{J%Ax_4EbqQwg$ykzQ}aNrC&8%?Da-p!WFgp4^l1J!)gL-s!T#Jht8Ku+CsNo5Xs`uE?X1=6&1MLv$XERgE_@q@~4 zDL1sJsGg(Nr%xZiZsD5R6v+#FsgN*4Xngd@KKl=YJwJ0%xg&j~{DKic2mXaJ543zw zE~PAFyh!7LYtP-v%B-9mbSTUd-)ADi<7G;c8*vvSW8x;M=n00lH*eMv_v793h-5DQ z1miSH-ISD|Oc(q+1cqc_QM>EV#F#I0zglg+5ieAmcaa3-vzDb zd{KCAfRcUt)_&EhLh3ELW49$E6cyXhbWWW%4V414p$iF`YA9NOfwyhI-HSELmn%v>0p##-9jSr}9But;pT6%`A>WecM;h&BVi;=Mpl1B?{m za%8@aN@;!ZQ8`jA>ihg;OSf6%Iu{!|BHdBhY!6r zY$)y!=0$e>)7aR~6s5UgFmH_sMH3V8ivUIhA?inNaRbIIE!41}gQae*j2LM|h>UE{ z{eTJ0Di|5Z55B){?!h{IruE(kcYuBdn83F?O?2xDS2kIvcFn(#FjlOvC(xsOR3RFYC z2JU)J0XHb1zYn%+$d8(uzM&}ihYN&S0EuAD!-r0}o(j2|cV18lXZCG@+S|9C`(f$nj%tZz z0{f@-XQ2is6eAB4V@@Fz3&5-S!i6RAjAN-NEAP`NpgoF6Mo2z)JGEYJmBkB{$XOA9L_PN3BM64p$MwLf& zB&S<-q#|Wu9&6~ZVQFX1H2mqpW&jV5rM9;F#nF!PfhOzd__j6n7j@|Q^IZHkJ;@A6 zJ}dHxAeIuZk%Ar5aotagN9ql4fJtX)+iDZw6WZ}NK6yb~nYb7qG1I$lpL;je2O3=_ z^0YKHM-L&eAU`X0+hS)Y2rRN*vgA5*E_-3^Ycba8=s+rQ_`w;SR(=f7lFK#WDH9ao zbdzK&8ykg-m_c)-{Cs`U%bCdWk5eAnL;Q&esEtgM5{)b;sI*7n^jhx1MBWc zEh#HYKYt$U%IDUH6xMH{UYIw2Py2h2?As8|*f_cmx$o;CD%A%kmf)Q?M0S+p8`rer zSFd__IJWM6**%qH{iLSmrn_GFej+teQZ7K0UH=7*R5LuK?H}VDwLUjDkG>#nXK!yS znBsb?Z;g|c74c9chSZ3-^~}nC3iv-L-K0sA5>J8owTPSA@T})*dj2H;v02Yxr{Z5$ zM|Fx25>i(pW#Hez?p#g^p`wE?G#1;BxPe7es)xoYMQ(97-1>Imi{P;$`CLRY>crG} zrQ*K4d4Xy4lsvoBh-L@%dt5S7@^cM%BUwd8GSKq?qepn?&78SBJEo+Bq2etVuGo$O zP5LSEcFnwbeXl$>+U7FD(C`Lx90ATLVX`w!=sOAFc(sN_Qq&DeR==>=0^uDB_=FX`}~voR@*g?yMmmIaH4Rys-l9@S~Oo0 zQLi4C!ZcKUhr|FSAP$(2-R<|y1g4A}4tadP1p1zU6`n1Ho9RMhW8Bf5MdtczIamd_ zyn2dTeaEeL%k<&zuV)W%@d{KlwNI}m5}r#;+}EV*I)hhs|GqA7LN;`6hgEB;TCW6` z1O*yyg9~zo4^+`sh#BD0(c33(I^xjb+bbsQo%a&sOwmRN1A`}*Hw;n!?JM4Ymj$M- zbFJ0nzy!NIUTN?VB+K~K$ z9|T8=e1M42UtX1pKY8%Lhen9P27s5NYJEO)-=~isxmwJE$NJsD8MAn?nU<0$mm48kX^c61(>OY8S?M$)5yAKngXaeE+A3l1N+``}k zoY8&hqIakLFS6dw(AaNa*GJ=DVud05@+v3^M9W?N9gXMQhd-z=Jp*5si}eH+>kg>!H=H5dZntYjK`4=`ga3?x;lZB&PXZp+v(n} ze%~YJsZTlr`h$22a$nzl!-fI(Oi08NkNSUqE^0;65 z_)ds=QwsC?TM}5I!ZU;wx#;Ze`SA|Ztv9Q0)_Z%a#YRVqJgy3!RHYfe8QBLXn#Ygv zhdK5 z4ydWuq40Bpvqoz2DDc$f^&J;OP$r*+S9sbo!B)%cYFzo7qV~X&(?cRlxjZmZ&~QSh zMH@CWi`H0TUO-dInTVafr@fHM9LcM$GVWrw>6Zz_zH4mKJ3atC&_l%_6Hg-TDkm5m(S0nBCq-2K;2n*q_l< z=GSd(Mk$8Z+{mpS#uXTHG=-3GOn;rCRs@WjNNb9HSy-49Ra)i(Ob+0Sd#Rz30ZrfC zKiI_6Gw5Um{!^AhMbHr3A|Vw6Oi~4NayQZb#PqE`d;a`r#qEX39hUjj4o;IE7ZnBQ zWH4oi;zUQ8gHI%q(vKpo4IfSyO!*4TG@w*&?JyE4T^QxtjCxJk(AbkFQL+5}>=^)t zbdSESZc!kiPc{@L{XtEQT2tqI@p-sG_rC7!ffD@2TV=!Rm->3wO`B%l)Wk1?YsujL znk+}k#YeTZ!SN#xmB0D=<4KB<1sNC6lctDsMz8g3P-ktRvsmF3qZ*qo)^;&BY`S&p zSNs5ASJ=_R;4egyghViL3eb$cvS?58>pMz8Ntt`__Sg83vEZt0M`29Bn|^Xf@2d6o zYvh|E!4jBm-?lAL#=rn#FP7Ci@SFZRJt>< zxcXV{IvNr5DN`m-USe-QNn87Mrt5D?ce=$xyf{4O+Un;zEtFzkS`D8thA)@%)C(%p zq5!4Q{%o`5>-f7jY(V2$gNQcc zi0h9j@T(b6_gG2C=Iz%XUQ<^H6)_?;S-kinuj9`@=K?cvy-2M5Wi{u`yTmG0gp-$= zJicF3!=wjKn}8u~sERD#s9qV`x{A(3G7tdL%*#Y0f{zz)Fj5c770MBy<`~jw2BC0H z-~<#+0c)m2qO5FL^I^!`fmi-GU^`^U5#(J^itn15MEXwzBVf0PwJ%DPQjYa82D)>C;yS!#!UIxq2e)64UQKyQ*WGWL)rsnYRjj**uw z5;vFi7H^T$R|bff?x)Z;s{i&?)@IQ{eSns}ql2!5j-?AD`%%JXE<+VGP(}tR4$i@T zP~Mm~_8&BMUlShpg9QS9grlUEcWzvGeO3ZAd~eV9LZBwhTSzFP#a;Yn=o8xvmqTBu z=1`h0#}WeV_n0O7+mYIg13&*FX|X$m3i^E z@7w_#3G`a7mfT5AK_x?WA&ApfS}fSzTC=F6M|SO0$D)|y3JInf0#4HY8Y&~?IwETs z`bSKXfRoTb%KwZ%eq6;=G>+hH&Vx^$mn>HJOTeon@N>OiQvWj(q43Q*yI+eM{pGm% z{_=k$YgDZ9@Q6=HF!0{0?>xqH>95NYF}7;WAqfdi5C}$UG&Dh~OK8WK;tBeo^YaE9 zMu(ew&kCOha*JkZY8wh}IFV%CSlY7PTjq*S_|n{5+%YP&I7vg=c{rM09r1Z9n3HhJ zIx4(cy!Zs*3wm{WI#>=2pn!xC%ng+Z!C(V$3bAFTe}Bu>FQy;eSh&}Chdgg1;p9n# zoC?ayN28+DHZ&I3d-GGE%lbRp+^=3%+ap-t($S$5K92@@Z1(-=)9Y%KRgrJNpuOv* zQ6PdWJ{XLkh9?h*S$~0jA)pP!7+$vShp4I}!(;b%CnO|Xb@~i#RWUg8Mdt5kZhCo_ zsA*C7r6GsbUY2{$T@R83!ARW-V?hB02yi9_;_wku3=9(Pu30@0i+1Fys!B?d%VXVR zT$f0@`Y;$`W>!wz0aZ05rP3IDG+>&;j1j4)mAnxthq(v7wzSlJ{eh-)%(~chhyOHw zGuQ|xUOJsTr@XWZoPECj06-@ya^pA`L?X`d(RhKW$SvBZA+RZN+o&uZ7I;tFWj!%6mKVhVdCR!xtS1p3X?N0Uml#*22%J5+C#Eb)8h7Bz5C{t@@F&(=Z z-7VG?NYoq7C&a|~FftX>vARX8-)HIm{qwNMF}D>eawnt zvJwSkG0%E|-yE8ezj!pHa_;8Ft1raM2*G}4E2>pU% z7OvWNP8pu4bZ5^WR||0vpF^jZlN6M{Aw}_@G$}c`>U;MPY@5uy1AYzL%!I^L`~^N* zqlux2oICfo)+*)v1~Z$udrL$BB)F-ycCbl39ZuQ75!bnreCH>H!`WWW0PHbrU)!Pr zeIcQ^PS~P=iSoLM#fZtU%n2?2!D1;AAD4o!8Qy)B28z5w>XW~`z&?HNqFqMTyhl;L z#$eZ}<%f8 z>Hb~o7Ashg6$#2u-VwMOe zO}h8&nTp&daPC2)sO+aJkCtv{0#d?8L3RswkeRgf^pvxm-!SwD8Xt+aAz}NXg$se> zTrcKua1}?Zq!mz0P+~bvasb>1(5Y(L+gdtAXE=joGP1JsHv+1$)QOsfa9&|$G)^kX zc(s_z{;0;>7%UweaV6NWL354A>eX*JW&>5Sa&sGf@&G`1E>wp|K(C{P2XHN#Ly)0$ zs4&-OHH-3u6^@Sk4C&+6HY&i5iqiE_cdQByfDgU&>gv&mO$AGAXr_O6O_X&CUx*}no%xCtD&E-2U+ZbWKi&mFFW8Hvw zChF8^EHCOMf!yjFe&NmtBUx9*yyl=mv3%lKd#Whn=;}3VByT8$9|Rz_4x|}_>@u|> z#?AWyePAOPtfDeuXxJ;}i3oM$Po^Y`w)f6e9Wml0X?CcxvQY}4yk`gi5QJ@*f+1w# zZat(JSDh|0MX+s-erT7tcmMlz)tQ+$FE%iV;vQ>y-*-ylSM5Ukn+NUSgm?^FX5yqG zS^$=t+xMkBaKrC${mNg`h(fJsSlQSigd+G6xcCb|cIXhoR>%U*oszh7ms60EB6Kq^bJv4} zvTzsZ`LkZ_KRE~<<^pBq_f*X2hXvB2QNCdB&*_&g9l^w_wY86_%kxQN$4XXK6ZDXg zQf(FOtypZKl78{x;)M(GrPkHc^HZuC_10wj_9^Uf*}8R@u3O(thp1V@!*P7@_V)H| z&Y)GwH6O?~;fV82xFarkd~jNMz9!Obq#-UkfaWZiDGKG|a-{GI=oidn2@ou$$lTD; zt3m<2QA%B%X&L_P)vJSp==m40feRHhb6&RDJfS&rUNhjbV#RgTJDlk&yMvnMsYer& z8m>0D2&wf^X%-YXlLv_h#l>@1t?E5Aax`SXo2U~fR#;o_TPNk9#cVus=n_YD8EZ-@ z6_w>IS9f<8Y3HkIYIGP`p?G`r=&-!_p>j8<7bbMtj8W)L950b*fFHB_1?TYoG2Aa z+0{;kGgc@h@8_fP7P|k2SbM>O?95D~d{5s3ri!ThK~}KN<+4FbJATC3g3=CZiZr5d z4ttr`wY9)ak1v`2PJjc%L}{~n^`F!!1R;#47GQ-+$<6N(@e0(ZR3S`s3$vs)t@0u4LB_97mz=on)p$Gk{_dBeQlH%{b(Rm@ zoccPZdcqu|`XgcWmrcsA1DeV0ch5~PT9~`HD!+Z`%uahxs^q{j%aWTBr?o_pi=m;`j*&7k3z3U+GCP4-E}Kne%U_k*0Sj* zC2y!xwBXgR3z<){1wXY?7k{kNUtCPAx3?qm@!#iH?ynK!e-@^T=Up^$+5dk4HD=kj diff --git a/examples/inference/cpp/bert_example.cc b/examples/inference/cpp/bert_example.cc index cdec69a1..d06501e8 100644 --- a/examples/inference/cpp/bert_example.cc +++ b/examples/inference/cpp/bert_example.cc @@ -9,13 +9,26 @@ Example of how to run Bert inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; int max_batch_size = 128; + int batch_size = 1; + int batch_seq_len = 10; + + if (argc == 4) { + batch_size = atoi(argv[2]); + batch_seq_len = atoi(argv[3]); + } + if (batch_size > max_batch_size) { + throw std::runtime_error("batch_size exceeds the maximum (128)!"); + } auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( "Bert", model_weights_path, max_batch_size); - int batch_size = 1; - int batch_seq_len = 8; - std::vector host_input = {101, 4931, 1010, 2129, 2024, 2017, 102, 0}; + std::vector example_input = {2859, 2758, 2051, 2157, + 2005, 6629, 7566, 1012}; + std::vector host_input; + for (int i = 0; i < batch_size * batch_seq_len; ++i) { + host_input.push_back(example_input[i % 8]); + } void* d_input; lightseq::cuda::CHECK_GPU_ERROR( diff --git a/examples/inference/cpp/quant_bert_example.cc b/examples/inference/cpp/quant_bert_example.cc index a1d96121..d58d8bb8 100644 --- a/examples/inference/cpp/quant_bert_example.cc +++ b/examples/inference/cpp/quant_bert_example.cc @@ -9,14 +9,26 @@ Example of how to run QuantBert inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; int max_batch_size = 128; + int batch_size = 1; + int batch_seq_len = 10; + + if (argc == 4) { + batch_size = atoi(argv[2]); + batch_seq_len = atoi(argv[3]); + } + if (batch_size > max_batch_size) { + throw std::runtime_error("batch_size exceeds the maximum (128)!"); + } auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( "QuantBert", model_weights_path, max_batch_size); - int batch_size = 1; - int batch_seq_len = 10; - std::vector host_input = {101, 2859, 2758, 2051, 2157, - 2005, 6629, 7566, 1012, 102}; + std::vector example_input = {2859, 2758, 2051, 2157, + 2005, 6629, 7566, 1012}; + std::vector host_input; + for (int i = 0; i < batch_size * batch_seq_len; ++i) { + host_input.push_back(example_input[i % 8]); + } void* d_input; lightseq::cuda::CHECK_GPU_ERROR( diff --git a/lightseq/inference/README.md b/lightseq/inference/README.md index 24b2bfad..10577b7f 100644 --- a/lightseq/inference/README.md +++ b/lightseq/inference/README.md @@ -72,7 +72,7 @@ cd examples/inference/python then you can check the performance by simply running following commands. `hf_bart_export.py` is used to transform pytorch weights to LightSeq protobuffer. ```shell -python export/hf_bart_export.py +python export/huggingface/hf_bart_export.py python test/ls_bart.py ``` diff --git a/lightseq/training/README.md b/lightseq/training/README.md index 6147a4d8..fb5dd74d 100644 --- a/lightseq/training/README.md +++ b/lightseq/training/README.md @@ -21,7 +21,7 @@ With only a few lines of code, you can enjoy the excellent performance provided ## Features - **High performance**. In WMT14 English to German dataset, compared to [Fairseq](https://github.com/pytorch/fairseq) with [Apex](https://github.com/NVIDIA/apex), -LightSeq can provide **1.53** times speedup for transformer big model on NVIDIA Ampere A100 with 4096 batch size. +LightSeq can provide **1.53** times speedup for transformer big model on NVIDIA Tesla A100 with 4096 batch size. - **Comprehensive operators**. LightSeq provides comprehensive efficient custom operators for PyTorch and TensorFlow, including embedding, encoder layer, decoder layer, criterion and optimizer. To the best of our knowledge, LightSeq is the first open source project that cover the entire training process for Transformer-based models. In contrast, [DeepSpeed](https://github.com/microsoft/DeepSpeed) only provides encoder layer. @@ -38,7 +38,7 @@ The following is a support matrix of LightSeq compared with ## Performance Detailed experimental results is available [here](../../docs/training/performance.md). Here are the experimental results on WMT14 English to German task. -We train transformer models of different sizes on eight NVIDIA Tesla V100/NVIDIA Ampere A100 GPUs with data parallel and fp16 mixed precision. +We train transformer models of different sizes on eight NVIDIA Tesla V100/NVIDIA Tesla A100 GPUs with data parallel and fp16 mixed precision. [Fairseq](https://github.com/pytorch/fairseq) with [Apex](https://github.com/NVIDIA/apex) is choosed as our baseline. ### Speedup for single training step From 3bb9f73bb6cf12c9ebe495288eacef98cb728069 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Fri, 8 Apr 2022 20:30:18 +0800 Subject: [PATCH 26/49] do not use ffn2 out quant if using gelu --- lightseq/training/ops/pytorch/quantization.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lightseq/training/ops/pytorch/quantization.py b/lightseq/training/ops/pytorch/quantization.py index 4fcf062f..fe284671 100644 --- a/lightseq/training/ops/pytorch/quantization.py +++ b/lightseq/training/ops/pytorch/quantization.py @@ -36,7 +36,7 @@ def __init__(self, in_features, out_features, pre_activation=None, *args, **kwar if pre_activation != "encoder_out": self.input_quant = TensorQuantizer(input_quant_config) self.output_quant = None - if pre_activation != "relu" and pre_activation != "encoder_out": + if pre_activation is None: self.output_quant = TensorQuantizer(act_quant_config) self.weight_quant = TensorQuantizer(weight_quant_config) From fa7b8cb426da5cc9ba5f9d4d52d13533bba29622 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Mon, 11 Apr 2022 22:22:44 +0800 Subject: [PATCH 27/49] polish gemm test --- tests/cublas/gemm.h | 50 +++++++++++++++++++++++++--------- tests/cublas/test.cpp | 63 ++++++++++++++++++------------------------- tests/cublas/util.h | 39 ++++++++++----------------- 3 files changed, 77 insertions(+), 75 deletions(-) diff --git a/tests/cublas/gemm.h b/tests/cublas/gemm.h index 0e00d593..0aa81b93 100644 --- a/tests/cublas/gemm.h +++ b/tests/cublas/gemm.h @@ -225,7 +225,7 @@ float test_lt_matmul(cublasLtHandle_t handle, int C, int B, int O, int H, T *X, float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, int8_t *X, int8_t *W, int32_t *Y, int32_t *alpha, - int32_t *beta, int iteration) { + int32_t *beta, int iteration, int order) { #if CUBLAS_VER_MAJOR == 11 cublasComputeType_t ComputeType = CUBLAS_COMPUTE_32I; cudaDataType_t scaleType = CUDA_R_32I; @@ -240,6 +240,14 @@ float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, cublasOperation_t opTrans = CUBLAS_OP_T; cublasLtOrder_t order_COL32 = CUBLASLT_ORDER_COL32; cublasLtOrder_t order_COL4_4R2_8C = CUBLASLT_ORDER_COL4_4R2_8C; + cublasLtOrder_t order_COL32_2R_4R4 = CUBLASLT_ORDER_COL32_2R_4R4; + cublasLtOrder_t order_W; + + if (order == 1) { + order_W = order_COL4_4R2_8C; + } else if (order == 2) { + order_W = order_COL32_2R_4R4; + } cublasLtMatrixLayout_t XDesc, WDesc, YDesc; checkCublasStatus(cublasLtMatrixLayoutCreate(&XDesc, XType, H, B, H)); @@ -272,9 +280,21 @@ float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, checkCudaStatus(cudaMalloc(reinterpret_cast(&Ytransform), sizeof(int32_t) * C * B * O)); - int ldXtransform = 32 * B; - int ldWtransform = 32 * O; - int ldYtransform = 32 * B; + int ldXtransform, ldWtransform, ldYtransform; + if (order == 0) { + ldXtransform = B; + ldWtransform = O; + ldYtransform = B; + } else if (order == 1) { + ldXtransform = 32 * B; + ldWtransform = 32 * round_up(O, 8); + ldYtransform = 32 * B; + } else { + ldXtransform = 32 * B; + ldWtransform = 32 * round_up(O, 32); + ldYtransform = 32 * B; + } + cublasLtMatrixLayout_t XtransformDesc, WtransformDesc, YtransformDesc; checkCublasStatus(cublasLtMatrixLayoutCreate(&XtransformDesc, CUDA_R_8I, B, H, ldXtransform)); @@ -282,15 +302,19 @@ float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, ldWtransform)); checkCublasStatus(cublasLtMatrixLayoutCreate(&YtransformDesc, CUDA_R_32I, B, O, ldYtransform)); - checkCublasStatus(cublasLtMatrixLayoutSetAttribute( - YtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_COL32, - sizeof(order_COL32))); - checkCublasStatus(cublasLtMatrixLayoutSetAttribute( - WtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_COL4_4R2_8C, - sizeof(order_COL4_4R2_8C))); - checkCublasStatus(cublasLtMatrixLayoutSetAttribute( - XtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_COL32, - sizeof(order_COL32))); + + if (order > 0) { + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + YtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_COL32, + sizeof(order_COL32))); + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + WtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_W, + sizeof(order_W))); + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + XtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_COL32, + sizeof(order_COL32))); + } + if (C > 1) { checkCublasStatus(cublasLtMatrixLayoutSetAttribute( XtransformDesc, CUBLASLT_MATRIX_LAYOUT_BATCH_COUNT, &C, sizeof(C))); diff --git a/tests/cublas/test.cpp b/tests/cublas/test.cpp index 9b1e7de5..02e6d340 100644 --- a/tests/cublas/test.cpp +++ b/tests/cublas/test.cpp @@ -1,7 +1,7 @@ #include "gemm.h" -vf _main(std::string name, int C, int B, int O, int H, int iteration, - bool debug) { +void _main(std::string name, int C, int B, int O, int H, int iteration, + bool debug) { printf( ">>>>>>>>>>>>>>>>>>>> %s, shape: X(%d, %d, %d), W(%d, %d, %d) " ">>>>>>>>>>>>>>>>>>>>\n", @@ -32,27 +32,39 @@ vf _main(std::string name, int C, int B, int O, int H, int iteration, checkCublasStatus(cublasLtCreate(<_handle)); float cublas_ft = -1, cublas_ht = -1, cublas_it = -1; - float cublaslt_ft = -1, cublaslt_ht = -1, cublaslt_it = -1; + float lt_ft = -1, lt_ht = -1; + float lt_col_it = -1, lt_col4_4r2_8c_it = -1, lt_col32_2r_4r4_it = -1; printf(">>>>> test cublas gemm ex >>>>>\n"); cublas_ft = test_gemm_ex(handle, C, B, O, H, fX, fW, fY, &f_alpha, &f_beta, iteration); + print_res(fY, fY, cublas_ft, C, B, O, H, "cublas fp32", false, debug); cublas_ht = test_gemm_ex(handle, C, B, O, H, hX, hW, hY, &h_alpha, &h_beta, iteration); + print_res(fY, hY, cublas_ht, C, B, O, H, "cublas fp16", false, debug); cublas_it = test_gemm_ex(handle, C, B, O, H, iX, iW, iY, &i_alpha, &i_beta, iteration); - print_res(Y, fY, hY, iY, C, B, O, H, cublas_ft, cublas_ht, cublas_it, debug); + print_res(fY, iY, cublas_it, C, B, O, H, "cublas int8", true, debug); if (C == 1) { printf(">>>>> test cublas lt matmul >>>>>\n"); - cublaslt_ft = test_lt_matmul(lt_handle, C, B, O, H, fX, fW, fY, &f_alpha, - &f_beta, iteration); - cublaslt_ht = test_lt_matmul(lt_handle, C, B, O, H, hX, hW, hY, &h_alpha, - &h_beta, iteration); - cublaslt_it = test_lt_matmul_int8(lt_handle, C, B, O, H, iX, iW, iY, - &i_alpha, &i_beta, iteration); - print_res(Y, fY, hY, iY, C, B, O, H, cublaslt_ft, cublaslt_ht, cublaslt_it, - debug); + lt_ft = test_lt_matmul(lt_handle, C, B, O, H, fX, fW, fY, &f_alpha, &f_beta, + iteration); + print_res(fY, fY, lt_ft, C, B, O, H, "lt fp32", false, debug); + lt_ht = test_lt_matmul(lt_handle, C, B, O, H, hX, hW, hY, &h_alpha, &h_beta, + iteration); + print_res(fY, hY, lt_ht, C, B, O, H, "lt fp16", false, debug); + lt_col_it = test_lt_matmul_int8(lt_handle, C, B, O, H, iX, iW, iY, &i_alpha, + &i_beta, iteration, 0); + print_res(fY, iY, lt_col_it, C, B, O, H, "lt_col int8", true, debug); + lt_col4_4r2_8c_it = test_lt_matmul_int8(lt_handle, C, B, O, H, iX, iW, iY, + &i_alpha, &i_beta, iteration, 1); + print_res(fY, iY, lt_col4_4r2_8c_it, C, B, O, H, "lt_col4_4r2_8c int8", + true, debug); + lt_col32_2r_4r4_it = test_lt_matmul_int8(lt_handle, C, B, O, H, iX, iW, iY, + &i_alpha, &i_beta, iteration, 2); + print_res(fY, iY, lt_col32_2r_4r4_it, C, B, O, H, "lt_col32_2r_4r4 int8", + true, debug); } // printf(">>>>> test tvm gemm >>>>>\n"); @@ -66,24 +78,10 @@ vf _main(std::string name, int C, int B, int O, int H, int iteration, // printf(" diff: %.5f\n", ie / C / B / O); // printf(" time: %.3f ms\n", tvm_it); - if (C == 1) - printf("SPEEDUP (cublas fp16 / lt fp16): %.3f\n", - cublas_ht / cublaslt_ht); - printf("SPEEDUP (cublas fp16 / cublas int8): %.3f\n", cublas_ht / cublas_it); - if (C == 1) - printf("SPEEDUP (cublas fp16 / lt int8): %.3f\n", - cublas_ht / cublaslt_it); - free_memory(fX, fW, fY); free_memory(hX, hW, hY); free_memory(iX, iW, iY); if (debug) checkCudaStatus(cudaFree(Y)); - - if (C == 1) - return {cublas_ht / cublaslt_ht, cublas_ht / cublas_it, - cublas_ht / cublaslt_it}; - else - return {0, cublas_ht / cublas_it, 0}; } int main() { @@ -139,19 +137,10 @@ int main() { {batch_beam_size * head_num, head_dim, 1, step}); } - vf speedup = vf(3, 0); for (auto shape : shapes) { - vf su = _main(shape.first, shape.second[0], shape.second[1], - shape.second[2], shape.second[3], iteration, debug); - for (int i = 0; i < 3; ++i) speedup[i] += su[i]; + _main(shape.first, shape.second[0], shape.second[1], shape.second[2], + shape.second[3], iteration, debug); } - printf(">>>>>>>>>>>>>>>>>>>> SUMMARY >>>>>>>>>>>>>>>>>>>>\n"); - printf("AVERAGE SPEEDUP (cublas fp16 / lt fp16): %.3f\n", - speedup[0] / shapes.size()); - printf("AVERAGE SPEEDUP (cublas fp16 / cublas int8): %.3f\n", - speedup[1] / shapes.size()); - printf("AVERAGE SPEEDUP (cublas fp16 / lt int8): %.3f\n", - speedup[2] / shapes.size()); return 0; } diff --git a/tests/cublas/util.h b/tests/cublas/util.h index 7018bfb7..1a4c1411 100644 --- a/tests/cublas/util.h +++ b/tests/cublas/util.h @@ -7,6 +7,8 @@ typedef std::pair psvi; typedef std::vector vpsvi; typedef std::vector vf; +inline int round_up(int v, int d) { return (v + d - 1) / d * d; } + inline void checkCudaStatus(cudaError_t status) { if (status != cudaSuccess) { printf("cuda API failed with status %d: %s\n", status, @@ -76,38 +78,25 @@ void init_data(float *fX, __half *hX, int8_t *iX, float *fW, __half *hW, } } -void print_res(float *Y, float *fY, __half *hY, int32_t *iY, int C, int B, - int O, int H, float ft, float ht, float it, bool debug) { - float fe = 0, he = 0, ie = 0; +template +void print_res(float *oracle, T *res, float time, int C, int B, int O, int H, + std::string name, bool dequant, bool debug) { + float e = 0; if (debug) { printf("oracle:\n"); - for (int i = 0; i < 10; ++i) printf("%.5f%c", Y[i], " \n"[i == 9]); + for (int i = 0; i < 10; ++i) printf("%.5f%c", oracle[i], " \n"[i == 9]); } - printf("fp32:\n"); - if (debug) - for (int i = 0; i < 10; ++i) printf("%.5f%c", fY[i], " \n"[i == 9]); - for (int i = 0; i < C * B * O; ++i) - fe += fabs((debug ? Y[i] : fY[i]) - fY[i]); - printf(" diff: %.5f\n", fe / C / B / O); - printf(" time: %.3f ms\n", ft); - - printf("fp16:\n"); - if (debug) - for (int i = 0; i < 10; ++i) printf("%.5f%c", float(hY[i]), " \n"[i == 9]); - for (int i = 0; i < C * B * O; ++i) - he += fabs((debug ? Y[i] : fY[i]) - float(hY[i])); - printf(" diff: %.5f\n", he / C / B / O); - printf(" time: %.3f ms\n", ht); - - printf("int8:\n"); + printf("%s:\n", name.c_str()); if (debug) for (int i = 0; i < 10; ++i) - printf("%.5f%c", float(iY[i]) / 127 / 127, " \n"[i == 9]); + printf("%.5f%c", (dequant ? (float(res[i]) / 127 / 127) : float(res[i])), + " \n"[i == 9]); for (int i = 0; i < C * B * O; ++i) - ie += fabs((debug ? Y[i] : fY[i]) - float(iY[i]) / 127 / 127); - printf(" diff: %.5f\n", ie / C / B / O); - printf(" time: %.3f ms\n", it); + e += fabs(oracle[i] - + (dequant ? (float(res[i]) / 127 / 127) : float(res[i]))); + printf(" diff: %.3f\n", e / (C * B * O)); + printf(" time: %.3f ms\n", time); } void vec_pb(vpsvi &shapes, std::string name, vi shape) { From e8912b7ca69dd8c0a7c105778c3d64f8286a1cd1 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 14 Apr 2022 17:15:22 +0800 Subject: [PATCH 28/49] fix gemm test lt col bug --- tests/cublas/gemm.h | 166 ++++++++++++++++++++++++++++-------------- tests/cublas/test.cpp | 71 +++++++++--------- tests/cublas/util.h | 14 ++-- 3 files changed, 156 insertions(+), 95 deletions(-) diff --git a/tests/cublas/gemm.h b/tests/cublas/gemm.h index 0aa81b93..75fbe2fa 100644 --- a/tests/cublas/gemm.h +++ b/tests/cublas/gemm.h @@ -101,32 +101,39 @@ int cublas_lt_matmul(cublasLtHandle_t handle, cublasLtMatmulDesc_t matmulDesc, } } +int cublas_lt_matmul_int8(cublasLtHandle_t handle, + cublasLtMatmulDesc_t matmulDesc, + cublasLtMatrixLayout_t XDesc, + cublasLtMatrixLayout_t WDesc, + cublasLtMatrixLayout_t YDesc, int8_t *A, int8_t *B, + int8_t *C, float *alpha, float *beta) { + cublasStatus_t status; + status = cublasLtMatmul(handle, matmulDesc, alpha, A, XDesc, B, WDesc, beta, + C, YDesc, C, YDesc, nullptr, nullptr, 0, 0); + + if (status == CUBLAS_STATUS_SUCCESS) + return 1; + else { + return -1; + } +} + template float test_lt_matmul(cublasLtHandle_t handle, int C, int B, int O, int H, T *X, T *W, S *Y, S *alpha, S *beta, int iteration) { cudaDataType_t XType, WType, YType; -#if CUBLAS_VER_MAJOR == 11 + cublasComputeType_t ComputeType; cudaDataType_t scaleType; -#else - cudaDataType_t ComputeType; -#endif + if (std::is_same::value) { XType = WType = YType = CUDA_R_32F; -#if CUBLAS_VER_MAJOR == 11 ComputeType = CUBLAS_COMPUTE_32F; scaleType = CUDA_R_32F; -#else - ComputeType = CUDA_R_32F; -#endif } else if (std::is_same::value) { XType = WType = YType = CUDA_R_16F; -#if CUBLAS_VER_MAJOR == 11 ComputeType = CUBLAS_COMPUTE_16F; scaleType = CUDA_R_16F; -#else - ComputeType = CUDA_R_16F; -#endif } else { printf("Not supported data type."); return -1; @@ -185,12 +192,8 @@ float test_lt_matmul(cublasLtHandle_t handle, int C, int B, int O, int H, T *X, NULL, Wtransform, WtransformDesc, 0)); cublasLtMatmulDesc_t matmulDesc; -#if CUBLAS_VER_MAJOR == 11 checkCublasStatus( cublasLtMatmulDescCreate(&matmulDesc, ComputeType, scaleType)); -#else - checkCublasStatus(cublasLtMatmulDescCreate(&matmulDesc, ComputeType)); -#endif float total_time = 0; for (int i = 0; i < iteration; ++i) { @@ -223,18 +226,84 @@ float test_lt_matmul(cublasLtHandle_t handle, int C, int B, int O, int H, T *X, return total_time > 0 ? total_time / (iteration - 1) : -1; } -float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, - int8_t *X, int8_t *W, int32_t *Y, int32_t *alpha, - int32_t *beta, int iteration, int order) { -#if CUBLAS_VER_MAJOR == 11 +float test_lt_matmul_int8_col(cublasLtHandle_t handle, int C, int B, int O, + int H, int8_t *X, int8_t *W, int8_t *Y, + float *alpha, float *beta, int iteration) { + cublasComputeType_t ComputeType = CUBLAS_COMPUTE_32I; + cudaDataType_t scaleType = CUDA_R_32F; + cudaDataType_t XType, WType, YType; + XType = WType = CUDA_R_8I; + YType = CUDA_R_8I; + + int64_t strideX = B * H, strideW = O * H, strideY = B * O; + cublasOperation_t opTrans = CUBLAS_OP_T; + cublasLtOrder_t order_W; + + cublasLtMatrixLayout_t XDesc, WDesc, YDesc; + checkCublasStatus(cublasLtMatrixLayoutCreate(&XDesc, XType, H, B, H)); + checkCublasStatus(cublasLtMatrixLayoutCreate(&WDesc, WType, H, O, H)); + checkCublasStatus(cublasLtMatrixLayoutCreate(&YDesc, YType, O, B, O)); + if (C > 1) { + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + XDesc, CUBLASLT_MATRIX_LAYOUT_BATCH_COUNT, &C, sizeof(C))); + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + XDesc, CUBLASLT_MATRIX_LAYOUT_STRIDED_BATCH_OFFSET, &strideX, + sizeof(strideX))); + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + WDesc, CUBLASLT_MATRIX_LAYOUT_BATCH_COUNT, &C, sizeof(C))); + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + WDesc, CUBLASLT_MATRIX_LAYOUT_STRIDED_BATCH_OFFSET, &strideW, + sizeof(strideW))); + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + YDesc, CUBLASLT_MATRIX_LAYOUT_BATCH_COUNT, &C, sizeof(C))); + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + YDesc, CUBLASLT_MATRIX_LAYOUT_STRIDED_BATCH_OFFSET, &strideY, + sizeof(strideY))); + } + + cublasLtMatmulDesc_t matmulDesc; + checkCublasStatus( + cublasLtMatmulDescCreate(&matmulDesc, ComputeType, scaleType)); + checkCublasStatus(cublasLtMatmulDescSetAttribute( + matmulDesc, CUBLASLT_MATMUL_DESC_TRANSA, &opTrans, sizeof(opTrans))); + + float total_time = 0; + for (int i = 0; i < iteration; ++i) { + cudaEvent_t start, stop; + float time; + cudaEventCreate(&start); + cudaEventCreate(&stop); + + cudaEventRecord(start, 0); + int success = cublas_lt_matmul_int8(handle, matmulDesc, WDesc, XDesc, YDesc, + W, X, Y, alpha, beta); + cudaEventRecord(stop, 0); + cudaEventSynchronize(stop); + + cudaEventElapsedTime(&time, start, stop); + cudaEventDestroy(start); + cudaEventDestroy(stop); + if (success > 0 && i >= 1) total_time += time; + } + + checkCublasStatus(cublasLtMatrixLayoutDestroy(XDesc)); + checkCublasStatus(cublasLtMatrixLayoutDestroy(WDesc)); + checkCublasStatus(cublasLtMatrixLayoutDestroy(YDesc)); + checkCublasStatus(cublasLtMatmulDescDestroy(matmulDesc)); + cudaDeviceSynchronize(); + + return total_time > 0 ? total_time / (iteration - 1) : -1; +} + +float test_lt_matmul_int8_col32(cublasLtHandle_t handle, int C, int B, int O, + int H, int8_t *X, int8_t *W, int8_t *Y, + float *alpha, float *beta, int iteration, + int order) { cublasComputeType_t ComputeType = CUBLAS_COMPUTE_32I; - cudaDataType_t scaleType = CUDA_R_32I; -#else - cudaDataType_t ComputeType = CUDA_R_32I; -#endif + cudaDataType_t scaleType = CUDA_R_32F; cudaDataType_t XType, WType, YType; XType = WType = CUDA_R_8I; - YType = CUDA_R_32I; + YType = CUDA_R_8I; int64_t strideX = B * H, strideW = O * H, strideY = B * O; cublasOperation_t opTrans = CUBLAS_OP_T; @@ -243,9 +312,9 @@ float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, cublasLtOrder_t order_COL32_2R_4R4 = CUBLASLT_ORDER_COL32_2R_4R4; cublasLtOrder_t order_W; - if (order == 1) { + if (order == 0) { order_W = order_COL4_4R2_8C; - } else if (order == 2) { + } else if (order == 1) { order_W = order_COL32_2R_4R4; } @@ -272,20 +341,16 @@ float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, } int8_t *Xtransform, *Wtransform; - int32_t *Ytransform; + int8_t *Ytransform; checkCudaStatus(cudaMalloc(reinterpret_cast(&Xtransform), sizeof(int8_t) * C * B * H)); checkCudaStatus(cudaMalloc(reinterpret_cast(&Wtransform), sizeof(int8_t) * C * O * H)); checkCudaStatus(cudaMalloc(reinterpret_cast(&Ytransform), - sizeof(int32_t) * C * B * O)); + sizeof(int8_t) * C * B * O)); int ldXtransform, ldWtransform, ldYtransform; if (order == 0) { - ldXtransform = B; - ldWtransform = O; - ldYtransform = B; - } else if (order == 1) { ldXtransform = 32 * B; ldWtransform = 32 * round_up(O, 8); ldYtransform = 32 * B; @@ -300,20 +365,17 @@ float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, ldXtransform)); checkCublasStatus(cublasLtMatrixLayoutCreate(&WtransformDesc, CUDA_R_8I, O, H, ldWtransform)); - checkCublasStatus(cublasLtMatrixLayoutCreate(&YtransformDesc, CUDA_R_32I, B, - O, ldYtransform)); - - if (order > 0) { - checkCublasStatus(cublasLtMatrixLayoutSetAttribute( - YtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_COL32, - sizeof(order_COL32))); - checkCublasStatus(cublasLtMatrixLayoutSetAttribute( - WtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_W, - sizeof(order_W))); - checkCublasStatus(cublasLtMatrixLayoutSetAttribute( - XtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_COL32, - sizeof(order_COL32))); - } + checkCublasStatus(cublasLtMatrixLayoutCreate(&YtransformDesc, CUDA_R_8I, B, O, + ldYtransform)); + + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + YtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_COL32, + sizeof(order_COL32))); + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + WtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_W, sizeof(order_W))); + checkCublasStatus(cublasLtMatrixLayoutSetAttribute( + XtransformDesc, CUBLASLT_MATRIX_LAYOUT_ORDER, &order_COL32, + sizeof(order_COL32))); if (C > 1) { checkCublasStatus(cublasLtMatrixLayoutSetAttribute( @@ -349,12 +411,8 @@ float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, NULL, Wtransform, WtransformDesc, 0)); cublasLtMatmulDesc_t matmulDesc; -#if CUBLAS_VER_MAJOR == 11 checkCublasStatus( cublasLtMatmulDescCreate(&matmulDesc, ComputeType, scaleType)); -#else - checkCublasStatus(cublasLtMatmulDescCreate(&matmulDesc, ComputeType)); -#endif checkCublasStatus(cublasLtMatmulDescSetAttribute( matmulDesc, CUBLASLT_MATMUL_DESC_TRANSB, &opTrans, sizeof(opTrans))); @@ -366,9 +424,9 @@ float test_lt_matmul_int8(cublasLtHandle_t handle, int C, int B, int O, int H, cudaEventCreate(&stop); cudaEventRecord(start, 0); - int success = cublas_lt_matmul(handle, matmulDesc, XtransformDesc, - WtransformDesc, YtransformDesc, Xtransform, - Wtransform, Ytransform, alpha, beta); + int success = cublas_lt_matmul_int8( + handle, matmulDesc, XtransformDesc, WtransformDesc, YtransformDesc, + Xtransform, Wtransform, Ytransform, alpha, beta); cudaEventRecord(stop, 0); cudaEventSynchronize(stop); diff --git a/tests/cublas/test.cpp b/tests/cublas/test.cpp index 02e6d340..8b3d10a3 100644 --- a/tests/cublas/test.cpp +++ b/tests/cublas/test.cpp @@ -12,15 +12,17 @@ void _main(std::string name, int C, int B, int O, int H, int iteration, float *fX, *fW, *fY; __half *hX, *hW, *hY; - int8_t *iX, *iW; - int32_t *iY; + int8_t *iX, *iW, *i8Y; + int32_t *i32Y; allocate_memory(C, B, O, H, &fX, &fW, &fY); allocate_memory(C, B, O, H, &hX, &hW, &hY); - allocate_memory(C, B, O, H, &iX, &iW, &iY); + allocate_memory(C, B, O, H, &iX, &iW, &i8Y); + checkCudaStatus(cudaMallocManaged(&i32Y, C * B * O * sizeof(int32_t))); float f_alpha = 1, f_beta = 0; __half h_alpha = __float2half_rn(1.0), h_beta = __float2half_rn(0.0); int32_t i_alpha = 1, i_beta = 0; + float i8_out_scale = 1.0 / (127 * H / 2.951); init_data(fX, hX, iX, fW, hW, iW, C, B, O, H); @@ -31,56 +33,53 @@ void _main(std::string name, int C, int B, int O, int H, int iteration, checkCublasStatus(cublasCreate(&handle)); checkCublasStatus(cublasLtCreate(<_handle)); - float cublas_ft = -1, cublas_ht = -1, cublas_it = -1; - float lt_ft = -1, lt_ht = -1; - float lt_col_it = -1, lt_col4_4r2_8c_it = -1, lt_col32_2r_4r4_it = -1; + float t = -1; printf(">>>>> test cublas gemm ex >>>>>\n"); - cublas_ft = test_gemm_ex(handle, C, B, O, H, fX, fW, fY, &f_alpha, &f_beta, - iteration); - print_res(fY, fY, cublas_ft, C, B, O, H, "cublas fp32", false, debug); - cublas_ht = test_gemm_ex(handle, C, B, O, H, hX, hW, hY, &h_alpha, &h_beta, - iteration); - print_res(fY, hY, cublas_ht, C, B, O, H, "cublas fp16", false, debug); - cublas_it = test_gemm_ex(handle, C, B, O, H, iX, iW, iY, &i_alpha, &i_beta, - iteration); - print_res(fY, iY, cublas_it, C, B, O, H, "cublas int8", true, debug); + t = test_gemm_ex(handle, C, B, O, H, fX, fW, fY, &f_alpha, &f_beta, + iteration); + print_res(fY, fY, t, C, B, O, H, "cublas fp32", debug); + t = test_gemm_ex(handle, C, B, O, H, hX, hW, hY, &h_alpha, &h_beta, + iteration); + print_res(fY, hY, t, C, B, O, H, "cublas fp16", debug); + t = test_gemm_ex(handle, C, B, O, H, iX, iW, i32Y, &i_alpha, &i_beta, + iteration); + print_res(fY, i32Y, t, C, B, O, H, "cublas int8", debug); if (C == 1) { printf(">>>>> test cublas lt matmul >>>>>\n"); - lt_ft = test_lt_matmul(lt_handle, C, B, O, H, fX, fW, fY, &f_alpha, &f_beta, - iteration); - print_res(fY, fY, lt_ft, C, B, O, H, "lt fp32", false, debug); - lt_ht = test_lt_matmul(lt_handle, C, B, O, H, hX, hW, hY, &h_alpha, &h_beta, - iteration); - print_res(fY, hY, lt_ht, C, B, O, H, "lt fp16", false, debug); - lt_col_it = test_lt_matmul_int8(lt_handle, C, B, O, H, iX, iW, iY, &i_alpha, - &i_beta, iteration, 0); - print_res(fY, iY, lt_col_it, C, B, O, H, "lt_col int8", true, debug); - lt_col4_4r2_8c_it = test_lt_matmul_int8(lt_handle, C, B, O, H, iX, iW, iY, - &i_alpha, &i_beta, iteration, 1); - print_res(fY, iY, lt_col4_4r2_8c_it, C, B, O, H, "lt_col4_4r2_8c int8", - true, debug); - lt_col32_2r_4r4_it = test_lt_matmul_int8(lt_handle, C, B, O, H, iX, iW, iY, - &i_alpha, &i_beta, iteration, 2); - print_res(fY, iY, lt_col32_2r_4r4_it, C, B, O, H, "lt_col32_2r_4r4 int8", - true, debug); + t = test_lt_matmul(lt_handle, C, B, O, H, fX, fW, fY, &f_alpha, &f_beta, + iteration); + print_res(fY, fY, t, C, B, O, H, "lt fp32", debug); + t = test_lt_matmul(lt_handle, C, B, O, H, hX, hW, hY, &h_alpha, &h_beta, + iteration); + print_res(fY, hY, t, C, B, O, H, "lt fp16", debug); + t = test_lt_matmul_int8_col(lt_handle, C, B, O, H, iX, iW, i8Y, + &i8_out_scale, &f_beta, iteration); + print_res(fY, i8Y, t, C, B, O, H, "lt_col int8", debug); + t = test_lt_matmul_int8_col32(lt_handle, C, B, O, H, iX, iW, i8Y, + &i8_out_scale, &f_beta, iteration, 0); + print_res(fY, i8Y, t, C, B, O, H, "lt_col4_4r2_8c int8", debug); + t = test_lt_matmul_int8_col32(lt_handle, C, B, O, H, iX, iW, i8Y, + &i8_out_scale, &f_beta, iteration, 1); + print_res(fY, i8Y, t, C, B, O, H, "lt_col32_2r_4r4 int8", debug); } // printf(">>>>> test tvm gemm >>>>>\n"); - // float tvm_it = test_tvm_gemm(iX, iW, iY, iteration); + // float tvm_it = test_tvm_gemm(iX, iW, i32Y, iteration); // if (debug) // for (int i = 0; i < 10; ++i) - // printf("%.5f%c", float(iY[i]) / 127 / 127, " \n"[i == 9]); + // printf("%.5f%c", float(i32Y[i]) / 127 / 127, " \n"[i == 9]); // float ie = 0; // for (int i = 0; i < C * B * O; ++i) - // ie += fabs((debug ? Y[i] : fY[i]) - float(iY[i]) / 127 / 127); + // ie += fabs((debug ? Y[i] : fY[i]) - float(i32Y[i]) / 127 / 127); // printf(" diff: %.5f\n", ie / C / B / O); // printf(" time: %.3f ms\n", tvm_it); free_memory(fX, fW, fY); free_memory(hX, hW, hY); - free_memory(iX, iW, iY); + free_memory(iX, iW, i8Y); + checkCudaStatus(cudaFree(i32Y)); if (debug) checkCudaStatus(cudaFree(Y)); } diff --git a/tests/cublas/util.h b/tests/cublas/util.h index 1a4c1411..c534036c 100644 --- a/tests/cublas/util.h +++ b/tests/cublas/util.h @@ -80,7 +80,13 @@ void init_data(float *fX, __half *hX, int8_t *iX, float *fW, __half *hW, template void print_res(float *oracle, T *res, float time, int C, int B, int O, int H, - std::string name, bool dequant, bool debug) { + std::string name, bool debug) { + float dequant_scale = 1.0; + if (std::is_same::value) { + dequant_scale /= (127 * 127); + } else if (std::is_same::value) { + dequant_scale /= (127 * 2.951 / H); + } float e = 0; if (debug) { printf("oracle:\n"); @@ -90,11 +96,9 @@ void print_res(float *oracle, T *res, float time, int C, int B, int O, int H, printf("%s:\n", name.c_str()); if (debug) for (int i = 0; i < 10; ++i) - printf("%.5f%c", (dequant ? (float(res[i]) / 127 / 127) : float(res[i])), - " \n"[i == 9]); + printf("%.5f%c", float(res[i]) * dequant_scale, " \n"[i == 9]); for (int i = 0; i < C * B * O; ++i) - e += fabs(oracle[i] - - (dequant ? (float(res[i]) / 127 / 127) : float(res[i]))); + e += fabs(oracle[i] - float(res[i]) * dequant_scale); printf(" diff: %.3f\n", e / (C * B * O)); printf(" time: %.3f ms\n", time); } From ff642704c9114d96c5f82d5d8bae5b7a3e28f95b Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Mon, 18 Apr 2022 18:09:12 +0800 Subject: [PATCH 29/49] support gpt2 qat --- ...er_layer.py => ls_hf_transformer_layer.py} | 0 .../huggingface/bert/task_glue/run_glue.py | 2 +- .../huggingface/bert/task_ner/run_ner.py | 2 +- .../huggingface/bert/task_qa/run_qa.py | 2 +- examples/training/huggingface/gpt/__init__.py | 0 .../huggingface/gpt/ls_hf_gpt_layer.py | 123 +++++++++++++++ examples/training/huggingface/gpt/run_clm.py | 26 +++- examples/training/huggingface/gpt/run_clm.sh | 7 +- .../training/huggingface/gpt/run_quant_clm.sh | 23 +++ lightseq/training/__init__.py | 4 +- .../{gpt_encoder_layer.py => gpt_layer.py} | 26 ++-- lightseq/training/ops/pytorch/layer_base.py | 1 + .../ops/pytorch/torch_transformer_layers.py | 141 ++++++++---------- 13 files changed, 254 insertions(+), 103 deletions(-) rename examples/training/huggingface/bert/{ls_hf_transformer_encoder_layer.py => ls_hf_transformer_layer.py} (100%) create mode 100644 examples/training/huggingface/gpt/__init__.py create mode 100644 examples/training/huggingface/gpt/ls_hf_gpt_layer.py create mode 100644 examples/training/huggingface/gpt/run_quant_clm.sh rename lightseq/training/ops/pytorch/{gpt_encoder_layer.py => gpt_layer.py} (82%) diff --git a/examples/training/huggingface/bert/ls_hf_transformer_encoder_layer.py b/examples/training/huggingface/bert/ls_hf_transformer_layer.py similarity index 100% rename from examples/training/huggingface/bert/ls_hf_transformer_encoder_layer.py rename to examples/training/huggingface/bert/ls_hf_transformer_layer.py diff --git a/examples/training/huggingface/bert/task_glue/run_glue.py b/examples/training/huggingface/bert/task_glue/run_glue.py index b1319a9b..0b3b62ca 100644 --- a/examples/training/huggingface/bert/task_glue/run_glue.py +++ b/examples/training/huggingface/bert/task_glue/run_glue.py @@ -45,7 +45,7 @@ from transformers.trainer_utils import get_last_checkpoint from transformers.utils import check_min_version from transformers.utils.versions import require_version -from ls_hf_transformer_encoder_layer import inject_ls_layer +from ls_hf_transformer_layer import inject_ls_layer # Will error if the minimal version of Transformers is not installed. Remove at your own risks. diff --git a/examples/training/huggingface/bert/task_ner/run_ner.py b/examples/training/huggingface/bert/task_ner/run_ner.py index b077246f..41db6c1d 100644 --- a/examples/training/huggingface/bert/task_ner/run_ner.py +++ b/examples/training/huggingface/bert/task_ner/run_ner.py @@ -44,7 +44,7 @@ ) from transformers.trainer_utils import get_last_checkpoint from transformers.utils import check_min_version -from ls_hf_transformer_encoder_layer import inject_ls_layer +from ls_hf_transformer_layer import inject_ls_layer # Will error if the minimal version of Transformers is not installed. Remove at your own risks. diff --git a/examples/training/huggingface/bert/task_qa/run_qa.py b/examples/training/huggingface/bert/task_qa/run_qa.py index 04055b16..83c4fe02 100644 --- a/examples/training/huggingface/bert/task_qa/run_qa.py +++ b/examples/training/huggingface/bert/task_qa/run_qa.py @@ -46,7 +46,7 @@ from transformers.utils import check_min_version from transformers.utils.versions import require_version from utils_qa import postprocess_qa_predictions -from ls_hf_transformer_encoder_layer import inject_ls_layer +from ls_hf_transformer_layer import inject_ls_layer # Will error if the minimal version of Transformers is not installed. Remove at your own risks. diff --git a/examples/training/huggingface/gpt/__init__.py b/examples/training/huggingface/gpt/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/examples/training/huggingface/gpt/ls_hf_gpt_layer.py b/examples/training/huggingface/gpt/ls_hf_gpt_layer.py new file mode 100644 index 00000000..b8b05437 --- /dev/null +++ b/examples/training/huggingface/gpt/ls_hf_gpt_layer.py @@ -0,0 +1,123 @@ +from torch import nn + +from lightseq.training.ops.pytorch.quantization import ( + qat_mode, + disable_quant, + TensorQuantizer, + weight_quant_config, +) +from lightseq.training.ops.pytorch.torch_transformer_layers import ( + TransformerDecoderLayer, + copy_para, +) + + +def get_hf_gpt_enc_layer_params(layer, config): + init_ws = [] + init_bs = [] + + init_ws.extend( + layer.attn.c_attn.weight.detach().clone().t().split(config.hidden_size, 0) + ) + init_bs.extend(layer.attn.c_attn.bias.detach().clone().split(config.hidden_size, 0)) + + init_ws.append(layer.attn.c_proj.weight.detach().clone().t().reshape(-1)) + init_bs.append(layer.attn.c_proj.bias.detach().clone()) + init_ws.append(layer.ln_1.weight.detach().clone()) + init_bs.append(layer.ln_1.bias.detach().clone()) + + init_ws.append(layer.mlp.c_fc.weight.detach().clone().t().reshape(-1)) + init_bs.append(layer.mlp.c_fc.bias.detach().clone()) + init_ws.append(layer.mlp.c_proj.weight.detach().clone().t().reshape(-1)) + init_bs.append(layer.mlp.c_proj.bias.detach().clone()) + init_ws.append(layer.ln_2.weight.detach().clone()) + init_bs.append(layer.ln_2.bias.detach().clone()) + + return init_ws, init_bs + + +def get_hf_gpt_emb_layer_params(layer): + init_ws = [] + + init_ws.append(layer.wte.weight.detach().clone()) + init_ws.append(layer.wpe.weight.detach().clone()) + + return init_ws + + +def gen_gpt_enc_config(training_args, config): + gpt_enc_config = TransformerDecoderLayer.get_config( + max_batch_tokens=8192, + max_seq_len=config.max_position_embeddings, + hidden_size=config.hidden_size, + intermediate_size=4 * config.hidden_size, + nhead=config.num_attention_heads, + attn_prob_dropout_ratio=config.attn_pdrop, + activation_dropout_ratio=config.resid_pdrop, + hidden_dropout_ratio=config.resid_pdrop, + pre_layer_norm=True, + fp16=training_args.fp16, + local_rank=training_args.local_rank, + nlayer=config.num_hidden_layers, + activation_fn="gelu", + has_cross_attn=False, + ) + return gpt_enc_config + + +class LSHFGptEncoderLayer(TransformerDecoderLayer): + def __init__(self, *args, **kwargs): + super(LSHFGptEncoderLayer, self).__init__(*args, **kwargs) + + def forward( + self, hidden_states, layer_past=None, attention_mask=None, *args, **kwargs + ): + ls_attention_mask = attention_mask / -10000.0 + ls_attention_mask = ls_attention_mask.squeeze() + output = super().forward(hidden_states, ls_attention_mask) + return output + + +class GptEmbedding(nn.Embedding): + def __init__(self, training_args, initial_embeddings, *args, **kwargs): + super(GptEmbedding, self).__init__(*args, **kwargs) + self.emb_quant = TensorQuantizer(weight_quant_config) + + if initial_embeddings is not None: + self.weight.data.copy_(copy_para(initial_embeddings, training_args.fp16)) + + def forward(self, input_ids): + x = super(GptEmbedding, self).forward(input_ids) + x = self.emb_quant(x) + return x + + +def inject_ls_layer(model, training_args, model_args, config): + if model_args.module_type == 1: + from lightseq.training import ls_hf_gpt_enc_convert + + ls_hf_gpt_enc_convert(model, training_args, config) + return + + if model_args.module_type != 2: + raise NotImplementedError + + init_ws = get_hf_gpt_emb_layer_params(model.transformer) + model.transformer.wte = GptEmbedding( + training_args, init_ws[0], config.vocab_size, config.hidden_size + ) + if model_args.enable_quant: + model.transformer.wte.apply(qat_mode) + else: + model.transformer.wte.apply(disable_quant) + + for i in range(config.num_hidden_layers): + gpt_enc_config = gen_gpt_enc_config(training_args, config) + init_ws, init_bs = get_hf_gpt_enc_layer_params(model.transformer.h[i], config) + model.transformer.h[i] = LSHFGptEncoderLayer( + gpt_enc_config, init_ws, init_bs + ).cuda() + if model_args.enable_quant: + model.transformer.h[i].apply(qat_mode) + else: + model.transformer.h[i].apply(disable_quant) diff --git a/examples/training/huggingface/gpt/run_clm.py b/examples/training/huggingface/gpt/run_clm.py index 90b9dd8d..52dfc223 100644 --- a/examples/training/huggingface/gpt/run_clm.py +++ b/examples/training/huggingface/gpt/run_clm.py @@ -33,6 +33,7 @@ import datasets from datasets import load_dataset +import torch import transformers from transformers import ( CONFIG_MAPPING, @@ -50,8 +51,7 @@ from transformers.trainer_utils import get_last_checkpoint from transformers.utils import check_min_version from transformers.utils.versions import require_version - -from lightseq.training import ls_hf_gpt_convert +from ls_hf_gpt_layer import inject_ls_layer # Will error if the minimal version of Transformers is not installed. Remove at your own risks. @@ -133,9 +133,15 @@ class ModelArguments: "with private models)." }, ) - with_lightseq: bool = field( - default=True, - metadata={"help": "Whether to use lightseq"}, + module_type: int = field( + default=1, + metadata={ + "help": "0: original Hugging Face layer, 1: LightSeq CUDA layer, 2: custom Torch layer" + }, + ) + enable_quant: bool = field( + default=False, + metadata={"help": "Whether to enable quantization"}, ) def __post_init__(self): @@ -436,8 +442,8 @@ def main(): ) # Replace with LightSeq encoder layers. - if model_args.with_lightseq: - ls_hf_gpt_convert(model, training_args, config) + if model_args.module_type == 1 or model_args.module_type == 2: + inject_ls_layer(model, training_args, model_args, config) model.resize_token_embeddings(len(tokenizer)) @@ -548,6 +554,12 @@ def group_texts(examples): data_collator=default_data_collator, ) + if not training_args.do_train: + state_dict = torch.load( + training_args.resume_from_checkpoint, map_location="cpu" + ) + trainer._load_state_dict_in_model(state_dict) + # Training if training_args.do_train: checkpoint = None diff --git a/examples/training/huggingface/gpt/run_clm.sh b/examples/training/huggingface/gpt/run_clm.sh index 863a8b97..30449bc4 100644 --- a/examples/training/huggingface/gpt/run_clm.sh +++ b/examples/training/huggingface/gpt/run_clm.sh @@ -8,12 +8,15 @@ python3 -m torch.distributed.launch \ --model_name_or_path gpt2 \ --dataset_name wikitext \ --dataset_config_name wikitext-103-raw-v1 \ - --per_device_train_batch_size 8 \ + --per_device_train_batch_size 16 \ --per_device_eval_batch_size 8 \ + --num_train_epochs 1 \ --do_train \ --do_eval \ --output_dir /tmp/test-clm \ --overwrite_output_dir \ --fp16 \ --logging_steps 10 \ - --block_size 512 + --block_size 512 \ + --module_type 2 \ + --enable_quant false diff --git a/examples/training/huggingface/gpt/run_quant_clm.sh b/examples/training/huggingface/gpt/run_quant_clm.sh new file mode 100644 index 00000000..e9eb847b --- /dev/null +++ b/examples/training/huggingface/gpt/run_quant_clm.sh @@ -0,0 +1,23 @@ +#! /bin/bash + +THIS_DIR=$(dirname $(readlink -f $0)) + +python3 -m torch.distributed.launch \ + --nproc_per_node=1 \ + $THIS_DIR/run_clm.py \ + --model_name_or_path gpt2 \ + --dataset_name wikitext \ + --dataset_config_name wikitext-103-raw-v1 \ + --per_device_train_batch_size 16 \ + --per_device_eval_batch_size 8 \ + --num_train_epochs 1 \ + --do_train \ + --do_eval \ + --output_dir /tmp/quant/test-clm \ + --overwrite_output_dir \ + --resume_from_checkpoint /tmp/test-clm \ + --fp16 \ + --logging_steps 10 \ + --block_size 512 \ + --module_type 2 \ + --enable_quant true diff --git a/lightseq/training/__init__.py b/lightseq/training/__init__.py index 16728a3d..e7da197b 100644 --- a/lightseq/training/__init__.py +++ b/lightseq/training/__init__.py @@ -7,9 +7,9 @@ from lightseq.training.ops.pytorch.transformer_decoder_layer import ( LSTransformerDecoderLayer, ) -from lightseq.training.ops.pytorch.gpt_encoder_layer import ( +from lightseq.training.ops.pytorch.gpt_layer import ( LSGptEncoderLayer, - ls_hf_gpt_convert, + ls_hf_gpt_enc_convert, ) from lightseq.training.ops.pytorch.transformer import ( LSTransformer, diff --git a/lightseq/training/ops/pytorch/gpt_encoder_layer.py b/lightseq/training/ops/pytorch/gpt_layer.py similarity index 82% rename from lightseq/training/ops/pytorch/gpt_encoder_layer.py rename to lightseq/training/ops/pytorch/gpt_layer.py index bfd9a559..f22c2e54 100644 --- a/lightseq/training/ops/pytorch/gpt_encoder_layer.py +++ b/lightseq/training/ops/pytorch/gpt_layer.py @@ -56,29 +56,29 @@ def create_cpp_layer(self): @staticmethod def from_huggingface(layer, training_args, model_config): - ls_gpt_config = gen_ls_gpt_config(training_args, model_config) - init_ws, init_bs = get_hf_gpt_layer_params(layer, ls_gpt_config) - return LSHFGptLayer(ls_gpt_config, init_ws, init_bs).cuda() + ls_gpt_config = gen_ls_gpt_enc_config(training_args, model_config) + init_ws, init_bs = get_hf_gpt_enc_layer_params(layer, ls_gpt_config) + return LSHFGptEncoderLayer(ls_gpt_config, init_ws, init_bs).cuda() -class LSHFGptLayer(LSGptEncoderLayer): +class LSHFGptEncoderLayer(LSGptEncoderLayer): def __init__(self, *args, **kwargs): - super(LSHFGptLayer, self).__init__(*args, **kwargs) + super(LSHFGptEncoderLayer, self).__init__(*args, **kwargs) def forward(self, hidden_states, attention_mask=None, *args, **kwargs): # attention mask from transformers is a tensor. # sizes are[batch_size, 1, 1, to_seq_length] # masked value is -10000.0, unmasked value is 0.0 if attention_mask is not None: - attention_mask = attention_mask.squeeze() - attention_mask = attention_mask / -10000 + ls_attention_mask = attention_mask.squeeze() + ls_attention_mask = ls_attention_mask / -10000 else: - attention_mask = torch.zeros(hidden_states.size()[:2]) - output = super().forward(hidden_states, attention_mask) + ls_attention_mask = torch.zeros(hidden_states.size()[:2]) + output = super().forward(hidden_states, ls_attention_mask) return (output, None, None, None) -def gen_ls_gpt_config(training_args, config): +def gen_ls_gpt_enc_config(training_args, config): gpt_config = LSGptEncoderLayer.get_config( max_batch_tokens=8192, max_seq_len=config.max_position_embeddings, @@ -96,7 +96,7 @@ def gen_ls_gpt_config(training_args, config): return gpt_config -def get_hf_gpt_layer_params(layer, gpt_config): +def get_hf_gpt_enc_layer_params(layer, gpt_config): init_ws = [] init_bs = [] @@ -122,8 +122,8 @@ def get_hf_gpt_layer_params(layer, gpt_config): return init_ws, init_bs -def ls_hf_gpt_convert(model, training_args, config): +def ls_hf_gpt_enc_convert(model, training_args, config): for i in range(config.num_hidden_layers): - model.transformer.h[i] = LSHFGptLayer.from_huggingface( + model.transformer.h[i] = LSHFGptEncoderLayer.from_huggingface( model.transformer.h[i], training_args, config ).cuda() diff --git a/lightseq/training/ops/pytorch/layer_base.py b/lightseq/training/ops/pytorch/layer_base.py index 12ecd07d..e8a009e1 100644 --- a/lightseq/training/ops/pytorch/layer_base.py +++ b/lightseq/training/ops/pytorch/layer_base.py @@ -76,6 +76,7 @@ class Config: local_rank: int # rank in local node nlayer: int # number of layers activation_fn: str = "relu" # relu or gelu + has_cross_attn: bool = True if "model" in kwargs: if kwargs["model"] not in MODEL_ARCH: diff --git a/lightseq/training/ops/pytorch/torch_transformer_layers.py b/lightseq/training/ops/pytorch/torch_transformer_layers.py index e3906e5f..ca83c356 100644 --- a/lightseq/training/ops/pytorch/torch_transformer_layers.py +++ b/lightseq/training/ops/pytorch/torch_transformer_layers.py @@ -662,7 +662,6 @@ def __init__(self, config, initial_weights=None, initial_biases=None): super().__init__() self.embed_dim = config.hidden_size self.dropout_module = Dropout(config.hidden_dropout_ratio) - self.cross_self_attention = False self.self_attn = self.build_self_attention( self.embed_dim, @@ -673,16 +672,18 @@ def __init__(self, config, initial_weights=None, initial_biases=None): self.activation_fn = util.get_activation_fn(activation=config.activation_fn) self.activation_dropout_module = Dropout(float(config.activation_dropout_ratio)) self.normalize_before = config.pre_layer_norm + self.has_cross_attn = config.has_cross_attn self.self_attn_layer_norm = LayerNorm(self.embed_dim) - self.encoder_attn = self.build_encoder_attention( - self.embed_dim, - config.hidden_size, - config.attn_prob_dropout_ratio, - config.nhead, - ) - self.encoder_attn_layer_norm = LayerNorm(self.embed_dim) + if config.has_cross_attn: + self.encoder_attn = self.build_encoder_attention( + self.embed_dim, + config.hidden_size, + config.attn_prob_dropout_ratio, + config.nhead, + ) + self.encoder_attn_layer_norm = LayerNorm(self.embed_dim) self.fc1 = QuantLinear( self.embed_dim, @@ -721,46 +722,58 @@ def __init__(self, config, initial_weights=None, initial_biases=None): self.self_attn_layer_norm.bias.data.copy_( copy_para(initial_biases[4], config.fp16) ) - self.encoder_attn.q_proj.weight.data.copy_( - copy_para(initial_weights[5], config.fp16) - ) - self.encoder_attn.q_proj.bias.data.copy_( - copy_para(initial_weights[5], config.fp16) - ) - self.encoder_attn.k_proj.weight.data.copy_( - copy_para(initial_weights[6], config.fp16) - ) - self.encoder_attn.k_proj.bias.data.copy_( - copy_para(initial_weights[6], config.fp16) - ) - self.encoder_attn.v_proj.weight.data.copy_( - copy_para(initial_weights[7], config.fp16) - ) - self.encoder_attn.v_proj.bias.data.copy_( - copy_para(initial_weights[7], config.fp16) - ) - self.encoder_attn.out_proj.weight.data.copy_( - copy_para(initial_weights[8], config.fp16) - ) - self.encoder_attn.out_proj.bias.data.copy_( - copy_para(initial_biases[8], config.fp16) - ) - self.encoder_attn_layer_norm.weight.data.copy_( - copy_para(initial_weights[9], config.fp16) - ) - self.encoder_attn_layer_norm.bias.data.copy_( - copy_para(initial_biases[9], config.fp16) - ) - self.fc1.weight.data.copy_(copy_para(initial_weights[10], config.fp16)) - self.fc1.bias.data.copy_(copy_para(initial_biases[10], config.fp16)) - self.fc2.weight.data.copy_(copy_para(initial_weights[11], config.fp16)) - self.fc2.bias.data.copy_(copy_para(initial_biases[11], config.fp16)) - self.final_layer_norm.weight.data.copy_( - copy_para(initial_weights[12], config.fp16) - ) - self.final_layer_norm.bias.data.copy_( - copy_para(initial_biases[12], config.fp16) - ) + if config.has_cross_attn: + self.encoder_attn.q_proj.weight.data.copy_( + copy_para(initial_weights[5], config.fp16) + ) + self.encoder_attn.q_proj.bias.data.copy_( + copy_para(initial_weights[5], config.fp16) + ) + self.encoder_attn.k_proj.weight.data.copy_( + copy_para(initial_weights[6], config.fp16) + ) + self.encoder_attn.k_proj.bias.data.copy_( + copy_para(initial_weights[6], config.fp16) + ) + self.encoder_attn.v_proj.weight.data.copy_( + copy_para(initial_weights[7], config.fp16) + ) + self.encoder_attn.v_proj.bias.data.copy_( + copy_para(initial_weights[7], config.fp16) + ) + self.encoder_attn.out_proj.weight.data.copy_( + copy_para(initial_weights[8], config.fp16) + ) + self.encoder_attn.out_proj.bias.data.copy_( + copy_para(initial_biases[8], config.fp16) + ) + self.encoder_attn_layer_norm.weight.data.copy_( + copy_para(initial_weights[9], config.fp16) + ) + self.encoder_attn_layer_norm.bias.data.copy_( + copy_para(initial_biases[9], config.fp16) + ) + self.fc1.weight.data.copy_(copy_para(initial_weights[10], config.fp16)) + self.fc1.bias.data.copy_(copy_para(initial_biases[10], config.fp16)) + self.fc2.weight.data.copy_(copy_para(initial_weights[11], config.fp16)) + self.fc2.bias.data.copy_(copy_para(initial_biases[11], config.fp16)) + self.final_layer_norm.weight.data.copy_( + copy_para(initial_weights[12], config.fp16) + ) + self.final_layer_norm.bias.data.copy_( + copy_para(initial_biases[12], config.fp16) + ) + else: + self.fc1.weight.data.copy_(copy_para(initial_weights[5], config.fp16)) + self.fc1.bias.data.copy_(copy_para(initial_biases[5], config.fp16)) + self.fc2.weight.data.copy_(copy_para(initial_weights[6], config.fp16)) + self.fc2.bias.data.copy_(copy_para(initial_biases[6], config.fp16)) + self.final_layer_norm.weight.data.copy_( + copy_para(initial_weights[7], config.fp16) + ) + self.final_layer_norm.bias.data.copy_( + copy_para(initial_biases[7], config.fp16) + ) def build_self_attention( self, embed_dim, nhead, attn_dropout, add_bias_kv=False, add_zero_attn=False @@ -771,7 +784,7 @@ def build_self_attention( dropout=attn_dropout, add_bias_kv=add_bias_kv, add_zero_attn=add_zero_attn, - self_attention=not self.cross_self_attention, + self_attention=True, is_decoder=True, ) @@ -837,35 +850,11 @@ def forward( saved_state["prev_key_padding_mask"] = prev_self_attn_state[2] assert incremental_state is not None self.self_attn._set_input_buffer(incremental_state, saved_state) - _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state) - if self.cross_self_attention and not ( - incremental_state is not None - and _self_attn_input_buffer is not None - and "prev_key" in _self_attn_input_buffer - ): - if self_attn_mask is not None: - assert encoder_out is not None - self_attn_mask = torch.cat( - (x.new_zeros(x.size(0), encoder_out.size(0)), self_attn_mask), dim=1 - ) - if self_attn_padding_mask is not None: - if encoder_padding_mask is None: - assert encoder_out is not None - encoder_padding_mask = self_attn_padding_mask.new_zeros( - encoder_out.size(1), encoder_out.size(0) - ) - self_attn_padding_mask = torch.cat( - (encoder_padding_mask, self_attn_padding_mask), dim=1 - ) - assert encoder_out is not None - y = torch.cat((encoder_out, x), dim=0) - else: - y = x x, attn = self.self_attn( query=x, - key=y, - value=y, + key=x, + value=x, key_padding_mask=self_attn_padding_mask, incremental_state=incremental_state, need_weights=False, @@ -876,7 +865,7 @@ def forward( if not self.normalize_before: x = self.self_attn_layer_norm(x) - if self.encoder_attn is not None and encoder_out is not None: + if self.has_cross_attn and encoder_out is not None: if ( encoder_out.shape[1] != x.shape[1] and x.shape[1] % encoder_out.shape[1] == 0 From c17fdbb5f3a6b885e5f8f7693725df75839d1a72 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Mon, 18 Apr 2022 22:45:05 +0800 Subject: [PATCH 30/49] add causal mask for gpt encoder --- .../huggingface/gpt/ls_hf_gpt_layer.py | 10 +++---- .../training/huggingface/gpt/run_quant_clm.sh | 2 +- lightseq/training/ops/pytorch/gpt_layer.py | 3 +- .../ops/pytorch/torch_transformer_layers.py | 30 ++++++++++++++++++- 4 files changed, 36 insertions(+), 9 deletions(-) diff --git a/examples/training/huggingface/gpt/ls_hf_gpt_layer.py b/examples/training/huggingface/gpt/ls_hf_gpt_layer.py index b8b05437..b22c7325 100644 --- a/examples/training/huggingface/gpt/ls_hf_gpt_layer.py +++ b/examples/training/huggingface/gpt/ls_hf_gpt_layer.py @@ -69,11 +69,11 @@ class LSHFGptEncoderLayer(TransformerDecoderLayer): def __init__(self, *args, **kwargs): super(LSHFGptEncoderLayer, self).__init__(*args, **kwargs) - def forward( - self, hidden_states, layer_past=None, attention_mask=None, *args, **kwargs - ): - ls_attention_mask = attention_mask / -10000.0 - ls_attention_mask = ls_attention_mask.squeeze() + def forward(self, hidden_states, attention_mask=None, *args, **kwargs): + if attention_mask is not None: + ls_attention_mask = attention_mask.squeeze() + else: + ls_attention_mask = torch.zeros(hidden_states.size()[:2]) output = super().forward(hidden_states, ls_attention_mask) return output diff --git a/examples/training/huggingface/gpt/run_quant_clm.sh b/examples/training/huggingface/gpt/run_quant_clm.sh index e9eb847b..196e6434 100644 --- a/examples/training/huggingface/gpt/run_quant_clm.sh +++ b/examples/training/huggingface/gpt/run_quant_clm.sh @@ -10,7 +10,7 @@ python3 -m torch.distributed.launch \ --dataset_config_name wikitext-103-raw-v1 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 8 \ - --num_train_epochs 1 \ + --num_train_epochs 2 \ --do_train \ --do_eval \ --output_dir /tmp/quant/test-clm \ diff --git a/lightseq/training/ops/pytorch/gpt_layer.py b/lightseq/training/ops/pytorch/gpt_layer.py index f22c2e54..3a5c8a57 100644 --- a/lightseq/training/ops/pytorch/gpt_layer.py +++ b/lightseq/training/ops/pytorch/gpt_layer.py @@ -71,7 +71,6 @@ def forward(self, hidden_states, attention_mask=None, *args, **kwargs): # masked value is -10000.0, unmasked value is 0.0 if attention_mask is not None: ls_attention_mask = attention_mask.squeeze() - ls_attention_mask = ls_attention_mask / -10000 else: ls_attention_mask = torch.zeros(hidden_states.size()[:2]) output = super().forward(hidden_states, ls_attention_mask) @@ -87,7 +86,7 @@ def gen_ls_gpt_enc_config(training_args, config): nhead=config.num_attention_heads, attn_prob_dropout_ratio=config.attn_pdrop, activation_dropout_ratio=config.resid_pdrop, - hidden_dropout_ratio=config.embd_pdrop, + hidden_dropout_ratio=config.resid_pdrop, pre_layer_norm=True, fp16=training_args.fp16, local_rank=training_args.local_rank, diff --git a/lightseq/training/ops/pytorch/torch_transformer_layers.py b/lightseq/training/ops/pytorch/torch_transformer_layers.py index ca83c356..49376b92 100644 --- a/lightseq/training/ops/pytorch/torch_transformer_layers.py +++ b/lightseq/training/ops/pytorch/torch_transformer_layers.py @@ -50,12 +50,14 @@ def __init__( self_attention=False, encoder_decoder_attention=False, is_decoder=False, + has_causal_mask=False, ): super().__init__() self.embed_dim = embed_dim self.kdim = kdim if kdim is not None else embed_dim self.vdim = vdim if vdim is not None else embed_dim self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim + self.has_causal_mask = has_causal_mask self.num_heads = num_heads self.dropout_module = Dropout(dropout) @@ -70,6 +72,15 @@ def __init__( self.encoder_decoder_attention = encoder_decoder_attention self.is_decoder = is_decoder + max_positions = 1024 + self.register_buffer( + "bias", + torch.tril( + torch.ones((max_positions, max_positions), dtype=torch.uint8) + ).view(1, max_positions, max_positions), + ) + self.register_buffer("masked_bias", torch.tensor(-1e4)) + assert ( not self.self_attention or self.qkv_same_dim ), "Self-attention requires query, key and value to be of the same size" @@ -320,6 +331,15 @@ def forward( assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len] + if self.has_causal_mask: + query_length, key_length = query.size(0), key.size(0) + causal_mask = self.bias[ + :, key_length - query_length : key_length, :key_length + ].bool() + attn_weights = torch.where( + causal_mask, attn_weights, self.masked_bias.to(attn_weights.dtype) + ) + if attn_mask is not None: attn_mask = attn_mask.unsqueeze(0) if self.onnx_trace: @@ -667,6 +687,7 @@ def __init__(self, config, initial_weights=None, initial_biases=None): self.embed_dim, config.nhead, config.attn_prob_dropout_ratio, + has_causal_mask=not config.has_cross_attn, ) self.activation_fn = util.get_activation_fn(activation=config.activation_fn) @@ -776,7 +797,13 @@ def __init__(self, config, initial_weights=None, initial_biases=None): ) def build_self_attention( - self, embed_dim, nhead, attn_dropout, add_bias_kv=False, add_zero_attn=False + self, + embed_dim, + nhead, + attn_dropout, + add_bias_kv=False, + add_zero_attn=False, + has_causal_mask=False, ): return MultiheadAttention( embed_dim, @@ -786,6 +813,7 @@ def build_self_attention( add_zero_attn=add_zero_attn, self_attention=True, is_decoder=True, + has_causal_mask=has_causal_mask, ) def build_encoder_attention( From 19dd24ab033a7212f17816ed841cde1d2bafd499 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Tue, 19 Apr 2022 13:10:45 +0800 Subject: [PATCH 31/49] support quant gpt export --- examples/inference/python/README.md | 7 +- .../fairseq/ls_fs_transformer_export.py | 2 +- .../fairseq/ls_fs_transformer_ptq_export.py | 2 +- .../ls_torch_fs_quant_transformer_export.py | 2 +- .../fairseq/ls_torch_fs_transformer_export.py | 2 +- .../ls_torch_fs_transformer_ptq_export.py | 2 +- .../fairseq/native_fs_transformer_export.py | 2 +- .../native_fs_transformer_ptq_export.py | 2 +- .../ls_torch_hf_quant_bert_export.py | 5 +- .../ls_torch_hf_quant_gpt2_export.py | 221 ++++++++++++++++++ .../python/export/{fairseq => }/util.py | 0 .../inference/python/test/ls_quant_bert.py | 2 +- 12 files changed, 235 insertions(+), 14 deletions(-) create mode 100644 examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py rename examples/inference/python/export/{fairseq => }/util.py (100%) diff --git a/examples/inference/python/README.md b/examples/inference/python/README.md index 7d1f03ae..23ab1bd3 100644 --- a/examples/inference/python/README.md +++ b/examples/inference/python/README.md @@ -9,14 +9,15 @@ cd examples/inference/python ## Model export We provide the following export examples. All Fairseq based models are trained using the scripts in [examples/training/fairseq](../../../examples/training/fairseq). The first two LightSeq Transformer models are trained using the scripts in [examples/training/custom](../../../examples/training/custom). -| Model | Type | Command | Resource | Description | -| -------------------------------------------- | ----- | ----------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Model | Type | Command | Resource | Description | +|----------------------------------------------|-------|-------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------| | LightSeq Transformer | Float | python export/ls_transformer_export.py -m ckpt_ls_custom.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt) | Export LightSeq Transformer models to protobuf format. | | LightSeq Transformer + PTQ | Int8 | python export/ls_transformer_ptq_export.py -m ckpt_ls_custom.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/ckpt_ls_custom.pt) | Export LightSeq Transformer models to int8 protobuf format using post training quantization. | | Hugging Face BART | Float | python export/huggingface/hf_bart_export.py | / | Export Hugging Face BART models to protobuf/hdf5 format. | | Hugging Face BERT | Float | python export/huggingface/hf_bert_export.py | / | Export Hugging Face BERT models to hdf5 format. | -| Hugging Face + custom Torch layer BERT + QAT | Int8 | python export/huggingface/ls_torch_hf_quant_bert_export.py -m ckpt_hf_torch_quant_bert_ner.bin | / | Export Hugging Face BERT training with custom Torch layers to hdf5 format. | +| Hugging Face + custom Torch layer BERT + QAT | Int8 | python export/huggingface/ls_torch_hf_quant_bert_export.py -m ckpt_ls_torch_hf_quant_bert_ner.bin | / | Export Hugging Face BERT training with custom Torch layers to hdf5 format. | | Hugging Face GPT2 | Float | python export/huggingface/hf_gpt2_export.py | / | Export Hugging Face GPT2 models to hdf5 format. | +| Hugging Face + custom Torch layer GPT2 + QAT | Int8 | python export/huggingface/ls_torch_hf_quant_gpt2_export.py -m ckpt_ls_torch_hf_quant_gpt2_ner.bin | / | Export Hugging Face GPT2 training with custom Torch layers to hdf5 format. | | Native Fairseq Transformer | Float | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to protobuf/hdf5 format. | | Native Fairseq Transformer + PTQ | Int8 | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to int8 protobuf format using post training quantization. | | Fairseq + LightSeq Transformer | Float | python export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt) | Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. | diff --git a/examples/inference/python/export/fairseq/ls_fs_transformer_export.py b/examples/inference/python/export/fairseq/ls_fs_transformer_export.py index c1930940..1b86e7d8 100644 --- a/examples/inference/python/export/fairseq/ls_fs_transformer_export.py +++ b/examples/inference/python/export/fairseq/ls_fs_transformer_export.py @@ -11,7 +11,7 @@ export_ls_decoder, ) import lightseq.inference as lsi -from export.fairseq.util import parse_args, save_model +from export.util import parse_args, save_model def _extract_weight(state_dict): diff --git a/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py index 5ff9c780..ae093990 100644 --- a/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/ls_fs_transformer_ptq_export.py @@ -12,7 +12,7 @@ export_ls_decoder_ptq, ) import lightseq.inference as lsi -from export.fairseq.util import parse_args, save_model +from export.util import parse_args, save_model # adjust this value to achieve better performance diff --git a/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py b/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py index f7b7b9c4..f10abfcb 100644 --- a/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py +++ b/examples/inference/python/export/fairseq/ls_torch_fs_quant_transformer_export.py @@ -14,7 +14,7 @@ ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi -from export.fairseq.util import parse_args, save_model +from export.util import parse_args, save_model enc_layer_mapping_dict = OrderedDict( diff --git a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py index cbee3c8d..ea223d53 100644 --- a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py +++ b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_export.py @@ -14,7 +14,7 @@ ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi -from export.fairseq.util import parse_args, save_model +from export.util import parse_args, save_model enc_layer_mapping_dict = OrderedDict( diff --git a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py index 1ed7a0e1..2ab259e9 100644 --- a/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/ls_torch_fs_transformer_ptq_export.py @@ -14,7 +14,7 @@ ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi -from export.fairseq.util import parse_args, save_model +from export.util import parse_args, save_model # adjust this value to achieve better performance diff --git a/examples/inference/python/export/fairseq/native_fs_transformer_export.py b/examples/inference/python/export/fairseq/native_fs_transformer_export.py index fcc59234..49e8aab8 100644 --- a/examples/inference/python/export/fairseq/native_fs_transformer_export.py +++ b/examples/inference/python/export/fairseq/native_fs_transformer_export.py @@ -13,7 +13,7 @@ ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi -from export.fairseq.util import parse_args, save_model +from export.util import parse_args, save_model enc_layer_mapping_dict = OrderedDict( diff --git a/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py b/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py index d97436a3..af704b6f 100644 --- a/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py +++ b/examples/inference/python/export/fairseq/native_fs_transformer_ptq_export.py @@ -13,7 +13,7 @@ ) from lightseq.training.ops.pytorch.util import get_pos_embedding import lightseq.inference as lsi -from export.fairseq.util import parse_args, save_model +from export.util import parse_args, save_model # adjust this value to achieve better performance diff --git a/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py b/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py index 18dc2f1a..72f18e90 100644 --- a/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py +++ b/examples/inference/python/export/huggingface/ls_torch_hf_quant_bert_export.py @@ -1,15 +1,14 @@ """ -Export Hugging Face BERT models to hdf5 format. +Export Hugging Face quantized BERT models to hdf5 format. """ import os import h5py from collections import OrderedDict -import numpy as np import torch from lightseq.training.ops.pytorch.export import apply_rule from lightseq.training.ops.pytorch.export_quant import quantize -from export.fairseq.util import parse_args +from export.util import parse_args os.environ["CUDA_VISIBLE_DEVICES"] = "-1" diff --git a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py new file mode 100644 index 00000000..b403c981 --- /dev/null +++ b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py @@ -0,0 +1,221 @@ +""" +Export Hugging Face quantized GPT2 models to hdf5 format. +""" +import os +import h5py +from collections import OrderedDict + +import numpy as np +import torch +from lightseq.training.ops.pytorch.export import apply_rule +from lightseq.training.ops.pytorch.export_quant import quantize +from export.util import parse_args + +os.environ["CUDA_VISIBLE_DEVICES"] = "-1" + + +""" +For the mapping dictionary: key is the value of the proto parameter, +value is a powerful expression, each && split tensor name of the matching path or expression. + +The sub-pattern of the path is separated by spaces, and the expression starts with a expression_. +You can operate separately on each tensor and support multiple expressions. Multiple matching paths +and the expression will finally be concatenated on axis = -1. +""" +enc_layer_mapping_dict = OrderedDict( + { + "self_norm_scale": "self_attn_layer_norm weight", + "self_norm_bias": "self_attn_layer_norm bias", + "self_project_kernel_qkv": "self_attn qkv_proj weight&&expression_.transpose(0, 1)", + "self_project_bias_qkv": "self_attn qkv_proj bias", + "self_project_kernel_output": "self_attn out_proj weight&&expression_.transpose(0, 1)", + "self_project_bias_output": "self_attn out_proj bias", + "ffn_norm_scale": "final_layer_norm weight", + "ffn_norm_bias": "final_layer_norm bias", + "ffn_first_kernel": "fc1 weight&&expression_.transpose(0, 1)", + "ffn_first_bias": "fc1 bias", + "ffn_second_kernel": "fc2 weight&&expression_.transpose(0, 1)", + "ffn_second_bias": "fc2 bias", + # weight_clip_max + "self_project_kernel_qkv_clip_max": "self_attn qkv_proj weight_quant clip_value_max", + "self_project_kernel_output_clip_max": "self_attn out_proj weight_quant clip_value_max", + "ffn_first_kernel_clip_max": "fc1 weight_quant clip_value_max", + "ffn_second_kernel_clip_max": "fc2 weight_quant clip_value_max", + # act_clip_max + "self_ln_clip_max": "self_attn qkv_proj input_quant clip_value_max", + "self_project_output_clip_max": "self_attn out_proj input_quant clip_value_max", + "ffn_ln_clip_max": "fc1 input_quant clip_value_max", + "ffn_first_act_clip_max": "fc2 input_quant clip_value_max", + "self_qkv_dense_clip_max": "self_attn qkv_proj output_quant clip_value_max", + "self_output_dense_clip_max": "self_attn out_proj output_quant clip_value_max", + "ffn_first_output_clip_max": "fc1 output_quant clip_value_max", + "self_qkv_bias_out_clip_max": "self_attn attention_quant clip_value_max", + } +) + +src_emb_mapping_dict = OrderedDict( + { + "norm_scale": "ln_f weight", + "norm_bias": "ln_f bias", + } +) + + +def fill_quant_hdf5_layer( + tensor_names, state_dict, hdf5_file, hdf5_dataset_prefix, mapping_dict +): + for proto_name, ckpt_rule in mapping_dict.items(): + target_tensor = apply_rule(proto_name, ckpt_rule, tensor_names, state_dict) + if proto_name.endswith("_clip_max"): + hdf5_file.create_dataset( + hdf5_dataset_prefix + proto_name, data=float(target_tensor[0]) + ) + else: + hdf5_file.create_dataset( + hdf5_dataset_prefix + proto_name, + data=target_tensor, + ) + + +def extract_gpt_weights( + output_file, + model_dir, + head_num, + generation_method, + topk=1, + topp=0.75, + eos_id=50256, + pad_id=50257, + max_step=50, +): + # load var names + state_dict = torch.load(model_dir, "cpu") + + var_name_list = list(state_dict.keys()) + + for name in var_name_list: + if name.endswith("weight_quant.clip.clip_value_max"): + state_dict[name[:-26]] = torch.Tensor( + quantize(state_dict[name[:-26]].numpy(), 127, state_dict[name].numpy()) + ).to(torch.uint8) + + # initialize output file + print("Saving model to hdf5...") + print("Writing to {0}".format(output_file)) + hdf5_file = h5py.File(output_file, "w") + + # fill each encoder layer's params + enc_tensor_names = {} + for name in var_name_list: + name_split = name.split(".") + if len(name_split) <= 2 or not name_split[2].isdigit(): + continue + layer_id = int(name_split[2]) + enc_tensor_names.setdefault(layer_id, []).append(name) + + # fill encoder_stack + for layer_id in sorted(enc_tensor_names.keys()): + fill_quant_hdf5_layer( + enc_tensor_names[layer_id], + state_dict, + hdf5_file, + f"encoder_stack/{layer_id}/", + enc_layer_mapping_dict, + ) + + # fill src_embedding - except for position embedding + fill_quant_hdf5_layer( + var_name_list, + state_dict, + hdf5_file, + "src_embedding/", + src_emb_mapping_dict, + ) + + # handling token_embeddings for GPT + token_embedding = state_dict["transformer.wte.weight"] + token_embedding = quantize( + token_embedding.numpy(), + 127, + state_dict["transformer.wte.emb_quant.clip.clip_value_max"].numpy(), + ) + print(f"processed token_embedding, shape: {token_embedding.shape}") + hdf5_file.create_dataset( + "src_embedding/token_embedding", data=token_embedding, dtype="uint8" + ) + hdf5_file.create_dataset( + "src_embedding/emb_clip_max", + data=state_dict["transformer.wte.emb_quant.clip.clip_value_max"], + ) + + # special handling for position embedding + position_emb = state_dict["transformer.wpe.weight"] + _max_allowed_step, _ = position_emb.shape + if max_step > _max_allowed_step: + print(f"max_step {max_step} exceed max allowed step, abort.") + return + # truncate position embedding for max_step + position_emb = position_emb[:max_step, :] + print( + f"processed position_embedding with max_step constriant, shape: {position_emb.shape}" + ) + position_emb = position_emb.flatten().tolist() + hdf5_file.create_dataset( + "src_embedding/position_embedding", data=position_emb, dtype="f4" + ) + + # save number of layers metadata + hdf5_file.create_dataset( + "model_conf/n_encoder_stack", data=len(enc_tensor_names), dtype="i4" + ) + # fill in model_conf + hdf5_file.create_dataset("model_conf/head_num", data=head_num, dtype="i4") + hdf5_file.create_dataset("model_conf/src_padding_id", data=pad_id, dtype="i4") + hdf5_file.create_dataset( + "model_conf/sampling_method", + data=np.array([ord(c) for c in generation_method]).astype(np.int8), + dtype="i1", + ) + hdf5_file.create_dataset("model_conf/topp", data=topp, dtype="f4") + hdf5_file.create_dataset("model_conf/topk", data=topk, dtype="i4") + hdf5_file.create_dataset("model_conf/eos_id", data=eos_id, dtype="i4") + + hdf5_file.close() + # read-in again to double check + hdf5_file = h5py.File(output_file, "r") + + def _print_pair(key, value): + if key == "sampling_method": + value = "".join(map(chr, value[()])) + else: + value = value[()] + print(f"{key}: {value}") + + list(map(lambda x: _print_pair(*x), hdf5_file["model_conf"].items())) + + +if __name__ == "__main__": + args = parse_args() + model_name = ".".join(args.model.split(".")[:-1]) + hdf5_path = f"{model_name}.hdf5" + + head_number = 12 # 20 for "gpt2-large" + # generation_method should be "topk" or "topp" + generation_method = "topk" + topk = 1 + topp = 0.75 + # default eos_id from https://huggingface.co/transformers/model_doc/gpt2.html#gpt2lmheadmodel + eos_id = 50256 + pad_id = 50257 + max_step = 50 + extract_gpt_weights( + hdf5_path, + args.model, + head_num=head_number, # layer number + generation_method=generation_method, + topk=topk, + topp=topp, + eos_id=eos_id, + pad_id=pad_id, + max_step=max_step, + ) diff --git a/examples/inference/python/export/fairseq/util.py b/examples/inference/python/export/util.py similarity index 100% rename from examples/inference/python/export/fairseq/util.py rename to examples/inference/python/export/util.py diff --git a/examples/inference/python/test/ls_quant_bert.py b/examples/inference/python/test/ls_quant_bert.py index 3b1d402e..b58f7728 100644 --- a/examples/inference/python/test/ls_quant_bert.py +++ b/examples/inference/python/test/ls_quant_bert.py @@ -8,7 +8,7 @@ BertEmbeddingLayer, TransformerEncoderLayer, ) -from export.fairseq.util import parse_args +from export.util import parse_args def ls_bert(model, inputs): From 88ae1d79a45b127febe382aba430ab3fa2526bfe Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Tue, 19 Apr 2022 20:03:12 +0800 Subject: [PATCH 32/49] add quant gpt required files --- lightseq/inference/model/CMakeLists.txt | 12 + lightseq/inference/model/bert_encoder.cc.cu | 2 +- lightseq/inference/model/bert_encoder.h | 2 +- lightseq/inference/model/encoder.h | 2 +- .../inference/model/quant_bert_encoder.cc.cu | 54 +- lightseq/inference/model/quant_bert_encoder.h | 2 +- lightseq/inference/model/quant_decoder.cc.cu | 2 +- lightseq/inference/model/quant_decoder.h | 2 +- lightseq/inference/model/quant_encoder.cc.cu | 2 +- lightseq/inference/model/quant_encoder.h | 2 +- .../inference/model/quant_gpt_encoder.cc.cu | 788 ++++++++++++++++++ lightseq/inference/model/quant_gpt_encoder.h | 109 +++ lightseq/inference/proto/CMakeLists.txt | 8 + lightseq/inference/proto/quant_bert.proto | 1 - lightseq/inference/proto/quant_bert_weight.cc | 2 - lightseq/inference/proto/quant_bert_weight.h | 2 +- lightseq/inference/proto/quant_gpt.proto | 66 ++ lightseq/inference/proto/quant_gpt_weight.cc | 511 ++++++++++++ lightseq/inference/proto/quant_gpt_weight.h | 84 ++ lightseq/inference/pywrapper/CMakeLists.txt | 14 +- lightseq/inference/pywrapper/quant_gpt.cc | 209 +++++ lightseq/inference/pywrapper/quant_gpt.h | 56 ++ lightseq/inference/pywrapper/wrapper.cc | 113 +++ 23 files changed, 2004 insertions(+), 41 deletions(-) create mode 100644 lightseq/inference/model/quant_gpt_encoder.cc.cu create mode 100644 lightseq/inference/model/quant_gpt_encoder.h create mode 100644 lightseq/inference/proto/quant_gpt.proto create mode 100644 lightseq/inference/proto/quant_gpt_weight.cc create mode 100644 lightseq/inference/proto/quant_gpt_weight.h create mode 100644 lightseq/inference/pywrapper/quant_gpt.cc create mode 100644 lightseq/inference/pywrapper/quant_gpt.h diff --git a/lightseq/inference/model/CMakeLists.txt b/lightseq/inference/model/CMakeLists.txt index e767db9c..cdd116ca 100644 --- a/lightseq/inference/model/CMakeLists.txt +++ b/lightseq/inference/model/CMakeLists.txt @@ -42,6 +42,18 @@ endif() target_include_directories(gpt_model PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}) +add_library(quant_gpt_model STATIC quant_gpt_encoder.cc.cu) +target_link_libraries(quant_gpt_model PUBLIC cuda_kernels) +target_link_libraries(quant_gpt_model PUBLIC quant_gpt_weight) +if(DYNAMIC_API) + target_link_libraries(quant_gpt_model PRIVATE CUDA::cublas CUDA::cublasLt) +else() + target_link_libraries(quant_gpt_model PRIVATE CUDA::cublas_static + CUDA::cublasLt_static) +endif() + +target_include_directories(quant_gpt_model PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}) + add_library(bert_model STATIC bert_encoder.cc.cu) target_link_libraries(bert_model PUBLIC cuda_kernels) target_link_libraries(bert_model PUBLIC bert_weight) diff --git a/lightseq/inference/model/bert_encoder.cc.cu b/lightseq/inference/model/bert_encoder.cc.cu index 755267b8..a7160eb3 100644 --- a/lightseq/inference/model/bert_encoder.cc.cu +++ b/lightseq/inference/model/bert_encoder.cc.cu @@ -4,7 +4,7 @@ /** @file -Transformer encoder, composed by gemm lib and +Bert encoder, composed by gemm lib and custom cuda kernel function */ diff --git a/lightseq/inference/model/bert_encoder.h b/lightseq/inference/model/bert_encoder.h index ef7ae6af..4a4a5ca1 100644 --- a/lightseq/inference/model/bert_encoder.h +++ b/lightseq/inference/model/bert_encoder.h @@ -17,7 +17,7 @@ /** @file -Transformer decoder, composed by gemm lib and +Bert encoder, composed by gemm lib and custom cuda kernel function */ diff --git a/lightseq/inference/model/encoder.h b/lightseq/inference/model/encoder.h index b54bf6b7..fe204dcb 100644 --- a/lightseq/inference/model/encoder.h +++ b/lightseq/inference/model/encoder.h @@ -17,7 +17,7 @@ /** @file -Transformer decoder, composed by gemm lib and +Transformer encoder, composed by gemm lib and custom cuda kernel function */ diff --git a/lightseq/inference/model/quant_bert_encoder.cc.cu b/lightseq/inference/model/quant_bert_encoder.cc.cu index 358eaaa3..8af3ff9c 100644 --- a/lightseq/inference/model/quant_bert_encoder.cc.cu +++ b/lightseq/inference/model/quant_bert_encoder.cc.cu @@ -6,7 +6,7 @@ /** @file -Transformer encoder, composed by gemm lib and +QuantBert encoder, composed by gemm lib and custom cuda kernel function */ @@ -127,25 +127,25 @@ void QuantBertEncoder::init_buffer() { quantize_weight(_p_d_enc_wei[_weight_offset + 2], _int8_p_d_enc_wei[_layer_id * 4], _tw._hidden_size, _tw._hidden_size * 3, - _quant_range / _enc_clip_max[_layer_id * 12], _stream, + _quant_range / _enc_clip_max[_layer_id * 11], _stream, _cublas_lt_handle); quantize_weight(_p_d_enc_wei[_weight_offset + 4], _int8_p_d_enc_wei[_layer_id * 4 + 1], _tw._hidden_size, _tw._hidden_size, - _quant_range / _enc_clip_max[_layer_id * 12 + 1], _stream, + _quant_range / _enc_clip_max[_layer_id * 11 + 1], _stream, _cublas_lt_handle); quantize_weight(_p_d_enc_wei[_weight_offset + 8], _int8_p_d_enc_wei[_layer_id * 4 + 2], _tw._hidden_size, _tw._inner_size, - _quant_range / _enc_clip_max[_layer_id * 12 + 2], _stream, + _quant_range / _enc_clip_max[_layer_id * 11 + 2], _stream, _cublas_lt_handle); quantize_weight(_p_d_enc_wei[_weight_offset + 10], _int8_p_d_enc_wei[_layer_id * 4 + 3], _tw._inner_size, _tw._hidden_size, - _quant_range / _enc_clip_max[_layer_id * 12 + 3], _stream, + _quant_range / _enc_clip_max[_layer_id * 11 + 3], _stream, _cublas_lt_handle); if (_tw._use_gelu) { @@ -153,7 +153,7 @@ void QuantBertEncoder::init_buffer() { } else { CHECK_GPU_ERROR(cudaMalloc(&_scaled_ffn2_colsum[_layer_id], _tw._hidden_size * sizeof(_DataType))); - float relu_scale = _enc_clip_max[_layer_id * 12 + 7] / 2; + float relu_scale = _enc_clip_max[_layer_id * 11 + 7] / 2; _DataType *temp; int weight_size = _tw._inner_size * _tw._hidden_size; @@ -270,7 +270,7 @@ void QuantBertEncoder::self_attention() { _batch_token_num, _tw._hidden_size, _stream, _p_d_output, _int8_ffn_in_buf, _p_device_wei[_weight_offset], _p_device_wei[_weight_offset + 1], _p_device_wei[_weight_offset + 5], - _max_thread_per_block, _quant_range / _enc_clip_max[_layer_id * 12 + 4], + _max_thread_per_block, _quant_range / _enc_clip_max[_layer_id * 11 + 4], _tw._is_post_ln, true); } CHECK_GPU_ERROR(cudaGetLastError()); @@ -291,8 +291,8 @@ void QuantBertEncoder::self_attention() { cublasLtMM_withAlgo_i8IO( _int8_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size * 3, _tw._hidden_size, 0, 0, 0, - _enc_clip_max[_layer_id * 12] * _enc_clip_max[_layer_id * 12 + 4] / - (_enc_clip_max[_layer_id * 12 + 8] * _quant_range), + _enc_clip_max[_layer_id * 11] * _enc_clip_max[_layer_id * 11 + 4] / + (_enc_clip_max[_layer_id * 11 + 8] * _quant_range), _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4], _cublas_lt_handle, _stream, false); @@ -301,7 +301,7 @@ void QuantBertEncoder::self_attention() { _batch_token_num, _tw._hidden_size, _stream, _int8_ffn_out_buf, _p_device_wei[_weight_offset + 3], _p_d_q, _max_batch_dim, _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block, - _enc_clip_max[_layer_id * 12 + 8] / _quant_range, true); + _enc_clip_max[_layer_id * 11 + 8] / _quant_range, true); /* ---step 2. correlation = q * k, perform softmax on correlation--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( @@ -330,7 +330,7 @@ void QuantBertEncoder::self_attention() { ker_arrange_atten_output_i8O_launcher<_DataType>( _batch_token_num, _tw._hidden_size, _stream, _p_d_q, _int8_ffn_in_buf, _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block, - _quant_range / _enc_clip_max[_layer_id * 12 + 5], true); + _quant_range / _enc_clip_max[_layer_id * 11 + 5], true); #ifdef DEBUG_RESULT for (int i = 0; i < _batch_size; i++) { // batch_id @@ -347,8 +347,8 @@ void QuantBertEncoder::self_attention() { cublasLtMM_withAlgo_i8IO( _int8_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size, _tw._hidden_size, 0, 0, 0, - _enc_clip_max[_layer_id * 12 + 1] * _enc_clip_max[_layer_id * 12 + 5] / - (_enc_clip_max[_layer_id * 12 + 9] * _quant_range), + _enc_clip_max[_layer_id * 11 + 1] * _enc_clip_max[_layer_id * 11 + 5] / + (_enc_clip_max[_layer_id * 11 + 9] * _quant_range), _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 1], _cublas_lt_handle, _stream, false); @@ -367,8 +367,8 @@ void QuantBertEncoder::self_attention() { _int8_ffn_out_buf, _p_device_wei[_weight_offset + 6], _p_device_wei[_weight_offset + 7], _p_device_wei[_weight_offset + 11], _int8_ffn_in_buf, _p_d_output, _batch_token_num, _tw._hidden_size, - _enc_clip_max[_layer_id * 12 + 9] / _quant_range, - _quant_range / _enc_clip_max[_layer_id * 12 + 6], _max_thread_per_block, + _enc_clip_max[_layer_id * 11 + 9] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 11 + 6], _max_thread_per_block, _stream, _tw._is_post_ln, true); return; @@ -391,8 +391,8 @@ void QuantBertEncoder::ffn_add_norm() { cublasLtMM_withAlgo_i8IO( _int8_ffn_out_buf, 1, _batch_token_num, _tw._inner_size, _tw._hidden_size, 0, 0, 0, - _enc_clip_max[_layer_id * 12 + 2] * _enc_clip_max[_layer_id * 12 + 6] / - (_enc_clip_max[_layer_id * 12 + 10] * _quant_range), + _enc_clip_max[_layer_id * 11 + 2] * _enc_clip_max[_layer_id * 11 + 6] / + (_enc_clip_max[_layer_id * 11 + 10] * _quant_range), _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 2], _cublas_lt_handle, _stream, false); @@ -400,15 +400,15 @@ void QuantBertEncoder::ffn_add_norm() { ker_bias_gelu_i8I_i8O_launcher<_DataType>( _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, _p_device_wei[_weight_offset + 9], _tw._inner_size, - _enc_clip_max[_layer_id * 12 + 10] / _quant_range, - _quant_range / _enc_clip_max[_layer_id * 12 + 7], true); + _enc_clip_max[_layer_id * 11 + 10] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 11 + 7], true); } else { ker_bias_relu_i8I_i8O_launcher<_DataType>( _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, _p_device_wei[_weight_offset + 9], _tw._inner_size, - _enc_clip_max[_layer_id * 12 + 10] / _quant_range, - _quant_range / _enc_clip_max[_layer_id * 12 + 7], - _enc_clip_max[_layer_id * 12 + 7], true, true, true); + _enc_clip_max[_layer_id * 11 + 10] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 11 + 7], + _enc_clip_max[_layer_id * 11 + 7], true, true, true); } #ifdef DEBUG_RESULT @@ -431,12 +431,12 @@ void QuantBertEncoder::ffn_add_norm() { const _DataType *scale_ptr, *bias_ptr, *res_bias_ptr; float clip_max, dequant_scale; if (_tw._use_gelu) { - dequant_scale = _enc_clip_max[_layer_id * 12 + 3] * - _enc_clip_max[_layer_id * 12 + 7] / + dequant_scale = _enc_clip_max[_layer_id * 11 + 3] * + _enc_clip_max[_layer_id * 11 + 7] / (_quant_range * _quant_range); } else { - dequant_scale = _enc_clip_max[_layer_id * 12 + 3] * - _enc_clip_max[_layer_id * 12 + 7] / + dequant_scale = _enc_clip_max[_layer_id * 11 + 3] * + _enc_clip_max[_layer_id * 11 + 7] / (2 * _quant_range * _quant_range); } if (_layer_id == _tw._n_enc_layer - 1) { @@ -452,7 +452,7 @@ void QuantBertEncoder::ffn_add_norm() { bias_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer + 1]; res_bias_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer + 5]; - clip_max = _enc_clip_max[(_layer_id + 1) * 12 + 4]; + clip_max = _enc_clip_max[(_layer_id + 1) * 11 + 4]; ker_residual_bias_ln_i32I_i8O_launcher<_DataType>( _int32_ffn_out_buf, scale_ptr, bias_ptr, res_bias_ptr, _int8_ffn_in_buf, diff --git a/lightseq/inference/model/quant_bert_encoder.h b/lightseq/inference/model/quant_bert_encoder.h index 54a6382a..55c26702 100644 --- a/lightseq/inference/model/quant_bert_encoder.h +++ b/lightseq/inference/model/quant_bert_encoder.h @@ -18,7 +18,7 @@ /** @file -Transformer decoder, composed by gemm lib and +QuantBert encoder, composed by gemm lib and custom cuda kernel function */ diff --git a/lightseq/inference/model/quant_decoder.cc.cu b/lightseq/inference/model/quant_decoder.cc.cu index 42ac5402..b9ab65dc 100644 --- a/lightseq/inference/model/quant_decoder.cc.cu +++ b/lightseq/inference/model/quant_decoder.cc.cu @@ -7,7 +7,7 @@ /** @file -Transformer decoder, composed by gemm lib and +QuantTransformer decoder, composed by gemm lib and custom cuda kernel function */ diff --git a/lightseq/inference/model/quant_decoder.h b/lightseq/inference/model/quant_decoder.h index fb524b0b..e31b8a05 100644 --- a/lightseq/inference/model/quant_decoder.h +++ b/lightseq/inference/model/quant_decoder.h @@ -20,7 +20,7 @@ /** @file -Transformer decoder, composed by gemm lib and +QuantTransformer decoder, composed by gemm lib and custom cuda kernel function */ namespace lightseq { diff --git a/lightseq/inference/model/quant_encoder.cc.cu b/lightseq/inference/model/quant_encoder.cc.cu index 1592e974..4f736848 100644 --- a/lightseq/inference/model/quant_encoder.cc.cu +++ b/lightseq/inference/model/quant_encoder.cc.cu @@ -7,7 +7,7 @@ /** @file -Transformer encoder, composed by gemm lib and +QuantTransformer encoder, composed by gemm lib and custom cuda kernel function */ diff --git a/lightseq/inference/model/quant_encoder.h b/lightseq/inference/model/quant_encoder.h index 953ea3b6..0d77114b 100644 --- a/lightseq/inference/model/quant_encoder.h +++ b/lightseq/inference/model/quant_encoder.h @@ -18,7 +18,7 @@ /** @file -Transformer decoder, composed by gemm lib and +QuantTransformer encoder, composed by gemm lib and custom cuda kernel function */ diff --git a/lightseq/inference/model/quant_gpt_encoder.cc.cu b/lightseq/inference/model/quant_gpt_encoder.cc.cu new file mode 100644 index 00000000..2c3ef050 --- /dev/null +++ b/lightseq/inference/model/quant_gpt_encoder.cc.cu @@ -0,0 +1,788 @@ +#include "../kernels/gptKernels.h" +#include "../kernels/transformerKernels.h" +#include "../kernels/transformerKernels_int8.h" +#include "quant_gpt_encoder.h" +#include "cublas_helper.h" + +/** +@file +QuantGPT encoder, composed by gemm lib and + custom cuda kernel function +*/ + +// #define DEBUG_RESULT + +namespace lightseq { +namespace cuda { + +template +QuantGptEncoder::QuantGptEncoder( + int max_batch_size, const int *p_d_token_id, float *p_d_ppl, + int *p_d_sample_id, const QuantGptWeight &tw, cudaStream_t stream, + cudaStream_t cache_stream, cublasHandle_t hd) + : _max_batch_size(max_batch_size), + _p_d_token_id(p_d_token_id), + _p_d_ppl(p_d_ppl), + _p_d_sample_id(p_d_sample_id), + _tw(tw), + _stream(stream), + _cache_stream(cache_stream), + _hd(hd), + _p_d_src_emb_wei(tw.get_src_emb_wei()), + _p_d_enc_wei(tw.get_enc_wei()), + _fone((_DataType)1.f), + _fzero((_DataType)0.f), + _atten_scaler((_DataType)sqrt(1.f / tw._dim_per_head)), + _max_batch_dim(max_batch_size * tw._max_step * tw._hidden_size), + _max_thread_per_block(1024), + _h_real_seq_len(max_batch_size, 0), + _h_ppl(max_batch_size, 0.f), + _h_sample_id(max_batch_size * tw._max_step, 0), + _h_unfinished(1) {} + +/** +Compute GPU memory size needed by gpt encoder, + to see how these memory is used, checkout init_buffer() for detail +*/ +template +size_t QuantGptEncoder::compute_buffer_bytesize() { + int si = _max_batch_size; + size_t sz0 = (size_t)_max_batch_dim; + sz0 += 2 * (size_t)_max_batch_dim * (size_t)_tw._n_enc_layer; + long long sz1 = (size_t)_max_batch_dim * 6 + + (size_t)_max_batch_size * (size_t)_tw._head_num * + (size_t)_tw._max_step * (size_t)_tw._max_step; + long long sz2 = (size_t)_max_batch_dim + (size_t)_max_batch_size * + (size_t)_tw._max_step * + (size_t)_tw._inner_size; + long long sz3 = (size_t)_max_batch_size * (size_t)_tw._max_step * + (size_t)_tw._src_vocab_size; + return (sz0 + max(max(sz1, sz2), sz3)) * sizeof(_DataType) + si * sizeof(int); +} + +/** +Init the GPU memory pointer which point to + the memory buffer needed by encoder. +These buffer are used during custom cuda kernel function, + find the corresponding function to see how these buffer are used +*/ +template +void QuantGptEncoder::init_buffer(void *pbuf) { + // int buffer + int *p_d_int = reinterpret_cast(pbuf); + _p_d_real_seq_len = p_d_int; + p_d_int += _max_batch_size; + + // datatype buffer + _DataType *p_d_datatype = reinterpret_cast<_DataType *>(p_d_int); + _p_d_query = p_d_datatype; + _p_d_k_cache = _p_d_query + _max_batch_dim; + _p_d_v_cache = _p_d_k_cache + _max_batch_dim * _tw._n_enc_layer; + p_d_datatype = _p_d_v_cache + _max_batch_dim * _tw._n_enc_layer; + // reuse 1 --------------------- + _p_d_qkv_projected = p_d_datatype; + _p_d_q = _p_d_qkv_projected + _max_batch_dim * 3; + _p_d_k = _p_d_q + _max_batch_dim; + _p_d_v = _p_d_k + _max_batch_dim; + // _max_batch_size * _tw._head_num * + // _tw._max_step * _tw._max_step + _p_d_c = _p_d_v + _max_batch_dim; + // reuse 2 --------------------- + _p_d_ffn_buf1 = p_d_datatype; + // _max_batch_size * _tw._max_step * _tw._inner_size + _p_d_ffn_buf2 = _p_d_ffn_buf1 + _max_batch_dim; + // reuse 3 --------------------- + // _max_batch_size * _tw._max_step * _tw._src_vocab_size + _p_d_logit = p_d_datatype; + CHECK_GPU_ERROR(cudaMalloc((void **)&_p_d_curandstate, + _max_batch_size * sizeof(curandState))); + CHECK_GPU_ERROR(cudaMalloc((void **)&_p_d_sample_id_buf, + _max_batch_size * _tw._max_step * sizeof(int))); + CHECK_GPU_ERROR(cudaMalloc((void **)&_p_d_unfinished, sizeof(int))); + ker_curand_setup<<<_max_batch_size, 1, 0, _stream>>>(_p_d_curandstate); + return; +} + +/** +Some requirements needed by custom cuda kernel function +*/ +template +std::string QuantGptEncoder::check() { + // if (_max_thread_per_block < _tw._hidden_size) { + // return "violate hidden_size <= max_thread_per_block"; + // } + if (_tw._inner_size & 1) { + return "violate inner_size % 2 = 0"; + } + if (_tw._dim_per_head & 1) { + return "violate dim_per_head % 2 = 0"; + } + if (_p_d_src_emb_wei.size() != 4) { + return "violate p_d_src_emb_wei.size() = 4"; + } + if (_p_d_enc_wei.size() != _tw._weight_per_enc_layer * _tw._n_enc_layer) { + return "violate p_d_enc_wei.size() = weight_per_enc_layer * n_enc_layer"; + } + std::string sampling_method = _tw._sampling_method; + if (kSamplingMethods.find(sampling_method) == kSamplingMethods.end()) { + return std::string("unsupported sampling_method: ") + sampling_method; + } + + if (_tw._topk <= 0) { + return "topk must be positive"; + } + if (_tw._topp <= 0 && _tw._topp >= 1.0) { + return "topp must be in (0, 1)"; + } + + return ""; +} + +template +void QuantGptEncoder::run_one_infer(int batch_size, + int batch_seq_len) { + if (batch_size > _max_batch_size) { + throw std::runtime_error("batch size of input greater than max_batch_size"); + } + if (batch_seq_len > _tw._max_step) { + throw std::runtime_error("seq len of input greater than max_step"); + } + _batch_size = batch_size; + _batch_seq_len = batch_seq_len; + _batch_token_num = batch_size * batch_seq_len; + CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_real_seq_len, _h_real_seq_len.data(), + sizeof(int) * _batch_size, + cudaMemcpyHostToDevice, _stream)); + CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_ppl, _h_ppl.data(), + sizeof(float) * _batch_size, + cudaMemcpyHostToDevice, _stream)); + +#ifdef DEBUG_RESULT + std::cout << "batch_size-" << batch_size << " batch_seq_len-" << batch_seq_len + << std::endl; + print_vec(_p_d_token_id, "batch_token_ids", batch_size * batch_seq_len); +#endif + + // token embedding, add position embedding and layer_norm + ker_gpt_embedding_launcher<_DataType>( + batch_size, batch_seq_len, _tw._hidden_size, _stream, _p_d_src_emb_wei[0], + _p_d_src_emb_wei[1], _p_d_token_id, _p_d_query, _p_d_real_seq_len, + _tw._padding_id, 0); + +#ifdef DEBUG_RESULT + print_vec(_p_d_query, "input embeddings", + _batch_token_num * _tw._hidden_size - 5, + _batch_token_num * _tw._hidden_size); +#endif + + for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { + _weight_offset = _layer_id * _tw._weight_per_enc_layer; + self_attention(); + ffn_add_norm(); + } + + // last layer norm + ker_norm_layer_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_query, + _p_d_src_emb_wei[2], _p_d_src_emb_wei[3], _max_thread_per_block); + + compute_ppl(); + + return; +} + +template +int QuantGptEncoder::run_one_sample(int batch_size, + int batch_seq_len) { + if (batch_size > _max_batch_size) { + throw std::runtime_error("batch size of input greater than max_batch_size"); + } + if (batch_seq_len > _tw._max_step) { + throw std::runtime_error("seq len of input greater than max_step"); + } + _batch_size = batch_size; + _batch_seq_len = batch_seq_len; + _batch_token_num = batch_size * batch_seq_len; + + CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_real_seq_len, _h_real_seq_len.data(), + sizeof(int) * _batch_size, + cudaMemcpyHostToDevice, _stream)); + CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_ppl, _h_ppl.data(), + sizeof(float) * _batch_size, + cudaMemcpyHostToDevice, _stream)); + CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_sample_id, _p_d_token_id, + sizeof(int) * _batch_size * _batch_seq_len, + cudaMemcpyDeviceToDevice, _stream)); +#ifdef DEBUG_RESULT + std::cout << "batch_size-" << batch_size << " batch_seq_len-" << batch_seq_len + << std::endl; + std::cout << "Sample with " << _tw._sampling_method << std::endl; + std::cout << "padding_id: " << _tw._padding_id << std::endl; + std::cout << "vocab_size: " << _tw._src_vocab_size << std::endl; + print_vec(_p_d_sample_id, "batch_token_ids", batch_size * batch_seq_len); +#endif + + // token embedding, add position embedding and layer_norm + ker_gpt_embedding_launcher<_DataType>( + _batch_size, _batch_seq_len, _tw._hidden_size, _stream, + _p_d_src_emb_wei[0], _p_d_src_emb_wei[1], _p_d_sample_id, _p_d_query, + _p_d_real_seq_len, _tw._padding_id, 0); + +#ifdef DEBUG_RESULT + print_vec(_p_d_query, "embedding", _batch_token_num * _tw._hidden_size - 10, + _batch_token_num * _tw._hidden_size); +#endif + + for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { + _weight_offset = _layer_id * _tw._weight_per_enc_layer; + self_attention(true); + ffn_add_norm(); + } + + // last layer norm + ker_norm_layer_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_query, + _p_d_src_emb_wei[2], _p_d_src_emb_wei[3], _max_thread_per_block); + if (sample_one_token() == 0 || _batch_seq_len >= _tw._max_step) { + CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_sample_id_buf, _p_d_sample_id, + _batch_token_num * sizeof(int), + cudaMemcpyDeviceToDevice, _stream)); + CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); + return _batch_seq_len; + } + + while (1) { +#ifdef DEBUG_RESULT + std::cout << "before sample:batch_size-" << _batch_size << " batch_seq_len-" + << _batch_seq_len << std::endl; + print_vec(_p_d_sample_id, "batch_token_ids", _batch_token_num); +#endif + + // token embedding, add position embedding and layer_norm + ker_gpt_embedding_launcher<_DataType>( + _batch_size, 1, _tw._hidden_size, _stream, _p_d_src_emb_wei[0], + _p_d_src_emb_wei[1], _p_d_last_sample_id, _p_d_query, _p_d_real_seq_len, + _tw._padding_id, _batch_seq_len - 1); +#ifdef DEBUG_RESULT + print_vec(_p_d_query, "embedding", _batch_size * _tw._hidden_size - 10, + _batch_size * _tw._hidden_size); +#endif + for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { + _weight_offset = _layer_id * _tw._weight_per_enc_layer; + self_attention_with_cache(); + ffn_add_norm_with_cache(); + } + + // last layer norm + ker_norm_layer_launcher<_DataType>( + _batch_size, _tw._hidden_size, _stream, _p_d_query, _p_d_src_emb_wei[2], + _p_d_src_emb_wei[3], _max_thread_per_block); +#ifdef DEBUG_RESULT + + print_vec(_p_d_query, "_p_d_query before logits", + _batch_size * _tw._hidden_size - 10, + _batch_size * _tw._hidden_size); + if (sample_one_token_with_cache() == 0 || _batch_seq_len >= _tw._max_step) + break; +#else + if (sample_one_token_with_cache() == 0 || _batch_seq_len >= _tw._max_step) + break; +#endif + } + + CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_sample_id_buf, _p_d_sample_id, + _batch_token_num * sizeof(int), + cudaMemcpyDeviceToDevice, _stream)); + CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); + + return _batch_seq_len; +} + +template +int QuantGptEncoder::sample_one_token() { + /* ---step 1. project hidden states to vocab logits--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_T, CUBLAS_OP_N, _tw._src_vocab_size, _batch_token_num, + _tw._hidden_size, &_fone, _p_d_src_emb_wei[0], _AType, _tw._hidden_size, + _p_d_query, _BType, _tw._hidden_size, &_fzero, _p_d_logit, _CType, + _tw._src_vocab_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); +#ifdef DEBUG_RESULT + print_vec(_p_d_logit, "logits", _batch_token_num * _tw._src_vocab_size - 10, + _batch_token_num * _tw._src_vocab_size); +#endif + CHECK_GPU_ERROR(cudaMemsetAsync(_p_d_unfinished, 0, sizeof(int), _stream)); + /* ---step 2. sample new tokens from logits */ + if (_tw._sampling_method == "topk") { +#ifdef DEBUG_RESULT + std::cout << "sampling using topk\n"; +#endif + ker_topk_sample_launcher<_DataType>( + _batch_size, _batch_seq_len, _batch_seq_len, _max_thread_per_block, + _stream, _p_d_logit, _p_d_sample_id, _p_d_sample_id_buf, + _p_d_real_seq_len, _tw._src_vocab_size, _tw._topk, _p_d_unfinished, + _p_d_curandstate, _tw._eos_id); + } else { +#ifdef DEBUG_RESULT + std::cout << "sampling using topp\n"; +#endif + ker_topp_sample_launcher<_DataType>( + _batch_size, _batch_seq_len, _batch_seq_len, _max_thread_per_block, + _stream, _p_d_logit, _p_d_sample_id, _p_d_sample_id_buf, + _p_d_real_seq_len, _tw._src_vocab_size, _tw._topp, _p_d_unfinished, + _p_d_curandstate, _tw._eos_id); + } + int *temp = _p_d_sample_id; + _p_d_sample_id = _p_d_sample_id_buf; + _p_d_sample_id_buf = temp; + CHECK_GPU_ERROR(cudaMemcpyAsync(&_h_unfinished, _p_d_unfinished, sizeof(int), + cudaMemcpyDeviceToHost, _stream)); + CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); + _p_d_last_sample_id = _p_d_sample_id_buf + _batch_token_num; + _batch_seq_len++; + _batch_token_num += _batch_size; + return _h_unfinished; +} + +template +int QuantGptEncoder::sample_one_token_with_cache() { + /* ---step 1. project hidden states to vocab logits--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_T, CUBLAS_OP_N, _tw._src_vocab_size, _batch_size, + _tw._hidden_size, &_fone, _p_d_src_emb_wei[0], _AType, _tw._hidden_size, + _p_d_query, _BType, _tw._hidden_size, &_fzero, _p_d_logit, _CType, + _tw._src_vocab_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + print_vec(_p_d_logit, "sampling-logits", + _batch_size * _tw._src_vocab_size - 5, + _batch_size * _tw._src_vocab_size); +#endif + + CHECK_GPU_ERROR(cudaMemsetAsync(_p_d_unfinished, 0, sizeof(int), _stream)); + // /* ---step 2. sample new tokens from logits */ + if (_tw._sampling_method == "topk") { +#ifdef DEBUG_RESULT + std::cout << "sampling using topk\n"; +#endif + ker_topk_sample_launcher<_DataType>( + _batch_size, _batch_seq_len, 1, _max_thread_per_block, _stream, + _p_d_logit, _p_d_sample_id, _p_d_sample_id_buf, _p_d_real_seq_len, + _tw._src_vocab_size, _tw._topk, _p_d_unfinished, _p_d_curandstate, + _tw._eos_id); + } else { +#ifdef DEBUG_RESULT + std::cout << "sampling using topp\n"; +#endif + ker_topp_sample_launcher<_DataType>( + _batch_size, _batch_seq_len, 1, _max_thread_per_block, _stream, + _p_d_logit, _p_d_sample_id, _p_d_sample_id_buf, _p_d_real_seq_len, + _tw._src_vocab_size, _tw._topp, _p_d_unfinished, _p_d_curandstate, + _tw._eos_id); + } + int *temp = _p_d_sample_id; + _p_d_sample_id = _p_d_sample_id_buf; + _p_d_sample_id_buf = temp; + CHECK_GPU_ERROR(cudaMemcpyAsync(&_h_unfinished, _p_d_unfinished, sizeof(int), + cudaMemcpyDeviceToHost, _stream)); + CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); + _p_d_last_sample_id = _p_d_sample_id_buf + _batch_token_num; + _batch_seq_len++; + _batch_token_num += _batch_size; + return _h_unfinished; +} + +template +void QuantGptEncoder::self_attention(bool cache) { + /* ---step 0. layer_norm, add output_bias to "query"--- */ + ker_norm_layer_resual_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_query, _p_d_q, + _p_d_enc_wei[_weight_offset], _p_d_enc_wei[_weight_offset + 1], + _p_d_enc_wei[_weight_offset + 5], _max_thread_per_block); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_query, "input with bias", + _batch_token_num * _tw._hidden_size - 5, + _batch_token_num * _tw._hidden_size); + print_vec(_p_d_q, "first ln output", + _batch_token_num * _tw._hidden_size - 5, + _batch_token_num * _tw._hidden_size); + } +#endif + + /* ---step 1. qkv = ori_q * qkv_wei + bias, and reshape qkv for multi-head + * gemm--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size * 3, _batch_token_num, + _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 2], _AType, + _tw._hidden_size * 3, _p_d_q, _BType, _tw._hidden_size, &_fzero, + _p_d_qkv_projected, _CType, _tw._hidden_size * 3, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + std::cout << "hidden_size: " << _tw._hidden_size << std::endl; + std::cout << "_batch_token_num: " << _batch_token_num << std::endl; + std::cout << "_dim_per_head: " << _tw._dim_per_head << std::endl; + std::cout << "_head_num: " << _tw._head_num << std::endl; + + print_vec(_p_d_enc_wei[_weight_offset + 2], "qkv_weight_mat", + _tw._hidden_size * _tw._hidden_size * 3 - 5, + _tw._hidden_size * _tw._hidden_size * 3); + print_vec(_p_d_qkv_projected, "_p_d_qkv_projected", + _batch_token_num * _tw._hidden_size * 3 - 5, + _batch_token_num * _tw._hidden_size * 3); + } +#endif + // get q, k, v by split and reshape qkv + ker_arrange_encself_qkv_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_qkv_projected, + _p_d_enc_wei[_weight_offset + 3], _p_d_q, _max_batch_dim, _batch_seq_len, + _tw._dim_per_head, _tw._head_num, _max_thread_per_block); + + if (cache) { + cudaStream_t stream; + if (_batch_token_num > 360) { + stream = _cache_stream; + CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); + } else { + stream = _stream; + } + CHECK_GPU_ERROR( + cudaMemcpyAsync(_p_d_k_cache + _layer_id * _max_batch_dim, _p_d_k, + _batch_token_num * _tw._hidden_size * sizeof(_DataType), + cudaMemcpyDeviceToDevice, stream)); + CHECK_GPU_ERROR( + cudaMemcpyAsync(_p_d_v_cache + _layer_id * _max_batch_dim, _p_d_v, + _batch_token_num * _tw._hidden_size * sizeof(_DataType), + cudaMemcpyDeviceToDevice, stream)); + } + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_q, "_p_d_q", _batch_token_num * _tw._hidden_size - 5, + _batch_token_num * _tw._hidden_size); + print_vec(_p_d_k, "_p_d_k", _batch_token_num * _tw._hidden_size - 5, + _batch_token_num * _tw._hidden_size); + print_vec(_p_d_v, "_p_d_v", _batch_token_num * _tw._hidden_size - 5, + _batch_token_num * _tw._hidden_size); + } +#endif + + /* ---step 2. correlation = q * k, perform softmax on correlation--- */ + CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( + _hd, CUBLAS_OP_T, CUBLAS_OP_N, _batch_seq_len, _batch_seq_len, + _tw._dim_per_head, &_atten_scaler, _p_d_k, _AType, _tw._dim_per_head, + _batch_seq_len * _tw._dim_per_head, _p_d_q, _BType, _tw._dim_per_head, + _batch_seq_len * _tw._dim_per_head, &_fzero, _p_d_c, _CType, + _batch_seq_len, _batch_seq_len * _batch_seq_len, + _batch_size * _tw._head_num, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_c, "q*k", + _batch_token_num * _batch_seq_len * _tw._head_num - 5, + _batch_token_num * _batch_seq_len * _tw._head_num); + } +#endif + + ker_correlation_softmax_gpt_launcher<_DataType>(_batch_size, _batch_seq_len, + _tw._head_num, _stream, + _p_d_c, _p_d_real_seq_len); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_c, "mask weights", + _batch_token_num * _batch_seq_len * _tw._head_num - 5, + _batch_token_num * _batch_seq_len * _tw._head_num); + } +#endif + + /* ---step 3. new_q = correlation * v--- */ + CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._dim_per_head, _batch_seq_len, + _batch_seq_len, &_fone, _p_d_v, _AType, _tw._dim_per_head, + _batch_seq_len * _tw._dim_per_head, _p_d_c, _BType, _batch_seq_len, + _batch_seq_len * _batch_seq_len, &_fzero, _p_d_q, _CType, + _tw._dim_per_head, _batch_seq_len * _tw._dim_per_head, + _batch_size * _tw._head_num, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_q, "value after attention", + _batch_token_num * _tw._hidden_size - 5, + _batch_token_num * _tw._hidden_size); + } +#endif + + // use v to save reshaped q, since they are in same size and v + // will not be use again before the next multi-head-attention + ker_arrange_atten_output_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_q, _p_d_v, + _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_v, "reshaped value after attention", 0, 5); + print_vec(_p_d_query, "attention input with output bias", 0, 5); + } +#endif + + /* ---step 4. new_q = ori_q + new_q * output_wei--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_token_num, + _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 4], _AType, + _tw._hidden_size, _p_d_v, _BType, _tw._hidden_size, &_fone, _p_d_query, + _CType, _tw._hidden_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_enc_wei[_weight_offset + 4], "attn out kernel", 0, 5); + print_vec(_p_d_query, "attention output", 0, 5); + } +#endif + return; +} + +template +void QuantGptEncoder::self_attention_with_cache() { + _DataType *_p_d_k_cache_cur_layer = _p_d_k_cache + _layer_id * _max_batch_dim; + _DataType *_p_d_v_cache_cur_layer = _p_d_v_cache + _layer_id * _max_batch_dim; + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_k_cache_cur_layer, "_p_d_k_cache_cur_layer", + _batch_size * (_batch_seq_len - 1) * _tw._hidden_size - 5, + _batch_size * (_batch_seq_len - 1) * _tw._hidden_size); + print_vec(_p_d_v_cache_cur_layer, "_p_d_v_cache_cur_layer", + _batch_size * (_batch_seq_len - 1) * _tw._hidden_size - 5, + _batch_size * (_batch_seq_len - 1) * _tw._hidden_size); + } +#endif + + /* ---step 0. layer_norm, add output_bias to "query"--- */ + ker_norm_layer_resual_launcher<_DataType>( + _batch_size, _tw._hidden_size, _stream, _p_d_query, _p_d_q, + _p_d_enc_wei[_weight_offset], _p_d_enc_wei[_weight_offset + 1], + _p_d_enc_wei[_weight_offset + 5], _max_thread_per_block); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_query, "input with bias", _batch_size * _tw._hidden_size - 5, + _batch_size * _tw._hidden_size); + print_vec(_p_d_q, "first ln output", _batch_size * _tw._hidden_size - 5, + _batch_size * _tw._hidden_size); + } +#endif + + /* ---step 1. qkv = ori_q * qkv_wei + bias, and reshape qkv for multi-head + * gemm--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size * 3, _batch_size, + _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 2], _AType, + _tw._hidden_size * 3, _p_d_q, _BType, _tw._hidden_size, &_fzero, + _p_d_qkv_projected, _CType, _tw._hidden_size * 3, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_qkv_projected, "_p_d_qkv_projected", + _batch_size * _tw._hidden_size * 3 - 5, + _batch_size * _tw._hidden_size * 3); + } +#endif + // get q, k, v by split and reshape qkv + ker_arrange_qkv_with_cache_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_qkv_projected, + _p_d_enc_wei[_weight_offset + 3], _p_d_q, _p_d_k, _p_d_k_cache_cur_layer, + _p_d_v, _p_d_v_cache_cur_layer, _max_batch_dim, _batch_seq_len, + _tw._dim_per_head, _tw._head_num); + + // copy new k and v to cache + cudaStream_t stream; + if (_batch_token_num > 360) { + stream = _cache_stream; + CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); + } else { + stream = _stream; + } + CHECK_GPU_ERROR( + cudaMemcpyAsync(_p_d_k_cache_cur_layer, _p_d_k, + _batch_token_num * _tw._hidden_size * sizeof(_DataType), + cudaMemcpyDeviceToDevice, stream)); + CHECK_GPU_ERROR( + cudaMemcpyAsync(_p_d_v_cache_cur_layer, _p_d_v, + _batch_token_num * _tw._hidden_size * sizeof(_DataType), + cudaMemcpyDeviceToDevice, stream)); +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_q, "_p_d_q", _batch_size * _tw._hidden_size - 5, + _batch_size * _tw._hidden_size); + print_vec(_p_d_k, "_p_d_k", _batch_token_num * _tw._hidden_size - 5, + _batch_token_num * _tw._hidden_size); + print_vec(_p_d_v, "_p_d_v", _batch_token_num * _tw._hidden_size - 5, + _batch_token_num * _tw._hidden_size); + } +#endif + + /* ---step 2. correlation = q * k, perform softmax on correlation + correlation: [batch_size, heads_num, 1, batch_seq_len]--- */ + CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( + _hd, CUBLAS_OP_T, CUBLAS_OP_N, _batch_seq_len, 1, _tw._dim_per_head, + &_atten_scaler, _p_d_k, _AType, _tw._dim_per_head, + _batch_seq_len * _tw._dim_per_head, _p_d_q, _BType, _tw._dim_per_head, + _tw._dim_per_head, &_fzero, _p_d_c, _CType, _batch_seq_len, + _batch_seq_len, _batch_size * _tw._head_num, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_c, "q*k", _batch_size * _batch_seq_len * _tw._head_num - 5, + _batch_size * _batch_seq_len * _tw._head_num); + } +#endif + ker_attention_mask_weights_launcher<_DataType>(_batch_size, 1, _batch_seq_len, + _tw._head_num, _stream, _p_d_c, + _p_d_real_seq_len); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_c, "mask weights", + _batch_size * _batch_seq_len * _tw._head_num - 5, + _batch_size * _batch_seq_len * _tw._head_num); + } +#endif + + /* ---step 3. new_q = correlation * v--- */ + CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._dim_per_head, 1, _batch_seq_len, + &_fone, _p_d_v, _AType, _tw._dim_per_head, + _batch_seq_len * _tw._dim_per_head, _p_d_c, _BType, _batch_seq_len, + _batch_seq_len, &_fzero, _p_d_q, _CType, _tw._dim_per_head, + _tw._dim_per_head, _batch_size * _tw._head_num, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_q, "value after attention", + _batch_size * _tw._hidden_size - 5, + _batch_size * _tw._hidden_size); + } +#endif + // use v to save reshaped q, since they are in same size and v + // will not be use again before the next multi-head-attention + ker_arrange_atten_output_launcher<_DataType>( + _batch_size, _tw._hidden_size, _stream, _p_d_q, _p_d_v, 1, + _tw._dim_per_head, _tw._head_num, _max_thread_per_block); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_v, "reshaped value after attention", 0, 5); + print_vec(_p_d_query, "attention input with output bias", 0, 5); + } +#endif + + /* ---step 4. new_q = ori_q + new_q * output_wei--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_size, + _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 4], _AType, + _tw._hidden_size, _p_d_v, _BType, _tw._hidden_size, &_fone, _p_d_query, + _CType, _tw._hidden_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + if (_layer_id == 0) { + print_vec(_p_d_enc_wei[_weight_offset + 4], "attn out kernel", 0, 5); + print_vec(_p_d_query, "attention output", 0, 5); + } +#endif + return; +} + +template +void QuantGptEncoder::ffn_add_norm() { + /* ---step 0. layer_norm, add output_bias to "query"--- */ + ker_norm_layer_resual_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_query, _p_d_ffn_buf1, + _p_d_enc_wei[_weight_offset + 6], _p_d_enc_wei[_weight_offset + 7], + _p_d_enc_wei[_weight_offset + 11], _max_thread_per_block); + + /* ---step 1. first ffn layer--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._inner_size, _batch_token_num, + _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 8], _AType, + _tw._inner_size, _p_d_ffn_buf1, _BType, _tw._hidden_size, &_fzero, + _p_d_ffn_buf2, _CType, _tw._inner_size, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + ker_bias_gelu_launcher<_DataType>( + _batch_token_num, _max_thread_per_block, _stream, _p_d_ffn_buf2, + _p_d_enc_wei[_weight_offset + 9], _tw._inner_size); + + /* ---step 2. second ffn layer--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_token_num, + _tw._inner_size, &_fone, _p_d_enc_wei[_weight_offset + 10], _AType, + _tw._hidden_size, _p_d_ffn_buf2, _BType, _tw._inner_size, &_fone, + _p_d_query, _CType, _tw._hidden_size, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + return; +} + +template +void QuantGptEncoder::ffn_add_norm_with_cache() { + /* ---step 0. layer_norm, add output_bias to "query"--- */ + ker_norm_layer_resual_launcher<_DataType>( + _batch_size, _tw._hidden_size, _stream, _p_d_query, _p_d_ffn_buf1, + _p_d_enc_wei[_weight_offset + 6], _p_d_enc_wei[_weight_offset + 7], + _p_d_enc_wei[_weight_offset + 11], _max_thread_per_block); + + /* ---step 1. first ffn layer--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._inner_size, _batch_size, + _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 8], _AType, + _tw._inner_size, _p_d_ffn_buf1, _BType, _tw._hidden_size, &_fzero, + _p_d_ffn_buf2, _CType, _tw._inner_size, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + ker_bias_gelu_launcher<_DataType>( + _batch_size, _max_thread_per_block, _stream, _p_d_ffn_buf2, + _p_d_enc_wei[_weight_offset + 9], _tw._inner_size); + + /* ---step 2. second ffn layer--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_size, + _tw._inner_size, &_fone, _p_d_enc_wei[_weight_offset + 10], _AType, + _tw._hidden_size, _p_d_ffn_buf2, _BType, _tw._inner_size, &_fone, + _p_d_query, _CType, _tw._hidden_size, _computeType, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + return; +} + +/** +Compute ppl from encoder output +*/ +template +void QuantGptEncoder::compute_ppl() { + /* ---step 1. project hidden states to vocab logits--- */ + CHECK_GPU_ERROR(cublasGemmEx( + _hd, CUBLAS_OP_T, CUBLAS_OP_N, _tw._src_vocab_size, _batch_token_num, + _tw._hidden_size, &_fone, _p_d_src_emb_wei[0], _AType, _tw._hidden_size, + _p_d_query, _BType, _tw._hidden_size, &_fzero, _p_d_logit, _CType, + _tw._src_vocab_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + +#ifdef DEBUG_RESULT + print_vec(_p_d_logit, "logits", _batch_token_num * _tw._src_vocab_size - 5, + _batch_token_num * _tw._src_vocab_size); +#endif + + /* ---step 2. compute language model ppl--- */ + ker_ppl_launcher<_DataType>( + _batch_size, _batch_seq_len, _max_thread_per_block, _stream, _p_d_logit, + _p_d_token_id, _p_d_real_seq_len, _p_d_ppl, _tw._src_vocab_size); +} + +template class QuantGptEncoder; +template class QuantGptEncoder; + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/model/quant_gpt_encoder.h b/lightseq/inference/model/quant_gpt_encoder.h new file mode 100644 index 00000000..d18579d8 --- /dev/null +++ b/lightseq/inference/model/quant_gpt_encoder.h @@ -0,0 +1,109 @@ +#pragma once + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "../proto/quant_gpt_weight.h" +#include "../tools/util.h" + +namespace lightseq { +namespace cuda { + +template +class QuantGptEncoder { + private: + typedef OperationTypeTraits _optraits; + typedef typename _optraits::DataType _DataType; + const cudaDataType_t _computeType = _optraits::computeType; + const cudaDataType_t _AType = _optraits::AType; + const cudaDataType_t _BType = _optraits::BType; + const cudaDataType_t _CType = _optraits::CType; + + // private member function + void self_attention(bool cache = false); + void self_attention_with_cache(); + void ffn_add_norm(); + void ffn_add_norm_with_cache(); + int sample_one_token(); + int sample_one_token_with_cache(); + + const int _max_batch_size; + + const QuantGptWeight &_tw; + cudaStream_t _stream; + cudaStream_t _cache_stream; + cublasHandle_t _hd; + const _DataType _fone; + const _DataType _fzero; + const _DataType _atten_scaler; + const int _max_batch_dim; + const int _max_thread_per_block; + std::vector _h_real_seq_len; + std::vector _h_ppl; + std::vector _h_sample_id; + int _h_unfinished; + + // gpu memeory buffer + _DataType *_p_d_query; + _DataType *_p_d_k_cache; + _DataType *_p_d_v_cache; + _DataType *_p_d_qkv_projected; + _DataType *_p_d_q; + _DataType *_p_d_k; + _DataType *_p_d_v; + _DataType *_p_d_c; + _DataType *_p_d_ffn_buf1; + _DataType *_p_d_ffn_buf2; + _DataType *_p_d_logit; + int *_p_d_real_seq_len; // [batch_size] + int *_p_d_sample_id_buf; // [batch_size, max_step] + int *_p_d_last_sample_id; + int *_p_d_unfinished; + curandState *_p_d_curandstate; //[batch_size] + + // {token_emb, pos_emb, norm_scale, norm_bias} + const std::vector &_p_d_src_emb_wei; + // {multihead_norm_scale, multihead_norm_bias, multihead_qkv_kernel, + // multihead_qkv_bias multihead_output_kernel, multihead_output_bias + // ffn_norm_scale, ffn_norm_bias} + // ffn_first_kernel, ffn_first_bias, ffn_second_kernel, ffn_second_bias} * + // encoder_layer_num + const std::vector &_p_d_enc_wei; + + int _batch_size; + int _batch_token_num; + int _layer_id; + int _weight_offset; + + const std::set kSamplingMethods = {"topk", "topp", "ppl"}; + + public: + int _batch_seq_len; + const int *_p_d_token_id; // input token id, [batch_size, batch_seq_len] + float *_p_d_ppl; // ppl for every seq, [batch_size] + int *_p_d_sample_id; + + QuantGptEncoder(int max_batch_size, const int *p_d_token_id, float *p_d_ppl, + int *p_d_sample_id, const QuantGptWeight &tw, + cudaStream_t stream, cudaStream_t cache_stream, + cublasHandle_t hd); + size_t compute_buffer_bytesize(); + void init_buffer(void *pbuf); + std::string check(); + void run_one_infer(int batch_size, int batch_seq_len); + int run_one_sample(int batch_size, int batch_seq_len); + void compute_ppl(); +}; + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/proto/CMakeLists.txt b/lightseq/inference/proto/CMakeLists.txt index 745d8e7f..1cd46f8d 100644 --- a/lightseq/inference/proto/CMakeLists.txt +++ b/lightseq/inference/proto/CMakeLists.txt @@ -9,6 +9,7 @@ include_directories(${Protobuf_INCLUDE_DIRS}) include_directories(${CMAKE_CURRENT_BINARY_DIR}) protobuf_generate_cpp(GPT_PROTO_SRC GPT_PROTO_HEADER gpt.proto) +protobuf_generate_cpp(Q_GPT_PROTO_SRC Q_GPT_PROTO_HEADER quant_gpt.proto) protobuf_generate_cpp(BERT_PROTO_SRC BERT_PROTO_HEADER bert.proto) protobuf_generate_cpp(Q_BERT_PROTO_SRC Q_BERT_PROTO_HEADER quant_bert.proto) protobuf_generate_cpp(Q_TRANSFORMER_PROTO_SRC Q_TRANSFORMER_PROTO_HEADER @@ -24,6 +25,13 @@ target_link_libraries(gpt_weight PUBLIC utils ${Protobuf_LIBRARIES} target_include_directories(gpt_weight PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}) target_include_directories(gpt_weight PUBLIC ${CMAKE_CURRENT_BINARY_DIR}) +add_library(quant_gpt_weight STATIC quant_gpt_weight.cc ${Q_GPT_PROTO_SRC} + ${Q_GPT_PROTO_HEADER}) +target_link_libraries(quant_gpt_weight PUBLIC utils ${Protobuf_LIBRARIES} + ${HDF5_LIBRARIES}) +target_include_directories(quant_gpt_weight PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}) +target_include_directories(quant_gpt_weight PUBLIC ${CMAKE_CURRENT_BINARY_DIR}) + add_library(bert_weight STATIC bert_weight.cc ${BERT_PROTO_SRC} ${BERT_PROTO_HEADER}) target_link_libraries(bert_weight PUBLIC utils ${Protobuf_LIBRARIES}) diff --git a/lightseq/inference/proto/quant_bert.proto b/lightseq/inference/proto/quant_bert.proto index 661de482..255ef4d2 100644 --- a/lightseq/inference/proto/quant_bert.proto +++ b/lightseq/inference/proto/quant_bert.proto @@ -48,7 +48,6 @@ message QuantBertEncoderLayer { float multihead_qkv_dense_clip_max = 21; float multihead_output_dense_clip_max = 22; float ffn_first_output_clip_max = 23; - float ffn_second_output_clip_max = 24; } message QuantBertEmbeddingLayer { diff --git a/lightseq/inference/proto/quant_bert_weight.cc b/lightseq/inference/proto/quant_bert_weight.cc index 6b0375f3..a69999a8 100644 --- a/lightseq/inference/proto/quant_bert_weight.cc +++ b/lightseq/inference/proto/quant_bert_weight.cc @@ -203,7 +203,6 @@ std::string QuantBertWeight::proto_parse_enc_wei( _enc_clip_max.push_back(enc_layer.multihead_qkv_dense_clip_max()); _enc_clip_max.push_back(enc_layer.multihead_output_dense_clip_max()); _enc_clip_max.push_back(enc_layer.ffn_first_output_clip_max()); - _enc_clip_max.push_back(enc_layer.ffn_second_output_clip_max()); } // for @@ -487,7 +486,6 @@ void QuantBertWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { dataset_prefix + "/ffn_first_output_clip_max", H5T_NATIVE_FLOAT, &clip_max); _enc_clip_max.push_back(clip_max); - _enc_clip_max.push_back(0.0); } // for std::vector<_DataType> raw_value; diff --git a/lightseq/inference/proto/quant_bert_weight.h b/lightseq/inference/proto/quant_bert_weight.h index c511f133..aa92ae16 100644 --- a/lightseq/inference/proto/quant_bert_weight.h +++ b/lightseq/inference/proto/quant_bert_weight.h @@ -44,7 +44,7 @@ class QuantBertWeight { // store the clip_max of weights and activations float _src_emb_clip_max; - std::vector _enc_clip_max; // size: 12 * enc_layer_num + std::vector _enc_clip_max; // size: 11 * enc_layer_num public: std::string initializing(std::string proto_path); diff --git a/lightseq/inference/proto/quant_gpt.proto b/lightseq/inference/proto/quant_gpt.proto new file mode 100644 index 00000000..aa9d4c4b --- /dev/null +++ b/lightseq/inference/proto/quant_gpt.proto @@ -0,0 +1,66 @@ +syntax = "proto3"; +option optimize_for = LITE_RUNTIME; +// all the matrix are stored in row-major order, +// plz see https://en.wikipedia.org/wiki/Row-_and_column-major_order for details + +// the definition of "Multi-Head Attention", "Scaled Dot-Product Attention" and +// "Feed-Forward Networks" +// plz see https://arxiv.org/abs/1706.03762 for details + +message QuantGptEncoderLayer { + // layer norm before "Multi-Head Attention" + repeated float multihead_norm_scale = 1; + repeated float multihead_norm_bias = 2; + + // "Multi-Head Attention" linearly project weights kernel for query, key, + // value, + // before "Scaled Dot-Product Attention, with shape (hidden_size, + // hidden_size*3) + // is built by numpy.concatenate((query_kernel, key_kernel, value_kernel), + // axis=1) + // perform numpy.dot(input, multihead_project_kernel_qkv) will get the [query, + // key, value] of + // "Scaled Dot-Product Attention" + repeated float multihead_project_kernel_qkv = 3; + repeated float multihead_project_bias_qkv = 4; + // "Multi-Head Attention" linearly project weights kernel for output + // after "Scaled Dot-Product Attention", with shape (hidden_size, hidden_size) + repeated float multihead_project_kernel_output = 5; + repeated float multihead_project_bias_output = 6; + + // layer norm before "Feed-Forward Networks" + repeated float ffn_norm_scale = 7; + repeated float ffn_norm_bias = 8; + + // "Feed-Forward Networks" + repeated float ffn_first_kernel = 9; + repeated float ffn_first_bias = 10; + repeated float ffn_second_kernel = 11; + repeated float ffn_second_bias = 12; +} + +message QuantGptEmbeddingLayer { + // token embedding table + // for encoder, it is in [src_vocab_size, hidden_size] + // so, look it up directly will get the input token embedding + repeated float token_embedding = 1; + repeated float position_embedding = 2; + // the last layer_norm of encoder + repeated float norm_scale = 3; + repeated float norm_bias = 4; +} + +message QuantGptModelConf { + int32 head_num = 1; + int32 src_padding_id = 2; + string sampling_method = 3; + float topp = 4; + int32 topk = 5; + int32 eos_id = 6; +} + +message QuantGpt { + QuantGptEmbeddingLayer src_embedding = 1; + repeated QuantGptEncoderLayer encoder_stack = 2; + QuantGptModelConf model_conf = 3; +} diff --git a/lightseq/inference/proto/quant_gpt_weight.cc b/lightseq/inference/proto/quant_gpt_weight.cc new file mode 100644 index 00000000..959d0fc5 --- /dev/null +++ b/lightseq/inference/proto/quant_gpt_weight.cc @@ -0,0 +1,511 @@ +#include "quant_gpt_weight.h" + +#include + +/** +@file +Load the model weights which stored in custom proto file into GPU memory. +Currently, fp16 and fp32 versions are provided. +Weights in proto file will always be in fp32. For fp16, the weights + will be casted from fp32 into fp16 +*/ + +namespace lightseq { +namespace cuda { + +/** +Cast weights into required datatype. +The datatype of weights in custom proto file will always be in fp32. +*/ +template <> +float QuantGptWeight::float2required(float value) { + return value; +} + +/** +fp16 version, cast fp32 into fp16 +*/ +template <> +__half QuantGptWeight::float2required(float value) { + return __float2half_rn(value); +} + +/** +Read model config stored in custom proto file. +*/ +template +void QuantGptWeight::proto_get_model_config(const QuantGpt &gpt) { + _hidden_size = gpt.src_embedding().norm_scale_size(); + _inner_size = gpt.encoder_stack()[0].ffn_first_kernel_size() / _hidden_size; + _max_step = gpt.src_embedding().position_embedding_size() / _hidden_size; + _src_vocab_size = gpt.src_embedding().token_embedding_size() / _hidden_size; + _n_enc_layer = gpt.encoder_stack_size(); + _head_num = gpt.model_conf().head_num(); + if (_hidden_size % _head_num != 0) { + throw std::runtime_error("Wrong head_num: hidden_size " + + std::to_string(_hidden_size) + " % head_num " + + std::to_string(_head_num) + " != 0."); + } + _dim_per_head = _hidden_size / _head_num; + _weight_per_enc_layer = 12; + _padding_id = gpt.model_conf().src_padding_id(); + if (gpt.model_conf().sampling_method() != "") { + _sampling_method = gpt.model_conf().sampling_method(); + } + if (gpt.model_conf().topk() != 0) { + _topk = gpt.model_conf().topk(); + } + if (gpt.model_conf().topp() != 0.0) { + _topp = gpt.model_conf().topp(); + } + if (gpt.model_conf().eos_id() != 0) { + _eos_id = gpt.model_conf().eos_id(); + } +} + +/** +Load the weights of embedding layer into GPU memory. +*/ +template +std::string QuantGptWeight::proto_parse_emb_wei( + const QuantGptEmbeddingLayer &layer) { + std::vector offset; + std::vector value; + int idx = 0; + + offset.push_back(idx); + if (layer.token_embedding_size() != _src_vocab_size * _hidden_size) + return "wrong token_embedding_size !"; + for (float ele : layer.token_embedding()) value.push_back(ele); + idx += _src_vocab_size * _hidden_size; + + offset.push_back(idx); + if (layer.position_embedding_size() != _max_step * _hidden_size) + return "wrong position_embedding_size !"; + for (float ele : layer.position_embedding()) value.push_back(ele); + idx += _max_step * _hidden_size; + + offset.push_back(idx); + if (layer.norm_scale_size() != _hidden_size) return "wrong norm_scale_size !"; + for (float ele : layer.norm_scale()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (layer.norm_bias_size() != _hidden_size) return "wrong norm_bias_size !"; + for (float ele : layer.norm_bias()) value.push_back(ele); + idx += _hidden_size; + + std::vector<_DataType> raw_value; + for (float e : value) raw_value.push_back(float2required(e)); + _d_src_emb_wei = raw_value; + for (int e : offset) + _p_d_src_emb_wei.push_back(thrust::raw_pointer_cast(_d_src_emb_wei.data()) + + e); + + std::cout << "finish initializing emb_wei from host to device" << std::endl; + return ""; +} + +/** +Load the weights of encoder into GPU memory. +*/ +template +std::string QuantGptWeight::proto_parse_enc_wei(const QuantGpt &gpt) { + std::vector offset; + std::vector value; + int idx = 0; + + for (auto enc_layer : gpt.encoder_stack()) { + offset.push_back(idx); + if (enc_layer.multihead_norm_scale_size() != _hidden_size) + return "wrong multihead_norm_scale_size !"; + for (float ele : enc_layer.multihead_norm_scale()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.multihead_norm_bias_size() != _hidden_size) + return "wrong multihead_norm_bias_size !"; + for (float ele : enc_layer.multihead_norm_bias()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.multihead_project_kernel_qkv_size() != + _hidden_size * _hidden_size * 3) + return "wrong multihead_project_kernel_qkv_size !"; + for (float ele : enc_layer.multihead_project_kernel_qkv()) + value.push_back(ele); + idx += _hidden_size * _hidden_size * 3; + + offset.push_back(idx); + if (enc_layer.multihead_project_bias_qkv_size() != _hidden_size * 3) + return "wrong multihead_project_bias_qkv_size !"; + for (float ele : enc_layer.multihead_project_bias_qkv()) + value.push_back(ele); + idx += _hidden_size * 3; + + offset.push_back(idx); + if (enc_layer.multihead_project_kernel_output_size() != + _hidden_size * _hidden_size) + return "wrong multihead_project_kernel_output_size !"; + for (float ele : enc_layer.multihead_project_kernel_output()) + value.push_back(ele); + idx += _hidden_size * _hidden_size; + + offset.push_back(idx); + if (enc_layer.multihead_project_bias_output_size() != _hidden_size) + return "wrong multihead_project_bias_output_size !"; + for (float ele : enc_layer.multihead_project_bias_output()) + value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.ffn_norm_scale_size() != _hidden_size) + return "wrong ffn_norm_scale_size !"; + for (float ele : enc_layer.ffn_norm_scale()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.ffn_norm_bias_size() != _hidden_size) + return "wrong ffn_norm_bias_size !"; + for (float ele : enc_layer.ffn_norm_bias()) value.push_back(ele); + idx += _hidden_size; + + offset.push_back(idx); + if (enc_layer.ffn_first_kernel_size() != _hidden_size * _inner_size) + return "wrong ffn_first_kernel_size !"; + for (float ele : enc_layer.ffn_first_kernel()) value.push_back(ele); + idx += _hidden_size * _inner_size; + + offset.push_back(idx); + if (enc_layer.ffn_first_bias_size() != _inner_size) + return "wrong ffn_first_bias_size !"; + for (float ele : enc_layer.ffn_first_bias()) value.push_back(ele); + idx += _inner_size; + + offset.push_back(idx); + if (enc_layer.ffn_second_kernel_size() != _hidden_size * _inner_size) + return "wrong ffn_second_kernel_size !"; + for (float ele : enc_layer.ffn_second_kernel()) value.push_back(ele); + idx += _hidden_size * _inner_size; + + offset.push_back(idx); + if (enc_layer.ffn_second_bias_size() != _hidden_size) + return "wrong ffn_second_bias_size !"; + for (float ele : enc_layer.ffn_second_bias()) value.push_back(ele); + idx += _hidden_size; + + } // for + + std::vector<_DataType> raw_value; + for (float e : value) raw_value.push_back(float2required(e)); + _d_enc_wei = raw_value; + + for (int e : offset) + _p_d_enc_wei.push_back(thrust::raw_pointer_cast(_d_enc_wei.data()) + e); + std::cout << "finish initializing enc_wei from host to device" << std::endl; + return ""; +} + +/** +Read model config stored in custom hdf5 file. +*/ +template +void QuantGptWeight::hdf5_get_model_config(hid_t hdf5_file) { + _hidden_size = get_hdf5_dataset_size(hdf5_file, "src_embedding/norm_scale"); + + _inner_size = + get_hdf5_dataset_size(hdf5_file, "encoder_stack/0/ffn_first_kernel") / + _hidden_size; + + _max_step = + get_hdf5_dataset_size(hdf5_file, "src_embedding/position_embedding") / + _hidden_size; + + _src_vocab_size = + get_hdf5_dataset_size(hdf5_file, "src_embedding/token_embedding") / + _hidden_size; + + read_hdf5_dataset_scalar(hdf5_file, "model_conf/n_encoder_stack", + H5T_NATIVE_INT, &_n_enc_layer); + + read_hdf5_dataset_scalar(hdf5_file, "model_conf/head_num", H5T_NATIVE_INT, + &_head_num); + + _dim_per_head = _hidden_size / _head_num; + + _weight_per_enc_layer = 12; + + read_hdf5_dataset_scalar(hdf5_file, "model_conf/src_padding_id", + H5T_NATIVE_INT, &_padding_id); + + // special handling for string reading + // string were converted to numpy array of np.int8 in python + // hence needed to be read as an char array here + char _sampling_method_buf[128]; // get 128 character for sampling method + int _sampling_method_strlen = read_hdf5_dataset_data( + hdf5_file, "model_conf/sampling_method", H5T_NATIVE_CHAR, + _sampling_method_buf, [](int size) { return size > 128; }, + "Expect model_conf/sampling_method to have less than 128 characters."); + std::string _sampling_method_read = + std::string(_sampling_method_buf, _sampling_method_strlen); + if (_sampling_method_read != "") { + _sampling_method = _sampling_method_read; + } + + int _topk_read; + read_hdf5_dataset_scalar(hdf5_file, "model_conf/topk", H5T_NATIVE_INT, + &_topk_read); + if (_topk_read != 0) { + _topk = _topk_read; + } + + float _topp_read; + read_hdf5_dataset_scalar(hdf5_file, "model_conf/topp", H5T_NATIVE_FLOAT, + &_topp_read); + if (_topp_read != 0.0) { + _topp = _topp_read; + } + + int _eos_id_read; + read_hdf5_dataset_scalar(hdf5_file, "model_conf/eos_id", H5T_NATIVE_INT, + &_eos_id_read); + if (_eos_id_read != 0) { + _eos_id = _eos_id_read; + } +} + +/** +Load the weights of embedding layer into GPU memory. +*/ +template +void QuantGptWeight::hdf5_parse_emb_wei(hid_t hdf5_file) { + std::string dataset_prefix = "src_embedding"; + size_t value_size = _src_vocab_size * _hidden_size + + _max_step * _hidden_size + _hidden_size * 2; + + std::vector offset; + std::vector value(value_size); // preallocate vector for performance + std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) + << " MB of embedding weight." << std::endl; + int idx = 0; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/token_embedding", H5T_NATIVE_FLOAT, + value.data() + idx, + [=](int size) { return size != _src_vocab_size * _hidden_size; }, + "Wrong token_embedding_size !"); + idx += _src_vocab_size * _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/position_embedding", H5T_NATIVE_FLOAT, + value.data() + idx, + [=](int size) { return size != _max_step * _hidden_size; }, + "Wrong position_embedding_size !"); + idx += _max_step * _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/norm_scale", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong norm_scale_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/norm_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong norm_bias_size !"); + idx += _hidden_size; + + std::vector<_DataType> raw_value; + raw_value.reserve(value.size()); + for (float e : value) raw_value.push_back(float2required(e)); + _d_src_emb_wei = raw_value; + for (int e : offset) + _p_d_src_emb_wei.push_back(thrust::raw_pointer_cast(_d_src_emb_wei.data()) + + e); + + std::cout << "finish initializing emb_wei from host to device" << std::endl; +} + +/** +Load the weights of encoder into GPU memory. +*/ +template +void QuantGptWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { + size_t value_size = + (_hidden_size * 2 + _hidden_size * _hidden_size * 3 + _hidden_size * 3 + + _hidden_size * _hidden_size + _hidden_size * 3 + + _hidden_size * _inner_size + _inner_size + _hidden_size * _inner_size + + _hidden_size) * + _n_enc_layer; + std::vector offset; + std::vector value(value_size); + std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) + << " MB of encoder weight." << std::endl; + + int idx = 0; + for (int layer_id = 0; layer_id < _n_enc_layer; ++layer_id) { + std::string dataset_prefix = "encoder_stack/" + std::to_string(layer_id); + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_norm_scale", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong multihead_norm_scale_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_norm_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong multihead_norm_bias_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv", + H5T_NATIVE_FLOAT, value.data() + idx, + [=](int size) { return size != _hidden_size * _hidden_size * 3; }, + "Wrong multihead_project_kernel_qkv_size !"); + idx += _hidden_size * _hidden_size * 3; + + offset.push_back(idx); + + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_project_bias_qkv", + H5T_NATIVE_FLOAT, value.data() + idx, + [=](int size) { return size != _hidden_size * 3; }, + "Wrong multihead_project_bias_qkv_size !"); + idx += _hidden_size * 3; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_project_kernel_output", + H5T_NATIVE_FLOAT, value.data() + idx, + [=](int size) { return size != _hidden_size * _hidden_size; }, + "Wrong multihead_project_kernel_output_size !"); + idx += _hidden_size * _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/multihead_project_bias_output", + H5T_NATIVE_FLOAT, value.data() + idx, + [=](int size) { return size != _hidden_size; }, + "Wrong multihead_project_bias_output_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_norm_scale", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong ffn_norm_scale_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_norm_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong ffn_norm_bias_size !"); + idx += _hidden_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_FLOAT, + value.data() + idx, + [=](int size) { return size != _hidden_size * _inner_size; }, + "Wrong ffn_first_kernel_size !"); + idx += _hidden_size * _inner_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_first_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _inner_size; }, + "Wrong ffn_first_bias_size !"); + idx += _inner_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_FLOAT, + value.data() + idx, + [=](int size) { return size != _hidden_size * _inner_size; }, + "Wrong ffn_second_kernel_size !"); + idx += _hidden_size * _inner_size; + + offset.push_back(idx); + read_hdf5_dataset_data( + hdf5_file, dataset_prefix + "/ffn_second_bias", H5T_NATIVE_FLOAT, + value.data() + idx, [=](int size) { return size != _hidden_size; }, + "Wrong ffn_second_bias_size !"); + idx += _hidden_size; + } // for + + std::vector<_DataType> raw_value; + for (float e : value) raw_value.push_back(float2required(e)); + _d_enc_wei = raw_value; + + for (int e : offset) + _p_d_enc_wei.push_back(thrust::raw_pointer_cast(_d_enc_wei.data()) + e); + std::cout << "finish initializing enc_wei from host to device" << std::endl; +} + +/** +Load the proto file into CPU memory and parse it. +*/ +template +std::string QuantGptWeight::initializing(std::string weight_path) { + // If weight is of type pb, parse using proto parser. + if (endswith(weight_path, ".pb")) { + std::cout << "Parsing protobuf: " << weight_path << std::endl; + QuantGpt gpt; + // Verify that the version of the library that we linked against is + // compatible with the version of the headers we compiled against. + GOOGLE_PROTOBUF_VERIFY_VERSION; + + std::fstream raw_input(weight_path, std::ios::in | std::ios::binary); + if (!gpt.ParseFromIstream(&raw_input)) { + return "Parse weights from [" + weight_path + "] failed."; + } + + proto_get_model_config(gpt); + + std::string res = proto_parse_emb_wei(gpt.src_embedding()); + if (!res.empty()) return res; + + res = proto_parse_enc_wei(gpt); + if (!res.empty()) return res; + + std::cout << "finish initializing all weight from host to device" + << std::endl; + // Optional: Delete all global objects allocated by libprotobuf. + // google::protobuf::ShutdownProtobufLibrary(); + return ""; + } else if (endswith(weight_path, ".hdf5")) { + std::cout << "Parsing hdf5: " << weight_path << std::endl; + + hid_t hdf5_file = H5Fopen(weight_path.c_str(), H5F_ACC_RDONLY, H5P_DEFAULT); + if (hdf5_file < 0) { + return "Unable to read HDF5 file from " + weight_path; + } + hdf5_get_model_config(hdf5_file); + + // hdf5_parse_* would throw std::runtime_error on error + hdf5_parse_emb_wei(hdf5_file); + hdf5_parse_enc_wei(hdf5_file); + H5Fclose(hdf5_file); + + std::cout << "Finish loading all weight from host to device" << std::endl; + return ""; + } else { + return "Unsupported weight extention for [" + weight_path + + "]; Supported extensions: .pb, .hdf5\n"; + } +} + +template class QuantGptWeight; +template class QuantGptWeight; + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/proto/quant_gpt_weight.h b/lightseq/inference/proto/quant_gpt_weight.h new file mode 100644 index 00000000..35d69aa7 --- /dev/null +++ b/lightseq/inference/proto/quant_gpt_weight.h @@ -0,0 +1,84 @@ +#pragma once + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "quant_gpt.pb.h" +#include "../tools/util.h" + +namespace lightseq { +namespace cuda { + +/* +Load the model weights which stored in custom proto file into GPU memory. +*/ +template +class QuantGptWeight { + private: + typedef OperationTypeTraits _optraits; + typedef typename _optraits::DataType _DataType; + + _DataType float2required(float value); + + void proto_get_model_config(const QuantGpt &gpt); + std::string proto_parse_emb_wei(const QuantGptEmbeddingLayer &layer); + std::string proto_parse_enc_wei(const QuantGpt &gpt); + + // parsing function for hdf5 + void hdf5_get_model_config(hid_t hdf5_file); + void hdf5_parse_emb_wei(hid_t hdf5_file); + void hdf5_parse_enc_wei(hid_t hdf5_file); + + // store the weights pointer + std::vector _p_d_src_emb_wei; // size: 4 + std::vector _p_d_enc_wei; // size: 12 * enc_layer_num + + // store the weights on gpu memory + thrust::device_vector<_DataType> _d_src_emb_wei; + thrust::device_vector<_DataType> _d_enc_wei; + + public: + std::string initializing(std::string weight_path); + + const std::vector &get_src_emb_wei() const { + // {token_emb, pos_emb, norm_scale, norm_bias} + return _p_d_src_emb_wei; + } + + const std::vector &get_enc_wei() const { + // {multihead_norm_scale, multihead_norm_bias, multihead_qkv_kernel, + // multihead_qkv_bias multihead_output_kernel, multihead_output_bias + // ffn_norm_scale, ffn_norm_bias} + // ffn_first_kernel, ffn_first_bias, ffn_second_kernel, ffn_second_bias} * + // encoder_layer_num + return _p_d_enc_wei; + } + + int _hidden_size; + int _inner_size; + int _max_step; + int _src_vocab_size; + int _n_enc_layer; // number of encoder layer + int _dim_per_head; + int _weight_per_enc_layer; // 12 + + int _head_num; + int _padding_id; // for src + std::string _sampling_method = "topk"; + int _topk = 4; + float _topp = 0.75; + int _eos_id; +}; + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/pywrapper/CMakeLists.txt b/lightseq/inference/pywrapper/CMakeLists.txt index def6efb9..8edb0863 100644 --- a/lightseq/inference/pywrapper/CMakeLists.txt +++ b/lightseq/inference/pywrapper/CMakeLists.txt @@ -9,20 +9,30 @@ pybind11_add_module( bert.cc quant_transformer.cc quant_bert.cc + quant_gpt.cc moe.cc) target_link_libraries(lightseq PUBLIC gpt_model) target_link_libraries(lightseq PUBLIC bert_model) target_link_libraries(lightseq PUBLIC transformer_model) target_link_libraries(lightseq PUBLIC quant_transformer_model) target_link_libraries(lightseq PUBLIC quant_bert_model) +target_link_libraries(lightseq PUBLIC quant_gpt_model) target_link_libraries(lightseq PUBLIC moe_model) set_target_properties(lightseq PROPERTIES OUTPUT_NAME inference) -add_library(liblightseq SHARED transformer.cc gpt.cc bert.cc - quant_transformer.cc quant_bert.cc moe.cc) +add_library( + liblightseq SHARED + transformer.cc + gpt.cc + bert.cc + quant_transformer.cc + quant_bert.cc + quant_gpt.cc + moe.cc) target_link_libraries(liblightseq PUBLIC transformer_model) target_link_libraries(liblightseq PUBLIC quant_transformer_model) target_link_libraries(liblightseq PUBLIC quant_bert_model) +target_link_libraries(liblightseq PUBLIC quant_gpt_model) target_link_libraries(liblightseq PUBLIC gpt_model) target_link_libraries(liblightseq PUBLIC bert_model) target_link_libraries(liblightseq PUBLIC moe_model) diff --git a/lightseq/inference/pywrapper/quant_gpt.cc b/lightseq/inference/pywrapper/quant_gpt.cc new file mode 100644 index 00000000..332e2db1 --- /dev/null +++ b/lightseq/inference/pywrapper/quant_gpt.cc @@ -0,0 +1,209 @@ +#include "quant_gpt.h" + +namespace lightseq { +namespace cuda { + +QuantGpt::QuantGpt(const std::string weight_path, const int max_batch_size) + : LSModel({"token_ids"}, {"result"}), + stream_(nullptr), + hd_(nullptr), + encoder_(nullptr), + _max_batch_size(max_batch_size) { + /* ---step1. init environment--- */ + CHECK_GPU_ERROR(cudaSetDevice(0)); + CHECK_GPU_ERROR(cudaStreamCreate(&stream_)); + CHECK_GPU_ERROR(cudaStreamCreate(&cache_stream_)); + CHECK_GPU_ERROR(cublasCreate(&hd_)); + CHECK_GPU_ERROR(cublasSetStream(hd_, stream_)); + + /* ---step2. load model weights into GPU memory--- */ + + // saved in custom proto file + std::string model_weights_path = weight_path; + std::string res = tw_.initializing(model_weights_path); + if (!res.empty()) { + throw std::runtime_error(res); + } + + /* + step3. instantiate gpt encoder, init the gpu memory buffer. + using thrust vector to avoid manage gpu memory by hand + */ + + // register device memory for inputs and outputs + CHECK_GPU_ERROR( + cudaMalloc(&d_input_, _max_batch_size * tw_._max_step * sizeof(int))); + CHECK_GPU_ERROR( + cudaMalloc(&d_sample_id, _max_batch_size * tw_._max_step * sizeof(int))); + CHECK_GPU_ERROR(cudaMalloc(&d_ppl, _max_batch_size * sizeof(float))); + + encoder_ = std::make_shared>( + max_batch_size, d_input_, d_ppl, d_sample_id, tw_, stream_, cache_stream_, + hd_); + res = encoder_->check(); + if (!res.empty()) { + throw std::runtime_error(res); + } + + size_t buf_bytesize = encoder_->compute_buffer_bytesize(); + std::cout << "Allocated " << buf_bytesize / (1024 * 1024) + << "MB GPU buffer for GPT2" << std::endl; + + // encoder and decoder use the same buffer to save gpu memory useage + CHECK_GPU_ERROR(cudaMalloc((void**)&d_buf_, (size_t)buf_bytesize)); + encoder_->init_buffer(d_buf_); + CHECK_GPU_ERROR(cudaStreamSynchronize(stream_)); +} + +QuantGpt::~QuantGpt() { + CHECK_GPU_ERROR(cudaFree(d_input_)); + CHECK_GPU_ERROR(cudaFree(d_sample_id)); + CHECK_GPU_ERROR(cudaFree(d_ppl)); + CHECK_GPU_ERROR(cudaFree(d_buf_)); + CHECK_GPU_ERROR(cudaStreamDestroy(stream_)); + CHECK_GPU_ERROR(cudaStreamDestroy(cache_stream_)); + CHECK_GPU_ERROR(cublasDestroy(hd_)); +} + +const int* QuantGpt::get_result_ptr() { return d_sample_id; } +const float* QuantGpt::get_score_ptr() { return d_ppl; } + +void QuantGpt::Infer() { + int batch_size = input_shapes_[0][0], seq_len = input_shapes_[0][1]; + + if (tw_._sampling_method == "ppl") { + encoder_->run_one_infer(batch_size, seq_len); + CHECK_GPU_ERROR(cudaStreamSynchronize(stream_)); + set_output_shape(0, {batch_size}); + } else if (tw_._sampling_method == "topk" || tw_._sampling_method == "topp") { + int sampled_seq_len = encoder_->run_one_sample(batch_size, seq_len); + CHECK_GPU_ERROR(cudaStreamSynchronize(stream_)); + set_output_shape(0, {batch_size, sampled_seq_len}); + } else { + throw std::runtime_error("Unsupported sampling_method"); + } +} + +void QuantGpt::set_input_ptr(int index, void* input_ptr) { + switch (index) { + case 0: + encoder_->_p_d_token_id = static_cast(input_ptr); + break; + + default: + throw std::runtime_error("invalid input index"); + break; + } +} + +void QuantGpt::set_output_ptr(int index, void* output_ptr) { + switch (index) { + case 0: + if (tw_._sampling_method == "ppl") { + encoder_->_p_d_ppl = static_cast(output_ptr); + break; + } else if (tw_._sampling_method == "topk" || + tw_._sampling_method == "topp") { + encoder_->_p_d_sample_id = static_cast(output_ptr); + break; + + } else { + throw std::runtime_error("Unsupported sampling_method"); + break; + } + + default: + throw std::runtime_error("invalid output index"); + break; + } +} + +const void* QuantGpt::get_output_ptr(int index) { + switch (index) { + case 0: + if (tw_._sampling_method == "ppl") { + return static_cast(encoder_->_p_d_ppl); + break; + } else if (tw_._sampling_method == "topk" || + tw_._sampling_method == "topp") { + return static_cast(encoder_->_p_d_sample_id); + break; + } else { + throw std::runtime_error("Unsupported sampling_method"); + break; + } + + default: + throw std::runtime_error("invalid output index"); + break; + } +} + +std::vector QuantGpt::get_input_max_shape(int index) { + switch (index) { + case 0: + return {_max_batch_size, tw_._max_step}; + + default: + throw std::runtime_error("invalid input index"); + break; + } +} + +std::vector QuantGpt::get_output_max_shape(int index) { + switch (index) { + case 0: + + if (tw_._sampling_method == "ppl") { + return {_max_batch_size}; + break; + } else if (tw_._sampling_method == "topk" || + tw_._sampling_method == "topp") { + return {_max_batch_size, tw_._max_step}; + break; + } else { + throw std::runtime_error("Unsupported sampling_method"); + break; + } + + default: + throw std::runtime_error("invalid output index"); + break; + } +} + +DataType QuantGpt::get_input_dtype(int index) { + switch (index) { + case 0: + return DataType::kInt32; + break; + + default: + throw std::runtime_error("invalid input index"); + break; + } +} + +DataType QuantGpt::get_output_dtype(int index) { + switch (index) { + case 0: + if (tw_._sampling_method == "ppl") { + return DataType::kFloat32; + break; + } else if (tw_._sampling_method == "topk" || + tw_._sampling_method == "topp") { + return DataType::kInt32; + break; + } else { + throw std::runtime_error("Unsupported sampling_method"); + break; + } + + default: + throw std::runtime_error("invalid output index"); + break; + } +} + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/pywrapper/quant_gpt.h b/lightseq/inference/pywrapper/quant_gpt.h new file mode 100644 index 00000000..4c87f884 --- /dev/null +++ b/lightseq/inference/pywrapper/quant_gpt.h @@ -0,0 +1,56 @@ + +#include "model_base.h" +#include "../model/quant_gpt_encoder.h" +#include "../proto/quant_gpt_weight.h" +#include "../tools/util.h" + +#ifdef FP16_MODE +const lightseq::cuda::OperationType gpt_optype = + lightseq::cuda::OperationType::FP16; +#else +const lightseq::cuda::OperationType gpt_optype = + lightseq::cuda::OperationType::FP32; +#endif + +namespace lightseq { +namespace cuda { +class QuantGpt : public LSModel { + private: + typedef lightseq::cuda::OperationTypeTraits optraits; + std::shared_ptr> encoder_; + + int* d_input_; + int* d_sample_id; + float* d_ppl; + void* d_buf_; + + int _max_batch_size; + cudaStream_t stream_; + cudaStream_t cache_stream_; + cublasHandle_t hd_; + lightseq::cuda::QuantGptWeight tw_; + std::set available_sampling_methods = {"topk", "topp"}; + + public: + QuantGpt(const std::string weight_path, const int max_batch_size); + + ~QuantGpt(); + + const int* get_result_ptr(); + const float* get_score_ptr(); + const int get_max_step() { return tw_._max_step; } + + void Infer() override; + void set_input_ptr(int index, void* input_ptr) override; + void set_output_ptr(int index, void* output_ptr) override; + const void* get_output_ptr(int index) override; + std::vector get_input_max_shape(int index) override; + std::vector get_output_max_shape(int index) override; + DataType get_input_dtype(int index) override; + DataType get_output_dtype(int index) override; +}; + +LSMODEL_REGISTER(QuantGpt); + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/pywrapper/wrapper.cc b/lightseq/inference/pywrapper/wrapper.cc index 38416130..039e7c77 100644 --- a/lightseq/inference/pywrapper/wrapper.cc +++ b/lightseq/inference/pywrapper/wrapper.cc @@ -424,6 +424,111 @@ class PyGpt { } }; +class PyQuantGpt { + private: + lightseq::cuda::LSModel *model_; + int *d_input_; + std::vector d_outputs_; + + public: + PyQuantGpt(std::string weight_path, int max_batch_size) { + model_ = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( + "QuantGpt", weight_path, max_batch_size); + std::vector max_input_shape = model_->get_input_max_shape(0); + int max_size = + std::accumulate(max_input_shape.begin(), max_input_shape.end(), 1, + std::multiplies()); + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_input_, sizeof(int) * max_size)); + + for (int i = 0; i < model_->get_output_size(); i++) { + void *d_output; + std::vector shape = model_->get_output_max_shape(i); + int output_size = std::accumulate(shape.begin(), shape.end(), 1, + std::multiplies()); + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_output, output_size * sizeof(int))); + model_->set_output_ptr(i, d_output); + d_outputs_.push_back(d_output); + } + } + ~PyQuantGpt() { + delete model_; + lightseq::cuda::CHECK_GPU_ERROR(cudaFree(d_input_)); + for (auto d_output : d_outputs_) { + lightseq::cuda::CHECK_GPU_ERROR(cudaFree(d_output)); + } + } + + py::array_t sample( + py::array_t input_seq) { + auto input_seq_out = input_seq.mutable_unchecked<2>(); + const int *input_seq_data = input_seq_out.data(0, 0); + int batch_size = input_seq_out.shape(0); + int batch_seq_len = input_seq_out.shape(1); + if (model_->get_output_dtype(0) != lightseq::cuda::DataType::kInt32) { + throw std::runtime_error( + "This model is not for sample, maybe you have set the " + "sampling_method to " + "ppl"); + } + + lightseq::cuda::CHECK_GPU_ERROR( + cudaMemcpy(d_input_, input_seq_data, sizeof(int) * input_seq_out.size(), + cudaMemcpyHostToDevice)); + + model_->set_input_ptr(0, d_input_); + model_->set_input_shape(0, {batch_size, batch_seq_len}); + + model_->Infer(); + + std::vector output_shape = model_->get_output_shape(0); + auto output = py::array_t(output_shape); + int *output_data = output.mutable_data(0, 0); + const int *d_output = static_cast(model_->get_output_ptr(0)); + lightseq::cuda::CHECK_GPU_ERROR(cudaMemcpy(output_data, d_output, + sizeof(int) * output.size(), + cudaMemcpyDeviceToHost)); + + return output; + } + + py::array_t ppl( + py::array_t input_seq) { + auto input_seq_out = input_seq.mutable_unchecked<2>(); + const int *input_seq_data = input_seq_out.data(0, 0); + int batch_size = input_seq_out.shape(0); + int batch_seq_len = input_seq_out.shape(1); + + if (model_->get_output_dtype(0) != lightseq::cuda::DataType::kFloat32) { + throw std::runtime_error( + "This model is not for ppl, you should set the sampling_method to " + "ppl"); + } + + lightseq::cuda::CHECK_GPU_ERROR( + cudaMemcpy(d_input_, input_seq_data, sizeof(int) * input_seq_out.size(), + cudaMemcpyHostToDevice)); + + model_->set_input_ptr(0, d_input_); + model_->set_input_shape(0, {batch_size, batch_seq_len}); + + model_->Infer(); + + std::vector output_shape = model_->get_output_shape(0); + + auto output = py::array_t(output_shape); + float *output_data = output.mutable_data(0, 0); + const float *d_output = + static_cast(model_->get_output_ptr(0)); + lightseq::cuda::CHECK_GPU_ERROR(cudaMemcpy(output_data, d_output, + sizeof(float) * output.size(), + cudaMemcpyDeviceToHost)); + + return output; + } +}; + class PyMoe { private: lightseq::cuda::LSModel *model_; @@ -524,6 +629,14 @@ PYBIND11_MODULE(inference, m) { .def("sample", &PyGpt::sample, py::return_value_policy::reference_internal, py::arg("input_seq")); + py::class_(m, "QuantGpt") + .def(py::init(), py::arg("weight_path"), + py::arg("max_batch_size")) + .def("ppl", &PyQuantGpt::ppl, py::return_value_policy::reference_internal, + py::arg("input_seq")) + .def("sample", &PyQuantGpt::sample, + py::return_value_policy::reference_internal, py::arg("input_seq")); + py::class_(m, "Bert") .def(py::init(), py::arg("weight_path"), py::arg("max_batch_size")) From 61cb0c42a3f0a6ad139e0b3863942812ba223045 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 20 Apr 2022 18:44:16 +0800 Subject: [PATCH 33/49] support quant gpt inference (stage 1) --- .../ls_torch_hf_quant_gpt2_export.py | 24 +-- lightseq/inference/model/quant_bert_encoder.h | 2 +- lightseq/inference/proto/quant_bert_weight.cc | 2 +- lightseq/inference/proto/quant_gpt.proto | 59 +++---- lightseq/inference/proto/quant_gpt_weight.cc | 148 ++++++++++++++---- lightseq/inference/proto/quant_gpt_weight.h | 15 +- 6 files changed, 172 insertions(+), 78 deletions(-) diff --git a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py index b403c981..64c5b2a2 100644 --- a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py +++ b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py @@ -24,12 +24,12 @@ """ enc_layer_mapping_dict = OrderedDict( { - "self_norm_scale": "self_attn_layer_norm weight", - "self_norm_bias": "self_attn_layer_norm bias", - "self_project_kernel_qkv": "self_attn qkv_proj weight&&expression_.transpose(0, 1)", - "self_project_bias_qkv": "self_attn qkv_proj bias", - "self_project_kernel_output": "self_attn out_proj weight&&expression_.transpose(0, 1)", - "self_project_bias_output": "self_attn out_proj bias", + "multihead_norm_scale": "self_attn_layer_norm weight", + "multihead_norm_bias": "self_attn_layer_norm bias", + "multihead_project_kernel_qkv": "self_attn qkv_proj weight&&expression_.transpose(0, 1)", + "multihead_project_bias_qkv": "self_attn qkv_proj bias", + "multihead_project_kernel_output": "self_attn out_proj weight&&expression_.transpose(0, 1)", + "multihead_project_bias_output": "self_attn out_proj bias", "ffn_norm_scale": "final_layer_norm weight", "ffn_norm_bias": "final_layer_norm bias", "ffn_first_kernel": "fc1 weight&&expression_.transpose(0, 1)", @@ -37,17 +37,17 @@ "ffn_second_kernel": "fc2 weight&&expression_.transpose(0, 1)", "ffn_second_bias": "fc2 bias", # weight_clip_max - "self_project_kernel_qkv_clip_max": "self_attn qkv_proj weight_quant clip_value_max", - "self_project_kernel_output_clip_max": "self_attn out_proj weight_quant clip_value_max", + "multihead_project_kernel_qkv_clip_max": "self_attn qkv_proj weight_quant clip_value_max", + "multihead_project_kernel_output_clip_max": "self_attn out_proj weight_quant clip_value_max", "ffn_first_kernel_clip_max": "fc1 weight_quant clip_value_max", "ffn_second_kernel_clip_max": "fc2 weight_quant clip_value_max", # act_clip_max - "self_ln_clip_max": "self_attn qkv_proj input_quant clip_value_max", - "self_project_output_clip_max": "self_attn out_proj input_quant clip_value_max", + "multihead_ln_clip_max": "self_attn qkv_proj input_quant clip_value_max", + "multihead_project_output_clip_max": "self_attn out_proj input_quant clip_value_max", "ffn_ln_clip_max": "fc1 input_quant clip_value_max", "ffn_first_act_clip_max": "fc2 input_quant clip_value_max", - "self_qkv_dense_clip_max": "self_attn qkv_proj output_quant clip_value_max", - "self_output_dense_clip_max": "self_attn out_proj output_quant clip_value_max", + "multihead_qkv_dense_clip_max": "self_attn qkv_proj output_quant clip_value_max", + "multihead_output_dense_clip_max": "self_attn out_proj output_quant clip_value_max", "ffn_first_output_clip_max": "fc1 output_quant clip_value_max", "self_qkv_bias_out_clip_max": "self_attn attention_quant clip_value_max", } diff --git a/lightseq/inference/model/quant_bert_encoder.h b/lightseq/inference/model/quant_bert_encoder.h index 55c26702..e8432a9d 100644 --- a/lightseq/inference/model/quant_bert_encoder.h +++ b/lightseq/inference/model/quant_bert_encoder.h @@ -82,7 +82,7 @@ class QuantBertEncoder { int8_t *_int8_p_d_src_emb_wei; const float _quant_range = 127; const float _src_emb_clip_max; - const std::vector _enc_clip_max; // size: 12 * enc_layer_num + const std::vector _enc_clip_max; // size: 11 * enc_layer_num std::vector<_DataType *> _scaled_ffn2_colsum; int _batch_size; diff --git a/lightseq/inference/proto/quant_bert_weight.cc b/lightseq/inference/proto/quant_bert_weight.cc index a69999a8..c64250b0 100644 --- a/lightseq/inference/proto/quant_bert_weight.cc +++ b/lightseq/inference/proto/quant_bert_weight.cc @@ -166,7 +166,7 @@ std::string QuantBertWeight::proto_parse_enc_wei( offset.push_back(idx); if (enc_layer.ffn_first_kernel().size() != _hidden_size * _inner_size) return "wrong ffn_first_kernel_size !"; - for (float ele : enc_layer.ffn_first_kernel()) + for (unsigned char ele : enc_layer.ffn_first_kernel()) value.push_back( dequantize(ele, _quant_range, enc_layer.ffn_first_kernel_clip_max())); idx += _hidden_size * _inner_size; diff --git a/lightseq/inference/proto/quant_gpt.proto b/lightseq/inference/proto/quant_gpt.proto index aa9d4c4b..319ab5ff 100644 --- a/lightseq/inference/proto/quant_gpt.proto +++ b/lightseq/inference/proto/quant_gpt.proto @@ -8,46 +8,49 @@ option optimize_for = LITE_RUNTIME; // plz see https://arxiv.org/abs/1706.03762 for details message QuantGptEncoderLayer { - // layer norm before "Multi-Head Attention" - repeated float multihead_norm_scale = 1; - repeated float multihead_norm_bias = 2; - - // "Multi-Head Attention" linearly project weights kernel for query, key, - // value, - // before "Scaled Dot-Product Attention, with shape (hidden_size, - // hidden_size*3) - // is built by numpy.concatenate((query_kernel, key_kernel, value_kernel), - // axis=1) - // perform numpy.dot(input, multihead_project_kernel_qkv) will get the [query, - // key, value] of - // "Scaled Dot-Product Attention" - repeated float multihead_project_kernel_qkv = 3; - repeated float multihead_project_bias_qkv = 4; - // "Multi-Head Attention" linearly project weights kernel for output - // after "Scaled Dot-Product Attention", with shape (hidden_size, hidden_size) - repeated float multihead_project_kernel_output = 5; - repeated float multihead_project_bias_output = 6; - - // layer norm before "Feed-Forward Networks" - repeated float ffn_norm_scale = 7; - repeated float ffn_norm_bias = 8; + // decoder-self-attention + repeated float multihead_norm_scale = 1; // [hidden_size] + repeated float multihead_norm_bias = 2; // [hidden_size] + bytes multihead_project_kernel_qkv = 3; // [hidden_size, 3, hidden_size] + repeated float multihead_project_bias_qkv = 4; // [3, hidden_size] + bytes multihead_project_kernel_output = 5; // [hidden_size, hidden_size] + repeated float multihead_project_bias_output = 6; // [hidden_size] // "Feed-Forward Networks" - repeated float ffn_first_kernel = 9; - repeated float ffn_first_bias = 10; - repeated float ffn_second_kernel = 11; - repeated float ffn_second_bias = 12; + repeated float ffn_norm_scale = 7; // [hidden_size] + repeated float ffn_norm_bias = 8; // [hidden_size] + bytes ffn_first_kernel = 9; // [hidden_size, inner_size] + repeated float ffn_first_bias = 10; // [inner_size] + bytes ffn_second_kernel = 11; // [inner_size, hidden_size] + repeated float ffn_second_bias = 12; // [hidden_size] + + // clip max + float multihead_project_kernel_qkv_clip_max = 13; + float multihead_project_kernel_output_clip_max = 14; + float ffn_first_kernel_clip_max = 15; + float ffn_second_kernel_clip_max = 16; + float multihead_ln_clip_max = 17; + float multihead_project_output_clip_max = 18; + float ffn_ln_clip_max = 19; + float ffn_first_act_clip_max = 20; + float multihead_qkv_dense_clip_max = 21; + float multihead_output_dense_clip_max = 22; + float ffn_first_output_clip_max = 23; + float self_qkv_bias_out_clip_max = 24; } message QuantGptEmbeddingLayer { // token embedding table // for encoder, it is in [src_vocab_size, hidden_size] // so, look it up directly will get the input token embedding - repeated float token_embedding = 1; + bytes token_embedding = 1; repeated float position_embedding = 2; // the last layer_norm of encoder repeated float norm_scale = 3; repeated float norm_bias = 4; + + // clip max + float emb_clip_max = 5; } message QuantGptModelConf { diff --git a/lightseq/inference/proto/quant_gpt_weight.cc b/lightseq/inference/proto/quant_gpt_weight.cc index 959d0fc5..4fd9b736 100644 --- a/lightseq/inference/proto/quant_gpt_weight.cc +++ b/lightseq/inference/proto/quant_gpt_weight.cc @@ -36,9 +36,9 @@ Read model config stored in custom proto file. template void QuantGptWeight::proto_get_model_config(const QuantGpt &gpt) { _hidden_size = gpt.src_embedding().norm_scale_size(); - _inner_size = gpt.encoder_stack()[0].ffn_first_kernel_size() / _hidden_size; + _inner_size = gpt.encoder_stack()[0].ffn_first_kernel().size() / _hidden_size; _max_step = gpt.src_embedding().position_embedding_size() / _hidden_size; - _src_vocab_size = gpt.src_embedding().token_embedding_size() / _hidden_size; + _src_vocab_size = gpt.src_embedding().token_embedding().size() / _hidden_size; _n_enc_layer = gpt.encoder_stack_size(); _head_num = gpt.model_conf().head_num(); if (_hidden_size % _head_num != 0) { @@ -74,10 +74,12 @@ std::string QuantGptWeight::proto_parse_emb_wei( int idx = 0; offset.push_back(idx); - if (layer.token_embedding_size() != _src_vocab_size * _hidden_size) + if (layer.token_embedding().size() != _src_vocab_size * _hidden_size) return "wrong token_embedding_size !"; - for (float ele : layer.token_embedding()) value.push_back(ele); + for (unsigned char ele : layer.token_embedding()) + value.push_back(dequantize(ele, _quant_range, layer.emb_clip_max())); idx += _src_vocab_size * _hidden_size; + _src_emb_clip_max = layer.emb_clip_max(); offset.push_back(idx); if (layer.position_embedding_size() != _max_step * _hidden_size) @@ -98,9 +100,7 @@ std::string QuantGptWeight::proto_parse_emb_wei( std::vector<_DataType> raw_value; for (float e : value) raw_value.push_back(float2required(e)); _d_src_emb_wei = raw_value; - for (int e : offset) - _p_d_src_emb_wei.push_back(thrust::raw_pointer_cast(_d_src_emb_wei.data()) + - e); + for (int e : offset) _p_d_src_emb_wei.push_back(_d_src_emb_wei.data() + e); std::cout << "finish initializing emb_wei from host to device" << std::endl; return ""; @@ -129,11 +129,13 @@ std::string QuantGptWeight::proto_parse_enc_wei(const QuantGpt &gpt) { idx += _hidden_size; offset.push_back(idx); - if (enc_layer.multihead_project_kernel_qkv_size() != + if (enc_layer.multihead_project_kernel_qkv().size() != _hidden_size * _hidden_size * 3) return "wrong multihead_project_kernel_qkv_size !"; - for (float ele : enc_layer.multihead_project_kernel_qkv()) - value.push_back(ele); + for (unsigned char ele : enc_layer.multihead_project_kernel_qkv()) + value.push_back( + dequantize(ele, _quant_range, + enc_layer.multihead_project_kernel_qkv_clip_max())); idx += _hidden_size * _hidden_size * 3; offset.push_back(idx); @@ -144,11 +146,13 @@ std::string QuantGptWeight::proto_parse_enc_wei(const QuantGpt &gpt) { idx += _hidden_size * 3; offset.push_back(idx); - if (enc_layer.multihead_project_kernel_output_size() != + if (enc_layer.multihead_project_kernel_output().size() != _hidden_size * _hidden_size) return "wrong multihead_project_kernel_output_size !"; - for (float ele : enc_layer.multihead_project_kernel_output()) - value.push_back(ele); + for (unsigned char ele : enc_layer.multihead_project_kernel_output()) + value.push_back( + dequantize(ele, _quant_range, + enc_layer.multihead_project_kernel_output_clip_max())); idx += _hidden_size * _hidden_size; offset.push_back(idx); @@ -171,9 +175,11 @@ std::string QuantGptWeight::proto_parse_enc_wei(const QuantGpt &gpt) { idx += _hidden_size; offset.push_back(idx); - if (enc_layer.ffn_first_kernel_size() != _hidden_size * _inner_size) + if (enc_layer.ffn_first_kernel().size() != _hidden_size * _inner_size) return "wrong ffn_first_kernel_size !"; - for (float ele : enc_layer.ffn_first_kernel()) value.push_back(ele); + for (unsigned char ele : enc_layer.ffn_first_kernel()) + value.push_back( + dequantize(ele, _quant_range, enc_layer.ffn_first_kernel_clip_max())); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -183,9 +189,11 @@ std::string QuantGptWeight::proto_parse_enc_wei(const QuantGpt &gpt) { idx += _inner_size; offset.push_back(idx); - if (enc_layer.ffn_second_kernel_size() != _hidden_size * _inner_size) + if (enc_layer.ffn_second_kernel().size() != _hidden_size * _inner_size) return "wrong ffn_second_kernel_size !"; - for (float ele : enc_layer.ffn_second_kernel()) value.push_back(ele); + for (unsigned char ele : enc_layer.ffn_second_kernel()) + value.push_back(dequantize(ele, _quant_range, + enc_layer.ffn_second_kernel_clip_max())); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -194,14 +202,27 @@ std::string QuantGptWeight::proto_parse_enc_wei(const QuantGpt &gpt) { for (float ele : enc_layer.ffn_second_bias()) value.push_back(ele); idx += _hidden_size; + _enc_clip_max.push_back(enc_layer.multihead_project_kernel_qkv_clip_max()); + _enc_clip_max.push_back( + enc_layer.multihead_project_kernel_output_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_first_kernel_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_second_kernel_clip_max()); + _enc_clip_max.push_back(enc_layer.multihead_ln_clip_max()); + _enc_clip_max.push_back(enc_layer.multihead_project_output_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_ln_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_first_act_clip_max()); + _enc_clip_max.push_back(enc_layer.multihead_qkv_dense_clip_max()); + _enc_clip_max.push_back(enc_layer.multihead_output_dense_clip_max()); + _enc_clip_max.push_back(enc_layer.ffn_first_output_clip_max()); + _enc_clip_max.push_back(enc_layer.self_qkv_bias_out_clip_max()); + } // for std::vector<_DataType> raw_value; for (float e : value) raw_value.push_back(float2required(e)); _d_enc_wei = raw_value; - for (int e : offset) - _p_d_enc_wei.push_back(thrust::raw_pointer_cast(_d_enc_wei.data()) + e); + for (int e : offset) _p_d_enc_wei.push_back(_d_enc_wei.data() + e); std::cout << "finish initializing enc_wei from host to device" << std::endl; return ""; } @@ -285,16 +306,23 @@ void QuantGptWeight::hdf5_parse_emb_wei(hid_t hdf5_file) { std::vector offset; std::vector value(value_size); // preallocate vector for performance + std::vector value_i8(value_size); std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) << " MB of embedding weight." << std::endl; int idx = 0; + float clip_max; offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/token_embedding", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/token_embedding", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _src_vocab_size * _hidden_size; }, "Wrong token_embedding_size !"); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/emb_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _src_vocab_size * _hidden_size); + _src_emb_clip_max = clip_max; idx += _src_vocab_size * _hidden_size; offset.push_back(idx); @@ -323,9 +351,7 @@ void QuantGptWeight::hdf5_parse_emb_wei(hid_t hdf5_file) { raw_value.reserve(value.size()); for (float e : value) raw_value.push_back(float2required(e)); _d_src_emb_wei = raw_value; - for (int e : offset) - _p_d_src_emb_wei.push_back(thrust::raw_pointer_cast(_d_src_emb_wei.data()) + - e); + for (int e : offset) _p_d_src_emb_wei.push_back(_d_src_emb_wei.data() + e); std::cout << "finish initializing emb_wei from host to device" << std::endl; } @@ -343,9 +369,11 @@ void QuantGptWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { _n_enc_layer; std::vector offset; std::vector value(value_size); + std::vector value_i8(value_size); std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) << " MB of encoder weight." << std::endl; + float clip_max; int idx = 0; for (int layer_id = 0; layer_id < _n_enc_layer; ++layer_id) { std::string dataset_prefix = "encoder_stack/" + std::to_string(layer_id); @@ -367,13 +395,18 @@ void QuantGptWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size * 3; }, "Wrong multihead_project_kernel_qkv_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_kernel_qkv_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size * 3); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size * 3; offset.push_back(idx); - read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/multihead_project_bias_qkv", H5T_NATIVE_FLOAT, value.data() + idx, @@ -384,9 +417,15 @@ void QuantGptWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( hdf5_file, dataset_prefix + "/multihead_project_kernel_output", - H5T_NATIVE_FLOAT, value.data() + idx, + H5T_NATIVE_UCHAR, value_i8.data() + idx, [=](int size) { return size != _hidden_size * _hidden_size; }, "Wrong multihead_project_kernel_output_size !"); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_kernel_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _hidden_size); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _hidden_size; offset.push_back(idx); @@ -413,10 +452,16 @@ void QuantGptWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/ffn_first_kernel", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _hidden_size * _inner_size; }, "Wrong ffn_first_kernel_size !"); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_kernel_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _inner_size); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -428,10 +473,16 @@ void QuantGptWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { offset.push_back(idx); read_hdf5_dataset_data( - hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_FLOAT, - value.data() + idx, + hdf5_file, dataset_prefix + "/ffn_second_kernel", H5T_NATIVE_UCHAR, + value_i8.data() + idx, [=](int size) { return size != _hidden_size * _inner_size; }, "Wrong ffn_second_kernel_size !"); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_second_kernel_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + dequantize_array(value_i8, value, clip_max, _quant_range, idx, + _hidden_size * _inner_size); + _enc_clip_max.push_back(clip_max); idx += _hidden_size * _inner_size; offset.push_back(idx); @@ -440,14 +491,45 @@ void QuantGptWeight::hdf5_parse_enc_wei(hid_t hdf5_file) { value.data() + idx, [=](int size) { return size != _hidden_size; }, "Wrong ffn_second_bias_size !"); idx += _hidden_size; + + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/multihead_ln_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_project_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/ffn_ln_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_act_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/multihead_qkv_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar( + hdf5_file, dataset_prefix + "/multihead_output_dense_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/ffn_first_output_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); + read_hdf5_dataset_scalar(hdf5_file, + dataset_prefix + "/self_qkv_bias_out_clip_max", + H5T_NATIVE_FLOAT, &clip_max); + _enc_clip_max.push_back(clip_max); } // for std::vector<_DataType> raw_value; for (float e : value) raw_value.push_back(float2required(e)); _d_enc_wei = raw_value; - for (int e : offset) - _p_d_enc_wei.push_back(thrust::raw_pointer_cast(_d_enc_wei.data()) + e); + for (int e : offset) _p_d_enc_wei.push_back(_d_enc_wei.data() + e); std::cout << "finish initializing enc_wei from host to device" << std::endl; } diff --git a/lightseq/inference/proto/quant_gpt_weight.h b/lightseq/inference/proto/quant_gpt_weight.h index 35d69aa7..7b5f5631 100644 --- a/lightseq/inference/proto/quant_gpt_weight.h +++ b/lightseq/inference/proto/quant_gpt_weight.h @@ -5,7 +5,6 @@ #include #include #include -#include #include #include @@ -44,8 +43,12 @@ class QuantGptWeight { std::vector _p_d_enc_wei; // size: 12 * enc_layer_num // store the weights on gpu memory - thrust::device_vector<_DataType> _d_src_emb_wei; - thrust::device_vector<_DataType> _d_enc_wei; + std::vector<_DataType> _d_src_emb_wei; + std::vector<_DataType> _d_enc_wei; + + // store the clip_max of weights and activations + float _src_emb_clip_max; + std::vector _enc_clip_max; // size: 11 * enc_layer_num public: std::string initializing(std::string weight_path); @@ -64,6 +67,12 @@ class QuantGptWeight { return _p_d_enc_wei; } + float get_src_emb_clip_max() const { return _src_emb_clip_max; } + + std::vector get_enc_clip_max() const { return _enc_clip_max; } + + const float _quant_range = 127; + int _hidden_size; int _inner_size; int _max_step; From c5f6aa2b014c15135bed44f37bcb1658cf85bf47 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 21 Apr 2022 17:09:37 +0800 Subject: [PATCH 34/49] add fake quant for logits gemm --- examples/inference/python/README.md | 9 +++++++++ .../ls_torch_hf_quant_gpt2_export.py | 2 ++ .../huggingface/gpt/ls_hf_gpt_layer.py | 6 ++++++ lightseq/inference/model/quant_gpt_encoder.h | 20 +++++++++++++++++++ lightseq/inference/proto/quant_gpt.proto | 2 ++ lightseq/inference/proto/quant_gpt_weight.cc | 12 +++++++---- lightseq/inference/proto/quant_gpt_weight.h | 2 ++ 7 files changed, 49 insertions(+), 4 deletions(-) diff --git a/examples/inference/python/README.md b/examples/inference/python/README.md index 449e4ae9..595fa688 100644 --- a/examples/inference/python/README.md +++ b/examples/inference/python/README.md @@ -18,6 +18,7 @@ We provide the following export examples. All Fairseq based models are trained u | Hugging Face + custom Torch layer BERT + QAT | Int8 | python export/huggingface/ls_torch_hf_quant_bert_export.py -m ckpt_ls_torch_hf_quant_bert_ner.bin | / | Export Hugging Face BERT training with custom Torch layers to hdf5 format. | | Hugging Face GPT2 | Float | python export/huggingface/hf_gpt2_export.py | / | Export Hugging Face GPT2 models to hdf5 format. | | Hugging Face + custom Torch layer GPT2 + QAT | Int8 | python export/huggingface/ls_torch_hf_quant_gpt2_export.py -m ckpt_ls_torch_hf_quant_gpt2_ner.bin | / | Export Hugging Face GPT2 training with custom Torch layers to hdf5 format. | +| Hugging Face ViT | Float | python export/huggingface/hf_vit_export.py | / | Export Hugging Face ViT models to hdf5 format. | | Native Fairseq Transformer | Float | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to protobuf/hdf5 format. | | Native Fairseq Transformer + PTQ | Int8 | python export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_native_fairseq_31.06.pt) | Export native Fairseq Transformer models to int8 protobuf format using post training quantization. | | Fairseq + LightSeq Transformer | Float | python export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.pt | [link](http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/lightseq/example_model/fairseq/ckpt_ls_fairseq_31.17.pt) | Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. | @@ -45,6 +46,14 @@ python test/ls_gpt2.py ```shell python test/ls_vit.py ``` +5. Quantized BERT +```shell +python test/ls_quant_bert.py +``` +6. Quantized GPT2 +```shell +python test/ls_quant_gpt.py +``` ### Fairseq based models After exporting the Fairseq based models to protobuf/hdf5 format using above scripts, we can use the following script for fast LightSeq inference on wmt14 en2de dateset, compatible with fp16 and int8 models: diff --git a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py index 64c5b2a2..f1925318 100644 --- a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py +++ b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py @@ -57,6 +57,8 @@ { "norm_scale": "ln_f weight", "norm_bias": "ln_f bias", + "output_ln_clip_max": "lm_head input_quant clip_value_max", + "logits_clip_max": "lm_head output_quant clip_value_max", } ) diff --git a/examples/training/huggingface/gpt/ls_hf_gpt_layer.py b/examples/training/huggingface/gpt/ls_hf_gpt_layer.py index b22c7325..45ab743c 100644 --- a/examples/training/huggingface/gpt/ls_hf_gpt_layer.py +++ b/examples/training/huggingface/gpt/ls_hf_gpt_layer.py @@ -3,6 +3,7 @@ from lightseq.training.ops.pytorch.quantization import ( qat_mode, disable_quant, + QuantLinear, TensorQuantizer, weight_quant_config, ) @@ -121,3 +122,8 @@ def inject_ls_layer(model, training_args, model_args, config): model.transformer.h[i].apply(qat_mode) else: model.transformer.h[i].apply(disable_quant) + + q_lm_head = QuantLinear(config.n_embd, config.vocab_size, bias=False) + q_lm_head.weight = model.transformer.wte.weight + q_lm_head.weight_quant = model.transformer.wte.emb_quant + model.lm_head = q_lm_head diff --git a/lightseq/inference/model/quant_gpt_encoder.h b/lightseq/inference/model/quant_gpt_encoder.h index d18579d8..cbb7ba50 100644 --- a/lightseq/inference/model/quant_gpt_encoder.h +++ b/lightseq/inference/model/quant_gpt_encoder.h @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -43,8 +44,11 @@ class QuantGptEncoder { cudaStream_t _stream; cudaStream_t _cache_stream; cublasHandle_t _hd; + // cublasLtHandle_t _cublas_lt_handle; const _DataType _fone; const _DataType _fzero; + // const int32_t _ione; + // const int32_t _izero; const _DataType _atten_scaler; const int _max_batch_dim; const int _max_thread_per_block; @@ -71,6 +75,10 @@ class QuantGptEncoder { int *_p_d_unfinished; curandState *_p_d_curandstate; //[batch_size] + // int8_t *_int8_ffn_in_buf; + // int32_t *_int32_ffn_out_buf; + // int8_t *_int8_ffn_out_buf; + // {token_emb, pos_emb, norm_scale, norm_bias} const std::vector &_p_d_src_emb_wei; // {multihead_norm_scale, multihead_norm_bias, multihead_qkv_kernel, @@ -79,6 +87,18 @@ class QuantGptEncoder { // ffn_first_kernel, ffn_first_bias, ffn_second_kernel, ffn_second_bias} * // encoder_layer_num const std::vector &_p_d_enc_wei; + // std::vector _p_device_wei; + // std::vector _p_device_emb; + + // std::vector _int8_p_d_enc_wei; + // int8_t *_int8_p_d_src_emb_wei; + // int8_t *_int8_p_d_src_emb_bottom_wei; + // const float _quant_range = 127; + // const float _src_emb_clip_max; + // const float _output_ln_clip_max; + // const float _logits_clip_max; + // const std::vector _enc_clip_max; // size: 12 * enc_layer_num + // std::vector<_DataType *> _scaled_ffn2_colsum; int _batch_size; int _batch_token_num; diff --git a/lightseq/inference/proto/quant_gpt.proto b/lightseq/inference/proto/quant_gpt.proto index 319ab5ff..ba2c63b7 100644 --- a/lightseq/inference/proto/quant_gpt.proto +++ b/lightseq/inference/proto/quant_gpt.proto @@ -51,6 +51,8 @@ message QuantGptEmbeddingLayer { // clip max float emb_clip_max = 5; + float output_ln_clip_max = 6; + float logits_clip_max = 7; } message QuantGptModelConf { diff --git a/lightseq/inference/proto/quant_gpt_weight.cc b/lightseq/inference/proto/quant_gpt_weight.cc index 4fd9b736..9da21402 100644 --- a/lightseq/inference/proto/quant_gpt_weight.cc +++ b/lightseq/inference/proto/quant_gpt_weight.cc @@ -80,6 +80,8 @@ std::string QuantGptWeight::proto_parse_emb_wei( value.push_back(dequantize(ele, _quant_range, layer.emb_clip_max())); idx += _src_vocab_size * _hidden_size; _src_emb_clip_max = layer.emb_clip_max(); + _output_ln_clip_max = layer.output_ln_clip_max(); + _logits_clip_max = layer.logits_clip_max(); offset.push_back(idx); if (layer.position_embedding_size() != _max_step * _hidden_size) @@ -310,7 +312,6 @@ void QuantGptWeight::hdf5_parse_emb_wei(hid_t hdf5_file) { std::cout << "loading " << value_size * sizeof(OpType_) / (1024 * 1024) << " MB of embedding weight." << std::endl; int idx = 0; - float clip_max; offset.push_back(idx); read_hdf5_dataset_data( @@ -319,10 +320,13 @@ void QuantGptWeight::hdf5_parse_emb_wei(hid_t hdf5_file) { [=](int size) { return size != _src_vocab_size * _hidden_size; }, "Wrong token_embedding_size !"); read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/emb_clip_max", - H5T_NATIVE_FLOAT, &clip_max); - dequantize_array(value_i8, value, clip_max, _quant_range, idx, + H5T_NATIVE_FLOAT, &_src_emb_clip_max); + dequantize_array(value_i8, value, _src_emb_clip_max, _quant_range, idx, _src_vocab_size * _hidden_size); - _src_emb_clip_max = clip_max; + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/output_ln_clip_max", + H5T_NATIVE_FLOAT, &_output_ln_clip_max); + read_hdf5_dataset_scalar(hdf5_file, dataset_prefix + "/logits_clip_max", + H5T_NATIVE_FLOAT, &_logits_clip_max); idx += _src_vocab_size * _hidden_size; offset.push_back(idx); diff --git a/lightseq/inference/proto/quant_gpt_weight.h b/lightseq/inference/proto/quant_gpt_weight.h index 7b5f5631..b9370c7d 100644 --- a/lightseq/inference/proto/quant_gpt_weight.h +++ b/lightseq/inference/proto/quant_gpt_weight.h @@ -48,6 +48,8 @@ class QuantGptWeight { // store the clip_max of weights and activations float _src_emb_clip_max; + float _output_ln_clip_max; + float _logits_clip_max; std::vector _enc_clip_max; // size: 11 * enc_layer_num public: From 292cc3c9ce020d4664e6e612d132b54e911795d7 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 21 Apr 2022 19:11:44 +0800 Subject: [PATCH 35/49] support quant gpt inference (stage 2) --- .../inference/model/quant_gpt_encoder.cc.cu | 26 ++++---------- lightseq/inference/model/quant_gpt_encoder.h | 34 +++++++++---------- lightseq/inference/proto/quant_gpt_weight.h | 4 +++ lightseq/training/ops/pytorch/export.py | 4 +-- 4 files changed, 29 insertions(+), 39 deletions(-) diff --git a/lightseq/inference/model/quant_gpt_encoder.cc.cu b/lightseq/inference/model/quant_gpt_encoder.cc.cu index 2c3ef050..d7e09ea2 100644 --- a/lightseq/inference/model/quant_gpt_encoder.cc.cu +++ b/lightseq/inference/model/quant_gpt_encoder.cc.cu @@ -32,6 +32,12 @@ QuantGptEncoder::QuantGptEncoder( _p_d_enc_wei(tw.get_enc_wei()), _fone((_DataType)1.f), _fzero((_DataType)0.f), + _src_emb_clip_max(tw.get_src_emb_clip_max()), + _output_ln_clip_max(tw.get_output_ln_clip_max()), + _logits_clip_max(tw.get_logits_clip_max()), + _enc_clip_max(tw.get_enc_clip_max()), + _ione((int32_t)1), + _izero((int32_t)0), _atten_scaler((_DataType)sqrt(1.f / tw._dim_per_head)), _max_batch_dim(max_batch_size * tw._max_step * tw._hidden_size), _max_thread_per_block(1024), @@ -40,26 +46,6 @@ QuantGptEncoder::QuantGptEncoder( _h_sample_id(max_batch_size * tw._max_step, 0), _h_unfinished(1) {} -/** -Compute GPU memory size needed by gpt encoder, - to see how these memory is used, checkout init_buffer() for detail -*/ -template -size_t QuantGptEncoder::compute_buffer_bytesize() { - int si = _max_batch_size; - size_t sz0 = (size_t)_max_batch_dim; - sz0 += 2 * (size_t)_max_batch_dim * (size_t)_tw._n_enc_layer; - long long sz1 = (size_t)_max_batch_dim * 6 + - (size_t)_max_batch_size * (size_t)_tw._head_num * - (size_t)_tw._max_step * (size_t)_tw._max_step; - long long sz2 = (size_t)_max_batch_dim + (size_t)_max_batch_size * - (size_t)_tw._max_step * - (size_t)_tw._inner_size; - long long sz3 = (size_t)_max_batch_size * (size_t)_tw._max_step * - (size_t)_tw._src_vocab_size; - return (sz0 + max(max(sz1, sz2), sz3)) * sizeof(_DataType) + si * sizeof(int); -} - /** Init the GPU memory pointer which point to the memory buffer needed by encoder. diff --git a/lightseq/inference/model/quant_gpt_encoder.h b/lightseq/inference/model/quant_gpt_encoder.h index cbb7ba50..7adcbd7c 100644 --- a/lightseq/inference/model/quant_gpt_encoder.h +++ b/lightseq/inference/model/quant_gpt_encoder.h @@ -47,8 +47,8 @@ class QuantGptEncoder { // cublasLtHandle_t _cublas_lt_handle; const _DataType _fone; const _DataType _fzero; - // const int32_t _ione; - // const int32_t _izero; + const int32_t _ione; + const int32_t _izero; const _DataType _atten_scaler; const int _max_batch_dim; const int _max_thread_per_block; @@ -75,9 +75,9 @@ class QuantGptEncoder { int *_p_d_unfinished; curandState *_p_d_curandstate; //[batch_size] - // int8_t *_int8_ffn_in_buf; - // int32_t *_int32_ffn_out_buf; - // int8_t *_int8_ffn_out_buf; + int8_t *_int8_ffn_in_buf; + int32_t *_int32_ffn_out_buf; + int8_t *_int8_ffn_out_buf; // {token_emb, pos_emb, norm_scale, norm_bias} const std::vector &_p_d_src_emb_wei; @@ -87,18 +87,18 @@ class QuantGptEncoder { // ffn_first_kernel, ffn_first_bias, ffn_second_kernel, ffn_second_bias} * // encoder_layer_num const std::vector &_p_d_enc_wei; - // std::vector _p_device_wei; - // std::vector _p_device_emb; - - // std::vector _int8_p_d_enc_wei; - // int8_t *_int8_p_d_src_emb_wei; - // int8_t *_int8_p_d_src_emb_bottom_wei; - // const float _quant_range = 127; - // const float _src_emb_clip_max; - // const float _output_ln_clip_max; - // const float _logits_clip_max; - // const std::vector _enc_clip_max; // size: 12 * enc_layer_num - // std::vector<_DataType *> _scaled_ffn2_colsum; + std::vector _p_device_wei; + std::vector _p_device_emb; + + std::vector _int8_p_d_enc_wei; + int8_t *_int8_p_d_src_emb_wei; + int8_t *_int8_p_d_src_emb_bottom_wei; + const float _quant_range = 127; + const float _src_emb_clip_max; + const float _output_ln_clip_max; + const float _logits_clip_max; + const std::vector _enc_clip_max; // size: 12 * enc_layer_num + std::vector<_DataType *> _scaled_ffn2_colsum; int _batch_size; int _batch_token_num; diff --git a/lightseq/inference/proto/quant_gpt_weight.h b/lightseq/inference/proto/quant_gpt_weight.h index b9370c7d..748b566b 100644 --- a/lightseq/inference/proto/quant_gpt_weight.h +++ b/lightseq/inference/proto/quant_gpt_weight.h @@ -71,6 +71,10 @@ class QuantGptWeight { float get_src_emb_clip_max() const { return _src_emb_clip_max; } + float get_output_ln_clip_max() const { return _output_ln_clip_max; } + + float get_logits_clip_max() const { return _logits_clip_max; } + std::vector get_enc_clip_max() const { return _enc_clip_max; } const float _quant_range = 127; diff --git a/lightseq/training/ops/pytorch/export.py b/lightseq/training/ops/pytorch/export.py index ef485a03..d8dac8e0 100644 --- a/lightseq/training/ops/pytorch/export.py +++ b/lightseq/training/ops/pytorch/export.py @@ -73,8 +73,8 @@ def check_rule(tensor_name, rule): except: target_tensor = tt["save"] print( - "%s -> %s, shape: %s, convert finished!" - % (target_tn if target_tn else "created", proto_name, target_tensor.shape) + "%s -> %s, convert finished!" + % (target_tn if target_tn else "created", proto_name) ) return target_tensor From 7ba1c6ac072bc45782c4587cab331cf9f9953eea Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Sun, 24 Apr 2022 22:56:02 +0800 Subject: [PATCH 36/49] support quant gpt inference (stage 3) --- lightseq/inference/kernels/CMakeLists.txt | 1 + .../inference/kernels/gptKernels_int8.cc.cu | 123 +++++++++++ lightseq/inference/kernels/gptKernels_int8.h | 19 ++ .../inference/model/quant_gpt_encoder.cc.cu | 193 +++++++++++++++--- lightseq/inference/pywrapper/quant_gpt.cc | 5 +- lightseq/inference/pywrapper/quant_gpt.h | 1 - lightseq/inference/pywrapper/wrapper.cc | 2 +- 7 files changed, 309 insertions(+), 35 deletions(-) create mode 100644 lightseq/inference/kernels/gptKernels_int8.cc.cu create mode 100644 lightseq/inference/kernels/gptKernels_int8.h diff --git a/lightseq/inference/kernels/CMakeLists.txt b/lightseq/inference/kernels/CMakeLists.txt index 5f647bcd..b9cebb32 100644 --- a/lightseq/inference/kernels/CMakeLists.txt +++ b/lightseq/inference/kernels/CMakeLists.txt @@ -2,6 +2,7 @@ cmake_minimum_required(VERSION 3.18) set(cuda_kernel_files gptKernels.cc.cu + gptKernels_int8.cc.cu transformerKernels.cc.cu multilgKernels.cc.cu embKernels.cc.cu diff --git a/lightseq/inference/kernels/gptKernels_int8.cc.cu b/lightseq/inference/kernels/gptKernels_int8.cc.cu new file mode 100644 index 00000000..6f57cf7c --- /dev/null +++ b/lightseq/inference/kernels/gptKernels_int8.cc.cu @@ -0,0 +1,123 @@ +#include + +#include "common.h" +#include "gptKernels_int8.h" +#include "transformerKernels.h" +/** +@file +Implemented the cuda kernel function and its launcher +that required by GPT model. +Currently, fp16 and fp32 versions are provided +*/ +namespace lightseq { +namespace cuda { + +/** +@brief: ker_gpt_embedding_int8 +for encoder, look up token embedding, add position embedding + +@thread +gridDim.x = batch_size +gridDim.y = token_seq_len +blockDim.x = hidden_size + +@param +token_emb: [vocab_size, hidden_size] +pos_emb: [max_step, hidden_size] +token_id: input token id, [batch_size, token_seq_len] +output: result, [batch_size, token_seq_len, hidden_size] +real_seq_len: record seq len exclude padding, [batch_size] +padding_id, the padding_id, default 0 +pos_offset: get real pos when decoding which gridDim.y=1 +*/ +template +__global__ void ker_gpt_embedding_int8(const int8_t* token_emb, const T* pos_emb, + const int* token_id, T* output, + int* real_seq_len, int padding_id, + int pos_offset, float dequant_scale) { + int target_pos = blockIdx.x * gridDim.y + blockIdx.y; + int tid = token_id[target_pos]; + if (tid == padding_id) { + // for padding id + output[target_pos * blockDim.x + threadIdx.x] = 0.f; + return; + } + if (threadIdx.x == 0) { + atomicAdd(real_seq_len + blockIdx.x, 1); + } + output[target_pos * blockDim.x + threadIdx.x] = + T(token_emb[tid * blockDim.x + threadIdx.x]) * dequant_scale + + pos_emb[(blockIdx.y + pos_offset) * blockDim.x + threadIdx.x]; +} + +/* fp16 version */ +template <> +__global__ void ker_gpt_embedding_int8<__half>(const int8_t* token_emb, + const __half* pos_emb, + const int* token_id, __half* output, + int* real_seq_len, int padding_id, + int pos_offset, float dequant_scale) { + int target_pos = blockIdx.x * gridDim.y + blockIdx.y; + int tid = token_id[target_pos]; + half2* output_h = (half2*)output; + + if (tid == padding_id) { + // for padding id + output_h[target_pos * blockDim.x + threadIdx.x] = __float2half2_rn(0.f); + return; + } + if (threadIdx.x == 0) { + atomicAdd(real_seq_len + blockIdx.x, 1); + } + + float2 te; + char2 cte = ((const char2*)token_emb)[tid * blockDim.x + threadIdx.x]; + float2 pe = __half22float2( + ((const half2*) + pos_emb)[(blockIdx.y + pos_offset) * blockDim.x + threadIdx.x]); + te.x = float(cte.x) + pe.x; + te.y = float(cte.y) + pe.y; + output_h[target_pos * blockDim.x + threadIdx.x] = __float22half2_rn(te); +} + +template +void ker_gpt_embedding_int8_launcher(int batch_size, int batch_seq_len, + int hidden_size, cudaStream_t stream, + const int8_t* token_emb, const T* pos_emb, + const int* token_id, T* output, + int* real_seq_len, int padding_id, + int pos_offset, float dequant_scale) { + ker_gpt_embedding_int8 + <<>>( + token_emb, pos_emb, token_id, output, real_seq_len, padding_id, + pos_offset, dequant_scale); +} + +template <> +void ker_gpt_embedding_int8_launcher<__half>(int batch_size, int batch_seq_len, + int hidden_size, cudaStream_t stream, + const int8_t* token_emb, + const __half* pos_emb, + const int* token_id, __half* output, + int* real_seq_len, int padding_id, + int pos_offset, float dequant_scale) { + ker_gpt_embedding_int8<__half> + <<>>( + token_emb, pos_emb, token_id, output, real_seq_len, padding_id, + pos_offset, dequant_scale); +} + +template void ker_gpt_embedding_int8_launcher( + int batch_size, int batch_seq_len, int hidden_size, cudaStream_t stream, + const int8_t* token_emb, const float* pos_emb, const int* token_id, + float* output, int* real_seq_len, int padding_id, int pos_offset, + float dequant_scale); + +template void ker_gpt_embedding_int8_launcher<__half>( + int batch_size, int batch_seq_len, int hidden_size, cudaStream_t stream, + const int8_t* token_emb, const __half* pos_emb, const int* token_id, + __half* output, int* real_seq_len, int padding_id, int pos_offset, + float dequant_scale); + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/kernels/gptKernels_int8.h b/lightseq/inference/kernels/gptKernels_int8.h new file mode 100644 index 00000000..a59edff7 --- /dev/null +++ b/lightseq/inference/kernels/gptKernels_int8.h @@ -0,0 +1,19 @@ +#pragma once +#include +#include +#include +#include + +namespace lightseq { +namespace cuda { + +template +void ker_gpt_embedding_int8_launcher(int batch_size, int batch_seq_len, + int hidden_size, cudaStream_t stream, + const int8_t* token_emb, const T* pos_emb, + const int* token_id, T* output, + int* real_seq_len, int padding_id, + int pos_offset, float dequant_scale); + +} // namespace cuda +} // namespace lightseq diff --git a/lightseq/inference/model/quant_gpt_encoder.cc.cu b/lightseq/inference/model/quant_gpt_encoder.cc.cu index d7e09ea2..efa75aeb 100644 --- a/lightseq/inference/model/quant_gpt_encoder.cc.cu +++ b/lightseq/inference/model/quant_gpt_encoder.cc.cu @@ -1,4 +1,5 @@ #include "../kernels/gptKernels.h" +#include "../kernels/gptKernels_int8.h" #include "../kernels/transformerKernels.h" #include "../kernels/transformerKernels_int8.h" #include "quant_gpt_encoder.h" @@ -53,39 +54,173 @@ These buffer are used during custom cuda kernel function, find the corresponding function to see how these buffer are used */ template -void QuantGptEncoder::init_buffer(void *pbuf) { - // int buffer - int *p_d_int = reinterpret_cast(pbuf); - _p_d_real_seq_len = p_d_int; - p_d_int += _max_batch_size; - - // datatype buffer - _DataType *p_d_datatype = reinterpret_cast<_DataType *>(p_d_int); - _p_d_query = p_d_datatype; - _p_d_k_cache = _p_d_query + _max_batch_dim; - _p_d_v_cache = _p_d_k_cache + _max_batch_dim * _tw._n_enc_layer; - p_d_datatype = _p_d_v_cache + _max_batch_dim * _tw._n_enc_layer; - // reuse 1 --------------------- - _p_d_qkv_projected = p_d_datatype; - _p_d_q = _p_d_qkv_projected + _max_batch_dim * 3; - _p_d_k = _p_d_q + _max_batch_dim; - _p_d_v = _p_d_k + _max_batch_dim; - // _max_batch_size * _tw._head_num * - // _tw._max_step * _tw._max_step - _p_d_c = _p_d_v + _max_batch_dim; - // reuse 2 --------------------- - _p_d_ffn_buf1 = p_d_datatype; - // _max_batch_size * _tw._max_step * _tw._inner_size - _p_d_ffn_buf2 = _p_d_ffn_buf1 + _max_batch_dim; - // reuse 3 --------------------- - // _max_batch_size * _tw._max_step * _tw._src_vocab_size - _p_d_logit = p_d_datatype; +void QuantGptEncoder::init_buffer() { + CHECK_GPU_ERROR( + cudaMalloc(&_p_d_real_seq_len, _max_batch_size * sizeof(int))); + CHECK_GPU_ERROR(cudaMalloc(&_p_d_query, _max_batch_dim * sizeof(_DataType))); + CHECK_GPU_ERROR(cudaMalloc(&_p_d_c, _max_batch_size * _tw._head_num * + _tw._max_step * _tw._max_step * + sizeof(_DataType))); CHECK_GPU_ERROR(cudaMalloc((void **)&_p_d_curandstate, _max_batch_size * sizeof(curandState))); CHECK_GPU_ERROR(cudaMalloc((void **)&_p_d_sample_id_buf, _max_batch_size * _tw._max_step * sizeof(int))); CHECK_GPU_ERROR(cudaMalloc((void **)&_p_d_unfinished, sizeof(int))); ker_curand_setup<<<_max_batch_size, 1, 0, _stream>>>(_p_d_curandstate); + + int max_batch_dim = + _max_batch_size * _tw._beam_size * + round_up(std::max(_tw._inner_size, _tw._hidden_size * 3), 32); + CHECK_GPU_ERROR( + cudaMalloc(&_int8_ffn_in_buf, max_batch_dim * sizeof(int8_t))); + CHECK_GPU_ERROR(cudaMalloc( + &_int32_ffn_out_buf, + std::max(std::max(max_batch_dim, _max_batch_size * _tw._beam_size * + _tw._head_num * _tw._max_step), + round_up(_tw._src_vocab_size, 32) * _tw._beam_size * + _max_batch_size) * + sizeof(int32_t))); + CHECK_GPU_ERROR( + cudaMalloc(&_int8_ffn_out_buf, + std::max(max_batch_dim, round_up(_tw._src_vocab_size, 32) * + _tw._beam_size * _max_batch_size) * + sizeof(int8_t))); + + // malloc embeddings + CHECK_GPU_ERROR( + cudaMalloc(&_int8_p_d_src_emb_wei, + _tw._src_vocab_size * _tw._hidden_size * sizeof(int8_t))); + quantize_weight(_p_d_src_emb_wei[0], _int8_p_d_src_emb_wei, _tw._hidden_size, + _tw._src_vocab_size, _quant_range / _src_emb_clip_max, + _stream, _cublas_lt_handle); + CHECK_GPU_ERROR( + cudaMalloc(&_int8_p_d_src_emb_bottom_wei, + _tw._src_vocab_size * _tw._hidden_size * sizeof(int8_t))); + quantize_weight(_p_d_src_emb_wei[0], _int8_p_d_src_emb_bottom_wei, + _tw._hidden_size, _tw._src_vocab_size, + _quant_range / _src_emb_clip_max, _stream, _cublas_lt_handle, + kRowMajor); + _p_device_emb.push_back(nullptr); + _p_device_emb.push_back( + to_gpu(_p_d_src_emb_wei[1], _tw._max_step * _tw._hidden_size, _stream)); + _p_device_emb.push_back( + to_gpu(_p_d_src_emb_wei[2], _tw._hidden_size, _stream)); + _p_device_emb.push_back( + to_gpu(_p_d_src_emb_wei[3], _tw._hidden_size, _stream)); + + // malloc reused kv cache max size: _tw._hidden_size * 2 * _tw._n_enc_layer * + // _max_batch_size * _max_step * sizeof(T) + int8_t *self_kv_cache_buffer; + int8_t *sliding_p; + CHECK_GPU_ERROR( + cudaMalloc(&self_kv_cache_buffer, + _layer_size_self_k * _tw._n_enc_layer * 4 * sizeof(int8_t))); + + sliding_p = self_kv_cache_buffer; + for (int i = 0; i < _tw._n_enc_layer * 2; i++) { + _p_d_self_k_cache.push_back(sliding_p); + sliding_p += _layer_size_self_k; + } + for (int i = 0; i < _tw._n_enc_layer * 2; i++) { + _p_d_self_v_cache.push_back(sliding_p); + sliding_p += _layer_size_self_k; + } + _p_d_self_k_cache1 = _p_d_self_k_cache.data(); + _p_d_self_k_cache2 = _p_d_self_k_cache.data() + _tw._n_enc_layer; + _p_d_self_v_cache1 = _p_d_self_v_cache.data(); + _p_d_self_v_cache2 = _p_d_self_v_cache.data() + _tw._n_enc_layer; + + // malloc weights + _int8_p_d_enc_wei = std::vector(_tw._n_enc_layer * 4); + _scaled_ffn2_colsum = std::vector<_DataType *>(_tw._n_enc_layer); + for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { + _weight_offset = _layer_id * _tw._weight_per_enc_layer; + // malloc quantized weights + CHECK_GPU_ERROR( + cudaMalloc(&_int8_p_d_enc_wei[_layer_id * 4], + _tw._hidden_size * 3 * _tw._hidden_size * sizeof(int8_t))); + CHECK_GPU_ERROR( + cudaMalloc(&_int8_p_d_enc_wei[_layer_id * 4 + 1], + _tw._hidden_size * _tw._hidden_size * sizeof(int8_t))); + CHECK_GPU_ERROR( + cudaMalloc(&_int8_p_d_enc_wei[_layer_id * 4 + 2], + _tw._hidden_size * _tw._inner_size * sizeof(int8_t))); + CHECK_GPU_ERROR( + cudaMalloc(&_int8_p_d_enc_wei[_layer_id * 4 + 3], + _tw._inner_size * _tw._hidden_size * sizeof(int8_t))); + + // malloc unquantized weights + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset], _tw._hidden_size, _stream)); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 1], _tw._hidden_size, _stream)); + _p_device_wei.push_back(nullptr); + _p_device_wei.push_back(to_gpu(_p_d_enc_wei[_weight_offset + 3], + _tw._hidden_size * 3, _stream)); + _p_device_wei.push_back(nullptr); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 5], _tw._hidden_size, _stream)); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 6], _tw._hidden_size, _stream)); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 7], _tw._hidden_size, _stream)); + _p_device_wei.push_back(nullptr); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 9], _tw._inner_size, _stream)); + _p_device_wei.push_back(nullptr); + _p_device_wei.push_back( + to_gpu(_p_d_enc_wei[_weight_offset + 11], _tw._hidden_size, _stream)); + + quantize_weight(_p_d_enc_wei[_weight_offset + 2], + _int8_p_d_enc_wei[_layer_id * 4], _tw._hidden_size, + _tw._hidden_size * 3, + _quant_range / _enc_clip_max[_layer_id * 12], _stream, + _cublas_lt_handle); + + quantize_weight(_p_d_enc_wei[_weight_offset + 4], + _int8_p_d_enc_wei[_layer_id * 4 + 1], _tw._hidden_size, + _tw._hidden_size, + _quant_range / _enc_clip_max[_layer_id * 12 + 1], _stream, + _cublas_lt_handle, kColMajor); + + quantize_weight(_p_d_enc_wei[_weight_offset + 8], + _int8_p_d_enc_wei[_layer_id * 4 + 2], _tw._hidden_size, + _tw._inner_size, + _quant_range / _enc_clip_max[_layer_id * 12 + 2], _stream, + _cublas_lt_handle); + + quantize_weight(_p_d_enc_wei[_weight_offset + 10], + _int8_p_d_enc_wei[_layer_id * 4 + 3], _tw._inner_size, + _tw._hidden_size, + _quant_range / _enc_clip_max[_layer_id * 12 + 3], _stream, + _cublas_lt_handle, kColMajor); + + if (_tw._use_gelu) { + _scaled_ffn2_colsum[_layer_id] = nullptr; + } else { + CHECK_GPU_ERROR(cudaMalloc(&_scaled_ffn2_colsum[_layer_id], + _tw._hidden_size * sizeof(_DataType))); + float relu_scale = _enc_clip_max[_layer_id * 12 + 7] / 2; + + _DataType *temp; + int weight_size = _tw._inner_size * _tw._hidden_size; + + CHECK_GPU_ERROR(cudaMalloc(&temp, weight_size * sizeof(_DataType))); + CHECK_GPU_ERROR(cudaMemcpyAsync(temp, _p_d_enc_wei[_weight_offset + 10], + weight_size * sizeof(_DataType), + cudaMemcpyHostToDevice, _stream)); + launch_scaled_colsum(temp, _scaled_ffn2_colsum[_layer_id], + _tw._inner_size, _tw._hidden_size, relu_scale, + _stream); + CHECK_GPU_ERROR(cudaGetLastError()); + CHECK_GPU_ERROR(cudaFree(temp)); + } + } + + CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); + CHECK_GPU_ERROR(cudaGetLastError()); + std::cout << "quantized encoder buffer init succeed" << std::endl; + return; } @@ -150,8 +285,8 @@ void QuantGptEncoder::run_one_infer(int batch_size, #endif // token embedding, add position embedding and layer_norm - ker_gpt_embedding_launcher<_DataType>( - batch_size, batch_seq_len, _tw._hidden_size, _stream, _p_d_src_emb_wei[0], + ker_gpt_embedding_int8_launcher<_DataType>( + batch_size, batch_seq_len, _tw._hidden_size, _stream, _int8_p_d_src_emb_bottom_wei, _p_d_src_emb_wei[1], _p_d_token_id, _p_d_query, _p_d_real_seq_len, _tw._padding_id, 0); diff --git a/lightseq/inference/pywrapper/quant_gpt.cc b/lightseq/inference/pywrapper/quant_gpt.cc index 332e2db1..6c836d9e 100644 --- a/lightseq/inference/pywrapper/quant_gpt.cc +++ b/lightseq/inference/pywrapper/quant_gpt.cc @@ -49,9 +49,7 @@ QuantGpt::QuantGpt(const std::string weight_path, const int max_batch_size) std::cout << "Allocated " << buf_bytesize / (1024 * 1024) << "MB GPU buffer for GPT2" << std::endl; - // encoder and decoder use the same buffer to save gpu memory useage - CHECK_GPU_ERROR(cudaMalloc((void**)&d_buf_, (size_t)buf_bytesize)); - encoder_->init_buffer(d_buf_); + encoder_->init_buffer(); CHECK_GPU_ERROR(cudaStreamSynchronize(stream_)); } @@ -59,7 +57,6 @@ QuantGpt::~QuantGpt() { CHECK_GPU_ERROR(cudaFree(d_input_)); CHECK_GPU_ERROR(cudaFree(d_sample_id)); CHECK_GPU_ERROR(cudaFree(d_ppl)); - CHECK_GPU_ERROR(cudaFree(d_buf_)); CHECK_GPU_ERROR(cudaStreamDestroy(stream_)); CHECK_GPU_ERROR(cudaStreamDestroy(cache_stream_)); CHECK_GPU_ERROR(cublasDestroy(hd_)); diff --git a/lightseq/inference/pywrapper/quant_gpt.h b/lightseq/inference/pywrapper/quant_gpt.h index 4c87f884..6032b580 100644 --- a/lightseq/inference/pywrapper/quant_gpt.h +++ b/lightseq/inference/pywrapper/quant_gpt.h @@ -22,7 +22,6 @@ class QuantGpt : public LSModel { int* d_input_; int* d_sample_id; float* d_ppl; - void* d_buf_; int _max_batch_size; cudaStream_t stream_; diff --git a/lightseq/inference/pywrapper/wrapper.cc b/lightseq/inference/pywrapper/wrapper.cc index 2d4bace4..ab9b4f36 100644 --- a/lightseq/inference/pywrapper/wrapper.cc +++ b/lightseq/inference/pywrapper/wrapper.cc @@ -398,7 +398,7 @@ class PyGpt { if (model_->get_output_dtype(0) != lightseq::cuda::DataType::kFloat32) { throw std::runtime_error( "This model is not for ppl, you should set the sampling_method to " - "ppl"); + "topk or topp"); } lightseq::cuda::CHECK_GPU_ERROR( From 1ab6bfc62b87c29ba5d05b6101620733740e5154 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Mon, 25 Apr 2022 18:59:15 +0800 Subject: [PATCH 37/49] support quant gpt inference (ppl) --- .../inference/python/test/ls_quant_gpt2.py | 120 ++++++ .../inference/kernels/gptKernels_int8.cc.cu | 152 +++++--- lightseq/inference/kernels/gptKernels_int8.h | 18 +- .../inference/model/quant_gpt_encoder.cc.cu | 357 ++++++++---------- lightseq/inference/model/quant_gpt_encoder.h | 5 +- lightseq/inference/pywrapper/quant_gpt.cc | 4 - lightseq/inference/pywrapper/wrapper.cc | 4 +- 7 files changed, 393 insertions(+), 267 deletions(-) create mode 100644 examples/inference/python/test/ls_quant_gpt2.py diff --git a/examples/inference/python/test/ls_quant_gpt2.py b/examples/inference/python/test/ls_quant_gpt2.py new file mode 100644 index 00000000..74acefc4 --- /dev/null +++ b/examples/inference/python/test/ls_quant_gpt2.py @@ -0,0 +1,120 @@ +import time +import argparse + +import torch +import numpy as np +import lightseq.inference as lsi +from transformers import GPT2Tokenizer, GPT2LMHeadModel + + +def ls_gpt2(model, inputs): + torch.cuda.synchronize() + start_time = time.perf_counter() + generated_ids = model.sample(inputs) + torch.cuda.synchronize() + end_time = time.perf_counter() + return generated_ids, end_time - start_time + + +def hf_gpt2(model, inputs, tokenizer): + inputs = inputs.to("cuda:0") + torch.cuda.synchronize() + start_time = time.perf_counter() + generated_ids = model.generate( + inputs, max_length=50, pad_token_id=tokenizer.eos_token_id + ) + torch.cuda.synchronize() + end_time = time.perf_counter() + return generated_ids, end_time - start_time + + +def ls_generate(model, tokenizer, inputs): + print("=========lightseq=========") + print("lightseq generating...") + ls_res_ids, ls_time = ls_gpt2(model, inputs) + ls_res = tokenizer.batch_decode(ls_res_ids, skip_special_tokens=True) + print(f"lightseq time: {ls_time}s") + print("lightseq results:") + for sent in ls_res: + print(sent) + + +def hf_generate(model, tokenizer, inputs): + print("=========huggingface=========") + print("huggingface generating...") + hf_res_ids, hf_time = hf_gpt2(model, inputs, tokenizer) + hf_res = tokenizer.batch_decode(hf_res_ids, skip_special_tokens=True) + print(f"huggingface time: {hf_time}s") + print("huggingface results:") + for sent in hf_res: + print(sent) + + +def warmup(ls_tokenizer, hf_tokenizer, ls_model, hf_model, sentences): + ls_inputs = ls_tokenizer(sentences, return_tensors="pt", padding=True)["input_ids"] + hf_inputs = hf_tokenizer(sentences, return_tensors="pt", padding=True)["input_ids"] + + ls_generate(ls_model, ls_tokenizer, ls_inputs) + hf_generate(hf_model, hf_tokenizer, hf_inputs) + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument("--user_input", action="store_true") + args = parser.parse_args() + + print("initializing gpt tokenizer...") + + ls_tokenizer = GPT2Tokenizer.from_pretrained("gpt2") + # lightseq use len(tokenizer) as pad_token in default + ls_tokenizer.add_special_tokens({"pad_token": "[PAD]"}) + print(f"lightseq tokenizer pad token id: {ls_tokenizer.pad_token_id}") + + hf_tokenizer = GPT2Tokenizer.from_pretrained("gpt2") + # use EOS as PAD for huggingface to avoid warning according to https://huggingface.co/blog/how-to-generate while avoid reshaping the model embedding + hf_tokenizer.pad_token = hf_tokenizer.eos_token + print(f"huggingface tokenizer pad token id: {hf_tokenizer.pad_token_id}") + + print("creating lightseq model...") + ls_model = lsi.QuantGpt("lightseq_gpt2_base.hdf5", max_batch_size=16) + + print("creating huggingface model...") + hf_model = GPT2LMHeadModel.from_pretrained("gpt2") + hf_model.to("cuda:0") + hf_model.eval() + + # lightseq gpt perplexity supports batch infer with different lengths, + # but sampling doesn't support + sentences = [ + "My name is GPT", + "My name is GPT", + "My name is GPT", + "My name is GPT", + ] + + print("====================START warmup====================") + warmup(ls_tokenizer, hf_tokenizer, ls_model, hf_model, sentences) + print("====================END warmup====================") + + while True: + if args.user_input: + sentences = [input("input the masked sentence:\n")] + + print("tokenizing the sentences...") + + ls_inputs = ls_tokenizer(sentences, return_tensors="pt", padding=True)[ + "input_ids" + ] + hf_inputs = hf_tokenizer(sentences, return_tensors="pt", padding=True)[ + "input_ids" + ] + + ls_generate(ls_model, ls_tokenizer, ls_inputs) + hf_generate(hf_model, hf_tokenizer, hf_inputs) + + if not args.user_input: + break + + +if __name__ == "__main__": + main() diff --git a/lightseq/inference/kernels/gptKernels_int8.cc.cu b/lightseq/inference/kernels/gptKernels_int8.cc.cu index 6f57cf7c..74b64af1 100644 --- a/lightseq/inference/kernels/gptKernels_int8.cc.cu +++ b/lightseq/inference/kernels/gptKernels_int8.cc.cu @@ -12,29 +12,12 @@ Currently, fp16 and fp32 versions are provided namespace lightseq { namespace cuda { -/** -@brief: ker_gpt_embedding_int8 -for encoder, look up token embedding, add position embedding - -@thread -gridDim.x = batch_size -gridDim.y = token_seq_len -blockDim.x = hidden_size - -@param -token_emb: [vocab_size, hidden_size] -pos_emb: [max_step, hidden_size] -token_id: input token id, [batch_size, token_seq_len] -output: result, [batch_size, token_seq_len, hidden_size] -real_seq_len: record seq len exclude padding, [batch_size] -padding_id, the padding_id, default 0 -pos_offset: get real pos when decoding which gridDim.y=1 -*/ template -__global__ void ker_gpt_embedding_int8(const int8_t* token_emb, const T* pos_emb, - const int* token_id, T* output, - int* real_seq_len, int padding_id, - int pos_offset, float dequant_scale) { +__global__ void ker_gpt_embedding_int8(const int8_t* token_emb, + const T* pos_emb, const int* token_id, + T* output, int* real_seq_len, + int padding_id, int pos_offset, + float dequant_scale) { int target_pos = blockIdx.x * gridDim.y + blockIdx.y; int tid = token_id[target_pos]; if (tid == padding_id) { @@ -52,11 +35,10 @@ __global__ void ker_gpt_embedding_int8(const int8_t* token_emb, const T* pos_emb /* fp16 version */ template <> -__global__ void ker_gpt_embedding_int8<__half>(const int8_t* token_emb, - const __half* pos_emb, - const int* token_id, __half* output, - int* real_seq_len, int padding_id, - int pos_offset, float dequant_scale) { +__global__ void ker_gpt_embedding_int8<__half>( + const int8_t* token_emb, const __half* pos_emb, const int* token_id, + __half* output, int* real_seq_len, int padding_id, int pos_offset, + float dequant_scale) { int target_pos = blockIdx.x * gridDim.y + blockIdx.y; int tid = token_id[target_pos]; half2* output_h = (half2*)output; @@ -75,18 +57,18 @@ __global__ void ker_gpt_embedding_int8<__half>(const int8_t* token_emb, float2 pe = __half22float2( ((const half2*) pos_emb)[(blockIdx.y + pos_offset) * blockDim.x + threadIdx.x]); - te.x = float(cte.x) + pe.x; - te.y = float(cte.y) + pe.y; + te.x = float(cte.x) * dequant_scale + pe.x; + te.y = float(cte.y) * dequant_scale + pe.y; output_h[target_pos * blockDim.x + threadIdx.x] = __float22half2_rn(te); } template -void ker_gpt_embedding_int8_launcher(int batch_size, int batch_seq_len, - int hidden_size, cudaStream_t stream, - const int8_t* token_emb, const T* pos_emb, - const int* token_id, T* output, - int* real_seq_len, int padding_id, - int pos_offset, float dequant_scale) { +void ker_gpt_embedding_i8I_launcher(int batch_size, int batch_seq_len, + int hidden_size, cudaStream_t stream, + const int8_t* token_emb, const T* pos_emb, + const int* token_id, T* output, + int* real_seq_len, int padding_id, + int pos_offset, float dequant_scale) { ker_gpt_embedding_int8 <<>>( token_emb, pos_emb, token_id, output, real_seq_len, padding_id, @@ -94,30 +76,112 @@ void ker_gpt_embedding_int8_launcher(int batch_size, int batch_seq_len, } template <> -void ker_gpt_embedding_int8_launcher<__half>(int batch_size, int batch_seq_len, - int hidden_size, cudaStream_t stream, - const int8_t* token_emb, - const __half* pos_emb, - const int* token_id, __half* output, - int* real_seq_len, int padding_id, - int pos_offset, float dequant_scale) { +void ker_gpt_embedding_i8I_launcher<__half>( + int batch_size, int batch_seq_len, int hidden_size, cudaStream_t stream, + const int8_t* token_emb, const __half* pos_emb, const int* token_id, + __half* output, int* real_seq_len, int padding_id, int pos_offset, + float dequant_scale) { ker_gpt_embedding_int8<__half> <<>>( token_emb, pos_emb, token_id, output, real_seq_len, padding_id, pos_offset, dequant_scale); } -template void ker_gpt_embedding_int8_launcher( +template void ker_gpt_embedding_i8I_launcher( int batch_size, int batch_seq_len, int hidden_size, cudaStream_t stream, const int8_t* token_emb, const float* pos_emb, const int* token_id, float* output, int* real_seq_len, int padding_id, int pos_offset, float dequant_scale); -template void ker_gpt_embedding_int8_launcher<__half>( +template void ker_gpt_embedding_i8I_launcher<__half>( int batch_size, int batch_seq_len, int hidden_size, cudaStream_t stream, const int8_t* token_emb, const __half* pos_emb, const int* token_id, __half* output, int* real_seq_len, int padding_id, int pos_offset, float dequant_scale); +__global__ void ker_ppl_i8I(const int8_t* logits, const int* input_ids, + const int* real_seq_len, float* ppl, int vocab_size, + float dequant_scale, bool in_col32) { + int seq_len = real_seq_len[blockIdx.x]; // remove "eos" + if (blockIdx.y >= seq_len - 1) { + // will not contribute to ppl + return; + } + + int token_idx_in_batch = blockIdx.x * gridDim.y + blockIdx.y; + int left_logit_idx = token_idx_in_batch * vocab_size + threadIdx.x; + int right_logit_idx = (token_idx_in_batch + 1) * vocab_size; + /* + step 1. find max logit over the whole vocab + */ + float max_logit = CUDA_FLOAT_INF_NEG; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = token_idx_in_batch; + int col_id = idx - token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32(row_id, col_id, gridDim.x * gridDim.y, + vocab_size); + } else { + logits_idx = idx; + } + max_logit = fmaxf(max_logit, (float)logits[logits_idx] * dequant_scale); + } + max_logit = blockReduceMax(max_logit); + __shared__ float s_max_logit; + if (threadIdx.x == 0) { + s_max_logit = max_logit; + } + __syncthreads(); + + /* + step 2. compute the log probability for the given token, + add it to the sequence's ppl + */ + float sum_exp_logit = 0.f; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = token_idx_in_batch; + int col_id = idx - token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32(row_id, col_id, gridDim.x * gridDim.y, + vocab_size); + } else { + logits_idx = idx; + } + float lgt = fmaxf((float)logits[logits_idx] * dequant_scale - s_max_logit, + logit_thresh_min); + sum_exp_logit += expf(lgt); + } + sum_exp_logit = blockReduceSum(sum_exp_logit); + + if (threadIdx.x == 0) { + int token_id = input_ids[token_idx_in_batch + 1]; + int logits_idx; + if (in_col32) { + int row_id = token_idx_in_batch; + int col_id = token_id; + logits_idx = row_major2flat_col32(row_id, col_id, gridDim.x * gridDim.y, + vocab_size); + } else { + logits_idx = token_idx_in_batch * vocab_size + token_id; + } + float log_prob = ((float)logits[logits_idx] * dequant_scale - s_max_logit - + logf(sum_exp_logit)) / + (float)(seq_len - 1); + atomicAdd(ppl + blockIdx.x, -log_prob); + } +} + +void ker_ppl_i8I_launcher(int batch_size, int batch_seq_len, + int max_thread_per_block, cudaStream_t stream, + const int8_t* logits, const int* input_ids, + const int* real_seq_len, float* ppl, int vocab_size, + float dequant_scale, bool in_col32) { + ker_ppl_i8I<<>>(logits, input_ids, real_seq_len, ppl, vocab_size, + dequant_scale, in_col32); +} + } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/kernels/gptKernels_int8.h b/lightseq/inference/kernels/gptKernels_int8.h index a59edff7..b2b7f8c0 100644 --- a/lightseq/inference/kernels/gptKernels_int8.h +++ b/lightseq/inference/kernels/gptKernels_int8.h @@ -8,12 +8,18 @@ namespace lightseq { namespace cuda { template -void ker_gpt_embedding_int8_launcher(int batch_size, int batch_seq_len, - int hidden_size, cudaStream_t stream, - const int8_t* token_emb, const T* pos_emb, - const int* token_id, T* output, - int* real_seq_len, int padding_id, - int pos_offset, float dequant_scale); +void ker_gpt_embedding_i8I_launcher(int batch_size, int batch_seq_len, + int hidden_size, cudaStream_t stream, + const int8_t* token_emb, const T* pos_emb, + const int* token_id, T* output, + int* real_seq_len, int padding_id, + int pos_offset, float dequant_scale); + +void ker_ppl_i8I_launcher(int batch_size, int batch_seq_len, + int max_thread_per_block, cudaStream_t stream, + const int8_t* logits, const int* input_ids, + const int* real_seq_len, float* ppl, int vocab_size, + float dequant_scale, bool in_col32 = false); } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/model/quant_gpt_encoder.cc.cu b/lightseq/inference/model/quant_gpt_encoder.cc.cu index efa75aeb..51117534 100644 --- a/lightseq/inference/model/quant_gpt_encoder.cc.cu +++ b/lightseq/inference/model/quant_gpt_encoder.cc.cu @@ -45,7 +45,9 @@ QuantGptEncoder::QuantGptEncoder( _h_real_seq_len(max_batch_size, 0), _h_ppl(max_batch_size, 0.f), _h_sample_id(max_batch_size * tw._max_step, 0), - _h_unfinished(1) {} + _h_unfinished(1) { + CHECK_GPU_ERROR(cublasLtCreate(&_cublas_lt_handle)); +} /** Init the GPU memory pointer which point to @@ -58,9 +60,6 @@ void QuantGptEncoder::init_buffer() { CHECK_GPU_ERROR( cudaMalloc(&_p_d_real_seq_len, _max_batch_size * sizeof(int))); CHECK_GPU_ERROR(cudaMalloc(&_p_d_query, _max_batch_dim * sizeof(_DataType))); - CHECK_GPU_ERROR(cudaMalloc(&_p_d_c, _max_batch_size * _tw._head_num * - _tw._max_step * _tw._max_step * - sizeof(_DataType))); CHECK_GPU_ERROR(cudaMalloc((void **)&_p_d_curandstate, _max_batch_size * sizeof(curandState))); CHECK_GPU_ERROR(cudaMalloc((void **)&_p_d_sample_id_buf, @@ -68,22 +67,27 @@ void QuantGptEncoder::init_buffer() { CHECK_GPU_ERROR(cudaMalloc((void **)&_p_d_unfinished, sizeof(int))); ker_curand_setup<<<_max_batch_size, 1, 0, _stream>>>(_p_d_curandstate); + _DataType *qkv_buf; + CHECK_GPU_ERROR(cudaMalloc(&qkv_buf, 3 * _max_batch_dim * sizeof(_DataType))); + _p_d_q = qkv_buf; + _p_d_k = qkv_buf + _max_batch_dim; + _p_d_v = qkv_buf + 2 * _max_batch_dim; + + CHECK_GPU_ERROR(cudaMalloc(&_p_d_c, _max_batch_size * _tw._head_num * + _tw._max_step * _tw._max_step * + sizeof(_DataType))); + int max_batch_dim = - _max_batch_size * _tw._beam_size * + _max_batch_size * _tw._max_step * round_up(std::max(_tw._inner_size, _tw._hidden_size * 3), 32); CHECK_GPU_ERROR( cudaMalloc(&_int8_ffn_in_buf, max_batch_dim * sizeof(int8_t))); - CHECK_GPU_ERROR(cudaMalloc( - &_int32_ffn_out_buf, - std::max(std::max(max_batch_dim, _max_batch_size * _tw._beam_size * - _tw._head_num * _tw._max_step), - round_up(_tw._src_vocab_size, 32) * _tw._beam_size * - _max_batch_size) * - sizeof(int32_t))); + CHECK_GPU_ERROR( + cudaMalloc(&_int32_ffn_out_buf, max_batch_dim * sizeof(int32_t))); CHECK_GPU_ERROR( cudaMalloc(&_int8_ffn_out_buf, std::max(max_batch_dim, round_up(_tw._src_vocab_size, 32) * - _tw._beam_size * _max_batch_size) * + _tw._max_step * _max_batch_size) * sizeof(int8_t))); // malloc embeddings @@ -110,25 +114,26 @@ void QuantGptEncoder::init_buffer() { // malloc reused kv cache max size: _tw._hidden_size * 2 * _tw._n_enc_layer * // _max_batch_size * _max_step * sizeof(T) - int8_t *self_kv_cache_buffer; - int8_t *sliding_p; - CHECK_GPU_ERROR( - cudaMalloc(&self_kv_cache_buffer, - _layer_size_self_k * _tw._n_enc_layer * 4 * sizeof(int8_t))); - - sliding_p = self_kv_cache_buffer; - for (int i = 0; i < _tw._n_enc_layer * 2; i++) { - _p_d_self_k_cache.push_back(sliding_p); - sliding_p += _layer_size_self_k; - } - for (int i = 0; i < _tw._n_enc_layer * 2; i++) { - _p_d_self_v_cache.push_back(sliding_p); - sliding_p += _layer_size_self_k; - } - _p_d_self_k_cache1 = _p_d_self_k_cache.data(); - _p_d_self_k_cache2 = _p_d_self_k_cache.data() + _tw._n_enc_layer; - _p_d_self_v_cache1 = _p_d_self_v_cache.data(); - _p_d_self_v_cache2 = _p_d_self_v_cache.data() + _tw._n_enc_layer; + // int8_t *self_kv_cache_buffer; + // int8_t *sliding_p; + // CHECK_GPU_ERROR( + // cudaMalloc(&self_kv_cache_buffer, + // _layer_size_self_k * _tw._n_enc_layer * 4 * + // sizeof(int8_t))); + + // sliding_p = self_kv_cache_buffer; + // for (int i = 0; i < _tw._n_enc_layer * 2; i++) { + // _p_d_self_k_cache.push_back(sliding_p); + // sliding_p += _layer_size_self_k; + // } + // for (int i = 0; i < _tw._n_enc_layer * 2; i++) { + // _p_d_self_v_cache.push_back(sliding_p); + // sliding_p += _layer_size_self_k; + // } + // _p_d_self_k_cache1 = _p_d_self_k_cache.data(); + // _p_d_self_k_cache2 = _p_d_self_k_cache.data() + _tw._n_enc_layer; + // _p_d_self_v_cache1 = _p_d_self_v_cache.data(); + // _p_d_self_v_cache2 = _p_d_self_v_cache.data() + _tw._n_enc_layer; // malloc weights _int8_p_d_enc_wei = std::vector(_tw._n_enc_layer * 4); @@ -195,26 +200,7 @@ void QuantGptEncoder::init_buffer() { _quant_range / _enc_clip_max[_layer_id * 12 + 3], _stream, _cublas_lt_handle, kColMajor); - if (_tw._use_gelu) { - _scaled_ffn2_colsum[_layer_id] = nullptr; - } else { - CHECK_GPU_ERROR(cudaMalloc(&_scaled_ffn2_colsum[_layer_id], - _tw._hidden_size * sizeof(_DataType))); - float relu_scale = _enc_clip_max[_layer_id * 12 + 7] / 2; - - _DataType *temp; - int weight_size = _tw._inner_size * _tw._hidden_size; - - CHECK_GPU_ERROR(cudaMalloc(&temp, weight_size * sizeof(_DataType))); - CHECK_GPU_ERROR(cudaMemcpyAsync(temp, _p_d_enc_wei[_weight_offset + 10], - weight_size * sizeof(_DataType), - cudaMemcpyHostToDevice, _stream)); - launch_scaled_colsum(temp, _scaled_ffn2_colsum[_layer_id], - _tw._inner_size, _tw._hidden_size, relu_scale, - _stream); - CHECK_GPU_ERROR(cudaGetLastError()); - CHECK_GPU_ERROR(cudaFree(temp)); - } + _scaled_ffn2_colsum[_layer_id] = nullptr; } CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); @@ -285,15 +271,13 @@ void QuantGptEncoder::run_one_infer(int batch_size, #endif // token embedding, add position embedding and layer_norm - ker_gpt_embedding_int8_launcher<_DataType>( - batch_size, batch_seq_len, _tw._hidden_size, _stream, _int8_p_d_src_emb_bottom_wei, - _p_d_src_emb_wei[1], _p_d_token_id, _p_d_query, _p_d_real_seq_len, - _tw._padding_id, 0); + ker_gpt_embedding_i8I_launcher<_DataType>( + batch_size, batch_seq_len, _tw._hidden_size, _stream, + _int8_p_d_src_emb_bottom_wei, _p_device_emb[1], _p_d_token_id, _p_d_query, + _p_d_real_seq_len, _tw._padding_id, 0, _src_emb_clip_max / _quant_range); #ifdef DEBUG_RESULT - print_vec(_p_d_query, "input embeddings", - _batch_token_num * _tw._hidden_size - 5, - _batch_token_num * _tw._hidden_size); + print_vec(_p_d_query, "input embeddings", 10); #endif for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { @@ -302,13 +286,7 @@ void QuantGptEncoder::run_one_infer(int batch_size, ffn_add_norm(); } - // last layer norm - ker_norm_layer_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_query, - _p_d_src_emb_wei[2], _p_d_src_emb_wei[3], _max_thread_per_block); - compute_ppl(); - return; } @@ -345,9 +323,9 @@ int QuantGptEncoder::run_one_sample(int batch_size, // token embedding, add position embedding and layer_norm ker_gpt_embedding_launcher<_DataType>( - _batch_size, _batch_seq_len, _tw._hidden_size, _stream, - _p_d_src_emb_wei[0], _p_d_src_emb_wei[1], _p_d_sample_id, _p_d_query, - _p_d_real_seq_len, _tw._padding_id, 0); + _batch_size, _batch_seq_len, _tw._hidden_size, _stream, _p_device_emb[0], + _p_device_emb[1], _p_d_sample_id, _p_d_query, _p_d_real_seq_len, + _tw._padding_id, 0); #ifdef DEBUG_RESULT print_vec(_p_d_query, "embedding", _batch_token_num * _tw._hidden_size - 10, @@ -361,9 +339,9 @@ int QuantGptEncoder::run_one_sample(int batch_size, } // last layer norm - ker_norm_layer_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_query, - _p_d_src_emb_wei[2], _p_d_src_emb_wei[3], _max_thread_per_block); + ker_norm_layer_launcher<_DataType>(_batch_token_num, _tw._hidden_size, + _stream, _p_d_query, _p_device_emb[2], + _p_device_emb[3], _max_thread_per_block); if (sample_one_token() == 0 || _batch_seq_len >= _tw._max_step) { CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_sample_id_buf, _p_d_sample_id, _batch_token_num * sizeof(int), @@ -381,8 +359,8 @@ int QuantGptEncoder::run_one_sample(int batch_size, // token embedding, add position embedding and layer_norm ker_gpt_embedding_launcher<_DataType>( - _batch_size, 1, _tw._hidden_size, _stream, _p_d_src_emb_wei[0], - _p_d_src_emb_wei[1], _p_d_last_sample_id, _p_d_query, _p_d_real_seq_len, + _batch_size, 1, _tw._hidden_size, _stream, _p_device_emb[0], + _p_device_emb[1], _p_d_last_sample_id, _p_d_query, _p_d_real_seq_len, _tw._padding_id, _batch_seq_len - 1); #ifdef DEBUG_RESULT print_vec(_p_d_query, "embedding", _batch_size * _tw._hidden_size - 10, @@ -395,9 +373,9 @@ int QuantGptEncoder::run_one_sample(int batch_size, } // last layer norm - ker_norm_layer_launcher<_DataType>( - _batch_size, _tw._hidden_size, _stream, _p_d_query, _p_d_src_emb_wei[2], - _p_d_src_emb_wei[3], _max_thread_per_block); + ker_norm_layer_launcher<_DataType>(_batch_size, _tw._hidden_size, _stream, + _p_d_query, _p_device_emb[2], + _p_device_emb[3], _max_thread_per_block); #ifdef DEBUG_RESULT print_vec(_p_d_query, "_p_d_query before logits", @@ -424,7 +402,7 @@ int QuantGptEncoder::sample_one_token() { /* ---step 1. project hidden states to vocab logits--- */ CHECK_GPU_ERROR(cublasGemmEx( _hd, CUBLAS_OP_T, CUBLAS_OP_N, _tw._src_vocab_size, _batch_token_num, - _tw._hidden_size, &_fone, _p_d_src_emb_wei[0], _AType, _tw._hidden_size, + _tw._hidden_size, &_fone, _p_device_emb[0], _AType, _tw._hidden_size, _p_d_query, _BType, _tw._hidden_size, &_fzero, _p_d_logit, _CType, _tw._src_vocab_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); #ifdef DEBUG_RESULT @@ -469,7 +447,7 @@ int QuantGptEncoder::sample_one_token_with_cache() { /* ---step 1. project hidden states to vocab logits--- */ CHECK_GPU_ERROR(cublasGemmEx( _hd, CUBLAS_OP_T, CUBLAS_OP_N, _tw._src_vocab_size, _batch_size, - _tw._hidden_size, &_fone, _p_d_src_emb_wei[0], _AType, _tw._hidden_size, + _tw._hidden_size, &_fone, _p_device_emb[0], _AType, _tw._hidden_size, _p_d_query, _BType, _tw._hidden_size, &_fzero, _p_d_logit, _CType, _tw._src_vocab_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); @@ -515,53 +493,33 @@ int QuantGptEncoder::sample_one_token_with_cache() { template void QuantGptEncoder::self_attention(bool cache) { /* ---step 0. layer_norm, add output_bias to "query"--- */ - ker_norm_layer_resual_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_query, _p_d_q, - _p_d_enc_wei[_weight_offset], _p_d_enc_wei[_weight_offset + 1], - _p_d_enc_wei[_weight_offset + 5], _max_thread_per_block); - -#ifdef DEBUG_RESULT if (_layer_id == 0) { - print_vec(_p_d_query, "input with bias", - _batch_token_num * _tw._hidden_size - 5, - _batch_token_num * _tw._hidden_size); - print_vec(_p_d_q, "first ln output", - _batch_token_num * _tw._hidden_size - 5, - _batch_token_num * _tw._hidden_size); + ker_norm_layer_resual_i8O_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_query, + _int8_ffn_in_buf, _p_device_wei[_weight_offset], + _p_device_wei[_weight_offset + 1], _p_device_wei[_weight_offset + 5], + _max_thread_per_block, _quant_range / _enc_clip_max[_layer_id * 12 + 4], + false, true); } -#endif + CHECK_GPU_ERROR(cudaGetLastError()); - /* ---step 1. qkv = ori_q * qkv_wei + bias, and reshape qkv for multi-head - * gemm--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size * 3, _batch_token_num, - _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 2], _AType, - _tw._hidden_size * 3, _p_d_q, _BType, _tw._hidden_size, &_fzero, - _p_d_qkv_projected, _CType, _tw._hidden_size * 3, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + cublasLtMM_withAlgo_i8IO( + _int8_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size * 3, + _tw._hidden_size, 0, 0, 0, + _enc_clip_max[_layer_id * 12] * _enc_clip_max[_layer_id * 12 + 4] / + (_enc_clip_max[_layer_id * 12 + 8] * _quant_range), + _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4], _cublas_lt_handle, + _stream, false); -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - std::cout << "hidden_size: " << _tw._hidden_size << std::endl; - std::cout << "_batch_token_num: " << _batch_token_num << std::endl; - std::cout << "_dim_per_head: " << _tw._dim_per_head << std::endl; - std::cout << "_head_num: " << _tw._head_num << std::endl; - - print_vec(_p_d_enc_wei[_weight_offset + 2], "qkv_weight_mat", - _tw._hidden_size * _tw._hidden_size * 3 - 5, - _tw._hidden_size * _tw._hidden_size * 3); - print_vec(_p_d_qkv_projected, "_p_d_qkv_projected", - _batch_token_num * _tw._hidden_size * 3 - 5, - _batch_token_num * _tw._hidden_size * 3); - } -#endif // get q, k, v by split and reshape qkv - ker_arrange_encself_qkv_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_qkv_projected, - _p_d_enc_wei[_weight_offset + 3], _p_d_q, _max_batch_dim, _batch_seq_len, - _tw._dim_per_head, _tw._head_num, _max_thread_per_block); + ker_arrange_encself_qkv_i8I_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _int8_ffn_out_buf, + _p_device_wei[_weight_offset + 3], _p_d_q, _max_batch_dim, _batch_seq_len, + _tw._dim_per_head, _tw._head_num, _max_thread_per_block, + _enc_clip_max[_layer_id * 12 + 8] / _quant_range, true); if (cache) { + throw std::runtime_error("QuantGpt sample() not implemented"); cudaStream_t stream; if (_batch_token_num > 360) { stream = _cache_stream; @@ -579,17 +537,6 @@ void QuantGptEncoder::self_attention(bool cache) { cudaMemcpyDeviceToDevice, stream)); } -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_q, "_p_d_q", _batch_token_num * _tw._hidden_size - 5, - _batch_token_num * _tw._hidden_size); - print_vec(_p_d_k, "_p_d_k", _batch_token_num * _tw._hidden_size - 5, - _batch_token_num * _tw._hidden_size); - print_vec(_p_d_v, "_p_d_v", _batch_token_num * _tw._hidden_size - 5, - _batch_token_num * _tw._hidden_size); - } -#endif - /* ---step 2. correlation = q * k, perform softmax on correlation--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( _hd, CUBLAS_OP_T, CUBLAS_OP_N, _batch_seq_len, _batch_seq_len, @@ -600,26 +547,10 @@ void QuantGptEncoder::self_attention(bool cache) { _batch_size * _tw._head_num, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_c, "q*k", - _batch_token_num * _batch_seq_len * _tw._head_num - 5, - _batch_token_num * _batch_seq_len * _tw._head_num); - } -#endif - ker_correlation_softmax_gpt_launcher<_DataType>(_batch_size, _batch_seq_len, _tw._head_num, _stream, _p_d_c, _p_d_real_seq_len); -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_c, "mask weights", - _batch_token_num * _batch_seq_len * _tw._head_num - 5, - _batch_token_num * _batch_seq_len * _tw._head_num); - } -#endif - /* ---step 3. new_q = correlation * v--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._dim_per_head, _batch_seq_len, @@ -630,40 +561,31 @@ void QuantGptEncoder::self_attention(bool cache) { _batch_size * _tw._head_num, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_q, "value after attention", - _batch_token_num * _tw._hidden_size - 5, - _batch_token_num * _tw._hidden_size); - } -#endif - // use v to save reshaped q, since they are in same size and v // will not be use again before the next multi-head-attention - ker_arrange_atten_output_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_q, _p_d_v, - _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block); - -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_v, "reshaped value after attention", 0, 5); - print_vec(_p_d_query, "attention input with output bias", 0, 5); - } -#endif + ker_arrange_atten_output_i8O_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _p_d_q, _int8_ffn_in_buf, + _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block, + _quant_range / _enc_clip_max[_layer_id * 12 + 5], true); /* ---step 4. new_q = ori_q + new_q * output_wei--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_token_num, - _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 4], _AType, - _tw._hidden_size, _p_d_v, _BType, _tw._hidden_size, &_fone, _p_d_query, - _CType, _tw._hidden_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_enc_wei[_weight_offset + 4], "attn out kernel", 0, 5); - print_vec(_p_d_query, "attention output", 0, 5); - } -#endif + cublasLtMM_withAlgo_i8IO( + _int8_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size, + _tw._hidden_size, 0, 0, 0, + _enc_clip_max[_layer_id * 12 + 1] * _enc_clip_max[_layer_id * 12 + 5] / + (_enc_clip_max[_layer_id * 12 + 9] * _quant_range), + _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 1], _cublas_lt_handle, + _stream, false); + + ker_residual_bias_ln_i8I_i8O_launcher<_DataType>( + _int8_ffn_out_buf, _p_device_wei[_weight_offset + 6], + _p_device_wei[_weight_offset + 7], _p_device_wei[_weight_offset + 11], + _int8_ffn_in_buf, _p_d_query, _batch_token_num, _tw._hidden_size, + _enc_clip_max[_layer_id * 12 + 9] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 6], _max_thread_per_block, + _stream, false, true); + return; } @@ -823,30 +745,52 @@ void QuantGptEncoder::self_attention_with_cache() { template void QuantGptEncoder::ffn_add_norm() { - /* ---step 0. layer_norm, add output_bias to "query"--- */ - ker_norm_layer_resual_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_query, _p_d_ffn_buf1, - _p_d_enc_wei[_weight_offset + 6], _p_d_enc_wei[_weight_offset + 7], - _p_d_enc_wei[_weight_offset + 11], _max_thread_per_block); - /* ---step 1. first ffn layer--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._inner_size, _batch_token_num, - _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 8], _AType, - _tw._inner_size, _p_d_ffn_buf1, _BType, _tw._hidden_size, &_fzero, - _p_d_ffn_buf2, _CType, _tw._inner_size, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); - ker_bias_gelu_launcher<_DataType>( - _batch_token_num, _max_thread_per_block, _stream, _p_d_ffn_buf2, - _p_d_enc_wei[_weight_offset + 9], _tw._inner_size); + cublasLtMM_withAlgo_i8IO( + _int8_ffn_out_buf, 1, _batch_token_num, _tw._inner_size, _tw._hidden_size, + 0, 0, 0, + _enc_clip_max[_layer_id * 12 + 2] * _enc_clip_max[_layer_id * 12 + 6] / + (_enc_clip_max[_layer_id * 12 + 10] * _quant_range), + _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 2], _cublas_lt_handle, + _stream, false); + + ker_bias_gelu_i8I_i8O_launcher<_DataType>( + _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, + _p_device_wei[_weight_offset + 9], _tw._inner_size, + _enc_clip_max[_layer_id * 12 + 10] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 7], true); /* ---step 2. second ffn layer--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_token_num, - _tw._inner_size, &_fone, _p_d_enc_wei[_weight_offset + 10], _AType, - _tw._hidden_size, _p_d_ffn_buf2, _BType, _tw._inner_size, &_fone, - _p_d_query, _CType, _tw._hidden_size, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + + cublasLtMM_withAlgo(_int32_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size, + _tw._inner_size, 0, 0, 0, _int8_ffn_in_buf, + _int8_p_d_enc_wei[_layer_id * 4 + 3], _cublas_lt_handle, + _stream, false); + + const _DataType *scale_ptr, *bias_ptr, *res_bias_ptr; + float clip_max, dequant_scale; + dequant_scale = _enc_clip_max[_layer_id * 12 + 3] * + _enc_clip_max[_layer_id * 12 + 7] / + (_quant_range * _quant_range); + if (_layer_id == _tw._n_enc_layer - 1) { + scale_ptr = _p_device_emb[2]; + bias_ptr = _p_device_emb[3]; + res_bias_ptr = nullptr; + clip_max = _output_ln_clip_max; + } else { + scale_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer]; + bias_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer + 1]; + res_bias_ptr = + _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer + 5]; + clip_max = _enc_clip_max[(_layer_id + 1) * 12 + 4]; + } + + ker_residual_bias_ln_i32I_i8O_launcher<_DataType>( + _int32_ffn_out_buf, scale_ptr, bias_ptr, res_bias_ptr, _int8_ffn_in_buf, + _p_d_query, _batch_token_num, _tw._hidden_size, dequant_scale, + _quant_range / clip_max, _max_thread_per_block, _stream, false, true, + true, _scaled_ffn2_colsum[_layer_id]); + return; } @@ -885,21 +829,18 @@ Compute ppl from encoder output template void QuantGptEncoder::compute_ppl() { /* ---step 1. project hidden states to vocab logits--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_T, CUBLAS_OP_N, _tw._src_vocab_size, _batch_token_num, - _tw._hidden_size, &_fone, _p_d_src_emb_wei[0], _AType, _tw._hidden_size, - _p_d_query, _BType, _tw._hidden_size, &_fzero, _p_d_logit, _CType, - _tw._src_vocab_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); - -#ifdef DEBUG_RESULT - print_vec(_p_d_logit, "logits", _batch_token_num * _tw._src_vocab_size - 5, - _batch_token_num * _tw._src_vocab_size); -#endif + cublasLtMM_withAlgo_i8IO(_int8_ffn_out_buf, 1, _batch_token_num, + _tw._src_vocab_size, _tw._hidden_size, 0, 0, 0, + _output_ln_clip_max * _src_emb_clip_max / + (_logits_clip_max * _quant_range), + _int8_ffn_in_buf, _int8_p_d_src_emb_wei, + _cublas_lt_handle, _stream, false); /* ---step 2. compute language model ppl--- */ - ker_ppl_launcher<_DataType>( - _batch_size, _batch_seq_len, _max_thread_per_block, _stream, _p_d_logit, - _p_d_token_id, _p_d_real_seq_len, _p_d_ppl, _tw._src_vocab_size); + ker_ppl_i8I_launcher(_batch_size, _batch_seq_len, _max_thread_per_block, + _stream, _int8_ffn_out_buf, _p_d_token_id, + _p_d_real_seq_len, _p_d_ppl, _tw._src_vocab_size, + _logits_clip_max / _quant_range, true); } template class QuantGptEncoder; diff --git a/lightseq/inference/model/quant_gpt_encoder.h b/lightseq/inference/model/quant_gpt_encoder.h index 7adcbd7c..1d2ad883 100644 --- a/lightseq/inference/model/quant_gpt_encoder.h +++ b/lightseq/inference/model/quant_gpt_encoder.h @@ -44,7 +44,7 @@ class QuantGptEncoder { cudaStream_t _stream; cudaStream_t _cache_stream; cublasHandle_t _hd; - // cublasLtHandle_t _cublas_lt_handle; + cublasLtHandle_t _cublas_lt_handle; const _DataType _fone; const _DataType _fzero; const int32_t _ione; @@ -117,8 +117,7 @@ class QuantGptEncoder { int *p_d_sample_id, const QuantGptWeight &tw, cudaStream_t stream, cudaStream_t cache_stream, cublasHandle_t hd); - size_t compute_buffer_bytesize(); - void init_buffer(void *pbuf); + void init_buffer(); std::string check(); void run_one_infer(int batch_size, int batch_seq_len); int run_one_sample(int batch_size, int batch_seq_len); diff --git a/lightseq/inference/pywrapper/quant_gpt.cc b/lightseq/inference/pywrapper/quant_gpt.cc index 6c836d9e..dba7ccc2 100644 --- a/lightseq/inference/pywrapper/quant_gpt.cc +++ b/lightseq/inference/pywrapper/quant_gpt.cc @@ -45,10 +45,6 @@ QuantGpt::QuantGpt(const std::string weight_path, const int max_batch_size) throw std::runtime_error(res); } - size_t buf_bytesize = encoder_->compute_buffer_bytesize(); - std::cout << "Allocated " << buf_bytesize / (1024 * 1024) - << "MB GPU buffer for GPT2" << std::endl; - encoder_->init_buffer(); CHECK_GPU_ERROR(cudaStreamSynchronize(stream_)); } diff --git a/lightseq/inference/pywrapper/wrapper.cc b/lightseq/inference/pywrapper/wrapper.cc index ab9b4f36..ca5cbb2f 100644 --- a/lightseq/inference/pywrapper/wrapper.cc +++ b/lightseq/inference/pywrapper/wrapper.cc @@ -413,7 +413,7 @@ class PyGpt { std::vector output_shape = model_->get_output_shape(0); auto output = py::array_t(output_shape); - float *output_data = output.mutable_data(0, 0); + float *output_data = output.mutable_data(); const float *d_output = static_cast(model_->get_output_ptr(0)); lightseq::cuda::CHECK_GPU_ERROR(cudaMemcpy(output_data, d_output, @@ -518,7 +518,7 @@ class PyQuantGpt { std::vector output_shape = model_->get_output_shape(0); auto output = py::array_t(output_shape); - float *output_data = output.mutable_data(0, 0); + float *output_data = output.mutable_data(); const float *d_output = static_cast(model_->get_output_ptr(0)); lightseq::cuda::CHECK_GPU_ERROR(cudaMemcpy(output_data, d_output, From ca9739b4214f9b03975e608e22d5f19a1e837af6 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Tue, 26 Apr 2022 02:02:07 +0800 Subject: [PATCH 38/49] support quant gpt inference (TODO: fix qkv bias out clip_max, sampling) --- .../export/huggingface/hf_bart_export.py | 7 +- .../export/huggingface/hf_gpt2_export.py | 7 +- .../ls_torch_hf_quant_gpt2_export.py | 5 +- examples/inference/python/export/util.py | 8 + examples/inference/python/test/ls_gpt2.py | 105 +++++++-- .../inference/python/test/ls_quant_bert.py | 2 +- .../inference/python/test/ls_quant_gpt2.py | 209 ++++++++++++++---- .../huggingface/gpt/ls_hf_gpt_layer.py | 2 +- 8 files changed, 279 insertions(+), 66 deletions(-) diff --git a/examples/inference/python/export/huggingface/hf_bart_export.py b/examples/inference/python/export/huggingface/hf_bart_export.py index 82a0effd..5da8102f 100644 --- a/examples/inference/python/export/huggingface/hf_bart_export.py +++ b/examples/inference/python/export/huggingface/hf_bart_export.py @@ -11,6 +11,7 @@ from lightseq.training.ops.pytorch.export import gather_token_embedding, fill_pb_layer from export.proto.transformer_pb2 import Transformer from transformers import BartForConditionalGeneration +from export.util import parse_args os.environ["CUDA_VISIBLE_DEVICES"] = "-1" @@ -512,14 +513,14 @@ def _print_pair(key, value): if __name__ == "__main__": + args = parse_args() + assert args.generation_method in ["beam_search", "topk", "topp", "topk_greedy"] # if save_proto is True, extension .pb will be added, otherwise .hdf5 is added output_lightseq_model_name = "lightseq_bart_base" # you can rename it to "lightseq_bart_large" for large model input_huggingface_bart_model = ( "facebook/bart-base" # Example: you can try "facebook/bart-large" as well ) head_number = 12 # change this to 16 for "bart-large" model - # in order to get score, we should use `beam_search` inference method - generation_method = "beam_search" beam_size = 4 max_step = 50 # max step for generation, it decides GPU memory occupancy # maximum_generation_length = min(src_length + extra_decode_length, max_step) @@ -529,7 +530,7 @@ def _print_pair(key, value): output_lightseq_model_name, input_huggingface_bart_model, head_num=head_number, # layer number - generation_method=generation_method, + generation_method=args.generation_method, beam_size=beam_size, max_step=max_step, extra_decode_length=extra_decode_length, diff --git a/examples/inference/python/export/huggingface/hf_gpt2_export.py b/examples/inference/python/export/huggingface/hf_gpt2_export.py index 89a12b42..de6ff483 100644 --- a/examples/inference/python/export/huggingface/hf_gpt2_export.py +++ b/examples/inference/python/export/huggingface/hf_gpt2_export.py @@ -7,6 +7,7 @@ from collections import OrderedDict from transformers import GPT2LMHeadModel from lightseq.training.ops.pytorch.export import fill_hdf5_layer +from export.util import parse_args os.environ["CUDA_VISIBLE_DEVICES"] = "-1" @@ -146,11 +147,11 @@ def _print_pair(key, value): if __name__ == "__main__": + args = parse_args() + assert args.generation_method in ["topk", "topp", "ppl"] output_lightseq_model_name = "lightseq_gpt2_base" # or "lightseq_gpt2_large" input_huggingface_gpt_model = "gpt2" # or "gpt2-large" head_number = 12 # 20 for "gpt2-large" - # generation_method should be "topk" or "topp" - generation_method = "topk" topk = 1 topp = 0.75 # default eos_id from https://huggingface.co/transformers/model_doc/gpt2.html#gpt2lmheadmodel @@ -161,7 +162,7 @@ def _print_pair(key, value): output_lightseq_model_name, input_huggingface_gpt_model, head_num=head_number, # layer number - generation_method=generation_method, + generation_method=args.generation_method, topk=topk, topp=topp, eos_id=eos_id, diff --git a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py index f1925318..78bce095 100644 --- a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py +++ b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py @@ -198,12 +198,11 @@ def _print_pair(key, value): if __name__ == "__main__": args = parse_args() + assert args.generation_method in ["topk", "topp", "ppl"] model_name = ".".join(args.model.split(".")[:-1]) hdf5_path = f"{model_name}.hdf5" head_number = 12 # 20 for "gpt2-large" - # generation_method should be "topk" or "topp" - generation_method = "topk" topk = 1 topp = 0.75 # default eos_id from https://huggingface.co/transformers/model_doc/gpt2.html#gpt2lmheadmodel @@ -214,7 +213,7 @@ def _print_pair(key, value): hdf5_path, args.model, head_num=head_number, # layer number - generation_method=generation_method, + generation_method=args.generation_method, topk=topk, topp=topp, eos_id=eos_id, diff --git a/examples/inference/python/export/util.py b/examples/inference/python/export/util.py index 270508b3..7ec3ac24 100644 --- a/examples/inference/python/export/util.py +++ b/examples/inference/python/export/util.py @@ -22,6 +22,14 @@ def parse_args(): action="store_true", help="whether to store hdf5", ) + parser.add_argument( + "--generation_method", + "-g", + type=str, + default="beam_search", + choices=["beam_search", "topk_greedy", "topk", "topp", "ppl"], + help="generation method", + ) args = parser.parse_args() return args diff --git a/examples/inference/python/test/ls_gpt2.py b/examples/inference/python/test/ls_gpt2.py index f316d06d..20167aef 100644 --- a/examples/inference/python/test/ls_gpt2.py +++ b/examples/inference/python/test/ls_gpt2.py @@ -2,30 +2,62 @@ import argparse import torch -import numpy as np import lightseq.inference as lsi from transformers import GPT2Tokenizer, GPT2LMHeadModel -def ls_gpt2(model, inputs): +def ls_gpt2(model, inputs, generation_method="topk"): torch.cuda.synchronize() start_time = time.perf_counter() - generated_ids = model.sample(inputs) + results = None + if generation_method == "topk" or generation_method == "topp": + results = model.sample(inputs) + elif generation_method == "ppl": + results = model.ppl(inputs)[0] torch.cuda.synchronize() end_time = time.perf_counter() - return generated_ids, end_time - start_time + return results, end_time - start_time -def hf_gpt2(model, inputs, tokenizer): +def compute_hf_ppl(model, inputs): + max_length = 512 + stride = 512 + end_loc = 0 + + nlls = [] + for i in range(0, inputs.size(1), stride): + begin_loc = max(i + stride - max_length, 0) + end_loc = min(i + stride, inputs.size(1)) + trg_len = end_loc - i + input_ids = inputs[:, begin_loc:end_loc].to("cuda:0") + target_ids = input_ids.clone() + target_ids[:, :-trg_len] = -100 + + with torch.no_grad(): + outputs = model(input_ids, labels=target_ids) + neg_log_likelihood = outputs[0] * trg_len + + nlls.append(neg_log_likelihood) + + ppl = torch.stack(nlls).sum() / end_loc + return ppl.cpu().numpy() + + +def hf_gpt2(model, inputs, tokenizer, generation_method="topk"): inputs = inputs.to("cuda:0") torch.cuda.synchronize() start_time = time.perf_counter() - generated_ids = model.generate( - inputs, max_length=50, pad_token_id=tokenizer.eos_token_id - ) + results = None + if generation_method == "topk" or generation_method == "topp": + results = model.generate( + inputs, max_length=50, pad_token_id=tokenizer.eos_token_id + ) + elif generation_method == "ppl": + results = compute_hf_ppl(model, inputs) + torch.cuda.synchronize() end_time = time.perf_counter() - return generated_ids, end_time - start_time + return results, end_time - start_time def ls_generate(model, tokenizer, inputs): @@ -50,17 +82,49 @@ def hf_generate(model, tokenizer, inputs): print(sent) -def warmup(ls_tokenizer, hf_tokenizer, ls_model, hf_model, sentences): +def ls_ppl(model, tokenizer, inputs): + print("=========lightseq=========") + print("lightseq calculating ppl...") + ls_ppl, ls_time = ls_gpt2(model, inputs, "ppl") + print(f"lightseq time: {ls_time}s") + print("lightseq results:") + print(ls_ppl) + + +def hf_ppl(model, tokenizer, inputs): + print("=========huggingface=========") + print("huggingface calculating ppl...") + hf_ppl, hf_time = hf_gpt2(model, inputs, tokenizer, "ppl") + print(f"huggingface time: {hf_time}s") + print("huggingface results:") + print(hf_ppl) + + +def warmup( + ls_tokenizer, hf_tokenizer, ls_model, hf_model, sentences, generation_method +): ls_inputs = ls_tokenizer(sentences, return_tensors="pt", padding=True)["input_ids"] hf_inputs = hf_tokenizer(sentences, return_tensors="pt", padding=True)["input_ids"] - ls_generate(ls_model, ls_tokenizer, ls_inputs) - hf_generate(hf_model, hf_tokenizer, hf_inputs) + if generation_method == "topk" or generation_method == "topp": + ls_generate(ls_model, ls_tokenizer, ls_inputs) + hf_generate(hf_model, hf_tokenizer, hf_inputs) + elif generation_method == "ppl": + ls_ppl(ls_model, ls_tokenizer, ls_inputs) + hf_ppl(hf_model, hf_tokenizer, hf_inputs) def main(): parser = argparse.ArgumentParser() parser.add_argument("--user_input", action="store_true") + parser.add_argument( + "--generation_method", + "-g", + type=str, + default="topk", + choices=["topk", "topp", "ppl"], + help="generation method", + ) args = parser.parse_args() print("initializing gpt tokenizer...") @@ -93,7 +157,14 @@ def main(): ] print("====================START warmup====================") - warmup(ls_tokenizer, hf_tokenizer, ls_model, hf_model, sentences) + warmup( + ls_tokenizer, + hf_tokenizer, + ls_model, + hf_model, + sentences, + args.generation_method, + ) print("====================END warmup====================") while True: @@ -109,8 +180,12 @@ def main(): "input_ids" ] - ls_generate(ls_model, ls_tokenizer, ls_inputs) - hf_generate(hf_model, hf_tokenizer, hf_inputs) + if args.generation_method == "topk" or args.generation_method == "topp": + ls_generate(ls_model, ls_tokenizer, ls_inputs) + hf_generate(hf_model, hf_tokenizer, hf_inputs) + elif args.generation_method == "ppl": + ls_ppl(ls_model, ls_tokenizer, ls_inputs) + hf_ppl(hf_model, hf_tokenizer, hf_inputs) if not args.user_input: break diff --git a/examples/inference/python/test/ls_quant_bert.py b/examples/inference/python/test/ls_quant_bert.py index b58f7728..29046866 100644 --- a/examples/inference/python/test/ls_quant_bert.py +++ b/examples/inference/python/test/ls_quant_bert.py @@ -3,7 +3,7 @@ import torch from transformers import BertTokenizer, BertForTokenClassification, BertConfig import lightseq.inference as lsi -from lightseq.training.ops.pytorch.quantization import qat_mode, disable_quant +from lightseq.training.ops.pytorch.quantization import qat_mode from lightseq.training.ops.pytorch.torch_transformer_layers import ( BertEmbeddingLayer, TransformerEncoderLayer, diff --git a/examples/inference/python/test/ls_quant_gpt2.py b/examples/inference/python/test/ls_quant_gpt2.py index 74acefc4..80428f04 100644 --- a/examples/inference/python/test/ls_quant_gpt2.py +++ b/examples/inference/python/test/ls_quant_gpt2.py @@ -1,31 +1,73 @@ import time -import argparse import torch -import numpy as np +from torch import nn +from transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config import lightseq.inference as lsi -from transformers import GPT2Tokenizer, GPT2LMHeadModel - - -def ls_gpt2(model, inputs): +from lightseq.training.ops.pytorch.quantization import ( + qat_mode, + QuantLinear, + TensorQuantizer, + weight_quant_config, +) +from lightseq.training.ops.pytorch.torch_transformer_layers import ( + TransformerDecoderLayer, +) +from export.util import parse_args + + +def ls_gpt2(model, inputs, generation_method="topk"): torch.cuda.synchronize() start_time = time.perf_counter() - generated_ids = model.sample(inputs) + results = None + if generation_method == "topk" or generation_method == "topp": + results = model.sample(inputs) + elif generation_method == "ppl": + results = model.ppl(inputs)[0] torch.cuda.synchronize() end_time = time.perf_counter() - return generated_ids, end_time - start_time + return results, end_time - start_time + + +def compute_hf_ppl(model, inputs): + max_length = 512 + stride = 512 + end_loc = 0 + + nlls = [] + for i in range(0, inputs.size(1), stride): + begin_loc = max(i + stride - max_length, 0) + end_loc = min(i + stride, inputs.size(1)) + trg_len = end_loc - i + input_ids = inputs[:, begin_loc:end_loc].to("cuda:0") + target_ids = input_ids.clone() + target_ids[:, :-trg_len] = -100 + with torch.no_grad(): + outputs = model(input_ids, labels=target_ids) + neg_log_likelihood = outputs[0] * trg_len -def hf_gpt2(model, inputs, tokenizer): + nlls.append(neg_log_likelihood) + + ppl = torch.stack(nlls).sum() / end_loc + return ppl.cpu().numpy() + + +def hf_gpt2(model, inputs, tokenizer, generation_method="topk"): inputs = inputs.to("cuda:0") torch.cuda.synchronize() start_time = time.perf_counter() - generated_ids = model.generate( - inputs, max_length=50, pad_token_id=tokenizer.eos_token_id - ) + results = None + if generation_method == "topk" or generation_method == "topp": + results = model.generate( + inputs, max_length=50, pad_token_id=tokenizer.eos_token_id + ) + elif generation_method == "ppl": + results = compute_hf_ppl(model, inputs) + torch.cuda.synchronize() end_time = time.perf_counter() - return generated_ids, end_time - start_time + return results, end_time - start_time def ls_generate(model, tokenizer, inputs): @@ -50,21 +92,106 @@ def hf_generate(model, tokenizer, inputs): print(sent) -def warmup(ls_tokenizer, hf_tokenizer, ls_model, hf_model, sentences): +def ls_ppl(model, tokenizer, inputs): + print("=========lightseq=========") + print("lightseq calculating ppl...") + ls_ppl, ls_time = ls_gpt2(model, inputs, "ppl") + print(f"lightseq time: {ls_time}s") + print("lightseq results:") + print(ls_ppl) + + +def hf_ppl(model, tokenizer, inputs): + print("=========huggingface=========") + print("huggingface calculating ppl...") + hf_ppl, hf_time = hf_gpt2(model, inputs, tokenizer, "ppl") + print(f"huggingface time: {hf_time}s") + print("huggingface results:") + print(hf_ppl) + + +def warmup( + ls_tokenizer, hf_tokenizer, ls_model, hf_model, sentences, generation_method +): ls_inputs = ls_tokenizer(sentences, return_tensors="pt", padding=True)["input_ids"] hf_inputs = hf_tokenizer(sentences, return_tensors="pt", padding=True)["input_ids"] - ls_generate(ls_model, ls_tokenizer, ls_inputs) - hf_generate(hf_model, hf_tokenizer, hf_inputs) + if generation_method == "topk" or generation_method == "topp": + ls_generate(ls_model, ls_tokenizer, ls_inputs) + hf_generate(hf_model, hf_tokenizer, hf_inputs) + elif generation_method == "ppl": + ls_ppl(ls_model, ls_tokenizer, ls_inputs) + hf_ppl(hf_model, hf_tokenizer, hf_inputs) + + +class GptEmbedding(nn.Embedding): + def __init__(self, *args, **kwargs): + super(GptEmbedding, self).__init__(*args, **kwargs) + self.emb_quant = TensorQuantizer(weight_quant_config) + + def forward(self, input_ids): + x = super(GptEmbedding, self).forward(input_ids) + x = self.emb_quant(x) + return x + + +def gen_gpt_enc_config(config): + gpt_enc_config = TransformerDecoderLayer.get_config( + max_batch_tokens=8192, + max_seq_len=config.max_position_embeddings, + hidden_size=config.hidden_size, + intermediate_size=4 * config.hidden_size, + nhead=config.num_attention_heads, + attn_prob_dropout_ratio=config.attn_pdrop, + activation_dropout_ratio=config.resid_pdrop, + hidden_dropout_ratio=config.resid_pdrop, + pre_layer_norm=True, + fp16=True, + local_rank=0, + nlayer=config.num_hidden_layers, + activation_fn="gelu", + has_cross_attn=False, + ) + return gpt_enc_config + + +class LSHFGptEncoderLayer(TransformerDecoderLayer): + def __init__(self, *args, **kwargs): + super(LSHFGptEncoderLayer, self).__init__(*args, **kwargs) + + def forward(self, hidden_states, attention_mask=None, *args, **kwargs): + if attention_mask is not None: + ls_attention_mask = attention_mask.squeeze() + else: + ls_attention_mask = torch.zeros(hidden_states.size()[:2]) + output = super().forward(hidden_states, ls_attention_mask) + return output + + +def inject_ls_layer(model, config): + model.transformer.wte = GptEmbedding(config.vocab_size, config.hidden_size) + model.transformer.wte.apply(qat_mode) + + for i in range(config.num_hidden_layers): + gpt_enc_config = gen_gpt_enc_config(config) + model.transformer.h[i] = LSHFGptEncoderLayer(gpt_enc_config).cuda() + model.transformer.h[i].apply(qat_mode) + + q_lm_head = QuantLinear(config.n_embd, config.vocab_size, bias=False) + q_lm_head.weight = model.transformer.wte.weight + q_lm_head.weight_quant = model.transformer.wte.emb_quant + model.lm_head = q_lm_head def main(): - parser = argparse.ArgumentParser() - parser.add_argument("--user_input", action="store_true") - args = parser.parse_args() + args = parse_args() + model_name = ".".join(args.model.split(".")[:-1]) + ckpt_path = f"{model_name}.bin" - print("initializing gpt tokenizer...") + print("initializing gpt2 config...") + config = GPT2Config.from_pretrained("gpt2") + print("initializing gpt2 tokenizer...") ls_tokenizer = GPT2Tokenizer.from_pretrained("gpt2") # lightseq use len(tokenizer) as pad_token in default ls_tokenizer.add_special_tokens({"pad_token": "[PAD]"}) @@ -75,14 +202,17 @@ def main(): hf_tokenizer.pad_token = hf_tokenizer.eos_token print(f"huggingface tokenizer pad token id: {hf_tokenizer.pad_token_id}") - print("creating lightseq model...") - ls_model = lsi.QuantGpt("lightseq_gpt2_base.hdf5", max_batch_size=16) - print("creating huggingface model...") - hf_model = GPT2LMHeadModel.from_pretrained("gpt2") + hf_model = GPT2LMHeadModel.from_pretrained("gpt2", config=config) + inject_ls_layer(hf_model, config) + state_dict = torch.load(ckpt_path, map_location="cpu") + hf_model.load_state_dict(state_dict, strict=False) hf_model.to("cuda:0") hf_model.eval() + print("creating lightseq model...") + ls_model = lsi.QuantGpt(args.model, 8) + # lightseq gpt perplexity supports batch infer with different lengths, # but sampling doesn't support sentences = [ @@ -93,27 +223,26 @@ def main(): ] print("====================START warmup====================") - warmup(ls_tokenizer, hf_tokenizer, ls_model, hf_model, sentences) + warmup( + ls_tokenizer, + hf_tokenizer, + ls_model, + hf_model, + sentences, + args.generation_method, + ) print("====================END warmup====================") - while True: - if args.user_input: - sentences = [input("input the masked sentence:\n")] - - print("tokenizing the sentences...") - - ls_inputs = ls_tokenizer(sentences, return_tensors="pt", padding=True)[ - "input_ids" - ] - hf_inputs = hf_tokenizer(sentences, return_tensors="pt", padding=True)[ - "input_ids" - ] + print("tokenizing the sentences...") + ls_inputs = ls_tokenizer(sentences, return_tensors="pt", padding=True)["input_ids"] + hf_inputs = hf_tokenizer(sentences, return_tensors="pt", padding=True)["input_ids"] + if args.generation_method == "topk" or args.generation_method == "topp": ls_generate(ls_model, ls_tokenizer, ls_inputs) hf_generate(hf_model, hf_tokenizer, hf_inputs) - - if not args.user_input: - break + elif args.generation_method == "ppl": + ls_ppl(ls_model, ls_tokenizer, ls_inputs) + hf_ppl(hf_model, hf_tokenizer, hf_inputs) if __name__ == "__main__": diff --git a/examples/training/huggingface/gpt/ls_hf_gpt_layer.py b/examples/training/huggingface/gpt/ls_hf_gpt_layer.py index 45ab743c..90061766 100644 --- a/examples/training/huggingface/gpt/ls_hf_gpt_layer.py +++ b/examples/training/huggingface/gpt/ls_hf_gpt_layer.py @@ -80,7 +80,7 @@ def forward(self, hidden_states, attention_mask=None, *args, **kwargs): class GptEmbedding(nn.Embedding): - def __init__(self, training_args, initial_embeddings, *args, **kwargs): + def __init__(self, training_args, initial_embeddings=None, *args, **kwargs): super(GptEmbedding, self).__init__(*args, **kwargs) self.emb_quant = TensorQuantizer(weight_quant_config) From 56eb950d84a8b2b6e5c1df6d1850889a52a9a6b8 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 27 Apr 2022 01:11:17 +0800 Subject: [PATCH 39/49] support quant gpt inference (ppl) --- .../ls_torch_hf_quant_gpt2_export.py | 2 +- .../inference/python/test/ls_quant_gpt2.py | 2 +- examples/triton_backend/README.md | 6 +- .../inference/kernels/gptKernels_int8.cc.cu | 60 +++++++ lightseq/inference/kernels/gptKernels_int8.h | 6 + .../kernels/transformerKernels.cc.cu | 4 +- .../kernels/transformerKernels_int8.cc.cu | 152 +++++++++++++++--- .../kernels/transformerKernels_int8.h | 14 +- .../inference/model/quant_bert_encoder.cc.cu | 11 +- lightseq/inference/model/quant_decoder.cc.cu | 4 +- lightseq/inference/model/quant_encoder.cc.cu | 7 +- .../inference/model/quant_gpt_encoder.cc.cu | 135 +++++++++------- lightseq/inference/model/quant_gpt_encoder.h | 6 + .../ops/pytorch/torch_transformer_layers.py | 2 +- 14 files changed, 318 insertions(+), 93 deletions(-) diff --git a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py index 78bce095..b2547a37 100644 --- a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py +++ b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py @@ -140,7 +140,7 @@ def extract_gpt_weights( token_embedding.numpy(), 127, state_dict["transformer.wte.emb_quant.clip.clip_value_max"].numpy(), - ) + ).transpose() print(f"processed token_embedding, shape: {token_embedding.shape}") hdf5_file.create_dataset( "src_embedding/token_embedding", data=token_embedding, dtype="uint8" diff --git a/examples/inference/python/test/ls_quant_gpt2.py b/examples/inference/python/test/ls_quant_gpt2.py index 80428f04..33b863e1 100644 --- a/examples/inference/python/test/ls_quant_gpt2.py +++ b/examples/inference/python/test/ls_quant_gpt2.py @@ -211,7 +211,7 @@ def main(): hf_model.eval() print("creating lightseq model...") - ls_model = lsi.QuantGpt(args.model, 8) + ls_model = lsi.QuantGpt(args.model, max_batch_size=16) # lightseq gpt perplexity supports batch infer with different lengths, # but sampling doesn't support diff --git a/examples/triton_backend/README.md b/examples/triton_backend/README.md index c4eed7d1..2ab191da 100644 --- a/examples/triton_backend/README.md +++ b/examples/triton_backend/README.md @@ -21,11 +21,11 @@ - The meaning of parameters in config.pbtxt, more information you can find in [Model config of tritonbackend](https://github.com/triton-inference-server/common/blob/main/protobuf/model_config.proto) - > ${name}: name of model,**which should be same with ** + > ${name}: name of model, **which should be same with ** > - > ${backend}: **fixed value - "lightseq"**,which is used to recognize the dynamic link library of tritonbackend, libtriton_lightseq.so + > ${backend}: **fixed value - "lightseq"**, which is used to recognize the dynamic link library of tritonbackend, libtriton_lightseq.so > - > ${default_model_filename}: name of model file,**which should be same with ** + > ${default_model_filename}: name of model file, **which should be same with ** > > ${parameters - value - string_value}: the type of model, which should be supported by lightseq. You can choose `Transformer`|`QuantTransformer`|`Bert`|`Gpt`|`Moe` diff --git a/lightseq/inference/kernels/gptKernels_int8.cc.cu b/lightseq/inference/kernels/gptKernels_int8.cc.cu index 74b64af1..319fcffa 100644 --- a/lightseq/inference/kernels/gptKernels_int8.cc.cu +++ b/lightseq/inference/kernels/gptKernels_int8.cc.cu @@ -183,5 +183,65 @@ void ker_ppl_i8I_launcher(int batch_size, int batch_seq_len, dequant_scale, in_col32); } +template +__global__ void ker_correlation_softmax_gpt_i32I( + int32_t* correlation, T* output, const int* real_seq_len, + const int batch_seq_len, float attn_scale, float dequant_scale) { + int query_token_pos = blockIdx.y % batch_seq_len; + if (query_token_pos >= real_seq_len[blockIdx.x]) { + return; + } + + int mask = 0; // can see the token when mask=0 + if (threadIdx.x > query_token_pos || threadIdx.x >= batch_seq_len) { + mask = 1; // Can only see the token on the left side of it + } + + int idx = (blockIdx.x * gridDim.y + blockIdx.y) * batch_seq_len + threadIdx.x; + float val = threadIdx.x < batch_seq_len + ? ((float)correlation[idx] * attn_scale * dequant_scale * + dequant_scale) + : CUDA_FLOAT_INF_NEG; + float max_val = blockReduceMax(mask ? CUDA_FLOAT_INF_NEG : val); + __shared__ float smax; + if (threadIdx.x == 0) smax = max_val; + __syncthreads(); + + val = mask ? 0.f : expf(val - smax); + float rsum = blockReduceSum(val); + __shared__ float ssum; + if (threadIdx.x == 0) ssum = rsum; + __syncthreads(); + + if (threadIdx.x < batch_seq_len) output[idx] = (T)(val / ssum); +} + +template +void ker_correlation_softmax_gpt_i32I_launcher( + int batch_size, int batch_seq_len, int head_num, cudaStream_t stream, + int32_t* correlation, T* output, const int* real_seq_len, float attn_scale, + float dequant_scale) { + int block_dim = batch_seq_len; + if (batch_seq_len < 1024) { + block_dim = (batch_seq_len + 31) >> 5; + block_dim *= 32; + } + + ker_correlation_softmax_gpt_i32I + <<>>( + correlation, output, real_seq_len, batch_seq_len, attn_scale, + dequant_scale); +} + +template void ker_correlation_softmax_gpt_i32I_launcher( + int batch_size, int batch_seq_len, int head_num, cudaStream_t stream, + int32_t* correlation, float* output, const int* real_seq_len, + float attn_scale, float dequant_scale); + +template void ker_correlation_softmax_gpt_i32I_launcher<__half>( + int batch_size, int batch_seq_len, int head_num, cudaStream_t stream, + int32_t* correlation, __half* output, const int* real_seq_len, + float attn_scale, float dequant_scale); + } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/kernels/gptKernels_int8.h b/lightseq/inference/kernels/gptKernels_int8.h index b2b7f8c0..aaf363f3 100644 --- a/lightseq/inference/kernels/gptKernels_int8.h +++ b/lightseq/inference/kernels/gptKernels_int8.h @@ -21,5 +21,11 @@ void ker_ppl_i8I_launcher(int batch_size, int batch_seq_len, const int* real_seq_len, float* ppl, int vocab_size, float dequant_scale, bool in_col32 = false); +template +void ker_correlation_softmax_gpt_i32I_launcher( + int batch_size, int batch_seq_len, int head_num, cudaStream_t stream, + int32_t* correlation, T* output, const int* real_seq_len, float attn_scale, + float dequant_scale); + } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/kernels/transformerKernels.cc.cu b/lightseq/inference/kernels/transformerKernels.cc.cu index c8794312..05a22094 100644 --- a/lightseq/inference/kernels/transformerKernels.cc.cu +++ b/lightseq/inference/kernels/transformerKernels.cc.cu @@ -810,7 +810,7 @@ __global__ void ker_arrange_decself_qkv(const T* ori_qkv, const T* qkv_bias, T val = ori_qkv[(blockIdx.x * gridDim.y + blockIdx.y) * hidden_size + i] + __ldg(&qkv_bias[blockIdx.y * hidden_size + i]); int seq_id = - blockIdx.x; // obvious, seq_id = batch_id * beam_size + beam_id + blockIdx.x; // obvious, seq_id = batch_id * beam_size + beam_id if (blockIdx.y == 0) { // for query new_q[seq_id * hidden_size + i] = val; @@ -841,7 +841,7 @@ __global__ void ker_arrange_decself_qkv<__half>( half2 val = __hadd2( p_qkv[(blockIdx.x * gridDim.y + blockIdx.y) * half_hidden_size + i], __ldg(&p_bias[blockIdx.y * half_hidden_size + i])); - // obvious,seq_id = batch_id * beam_size + beam_id + // obvious, seq_id = batch_id * beam_size + beam_id int seq_id = blockIdx.x; if (blockIdx.y == 0) { // for query diff --git a/lightseq/inference/kernels/transformerKernels_int8.cc.cu b/lightseq/inference/kernels/transformerKernels_int8.cc.cu index 67406048..49c2955a 100644 --- a/lightseq/inference/kernels/transformerKernels_int8.cc.cu +++ b/lightseq/inference/kernels/transformerKernels_int8.cc.cu @@ -1199,6 +1199,122 @@ template void ker_arrange_encself_qkv_i8I_launcher<__half>( int max_batch_dim, int batch_seq_len, int dim_per_head, int head_num, int max_thread_per_block, float dequant_scale, bool in_col32); +template +__global__ void ker_arrange_encself_qkv_i8I_i8O( + const int8_t *ori_qkv, const T *qkv_bias, int8_t *new_q, int8_t *new_k, + int8_t *new_v, T *d_v, int batch_seq_len, int dim_per_head, int head_num, + float dequant_scale, float quant_scale, bool in_col32) { + int hidden_size = dim_per_head * head_num; + int batch_id = blockIdx.x / batch_seq_len; + int token_id = blockIdx.x % batch_seq_len; + for (std::size_t i = threadIdx.x; i < hidden_size; i += blockDim.x) { + int head_id = i / dim_per_head; + int dim_id = i % dim_per_head; + int target_id = targetid_4dim(batch_id, head_id, token_id, dim_id, head_num, + batch_seq_len, dim_per_head); + int qkv_index; + if (in_col32) { + int row_id = blockIdx.x; + int col_id = blockIdx.y * hidden_size + i; + qkv_index = row_major2flat_col32(row_id, col_id, gridDim.x, + gridDim.y * hidden_size); + } else { + qkv_index = (blockIdx.x * gridDim.y + blockIdx.y) * hidden_size + i; + } + + float val = float(ori_qkv[qkv_index]) * dequant_scale + + __ldg(&qkv_bias[blockIdx.y * hidden_size + i]); + int8_t quant_val = float2int8(val, quant_scale); + + if (blockIdx.y == 0) { + new_q[target_id] = quant_val; + } else if (blockIdx.y == 1) { + new_k[target_id] = quant_val; + } else { + new_v[target_id] = quant_val; + d_v[target_id] = float(quant_val) / quant_scale; + } + } +} + +template <> +__global__ void ker_arrange_encself_qkv_i8I_i8O<__half>( + const int8_t *ori_qkv, const __half *qkv_bias, int8_t *new_q, int8_t *new_k, + int8_t *new_v, __half *d_v, int batch_seq_len, int dim_per_head, + int head_num, float dequant_scale, float quant_scale, bool in_col32) { + int hidden_size = dim_per_head * head_num; + int batch_id = blockIdx.x / batch_seq_len; + int token_id = blockIdx.x % batch_seq_len; + for (std::size_t i = threadIdx.x; i < hidden_size; i += blockDim.x) { + int head_id = i / dim_per_head; + int dim_id = i % dim_per_head; + int target_id = targetid_4dim(batch_id, head_id, token_id, dim_id, head_num, + batch_seq_len, dim_per_head); + int qkv_index; + if (in_col32) { + int row_id = blockIdx.x; + int col_id = blockIdx.y * hidden_size + i; + qkv_index = row_major2flat_col32(row_id, col_id, gridDim.x, + gridDim.y * hidden_size); + } else { + qkv_index = (blockIdx.x * gridDim.y + blockIdx.y) * hidden_size + i; + } + + float val = float(ori_qkv[qkv_index]) * dequant_scale + + __half2float(__ldg(&qkv_bias[blockIdx.y * hidden_size + i])); + int8_t quant_val = float2int8(val, quant_scale); + + if (blockIdx.y == 0) { + new_q[target_id] = quant_val; + } else if (blockIdx.y == 1) { + new_k[target_id] = quant_val; + } else { + new_v[target_id] = quant_val; + d_v[target_id] = __float2half(float(quant_val) / quant_scale); + } + } +} + +template +void ker_arrange_encself_qkv_i8I_i8O_launcher( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t *ori_qkv, const T *qkv_bias, int8_t *new_q, int8_t *new_k, + int8_t *new_v, T *d_v, int batch_seq_len, int dim_per_head, int head_num, + int max_thread_per_block, float dequant_scale, float quant_scale, + bool in_col32) { + ker_arrange_encself_qkv_i8I_i8O + <<>>( + ori_qkv, qkv_bias, new_q, new_k, new_v, d_v, batch_seq_len, + dim_per_head, head_num, dequant_scale, quant_scale, in_col32); +} + +template <> +void ker_arrange_encself_qkv_i8I_i8O_launcher<__half>( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t *ori_qkv, const __half *qkv_bias, int8_t *new_q, int8_t *new_k, + int8_t *new_v, __half *d_v, int batch_seq_len, int dim_per_head, + int head_num, int max_thread_per_block, float dequant_scale, + float quant_scale, bool in_col32) { + ker_arrange_encself_qkv_i8I_i8O<__half> + <<>>( + ori_qkv, qkv_bias, new_q, new_k, new_v, d_v, batch_seq_len, + dim_per_head, head_num, dequant_scale, quant_scale, in_col32); +} + +template void ker_arrange_encself_qkv_i8I_i8O_launcher( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t *ori_qkv, const float *qkv_bias, int8_t *new_q, int8_t *new_k, + int8_t *new_v, float *d_v, int batch_seq_len, int dim_per_head, + int head_num, int max_thread_per_block, float dequant_scale, + float quant_scale, bool in_col32); + +template void ker_arrange_encself_qkv_i8I_i8O_launcher<__half>( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t *ori_qkv, const __half *qkv_bias, int8_t *new_q, int8_t *new_k, + int8_t *new_v, __half *d_v, int batch_seq_len, int dim_per_head, + int head_num, int max_thread_per_block, float dequant_scale, + float quant_scale, bool in_col32); + template __global__ void ker_arrange_atten_output_i8O(const T *ori_q, int8_t *new_q, int beam_size, int dim_per_head, @@ -1294,7 +1410,7 @@ template void ker_arrange_atten_output_i8O_launcher<__half>( int head_num, int max_thread_per_block, float quant_scale, bool out_col32); template -__global__ void ker_arrange_decself_qkv_i8I( +__global__ void ker_arrange_decself_qkv_i8I_i8O( const int8_t *ori_qkv, const T *qkv_bias, int8_t *new_q, int8_t *new_k, int8_t *new_v, int head_num, int dim_per_head, int max_step, int step_id, float dequant_scale, float quant_scale, bool in_col32) { @@ -1313,7 +1429,7 @@ __global__ void ker_arrange_decself_qkv_i8I( __ldg(&qkv_bias[blockIdx.y * hidden_size + i]); int8_t quant_val = float2int8(val, quant_scale); int seq_id = - blockIdx.x; // obvious, seq_id = batch_id * beam_size + beam_id + blockIdx.x; // obvious, seq_id = batch_id * beam_size + beam_id if (blockIdx.y == 0) { // for query new_q[seq_id * hidden_size + i] = quant_val; @@ -1334,7 +1450,7 @@ __global__ void ker_arrange_decself_qkv_i8I( } template <> -__global__ void ker_arrange_decself_qkv_i8I<__half>( +__global__ void ker_arrange_decself_qkv_i8I_i8O<__half>( const int8_t *ori_qkv, const __half *qkv_bias, int8_t *new_q, int8_t *new_k, int8_t *new_v, int head_num, int dim_per_head, int max_step, int step_id, float dequant_scale, float quant_scale, bool in_col32) { @@ -1353,7 +1469,7 @@ __global__ void ker_arrange_decself_qkv_i8I<__half>( __half2float(__ldg(&qkv_bias[blockIdx.y * hidden_size + i])); int8_t quant_val = float2int8(val, quant_scale); int seq_id = - blockIdx.x; // obvious, seq_id = batch_id * beam_size + beam_id + blockIdx.x; // obvious, seq_id = batch_id * beam_size + beam_id if (blockIdx.y == 0) { // for query new_q[seq_id * hidden_size + i] = quant_val; @@ -1374,39 +1490,39 @@ __global__ void ker_arrange_decself_qkv_i8I<__half>( } template -void ker_arrange_decself_qkv_i8I_launcher( +void ker_arrange_decself_qkv_i8I_i8O_launcher( int step_token_num, int hidden_size, cudaStream_t stream, const int8_t *ori_qkv, const T *qkv_bias, int8_t *new_q, int8_t *new_k, int8_t *new_v, int head_num, int dim_per_head, int max_step, int step_id, int max_thread_per_block, float dequant_scale, float quant_scale, bool in_col32) { - ker_arrange_decself_qkv_i8I + ker_arrange_decself_qkv_i8I_i8O <<>>( ori_qkv, qkv_bias, new_q, new_k, new_v, head_num, dim_per_head, max_step, step_id, dequant_scale, quant_scale, in_col32); } // template <> -// void ker_arrange_decself_qkv_i8I_launcher<__half>( +// void ker_arrange_decself_qkv_i8I_i8O_launcher<__half>( // int step_token_num, int hidden_size, cudaStream_t stream, // const int8_t *ori_qkv, const __half *qkv_bias, int8_t *new_q, int8_t // *new_k, int8_t *new_v, int head_num, int dim_per_head, int max_step, int // step_id, int max_thread_per_block, float dequant_scale, float // quant_scale, bool in_col32) { -// ker_arrange_decself_qkv_i8I<__half> +// ker_arrange_decself_qkv_i8I_i8O<__half> // <<>>( // ori_qkv, qkv_bias, new_q, new_k, new_v, head_num, dim_per_head, // max_step, step_id, dequant_scale, quant_scale, in_col32); // } -template void ker_arrange_decself_qkv_i8I_launcher( +template void ker_arrange_decself_qkv_i8I_i8O_launcher( int step_token_num, int hidden_size, cudaStream_t stream, const int8_t *ori_qkv, const float *qkv_bias, int8_t *new_q, int8_t *new_k, int8_t *new_v, int head_num, int dim_per_head, int max_step, int step_id, int max_thread_per_block, float dequant_scale, float quant_scale, bool in_col32); -template void ker_arrange_decself_qkv_i8I_launcher<__half>( +template void ker_arrange_decself_qkv_i8I_i8O_launcher<__half>( int step_token_num, int hidden_size, cudaStream_t stream, const int8_t *ori_qkv, const __half *qkv_bias, int8_t *new_q, int8_t *new_k, int8_t *new_v, int head_num, int dim_per_head, int max_step, int step_id, @@ -1414,7 +1530,7 @@ template void ker_arrange_decself_qkv_i8I_launcher<__half>( bool in_col32); /** -@brief: ker_fuse_softmax_new_value_int8 +@brief: ker_fuse_softmax_new_value_i32I_i8O fused query-key correlation softmax and new_value for decoder self attention @thread @@ -1424,10 +1540,10 @@ blockDim.x = first multiple of WARP_SIZE greater than cur_step + 1 @param correlation: [batch_size, beam_size, head_num, cur_step + 1] */ -__global__ void ker_fuse_softmax_new_value_int8( +__global__ void ker_fuse_softmax_new_value_i32I_i8O( const int32_t *logits, const int8_t *v, int8_t *new_v, int step_num, int max_step, int head_num, int dim_per_head, float attn_scale, - float dequant_scale, float quant_scale, bool col32_out) { + float dequant_scale, float quant_scale, bool out_col32) { int idx = blockIdx.x * max_step + threadIdx.x; float val = threadIdx.x < step_num ? float(logits[idx]) * dequant_scale * dequant_scale * attn_scale @@ -1470,28 +1586,28 @@ __global__ void ker_fuse_softmax_new_value_int8( int col = head_idx * dim_per_head + i; int col_size = head_num * dim_per_head; int new_v_idx = row * col_size + col; - if (col32_out) { + if (out_col32) { new_v_idx = row_major2flat_col32(row, col, row_size, col_size); } new_v[new_v_idx] = float2int8(block_new_value[i], quant_scale); } } -void ker_fuse_softmax_new_value_int8_launcher( +void ker_fuse_softmax_new_value_i32I_i8O_launcher( const int32_t *correlation, const int8_t *v, int8_t *new_v, int batch_head_num, int step_num, int max_step, int head_num, int dim_per_head, float attn_scale, float dequant_scale, float quant_scale, - bool col32_out, cudaStream_t stream) { + bool out_col32, cudaStream_t stream) { int block_dim = step_num; if (step_num < 1024) { block_dim = (step_num + 31) >> 5; block_dim *= 32; } - ker_fuse_softmax_new_value_int8<<< + ker_fuse_softmax_new_value_i32I_i8O<<< batch_head_num, block_dim, dim_per_head * sizeof(float) + step_num * sizeof(float), stream>>>( correlation, v, new_v, step_num, max_step, head_num, dim_per_head, - attn_scale, dequant_scale, quant_scale, col32_out); + attn_scale, dequant_scale, quant_scale, out_col32); } template diff --git a/lightseq/inference/kernels/transformerKernels_int8.h b/lightseq/inference/kernels/transformerKernels_int8.h index 247943ed..ce8ac1d8 100644 --- a/lightseq/inference/kernels/transformerKernels_int8.h +++ b/lightseq/inference/kernels/transformerKernels_int8.h @@ -72,6 +72,14 @@ void ker_arrange_encself_qkv_i8I_launcher( int batch_seq_len, int dim_per_head, int head_num, int max_thread_per_block, float dequant_scale, bool in_col32 = false); +template +void ker_arrange_encself_qkv_i8I_i8O_launcher( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t *ori_qkv, const T *qkv_bias, int8_t *new_q, int8_t *new_k, + int8_t *new_v, T *d_v, int batch_seq_len, int dim_per_head, int head_num, + int max_thread_per_block, float dequant_scale, float quant_scale, + bool in_col32 = false); + template void ker_arrange_atten_output_i8O_launcher( int batch_token_num, int hidden_size, cudaStream_t stream, const T *ori_q, @@ -79,17 +87,17 @@ void ker_arrange_atten_output_i8O_launcher( int max_thread_per_block, float quant_scale, bool out_col32 = false); template -void ker_arrange_decself_qkv_i8I_launcher( +void ker_arrange_decself_qkv_i8I_i8O_launcher( int step_token_num, int hidden_size, cudaStream_t stream, const int8_t *ori_qkv, const T *qkv_bias, int8_t *new_q, int8_t *new_k, int8_t *new_v, int head_num, int dim_per_head, int max_step, int step_id, int max_thread_per_block, float dequant_scale, float quant_scale, bool in_col32 = false); -void ker_fuse_softmax_new_value_int8_launcher( +void ker_fuse_softmax_new_value_i32I_i8O_launcher( const int32_t *correlation, const int8_t *v, int8_t *new_v, int batch_head_num, int step_num, int max_step, int head_num, int head_dim, - float attn_scale, float dequant_scale, float quant_scale, bool col32_out, + float attn_scale, float dequant_scale, float quant_scale, bool out_col32, cudaStream_t stream); template diff --git a/lightseq/inference/model/quant_bert_encoder.cc.cu b/lightseq/inference/model/quant_bert_encoder.cc.cu index 8af3ff9c..c4dec5f5 100644 --- a/lightseq/inference/model/quant_bert_encoder.cc.cu +++ b/lightseq/inference/model/quant_bert_encoder.cc.cu @@ -71,9 +71,10 @@ void QuantBertEncoder::init_buffer() { CHECK_GPU_ERROR( cudaMalloc(&_int8_p_d_src_emb_wei, _tw._src_vocab_size * _tw._hidden_size * sizeof(int8_t))); - quantize_weight(_p_d_src_emb_wei[0], _int8_p_d_src_emb_wei, _tw._hidden_size, - _tw._src_vocab_size, _quant_range / _src_emb_clip_max, - _stream, _cublas_lt_handle, kRowMajor); + quantize_weight(_p_d_src_emb_wei[0], _int8_p_d_src_emb_wei, + _tw._src_vocab_size, _tw._hidden_size, + _quant_range / _src_emb_clip_max, _stream, _cublas_lt_handle, + kRowMajor); _p_device_emb.push_back(nullptr); _p_device_emb.push_back( @@ -356,7 +357,7 @@ void QuantBertEncoder::self_attention() { for (int i = 0; i < _batch_size; i++) { // batch_id for (int j = 0; j < _batch_seq_len; j++) { // token_id std::cout << "attn_ln input: token-" << j << std::endl; - print_vec(_int8_ffn_in_buf + i * _batch_seq_len * _tw._hidden_size + + print_vec(_int8_ffn_out_buf + i * _batch_seq_len * _tw._hidden_size + j * _tw._hidden_size, "attn_ln input", 10); } @@ -380,7 +381,7 @@ void QuantBertEncoder::ffn_add_norm() { for (int i = 0; i < _batch_size; i++) { // batch_id for (int j = 0; j < _batch_seq_len; j++) { // token_id std::cout << "ffn1 input: token-" << j << std::endl; - print_vec(_int8_ffn_out_buf + i * _batch_seq_len * _tw._hidden_size + + print_vec(_int8_ffn_in_buf + i * _batch_seq_len * _tw._hidden_size + j * _tw._hidden_size, "ffn1 input", 10); } diff --git a/lightseq/inference/model/quant_decoder.cc.cu b/lightseq/inference/model/quant_decoder.cc.cu index b9ab65dc..604d6713 100644 --- a/lightseq/inference/model/quant_decoder.cc.cu +++ b/lightseq/inference/model/quant_decoder.cc.cu @@ -638,7 +638,7 @@ void QuantDecoder::self_attention() { // get q, k, v by split and reshape qkv - ker_arrange_decself_qkv_i8I_launcher<_DataType>( + ker_arrange_decself_qkv_i8I_i8O_launcher<_DataType>( _step_token_num, _tw._hidden_size, _stream, _int8_ffn_out_buf, _p_device_wei[_weight_offset + 3], _int8_ffn_in_buf, _p_d_self_k_cache1[_layer_id], _p_d_self_v_cache1[_layer_id], @@ -671,7 +671,7 @@ void QuantDecoder::self_attention() { CHECK_GPU_ERROR(cudaGetLastError()); #endif - ker_fuse_softmax_new_value_int8_launcher( + ker_fuse_softmax_new_value_i32I_i8O_launcher( _int32_ffn_out_buf, _p_d_self_v_cache1[_layer_id], _int8_ffn_in_buf, _step_token_num * _tw._head_num, _cur_step + 1, _tw._max_step, _tw._head_num, _tw._dim_per_head, float(_atten_scaler), diff --git a/lightseq/inference/model/quant_encoder.cc.cu b/lightseq/inference/model/quant_encoder.cc.cu index 4f736848..c8cb8159 100644 --- a/lightseq/inference/model/quant_encoder.cc.cu +++ b/lightseq/inference/model/quant_encoder.cc.cu @@ -76,9 +76,10 @@ void QuantEncoder::init_buffer() { CHECK_GPU_ERROR( cudaMalloc(&_int8_p_d_src_emb_wei, _tw._src_vocab_size * _tw._hidden_size * sizeof(int8_t))); - quantize_weight(_p_d_src_emb_wei[0], _int8_p_d_src_emb_wei, _tw._hidden_size, - _tw._src_vocab_size, _quant_range / _src_emb_clip_max, - _stream, _cublas_lt_handle, kRowMajor); + quantize_weight(_p_d_src_emb_wei[0], _int8_p_d_src_emb_wei, + _tw._src_vocab_size, _tw._hidden_size, + _quant_range / _src_emb_clip_max, _stream, _cublas_lt_handle, + kRowMajor); _p_device_emb.push_back(nullptr); _p_device_emb.push_back( diff --git a/lightseq/inference/model/quant_gpt_encoder.cc.cu b/lightseq/inference/model/quant_gpt_encoder.cc.cu index 51117534..6499d112 100644 --- a/lightseq/inference/model/quant_gpt_encoder.cc.cu +++ b/lightseq/inference/model/quant_gpt_encoder.cc.cu @@ -73,17 +73,19 @@ void QuantGptEncoder::init_buffer() { _p_d_k = qkv_buf + _max_batch_dim; _p_d_v = qkv_buf + 2 * _max_batch_dim; - CHECK_GPU_ERROR(cudaMalloc(&_p_d_c, _max_batch_size * _tw._head_num * - _tw._max_step * _tw._max_step * - sizeof(_DataType))); + int max_attn_score_dim = round_up( + _max_batch_size * _tw._head_num * _tw._max_step * _tw._max_step, 32); + + CHECK_GPU_ERROR(cudaMalloc(&_p_d_c, max_attn_score_dim * sizeof(_DataType))); int max_batch_dim = _max_batch_size * _tw._max_step * round_up(std::max(_tw._inner_size, _tw._hidden_size * 3), 32); CHECK_GPU_ERROR( cudaMalloc(&_int8_ffn_in_buf, max_batch_dim * sizeof(int8_t))); - CHECK_GPU_ERROR( - cudaMalloc(&_int32_ffn_out_buf, max_batch_dim * sizeof(int32_t))); + CHECK_GPU_ERROR(cudaMalloc( + &_int32_ffn_out_buf, + std::max(max_batch_dim, max_attn_score_dim) * sizeof(int32_t))); CHECK_GPU_ERROR( cudaMalloc(&_int8_ffn_out_buf, std::max(max_batch_dim, round_up(_tw._src_vocab_size, 32) * @@ -103,7 +105,7 @@ void QuantGptEncoder::init_buffer() { quantize_weight(_p_d_src_emb_wei[0], _int8_p_d_src_emb_bottom_wei, _tw._hidden_size, _tw._src_vocab_size, _quant_range / _src_emb_clip_max, _stream, _cublas_lt_handle, - kRowMajor); + kColMajor); _p_device_emb.push_back(nullptr); _p_device_emb.push_back( to_gpu(_p_d_src_emb_wei[1], _tw._max_step * _tw._hidden_size, _stream)); @@ -114,26 +116,25 @@ void QuantGptEncoder::init_buffer() { // malloc reused kv cache max size: _tw._hidden_size * 2 * _tw._n_enc_layer * // _max_batch_size * _max_step * sizeof(T) - // int8_t *self_kv_cache_buffer; - // int8_t *sliding_p; - // CHECK_GPU_ERROR( - // cudaMalloc(&self_kv_cache_buffer, - // _layer_size_self_k * _tw._n_enc_layer * 4 * - // sizeof(int8_t))); - - // sliding_p = self_kv_cache_buffer; - // for (int i = 0; i < _tw._n_enc_layer * 2; i++) { - // _p_d_self_k_cache.push_back(sliding_p); - // sliding_p += _layer_size_self_k; - // } - // for (int i = 0; i < _tw._n_enc_layer * 2; i++) { - // _p_d_self_v_cache.push_back(sliding_p); - // sliding_p += _layer_size_self_k; - // } - // _p_d_self_k_cache1 = _p_d_self_k_cache.data(); - // _p_d_self_k_cache2 = _p_d_self_k_cache.data() + _tw._n_enc_layer; - // _p_d_self_v_cache1 = _p_d_self_v_cache.data(); - // _p_d_self_v_cache2 = _p_d_self_v_cache.data() + _tw._n_enc_layer; + int8_t *self_kv_cache_buffer; + int8_t *sliding_p; + CHECK_GPU_ERROR( + cudaMalloc(&self_kv_cache_buffer, + _max_batch_dim * _tw._n_enc_layer * 4 * sizeof(int8_t))); + + sliding_p = self_kv_cache_buffer; + for (int i = 0; i < _tw._n_enc_layer * 2; i++) { + _p_d_self_k_cache.push_back(sliding_p); + sliding_p += _max_batch_dim; + } + for (int i = 0; i < _tw._n_enc_layer * 2; i++) { + _p_d_self_v_cache.push_back(sliding_p); + sliding_p += _max_batch_dim; + } + _p_d_self_k_cache1 = _p_d_self_k_cache.data(); + _p_d_self_k_cache2 = _p_d_self_k_cache.data() + _tw._n_enc_layer; + _p_d_self_v_cache1 = _p_d_self_v_cache.data(); + _p_d_self_v_cache2 = _p_d_self_v_cache.data() + _tw._n_enc_layer; // malloc weights _int8_p_d_enc_wei = std::vector(_tw._n_enc_layer * 4); @@ -186,7 +187,7 @@ void QuantGptEncoder::init_buffer() { _int8_p_d_enc_wei[_layer_id * 4 + 1], _tw._hidden_size, _tw._hidden_size, _quant_range / _enc_clip_max[_layer_id * 12 + 1], _stream, - _cublas_lt_handle, kColMajor); + _cublas_lt_handle); quantize_weight(_p_d_enc_wei[_weight_offset + 8], _int8_p_d_enc_wei[_layer_id * 4 + 2], _tw._hidden_size, @@ -198,7 +199,7 @@ void QuantGptEncoder::init_buffer() { _int8_p_d_enc_wei[_layer_id * 4 + 3], _tw._inner_size, _tw._hidden_size, _quant_range / _enc_clip_max[_layer_id * 12 + 3], _stream, - _cublas_lt_handle, kColMajor); + _cublas_lt_handle); _scaled_ffn2_colsum[_layer_id] = nullptr; } @@ -276,10 +277,6 @@ void QuantGptEncoder::run_one_infer(int batch_size, _int8_p_d_src_emb_bottom_wei, _p_device_emb[1], _p_d_token_id, _p_d_query, _p_d_real_seq_len, _tw._padding_id, 0, _src_emb_clip_max / _quant_range); -#ifdef DEBUG_RESULT - print_vec(_p_d_query, "input embeddings", 10); -#endif - for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { _weight_offset = _layer_id * _tw._weight_per_enc_layer; self_attention(); @@ -501,7 +498,6 @@ void QuantGptEncoder::self_attention(bool cache) { _max_thread_per_block, _quant_range / _enc_clip_max[_layer_id * 12 + 4], false, true); } - CHECK_GPU_ERROR(cudaGetLastError()); cublasLtMM_withAlgo_i8IO( _int8_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size * 3, @@ -511,15 +507,22 @@ void QuantGptEncoder::self_attention(bool cache) { _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4], _cublas_lt_handle, _stream, false); +#ifdef DEBUG_RESULT + print_vec(_int8_ffn_in_buf, "attn qkv in", 20); + print_vec(_int8_p_d_enc_wei[_layer_id * 4], "attn qkv w", 20); + print_vec(_int8_ffn_out_buf, "attn qkv out", 20); +#endif + // get q, k, v by split and reshape qkv - ker_arrange_encself_qkv_i8I_launcher<_DataType>( + ker_arrange_encself_qkv_i8I_i8O_launcher<_DataType>( _batch_token_num, _tw._hidden_size, _stream, _int8_ffn_out_buf, - _p_device_wei[_weight_offset + 3], _p_d_q, _max_batch_dim, _batch_seq_len, - _tw._dim_per_head, _tw._head_num, _max_thread_per_block, - _enc_clip_max[_layer_id * 12 + 8] / _quant_range, true); + _p_device_wei[_weight_offset + 3], _int8_ffn_in_buf, + _p_d_self_k_cache1[_layer_id], _p_d_self_v_cache1[_layer_id], _p_d_v, + _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block, + _enc_clip_max[_layer_id * 12 + 8] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 11], true); if (cache) { - throw std::runtime_error("QuantGpt sample() not implemented"); cudaStream_t stream; if (_batch_token_num > 360) { stream = _cache_stream; @@ -527,29 +530,30 @@ void QuantGptEncoder::self_attention(bool cache) { } else { stream = _stream; } - CHECK_GPU_ERROR( - cudaMemcpyAsync(_p_d_k_cache + _layer_id * _max_batch_dim, _p_d_k, - _batch_token_num * _tw._hidden_size * sizeof(_DataType), - cudaMemcpyDeviceToDevice, stream)); - CHECK_GPU_ERROR( - cudaMemcpyAsync(_p_d_v_cache + _layer_id * _max_batch_dim, _p_d_v, - _batch_token_num * _tw._hidden_size * sizeof(_DataType), - cudaMemcpyDeviceToDevice, stream)); + CHECK_GPU_ERROR(cudaMemcpyAsync( + _p_d_self_k_cache2[_layer_id], _p_d_self_k_cache1[_layer_id], + _batch_token_num * _tw._hidden_size * sizeof(_DataType), + cudaMemcpyDeviceToDevice, stream)); + CHECK_GPU_ERROR(cudaMemcpyAsync( + _p_d_self_v_cache2[_layer_id], _p_d_self_v_cache1[_layer_id], + _batch_token_num * _tw._hidden_size * sizeof(_DataType), + cudaMemcpyDeviceToDevice, stream)); } /* ---step 2. correlation = q * k, perform softmax on correlation--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( _hd, CUBLAS_OP_T, CUBLAS_OP_N, _batch_seq_len, _batch_seq_len, - _tw._dim_per_head, &_atten_scaler, _p_d_k, _AType, _tw._dim_per_head, - _batch_seq_len * _tw._dim_per_head, _p_d_q, _BType, _tw._dim_per_head, - _batch_seq_len * _tw._dim_per_head, &_fzero, _p_d_c, _CType, - _batch_seq_len, _batch_seq_len * _batch_seq_len, - _batch_size * _tw._head_num, _computeType, + _tw._dim_per_head, &_ione, _p_d_self_k_cache1[_layer_id], CUDA_R_8I, + _tw._dim_per_head, _batch_seq_len * _tw._dim_per_head, _int8_ffn_in_buf, + CUDA_R_8I, _tw._dim_per_head, _batch_seq_len * _tw._dim_per_head, &_izero, + _int32_ffn_out_buf, CUDA_R_32I, _batch_seq_len, + _batch_seq_len * _batch_seq_len, _batch_size * _tw._head_num, CUDA_R_32I, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); - ker_correlation_softmax_gpt_launcher<_DataType>(_batch_size, _batch_seq_len, - _tw._head_num, _stream, - _p_d_c, _p_d_real_seq_len); + ker_correlation_softmax_gpt_i32I_launcher<_DataType>( + _batch_size, _batch_seq_len, _tw._head_num, _stream, _int32_ffn_out_buf, + _p_d_c, _p_d_real_seq_len, _atten_scaler, + _enc_clip_max[_layer_id * 12 + 11] / _quant_range); /* ---step 3. new_q = correlation * v--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( @@ -578,6 +582,12 @@ void QuantGptEncoder::self_attention(bool cache) { _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 1], _cublas_lt_handle, _stream, false); +#ifdef DEBUG_RESULT + print_vec(_int8_ffn_in_buf, "attn out in", 20); + print_vec(_int8_p_d_enc_wei[_layer_id * 4 + 1], "attn out w", 20); + print_vec(_int8_ffn_out_buf, "attn out out", 20); +#endif + ker_residual_bias_ln_i8I_i8O_launcher<_DataType>( _int8_ffn_out_buf, _p_device_wei[_weight_offset + 6], _p_device_wei[_weight_offset + 7], _p_device_wei[_weight_offset + 11], @@ -754,6 +764,12 @@ void QuantGptEncoder::ffn_add_norm() { _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 2], _cublas_lt_handle, _stream, false); +#ifdef DEBUG_RESULT + print_vec(_int8_ffn_in_buf, "ffn1 in", 20); + print_vec(_int8_p_d_enc_wei[_layer_id * 4 + 2], "ffn1 w", 20); + print_vec(_int8_ffn_out_buf, "ffn1 out", 20); +#endif + ker_bias_gelu_i8I_i8O_launcher<_DataType>( _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, _p_device_wei[_weight_offset + 9], _tw._inner_size, @@ -767,6 +783,12 @@ void QuantGptEncoder::ffn_add_norm() { _int8_p_d_enc_wei[_layer_id * 4 + 3], _cublas_lt_handle, _stream, false); +#ifdef DEBUG_RESULT + print_vec(_int8_ffn_in_buf, "ffn2 in", 20); + print_vec(_int8_p_d_enc_wei[_layer_id * 4 + 3], "ffn2 w", 20); + print_vec(_int32_ffn_out_buf, "ffn2 out", 20); +#endif + const _DataType *scale_ptr, *bias_ptr, *res_bias_ptr; float clip_max, dequant_scale; dequant_scale = _enc_clip_max[_layer_id * 12 + 3] * @@ -835,6 +857,11 @@ void QuantGptEncoder::compute_ppl() { (_logits_clip_max * _quant_range), _int8_ffn_in_buf, _int8_p_d_src_emb_wei, _cublas_lt_handle, _stream, false); +#ifdef DEBUG_RESULT + print_vec(_int8_ffn_in_buf, "logits in", 20); + print_vec(_int8_p_d_src_emb_wei, "logits w", 20); + print_vec(_int8_ffn_out_buf, "logits out", 20); +#endif /* ---step 2. compute language model ppl--- */ ker_ppl_i8I_launcher(_batch_size, _batch_seq_len, _max_thread_per_block, diff --git a/lightseq/inference/model/quant_gpt_encoder.h b/lightseq/inference/model/quant_gpt_encoder.h index 1d2ad883..a84cc830 100644 --- a/lightseq/inference/model/quant_gpt_encoder.h +++ b/lightseq/inference/model/quant_gpt_encoder.h @@ -78,6 +78,12 @@ class QuantGptEncoder { int8_t *_int8_ffn_in_buf; int32_t *_int32_ffn_out_buf; int8_t *_int8_ffn_out_buf; + std::vector _p_d_self_k_cache; + std::vector _p_d_self_v_cache; + int8_t **_p_d_self_k_cache1; + int8_t **_p_d_self_k_cache2; + int8_t **_p_d_self_v_cache1; + int8_t **_p_d_self_v_cache2; // {token_emb, pos_emb, norm_scale, norm_bias} const std::vector &_p_d_src_emb_wei; diff --git a/lightseq/training/ops/pytorch/torch_transformer_layers.py b/lightseq/training/ops/pytorch/torch_transformer_layers.py index 49376b92..9f141d8d 100644 --- a/lightseq/training/ops/pytorch/torch_transformer_layers.py +++ b/lightseq/training/ops/pytorch/torch_transformer_layers.py @@ -379,6 +379,7 @@ def forward( else: attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim) attn = self.out_proj(attn) + attn_weights: Optional[Tensor] = None if need_weights: attn_weights = attn_weights_float.view( @@ -861,7 +862,6 @@ def forward( Returns: encoded output of shape `(seq_len, batch, embed_dim)` """ - if need_head_weights: need_attn = True x = x.transpose(0, 1) From d3a58073d81bb07c8012ab00aef1ee5165928e8f Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Wed, 27 Apr 2022 20:22:55 +0800 Subject: [PATCH 40/49] support quant gpt inference (sampling) --- examples/inference/cpp/CMakeLists.txt | 3 + examples/inference/cpp/gpt_example.cc | 2 +- examples/inference/cpp/quant_gpt_example.cc | 65 ++ .../export/huggingface/hf_bart_export.py | 3 +- .../export/huggingface/hf_gpt2_export.py | 3 +- .../ls_torch_hf_quant_gpt2_export.py | 3 +- examples/inference/python/test/ls_gpt2.py | 8 +- .../inference/python/test/ls_quant_gpt2.py | 14 +- .../inference/kernels/gptKernels_int8.cc.cu | 608 ++++++++++++++++++ lightseq/inference/kernels/gptKernels_int8.h | 32 + .../inference/model/quant_gpt_encoder.cc.cu | 337 ++++------ 11 files changed, 862 insertions(+), 216 deletions(-) create mode 100644 examples/inference/cpp/quant_gpt_example.cc diff --git a/examples/inference/cpp/CMakeLists.txt b/examples/inference/cpp/CMakeLists.txt index e2f1b630..dbf92330 100644 --- a/examples/inference/cpp/CMakeLists.txt +++ b/examples/inference/cpp/CMakeLists.txt @@ -15,5 +15,8 @@ target_link_libraries(quant_bert_example PUBLIC liblightseq) add_executable(gpt_example gpt_example.cc) target_link_libraries(gpt_example PUBLIC liblightseq) +add_executable(quant_gpt_example quant_gpt_example.cc) +target_link_libraries(quant_gpt_example PUBLIC liblightseq) + add_executable(transformer_decoder_example decoder_example.cc.cu) target_link_libraries(transformer_decoder_example PUBLIC transformer_model) diff --git a/examples/inference/cpp/gpt_example.cc b/examples/inference/cpp/gpt_example.cc index c1defe1a..79e86e8c 100644 --- a/examples/inference/cpp/gpt_example.cc +++ b/examples/inference/cpp/gpt_example.cc @@ -58,7 +58,7 @@ int main(int argc, char* argv[]) { } std::cout << std::endl; - lightseq::cuda::print_vec(d_output, "output", 5); + lightseq::cuda::print_vec(d_output, "output", 10); } return 0; diff --git a/examples/inference/cpp/quant_gpt_example.cc b/examples/inference/cpp/quant_gpt_example.cc new file mode 100644 index 00000000..cec015a8 --- /dev/null +++ b/examples/inference/cpp/quant_gpt_example.cc @@ -0,0 +1,65 @@ +#include "model_base.h" +#include "gpt.h" + +/** +@file +Example of how to run gpt inference using our implementation. +*/ + +int main(int argc, char* argv[]) { + std::string model_weights_path = argv[1]; + int max_batch_size = 128; + + auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( + "QuantGpt", model_weights_path, max_batch_size); + + int batch_size = 1; + int batch_seq_len = 5; + std::vector host_input = {3666, 1438, 318, 402, 11571}; + + void* d_input; + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_input, sizeof(int) * batch_size * batch_seq_len)); + lightseq::cuda::CHECK_GPU_ERROR(cudaMemcpy( + d_input, host_input.data(), sizeof(int) * batch_size * batch_seq_len, + cudaMemcpyHostToDevice)); + + model->set_input_ptr(0, d_input); + model->set_input_shape(0, {batch_size, batch_seq_len}); + + for (int i = 0; i < model->get_output_size(); i++) { + void* d_output; + std::vector shape = model->get_output_max_shape(i); + int total_size = 1; + for (int j = 0; j < shape.size(); j++) { + total_size *= shape[j]; + } + lightseq::cuda::CHECK_GPU_ERROR( + cudaMalloc(&d_output, total_size * sizeof(int))); + model->set_output_ptr(i, d_output); + } + lightseq::cuda::CHECK_GPU_ERROR(cudaStreamSynchronize(0)); + std::cout << "infer preprocessing finished" << std::endl; + + /* ---step5. infer and log--- */ + for (int i = 0; i < 10; i++) { + auto start = std::chrono::high_resolution_clock::now(); + model->Infer(); + lightseq::cuda::print_time_duration(start, "one infer time", 0); + } + + for (int i = 0; i < model->get_output_size(); i++) { + const int* d_output; + d_output = static_cast(model->get_output_ptr(i)); + std::vector shape = model->get_output_shape(i); + std::cout << "output shape: "; + for (int j = 0; j < shape.size(); j++) { + std::cout << shape[j] << " "; + } + std::cout << std::endl; + + lightseq::cuda::print_vec(d_output, "output", 10); + } + + return 0; +} diff --git a/examples/inference/python/export/huggingface/hf_bart_export.py b/examples/inference/python/export/huggingface/hf_bart_export.py index 5da8102f..d4f6e519 100644 --- a/examples/inference/python/export/huggingface/hf_bart_export.py +++ b/examples/inference/python/export/huggingface/hf_bart_export.py @@ -514,7 +514,8 @@ def _print_pair(key, value): if __name__ == "__main__": args = parse_args() - assert args.generation_method in ["beam_search", "topk", "topp", "topk_greedy"] + if args.generation_method not in ["beam_search", "topk", "topp", "topk_greedy"]: + args.generation_method = "beam_search" # if save_proto is True, extension .pb will be added, otherwise .hdf5 is added output_lightseq_model_name = "lightseq_bart_base" # you can rename it to "lightseq_bart_large" for large model input_huggingface_bart_model = ( diff --git a/examples/inference/python/export/huggingface/hf_gpt2_export.py b/examples/inference/python/export/huggingface/hf_gpt2_export.py index de6ff483..aa559a10 100644 --- a/examples/inference/python/export/huggingface/hf_gpt2_export.py +++ b/examples/inference/python/export/huggingface/hf_gpt2_export.py @@ -148,7 +148,8 @@ def _print_pair(key, value): if __name__ == "__main__": args = parse_args() - assert args.generation_method in ["topk", "topp", "ppl"] + if args.generation_method not in ["topk", "topp", "ppl"]: + args.generation_method = "topk" output_lightseq_model_name = "lightseq_gpt2_base" # or "lightseq_gpt2_large" input_huggingface_gpt_model = "gpt2" # or "gpt2-large" head_number = 12 # 20 for "gpt2-large" diff --git a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py index b2547a37..b42bb3c8 100644 --- a/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py +++ b/examples/inference/python/export/huggingface/ls_torch_hf_quant_gpt2_export.py @@ -198,7 +198,8 @@ def _print_pair(key, value): if __name__ == "__main__": args = parse_args() - assert args.generation_method in ["topk", "topp", "ppl"] + if args.generation_method not in ["topk", "topp", "ppl"]: + args.generation_method = "topk" model_name = ".".join(args.model.split(".")[:-1]) hdf5_path = f"{model_name}.hdf5" diff --git a/examples/inference/python/test/ls_gpt2.py b/examples/inference/python/test/ls_gpt2.py index 20167aef..dd595e38 100644 --- a/examples/inference/python/test/ls_gpt2.py +++ b/examples/inference/python/test/ls_gpt2.py @@ -150,10 +150,10 @@ def main(): # lightseq gpt perplexity supports batch infer with different lengths, # but sampling doesn't support sentences = [ - "My name is GPT", - "My name is GPT", - "My name is GPT", - "My name is GPT", + "I love you, but you", + "I love you, but you", + "I love you, but you", + "I love you, but you", ] print("====================START warmup====================") diff --git a/examples/inference/python/test/ls_quant_gpt2.py b/examples/inference/python/test/ls_quant_gpt2.py index 33b863e1..e65dc89f 100644 --- a/examples/inference/python/test/ls_quant_gpt2.py +++ b/examples/inference/python/test/ls_quant_gpt2.py @@ -118,7 +118,7 @@ def warmup( if generation_method == "topk" or generation_method == "topp": ls_generate(ls_model, ls_tokenizer, ls_inputs) - hf_generate(hf_model, hf_tokenizer, hf_inputs) + # hf_generate(hf_model, hf_tokenizer, hf_inputs) elif generation_method == "ppl": ls_ppl(ls_model, ls_tokenizer, ls_inputs) hf_ppl(hf_model, hf_tokenizer, hf_inputs) @@ -185,6 +185,8 @@ def inject_ls_layer(model, config): def main(): args = parse_args() + if args.generation_method not in ["topk", "topp", "ppl"]: + args.generation_method = "topk" model_name = ".".join(args.model.split(".")[:-1]) ckpt_path = f"{model_name}.bin" @@ -216,10 +218,10 @@ def main(): # lightseq gpt perplexity supports batch infer with different lengths, # but sampling doesn't support sentences = [ - "My name is GPT", - "My name is GPT", - "My name is GPT", - "My name is GPT", + "I love you, but you", + "I love you, but you", + "I love you, but you", + "I love you, but you", ] print("====================START warmup====================") @@ -239,7 +241,7 @@ def main(): if args.generation_method == "topk" or args.generation_method == "topp": ls_generate(ls_model, ls_tokenizer, ls_inputs) - hf_generate(hf_model, hf_tokenizer, hf_inputs) + # hf_generate(hf_model, hf_tokenizer, hf_inputs) elif args.generation_method == "ppl": ls_ppl(ls_model, ls_tokenizer, ls_inputs) hf_ppl(hf_model, hf_tokenizer, hf_inputs) diff --git a/lightseq/inference/kernels/gptKernels_int8.cc.cu b/lightseq/inference/kernels/gptKernels_int8.cc.cu index 319fcffa..6fc89d93 100644 --- a/lightseq/inference/kernels/gptKernels_int8.cc.cu +++ b/lightseq/inference/kernels/gptKernels_int8.cc.cu @@ -11,6 +11,12 @@ Currently, fp16 and fp32 versions are provided */ namespace lightseq { namespace cuda { +__forceinline__ __device__ int8_t float2int8(float x, float quant_scale) { + float i8_f = x * quant_scale; + int32_t i8 = floorf(i8_f + 0.5); + i8 = i8 < -127 ? -127 : (i8 > 127 ? 127 : i8); + return int8_t(i8); +} template __global__ void ker_gpt_embedding_int8(const int8_t* token_emb, @@ -243,5 +249,607 @@ template void ker_correlation_softmax_gpt_i32I_launcher<__half>( int32_t* correlation, __half* output, const int* real_seq_len, float attn_scale, float dequant_scale); +template +__global__ void ker_topk_sample_i8I(const int8_t* logits, int* old_input_ids, + int* new_input_ids, const int* real_seq_len, + const int vocab_size, + const int batch_seq_len, int logits_seq_len, + int* unfinished, curandState* curandstate, + int eos_id, float dequant_scale, + bool in_col32) { + int last_token_idx_in_batch = blockIdx.x * batch_seq_len + batch_seq_len - 1; + + /* add EOS to end if last token is EOS */ + if (old_input_ids[last_token_idx_in_batch] == eos_id) { + int left_token_idx = blockIdx.x * batch_seq_len + threadIdx.x; + int right_token_idx = (blockIdx.x + 1) * batch_seq_len; + for (int idx = left_token_idx; idx < right_token_idx; idx += blockDim.x) { + int new_idx = idx + blockIdx.x; + new_input_ids[new_idx] = old_input_ids[idx]; + } + if (threadIdx.x == 0) { + // blockIdx.x * (batch_seq_len+1) + batch_seq_len + new_input_ids[(blockIdx.x + 1) * (batch_seq_len + 1) - 1] = eos_id; + old_input_ids[gridDim.x * batch_seq_len + blockIdx.x] = eos_id; + } + return; + } + int logits_token_idx_in_batch = + blockIdx.x * logits_seq_len + logits_seq_len - 1; + int left_logit_idx = logits_token_idx_in_batch * vocab_size + threadIdx.x; + int right_logit_idx = (logits_token_idx_in_batch + 1) * vocab_size; + + /* + step1. find max logit and rough Kth logit over the whole vocab + */ + __shared__ float s_max_logit, s_topk_logit; + float rough_top_kth_logit = CUDA_FLOAT_INF_NEG; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32(row_id, col_id, + gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + rough_top_kth_logit = + fmaxf(rough_top_kth_logit, (float)logits[logits_idx] * dequant_scale); + } + float max_logit = blockReduceMax(rough_top_kth_logit); + rough_top_kth_logit = blockRoughTopK(rough_top_kth_logit); + if (threadIdx.x == 0) { + s_topk_logit = rough_top_kth_logit; + s_max_logit = max_logit; + } + __syncthreads(); + + __shared__ int s_tid; + + if (k != 1) { + /* step2 hold one logit per thread which larger than Kth logit and sample + * from them */ + float topk_exp_sum, topk_exp = CUDA_FLOAT_INF_NEG; + int topk_tid = vocab_size; + int test_num = 0; + __shared__ float s_topk_exp_sum; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32( + row_id, col_id, gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + float logit = (float)logits[logits_idx] * dequant_scale; + float logit_exp = expf(fmaxf(logit - s_max_logit, logit_thresh_min)); + if (logit >= s_topk_logit) test_num++; + if (logit >= s_topk_logit && logit_exp > topk_exp) { + topk_exp = logit_exp; + topk_tid = idx - left_logit_idx + threadIdx.x; + } + } + + test_num = blockReduceSum(test_num); + + if (topk_tid == vocab_size) topk_exp = 0; + topk_exp_sum = blockReduceSum(topk_exp); + if (threadIdx.x == 0) { + s_topk_exp_sum = topk_exp_sum; + } + __syncthreads(); + + /* calculate cumulative probability */ + float topk_prob = topk_exp / s_topk_exp_sum; + float prefix_sum_prob; + typedef cub::BlockScan BlockScan; + __shared__ typename BlockScan::TempStorage temp_storage; + BlockScan(temp_storage).InclusiveSum(topk_prob, prefix_sum_prob); + + __shared__ float random_x; + if (threadIdx.x == 0) { + random_x = curand_uniform(curandstate + blockIdx.x); + } + __syncthreads(); + + if (threadIdx.x == 0) { + s_tid = vocab_size; + } + __syncthreads(); + + int threadID = threadIdx.x; + __shared__ int s_threadID; + __shared__ float s_max_prob; + if (random_x > prefix_sum_prob) threadID = blockDim.x; + threadID = blockReduceMin(threadID); + float max_prob = blockReduceMax(topk_prob); + if (threadIdx.x == 0) { + s_threadID = threadID; + s_max_prob = max_prob; + } + __syncthreads(); + if (threadIdx.x == s_threadID) { + s_tid = topk_tid; + } + __syncthreads(); + + if (s_tid == vocab_size && topk_prob == s_max_prob) { + s_tid = topk_tid; + } + __syncthreads(); + } else { + s_tid = vocab_size; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32( + row_id, col_id, gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + float logit = (float)logits[logits_idx] * dequant_scale; + if (logit == s_max_logit) { + s_tid = idx - left_logit_idx + threadIdx.x; + } + } + __syncthreads(); + } + + /* if new sampled tid is not EOS, set unfinish TRUE */ + if (threadIdx.x == 0) { + if (s_tid != eos_id) unfinished[0] = 1; + } + + /* step3 copy old_input_ids to new_input_ids and add new sampled ids */ + int left_token_idx = blockIdx.x * batch_seq_len + threadIdx.x; + int right_token_idx = (blockIdx.x + 1) * batch_seq_len; + for (int idx = left_token_idx; idx < right_token_idx; idx += blockDim.x) { + int new_idx = idx + blockIdx.x; + new_input_ids[new_idx] = old_input_ids[idx]; + } + if (threadIdx.x == 0) { + new_input_ids[(blockIdx.x + 1) * (batch_seq_len + 1) - 1] = s_tid; + // save the newly sampled ids to old_input_ids for next step inputs + old_input_ids[gridDim.x * batch_seq_len + blockIdx.x] = s_tid; + } +} + +void ker_topk_sample_i8I_launcher(int batch_size, int batch_seq_len, + int logits_seq_len, int max_thread_per_block, + cudaStream_t stream, const int8_t* logits, + int* old_input_ids, int* new_input_ids, + const int* real_seq_len, const int vocab_size, + const int k, int* unfinished, + curandState* curandstate, int eos_id, + float dequant_scale, bool in_col32) { + if (k == 1) + ker_topk_sample_i8I<1><<>>( + logits, old_input_ids, new_input_ids, real_seq_len, vocab_size, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 2) + ker_topk_sample_i8I<2><<>>( + logits, old_input_ids, new_input_ids, real_seq_len, vocab_size, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 4) + ker_topk_sample_i8I<4><<>>( + logits, old_input_ids, new_input_ids, real_seq_len, vocab_size, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 8) + ker_topk_sample_i8I<8><<>>( + logits, old_input_ids, new_input_ids, real_seq_len, vocab_size, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 16) + ker_topk_sample_i8I<16><<>>( + logits, old_input_ids, new_input_ids, real_seq_len, vocab_size, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 32) + ker_topk_sample_i8I<32><<>>( + logits, old_input_ids, new_input_ids, real_seq_len, vocab_size, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else { + throw std::invalid_argument("topk argument should be in [1,2,4,8,16,32]"); + } +} + +__global__ void ker_topp_sample_i8I(const int8_t* logits, int* old_input_ids, + int* new_input_ids, const int* real_seq_len, + const int vocab_size, + const int batch_seq_len, int logits_seq_len, + int* unfinished, float p, + curandState* curandstate, int eos_id, + float dequant_scale, bool in_col32) { + int token_idx_in_batch = blockIdx.x * batch_seq_len + batch_seq_len - 1; + + /* add EOS to end if last token is EOS */ + if (old_input_ids[token_idx_in_batch] == eos_id) { + int left_token_idx = blockIdx.x * batch_seq_len + threadIdx.x; + int right_token_idx = (blockIdx.x + 1) * batch_seq_len; + for (int idx = left_token_idx; idx < right_token_idx; idx += blockDim.x) { + int new_idx = idx + blockIdx.x; + new_input_ids[new_idx] = old_input_ids[idx]; + } + if (threadIdx.x == 0) { + new_input_ids[(blockIdx.x + 1) * (batch_seq_len + 1) - 1] = eos_id; + old_input_ids[gridDim.x * batch_seq_len + blockIdx.x] = eos_id; + } + return; + } + int logits_token_idx_in_batch = + blockIdx.x * logits_seq_len + logits_seq_len - 1; + int left_logit_idx = logits_token_idx_in_batch * vocab_size + threadIdx.x; + int right_logit_idx = (logits_token_idx_in_batch + 1) * vocab_size; + + /* + step1. find max logit in each thread and sample from these probs with nucleus + sampling + */ + __shared__ float s_max_logit; + float max_logit = CUDA_FLOAT_INF_NEG; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32(row_id, col_id, + gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + max_logit = fmaxf(max_logit, (float)logits[logits_idx] * dequant_scale); + } + float max_logit_array[1]; + max_logit_array[0] = max_logit; + typedef cub::BlockRadixSort BlockRadixSort; + __shared__ typename BlockRadixSort::TempStorage sort_temp_storage; + BlockRadixSort(sort_temp_storage).SortDescending(max_logit_array); + float presum_max_logit_exp; + max_logit = max_logit_array[0]; + + float block_max_logit = blockReduceMax(max_logit); + if (threadIdx.x == 0) { + s_max_logit = block_max_logit; + } + __syncthreads(); + + float biased_logit_exp = + expf(fmaxf(max_logit - s_max_logit, logit_thresh_min)); + + typedef cub::BlockScan BlockScan; + __shared__ typename BlockScan::TempStorage presum_temp_storage; + BlockScan(presum_temp_storage) + .InclusiveSum(biased_logit_exp, presum_max_logit_exp); + + float topp_exp_threshold; + if (threadIdx.x == blockDim.x - 1) { + topp_exp_threshold = p * presum_max_logit_exp; + } + __shared__ float s_presum_logit_exp_threshold; + if (presum_max_logit_exp > topp_exp_threshold) { + presum_max_logit_exp = CUDA_FLOAT_INF_NEG; + } + float logit_exp_threshold = blockReduceMax(presum_max_logit_exp); + if (threadIdx.x == 0) { + s_presum_logit_exp_threshold = logit_exp_threshold; + } + __syncthreads(); + + __shared__ float s_logit_threshold; + if (presum_max_logit_exp == s_presum_logit_exp_threshold) { + s_logit_threshold = max_logit; + } + __syncthreads(); + + /* step2 hold one logit per thread and sample + * from them */ + float topk_exp_sum, topk_exp = CUDA_FLOAT_INF_NEG; + int topk_tid = vocab_size; + int test_num = 0; + __shared__ float s_topk_exp_sum; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32(row_id, col_id, + gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + float logit = (float)logits[logits_idx] * dequant_scale; + float logit_exp = expf(fmaxf(logit - s_max_logit, logit_thresh_min)); + if (logit >= s_logit_threshold) test_num++; + if (logit >= s_logit_threshold && logit_exp > topk_exp) { + topk_exp = logit_exp; + topk_tid = idx - left_logit_idx + threadIdx.x; + } + } + + test_num = blockReduceSum(test_num); + + if (topk_tid == vocab_size) topk_exp = 0; + topk_exp_sum = blockReduceSum(topk_exp); + if (threadIdx.x == 0) { + s_topk_exp_sum = topk_exp_sum; + } + __syncthreads(); + + /* calculate cumulative probability */ + float topk_prob = topk_exp / s_topk_exp_sum; + float prefix_sum_prob; + BlockScan(presum_temp_storage).InclusiveSum(topk_prob, prefix_sum_prob); + + __shared__ float random_x; + if (threadIdx.x == 0) { + random_x = curand_uniform(curandstate + blockIdx.x); + } + __syncthreads(); + + __shared__ int s_tid; + if (threadIdx.x == 0) { + s_tid = vocab_size; + } + __syncthreads(); + + int threadID = threadIdx.x; + __shared__ int s_threadID; + __shared__ float s_max_prob; + if (random_x > prefix_sum_prob) threadID = blockDim.x; + threadID = blockReduceMin(threadID); + float max_prob = blockReduceMax(topk_prob); + if (threadIdx.x == 0) { + s_threadID = threadID; + s_max_prob = max_prob; + } + __syncthreads(); + if (threadIdx.x == s_threadID) { + s_tid = topk_tid; + } + __syncthreads(); + + if (s_tid == vocab_size && topk_prob == s_max_prob) { + s_tid = topk_tid; + } + __syncthreads(); + + /* if new sampled tid is not EOS, set unfinish TRUE */ + if (threadIdx.x == 0) { + if (s_tid != eos_id) unfinished[0] = 1; + } + + /* step3 copy old_input_ids to new_input_ids and add new sampled ids */ + int left_token_idx = blockIdx.x * batch_seq_len + threadIdx.x; + int right_token_idx = (blockIdx.x + 1) * batch_seq_len; + for (int idx = left_token_idx; idx < right_token_idx; idx += blockDim.x) { + int new_idx = idx + blockIdx.x; + new_input_ids[new_idx] = old_input_ids[idx]; + } + if (threadIdx.x == 0) { + new_input_ids[(blockIdx.x + 1) * (batch_seq_len + 1) - 1] = s_tid; + // save the newly sampled ids to old_input_ids for next step inputs + old_input_ids[gridDim.x * batch_seq_len + blockIdx.x] = s_tid; + } +} + +void ker_topp_sample_i8I_launcher(int batch_size, int batch_seq_len, + int logits_seq_len, int max_thread_per_block, + cudaStream_t stream, const int8_t* logits, + int* old_input_ids, int* new_input_ids, + const int* real_seq_len, const int vocab_size, + const float p, int* unfinished, + curandState* curandstate, int eos_id, + float dequant_scale, bool in_col32) { + ker_topp_sample_i8I<<>>( + logits, old_input_ids, new_input_ids, real_seq_len, vocab_size, + batch_seq_len, logits_seq_len, unfinished, p, curandstate, eos_id, + dequant_scale, in_col32); +} + +template +__global__ void ker_arrange_qkv_with_cache_i8I_i8O( + const int8_t* ori_qkv, const T* qkv_bias, int8_t* new_q, int8_t* new_k, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, T* d_v, int batch_seq_len, + int dim_per_head, int head_num, float dequant_scale, float quant_scale, + bool in_col32) { + int hidden_size = head_num * dim_per_head; + int batch_size = gridDim.x / batch_seq_len; + int batch_id = blockIdx.x / batch_seq_len; + int token_id = blockIdx.x % batch_seq_len; + int head_id = threadIdx.x / dim_per_head; + int dim_id = threadIdx.x % dim_per_head; + int target_id = targetid_4dim(batch_id, head_id, token_id, dim_id, head_num, + batch_seq_len, dim_per_head); + int8_t new_val; + + if (token_id < batch_seq_len - 1) { + int old_target_id = + targetid_4dim(batch_id, head_id, token_id, dim_id, head_num, + batch_seq_len - 1, dim_per_head); + if (blockIdx.y == 0) return; + if (blockIdx.y == 1) new_val = k_cache[old_target_id]; + if (blockIdx.y == 2) new_val = v_cache[old_target_id]; + } else { + int qkv_index; + if (in_col32) { + int row_id = batch_id; + int col_id = blockIdx.y * hidden_size + threadIdx.x; + qkv_index = row_major2flat_col32(row_id, col_id, batch_size, + gridDim.y * hidden_size); + } else { + qkv_index = + (batch_id * gridDim.y + blockIdx.y) * hidden_size + threadIdx.x; + } + float tmp_val = float(ori_qkv[qkv_index]) * dequant_scale + + __ldg(&qkv_bias[blockIdx.y * hidden_size + threadIdx.x]); + new_val = float2int8(tmp_val, quant_scale); + if (blockIdx.y == 0) { + target_id = targetid_4dim(batch_id, head_id, 0, dim_id, head_num, 1, + dim_per_head); + } + } + + if (blockIdx.y == 0) new_q[target_id] = new_val; + if (blockIdx.y == 1) new_k[target_id] = new_val; + if (blockIdx.y == 2) { + new_v[target_id] = new_val; + d_v[target_id] = float(new_val) / quant_scale; + } +} + +template <> +__global__ void ker_arrange_qkv_with_cache_i8I_i8O<__half>( + const int8_t* ori_qkv, const __half* qkv_bias, int8_t* new_q, int8_t* new_k, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, __half* d_v, + int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, + float quant_scale, bool in_col32) { + int hidden_size = head_num * dim_per_head; + int batch_size = gridDim.x / batch_seq_len; + int batch_id = blockIdx.x / batch_seq_len; + int token_id = blockIdx.x % batch_seq_len; + int head_id = threadIdx.x / dim_per_head; + int dim_id = threadIdx.x % dim_per_head; + int target_id = targetid_4dim(batch_id, head_id, token_id, dim_id, head_num, + batch_seq_len, dim_per_head); + int8_t new_val; + + if (token_id < batch_seq_len - 1) { + int old_target_id = + targetid_4dim(batch_id, head_id, token_id, dim_id, head_num, + batch_seq_len - 1, dim_per_head); + if (blockIdx.y == 0) return; + if (blockIdx.y == 1) new_val = k_cache[old_target_id]; + if (blockIdx.y == 2) new_val = v_cache[old_target_id]; + } else { + int qkv_index; + if (in_col32) { + int row_id = batch_id; + int col_id = blockIdx.y * hidden_size + threadIdx.x; + qkv_index = row_major2flat_col32(row_id, col_id, batch_size, + gridDim.y * hidden_size); + } else { + qkv_index = + (batch_id * gridDim.y + blockIdx.y) * hidden_size + threadIdx.x; + } + float tmp_val = + float(ori_qkv[qkv_index]) * dequant_scale + + __half2float(__ldg(&qkv_bias[blockIdx.y * hidden_size + threadIdx.x])); + new_val = float2int8(tmp_val, quant_scale); + if (blockIdx.y == 0) { + target_id = targetid_4dim(batch_id, head_id, 0, dim_id, head_num, 1, + dim_per_head); + } + } + + if (blockIdx.y == 0) new_q[target_id] = new_val; + if (blockIdx.y == 1) new_k[target_id] = new_val; + if (blockIdx.y == 2) { + new_v[target_id] = new_val; + d_v[target_id] = __float2half(float(new_val) / quant_scale); + } +} + +template +void ker_arrange_qkv_with_cache_i8I_i8O_launcher( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t* ori_qkv, const T* qkv_bias, int8_t* new_q, int8_t* new_k, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, T* d_v, int batch_seq_len, + int dim_per_head, int head_num, float dequant_scale, float quant_scale, + bool in_col32) { + ker_arrange_qkv_with_cache_i8I_i8O + <<>>( + ori_qkv, qkv_bias, new_q, new_k, k_cache, new_v, v_cache, d_v, + batch_seq_len, dim_per_head, head_num, dequant_scale, quant_scale, + in_col32); +} + +template <> +void ker_arrange_qkv_with_cache_i8I_i8O_launcher<__half>( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t* ori_qkv, const __half* qkv_bias, int8_t* new_q, int8_t* new_k, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, __half* d_v, + int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, + float quant_scale, bool in_col32) { + ker_arrange_qkv_with_cache_i8I_i8O<__half> + <<>>( + ori_qkv, qkv_bias, new_q, new_k, k_cache, new_v, v_cache, d_v, + batch_seq_len, dim_per_head, head_num, dequant_scale, quant_scale, + in_col32); +} + +template void ker_arrange_qkv_with_cache_i8I_i8O_launcher( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t* ori_qkv, const float* qkv_bias, int8_t* new_q, int8_t* new_k, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, float* d_v, + int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, + float quant_scale, bool in_col32); + +template void ker_arrange_qkv_with_cache_i8I_i8O_launcher<__half>( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t* ori_qkv, const __half* qkv_bias, int8_t* new_q, int8_t* new_k, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, __half* d_v, + int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, + float quant_scale, bool in_col32); + +template +__global__ void ker_attention_mask_weights_i32I( + int32_t* correlation, T* output, const int* real_seq_len, int dst_seq_len, + int src_seq_len, float attn_scale, float dequant_scale) { + int query_token_pos = blockIdx.y % dst_seq_len + src_seq_len - dst_seq_len; + if (query_token_pos >= real_seq_len[blockIdx.x]) { + return; + } + int mask = 0; // can see the token when mask=0 + if (threadIdx.x > query_token_pos) { + mask = 1; // Can only see the token on the left side of it + } + + int idx = (blockIdx.x * gridDim.y + blockIdx.y) * blockDim.x + threadIdx.x; + float val = + (float)correlation[idx] * attn_scale * dequant_scale * dequant_scale; + float max_val = blockReduceMax(mask ? CUDA_FLOAT_INF_NEG : val); + __shared__ float smax; + if (threadIdx.x == 0) smax = max_val; + __syncthreads(); + + val = mask ? 0.f : expf(fmaxf(logit_thresh_min, val - smax)); + float rsum = blockReduceSum(val); + __shared__ float ssum; + if (threadIdx.x == 0) ssum = rsum; + __syncthreads(); + + output[idx] = (T)(val / (ssum + epsilon)); +} + +template +void ker_attention_mask_weights_i32I_launcher( + int batch_size, int dst_seq_len, int src_seq_len, int head_num, + cudaStream_t stream, int32_t* correlation, T* output, + const int* real_seq_len, float attn_scale, float dequant_scale) { + ker_attention_mask_weights_i32I + <<>>( + correlation, output, real_seq_len, dst_seq_len, src_seq_len, + attn_scale, dequant_scale); +} + +template void ker_attention_mask_weights_i32I_launcher( + int batch_size, int dst_seq_len, int src_seq_len, int head_num, + cudaStream_t stream, int32_t* correlation, float* output, + const int* real_seq_len, float attn_scale, float dequant_scale); + +template void ker_attention_mask_weights_i32I_launcher<__half>( + int batch_size, int dst_seq_len, int src_seq_len, int head_num, + cudaStream_t stream, int32_t* correlation, __half* output, + const int* real_seq_len, float attn_scale, float dequant_scale); + } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/kernels/gptKernels_int8.h b/lightseq/inference/kernels/gptKernels_int8.h index aaf363f3..1e1822e0 100644 --- a/lightseq/inference/kernels/gptKernels_int8.h +++ b/lightseq/inference/kernels/gptKernels_int8.h @@ -27,5 +27,37 @@ void ker_correlation_softmax_gpt_i32I_launcher( int32_t* correlation, T* output, const int* real_seq_len, float attn_scale, float dequant_scale); +void ker_topk_sample_i8I_launcher(int batch_size, int batch_seq_len, + int logits_seq_len, int max_thread_per_block, + cudaStream_t stream, const int8_t* logits, + int* old_input_ids, int* new_input_ids, + const int* real_seq_len, const int vocab_size, + const int k, int* all_finished, + curandState* curandstate, int eos_id, + float dequant_scale, bool in_col32 = false); + +void ker_topp_sample_i8I_launcher(int batch_size, int batch_seq_len, + int logits_seq_len, int max_thread_per_block, + cudaStream_t stream, const int8_t* logits, + int* old_input_ids, int* new_input_ids, + const int* real_seq_len, const int vocab_size, + const float p, int* unfinished, + curandState* curandstate, int eos_id, + float dequant_scale, bool in_col32 = false); + +template +void ker_arrange_qkv_with_cache_i8I_i8O_launcher( + int batch_token_num, int hidden_size, cudaStream_t stream, + const int8_t* ori_qkv, const T* qkv_bias, int8_t* new_q, int8_t* new_k, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, T* d_v, int batch_seq_len, + int dim_per_head, int head_num, float dequant_scale, float quant_scale, + bool in_col32 = false); + +template +void ker_attention_mask_weights_i32I_launcher( + int batch_size, int dst_seq_len, int src_seq_len, int head_num, + cudaStream_t stream, int32_t* correlation, T* output, + const int* real_seq_len, float attn_scale, float dequant_scale); + } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/model/quant_gpt_encoder.cc.cu b/lightseq/inference/model/quant_gpt_encoder.cc.cu index 6499d112..7333165f 100644 --- a/lightseq/inference/model/quant_gpt_encoder.cc.cu +++ b/lightseq/inference/model/quant_gpt_encoder.cc.cu @@ -319,15 +319,11 @@ int QuantGptEncoder::run_one_sample(int batch_size, #endif // token embedding, add position embedding and layer_norm - ker_gpt_embedding_launcher<_DataType>( - _batch_size, _batch_seq_len, _tw._hidden_size, _stream, _p_device_emb[0], - _p_device_emb[1], _p_d_sample_id, _p_d_query, _p_d_real_seq_len, - _tw._padding_id, 0); - -#ifdef DEBUG_RESULT - print_vec(_p_d_query, "embedding", _batch_token_num * _tw._hidden_size - 10, - _batch_token_num * _tw._hidden_size); -#endif + ker_gpt_embedding_i8I_launcher<_DataType>( + _batch_size, _batch_seq_len, _tw._hidden_size, _stream, + _int8_p_d_src_emb_bottom_wei, _p_device_emb[1], _p_d_sample_id, + _p_d_query, _p_d_real_seq_len, _tw._padding_id, 0, + _src_emb_clip_max / _quant_range); for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { _weight_offset = _layer_id * _tw._weight_per_enc_layer; @@ -335,10 +331,6 @@ int QuantGptEncoder::run_one_sample(int batch_size, ffn_add_norm(); } - // last layer norm - ker_norm_layer_launcher<_DataType>(_batch_token_num, _tw._hidden_size, - _stream, _p_d_query, _p_device_emb[2], - _p_device_emb[3], _max_thread_per_block); if (sample_one_token() == 0 || _batch_seq_len >= _tw._max_step) { CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_sample_id_buf, _p_d_sample_id, _batch_token_num * sizeof(int), @@ -355,35 +347,19 @@ int QuantGptEncoder::run_one_sample(int batch_size, #endif // token embedding, add position embedding and layer_norm - ker_gpt_embedding_launcher<_DataType>( - _batch_size, 1, _tw._hidden_size, _stream, _p_device_emb[0], + ker_gpt_embedding_i8I_launcher<_DataType>( + batch_size, 1, _tw._hidden_size, _stream, _int8_p_d_src_emb_bottom_wei, _p_device_emb[1], _p_d_last_sample_id, _p_d_query, _p_d_real_seq_len, - _tw._padding_id, _batch_seq_len - 1); -#ifdef DEBUG_RESULT - print_vec(_p_d_query, "embedding", _batch_size * _tw._hidden_size - 10, - _batch_size * _tw._hidden_size); -#endif + _tw._padding_id, _batch_seq_len - 1, _src_emb_clip_max / _quant_range); + for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { _weight_offset = _layer_id * _tw._weight_per_enc_layer; self_attention_with_cache(); ffn_add_norm_with_cache(); } - // last layer norm - ker_norm_layer_launcher<_DataType>(_batch_size, _tw._hidden_size, _stream, - _p_d_query, _p_device_emb[2], - _p_device_emb[3], _max_thread_per_block); -#ifdef DEBUG_RESULT - - print_vec(_p_d_query, "_p_d_query before logits", - _batch_size * _tw._hidden_size - 10, - _batch_size * _tw._hidden_size); if (sample_one_token_with_cache() == 0 || _batch_seq_len >= _tw._max_step) break; -#else - if (sample_one_token_with_cache() == 0 || _batch_seq_len >= _tw._max_step) - break; -#endif } CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_sample_id_buf, _p_d_sample_id, @@ -397,35 +373,32 @@ int QuantGptEncoder::run_one_sample(int batch_size, template int QuantGptEncoder::sample_one_token() { /* ---step 1. project hidden states to vocab logits--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_T, CUBLAS_OP_N, _tw._src_vocab_size, _batch_token_num, - _tw._hidden_size, &_fone, _p_device_emb[0], _AType, _tw._hidden_size, - _p_d_query, _BType, _tw._hidden_size, &_fzero, _p_d_logit, _CType, - _tw._src_vocab_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); -#ifdef DEBUG_RESULT - print_vec(_p_d_logit, "logits", _batch_token_num * _tw._src_vocab_size - 10, - _batch_token_num * _tw._src_vocab_size); -#endif + cublasLtMM_withAlgo_i8IO(_int8_ffn_out_buf, 1, _batch_token_num, + _tw._src_vocab_size, _tw._hidden_size, 0, 0, 0, + _output_ln_clip_max * _src_emb_clip_max / + (_logits_clip_max * _quant_range), + _int8_ffn_in_buf, _int8_p_d_src_emb_wei, + _cublas_lt_handle, _stream, false); CHECK_GPU_ERROR(cudaMemsetAsync(_p_d_unfinished, 0, sizeof(int), _stream)); /* ---step 2. sample new tokens from logits */ if (_tw._sampling_method == "topk") { #ifdef DEBUG_RESULT std::cout << "sampling using topk\n"; #endif - ker_topk_sample_launcher<_DataType>( + ker_topk_sample_i8I_launcher( _batch_size, _batch_seq_len, _batch_seq_len, _max_thread_per_block, - _stream, _p_d_logit, _p_d_sample_id, _p_d_sample_id_buf, + _stream, _int8_ffn_out_buf, _p_d_sample_id, _p_d_sample_id_buf, _p_d_real_seq_len, _tw._src_vocab_size, _tw._topk, _p_d_unfinished, - _p_d_curandstate, _tw._eos_id); + _p_d_curandstate, _tw._eos_id, _logits_clip_max / _quant_range, true); } else { #ifdef DEBUG_RESULT std::cout << "sampling using topp\n"; #endif - ker_topp_sample_launcher<_DataType>( + ker_topp_sample_i8I_launcher( _batch_size, _batch_seq_len, _batch_seq_len, _max_thread_per_block, - _stream, _p_d_logit, _p_d_sample_id, _p_d_sample_id_buf, + _stream, _int8_ffn_out_buf, _p_d_sample_id, _p_d_sample_id_buf, _p_d_real_seq_len, _tw._src_vocab_size, _tw._topp, _p_d_unfinished, - _p_d_curandstate, _tw._eos_id); + _p_d_curandstate, _tw._eos_id, _logits_clip_max / _quant_range, true); } int *temp = _p_d_sample_id; _p_d_sample_id = _p_d_sample_id_buf; @@ -442,17 +415,12 @@ int QuantGptEncoder::sample_one_token() { template int QuantGptEncoder::sample_one_token_with_cache() { /* ---step 1. project hidden states to vocab logits--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_T, CUBLAS_OP_N, _tw._src_vocab_size, _batch_size, - _tw._hidden_size, &_fone, _p_device_emb[0], _AType, _tw._hidden_size, - _p_d_query, _BType, _tw._hidden_size, &_fzero, _p_d_logit, _CType, - _tw._src_vocab_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); - -#ifdef DEBUG_RESULT - print_vec(_p_d_logit, "sampling-logits", - _batch_size * _tw._src_vocab_size - 5, - _batch_size * _tw._src_vocab_size); -#endif + cublasLtMM_withAlgo_i8IO(_int8_ffn_out_buf, 1, _batch_size, + _tw._src_vocab_size, _tw._hidden_size, 0, 0, 0, + _output_ln_clip_max * _src_emb_clip_max / + (_logits_clip_max * _quant_range), + _int8_ffn_in_buf, _int8_p_d_src_emb_wei, + _cublas_lt_handle, _stream, false); CHECK_GPU_ERROR(cudaMemsetAsync(_p_d_unfinished, 0, sizeof(int), _stream)); // /* ---step 2. sample new tokens from logits */ @@ -460,20 +428,20 @@ int QuantGptEncoder::sample_one_token_with_cache() { #ifdef DEBUG_RESULT std::cout << "sampling using topk\n"; #endif - ker_topk_sample_launcher<_DataType>( + ker_topk_sample_i8I_launcher( _batch_size, _batch_seq_len, 1, _max_thread_per_block, _stream, - _p_d_logit, _p_d_sample_id, _p_d_sample_id_buf, _p_d_real_seq_len, - _tw._src_vocab_size, _tw._topk, _p_d_unfinished, _p_d_curandstate, - _tw._eos_id); + _int8_ffn_out_buf, _p_d_sample_id, _p_d_sample_id_buf, + _p_d_real_seq_len, _tw._src_vocab_size, _tw._topk, _p_d_unfinished, + _p_d_curandstate, _tw._eos_id, _logits_clip_max / _quant_range, true); } else { #ifdef DEBUG_RESULT std::cout << "sampling using topp\n"; #endif - ker_topp_sample_launcher<_DataType>( + ker_topp_sample_i8I_launcher( _batch_size, _batch_seq_len, 1, _max_thread_per_block, _stream, - _p_d_logit, _p_d_sample_id, _p_d_sample_id_buf, _p_d_real_seq_len, - _tw._src_vocab_size, _tw._topp, _p_d_unfinished, _p_d_curandstate, - _tw._eos_id); + _int8_ffn_out_buf, _p_d_sample_id, _p_d_sample_id_buf, + _p_d_real_seq_len, _tw._src_vocab_size, _tw._topp, _p_d_unfinished, + _p_d_curandstate, _tw._eos_id, _logits_clip_max / _quant_range, true); } int *temp = _p_d_sample_id; _p_d_sample_id = _p_d_sample_id_buf; @@ -532,11 +500,11 @@ void QuantGptEncoder::self_attention(bool cache) { } CHECK_GPU_ERROR(cudaMemcpyAsync( _p_d_self_k_cache2[_layer_id], _p_d_self_k_cache1[_layer_id], - _batch_token_num * _tw._hidden_size * sizeof(_DataType), + _batch_token_num * _tw._hidden_size * sizeof(int8_t), cudaMemcpyDeviceToDevice, stream)); CHECK_GPU_ERROR(cudaMemcpyAsync( _p_d_self_v_cache2[_layer_id], _p_d_self_v_cache1[_layer_id], - _batch_token_num * _tw._hidden_size * sizeof(_DataType), + _batch_token_num * _tw._hidden_size * sizeof(int8_t), cudaMemcpyDeviceToDevice, stream)); } @@ -601,57 +569,34 @@ void QuantGptEncoder::self_attention(bool cache) { template void QuantGptEncoder::self_attention_with_cache() { - _DataType *_p_d_k_cache_cur_layer = _p_d_k_cache + _layer_id * _max_batch_dim; - _DataType *_p_d_v_cache_cur_layer = _p_d_v_cache + _layer_id * _max_batch_dim; - -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_k_cache_cur_layer, "_p_d_k_cache_cur_layer", - _batch_size * (_batch_seq_len - 1) * _tw._hidden_size - 5, - _batch_size * (_batch_seq_len - 1) * _tw._hidden_size); - print_vec(_p_d_v_cache_cur_layer, "_p_d_v_cache_cur_layer", - _batch_size * (_batch_seq_len - 1) * _tw._hidden_size - 5, - _batch_size * (_batch_seq_len - 1) * _tw._hidden_size); - } -#endif - /* ---step 0. layer_norm, add output_bias to "query"--- */ - ker_norm_layer_resual_launcher<_DataType>( - _batch_size, _tw._hidden_size, _stream, _p_d_query, _p_d_q, - _p_d_enc_wei[_weight_offset], _p_d_enc_wei[_weight_offset + 1], - _p_d_enc_wei[_weight_offset + 5], _max_thread_per_block); - -#ifdef DEBUG_RESULT if (_layer_id == 0) { - print_vec(_p_d_query, "input with bias", _batch_size * _tw._hidden_size - 5, - _batch_size * _tw._hidden_size); - print_vec(_p_d_q, "first ln output", _batch_size * _tw._hidden_size - 5, - _batch_size * _tw._hidden_size); + ker_norm_layer_resual_i8O_launcher<_DataType>( + _batch_size, _tw._hidden_size, _stream, _p_d_query, _int8_ffn_in_buf, + _p_device_wei[_weight_offset], _p_device_wei[_weight_offset + 1], + _p_device_wei[_weight_offset + 5], _max_thread_per_block, + _quant_range / _enc_clip_max[_layer_id * 12 + 4], false, true); } -#endif /* ---step 1. qkv = ori_q * qkv_wei + bias, and reshape qkv for multi-head * gemm--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size * 3, _batch_size, - _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 2], _AType, - _tw._hidden_size * 3, _p_d_q, _BType, _tw._hidden_size, &_fzero, - _p_d_qkv_projected, _CType, _tw._hidden_size * 3, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + cublasLtMM_withAlgo_i8IO( + _int8_ffn_out_buf, 1, _batch_size, _tw._hidden_size * 3, _tw._hidden_size, + 0, 0, 0, + _enc_clip_max[_layer_id * 12] * _enc_clip_max[_layer_id * 12 + 4] / + (_enc_clip_max[_layer_id * 12 + 8] * _quant_range), + _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4], _cublas_lt_handle, + _stream, false); -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_qkv_projected, "_p_d_qkv_projected", - _batch_size * _tw._hidden_size * 3 - 5, - _batch_size * _tw._hidden_size * 3); - } -#endif // get q, k, v by split and reshape qkv - ker_arrange_qkv_with_cache_launcher<_DataType>( - _batch_token_num, _tw._hidden_size, _stream, _p_d_qkv_projected, - _p_d_enc_wei[_weight_offset + 3], _p_d_q, _p_d_k, _p_d_k_cache_cur_layer, - _p_d_v, _p_d_v_cache_cur_layer, _max_batch_dim, _batch_seq_len, - _tw._dim_per_head, _tw._head_num); + ker_arrange_qkv_with_cache_i8I_i8O_launcher<_DataType>( + _batch_token_num, _tw._hidden_size, _stream, _int8_ffn_out_buf, + _p_device_wei[_weight_offset + 3], _int8_ffn_in_buf, + _p_d_self_k_cache1[_layer_id], _p_d_self_k_cache2[_layer_id], + _p_d_self_v_cache1[_layer_id], _p_d_self_v_cache2[_layer_id], _p_d_v, + _batch_seq_len, _tw._dim_per_head, _tw._head_num, + _enc_clip_max[_layer_id * 12 + 8] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 11], true); // copy new k and v to cache cudaStream_t stream; @@ -661,52 +606,29 @@ void QuantGptEncoder::self_attention_with_cache() { } else { stream = _stream; } - CHECK_GPU_ERROR( - cudaMemcpyAsync(_p_d_k_cache_cur_layer, _p_d_k, - _batch_token_num * _tw._hidden_size * sizeof(_DataType), - cudaMemcpyDeviceToDevice, stream)); - CHECK_GPU_ERROR( - cudaMemcpyAsync(_p_d_v_cache_cur_layer, _p_d_v, - _batch_token_num * _tw._hidden_size * sizeof(_DataType), - cudaMemcpyDeviceToDevice, stream)); -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_q, "_p_d_q", _batch_size * _tw._hidden_size - 5, - _batch_size * _tw._hidden_size); - print_vec(_p_d_k, "_p_d_k", _batch_token_num * _tw._hidden_size - 5, - _batch_token_num * _tw._hidden_size); - print_vec(_p_d_v, "_p_d_v", _batch_token_num * _tw._hidden_size - 5, - _batch_token_num * _tw._hidden_size); - } -#endif + CHECK_GPU_ERROR(cudaMemcpyAsync( + _p_d_self_k_cache2[_layer_id], _p_d_self_k_cache1[_layer_id], + _batch_token_num * _tw._hidden_size * sizeof(int8_t), + cudaMemcpyDeviceToDevice, stream)); + CHECK_GPU_ERROR(cudaMemcpyAsync( + _p_d_self_v_cache2[_layer_id], _p_d_self_v_cache1[_layer_id], + _batch_token_num * _tw._hidden_size * sizeof(int8_t), + cudaMemcpyDeviceToDevice, stream)); /* ---step 2. correlation = q * k, perform softmax on correlation correlation: [batch_size, heads_num, 1, batch_seq_len]--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( _hd, CUBLAS_OP_T, CUBLAS_OP_N, _batch_seq_len, 1, _tw._dim_per_head, - &_atten_scaler, _p_d_k, _AType, _tw._dim_per_head, - _batch_seq_len * _tw._dim_per_head, _p_d_q, _BType, _tw._dim_per_head, - _tw._dim_per_head, &_fzero, _p_d_c, _CType, _batch_seq_len, - _batch_seq_len, _batch_size * _tw._head_num, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); - -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_c, "q*k", _batch_size * _batch_seq_len * _tw._head_num - 5, - _batch_size * _batch_seq_len * _tw._head_num); - } -#endif - ker_attention_mask_weights_launcher<_DataType>(_batch_size, 1, _batch_seq_len, - _tw._head_num, _stream, _p_d_c, - _p_d_real_seq_len); - -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_c, "mask weights", - _batch_size * _batch_seq_len * _tw._head_num - 5, - _batch_size * _batch_seq_len * _tw._head_num); - } -#endif + &_ione, _p_d_self_k_cache1[_layer_id], CUDA_R_8I, _tw._dim_per_head, + _batch_seq_len * _tw._dim_per_head, _int8_ffn_in_buf, CUDA_R_8I, + _tw._dim_per_head, _tw._dim_per_head, &_izero, _int32_ffn_out_buf, + CUDA_R_32I, _batch_seq_len, _batch_seq_len, _batch_size * _tw._head_num, + CUDA_R_32I, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + + ker_attention_mask_weights_i32I_launcher<_DataType>( + _batch_size, 1, _batch_seq_len, _tw._head_num, _stream, + _int32_ffn_out_buf, _p_d_c, _p_d_real_seq_len, _atten_scaler, + _enc_clip_max[_layer_id * 12 + 11] / _quant_range); /* ---step 3. new_q = correlation * v--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( @@ -717,39 +639,29 @@ void QuantGptEncoder::self_attention_with_cache() { _tw._dim_per_head, _batch_size * _tw._head_num, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_q, "value after attention", - _batch_size * _tw._hidden_size - 5, - _batch_size * _tw._hidden_size); - } -#endif // use v to save reshaped q, since they are in same size and v // will not be use again before the next multi-head-attention - ker_arrange_atten_output_launcher<_DataType>( - _batch_size, _tw._hidden_size, _stream, _p_d_q, _p_d_v, 1, - _tw._dim_per_head, _tw._head_num, _max_thread_per_block); - -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_v, "reshaped value after attention", 0, 5); - print_vec(_p_d_query, "attention input with output bias", 0, 5); - } -#endif + ker_arrange_atten_output_i8O_launcher<_DataType>( + _batch_size, _tw._hidden_size, _stream, _p_d_q, _int8_ffn_in_buf, 1, + _tw._dim_per_head, _tw._head_num, _max_thread_per_block, + _quant_range / _enc_clip_max[_layer_id * 12 + 5], true); /* ---step 4. new_q = ori_q + new_q * output_wei--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_size, - _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 4], _AType, - _tw._hidden_size, _p_d_v, _BType, _tw._hidden_size, &_fone, _p_d_query, - _CType, _tw._hidden_size, _computeType, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + cublasLtMM_withAlgo_i8IO( + _int8_ffn_out_buf, 1, _batch_size, _tw._hidden_size, _tw._hidden_size, 0, + 0, 0, + _enc_clip_max[_layer_id * 12 + 1] * _enc_clip_max[_layer_id * 12 + 5] / + (_enc_clip_max[_layer_id * 12 + 9] * _quant_range), + _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 1], _cublas_lt_handle, + _stream, false); -#ifdef DEBUG_RESULT - if (_layer_id == 0) { - print_vec(_p_d_enc_wei[_weight_offset + 4], "attn out kernel", 0, 5); - print_vec(_p_d_query, "attention output", 0, 5); - } -#endif + ker_residual_bias_ln_i8I_i8O_launcher<_DataType>( + _int8_ffn_out_buf, _p_device_wei[_weight_offset + 6], + _p_device_wei[_weight_offset + 7], _p_device_wei[_weight_offset + 11], + _int8_ffn_in_buf, _p_d_query, _batch_size, _tw._hidden_size, + _enc_clip_max[_layer_id * 12 + 9] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 6], _max_thread_per_block, + _stream, false, true); return; } @@ -818,30 +730,51 @@ void QuantGptEncoder::ffn_add_norm() { template void QuantGptEncoder::ffn_add_norm_with_cache() { - /* ---step 0. layer_norm, add output_bias to "query"--- */ - ker_norm_layer_resual_launcher<_DataType>( - _batch_size, _tw._hidden_size, _stream, _p_d_query, _p_d_ffn_buf1, - _p_d_enc_wei[_weight_offset + 6], _p_d_enc_wei[_weight_offset + 7], - _p_d_enc_wei[_weight_offset + 11], _max_thread_per_block); - /* ---step 1. first ffn layer--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._inner_size, _batch_size, - _tw._hidden_size, &_fone, _p_d_enc_wei[_weight_offset + 8], _AType, - _tw._inner_size, _p_d_ffn_buf1, _BType, _tw._hidden_size, &_fzero, - _p_d_ffn_buf2, _CType, _tw._inner_size, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); - ker_bias_gelu_launcher<_DataType>( - _batch_size, _max_thread_per_block, _stream, _p_d_ffn_buf2, - _p_d_enc_wei[_weight_offset + 9], _tw._inner_size); + cublasLtMM_withAlgo_i8IO( + _int8_ffn_out_buf, 1, _batch_size, _tw._inner_size, _tw._hidden_size, 0, + 0, 0, + _enc_clip_max[_layer_id * 12 + 2] * _enc_clip_max[_layer_id * 12 + 6] / + (_enc_clip_max[_layer_id * 12 + 10] * _quant_range), + _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 2], _cublas_lt_handle, + _stream, false); + + ker_bias_gelu_i8I_i8O_launcher<_DataType>( + _batch_size, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, + _p_device_wei[_weight_offset + 9], _tw._inner_size, + _enc_clip_max[_layer_id * 12 + 10] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 7], true); /* ---step 2. second ffn layer--- */ - CHECK_GPU_ERROR(cublasGemmEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._hidden_size, _batch_size, - _tw._inner_size, &_fone, _p_d_enc_wei[_weight_offset + 10], _AType, - _tw._hidden_size, _p_d_ffn_buf2, _BType, _tw._inner_size, &_fone, - _p_d_query, _CType, _tw._hidden_size, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + cublasLtMM_withAlgo(_int32_ffn_out_buf, 1, _batch_size, _tw._hidden_size, + _tw._inner_size, 0, 0, 0, _int8_ffn_in_buf, + _int8_p_d_enc_wei[_layer_id * 4 + 3], _cublas_lt_handle, + _stream, false); + + const _DataType *scale_ptr, *bias_ptr, *res_bias_ptr; + float clip_max, dequant_scale; + dequant_scale = _enc_clip_max[_layer_id * 12 + 3] * + _enc_clip_max[_layer_id * 12 + 7] / + (_quant_range * _quant_range); + if (_layer_id == _tw._n_enc_layer - 1) { + scale_ptr = _p_device_emb[2]; + bias_ptr = _p_device_emb[3]; + res_bias_ptr = nullptr; + clip_max = _output_ln_clip_max; + } else { + scale_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer]; + bias_ptr = _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer + 1]; + res_bias_ptr = + _p_device_wei[(_layer_id + 1) * _tw._weight_per_enc_layer + 5]; + clip_max = _enc_clip_max[(_layer_id + 1) * 12 + 4]; + } + + ker_residual_bias_ln_i32I_i8O_launcher<_DataType>( + _int32_ffn_out_buf, scale_ptr, bias_ptr, res_bias_ptr, _int8_ffn_in_buf, + _p_d_query, _batch_size, _tw._hidden_size, dequant_scale, + _quant_range / clip_max, _max_thread_per_block, _stream, false, true, + true, _scaled_ffn2_colsum[_layer_id]); + return; } From a37e20fd5a475f93ab7e3f4271872a38e23bd09b Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 28 Apr 2022 00:28:21 +0800 Subject: [PATCH 41/49] support quant decoder sampling --- .../kernels/transformerKernels_int8.cc.cu | 415 ++++++++++++++++++ .../kernels/transformerKernels_int8.h | 23 + lightseq/inference/model/quant_decoder.cc.cu | 22 +- lightseq/inference/model/quant_decoder.h | 1 - 4 files changed, 449 insertions(+), 12 deletions(-) diff --git a/lightseq/inference/kernels/transformerKernels_int8.cc.cu b/lightseq/inference/kernels/transformerKernels_int8.cc.cu index 49c2955a..8028d1b2 100644 --- a/lightseq/inference/kernels/transformerKernels_int8.cc.cu +++ b/lightseq/inference/kernels/transformerKernels_int8.cc.cu @@ -1922,5 +1922,420 @@ template void select_beam_rough_topk_i8I_launcher<__half>( int max_thread_per_block, cudaStream_t stream, int beam_size, float diverse_lambda, int end_id, bool in_col32); +template +__global__ void ker_topk_sample_i8I(const int8_t *logits, const T *logit_bias, + int *old_input_ids, int *new_input_ids, + const int vocab_size, const int max_step, + const int batch_seq_len, int logits_seq_len, + int *unfinished, curandState *curandstate, + int eos_id, float dequant_scale, + bool in_col32) { + int last_token_idx_in_batch = blockIdx.x * max_step + batch_seq_len - 1; + + /* add EOS to end if last token is EOS */ + if (batch_seq_len > 1 && old_input_ids[last_token_idx_in_batch] == eos_id) { + if (threadIdx.x == 0) { + old_input_ids[last_token_idx_in_batch + 1] = eos_id; + } + return; + } + int logits_token_idx_in_batch = + blockIdx.x * logits_seq_len + logits_seq_len - 1; + int left_logit_idx = logits_token_idx_in_batch * vocab_size + threadIdx.x; + int right_logit_idx = (logits_token_idx_in_batch + 1) * vocab_size; + + /* + step1. find max logit and rough Kth logit over the whole vocab + */ + __shared__ float s_max_logit, s_topk_logit; + float rough_top_kth_logit = CUDA_FLOAT_INF_NEG; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32(row_id, col_id, + gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + rough_top_kth_logit = fmaxf( + rough_top_kth_logit, + (float)(logits[logits_idx]) * dequant_scale + + (float)__ldg(&logit_bias[idx - left_logit_idx + threadIdx.x])); + } + float max_logit = blockReduceMax(rough_top_kth_logit); + rough_top_kth_logit = blockRoughTopK(rough_top_kth_logit); + if (threadIdx.x == 0) { + s_topk_logit = rough_top_kth_logit; + s_max_logit = max_logit; + } + __syncthreads(); + + __shared__ int s_tid; + + if (k != 1) { + /* step2 hold one logit per thread which larger than Kth logit and sample + * from them */ + float topk_exp_sum, topk_exp = CUDA_FLOAT_INF_NEG; + int topk_tid = vocab_size; + // int test_num = 0; + __shared__ float s_topk_exp_sum; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32( + row_id, col_id, gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + float logit = + (float)logits[logits_idx] * dequant_scale + + (float)__ldg(&logit_bias[idx - left_logit_idx + threadIdx.x]); + float logit_exp = expf(fmaxf(logit - s_max_logit, logit_thresh_min)); + // if (logit >= s_topk_logit) test_num++; + if (logit >= s_topk_logit && logit_exp > topk_exp) { + topk_exp = logit_exp; + topk_tid = idx - left_logit_idx + threadIdx.x; + } + } + + // test_num = blockReduceSum(test_num); + // __shared__ int s_test_num; + // if (threadIdx.x == 0) { + // s_test_num = test_num; + // if (s_test_num != 1) printf("sample from top %d\n", s_test_num); + // // printf("sample from top %s", test_num); + // } + // __syncthreads(); + + if (topk_tid == vocab_size) topk_exp = 0; + topk_exp_sum = blockReduceSum(topk_exp); + if (threadIdx.x == 0) { + s_topk_exp_sum = topk_exp_sum; + } + __syncthreads(); + + /* calculate cumulative probability */ + float topk_prob = topk_exp / s_topk_exp_sum; + float prefix_sum_prob; + typedef cub::BlockScan BlockScan; + __shared__ typename BlockScan::TempStorage temp_storage; + BlockScan(temp_storage).InclusiveSum(topk_prob, prefix_sum_prob); + + __shared__ float random_x; + if (threadIdx.x == 0) { + random_x = curand_uniform(curandstate + blockIdx.x); + } + __syncthreads(); + + if (threadIdx.x == 0) { + s_tid = vocab_size; + } + __syncthreads(); + + int threadID = threadIdx.x; + __shared__ int s_threadID; + __shared__ float s_max_prob; + if (random_x > prefix_sum_prob) threadID = blockDim.x; + threadID = blockReduceMin(threadID); + float max_prob = blockReduceMax(topk_prob); + if (threadIdx.x == 0) { + s_threadID = threadID; + s_max_prob = max_prob; + } + __syncthreads(); + if (threadIdx.x == s_threadID) { + s_tid = topk_tid; + } + __syncthreads(); + + if (s_tid == vocab_size && topk_prob == s_max_prob) { + s_tid = topk_tid; + } + __syncthreads(); + } else { + s_tid = vocab_size; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32( + row_id, col_id, gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + float logit = + (float)logits[logits_idx] * dequant_scale + + (float)__ldg(&logit_bias[idx - left_logit_idx + threadIdx.x]); + if (logit == s_max_logit) { + s_tid = idx - left_logit_idx + threadIdx.x; + } + } + __syncthreads(); + } + + /* if new sampled tid is not EOS, set unfinish TRUE */ + if (threadIdx.x == 0) { + if (s_tid != eos_id) unfinished[0] = 1; + } + + /* step3 write back new sampled ids */ + if (threadIdx.x == 0) { + old_input_ids[last_token_idx_in_batch + 1] = s_tid; + } +} + +template +void ker_topk_sample_i8I_launcher( + int batch_size, int batch_seq_len, const int max_step, int logits_seq_len, + int max_thread_per_block, cudaStream_t stream, const int8_t *logits, + const T *logit_bias, int *old_input_ids, int *new_input_ids, + const int vocab_size, const int k, int *unfinished, + curandState *curandstate, int eos_id, float dequant_scale, bool in_col32) { + if (k == 1) + ker_topk_sample_i8I<<>>( + logits, logit_bias, old_input_ids, new_input_ids, vocab_size, max_step, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 2) + ker_topk_sample_i8I<<>>( + logits, logit_bias, old_input_ids, new_input_ids, vocab_size, max_step, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 4) + ker_topk_sample_i8I<<>>( + logits, logit_bias, old_input_ids, new_input_ids, vocab_size, max_step, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 8) + ker_topk_sample_i8I<<>>( + logits, logit_bias, old_input_ids, new_input_ids, vocab_size, max_step, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 16) + ker_topk_sample_i8I<<>>( + logits, logit_bias, old_input_ids, new_input_ids, vocab_size, max_step, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else if (k == 32) + ker_topk_sample_i8I<<>>( + logits, logit_bias, old_input_ids, new_input_ids, vocab_size, max_step, + batch_seq_len, logits_seq_len, unfinished, curandstate, eos_id, + dequant_scale, in_col32); + else { + throw std::invalid_argument("topk argument should be in [1,2,4,8,16,32]"); + } +} + +template void ker_topk_sample_i8I_launcher( + int batch_size, int batch_seq_len, const int max_step, int logits_seq_len, + int max_thread_per_block, cudaStream_t stream, const int8_t *logits, + const float *logit_bias, int *old_input_ids, int *new_input_idx, + const int vocab_size, const int k, int *unfinished, + curandState *curandstate, int eos_id, float dequant_scale, bool in_col32); + +template void ker_topk_sample_i8I_launcher<__half>( + int batch_size, int batch_seq_len, const int max_step, int logits_seq_len, + int max_thread_per_block, cudaStream_t stream, const int8_t *logits, + const __half *logit_bias, int *old_input_ids, int *new_input_idx, + const int vocab_size, const int k, int *unfinished, + curandState *curandstate, int eos_id, float dequant_scale, bool in_col32); + +template +__global__ void ker_topp_sample_i8I(const int8_t *logits, const T *logit_bias, + int *old_input_ids, int *new_input_ids, + const int vocab_size, const int max_step, + const int batch_seq_len, int logits_seq_len, + int *unfinished, float p, + curandState *curandstate, int eos_id, + float dequant_scale, bool in_col32) { + int token_idx_in_batch = blockIdx.x * max_step + batch_seq_len - 1; + + /* add EOS to end if last token is EOS */ + if (batch_seq_len > 1 && old_input_ids[token_idx_in_batch] == eos_id) { + if (threadIdx.x == 0) { + old_input_ids[token_idx_in_batch + 1] = eos_id; + } + return; + } + int logits_token_idx_in_batch = + blockIdx.x * logits_seq_len + logits_seq_len - 1; + int left_logit_idx = logits_token_idx_in_batch * vocab_size + threadIdx.x; + int right_logit_idx = (logits_token_idx_in_batch + 1) * vocab_size; + + /* step1. find max logit in each thread and sample from these probs with + * nucleus sampling */ + __shared__ float s_max_logit; + float max_logit = CUDA_FLOAT_INF_NEG; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32(row_id, col_id, + gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + max_logit = fmaxf(max_logit, (float)logits[logits_idx] * dequant_scale) + + (float)__ldg(&logit_bias[idx - left_logit_idx + threadIdx.x]); + } + float max_logit_array[1]; + max_logit_array[0] = max_logit; + typedef cub::BlockRadixSort BlockRadixSort; + __shared__ typename BlockRadixSort::TempStorage sort_temp_storage; + BlockRadixSort(sort_temp_storage).SortDescending(max_logit_array); + float presum_max_logit_exp; + max_logit = max_logit_array[0]; + + float block_max_logit = blockReduceMax(max_logit); + if (threadIdx.x == 0) { + s_max_logit = block_max_logit; + } + __syncthreads(); + + float biased_logit_exp = + expf(fmaxf(max_logit - s_max_logit, logit_thresh_min)); + + typedef cub::BlockScan BlockScan; + __shared__ typename BlockScan::TempStorage presum_temp_storage; + BlockScan(presum_temp_storage) + .InclusiveSum(biased_logit_exp, presum_max_logit_exp); + + float topp_exp_threshold; + if (threadIdx.x == blockDim.x - 1) { + topp_exp_threshold = p * presum_max_logit_exp; + } + __shared__ float s_presum_logit_exp_threshold; + if (presum_max_logit_exp > topp_exp_threshold) { + presum_max_logit_exp = CUDA_FLOAT_INF_NEG; + } + float logit_exp_threshold = blockReduceMax(presum_max_logit_exp); + if (threadIdx.x == 0) { + s_presum_logit_exp_threshold = logit_exp_threshold; + } + __syncthreads(); + + __shared__ float s_logit_threshold; + if (presum_max_logit_exp == s_presum_logit_exp_threshold) { + s_logit_threshold = max_logit; + } + __syncthreads(); + + /* step2 hold one logit per thread which larger than Kth logit and sample + * from them */ + float topk_exp_sum, topk_exp = CUDA_FLOAT_INF_NEG; + int topk_tid = vocab_size; + int test_num = 0; + __shared__ float s_topk_exp_sum; + for (int idx = left_logit_idx; idx < right_logit_idx; idx += blockDim.x) { + int logits_idx; + if (in_col32) { + int row_id = logits_token_idx_in_batch; + int col_id = idx - logits_token_idx_in_batch * vocab_size; + logits_idx = row_major2flat_col32(row_id, col_id, + gridDim.x * logits_seq_len, vocab_size); + } else { + logits_idx = idx; + } + float logit = (float)logits[logits_idx] * dequant_scale + + (float)__ldg(&logit_bias[idx - left_logit_idx + threadIdx.x]); + float logit_exp = expf(fmaxf(logit - s_max_logit, logit_thresh_min)); + if (logit >= s_logit_threshold) test_num++; + if (logit >= s_logit_threshold && logit_exp > topk_exp) { + topk_exp = logit_exp; + topk_tid = idx - left_logit_idx + threadIdx.x; + } + } + + test_num = blockReduceSum(test_num); + + if (topk_tid == vocab_size) topk_exp = 0; + topk_exp_sum = blockReduceSum(topk_exp); + if (threadIdx.x == 0) { + s_topk_exp_sum = topk_exp_sum; + } + __syncthreads(); + + /* calculate cumulative probability */ + float topk_prob = topk_exp / s_topk_exp_sum; + float prefix_sum_prob; + BlockScan(presum_temp_storage).InclusiveSum(topk_prob, prefix_sum_prob); + + __shared__ float random_x; + if (threadIdx.x == 0) { + random_x = curand_uniform(curandstate + blockIdx.x); + } + __syncthreads(); + + __shared__ int s_tid; + if (threadIdx.x == 0) { + s_tid = vocab_size; + } + __syncthreads(); + + int threadID = threadIdx.x; + __shared__ int s_threadID; + __shared__ float s_max_prob; + if (random_x > prefix_sum_prob) threadID = blockDim.x; + threadID = blockReduceMin(threadID); + float max_prob = blockReduceMax(topk_prob); + if (threadIdx.x == 0) { + s_threadID = threadID; + s_max_prob = max_prob; + } + __syncthreads(); + if (threadIdx.x == s_threadID) { + s_tid = topk_tid; + } + __syncthreads(); + + if (s_tid == vocab_size && topk_prob == s_max_prob) { + s_tid = topk_tid; + } + __syncthreads(); + + /* if new sampled tid is not EOS, set unfinish TRUE */ + if (threadIdx.x == 0) { + if (s_tid != eos_id) unfinished[0] = 1; + } + + /* step3 write back new sampled ids */ + if (threadIdx.x == 0) { + old_input_ids[token_idx_in_batch + 1] = s_tid; + } +} + +template +void ker_topp_sample_i8I_launcher( + int batch_size, int batch_seq_len, const int max_step, int logits_seq_len, + int max_thread_per_block, cudaStream_t stream, const int8_t *logits, + const T *logit_bias, int *old_input_ids, int *new_input_ids, + const int vocab_size, const float p, int *unfinished, + curandState *curandstate, int eos_id, float dequant_scale, bool in_col32) { + ker_topp_sample_i8I<<>>( + logits, logit_bias, old_input_ids, new_input_ids, vocab_size, max_step, + batch_seq_len, logits_seq_len, unfinished, p, curandstate, eos_id, + dequant_scale, in_col32); +} + +template void ker_topp_sample_i8I_launcher( + int batch_size, int batch_seq_len, const int max_step, int logits_seq_len, + int max_thread_per_block, cudaStream_t stream, const int8_t *logits, + const float *logit_bias, int *old_input_ids, int *new_input_idx, + const int vocab_size, const float p, int *unfinished, + curandState *curandstate, int eos_id, float dequant_scale, bool in_col32); + +template void ker_topp_sample_i8I_launcher<__half>( + int batch_size, int batch_seq_len, const int max_step, int logits_seq_len, + int max_thread_per_block, cudaStream_t stream, const int8_t *logits, + const __half *logit_bias, int *old_input_ids, int *new_input_idx, + const int vocab_size, const float p, int *unfinished, + curandState *curandstate, int eos_id, float dequant_scale, bool in_col32); + } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/kernels/transformerKernels_int8.h b/lightseq/inference/kernels/transformerKernels_int8.h index ce8ac1d8..3913973a 100644 --- a/lightseq/inference/kernels/transformerKernels_int8.h +++ b/lightseq/inference/kernels/transformerKernels_int8.h @@ -2,6 +2,7 @@ #include #include +#include #include namespace lightseq { @@ -118,5 +119,27 @@ void select_beam_rough_topk_i8I_launcher( int max_thread_per_block, cudaStream_t stream, int beam_size, float diverse_lambda, int end_id, bool in_col32 = false); +template +void ker_topk_sample_i8I_launcher(int batch_size, int batch_seq_len, + const int max_step, int logits_seq_len, + int max_thread_per_block, cudaStream_t stream, + const int8_t *logits, const T *logit_bias, + int *old_input_ids, int *new_input_ids, + const int vocab_size, const int k, + int *all_finished, curandState *curandstate, + int eos_id, float dequant_scale, + bool in_col32 = false); + +template +void ker_topp_sample_i8I_launcher(int batch_size, int batch_seq_len, + const int max_step, int logits_seq_len, + int max_thread_per_block, cudaStream_t stream, + const int8_t *logits, const T *logit_bias, + int *old_input_ids, int *new_input_ids, + const int vocab_size, const float p, + int *unfinished, curandState *curandstate, + int eos_id, float dequant_scale, + bool in_col32 = false); + } // namespace cuda } // namespace lightseq diff --git a/lightseq/inference/model/quant_decoder.cc.cu b/lightseq/inference/model/quant_decoder.cc.cu index 604d6713..9b366f07 100644 --- a/lightseq/inference/model/quant_decoder.cc.cu +++ b/lightseq/inference/model/quant_decoder.cc.cu @@ -904,22 +904,23 @@ void QuantDecoder::ffn_add_norm() { template bool QuantDecoder::sample() { - throw std::runtime_error("QuantDecoder sample() not implemented"); CHECK_GPU_ERROR( cudaMemsetAsync(_p_d_sample_unfinished, 0, sizeof(int), _stream)); /* --- Sample new tokens from logits --- */ if (_tw._sampling_method == "topk") { - ker_topk_sample_launcher<_DataType>( + ker_topk_sample_i8I_launcher<_DataType>( _batch_size, (_cur_step + 1), _tw._max_step, 1, _max_thread_per_block, - _stream, _p_d_logit_buf, _p_device_emb[6], _p_d_alive_seq, + _stream, _int8_ffn_out_buf, _p_device_emb[6], _p_d_alive_seq, _p_d_alive_seq_buf, _tw._trg_vocab_size, _tw._topk, - _p_d_sample_unfinished, _p_d_curandstate, _tw._end_id); + _p_d_sample_unfinished, _p_d_curandstate, _tw._end_id, + _logits_clip_max / _quant_range, true); } else { - ker_topp_sample_launcher<_DataType>( + ker_topp_sample_i8I_launcher<_DataType>( _batch_size, (_cur_step + 1), _tw._max_step, 1, _max_thread_per_block, - _stream, _p_d_logit_buf, _p_device_emb[6], _p_d_alive_seq, + _stream, _int8_ffn_out_buf, _p_device_emb[6], _p_d_alive_seq, _p_d_alive_seq_buf, _tw._trg_vocab_size, _tw._topp, - _p_d_sample_unfinished, _p_d_curandstate, _tw._end_id); + _p_d_sample_unfinished, _p_d_curandstate, _tw._end_id, + _logits_clip_max / _quant_range, true); } #ifdef DEBUG_RESULT print_vec(_p_d_sample_unfinished, "unfinished flag", 1); @@ -1052,7 +1053,6 @@ void QuantDecoder::update_new_seq_probs() { template bool QuantDecoder::topk_greedy_search() { - throw std::runtime_error("QuantDecoder topk_greedy_search() not implemented"); _tw._diverse_lambda = 0; if (_cur_step == 0) { return beam_search(); @@ -1061,11 +1061,11 @@ bool QuantDecoder::topk_greedy_search() { CHECK_GPU_ERROR( cudaMemsetAsync(_p_d_sample_unfinished, 0, sizeof(int), _stream)); /* --- Sample new tokens from logits --- */ - ker_topk_sample_launcher<_DataType>( + ker_topk_sample_i8I_launcher<_DataType>( _step_token_num, (_cur_step + 1), _tw._max_step, 1, _max_thread_per_block, - _stream, _p_d_logit_buf, _p_device_emb[6], _p_d_alive_seq, + _stream, _int8_ffn_out_buf, _p_device_emb[6], _p_d_alive_seq, _p_d_alive_seq_buf, _tw._trg_vocab_size, 1, _p_d_sample_unfinished, - _p_d_curandstate, _tw._end_id); + _p_d_curandstate, _tw._end_id, _logits_clip_max / _quant_range, true); #ifdef DEBUG_RESULT print_vec(_p_d_sample_unfinished, "unfinished flag", 1); diff --git a/lightseq/inference/model/quant_decoder.h b/lightseq/inference/model/quant_decoder.h index e31b8a05..9274e0fb 100644 --- a/lightseq/inference/model/quant_decoder.h +++ b/lightseq/inference/model/quant_decoder.h @@ -101,7 +101,6 @@ class QuantDecoder { _DataType* _p_d_query_buf2; _DataType* _p_d_c; _DataType* _p_d_encoder_out_buf; - _DataType* _p_d_logit_buf; int8_t* _int8_ffn_in_buf; int32_t* _int32_ffn_out_buf; From 305ef739550c6fe36fef4e1c0850aff6095c3819 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 28 Apr 2022 00:44:49 +0800 Subject: [PATCH 42/49] modify readme (add install command) --- README.md | 20 +++++++-- examples/inference/python/README.md | 16 +++---- examples/training/custom/README.md | 2 +- examples/training/deepspeed/README.md | 4 +- examples/training/fairseq/README.md | 8 ++-- examples/training/huggingface/bert/README.md | 4 +- examples/training/huggingface/gpt/README.md | 4 +- examples/training/neurst/README.md | 18 ++++---- lightseq/inference/README.md | 44 ++++++++++---------- lightseq/training/README.md | 10 ++--- 10 files changed, 71 insertions(+), 59 deletions(-) diff --git a/README.md b/README.md index e9e43b7a..741e9046 100644 --- a/README.md +++ b/README.md @@ -66,6 +66,20 @@ More results is available [here](./docs/inference/performance.md). ## Quick Start Complete user guide is available [here](docs/guide.md). +### Installation +You can install LightSeq from PyPI: +```shell +$ pip install lightseq +``` + +LightSeq installation from PyPi only supports Python 3.6 to 3.8 on Linux for now. Consider compiling from source if you have other environments: +```shell +$ PATH=/usr/local/hdf5/:$PATH ENABLE_FP32=0 ENABLE_DEBUG=0 pip install -e $PROJECT_DIR +``` + +Detailed building introduction is available [here](docs/inference/build.md). + + ### Fast training from Fairseq You can experience lightning fast training by running following commands, @@ -97,12 +111,10 @@ $ cd examples/inference/python then you can check the performance by simply running following commands. `hf_bart_export.py` is used to transform pytorch weights to LightSeq protobuffer. ```shell -python export/huggingface/hf_bart_export.py -python test/ls_bart.py +$ python export/huggingface/hf_bart_export.py +$ python test/ls_bart.py ``` -LightSeq installation from pypi only supports python 3.6 to 3.8 on Linux for now. Consider compiling from source if you have other environments. - More usage is available [here](./lightseq/inference/README.md). ### Fast deploy inference server diff --git a/examples/inference/python/README.md b/examples/inference/python/README.md index 595fa688..da721458 100644 --- a/examples/inference/python/README.md +++ b/examples/inference/python/README.md @@ -3,7 +3,7 @@ This repo contains examples of exporting models (LightSeq, Fairseq based, Huggin Before doing anything, you need to switch to the current directory: ```shell -cd examples/inference/python +$ cd examples/inference/python ``` ## Model export @@ -32,31 +32,31 @@ We provide the following export examples. All Fairseq based models are trained u ### Hugging Face models 1. BART ```shell -python test/ls_bart.py +$ python test/ls_bart.py ``` 2. BERT ```shell -python test/ls_bert.py +$ python test/ls_bert.py ``` 3. GPT2 ```shell -python test/ls_gpt2.py +$ python test/ls_gpt2.py ``` 4. ViT ```shell -python test/ls_vit.py +$ python test/ls_vit.py ``` 5. Quantized BERT ```shell -python test/ls_quant_bert.py +$ python test/ls_quant_bert.py ``` 6. Quantized GPT2 ```shell -python test/ls_quant_gpt.py +$ python test/ls_quant_gpt.py ``` ### Fairseq based models After exporting the Fairseq based models to protobuf/hdf5 format using above scripts, we can use the following script for fast LightSeq inference on wmt14 en2de dateset, compatible with fp16 and int8 models: ```shell -bash test/ls_fairseq.sh --model ${model_path} +$ bash test/ls_fairseq.sh --model ${model_path} ``` diff --git a/examples/training/custom/README.md b/examples/training/custom/README.md index be38ed7d..27494f33 100644 --- a/examples/training/custom/README.md +++ b/examples/training/custom/README.md @@ -6,7 +6,7 @@ The source inputs of the encoder are batch of sentences and the target outputs o You can run the example simplely by: ```shell -python examples/training/custom/run.py +$ python examples/training/custom/run.py ``` If it runs successfully, you will see the following output: diff --git a/examples/training/deepspeed/README.md b/examples/training/deepspeed/README.md index ac078949..63c88ff3 100644 --- a/examples/training/deepspeed/README.md +++ b/examples/training/deepspeed/README.md @@ -3,12 +3,12 @@ This repo contains an example for how to use LightSeq to accerate the training o First you should install these requirements. ```shell -pip install torch ninja fairseq deepspeed +$ pip install torch ninja fairseq deepspeed ``` Then you can train a translation task on wmt14 en2de dataset by running the following script: ```shell -sh examples/training/deepspeed/ds_fairseq_wmt14en2de.sh +$ sh examples/training/deepspeed/ds_fairseq_wmt14en2de.sh ``` This script firstly download the dataset, and then run native Fairseq training script using DeepSpeed launcher without any other parameter modifications. diff --git a/examples/training/fairseq/README.md b/examples/training/fairseq/README.md index 221bd0dd..093ddfe7 100644 --- a/examples/training/fairseq/README.md +++ b/examples/training/fairseq/README.md @@ -3,13 +3,13 @@ This repo contains examples for how to use LightSeq to accerate the training of First you should install these requirements. ```shell -pip install lightseq fairseq sacremoses +$ pip install lightseq fairseq sacremoses ``` ## Train Then you can train a translation task on wmt14 en2de dataset using LightSeq by running the following script: ```shell -sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh +$ sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh ``` Or you can use LightSeq modules like `--arch ls_transformer_wmt_en_de_big_t2t`, @@ -34,7 +34,7 @@ to switch to fairseq modules. ## Evaluation Then you can evaluate on wmt14 en2de dataset by running the following command: ```shell -lightseq-validate /tmp/wmt14_en_de/ \ +$ lightseq-validate /tmp/wmt14_en_de/ \ --valid-subset valid \ --path checkpoints/checkpoint_best.pt \ --task translation \ @@ -47,7 +47,7 @@ lightseq-validate /tmp/wmt14_en_de/ \ ## Generate You can also generate on wmt14 en2de dataset by running the following command: ```shell -lightseq-generate /tmp/wmt14_en_de/ \ +$ lightseq-generate /tmp/wmt14_en_de/ \ --gen-subset test \ --path checkpoints/checkpoint_best.pt \ --task translation \ diff --git a/examples/training/huggingface/bert/README.md b/examples/training/huggingface/bert/README.md index b3f8a0b3..77dde9aa 100644 --- a/examples/training/huggingface/bert/README.md +++ b/examples/training/huggingface/bert/README.md @@ -7,12 +7,12 @@ We modify the examples like token classification [examples](https://github.com/h First you should install these requirements. ```shell -pip install torch ninja transformers seqeval datasets +$ pip install torch ninja transformers seqeval datasets ``` Before doing next training, you need to switch to the current directory: ```shell -cd examples/training/huggingface/bert +$ cd examples/training/huggingface/bert ``` Then you can easily fine-tunes BERT on different tasks by running the bash scripts `task_ner/run_ner.sh` diff --git a/examples/training/huggingface/gpt/README.md b/examples/training/huggingface/gpt/README.md index 76bfc84a..fe80f415 100644 --- a/examples/training/huggingface/gpt/README.md +++ b/examples/training/huggingface/gpt/README.md @@ -7,8 +7,8 @@ We modify the language modeling [examples](https://github.com/huggingface/transf First you should install these requirements. ```shell -pip install -r requirements.txt -bash run_clm.sh +$ pip install -r requirements.txt +$ bash run_clm.sh ``` Before running the script.make sure your pytorch worksfine with cuda, lightseq doesn't support pytorch cpu mode. You can verify your pytorch on CUDA by the following code. diff --git a/examples/training/neurst/README.md b/examples/training/neurst/README.md index bb3ded25..d755155e 100644 --- a/examples/training/neurst/README.md +++ b/examples/training/neurst/README.md @@ -3,27 +3,27 @@ This repo contains an example for how to use LightSeq to accerate the training o First you should install these requirements. ```shell -pip install subword-nmt pyyaml sacrebleu sacremoses -git clone https://github.com/moses-smt/mosesdecoder.git +$ pip install subword-nmt pyyaml sacrebleu sacremoses +$ git clone https://github.com/moses-smt/mosesdecoder.git ``` Then clone NeurST and switch to lightseq branch. ```shell -git clone https://github.com/bytedance/neurst.git -cd neurst/ -git checkout lightseq -pip install -e . +$ git clone https://github.com/bytedance/neurst.git +$ cd neurst/ +$ git checkout lightseq +$ pip install -e . ``` Install lightseq ```shell -pip install http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/lightseq/tensorflow/lightseq_tf-2.0.1-cp37-cp37m-linux_x86_64.whl +$ pip install http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/lightseq/tensorflow/lightseq_tf-2.0.1-cp37-cp37m-linux_x86_64.whl ``` Download and preprocess data ```shell -./examples/translation/prepare-wmt14en2de-bpe.sh ../mosesdecoder +$ ./examples/translation/prepare-wmt14en2de-bpe.sh ../mosesdecoder ``` Traing the model ```shell -python3 -m neurst.cli.run_exp \ +$ python3 -m neurst.cli.run_exp \ --config_paths wmt14_en_de/training_args.yml,wmt14_en_de/translation_bpe.yml \ --hparams_set transformer_base \ --model_dir wmt14_en_de/benchmark_base \ diff --git a/lightseq/inference/README.md b/lightseq/inference/README.md index 10577b7f..53f1b27e 100644 --- a/lightseq/inference/README.md +++ b/lightseq/inference/README.md @@ -65,15 +65,15 @@ More results is available [here](../../docs/inference/performance.md). We provide an end2end bart-base example to see how fast Lightseq is compared to HuggingFace. First you should install these requirements. ```shell -pip install torch tensorflow transformers lightseq -cd examples/inference/python +$ pip install torch tensorflow transformers lightseq +$ cd examples/inference/python ``` then you can check the performance by simply running following commands. `hf_bart_export.py` is used to transform pytorch weights to LightSeq protobuffer. ```shell -python export/huggingface/hf_bart_export.py -python test/ls_bart.py +$ python export/huggingface/hf_bart_export.py +$ python test/ls_bart.py ``` on our Tesla V100 we can get following output, 10x speedup have been obtained from running LightSeq rather than HuggingFace. @@ -108,8 +108,8 @@ We provide python api to call lightseq, all you need is to install `lightseq` wi And check these files `lightseq/inference/proto/*.proto` to prepare your model weights. We provide an example weight file for you to test. ```shell -curl -OL https://github.com/bytedance/lightseq/releases/download/v0.0.1/transformer_weight.tar.gz -tar -zxvf transformer_weight.tar.gz +$ curl -OL https://github.com/bytedance/lightseq/releases/download/v0.0.1/transformer_weight.tar.gz +$ tar -zxvf transformer_weight.tar.gz ``` Finally you can run lightseq in only a few lines! @@ -138,12 +138,12 @@ To avoid problems caused by inconsistent environments, you can use the pre-built [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) and make your GPU driver version >= 410.48 ```shell -docker pull nvcr.io/nvidia/tensorrtserver:19.05-py3 +$ docker pull nvcr.io/nvidia/tensorrtserver:19.05-py3 # -docker run --gpus '"device=0"' -it --rm -p8000:8000 -p8001:8001 -p8002:8002 -v +$ docker run --gpus '"device=0"' -it --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /${current}/${path}:/quick_start nvcr.io/nvidia/tensorrtserver:19.05-py3 /bin/bash # inside container -cd /quick_start +$ cd /quick_start ``` ### Use our pre-build lib @@ -154,8 +154,8 @@ version, we will upload binary executable example and dynamic link library of mo custom backend of TRTIS. ```shell -wget https://github.com/bytedance/lightseq/releases/download/${VERSION}/${VERSION}_libs.tar.gz -tar -zxvf ${VERSION}_libs.tar.gz +$ wget https://github.com/bytedance/lightseq/releases/download/${VERSION}/${VERSION}_libs.tar.gz +$ tar -zxvf ${VERSION}_libs.tar.gz ``` ### Run local inference demo @@ -164,12 +164,12 @@ To run local inference demo, you need to prepare model weights saved in custom p LightSeq and input token ids. We provide a GPT-LM model and its corresponding input token ids: ```shell -wget https://github.com/bytedance/lightseq/releases/download/v0.0.1/v0.0.1_gptlm.pkg.tar.gz -tar -zxvf v0.0.1_gptlm.pkg.tar.gz +$ wget https://github.com/bytedance/lightseq/releases/download/v0.0.1/v0.0.1_gptlm.pkg.tar.gz +$ tar -zxvf v0.0.1_gptlm.pkg.tar.gz # fp32 example -./{VERSION}_libs/gptlm_example.fp32 ./v0.0.1_gptlm.pkg/gpt.pb ./v0.0.1_gptlm.pkg/test_case +$ ./{VERSION}_libs/gptlm_example.fp32 ./v0.0.1_gptlm.pkg/gpt.pb ./v0.0.1_gptlm.pkg/test_case # fp16 example -./{VERSION}_libs/gptlm_example.fp16 ./v0.0.1_gptlm.pkg/gpt.pb ./v0.0.1_gptlm.pkg/test_case +$ ./{VERSION}_libs/gptlm_example.fp16 ./v0.0.1_gptlm.pkg/gpt.pb ./v0.0.1_gptlm.pkg/test_case ``` To run the end-to-end model server based on TRTIS, you need to prepare a custom backend [model @@ -187,15 +187,15 @@ models/ With the pre-built libraries and example weights mentioned above, you can easily run a server: ```shell -mkdir -p ./model_zoo/gptlm/1 -wget https://github.com/bytedance/lightseq/releases/download/v0.0.1/v0.0.1_gptlm.config.pbtxt -mv v0.0.1_gptlm.config.pbtxt model_zoo/gptlm/config.pbtxt -cp ./v0.0.1_gptlm.pkg/gpt.pb model_zoo/gptlm/gpt.pb -cp ./{VERSION}_libs/libgptlm.so.fp32 model_zoo/gptlm/1/libgptlm.so +$ mkdir -p ./model_zoo/gptlm/1 +$ wget https://github.com/bytedance/lightseq/releases/download/v0.0.1/v0.0.1_gptlm.config.pbtxt +$ mv v0.0.1_gptlm.config.pbtxt model_zoo/gptlm/config.pbtxt +$ cp ./v0.0.1_gptlm.pkg/gpt.pb model_zoo/gptlm/gpt.pb +$ cp ./{VERSION}_libs/libgptlm.so.fp32 model_zoo/gptlm/1/libgptlm.so # or fp16 server # cp ./{VERSION}_libs/libgptlm.so.fp16 model_zoo/gptlm/1/libgptlm.so -export MODEL_ZOO="/quick_start/model_zoo" -trtserver --model-store=${MODEL_ZOO} +$ export MODEL_ZOO="/quick_start/model_zoo" +$ trtserver --model-store=${MODEL_ZOO} ``` After starting server, Invoking the [TRTIS diff --git a/lightseq/training/README.md b/lightseq/training/README.md index fb5dd74d..65656a31 100644 --- a/lightseq/training/README.md +++ b/lightseq/training/README.md @@ -58,15 +58,15 @@ We compute speedup on different batch size using the WPS (real words per second) To install LightSeq training library, ```shell -pip install lightseq +$ pip install lightseq ``` or install in develop mode, ```shell -git clone https://github.com/bytedance/lightseq.git -cd lightseq -pip install -e . +$ git clone https://github.com/bytedance/lightseq.git +$ cd lightseq +$ pip install -e . ``` ### TensorFlow @@ -75,7 +75,7 @@ pip install -e . - Cuda version = 11.0 - To install LightSeq training library: ```shell -pip install http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/lightseq/tensorflow/lightseq_tf-2.0.1-cp37-cp37m-linux_x86_64.whl +$ pip install http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/lightseq/tensorflow/lightseq_tf-2.0.1-cp37-cp37m-linux_x86_64.whl ``` ## Usage From c1141d846b2e89a7f43d282538722706c4abb019 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 28 Apr 2022 01:56:21 +0800 Subject: [PATCH 43/49] optimizer quant gpt gemm, fix gelu bug --- .../kernels/transformerKernels_int8.cc.cu | 149 ++++++++---------- .../kernels/transformerKernels_int8.h | 15 +- .../inference/model/quant_bert_encoder.cc.cu | 4 +- lightseq/inference/model/quant_decoder.cc.cu | 2 +- lightseq/inference/model/quant_encoder.cc.cu | 4 +- .../inference/model/quant_gpt_encoder.cc.cu | 54 +++---- 6 files changed, 103 insertions(+), 125 deletions(-) diff --git a/lightseq/inference/kernels/transformerKernels_int8.cc.cu b/lightseq/inference/kernels/transformerKernels_int8.cc.cu index 8028d1b2..85c5d736 100644 --- a/lightseq/inference/kernels/transformerKernels_int8.cc.cu +++ b/lightseq/inference/kernels/transformerKernels_int8.cc.cu @@ -864,85 +864,72 @@ template void ker_residual_bias_ln_i32I_launcher( template __global__ void ker_bias_gelu_i8I_i8O(int8_t *input, int8_t *output, - const T *bias, int total_count, - int feature_dim, float dequant_scale, - float quant_scale, bool in_out_col32) { - int i = blockIdx.x * blockDim.x + threadIdx.x; - - if (i * 4 >= total_count) return; + const T *bias, int feature_dim, + float dequant_scale, float quant_scale, + bool in_col32, bool out_col32) { + int block_start = blockIdx.x * feature_dim; + int start = block_start + threadIdx.x; + int end = block_start + feature_dim; + for (int i = start; i < end; i += blockDim.x) { + int input_index; + if (in_col32) { + int row_id = blockIdx.x; + int col_id = i - block_start; + input_index = + row_major2flat_col32(row_id, col_id, gridDim.x, feature_dim); + } else { + input_index = i; + } - char4 *out4 = reinterpret_cast(output); - const char4 *data4 = reinterpret_cast(input); - const float4 *bias4 = reinterpret_cast(bias); + float fout = gelu(float(input[input_index]) * dequant_scale + + __ldg(&bias[i - block_start])); - int bias_i; - if (in_out_col32) { - int row_size = total_count / feature_dim; - int flat_i = i << 2; - int col_id = (flat_i / (row_size * 32)) * 32 + (flat_i & 31); - bias_i = col_id >> 2; - } else { - bias_i = i % (feature_dim >> 2); + int output_index; + if (out_col32) { + int row_id = blockIdx.x; + int col_id = i - block_start; + output_index = + row_major2flat_col32(row_id, col_id, gridDim.x, feature_dim); + } else { + output_index = i; + } + output[output_index] = float2int8(fout, quant_scale); } - - const char4 input4 = data4[i]; - const float4 b4 = __ldg(&bias4[bias_i]); - float4 output4; - - output4.x = gelu(float(input4.x) * dequant_scale + b4.x); - output4.y = gelu(float(input4.y) * dequant_scale + b4.y); - output4.z = gelu(float(input4.z) * dequant_scale + b4.z); - output4.w = gelu(float(input4.w) * dequant_scale + b4.w); - - char4 out_i4; - out_i4.x = float2int8(output4.x, quant_scale); - out_i4.y = float2int8(output4.y, quant_scale); - out_i4.z = float2int8(output4.z, quant_scale); - out_i4.w = float2int8(output4.w, quant_scale); - out4[i] = out_i4; } /* fp16 version */ template <> -__global__ void ker_bias_gelu_i8I_i8O<__half>(int8_t *input, int8_t *output, - const __half *bias, - int total_count, int feature_dim, - float dequant_scale, - float quant_scale, - bool in_out_col32) { - int i = blockIdx.x * blockDim.x + threadIdx.x; - - if (i * 8 >= total_count) return; - - const int2 *vals_int2 = reinterpret_cast(input); - int64_t *outs_i8 = reinterpret_cast(output); - const float4 *bias4 = reinterpret_cast(bias); - - int bias_i; - if (in_out_col32) { - int row_size = total_count / feature_dim; - int flat_i = i << 3; - int col_id = (flat_i / (row_size * 32)) * 32 + (flat_i & 31); - bias_i = col_id >> 3; - } else { - bias_i = i % (feature_dim >> 3); - } +__global__ void ker_bias_gelu_i8I_i8O<__half>( + int8_t *input, int8_t *output, const __half *bias, int feature_dim, + float dequant_scale, float quant_scale, bool in_col32, bool out_col32) { + int block_start = blockIdx.x * feature_dim; + int start = block_start + threadIdx.x; + int end = block_start + feature_dim; + for (int i = start; i < end; i += blockDim.x) { + int input_index; + if (in_col32) { + int row_id = blockIdx.x; + int col_id = i - block_start; + input_index = + row_major2flat_col32(row_id, col_id, gridDim.x, feature_dim); + } else { + input_index = i; + } - int2 val_int2 = vals_int2[i]; - int8_t *val1 = reinterpret_cast(&val_int2); - const float4 b4 = __ldg(&bias4[bias_i]); - const __half *b_half = reinterpret_cast(&b4); - int64_t out_i8; - int8_t *out_i1 = reinterpret_cast(&out_i8); + float fout = gelu(float(input[input_index]) * dequant_scale + + __half2float(__ldg(&bias[i - block_start]))); -#pragma unroll - for (int j = 0; j < 8; ++j) { - float out_f; - out_f = - gelu(float(val1[j]) * dequant_scale + __half2float(b_half[j])); - out_i1[j] = float2int8(out_f, quant_scale); + int output_index; + if (out_col32) { + int row_id = blockIdx.x; + int col_id = i - block_start; + output_index = + row_major2flat_col32(row_id, col_id, gridDim.x, feature_dim); + } else { + output_index = i; + } + output[output_index] = float2int8(fout, quant_scale); } - outs_i8[i] = out_i8; } template @@ -950,35 +937,31 @@ void ker_bias_gelu_i8I_i8O_launcher(int batch_token_num, cudaStream_t stream, int8_t *input, int8_t *output, const T *bias, int feature_dim, float dequant_scale, float quant_scale, - bool in_out_col32) { - int total_count = batch_token_num * feature_dim; - int grid_dim = total_count >> 10; - ker_bias_gelu_i8I_i8O<<>>( - input, output, bias, total_count, feature_dim, dequant_scale, quant_scale, - in_out_col32); + bool in_col32, bool out_col32) { + ker_bias_gelu_i8I_i8O<<>>( + input, output, bias, feature_dim, dequant_scale, quant_scale, in_col32, + out_col32); } template <> void ker_bias_gelu_i8I_i8O_launcher<__half>( int batch_token_num, cudaStream_t stream, int8_t *input, int8_t *output, const __half *bias, int feature_dim, float dequant_scale, float quant_scale, - bool in_out_col32) { - int total_count = batch_token_num * feature_dim; - int grid_dim = total_count >> 11; - ker_bias_gelu_i8I_i8O<__half><<>>( - input, output, bias, total_count, feature_dim, dequant_scale, quant_scale, - in_out_col32); + bool in_col32, bool out_col32) { + ker_bias_gelu_i8I_i8O<__half><<>>( + input, output, bias, feature_dim, dequant_scale, quant_scale, in_col32, + out_col32); } template void ker_bias_gelu_i8I_i8O_launcher( int batch_token_num, cudaStream_t stream, int8_t *input, int8_t *output, const float *bias, int feature_dim, float dequant_scale, float quant_scale, - bool in_out_col32); + bool in_col32, bool out_col32); template void ker_bias_gelu_i8I_i8O_launcher<__half>( int batch_token_num, cudaStream_t stream, int8_t *input, int8_t *output, const __half *bias, int feature_dim, float dequant_scale, float quant_scale, - bool in_out_col32); + bool in_col32, bool out_col32); template __global__ void ker_bias_relu_i8I_i8O(int8_t *input, int8_t *output, diff --git a/lightseq/inference/kernels/transformerKernels_int8.h b/lightseq/inference/kernels/transformerKernels_int8.h index 3913973a..cfe7690a 100644 --- a/lightseq/inference/kernels/transformerKernels_int8.h +++ b/lightseq/inference/kernels/transformerKernels_int8.h @@ -31,7 +31,8 @@ void ker_bias_gelu_i8I_i8O_launcher(int batch_token_num, cudaStream_t stream, int8_t *input, int8_t *output, const T *bias, int feature_dim, float dequant_scale, float quant_scale, - bool in_out_col32 = false); + bool in_col32 = false, + bool out_col32 = false); // TODO: remove clip_max template @@ -39,8 +40,8 @@ void ker_bias_relu_i8I_i8O_launcher(int batch_token_num, cudaStream_t stream, int8_t *input, int8_t *output, const T *bias, int feature_dim, float dequant_scale, float quant_scale, - float clip_max, bool in_col32 = true, - bool out_col32 = true, + float clip_max, bool in_col32 = false, + bool out_col32 = false, bool narrow_clip = false); template @@ -48,16 +49,16 @@ void ker_residual_bias_ln_i32I_i8O_launcher( const int32_t *input, const T *scale, const T *bias, const T *residual_bias, int8_t *output, T *residual, int batch_tokens, int hidden_size, float dequant_scale, float quant_scale, int max_thread_per_block, - cudaStream_t stream, bool is_post_ln = false, bool in_col32 = true, - bool out_col32 = true, const T *colsum = nullptr); + cudaStream_t stream, bool is_post_ln = false, bool in_col32 = false, + bool out_col32 = false, const T *colsum = nullptr); template void ker_residual_bias_ln_i8I_i8O_launcher( const int8_t *input, const T *scale, const T *bias, const T *residual_bias, int8_t *output, T *residual, int batch_tokens, int hidden_size, float dequant_scale, float quant_scale, int max_thread_per_block, - cudaStream_t stream, bool is_post_ln = false, bool in_col32 = true, - bool out_col32 = true, const T *colsum = nullptr); + cudaStream_t stream, bool is_post_ln = false, bool in_col32 = false, + bool out_col32 = false, const T *colsum = nullptr); template void ker_residual_bias_ln_i32I_launcher( diff --git a/lightseq/inference/model/quant_bert_encoder.cc.cu b/lightseq/inference/model/quant_bert_encoder.cc.cu index c4dec5f5..c02b90ea 100644 --- a/lightseq/inference/model/quant_bert_encoder.cc.cu +++ b/lightseq/inference/model/quant_bert_encoder.cc.cu @@ -370,7 +370,7 @@ void QuantBertEncoder::self_attention() { _int8_ffn_in_buf, _p_d_output, _batch_token_num, _tw._hidden_size, _enc_clip_max[_layer_id * 11 + 9] / _quant_range, _quant_range / _enc_clip_max[_layer_id * 11 + 6], _max_thread_per_block, - _stream, _tw._is_post_ln, true); + _stream, _tw._is_post_ln, true, true); return; } @@ -402,7 +402,7 @@ void QuantBertEncoder::ffn_add_norm() { _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, _p_device_wei[_weight_offset + 9], _tw._inner_size, _enc_clip_max[_layer_id * 11 + 10] / _quant_range, - _quant_range / _enc_clip_max[_layer_id * 11 + 7], true); + _quant_range / _enc_clip_max[_layer_id * 11 + 7], true, true); } else { ker_bias_relu_i8I_i8O_launcher<_DataType>( _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, diff --git a/lightseq/inference/model/quant_decoder.cc.cu b/lightseq/inference/model/quant_decoder.cc.cu index 9b366f07..9bc833ad 100644 --- a/lightseq/inference/model/quant_decoder.cc.cu +++ b/lightseq/inference/model/quant_decoder.cc.cu @@ -840,7 +840,7 @@ void QuantDecoder::ffn_add_norm() { _step_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, _p_device_wei[_weight_offset + 15], _tw._inner_size, _dec_clip_max[_layer_id * 19 + 16] / _quant_range, - _quant_range / _dec_clip_max[_layer_id * 19 + 11], true); + _quant_range / _dec_clip_max[_layer_id * 19 + 11], true, false); } else { ker_bias_relu_i8I_i8O_launcher<_DataType>( _step_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, diff --git a/lightseq/inference/model/quant_encoder.cc.cu b/lightseq/inference/model/quant_encoder.cc.cu index c8cb8159..075bccf9 100644 --- a/lightseq/inference/model/quant_encoder.cc.cu +++ b/lightseq/inference/model/quant_encoder.cc.cu @@ -344,7 +344,7 @@ void QuantEncoder::self_attention() { _int8_ffn_in_buf, _p_d_output, _batch_token_num, _tw._hidden_size, _enc_clip_max[_layer_id * 12 + 9] / _quant_range, _quant_range / _enc_clip_max[_layer_id * 12 + 6], _max_thread_per_block, - _stream, _tw._is_post_ln, true); + _stream, _tw._is_post_ln, true, true); return; } @@ -364,7 +364,7 @@ void QuantEncoder::ffn_add_norm() { _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, _p_device_wei[_weight_offset + 9], _tw._inner_size, _enc_clip_max[_layer_id * 12 + 10] / _quant_range, - _quant_range / _enc_clip_max[_layer_id * 12 + 7], true); + _quant_range / _enc_clip_max[_layer_id * 12 + 7], true, true); } else { ker_bias_relu_i8I_i8O_launcher<_DataType>( _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, diff --git a/lightseq/inference/model/quant_gpt_encoder.cc.cu b/lightseq/inference/model/quant_gpt_encoder.cc.cu index 7333165f..87bb14b7 100644 --- a/lightseq/inference/model/quant_gpt_encoder.cc.cu +++ b/lightseq/inference/model/quant_gpt_encoder.cc.cu @@ -187,7 +187,7 @@ void QuantGptEncoder::init_buffer() { _int8_p_d_enc_wei[_layer_id * 4 + 1], _tw._hidden_size, _tw._hidden_size, _quant_range / _enc_clip_max[_layer_id * 12 + 1], _stream, - _cublas_lt_handle); + _cublas_lt_handle, kColMajor); quantize_weight(_p_d_enc_wei[_weight_offset + 8], _int8_p_d_enc_wei[_layer_id * 4 + 2], _tw._hidden_size, @@ -199,7 +199,7 @@ void QuantGptEncoder::init_buffer() { _int8_p_d_enc_wei[_layer_id * 4 + 3], _tw._inner_size, _tw._hidden_size, _quant_range / _enc_clip_max[_layer_id * 12 + 3], _stream, - _cublas_lt_handle); + _cublas_lt_handle, kColMajor); _scaled_ffn2_colsum[_layer_id] = nullptr; } @@ -538,17 +538,15 @@ void QuantGptEncoder::self_attention(bool cache) { ker_arrange_atten_output_i8O_launcher<_DataType>( _batch_token_num, _tw._hidden_size, _stream, _p_d_q, _int8_ffn_in_buf, _batch_seq_len, _tw._dim_per_head, _tw._head_num, _max_thread_per_block, - _quant_range / _enc_clip_max[_layer_id * 12 + 5], true); + _quant_range / _enc_clip_max[_layer_id * 12 + 5], false); /* ---step 4. new_q = ori_q + new_q * output_wei--- */ - - cublasLtMM_withAlgo_i8IO( - _int8_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size, - _tw._hidden_size, 0, 0, 0, + cublaslt_gemm( + _int8_p_d_enc_wei[_layer_id * 4 + 1], _int8_ffn_in_buf, _int8_ffn_out_buf, + 1, _tw._hidden_size, _batch_token_num, _tw._hidden_size, 0, 0, 0, _enc_clip_max[_layer_id * 12 + 1] * _enc_clip_max[_layer_id * 12 + 5] / (_enc_clip_max[_layer_id * 12 + 9] * _quant_range), - _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 1], _cublas_lt_handle, - _stream, false); + _cublas_lt_handle, _stream); #ifdef DEBUG_RESULT print_vec(_int8_ffn_in_buf, "attn out in", 20); @@ -562,7 +560,7 @@ void QuantGptEncoder::self_attention(bool cache) { _int8_ffn_in_buf, _p_d_query, _batch_token_num, _tw._hidden_size, _enc_clip_max[_layer_id * 12 + 9] / _quant_range, _quant_range / _enc_clip_max[_layer_id * 12 + 6], _max_thread_per_block, - _stream, false, true); + _stream, false, false, true); return; } @@ -644,16 +642,15 @@ void QuantGptEncoder::self_attention_with_cache() { ker_arrange_atten_output_i8O_launcher<_DataType>( _batch_size, _tw._hidden_size, _stream, _p_d_q, _int8_ffn_in_buf, 1, _tw._dim_per_head, _tw._head_num, _max_thread_per_block, - _quant_range / _enc_clip_max[_layer_id * 12 + 5], true); + _quant_range / _enc_clip_max[_layer_id * 12 + 5], false); /* ---step 4. new_q = ori_q + new_q * output_wei--- */ - cublasLtMM_withAlgo_i8IO( - _int8_ffn_out_buf, 1, _batch_size, _tw._hidden_size, _tw._hidden_size, 0, - 0, 0, + cublaslt_gemm( + _int8_p_d_enc_wei[_layer_id * 4 + 1], _int8_ffn_in_buf, _int8_ffn_out_buf, + 1, _tw._hidden_size, _batch_size, _tw._hidden_size, 0, 0, 0, _enc_clip_max[_layer_id * 12 + 1] * _enc_clip_max[_layer_id * 12 + 5] / (_enc_clip_max[_layer_id * 12 + 9] * _quant_range), - _int8_ffn_in_buf, _int8_p_d_enc_wei[_layer_id * 4 + 1], _cublas_lt_handle, - _stream, false); + _cublas_lt_handle, _stream); ker_residual_bias_ln_i8I_i8O_launcher<_DataType>( _int8_ffn_out_buf, _p_device_wei[_weight_offset + 6], @@ -661,7 +658,7 @@ void QuantGptEncoder::self_attention_with_cache() { _int8_ffn_in_buf, _p_d_query, _batch_size, _tw._hidden_size, _enc_clip_max[_layer_id * 12 + 9] / _quant_range, _quant_range / _enc_clip_max[_layer_id * 12 + 6], _max_thread_per_block, - _stream, false, true); + _stream, false, false, true); return; } @@ -686,14 +683,12 @@ void QuantGptEncoder::ffn_add_norm() { _batch_token_num, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, _p_device_wei[_weight_offset + 9], _tw._inner_size, _enc_clip_max[_layer_id * 12 + 10] / _quant_range, - _quant_range / _enc_clip_max[_layer_id * 12 + 7], true); + _quant_range / _enc_clip_max[_layer_id * 12 + 7], true, false); /* ---step 2. second ffn layer--- */ - - cublasLtMM_withAlgo(_int32_ffn_out_buf, 1, _batch_token_num, _tw._hidden_size, - _tw._inner_size, 0, 0, 0, _int8_ffn_in_buf, - _int8_p_d_enc_wei[_layer_id * 4 + 3], _cublas_lt_handle, - _stream, false); + cublaslt_gemm(_int8_p_d_enc_wei[_layer_id * 4 + 3], _int8_ffn_in_buf, + _int32_ffn_out_buf, 1, _tw._hidden_size, _batch_token_num, + _tw._inner_size, 0, 0, 0, 1, _cublas_lt_handle, _stream); #ifdef DEBUG_RESULT print_vec(_int8_ffn_in_buf, "ffn2 in", 20); @@ -722,7 +717,7 @@ void QuantGptEncoder::ffn_add_norm() { ker_residual_bias_ln_i32I_i8O_launcher<_DataType>( _int32_ffn_out_buf, scale_ptr, bias_ptr, res_bias_ptr, _int8_ffn_in_buf, _p_d_query, _batch_token_num, _tw._hidden_size, dequant_scale, - _quant_range / clip_max, _max_thread_per_block, _stream, false, true, + _quant_range / clip_max, _max_thread_per_block, _stream, false, false, true, _scaled_ffn2_colsum[_layer_id]); return; @@ -743,13 +738,12 @@ void QuantGptEncoder::ffn_add_norm_with_cache() { _batch_size, _stream, _int8_ffn_out_buf, _int8_ffn_in_buf, _p_device_wei[_weight_offset + 9], _tw._inner_size, _enc_clip_max[_layer_id * 12 + 10] / _quant_range, - _quant_range / _enc_clip_max[_layer_id * 12 + 7], true); + _quant_range / _enc_clip_max[_layer_id * 12 + 7], true, false); /* ---step 2. second ffn layer--- */ - cublasLtMM_withAlgo(_int32_ffn_out_buf, 1, _batch_size, _tw._hidden_size, - _tw._inner_size, 0, 0, 0, _int8_ffn_in_buf, - _int8_p_d_enc_wei[_layer_id * 4 + 3], _cublas_lt_handle, - _stream, false); + cublaslt_gemm(_int8_p_d_enc_wei[_layer_id * 4 + 3], _int8_ffn_in_buf, + _int32_ffn_out_buf, 1, _tw._hidden_size, _batch_size, + _tw._inner_size, 0, 0, 0, 1, _cublas_lt_handle, _stream); const _DataType *scale_ptr, *bias_ptr, *res_bias_ptr; float clip_max, dequant_scale; @@ -772,7 +766,7 @@ void QuantGptEncoder::ffn_add_norm_with_cache() { ker_residual_bias_ln_i32I_i8O_launcher<_DataType>( _int32_ffn_out_buf, scale_ptr, bias_ptr, res_bias_ptr, _int8_ffn_in_buf, _p_d_query, _batch_size, _tw._hidden_size, dequant_scale, - _quant_range / clip_max, _max_thread_per_block, _stream, false, true, + _quant_range / clip_max, _max_thread_per_block, _stream, false, false, true, _scaled_ffn2_colsum[_layer_id]); return; From 7ff1fc44650ad791ab5e216f80afde09bf3f34e4 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 28 Apr 2022 02:44:27 +0800 Subject: [PATCH 44/49] optimize cpp example --- examples/inference/cpp/gpt_example.cc | 18 +++++++++++++++--- examples/inference/cpp/quant_gpt_example.cc | 18 +++++++++++++++--- examples/inference/python/test/ls_gpt2.py | 8 ++++---- .../inference/python/test/ls_quant_gpt2.py | 8 ++++---- 4 files changed, 38 insertions(+), 14 deletions(-) diff --git a/examples/inference/cpp/gpt_example.cc b/examples/inference/cpp/gpt_example.cc index 79e86e8c..60883eb1 100644 --- a/examples/inference/cpp/gpt_example.cc +++ b/examples/inference/cpp/gpt_example.cc @@ -9,13 +9,25 @@ Example of how to run gpt inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; int max_batch_size = 128; + int batch_size = 1; + int batch_seq_len = 10; + + if (argc == 4) { + batch_size = atoi(argv[2]); + batch_seq_len = atoi(argv[3]); + } + if (batch_size > max_batch_size) { + throw std::runtime_error("batch_size exceeds the maximum (128)!"); + } auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( "Gpt", model_weights_path, max_batch_size); - int batch_size = 1; - int batch_seq_len = 5; - std::vector host_input = {3666, 1438, 318, 402, 11571}; + std::vector example_input = {40, 1842, 345, 11, 475, 345, 910, 326}; + std::vector host_input; + for (int i = 0; i < batch_size * batch_seq_len; ++i) { + host_input.push_back(example_input[i % 8]); + } void* d_input; lightseq::cuda::CHECK_GPU_ERROR( diff --git a/examples/inference/cpp/quant_gpt_example.cc b/examples/inference/cpp/quant_gpt_example.cc index cec015a8..a915dbc1 100644 --- a/examples/inference/cpp/quant_gpt_example.cc +++ b/examples/inference/cpp/quant_gpt_example.cc @@ -9,13 +9,25 @@ Example of how to run gpt inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; int max_batch_size = 128; + int batch_size = 1; + int batch_seq_len = 10; + + if (argc == 4) { + batch_size = atoi(argv[2]); + batch_seq_len = atoi(argv[3]); + } + if (batch_size > max_batch_size) { + throw std::runtime_error("batch_size exceeds the maximum (128)!"); + } auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( "QuantGpt", model_weights_path, max_batch_size); - int batch_size = 1; - int batch_seq_len = 5; - std::vector host_input = {3666, 1438, 318, 402, 11571}; + std::vector example_input = {40, 1842, 345, 11, 475, 345, 910, 326}; + std::vector host_input; + for (int i = 0; i < batch_size * batch_seq_len; ++i) { + host_input.push_back(example_input[i % 8]); + } void* d_input; lightseq::cuda::CHECK_GPU_ERROR( diff --git a/examples/inference/python/test/ls_gpt2.py b/examples/inference/python/test/ls_gpt2.py index dd595e38..abbd78a6 100644 --- a/examples/inference/python/test/ls_gpt2.py +++ b/examples/inference/python/test/ls_gpt2.py @@ -150,10 +150,10 @@ def main(): # lightseq gpt perplexity supports batch infer with different lengths, # but sampling doesn't support sentences = [ - "I love you, but you", - "I love you, but you", - "I love you, but you", - "I love you, but you", + "I love you, but you say that", + "I love you, but you say that", + "I love you, but you say that", + "I love you, but you say that", ] print("====================START warmup====================") diff --git a/examples/inference/python/test/ls_quant_gpt2.py b/examples/inference/python/test/ls_quant_gpt2.py index e65dc89f..033ac5b4 100644 --- a/examples/inference/python/test/ls_quant_gpt2.py +++ b/examples/inference/python/test/ls_quant_gpt2.py @@ -218,10 +218,10 @@ def main(): # lightseq gpt perplexity supports batch infer with different lengths, # but sampling doesn't support sentences = [ - "I love you, but you", - "I love you, but you", - "I love you, but you", - "I love you, but you", + "I love you, but you say that", + "I love you, but you say that", + "I love you, but you say that", + "I love you, but you say that", ] print("====================START warmup====================") From 6a4c705b0bb9806dc59641255543e47bfa975aa8 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 28 Apr 2022 14:47:32 +0800 Subject: [PATCH 45/49] replace quant gpt cache memcpy with pointer wsitch --- .../inference/model/quant_gpt_encoder.cc.cu | 53 ++++++------------- lightseq/inference/model/quant_gpt_encoder.h | 2 +- 2 files changed, 17 insertions(+), 38 deletions(-) diff --git a/lightseq/inference/model/quant_gpt_encoder.cc.cu b/lightseq/inference/model/quant_gpt_encoder.cc.cu index 87bb14b7..82d2cae8 100644 --- a/lightseq/inference/model/quant_gpt_encoder.cc.cu +++ b/lightseq/inference/model/quant_gpt_encoder.cc.cu @@ -327,10 +327,17 @@ int QuantGptEncoder::run_one_sample(int batch_size, for (_layer_id = 0; _layer_id < _tw._n_enc_layer; _layer_id++) { _weight_offset = _layer_id * _tw._weight_per_enc_layer; - self_attention(true); + self_attention(); ffn_add_norm(); } + int8_t **ftmp = _p_d_self_k_cache2; + _p_d_self_k_cache2 = _p_d_self_k_cache1; + _p_d_self_k_cache1 = ftmp; + ftmp = _p_d_self_v_cache2; + _p_d_self_v_cache2 = _p_d_self_v_cache1; + _p_d_self_v_cache1 = ftmp; + if (sample_one_token() == 0 || _batch_seq_len >= _tw._max_step) { CHECK_GPU_ERROR(cudaMemcpyAsync(_p_d_sample_id_buf, _p_d_sample_id, _batch_token_num * sizeof(int), @@ -358,6 +365,13 @@ int QuantGptEncoder::run_one_sample(int batch_size, ffn_add_norm_with_cache(); } + int8_t **ftmp = _p_d_self_k_cache2; + _p_d_self_k_cache2 = _p_d_self_k_cache1; + _p_d_self_k_cache1 = ftmp; + ftmp = _p_d_self_v_cache2; + _p_d_self_v_cache2 = _p_d_self_v_cache1; + _p_d_self_v_cache1 = ftmp; + if (sample_one_token_with_cache() == 0 || _batch_seq_len >= _tw._max_step) break; } @@ -456,7 +470,7 @@ int QuantGptEncoder::sample_one_token_with_cache() { } template -void QuantGptEncoder::self_attention(bool cache) { +void QuantGptEncoder::self_attention() { /* ---step 0. layer_norm, add output_bias to "query"--- */ if (_layer_id == 0) { ker_norm_layer_resual_i8O_launcher<_DataType>( @@ -490,24 +504,6 @@ void QuantGptEncoder::self_attention(bool cache) { _enc_clip_max[_layer_id * 12 + 8] / _quant_range, _quant_range / _enc_clip_max[_layer_id * 12 + 11], true); - if (cache) { - cudaStream_t stream; - if (_batch_token_num > 360) { - stream = _cache_stream; - CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); - } else { - stream = _stream; - } - CHECK_GPU_ERROR(cudaMemcpyAsync( - _p_d_self_k_cache2[_layer_id], _p_d_self_k_cache1[_layer_id], - _batch_token_num * _tw._hidden_size * sizeof(int8_t), - cudaMemcpyDeviceToDevice, stream)); - CHECK_GPU_ERROR(cudaMemcpyAsync( - _p_d_self_v_cache2[_layer_id], _p_d_self_v_cache1[_layer_id], - _batch_token_num * _tw._hidden_size * sizeof(int8_t), - cudaMemcpyDeviceToDevice, stream)); - } - /* ---step 2. correlation = q * k, perform softmax on correlation--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( _hd, CUBLAS_OP_T, CUBLAS_OP_N, _batch_seq_len, _batch_seq_len, @@ -596,23 +592,6 @@ void QuantGptEncoder::self_attention_with_cache() { _enc_clip_max[_layer_id * 12 + 8] / _quant_range, _quant_range / _enc_clip_max[_layer_id * 12 + 11], true); - // copy new k and v to cache - cudaStream_t stream; - if (_batch_token_num > 360) { - stream = _cache_stream; - CHECK_GPU_ERROR(cudaStreamSynchronize(_stream)); - } else { - stream = _stream; - } - CHECK_GPU_ERROR(cudaMemcpyAsync( - _p_d_self_k_cache2[_layer_id], _p_d_self_k_cache1[_layer_id], - _batch_token_num * _tw._hidden_size * sizeof(int8_t), - cudaMemcpyDeviceToDevice, stream)); - CHECK_GPU_ERROR(cudaMemcpyAsync( - _p_d_self_v_cache2[_layer_id], _p_d_self_v_cache1[_layer_id], - _batch_token_num * _tw._hidden_size * sizeof(int8_t), - cudaMemcpyDeviceToDevice, stream)); - /* ---step 2. correlation = q * k, perform softmax on correlation correlation: [batch_size, heads_num, 1, batch_seq_len]--- */ CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( diff --git a/lightseq/inference/model/quant_gpt_encoder.h b/lightseq/inference/model/quant_gpt_encoder.h index a84cc830..b0327214 100644 --- a/lightseq/inference/model/quant_gpt_encoder.h +++ b/lightseq/inference/model/quant_gpt_encoder.h @@ -31,7 +31,7 @@ class QuantGptEncoder { const cudaDataType_t _CType = _optraits::CType; // private member function - void self_attention(bool cache = false); + void self_attention(); void self_attention_with_cache(); void ffn_add_norm(); void ffn_add_norm_with_cache(); From e68bf8f5a885b4254ac315ec9cf5a6588385fa25 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 28 Apr 2022 19:20:58 +0800 Subject: [PATCH 46/49] fuse quant gpt softmax kernel --- examples/inference/cpp/bert_example.cc | 19 ++++++----- examples/inference/cpp/gpt_example.cc | 17 ++++++---- examples/inference/cpp/quant_bert_example.cc | 19 ++++++----- examples/inference/cpp/quant_gpt_example.cc | 17 ++++++---- .../cpp/quant_transformer_example.cc | 28 +++++++++++---- examples/inference/cpp/transformer_example.cc | 28 +++++++++++---- .../inference/kernels/gptKernels_int8.cc.cu | 34 +++++++++---------- lightseq/inference/kernels/gptKernels_int8.h | 2 +- .../inference/model/quant_gpt_encoder.cc.cu | 28 ++++----------- 9 files changed, 110 insertions(+), 82 deletions(-) diff --git a/examples/inference/cpp/bert_example.cc b/examples/inference/cpp/bert_example.cc index d06501e8..22c08bb7 100644 --- a/examples/inference/cpp/bert_example.cc +++ b/examples/inference/cpp/bert_example.cc @@ -8,9 +8,12 @@ Example of how to run Bert inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; + std::vector example_input = {2859, 2758, 2051, 2157, + 2005, 6629, 7566, 1012}; + int eg_seq_len = example_input.size(); int max_batch_size = 128; int batch_size = 1; - int batch_seq_len = 10; + int batch_seq_len = eg_seq_len; if (argc == 4) { batch_size = atoi(argv[2]); @@ -20,16 +23,16 @@ int main(int argc, char* argv[]) { throw std::runtime_error("batch_size exceeds the maximum (128)!"); } - auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( - "Bert", model_weights_path, max_batch_size); - - std::vector example_input = {2859, 2758, 2051, 2157, - 2005, 6629, 7566, 1012}; std::vector host_input; - for (int i = 0; i < batch_size * batch_seq_len; ++i) { - host_input.push_back(example_input[i % 8]); + for (int i = 0; i < batch_size; ++i) { + for (int j = 0; j < batch_seq_len; ++j) { + host_input.push_back(example_input[j % eg_seq_len]); + } } + auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( + "Bert", model_weights_path, max_batch_size); + void* d_input; lightseq::cuda::CHECK_GPU_ERROR( cudaMalloc(&d_input, sizeof(int) * batch_size * batch_seq_len)); diff --git a/examples/inference/cpp/gpt_example.cc b/examples/inference/cpp/gpt_example.cc index 60883eb1..bc07d90e 100644 --- a/examples/inference/cpp/gpt_example.cc +++ b/examples/inference/cpp/gpt_example.cc @@ -8,9 +8,11 @@ Example of how to run gpt inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; + std::vector example_input = {40, 1842, 345, 11, 475, 345, 910, 326}; + int eg_seq_len = example_input.size(); int max_batch_size = 128; int batch_size = 1; - int batch_seq_len = 10; + int batch_seq_len = eg_seq_len; if (argc == 4) { batch_size = atoi(argv[2]); @@ -20,15 +22,16 @@ int main(int argc, char* argv[]) { throw std::runtime_error("batch_size exceeds the maximum (128)!"); } - auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( - "Gpt", model_weights_path, max_batch_size); - - std::vector example_input = {40, 1842, 345, 11, 475, 345, 910, 326}; std::vector host_input; - for (int i = 0; i < batch_size * batch_seq_len; ++i) { - host_input.push_back(example_input[i % 8]); + for (int i = 0; i < batch_size; ++i) { + for (int j = 0; j < batch_seq_len; ++j) { + host_input.push_back(example_input[j % eg_seq_len]); + } } + auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( + "Gpt", model_weights_path, max_batch_size); + void* d_input; lightseq::cuda::CHECK_GPU_ERROR( cudaMalloc(&d_input, sizeof(int) * batch_size * batch_seq_len)); diff --git a/examples/inference/cpp/quant_bert_example.cc b/examples/inference/cpp/quant_bert_example.cc index d58d8bb8..54ff5c14 100644 --- a/examples/inference/cpp/quant_bert_example.cc +++ b/examples/inference/cpp/quant_bert_example.cc @@ -8,9 +8,12 @@ Example of how to run QuantBert inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; + std::vector example_input = {2859, 2758, 2051, 2157, + 2005, 6629, 7566, 1012}; + int eg_seq_len = example_input.size(); int max_batch_size = 128; int batch_size = 1; - int batch_seq_len = 10; + int batch_seq_len = eg_seq_len; if (argc == 4) { batch_size = atoi(argv[2]); @@ -20,16 +23,16 @@ int main(int argc, char* argv[]) { throw std::runtime_error("batch_size exceeds the maximum (128)!"); } - auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( - "QuantBert", model_weights_path, max_batch_size); - - std::vector example_input = {2859, 2758, 2051, 2157, - 2005, 6629, 7566, 1012}; std::vector host_input; - for (int i = 0; i < batch_size * batch_seq_len; ++i) { - host_input.push_back(example_input[i % 8]); + for (int i = 0; i < batch_size; ++i) { + for (int j = 0; j < batch_seq_len; ++j) { + host_input.push_back(example_input[j % eg_seq_len]); + } } + auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( + "QuantBert", model_weights_path, max_batch_size); + void* d_input; lightseq::cuda::CHECK_GPU_ERROR( cudaMalloc(&d_input, sizeof(int) * batch_size * batch_seq_len)); diff --git a/examples/inference/cpp/quant_gpt_example.cc b/examples/inference/cpp/quant_gpt_example.cc index a915dbc1..6a3dce42 100644 --- a/examples/inference/cpp/quant_gpt_example.cc +++ b/examples/inference/cpp/quant_gpt_example.cc @@ -8,9 +8,11 @@ Example of how to run gpt inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; + std::vector example_input = {40, 1842, 345, 11, 475, 345, 910, 326}; + int eg_seq_len = example_input.size(); int max_batch_size = 128; int batch_size = 1; - int batch_seq_len = 10; + int batch_seq_len = eg_seq_len; if (argc == 4) { batch_size = atoi(argv[2]); @@ -20,15 +22,16 @@ int main(int argc, char* argv[]) { throw std::runtime_error("batch_size exceeds the maximum (128)!"); } - auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( - "QuantGpt", model_weights_path, max_batch_size); - - std::vector example_input = {40, 1842, 345, 11, 475, 345, 910, 326}; std::vector host_input; - for (int i = 0; i < batch_size * batch_seq_len; ++i) { - host_input.push_back(example_input[i % 8]); + for (int i = 0; i < batch_size; ++i) { + for (int j = 0; j < batch_seq_len; ++j) { + host_input.push_back(example_input[j % eg_seq_len]); + } } + auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( + "QuantGpt", model_weights_path, max_batch_size); + void* d_input; lightseq::cuda::CHECK_GPU_ERROR( cudaMalloc(&d_input, sizeof(int) * batch_size * batch_seq_len)); diff --git a/examples/inference/cpp/quant_transformer_example.cc b/examples/inference/cpp/quant_transformer_example.cc index 08930deb..4073b8a3 100644 --- a/examples/inference/cpp/quant_transformer_example.cc +++ b/examples/inference/cpp/quant_transformer_example.cc @@ -8,16 +8,32 @@ Example of how to run quantized transformer inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; - int max_batch_size = 8; + + std::vector example_input = {63, 47, 65, 1507, 88, 74, + 10, 2057, 362, 9, 284, 6}; + int eg_seq_len = example_input.size(); + int max_batch_size = 128; + int batch_size = 1; + int batch_seq_len = eg_seq_len; + + if (argc == 4) { + batch_size = atoi(argv[2]); + batch_seq_len = atoi(argv[3]); + } + if (batch_size > max_batch_size) { + throw std::runtime_error("batch_size exceeds the maximum (128)!"); + } + + std::vector host_input; + for (int i = 0; i < batch_size; ++i) { + for (int j = 0; j < batch_seq_len; ++j) { + host_input.push_back(example_input[j % eg_seq_len]); + } + } auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( "QuantTransformer", model_weights_path, max_batch_size); - int batch_size = 1; - int batch_seq_len = 13; - std::vector host_input = {63, 47, 65, 1507, 88, 74, 10, - 2057, 362, 9, 284, 6, 2}; - void* d_input; lightseq::cuda::CHECK_GPU_ERROR( cudaMalloc(&d_input, sizeof(int) * batch_size * batch_seq_len)); diff --git a/examples/inference/cpp/transformer_example.cc b/examples/inference/cpp/transformer_example.cc index 79413bb0..68f2f101 100644 --- a/examples/inference/cpp/transformer_example.cc +++ b/examples/inference/cpp/transformer_example.cc @@ -8,16 +8,32 @@ Example of how to run transformer inference using our implementation. int main(int argc, char* argv[]) { std::string model_weights_path = argv[1]; - int max_batch_size = 8; + + std::vector example_input = {63, 47, 65, 1507, 88, 74, + 10, 2057, 362, 9, 284, 6}; + int eg_seq_len = example_input.size(); + int max_batch_size = 128; + int batch_size = 1; + int batch_seq_len = eg_seq_len; + + if (argc == 4) { + batch_size = atoi(argv[2]); + batch_seq_len = atoi(argv[3]); + } + if (batch_size > max_batch_size) { + throw std::runtime_error("batch_size exceeds the maximum (128)!"); + } + + std::vector host_input; + for (int i = 0; i < batch_size; ++i) { + for (int j = 0; j < batch_seq_len; ++j) { + host_input.push_back(example_input[j % eg_seq_len]); + } + } auto model = lightseq::cuda::LSModelFactory::GetInstance().CreateModel( "Transformer", model_weights_path, max_batch_size); - int batch_size = 1; - int batch_seq_len = 13; - std::vector host_input = {63, 47, 65, 1507, 88, 74, 10, - 2057, 362, 9, 284, 6, 2}; - void* d_input; lightseq::cuda::CHECK_GPU_ERROR( cudaMalloc(&d_input, sizeof(int) * batch_size * batch_seq_len)); diff --git a/lightseq/inference/kernels/gptKernels_int8.cc.cu b/lightseq/inference/kernels/gptKernels_int8.cc.cu index 6fc89d93..c0c24611 100644 --- a/lightseq/inference/kernels/gptKernels_int8.cc.cu +++ b/lightseq/inference/kernels/gptKernels_int8.cc.cu @@ -658,7 +658,7 @@ void ker_topp_sample_i8I_launcher(int batch_size, int batch_seq_len, template __global__ void ker_arrange_qkv_with_cache_i8I_i8O( const int8_t* ori_qkv, const T* qkv_bias, int8_t* new_q, int8_t* new_k, - int8_t* k_cache, int8_t* new_v, int8_t* v_cache, T* d_v, int batch_seq_len, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, float quant_scale, bool in_col32) { int hidden_size = head_num * dim_per_head; @@ -702,16 +702,15 @@ __global__ void ker_arrange_qkv_with_cache_i8I_i8O( if (blockIdx.y == 1) new_k[target_id] = new_val; if (blockIdx.y == 2) { new_v[target_id] = new_val; - d_v[target_id] = float(new_val) / quant_scale; } } template <> __global__ void ker_arrange_qkv_with_cache_i8I_i8O<__half>( const int8_t* ori_qkv, const __half* qkv_bias, int8_t* new_q, int8_t* new_k, - int8_t* k_cache, int8_t* new_v, int8_t* v_cache, __half* d_v, - int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, - float quant_scale, bool in_col32) { + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, int batch_seq_len, + int dim_per_head, int head_num, float dequant_scale, float quant_scale, + bool in_col32) { int hidden_size = head_num * dim_per_head; int batch_size = gridDim.x / batch_seq_len; int batch_id = blockIdx.x / batch_seq_len; @@ -754,7 +753,6 @@ __global__ void ker_arrange_qkv_with_cache_i8I_i8O<__half>( if (blockIdx.y == 1) new_k[target_id] = new_val; if (blockIdx.y == 2) { new_v[target_id] = new_val; - d_v[target_id] = __float2half(float(new_val) / quant_scale); } } @@ -762,12 +760,12 @@ template void ker_arrange_qkv_with_cache_i8I_i8O_launcher( int batch_token_num, int hidden_size, cudaStream_t stream, const int8_t* ori_qkv, const T* qkv_bias, int8_t* new_q, int8_t* new_k, - int8_t* k_cache, int8_t* new_v, int8_t* v_cache, T* d_v, int batch_seq_len, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, float quant_scale, bool in_col32) { ker_arrange_qkv_with_cache_i8I_i8O <<>>( - ori_qkv, qkv_bias, new_q, new_k, k_cache, new_v, v_cache, d_v, + ori_qkv, qkv_bias, new_q, new_k, k_cache, new_v, v_cache, batch_seq_len, dim_per_head, head_num, dequant_scale, quant_scale, in_col32); } @@ -776,12 +774,12 @@ template <> void ker_arrange_qkv_with_cache_i8I_i8O_launcher<__half>( int batch_token_num, int hidden_size, cudaStream_t stream, const int8_t* ori_qkv, const __half* qkv_bias, int8_t* new_q, int8_t* new_k, - int8_t* k_cache, int8_t* new_v, int8_t* v_cache, __half* d_v, - int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, - float quant_scale, bool in_col32) { + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, int batch_seq_len, + int dim_per_head, int head_num, float dequant_scale, float quant_scale, + bool in_col32) { ker_arrange_qkv_with_cache_i8I_i8O<__half> <<>>( - ori_qkv, qkv_bias, new_q, new_k, k_cache, new_v, v_cache, d_v, + ori_qkv, qkv_bias, new_q, new_k, k_cache, new_v, v_cache, batch_seq_len, dim_per_head, head_num, dequant_scale, quant_scale, in_col32); } @@ -789,16 +787,16 @@ void ker_arrange_qkv_with_cache_i8I_i8O_launcher<__half>( template void ker_arrange_qkv_with_cache_i8I_i8O_launcher( int batch_token_num, int hidden_size, cudaStream_t stream, const int8_t* ori_qkv, const float* qkv_bias, int8_t* new_q, int8_t* new_k, - int8_t* k_cache, int8_t* new_v, int8_t* v_cache, float* d_v, - int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, - float quant_scale, bool in_col32); + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, int batch_seq_len, + int dim_per_head, int head_num, float dequant_scale, float quant_scale, + bool in_col32); template void ker_arrange_qkv_with_cache_i8I_i8O_launcher<__half>( int batch_token_num, int hidden_size, cudaStream_t stream, const int8_t* ori_qkv, const __half* qkv_bias, int8_t* new_q, int8_t* new_k, - int8_t* k_cache, int8_t* new_v, int8_t* v_cache, __half* d_v, - int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, - float quant_scale, bool in_col32); + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, int batch_seq_len, + int dim_per_head, int head_num, float dequant_scale, float quant_scale, + bool in_col32); template __global__ void ker_attention_mask_weights_i32I( diff --git a/lightseq/inference/kernels/gptKernels_int8.h b/lightseq/inference/kernels/gptKernels_int8.h index 1e1822e0..007e8e9a 100644 --- a/lightseq/inference/kernels/gptKernels_int8.h +++ b/lightseq/inference/kernels/gptKernels_int8.h @@ -49,7 +49,7 @@ template void ker_arrange_qkv_with_cache_i8I_i8O_launcher( int batch_token_num, int hidden_size, cudaStream_t stream, const int8_t* ori_qkv, const T* qkv_bias, int8_t* new_q, int8_t* new_k, - int8_t* k_cache, int8_t* new_v, int8_t* v_cache, T* d_v, int batch_seq_len, + int8_t* k_cache, int8_t* new_v, int8_t* v_cache, int batch_seq_len, int dim_per_head, int head_num, float dequant_scale, float quant_scale, bool in_col32 = false); diff --git a/lightseq/inference/model/quant_gpt_encoder.cc.cu b/lightseq/inference/model/quant_gpt_encoder.cc.cu index 82d2cae8..26f1b5e8 100644 --- a/lightseq/inference/model/quant_gpt_encoder.cc.cu +++ b/lightseq/inference/model/quant_gpt_encoder.cc.cu @@ -587,7 +587,7 @@ void QuantGptEncoder::self_attention_with_cache() { _batch_token_num, _tw._hidden_size, _stream, _int8_ffn_out_buf, _p_device_wei[_weight_offset + 3], _int8_ffn_in_buf, _p_d_self_k_cache1[_layer_id], _p_d_self_k_cache2[_layer_id], - _p_d_self_v_cache1[_layer_id], _p_d_self_v_cache2[_layer_id], _p_d_v, + _p_d_self_v_cache1[_layer_id], _p_d_self_v_cache2[_layer_id], _batch_seq_len, _tw._dim_per_head, _tw._head_num, _enc_clip_max[_layer_id * 12 + 8] / _quant_range, _quant_range / _enc_clip_max[_layer_id * 12 + 11], true); @@ -602,26 +602,12 @@ void QuantGptEncoder::self_attention_with_cache() { CUDA_R_32I, _batch_seq_len, _batch_seq_len, _batch_size * _tw._head_num, CUDA_R_32I, CUBLAS_GEMM_DEFAULT_TENSOR_OP)); - ker_attention_mask_weights_i32I_launcher<_DataType>( - _batch_size, 1, _batch_seq_len, _tw._head_num, _stream, - _int32_ffn_out_buf, _p_d_c, _p_d_real_seq_len, _atten_scaler, - _enc_clip_max[_layer_id * 12 + 11] / _quant_range); - - /* ---step 3. new_q = correlation * v--- */ - CHECK_GPU_ERROR(cublasGemmStridedBatchedEx( - _hd, CUBLAS_OP_N, CUBLAS_OP_N, _tw._dim_per_head, 1, _batch_seq_len, - &_fone, _p_d_v, _AType, _tw._dim_per_head, - _batch_seq_len * _tw._dim_per_head, _p_d_c, _BType, _batch_seq_len, - _batch_seq_len, &_fzero, _p_d_q, _CType, _tw._dim_per_head, - _tw._dim_per_head, _batch_size * _tw._head_num, _computeType, - CUBLAS_GEMM_DEFAULT_TENSOR_OP)); - - // use v to save reshaped q, since they are in same size and v - // will not be use again before the next multi-head-attention - ker_arrange_atten_output_i8O_launcher<_DataType>( - _batch_size, _tw._hidden_size, _stream, _p_d_q, _int8_ffn_in_buf, 1, - _tw._dim_per_head, _tw._head_num, _max_thread_per_block, - _quant_range / _enc_clip_max[_layer_id * 12 + 5], false); + ker_fuse_softmax_new_value_i32I_i8O_launcher( + _int32_ffn_out_buf, _p_d_self_v_cache1[_layer_id], _int8_ffn_in_buf, + _batch_size * _tw._head_num, _batch_seq_len, _batch_seq_len, + _tw._head_num, _tw._dim_per_head, float(_atten_scaler), + _enc_clip_max[_layer_id * 12 + 11] / _quant_range, + _quant_range / _enc_clip_max[_layer_id * 12 + 5], false, _stream); /* ---step 4. new_q = ori_q + new_q * output_wei--- */ cublaslt_gemm( From 9e4003753da95c2a3b82d8c010b74ec29037b938 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Fri, 29 Apr 2022 00:24:58 +0800 Subject: [PATCH 47/49] optimize quant gpt arrange-qkv kernel --- .../inference/kernels/gptKernels_int8.cc.cu | 47 ++++++++++++------- 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/lightseq/inference/kernels/gptKernels_int8.cc.cu b/lightseq/inference/kernels/gptKernels_int8.cc.cu index c0c24611..286193f2 100644 --- a/lightseq/inference/kernels/gptKernels_int8.cc.cu +++ b/lightseq/inference/kernels/gptKernels_int8.cc.cu @@ -719,41 +719,54 @@ __global__ void ker_arrange_qkv_with_cache_i8I_i8O<__half>( int dim_id = threadIdx.x % dim_per_head; int target_id = targetid_4dim(batch_id, head_id, token_id, dim_id, head_num, batch_seq_len, dim_per_head); - int8_t new_val; + int2 new_val; + int8_t* p_new_val = (int8_t*)(&new_val); + const int2* p_ori_qkv = (const int2*)ori_qkv; + const float4* p_bias = (const float4*)qkv_bias; + const int2* p_k_cache = (const int2*)k_cache; + const int2* p_v_cache = (const int2*)v_cache; + int2* p_new_q = (int2*)new_q; + int2* p_new_k = (int2*)new_k; + int2* p_new_v = (int2*)new_v; if (token_id < batch_seq_len - 1) { int old_target_id = targetid_4dim(batch_id, head_id, token_id, dim_id, head_num, batch_seq_len - 1, dim_per_head); if (blockIdx.y == 0) return; - if (blockIdx.y == 1) new_val = k_cache[old_target_id]; - if (blockIdx.y == 2) new_val = v_cache[old_target_id]; + if (blockIdx.y == 1) new_val = p_k_cache[old_target_id]; + if (blockIdx.y == 2) new_val = p_v_cache[old_target_id]; } else { int qkv_index; if (in_col32) { int row_id = batch_id; - int col_id = blockIdx.y * hidden_size + threadIdx.x; + int col_id = (blockIdx.y * hidden_size + threadIdx.x) << 3; qkv_index = row_major2flat_col32(row_id, col_id, batch_size, - gridDim.y * hidden_size); + (gridDim.y * hidden_size) << 3) >> + 3; } else { qkv_index = - (batch_id * gridDim.y + blockIdx.y) * hidden_size + threadIdx.x; + (batch_id * gridDim.y + blockIdx.y) * blockDim.x + threadIdx.x; + } + int2 ori_qkv8 = p_ori_qkv[qkv_index]; + float4 bias8 = __ldg(&p_bias[blockIdx.y * blockDim.x + threadIdx.x]); + int8_t* p_ori_qkv8 = (int8_t*)(&ori_qkv8); + __half* p_bias8 = (__half*)(&bias8); +#pragma unroll + for (int i = 0; i < 8; ++i) { + p_new_val[i] = + float2int8(float(p_ori_qkv8[i]) * dequant_scale + float(p_bias8[i]), + quant_scale); } - float tmp_val = - float(ori_qkv[qkv_index]) * dequant_scale + - __half2float(__ldg(&qkv_bias[blockIdx.y * hidden_size + threadIdx.x])); - new_val = float2int8(tmp_val, quant_scale); if (blockIdx.y == 0) { target_id = targetid_4dim(batch_id, head_id, 0, dim_id, head_num, 1, dim_per_head); } } - if (blockIdx.y == 0) new_q[target_id] = new_val; - if (blockIdx.y == 1) new_k[target_id] = new_val; - if (blockIdx.y == 2) { - new_v[target_id] = new_val; - } + if (blockIdx.y == 0) p_new_q[target_id] = new_val; + if (blockIdx.y == 1) p_new_k[target_id] = new_val; + if (blockIdx.y == 2) p_new_v[target_id] = new_val; } template @@ -778,9 +791,9 @@ void ker_arrange_qkv_with_cache_i8I_i8O_launcher<__half>( int dim_per_head, int head_num, float dequant_scale, float quant_scale, bool in_col32) { ker_arrange_qkv_with_cache_i8I_i8O<__half> - <<>>( + <<>>( ori_qkv, qkv_bias, new_q, new_k, k_cache, new_v, v_cache, - batch_seq_len, dim_per_head, head_num, dequant_scale, quant_scale, + batch_seq_len, dim_per_head / 8, head_num, dequant_scale, quant_scale, in_col32); } From dd71c87bd77e48cc9a6327055b9751cf53be9e9d Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 5 May 2022 12:50:16 +0800 Subject: [PATCH 48/49] modify PiPI spelling --- README.md | 2 +- docker/README.md | 2 +- lightseq/inference/README.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 741e9046..f5a5a20b 100644 --- a/README.md +++ b/README.md @@ -72,7 +72,7 @@ You can install LightSeq from PyPI: $ pip install lightseq ``` -LightSeq installation from PyPi only supports Python 3.6 to 3.8 on Linux for now. Consider compiling from source if you have other environments: +LightSeq installation from PyPI only supports Python 3.6 to 3.8 on Linux for now. Consider compiling from source if you have other environments: ```shell $ PATH=/usr/local/hdf5/:$PATH ENABLE_FP32=0 ENABLE_DEBUG=0 pip install -e $PROJECT_DIR ``` diff --git a/docker/README.md b/docker/README.md index f29df5c6..375f5f4e 100644 --- a/docker/README.md +++ b/docker/README.md @@ -1,5 +1,5 @@ ## Dockerfiles of lightseq -Pypi: for publish python package. +PyPI: for publish python package. Tritonserver: for publish tritonserver diff --git a/lightseq/inference/README.md b/lightseq/inference/README.md index 53f1b27e..0819db21 100644 --- a/lightseq/inference/README.md +++ b/lightseq/inference/README.md @@ -97,7 +97,7 @@ Nothing's gonna change my love for you. Drop everything now. Meet me in the pouring rain. Kiss me on the sidewalk. ``` -LightSeq installation from pypi only supports python 3.6 to 3.8 on Linux for now. Consider compiling from source if you have other environments. +LightSeq installation from PyPI only supports python 3.6 to 3.8 on Linux for now. Consider compiling from source if you have other environments. And there is also a quick start for huggingface GPT in examples. From 8c4b81e24be5332858084f8d3088141db34fae08 Mon Sep 17 00:00:00 2001 From: "weiyang.god" Date: Thu, 5 May 2022 13:24:11 +0800 Subject: [PATCH 49/49] fix gpt memory spelling --- lightseq/inference/model/gpt_encoder.h | 2 +- lightseq/inference/model/quant_gpt_encoder.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lightseq/inference/model/gpt_encoder.h b/lightseq/inference/model/gpt_encoder.h index 3ea74f6a..8ca2856f 100644 --- a/lightseq/inference/model/gpt_encoder.h +++ b/lightseq/inference/model/gpt_encoder.h @@ -53,7 +53,7 @@ class GptEncoder { std::vector _h_sample_id; int _h_unfinished; - // gpu memeory buffer + // gpu memory buffer _DataType *_p_d_query; _DataType *_p_d_k_cache; _DataType *_p_d_v_cache; diff --git a/lightseq/inference/model/quant_gpt_encoder.h b/lightseq/inference/model/quant_gpt_encoder.h index b0327214..e399d1bd 100644 --- a/lightseq/inference/model/quant_gpt_encoder.h +++ b/lightseq/inference/model/quant_gpt_encoder.h @@ -57,7 +57,7 @@ class QuantGptEncoder { std::vector _h_sample_id; int _h_unfinished; - // gpu memeory buffer + // gpu memory buffer _DataType *_p_d_query; _DataType *_p_d_k_cache; _DataType *_p_d_v_cache;
  • &0ilD%b>?+-+hLFRRlEC$Qys z2Y}Oz3Jwh9@aE%V$@OgYF`(g)T_(LGmhSY!Dp_YriI_j7URQ$w^7A%2EtfCy_`LSE z_*lMlMwP~d!z0Im2c`sL@&B!_PMD(v!5SVQ68wH%WRX9v(kv+P?!eZ@0~D~BcgH~Q z*-)+~BAVt&r*VTvsf`~vyZaY1kArhP`D$#(PJS@)H5CH1>7q2KK^HKm;o2A(fGTY$u5X7d6L=yD}S5e>wSdtk~v z`>DE${4fT!=vF^iT=^g{29>1wKPBTolizZkjF!SkF zE7)_IKuqMf(Ezo9bZ4Xj!2fsuZhIF^$)wm(bi>u0dm^D4Y3js5}S{cR76 zgG64_kG|@V&8sCTve64!s8m0l5%~W-`gAylIybQ_Zq4%|>U%(omAc{k_g^t&4F*cg zZhz1aG~|z0*%~#r<`Qs?{$!m5TEJ=GaK=a}3J6Fn3UjIAQZ$e9Z~b*Bd2T`uM`>E_ z8VtMOqn_u|&1U+g#f}F>l9LV3f^9dz;m-GRT%gRBcw!@b61rJB0yU|oIPnRdDdtjO zO!i|F0y{u+^&)fxTYnn_AIBU-@=DD3c^H!6QFQ|o7K+Vp(5~dam%_iG+CM+!PQWjP z=ehxn=IySS;)6me>7QT+F9WUzssp=Z@ONwCurey3G6YJ0QvN&HUa(hRy=~_2sow9S zb*F8;g*)yD;81Z~55dJLl$K)NYt;wBNEXRmftUDQz`r;<2B7P*JaC4VaGO_YJOxYJ z`yGDkrSIQ8d>aVd0-$Do6O3dYU*CXVCg7wL_y=bI^2CzgizBd$tp`=Np+(5v6%;6e zL%{jQR5GV*<~9PU{0D%WomdyWO4X7Pb(8si0cP@ga6?WeBsk`mfwTD?q}z_M;v(Z6 z+rc_Q?0py8Q_Q~mF?Fg0;A<;S0X6ahRAnJ0!(mzAS+5*eio1X#-7m8Ue&HX1X?9(< zBVYv-*axgXrU2pc>~Z`S!1<`Q_gTad4Cr?)qB%E8@BabKJ@JEqCuAMaYmPuo$tDC&_3~9| z&yO9&eFBE4Q(!?!^=k%{{weTmDL<1N)qPjahGc_hHs6el&GFs?LQrJ?6i~ga0)jEC zvkW+!`TLk2pvhmmA%A4`{o`ZT#&zhr2Ob5F03Ee-=y#$ziyNS_DOSg=#N*KzBYev) zBlT!0G2lwgHy5ZLQYwlO)o1@dC?P1CgxK?6XmB|WqAbe)Q(sY@6g)(V`O>&U70_Y4 z2LqShGGvV`;fHys`$*#>6_+XGM+Sk#k{XxoJ7D$JO)ei+UvGkYws_={5?%owi8V7< zAX-vA;uHY4?}oDXB4BZ#OZaZP?5_PB`KkeEu|GoMR-xc*8z2kwyufrSOo2a|JqA-h z13D8=D^d}koIyaO!hQ@)2aiZ*yzh2hH=39RJ6S%52l*rv+{xU`lF4wH%oEas)J<-8iHCQ=_n9 z2>V>^X8)9#1>nJQ6n?S<(6q~xbUI5R*F)-{5U`a`(ovVDa^7fX%tI!Mft@>~dR0Mw zM?lJYGNZ6=1q9@e3au7}JE9l}7UjF#As73$=4_~s{Hpltzj}y+w?E!`yQS@OuR;wpS|}+ty-OkZCO@pC zSA;{VAXPg~PPO%LVu>f$4tk867s{B5|6tBW+ik)3y};;E5VW)})Q9OkK|B=`J$F6I zdEM3yX7IVkI~O_~YvA~=bKi@}NFBu?cBfB1p-@o@RcUUMUa+Eqi{UYeY6UJ{ zog4wmDg}F1X^-FSo*qnRO>fn%zR-LP=BO{FpLToZig-(u=j(pHe0tR>U!Yti>4p4b zkkSlIFGS8gJESne*6_jhDAKO+J;Bx$7#Ghit6Nv+q*tkvBFvi}M-9dg_E#$F-yR3R z4Rae|7oUB{ibv0tH+4X8QKUq@ttJ#IFU1}}$L^ita$)INV-0};E8m+^x3b&sW)f`^v z3K!MvrS(RyP`fxzc&9Fn*Xk|!NUVM|wo`C0Yox3Lt=K04XQOCeN|?+4t~Wh4*rL$| zt?}0F@1KC*TEy zTI>U8Tj$1X;bET&vxFSSMR$}uh+&z&T3lkVD#_(%jsze7h;IB&Xb-GhXZMygiV=G6 zw_xc_2O3IJS)Vk2m)br;jP zGHV&(5J68(qdkzP-Lew?oUqPXEJ_kBjgSCf)*qyTjNo~+4A(A{n3AKfGeg12qYx72 zpQQLf&>i23hzo<*%I#}8q;Hvn?86Zc^pci-1W9$6?kwiZrxB@mr9S2bgpt z`#)8xd6#OOC97Wh)2ol%g4J#9WOkpW;e~T-hE9d{*X1b9bCs(}rGTz{%xlSCH7?ZM z3PXPC@(F;SETzLi=sTST=R} zmdJe*ud={??x=J5?`H@5DV2et9|rpT3*FGtmBN+~dg zE&SnF{FzClBb!dsXQl?ZKQx|a8LdRzFTob5^S=3h+r2bq^u;kHj&*R;Y;!2PJ3cNO zS!|5R9?_rP5~Tx`2(@?EBCC)C7;BwsOb){3U5PzTgZ6Ba{bxX*JLa>j%qK*Rj6bly z7F4YZPa(w72vEM!Web{v?7Nn6VtyhVok#^eR(Tig`+Ew5R-!O6hXLh@pK@j&vZ5Kf zNJhnO*4Kgdj(s?6Kl(!mJ@xwQO02w4p?1=j_{r)J&=R*s6!D9vEQBhG{w*AYUu1pW zFXIsuY;IKa5k55ETN;WoKX?0OGRz@4?XfMQ{_ z+)db(Ii<1i>!8s1v%3ub5Q`-*rJ|GyIX0LP9ij|#>b8W;;a6ld^?BAkJqA}KQp0Di z_oP=iXIQ5e?TTPG(gVP)&WujLzmjxU4VbE7tS4ZH-9fEYEv5pKrS8AQ-i=30JDyD- zwv_n#ZC`OX&x_Omf+l$$UBlxB$1mr@2$ZI3S7)B$9pyyQ`TGCd`CyPH#Yx6zT0Y-f z57eA~C8g|hIN{Xl;_@t?VB{E~Ijf`fTZbW?bDr~8@qoCW!}8m5FxnWt;bp%26iz`< z2jCmtxFE?IIK#6CU+_UItQCvuCS=eKOS;urz)9^<-$g%HBaE%xWS;3ax}5;ncZ`A1|-M6vcM>L&#>twA}NX zNoOGUDdXQ=f(WeSx1k1_?;GcC=BHsT-pj=@Gvo{XYn_+&+tx~&5f8zc@O^3Pd|z`_dfzEfs^61J9UM0}`YXD$ z60r87X)e^T>(hIV`-B2OAv$M^`dWDdr_fpVrscJ>Tk9m?%!{hPXn6Kj@7^;MUak^m zcGgZ|E$`N;OoK>hpv&_D7oOd=#xvSpkmA{=r4r;lV}8J9efzCN$}pS|o>JLm3l_KW zU3}VOHy5r|Z#Q(vRRZhhxlOJ0)K{kV>hw+sM}}^15(|7-TwP*4eNvE;?1J;c)~d+N zK6Sa;C|+pT4&X={{*0K|E)+5sTtnVk6zQTlI1AR>1Dzii3(ijH3}0*|kEsG@KnY(D za5CAv>_Btx(U-P`8vXxvG|)W0$%dN3-LHj~=7qH8TJ(>IDq=lyk3%Wz;&~l0%*X!+ zGg{-pNkMn4Sm#;!3h~FP0$-3s`Mi|#uPI_NJ}(ET07u4jNpszoGtCA!9SMm}Iu zdkZYGogHe=vzPB zGP)-wy#XR4qQB{W1_DIIsY03orDSt$;crU=O6U&Gx>xLh^4jDvhW3McnCyxH_4^TM#3Ic^B-rFoW5oe%Jg~ z_UH?!dX8i1MLN>D0i&RMw{GViEEEt7X43NJ9y@}p)VSx(7_NMSKQLrhWD`IW@9)Oi zZR(~(BtcIh8xGmQZh~{R`)Xmk_{y_yz*sW!lSq#MhWVBFfE3sw>Iz2q#gIS!U_AwY zK1_CzT`~V%5%D0;?>f@5y>scqYJzYK+Ddb`i9jW>N-tByHvX^`_?_Q;4{pGdogj0h z<<;L=?Jnf?V?EMF3VVPQ3+bZo%~ykrl$?JygdqZI!l!c!u8+jq54QqZe+_FoJ2*;!Y(Yy!arg=s~E zWdRd6A8JL97r5?XIU)^WIWhtJ4#lO*)Oa4uY1Ii^t-!D;e~@BkO1_m(=vRS!jSheg zca?Gr5}5-|>nP2+u9*E?5=o5N3H`GIegvn$k~_yN5X)m@D;IQv^@p@>PuH;Gr{Ain zmPr${dYv!E3%sBe2e*4}cC%bGiiG3KQxY3@H4sn9=+O{CVyDex5up084)a&DMk>N53`Z1MHKihqyU2P#Bp5H$Z#;I>o=k`Y>_G_Ac zm)LKgL3@7IQ!w+BXR5AaS|Q}q$S-%xRIPnc)(`<`2MX5bcf!qRK~@ z3LUc+F62BKCNHwjc6HDBg=*i6K_@n{J<+MnvK~C|k9)Co}TfWDZBlfwHWW#SD zx!lOP)$}Yq&<<_f6t#t&V@ad@>I2~8#;Y`Q01Co1Q^0%w$QE(5w1)O6a6Lb@NxBBM zDf8u;uRJ?!R4#AN?un?bYQ@zkf{*+qQqLenEgjiRT71>>gIWU1F9C70?IOFVLUwS5 zh1r`5mL@MK*F}DbIo_tVWB`K$IoX1faW2iW0C7B* zGl*It2+R?F6|?;b#@t_*`nHQUa)Fb=c_DYE0QKsHfM5SSQ!~y$XFNFv0x3(4AuZSA=*+>?To{^N6Z9m&0BbPpy ztQj%5u1fyUT&DET#&Vcq_OL5J=IccH3q(9+UaJ=q>v==iNy&`G8Cj&iBcS{SudzSi z7&Mimw37k5HfHKQ9h6%G}gHOSvS9 zrY$j6N&80U>nF?_d!^$p@?z`I%+N!O9rYOsV;E4}2}~#pWxIBMDW=IjLcyWo(z+&% zE0nm!`t-(LYiR<+yqE?|h6C}PXAL`P(ax_aMeA6iE~zc*Y}lQv!I>!TA^BIHm*p$; z1d-4Sb5!siC~Kg)>6RQ?-@J{Uh~~aB-%+rrv^kSfB#wjYTuC8Xr}puFc3u<#cVafZ zya&;n7hcZ~*OAeSG0k5OqS)iN+-}nW0W>#b_}dW78SPm_Q<_HL115@G!7@H-L{}}#;}#<9wPe{Q}8`91#~K2 zHpdm$Ul4O1)z98U4jGx#CW`PJ1uNHsPMNzh-i>L@Nc;NpbggyP&JbZ1!6Gh2 zK*WjW?#hMj(Tmt%+-tP4%HuyER+gXO7s*xwm$e>S-`x!58*?AQV%$actNvBWg^$O8 zByl_}z2o&4i3baNB+s25CLHj+E=_6i=a0`HCqX>yG=pQwTwv&xD@|MEZp53q)uAXg zD&pkasXb`!OX0@RZlFF{kfLjRUg?9re-!u{w4y^8?!vM}S^U*%V@?@}+kVE3HkXfe zLtSk*TWYXM{@q3ymlf<9R^}KWw&#F$+bONz+Gaxc@;vqj-QWqAsmwY1*_=_nO@}Pw z4)zHB_TF}KGbew?^pl1bQ;mTj)Kid)qA{V8hs!E(-$MF3oueab4+p^b_TH>Wg{%vP z<_0IR21sCXds3gmTsy~kJxG5iXfemIQH+2V6Y<|ouI?|>;8=`db+o0FCFBdTs%XG-6PPFq+8)Hr3{XzVUH2GVN&!x58DM-$-ai^rL9XU=)ktGU9?ic%C_@O|xsVHz^2G#Uj7v^twNzAxt3Ni7s{-YbF0h z*kyDmLNeGZeZx839KN28CE4te<-sNO*T}xc)+Cn}=~x2jd&xT zxRzonhb6=K0AHijD06W>tTsz!arKpzD(pQ^m1z?VWM~ z`0;X}@<35E0qI3+uUzos<(*za&Dn)@KG~T>F_ybiZj=Fb{xoZnn<} zm(IOyY%EwV-qP7v8Ag@4Da6}8s?z&~b0K~|fN4|+PKhZ`Sq@c6vTCd3AJ8OWWF>Dg zu}fhIj&1T3wvEi`lq1FA7wQ{o!rd*Qr#@68vtd**^;0-r7%#{=bLb$hABdIe5gZs4 z!GSSEarIrEO^7JKlezZm@-VDmALBtux8p>x^W)Sg+Xxezto35ygi#9D+9}%~v!M)D z&|Bntf{oVho6X$$T7D$-VA!QYTpB;6%m}W*s3g&m+^Sk z-D~W`!o3!rg!ME`dLcvgeZ~Xz6XgOb1CC=2&?Blvdrfn`Or+85HOXG_26d^#2o(y!_+F_f^ z7jK*NPEd7`HGV~}@y{kKikO635nJ#C3SQA;b=qWY{7A3hXEeOjIdfsOSt0-6wIc;i zJ1Qtzn5(#M%T3`rn=QwS$R`PB%UJ*oy1&m|^r99TlYt$d)$(qU^9)!rzNDr~>s|Uf zj4lWtVr`KSQ;=9IWZ(q^cwbI;wNElP16sLz-*kntKJ&_96MtaYNgkgp(z5q5%RGQX zUg|t>>RrR}T45Lg$0Rz7vX(Pih(jUlq{sS2x^tmC6?dQDz{9(@-bHTN4U3TbD;G+% zlXZmu7skY)to(KkWjv=L%C`J!*Q;8lHy0r)+Q#aL-*ewz* zVY)>wzkR-)SB1FHl2FeXrgdXzRt>d`D`wFDfMPedKQ5ZjCOt^Qv4v$(4dyr)%Sv64 z^yfAMhH&rthC_;Kb7m`aZco;_KUunuCo&e2qFo(n1&^{$eb%A!q>w>>)RFL2h)1`= z-3yJ0%qr(ehm*@I{TmSJMgwDQ@{ozYi;OqZfVbtD9%gB)P^ z`p*~RhqgI2Pn%Nj``YK2&}=4@C41V?r|$SKZOfC8cuZ>o9TkE33Ex5_Ns8L*S#LoO z`wqU8_qEPeX3vEbGZM;YfR|Rzs~({gz!ntg?((iNF!TFnb+B_9p40(G!42|YZ_>Q` zT(I`IjI#fD0YLi{G92w7b1Qb*I|ng@5=kx?en^Q@EE4PM#%J9f^-GZQDm9KLAvV`2 z#05~W(u(#2tO!+l`8ENrroD`#Z&TN`O|H!d?VP0)5?dyj7*Z%LK|p0inA=vl^Xb$l zZYh}=5tU&#u^LjG?8J{(7?q{k>lCvhMdO2Qk%LmQor}B@bOE$c_mpMH;(V}EaD+x= z5p+w!%R<`ELcMkdhvKaVA}|S*0=|PrY#dw(AKv!Y?=lyxDH;%JuB(?poeh34-u17} z?f1>?_3sjU#sak6%Kcl{H&g^c2IooM+p*#w(Amg@=gOO>4%+91sor{_Eu*gUiS28w zh|Rbuk{+is^5hF1t?6teP6o-!(^>0Vak+1;qZA+H6=*SxREb}Tf`A=&s#O`;_M&DEmRd;3}N`$zT)k|BG`LzwRs$AYX?8X$3%5I z?LBFy>n~dJ2;Yfa==XXjMB-&D7di1~gh2&4S;vwbZ$BP$?c46e(S_;ON4_~*w)&~1 z1Kv+pC{{N-wKU~hic_N~`$wYDED**hU5VZOmMD%yT4M-p{M0%{uiLh3hw7otJX5$V z*RlR@K{k(u1!slxWcnBy6B2ss;x9V0qK1viZY{CH%txm(Nh)xTn=&#vZrOegu9!8t zLDeHaxk{;4g!AcX$^kJU50aAI>1o;ZDVT*+G9gc!HEDdlO5z~gFyAjmF=%eQE99eg zQ}|-8-dh=?M!$%Z8_j6y_Z$5fAc3Kvalw?QvJ(S+F${v zUVjE0RxSV6$12)kPsAJBf#1CSEiTyGr&zC~lsJD>pF4%fHL$WqT=@7oqzbO0Yx8m< zXDYy|zw;vxMsBUe#G=FWJ6Lrro$+bZrv$Sn0^P}(gT-iD`5wK){()@sg}Ys0n&O9+ zI@a9W#KCRGs=~|E1QNoUcXBu$XXYK4cYD?HpA&sSq?yri(NfWh2a0bHI^xr`s&|_wWkLU++GQtG;I^YrIfSz0J`Tm)sTVd>V`sNfG>j@4Aqq*9Q`cy2YBev`PHizFQ(q3(rr7Bt?NHH?k?{4f?$5$2jBP}6#R z*PA#A6SpOgsTA1>IRVG8T`|ZCH`|f82DtV3QsHFWSWa7BYxS!zb*|hb?rna>+$th1 zpY}{EL@?MF_nb8Om$v&H@e-Q`%C!BlNNcCEg?tXBf?(B2-e(CPyC1ggFIGaqNcK*E zcL2l&M`b5EIqeu}5Y@vOU=cpzO^D;y{?}whC@5-49(H7e=Qi6~UUl862;b`FQm+qn z?9?Ujc=BUmk67daUb{l9{q_AbUURu~`pW)34LVs5cNTWr@*ISp!L*CaZKwpngZbbO z$tzbdL+S(WB{dZbQRKJ2ry~~5$-fM01n$UJS+z%YQbZCF#?CcV@ADDS&8ltW1Nr7< zuG-b)Fz!o9#@m!pjX$$rMAX*Wsy+aK!ESV;$>d!b4SdW9=1N|f9i(GB_S5W)n5^tL z1Ezhx|JZ0O4EB|WRAGOl>iM7A8QanRt?zIZMraolGy)r!f9{L~Wmxjmm-=lLaO zGAS<2ZhnXD3tr5q35l4c#i~jFd`JC6@=%)EP?N;_6%zQBwjW+_ zEyuF=Cy)ztt7NgtArX}Lt6ShJ-BDrt=0UGQF4FUIMfhEK0CHiB&TjC6<{24uF^`bp zh^Y;g_bvp>r_5CRf+Q2UF36@u>0nzXEREZe*@BDR*%tn%n4RvU4Fn>Cm$JHktt$VUr>Q5=FtHt}Wr3&FA} z20y^8EP`jN^84FC<6R1GLpn4LoRGf|`Uht~(&}lBJATCb8H@)(N+iKw*gBR=Vuh zpER0Y6-xC>luCB`O_A+b(YPy&)X)Bs!0u7HY+0KE^}2vxJNNQt9bga!} z@6JBh?EHq2GTsH|Rrf%Of&Vb)4ub_>n>S)lQH$elc9?bsL{Z25NL;*pXtEoJWFa2- zF7F(8yOF^{q+$cesvP2TTe|Iu! z?v5Pf?87UdcXOEUKyM;;6th5h(7oa3>XXGGluIW{#0sr^){<_&8$Y_V6&@Du5C*oZ;)a~fN9>T9&#@-2H^e_U!$2u6d*;2fa#iBD?x=w9n!kp`dbQ;ajky`vHt!`VQ4!Gm&Eo71~HY zu|C-lDr?dsgORd)P*zP)>d73d*%?wsU&f;$X;EiT#5Dqa}V#>wCWjDw%U^ zO;(o=&A8MEzg<%DT|Vp){!xGN4v@u3X*KTEQ}8Imw)Y0)4oww=L#%*P;?B|pqJZkR zB_a2Z49WMicPg%NI|imm276Cb+3M<+t75YozDLYPnRQlz+_n#pnkMf?nvGTRWd^!F zPjcSsvTC>S13vmRbS=_!p^8h6+{Elt2v%n{$rqUcVnV1Uvok#+v|VMwKb_M=@b zC1PiuIRrJ1SU6U6SX;64Frxw~6^#lTRr_`^vSi|r#&f#-lc`Wg#%S5HY2jwl?=FA5 z#suSh=E35`Ylu>%F4Z`xehrO2Ejsv###7q}XZKRZR4|1mNa7d#HdQla;j|>#uQ3(6 zKT%L5)R{~iZMo-^)T?wEnQaY#UH5(3&E6Qw#w+2+AC9yfh2ufg5&tpBm<*?ndSnv+ z>NB7=1kbZ0zy~LED*=yf$m0FYdwVY3$#ZyKw`?;noEsiFIny@(6EU+NKl6DkUXUY@aCZ>-;)uu_aYtVLOnfG}nyiFJvXHg7R5)QcNP|o-(nLUuqX- z`9RuPQnI-{B`Z3%!}Gd8Qy{XFqjZnL%H6Z;V@1=ljxf`2K!o+>eCqw#Gs|>@53q#M z9ym?MuVhJc*cTb#Y^n5iC%>L;A$#bmS5C~uZTjdgGJ>#ImKof2WVbKcm1z`-LJqYJ zXBThI=>4_Gjll|LEC>+_5jm~3Mk*L^1y1PMktc#$76{#FOi>W(Vb3?ha9%@3i) zNKs`)?yPiON29f08Z1?U`pQkomHvbBCnbmVrLx3>@WSYDY7w_la=+heKa!OK#SMMC z!WCsb_)GbNH~nm|jV^6hR_m@OpdQ+ZAJMw>)(q?%!5K_;%#{#Xmlnr6_)bX(5i7-< zE}fTmW402sAKPhMhnIu4pwqPZmnc0@3-#PEKo;Xv?j-#tKD4jnr#-9VlJm-FI8fO`Q*u7`6X_3 z+M)@ApQC2M1}7&#byUTUkrhZ7s_{89N^r$CHB;w(XT#&C;1S4T|o#0hTL)+=JSRyqO&?-%`MxW9= zBG26?jK^5ypR_e5%-L#WNz)Q?f0r6QaAS?>c4i28=#=kJFVDQlSiD;Dk z`c(%Wj8NR_4bC-(E8S?KwNwlzTf%ktdUu={Cxj>3jl6YxW#<*YMs$d2gOj+7(S#$< zi-FGqPusaKs+ex}h+A#YyJhU5{K{>M{8qvrH9rm+j)(0i%g%RFsSJFNuta}beh*XL z54B|!$=2^nALAIir?FhmoO$32@;86s&r zCyH5eIpK8egwuWzZ6AcP<6<<29yEj;u|5gdnEv?Qh)1y|fog`rS8q=voPu#+cR%Omp(g? zq6okScjy7d?0tO2&>0SmP(GelmD_j29e5k-&>c1d9mOp>vM+-p-A8=-NxTu1)JmTxvW4`#8bDIZ zC)a*gx<*&oyAj__WB1;YgZEttu;yBuQCHQ2c71i`C6TP|5@b+QKlW|+)bJ9Y)cSdLVQXr9ayxYl3GA3lkIToZ@hnlM5q(eiKN8tCi3fHL@eEh zH2iCI%9L^q;dDip@L-wgH)LoFEN;?hINl_iY<*%lb&(fV^o*>T^=Voq&!f}AI{nn~ zK8>LR`E#OfJnKAn;V$~u-y(5|eL<7= zF3(k(<&y*unMKMhEovKR@>pp?+(4Qz53LZzo;PN`>}O`U(_~Jqx3i8r z^O7_sjMD|#MbO`e>I5cK$NG8RTS2QX zboDgn44Fy#FqODl&U#oL1FV}ikN?BZm**D~wL5|5QyzoklUZd`gHe~;;*x&6RwGb* zy-DF3$r=c(ot8PyI}(bb50N`k9Hv%M5^Gwr#pz2owdVJ5LnYlh;?;LJe%mLHi1TpL zxvXSBqOGcH?hU)+4-*m?2XQITf*SOv_;iVp&gd6ckWq?B5oqTEnnwLV@Fsbra5%Z` zaVn-3Wa(QqIfi0{mcdkp7s@Q-~9T3F<8QoCxhZ8UnZXR zg8#E)IT$I|62x$2-L`R$Y$~BQiLvtY-1;#M^4F$a?>m;(oIY{1EQ4L`wrj0R__7If zgolb2H?i4fH4lV|P})C4>~*&Vub|^Ls-6)+<|KQedM;8Ara0g>BR20N8KuC~~WX zM!)f5-@I9kKxOKR%gf*3B8=N^X%OULc719Z2)54CzzFH5CO@z>j((14gpll$!DjK@ zjECA&n0wf|DK7-IN>Ji#%IJU%kq~X%aok(f8lrxItytCjy&TcjcEb|U@+rQAr1Nut z`$#cL0l6sO>}HvB`^z3{Q5j1N9@S6L#gl;5op8e5AVzJbT`+Iyks#uJr$N5gUO`31 zP~qB#pKJoIgKefD{L2*FDok&0YbHOL!_=qc>sd!gdjSuVok?iK2@cYc_P7QwDgDUT@Td>{-a8gA3MJ2@TM#)(-oe9)8iFSuX`*7} zS_$a}M4gZvt6`*v(e;yl@aYcFD@bXa0rnJ~OU>x*XHlEL#`bqkW84xr1EnZwAvlI* z%$1=o&zKupw=Yu|myf)W24=!98R>$#S?a#JD7|8Ndi+n@F=*jJi7oslW5hBMLi7(Q zaD-sL{)&{?I;={HC_<2_GhZt0Im7SML!B2##^Z*`!cRhYRjTGc(#4R*~WmP!k_aBWFJW zdnw~RqO#K*!EuXZY9M~B&W{jZWd3ta0?qC=;|N7h+N{}jCe|ET2xnu0X2+JYMpv6c zw}tsfzgJw9zu$-}A&7A>0u*xk*O~|vM_=Ft5||BX6@4*k3^@Dr;rjCrMyY$Z-uGAA ze>J^0KRZ)0n|iYS9g{=o&jX);>}c)+M_}F;Nsv834)-l|L9^#>cDO&<^oaGw)p5h3 zs*9PXdcCH3hUO+%v{m$W$%o26~n>K~8>U)K5{1hqr z^z-MDuFOf|E$5K*6mkW-MS+}FzK`!JuaKS4hsYPH6o70 ztra>Vr!~f{rM^?7 z^^Bu?pP!aZHT*xi-a0DEwf*BIMi>O7R9c1xLApV5XlW#+WathFr5mL|T0lya5|Hlh zlolifr8}g~J-Xlbx8HNtS&M)6UVFpL^W1U8_xgO54XOLk3lR7?6=d6x`|agaPjrY7 zdFGCGq@-GqZIVI)Mkq=*qXA3@pM*|>KZ#1#&lAkEi_%6f<`Y1mGWsDI1}t#=fQ$qw zd0Gy9Kl-pIlokO8BrBIw&DUo09`tuKsh=G1&#Ec1AgHoMtP}5FR$f_^B9sV6s{I5} zCqHi~dQi-y#+cp_*%QB^417i~i-c%0V)cj*ewdU*yK|fAiKF=0=Pb^eZZRo2axjkh z0;C3(u0jmhXHV>K5ce6FZQ+;$MkWC$Gv>=mUKRowySa}UgY7E+OpbvtYp)*VS3qwy zApE{uW<&@v7%)qlH@|(-Bw4lcc(uMJIKE>`k;bqqda3d4!t=*;9(y3=Vx=w!(UQYT z+^czBTJze*`{%}{T$Y)bQ44c=^P&-7=B>fCsJowOo;qrk2hchCYL zh+an6kvJK6arVu-JYvHbZcq_SH3lE0_!lAV*RuS6;cWR$hL_(<{40)gD$&fWHveH#DhG96-m=Jb!n>G*djyB)<&%6~w-mi> zday|ra(JjH#{{oMGJhx@-unSl>*#;CXX^}jWQ=hi+7Bh&T0aE$v>v{GO924^uub3??8nEKLw@^nWMQ;QNc zpy9o%`!i~o;xJ?yMhuJX54u|Iji;304KQZ~>bdYREHGPoV6b*ufF|CS=%JPZo#-9w zk#9jlMEvIwSQ*Hgq-nvM2sjU69V?{$S75MBnKSMAZ@RNLdObj^zy0HDD?zwHKr`a| z7~Y1s+;!Y}@^FE0R1xQygquG~+i@vCG#Iw=53CJNX-{53TCnsO-=g(SVM4a{>WTVV z*M<4!R4^!bJ08K(*$>IHaY@sdA{{9f1OwW<+1R4@L0DT}DszS#3ZEqDJI@!}td9#7YP)h$ z-4m-^iaz9lL9N0^BsVfc*>7Cxi`2x{+~RS7F1 z&4bo`5MJ*iSmeG%go_F=nzAQK%r>pYDSEk|U(TEf)~<^`OX(VEevH%Nu(Q$H1w^7hLdXcEG8dZ28TV3ez> z??V=jygGB-C)ea)1kp)<2a3OG;&T#CSU@MMZX_q-T(tXQ<=Im~w(rq@F*=eERZrLw zen5(nX7EM)WJ)s2q_@b&n@{$p>KvJ|`%1bT3X;Yv-O=s<4J4PhFjNE7>*c}k_|wOE z#tOFIG-uBdXz%*j>b-(Qp8Hfj+dS*mG~F%S$mzngyf*d%7@z)CLBs-2Tg7o_uG>?F zs&g@ao;G#Xzya%yUqkPNwd?iem9-HRzO$&Nm^_1AZThwFX>RfvppXa{bXSH?)%NQ(lDeSccV1KANKU_ZB z-7dnKJ9M9suAVm1DxC*5-No^sSM(f6j=T&E0-kVCuB6b&cQe{a1w-;FQXV0|T)&y^ zu%vE~{FTlB;4Q@GCH#yz<2vMeiN3Io56hH{twwAPJ+;aG*ZV}CEN4*h9zTdUgRt(f zZa67l9GI+_r+hK4c2YIes6Mm-uyqHoHETV)&;kiTd}B z1E+n0rUbfi?29%vX^hG$T-|K1$|ozPx!*N(sCb*iyj1k9dFXHTejJpYF97hxc{4N8 zuct_GjP8I5C6_Sc6^#7_m^@JI`zroF+_eo(w6l~1-+*MKV)FkMBJzW8K!KB2UlLj} zw}tYA@W&O|IICt*Y?FrIDg<^$LL?L@Nr0no1nE9Q_-?zKRUR!N-nh(vuVm0ibU|u? z1Xd!r=RhF2y{SRZ>9~pBG(Mk$F#}S%#Fv5v#z$K-U#Hp{0TnKX%)$y2rtJ+%UjiGq z@=4e89uObDpaQfQBaMf}#Z`Y~^8ZO_$GssvLvP}_i%KkY3^}IWYRSG9({7#xRFQup zEjDhWAt4e-$s~3!DP8-ciUBSkUB^~ilS44XH}O_cnbJaM8TG}_8j0ns%QiqSyJXYM zI3z9G{yeW|Q~Y)pfBRptt9YOUEbW6Xu7Klr7#oGp+0qVD%1*CRHDjJ;eWa5v#VlA3 zh7i}bi-0S+=eK8WGiE?;Hl92S!Z7doMWms@W55pB$#}Kng#QqXfI6wH)qt5}=aW}p zvR(^j;R=XSVm!9U!KGU1$y8C8Ps@oY@SF5r5EPzJ{?~(r`U|~5dF={ZE#7oRA2|X< z%Cn1;?HoJ3KZJpQW&VJPPuJxQ%9Bk?0NS$+=JTFVRoI37$WZhXf!8GC#)xsE0QaAmI+C1d(PJLw`tzz)Zy++{*4f zX{~L8795^Q!27~$kZU(o+zEJYn%ieiI!+ep-V3`caQa6w<^}SI!;af>%rTnAcsWb~ z<#J>aLBG9>%`nP>$yUr4F|ZI|>gmW^>XHOtj$VP^F$c`-tOozW>mP`aFII>(Z_jZgoJ$IW_AtLhxkrEBr-2a_Ek83HX)5~=2m zYU&$Q#Y_QgS}HAMItXYpgrOck--ilmvbUhcbpf+PB$6X#`hI03Fu)kv)i-wu#n3;Ott4*UQE~e!PU*0( zDS)-H4J!a9a>Lc@r0DePbm$e13&fMdjukAo`NIC8MXXMyGo>vELPMnQ+$EYp9V zA?PT)7Vo{sqXo#2*#dPSdGng_mI0$*j>rl^S0&hhGMn-&Adf7@%cn)f6f_0SEXYBY zqLCwHriZk3{Sv9pP-`2A%eQCt92l-Fy|YS?Up-ord~)%hH--H0`~;R0pWaf{m|aW^ z7LFABc^&^fQN_1uMgThI^tAbAUX8Dg!plaJxs8i9L|K5Z?{(HQ0=>=2VxkE#B+vE~ zXjbcxE{O!Czyz_?T<%Y!`JZ1}umDx%as=79YloWBulIbV$y~ILUmo=c27yVrr?V=*@DAdJe`e+B4Bb^hp> zzZabociWJ99bPardz^jzj3s?|3+bCa@soFp$Qk2~t7!j}Mbk>9(Z9>FS(gOU;D2g# zo<#yrD!y5}bU+M2(L^~j7V{%X96E#$tscDhB&i>fC~TFVBGc0y8W78|y*fSRz(jX_ zlW#V9nSw+THsi4pNr}1*p6){QTN~po0P^)7{|_$hpO*r}t>duIpae1Q4GLw)8 zxruN&&WhG+(UtX)f1hx`5Fr%BU+XRpsI$nj%RZ5r1-htbE5rw2J)HX$O?(>SMG2^8 z#itGd5dXaj7t+=h$t^&N>l;tsYC_s90iqXo+2eIeFfdKE30OQUSd8Jg|F5ZULzv&E zeNX}20k<>TZebJ%YKS6iR+cCAY#zCTx#fFa8vtrZyH9dC1B7crgm=3m$=8|QAAmPm z3trJEu%VeN2DgFz+oxLlc`8)^L?620Lfz#8wI)5f0z2l7nbu!Sl z5?}Q+Z@AWFT~9l_6jBDj?M7Bci72n~0>6N%$N$@LSuzCEfVD`8It4|0%sgU{OUlQ~QR*jrD*sq%Y&4oIn*PL9AW z#T7ZMfWBsvUP=M$_64gTWBmCB$=S^!#%Qn&;GzYQBWES`e8K%_4>-||Jx?r-I}an( zF8=4F>mx$dq}@B;Q`*#M> zVLgC`rGVr9FYQj;1UgOzn3Bv~?W{!_AM8PKYOj1S&)JU}0lvq?)KUA4p;PC*99GeQ zZt!=$0O(SG`ddj{+dhTsEnh`HgPgdf4~suBo0l1JN7lzbxx~oC zUCx55afA}wdOJu--Mk_2>TBwdON6mKbOVT4=gFm2TV${#o2Lk@61Ph!)x$~Q#0r$^ zY(GHcWpxKc92Z#8d zLl;Jc#;Y)L2CfV*C1IUjz!H_%mj065c9gSTHXGPWNA&&AYPR{Dx`Da99HC8M&k89DsmvJa{v61AO0;=3pf2rjTIJZ zm=&|X0^iJL3Ke>SIR!xEr25Es=jkJL;5mzGQY{2xJ#O}tbcT<%-n2HCfZ8BHm!?+O zekz^EhAq+lyI_ZfUWp0Yln=~L5yx@c3QvI>toqCXwTi%!QM@#DCM4!B3hnQ`soIR+ z!Z`Fpe|?siM$Iz3X;0_xuxis2Izza!A1lp!L>K9(aJeUf6QZ(A13 z=n6a3!cpwodUJ|Ry$WWD-9WYy9)<&WTQBYw;eOMXzguNSuOGDC_F}WT&Rb8+s?>M{ z$Y&Jldjk@%{6Ep`0ls)WvO-D~#j^@zWltwcZi_}mOaU}@CzU%0+J)%u_Ya;K4Ljd3 zS3B8=fKNUC0L}OM!u|9sqr=2^6jyWV7<7LIrBs1L)+(?#Dm!-AR%4p%NTxK2NKE(U5)#wt7|wV+kBmK4&cX3j zOXYLcN?l2zN0MyrdmmsGl*d`WQwe_>`!Ml!%>&H3O9F1=WwaX6gB9N`aE%VWv;>XP;3W`JeZ7zC`hp5-)8^{6jKMv$35qoIzWa{{ofc||GMZF0Dt0Q zocmEXn|f005-I)fkPR@HG0^BUaV78A1HP$?%#bC3aF$AYgFCF2L&wJP{THOEZapyg z@JtcxbD{vvnd`>NkKx)3Duc<6SD?YgLcmjCVh8v@JoJUR;2UvkRPCZN$77ta`2opCijCNQ=@}K$gS8 zyZ)dD&B}CZy38K%teOVFow%jQjHGAvk@*|lT!miz?%sXP*VTp1zLCnV<^u1?#(hSK zETiW@@h^+tpe};3RJg#S=F5Z8mmjqOEqe1i2o`?wfGF1v3At|zA>c6tMvdJr;+%?X z`R}xmLiYa))UEIP*KO!eu__4tZ^45Z^dP~Rybeys#cpLvYqWO_`7<))cB?&MAD*=V zzT_hJ3f0S>)TQ|nz458#0M6LZ_OWe@w#pRbv=vWJ!9thatkr6B1CpPDah|RC-tVk= z1P8~uzv#@WmX`K&=97o7FDRKS-cL+BmdtDmRJUB>ywvrC?SWXXo>?RRtQBd9nu5{k zeMdS%RSENL7TA>&#b3r0>;YrxNP+ARGt&FCIs8f$mGNu%_g-Q|_0}E>{`*gue_K3D21*uT0cg%{ptIzm>?|X&gUdJFVv$VC9U*a4ARmBH`9VtG z@_leywa?=s6D86>05@m@g!XcE;cw!8$pfAhxvWi}RT%84t%7oKh55WFK@4`p_i@+~ zpwb6^pJSf>sy2H2>!-E3j_(!kjfihyip$$&Rv;{Uk-*gkqv7E(CB&NXq_yA(cLhpY z&&sUaI2NGm4<{FLwbV4sf4`Ks`8PBHbm10Yda|rDa7Xma7_8slpl(zPChzJc)4;Vr z^RgK{kh<%ui*?p}7`1IU2fSuo2Q$DqKE5<2z<#4 z#<9KKK+K|k2snwg01+?q?eJL8eony1Cm6jTP3f~91H_sPwdob;JSQ3f1nF_AsCRK@ zv@<;@8pTq{0;$0qxaZcQ1lLHCfSC~i2@=tvl*-c!T-)s`MSXDpIYKu?(F#f=?gpv) zEkn#OY`%Hi7x?`%CujCM60B&g5ghU6fab52M+hF(j%qy#ueD~LcNOX_9PXZ6-3XVZ z4{1L%>QW!Ss_G^muv@A_s>1(op}ZjW2MVcLz>4ZBEL_H0w(0-h!(6-!A5??`VP%4! zb?JiR+4{z50Xd~RcbIkDdmzAh!?yrZO!r8vJ;843_qqw!=&g17esFmk2-5UuytWnFj;F!YWT)XyeJ zAmJbdS`!{X7S%Gw_W^qL!tV@(<{QZ4MzuOe^ELwENM8KLJgP=Y(lbXTlimQV7W=@R?-T(^A@|GAbQNi9;}r z6{t4yjKG5=N3#$;2{8;(@<9WneX6ADa3-sm+mZQ9u{eWyg3`fZ@XL#m)8?ud!}At( zr|H7i8N(LeF17#z!^YQTSodP4dEK)wiQNIY@Hg9wjhdTk0TJB}!)%`rl>q<0KmJ?z zJ!N_j0jrAV%8N^ry1xQlx}qp;+;--yfZeSp?*~8nl|XHV3o7%h z9^hY-k*2cVJjhAC&&9X|J@ROmje9Y#sYPgn;jAmt*z}^HbZI6Es7K^)#j7a#NcQ8x zXAh>q4QOBGuwY+(E=(&dO;hEzt#=4sHCg%O=RJN|j7+3`C(n|zCR+1`8z5w#lN z(`7so?rd(g$n-WxDmD9?0c;{4*jS$|5QG4(QI!D=ip_X|TK0V5J7Ap-4HGBB;`rbV z7WK9Rd-F$=uDMsE+0t&LAf34KMxx}EX3EXlQ$M&;9i9MX+RVZup|#Ts(6r-5ynbUg zK#i{-boD)nRd?)Kbf5efpnTFmkVdxS1VEW?hL4|jggr43rpCmK^Gw|P_S+j#*HS{6 zpzCyq>7MRO-oed9(DM+roT)A@EzI})MvO7SlY4=j#O0tZnUtMTV3Ujnh6a8--=K)* zf_&)vqD7?nS)(Xm*4H0xf1CPUh|G;gJZ-*+eldaoPg^W65E4lAdZPIQ+oDwp(vFRZ zLOVZHVir(NvSc`>APr-qLH3>YXfC@Sl8gTTU38iXDXOqU>3;{kW@1bcDRiq<)Yk* z^g1|Dtr(QkEibpok_=ydd=yRijP5!5`Vpo396^`>C?;BJl{~WwrbT*%ZD&H;S_3gL zXLym51unBzAEb|RqiJHeAz{Yb;Qniz25IpFjjScODqHqsWDM(FHlj{&zQ3JzDz2Z>nRewoUec7xc+&~Oe+~FyTZ5J zYPx7P$^$bkST-3yyuEzmm_v6{dU!UW@@G5QbQ+*K`s8$HzF6EA)Yu|kJ~lLe3`*Qu zkS=Udi-yH)#26SYF=;+z`YLI~+@0&Ozf5Q;priK&xr*3`EEk#@uk4`Y7y)QT#*4FH zdA5&#dfM+(tzDqnx~<$_2ZS;bRXAMQFnlbQZDLQSS%$b@*`OS64(#e$0fAaAXUieY z=w(dP)k3X@iUhTw%Bdf_e3LGYT01N(>wD?`mdVZm369StbOv1nu!j%5s+VlL zZ(u}!4-4OUN*cLK$&QHd%cV?gxc4w$$S>>X?geB?U$M zXsyvA;yZ>TkOe+LhLM&VC1Z9Z+=}r?#FqTSx!BoDPt3Ko36bi9fwqQJi#q`*o*$%P znNJ{aaj-@kA=lo8$swC}AL)%>dmM~K_c$bU!Fw331^AoKma)Hyq#Qm!9>%LTvZmA1 zXIQNM{<82FIOxAHwE8gofL{n&OoC#$iB8)Vurlh8Ii@MXz3548m68M%M>`_;16Gp| z%fO^Du&*{D$(OvABQSij7|HY^Jb|YKcQ_;vX#F>*WeOMYdX?*{`32i>Li!^7*P6cy z@`zb=jE!{mDgD|hM(;1E4UxRL$dDje<8qG~416LKyC&>M;I)4hvhS#T_tu5Fi3l8) z>L@db^WaI;;}abzE!-wra0omgy>Nclrc#@ zx~Z+)@_edC@tda1jnRHm1J9>6?)%p}$o9cvh$qI-Cvn=JxhS@ace@E9KuKNK?R#dA z-M)LIagb-F%TTvabghUP#yQ*G&57x|9Miw2QE(cHQC8-$MF(z$;f*^jqX&)M5C^SBMrl9`(@u6{z+b>>0kj3IoyQU)fuD#TvGoQF>;sM38D=#omW)#1efwL@;>pMbcqcf-XVgU=1mPc+T$ zK|cV~y>DIbqWM-MzAZVdgCKLju^O>n13L9viY#N=T)nxC6+p7cn0&^K^&{`TjHbQF z1czDpTtCwujRO(hkLjj^D4Yp$wI1wkL?K_RLWPlx!0iEB}J) zgY?fx5)JnnL)i7Fu~C4h0qI(MnBmV*K^EVyJG=LM(UrZp?B%kSG`?2-y;{$n+U)#? z1;B%bc=P*g)@N$g5&msd=%AFZJyxT@kOZ{8&1Q{jrqQ^yNrX0 z&2AC?h4!HHifIH>ZEL0siB12!_?bHOK_|5`<^3-4Ba;+^KFdT<$mIW}xStb}6(lb> zx}M=fP}z%*{=-k701gK{^yJHj>e zu%y0pp2{Fnl@u6Tp~(7U5wy!^?>bNejT76{jXuAjefI93)4xX=4IVE%$|ecVeoa{R z@ad=NTHZs)aw%8nh`4}k-`4+Hw!U$c>ocrFgXdB-YXU|WA z2OKYv-rCQ#)|nf665I(bO)doi;G_(v!(0ugUeYZ4pY;wyOM#xDaw6KHL>34ktEnh} zk)KXVfyVE1$1e?KVurrU@jj1lS?$bBvlFgbA8!3%Jbv7t5wm-XEMXQN5e%XZC+4iU z<~kTkte^C0!xK+&G$zmJ9~Kw1St(oMXRBD(wEgA(y_m+>MqquxNr~f!QaG(ik3S~r zIwl{e6Zk`{Lfg4iR8Mc4Ie&`iaY_IBi;JQ*B8C2`%x!B1{9q@-*oPq-jHg(zac4JB zQVo60J-5FSE%yE4E7P|dj&@hPS0Ult&&abqoB)ZMtd%H*2VX6(4L@;zECO(qk}uV{ zf;vgrduOZcY?5^DxcA)$74g1|uKrxhWt=JYJ2vT9K-v1`ln-F>T|nWqrk#(r2PnF> zEoh3v55mm792OcC2PEYgQvPnb@6;T?VMdswqr*og7cSGYI}FIC4zx4p-mPl!1(qC! zHwn{{6hF6sxk6c#&8eWlCJ3ylf2J;!$kCF83EpbMiNNvRhmf&FvTdhmFey^fAq17O zKm;x5-7Dhqn5&J`$UWp|^1=JqHq$7gV_8iqZ1x$ZLb4RAq2rv&KcB%5dIo_OIknA! z;uA@qETbu;Wf4=7qKCo&EfVqbV=;7ycxt=2_j6Mz{0UFBydR>9g zIAu1Ey3gfbz5iSUby!=8JJ=&)?9@(fGLyGyQ*li&Dlg~og2ggGwvhr{EgmWc;Yoj^ ziNV9#tKKQM;P6ri=%|3t!cUf_uA)S`ENJXJ)mB(pDeHt2x5&y!Pvy+i1hXF!N$lW4O2y4g7OH1D`dcZK` z*nP#jyI-C*`h8dB+uv(D1q*S9w||a-{*~%Oq$5#Csby6x;MGM&^j4uXFKgJ1 z!j>|>l^6)?J4al+17!`}FP5>7?)|KG`UWRb`-H#J$q}HK0$s&G4Y>O(grv_E+4<)T zZ54_2gJx9AD@7QGC&{paPg1pha8pgh$W6=+lA{69tH1%4-Z2?Lb*_=vnOIMLU-!|D z=OVkb4_6$9NMk*Ku}$g)Nf9%qf}r?-S&S*U#6B?K?KAI-yfF58a>VnTy$@Uce)vVe zH>5lI9xpRB^wruftJi|SHJti@Qab;!x{N8}lzP7EpX25fdgfxX=B>-gUwv}XrywRI z3m!cX4rL8c+%=6Fm}5;Okn2uQs-+~2CAg{cq;!$KoCR(8ifhB)5__sZYOP~nE+7-) z+2`ki+ZV>6ZE-Dsf7Np4YtkcqFy4Rh(o+kXrX8;|bc@UjTqx$oL%UA@q-*++|Lz^5?7ewOsl-2&k ztml#9QCK`E?RX^23c4Psy?9s;O%3&yb6d^>4w+-2DVG>}EBE}{o}h&0a$is8m?b&d z=}}E!OtP`#?LXT8XvyBrd~n2pOV$aW{aFwy8GG=WNC>oY~t49$Os^x@0$V6G~{$`%|d(8{IuBrE2+P-fM6;?Rgy`u#gJ_%}h z#XKmiS8=x9Dua+g<5H%m@%IMJ1R3J#59@-liOKj&fi<=?21_A{i6dh@v2;Gf*6}P%`L)I_NY2o z%3dc2c?$h_<~iiF1|JIhB_LzWf0LhJ*9p>nW>Cw`1NVY>6HQDG8g3?zGTw$^88S7v zjpyGZZ(byfSI9_J*vQ@#Ho?XprPjX-N(|N51Pub^?`W?;kEsVXA}4JgEjJ`e-W3-{ zPvaqu%EVKIK3QGqMoOa{lVP2dshc6dvqrZ(sYm5 zS73z8%$mF^khm{4_O-yrpG2XWphmNe{UlwH6)Nl$w-mJeqhuG(c>A?xaT$&TCVK)w z821TMttj+@X}!~djor*dY7G^K`iZg`S~#ygC6T`e1YbTjc97JUML?8*n7~RlF$|Q` zhDr$J8BH&hxQ$-q1-&L_w+~i@=+q2&K>d0l5@|hLQUkQ4c)so+JXcL0j$Ip8x*~iP zk=Z=2X_mGqO?Kz`?-w#y@Q#NO#EiQ@Jip}uZf6sgxD-+Emztv_3yN?}A!?pw!Y-8# zSVWjJoekD=o2FC5-!tAb6lLKG3OjFFt1bswmb{tW^64*t-22SGS}y+MUHw-%@YLhz z{aVgJ_Kc0Tu~+y2Xgx8?9=SYTn?y9g;?akTdS?gu7i@t=lY`2NXZrfHjM3h%R&HiR zWZ>QKby&R@>#RFPn^w<}l#|3Ld12CbNao4l+_&=-ATP7Fo+^0<;Ah{s@AP5+rq`_D z8-TVMR+vY=bKzdI`kJrH$(NE(P#Z*Q3g23WeEnjQVfay<{uSOC8Wd%F{A0X9=<$1R zD8$c}R;x%9UH(JA~9E8@}V;jnOKBk;R)(YyiWI%Ia=9qV?i_ou$Gc zD{;8tbguAy^5!{tgz!3D^c>Z>U=hw!1iAP9zk#kG8mLayMv^S?$AuA;OYNcYW-M*N zG~yM*U(?>gRMCq3Nn&)?{pMNrAyFmhcG1O)x4%;~8qMA^Ag}H!Q)YNruGPtvz6OMi z+I==p9i#~lT$+2!eot@`^eCYT#Ckhn%$VSl#fHp~TBWept`eqo4w@}G zbuSl(FScX#x#L{Ljm?m<=bgQ3(~WewFYy(a=FFSaX8_fM0U-9wi-7XJ8~G?Yd&_fO zFi}=4XAR6lt4P%%fJU=<={oM7!^M;YWD=w~;^}Htwz};e)9d6Pvr|6Ak4A0LLtEP5 z#$FCF``zK(_~tPw9qjaS<{6-yF&P0WNGHFHWd8{VY8CyG(gvbEP5R~%!(4n8E*C~E zX*)vOcjLKzh|It{I8Q>ENlTd$6k|FxEH^(pG=et_Z1%(@79~?Pvq@sCy}9~*%ngd~ zCNcErOB{zUk|HBLEr8N9a(C(nH#R-CK?-3I%}%=IYRV0FwOcl3xk1c6roX&fKUDqu zcsTb@A~h^s-YbhcXe=cBF4pDldyVqp5=QCUT|+LB=o%W+Gvrnhv@`vbah!ACxe+D8rU2Jhcpz`K`x$ct`ah2Xv^{&D)?^&cUbtpc(l6q!>P&DAN5Z1n<@R1;J~T zcjCjhfEmI9gLc!`c)0XxcJGSPTGRYyzbBKAC-yKl0@R8Hw?4OTdGx>~=64Xmb6yrT ztbz!9nhinqbWr{TDMK^Mv$~f<*!BDnrXl8`(}x89O#*>UVZU=AIP~#xKZeh_{xl8z zj|dm5$IX(LZ_&(T;qNY0-{!v53_CmX{&9*$t6|;v^0+P4CKH^}qSSlfVWb^zPJJ7V zi;Pldj6F1JVqtPn1d|3ymPXc$;NW-#2@^MdegOi@83~SaDGBm^pm4am>8KZd&TezrYy6OX zV@f#BNUZP{kzOV-YjTizcR^VX-F;JM3YBeKbc(@$9#?}jO6cu72=!y4C8-WR>}An( z;V@!HNBr=1Md>88iTPWVM(+;0w-w!#>U?N0E39Lq$bXxZwCvuAx({J|L(E<^3MOcr zGZFnv+9m1tbYSH@3UrT0Wg^Jk8VWkgf~0=ctVC5{UtQ=nm%sECQh|&qEdq0Z+hd;& zPgjL`XiEp((b=;+-X}zGYUe3ev6Gz)ubbU%GbtZkr0Xp6ns};xu2kNjbVT`_If!cL zjEw}4gbk);Z=3bK&EZr-3?0s0Ov1yQ42xD#bd8tci+WK(GsAoq7z)T`s|q zxq5+#ln`t`<0-N$zu04U@r7m>beVfWz!mr2;F<5lr>7P4GH0R@cR$qd*?WVU0>-Jz zT7(V_Z^pM?%~dYnm=yMMO%_s~Ue$nD`n&s{Jf>b@+00iDk%eskOJ(C~*98>idt=NR zAKnP5_mVtB3`yns6RLpVHO3b!rfHCL$nfU<f0adS+AM|Y)aBl(q zu)V`hQYe8O`NmDt**aszAhrn?ui1-mdY{!ijv?|sQq>Q{viIrLJaiRaYEkg%5?It+ zNbNYPG?83R10m_=Lc{Dqq(o33(d6i{ zDo!<|+R$WoX*&Yxz3e00Hm45er-fRTG?t>Fr1bLh70}^HFb1ljQ-O2QljV;>Mk}`r zrcMSO=Y|JDJ!lh~{fJDl1h`y_@csk>-B_z(;^-Ln9%^!o`zm)iKCej~BebNGQnr(Z z?Gm&&{kqQR&Gr8YON~|MdaL~1aw@zklL=}8?O2|X0-etMCn{I`F`|}uw0G^|qsQ|gh(!D`PA-D{dj7{Mc@{hyN1XcWS2;oUY zR_fq@Ohy-fGMcAId{O_pM*_K~Zm?n=%#iBREt9sDATm1*9xm&6gDwD<00rL-&K+|T z^jM5*lpuT-?Ghd&W~g|IG7%Lj!AtrG-z6wdSjJ;3Vk;wD=SSaL^wIlKh?~4Quz|?n zFs{&3MMmk(df-DeddI5yZ&inlNW4eRk$QCV(!7hykn7F^k!hsrd{3VmEdHJah*vW^ zJ->lqjE?);p384ynVudE1AEsELiG|auPf*I%v6|t6+9WxJ`o5mDniMLT`ahMaIQ!4 z1V}uCK^OcHn~EJZP2AdUF~f-Cy+9J?Jd$BcB9#t;Qanmx>SU2Zkm zFudR+;*9O+YPB>a%>m#U<$>qVi|tBwYTqAgy}#a5vLd~&#=NyeFD$vP)}2Q7X!=&Q^Ubs0(Lc8 z??;|Ti9D>}DMO&Ppg?N*_GiOykPTxE$s=fEJu7b-^7%?n%J!8 zF))F}CFSDim;!SzDc;(CMsq1*r3^K3yWQ{svx=N;(5Md3IEy;IH5rQbj!gHmNxGIbPGGB7N2EG$AXncS0X` z(%H z+Fc-$3&vHt_q?2Y&abK_xkzN-syU< zokA(>z+A#Os%G|_GI^bg>h#v&4ryQ+ce923q!;RIqM#939#nV4jmQErn;nxfo==NT zGfMJ}%^13Ao+8T3$eg^M@cd5QvP+-c1uv(~do_OfeeYn5bmnoo&d9bRBGM?5_#-0s zIrX9D`N5As`q2fhl1k7KZc$0)+SK8wwqZ#&3frw z4gBvC34*%g=oO2MrrFom%Pkz!?EL`V&4eBVQ%{V0SK_ z)J)<%m)lZCgUN4wT$R7mt8TPNJvpW=XISj5cYizA&kLpy|>LDw!9isD}p{A6FXE@F~?7?!*umUfrqH7 z;B$ukEsy;P%i%Bwg1z;gCLP!$ledL+k=yyUPNnq(PId&yi)Y_+%Uaxd5E4!*fLL zI{~Nc_$REhL+gbp z@=c*zFAN7SgabHxpD%To07)-jlX!dP1=k02`*%gsX>Yi@&4E#D_VIQ!tseRI-{Vsm zR*Z&wvqqcKb!A#sHSc-rIFx;|kvP#*DO$MmbB}WdN(xDx+jeRm_JvwqftG>TeuqvN z&&!i7D}08SGXJ`F2>&d5%9W+}xSd>|4DJ_Mj%KS#8;Z!)XF8=sU_#A&}IYF5!#ddOx%PI-}du%7Q=crYQwt z`Y24xWf5!}!7q?0**`Dr6uO5N9L8tc`TCxlJ*flFty+ZT;o4v^%8Xf;U${fAvzrX= zo(G!M6yNOZ@TVk8SU8ch?54&)zv4E(zsvCBsV6`xZH5n)%Uf07gj)Kr*IQMf^j;3! z?QkV2!^K5mj=;S5Rt{a*BZ4>SHVn?nK2}oVCIEC|dv4oB;aPg!kSe&3%Rd)WtWO(; zPgO-^u@?o)(21xDA8M`O6?n&ryaTKlVzm5aK>vRWTaMus$RIdCT%fI9E%k5=6e?z1 zJj$5C#VYp*^;A-{zL6#i#x2Da5|aaXTW(mwyI^FljP!wk3`V$RKp<$x!M2VHF5;Ye z;gKONqr81ML1V={>>=gDh39sl8$90?Zq7f3>HdG|QTPcfKJ`1H-h_S#-b6 zO2lD!%{FBLDVLn8C;YKh*VqO(UUqN9!5oXn6sd&!0PV^x6t0^g+{2raub$rDe+eEc z-YOqvk5+cG2wW@`fq!{07Le27!*tUV0Wx zn{gsGYk=UlXgYp?v?PM~5Q|YKe47hAp`&Qn%u7#h+!uwy%7sHMsu-!0b>RzwV4BIWaIf8|L{ z$mNcK05|1blx)MdH@B!LpfloS9_#PiXK4j8K|>#ZOrah;&I58(&652HTj7NHmaq^- zv$ZgfsY$1Qbca2KkOJC9?+Z8K281tpe*!boqRetQj|#LN#joi>Tf$qTSdaG=I=mRj zS%o(S_sV~OHp%OkBJ-HZ{lbbJ#Nx)7qaMK8>SE5FcKux0P?Tt?U-4l?ft&)7xh$*X zS_2UMj-7P-;wN@*L~lu#&?z~_@%??v&hJoYWFyq;5%x-5J%&HHr1C%#N_EzXuYM>j z-SyRDCgpkLDbP`uq2ZuqF!55DN8A88{}8Ot59^1k8I3A{+nly8q*nlKzA#hm_Q|aXK^vo>~MX!I(7gY#C_utP}8K3rbFq1RkpWv zQD`H_7@EF{9n}2Os9b&EkMy!OoEwadC&gN}7&|b!fbJ@~w#Qg%qGESXi-P&>aAV`)J1B;6yy^%*$7__4O;&QD7) zPkq0B`tY0;j7KR*^Kay!5bD&s6*m^%zV{Fw8T97GxAaizA#sXd>a|@$f65BwGFd=N zE`hLXcz|y}ziDYZmK*N9uM)(10;V|hLpks0WRyCwl^QjFmzgBs@J}`dtj4P1n+7}g z+n$pr^bX>|o0pH*;t2ikC5pY?Q?sCxKk z25u-#3R0!(eJQ+{44^7yjYeKnP(&CSTuze8L^3){$_}Zg1&IR_DwnF@dqa@eR=C>v zq=g;HKvOa!7t9rkV$5#$7jL1Qr@ZgVFhoNH9^R_mFG1EO%=!xMZ&D~=#aaa<-CW*8 z%O%B0UE1OEG&rru2v^I9v~arulE9NBHUrxvLBUd)p3b?iX2Uk~b(NHOE@q5mCQ*GZ z{tg3ny*O>cRbzRIn1|Lo+n}q)H&B?;?YuCAAEbfCTm&arAWk4%YNT8j_UDzfTATCY z=X$_p<7gMv3?L6~CElPJS+9uDIV_2(eY|@I(AUWOsU&cQH4)p3kC6xl+sp*=)w-F% zxmXjF-+w9uwc;>Y2pPR%A`d7UaRZ(&o@S>&$oYXqxVhKIL$l0x3HGDECw)N1G)L{X zxD3WNd4ZTVQpgV}dIq`L0z6;s&C7s0$VVs6MlfAw`oVMi?XddGVk#LJ^a7O4(=TfcigY@h!#5yUv+kuOBQ?YrvY*`&CpK#j<8@c&kfW*hBw-^hWrHq$H>`d{ zQ>4m2fSy;FNq2dN`Jn09t7_4mniU-0LZ@6=?!MB4#!yr8Sh85J*gIRFU7RFCNuPj# z!us8jz`B!ds5GAXonRYR__Zk+msn)6l ztq>Pk5UiAfS`{v(vl)iVT8@A_ko#d(oU^pn#wkKCtjA5_1y{gc=YWc5leY2o5h4dM zLn53SHcg-TcqV*y$!T1dLP%oZm0QG6LVRxx+Wj0A?$4Vs|GwFXVi;b@*OYgS1>x%J>|&bh}ScNU)+ zeNnfpd{+YPp6mA>S&7e7T@z9zg@nxhm8?2tX|(|iPajufa^Gp<(VD8*D!d*}yjx~I z%pIqo3$tB}LW&T57SgYM1&rM?mnAUp<39uVm$(agk-J9FX6BQqb*J*g7qFK(#cfkD zqq78JmSNtGNM8ydpQc<6+_SV>*&a5cb$I}D%{-ll!~SCrJrrVR!C-CJoKmogC{la> z&>9}a{j5#qD2;G7g|bQivS0mxO`_-Co7uME1a+g${rI}2^$Xuz=7XGM0puNTj$Q(r zsSzxfqJZ^JQM=8se9z3$>$-ie?z<=NE{lm#yM9ew-PMUkw+T9M6&$AeemE(a9NOag z+^j$U!K9&;^@i!=a(!<)M(T}4glR>}rrgw7JszF<;T45XwIzd^Ga*{_7Mh4`F(ii( zCvIH~DawvlYGcK3#Qcp!pByISTEqWrStol@NiZ$-A(n5&7QKs1w9!~kLLdJEw|Wii zY_``p;K7{xTH_r`Nz1cgDk#Gf!nKw)CUlj&j#u~j`8#+UJD`*u4*e?iCyf9%!qayq zuq6x`x4jTitF+0`X+UXwW)P#9qo@{O+S5Mly_}(VQb~c5cBe7 zhQ0klB0?fVSbEEQF_6;4UyYb6H7Lg`Az&Jd;j0AHpM%cm$d_PAcw8L$Jw2sCwbc6X z@mj41=+T(?2;J2KDVx7-)Wm+%fEaXX8E7&64}y8`gPiIE-+nU z2&DWJ3G122Q7CxCo6x)l>d!taMCG!-^L0{w#P@ zD*QYI=bYh|a8?DLm2*EHQx=iy&Vvlb$TW=x+D7z<`t|9?51>w2`9R;{Bno@kcp1R6Hv0cqxluev;t~Y;CdGNei97%RqNF0_J^G*EMv|=()Q( zpKT$-C-B1ICjWuRCwMe4MXvHY=HR{p8Bw21d1J?O4jFBU&n`L28Qwf>Nmp|=q=H0S zWu`b*Ct1TbKFXj088ys_PpwzjVU^nTZfHZQMy1VgWp3x`qq#n76B<)DYt!LQmEG?y z1^HZF`;wmDTg%66p3-odJkObNhPl&dK&i2Vgc?cjt2co7lwtFkXqxsJ&lv;4E`C5K zzMw?ih|s)(Z*R&fd^SY{1vI-bzZ{b6_pk(B!=(K0(K&;XG4`YRZhFVcAAo1Z!I$c( z0rQJnPKe@s7WiMK6g*{+6-RGTW4p38T4 zn70CAv(j6=(Q&yW+ad9~SOji3USzLXzgq0N`wg!h7SdC$@zMZKm@t~xBpja6lhdU1 z!ry|!9WUPmw$o_P*()Pwuf}iIpCs7%Q1XjO2V_7Z*w@7ZO1j_}fVk9(^aC~J?#R%n z^*8rN*vOccAi7R>rg3ASK912J&D4J#2js3O5z`STR`M$9&vpy?W_PKVn%;@KUnd{K zy(N7q=Shqe$Ndu4DXADwM0m%{ChMaJ3@L_CGguo{2a0j~X>@G6#B0L`5SK^7&q?hH zrCIWX6}9$z+RvH(`FQAIC%y>z;ZqqR?IWQWhgQuH4gu?3JLsVAe|=ByPpzw(uVo>9 za;HSVVuUm~`2*Ck)UKTzR6y0^g4VfkyCa&T_Pg>1OsZLUT_1!Jt)LZ|)8!^V#_PI( zrQVY3Rl)qCvIy5bt|gr1fD7P>m)I9ID!Ru&TWLQGEuDO8I5R$lJ-_9c@!-KGXwgI^ z7sP+@XFaj>pCdH49eT!3$T}kI{>p=7w>hG{?GZB$TTrbIU*ilDmG?$C690ZFMPqZy zo)xf+4~~_c3L;{=nThwpxD1|0oP-a?a4uEP$gg9@ARbBEIfEz?MD;d$TRa{%jnj2} zdWYNdDkqO8q0ZQ(1O?C+_qfqRWtFt0>{k7lzOdg&)*7L>oKl($zQ}{KVEqw@$M9XYZCeN>AQDZ2DJE$ddLV89HWGJ1 zc(`0ZWE#4FO79??3%)#F;*GW66`Sq)v#`>=#bKsrp3F(XYSbBTQf#=t-bkQY;Y;_l zk`I@{{`&#r>89dASrDF_6q9q`WAB_uSTcve;w?pYg@^MQVZ{A&Rf_Qvrr8h-q4He3 zSBe-7ER}XvT&vHrJODL>y*w9-43>FITA>YyCy^K8~ zFK>CzGX;n*?rC0?xtPd~NM5z7M>&@sPEpHA4beVqB0{I|T~$EC;X^^C0-cX`)Kz?j zXOowSVj`o%k&+c+vjXbr3#uh*Nqd`1M*Ed~6e+D3(_OED_BmPOM8YMr31Mke-kF{8 z#K8`fU74ld)u1-?A!>amyVQ|> zEuupLlSaQ1p<8h65~O`r`^$vPcpr}ZdxO>rVErhl zn!Y}3KcurY85sMZsOE;}h#B+sNKWuUj*_7N31G?R@dneQDiA2_`v4#h#c#rmjF2BUaLENQ_Fr>2_|gb7Qb6^ zq!jVQB99m#W|2Put*%vjC6u0OpH_M+swth+7MU6%q6L*Bx+uQfVSlruyUn#4>mnTr zW4ELs_c#0BZm6dfz%lpB0GQ%h)-%cZ`o|-iLJq4MDnLLRrFB#0KbK#VEwWyJ68}&! zyNGBm^pXLj(xMo?C!}n|QAw0XrI0(lC;3WJ!r`9Vjd#(92WDBjU}3`Z=$#eL-n|{I zpU7q1&LLqpG>*SP%dhCxvtPOCp>{=?>v6b|hT)drdiU)|8KRJ%2(=Si#zKXMZ@joz z2-j0x9*IyeW#HkQ5l>R*8xHep3qS75yVikzPJCLVo#$Nlq z7jIEp_Pt`47m1#x8n@R?G?!8|%j;Dqb{4gE(x7444q7x`hG6o6TernmFA)iWzYjbCkB; z(9M_k`FQGBRRv4~dQ`L+deM1sT%$8y=TPwUvjXQOPPk+O@PC!3&cmEw$G^Bl!FuA# z_E^M%220T_lqQ$8{Z7%urhLfBN;IvqTY8~GOAEF`78-%KMH#!{fVB??neC*;@tM8I z8Hd0r50#Atr!JgkJg;CRaRI(kG8~C;+Z|=T@))fze+8?TYjlZ$nfAY4F#Jwxk81KE zS%T7D>pXp)xLf?`I3e52E3a?oD60uwnYM-MFbi zQB6IrFpF1YLbw^+OMR!ZJv}__WY0S!7}4RQ?I@^k_H7%sealxFaH;&{A6I<-V@o$} z9W$&`HY!*KH!O;fB{-wv)6CsjH)(AlShJu{ftzF@D*mhy@V2VpGu=?0d-brB$olOg zcg){S;sWE;8>8`U%O82^iO^?%JCDLGA8$Wm;uk~`NZDWl7I1tVHS69nz$QIyM|wf^ zVo#ss=rkr&CK~fx<+fmfWXW?+wmo1P^{l3(Yfs>P*vp~I&rBZ1oJf!3MI>tbTSPQX zm+WLWz2h*Tndv8fADvR3_|@$k-Px`rD!@fmdfP zg^*49N-ETfh5-Iuv-2++3go#S$*MEomE_9MO&1>4dcoj<5?7=C_ITP%9oSo1#{*;5QT-nE8k+5q`&`rtDQ}1h&E>cIn}|_cg6&`T#AGAN3-l=v}ohJp*Qb3 z4s@K_-d8lc?~%KU9X%exXZ6sM$4JC~1(V4^;Pwm6A}qTeTS14$pvndC6w*`8y{7}J zN*Zk5w{%~^+*W;4&^nzy(@Hf$)4*DQHnyF+H*u*z;3X9lHCW8`#XrGn0(jt9p%t)v zMjlvL=Gb%hyp4-k8AZkS3cH*M~>TLp>9CUqZlBUpo)@xKp#yw$0p&N@Ck%wkZa;bQ7}lgn&kT zQeT1Ludc-np5ZcBLq31=G}{QurOP)?OfCTvWn~Anh5D>_t|rs_b@9D`HnM9O`kk^4 zY-Y^j3r9x0SKD7TQJ!p$7VsN>>x)|){~@~LjE+4#QYtcU0wNQ5#63UuzP={0RyMrb z&DajhehgSG!tVyHIZp6|jtg@<5BhMymG>xK$)TgPxzoE3)CiyE{V1zLA8PM~q4*hG z2igpylM`tpQVxJPaY{0Ro|AG^74(Wglb{0z>aXBc@P{vKmIHI!u zstQ8uuxoxSzEX{B*HkKMI%C9SW5g0~?QKy8!{*3zUB%C_rV*D>X20ntk5<{ex^$Wq zQ}!F4-BEsNV)Q&26BSe=6|}|8twk%$4)sVKcIX0yUY_)<5lg4s=R86gCBGlY#FrwN zM2JO|%KLzoBH?`6vSU@yLuE^mx*@1VtmZyqU0cuiJu+3k=~@5iVnUr*4sEG3sn-~D z=$7}E;F)W?GNj}L^#OdRUX$IeJQ-SW2;IY3Ve!gZdwCa@7EDNB`#CJ!&S;4f2%t4wG(Yu*z-M zzg0ydd6(1?H4<$PAnor(dG&`kmN2C0;!kiaiT(Vp;zBsN1ce^$1n2t+ByXOD>xPkj z*&53WgCQO;diAfiXUW6;j$RG=lM};Q9tXQdRK{2B*^oh|E*dC7waJ4i`HJ4+Nl}uU zygMf!t-U==Q+?1^X=}6r&T-MusdpI;Ym0RnHQ`vkT^h;MRgfA1R}i!6R6_^omLgWS z!243bbyu4;i|rY4JCeuSURE?D6g-CO3FG4-E`yt21kJf70M31d>z>?^D0 zpPB+KRgO5%f8Z5VDbFH%@|Z70#LqpgM<_NPs!cF^l2D`wFKmM`m^#&$TD|6w8WHW_ zU)4Jb3L3@!0U{wa3}+}dJI^?ujh~ zEMPsH0FGjbyMyxAYLCNss=z!Zq?ASXKhdzWa{=qT8U$v11~DL_<8IMgL}Za14<^|A`z*m!=*6NE$;TZaAd$~D zU#yXP1?e6*aUrKY_ukG)tr@`IP$g9KfH`3yeFE>#9uB>X14k_`Q>O@L+OPbI1bCRm zB(-zTADU7lr!tfl)}lONq?oCWoIIaV)kSb9?*;34j>0Q0uxr)Bc^jP_IFAONbS;M5k+ zG!{yf)^gjtvy@y5CR|9=OEi)ql(LVwgnmxHTU$b7rtWq?#T~a_0&^^SPr7H5r^tGK zoRs?G1>k=3i})$8bWH}O0IUipzVe`Y!3(pq1H&Tx)8g{`L|+XLpk`z%3CkjnZlXyVODyVu_Wsf0>gOe0NWm@_!{sTsYzfGJlEmPEtL#E78BjIwH? z(l#rlYzL-`-ZO|UHPyO;j@n2|v07xUrq2?>S)at{*hKxV#ESVJgzD%`n_^MBeeXf4 zYhJO#{bDDue3XY7Y4~FFa(9PAA(*C&mL5ciVJRsm{AL~<#kJKL0P#)^`!k`%`CK>jZQRGi!5e)d(WegBsB{2qQ ziW93yf{A2h6CgOMqMO>^Bc5CNXLKQv!X1+(`*6m z%DWJjeO$0c!Qk!zU1##)83%(N?Nw3Jepa4`pQMU)8Zm6)dthWSlpR2g>D{G6CFzHU z-9$0AeS{smhN9@hYHvgM`LiIz~hjWOx~gX90%j5RQbtJ2IClE-j@wf6JJR)i_* z3IgM#C04-XL9DBO2iGG6S?;lT3bC=0(GTcOiZlXJMWDPP(IG9xk8m=|dy%t8ZLuQK z`*TCn#p8kMk4HbH)NOyFO_$2Y!e68CJg7x{xB$)j0Mai7;H_~>r_j@#D}D!Vklw^h`p%N9#2*MB zeqWmu6YSW9bLO-j6MVQoCx$`P3|1fH%g>$@5r{@!eK?RNOQ25n&dq)aDH@TflUs=% zfvBYj>^SY$^>d=R4W6V>V{I%!+CdkP<`=r|r{8@2`FR84ODNtasZcQ0_zMd4 zn!!)+$78Lth0Sn{te(+0&!?0ijCUZ7=DV8lau8 z6VDFMna-op?;!`F!k^Up6Hf?BW}zhV(C(oV5o0!V~VSMtN1jB zW>r6eA|p&nJX+B)^O2I)x5@Z1!@qm8-w!bKB7&c_40J=SAi~Zt9#HlarnhA=fOK{Z z^xHQ>am-ik^xKbwUx~MRrZo=?2rc6*V>S>oC!aon>RF>|x>`+!*(gqW3)^&q@|&qR z|G}HlBgBC)`~hfz=D?FQAl%9Wa>x@LLNY9}V0N#PC_yKfxmdi%DB=%2 z&DdehPzN=Wj=%Z%!m@C9j4@+dJ+kqkFHf_&+l;xoSSpaXfsK8HV~EJ0(P}PfEq3qM z85*&2SZ~1e=j##u_l^j?P_IrQB4ITyL~z%J$4S}nUOw`p`&F%8C1nFt761?0*ECxjT_9$#K!(v#{N z18#H*!}3227Yz|e@}nJ5myhlR7CooOfy3;(l*1SIa81HjG@~@NMsL4%mMYu3g~?Dq zwu=}X+0Ne8a(edy8~Ax@5y*sN8Avt!^P!b?5Umvu?}CH{wf^Jh1o*yiQ=&U`({Z=b z=BVv^xk$MDZ=o7|ZV6ir-JS9w=s@tIXLT@*&jj?Bf|+9;@sVZt_W>r|Q4(NcENKrE-egpQ7VF;w5sD4&jqoYuWbArN3s`zwSC* z^l0^n({H=*8fqw6iXZTE2(0ac0H=s8ZJeCdjNCAQKAt{+Rjnv}kKE1;CQ}&A6;_W2 z-kca&BRTh`TUd}w7sRr7)S&fKlp7-I2^M{V)L)5ooKCHCv$X~ry}~aBJ}UBje0DKv zz%1JqIle8}sQTUF+0#Llbf}*Y$Eg0Sm66}}7k{E}E3sU)c;oZQfLX%LJ|Y<5eV}+h z2_x$zIn6Tj7zLWBJ6;&4h5=mTk(}%ir7O~IkOS+>az?1VJ|$a?SOG%ak$6Rf_eMZ9 z*+m4q-R|K9U(z9aDv)#vz(gt^8H<>c(q%kI6)F<)w@0dAkt;qvqW_~9x+>vmbIz$X zdns7%BqP-X03kZ%jEBWw1fzX7Rc!-0#Pum^etz?zP>JN}ZKoD`fBlxcuxY6^A2V70 zQ$eCf0W0(Yw)s%iJJ-sC(UMb#9Zy&?N3Bzf+BK8RzSjvJHWj!o1CIsXX z+>pXspgih>QG)g)ca{Pxmpr4DY>a z>H43xs{k$gWWu5}Lc*4!JBu<~2&PLK-MN4Tp1W@-I6K<8HQ{1aPGL*4z&ZKG&Bbp^ zZGlAg@~!YrhP_97<3FI4D^feriZ=fO6kOhWJEj$lhwY$L_!DbeC{ zagS}C!-W`*d~d5q9SZzP+h4nZyW#5(s|}kE9H-IyYF!*`n}}k}v~7|EGZGLz+pWHt zo;ieA99Ja6s|+ZRgK5DzTUq#UK3#~R_}iGP-Vr4*rbUwb6n3sgrBjr)jcnWMMl?5O zTGk-!R+gRbnMiqXs47XoIyFB2^1p! zdG^@#im9>ERjFf7giaH!_k8y{%S(XwTOScXr$VshYtS`CS9-w7(qBOPt=j|fH$~8- zC6yU0mc4gKrayJ%d$)Gvm?VUmV7>ZUWIQi;wo-x&&(RgUZdB`gp$zA7`t})5aTA2w z?HXX=I}6i?@+=aGA=oM)bXMC=*h|J?dbwfPmETBy(nH7^4i zYR|c&3+lmT`ICC?PGCqM4JelBUDdWj>}MvAyIcU7nk{N)Z+{f>4+%eN4=y8;pDDr2 z$Yz9I^8OjE6%-eR_0!k%DJfXuKLT~Y!th=`pxW+R=mZp$pTGGtNI!{Ak@O!?X35jP z0p$gLh8pbP)d{%2kA&h(g+muEnGt>EE3~k1giBTwZ1PpI6lv5M48MMQmkF)W7Ee6D zp@uM;{)xv|m;}Uq^+6HCtY7v-e~Ke3Y-G-w324!PP|yj+ju#vcEbo_@Qv}5S+hCvhnPmT}i z{{S%8hdxM#|7sK%P&dyj_#clA-$YPm{n%}Qad8&)6bmO^Q9jhf{Bh{EGC{jc6#Tt9 zK&<8k`~YY6O-C3DfK)Bt$-4F{4}ev8mXxu{>wgM2~Wj5f2W9*s8^3RYc3epcbfQr)~S_A|0cLGZVzLTT; zt_0W$T8q#(-rNb#O0bVbT9FQ5RpFDk78%m@*Hd&dCEa#9^T72XU4>lw3_Sc;AR-To zW+}uO78?=KVt$%h z`;A--K6NYkC+zFi75war6M9F05-$O_ocxuSbx^~GCM$SBdr&q&E&zjQi%|ztoI9OC9W#ogUE+} zLp6xCHqNrDl|9lrMRW%l+OztE-vKUd^yx~vM-sk%jj`WYtJe~@JgRa}sm&b^&gowD zNGIhY9Z$ty*CbS9JEu503#)Qk>Fy+8Otx z$V-<2HW$2eyfp*}m~P@-^AqVf+7K!M_KF#3Z81m9AN_VlDdXJ)vnb~77I9hHUpEDZ z#OvAJ4uP%@vC(GbW0^lpGBtyFY*&6%6(B^OksI-x1;ARTG}VlVK?YL;6dEveeZ^}? zO!eZ59;C5Z#7T8qnha<0?i|ZZA{h3ZUA)}2=$1U2c{7lw=(^Ierv&KQ6dGj@_YII% zT&NK0<&dTPBFia72z}Ml%kddq`WfQ!h3-F{T}0WEVzwE%72ij%?Wsx3GbSp;a5EeC z0+T{})Pj>%Y$>;ROjaasN-{Fo(M?83q561P{EY~uZIVdC-xu44jY2L{AFqcaW6zZ_ zEdlr3J}e5^m;6tnAoQw0v$=?W@|4aTX^6;6O?>NYP-gV;GfUetl)fTaxcl4WH*;))8p{q*1JxP(QytITg&J|57{_23GEa-Uu z7mi*3;Z5rxx-GE(q{Q-qq`Af>@EcmZup+seQ% zM~Xjx2Zz}U>fgSJZ3qqA0@~x4aVK+)_OTlaV&qo2fJVIb2%TV)|A3)Ibj?00pQm|rGQE6vC_t8D+ zKd*|F8tq%Ic$)RppeE8Q{QU(Jhj=U+pIg>hAJMWjq(l? zDR&d31^~}z=-`y_JlzJEr;G?sfG9`cD_NAHFnPzJRp|LDkQOvpnz}?>W9%g~LRicV ztMCNSN5iNhF>1cNQ1>2(COn@4wk@m=TUat7lxhk zV1ne+^_9OxpXgHWI!HzMU>#XV3{VAjfntcK#)njb)LJ8|bIhxRf;nJ6;6z zQ@;!qUy2l|AYhji08a%wel6+1p38(Uaj@ELF@kueB59!pNJeG6ZNM17e`>|K(39p* z-~Qg?V1xEB^fw#-cdR7$f?iqO3uAa3aB02v@moLi0q1T=V0dyWPJnR=Ab(o(YOt2V zSWzkT+R=j{{wxSoc_6poR+$gv#Z!FwU%0VO5=Ggj-kH@O~Z&y&jq4%d)isj#n1b;~8BosFmXA6P% zKZ$duB_5%AP7ORIWz$Ur$4Vs@Zbhxf)b~nwKS;=eagrm9r@0hg&HMk1oz!`aEHalQ zodFH3QTAbQUsMN${&@X#`_(4oOEfAHjF&^E8tFZUB{m-=s@5WI11qP7MA=5Yr;dZ; z3`*zj$Bps?gd4*Ad=~9&&a#Tz@qGXJCTb*4 z%J*@g)|vx#k8G^TAWGBl9lDM;dfWOHsCvsTeL6WCHO$vU)bUlt0`XozO0soovd z@MYdB)ZJ411c#smYVu{Sy_kz`ksg5tk&IHDZ=h)(#S0VsM0CA&-L(kH&gw%X%O;s{ zgvne?75%Jun7fW9qk9;MVp&`LaM+1Ws=0A03a4p^fYV0b>EvQ9ysXkoda{?O;lj zRJWHndfM_d_y}9@pQ?LUZNFkZOj;DG9eH^H#swb=rpteucpn`}QbW|X5= z|J6VL`knOz8Zx^_*!XMdaAJ`~Ah*_@$Wy*nFFzGRKr4EAXiET@BoE|2uyR;|0e&VN zwu5Q;(MaQmWUp^JXdLS(SEg|H(Nx;vFE*+;2QqDtKL0Kq`MIeMg0Dfu+oi@Doz}UdJJ3 z69309uSY~X&{z57fjcmXx$I~%bs)(@1Q|u@Rh2&dJ*e!8gyxfdyxjrLp)fZ)HYvgZ z+Aj_-;xBJI>9!s7#%*sTFd=UKwEcXQ$N%|5VUo@41zmwriz;Nt=N02BMd<*EOH_jj z1as6g)P@fTWxByW7k}i^{gM`bHIStHWkPP^ z1K4M6TZh}NDg;64jzPg*Z{-kkj^5}MNOiMtg$R1nPs_Aa0ek8H`KUo!qP>wt^5H?9 zx#Fm&!{oL&HgB)YU7DxB<@2PSCe585=#HMnODg)O2CVp*wOQ@V$BT#)X008qiJ z9?Vq(a}m*F1)u~y;zj{<0eQg6KOjQ~K;g8Mq8CK}Rhq{$<=;Qk-6RUQ$hB%z5yGb= zwuB+-Hrd;gH7PbqC1A;m7>D|e>#v6X3;W^{OaG`&s|Bc1}UPh<^ED+o^^0Q@pM@DhVnbv-EWpWS=;W0ZjMc@8Tag)^s$aOJr z*qIu+jk}ZaX1X!|Jw;YlwBA^WlX3`25uhGU5*9_?06t`!4b1wRNlD$S2#~+rQjF#- zPi}s2FgID1HcT_mSK7sO`bH%8tl03%wYgtr*#92gpNt8}@oKfvqwTe!LdKLEh!RJl zx+RDrXP&3%*_7_F+Sco;9NkRNCHxag3(8`7pm#7_pK8biHuUVvS9#osRgw@WUONDB zi|~8(^2X7}_mZB>=NCCtOc2)_4O#I%e*~n4K1IuJuQC02Uc?*G@^Txreg0(p$mHMm zMZQZC8tWXFVF$U0BeyJK%m-6kbVkZAp^~Rs*PnfZ+#8KdHiyn^O#`GzU@yD%uv-F? z0*#El7{l$HZs8C4&#u3%Zjr<2<$WEA3#Vv=qY?4cH@&wA{|OOS^|H&m!qN38hWftw z%P6cu?}H+xVx|A@Njdrh4uhA-YDFVBtPD?#K-9tcbZIg1KSzvi1#RH9x)Gv4a!Ej3 z3`&f97*+;5a&Uo7+Cf(Ux`v3_&(LnkLtN6;IsoPi7gDQ_+|IqnHDu^>LY`X}8I-O% zEa3%T72R23*uyi))BJaal#H?V)iu}^%4?v-^11&4-LA+={$PJ5Pjl!8T$zoVACH&R z02-f1wG@>{iz_lBed*ymo_oA+!oddih$d)-uJ~S=3(kXVL4o%fZZU3In#D!9{s$1*^-iXi|e$9T9Gs2t<$E6E{iB5ui5F8-%tmZHl*(|9=z-idS> zKVS0v=jDQT!B|bK3wwYqvfP(vXD*)G;Cca#g&x$_4JlAX$HI&sGils!a_;*=-v_9? zf#_8_$x?Gucp<&peuVAc$lW(gd<2~ms197}W#+A0P{>EU-C0X>IXmCQ~+n< zwnZL7!7>XGbDO|u#2q*n-fqa|c>`r{n*UrDMeXfJlO9lR;s25m{yQvmOK1p(8!pSK zVq{9w!P6DJI$mv5%ad^Io573sjjl(~@zuIRcOLuYd&NKuffgtSRWc2HLT8cg5LBl- zVq2?d6PY%LGdlzZZI=nI`CSbpG)aw235Zj^#zy~>k&<^(iY=X3;c8BH%aA9x%=xxZ z`c&=+dWMbtwY{xb74z!BVs^O~+~918A*& zeJ;8(R1x=`u`;)fYvvwIVZ$4V9_T2Pfjf@=+%8nE|M`-OWQdRZ_|;be`% zhfv;@wzd279N>whb9;fN5fl-9%P_ompkR8b`g+lufb`vV^9Zv~?N_8!7oM)*`P@*f4RkdRq-veyRG2!}PL>dE3LE^y}eznUX2^w($Hmz|0< z=vG2ohl{!bZ~hTcDPnz*0434Bi^RV-b;64f_4<|E+iy)@#vxufF0ec=7cenZ0G*>= zsQqjZ7`iI22Zoi_bH2hdw0}>+2~-LR@r;va*fkK@rA=2CVsU^(W72x1BK>RItuZ`C zQcsZn94w|8LqtxWM3^G&VT}Im^;we-06@m43kVjyS03*i4$?iLr?P@gmC}z z*&N>`K>{$9TsbdX4yeoo{TrgA|1RLFc#imClp)U(5^`_kC7OBxuc8r_yhTq@ZoZfV z>gO0sLjxg=i2X8F&y6$3zpm_*KdpNKtVU(3&@p(q2POXa(O(r9vz82)NVhx9t}%r0 z%F`3Rcg@RXNGgh~@SceV0f>r9Oavk}yzMP@>296|`_uObwH6V60;*hnAa~djFI0R5 zA`UyN^wNH;ExhwpUi1<;M9sGc(xJXs6=3+#{P54+qGQB~FF6qm!l(%(Vj-;A+nvsP zR>%fHtBK19JkCaWm*6??um2#Gc!x!FW@>A*Z1)xYE&ONp-D>4_&+J)ej^Ec`=fO(~ z{kEz8+sD{}Awpq3=YzAKjR2uVUV^o)K1=)Q=Q6m@AzB7``5bby#}-Tu8#$yU?AiMOd3@%+Cb9_AAR__ zh3Ia+3pAja&+=qV#=g^0DEz2ua=*J>2`)7bR?b70R5ClTW$|b)J9dNSXbJTWX-(>P zYZ!iBTjmz@gLkO!=+)?9Rc=u%lbL-Ig_5nURUeBk6b+KV0{zgw8IG2x zsr{??CySU6tUeN%IbwF*8Gno1{{SO&Pe%pY^WtBC8)CD)vMIY!3%D9E;HuqV(wj+# zpKsC$J0v>Jdh9lhdWbHJV55jO(HF0? z8?E^nZ8_qBbjkgKI8*c{V<9a$bg(y;GR2>0Tl)#jK?u~Smj#;mxD^wBl*2M?Nhw5X z$TvO>6ch19w>bk-YoF1G%qG15_u2o1seb|M-=83|MjLfI0%ACJX(_l4ppPrpDA*o- z0?@?_a>$0yMhXMAnf!lKR)?DQx1flKik{z3L+<^$K|$oVfcpFc;yiqy7(#ejK9%G& z8PB*LMOq~pzw-!0>UR!6=8?;cpZHy@%uzO zS?5>8hOg=EUAsW0!2;1gw?1CTR-WVSf%4lioN16LWumfGKyx%p@)*>%? zENbMSa(zf9sv$>bX38>|R#RZUtf<-^!;>`*KqkASue&WwJajyS$!k(m2K9Z)V zqiV_qY~AAi(3;EimzF z^e3n`61{Tc{R7T{U>YGCriPzv0r$ROe7xP84_iU7-*^B1YNls~YWF;Umd-Rin7Xzs z`ZTI#ud7`|jhN~!7aF0QEGn!BeP~wp0M?MVR14>27l^B<8X7z~fU`f|?l=j2xhdAT zWAQ_ZEWo7S2Njp}UWzmVS4zK$b!5D4sSgJ(uN;croK_DjfX*s$kKRqxf?7U&D@+W?i9YT zxIq!{_eXk)%ioC4?r#QiosCkkfCrUn<1#%0!+brUJv#XNbQMg$XDRo9 zi3R(|;@&GPFMQei_i;pfoUZKxUze@_w$9#dGHWQ6LC!6>1W+gZ6YWlm_c!-`XMcE7 zUP1KM>r)c29mm(HfdXnsd$q+#B4n4VJ856Bj``^4QDr zZd3F}n&d&Nd*CbUO!gWkG7D-O5N>Q@co~AXSxyzF1X;)4q`l&^=mG<^17mdJ3$_|1 z7T~L``o?wF*uVNA&h11#Gu7L3^Kb5mDluzaErsG`6lfVM`bMj`c9*{Y%Z^Q|;=^BF zuxl>5e+Kb#DM)^!qokp}Xbn7Yg!>HUp01hlEza-L|NSR)p`>2XhTYx1W6n#j5qhSB zg7fH2W+6m#UCv6qc=gUH>CHm$6VTlbeZ>X7q`NG&`hiM~D!Ep`L>j2`@KB)$2XOtz z)7EEKwxWI8lT*s~YRFcuT3YuAIld9NS->$OM&SH z%%*59!|MR4e6CXG{L?J?Gc3I&AlV-7G^f7X^i`h}Ks2yU?J}N~2IVxE+hw0C>(2a)3s2Z zw{Dj@3#Q6SCNMU=W@L@_XHkatx#5E2ZS-(ryl{Nw98QI z6x}m-0b3IdH8KIt39(3;CbNcmoP3V{Z8)qyS36=zRHKD@gnqU*nj3RHSSXw+hhDOI zO_%mxxHygN*cCS6pvkp=q=lJltAz+@r~I)B|`Z^V>Nsl^=^x ztAa%M945h8b1)^h({$Eag@B1bB~)+!6{h0CM1nm=I$VxwGs?PokKVRf{7cDFY}UP~ zypOKjpIU|j{R(KA%0S|Cw^wT6b4kT3&p-gks=E-0MT596>l*X5#-l}SAd&R8{Vp>O z^Ziv*B{PZbShKuB7|spD`m{vtTvU;o%n_e!7Af4u@$>Ovc@h%z0A0y*%A;@{i{bqj zrk?jB%Lz3-A1W6c%O?nbtg8f&c4u_KNe^L7qAOp}*7&0EwiE==(M1ug*{)FIZ2>ei z3VsEImV&TI>rX#ma=YFoZ{Q2~6bZIEJe;aWiG#(1A{=R#@3K$?lKg4sP|)7u#P^@T zUpr6gn!?TSLjy1hEr*vo1M0Roz1?;|3UtV1okx_%?hv)n2DAFxwQ%2r&2Ret*h+n6 zcky;k7)b&TR30hrFGT+xVqW&-6Fz|?ELHN4z-!!|VluTh1IAY)5SquWpMO`fek5a@ z-tL2VO9PF@)*g0> zO1a6Ne>e9Ueel&?j>Qe~KWTnJMf@Gf*0PZC(=!%=1_Jj|r9>~Nlufbixw#9Ppk^_N zGnzy>ev`e>qJvDIf!}y0nF|`~GYXn82ogM#RyC@H`~1_e2-&_W#p!%7&1)O|UZ=;E zB_>35?BS|T!LZbNc>WZv=yDs17Vw$r+wGd_B+#i}J(>bUaXba))9-1+UvAsgwULGQ z2iYME10ZkgZfD{-hs74nZmx_D;>YnAg#b3xg5?>K{{CO@zk)p%H#+x)(G4)z zApWS3&1(u{#`u_ z9M+0$I9v{A@~5;7`myd5aX2~v5)lV0J-a~&@}=s|^>s*A_@}Nt9|fPdQg94MYz06^)P>ozR*vIyTFq5m3`^i@EFsGj@S)HzN5Ct<*m|B9~Yj1dfHZ3KD} z`O;$G^EyObw}eJA6Wu7|MPC+F8IZzX?+iNkZbNolI5frm119(_~*u~BR=r(57{A2sXTK+YI;<02(Kfb`cqH5U^M_h>v zKi@n#%22Iq1gS~caL0f_t=}e|l8Hx(Y#yBAu{3=*1SuW8Jz{yZybJU%YofHS>t2YI zdJBlX>cBy3oVd!_s0o!#Y8!uQ-8F8bPIG6=yM5FR$-xC;@BhA*6JET`$-RAuWe)6$ zuP@R139q0s+!F^!6`M348JT~#iEn$5uU7nisSGAH!$_)LEB;rLdWx5zvRo3Ed>RiE zEo+k_2MDx50l-59-9VdC0u_6e|7Sr=#omyK5nT4g#gr@VDujWH4p!hgRGT26^A_!l z1&v31c|LC|jMlP7kr+AdUFFcRr^@LkIx*^u^dPAyl=oflCO?fO8wVohqQbl5(U(NY z+86ZD2wpN4z^yXXE;G`I5^40;@bB7(Bs4Tq0hp#)jI@7)|(+Jw7~ zwN-qqxT-<}l-vkmW0isy1D%gn$fl{poyuHGC_+04H61{Zq;sfAK7$|l{~j*!MM6O~ zW<}UuGp|hg>fD3usURiNC)ZVlXCW+dDw=*$($nk7nm4Yk%*sUn1c8}pCkBe9AyBNW zZZBE9X+m1${YE6voxXUldy>GC)vkE(QOKg7zgdIO^DAl4Jx}}f{tb9^p{LMF=yJ`8 z0WmKHQwiCcG{j=(60gHzKij7v6(RA4_ib3I+7U3n9);3MUSbRc2Lk9VLzwucL(B~ar0rKMRb!E?B{O8E#vo} zteTj>{RoRU=JRenEdVK2DYD{X0J!~VlFy&qCmPmr0Y*~I2eip2BdS2%L`G#L>w7?S zVj|y0W|FAaYF-S*Pmm)Ruvb58=-A=R#X^_X$RUZNk8YwNu67n7l_1o}OZN)uVaw-& zht21fX=U~cE8ma;7Kn=SBo41Go@tXilLzo@dq$Wgv{RZkt<0I>?9Lc>f3Cx&6KycL zerJUb-9u{BTaVG;?GDTZNaROhTC$CoG3XTx(3^( zGN+9HKfc}r9P7US8;?+ESZOG(Z1DZH%r=rD8(>d-?3+INUSj+q8VI#0|B(#hrA=wFLy^7 ze%Xt&+!WYf!O>X*zDFrJ9aqOR$0wO|52n6AE8ITeFbirv0~Bkh%Cm0&`MHlrvhIB8 zL_}t6!4(m5CdNp30DGJM^~}ocVYlfWK26>|T_;3Q4)97aZ8^{hM$;+c{W+JkoLFKg zrQa$bwMWe&KG;R0s{E6F4CPw;`SQhm3oDDMHr+>dDPZMlJf>$KP_H}z0@7}o^r|<) z5u>@PLs^Z632{K84>!6(seZ(`6kDZZfu-UOxwKAVG;oQw;Fpii?m(tjV7IMd6Lx<; z)}p9T_fsfM2OeZNft&I@GeF2b05Sfx%TMj?30?LGZHxJG!Jjy6LS&=9YPDab6}^9O z_D>eTc2Aqh%lB@Pf`>Pgsqf?2qE_e5P>#X^S&{Ag-;IYXB8N5a@IsP+Om7FBkelbE z7Q;qYcklF(^HzxdYCRd<#^gF8I&NWXdqYv(l3XWC#bK zxlTZQryEzOq|Hcnp&x3|bD_fB>)x+$D{_-W^768{1k1*ByOG8@%qKQ(;1i`x%nH8#-;Ds@~|0AizQn5eX?@rcB7e5<1wtA8vkYZ@*3f4G7corw*tU< z{ek1X6ZWFv(ZMO@CZ>3}&wUa+3Qfw{YWF@=*NQuE?G#k#i$0SwthLYh|_#PC{M)JrfgJssQ`22~-c>dq}nAeagzAvlJX!uBy|462G zpNxSHbU{PW+;iN*JP&qOUki67WOz)U-f!(}EVMO&zU${^wU&@L-Sd6XUMv7q5=l5dcqjUlx8|eztLAs@yRQD_=RV5pz-cIz8kWS^X)$ku( zlWgHKfqEhnRo0|lM5&)l@U+IRAD3KE{g>y_CEX|e#OK_KB{Fs$&jVhSob6p1mgw-w zp?mLtSeUU*rjkY+D6#QK04JXWc7UeBhcv=e(3nM|LA zA}MyV8PgMpb~`VTja5VS%lxmE_fI3`Whf;b-*d*%RzYv7V4ferIWCX3|7afpgWQJ* zkimQtDgYOlaYe}p1_$UN4);M`IXG&)2w9gS0Yd~BIyyU{qt$i4Pkv%&v4N&Wug&J$ zNqQ^y1nx+=^H%s#eKF=E|IbtBm0^A^PEoKWtTdQ_+!yU&$u{w^dmoaxa4iObJQ`KA z+&Pke{yM(?I;MEly-3uzLbW^70O}zx&{^eKBK_guA&9x|3-lmpV%@J++V|`1L2@1$ zr_6p9&#?n&_pKrAf!i14Gi1E<f-_9KzzsCBGyHrdK|f(y4ZUFvnBpdX&liALq1 zWg?lvITz|*&s9`KR~UC;HmRMbSdfdl2;*%5oyy?td6uK^m$BpeKz9a*=`JQ+8 zjt&@(Z6)-n*EisFy`0{iMoJ|zdnS7QvZp6U^Xbi6jrW@?F8zC&QC_kh2q;d4J#HV`9)4((Q%}tc;rzbz z)cRP-5kPxp@D7Z=g0DuVuOpQ+?mGexxL}Lo0;9{Nxsu&~}L?p$r z8sb*KI~~E8PbiSg1Gk-Ni)D_oSBHhLy!vyrcns1;>DZV0Yy4yty6Ot1VjyX|%eS>U zHfcRpSqE4%u2Q%$wV)8%9^u`&sL6$|DaEJld%8k>ZiiEN{g@VUZGR*3Eoz<@$r5cn zvOm#+$T-Z%3C9;oWs1jS3qThVp-e9#ST4uzGgQe*aYx z6)p~M39{Ts31Z!FVWcnijWglxNwZ?PqO}}STsx07t*-GlLebxLETfB-ijmF~miky0ZA6 zF3>{X7`*ntM5fdf7(t~)l?z+@P#gAnauHMcB%l~eV7OO1!9s5y4;Zum(tPm@sU*wu`>cRMso7qoT^H1|o9}j}ZnC{_ z)ICqlAQe3u>AFXnXrBUHL=AFmqhx+OXWxEt%r-lOdv(Z8C+lyB1AHwUJsCY$fW;jB zjTbfm-0-D&_G2UB!M>%NtKJhYm(h787p^_B%)Fy;%Xy$PIrdjAzI@F6i#+oRG2?Bo z7wWX_pR14;#&CM9mbTB?&xaZ{U3lN!4F$LHTy>g z?zcNVyBmc$5)T6Xt{wi?DZOF)(fAXWnU_1>6(hHzW}R%X+QFByLaE4r+F>&O;=r$J zpSj{Xi{2BMvP+im!RSj0%2&F2H6M=SvYC)`0XMZ$@AJhaf8U2k2oru=g)e)9KCQc* zgv%!_EIjvr#3jFXtkf4;lnlvg5wbFkHa&a|@Q_t`E`^mR2)8^nR%M9%mD=gh78YC?fOdiy8^Z zfc4T|z6urN6T;nheZ$$IhjM88eB<27p}d7G9nvEXMa{h`$hdye-`j3=ie&kKuxmtt zZwZC2Yv3lNpN4Ry)bIxVP-a`>HTSh@Zn~$(8(!$o+z$2UB&=aXXSOddpA=+B3+b)Q z1z8^GV0~)k>Nnw0ltD0F>-OELdtz3V+DT3j(qAGe)q73pTyeCX*`%^eva0a!K@3hz zn=UAr&_0es-3Nc4x8M*C9-XL;QXt{P)>8^gWjKbH`3XZg0FZmB2=HDmm4wA14q>XItb$CoSC^|O~P*qQycN?vERS? z>(BS~aNwi3E9FhqSE$WBwHQenAzsKB>X2@0Mn+;o#?_4Q)Ua-=0nis2V;7OD(0~d= z%_dL6qGkhfI{yHVSk@Hhq*cqr^dl_2dxx~(vvwI0n!SH%)e1mcn? zJ*cGiMt8-$^onqucyxZFK2Z|o<7|2GR5e)iJai#eYu z^$%S>FYJ?36_2BKR3R8YieQ~8w*@UA{mH^%-UJ_1AK(JJ%*?aPnci$U6ys~)yTnNc zxw+X1$$9+E$0q(fN_?jBa$0Dk9q#s6=YHXwL=ILT?(=QIbXqJ91iF24c4lf^?ThM= zWs;c<{knL*Ee{lYmc+U*cG)dKFS|u-&ne5|pPTrH4;7sI5ik4;vMVPyjpsZ^a%_(g zuURG{y0|Y6$`ipK)TG6WNy$XeFM#pOXEUTG`F+^l^y0X>E%XFRk?oaIj(=_}k0D;X zX6Lc&G8rn6@qz~+#doG9zHSd*-w}~?S(gbOuViRnBA=lc^n#3bzgxK93y@c4A&J4< zZdwr}l^tEf1F*#@LTW)re_z`(x#R5=I9D}I-Mu~4JVyy!{whCzFrR7U#`{XS*^85q z6}Gf-VE>-EyMu&mk_^h|xZ8Z1FNYYjaLXt;EV*IE_4l0#&<08E6PS(7`Fn28?SVgk z3gnACv<}fSXU!5+_oMT_C9OGD3(G5eTw(_T*-DmvUHIpQ2%w=ai@ZrQ`|!-`xkJ}N zgYsoB&b_B0MmeYKsJbH-Q3c2S(DJ`eJ3|;xTYpxvYf<0d z2#wd6lM^N$=z%Vuo(^gy;(#eWB+P!cp@Kju?hLbw=a$3&{!@T8J;&Bv;?M7|e2tdQJm!Vl=_6u)Uf=%$1MpuDE-Zk;Ze|O;*=OKWZT=@aaXw{0 z54E$S8t+RN>%Si@vb7~paUe0Ul*?XKY#9jt0Ukb9Wm2{g4K7+ zFXyxQ315^RuAoG)Gj;I?*9#ctW!#6dhsZ{W#%~(PHTDGu7nEZCP|nRJVUUe3%rnz2 zf3H#+x;G1}+n>??Kb}(*W$OhcurGbk!>ZL8@(&(|J$r!tEghlFTkKw~dAG=P(VvGB z5Jh+Bu=`7hPAu_3r0cCMJr3Wos>BcTxkdhe{E4Ly!^IYmYH&zHS$$;D{UeY!y$;-X z?P%Wfh^^0+?(aC82#Ce;!(pAvQYy>;e40Du&3y5o4og(2>Ulsc>#)PP6WMB)Yzao+^N7zjFTRYfHNjLu2nEzR3b7cdCw~!fa;H%`^ulG1|)lBoALJrundCw*j zw!fd`#bk10*ai08mT5%~T6(e$7+uxnSHLY3({?mQne}vQL9T#Z$JyR7JdXmy{{*;H zAM$_Qu3#`o$$~3eIsh%7F4|$X_IQR?d>_MSDv?MnYK02v1Ub!#esTtcQUi>OY+?Hs z{CzS1$AhJQ@08*pcnIt|$Fn>T--Q_ep**2R6a%f_b;2EHH%KAZGbh*k=fg#jC z;?Mi~^S@E_*#P0Z1b$?Ra;5SClu?XFZs2SLjenMeD`w!Nk*y3N*jkE3wn=%Q)GI0_ z+yC>C)LbpHNGqs@OfUpezjXY&?Qy`TEr?4A+)#G%eMkM0*L}nb^d1q9Eny;uD?lszO|{Aul1Xhhi(Q1))N8`L_y_u5*(L5OL9Uu6FYxyHir`Z$e<_O)45#a5XuE#?fQ z!&8+rS?3r(&ktyT0nTZ*26Gqfs%Gc$1I0ukN8w$+@Z$Ju|E%wW9-58`YL)rV(@$RM znSFS^V!)3_37WCZEVZ)|n*23TDSbrW(e24e_8swmYt}WniJg37+k{PemS#cJ*d4*~ zDg)`w_jXwKf6N>~Fl7@ix4uI%J`+MIB5pHNb%^*H{dz1uU7$6meH2((ADH88q+!$N zTVb`007}n!O7cEiT}BN7N9cxa`{bChur2en4#9Aa^1tLtxNtB5A}|+(Ckhum-$^AT zrcAJhpzdd`tiGRV-Lm8J;-^Rt0NTa0T-dTgInLcA6H2LaV(Upj(_Qwfhq`xIzmO>T zT4L)wFr#3CF8dV%TYtu-SbYn> zAJ@mp6?TD@W>uVO1bJ#foMmA(>@OoUmOeJpd~OFAxu1wV+PU0FPzBK@aeZ|oTqy#( zmS?9dXlJ*;Rh$VB5khNUo=Z_2PGpER$-L1zvGq``XEyCd&*GJk5INkeXOjK#5s@pZ zcX!uKX)^CR)bzy6Ukt2V_Eh_#mBes5d1F*fr4MauM@q%Q?WgMm zi`NXa7B;nB{H6A67B&Pi8I4(J?ndCV4do18miP{P(QOg+!DpgK^Sx2OcqMA}{glY% zOC&8D+9!g~>gkEOf=SP2_!itdgZIf;t<1Z3?hK}EAau+T5M*_bCp)d-7ZC-Nqj~8k zLD?f{MZyvI7*PM{I@gZ9Y94F&-UWS7DCU4TD_tNz`sdoK&PJHnc-t7pIVw%* zJuKCfX3r!JJ|Hc}cRCF?EbU_&Aq?Ae;>w9N^pDStF0=mW!|hz=R#bIQB7tZG#Ou)m z5Kd+L*irB)Yr-@8~jw^PwAeY621`U)IEf0Zwgz2CEFvGBrQ-lo(nU zIa(4?mtrxux5;d(uQaRjNIIE5@J2Sp^`}3PVl)e9^oE?1GFvX#%ah4Q5B@8V3z%QPX@8fuv2}SyxX6hgc`>9fh5F8~+z4Vk2!E|bQcZX>$ zR2UB@t!6h+6>2rX?ws^MBTFMTSCK@i#a@c4{QqM)e&{w$R8d z{dQR<^FUP{)5&m%S#}zM`Xxm{)5y$0ID`@qH_-H9hX^$n60ybS-KPl1u?H3YRSu!PoW1d@0?2s3uyxR&+n>EwXe!&v6!%lh9w$Y#Zq6Bo@o zCJC8D|JBkpJHy&1Asilja%{^+qxgH5ckDY@=X~wmX=GUpZfC!^bhT9V?0Wg)b)QaM zM^1R%!-SDd%9iRji>SVcg=Ra@5r^g6i#{%HEhiA3HRLfz82s9|Sqc2|t}nTc={HJsxy5{sSl#d*PDxc|>2NmvY%y&w8 z%@OA!V}qKqY3FA*0fGDVyM+`lyrMcffUhv0tNM`l3SWr`+1)1pyVqOw#i~BYFQdA9 zqFKk<_^rerh*gIaoc}f9^etH6$IhAjD4(;Y69+UmoP{-D_QHlC9pJ2Le1q>H@#ai_ zQw{}JXOPy9Qp0 zq5a6mD<|VnF0O3Ls|Um%qB5~Y%9eMJ>meJ(8EnKX*lD1cb-Vhee8b%Ov`}aV-ykk> z?)*q!Bcm|tkCR0tW$CZ#N_|#+EMRm+c!VfH&a@ z(xP?oBAUgrD(JQ24-}g+V7AZG$-?$f|94XXw7O*EH}Xx=AcA?tuaj&^=Eo7;0tWJ) zCK4Cs=o=@6U(PkRL|Qd_-eWmHgC~H9$0Z*ao14mN*NSz5{{GCvY<;e(j?SF!aKqfC z;;Yo2Va|YgFho%dhv#^9lD=1z{#IWF=7M+Pm~{)bVKHtq3s23!opF)}7ro9v>3w?K#KR$PsTe$K82&O+gg?RIT=2E{+1DMippZ|*DT9_K4-bl6lo3% z6eJqF^4pi%oU?aR_(DHBoZ_Yz-}MnHg9b=WTlQY-!Lb&8A8l8whWB(U9F=Fr@eJ9n z9}Fl*k>9-6WY#(Rb6h}oMzh&BD(1gJT zCrMj%!5frtdJnrL#JcG)t}x^`XL;^G-p8jbGBvIJ+@YO8ejT_-)O#&hUkMz}NVg@% z48OwxKItzW1x2ddzc~Ko&IsC&+&yRLHMbm4q~SNZ+Iv7Q(tcA#UX=Z&Q~YZ2f~&}+ z5aK+M?s^;Am2PG}gP+)nlpfa-dBLdU%T!v<_iJqtR@(rep)Z^@SiTcC!;Le?Dx^lg zyUp%YkH{8tojNDyyMAMMQsMdWxs_A4f_C#p$_YGeN8}$H#7ov4;(EfFStz<*pT!kq zT8U9K=cU5f_vhYa;J}?M<%nAI?0$EczKMF`8X0Ht&ocCGl4-xB=Z;g`UOJA*n-ZtB zFCZmR^`Uw}Z1(i~xa_Ij62Xenw(agaIg;sA;(x6h?r}nSX^tQhTONArT@W$dRA;_W z0FuwuR;k#N-f&QHYtj%Awul~SCYGL@<7lM^gkAws@V>9<(F(Ih_={!LQGuI`zCmuF z7P7TCD9Qw6a;)llXid^_ra3E6ou0~Ow9c{GeC#rE-yW72??g_XCGZPu*eK)dXGvyz zBEThbN1KOQntpPr4yrya~!*J3$61e zq6ACAPERiIrCmy9S~;IdK?`$C#HevIY)aftQ>e-p?G*kvzoI6J!XP8CVE^^_DP{X6 zBqUgX>r8gbtGGkwB~~z>=_QhAE`zX8qgnBZR-A)n8vo|GxQ|#h+?%CJcShO^5%gc! zFfS+hcPWl(Pj|k4a6{cp2*- zURp|TM~4Xtb8v-6WYO%pMV^IxIn(GFCyG}xl^u|IHaq!jmQW%g+@8Spf;^wyo_fT= z*Mcha&PUxR%>ZKURPJB8C#@U4@#DO}_MGAg8Ns zFJ%8h3KafjWp_j(1n8sfdEbef)+?9CN-jC$0j|9na5>#pBo(`C8#?blbF z4YACtf7B=n6s9Fh)C}{r{}I<`*T`NFI_Dby)V%AuR^9W^P#d~Kp+!I3gtt7Dd(_XJ z%l+TA=c{gv47Zsg(uuS!$&C8Bxma}L1lM{kq}ZjnO$)rczCzmmvoCu9t99jpXA0D| z@VQHNAuo#`MODn44BsNwzhs5sv7q2}nVBSIv+#blO`EKf1unKHOXa4vzI$Qtof~a< zyVbn`iF=%@m-Evb*{VPMa4nX;>wQy@I319Xl&0+sBZY2p4PQ9R?^jhU6TEr+*EZJw zUQ*a<59m1>@w3`z8$Re9?YHr1s$P*2b2Z)Yxee2CBHENk{J6RlzaESd z#d>)7?3aFg`r6jC_EA6rU%xZ_U$oO6?>B5c`9pI1h2a{G{Fu`Mb?NGNB=@I{hy0;s z)Nd3)(n)%ZfmcBd(4`}u*ASH!MxTji zpIhC(=Y?|^?J5#KtMV|B$T&Wt%M*euV&83k8l)-|C4%JJ5OGGRLnziW+D7kmtU_wK5Ol zvSQ6FQZDn)?_R^~ifTtH=k$-)^BKAXbyJO>Q%aXXrFFek+ac$*$!tmGoW!n7mv-CH`-PyOkGQ`?1$fMPzR-+;#pX~0c@6|F zY?X0O;B)_s8ao?@#HR5>V2w>dMP{L_ZoQwdX!R9(^KK`Ob5Ys_=j@mdk8A{3F)%Tt z6m##Zsvt$-Y&Hfz^uIhMucs$ZRRh6TaaUvjotT5u8Kd*;ExH+B7&^_KGFL{O_ndrJ z`RJE;%JN9oFojng*+eQD{>n>jc;7_54L?=D7$>`!(Kf)5JZv4_fIEYAF9HW1elsMYdhg|04GUO(+I+meHGI0@cm(v~gn zX+~TNk@pc37SH2!ekO(=+?PyA_~7I#Y8k^(TAQciv3NsOEkUiSq1#R<<((IM`&u+E z^fi{T)@(6tiuujU}cR=8~a+g*MN;(7!5+(fT-9Wp9>?)9n`$d=;_2(m!p^0vd3VoM_5^R@S$RiC3?awDo^=y=rf zu&B&w7@qPEZL27Vy2I`R;EsSzaRhfE3C8;MalhpiF{wpzvk~9;Y-Z`YacJ|NC&o%$ zq*o^FSw8y{P__v%4n~Z<+TFjvOJMgJjZ+caTZXU1^7Trl)gfNx&PGMk{o>jZqhyTy zC}Vh>;Bt||p+nJ)Vg)~#o*&R#LH{KF#xTgUeyl}D9-He=Gbt%-TbQa5kpVnJx9&;q znwiNej{Gs^@Q725Ob;tLLirqTsa^t+WI_S@dAPXf@MEogI<)X941V9scWYVmnl~n& zcotd4f8L);-*v#v_Gnd&v%0_5se{+vIzOm^d!s;>3Kjy!IC>;0KL;aXYewEG+6|-k zc>+I(;x^GAYnj+!N^fh41Za}S#!otRw?LKIo(G4nqtCOSCIfH<&a&~1l}23pjY@dL z(}7$%xpTRi|0zpB2GjDJuYYP4Y~a)s+f2K1H-Q!(?fC)Jbo&Q3RrLfIYM<9?Jvi_v z_UQR9%XB(SkeV z5Mx`_9N|ux;27+ce^teClY3x$PHMuURZ#gz$UNIzr&UsY8gx%OK)Vb113RVSXk~+L zPLQ08TrV-~NS)0iPGmYoa zkXV8j6@HLml=ptWk8|4pu8rKkP&G1b6)8NO#bv?a7=~e_q1W}PMxXO8cit^2MxbA+K)PEGvjJ;ln6KaG^ zvG$Z5QaXmNFbSkv?4A=;m&{8R7`y-rHk(Fj@+~nvv|pqjNRuSJ#CL>Kw$#8M&$W$G zDli_+E}Z*0kgn<_zgc|bV)_g57M02{EK4uD_|~)VK_}1Ovp0{e+BvE2$l+ea#t4g> z^k3hGni;ug%q+=;AekcPDZ?!`}uJov2H+j}1fMmna@ z>}p#Hd1S7j)7k|ywU3-X5~bnT3WTQ9+}8shD)w!ta>rpF z8Rz#fYlc04G9z{oi6JxR-vu>kvw--NZkDc?86|9H*PKGc#DRuZTy(;Xrde*TJ-o+y zK&A^POw+{hwArJk3hskht|HF456fdCffFvkat7{x$8VQ` z56pFMaX5Qxw(u@)mAV!)(B`TUC=K+nlTig&yH=lxnKVM_Blp>pS$HtJ)N0GGqiAxc zO2K1}WY67b2-UKPTV+hd;7DewRj^YDqLuzITKb*pEwM1um^!V1YB@9yR5#LScM$zb z67E<>?9{x=bwHS8Iy6qpk06fSS4y~MULk!pOh{6t1m$!)D!;UKc;f!7r8rbtL5nxI zA|8uez+_0BlX-iQhc!(uL`=6OZH(&Uih^+dH-P&5K63Wi&4E44Rr>DQT$s!K1zrY{ zp8Ip6r7E9`e=fylKO2J>#e3=;MQWq}zPWI0I!YYne(h74DK%_PHmo{n`e zXD2wExpTz+ij(i0=$VF2G|Zjz25!l~_NUpFjU6gZilN(e^){}fm)J0NlVuGsxuq-J zKP^7YWJ<63+-AanSIgb|rGkV46Ch$@>CouG2&( zGWlu74x!(_!91d7?jI$*+rR(hnVRmSJIbM}j? zT~~UhRB@uBpD!%JF<=SO)=s^8|0NhO9H4h&Ys`g)Thql|2||$OynNdH;cTz)M^618 zzYM61P63;_r?QaWSX!$nJH3fM)F|V6?0NaRRa|a2#<~QIp9@HKuhE>h_%9En>`y$X zCs5BI?`tPSosL>JoZU6)Qd%o&X_HUwzIALp{|<&ZVJ2_DCVIG!Oe(^n9Lkv19Xvpk z7|l(7#axGsvuTCMWD#ShhJNT@@_+J?>?EQ`G99dK+gz*-oeCEs$0MxPpSz9{8^JNP zvB|05pBb#F<#LIAb@ujp`{)NDM)Pm|Uj8PFJRrMilBhp&0^W~>dU+c1wG~EPrq4rV zUGg5m`N{v%hF^!x(nzn!_-IVO^0x^DZad7j$=im1o>CvY_?Im4c+bx=+QYA;^JsyL zY0==iGldc5uvHp343BuVIhzv+V*-6rywRR&{&jZ=&Kbt-UU|e2z^ffk_z4EJ=V$3nEbdt*$ zWn-)cP-N9cxgkk;l5PTR$dyX=*~sKk5(Q(kpb`+r+efRKTf^}GicP!hrUDLUf2TS& zS?Bj(vVliVMCOz9@omMZeeZQ={PqY~H#hXwr`W)JW`81hpF-6eXFbsd&&IFsdH5Pi zcIs3O8~sXeKR@~7+2OC%0%!5stCj3$S_dn;{vgN~9sK&Yf3At80LQKQdO5EB zut57bEDpAf-;lZWU9_He^pr0=Yqs}&Gn^*;?w04wo^M(wCPbvz=KpbOD%}c_6T`a8)D8Qx2^9Lihjwfh-G!ST08WY)Ct? zcERBkqZUluYrjD#ozEf0U7&BRis$(mvnboHZXz#FpJT;zd5IhU0?QqycV@~S)a%|z zjO*(fKZdo!Vry{wt)crkDI{HSbM4#VU;L-E!c&bggS>A^0}ie}h}^Rf zzLw`-v+%d*3uwfv+_P(RlFWsk62o8Rw!h#Ko>W;kZAI%YXa{C<^^57D=1YejyWAcT z0l&I0KL<3H?^38661JRey8KZ2OPG+R1sGcs+QzGtP`INCl09PxoyO8Ih}wxG$Cr zccofMHTmu;j#>%YZ(E^y2`F9}HSmn$LF))0Hdg^1ayH20z@*sai{Z9DMSy$e24%~8;Ft1BnOwE?g5KAWb_=)(sH;Ww4t?dDBQmgRTY2nB z$Q|zM>FF^pVBD!~51;K1iV|^!{V0NH;PLjpEFp=vcC~9bCQWH)` za}wUUG&hR7d3UkE#2BfxeIP3A*t^MR)qZLezsIm0FkJdx9+&qmC&7`c(NyD{_-Qkl zQOo9*p;Pn=#ib>L`!mv1m=`ZjB95X+UNebc!EtPWM3jftZb;G^*x4*ch98i|5O_-i zQZZ3%Xn*n!4|Fw6cLfWmrZ*&vo?Ua~8Zs$rTiURmz$Usr!7uR4h%Wic>5AF5SQKhI2HJ#E?NpYr zlp>s&R!|sOrvP{oE>%2aS73JfZSco?Guk->wba|jb{b%bot{{bokA0Y02y1nUP9dW zHQoNX+-7J6k{3NlA8D~IUHoI|lR46B(4X=~%v@Qocm)}oh!W#1!i!y=!oij2KWY;N z*ts=!_II{f(H$Jr&-{NwA^%-xqR~ZoU(HtFCdY-xq#rRrFgK)eaL37iw(da!F%;HA ztgc7b`sD1Xi^+BS1192;!&}awn2zXUO(j2^R-tvx%Zvk?C7kGwZCUskp!ONlLqzE& zISTn~ztK~zkwtSqmU7PH7F%RggpSLO4+SJnO72Tw<)177KAB%gz%DfkPPZmzO@htc za+XeHTQCRdiw3%MsM4Pl%s7f9jld1(56eoid#?{JVc}2YeY04yoCSBoaA{JVyOK= zF@m(3rqwb!zxM0ujqmvz+k`u@l8*2cSPa{PcO29k_>I0VUiWc3^aaFhz~u^C$DRwT z{!M3lCwG1md*9{dMz#{&f#>sfl0J9@=V@3ON7%KwlNoV6<}gUQP2*Ixjovt)pw!)n z_ezjUY9YYB{JgVMKd3WO1B~c7uD?Uo<8+~7aF=nBd%Vps>ZB|H3VA=A`3_dqQGeOM zbp+>tFi96VKK(GV?aLK*v&3x5KBTAo8lCIBi6)ncq+t{9yVEyB3Z&u?Bz*hbEib_L z^pion#Efr;Y{E1WF!O&n%+7(hcq_0rz8^|Ii4LJzp!CRDHZ%bF#+AV9k5M#|9p|Z{ z11P-k2eNIRNW)CWCZzpZ;eOd))8y>D)fMBoz3N}jrEcKr({6`e^a+6Ym*}OB!$6k? zj4yxB@!+MqK!cq*xTb%tf_DQ@%s%2g0jV@$ghq2)Fc+0qoHyVmz19e%-mbq{@Vn~r zZ$8=}$sC7qRouJ05!@=o<(LXm?;dUB2hjK?fN}%nZcuOcXks{5g4Vkw^CDynu@kNp zzN~5&fD_>^UKP4=KW8nM7xg<3O{XCE!V(dL#H!L%cp;lm2`JD%%ZBgAO;Pw(stqE~ zI~ZTJR-I?(pYs)K#6-&j7i%NF=-Ugf(4^jHc39#pj%qIFTNccX)VSc&QZO-Amw>JfP5R`%TML*J4OlTH=!20Uhz!pPa|g=>B&GR*N3SnBk@@4=PD zC>10g_c@S3=z!_#U9>ib&M98dBh0cMVCaI~aIfLx&m<<|Zf{WW4XVFEv8Ov8F0a5` zqB4t+zdO0B8gX-Ne;B~L%q5Wv?0pPz41xyx${7~?a(uRNC;K@8pILvm{Qk+uHS3c2 zqv$st(huIgK~XD4a8F~?hAFn2LeaZxj{9#g^$$|s^Da6g(xvTN-FTJRl-V2UQOnWR zGXpZO&5TkmL`YYOiB-LsJYRX{>)r7;qYx@rUC;14uk21Y_Ydz{#-hw}L`R!+S4aKC zOzbiez;YBN#X~4RCcZ&U>|W!5;sTH-uZ*RfnbL=MQ2}G+n`21_09hn_ z0T=Du29x+_xFWJ%tz7%}`{NHDCY!cGPu-Y$Od*z~C}n%%hiw7#80Y?>h_atURh4<6 z|IOAADZ6M@*pbEdY!#vYD{O3Kq8DeF?bhfzU@&?aw!(Zi%r|ebhY0=PVwe z1{J{2-~P1f*xC*RS_fL(-0Lk+bbSMb_U}J-`+4`%uZnyR5jsxC3*xCbMO2_>r>_%l zpP-9!o1GE0|E#>mzsn8Qx%wF7YZ`-Xb{}NfpW%S&seHZY?=MIG5b%X| zF)xqAsi30!j3uZ|`H{m>{R7YvNj-hg`EJUo<2j=qE?o-WVtHkx%6*1r^PQV*QsBE( zh&9M~I}&zr2)L7U=XJ3dTKhdT3e9TO+Y{`>QvmaZ^gNSqO^I)E8^%HMyVs)BZ!Sfh5Eb_nsP=<_UUR;I& z_z(jG%<^i;3~N=7z&WHt4oWhxPTdJiXc@^57hk$G@SklE&F+4I{dys{w7sIphD*!? zXbDBkpUA;uor}5epMgT`Xo06ZMo3}7+vi@?UwT8k0`EFnKg^a;7VUVQx91w?PaNO- zA?teRIz6Fp+pZ}$`2+g9_ZEFO7AYFnj00$WO{r|xtM>2-<$Mt4J$xMxcOQ}6(A@!0auC95bcOS}*OAf67oqK=mFoZjP;Y%= z7q@>agTSp<)GNE=4h07#{qz7TgwxqhEIr{EjA#$)aKefs@jVH3{fYLqw-XZKK+t%n zm~C9eY&ZY2BR4S8T$cb8sCP{)H6R6Ue*J}rxm@;)j`yp3g8afAECb+#t%e(Hjt^}3 zXYKIPty@>^{)JgLpv=4>_e|WP)lyr{9dq`0_^xML3tQ45IGy^p(Iy*1InC09**_iimTIFDe{f{B0GH5y~z&a?3as>|2Lx6=NEGh4`8o@S@$tLz)^g zA7Xt%0ZQiXAtPQ2@wW$VAy;5`#T^lw+O50iPK^x#x^(TmOQx-T#d+r`@vOQnPNveN zc<)}(3>U{;SXNsxZ{)Dh;M@32sg#m&>h0rB$L3Cf*F9s;j+|Jfwme88^JcHhW^<7^ zbD?%A1o5}}rL}AhKUbnwBE`6Yj>iAgp57~3(&ze)u07-^&hJ!t1 z)#dr92O`OR`382icctEZDPGLH8=$(w-DIixlW3m!Ff^y$wB(J%^VPNweqwHE$JY5& zua)A-=H5Wt(Sa8Bbhb8{9@`#C)Du>~GZ*Fj^g1_xdatgpf_vHW527SeY7W#^9|x$|`(6y~boHyig+ zRWX5gDT;KfY+v2HehmNGPjGP!v894Wsq^G6qx5CX*^GgGg~%gRCV7^@^s-CQ4VjK! z=4WTJ)@^jZB9n8~qa+`y#qDzIsn4D(`yzn=kuj5H2uON7>CrvEEY8shZ=A<08YgSL|dT+U|)Sp#BdBa|`E(ckdxJC^! zx>rv3aM=_b^nZfP3Wa)}l5TROC%oL&#;YAbeXGPWDk?^@)(Txe{)N|D>io|bSp}5P zpOJ9G?3{&zUIm&TQygJCS|GrwSt$M#Q)mjA*cjVa(Oc_z(uXv8NuSQyBTu=@VNSRH z<%zj(7a&Dq?*&k!1;8bE1QrdAX`edO&_@vIC4l?>M(9iKosO!|Y=tLJAPzwZi4^RP z;ec?V`g63lk*{X};OLr*6yNmi!N{`-+`uXec!^Yi1nPMAjo1KYM6AhLB3_<5>r1EBN zl?vCVzh;+R z8DDdrC52YL{VpOjdXwskd@k&PBd;9bb*-U%&gv~VH%irqXlgzV^9lwO@pCsId2}DfHzl;Z49RGwzI&f}wMJ5rqx9tPl)->upTlYBYM`WsVYJ{^22 zLY$-0aP-j&YpkF?YdY=}VY>^;O${n|a}9!^-j8a}^D7SqZMX0GcSW%Xdk8KwPt?y; zS?c{17#S^3t;cTo*M=TvA{3#)qn2Xv+SNhPd+34b$1gTRr=q{&d?q{{r*Zx0@C8jqZ4ay`j1j%N6Fid_`QZ zQcN)c)V=`o+e3no$}xbeMF*Mw=g}_HKYj6OJ4>B7#vrlY1$e4fy714jLA6Bw)?|vF zl-rEr#{4S;YLC$Dj$}Ywu?A$y8-bbEdXW(CB|+i_qg@rlJhF>Ed$Tt?Si9b;Z|RDa zcr|TRs^(L`$H!L#ql)+o5=dt;V&_v9Ub^7(owwiCc}JT+^<)*eM*nwdZs7I+7NCN` zJJBDvx*czc58ukgd^E%SX8Tea&t7EG6z>l^_L$+$=5t3$A)T?z50NPq081XVM23j} zyAqdsAbLmztY@^o#ZX%(pRrwpfcA~JyeG+X3npL--Hhf6#k4&rfizh*=NyFWO$ac0T{$$_0w7u+l40g49~Nk8J6ahsuV9hA#NNCl zHMH&X*-_INRZaCsA%=+QyM-^^#s9ggN7&H#dbRtn6{R!It-6G5y&$vF$B8{hg3O7> z7eZ%Ph9pc?M{DXUr{nrO&e@Hp-B~(83*1JFs>mr~B#7Immexct{8Fghkjq+QGBe## zhI>gI#{QJyv@Y3qsul@yg62HUn@ldv9T|bXl=t+jFzYM;EsTL3gLdDs$R|*8eZ1!- zaym!$JOn`*5#|fGFIg?cOWI;I3I}ehiOCj&-B;APkvBt0#UK>V7RefgnpHWvYi~&-=!q7ox_Z!jI+In z92iULfZUzlO;>BY;e>yAq`a32xR(xn$B{dB*e5?ppvz7U+oJ$*_^KBw`z-joJp+R& z=T7p7%A3K~mwu*8-EcP?@I&rhAnrSL!aNkW?u2okC+Q`bD^j_@RCsFlD}6Gc+bS4N zW*e@hoOZRiB!ty!-rVl_s-Y?MPg@c|Sdt z^CBy3i2iMf;)`@HS&40<+LTysjjHWvBkHHkG#BeSc+(S+$v&WNUxGB%xB1Xr+vu~D zC;3%pzq-=jFM~fstMK-6iX&(1F_*hKuOcVZd&~*)5F=LiR7rQO_GnrfbOx`=J6JM- zjaW&;dkJ;&!e?%Fnm%aBv}sFZY4?L!B;X#VPk z%+q&}M72+$5Q<5<=xalCi=wkmeNdUKAW3~Sw{qi9vt-A2|x&#svn`Jyz~ zHZH`Y@!MWWed<$=v$^XY!!=Lq95;w|ewEn(<_p^(G#nY(! z&aFX;Sp@A>r2@8}|9nYR7I&5tNH_b5>(1OxDN^Owu{9Hsh~T-}Gt;`>e-~RkqtWCw z4!(8hQQmR(+RlsS7&2WuXiWVQe7QY>x|dC-EG!OM%BrB`xspR7)=%t^QpQi?Wv#(F zMi$p1wp-ovo4f{8tYyG^c+q;JFX@z;5L`h-@P-_zk^5tXEH@*bm*vJLQV2bwqBU>3dn)O7v&Mmlib;)Of zfRWgqP#TX45>6tQe-r-3CjZ!HmQHk6a=x#LO7y2h621C`>EdZ)=gQ?V{JRvX`dk)M zTHVzPjtv%$wvtajbrHg0g3>9Y<57t9v(H?wy{kFMcx@CT3q_22;Va&uXwk3IO}m4h zsl^jFWo-V&m;CgiCD)G#dIJY)(f)>?d5e&(u8!+*0R$5n;XsMXOA3v^>b37FMOVu} zk5tF`rE>f!rYCyu=dg5m+@gvCx#(iwhpe>7Y$S z7ItQW@w^?{^FO*DzNUM(dkI_U#POqW!3{r_t*3|VZ~ipXIv`ML+9I5eX70W{KAnT0 z0cc|;oJK(?Bnn+X1yy1F>EpX)K6FCv3#c!}biX2;cnL{s72aFYPI#`UC6mL>woPCl zTmS-V?v*-Eg5KDNTarGRfqiX`%l88V`Rq}JmQ-ucMA9QZPpY_R!^M_kyepVGNgE!j zvn2c7QkkKE?MdE4x*JzqaS!%-I&jEXwDA&BWvImibQ)<@IE)EU(!0-?8%X+AmSd;%`E{gHru+RlGaXWt97E8_am9)~qOnvG-FDK?4h2m(%p-lL1w*bwo zxhBY`y@Yu!`&_}d%_E*kyl9w!-2%h#uczolwIKJRZucd*5#4?gQ1VS~wT}Sn=PQcc zP1z~ihgZiZf)QkzX`Z*tj@N^sMlDS0;5~7Rb$H7JkUss)MK{y;@O7j;viZthe`K;H z^;5!*_r@X&HExih>OH-fJL2L;WB&Paxr}b5E*kazU8P5P6P=3WBC}!@_uKV@wdgU=5qrG;X{mrmAbzjlYe-%#& z!2~2K);<%&qn{wPV|LFPz;hzmTCl>pLrem>=&RFj2|KR?D4hgU!+F3=s&Si>69mog z!IJvI)}b(PmGNp4OkR;C{&^;7Uq`1q|-P zD0l1U=bv#yxmS^Mn~vwqRYsSgxwcmJjfUIdw5bnd3HP?J3&@z)%3~6zb|FVi3)p1c zGTbmdpU!MD4#>y~Jt7akrFJT=$YrOPSZOpLoE61-ZGW9RlXQ53QdV?$>Z#O|P0mWq z?`c=k8~Ttvv&7w_x7p%opX7+LjCxPqBgnJ8*$?ul9|UhkzYF|Aj+39^4hnyL#b$1T zi3Lz5K4&9d=L43`$k6Tf@D^;}Fg)97i82ZKQif z;B~^^`-wo<#cL5|r6D&U9=Z(V#O9^gU?#mC*x6ApT|r0l2fP5;_pwjzkvAS@pUedP z=w*ivOAroXO9zvD`^>L|m$yz>h^ZtVXvCU12<@h+Og?ko(l@~4SCAeRiDAKKNRNcz zOKvKX_J`kD+$J3&eicIdB(+XOSE6avy_A5Vn;&0ZL06F)Fp_}jqBmqvZ@cMyKj?By zj3Z-I{|z3EdCQ}pJxX2wwV6lC5}GLj3{byz%>}yE_yF~>Equ>1{E1z_HgRX@Y5Uv=r$kpYeM5(PT*Knca?~7>8a!JM zd2i}JPSDzqFT!<cHyDvCEd&gWxz zo`;$LR|$YIp_SNm%n*5;mDG`<$TeGoN)N{ zWNUMB!<6RI!-9qw2sC}#+o(DUYPiR3TcHjhdXFLwa=~fD0v++XhDG4tO-OJUXS@T7 zN{vF5GS%cJk$l;`@`Q=|?*M9N0NG|N#7J**_kG>njy+Q-%JlDIDE<@*$fi%kV2#!- zJ@PK=b0qePgxI)z(;X44puaA(swL%r)nQ1m}O6v3$GCt6# z%Vzi0@Z0ESR@Mq``dLfYBC@EX$b2V#2n78akBY%K9Bgdic^75Ax)G}7c3VYwMDsvN zhXEcRe^-h!yD?hIB){Q^dKKbYl?{NvpImr;CN95A@C3(JBJ|AKFsdLIG%Tojzr9AN z&+=PX;i!8V0bYsY5_f)xXCJN7g~;q&<>-^m%*cc-Q}?RV)j*s4r~__FLCwYfI$?A? zP7V8R&RJ*2qj`{F^Xl#vBagM<*?%!S!Dv2LNYHRR`@ZzY4tSY$AcJlifqZK!AkwuE z>Byd}S8UPVBXeRfL?$Qxg3I=JXcNr^;JU?SBRMzF!Q*B}~S56DDH3v;-!r4BYmYL`kyMHgUhb z_pUA?L+~mvc6b~iVl&kOn?N2U=p}$C%)I7i5rVQI^Ls8(mBXY{F{^EV6~!ey(Jvr% z+d*JZ5grZA7m|%j%%3i4NaaSdW6QSE=bD0gLJ-6O|TMFk2J>UaX^2|GK_m4{yU_nY5aRn1ocVX(#?83qxIKaa+|6SU-2MejnZ zH+!g&xE&E2fF9w_#c-~>vV~;ATBNmXZ z)fbR_`1UTNh&^Zqx4J3sn!qB6sX6_S>-OGh94J2NcbT7FT)Lrj^#dm>Sn;!YZw3J( zSz9%BS4|Q{u+@)u%e!ncef zcumL*)S_Cj%S}TLND_j)0b1|L9_a&Ex;R4QLaEXTDxvGFu2aw%i;{lYhy6SSG%lv5 z43k&LWV#g2l+VQ~zan?N@)r5hZYY7cqe*stm6zY(bZ1sGwBFg1cj^CGL*r*Tg;7+3 zu|#Qnzl$(cSzu-m0(^o>?o{u;3rnOA!dTJ|l-h(I!-Nmh$YE)HpgWU-Ntm^4L>=BE zZV|Cha&u>uPnr<2pGxR^s0yAxVv#-WQ3LP!l#eIFz>_(a|aH8Ftq}^50J{@`6?lJ)Vw6 zg~G{&2o~%#NUjo~NuhLI8NLtyZw%3wqJVg=_$k(SH!7a!5}TqS3~dLn^FHkPjxV&q z8;}PQwV_e%GxK*(`=q{Tr~IJSRFsEKk(%6~PlLjsauwof>HU6G4!b16(jDrKj>jm{ zoxFA8Lfww-g~|)n=bx#%f7whf`?;ei`DdfTtdgl)X+N5~OMM*(!W>hl`?`M}Z@gG? zJhwVn4Og>9+Yc7}$g+G*h7TtXd%qFYzXTPh--Q4GY(wf@QRs16&kYhX)4IJar(M#s z+@&{7AxP{UxtJbAcjOWub*Y1!(TP}t7Hl)nlXvki>}anRzOR(eayxVF(224bWrIJT z;v*j|WAGWzg07x}yVt=r`R~_d=@?*O0JzlH-&F91EfkSm-3LXZ`X{|d^&V~<|D0RR zJcNf%OfElv@&2#Hb&$9Fuh-*h*&tlLR%pJ#OTO^}khj1%Jy2XQiJEaDS$fE$1`4Xe zQ4vrMYXT?91H$QhfSc$*|MYdPpoFPP$L1igwPXN?!oFoy%3rIX>>(axp@>JGu1FJr z@u?wu4|SVTiGvSIE*I_hKp8xv&^h&A-?}-BDFJ6@Uj|?e)XyJ-=)^OK|KaX133S;Q zo_4X556&LAle8-6@%_Opn%S=MY+%xEtyx^|z*-%6k3Wwv#JD5@Z*%5N2M85Xkconm z(`A?g|0=kdFFv74&#v>|7ykd#3r<-w!)!XX{@{v(*W>^CS__NvbW6Z9r|2C-y(4d+ zkh0tplCX1P1R-@rz0=O-uUknwk3M(qoIRknbS0Oe!r-yur#PsB(YlVJW4LPR*1~uW zQgHM{U_|&Gy_lkdZ^9WY%84T!gGF??AQ-R-W4)gzfc}^=bb=~Zb zAR6_#CO!Y(Qj)(u)&G3J{`*hNbvSK7aZ)5>a0XR}Rul+Wx8L)KP3lT%p#SF`4(JyC z_dmqIeGh$gdWXSqGWwI+_ViNz7|+oCMj1AZaz&;y2cCuhe%t^3)5X658d~d%Qr%L>0W(FCAYi}_ z+^Xitr!+>se>XROZuQ}-yua6Cs{5S3yfLuyx3N;6mzlhGlkRw2_&+rY<|aKNLLZO& zTbrcC>4DCK-2E;_{K#LIfYt_e_^LXKU=)Qu-lGSUDMx8CBQ^hHle4{z&&Po&Nt^K5t#%*V?Z$j&J_QuhQbYO`f5vBql)pGs=a^AS2RcQ}p3q zyU@W$$`(gJd_>Zf&W!l4#~$8KOFTHGHBiLw_fef2m7aM)uC5O}^RE22X6RoZC^(wV zOnFD@l$%dbtCS{Z|KMZp`E*6p1zWm*U4!^|UEdEBFT|;gu(hl)m}%ZZU@E zBWu}TdH`Np5p?Y0D0@Js*NYyrzJKF*sqj&?{g($l|MR2%$JNogj2iQ=JrlU&W>?7t z+2r?4le33bACmH(f=lnuEmvc;iAEuf1KLePob5tymSw$A(v1_!|KDI=(+cm?z` z;t`eW*6Q~uaiPD&LGhR#hLaHU2PvZZhzYMfP5KG=FZCc%Ruo_k?OU?wgWm#!nxIFd zV(f`l(1`=za~6S(p&v^I^yIb$`an;HIY&=+J)KL)%|7UH{pJjt#T^h?XoHme=Qqqf z=%ohOx8OoJ|HUxDt_l)G1bu|;#3It@Zp9Vd?+~z6eq+pP{9Gv@KkkrY)B(NCov6R> zD<>HtnY0yv^+MKlZ!%y%?a9%gKsc9N*tub@klQtg_(Js8TI2`$wst)@M_6k1ir1pS z%xeJ?f$7R_Yh7uC0tZ10|K1=p2m*dU_(vxsh$FJ%YpIx2NthH1pvb-J1EpN=+!p@+ z_w=+#GO6dnh&lX}01UI!4sD)Veha+d^OhrE8`kp3P&oJys>8i(T|jwbs^Ci5e15O@ zZH7if`XBz}Pn(Nkz~kKjB1A%dUMHJuA&bRX0 z67L6{wy6AZS8(iMP5;xah5G^9EA10xzeX$9o*q8$?{VV`hf$q?+|Ov?v7Ou15E;N(9S*6N@gWZ{TY>vkG4Huhu3@0wWkFYoma8M=-(B1w;a8&VK;baaJ_BJ z#{9ST;wi8*vw!qs{+9J5lj_;FcSkYVmm0x!+-vVw3u6rM7h~a9mD$c2@0<9?WpB}u zcbe{q2JI(W-<&Dndgh>`nX7xa?8=}{d#%fZ^NR5nI#13l3~u+kuZeCu88pv*XG%)H zB{jLe;63(^q5B?D84tWx94zXH`MP+-tSf812#XaFHJ#`1j`|sl2xiX-{ye82X0KlLp&&jiiugqao%_!RCU1H zAj(ulqdv9`dYoRnqtBi_L!{%5%N{39a-2^eP)uM#7@!kq6)RO}Ae7WN$)$A6vXK6I zUt!vq?IYLx@25!L@syn6;ATF?Se-Ljl{;G+W5-EXe9Ln~nPK8$tFxQYE!pPdagip9 z^6M_@1uWg+8sEQ6zvRliaksf-8xE6c80PhgDgdM_Z<<+WFwi`gbNqzo;`=4oqu(H& zNkrtE820yC({eV#i7s8p;9vswxVaon| zmjwdTB)Z}y?h((cj?m=U1NL$4A{E6EXiyU%q9*UueJzDjsbApQxCa}Fz`2*UImfFa zB&sDL%e6M_Fagovi;N0-!OylqKiLV>8SS2&BR|JL*#(+-L)crB zIFH{gy1e>h(Aa=8j!jOiQkZFC=%}KP#GHvdx1|FUGY=X>J;D(*`%Za>LYrN!=>-w~ z7a>23D)7fT065HjCq=5$z5CZ$5-*Bg(vOo=VSpeGSq8nAN}wJ>2h0P{c7fx8gbj-L z$TdMX(*N#P!3u=j#Ns-U53W-U&wT+Dz(({Q4W(_8+Zg`#fS*qJTSxL)=n6|(1n+#v zG;<{KKg*4u)-M||FmLA&p~c!nTz{;l1j>GYS#{E)!(x_ZH7yghY{&4lR<_|mIV*ar zzcs!sV4ra=0my1ji`Z@4g6MXb@S-nYR4qdneJ*=~=I~z!3-7UD8s!a)ex~BHVZ?O+ zv*A5Lq%dQ{WG9qi1{P#BYBh>eQXUFB!1Cj{XfH)fyuZUj6^8*AhI1ouyu&`Fkq@X7 ztNwz*c;kPdv8U_+#bZ(f&!w}XA^Luw?gr33e3lL6?M;7+(|EZFS7Y>Cx4SN9Emi{9 z*>{Lrlv@lGeRAXn!J!`dKPw>so|Ee_FfjMVR0B7-jh`q%I+Fw2mH$|B$;kA%X*dQd zH<1L@Wei=Fle7&KA7wCredXSNJkJ+VNKlZo-^txMm@lIR*pxH7T>W2Y39k^;mbO0{ zYeE}AHuN%oD&&F(5c$7BX1H)r(1)z|XgoBX*Q0D4iXoxkw?{TvmrPB=@ut!4gIX34 zLs1Y$W&-poK5RLr{8DM0!HpORaa%jp5%DXp zAoiqJ$I-J#-s=%gYt=RHhn$lwSxF3`hFoo3Zv{8vsqxrAF9C zQU|+DwQ2(Z``FkgAq8|a7A0fPc-!cfXJ9<7&N>^nl>`NEFq(7xHKC|cj=I_Lka!+w zuZJ{crJ$k^xvIR;qwF>rW_l#m6-i&kvi&s7v1f08ILz?5H~U!IxnDXu)fbJFx<4MR zKJn6cg*llaJWg}XQj_{vLei(8fTx;k>ifXkbDv7a2)^Ep4t=iIX5f<(_%QaHSZ;~& zoe|l^4c{8w7#_L^>@{^ z0yhKTKtB~&_GY!p4%x}hF>=mzA1`U$^go$-%Pe5um{)V~NP;`>)*_wqlUE!K@=xZB z&usf%F_;xdxX`Vt@Lq4+tKUQfxD($V`C0Q)@dnUG-dZx-svoy=O!r$|_;xl<|ou967fq!5iaq zQkN-sBIYa|7&6H!lnJA*4U|4YoD#3DgDh4G#=PN*!mNJ+%r`#;l;M3?YntGMW0X*$ z`Kyh=NYUap7C?FOSx*UEWG!A4EQPxu3+Y2qub^M{8-4U6DbEdsH^7c+h6vEbyMP_&zwaC!IW&(=@b`hz z%K;Xo6exsvuetflf-@*h3PO$3T8@0TnY=$Co>iY5 z3L>s&^|{&=H$c!>X4gi|6*W-zV~IG9klE_VK%M@~CTzAMNELkDRQ8Tj#3BQ#*%xYK zJzM(akDiu~?E`=Fc??fYQODF?+7Yr>UO;a(L@@2p_npr~2%R8fIhgo{CH2 z!Ck8bLLVX8nt>n?B(Z1` z{tNZM^f3U2nXU9499rNV_zJwNJNVcipcnN2;$yTp)5tdtp>6WGGD=}C?t`vZlzzAa z3O+4WzDX$FfkpwDH=g~BwLYvW@j!RSX-*EvN}KGl916Rk&kL&zZoPO# z2|fH)aPy^uOG{gG9GPC{d~>_&<`V<-{qM*Sz4c76<^3gpGE{dd*gD+U1)dD~FKkfpI!2DEb-b+mx1Zg|&4aCgoBC?*;58%=M z*)-@sBkPX>GcMBIp4_3-O*@VrSMK#9PCA}EuB}FoP*2gv6z+FFj?XQVYY=*pkyB!f zOr(3ot>~wP+E4}(N8DbXADYM|e((@6IE|i)Prf482GL^Epv}2u3Yug~&yN}lHm#2_ z*>FZfyLwMBsf&9m4TBJ{9K?%ll)U5g$V|z@Lg$)CTTYU{O}mo$EQ>Y6FLv);yjl_e ziu(Qu#Xm9~^$3C;D<%+fxKHPN5tR=!cRWEh{=^$<4`dY$)(6#t9qHteCpKxwMUG!Z zFVyb``cB>vI6+D&jI?=z47y>+STOl_0H5-Kd5bbV^FT2Nco8Qn*$xCf_dp`U=qKn- z4_Oj{LroUj3qf1L07yJl4g@*%-wJ0R}{#a0b|2ocVbEQ(VRy%fK>SkSp!Lk-T=uSfaSl( z;Bn(M*?3(5^gv$`CK@&^io-u7mhW8Pg~LP7)Li#R79Mt>iaep0I&DG13o%TDJ7-)5 zN>3wEj=M((fk7+RZ*kY=ypBRr@bnad_WYgG-Y+cO*#YIHNya^x@cJEOlLm(Z50VI+ z!C!e@5%KrD=T)Uvb)fQ&(4u4Q6x+kVq4!a}`XG#K2>8&rGX}*)wgdJ7!~_ouDCY3- zi{}lV+qzdk&Qi}KHBvA`n@7_7oa{kkFMo&TE)jA)oILJ;h%a_kD4|#}(z3K{1H$o> z5f2&QlVXUpH8TSkr~Am(BeD4|UJ?Z&ai$F?sM`vleAJ+*)zvP|Uj{KqdfZ5UC?1@W zG76Da$V6%(*wxM#Tbe&`Qi1aQD?nAWM-H7O{Xoed&N%~+VQ>F9I5$(^rp|rSLiD1r zvHk)J-nmbyrz^Ye`=Vy0$1~M&k+b;^ToOP>l_#_TfeJk}-ZyH-A~8Ry|FS@6ao!IPA zBZ{Z>ML~pL3V2@6fp_RwxvTC?HC<9;=vv}GqwDtf{En-JVTcG-o1x(~y;y$rkp>&B zkZN>|>x626BGkUnxzuX`Ex&eU?=Oal5HKV_=YH$Oy*P!ArfJxk19OBJ%Qu|HYp^A0 zK(K>Dlv7u#q&p;}4FZ#Upj?SG%x0m7OT>^FkyxHNsuF0Kh4o06|xgPzL9& zM1eS@i>(&qe#TbSYt`RTSpwR|ATuah!=HtsSN?JVUGNjYz6SMNeccXQ#Vv+&fg$R7PN;u9w{txtb ze4iZb*{T*#L!&Gifd0#ca5;BC@?W04VH4v8xMRrFUw!62&}CC9Owspnvf(rg0SKjd z3@;9|&~L6wxQ=5aMM@=858SM;#u0-id(e(s4F%H^*W)Z&oC;(}8o|Z!X^}}O zui!kvs*I4O$p2ue@G8L=3IScmNEsMHz={kn5c(izSao0nivSXUn(~Y`fr#JoFRUua zB~~5B07p%~WM!edCv&Yw9TJ!e*VP7s$<1!h(&0OR|Jc98?lVpp{FGpl5UqR5E>E^F zLn4fS3HC4Y!a7c<*-l%82NzHU*?>xI==wCs$5tm`%%(wc)#RZd?donW31~>2?~vAD z_A^sI~J8e4DjFv8r*@H6Rc14uplL$6H)M zNFa6fL$>I1J^!rB&=^z_f>*n@PwNMD7May^h@!BqImga0P5Fp?JW!oLh37W;!zdvX z9sa8d$*pmB(n%PScKO8|5c6|3E-9~A`@TO zKK%0#mE9$1UI&WcbC6|o;)PGX?El)pwJcGoQwji}_e8Vy5?W4v>AY@ELN?x#1AB=S zNYs77KNNabT6YR!$J^3uK=RAYSM5fOq zIdZyJ+LJ8XfYp!KB-cX!dtr!V)|>*=an(h4U^1Nx z|C+?)njK2c97XcVqWmfsl85#N3l3tLS@#$T(l=IZJ(ZIq1i1I#GN^Jyg zL}wQjz1!%a;44G}NO&Yl&Is!TmcqGhmg`&zYFE0fs~}7q9hAC6BT&_SCHLF&#{H)g zzQ{##>^up^u71+zrX^w^kwhY<-h!Z#WtEkxQ-eQ@r>y!%S?8)ZX*~tmlBPc#oWx2I zLRrwt1b^ki?Qx^4H-5H^few#Y>^YWy8*m54RYKOuI0E&6jbpBtVr*QC!A`&2pNcM& z6paQ^^w}X{-Y<_vad-BSoweQqJ(pGeoi$F3&_4oci6KkRCY=F3h%Ryvln_&1qj%H% zM4lZN{q4c#vCt3Xi(gnsuZ_O02uun7O!#NRh~z`DoV^5wf&kD2SDamMm^kzcMj52o zTRdev+t2@HE@V$>#nV4pc-G>;MjUv`g?{X^7k?|Qh9sJ zO=+tVtoXZ<#w~M*eTD4vWRc&B z{nkMNpWZWw6Bo3bcEk8}MMsT_Y1`t^nhPexig3&QUZ>{;mU^WjF^8T3s-3)pF zF3`<>_I4ya?LZ}4ozA19W#8vWKg<{tonhzWCrkYX3*;3H3D{l3eW#zm-6CImO3oQa zZ$BW*R=j`&_fZa6uDLs{tix#E{0nz&;^+l@k~o{#y*n?6j4A3=VS5EGs^5uFgq(qu z;hRmT$qN-7vC`anrj0~?Vtm;vxS%pp2mbe=i$PEQb|5kBmmrrB#+J5QA9kJA+biv=ztkSKabsTE`beOR=OU;)T zT)3wS=*Z11R-=-zJ&)B_uNB8t>t9BO^Z6xVg$ae44H6N)zyN@iUP}>~YcocPTKc+298Cl-w;6ciE5I;jU)J8kJ?dN7(k1b1%27mP784dX?N%!=tmI0V7}#wO=t zsao1~l?Qr-Z>IoSJ5J+qY{_o^hLmAZ$U2r=A%kEdf;Il!oKA_TBjY_Kfo2lNwfO#J z^*!93L#`%bCq#+-P41J2$Xb}+C{?_-ha9$W0QZW-xsy&9-PiZLLf%iK8lz6F^YD8^ zRA}{$7Z;d_6(ORgTC@7ovfw<0Ra=M`=jRTh2fv~D=o{b+LC(0*#mu&tsf+wWoh4GQ zei+>se=}qS{H+pe1a(cxRiZbccWyV<(tJ#lzU~$$;D-?y*nNj$tKe?!g|1?h(1b}- zRQzboaw{8=e%Kcc8k;#dFatwW z7x!S~D%xL`PvUu%%SA}HTC4EfE|@hHVzzJ_E%qsXQZ7_{i@JNbx7(gPaG zTf3KeF2SL*Iog>YMSbs$Uy(kY!Vfcr3-=sM(1&*WQc- zwo-mH>cYs1oxAnUrVIT=Y{6$w1@+}gCP3TnWXK;oQ8*(Fg3J6k#|d|oSySp}%h z-B*4muT7?UhJL18Z(}@mi?T|soX{=&m-P=L<-DG!fPAsPe}2<*rc36+OiArPBYniF z+s;zEPt}YfWg#}Cy;GP-Ug;6E5e#oc-(j;2s=?69i!EwqRG1iQL=hIU39bueLVbGT z(cur!Q3oye<~ z?9+D}A6tHyQ?7FN&2RHv*7evXEDuRLW-1!^{#t7ymX&TwedbF;DyNz|t>MDP8+9w6 z$_`-h9S)uSM z^M`A$q|cYjpl7Pj*FGe6x#F;BOBIU8EPAQJBm^Z9`Rz4H2!=$wWtm;#>Ggq}6-Hf* z44~tkRKAT~4=3V2Lek<7tZXrZNhGfaj+R=)u7UC41UB)3YrjJ=YY#xh(p2)1qS3v|KX9M~8I#qq#j4y6)6yQ7-S1;v4%Va^L#{38`k=q-b z1ayVf?^z-}wvAZcIJ~`ug7k^9e)R4IYojs?p4Drkd*8u>_LKdxA_W;X`OzlzC^Y`& zu4gj2Em5UOSBt4Ced>tI98oI^W|R~8G_sfJU5@=%VquTkIP8$i7YAeayd+FpS)RBb zcXrQFQYudrcLsUY>f%wREzFKn;0kA@LDKTn*P7?wog`=MySyTz)U64tEQVmVEw6p{X zgP1Z7->>(u7bYa3%0u%Ji0FW9ueUzNhaa1#_mn{Lxo@mZ<4L+Ljz>coWNkLbDf_AF zzn7kwdCk#Ohe=Gk^WnFda}xq))Sq>=pck$RcnPUKVLI=p=y-8^DUuAa-!kjHW63gO zHS!YiY7jP9f0p(H5Br=qdBAn=9BHN9Z=NPG?TFwKFHRVpe8j2x&!h{Z_<57pP&Wm3 zPV3VTfK=`hs|rBkqivejV{hthPBW?5GT;@I3)j+}7H+&qIZHq>Z|U@--7)QpL&Do- zWKzKwwsPL}v1ENCYZemv$WtgCNr4XQWCWY})O_*7*0U5FqeE6keMIV13wP5cD!+d| zM9j7ihpjpl!BBTv4ALItbTxjr{ghkqlo6YDT*_rdL=Kh1KW7gSl{x&Y8oN~_&J?i) z4VVWruo094>t8XXp5$o3@p?RjVuz2;nDprReDyq#5xPSx07daT*sF!-G{H1DcjQEtDg{adI01FaML!Z7+divQdr?5AYe&j98 zBj@dyjzF~6IbE!`0tBvrSwn}6zHlQ|CEsK(>SDXoJnJmGGgo^ZISd32OJ}8qr$4b=^BP+^gsk@ zgRIhNDyvSQ@x7SVR0SQSxReGNq>B9FM7T^o$O=dCdYWV9 zcG9WyY&48UCikz?^NGH6LtxB=2;31CrV3%pPr@RngU*f$i+r; z*eFr*cf33K)v0LpEP~das!q=Pkdg4b1eIe6)0p z-Tp+ZkpNTj#*G;IfhhNP$KjAPc-xBAKW*Ms`dzj5is15#Koxb8E=D1_x3c0qcQgq+ zUh-E>wO_U_bJ;)UYtYyCa#T_i3wb9wJ6<}_)mD5^EgE3XD?MD%!+G^m1Y~G9R+Jxx z#1N5(iv-yhUrc)`N^`VEv)a8ZxS%D=-c47bs+r^~gd`g=$&N>?Ld(NEO`?`W4~&6? z;S-a#I~Qf>C1Mr{r$d?+q!MvABMj8jD(-{eZN4^`-zVAH$}MQah{Hk0`dfqjPh?V( zv14CQv-($8KHK0(E%WOi5+7jmb$6o9rRCSCp2>bBT200;PitvJudY9#ZzXt^$~I8Y zjRvbaJVwA!3?|>&uQK;?b|RkCEZlpKBMi*z`n+b+=dQll`w3M7f-9>r(gt;E2o8%g z^@3GAy*4;T`sTyy31oU}=#$GT>S}cjwVDBOY26Dl&7CE5-ls+0&7i-6K^3!Cl$~QhSn5QHS)O7{3ZztJJ^NA{G ziM#J(*G~>bx9c*f=4r@Bh8NfVzL=;HwD{SPZJU+O^l8{Px`lhLsAdoT)OmYl5Ab); zW}!bhk{OBpXbzZ(a3tqX)F4u(M$aNwE^|a#f-CrkutDt-YimYm1Go1&?lRjkrtd2DhNLmsx zfFQ~~hwxQ#cGJUL^3-5bQNq&Dt1MBZOUICIUA@|a{>|9rHj{Abx_NGB_B(65pH$c> zU?cj5;X&Oz+Zu41ESsjF&}Ht$Q#f?Zkc`h-Ebs||w4yLLx-{~%x$EamRNnIFVr=M= zcUQ_Eb60Kz!I$}DzZ*&}1)`R>A9_K~NY2Ou?UPf|f}ehdMjWZ9s-f3S{fRD(WVF*| zNikXAA*Sd66f>=~xo%JR#@0^9kpip@TZls49Y3*wk&t?+YVmC{tl|;7Sg^4@w|aWV zJJ!%Ha6y5@k$q0d4s@3&LNe4=d+M+au$VAht#5aV&86zJgi6E}l9{ljI)?WrJx(sl zq&{f&O|o!?$Pb4Pfj&PHbgsNRtVlxt)&vDUo|R}Ax?;DJ^vr^#v*qX5wI>f}j2Jvw zi+OgY(tdsFD56Z6x_q97%sHQV8fM9JEd?aaCiBL(WHjBNgU`S zUTohNy-6-_CKGU83Q*=dT)BHr#>Z~3Q6|@qT87LnPLf`k&_U)yx{#dWXHLftdjvY( zm(+5)uUV>XDM)|$9CMaL#%EESe0`IPKzhve#g(n@nXRf}D8M8H{SFr}l?*K2ExSD9 z#n1Pve?ee5_Q!pS6hTMT=1^}#NM%W?YLNyE_rw@aioloZ&v!qJ6X#9M!c17llK$p9 z488XfwUL4W2zZvJ3lvWjb$Bml3Fzpu$tXHXrYObHhoAYWD0*@Wy|hR2iowzSvz76> z9F1Evq0mI_H=ydjKc}^1;#8Tj(ZxAgdixve{d%A_(t0u&9%qB3=yws&~Tdg}J+Kk?bm7^YJr=N}Yh>&DVj0rPf7} zG-r?E?y7NONv0$QzuyLcYj8!#iGG2r-p%3SP`;_lsh#}gQDjo-u;Nn%|GQT-@loLl zr=#^Bo^$dWKij(``6F3_qMoSF=R3aExu+J;1g-};=j&u^+e!CF>KYgiabBtlyqz4* zXi>Mgwx*<6t~_q@dR3k&j< zzYARi?>YpJ(is+Tc&dEy-+w2o)Sa?gd4#BYMiJbr1~+cz(_jb|EoMWHR6eTgV0;M^z<$%aZ&$S#F!p%)%M;ON-&zlx62Ejvt!DKU^_kg1QF)&vV($9> zFSRtT%48o|cOZn8b8Ox74a21G>~<#g{d=_6us>RA<&vVT_LHan_z1}`1Qb`h)T_3; zXuJ#gFfUGftKppAhZDO~E^Ch)R0tZH+g0@USoUHprges2LU38>5sHS#tfp(n)|+j) zJ*ai@Qq81|1~a!PErDKMI5N{P=9A7bEfLr|Plo>kqFWms~X$az`u9FrxA@mT*h2zBN= znX{7HbQHWXZ~d2-Fs;w6rhA$f35?EIL=j?4Ye^ZeUPkec=^-_uR73tKWj4YI0y0~BOExjoX(SANtie}`uS5wPd@%8QVE^aZ9 zLV0@E4*S~cIX?ZI;u^gv7C{^y) zwbjxW&=d>Hj-4#)Rl(};APgXW5KU~2Ndt_6()^}A5NPBfXX<%>h9=&ilSqqkX$ z&7TrDlxW|m>;q717wG#J1};d3g>u6guUZ8l>f#BFmv7Q~`dHrrd9F}x?&?@inO(;Vr#=?hdkn``p zdfE>3vb#aD8Erf4P$>UZ0mV7PfxM27i|Fl$tw9nG`iTiVn zj88@6lD6bjH<|VAe5WA(wX~jN)`tuWe zsCyrOD$N;uWne^&=8HWd^##+A8GYsF4m8e#5iY}pzHT$xXcl)`{!La@pJMG<%EiRt8?6H*f9RsHvjhz3uSRb^=%1uop=@HR z|D1}8JL4*__R@*a=)_R6u7j=U&@q)3rCxyWyAdqm^cZnpV6q=W$tFWCx#{6WvB{O6 z-uR(XzY<~Q+V(JIvEjRJHeEL3uZ2=ltop^1lKrBYs}}iu>YKT*n{Biz-CieOHF&2K zypEYxHR}F8KAIS^G*Io3w5{Uz(6!d}Vd>JSXnxyKp&9e8isnj49J%&oTQ6$l)Dbhg z)2G0LP`)X-M@2D9o7co0QBA71S$ZU#xbqGZxb(_Jj*$Lr2p#c%QQ9(Ru9*FJlenq5 z_SBWjqy#po-Xs~ApMz^&4n>2Fvb_w)`f`b&izw|5)^mm{bD0^-{`t}+6a}ph*_9Jm z0->X~^ZolhE88QARL+VNe^<&i3bfOcG_MQaW;C`Q@h+YO?K7b&Gs&Sc>zb z`95LWCI^c94^HR2K|WHa{YKeN>PrUO{X>nCY~jHci2~^b4zW|`hA6qNSrTUX+sQyqZh^E4u20IJsr?;wkoz*^)Tv*irSkKQ&rC>_;?hg0|y=eK$7?xRl;iYTX#qZ9NqNZt#?`8{{ zYcTS)KjS=aQQod%ig!IO4kHE(INhjBCTjXwM7o;DS(;A|?pZq7Ojr4dhtK(id(_#V zwKHK8+W%k`;^eQ{$U4$|1Wik)09p>4$zE%m;XB5& zuN*&jQ0r`;<#5c}&YYev<3Df34#7ZLRwx$h){}+X!s@Nkr-hD0N!nuh?-+3U# z)pI%08AV%`J|A93JHy`f4Z;#`7b=RDV1v`z6ElxzI-7lJiLXWbf4eV1!+5p0qoOR# zDQ1}~sNpB68jkQ|Y_%Nrx#`%<0poZQ1gooYC{hdabsV894dAqeM%h~ z4)V1D*_@(BJ0Baph&n!gwyDD>zAg82HJ>FhL8-l=6TOuE>!rKOg2jpg>XE*L%4AiR zrMG<5n7=~h)0*vb#1pnvgvYQ=ony}}Q+2#aqw#X_Lb^M#lNZx}o#bL(xPX#z|I&+={TJT+OIv!M_s8A^>ILTH z`rnhH5aL=%%BpTiF;vevcQUHJy>a4X}qkiu_-a{Vc%HS0qIQ$}= z#FX+(b!oW2sHEX0t#Q0fdUV_%$z@q7LaI{)tv3@frKR(BQdQcV$*z}f>c$TI+UyPr zq%6qzRjRP;8hR4PJ+XaQ166X;pm`9+)0F9P!r?fXeBzt0G}UR*ui|4()s=<&MITgv`2u4wm1faI~K#oBYKr&*TRVuS0Cl z{6#HKpxJ7~mtW5)Rx1{jhq^vA<|`)_Pm#SH!?C~7Mr?)c=MB5J8aiv~V)jP*Cu_F2 zv*F13R|4&lT=5KB9ZTPij$is+K7MIoJ2*bwc{pC-NhLWUmjnyi$9l3BFIS??X)+ZP zFr91~E2Q(4*ss2nSAHNn>_(Bsjqz=~+$AYD^arYjmp@O22QIxn{QuDO6@E?k-`j!+ zL%L*0cejFcjV@`BW~4B>J4SbRBS`mQ$57yw!Bq|2CAznq9u*X! zZipN766ki85erM|&WG=(cldO2XQ5+s{Of3;=(f|W3IGJn$iB+PbuVq(p9n*KeD5xFOlQ>b z6M7bMfYPPZeY8mNa|xg?C3e>3KUKYj8vg3m`V`A99U_0dc%*b@9p%|;-ie1BS&~R# z8}b`HkuaM`a1unDyEj<|A&@dj2QtTZ(Y!_7945D8B$C$;!nB9Ex-wlClJ$njkxwBq z&!V2NS1VcY4dE`82LAwm2*FdfS|wt}LU{bu!iuxg>5zM|hf&p`l=@5lRoCR&uPcIE zv}c5i7NTYb7E(VXP-3Z%QN&zk1Q zf?g*kT1y~8%2QVceQfX+U|>}8=4cq9krrTe7wH5(Letv=V&|>WoXfc4yjVCW!o^& zEQri|zffA=dFG9J;XRIX(r|_x#r7AegWia7pV7%dsy7CYM0p?cIBC4O_NmQ~%7f2c z2SRFWK5O6jLq{wi(C=I5WXh0G&3+N^unv1~#0Xsk(jJ2)g>T|GexL3z-ZG9rhhLOXVsq$NVTL;o$R8PFG({V_{msfgxYfZ{7n=WBI z=Qa2YHh+a;B83?u?z>Dd>%Q|hCAxXZ>QI-TNSa*YiPzExW94DLJS9KzJCumTLd=X;{7VtGX>4hjr_#10~S?h(6d|#kJAiuz`lelr77<|87_G=$CcoWu8*arM^y>d z`(WheJN5WKout$J>AdSaQ^vcjQ2+5U_ver=8kwWG+ArCY(7IKbADjqw*6Odb7<3X0 z#Om^PDA(I`=;@^2-X#6(5`Hna5B6$oyP#O~+WdznOE&X%vH@%vH`0hu{l|IXOs~7a z{DtaU9(ef$6lj8nm~M74+ut!ToP-<~zbfG3x)@c=h`{o^%5|NU7g^LK;AX+~>Q(JP z#3wTKkM^uDkB|2Q7+|sVO5&pG9f8hKF0GT1FAP>>g?ly_kUsGlY2VW_hS2fDH~axV z4FOCEuZ|ZHRE^b&oD_PtYeYxfQrp(5^Vukt*&YI;CrAMiTOWcIclw+lqfBXvSzs!p zZP5&-FUMAwbC?b8g-X^^$0j^$%SY)uME zTK6OKI!u(+s*tlBK5M_!wbU{2XAq_8j)D6xr1b;_$F8F6Swl^@G25%U8zG$8t`zNTVPi?c^|wo9~6}p#__Eh|XR3N`+y5>bU=OF@+_3OBlVQ zx;V6Z3ck!%9s5Nz?3G&IGjQthSK9te32y~Hxy4=F>~gP zxjq7dzHc_-6&1LMxKPSggn#tSyv4s|Qm+8X*;~o?B5cOi+&;n|-gqnByMLrw$c;}}(C%g@`TLQCoElZntEN>+vs}8IvF9^^nKZ9nGJWq33HIM5ysi&MtUNUE{%hn9*tBNXAperb*!g< z?ON0tSP21-ihs}FmtM0?lF-*da(PU*3jWT3Yw+XD1rh`@gIM+zDtsSD+;2=JP`#!I zMlCl?Ga)wf7}}SmyA$qU&1@Jh`EZ;?-QZJstfwm4;`olwiq9)am>I(MDPg{fo$Uzl z{+D!(#2OK(rGCp(g?jfQ_YB|bTRCKz69p>Hi`=0YFL?OJo&DK%FNn@SI;$kq*JoOcY04a>?_z&R znNHm`SPjqde8Uv0+dwRlziU!XduX7m-gC^WK4BoIl7&5!(;*R!87Hza_V0C>82jm) zxPBMsQjBCY3qyeg)s=1R94JYN=hi-uX@ZUK$b<2c95%- z!-M@HT)lmLT+z(#3E-^R0(rWlRCehVq~@mCv7 zmJsVBfF#IVC`?l*$sWMci;)_0k3dLe ztX48u_do`s`T3Lur+lzuXY&3>eUwq;pB@$f7HS*5XQQ! zlfCA6GpNf7OILoOMA~u$(GaO&n24&O07H1D_4Px(h)RNf8-U=}T^Mr^CON4o6c;&i ziDFVi(bu=k4GP;Y`{#Uv{gpIo2$j_Cb5kPhlDhW^XA?us;4|CUP(8vRqvP@sJ@e~A zjO40J&miV9S#)|OF@wxWYhfp};Wmt4h>}e&BMRsPA>D9-10X#@$15&V${T+24n1g1 zHtI0t)g2zb=|7^uugoBjzUY^R1~xeKVQ1yPr#E8@bt6)7A8d<}i1>}kq!t$6#xJzf z?#fkLK>v8Dxu)u;dt-e1md6^f3I`@l6)AUyt7avNrsxJO|O`g7JF9TM(HsmZRATYLSlvLf{0!C065yu}wd20GPvV z@d-`D=#mLJ&u&=L&HEw8yq1Q0!k9m^mNaXmg;*Lz zXqos||h$;jpo&CX@z!I&WRt<@3wlMHs;)8{0 zh#L`wkyx*aE;(mi?RnnDpw>GEYBkr4=^SiYLYZ5hqpVRiF#$}DozWkcZujeT6(Xmt z#0{JdN1p48PHZk}6jfF1ei_AW=UD%3u<^1PkD`w5c zZvT!!m*C!|vBH5SB1#HknNH%z?@?eRp; zUiWa&(*S|sh&HN?$N%>NXkU~8zS4g&c4Q_nS^lNQu@#9&a3jjGuzJgHqf?dNyMxH0 z8=-h>4^3@ssM~CRYxAMQiSTmSJW_`(+qv8fDn1G>DuRGgC7QGE8u`1#-xabO-}e!J z6^S{Toa#J8gDJ%{q?|w_quMaq|-uu^`YZ6;^|+)(O+VHxX zwPe%)M0p*^neY6nRXTdBgpDKus6J+yu7QRIS!c&)C;8Uo_W9j0p9S>Jk_9nQo8#}I zLLK+AD)N#MCi0E@s$+>^Wn_Bs{#Rgyj+aA*^~LZl@ep#mVDfEWKCLY2NK+SN06-T z_H66+fICxyR)ODTyeM85Du;l~GL`_iZu0@~`g|^d*q0A_$qIRdf@-o^wz)rveROh- z`brE_<9Hi=oF4V#rP3D;BG$4AhLrp#!e84Z$0bw_!!l{f*X5(3R!?J!lUsXw$MCO; ziu!Vz+9>U5CsN^TNA)|8H$w5RTr9$9<1)3t_LNR|9n@hTBc-3sCTrOIF=s7j*WmeZ zeErTw+hp<|D_u)3q|R|mi3j;CjOsbAfBB=l7A}1#^zrXvR;_SF4xuUOY0mJY@%WS;xzqhE;PqS0k*GtNJ6Ud(BY zp^9TF>`d!x*$9nsR@Ut0$$7>2u1S&gGaPd-n}?BenQm5B7f_i_)De>n!T5ry`((YS z@MUxV>{xKRF~qyb2pdiQi9^b}?WxFRxfo;Iu+Hvl!Fw6StZ8aKe5A$#!hB1zFvZ+E zcbIT2z#0Op*azI7f4NLy``IndrA2NxH{D`=jg^dK5b-H_*rA20^(>tZrCI-+w>M(!xN4dW%Nx&wp=)v>ko9%H%lX*nk`eeDVBQx!)pli2Ff~U zkfn2#&~Cd#{BeVv%yEPF+B>Dr#qC)|bG$h&0a1M0oRy@Kva3CdK9I}5|66YX!1-yN zRcFpyQ=Xb&j~0?QpBz%-e(kR0twi5B>rGc8sB$?ew^OdU<#hLHa9*u$1fg}kNHwxb zhf(x|NKkAj!yg&}wdl>0+uG`N8Cld7N+i%)bWA5^vhjSHo=e z=Nck_P%%e0kjTTm znl1$IS>8LbHh%H3o@CpMQ4%=Lm zcSY3?5wD(73&NZ510K=_N)q%&p5szZtG(_Y;`+<%`K|y5b!yMYpDm%Hm#R>m>BB4K zKBfc#2ij8wXa3P6TKoxL6k&nNW3o>qI>Q4w9wjyebJ#|&(>l;b8d#Ya9;Ue_F^zn3TOJUZ%(pQ83vu_IR2}ZCjNPTHkY_sklCF@MV`(nDJ^et z_LIL=9w(z{h1lNr>{>InX2jfp7*d|q@-x?p!Hz%z%y2ymXqgfABZMtF0Hwi^rZFl>24kj?@h7_#rJj;L@|ro7)g3+Rq0tu?tG-SzR!WKY$43ADupce{5*4TD;8@i z6;ZM#8A+#+*6LL3k0nOJMXw8yU#mwqov>1(CU;FLm*7E0dDEyfw&lz*e@yI4K9oM$ z;x1GRBJM&U8M5)qWqR}-AY_Xn*A&<4c4qaFUQ+V(BQ0=`_+s ztW2;G@}ujZ=CxUgp@Y@{n=NhfnT-D~Pba{;p~@50#)g)rXFIj-<#X@_wyI<*%B*^~1W?=glOK=5&&F@|V7k3FOr=)><8KED$O;>4V`H zbHND)P;O@LVQj`w@@s#irRK@{t@vQ@MQ-_ziz<$-*hkZoJC`%bN0#LCzwME7OptCT zVHc=2KbzjBlM;yLyjU!PYcIAJR3PcNU$-^J9Oj#?$C#RtO0QlSDQ$mq7nCY>Hu-J3 z9^6cbsK_j-#0SWHYUDq15jEV*>l5!CFE9j6TCUTQ2`D)X_{S&HwS~9nEPouE^E(%r zpexycg9>(zJ$v=NhSU&&2p)SfcpH<4f#$zt9C-%#6bNk?emxA$N5%(Ad%uzANAZ1Y zmSdF-9Wk(+D|U!nHl$xmY-3R_!iy=l{jZxC@E)=Ud9w=P+(%@jgj;JeIO*zBNj}Lf zl|+86T+kyg{z1p{2HeiK`Gm^joO`Yko?|~H#78{DY!orENqeuB#*GRU&U2w*;-0+O z^4$IE`$4&jvDJ$>?(U4uZlRTLt`s4eYOrR6U8v0^OHZGBZIak3b!8%T9C!whwW3#| zSeBAZdo4)@nm7~V`0{otC|D4C&T(}IA}S9?Cl&h)b_|!qVUC%; z2NwB>3{&{nWzSW{AQ}`Y=qrL?e;yka$ui#Zopo|IBN467UVgbFlqEMNMcrh(Ym-ve zIzxaE22AZ{2VD5f0UST%`N4OrpRXK?gczv8mQ4eBaH)^O!<7OIH`rbJHf%+{nl>+J zZeY7;YIK*>2BK~}6vqE3ixlm<`z2fa-L^RYZ?&SWiCe*xS0{|+H>!-_lY8=fnF5z5 zY2FS!3KA!9uCuEb7`qpH_XcDVkzexWdcb^HFw^GL_i4GxrPk|GJ1!!ra=0vwjWpF| zhkEk<^I9vD|9*8x^4O2Pp)QbM<9^cGt1mm;BW{n8Sd$XQTPRKld3Krq<(d3xYH0*p z>rqY&B<3#TUMhbuXsfx_5Au~C{JV57YRh@cnC93kCtJ?;rR^NScABG{65OKuCtMdh z;_^8C|1Zd@=6~(VgE&3hFjcU2Y^trowtJiFSJQ0H9 z77DNvCy9T;_vHxH-ni9R|_m@k2t3aOeKGhnHyjK1pnGzg2XTC&1 zVd^xV4xP4TbNC-FjWH8as}9tlAg=}k^X--9`w``Y>MRs1z1rkmb@MB%bG7U0Ddpkj zb+*hYa24{*l0)sIsCZOZ1lZ!iej}zYhm5aB75M5BoPp!8&t!IKyI zCKuu&T{s_%JzQ;pRYqW_ymnmm7n#chx4%xA8zmnJt*efNM1P$s@M9ihOsdQ-j~z)B z_vg&F8|31yG|qIFRhx+mT3oaG@bMC&m@!Oopl_jdktWX!c)O&+#VdXpmIwmk77pm^ zA9A4t{F_+f_K)lnP5kgiYoYW6QOYP`AtTUtj|ZasZ-A)R{xplKe2*RkVaOdfmS5y77>s7FrllVv?RM)s5!HjfTk>7pL6!{e zR>y{D=2{}I<-;uxjFt5Ei{bVwj16=-^In4zY0ScHWBadLi0o9IKlARsJH@&C!d5u?fRphyRTTP?^Cp-8JUFyI|D=|- zLxQ8pSvcPRuMNC40N8&S7sy5PAG*+ld#<)KR*sq(Nc)?2-EEDYP2NEmFMAb4ZX@S& zcmmK8{M3e966-IwSp2*HMkmDHmbt3V?ZLtSIWhk9CNUc>Q+JgJt0PDg**uy&np07= z{6&1~L>5waH!raByVxPmNs@gy)lUjLr(c zLX%!}A+vcZ4N5)8HArP%8kvak;~l1oR&ehmsZeD~!R5u8iz{KDaC&Y0`cMz&p3L=( zlu7h_WcS&_y-uhJ>%zvFV$8W#@TJY2tykpcCQ(9#V5wQV)hI{HigT)(I#+472ZhX5 zq*9OR`AvpvE*04E3xm0?-!;908PT}=bOuka8U*cOx^4Dp1JI8x!2V&4_1~^& z6)%N-!zjbx9emHKrd=lgCH{(An;j#OeK9<|M0vBoRvh!EG^Ea5il#d1yTx8$#_Gw}H(`ZV&c&W2+nIG;&BH|oVc)q^m8uk7NN8&hnQyafz zeIAbQv_v}13j~(H{2Q0mk`z;e;rrKnNDRNI%y!nMMCm^Z5S2G4zKjZ#F1q3PY-_h; zvm>6Q3?Q;-1=+gJ8rC9!S{0lQLxd?`vj6$vKbSl@!7u~k$iE$LiFCV;%XMF7{w6^x zBT{hkQAJuH=R4JRA6XWz?eh&(CKcK*P4 zF@U-w4HBn~=KaxqPL0wwjl1>{OJb<5-hz~-m2w#7)*y8UbEz9}Vq{#Q+`mZ}1kQ;w zc>jA^&!2srM%%10zhE~hl(>l1!gth)1cn73B(Lqbj5Z>_!eLnk!WnBdy)%FTfD zgsaOIv}Wd9G-gGnaqi;dTyZ&n2c|xZifY`ibLZwu3^Ag~_3lci{q}pe6gw%hB)x%r zm}^e+Lp6$otdwiz%+g^~<)KAySUcq>?%!IL0zZ=TCf{yV+uo@P2onu)!_j;wVUW}w z(Oc6n_^#YFs$I;67IN|&Yhe|^0g@Ek9>qIT^Z6leU8=s7&fA}axjKi*!|wSu8PAvT*)kldTKGepPMOs;<^OAzuu!PZmf8`H z8aW=FCwGc#qkQNqZ_pXqeMPn#A*?$IH@=j>Dgp(6hOTsAyUF?OBXqblXRHRDQI*Ng zhYf_G8U>|7!1XK3%a#r8%DMfzVP~9exp(h(+Gg%%*u5;NnzAV@G7^SL#&ieh!cu8L zZu)wN%@V!!y_oGC>N`$`M>N&)oO*&HW`djfupCB%R(t=e7*q2>K*Sy=o+cIOXEaK2 zPF_+aSP)hyw7PN_2L@({NyGiI1jYV-NjoKL;|XdJ2pE5aZGHv)OTq$3GDh6^U$W-X-zo2jg^n zh9`qJ1(=P@__?#s-_AvVx$MNAKrhJKmr!(4%iVDR)f!@gpJ#ow`^O3n`Op0SoyC$i z>}YTKCm&!WAtJIV2rILd)m69Cm_iDZkx)@iqt8x&6Zun*+VhtnR(?8{P;-(PPApH z-%GVk8+(3f=Z)_DEg$`erB@u;3B{89$Sr29LNE*>WEylrK{w!#%q5`UsVRgS-_AhI zS({3k*3qjBIc->{(cn5~BjEaRC%p8X!ne&r%k*jy^r>f-#X~)rK+Pz-+XP}v!PMF!u3M$6v}n@OIY@ zj}9+ZH1AxG86_h~(px!bnzas!J=d3-8=-#|w0y0%InD-a_JWQrLF`@mJ9~2r!=193 zYSAl)wQ5I*H}eDS9}Sgz5AriPfWJ~vp{>tldTKvxT5Gzvvk_H=)`I#Y`g(4|Io1G0 z%$j7|*W-gEcR&GHqNlE_@tbN(+(~t8e_rkk!MFOBW_qqXry&w)q3L*Mwf18&*fsNd z7EQNlyGF@tsT%)LUEJV`RUBMS;P^(5&-J_Uk=1Rx{;{|u( z>+;<>FcY-;6gwl2)!dHj?C_;3nui!)=VhvHHSiKKxyk4@K+FHR1?2ZWBiGs z-Z<0lErp3Zi1zU{axWj;E_qTyDB~}XulKVlWY2GOfX*+F!IMPD?y08aLJCoi!l8{eBMEzm(AjDv{ z1Wo9jnd#|MQf1{a(UQ3zwyWb&A2E59X!dDZUf~`i9>I)>_euQ^A zHL?&D4re=DBGh&3Z?hWgwfF|`C>9?>>D`L^C9FAA1Jmb5@v3bJcMyz6<&3~LTPm)f z`@fIc6vptY03c9+t5O%-YdH?R^6-|xQwzFU0S+Yz~Z7Wp~#RdXMV^TDRA}L%|;TVmgCaJ zQ}+yf&p+-b{T7|lsl!#Zjp7_64W%NqP3=n%axZnrChq(MnavJTfk#@Qf-F0B*9Hr8 zuo+SLQs4+8S~L!Ap_;53o40tXvqTe-2Day}sTF}eRE0VS|QpE)0`& zh&u;>gsY1%%bN4Sap)-hE}zhY@cPs6<)n^uBf>nj_iv;2$uLRsw*utvcC7?7BXFBR1;!+bAI&{LYe9Hmqw`L^#@<@BD#D+-oWz z+Dd}jqr$70A|PJgJ_F#cQiL{!HjC@ho1pw`d5j_3PCW;0%dHz(U~w%q#wr7peXz-h zy^N~@4AISZ^U}sFyDaH5UyS7PhI=MIy_IYNA-U=n5VSHO^(qWy%XJJ!r?gItWLgo4mfNQ?jG6b@E$BUbeof^Dn<%0+^i^#iF3_TdI_;l=Mg>VNn?aJ=$lRPy3@L6J0-X#=YBuKFlVzP zZsODAdRb1moUff#&TuQqofdkDn;5Lt1yo%j1s|q0*+x}#Sp8N@J$ip3YTg6egb++H zUek6o{{586tqu2T^q=Hs8ICC6W)TtCT)2pJ@TCn6;|~p@M@?MW{uLoLK$t%|IGsSQJoKL~a;_+aF@-_U zUMW23*c`}qGM}B=m~&xwbysE^D6-M#`Q`U|}&Z|$HrZ_76Nl13vN0*fhKgFhx3 zohw)*L|@D;jG}nthMTv(WM0Er9T6m!y&t%%FJHJo2*7ey<^kOMk$7n4Fr*Ht-oD5* z|5*U7)re>bIo`fKjHPy?&0YL2b&0;dz-|UO>WVuv`AWBd166MAb)KdOuyb@P0uOZu zj+2A1uKM{Ueh_w|AwW%qB7-oxebDjqaM6iKPwB+a+m)o=ivyDi?lf~NKvi3|k0f!# zhX+VWRIkd)GW0S0z?bju-27YH*04JsC_iF@csLRH^t(<*c3m;X zjLP}VsTN`fw7!1*KU7fgYOr|T_I>Ij2%~QJ6-l`+(Zg8=D;JzEd(`LLeRsx!`>%6y z^rI_1s)xv)W-31pq$biyK3UgQddwJMN1{B~_~e(OxF#EdAK|{J+kvf8LLP~Ld?gFm zkwFg!n@z!8lTQg^Ya)9lZL%AuSGQ!8O_1Cw(LOlounX_YN3?8yO zI3RJ%WyGL`7&R$ajvh?I{4>}y=D}kAu0N6$nX=~-YJiBgVnFP9#8paro z2RYVF0}gPTU2o6zGYkDgUAx10_ z7BUcm?qYkEMxZKOf5FuyvE6tB)fCF3$7?!gsZ^An_e{cLex*&3%^C18UR^s;hEovF zRRhC2`xdtWQ-H0y!;2vM_@YH24!o#eA~Fm|a<%qyCQq~mMk@FuGCq)vZ3I1Y0~M>u z)=!ek?!Pl^@37@gEvj%KaRA<0*ZuHv1w;EHnai1}o)O!GvMOmm|E@4sS#?J;a zx0=H?YU3p*EgMpScfkavqYelj>YA30;cuyhAx{N51))5P^nG$I$osSteY<*TCrF3t zA+#%K9|bk$-`HA?+wh|wv9Yg{(~Zh~6By)3V8Yql`$s>A$N z&4nL3Q;(9IhvLr|v&MpNCDg+-hjJp#%>ehBw4d5Dbm=r7Ny*z)s9qydi$z!F^G-Ct zi5v|IbwXUF+wt54us}pap5uf8<{ktrR3SKQ2j@WS%A*VFE+aw}pCCyT zqHT>&3HU{oCWO5ad&PM7q7vvoF2`8gwS~6Ep9~y<&yPf+aYr@n_P73MR|tZ(^2=vZjR%(kh5h!!0Xl?WO|vGX}jG)FgixL~zkREIy$ zS1Xvf6er!%Z-DPGP%_E6+9U zq^HhK_hFxE2R&A>?9I?r9#nJB=5Qm_&UDZ6uOXx8VJtaV(N#q_Q`V+0#3RX&5|kq7 zoD^&5dDcWfyj^E0-1^@*d~eWXMT^Q#nV!AT4xb)a_1lxfl2)H@l*eTg*sj%6$(9Gtuj{a*xa0W(PdE}6_!ebTR%{>z zXKNYXn1%X%siY=A^4ePNXzQ(wRFJrNdFVd5&fmB^eQFhJ;%<1=0Qx1a)OGm!%WVsF zm!?!s4(3b&shRaLu7)BsIql0EpIf{RZbLXr>@P3n&E%&G4t|cs_B!nF@Hhk~05W zx+na8Vb=fDA}=7>BHfU0;PVW%H~wPjF%SM5$(X+6!?*)4$!i&m89884A>V+bJ+%r| zD6o*u=a|WETf~M3Vb-T#Z7B1mVv&-OAnN8)!F(>hSB$6qHSph|PWRhHAOTG;A`d3x zpzos{LRmg&aef*-W^&=fqg|9_s5#zg%=m|lRwWd;DLl}`a6nN;=?Pn$6dDv0a&xoE z|KB5qjr7mHf_1nHINQL~=ab?=v>pbe=k#N}KUEx}IS=FSC|tzV0Eb$CX7*pGE+CL@ za8RRDc^_uY#_X?CJ%B!BmTt_+Hlw7Rh*)fr$yqxME4);$-EDv8|mh;dh| zO4s=j4`y072k{$iGh@6W50S@7?Sp1gJTh6AWgb9A_TnI{p8W0HQ+aJ&n@%z?SIOX zuazfr>i?CZTzY`($`J68K5!Cys*~yM~zD|7H^`Bd3B`vKN_kgR1 zv|ctTGvXsWI}xFGoBQPZ6-JBi=AfaU^hl>4z^+TU8WFM3+$rO|x|(4?G_?%7%Fk$K zuHk*y4qvfQ*7HYvuh|vlR;AOHZpy#jHB~TWc4WSq$R(7aiNv7@``*W>t~fOw7SeYb z1K}AQjf<4U2PF;?^@V1G@PreC)*@eU-IeEtNGNuEER|pqRPOw|y?G7V!_=ti3gN{; z622OdCC37GN?{Aoee(obsNRhpQVVMqscUhqM=*VUH%mCu-xi~AIUPsmweQS zI~Xq5T#G}kg3<~?2>3vg{jX;nW%s@&sRlgGNqhClD^}putt_<4(eC$#37s{xIiajs zrd?%%<%kvtn>pXBML!~J;kFKChD1slh3@}^Xf%-ut653CZ&+yGnu+$XIzxU7C!G;a~dXJptbYRclc#sxXgfp z0L%7?k!Gid&>J@WWE83edtXNG&L^x}=)BX-KOxAw*6U0(j4ed2$~3s}`u#nvb?(aK z{Vi=#!M3Gcr*XL?r5!o_h-bIbGIlbLAYQxI<>Cl-Ukp-)Y~y9glYMdnG=jxF=F#Ve zJ#L=VBZFW=oUJaMA>p%jIxOP#fsWWqRdfo3@ls8&f(G(MZ#KK263iGeIn?kBY#6aq{7FwoQh33Qd0B)IFE@ri^Mn=oLH(}Y#!S85CB5%?jY9`k zF)w56PhKy(8OB|-E{=WstuI9qBftn-EJl4}R3@?WSy%eYI87a*XCc0K=Hx?v!1cqH zM7>Eeddqtv^{<(P>?T@2#$WB4Bw1u^b&)ekPzA9KBxrDH1qsg>x)zF6CP?WJ@M`eA z?IW&$Z@b)e0$Uut$g7fAL29w8+~{jeE$63`mW5Kf?}GMxc6Y69*>crk;Zp|~FTkF| zfp8Ya)kwPdR~xbXvU#s&D$1?QMV_0<~79{M{Pj~uRz&*MVlq{sJ0AaDJk_A zI`i@o+ul9;j|OQbwqpi;T9()s2jYn-#zhMDvb=YGoQdL?7CY@e~i3?cVEvyAUN@57iw;dvfF z^xbMG4Mg#OWH1-p&;Dz?mRMs<+oE~6K^`mwQfKdG1jo?^2|c8}@`erQ6yhWTc-4`1 zsA%LW@Jx_j(R3v;QvI3A72@~uE;5#?qWbbCF|cRu(G6ihb+#eDCXFrj_k7cHh!62E zE6(U>bK>d$M}Y(cnSufxNU@b}yDFmnxnfRqvMcbYvfe>SkKh?mm2HkTIP3DNhoyNe z4b&>3+*lHVIPPDITIItik;w2**7hZr|1B?c@yU456w%))>P?qL+*|G~BB%)C($_xk z@qQrctEjF~X-BDa%PNOw4+MJqZGPz2LTAQurw1I;5^E!}HI?CB!ZalmG(s%UOJVBr zgY4##e5l-F>-yH)=08s8}ptvyuS3I%XL>7o@+ z9mk!?mRri3QjOE2yBq0EdEbNfg;G)Camc4k?5=f6*~sm`RaF=994c(g${S1CXG%H+ zBm_@59xHt5V>**py%tEGk=YNUQhbZ)@MBLeMU&P9ZzN;7_VOK@5PKDD`KvUgSq}%s zv$!|Gk=1om##5)PCJ72pFCzO}SIPHt4Kt?W16kt*|5@RG)zrpY8)UQy7J-wdxuN~_ zYZqebUkx|L_VPm4c5V`y7XQts-lE9WyKX1S-sasll@XnBF^m!R+OtgOx4GP-Un=vj zHqAxOxKVK|x5T3B&~!=)O9M5;M{fB_pKsx{a{`*4kGSUt7431qh4J#AUi+m#U@+WJ zBPQ2(kL(1`eA+o(tn-;7$hemKLPM8`7HK&lYsS?!4sMn2H$)z`6&7FDcc1({KprkK>`9Id)8c4P98 z70l|tY5Mjg#eXlUvk`e$^*`$@S|@nM=|b)i`FfW7zp7InHVQz=k|DYa8Q=3Mfwa@G*hZ(?4IIzQ2o7y?h)V+y}jo@3n&=^lE| zCg-Jeo$YW-a1enOu71v6dN%5bxqZwDCRz7VE&9eW@&?hO?*pg!VFYRzlGP>ek;Vr* z`rfjdV{k7?M*)u-FY#$DI66L2hq2Fwh@tvr?u|LR(p#$^(oneVrd++5el&ZY&8PC{{;~=zdSmQU} z$$tXwUB4F-u17D`43YPJy(o_dvenf^I_+BRG3-5-GM`uBx1GEMlGAsyQr~8W2uv5k(4wcM& zOZW3hr$fLKe3eD}Kn0-bHRx7ho<`J#=wXMMO<&e5$Sra|PykWsyij~mh}X(+HrHsuJrH7Qe7KO6p)iTEM|2v;OnpEP}igG?3*%?3%(koAXZtL(5;(1?2w`m;_ zP=D+U|5GN2ZleCKjV<7c8ei&9%Ju&-_SRuluUpr!gc5?Zbhk(;-QC@tN_U5Vbax{S zqI7o(NJ>k$bc1w%_j2!j&UwD~ywCOixh@uK{nnjxjydKS!B65Tf@p&#y!$Z%7#QPO zO7)VqpNnwD+m|$Ex%v3wa>mPc9PH^odd!L0SVz%@noZLi3D^(=qyxs{V_Y<$;6{J^w0GPF zJ@i8K47UG=fB^^#7;jF2RnS54*tYu2%fU-S+T=`MrhY#%T7{Hg}Q-LNIJdJ#d4zLth%b3#k}ln-4H+AxheO@t!-^kV51= ztH^uyT1GKiL|d;tZS7jtR}-XiHL$xh(h)}JZnok{o*_i28;dvekk}HO5LYi305-NZ zj0X2$2pWV9fmovaa=87|8bc*i0|}xDl%Q>_igF`UzRYk)N z?2X7pgtT|bZ(gBujh}YE#gc>~mHUYKzE&wx;D;rlQBAxGiC7LUd)+W0E`s7t>-$@r z-tK;xTTT;*wNx<9(^uAcKE@%(EOJ}gFGE?32X=!Y8L9ynS_sBh^ZgO-(%1Imd?Ij_ zsMH$55x-QHN|(t41e_|6z-jB!l$)ro$0_pLW~;bs3UfjYvx%aK@&@6C;=;n~kxQ7% z8cw8BR>NcGvoJ+Q5KilxNVrtjZFL;^IUI5^dH7h=q(kIQ_6B~6u3!PD8hGYkvKBaN zS1C)pk>V^9uKnF;9V^(aRnR?|wbOddOc!QKp1)>_f(k`R-kb|Z zJc-3@Ed=6Q{Mtx^1Z3o#y722Ssagj**YS=*tGZj(eM1Q%24tRlUg>o`sw(_ETMT4`_$r#J)^jwO&BdzuPUk|M=+tw3@PVprcm2} zSsY%5ZWaEu0n4s+5>#Hn_zb@}b)eIYs(jCq*@s^=R&M+%O=|5FSOeliqAV7<-j`L~ z=+Z1x<Sa3#>6{()*7^^#`N!gVMjOG?c5o! z;Or2ip2~CWH>5PDb#4JK#xX2=VPqZyZ!ibUvst`OXG+K$1JuuAa}&w+L_7h#xs* z%^tw%CN(A*q8Cta64fBMj-1j@X8fb=R!sdhY!EIKPZ#dbK~x@|Rwoc)1*RzFn2jd4 zoqb;a933@s8MyrfjDTp^2Qmi%sELG}%?Eh8ghaxJ+OH! z{(?^;LOGOP=0s(v+bb&mQoP_A{1x1`>;xT=m-k-XI0TqeBIt=UM+pRA5-gc{L?ZAL zQ-T{_vF$Y(jv|}gacl$wH1J|K2rD{GZoi|9Z71~=e#ZD5P00vx!^swPGCE`@ZcMO4 z{L`sA*2H*F$`Zvn`>-FI%PasMyTB6t6qak;%Tn;bR64>0D>9m5NLa9m2nD%qAg8!; z_!BYA@pL9PA>g)BkI6<9-MW0Y$u4RP`UC4K#--@6+H1-FdaF?Ob7(h3^f_WuN(3>@ zh#Fw!W~+#tBggzhvfvq1+kH5Fod(J4*WEbqsR6`nq$RI2rBUCG!pOjL%f7?w2kuPu zJA|$au~b_K^AYN}iX)jP5vCE`JHVDQr6t}PE}t4HBd%ZkBaJss3Mm_T%zFu34NCpT zo9TG=Yw~EGxG(}>r~%a(^rzMcxhubt2o%j5NAt4jz8F&rwF()2EVcEbOM%%0{==#w zYB-5|Q_c~hD**dQTc8r~bSE}PzP9UQfubVbV9Y||l9TM>>H5H!6-h@rN6l_^Yffq82kTy-f z2zDyk=oopgUF9`7B;PgQ0SB6v2;R*7JwX>}0_>zH1B5?JTh{h9|IfJ6>@4*e0H|5<5}1W8fo zNyaifUOp>OB72ITu#@H8JS7eXqIE{1DlCs_r1Tq|7#5 zt`Tl(>g}gCp7j`E^#1e^To5xE9DEm$pzb6;f`=dFKneSy^L$p}=z|pt%g-T|wLc5% zui;^;g*Tvi=)`V|c_t2NNpFbfKQNbs34 zx-&+oF<8a2v2{I|qtU+P6{o}8^=0uf9U?O%?QgqNlUw)Smqx383nhQDBd;?u2xCLG(jJI-&N3*2e(B zHhu}*a_{rvu8FLNvmL9V;Xg42D)d`SC?tm?&*j_Br`S5E568j$5zUC>8ONo-^Pinfxv6Ar)$%=p~R2&-B*PsZdG&zNUuta%mD*a{+39B)g&*#Bt-$w z<&4;+yF|xe=2Y(v>tZ| zA|vtZK|0`I)&6@SfKZ757G63kxggQ0M>P+|sa5-T< zn|q(6X+k!+Hv>f?5b&&Jen2P}nYP8tu14zTlkXUT|BB{O%l6X9of^$>ev$v27I9a8MFm`I12`Yk{{0rL|L1 zmrpNyp`X8AakbP1I|@xi05%eBUqv?#vOfNt2TaVcYQW#KBO5{J83q$j+WOu7>gU)$ zXN45#SoS@m%I{v$7H;stHIpWhT^j*T1uX7&6R_j2SMKHct%17xYN)(Q-HiD@UWP{y zxxlMPX?oR|vt&Sgxk&t)51#Wa5#+M{fV~U-uYKw(NPZlye@DhVCut68CHU)?;KB>Q zcKgdco9h>F{pI`sS}$bQaYJxp%r-$)GE}_??_d89Sr#T(&|oLpxq%G56Lem(6qjjZ zYe#lKVhNju)gX}vJ+NU!$IFAHJOer;K#tWT;abiB@I=Tz|Bp}lUti!ngC*ieid4Y= zU*Gl5Z2XT;W*PvTIp%@ZVoQv_f8zi6^MAa^maCa~@@v%ZZHQgxM&N|G$3OKi}-DCNF$=^ag{=RSr|@?`obm<#%~tQxBae74ofbW}j7Cinh36^5e*aRpoIw>lvZTNYAjrOe#FExS zILO#QhxE9_l-F_!R~|V9*oX0z0Nj0N3vh5SL2UGDp4Bo5A_F=Dnr|o059^n`R{`K~ zT+(>a-!{4y&CdrMMEN>hs(jV%|CtUdTwm_P2J*Yx<;OCf=OO`5Uuux-OjWy&yMdb< z5fWKx@#DPQjULa-45Yf5H$7JF<}+IBxUMCQ-F=cUGlYP zLj~tv+iAv80AN*As48dPty2dBMPurH5NNKlmb=TYT&f_3aV{zJ8HLloWNI0DDcIQZ z_;5S1QC2nZ%BXlaDQepJ-%D40Z#<4xC&6aWF*h{NxuKsVn4WGA?czWfD~$)Z2H>&L@C-! zzmqBeiNB7^Bqp#86wM*R6Q7$?SYU(e#moSjR+f1g{uv1ivFs<;Q$ne zSVOi*0&+2oLi<8uwAJ0>@)wZKe8ev4km5qjh&~91=7xJGKfd7i&j}#f5Y1Z$`h(md zDiQzQz$R?aAS6`h05!s{=Rkku_&cMwy%b{*jYom{QO$&$vd7^L-z zv56W6VqO3K2qr;TZx9a!gVdishU>`M)>AuWx8oF!&`BXjUkkGwL2o>Q^TQy0z07nQ zXdWL)VJpsotUBLNvTS%f3{iz%K)zsl;R`IqoOJLoen&ufQ9C0-gN6!AAO!3Lpf#q1 zFXY0__JKWK_^`#jocan#5ZB(8ONjq_vV{FR>gH{@8Gg+};(%iy$Uw6*38XFb-w?$! z;zFO*x1E=Y_z|T&R}c>9KoVU!FZb~xQ9=}wBK;>$08dekrU3}PoePbQy*jXHtKI4&@wf5UBh5lH2u> z2CJYm7!VWNbwI6>6JJYkG3NNn1H0jh5x!fb7Q!M0_#afTknpx6>z`~5v>_0}>fO#w z2TivgIwxW~4Y5vB*j9@ZxldcYCYLRCz%#`LR3I4Od-a`8V=S;p9 zAT(KOwRhxw*<3LuhL2TPDH;76kQzu|H*M!8_L@NurXl#4I#;nu~taaUi}K4ekU7)$d$FV1c4nS%&q`7yNTj%#l_A+$;zP+?CMWK?tA=K9Bd` zZA<_E_t0oWM>S9!vIAVqAZor7#VnIta~mSP%Ggchl_;SdeT3)8!DP_3u|wws^KU4R<~#zZiU7Eu)nNol(3G)~4(g z069{ks_#*F&lE_8q%ay^y5HxM_Ji?SGPg-@h3P?^!lioZWJwBAr$Pc70)V{^h1k%sp4jWj~s=^E+ zH>;wI9*T&~L#&UNK98=AR{=h6za4`kq4BFJkcmZ-2P-FhxlSkxqAVwSANu;Ef2#?WhAlQTA1-@)7UlLj27p9RHOdqHPNR6#@Y@j`ibY zcAyJJTDoT#{~_{l*S+W`61I)9ESDIrZ2+oUTZ~s{o|ia9*b-+x0jG5K@=psd;M$84 z{<=!ST5}Hh@XnXq`Wf{@CJ@w}A}Y@KXBq=Kx=~3wdjOE8qvDr&celmgewCpuXxjt| zLGgUSa=}E_9;{8u882?Y6tM2l=4&wvs*mg>?SEP7$;9gg+bmKo_CoFsfjPo= zmET`JhhPX3T$fZ{UBtIL_&vN4TU5eG`eznBST+88o3?-{`<(H^>T^1IOJf?u^&XN4 zG#ol7tcrD_a`Ya`2v%gDo=OlnSapVZ<{i!yan*uE1?sP*ZWjD;79D~;e1T~>R8fUt ztAy+(sCzgjC!g%2=#;uLjkNS?$LsEFH zPqt-p`ytyuw{5ofSNtF9eudFZ?Sk6=zf0Zk+B{tHBejIrH|`+j$0k#2yau!y|Wr^3C<2jQnLnfpLxV z<{3dHzVgNNSMZH+{YZ{w?V6X)XtKV3lWhy9B>GNk6aNAzLhTXg5$x78pW zZ7oE|5v|~Uch(_<;8fvjMtvgvyP$K!dI*Z}%MOust=tW~WXUEG0Y@MSJ^#(?tIzefMPa-Z%XP!)8(le$_#aN9_fM=M{J;FVGINsK$Q_>w51Kb>p6vY% z3G$A<=0)v_zOShoX5h|MS`2rm_f(V|$s8Fb>=)i~@0;`jnLSqTWfda(phz{_9x((e~}OQSleNXQO%)sPFb@_a0<#U_IpI$gXSa!66Z4?%7yo=CpSJ@Ec7+ ze!Pn?<2=D~Z0ahyb5*BAP^_-Uj+|;Ns9AcMmx#Iby1w=fpZ5**b_Vl?an4p!^xCDC zW$9#i(8I#&_d1B~o2d3^;2s6uu4hE@z7F$mDU=@re#83H)GHD`*)UW6J@~dd>A*m> z`~ZX?K6{{VSnoTVWiY+|{jU0DLo^&^1Szhq=Dq5mnZyxuQlBKtHg<2XWR$&Vu3*!Q zzN(MSEr__7${6PDfX?~_nc?Ruz04tS6?wIC%o+uo(f%u5_(^hHz}{MTRW=N0*#+yy z&7)oW_(T)wJau_BE}N;Q7e+;^L~M$5S=u($Lu{=a%=%n6m08=(jcX3`kZ7OBC0je3 zB37)~D4plX8nj!tf~d8N@Opcf$a7z@W)(!%DpjuDD=@@gYc566of*-kdnPB|!`x^c z;uY74jt_uDVKNIe9JWdb6qe9&MFGJ?i8&nCmgQqm*r@qR9;b9y#=Q5#Q>1qy_cfOx z1w#aFSM)Z`Y}ZSC$$8P)-?gG}9eOhIoQE3i-$rtn+I^Y$(GbpOJ#evECU^F1wYHW# zrkj@%tyVX{%>yCu$G2q_M^}DM{GO?o1kt+q?oBss(g;F;gyL7;Z`J02E1t!k*JQEY zWHSiw!;NdeM#S^AU{AV&{VPz559|QouZBwGv`XTK^J0Buh&Fmp1x={kNA)bzl%8nw z8Gu4}HiN{YsEDeD7Y|EVWnFn4C1pr26GTGMhR9t?Z2$-S5WDX|kpSDNcL-fsLoNsV42F3(tjTZMFX?d_O{y}cbn_;XcRq;WG1YNVt; z26FpK@?ZTA-XJ9TDP21xYm7LL=DuLye zDpzmkBkt!M(!II<+2SClbj7x_8W#c2FvEFD5)H<>IqBbbv;nxIQ>X2Bt)|8lUx}Eu zFr(Fuz7LIA*#I{5l4A=k!-tphP8k>IXxUckujj=>-Hw4O4LxX4kZvEff>K^2B+0Ms zB?5XOUar~#aE~`uYOMBegvn(T7wRxx#f&HnrH?@!XCH>-n>~!2U1^7YNjxrcHbhnj% zYE>*{p#e$E8)w4sGPuynL-1qQAj81ng&ojyl0{=VMhcTM`eqP8JS3%%d-8^v#jrrn zX-t&2rTw)96puE5y*pQ;~D~`{?-h zYoqNdy~HiUYl6mQCc(xw;Qrh6^)$se3fFPS<$ePmMJzr} z4`3HrfgRdl{#v2T|K~unF)9>L%0tiWO)rjOhcZ6uf* zU-&qJK{sr}2pmaS-kv};2Dvx+UN=CE^zXb_g?yCNuLq(Y`oS{PO~-UCDnCR|Mt?k9 zBxnh2wdCAJP3a0@@!W$Hnja$Xh9`qxM0un3H-2v%;o2)H@z#v`C&P?y!R$h@=lQglCzS{97{zGi;4gBsl*6EdG-ens{3iS{`?1NEj0T&1MLzLsb`Of_ zp{&_+S@o&|@KF(WvC$v$8*I+P0gugNm%l739y`OMm z?onYr=xx{?-MPPt@TA~keQ69cCzjtL@fz20aXX0^V%ipn>=-!$NOGb%AI@$W4U z*1XfcW>af>i69D^e!sABe7n6g^C{G*YS!e_IoUUU&jGL?f9Td!YB~~aDwoJ|@@CRR zMpTy|DN0oo&g(q8bf9>3PUhQ617pKOjZVrm3BMBh7q~71@B2p3zlIRU>HsA$q<j#B6?F+?$tu~6mvhR_Xtbh2EmLikAOqeD$ zL}0tpQ7us8;#@rt-cox_6#V0 zUHorf_nmc|hSGE9roXQ{MX9VH;MMr|KfF4jK#mB;zzRWeIf!YShX~GigGe+TlBD#DZ zxsb4tW=xLhlPyo@g9kRg`@36(=(dV03i<(*u8;Nkb|Tu_ZBUa*7+p2$Uxsxxxb^7d8} zPY|Zr8sGQHTE<|C<5zU6pHq50Y#ij9`BB+ufs@n%eQQOF>PM1jg|Cb-lh)ECvZVUV%C zMKRX;IVrJ0RylT%wOkWA%fPvi!?yYWEHcl`E@(G4grFSLSk|@Wbb>G@HMD&-d6H<{ z7c<7;7YraF(u{6;<@YQeLC?37BrW`%5%z9m+q#Gq`T$N6r;%K0wr`v|HS?LHBC*VQ zXbAJ-H!jl;w%V5>2zWFFKp)lGUKdHmF`nmvuizjibQkvN991-q*+!qVbFbx}CKw3% zHXHPj+AQTkss2O1v|e1x{BDGgE_1*WqpgzhT8Q$d2kBa_|2bV|wlDjqXg;;Fc%69{ zUL5-3h~SO2B5zkpv+T8Nc=sg!rXug$XE%N5jUoQ3u*gnLkN3-u^kvwL;$)602;InC z&%9a&Ld<*pKcju#=^{APQxB#=?k|vHvbpsna`0*Y<$MOP;?`x0aeE9(v(exOB1%_R zxsW8oEA~Zff}`GQ)X#rNkAISL0aobDY{TXGCd&Bb8Ou-$W*X98RkP};73Fbdu*e~W zYK+qf6WGPWs!N9mcryou%~GS5q4_2p=#vJyM$=hweuN5U>3v_Fy;~5=X z3wF?4? z-n6$#GIN?pC9l|5nyD`qfjrhH{t0r}i9b={5Np3gA=-qPGb*LxoKuKE%UQe6<7&3k zGP=CW{35&eeZf^4U zq~%uv^AG1ry?S(Et}i?b%o&ES#B8=9MqS;ZcLQBUyAei^%%hFjJvE%r=IQIP=~%O$ zNU;j57g73=nULxRbID{w)LrcTn zmCAq6U1x3wT=wyT?J-;si(OtV&Ynwb!*5UbL6QJe1cq6MIhuc6_DAfB zq>NrApi06PHsR26bW5)Isu1SUV_yuVWHJ~O69ia+sx>K>pqiR$-8b&|Bm`DeA$_fSyeC1He~%!fS1|Nl>-3z(2lU2IXuS5uo>0> z10BYqt7B#eF*gcZHc6&uOHebaY0TeCW{bXVgt$SqQ|GEHht8IaX&P%KzRV7B` ziz8{6o{QeK8iYf{$%Gpni~S462dC)jYpwwc!hrG_=5t6POIa5ukNGUE+s?QSkIcAM zBdbs99Wi!rIoMRBn=y3W^6w}&ZHMK&Z@G-i+J8}pt*AT~la0suJ*Y;Pj3c7>%lK=v z3RM9if^4th=zHiVLU2_8j10zk;1mh6xyiWQO8g8sN$rZWxl!A#FO5V(PI=M-X`2!KVaAAdY9m5!`6F7PffD>(e~|W85IJz z6<_mEMXy%@Tkyi=JC~@7sUm&2zl{Srjr=~|l8Hw#)Z^-UC90|+EGS1r8;iuFl4m7^ z@6`Vd$3u}S2z@--!e74X8J~(FfFE;B{kYj2rW?mkbzYg6Q_qIrfz!l!A$GR+<{L~i zhu@3g#){|iDrGId2XYbjL;d5}n�TNz?6yftQn;!WSFW5y09=M6p30$}e>H%mmcm8p)IDP+!9UkzkYwF82yzo=7z zJGeAFjO;JCC$aiP=5H`BsvJX6~(axtC3%~nuO z%Vvza-An{p+knQ_uFjG`S~R^0_gdMS6YaAum+}n^%jKEivx+3^#@x2E^VE|8)$ee?&+y( z(gax}Gdl~t(K7rf+}l%u#LP-9v;B+pSc@h4dgW%h$`+LLE4X*e$Rl?>(O zvb8pg<$BiGfjL>BkFUI9BtHT^b8*P{w}z){?1ajGuOGhVxh5%x&?Xd9OO1hxll^DYsfd>#4nDi zPp;n1j=JbzoQ7jhvaIT3{tWx4%A*&JhQ@nZ6+A0@?`NDRt&?VQf%H2fA zi@yb&v$>otoU>RPN0Am(OMb^87s&q@lYw#blbfLBddOBzQQqPGJzI|^mF$%^Cw*W? zHiYn?AW=jFPC+g5k3yKI6Q3o0Fr^YSF*TplZqRj3yf98<=gH|q&kYN8!hEgFb`_#U z@oau`Z`YJCaDf7wYw*HZ*Qf>Gn%GzNWHgn_eC6);hB{=>MS_XPrA*hW>ZtdLb9J+qh~a~cDukR9DL3N-}Kaj z6~x}gsJdNL(QrB1#`2enpTEqR@J`-j^v)=SRFsx5-a2eJN5@VN4Q^anuOm%W#@CTT z9?wPA{0XgolHo*grC@v%{-;{Bnz3@poIc_axO~sdSl)}7JPBIOwlZ&T4EK!FIj`ZX zo-ZF?Pw(`y?puEHSeynMg0a|{lnEZf>46O^5(Y8MFnmn2Tf8hAt%K*ckFPOUt}3dA z`f!=F0lDHb0iE2`j-upkP92qWf%0{FiKT^hIWo;$(ia^Jy?i#(4S z@$-l|ALqp`kDcpIs?Ij3Ju^l(0;X?!*xO+xaFxJ? z|D}#WsM{`tOZOdM%^WCwkQ&I%64L5yjFBKijggEu>wGKpDjUMmMX2_+`mUBtpGpm( zlOk5J#%7TkPKO>T_bJXaHY1`TPzo8mg5+_p#-i%N>mp`ZH= z$cRKgqF5#3U^H6z@Xvd3Xk~>rXEmA3F%wym_p|8wqdZMAPQG1B)-*}QWQbwvV(K^S ziQ~U3*1xx~?z7~dt?-t4gfcY5gnh<4-9`KkiAiwhdqspIt_!=ruu?XvstH+-U7tM8 zA^i>fY9>PSG>hR%z+c6k4lLm|+#q?r>-7MiMdtm}tG7#p zly{%XbZ`HmQX$bnLmG?XPg7<&Lis?Bv_fLG?3JD&p{hd-IJpvwhr_odzAhN|6xSi8JL zyDd)+BD^mY$?+A`&zx6(fCw+rj^O!xf}pSiK1{5fv{q=7-fK$4)9m-(vD9A|wVd?e zho8?@{Nn8gtluQHLR04xKhlbZfpzK{pDndf{g_E{EW^HTelll@* z&a%jXhWGVGIG0KM(A?gn$kV!?_)l?CvHI9hXJqGQN|w(G*aLyp5?!`98S&i;To>S1 zXrnroFtnd}2_X(BV(njv)HH0SS;pB|qb-tL3#%Ne+sTQPi|LDp@Hj5HZ((^Us54$z zhfu%=(2V;?1$0zENSGCuF~HVI3#oXMSgX9E#z?}_#M}A7v zG~<`j?PTf6{ydFU_72A{L|69 zPb$`e0yiNB)ZF+Kq4zZ-dR`^53!8aB1V(W+^s7 zMfD}x2x|}DkOts^`YE3iJn@5e5htp(5mGU#OjUQfnvCXAUs~we=2F^tr0m-Cn1iWK zXlU2}R_bupS}n#pT=0tIH0p+bq?108|Fr(P)RZCljM4~g@HywL6)EK5owHsxX~(oQD`f z$U|si_P%plrseDlIq0jr0kehVPz$R^=VATxtK!35Vfa48&jQcfaqAcjr;ZyBLtRcW zxAun^W4?=XEzXXs10YRZHJoY3I0o@)SWyOMS+0}hF_Q>p$0OwXIP9WlLl7qu}NjZjXL~ms%&Mt80;Lb2J<(I1HsD z3$M*{#{=I^cq}5CPHjJJ4A7?z`igw1yEH!($D`V{mhZg28yJa^{C1dwoXP#}q?i4i z1O?HjDUX$il}c+(UNSzzVLXM|xaSKc-~3mG`WN*Cur-(xZ*83-XMtgm_4wI+bz0cs zM5_Z}rluzAdp8=2rm^rHwVe9B(aLL@%%}7ROb>;}H?1;bc+;zla1(PA>!%=t+Qi15 zICh|owCW?3@}d+3^Y9U@mlzbN9T@XZvE<@nQz}zjGsE_slbBPJO`D(%#qJB4Ha%}{ z`$1$)S+f8};o;s}*#ZmnQqA#q1-HLsxjuk~G)pdo;t+F9y=-<%Y3?m{&=xPSCv`cJ z##&s+Nk6+i{vBI?zx{`6?`kR4XVzzBBOr*Y|8cz9)g_vCXnD6$vsRPLvISVxaTB8=3X`q@;q@*^y(8!XzBQ#W2ApTc$CzUxEf z7s~h)BVSalm+4)8IH==gPZJ!;GAyTDAcH_vxi?Ei{MXNX;(G>LQ_>0UOH6x}sS{B6 z!eGu#?g7NUh^#N}haOS&tZcA#8i$mip-~7o>chc~Y#o6SV^iHFP z8(DTfV7ZT%+AwIGe@U%RBoA;z9F2SH&?GVLGj0V(W zH^w<8F(_RwZ5$k*>qr3n7X*`*Tv={c6a%+@q^@z(R>rhX@WXFd{6oQ3YKJa5Vq#E!~%xuFjm_3g!Ej6+ZeHK;3^@DnF8_~gh4>9UiLXD zQ|P$CQ@{v9B&0(tfG0=`hFPDMMfRTNM_D?V4%bOHh8LqDf}1SQ-o_niFo6-yjHlf*dkh?(I1e5-FFb_!V*>c87kLHSm)XIpyec8>ZJj@1 za&NMin{yXq{+zgAGf9XGFm|v);mF;1GFVeZV!UWCdE-!ae^K8~$Z&XM3(#0!wW;I} zTvM<~PPGqFbRMZ^tDqIfk_9(eOntC|Dx)26n68~A_y^6kxX4QmSSzfw1&}(LbtS^MRZSz1>v0Y(CL@D)|k(_#_L`LSrW@P8Y8f2Ez|ti zLF&W9kW^<>Xoog3G85};ziL{~^IBTV3h)&lVDniLowCkSFB0ELEACcN(btoT<3;-7 z6v^=>2WC`!9-)#;-{x;Oo0Z@-mUJyHscCnrP3oF(Os~l}%Sy**aY^GC%9XCZ_%iF9 zrZZU`MzEV<9PU(0Hn2S_|6xOc+?4EGRdR%I6`O2mck~+YRcO&;Mf9hp!WGc#zbvOy z9qEp@wUzy>PCN9Pja80IPj%v#)E;Ek-9MLSjIPniNtZcmc`c5H{?~=}1bPU;Opl%j zSy)~2cGq@)f3_M1PIJSO=*I6k!>A2~B)8;alLY>_=D64mKI4`#C+Q%r7TVC+)5i{9 znEUpPAp(PQc3sD$H2r(fb8$4ys9WkTG+9W|it24&+QiCUb}c+~S>%%^9?g~}oyeDH zqPmZ{ypxjJZBR-II)`x*mBu zeqqzkm8g1VJ*2n{Xd`VCrH2TdZW4}C)BTmxi;fo6htji&YR<&nja(sb zXCysF!se5#itZ`?#3JAh#KWl2cd0$QRQ2gSXP=TA^>7iYFSFk}-k&K8l#$%YvQFx=yxBjY0Zwsuk_ zFKTNs=2*X%8V8x`i@>sVSr^vZQuA2u(~$p|! zEbXJcxNOs0Q)SBkTdLwpGHLU)#NSkTe%vuFxK1uvP;)OpXS&R*kvPH88o{rx{UXhgv*v?D5f(Z&c1|u$LM@mv(s)tS;8gC5rC3-Bs!o0_hQ7m z@8yi6msy=`ntq*}kZ%S{N$G@_g5GJJX1uc<{633i{qX!YS*Ej;WXZ7D7cj$-!fDJh z7XHqgk|iTtZ#>7ZC8vK5R!8B7*7XGl#KPXp;>Uqrasi+!{QK8rvS#E%f3=dnQNOW@ z^!vVM71AZA1)BanB};?BE@s-DaGF3wK!(DTV3naMoIB(&2?# z!*X@kRxAf^SY$8S0?S=W-QwDi^9?ngKu*d@G{$n{>4Mj*4)Obiy1y+{S)RHp51y*( zZ#H`9P(&Wr-oz-VL3@Bx$cyLdbzR^Uc&+;GrPmm? z-0zA>uH751N;GA6wZTCh(uoC+1w(E;{a%527D?k@Wuf8`l8&tdW;+C;BQe;uw{vfH zv}*N@zc5Pf-J84{tfe+5=sIxT7`NKHPdZcm+A_ZpS+=Z@m+vWoX11;=`>$+)+1JD; z>+|!ojl`Ptwi;y>X0Q7^N|uGH50b3gbX;CT)TL<&@6`1rcb+qtyG}=majD!$`z)U) ze%hBZrSk}CR-4byXROG##9SPr*G^WEllGj!zF-}@R$(i#&fiF*tevc-D|UH4okL^n zk%{IiN{*(#W1uds)ZVQ@T8t%Fv6MF8VKRC;o)^uxE6$EEjk|=hH!WtIZyCk$3nUUO zljT)m9yyc>os#;rQm1tKT;+P`+@iXQH3^`Z_f2KW%^}BE{-LJG0-~ zc$hC>{5-z*1^=B!b)dsrMtJxX*Xo4kGSJ}p=MtM8syhyy*Q9j}lx+&IPE$6YP#ln0 zm(akB_ZsRd%0KT5%H5(>s$^+PqD>(zc6(7gFCQL1 zOear*gaST6)_};}Cqb9?9Fi@VC4TIDLWh}7C0KhgD@9kj9SL|YV;R_mBqt)2SK z%K|7QT(uefKkr~}s^?g+l4=C%#RbKC$M>i-XmSQibQSWh zepYhW_2^9*JB?&1?h&|GZ)Zhdv|cnruZei^)p|VN;{R*!Jj0^emb|aX5m3n@N>XxC zP;w5E1O-F{1O!wh=PY33K~#{WEs}H2AQ_ruB6og$Ei`c&c6KDU;EG3jWdBywDZa+ zEBwRlKfLxozv-WUT$=AOA}S4Zs{TJt=6~4M|MHV=uiz8^cO(DfI32#>f6I)N-OF%4 zc8uQ)VF5_}Z@H=eyOIA}`2TX7`=|Bo`M+o@ZrU-7%*cpxqi-X`HExMAJ7@58r8sgM zRM+s=Q2OX~~HS3pf7B2iuit*{(waV(mkAokad z>WoDhwS`QuZtMX-n7&j%9k?+F!d%JjP`4o2?0T_*eqm2lyyu^#;<`9L3)rPs2(D_`xm$m@m~q)ye{a3s~y8TZTAT}-<}jjMC% zI~tB}S)APJeT<;nx@oJ3$`j28%pqp)gQj4fp05Rsx5hwm|acL97!LWvGWI zhwAt1LJ@l{<4vjSfeImxGn7uI6`EyVC&oRGB4NdKD)K?!!O=7xQVDtubEwYPtoy$jdkt$3p zD5Z-*#aZ_`lM3is<5D7e|DRH=q^4SUf;iZjZ9#6mIFG=KlEVW)H>*TIvu!O{vrb(= zRa62r0P7A#eN^!r8~5q|U_ubtvU~npH4|++$Fa z^`_-}tWI8HkJEm05CqifU~?t*9jI!4k{aV?{`iqL@d>vZKcQA^8e(v$;nnSLL_Pl>%evJfl? z3KQ}WMxc~}xyn2geyt$PfJ?12^&WfM8WOShi%&WVeuVa3l*?(Kr4xNUriOqdTppRx z=j<25rL=XN$s(Y|m!mIhA7ag-q+z>h*FDZ0>p_In85N?jtRjd~z(EmM8nW*GAwpp@&3_QF3^1uc889=N2~ zzHO-*D4va50Wu2xswjHP^-jksOAe9M8_?{M9(m}K3e0KN5FFXyA5(Exv7gL7YGyxM z!~OwXpi-)zzTSz9KL%(C2&Qp_rvj@l~Qr!9g1|hw)JvQ7N%C0j9n>0tH zYL3#0Dqzp;+mEU!Sm>sB$y4lcVOpnW>RPkjds(-4%FPTX%!lwixr^gTL-0Kg&3NmU zlb|P*mVP!cm;Jn`Ai1q(9o_!b@*W@7q3!ldbFYR5McJ4Lclk?|xEzL{r`y--RY;HB zP5Sy8kd~=MuA23cYlVjJM@XRDlwEv3XsZ)k`0Wdx_e^lK&5HQ?g=51e1|t>iU?9CW zP`ww)YGNaI0zVbP>~%Z!svcwB3CF!k(TamqL!xR806=Lj%Fky!jLAx{l8*zUXAYmy zj-wMg_>%CvR(qJvjsBnd(E)=|{U#nkrLdo79(+B1R%CiLWQ`~jd z1aj8_&2o)_hFaJ(t))FtVP|54IT~BQDM|}@zvxY+X?>?|4I{xsYm`P&Zns_5z;P8m zrnr$|&;{-~5YI|f0JnomKXPImjt`pjObWaJq#hq_K@6#7uas?xn`Zs?7a+pQKLL04 z*O-xbGC0|$MZ7B4m^0`pb$0xerCoqyvV@d|w>%!(grhRR6n_Q>MfeL4MJyrQBD@yW zqRDVfBrLscoexd#E*K8YDf&;xgxn>+8qGotL6f8fKm!C*DTTXXI0%Ixh#dQ7vT5S| z=)K^HI5oNI##vrBkqy=Ch37*FM`weT`-7{;JWnC6?{UYB`O7%IbqJ8;WWplED(j0p zvrk1qkEnL_Fp<}{g#FS`M#;cB68-*Nb`g$J9nB5px%7NgcA#Nr%k4|(j}X1pb0EgJ zuORH_G5&?epm$cHw>v~k*&=EqEOIWCX+8|6P6mr+pp&vFN7DCwqPJ+M?}r)7g5O>@ z6DgYx9&2~_d#&Vs!V=a@uD1bpKWctG`{4pr4j<|qf{${!Q zw!cd2)#(e|?CO~&mE7G$4318u0=z^gLJvd0Qr_fs8F_Hc9UloxIqHz~)XcU&b%@$f z_l3((NS{GGAR*;{g~0|^*3lxLRX(T8`n9YMG;!1 zxdb7QDHpQKBY?EgjS!q_ zxK5vMiZpo@yz-f6XNVou;^W&$MKTRbtBb}-&6)c9Wa!}B$7t5|1KI4`RZmw7E$>w_$Bm#@2!6Z8n-`Q6Z`!+Z2t zoJE&M!+B@*d~Jxxz3ZcHO=}pNfrZcOGo8&)8$25+JRRrN1nB|{EZr>*t>S{nP+dIu zS$m_~o-MEa8iT7M9+px3os~=!Uaed&bT&9~q`>%)vR%jP>7>fbAScdjtA0&bANC85 z<)Pl`3pF_-Q4IhHU49v14VXp8;h)&F?7Q@_^to%H*5^i47U$39hFTAT~`b}cRpI>Hf8f?G2 zAYgeEJlkWYKbJ=*rxV`@2t2it8IUVc4W_&nNGRGp zJ`TNRmqdwCKw}dSu#DA4s6vS1`lcQr5D!^YV=U(*oRqW9kCfHsEEvlN(#(hsAW44v z{Px@EUIW-A_jctd^}2R@|YG|~lUrv(!Jx}i+Ya_ediF`AtD z;vksCXsXVAO}>y(J}?2lf54CQv~&LL0bavdnEmd&p_e6`V!211TRmhrqfKosTy&v4 zDo!$@ zeuWY6vovV`5%x;0)Ss|t3PGuEef@e(2S|IB2nR{X`4{%nH}a7uZr~tR0I8$4;N}b5 zt%P#p&H29lpD&kCE-h9HZ|L+~E{{1)I9W!Ai_L|NZnaB1J!!l^Cdt2tt$e4IPpn0{ zvnkCTJcD>M_CpBo?`}^Y!OuD;tXh4ic-q|;&36M5NDgCf>olIuPPTO0Tm)|Sc+X`& z3z}`sIQ`C&ztJ5)N{kU7<$stzQB-TvFw@?%E^0-orTPi#=$(y7bzW>doV`f(INZZ8 zawJD=C%EFdb}wuFoc2v|yAHbKgGJ;~2#UU4Y~= zPR-JuO-~J)ZR%L6*cqhUa1=B*8@i~`gfi(qGN)Mf_UpW)1WDTL&oJSyGaZ8ZNbJen z+}L9T>a`z1#?R@Tl6IB~*4U>!rGsu#3N79*opkU5p!YD>hupbbpe29wh!LIj3os>3 zzq!$8N*NQ-e>-Stn<4f193;N7gFET){SpEUb1fOP_Y<_hUS5JM^kvKSld9aQ(c)Sr zgAdRPoQ5<%_RJi2VM)5#pS8cWew$gJ3u!Ao47oyNQCh;~S+88cRm3ptZ$g^MoA-8` zga)A=I)ba(;ja6=j?cd zWG5=p9$SM#o9egAz8}KzE?OQrtRqZqfkZ7bBi)HQA^X`{&(7)HZq+Q^yWS%xgMt_}=yM%YfvZ~H`fdjXB-X-yB#R*C=Fg{J6_2zJ_Mx`_U@dQTn*A9y{1 z=dXQsUuzb$$SInOaYZP7w_7UTE6>Hzk51>bV_#u1(;45Kk^6$*Lq4K4{_5xHRoieR z!S;1?o1fDZ1z7gF+uU(K{+PH-Qk=Ypcthw%az(1a@2>@-Tlj?De))u0F3TSp7rMlr zPTRi@N8t8stj6{q@+_{M#6k+aq2zWw(y{M`aIFrSWBv}yFobw!4Gep#*E9`K8#lCt zX@i5;b?t~X?Ui1$%30f4x#*&d-YPUO+%Hlqtp30>BKYQv^KGUx(hBbD9t^ThN??s2=?Ax{;g;^Rmx;{zkJRrCVWRu@Z~U01 zQHQf>xk_Ek9x41U+LNF=iFa2hxn|kH`>2YS6kq;p!tF_E4*mX zkk}5j`!zk_K;sJzU9=zeNo_?fI-u>HtsGR*LSxXj!rJ$JCmE-!T)IyWg(Q*tdsbnk zZ{Ly@+YzpkHA|=6pmPMX$|5KZ_?$&k2=OjsdkIpNA}qS67%WrFNq&n3tOSp{Ke^77 zyR|iTRBz0-CY+w_Gk>4ETs(^Jyxozf&k<2JdnIusd%aA{5z`AxgkIRFlH0GXTdw!k zB7fPD)7*@kZ*JK$B}T)Sf$hxUwjS_3>DZG-91+|3bfC{9b|If}K$C{Ap$!&7pO@wi z969cCsxDq3*h;nTL6>vC-I?8z+H1I&*HV%@-hsHDF-#8**kre@X#ZfS1@mX1z1N5I zQ#yc+de2;x5EBYq%7R_0=2fbnNvcS3qn~U8bcf)rk)QP$pRij7&|P+ozth*Nu*2m; zorTeJhs|$Qd5s^#KxL}wC78HmAZik2_bTsU=ktnm_0$NYqFS;TecoS)DH(X*ZAjrQ zT3vu9E{{Q=jxzye6lmCnJ}KHNz7-`5X>VR+0%JaTqdU)k{A>Zi%%;Zw0(m`l&kl4f zg;htA>Ri@aVGmvrQfsO&B7VZ-``_wvg72y4_?#)f8)$=-zke-JJ|H9g5xiee;EpoW zh=|B}YFCDLT3An~x9RChhVNMV@R}kXVZD0KJ}gLK;Q4Jj#FEt|c&&B0Y@AHEyr-MH zpTGKS#n0gUT-v;rNn+TAxj6JarwdKwoLrZ9tzz93Q&02ih#y2|BN9jRu6v-|p;??3 zA$2Yw+9gs{Sv9r$&p@xj?~CP2i>-tMbzPH?vFnt|W0l#_nY}K$B(e1_X7S{eZ0eH8 z$*M!4t$AJYB-e8(Z<4TFnSNFFx*>r@>4pP+Wft_ zC5(84)cK{uGnijUWqF?38GqT}n*vOcuTNXKaLoaXEjHkO#wPNlqAQj^!5TA!j=QD~ zkW`D;XP+i?u)DVR)Tp10{!D1r)SsB_EoDx~2Y$f=%xDx+ z2jX>En5ego=93+%sQt%qG1q(>x9Yu3-nHh?U6UE8#JslRJ`v^6axKy_Uy-!g+|&V- zu54i&abAnncRXW1;(0K|_>qbmL3l@&d0ocgG|9&wP7}{6NaAv9z!d+Qt3(uoFzO8F z?VrEa2>m3MHx|XKJLAATjp8J`MU*)VF6S$t?lM@P`Aq#1R%jYZBpJ98h`lZgK9 zE=!gSl>*G#Sx+QIwZ@SOFW@KTj*K@^c3zOyE8H|$Cn57!{d(L~a#%Mj{Gz&K-}mje zb{jlpNCiH}yn)WcG6`O<7%X~?lo*`&L_ge2I6oDqCcD!>Q)`Us{spGnw_YJ06iT>q3krWg(Resz}~f2n9ZJ{_JnY78lA8)pZytdd83;^2C#-~fugw0 z%p6R^o7Fr(AD(nIn!j8&{*J%<#k14=tTGoACqi`0+0|b(-!7WJekA(g;AlJ9KCv&* zLs9av6sw2q2WaTd#?x3r5*}@J3!*p3lm41b<_YV`6m8wr1RpwAl`dO9R{7_V%QGra zKzuk|L#mfDYg)>uqLWf}+y5MU5K1a&SR1&325Ryj#^C+hP^(h7VS+s zqInZzwOGzJr2Bqoyq$#-XGj|SoyGI4I^n#Y0QY5lq{i(XRPmGH<@_f=TG7>WY=8Su z@$v@|75d~^+pD(UKc;AEOiIFo(43?g^gQ31{T(7_9g?b7DSWNL%ft`toaeKDYF2{E zsGhg8#7-$(MDg888rFJ`zP|YYylbDwGy3h-6^U809Q7+wYBJDEjA0~U`zbWE;xDWM z6Ik{c3PZCEkM&DPg`-T9TQi8FV@|9`-HPj$JpNj`LQGe|aqgbD~W zT&EOSr)R)&b*zP4b>k#YiaZ^(*MPDHyoiXU-_D2+Ek*~8Yz(EP`b8viHCTHjzGTLK zuL^^=8m?}LN8+P!@xs|6*2pu-B@Xkd!8hD{cXB@?4VLW9 zv02(4vH?&|KaAu-ekwyEBu^(e4G02thhDpD6%R-V*O3IiO3zt$ z@DeaJ(m#Nc$O~ZJm7O^E6gAv{h(M2s`02|C`B8b$HBtYU*s2rpOcnN>3>PzseYE9e ziQ>e%oD`|RmyP_Rhla!!H-Gwp9ABgQNLewHRbyMV;>Nk?Eu@*uy-k~?$}IubDBJ5_ zI2s=hf1tODvMg%NI2Ap7ayg1yF7p}A+3c9LfS1C^hKza-`?20xU$7z15kev1ew%5y zWiki@{r92U8{x`}SmpKOTBQ{smA#e)&Hf=&1fdo#ftZJ0^YxlLFscCemR@%(@Z?x| zDmXinFqsCCrZL+)6u9=RU{fyMxYkREJI1}oJbu<(h;Ei+kTSZbamm&Vp6ljQoLM9- z4NeB5iC@_3JFh;KX!MBs1_@d38g5mL*xR=^PL-0^cdAN8H|SShcP&v>;Wv6?SXsqx zFnB-ALW^ryBJ|=&O|8^a(PfpAq?I4FJIWf7K~c)@w&T4sLA>M?O7@}G2|}9Q60J7$ z_CdS7rt$h9uZ;DB=X~t?Ip8$PH=bgr0D~~{9|ob^rJU@J>F*UAdW>U6(ImqSeMJqY zdn@Nv9ZJC<}pY5fI|4xMnk0YRBKa-`DCfDT?WeV!JByoF}RyP_SZof_YxKCrh=PYy9}H17BYkmyjxzA=(q!8_kOy`D^00du68?L|~Y(3kji&|-xp zzv{jVND_M)gT&ARbQ_qKHiN|j6;b|E*?^2dUl!&{(souL-y1Ci5!stq)a?LH0>mSpv}gnUA&h3y~T&iK0)b7W9q z!JtY!JgF@!hz%pTpC z60&?N<&mB5xtfL>hIWXPsKfIF9?`8m?k1Is)!%ZXpY8v0h)pEwnb{neaUn&?tCpkF zJqDWHFloZG3GP8u?M4+G0?gU5Fn)L93 zx+omYe0%$2l-1|iQ`kRd<(wK=D44yTT5`Q2 zEZ=mawKSny((n`A;TQ6;wq?{ZzQHcpxs*GjIS;-e*pJap6UR`QZ6uv2n(Rw8=DZN zHb!c9Z|E0z@u5i{GjgTz=S6P%l^a{($A)@uAp%7U&#B76%gqt|oOd1rYLg)F2Qd@C zCtm*obolwG+!DkDM0)x3vMGH=nj2K#BVFKhpqbm?k2)Ay@(_GFy`&m;HJ*h#W(77t z`Fy0iWu09rU6ku&c#YfxMf+k?>>_j}XnTVt<8qnHF4?}B^kU6-u*XJi%S^}&)hIs$ zT076^2G_>b3!5}tpN$hYZHW$pZf>Dc=V+r+t1=a;jvm`p3Y{llG%V9vgJFkSgQ(yq z)3H+RZGiJyjAv!xJ1QAnwZWzoPOZ)IiilrX8EZh+zMm%dBHb4rWbIce!BgB4Dc9#F z!c7<^h-Z?^CO%XTtG~QG*Lq;#AJ3~`N7oro0fOGpZ-}sTeD2<8U##k~4T#su!o6uH zrCnVXd`KC^%4rnFiY|Kmj5q1I$j^fRNC#z@u|*~Vhnr>bS=ecbyWhjOmWQ>}j5 zKzg@P>$`O*vi@PzJ&fSmQl63uXuS4NhZf;)wk{s5Tpa#26}!}jS090?KgvECrhhAk zH^g;7(Wx@Y`J-rdl={Lq6T7 zC>mH2+ZntmqBq|@oA5TO|F_G1cd320e(Ttw+l>hTvw?OoF(8U2TrfhwO6R9z#va7x zW2mlbOkIDO#c1@gfy4@!7e>j;+TI8cOaYO^XqL?l3Muyx{ ziQr>@q`)n&qqxc0^W9DO{1AlVUxyCxvop5nxTR%GF{!<1{vZ?lgw}N&{d$3k{Ah)F zNogWmnqMc-#cSqlv?%LV)CpTwf;y>Psq0y zA*pgJ8@O!#f}J!O!DBk&LX*cuB zjfR%6Yl+5LEkZy1Wy`gBCyM9-h2l+#wAJ4UmTt3}q`L(YD>^NJK*hfos9>vzX_gNr(s7`6vbi|7r);<_za$8?S{c8>{T++wbQIqSJmo6gyYi?{wge798rnf zvDQ+Z%61>J5wj_uHd6?(Dj|is^8tDP%}FJxmXp2`-&tvX9(uY9o$~;LidZN9ni|j=G4y~(*hW>7#AZu?i$3BDAU~mbF>OMEn)}KqfjpA zBNXFLCyt3%Z-mhN{9&c4`>015N{$Zc;6qL;{&}yZ93}V0uyny}E4OOf+~)g7Pf}yD z??`OO1%Lsrpg*lnLc@-OiqECYa+XhEix#_*oH*e^V^|VwlE)BpBaO3Oak_ysH%oh1 zc7oq|D#{@+b((I0Np3-C%!iST)|m);{MaDX4x$m=E096;)8I+J(=NkS{9NQfgP|s= z{Fs{6V*>+TF;M(tKC8SEj~FWI=b-szq=@=Y1s` z-rreTsszYJ48twcJI>(9FRMTr$TL)*u~U@C_Kc^pUM}}?<1M6TuS~5k1d=<-p*ZUf zzuETs%rHb7skK#D#1!{{+uaPyd*@D{{mWHL1N4vS{()Id6>^#U(DWNeS22LPG2d!t+%tPqhe2$ZX(GYnV&ckK9_qQIViKVuKYUE zsbp~TXTLM7>ro)-mLx78U_NyF1Wp-7v#wQa>p8~mB&hTp=(o+r$~ya5Y*Z{ZCLO$- zO3wUJyq5o3lUUsdsq&T@x%5`kkX5e3Dv35y?&%d6{_NwOgMau6EZnl;%j4K4~0-Uf{odX078L7`xQ+zIYm`EOmBuL)l?zt-f zEtq5stE~kEZiVJ?-w6nCe1HXI^KQ?i_KgFJCm5-FU@LRv!yXlb`suX~?9>A0 zK6%7z#?ZKnPvD(WO0g~Jq4m6e1qm=$tm~ytk651{st9wDG0>xKp9#bra3&a!&-`Ik z_a((LuKUMh;a8Ao(N>Au-`%V278s+VN+>?GzvH)FI2ZbOxLhJqMLfuRQF&IE&bhr) z@p1i}BaV=%<2HVK)gAm^rwo5R=dZoYXOYAOe?5<+_ijNPd5wZ$NZ@HdJxEoe52=$^ z{8s1nw`voq0bCJjyu@b%bn<5ogLFs#=Sbq2&oAJH@j4ls^84jGBp;<2|6uZ6Kk12} zSrAlVib1u zr}5v%=e2k0*)q1tB8Lk+=Qzh1OqrzUGhGOPG1xN?eX}N!l!m6a^fJ3n54?$Jb%fqR zsduwKjc|t;@WqXh+y=_AR}j z95w^RCD1|s^e)?}zwWV3!js48>UPQA%4eMa7}GzcrpVcp3=>mDpUs=mAH5t$3irsD z@TDwr0ynMX{Ui!TJq?XP)HnXZ%kAji6pyyHlRL;HPVY%qSUM}>kp5; zIfI?XNK-|M|ENMot{~!h^vwi}x2S)IR*;t)Sl))#$g75m+jsp#$N!yONA=g8{_Sc0 zFaJ!VZ_8!zK1%cZ`z-yJZ$5m3=Y1sckyFM0@O1mv@B3>Gj(O_9QtaZP$NTFkJ5qxF zb~pd&&v7NN8t>aZI`i*;q>BtJm=FTtzfa14e@h?+37L-jk{kZ}ABhVAPrc|?-S4EY zzwHYD^3O8(U5r{$z+U^er=E^Z)UYD-e-34}beR2kgI}kE0v4XI?&FdUE;1Ri&(nWAIN= MPDM87`h(~H4}qkN6aWAK literal 240588 zcmb?@_dnMC`?pOt3E4!1Bv~OVGLoGX*$K&>W$y?fE6EBeLS!_Igfc>kQnJg+h|2DM zy1tM5aex1S`}4zfU7s?}^YuPn$MIapc|%`UgNBlgl7N7KMoUx8kbr;`|4HP+;chxiPc_R$+(TXY^-=Q_S9lj(aXO^x)WQcnBj%$J^v*~6Lt?s;|| z7mdvqCm{+|=Aozf?_Ui*F-L|8|M#CLlD{E5^?(1Gp5nC(p+^YA|9tW8_bi_Y;jsE& zUmviShn)5Q{XF^+lK;=&K6l!epzn6B5M}#vuwm3g!WQwc2fjnQR1&i>LZ@#jw6tUEPgl&xqc!lZsERXdDF@nSkXh1vm)5@TsmHg2bT zfn4?1bM6JJMKY-;@voPEh+4Zp>XyPwzZlVQ)=7jkL!~<5gP;IqWY_?yP`M4yLk_z% zSAx{4+uBnaZ!%T$tnRokewHK(br2R0o_CrfW)-VW{&0xLD(xug!?o4#I%%8sAYV#)`?x$ibH~?F{=Am>aWGS&JImtS3`yl z_~zvAQ#(L@yeX!KEI*K{v<>i)NOiA1iiHeq%mP$%X+gE*FEKPfyS4~Dk zLv!)saN-<$l=bJ&pFQf$EiH4}C#3W(#L_o6Hk_QCuKD=|IG9n83E4mJ7)FEz3=*ZgBkgaqYtQO-%%o zwGAetYgLzfY;0_<{av_fXSaH2BV3t>?Qr#zsUJU17aH2y+H$ndeITqfy_=I$UQ$w0 zSor(iP-=R*V5+vax3`M@oWfMj72Tkq1~tCEfdP|q=l-3$u!ojbOhja5b=CP(l$s=2 zFOlwR`4BDYLc`-%21Ku4-wfRRb2OUFnm|BM@W+P|qyC~Zo?c$nwY5q<^Q=|WC!YCi zt+|Vch@3usnu;ne?__k*{rmJJgi;}|?dl7{j+xDR6N@ySJ#o}zKd&XeyI|& z3F)_QcV)@Y@#mk(lD;xf%5X!inxbj4?dg@l`T6;-E7SeOdTMGUWgfy!%`8VOI28Qc zUPbH~e&$_OTT4etng1n{h>lAkL&0AnCabWpFg5k#k;W~;peO+*N}4pmo{Hg|M5eSCO!YHG^6M)ipP>fgUPvhFs{ z&Z35@B>8&?_2yDV&OP(_DQDl6k(|t!s$t5{NJ>IdYF_`orKRQDx1iY%C8?>YT@qG- zt!-_ztl~|tUcLJI^@VMFf}*0LY{Icfb>UQ(Z!h*s_P>Ac(ROn`itBd?Wt24@{^>MP z$B~8;Z!UhRxq7+#&hPKtSwqDk-!?Wkedh+c-@cuBn9T)zpWYS)`THWpr4St%gm(NhYuI_L`vMaaiguRZTATcJ|1oiZY(Q19A#asn;k}!%dWJ&x;M%i zEvh|%GwZ~4=_u>Rj~}b6tLvO4%>6BJI?amp`}p`1=g&_pgSYoFfA*gLwEXeWIUP?b z-JW;vYAP#3Pvbz040!ix+1)9{wQCmdYHqfEn=Fte<4R`jiXNh^t^L!4l9Cd)FgG=~HhP7cmK1b|uWQ7sS1B>T%4WDSHZbgb!Y@u^G<^-Z^h-I=#54 zm(1^1HcD_Z^y^o1G++^tN3*j|5B{{pv$IBZ-#ft`k&!Qyp?|{v7eAA}aO#O`t5#K? zaVwsdiRj-=PEJN0u(PqDkA$s1R^!XFlo?fM5!hP)k;58w=AOd-XM`dC8nuNU&oud+ z`tsEmzr9RqBD1@5H10KRP~(-@3=#oY*5Vf%=FN zozTD2(gsGG!{cw>JeA0G=6UH-1`9Lu#`^k+zzxX*2MSI`(_Vb$b?C^E%L(iqQ@?KP z6?|eU>bG?$R#-^L;Y&?fX6AWaU0RMkR8+Q=rUy>l%q_GEzE~6ye=iN z=*pHK92oHV@W6myi;R@CgqriAxI=F)t~wuiomNPi@&Ps?lX~NG=W>*S8$5r!`%>$A zx$1MptNkHm3_rira!c67Jbo;8LP?n=#QTkkipsuy`>2>jM_1MvZ>Wi-U&qD#^y%)2 z>opG_#&+AEE_{!Nvw#0?%RIzHS!sH6V(nEV)g_6{oE*!aQi2s?Q?nl*&CJZOOFKLC z7tyR7rj0(?c#wkc_;LT4_l06&Vzjihp5ME)G&OlBh@4Pk>@qHRZ6lSoB*%yHNG)7S(?&rAiQ{zTLo|4yvfL5J#wTxl8U9_+V5Fh z<}LvzQD(Gci+Atdp&^Oi>M11*V=;eu?d+K|Kcl&Wa)q=-nQd9W|NMD|p5$ikm@L=c zy?eR2m0Z5c8W|a7>Ksh(L=&H#nc2_Envs@fG7@`5bOEvqmsb)RTsAXy%7rP{E4?Y_0My}4IV*UQa~^_abmjt=$Ek0@oy<4w+Z9TzTL zn(9&dr{K3Viu37ZBVELM+8ty=Lw8!n9%%A?(gniV+1bH?zP$d|M2N|-#8^+y_1ENE zM@Pr8u`&JkJw1A2>3TVeobB$FrlrQhFJHV^{r&rhWrJwSTVv^gS2=hPXugG!G5HUY z4rox4ghfPFHT-t~+>2W0Q)GE=Mu&SwTTTR@Rp&8g>s4 zk7WW|=6T3l9rR;5h!ovHu|M`=<_#TB1g(V!a$LNrzi0DgCIE zswYqC%*8IOe>GO770>5Y{=JKFva?TEzaqJM^(P@Q83_rA-Q~+E{90G?4ND?s`iG=K z*!OdC#!vX6$qL2VRGL=5$7pYU7r(l?dh_PZ#fHs9;V5?U-MOdBt#Hn<#i+|=n=t z3nPTi)cf3Ha};Vg8I6a#`%!6W#o!%<*KCDnAL;holJ=scqDo0gIV>uQ*X_m;-TcI) z>_TP+VA8MqFJ5r8RA5ZnvJT{(B(LbuNmk#T%-ay94QR&@IWe$d*+#-HiF3B2!Wy00N>gM->GE{_dM?8wUM-NwSQ zva)h=a!N`ZEiAspFdPswuWOp}=+oyR>N5_xTwh(iito!tgC2T%r7>t*SVZLC#@vxq zo6)}m357)b0s_rnPmnI7&oA$t$i>-hdY9wnguzu$wFK$tx_O0H;MwY9#en<#7CNXkf6 z4UKmE*5TnmfLvgNk-)#4es9D}*S@$mMn*;99?0MF2|@4ubEm@CS0tj^nI$4G=oGLMLgf>}j<6wtrf^Mjq8{m0~F z$kp4KnOe#B8yXairtZkN(s~$J(y`O)JtKI)?KcllE~tCAx4(a`znIQTUPi_jAUHr# zNm_ar3E*yMYidq^DCrL)rFv3T)jYb{Yx3~XqePD5`6VTvYh5ifc!O8Icjv3e;lZWf zy5%&(#?}RcPM!rczJn`SiE4Ba$-`U*VWZUao~cKJbajc;7{K> z>%e?MV!KnPPAv(>;6S)?dhkPOBKiO~CB((`&z&n4NaCT`yEo_V zzW15Z*blrbQ%4$H!+u~K>}G0m5`e7moyp^i&ue@azX3hBwYOtZ?#%QT<0bFevu81- zfuxUhw_jOUx$qu4wt^#aadA0(q54+m6c8S8t40DxsPBkp?xXztu+FLXA}NG!JdYkd zQV|)Wc)-8aGv>Aypy>BdC)XfJlVo@ofhTavTfFdFUR$2P3q-A40#xGpis^;!px`}g z+?jTG%>3lZ7T_H_d;4$Ced$^%fL^Iv61Yy>$<7W0W}2L|MU(N0CnsbOp~_r~r{{!5o=K+Sk4a0t$yDlk<5WM2lKkjT{G zOIq?%D5`3=uku&0@b(5XXcwpM(nvkqOg}_vyufWiQTGJw=tg4Vzo~RfQBhGVt#92~ zGDe1mJ_}zzHw0c6a&~Y)Z*2<;3mY9BrK3EheI#TT8y4i}o0MBzwY4QaXbWQG*B9cDb+fXwn6FC!?*dULUU-q-HtJoK7Q4s|(J)(c;17guNcfl?wf3dGaE+R8vS z%P&43`0(MjMVgM@tRa*4uSqSOZa6s|rn_n+Kt_hVFRq}kwM10?wIu$x6X?ZqUJhUD z-03X{F95N90qw%17^=D2=}c;2#N@sH>k{y@DLX}$%|Pk-g7WessoDmrbq5Lm6Mp;% zIWcTg-`W9@L>)aQo%Pb8FCSoE>Dua>ckgrsF8K#me0HX&Y*L^wIS|(x=1;vrakxKM z>Nia>e*2gk6J;x4)Q@-BRODzsS8Q#ou70Tjhjf#@3y3@_8w4UKa_+J6l(V$Fpdk5^ zL~2%X0v#Ey#2<5WSM2S{R^EHm2jJBML0xPN3dAJ?@>Fvw6rz;&%gD@hymCdRbd-W* zypzc1rq03Ve#fA_&dDY;sa^W2FBj5C;=Ag&J;ntowbQXVlwJa z4T17F*6_&4NW8@TM{Qn1gmILok7CPv2gM-9yuxKzPW+P&)lX~;x6P%o!SFT*C z5NC+zo+LOhOG>R6!*UI(jHM+eb;pzkujtC>s>@)YBPFS$9S~G>gi~MD@dUoe%_aC1 ztV}8JDXRHH$HRvYaldfLI+S*Oj8PwDIdrwOv4$vpH7WrGlTBy|8cuAWoNnt#6{&R{X+ZZ!BRv{f`{aqCSbBSVdn4%K zJ!0!hQ$X?W-@mU4yr|kIo*?7$b*=MAW7pl|dsVsKbJMDcOs=fctI*Tp5*PgLp-}68 zH8Ro&9P)c^5bZDgF!e3j1LET1A|joiKCK0|kwmKU-amDD_Oic!y{TwamzJ|MWUt-o z{~k*in;#n&M@pnZEdQqI$cgKJN^5ItpFiKsJa$=9N@~Ug?e-6P!XcxguML6huab(2 zie5?NjE~#M%gfiBTcDX-yvWJL_3pMJbKKO}7%vlLW*Ml1_tx4nP%%WIYwAX%s=GjGh5xIzc2@{V@$~FLzespCL z&&Xw5dr(sH_Qwhxr`(4!GnozFG>em(iL4R=cU(wsr~C;*qj&*s#~rk04J=Gg66&Ik z2hPuk3)tH~N}|HV!qVUALmxB5(^9@n(BpUZY%02NRyQs|(AL^fXS)Mh$?X$`_wSRC zlj~)O6L$1pP?_7~KoO9KqP$g@Oe+#O;fITFEc~>lhAjSRMMX?}{9!RMX+SB@RvUM1 z?by&zLT4_MdKu*y&140z0nBT(F#qpe`>$TT3XIC0LP|=iUFZQy*R`KQlg-A?j){rM z&dI4`Y*)+N%fZp*r7@;DdM$6CrRB(CMr7dDEDiW->`ta_3Ys|Jx}KM{iGe z_dKClJXm_is8*YOEG!yoYF-dcz2^qB3k~)3%CG-j0P+EFwenEOxE`Aim@zvyd5ia? zQ0%ph)kB~B#VwOEG9DBbe#}>=qHg&*JbX#wys|REv75{Z6QayiUKM8_nV_WyEsv}F z`yY4UR-p$Aql_ey{vhT0?d$vZ5~8BiuNf46V9uU1G0A7*p$JJ6U7~RkB!0B_zDEv| z+@~36yj@*g`2_{X{^g+5AWc&^+)YX%?0;49W`EkVFTvn0XtdZjB96gp#i>PqhDC9Qpg}*R`Fk z9ErKToGg~)X?ju-6Pfri54;^_q=Z}d> zJFMm>9cZ8lv_#Pq2+e}NH2VC!Q(1X5v3(Bc)7+dR;<(*_j>tZ(WKjPcNyp@r6eB## zQt}8B<VcnL6KRXJf5dCxb=A#JH4#6r!wQHryAY{N))!C}Y(W0tquxm4TI2 zGuaMENq|x7XTUi9Qf_s1HRch18`NaV$o%SRFWe5+s4~nvmnv`x^vsa^CTGtYFbO1S z-uoX-3{C98ET|P^%nb3o++5{6eNf(~t|RwaZCYDfRi5Aa>$mW=9