Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in poetry poe run-digital-data-etl #15

Open
Jeferson100 opened this issue Nov 14, 2024 · 1 comment
Open

Error in poetry poe run-digital-data-etl #15

Jeferson100 opened this issue Nov 14, 2024 · 1 comment

Comments

@Jeferson100
Copy link

The following error occurred when running the command poetry poe run-digital-data-etl in the command line.

(LLM-Engineers-Handbook) PS C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook> poetry poe run-digital-data-etl                                                                                           
Poe => poetry run python -m tools.run --run-etl --no-cache --etl-config-filename digital_data_etl_maxime_labonne.yaml
2024-11-13 21:05:06.001 | INFO     | llm_engineering.settings:load_settings:94 - Loading settings from the ZenML secret store.
Your ZenML client version (0.67.0) does not match the server version (0.68.1). This version mismatch might lead to errors or unexpected behavior. 
To disable this warning message, set the environment variable ZENML_DISABLE_CLIENT_SERVER_MISMATCH_WARNING=True
2024-11-13 21:05:08.831 | WARNING  | llm_engineering.settings:load_settings:99 - Failed to load settings from the ZenML secret store. Defaulting to loading the settings from the '.env' file.
2024-11-13 21:05:08.929 | INFO     | llm_engineering.infrastructure.db.mongo:__new__:20 - Connection to MongoDB with URI successful: mongodb://llm_engineering:[email protected]:27017
PyTorch version 2.4.0 available.
2024-11-13 21:05:12.004 | INFO     | llm_engineering.infrastructure.db.qdrant:__new__:29 - Connection to Qdrant DB with URI successful: localhost:6333
Chromedriver is already installed.
USER_AGENT environment variable not set, consider setting it to identify your requests.
sagemaker.config INFO - Not applying SDK defaults from location: C:\ProgramData\sagemaker\sagemaker\config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: C:\Users\jefer\AppData\Local\sagemaker\sagemaker\config.yaml
Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884      
  warnings.warn(
Initiating a new run for the pipeline: digital_data_etl.
Not including stack component settings with key orchestrator.sagemaker.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in _run_module_as_main:198                                                                       │
│ in _run_code:88                                                                                  │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\tools\run.py:200 in <module>         │
│                                                                                                  │
│   197                                                                                            │
│   198                                                                                            │
│   199 if __name__ == "__main__":                                                                 │
│ ❱ 200 │   main()                                                                                 │
│   201                                                                                            │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\click\core.p │
│ y:1130 in __call__                                                                               │
│                                                                                                  │
│   1127 │                                                                                         │
│   1128 │   def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:                           │
│   1129 │   │   """Alias for :meth:`main`."""                                                     │
│ ❱ 1130 │   │   return self.main(*args, **kwargs)                                                 │
│   1131                                                                                           │
│   1132                                                                                           │
│   1133 class Command(BaseCommand):                                                               │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\click\core.p │
│ y:1055 in main                                                                                   │
│                                                                                                  │
│   1052 │   │   try:                                                                              │
│   1053 │   │   │   try:                                                                          │
│   1054 │   │   │   │   with self.make_context(prog_name, args, **extra) as ctx:                  │
│ ❱ 1055 │   │   │   │   │   rv = self.invoke(ctx)                                                 │
│   1056 │   │   │   │   │   if not standalone_mode:                                               │
│   1057 │   │   │   │   │   │   return rv                                                         │
│   1058 │   │   │   │   │   # it's not safe to `ctx.exit(rv)` here!                               │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\click\core.p │
│ y:1404 in invoke                                                                                 │
│                                                                                                  │
│   1401 │   │   │   echo(style(message, fg="red"), err=True)                                      │
│   1402 │   │                                                                                     │
│   1403 │   │   if self.callback is not None:                                                     │
│ ❱ 1404 │   │   │   return ctx.invoke(self.callback, **ctx.params)                                │
│   1405 │                                                                                         │
│   1406 │   def shell_complete(self, ctx: Context, incomplete: str) -> t.List["CompletionItem"]:  │
│   1407 │   │   """Return a list of completions for the incomplete value. Looks                   │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\click\core.p │
│ y:760 in invoke                                                                                  │
│                                                                                                  │
│    757 │   │                                                                                     │
│    758 │   │   with augment_usage_errors(__self):                                                │
│    759 │   │   │   with ctx:                                                                     │
│ ❱  760 │   │   │   │   return __callback(*args, **kwargs)                                        │
│    761 │                                                                                         │
│    762 │   def forward(                                                                          │
│    763 │   │   __self, __cmd: "Command", *args: t.Any, **kwargs: t.Any  # noqa: B902             │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\tools\run.py:159 in main             │
│                                                                                                  │
│   156 │   │   pipeline_args["config_path"] = root_dir / "configs" / etl_config_filename          │
│   157 │   │   assert pipeline_args["config_path"].exists(), f"Config file not found: {pipeline   │
│   158 │   │   pipeline_args["run_name"] = f"digital_data_etl_run_{dt.now().strftime('%Y_%m_%d_   │
│ ❱ 159 │   │   digital_data_etl.with_options(**pipeline_args)(**run_args_etl)                     │
│   160 │                                                                                          │
│   161 │   if run_export_artifact_to_json:                                                        │
│   162 │   │   run_args_etl = {}                                                                  │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\new\pi │
│ pelines\pipeline.py:1386 in __call__                                                             │
│                                                                                                  │
│   1383 │   │   │   return self.entrypoint(*args, **kwargs)                                       │
│   1384 │   │                                                                                     │
│   1385 │   │   self.prepare(*args, **kwargs)                                                     │
│ ❱ 1386 │   │   return self._run(**self._run_args)                                                │
│   1387 │                                                                                         │
│   1388 │   def _call_entrypoint(self, *args: Any, **kwargs: Any) -> None:                        │
│   1389 │   │   """Calls the pipeline entrypoint function with the given arguments.               │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\new\pi │
│ pelines\pipeline.py:748 in _run                                                                  │
│                                                                                                  │
│    745 │   │   │   │   code_path=code_path,                                                      │
│    746 │   │   │   │   **deployment.model_dump(),                                                │
│    747 │   │   │   )                                                                             │
│ ❱  748 │   │   │   deployment_model = Client().zen_store.create_deployment(                      │
│    749 │   │   │   │   deployment=deployment_request                                             │
│    750 │   │   │   )                                                                             │
│    751                                                                                           │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\zen_st │
│ ores\rest_zen_store.py:1544 in create_deployment                                                 │
│                                                                                                  │
│   1541 │   │   Returns:                                                                          │
│   1542 │   │   │   The newly created deployment.                                                 │
│   1543 │   │   """                                                                               │
│ ❱ 1544 │   │   return self._create_workspace_scoped_resource(                                    │
│   1545 │   │   │   resource=deployment,                                                          │
│   1546 │   │   │   route=PIPELINE_DEPLOYMENTS,                                                   │
│   1547 │   │   │   response_model=PipelineDeploymentResponse,                                    │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\zen_st │
│ ores\rest_zen_store.py:4362 in _create_workspace_scoped_resource                                 │
│                                                                                                  │
│   4359 │   │   Returns:                                                                          │
│   4360 │   │   │   The created resource.                                                         │
│   4361 │   │   """                                                                               │
│ ❱ 4362 │   │   return self._create_resource(                                                     │
│   4363 │   │   │   resource=resource,                                                            │
│   4364 │   │   │   response_model=response_model,                                                │
│   4365 │   │   │   route=f"{WORKSPACES}/{str(resource.workspace)}{route}",                       │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\zenml\zen_st │
│ ores\rest_zen_store.py:4341 in _create_resource                                                  │
│                                                                                                  │
│   4338 │   │   """                                                                               │
│   4339 │   │   response_body = self.post(f"{route}", body=resource, params=params)               │
│   4340 │   │                                                                                     │
│ ❱ 4341 │   │   return response_model.model_validate(response_body)                               │
│   4342 │                                                                                         │
│   4343 │   def _create_workspace_scoped_resource(                                                │
│   4344 │   │   self,                                                                             │
│                                                                                                  │
│ C:\Users\jefer\Documents\Livros\LLMs\LLM-Engineers-Handbook\.venv\Lib\site-packages\pydantic\mai │
│ n.py:568 in model_validate                                                                       │
│                                                                                                  │
│    565 │   │   """                                                                               │
│    566 │   │   # `__tracebackhide__` tells pytest and some other tools to omit this function fr  │
│    567 │   │   __tracebackhide__ = True                                                          │
│ ❱  568 │   │   return cls.__pydantic_validator__.validate_python(                                │
│    569 │   │   │   obj, strict=strict, from_attributes=from_attributes, context=context          │
│    570 │   │   )                                                                                 │
│    571                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValidationError: 2 validation errors for PipelineDeploymentResponse
metadata.step_configurations.get_or_create_user.config.outputs.user.artifact_config
  Extra inputs are not permitted [type=extra_forbidden, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden
metadata.step_configurations.crawl_links.config.outputs.crawled_links.artifact_config
  Extra inputs are not permitted [type=extra_forbidden, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden
Error: Sequence aborted after failed subtask 'run-digital-data-etl-maxime'
Python: 3.11.8
Sistema: Windows
Versão do SO: 10.0.22631
Nome de lançamento: 10
Arquitetura: AMD64
Versão completa: Windows-10-10.0.22631-SP0
Package            Version
------------------ ---------
alembic            1.8.1
annotated-types    0.7.0
asttokens          2.4.1
bcrypt             4.0.1
certifi            2024.8.30
charset-normalizer 3.4.0
click              8.1.3
cloudpickle        2.2.1
colorama           0.4.6
comm               0.2.1
debugpy            1.8.0
decorator          5.1.1
distro             1.9.0
docker             7.1.0
executing          2.0.1
gitdb              4.0.11
GitPython          3.1.43
greenlet           3.1.1
idna               3.10
ipykernel          6.29.0
ipython            8.20.0
ipywidgets         8.1.5
jedi               0.19.1
jupyter_client     8.6.0
jupyter_core       5.7.1
jupyterlab_widgets 3.0.13
Mako               1.3.6
markdown-it-py     3.0.0
MarkupSafe         3.0.2
matplotlib-inline  0.1.6
mdurl              0.1.2
mysqlclient        2.2.0
nest-asyncio       1.6.0
packaging          24.2
parso              0.8.3
passlib            1.7.4
pip                24.0
platformdirs       4.1.0
prompt-toolkit     3.0.43
psutil             5.9.8
pure-eval          0.2.2
pydantic           2.8.2
pydantic_core      2.20.1
pydantic-settings  2.6.1
Pygments           2.17.2
PyMySQL            1.1.1
python-dateutil    2.8.2
python-dotenv      1.0.1
pywin32            306
PyYAML             6.0.2
pyzmq              25.1.2
requests           2.32.3
rich               13.9.4
setuptools         65.5.0
six                1.16.0
smmap              5.0.1
SQLAlchemy         2.0.35
SQLAlchemy-Utils   0.41.2
sqlmodel           0.0.18
stack-data         0.6.3
tornado            6.4
traitlets          5.14.1
typing_extensions  4.12.2
urllib3            2.2.3
wcwidth            0.2.13
widgetsnbextension 4.0.13
zenml              0.68.1
@rkaunismaa
Copy link

Did you run 'poetry poe local-infrastructure-up' before attempting to run 'poetry poe run-digital-data-etl' ?

Are you able to see a local instance of mongodb running on your system?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants