Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listing filenames of produced distributions in the CLI #198

Open
layday opened this issue Dec 7, 2020 · 24 comments
Open

Listing filenames of produced distributions in the CLI #198

layday opened this issue Dec 7, 2020 · 24 comments
Labels
enhancement New feature or request

Comments

@layday
Copy link
Member

layday commented Dec 7, 2020

Source distribution and wheel filenames are variable; they encode a variety of information (e.g. platform, Python version, distribution version) which might vary between build invocations. We might want to think about offering a way to retrieve these from the CLI (e.g. through a new option which would create a manifest) for scripts and automation tools to refer to which would provide a minor convenience over globbing.

@layday layday added the enhancement New feature or request label Dec 7, 2020
@FFY00
Copy link
Member

FFY00 commented Dec 7, 2020

IMO this is out of scope. Maybe if this information wasn't in the file name I would agree on having a manifest with extra information.

I think it is fairly simple to write a Python module that just parses the wheel name and output json or something like that.

import argparse
import json
import os.path
import re

from typing import Any, Dict, Optional


_WHEEL_NAME_REGEX = re.compile(
    r'(?P<distribution>.+)-(?P<version>.+)'
    r'(-(?P<build_tag>.+))?-(?P<python_tag>.+)'
    r'-(?P<abi_tag>.+)-(?P<platform_tag>.+).whl'
)


def parse(name: str) -> Optional[Dict[str, str]]:
    if m := re.match(_WHEEL_NAME_REGEX, name):
        return m.groupdict()
    return None


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'file',
        type=str,
        help='wheel file',
    )
    args = parser.parse_args()

    if info := parse(os.path.basename(args.file)):
        print(json.dumps(info, indent=4, sort_keys=True))
    else:
        print('Inavlid wheel name, see https://www.python.org/dev/peps/pep-0427/#file-name-convention')
python -m wheel2json ~/Downloads/packaging-20.7-py2.py3-none-any.whl
{
    "abi_tag": "none",
    "build_tag": null,
    "distribution": "packaging",
    "platform_tag": "any",
    "python_tag": "py2.py3",
    "version": "20.7"
}

@FFY00
Copy link
Member

FFY00 commented Dec 7, 2020

Perhaps something for https://github.com/pypa/wheel? python -m wheel info ~/Downloads/packaging-20.7-py2.py3-none-any.whl?

@layday
Copy link
Member Author

layday commented Dec 7, 2020

I think you're misunderstanding - I don't want to parse the wheel filename into its constituent tags. I want build simply to return the filename of the distribution file the backend has produced in a machine-readable format. It could then be piped into a hypothetical installer or another tool which might expect an sdist or a wheel. For instance, if we were to produce a JSON file:

$ python -m build --write-manifest-to manifest.json
$ python -m install $(jq .wheel manifest.json)

@FFY00
Copy link
Member

FFY00 commented Dec 7, 2020

Hum, I still am not convinced. I would like to keep the CLI fairly simple. As you said

would provide a minor convenience over globbing

Unless there is a reasonable use-case this would either block or make significantly harder, I am -1.

@gaborbernat
Copy link
Contributor

I think this could be achieved if we print the generated files on stdout and forward other output to sys.stderr:

twine upload $(python -m build 2>/dev/null)

Though, in general, I'm tempted not to support this. I can see a valid use case for this in the sense that currently if you want to build a package and then upload it, you need something like:

rm dist -r && python -m build . && twine upload dist/*

Because otherwise, dist might contain previous builds you don't want to upload via a simple globbing. I'm solid -1 on the JSON part, but I'm -0.5 on the stderr/stdout part... 🤔

@FFY00
Copy link
Member

FFY00 commented Dec 7, 2020

Because otherwise, dist might contain previous builds you don't want to upload via a simple globbing.

But you can simply choose other dist folder. If this was not possible I would agree with this feature, but you can... I think it is trivial to output to another folder if you need to separate things, and then move the files afterward if you want to have everything in the same folder at the end.

@gaborbernat
Copy link
Contributor

How do you select a folder name that's guaranteed to be empty/not existing without invoking an rm first?

@kpfleming
Copy link

As I've been working on a project with @gaborbernat I've run into this exact problem as well. The only reason that this works properly today for twine is because twine apparently does its own glob expansion.

In a CI job I'm running python -m build --sdist --wheel with no prior knowledge of what the generated file names will be, and then I want to do an install-test of those packages. python -m pip install foo/* does not do glob expansion, so it is necessary to either know the filenames of the sdist/wheel, or to use external glob expansion to get their names.

As best I can tell PEP 517 requires the build backend to return the basename of the thing it built, and ProjectBuilder in this tool returns the full path to the thing that got built. The requested information is already available, so would a change to emit it on request be a significant effort?

@uranusjr
Copy link
Member

FWIW if you know the package name in advance, you can do pip install --find-links foo <project> instead.

@gaborbernat
Copy link
Contributor

gaborbernat commented Jan 20, 2021

Calling pip just to find the names IMHO is the way too heavyweight a solution.

@uranusjr
Copy link
Member

uranusjr commented Jan 20, 2021

I was responding to kpfleming’s comment, which was talking about actually installing the built wheels. If you know the project names in advance, you can pass those to pip to install the packages (instead of using path, which needs either glob expansion or knowing the dynamically-generated file names).

I made no mention of using pip just to find out the names. pip cannot even do that (without you manually parsing its debug output).

@FFY00
Copy link
Member

FFY00 commented Jan 20, 2021

Okay, I see the need, though I don't think adding a new option for the might be the best solution. python -m build is a command for users, it is designed to be fairly simple and to be as intuitive as we can. Building from an automated script, without user interaction, is a different use-case. I think adding a new option here would increase the complexity of the command line, and still be not enough to solve the kind of issues that may arise from this use-case, putting more pressure on us to add more options and further make the CLI more complex. For this reason, I believe this use-case should be handled by a different command (maybe python -m build.machine?).
The idea would be that this new command would be able to output the build information that may be necessary by automated tooling. Usually, I'd say to just use the Python module to write your customized payload, but that could be annoying, and given how common this use-case is, having a ready to use suitable interface would make sense. Having this as a separate command also opens the possibility to later split it into a different package if it becomes increasingly complex, or starts needing external dependencies.
How I'd propose this command to behave is to simply output json based on defined json schema. It would then have the option to only output a specific field instead.

So, the usage in this case would be something like:

$ python -m build.machine --output-field build.artifacts
dist/my_package-1.0.0.tar.gz
dist/my_package-1.0.0-py3-none-any.whl

TLDR: Keep python -m build a simple user-focused command and introduce a new command for use in automated scripts and that could appropriately address the requirements of that more complex use case.

@gaborbernat
Copy link
Contributor

gaborbernat commented Jan 20, 2021

TLDR: Keep python -m build a simple user-focused command and introduce a new command for use in automated scripts and that could appropriately address the requirements of that more complex use case.

I'm personally -1 on this proposal. Would confuse more than help to have to maintain and use two separate entry-points depending on your use case. But considering twine accepts glob expressions I'm personally not too fussed about this at the moment, so I have no strong feelings of a solution, and I feel introducing two entry-points is more confusing/pricey than its benefit...

PS. Your proposal is also against UNIX design philosophies, I haven't this duality in other tools, for example, there is not an ls and an ls.machine my 2c.

@layday
Copy link
Member Author

layday commented Jan 21, 2021

Perhaps this is something that we could roll into #192 - if the output format were to be customisable - say, if build could grow a provisional --output-format=(human|json) option, and all non-build output redirected to stderr as suggested, that'd probably meet users' needs. Imagine:

$ python -m build -w 2>&-
Built foo.whl
$ python -m build -w --output-format=json 2>&-
{"type": "build_success", "path": "foo.whl"}

This would mean we've got cook up some kind of JSON schema and we'd need to think about whether this would be more generally useful - would people care about other type messages being given in a machine-readable format?

@uranusjr
Copy link
Member

say, if build could grow a provisional --output-format=(human|json) option, and all non-build output redirected to stderr as suggested, that'd probably meet users' needs.

I was going to suggest something similar as well; if build is going to do this, it should have a global flag similar to Git’s --porcelain=.

Another solution to this would be offer a programmatic API that matches exactly one-to-one to the command line, that either returns or passes to hooks structured data containing relevant information. The stdlib venv.EnvBuild is an example; its init parameters map exactly to python -m venv arguments, and the context argument in various hooks contain information to the environment being created. This way, people looking for machine-readable output can write a Python script with that API and “bring their own serialisation” for data exchange.

@FFY00
Copy link
Member

FFY00 commented Apr 20, 2021

I think I am okay with going with @uranusjr proposal of a programmatic API, though it is not the cleanest solution for this. It would still be useful on its own.

What about a python -m build.json command that will behave just like python -m build but will output json? There we could have all the required options to curate the data output.

I would really like to keep this separate for the main command for two reasons, 1) it makes the command much simpler, and 2) it becomes easy to separate the command to another package if needed. I would like to keep things simple and fairly modularized given that this is a critical package to bootstrap Python environments, I want to be able to easily drop functionality, especially as runtime requirements, if we run into any issues.

@henryiii
Copy link
Contributor

How about having --output-format=default (or similar), and if you pass something else, like --output-format=json, then it looks up a entrypoint? Then build-json could provide a build.output-format:json entry-point. Or you could even include it in the same package, but that would still make it easier to pull it out or have people write more.

I like the idea of a programmatic API, though it's a bit of work to document, it would be nice to have (and sometimes cleans up the internals a little).

What about python -m build.json

I don't like this, personally. First, what do you do with pyproject-build? pyproject-build.json? Second, this is much harder to use with pipx run, which is a fantastic way to run build, especially in CI - you'd be forced into a --spec build pyproject-build.json (or whatever it was called). Third, it's not discoverable. python -m build -h won't naturally show this an an option. Finally, it's not a different command, just a different output option; if it was a different command, then you'd have to duplicate all the options, like --wheel, etc. That's not good API design, generally; it's not composable and is turning something that is fundamentally an "option" into a command. Now if the json form had a completely different set of options, this would be better/correct. But not if it just changes the output.

If it's only json, then json's pretty easy to handle with stdlib utils, so I don't think it would hurt bootstrapping.

@gwerbin
Copy link

gwerbin commented Aug 23, 2022

Sorry to bump an old thread, but here's another use case: generating the name of the Sdist and Wheel files before actually building them (when possible, e.g. when a dynamic setup.py is not present), for later use in a Makefile.

@henryiii
Copy link
Contributor

This can't be done before, as it's up to the build backend to decide the outputs. Build can't know if the build backend is going to produce a pure Python wheel, a compiled wheel, or something in-between (like one that doesn't depend on the Python version but does depend on the OS).

@layday
Copy link
Member Author

layday commented Apr 20, 2024

I've been writing built dist filenames to a JSON file in the dist folder, but that ends up being picked up by twine with dist/*, which attempts to upload it to PyPI and fails. Prepending a dot to the filename would work on *nix, but that would also make it less discoverable and I assume would have no effect on Windows. If we do decide to write filenames to a file next to the dists, we might have to coordinate excluding it from twine.

@henryiii
Copy link
Contributor

henryiii commented Apr 24, 2024

What I think would work would be a --json-output= (bike shedding fine) option that would write out the filename (and a bit more info if anything makes sense) to a file that the user specifies. That way it doesn't have to be in dist unless the user wants it there, you could save multiple runs, etc. For tools like cibuildwheel, this could be a temporary dir.

This can't be done with stdout / stderr since the build backend is allowed to write there, and the info is important / useful. uv has this same problem, and works around it by writing to a file.

@layday
Copy link
Member Author

layday commented Apr 24, 2024

How would I reproduce this in uv? It doesn't expose a build command.

@henryiii
Copy link
Contributor

When uv is building (like for uv pip install or the experimental build command (requires a dev build), it needs to communicate between processes. Stdout/stderr can't be used, so uv moved to using a temporary output file to commutate between the Python and Rust processes. See astral-sh/uv#2314

@layday
Copy link
Member Author

layday commented Apr 24, 2024

Ah, but that’s for IPC with the build backend. pyproject-hooks works exactly the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants