Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of autoimports via new namespaces #1407

Closed
gilch opened this issue Aug 29, 2017 · 11 comments
Closed

Get rid of autoimports via new namespaces #1407

gilch opened this issue Aug 29, 2017 · 11 comments
Labels

Comments

@gilch
Copy link
Member

gilch commented Aug 29, 2017

Namespaces are one honking great idea -- let's do more of those!
-- The Zen of Python

Autoimports are causing us headaches. #1367, #791.

Macros are also brittle due to lack of namespacing and hygene. #277. Even some compiler builtins have this problem--

=> ((fn [list] (eval `(print ~@list))) ["Hy" "world!"])
from hy.core.language import eval
from hy import HyExpression, HySymbol
(lambda list: eval(HyExpression((([] + [HySymbol('print')]) + list((list or []))))))(['Hy', 'world!'])
Traceback (most recent call last):
  File "c:\users\me\documents\github\hy\hy\importer.py", line 201, in hy_eval
    return eval(ast_compile(expr, "<eval>", "eval"), namespace)
  File "<eval>", line 1, in <module>
  File "<eval>", line 1, in <lambda>
TypeError: 'list' object is not callable

It seems like that should work. I didn't even use any macros. Can you spot the error in the Python expansion?

One might be tempted to scoff and say, "You should never use a builtin name as a local!". That might work for Python, which has a fairly small number of builtins, but for Clojure, which has ~600 symbols in core, that's an unreasonable cognitive burden on the programmer. Hy isn't quite there yet, but it's still got a much bigger core than Python's builtins. The name builtin is already a problem. #525 Maybe restricting the names of special forms is okay, but we really need to be able to shadow the other core names with a local.

Clojure's namespacing may hold the answer. If the compiler had used __import__("builtins").list instead of just list in its quasiquote expansion, the above would have worked properly. Clojure automatically namespaces symbols in its syntax-quote forms, so they work more like that.

An inline import is not a big deal in Python, since modules are cached. It's basically just an extra dict lookup--a fast action that you do all the time in Python code.

Using __import__ directly is frowned upon in Python--__import__ is kind of an implementation detail (but we could say the same of the AST itself). The recommendation is to use the import statement, like we're doing now. But that's harder to use than an expression, and requires an extra gensym, and doesn't play nice with __future__. So if you need a more portable import expression, you're supposed to use the one from importlib. But how do we get to that without an autoimport? Is __import__("importlib").import_module("foo").bar any better than __import__("foo").bar? Chicken/egg.

The solution to a chicken/egg problem is bootstrapping.

step 1 new builtin

As a start, I propose adding a _# object to Python's builtins module upon import of Hy. The _ marks it "private" and the # makes it unlikely to interfere with any other Python use of the module, even by other libraries, since # is not allowed in Python identifiers. This object's class will override getattribute to make a dot access a module. Now you can refer to (e.g.) list as _#.builtins.list in macros, and it will work properly from any module, even in a context where list is shadowed.

step 2 no more autoimports

We can get rid of all autoimports in the compiler once this is available and we rewrite the compiler and macros to use these namespaced symbols instead of builtins or anything from core we're autoimporting now. This is a a bit less tedious than rewriting them to (e.g.) __import__("builtins").list or (. ~g!builtins list) after adding (import [builtins :as g!builtins]) to every macro. Unlike some complicated import expression, _#.foo.bar is just a HySymbol, and can be treated like one by macros.

step 3 upgrade quasiquote to syntax-quote

But it could be automated further like Clojure does in its syntax quote. This way, the compiler would insert the namespace prefix for you. You don't have to write it, but it comes out in the expansion anyway.

There could be some namespace macro that alters the compiler state, maybe by setting a _#.hy.core.ns object, which would map abbreviations to their expansions, e.g. {"list": HySymbol("_#.builtins.list"), ...}. We could make the def form add the mapping to the current namespace. #911. This way you don't have to build the mapping either. You just set the namespace, and def builds it for you as you go. setv wouldn't alter the namespace, so adding the abbreviation is optional. There would also be a way to include all the mappings from one namespace in another. You'd often include builtins and hy.core in a custom namespace, for example.

step 4 special variables

We could also use the namespace system to help implement special variables. hylang/hyrule#51. We'd update the def form to make special variables instead of the normal kind. A dot lookup would go through a special __getattr__ method that dereferences a dynamic variable, instead of returning the Var object itself. This means that you'd have to spell out the prefix when using them outside of a syntax quote.

We can start using special variables to configure things like other Lisps do, like changing how things print at the repl.

step 5 module-level namespacing

Making the compiler add the prefix to an abbreviation outside of a syntax quote seems like a bad idea. There'd be no way to shadow it with a local. That's not how special variables work in Common Lisp or Clojure.

Since we want to be able to shadow abbreviations, they need to act like globals, not symbol macros. For most things, it's enough to simply add it to the module dict. (e.g. (import [foo [*]]) But that's not good enough for a dynamic variable, which needs the dereference magic that lives in the dot access. At the repl, you could replace the __builtins__ module with an object that does the dereference magic, since if value not found in the globals() dict, Python will look for it in __builtins__. But this doesn't work in modules, which are executed all at once on import, instead of incrementally. By the time you try to set __builtins__ it's already fixed.

The way around this is to exec the code or bytecode on module load using a custom globals object that has the magic in its __getitem__. This way, an ordinary global lookup can dereference the special variable, even without a dot, even outside a syntax quote. And it can still be shadowed by a local.

step 6 finally, a sensible let alternative

Make an _#.auto object that overrides __setattr__ to create a Var object when you attempt to bind one with that prefix that doesn't exist yet.

The binding form implemented in step 4 requires a prefixed name. In step 5 it would also accept an abbreviation from the currently active namespace. Now, if there's no prefix, and no abbreviation, _#.auto is assumed instead. Now you can use bindings like a dynamic-scoped let. These bindings forms seem to do most of we want from let short of lexical closures. The binding get released at the end of the last form. You can nest them. You can use short names, like x and y. You can pass them to functions, if the call is made before the binding ends, and they'll get a lexical argument.

conclusion

That's a big plan with a lot of steps. Some of the details are tentative. I'm not going to put all of this in one PR. That's too much. But you might not like the changes until they're finished, so you need to know where I'm going with this.

I'd also like to get general approval of this plan before I start putting in the actual work. Does this seem like a good direction? Any parts that aren't clear? Is there anything you'd want changed?

@Kodiologist
Copy link
Member

600 builtins is too many. Perl has around 200 (depending on how you count), which is also too many (many of them are no longer very useful, and I suspect they predated the implementation of modules in Perl). Arguably, Hy already has too many. There's no shame in moving things to hy.extra or so.

@refi64
Copy link
Contributor

refi64 commented Aug 29, 2017

Steps 1 and 2 sound great, though I personally don't think using __import__ here would really be a problem.

I got totally lost at step 3 though...

@Kodiologist
Copy link
Member

The most straightforward solution to auto-importing woes is to replace the current auto-importing magic with from hy.core.language import * at the start of the REPL and the top of each source file. Then Hy core functions would work very much like Python builtins.

Things like dynamically scoped variables and protection against shadowing of core function names in macro expansions would be nice. But it's hard for me to guess in advance whether your plan would work and what the side-effects would be.

@gilch
Copy link
Member Author

gilch commented Aug 29, 2017

I got totally lost at step 3 though

@kirbyfan64, I was assuming knowledge of Clojure. It might help if you play with Clojure's syntax quote.

user=> `x
user/x

Notice the prompt. That indicates that the current namespace is user. In the first example, the compiler expands `x to user/x, which is a symbol with a namespace prefix. This serves the same purpose as the __import__("builtins").list trick. So x in the macroexpansion will refer to the x in the current namespace (user), without conflicting with any local in the expansion context.

If you need to insert a symbol without the prefix (like it in our anaphoric macros), you can do that too, by unquoting any form that evaluates to a symbol, instead of just writing the symbol.

user=> `~'x
x

The current namespace lives in clojure.core/*ns*

user=> clojure.core/*ns*
#<Namespace user>

We can change it with the in-ns function. Notice that the prompt changes to reflect this.

user=> (in-ns 'foo)
#<Namespace foo>
foo=> `x
foo/x

And now symbols get a prefix for the current namespace.

foo=> (def x "Foo!")
#'foo/x
foo=> x
"Foo!"
foo=> (in-ns 'user)
#<Namespace user>
user=> x
CompilerException java.lang.RuntimeException: Unable to resolve symbol: x in this context, compiling:(NO_SOURCE_PATH:0:0)
user=> foo/x
"Foo!"

Here we defined a foo/x and it's not available in user unless you use the full name. But you can add the mappings from the foo namespace to the abbreviations available in user by referring foo into user.

user=> (refer 'foo)
nil
user=> x
"Foo!"
user=> (eval `x)
"Foo!"
user=> `x
foo/x

See also, the discussion in #911. Does that help?

@gilch
Copy link
Member Author

gilch commented Aug 29, 2017

The most straightforward solution to auto-importing woes is to replace the current auto-importing magic with from hy.core.language import * at the start of the REPL and the top of each source file. Then Hy core functions would work very much like Python builtins.

I did think of that possibility, but it doesn't work well.

Importing everything at the top doesn't help with #1367. And it creates a new problem: module docstrings have to be the first statement, but now that doesn't work in Hy. (I guess you could explicitly (setv __doc__ ...).) And you'd still somehow have to get the __future__ import to the top of the file. The only statements allowed before a __future__ import are other __future__ imports, and the module docstring.

I'd also like to make Hy's core more discoverable. With _# or something like it, new users could find the core functions with (dir) (dir __builtins__) (dir _#). If we just dump everything into every Hy module, then the user code is harder to sort through with dir. (Maybe you'd always have to use [s for s in dir() if s not in dir(hy.core.langage)] at the repl.) Similarly, user-created .hy modules imported in the Python side would be harder to sort through with dir().

Furthermore, all .hy modules will export all of Hy's core to Python when they're imported with *, unless you explicitly set __all__ to something else.

It's more trouble than what we're currently doing.

Things like dynamically scoped variables and protection against shadowing of core function names in macro expansions would be nice. But it's hard for me to guess in advance whether your plan would work and what the side-effects would be.

I am also worried about some of this. Some of these steps could be done in a different order, or in slightly different ways.

If we did the lookup magic from step 5 first, we can hook anything we please into the globals().__getitem__ lookup. Besides dynamic variables, we could make our own objects behave like __builtins__, but only for Hy modules. This way we wouldn't have to put _# in builtins. In fact, we wouldn't even need that first part of the prefix, since we could make all of our namespaced objects act like a builtin. So _#.builtins.list could be builtins/list or something.

I'd also like Hy to support mypy for static typing, if possible. But I'm worried that the abbreviated special variables from steps 5 and 6 would confuse it. Those accessed through . should be fine though. Similarly, using the globals().__getitem__() hook for anything else could have the same problem, hence putting _# in __builtins__ might be the better option. But maybe that would confuse mypy too, since mutating builtins simply isn't done by well-behaved Python modules.

Another option would be to use symbol macros, but restrict them to names containing a /. So the compiler would treat them differently than normal symbols, just like it does for names containing a .. (/ and // would be special cases, just like . is.) For example, the compiler would expand builtins/list to __import__("builtins").list and hy.models/HyExpression to __import__("hy.models").HyExpression, etc.

Another option would be to autoimport _# in every Hy file (and at the start of the repl), instead of putting it in __builtins__. This would still be discoverable via (dir) (dir _#). It wouldn't automatically import from Hy modules, since it starts with _. It wouldn't pollute every Hy module's globals() dict with hundreds of symbols you might not even use (just one). But we still have to make sure it comes after the docstring and __future__ imports. Since it's no longer in builtins, we could use any name that starts with an underscore, not just invalid Python identifiers, like _#. Maybe _hy could work. The users might want to type these, so we don't want it to be too long.

@Kodiologist
Copy link
Member

Importing everything at the top doesn't help with #1367.

Yes, Hy will still need to correctly position future statements, module docstring, and core imports.

@gilch
Copy link
Member Author

gilch commented Aug 29, 2017

Some more concerns. We can shadow special forms.

=> (setv + 42)
  File "<input>", line 1, column 7

  (setv + 42)
        ^^
HyTypeError: b"Can't assign to a builtin: `+'"

=> ((fn [+] (+ + +)) 21)
from hy.core.shadow import +
(lambda +: (+ + +))(21)
42

You wouldn't expect this to work, but it does. Maybe it shouldn't, but then, how are the shadow functions supposed to work? Maybe shadowing special forms should be allowed in general instead of partially disallowed like now. But then, how should a Hy syntax-quote expand `+? Like hy.core.shadow/+ or _#.hy.core.shadow.+?

Let's look at Clojure--you actually can assign to special form symbols.

user=> (def do 42)
#'user/do
user=> `do
do
user=> (do do)
42
user=> `(do do)
(do do)

and, as you can see, it doesn't expand them in syntax quotes. But, like a macro, it takes priority over a function with the same name, like how shadows work in Hy now. This is probably the right way to handle it. You could still explicitly use the prefixed form, when that's your intent. But it should be explicitly documented, because Hy has a lot of special forms compared to most Lisps.

@Kodiologist
Copy link
Member

You wouldn't expect this to work, but it does. Maybe it shouldn't

Presumably, it's not supposed to, and whoever wrote the "Can't assign to a builtin" feature forgot about function parameters etc. as ways to change those names. We could conceivably ban assignments to + everywhere (more precisely, everywhere except Hy core) or nowhere, but the status quo makes no sense.

@gilch
Copy link
Member Author

gilch commented Aug 30, 2017

Considering how Clojure works, no ban anywhere makes more sense. And it would be easier to implement too. Special form names would take priority in the function position, even when they're shadowed, like Clojure.

@gilch
Copy link
Member Author

gilch commented Aug 30, 2017

I personally don't think using __import__ here would really be a problem.

I worry that other implementations don't have it. I'd like to support IronPython3 and Jython3 when (if) they get released. They both appear to have active repositories. I'm also thinking about supporting Stackless, but PyPy3 might make that obsolete. It doesn't run on Windows yet though.

I suppose we should check out how those implementations do it. Maybe we could special case them somehow if they don't use __import__.

Another concern is the issue of creating hidden dependencies. I'm not sure how big a deal this is, but normally, you want all of your imports at the top of the file, so you know what it needs to run properly just from looking at the head, instead of searching through possibly thousands of lines. Except for __future__, Python doesn't enforce this at all. You can put import statements anywhere statements are allowed. But doing this is considered poor style in most cases, because it hides dependencies.

One "advantage" of using something like _# instead of __import__ is that we could check the module cache, but raise an error if it isn't there, instead of importing it. This would require a module to be imported normally at least once (or put in the cache at least). But importing the module anywhere in the program would put it in the cache, so I'm not sure how much this helps.

@Kodiologist
Copy link
Member

Autoimport per se is gone as of #2141, except for the hy module itself, which is both a user convenience and important for things like hy.models.Expression in the compilation of quote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants