Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[builtin/declare] Support -p #671

Merged
merged 7 commits into from
Mar 21, 2020

Conversation

akinomyoga
Copy link
Collaborator

Support declare and declare -p (cf #647, but declare -f, trap, trap -p are not yet supported)

  • Note: Bash quotes the values with double quotations, but this patch uses single quotations provided by string_opt.ShellQuote.
  • Note: Bash ignores the flag -g when -p flag is specified, but this patch supports it.
  • Note: Bash ignores the flags -nrxaA when -p flag and variable names are specified, but this patch supports them.
  • Please see the changes in spec/assign-extended.test.sh for the detailed behavior

@akinomyoga
Copy link
Collaborator Author

Also, there is a problem in implementing declare -p for an array that have unset elements. In Bash, arrays have a literal of the form ([index]=value), so unset elements can be skipped in the literals. However, Oil does not support the form ([index]=value) for indexed arrays, so I don't know how to represent the array with unset elements. Current implementation outputs an empty string '' for a value None as follows:

$ bash -c 'arr=(); arr[3]=foo; declare -p arr'
declare -a arr=([3]="foo")
$ osh -c 'arr=(); arr[3]=foo; declare -p arr'
declare -a arr=('' '' '' 'foo')

So the question is:

Q. Is there any literal to represent an array with unset elements in Oil?

If Oil does not support such a literal, maybe the output of declare could be changed to the following form. Do you have any other ideas?

declare -a arr; arr[3]='foo'

Copy link
Contributor

@andychu andychu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

Yes the solution you describe seems good, if the intention is to have declare -p output understandable by eval. I guess it can switch to that form only if None is in the array?

I mentioned a few places where we will likely have to change things to translate to C++... so the important part is the tests. (there are some threads on Zulip about how I translate to C++)

They look good pretty exhausitive to me, but I didn't look at every line. As mentioned, I think at least one test should do eval.

[local]
test_var5='555'
## END
## OK bash STDOUT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of listing exact output of OSH and bash, you can eval the output of declare -p?

I think that's the property you want, right?

It makes the test a little more meaningful if both bash and Oil pass the exact same way.


Not all tests have to be like that , but I think at least one should no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of listing exact output of OSH and bash, you can eval the output of declare -p?

Thank you for the suggestion! That's a good idea! I added such a test d1c85b4, but it is still a very simple test.

Actually, the problem is "how to check the results" of eval & declare: The existence, the flags, and the values of each variable should be checked (i.e., printed for the test). In the case of indexed/associative array variables, the existence and the value of each element, and non-existence of redundant elements should also be checked (i.e., printed). If one wants to test everything in the exact same way in Bash and Oil, it seems that one needs to implement another declare -p in pure shell scripts (which may involve other bugs/divergences on the detailed behavior of each shell), so I gave up.

I think that's the property you want, right?

Yes! Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's true, it's tricky. One possibility is to sed to different variable names, and then check for equality. Something like:

eval -- $(declare -p | grep ^myarray | sed s/^myarray=/restored=/)
# now check myarray == restored

arrays_equal ...   # I wrote a function that takes two lengths and then @A and @B concatenated

But yes this is a lot of trouble ...

Actually in Oil we may be able to help with this. We simply have

if (A == B) {  # are string, arrays, associative arrays equal?
}

but that's in the future

@@ -25,10 +26,108 @@
from frontend import arg_def


def _PrintVariables(mem, cmd_val, arg, print_flags, readonly = False, exported = False):
flag_g = getattr(arg, 'g', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can all just be arg.g, arg.n, etc. right? And then you don't really need the temporary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function _PrintVariables is shared by class Export, Readonly, and NewVar. Each has its own definition for arg. Export, Readonly, and NewVar supports flags -n, -aA, and -gnrxaA, respectively. For example, some have flag g and the others don't have the flag g. So, one first needs to check whether each attribute is defined in arg or not, or otherwise, I get AttributeError.

I thought about one possibility to create a dictionary at the caller side something like _PrintVariables(..., arg={'g': None, 'n': None, 'r': None, 'x': None, 'a': arg.a, 'A': arg.A}, ...). But it looks like redundant codes. Also, we need to keep consistency between this function call and the definitions of arg of each classes; every time one updates the set of flags in arg, one needs to update this function call accordingly, which is not so maintainable.

Another possibility may be to support the same set of flags for export and readonly as declare.

Do you have an idea of a more clever way to implement this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK I see. This is OK for now but could use a comment saying that.

(I think I have an idea of how it will translate to C++ ... the dynamic types of arg.X are an issue I've been thinking about.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

This is OK for now but could use a comment saying that.

It has been already merged, so could you write a comment for this? Thank you!

result[name] = cell
return result

def IsGlobalScope(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have # type: () -> bool

TODO: I think we can set up the continuous build to alert of that, since type checking is running there now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Updated 9e94f90.

cells = {}
for pair in cmd_val.pairs:
name = pair.lval.name
if pair.rval and pair.rval.tag == value_e.Str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use tag_() though it's unfortunately uglier. That's because it translates to C++ better -- C++ doesn't have "virtual" fields like Python, only methods. I plan to get rid of .tag and make it .tag(), but for now it's .tag_()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! Updated 0bd3c4a.

flags += 'A'
if flags == '-': flags += '-'

decl = 'declare ' + flags + ' ' + name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally speaking I use the style of creating a parts = [] array and appending to it, then print(''.join(parts)).

But that is not super important now, if it passes the tests it seems fine

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I changed to use that strategy in d3dd9c9.

core/state.py Outdated
@@ -1497,6 +1497,31 @@ def GetAllVars(self):
result[name] = str_val.s
return result

def GetAllCells(self, lookup_mode = scope_e.Dynamic):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I use the style of foo='default' (no space).

I suppose at some point we should do something about issue #1 (autoformatting for Python). If enough contributors want it, I will be motivated to :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thank you! Hm, PEP 8 (mentioned in #1) says that spaces are not recommended for keyword arguments and default arguments.

PEP-8 - Other Recommendations

Don't use spaces around the = sign when used to indicate a keyword argument, or when used to indicate a default value for an unannotated function parameter.

Updated e2c89a0.

@akinomyoga akinomyoga force-pushed the declare-print-definitions branch from ef68ded to d3dd9c9 Compare March 21, 2020 12:44
@akinomyoga
Copy link
Collaborator Author

Thank you for your fast review and useful comments!

I guess it can switch to that form only if None is in the array?

I implemented 3a5eb65.

@@ -25,10 +26,108 @@
from frontend import arg_def


def _PrintVariables(mem, cmd_val, arg, print_flags, readonly = False, exported = False):
flag_g = getattr(arg, 'g', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK I see. This is OK for now but could use a comment saying that.

(I think I have an idea of how it will translate to C++ ... the dynamic types of arg.X are an issue I've been thinking about.)

[local]
test_var5='555'
## END
## OK bash STDOUT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's true, it's tricky. One possibility is to sed to different variable names, and then check for equality. Something like:

eval -- $(declare -p | grep ^myarray | sed s/^myarray=/restored=/)
# now check myarray == restored

arrays_equal ...   # I wrote a function that takes two lengths and then @A and @B concatenated

But yes this is a lot of trouble ...

Actually in Oil we may be able to help with this. We simply have

if (A == B) {  # are string, arrays, associative arrays equal?
}

but that's in the future

flags.append('A')
if len(flags) == 0: flags.append('-')

decl.extend(["declare -", ''.join(flags), " ", name])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you don't need ''.join(flags) or ''.join(body) below, because we join it at the end anyway. But this is a minor detail

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for curiosity, how can I write in this case? The reason why I maintained other lists flags and body is because I wanted to use len(flags) and len(body). Ah, OK. Maybe I could write something like the following, but I just didn't want to split a simple line into many lines.

decl.append("declare -")
decl.extend(flags)
decl.extend([" ", name])

Or this? But the following code seems even less efficient because it induces addtional list generations.

decl.extend(["declare -"] + flags + [" ", name])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually check guard space separation in a loop with i != 0 , and then you don't need to check len(body). And I think you can check len(assoc_val.d) for the last check.

But those are minor details and I think it's fine. There is one more string allocation for ''.join(flags) and one more for ''.join(body), but like you say it's easy to introduce other allocations elsewhere. After translating to C++ the previous style of += would introduce a lot more allocations.

I actually did go back and optimize allocations in C++ generated from Python for the parser! e.g. the last step of http://www.oilshell.org/blog/2020/01/parser-benchmarks.html

But I think this is pretty close to optimal, and it's not a hot path anyway, i.e. declare -p is not that common.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! I see, it is aimed to be translated to C++. Thank you!

@akinomyoga akinomyoga deleted the declare-print-definitions branch March 21, 2020 18:03
@andychu
Copy link
Contributor

andychu commented Mar 21, 2020

BTW the place where I expect it to matter was osh/word_eval.py. It initally built up a lot of intermediate lists and used + to concatenate them, but now it uses the .append() or .write() style. There are a lot of tiny objects in this algorithm, as opposed to the escape codes in strings that other shells use:

https://github.com/oilshell/oil/wiki/OSH-Word-Evaluation-Algorithm

But I expect we can make it pretty fast ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants