-
-
Notifications
You must be signed in to change notification settings - Fork 163
Language Design Principles
andychu edited this page Jun 7, 2024
·
95 revisions
-
Syntax and Semantics Should Correspond
- The same semantics should use the same syntax
- Different semantics should use different syntax
- e.g. discussion in The Five Meanings of #
- e.g.
find -type f
and the I'm too lazy to write a lexer pattern is BANNED!
- Users don't read the manual -- So syntax should be "guessable" based on common, established behavior -- Python, JavaScript, C, JSON, etc.
- The common behavior should be the default behavior. The short thing should be the right thing.
- For example, simple word evaluation makes it so that you can use
$var
instead of"$var"
. That's almost always what you want. -
read -r
should have been the default in bash -- i.e. it inhibits backslash processing, which most people didn't intend withread
- Note that
bin/ysh
has all the right defaults withshopt --set ysh:all
.bin/osh
is compatible.
- For example, simple word evaluation makes it so that you can use
-
Failures should not be ignored.
- Example: in bash, when evaluating
strftime
in printf strings like%(%Y)T
, if the result overflows a 128 byte buffer, it's silently truncated!
- Example: in bash, when evaluating
- Every feature should have Predictable, Linear Performance (extended globs break this rule with backtracking, so they're not in YSH)
-
Minimize the use of global options (
shopt
)- YSH started out with many such options, but I eliminated them over time because it got unwieldy to explain and document.
- There are still many of them and they should be used sparingly. But note that the
strict_
ones don't really have any cost, because they abort your program on disallowed behavior. They don't silently change the semantics. - Rationale: Global state makes code harder to read. It's a "hidden mode".
- They should mostly be hidden under groups like
ysh:upgrade
- Counterexample:
simple_word_eval
is probably the most important one that silently changes behavior, and I think it's justified in that case.
- Borrowed from the Zen of Python (
import this
)- If the implementation is hard to explain, it's a bad idea.
- If the implementation is easy to explain, it may be a good idea.
-
Oils Is Exterior-First (Code, Text, and Structured Data)
- this means interior features are less important: Lisp-like macros as opposed to Unix-like code gen, closures
YSH is less constrained by compatibility, although there is still some consideration for it.
- It should be a smooth upgrade from OSH. Avoid "wild" breakage.
- We keep all the good concepts and throw out some bad ones.
- It should be explainable as clean slate language! This principle is heavily in conflict with the first, but there were surprisingly few compromises necessary!
-
Avoid inventing syntax that doesn't exist in any other language. Most of YSH should look familiar to programmers and shell users.
-
@
has precedent in Perl, PowerShell, etc. - the expression syntax comes from Python, JavaScript, etc.
- However, a corollary of the principle above is: If YSH has completely new semantics, then inventing a new syntax is justified.
- See YSH Language Influences
-
- YSH should be familiar to Python and JavaScript users. Common features like assignment should behave similarly.
- This principle has "leaked" into OSH when omitting
declare -i
. Also, our initial reluctance to implement$a == ${a[0]}
is shaped by this.
- This principle has "leaked" into OSH when omitting
- Conversely, if our syntax looks like JavaScript or Python, it should behave like JavaScript or Python, unless we're fixing a wart.
- e.g. See
#language-design > Things Oils Shipped Without
- This is a corollary of "syntax and semantics should correspond", but across languages
- e.g. See
-
Don't break the interactive shell / top level / examples printed in books
- e.g. We don't break redirect syntax, and we don't break
PYTHONPATH=. foo.py
- e.g. We don't break redirect syntax, and we don't break
- There Should Only Be One Kind of Expression
- Shell has 3 to 4 recursive expression languages: arith, bool, word. And bash has regexes.
- In contrast, YSH has just one expression language. Note that eggexes are "first class".
- Exception: Globs are still a separate expression language. (But they're unchanged in YSH, inherited from POSIX)
-
Avoid single-letter flags and names. This was OK in the 70's but no longer scales!
- For example,
shopt --set
is better thanshopt -s
;test --file
is better thantest -f
- For example,
-
Arrays are first class
- In particular, no silent splitting and joining, as happens with unquoted substitutions,
$@
,echo
andeval
, etc.
- In particular, no silent splitting and joining, as happens with unquoted substitutions,
- YSH has reference semantics in general, but value semantics for everything that shell does
- Making copies of List
- Passing List as ARGV
- But for Python and JS stuff, you have reference semantics
- It's OK to make common things look pretty, even if they are slightly inconsistent
-
if is-main
is nicer thanif (_is_main())
, even though it conflates success/fail and true/false. (It is also a builtin, so it doesn't have errexit pitfalls.)
-
- Avoid syntax with confusing corner cases -- e.g. What does ${####} Mean? and Shell WTFs
- Avoid adding syntax that will be used rarely.
- Example: All of these are valid in YSH, and will be common:
--flag foo
,--flag 'foo'
,--flag $mystr
,--flag=$mystr
,--flag u'\n'
, etc. - There is a corner case for
--flag=u''
- the u is not significant. But so far, all proposed "cures" are worse than the disease.
- Example: All of these are valid in YSH, and will be common:
- Don't take on problems you can't solve correctly
- a major example of this that we don't assume we know the syntax of external commands like
cp
,ls
, etc. - for both completion and linting
- a major example of this that we don't assume we know the syntax of external commands like
- No implicit serialization / deserialization from typed data to strings
- e.g. flags, env vars, or J8 notation
- Conversions are always explicit. This is mainly because they always involve the possibility of errors, and we don't hide errors.
- Don't be afraid to be low-level
- For example, arguably a "high level" language would use some kind of
Decimal
number type, since that would match text and JSON. - But we use double-precision floating point numbers, where
0.1 + 0.2 != 0.3
, etc. - This is because doubles are familiar to low-level programmers. We encourage learning base 2, and the details of floating point numbers -- they are essential low level concepts!
- In general, I like shell because it's an interesting mix of high level (short code) and low level (close to the system). YSH preserves that flavor.
- For example, arguably a "high level" language would use some kind of
OSH is a "cleaned up shell/bash" and heavily constrained by compatibility. But there are edge cases where we have to make choices. The spec tests have uncovered dozens of cases where existing shells disagree, so we have to make a choice!
-
Avoid complex "line noise" syntax. We won't add more syntax that looks like
${x@P)
,${x^^}
,cat <<< 'hi'
, orexec 2>&-
. It's too elaborate and unfamiliar. - The Common Subset Principle -- In general, OSH shouldn't introduce incompatible semantics for the same syntax and be very compatible with its legacy shells. It might not run every last bash script. However, in those cases, you should be able to make small modifications to allow your script to run under both, OSH and bash. Most often these changes are to improve clarity.
- Example: In bash,
echo X > @(*.py)
means the same thing asecho X > '@(*.py)'
(yes really). OSH disallows the former for clarity, but the latter is in the common subset of OSH and bash. - Example: The meaning of
()
indeclare -A assoc=()
is changed to obey the common subset principle. It means empty assoc array rather than empty indexed array because the context is clear, and because in bashdeclare -A dict
means something different.
- Example: In bash,
-
Static Parsing
- Dynamic Parsing (parsing at runtime) Confuses Code and Data.
- Consider Interactions Between Language Features (bash doesn't do this, e.g. extended globs)
- Minimize the combined OSH+YSH language size to the degree possible.
- Where YSH duplicates functionality from OSH (like arithmetic), it has to be significantly better.
- This partly explains why we keep OSH string literals in YSH, and why bash
declare -a/-A
behave differently in YSH, and whydeclare -i
isn't supported. - It also explains some constraints on the syntax, i.e. that we only have a
ShCommand
lexer mode, and noYshCommand
lexer mode
-
Don't Silently Change What Code Means. Instead choose a new syntax
- Early on, I wanted to take over
set
for assignment (leaving all options forshopt
. But now it'ssetvar
. It was tempting to take it over, but a bad idea. -
cols
could have beenselect
, but that rare feature was taken. - An exception is
shopt -s simple_word_eval
, which does (silently) change the meaning of unquoted$x
. But most newcomers and even some long-time shell users are surprised by the splitting; that is, many shell scripts actually only operate correctly on names without spaces. So in many cases this option will silently fix bugs, but will require adding an explicit split() where looping over unquoted variables.
- Early on, I wanted to take over
-
Local reasoning about code. You shouldn't have to look at the top of the file constantly to figure out how code behaves.
- Blocks like
shopt --set errexit { }
allow local reasoning, rather than setting the global permanently -
redefine_proc
prevents distant definitions from clobbering your code - TODO: tag procs with
ysh:all
? issue 1147
- Blocks like
Blog: HOW OSH Is Designed / Why OSH Isn't Bash
- You should be able to express arbitrary byte strings. Everything should be "8-bit clean" by default.
- UTF-8 is an optional (but common) layer on top. (Ditto for other encodings.)
- You should be able to use existing Unix tools with new protocols. (e.g.
grep
still works with lines of QSN. In contrast, the\0
delimited format offind -print0
is doesn't work withgrep
.)- This is a narrow waist argument -- conforming to the waist enables code reuse
(referring to: CSTR Proposal and TSV2 Proposal. And the deferred Shellac Protocol Proposal, and Coprocess Protocol Proposal)