Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add infix operators for relational algebra #8036

Merged
merged 3 commits into from
Oct 22, 2015
Merged

Conversation

jiahao
Copy link
Member

@jiahao jiahao commented Aug 17, 2014

This PR includes yet moar Unicode; this time, primarily to support relational algebra.

  1. Allows all Unicode characters in the geometric shapes block as valid characters.
  2. Defines joins (⨝ ⟕ ⟖ ⟗) and antijoin (▷) as valid infix operators. (Note that ▷ is semantically a shape, but I couldn't find a separate antijoin character.)

@johnmyleswhite
Copy link
Member

I admire your steadfast commitment to exploring the final frontiers of Unicode, @jiahao.

@jiahao
Copy link
Member Author

jiahao commented Aug 17, 2014

These are the voyages of the flagship Julia. Its five-year mission: to explore strange new symbols, to seek out new codepoints and new characters, to boldly go where no language has gone before.

@jiahao jiahao added unicode and removed unicode labels Aug 17, 2014
@StefanKarpinski
Copy link
Member

I'm loling on a bus over this exchange. I do like all these Unicode symbols.

@elextr
Copy link

elextr commented Aug 18, 2014

The Julia language boldly exploring new frontiers of untypeability ;)

@StefanKarpinski
Copy link
Member

Except that we've simultaneously expanded the frontiers of typeability.

@jiahao
Copy link
Member Author

jiahao commented Aug 18, 2014

That's actually a good point. How do people feel about \join, \leftjoin, \rightjoin, \leftrightjoin and \whitetriangleright as LaTeX-like tab-completion sequences?

(Note that \bowtie tab-completes to , the bowtie character (U+22C8); not to be confused with the aforementioned join operator (U+2A1D), . The former is still left as an invalid identifier character in Julia. It's amusing that it is possible to tab-complete an invalid character... @stevengj)

@stevengj
Copy link
Member

Shouldn't join have the same precedence as union, i.e. have + precedence?

The standard LaTeX name seems to be \Join, but probably it is just uppercase because \join is used for something else in TeX? I'd rather have \antijoin than \whitetriangleright if we are going to use \join.

Since the tab completion is mostly autogenerated from unicode.xml, it's inevitable that it won't overlap exactly with the set of allowed identifier chars; I don't see this as a problem.

@stevengj
Copy link
Member

Aren't the unimath names for these symbols \Join, \leftouterjoin, \rightouterjoin, \fullouterjoin, \triangleleft, \triangleright, etcetera? I'd prefer to stick with these names rather than choosing our own.

In general, we might think about importing unicode-math-table.tex. See #8044.

@StefanKarpinski
Copy link
Member

+1 for using unimath names. Except why is \Join the only one capitalized?

@jiahao jiahao force-pushed the cjh/relational-algebra branch from 298cf34 to 14c55d7 Compare August 21, 2014 02:17
@jiahao
Copy link
Member Author

jiahao commented Aug 21, 2014

I wasn't aware of unimath; that makes things easy.

I've updated this PR with a better choice of precedence class (multiplication; table joins are essentially equivalent to matrix products)

@stevengj
Copy link
Member

Can you also update the LaTeX table by re-running the unimath script in base/latex_symbols.jl?

@@ -72,6 +71,7 @@ static int is_wc_cat_id_start(uint32_t wc, utf8proc_propval_t cat)
(wc >= 0x2a00 && wc <= 0x2a06) || // ⨀, ⨁, ⨂, ⨃, ⨄, ⨅, ⨆
(wc >= 0x2a09 && wc <= 0x2a16) || // ⨉, ⨊, ⨋, ⨌, ⨍, ⨎, ⨏, ⨐, ⨑, ⨒, ⨓, ⨔, ⨕, ⨖
wc == 0x2a1b || wc == 0x2a1c)))) || // ⨛, ⨜
wc == 0x2a1d || (wc >= 0x27d5 && wc <= 0x27d7) || //joins: ⨝ ⟕ ⟖ ⟗
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If joins are infix operators, then they don't also need to go in is_cat_id_start. Normally we put them in one place or the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump @jiahao.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump. spaces, not tabs too

@jiahao jiahao force-pushed the cjh/relational-algebra branch from 14c55d7 to 3f35762 Compare September 21, 2014 23:43
@jiahao
Copy link
Member Author

jiahao commented Sep 21, 2014

Rebased and updated with all comments.

I combined and partially rewrote the scripts in the comment block of latex_symbols.jl so that only a single script needed to be run.

@nolta
Copy link
Member

nolta commented Sep 22, 2014

Is the travis failure related?

@stevengj
Copy link
Member

Are some symbols gone? e.g. I can't find \alpha in this list; I think it got replaced with \upalpha, which seems non-ideal.

@JeffBezanson
Copy link
Member

Bump. I think we need \alpha!!

@jiahao jiahao force-pushed the cjh/relational-algebra branch from 3f35762 to f1e884d Compare October 20, 2015 19:05
@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

Rebased and updated.

@@ -99,6 +99,9 @@ static int is_wc_cat_id_start(uint32_t wc, utf8proc_propval_t cat)
(wc >= 0x2220 && wc <= 0x2222) || // ∠, ∡, ∢
(wc >= 0x299b && wc <= 0x29af) || // ⦛, ⦜, ⦝, ⦞, ⦟, ⦠, ⦡, ⦢, ⦣, ⦤, ⦥, ⦦, ⦧, ⦨, ⦩, ⦪, ⦫, ⦬, ⦭, ⦮, ⦯

// geometric shapes
(wc >= 0x25a0 && wc <= 0x25ff) ||

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't many of these already covered by the cat == UTF8PROC_CATEGORY_SO check? It seems like the only ones in category Sm (which have to be special-cased) are U+25F8 to U+25FF.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's useful to do the minimal amount of special-casing here for people in other contexts trying to write Julia lexers with regexes etc., e.g. for syntax highlighting in editors.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I just tried some simple function definitions without this commit and it works. I'll take it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you don't need it at all, because there is already wc >= 0x25F8 && wc <= 0x25ff above in addition to UTF8PROC_CATEGORY_SO.

@jiahao jiahao force-pushed the cjh/relational-algebra branch from f1e884d to f326c25 Compare October 20, 2015 20:10
"\\rightouterjoin" => "⟖", # right outer join
"\\fullouterjoin" => "⟗", # full outer join
"\\Join" => "⨝", # join
"\\mathunderbar" => "̲", # combining low line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just \underbar? math is kind of redundant here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

math is kind of redundant here.

True; that was the name in unicode-math-table.tex. I can change it manually.

@jiahao jiahao force-pushed the cjh/relational-algebra branch from 2d09759 to ead4141 Compare October 21, 2015 18:07
@jiahao
Copy link
Member Author

jiahao commented Oct 21, 2015

I've modified the latex symbol parsing script to strip math- out of character names in unicode-math-table.tex.

@jiahao jiahao force-pushed the cjh/relational-algebra branch from ead4141 to ec96896 Compare October 21, 2015 20:55
Note: ▷ is semantically a geometric shape, but I could not find a separate character for antijoin
@stevengj
Copy link
Member

LGTM once commits are squashed and tests are green.

- Minor clean up of latex_symbols generator script
- Update list of latex symbols
@jiahao jiahao force-pushed the cjh/relational-algebra branch from ec96896 to 93668d1 Compare October 21, 2015 21:11
jiahao added a commit that referenced this pull request Oct 22, 2015
Add infix operators for relational algebra
@jiahao jiahao merged commit e372b06 into master Oct 22, 2015
@jiahao jiahao deleted the cjh/relational-algebra branch October 22, 2015 02:57
@AzamatB
Copy link
Contributor

AzamatB commented Jan 15, 2016

A bit confused, does it mean that these symbols are now defined as relational algebra operators and can be used to perform joins in Julia (since relational algebra was brought up), or we are only talking about including these symbols into set of valid Julia characters?

@tkelman
Copy link
Contributor

tkelman commented Jan 15, 2016

These are recognized by the parser as valid infix operators, but no default implementation of them is given in base. Packages and user code are free to do so. (Though some coordination is called for if packages are defining methods for them on Base types.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants