Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add coauthor annotation #175

Merged
merged 8 commits into from
Jan 12, 2021

Conversation

ziruizhuang
Copy link
Contributor

@gkthiruvathukal , I have added coauthor annotation using your suggested flat format. Since the author name parsing from bibtex is handled by upstream jekyll-scholar, only first name and last name is processed here.

The code is not very elegant, but since the code is only used to generate static web pages, it might be okay for now.

And @alshedivat , do you have any suggestions?

We can have more discussion on the implementation here.

Related issue #100

@gkthiruvathukal
Copy link

@ziruizhuang Awesome! I'm going to take a closer look and make some comments shortly.

@gkthiruvathukal
Copy link

@ziruizhuang I think we're on the right track here! I looked at your code and think it is fairly clean/elegant. It is always tricky business to deal with names, and the perfect information model for it might not exist (without a tree-structured design, which also adds complexity).

Names are always a challenge, and if I am understanding correctly, you'd be able to handle a case like Carl Philipp Emanuel Bach or Carl P. E. Bach by having various first name combinations with middle name components (including any optional ones):

For example:

- lastname: [Bach]
   firstname: [C. P. E., Carl Philipp Emanuel,... ] # I'm assuming other combinations / optional middle names could be elaborated as needed, but most of the time it is names spelled out or initials!
   url: https://en.wikipedia.org/wiki/Carl_Philipp_Emanuel_Bach

I'll do some testing and let you know, but it looks like this would be good to go as a big step forward!

@ziruizhuang
Copy link
Contributor Author

@gkthiruvathukal Thank you for your feedback, and yes, current implementation can handle combinations by adding the middle names / initials into the firstname array.

As for the code, there are two major issues that I haven't find out a good way to mitigate.

  1. Currently, the information in coauthors.yml are processed and converted to arrays in bib.html template. My understanding is that this template is called on every bib entry, so it is not very efficient. A better way would be process the information in an upper level template, and pass the processed arrays as parameters into bib.html. However, I haven't been able to find out which template is the upper level template. For the time being, since this is only called during the generation of static web pages, it might be okay.
  2. The format of coauthor identification is hard-coded and coupled through different parts of the bib.html. One for processing the coauthors.yml, and one for processing per entry author information from the bibtex. Ideally, it would be better if we can build a function generating the coauthor identification key, and call this function whenever it is in need. I am not familiar with ruby nor jekyll, and I haven't learned a way to do functions in templates.

@gkthiruvathukal
Copy link

@ziruizhuang I did a bit of testing on a fairly lengthy list of co-authors. If you want, you can take a look at my site at https://github.com/gkthiruvathukal/gkthiruvathukal.github.io, which deploys to https://thiruvathukal.com. I do agree it is a bit less than optimal to recompute the same information again and again (yes, I'm pretty sure this template is called for each bib item). But in my testing, the build time did not go up dramatically. The entire build time for my site about 5 seconds on my MacBook Air.

There is one minor thing. Before this change, when my own name was listed in a publication, it would be underlined. But now, I am not seeing underlining. I thought maybe I need to add myself in the coauthors.yml file, but that results in a full hyperlink. (That's how I left things for now.) I didn't see anything in your code that would cause this to happen.....but I will look at the diffs more carefully to see if that code somehow got deleted/disabled to underline the site author's name when it shows up as an author in a publication.

In any event, I think going to the flat format is the right way to go, and it's pretty amazing you were able to build this based on the way I described it earlier. I hope this pull request can be approved by @alshedivat, since it will help many users. Even though I didn't write the code, it was nice helping to figure out a good information model for it. Thanks for doing the work, because I was super busy getting ready for the upcoming semester.

@gkthiruvathukal
Copy link

@ziruizhuang Update: I was missing scholar.first_name. I only had site.first_name. So disregard the "minor thing" I reported above.

I do wonder, however, whether this line of code has potential to cause trouble:

{% if site.scholar.first_name contains author.first%}

Is contains accurate here? I would think that there should be a match from the beginning of the string.

@ziruizhuang
Copy link
Contributor Author

@gkthiruvathukal Thank you for your effort in testing the code.

As for contains, I'm not so familiar with ruby/jekyll/liquid, so I am not quite sure. I can find a reference though
contains can also check for the presence of a string in an array of strings.
from Operators – Liquid template language. So, I suppose it should work, though it is somewhat counter-intuitive. We will need someone more familiar with the languages to confirm it.

@ziruizhuang
Copy link
Contributor Author

@gkthiruvathukal I see that the code in your repo uses string in scholar.first_name, while the code in this pull uses string array. We may need to add a documentation / tutorial for users to clarify this.

@gkthiruvathukal
Copy link

gkthiruvathukal commented Jan 7, 2021

@ziruizhuang I follow. That makes complete sense, actually. So scholar.first_name can have all the first names that match. In my case, it could be [George, George K.]?

This is what I have right now:

scholar:
  last_name: Thiruvathukal
  first_name: George K.

So first_name should be an array, even if just an array of one?

scholar:

  # Note: This is how you get "cite" to point to a page within the site.
  relative: /publications/index.html

  last_name: Thiruvathukal
  first_name: [George, George K.]

This would actually be good. I think I have at least one publication where my middle name is not listed....

Happy to help update the docs a bit, once we get this PR finalized.

@alshedivat
Copy link
Owner

@ziruizhuang, @gkthiruvathukal, this is awesome! Thanks for drafting and discussing the best way to go about co-author annotation! I looked through the code changes and your discussion and have a couple of thoughts:

  1. Keep in mind that co-authors with identical last names are rare. Most last names (likely 99%+) will correspond to 1 co-author. I assumed that in my original implementation. I would recommend changing the matching logic to the following (essentially, "lazy matching"):
    a. first match the last name of the publication author and an entry in the co-authors,
    b. if there is only 1 co-author with such last name, use them for annotation, otherwise, try matching the first name(s).

  2. Perhaps a more optimal data structure would be semi-hierarchical as given below, which would allow quick last name lookup and require linear-time matching only when there are multiple co-authors with the same last name.

<lastname1>:
  - firstname: [<firstname_string>, ...]
    url: <url_string>
  - firstname: [<firstname_string>, ...]
    url: <url_string>
<lastname2>:
  - firstname: [<firstname_string>, ...]
    url: <url_string>
  - firstname: [<firstname_string>, ...]
    url: <url_string>

The above solution would also avoid the preprocessing of coauthors.yml at the beginning of big.html.

@gkthiruvathukal
Copy link

@alshedivat I like this proposed structure. In fact, it kind of brings my two original thoughts together all in one! :)

Just so you know, I agree it's rare, but in some fields, it's becoming less rare. This is especially true when there's a large author list (a phenomenon known as hyperauthorship!) See https://www.nature.com/news/physics-paper-sets-record-with-more-than-5-000-authors-1.17567.

This doesn't apply to me, but I have a couple of papers where there are many authors. I also have a collaborator with last name Lu and, sure enough, we had someone work with us on a paper who also had Lu as a last name. :)

There is kind of a "pain paint" though when it comes to certain names as keys, and I think we need to document it. One of my collaborators has the last name "O'Connell". To make this work, I had to put quotes around it, if I recall correctly. With the new syntax where last_name is a key/value pair, it's easier to manage!!

@alshedivat
Copy link
Owner

@gkthiruvathukal, got it, makes sense. I'm definitely in favor to support proper co-author identification based on full names.

Re: having to put quotation marks around last names with special characters -- how about we put all last names in quotation marks as a safe default that should work for all last names?

@gkthiruvathukal
Copy link

@alshedivat Yes, I think that would be a good idea. This helps in many situations. I have co-authors with hyphenated last names, special characters (e.g. Läufer), and the one I already mentioned (O'Connell).

Looks like we are converging on something awesome!!

I also like your amendment because it will mostly preserve the "simplicity" of the original syntax. If there is no ambiguity, you basically use the last name key and hash containing just the url.

Coauthors are grouped by their last names. Within each group, using flat format (array of {firstnames, url}).
@ziruizhuang
Copy link
Contributor Author

@gkthiruvathukal @alshedivat Wonderful. I updated the implementation based on your discussion. Please have a look.

@gkthiruvathukal
Copy link

@ziruizhuang I was about to ping you to see if you wanted me to work on it! I finally was able to spend some time to read the liquid docs and feel more confident with it now. Thanks for working on this update! I'm going to test it now but might not report back until morning my time (based in Chicago USA here).

@gkthiruvathukal
Copy link

@ziruizhuang, everything is working nicely! You can see the build status here: https://travis-ci.com/github/gkthiruvathukal/gkthiruvathukal.github.io. I manually copied in bib.html for now. I will fully test the common last name tomorrow but at least my current bibliography is working with our new desired format!

I think we should declare this PR good to go, especially since you've already tested using the "Bach" example I provided earlier, with J.S. included for good measure! 👍🏾.

@ziruizhuang
Copy link
Contributor Author

@gkthiruvathukal , If everything goes well, think it's time to check whether the doc / tutorial is easy to follow. (P.S. I have traveled to Chicago a few years back and I do love the city : ) )
@alshedivat , I will mark this PR ready for now.

And thank you all for the discussion, ideas, code reviewing, and testing.

@ziruizhuang ziruizhuang marked this pull request as ready for review January 8, 2021 12:45
@gkthiruvathukal
Copy link

I think this PR is a great example of how iterating toward a good solution benefits from addressing requirements, coding, documentation, and testing! I'm actually glad I wasn't coding so I could independently test the code.

By the way, @ziruizhuang, you'll be pleased to know that the last name support also works correctly on my end with two different authors.

@alshedivat I think this PR is good to go, pending your final approval.

@alshedivat alshedivat self-requested a review January 9, 2021 04:26
Copy link
Owner

@alshedivat alshedivat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks again for this contribution.

I suggested a few stylistic changes. Please double check that everything works again, and we'll be ready to merge.

_layouts/bib.html Outdated Show resolved Hide resolved
_layouts/bib.html Outdated Show resolved Hide resolved
_layouts/bib.html Outdated Show resolved Hide resolved
_layouts/bib.html Outdated Show resolved Hide resolved
@gkthiruvathukal
Copy link

gkthiruvathukal commented Jan 9, 2021

@alshedivat For the changes you suggested, are those in your repo or @ziruizhuang's fork? It looks like they are approved already but I would like to do some final testing on my site.

ziruizhuang and others added 4 commits January 10, 2021 09:05
stylistic changes

Co-authored-by: Maruan <[email protected]>
stylistic changes

Co-authored-by: Maruan <[email protected]>
stylistic changes

Co-authored-by: Maruan <[email protected]>
stylistic changes

Co-authored-by: Maruan <[email protected]>
@ziruizhuang
Copy link
Contributor Author

@gkthiruvathukal I committed the suggestions from @alshedivat just now. The code should be available at https://github.com/ziruizhuang/al-folio/tree/dev-author-annotation

@gkthiruvathukal
Copy link

@ziruizhuang Thanks! I will incorporate these changes on my site and follow up.

@alshedivat
Copy link
Owner

@gkthiruvathukal, did you get a chance to test the final changes? If all looks good, I'll go ahead and merge.

@gkthiruvathukal
Copy link

@alshedivat Yes, I have incorporated the latest version of _layouts/bib.html on my site, and it is working perfectly. I've also done testing on the common last name feature (what we were going after in the first place) and it's working nicely. I think you can safely merge this PR.

Nice working with you and @ziruizhuang on this PR. I'll try to help out with future changes. (I actually did a bit of liquid on my regular page content, so I know how to use it now.)

@alshedivat alshedivat merged commit 6b28f90 into alshedivat:master Jan 12, 2021
github-actions bot added a commit that referenced this pull request Jan 12, 2021
* add coauthor annotation

* fix typo in coauthors.yml

* add brief author annotation tutorial in README.md

* change to combined data structure

Coauthors are grouped by their last names. Within each group, using flat format (array of {firstnames, url}).

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

Co-authored-by: Maruan <[email protected]> [ci skip]
song-qun pushed a commit to song-qun/song-qun.github.io that referenced this pull request Mar 19, 2021
* add coauthor annotation

* fix typo in coauthors.yml

* add brief author annotation tutorial in README.md

* change to combined data structure

Coauthors are grouped by their last names. Within each group, using flat format (array of {firstnames, url}).

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

Co-authored-by: Maruan <[email protected]>
MichaelHilton pushed a commit to MichaelHilton/MichaelHilton.github.io that referenced this pull request Jul 22, 2021
* add coauthor annotation

* fix typo in coauthors.yml

* add brief author annotation tutorial in README.md

* change to combined data structure

Coauthors are grouped by their last names. Within each group, using flat format (array of {firstnames, url}).

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

Co-authored-by: Maruan <[email protected]>
antchristou pushed a commit to antchristou/antchristou.github.io that referenced this pull request Nov 20, 2023
* add coauthor annotation

* fix typo in coauthors.yml

* add brief author annotation tutorial in README.md

* change to combined data structure

Coauthors are grouped by their last names. Within each group, using flat format (array of {firstnames, url}).

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

Co-authored-by: Maruan <[email protected]>
siantonelli pushed a commit to siantonelli/siantonelli.github.io that referenced this pull request Oct 26, 2024
* add coauthor annotation

* fix typo in coauthors.yml

* add brief author annotation tutorial in README.md

* change to combined data structure

Coauthors are grouped by their last names. Within each group, using flat format (array of {firstnames, url}).

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

* Update _layouts/bib.html

stylistic changes

Co-authored-by: Maruan <[email protected]>

Co-authored-by: Maruan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants