-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using polyglot notebooks (dotnet interactive) as input #806
Comments
Hello, Thank you for bringing up this compelling feature request. While I am interested in supporting this idea, I would like to gather more information to better understand its implementation. As I do not have prior experience with notebooks, I would appreciate it if you could provide more details about the technical aspects of the proposed feature. For instance, would this feature require a new file format or would it be embedded within an existing one? I would also appreciate it if you could describe the input and output process for an end user. This would enable me to gain a better understanding of the technical side of the feature and provide better feedback. Thank you, and I look forward to hearing more about this feature request. |
Sure, ill try my best. What are notebooks?Notebooks are interactive documents popularized by the python package jupyter. The file extension is Notebooks are rendered as an interactive document that can contain code cells and markdown. Code cells can be executed, and the last object in a code cell is usually displayed as formatted output below the code cell. Markdown cells can be used to add formatted text annotations to contextualize code. You can maybe already see that there are many parallels of notebooks and using literate scripts with embedded output via fsdocs, the only real difference being that notebooks are interactive, meaning you have a "play" button next to each code cell, while fsdocs is usually a tool that you run once and then host the output somewhere. An exampleFor an example how such a document looks like, take a look at jupyters official example here or a F# notebook generated by fsdocs for plotly.net (a data visualization library that i maintain) docs here (note that you have to execute the cells in that notebook to get interactive output) The file format
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"dotnet_interactive": {
"language": "fsharp"
},
"polyglot_notebook": {
"kernelName": "fsharp"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div class=\"dni-plaintext\"><pre>42</pre></div><style>\r\n",
".dni-code-hint {\r\n",
" font-style: italic;\r\n",
" overflow: hidden;\r\n",
" white-space: nowrap;\r\n",
"}\r\n",
".dni-treeview {\r\n",
" white-space: nowrap;\r\n",
"}\r\n",
".dni-treeview td {\r\n",
" vertical-align: top;\r\n",
" text-align: start;\r\n",
"}\r\n",
"details.dni-treeview {\r\n",
" padding-left: 1em;\r\n",
"}\r\n",
"table td {\r\n",
" text-align: start;\r\n",
"}\r\n",
"table tr { \r\n",
" vertical-align: top; \r\n",
" margin: 0em 0px;\r\n",
"}\r\n",
"table tr td pre \r\n",
"{ \r\n",
" vertical-align: top !important; \r\n",
" margin: 0em 0px !important;\r\n",
"} \r\n",
"table th {\r\n",
" text-align: start;\r\n",
"}\r\n",
"</style>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"let a = 42\n",
"\n",
"a"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".NET (F#)",
"language": "F#",
"name": ".net-fsharp"
},
"polyglot_notebook": {
"kernelInfo": {
"defaultKernelName": "fsharp",
"items": [
{
"aliases": [
"f#",
"F#"
],
"languageName": "F#",
"name": "fsharp"
},
{
"aliases": [
"frontend"
],
"languageName": null,
"name": "vscode"
}
]
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
is rendered by vscode like this: Notebooks in .NETJupyter can use any kind of Kernel, which is basically a program that tells jupyter how to run code of a specific language. The official .NET kernel that can run F#, C#, and many more is contained in dotnet interactive dotnet interactive allows library authors to distribute custom renderers, which can take a .NET object and transform it into output in the notebook. We have this for example for plotly.net: Answering your question
Since i am not sure of how fsdocs works internally, i cannot answer this for certain. My personal usecase would be converting For further demonstration, here is a simple page that i work on for fun, where i convert F# notebooks to html via It would be awesome if i would be able to do that only using .NET tools instead of relying on a python tool for that conversion. There are many similarities between this literate file using fsdocs fsi evaluation: let a = 42
(***include-value: a***) and this cell and output combinate in a notebook: So my naive approach would be just parsing code cell content and output and emitting that into html? |
Thank you very much for this elaborate response. Much appreciated! However, I'm a bit lost about the execution part of things. Otherwise, I believe we can add support for a new input format. We would need to take a good look at what needs to be done to have parity with scripts for example. We'll definitely need to refine things. And last but not least, I'd like to know if my fellow maintainers would be on board with this as well. |
Hi @kMutagene, this is a feature that dsyme requested to be added by the F# data science community back when he rebooted fslab and gave your team admin control (2020?). It's in that old "discussion" thread in the fslab repo where Don is replying to you fslaborg/FsLab#3 (comment), also fslaborg/FsLab#3 (reply in thread) So, Don approves also and I would really like it and it's been on a mental "todo" of mine for this library. Getting closer to nbdev functionality would be nice. At a high level, the notebook is json with code and markdown cells denoted by metadata, so it's even easier than parsing an .fsx or .md file. We get the markdown and code paragaphs for free. Here is ParseScript.fs, we'd want a ParseIpynb. I had a recent discussion related to this on the dotnet/interactive repo, they have a parsing library that they suggested we could use. The issue would be whether we want to support full polyglot notebooks with potential .js, powershell, C#, F# code ... and magic variables, etc. I kinda feel like that's out of scope and we should parse it ourselves and restrict it for now to parsing pure F# notebooks that only have the same features as .fsx files (i.e., no fancy polyglot notebook features). |
@nojaf I think the idea is right now by adding That notebook is generated by |
The interactive part is only relevant while working with the notebook. Once saved, the notebook keeps the execution results saved in the
exactly, however direct conversion to html skipping the translation into the literate model would be enough for me personally if that's easier to implement
Not sure if any magic in the notebook matters at all - ultimately the notebook can be executed and saved by the respective tools that support the magic stuff. If the notebook environment takes care of all the execution, the only thing fsdocs would need to do is parsing code blocks and output and embedding it 'as-is' - but maybe that would go more in the direction of a standalone tool that does not need to care about integrating into an existing codebase. However, if execution is needed there might also be the option of using https://github.com/jonsequitur/dotnet-repl#-run-a-notebook-script-or-code-file-and-then-exit for that. |
So i had a first go at a library parsing I would love it to be a very granular nuget package, but I am not sure about the namespace yet. However, please let me know if that modelling of |
Hi @kMutagene, nice work. Is your objective with the library only to help with FSharp.Formatting, or are you working on your parser for other purposes too? Have you seen https://www.nuget.org/packages/Microsoft.DotNet.Interactive.Documents? See description here dotnet/interactive#2685 (comment). If we rely on an external nuget library for ‘ParseIpynb.fs’, it might make sense to use the one mentioned above that ‘dotnet/interactive’ already provides because it would presumably incorporate the special dotnet interactive features. But it might be better to not take an external dependency (previously dsyme did not want to take an external dependency for an html dsl). |
I have not seen that, might have just used that instead of writing things from scratch, but here we are 💀
That's the reason why I was initially not sure about the namespace - I would have no problem with just commiting it into this library (meaning it would become I think we can expect the main way of writing F# notebooks being polyglot notebooks, and therefore it might make sense to just use their parser/writer library. I think i'll stick to the Another reason why I'd like this kind of nuget package is that I would like to create a .NET tool equivalent of |
I think I should however add for the sake of completeness that I have not found a simple way of just parsing a notebook file using the API surface of It seems like the way to use that library is creating a parsing server which reacts to Parse requests - something that totally makes sense for the dotnet interactive tool, but might not be what we need here. At least for my purposes this seems kind of convoluted when i could just have a simple 'ipynb in, document model out' function. |
It's a good use case. The original version of this library that tomasp made was more in this spirit, writing .fsx and then the tools converted to html, latex, pdf. The code for this is still here, but FYI, not sure if you've tried pandoc (I believe it's what nbconvert uses under the hood), e.g., |
hey @nojaf @nhirschey I just wanted to try to summarize what we talked about at the Data Science In F# on this topic, if i forgot/misremembered something feel free to add to this.
In general, i am working on a .NET port of a subset of Any thoughts/additions on this? |
Hi @kMutagene, this is more or less how I remember it. I think the |
If that is the case, i think we are pretty far at solving this already. The basic parsing and converting is implemented at NBFormat.NET. However that lib currently uses prism.js for syntax highlighting. I think FSharp.Formatting has custom syntax highlighting, is there a way to incorporate this or apply it to a string post-conversion? |
On another note, NBFormat.NET also uses Markdig for markdown conversion. I think FSharp.Formatting also has a markdown parser/converter. So it might be more reasonable to actually include the notbook conversion into the codebase here, to prevent those unnecessary duplications of markdown and syntax highlighting pipelines |
Hi @kMutagene,
Yeah, the simplest thing (and I believe and the way nbconvert does it) is transform the ipynb to markdown, then FSharp.Formatting ingests the notebook as a markdown file. I'm part way to testing this, let me check a few things and get back to you soon. |
what happens to the cell output when doing it this way? |
This is quick and dirty, but taking the fslab blog post cytoscape example, ipynb -> md and then using the md as input is shown below. I’m passing through the ipynb html outputs “as is” (i.e., markdown convention is pass html through unmodified). There's more work to do it properly, but it's at least proof of concept? : |
Yeah that looks good actually. If i understand correctly, you do the conversion to md via pandoc though? So it looks to me like we can just parse the notebook model via NBFormat.NET, and create a markdown output with it. That would just mean leaving markdown cells as-is, putting code cells into markdown code tags with the correct language, and leaving cell output as-is. Does that input have to be an actual markdown file, or could it be done in-memory? Advantage of this would be no non-.NET dependencies |
Great!
FSharp.Formatting expects everything to be files. Even Literate.ParseMarkdownString write the string to a file before proceeding. So to minimize "surgery" on core functions, it's easy to write to a temp file and send the temp file down the path.
I didn't use pandoc. I used ~130 lines of F# code to parse and convert to You could also use For reference, to parse using dotnet interactive see code below. I'm working on mapping this version to my above-linked gist converter so we can compare. #r "nuget: Microsoft.DotNet.Interactive.Documents, *-*"
open Microsoft.DotNet.Interactive.Documents
let nb = Jupyter.Notebook.Parse(System.IO.File.ReadAllText("post.ipynb"))
|
the last time i looked into this, it seemed to me like it is expected to create a deamon-like service that can be sent notebooks to parse, which was the reason i decided to write my own lib in the end. It seems like i just did not find the correct API looking at your code sample, so i guess using that one is ideal because it will keep up with changes to the polyglot notebook format as you said. They are doing some slightly unexpected stuff (e.g. how languages are named in the code cells), so having their original model seems like the way to go here. So to summarize, it looks to me like the pipeline would be
looks like almost everything is there, once the mapping of the |
Emitting notebooks with this tool is a killer feature. Since there is already compatibility with the internal model and the ipynb format, i suggest another notebook-based feature: using notebooks as input. I would love to help implement this, but would need some pointers on how to navigate the code base.
This has huge advantages over working with literal scripts:
This would be similar to python's nbconvert, and i think such a tool is a critical part missing in the .NET notebook landscape. Maybe it would be better to make a standalone tool for this, but that is up for debate, i focused on this repo since it can already generate notebooks.
Note that you can already use nbconvert to convert polyglot notebooks to html (which is what i currently do), but that means you have to install and maintain a python environment instead of being able to use a dotnet tool.
The text was updated successfully, but these errors were encountered: