Replies: 2 comments 1 reply
-
Sounds like a topological sort to me. You start by constructing a DAG of symbols. Symbols with no dependencies can be scheduled by a master thread for resolution in parallel. The master thread then updates the DAG and reduces it to contain nodes that are still unresolved. The step is repeated on the updated DAG and eventually the graph will only contain nodes that have been resolved. The construction of DAG process will exclude any symbol that are not needed during the rendering process. I need to read more on this to see if I can find out how a parallel topological sorting works |
Beta Was this translation helpful? Give feedback.
-
Summary of Call TodayAsync vs ContinuationWe have renamed our current fastn 0.5 to fastn-lang, as it is probably going to be full compiler. We can call it compiler as well. We want our compiler to agnostic of where the files are read. A lot of compilers are written with direct access to file system, we want to create an indirection, so files can come from file system or DB (and other logic can happen like access control, which can remain outside of the compiler concern). We have a continuation/state machine based approach, where we call continue on compiler in an infinite loop till compiler says all done, else it keeps us asking, for more files etc. We discussed we can also use async trait. Since the very beginning, 2020 or so, we have been following this (continuation based) approach, and at that time async was not as good a shape as right now. So we can revisit this using trait (we will call it document store trait). fastn-lang vs fastn-resolver vs other cratesI was also thinking about our fastn-core etc crates in 0.4, and wondering how things would be in future, or how can we simplify things. Came up with the idea of Other crates wise we want to probably have some fastn-terminal for rendering to terminal, and I jokingly suggested (secretly hoping it happens), that we consider the terminal our primary target, so we can start using the terminal renderer UI as spec. File SourceSo the data structures we are creating right now extensively uses Or we make this discussion moot and SQLiteSo we have resolved and unresolved symbols in bags, and we def do we want to keep track of symbol dependency graph so we can do incremental compilation, it would mean we have to persist these data structures, and to incrementally update them on every file change. One possibility we create a bunch of Rust types, and let We can let fastn compiler etc work off of a sqlite db, and create a SQLite As Our Package FormatSince after each We have considered elaborate partial package, and downloading only the specific files that are needed instead of the whole package etc, but I feel this may be a better. This removes the cleverness, gives us more standard version management problem, solve for dependency versioning etc, manage sqlite files etc. We would be lot more like Python/Rust in that sense, we only have to figure out if we are more Python like, where only one version of any package can be added a dependency (direct or transitive), or Rust/Node like, where multiple versions of same package can live in the dependency tree. Which is anyways an independent question. |
Beta Was this translation helpful? Give feedback.
-
Yesterday I had a call with Arpita and discussed a few ideas.
Concurrent Compilation
In 0.4 we did not do any concurrent processing, but I feel we have opportunities to do so in 0.5.
So this is how our parsing plan looks like right now. We first create Sections, as described by our "p1" grammar. We are going to rename it to "fastn section grammar" now onwards, as p1 is not a great name. We do very little work during section parsing, and it cannot be made trivially concurrent. We will have to try to partition the source
&str
orVec<char>
into multiple parts, but since without parsing we cannot cleanly divide them into sections.Given this input:
We get an array of (we also get errors, and spans so we can do power syntax highlighting, etc, but ignoring all that for simplicity):
The next phase is converting these raw sections into this (ignoring some errors, comments, etc):
We go through each section, and first check if it is an "import", "definition" or "content".
Definition
is:Now this pass can be done in parallel. We can go through each section and classify them concurrently. The output of any classification does not depend on any other classification. Each classification action can result in zero or more errors, so our "classification coordinator" can accept the vec from each of them, and join them together at the end. This way we can also avoid any locks. This is what we are going for, opportunities for concurrency without lock, as these are typically obviously superior to non-concurrent implementation. If locks/atomics/syncing gets involved then it's no longer obviously superior, and we should try maybe both implementations to see which performs better, and in the interest of getting the first launch out, we will prefer a non-concurrent approach.
Resolved Vs UnResolved
The output of the last pass is actually un-resolved, meaning for example, if the type of some field is "foo.bar", we have not yet resolved if "foo.bar" exists. In the unresolved state, we only do minimal syntax parsing, as Section is not fully parsed, it leaves out a lot of strings, the unresolved document parser.
This brings us to the third phase of our compilation (planned), we will take unresolved stuff and try to type-check everything, and create resolved stuff.
Content vs definition
When the main document is being rendered, the "content" of that document is rendered. Content is component invocations. All the data that is referenced, directly or indirectly by the content of the main documents are also instantiated. Other documents can provide definitions but their top-level content is ignored.
When the parser starts, it creates two "bags" (bags are maps from names, now onwards calls symbols, to the corresponding
Definition
): the unresolved bag and the resolved bag. It then loads the main document and does the section-level parsing. Then it creates unresolved symbols. The second step can happen concurrently. The second phase returns a list of (unresolved) content and (unresolved) definitions. For each unresolved definition, we prepend the document name,<document-name>#<symbol-name>
and put them in the unresolved bag.Then we start resolving by going through the unresolved content of the main document. We also keep track of import aliases for each document, and keep a set of "used symbols", which is a set of every symbol we have used so far while processing the resolved content of the main document.
When resolving, we go to each unresolved content, each content is basically a component invocation. We first resolve the component name itself. All the built in components, kernel components, are added to resolved-symbols at the beginning of the process. If a non-kernel component is found we first do the alias analysis (by looking at imports) and find the full name of the component being referred, this is component name resolution step. Once the full component name, we see if it is already resolved or not. If it is not already resolved, we try to find the component definition in the unresolved bucket.
So we first resolve the component definition, which entails going through type of every argument of the component, and resolving them (if they are not already resolved), we then resolve the variables referred as the default value, and then we go through the body of the component, and find all components and resolve them.
Continuation Based Concurrent Resolution Algorithm
If we want to resolve concurrently, and since the process of resolution requires us to mutate the resolved bag, we can pass a sync version of the resolved bag to all threads. We can do this concurrently as each unresolved content can be technically resolved independently. But if both section 1 and section 2 find say an unresolved component
foo
, both may try to resolved, and waste work.So we use continuation based concurrent resolution algorithm. We spawn n thread to do resolution, and we pass each section to each thread, and let each section return either success, or a "stuck at". We create an unresolved content tree into a resolved-unresolved enum based one, eg each item if the tree is wrapped inside an enum with two variants, initially entire tree will be un resolved.
The coordinator will try to resolve the tree, by passing each branch to a resolver thread, and all threads will have readonly reference to the all the bags. The response can be either resolved component, or a "stuck on symbol", or "stuck on document".
The coordinator will update the tree where resolved component was founder and go through every stuck on document, request its source, and do do first two phases of parsing describing above, creating a bunch of new symbols to be added to unresolved symbols bag.
It will then look at all the stuck on symbols list, and try to resolve them concurrently, by passing them the readonly version of all the bags. Each thread will return resolved symbol, or stuck on another symbol or stuck on document. Whatever is resolved will be added to resolved bucket, and whatever document is needed is parsed and added to unresolved bucket.
We will then try to resolve the content once again, and this time some of the stuck on symbols would have been resolved, and some more of the document would be processed.
This process will happen in a loop till the document is fully resolved.
Note that at each step we will also get errors, and the coordinator will keep collecting all the errors as well.
Fully Resolved Document
At the end of the previous process, we will have a fully resolved document tree, and we will have a list of symbols that were referred when resolving that document. We will also have resolved and unresolved bags. Each resolved component innovation and each symbol used, can be then converted to a corresponding JS code, and this is the final JS.
The Three JS Files
We are hopeful that this process is fast enough that it can analyse all the documents of a fastn package in one go, and create a list of symbols used across all documents in the fastn package. At this point, everything in the resolved bag is resolved because it was ever needed, and the JS for each of them will be created in a
deps.js
file. We will havefastn.js
, the fastn runtime, which only changes when a new fastn release is created. And for each document we will have a<doc>.js
, which will contain the content of that document.File Watcher / Incremental Build
We can write some sort of file watcher, which will keep updating the
deps.js
, actually it would bedeps-<hash>.js
, a content hashed version, so when any document is requested, and in parallel if a document is updated, we do not clobberdeps.js
.We can keep the resolved and unresolved symbols in
.fastn
folder, and update them instead of parsing everything from scratch to achieve faster incremental builds.Beta Was this translation helpful? Give feedback.
All reactions