Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Do Not Merge] ipsl: compiler and mesh for ipsl structure #33

Closed
wants to merge 426 commits into from
Closed

Conversation

Jorropo
Copy link
Contributor

@Jorropo Jorropo commented Jan 18, 2023

This still requires lots of work.

This include a mostly complete compiler, (minus a few types and literals). More nodes and scopes will be required.

TODO (for later PRs):

  • number support
  • matcher support
  • rewrite how AstNode collect scopes (stop wrapping, probably make Serialize() (AstNode, Scopes, error)) or similar.

Depends on (include) #36
Fixes ipfs#3

Kubuxu and others added 30 commits March 24, 2017 01:02
License: MIT
Signed-off-by: Jakub Sztandera <[email protected]>


This commit was moved from ipfs/go-unixfs@8bf6483
License: MIT
Signed-off-by: Jeromy <[email protected]>


This commit was moved from ipfs/go-unixfs@5e8347c
License: MIT
Signed-off-by: Jeromy <[email protected]>


This commit was moved from ipfs/go-unixfs@1f3adf3
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@123b0fd
…n nil

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@c1e9ccd
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@957da2c
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@e19161f
License: MIT
Signed-off-by: Jeromy <[email protected]>


This commit was moved from ipfs/go-unixfs@fb78dc2
License: MIT
Signed-off-by: Jeromy <[email protected]>


This commit was moved from ipfs/go-unixfs@4f4ba29
License: MIT
Signed-off-by: Zach Ramsay <[email protected]>


This commit was moved from ipfs/go-unixfs@ee18aaa
License: MIT
Signed-off-by: Zach Ramsay <[email protected]>


This commit was moved from ipfs/go-unixfs@427d991
License: MIT
Signed-off-by: Zach Ramsay <[email protected]>


This commit was moved from ipfs/go-unixfs@6b9f909
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@ec9e96f
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@db7c21c
And updated related dependencies.

License: MIT
Signed-off-by: Steven Allen <[email protected]>


This commit was moved from ipfs/go-unixfs@ee13c79
License: MIT
Signed-off-by: Łukasz Magiera <[email protected]>


This commit was moved from ipfs/go-unixfs@4597735
License: MIT
Signed-off-by: Jeromy <[email protected]>


This commit was moved from ipfs/go-unixfs@3da5638
License: MIT
Signed-off-by: Steven Allen <[email protected]>


This commit was moved from ipfs/go-unixfs@ef6144d
License: MIT
Signed-off-by: Łukasz Magiera <[email protected]>


This commit was moved from ipfs/go-unixfs@6d45943
License: MIT
Signed-off-by: Łukasz Magiera <[email protected]>


This commit was moved from ipfs/go-unixfs@f98d19a
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@5a0548a
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@e3ad5de
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@109d198
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@9501516
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@d0d21fa
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>


This commit was moved from ipfs/go-unixfs@e94381a
gateway: fix seeker can't seek on specific files

This commit was moved from ipfs/go-unixfs@a1739b5
@codecov
Copy link

codecov bot commented Jan 26, 2023

Codecov Report

Merging #33 (8b03955) into main (7346505) will increase coverage by 13.95%.
The diff coverage is 62.03%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main      ipfs/go-libipfs#33       +/-   ##
===========================================
+ Coverage   17.95%   31.91%   +13.95%     
===========================================
  Files          94      101        +7     
  Lines       10368    11841     +1473     
===========================================
+ Hits         1862     3779     +1917     
+ Misses       8245     7544      -701     
- Partials      261      518      +257     
Impacted Files Coverage Δ
unixfs/ipld-merkledag/file/unixfile.go 0.00% <0.00%> (ø)
unixfs/ipld-merkledag/io/resolve.go 0.00% <0.00%> (ø)
ipsl/ipsl.go 25.00% <25.00%> (ø)
unixfs/ipld-merkledag/importer/importer.go 33.33% <33.33%> (ø)
ipsl/helpers/helpers.go 37.50% <37.50%> (ø)
unixfs/ipld-merkledag/test/utils.go 47.82% <47.82%> (ø)
...ixfs/ipld-merkledag/importer/trickle/trickledag.go 53.58% <53.58%> (ø)
unixfs/pb/unixfs.pb.go 56.47% <56.47%> (ø)
ipsl/unixfs/unixfs.go 57.14% <57.14%> (ø)
unixfs/ipld-merkledag/mod/dagmodifier.go 57.21% <57.21%> (ø)
... and 23 more

@Jorropo Jorropo marked this pull request as ready for review January 30, 2023 11:31
@willscott
Copy link
Contributor

Is there a definition of what the IPSL is?

@Jorropo
Copy link
Contributor Author

Jorropo commented Jan 30, 2023

@willscott I have an incomplete incorrect spec draft here: https://hackmd.io/@UV0H7uWJTQ6Wm8jsq8w8mQ/BydHhrN5s

TL;DR: This is a low development cost extensible alternatives to selectors.
Selectors implies way to much to be easy to implement.
IPSL is a one pass compilable turring incomplete language that (not yet & optionally) will have support for adding more modules at runtime.
IPLD Selectors require you to buy in in the ipld datamodel and the reification buinsness which is lots of complex code to write. IPSL's interface is struct{cid.Cid, []byte}, which all IPFS implementations probably already implement. Instead it is on the burden of the clients to implement whatever features they need (and send them as wasm).

So for example you could run selectors on top of IPSL, by implementing them in wasm for example. And use selectors on a remote node that does not implement them.

@willscott
Copy link
Contributor

IPSL's interface is struct{cid.Cid, []byte}, which all IPFS implementations probably already implement.

  • Presumably currently no implementations implement this right now?
    • we haven't gotten any commitment or standard agreement - e.g. this is not on the roadmaps of boost, iroh, lotus etc.
  • How would you know if a given 'selector' is able to be understood by a remote node? is there any 'common core' that could be relied upon?

I would encourage getting review/alignment with current/former IPLD members that have thought about selector expressiveness and goals - e.g. @rvagg , @RangerMauve, @warpfork

@Jorropo
Copy link
Contributor Author

Jorropo commented Jan 30, 2023

@willscott you are starting the politics earlier than I meant too, I don't have a proper RFC ready 😅

IPSL's interface is struct{cid.Cid, []byte}, which all IPFS implementations probably already implement.

we already had the cid bytes pair in kubo since the beggining, it's so old it predates CIDs:

In case this was unclear, the which all IPFS implementations probably already implement was just struct{cid.Cid, []byte}. I really belive this is the canonical way to represent a block (it's some bytes, addressed by a hash).

About the rest of things here there is one alpha partial implementation (this PR).
The goal is for IPSL to be the internal API used by go code doing requests to Kubo. Eventually looking somewhat like this:

traversal := ipsl.NewDepthLimit(10, ipsl.NewAll(
    unixfs.Name(match.Wildcard("file1*"), unixfs.Everything())),
    unixfs.Name(match.Eq("file2"), unixfs.Range(0xFF1024, 0x2FF0555, unixfs.Everything())))

This correspond to this request:

(some unixfs folder)
+ -> file1*
|    + -> *
+ -> file2
     + -> 0xFF1024 / 0x2FF0555
          + -> *

I've tried using selectors for this, but the API is unreadable and they are hard to use (current implementation is not comparable and hard to make so due to reification and hidden internal state).

This is needed because this allows RAPIDE (new download client) to have a "full view" of the request, instead of sequential calls to GetBlock().
Having a full view of the download request will allows to write much faster download clients and support more protocols (like car files over gateway, graphsync, ...).

Goals

  1. At "worst":
    We have a somewhat lazy tree data structure which represent unixfs requests inside Kubo's internal.

  2. Ideally (thoses two parts are optional and will be ironed out later, or not):

    • you want to transmit thoses tree requests in order to do things like 1RTT nested directories lookups.
    • unixfs is not the only dataformat and you want to support other dataformats, also ideally you solve the imbalance in data formats development cost (as a user I shouldn't have to rely on a remote server owned by an other party to implement my data-format to download my data from them, as long as my data is multihash addressed).

Design

The way features are meant to be implemented is in three ways:

  1. The language spec. This include:
    • The syntax (LISP like), this is a tree
    • The type system, it is divided into two kind of types:
      • lazy evaluated type, thoses types return objects that will be evaluated later:
        • scope, map[string]NodeCompiler mapping that can be bound using a scope node [],
          the NodeCompiler type is an implementation detail of the language, in this implementation it takes in some arguments are turn some other type out of that (intended to be implemetend natively, could also be in wasm if we add constructors support to the wasm code but I don't see value in that yet)
        • traversal node, signature: func(struct{cid.Cid, []byte}) ([]CidTraversalPair, error), this takes in some raw block data and CID associated with it and returns a list of childrens traversals to follow, this contain CIDs and other traversal nodes (implemented natively or with WASM)
        • matcher node, is a generic type that accepts most literal type (string, cid and number) and return true or false (note boolean is an internal type which can be used by traversal or filter nodes, but it isn't part of the language). (implemented natively or in wasm)
      • literal types (computed while compiling)
        • string (some bytes)
        • CID (a cid)
        • number (an uint64)
        • none (nothing)
  2. Builtin nodes, all features you care about (unixfs, ipld selectors, cool new dataformat 3000) are not present by default in the language (because this would force everyone to implement thoses which I don't want). However they are not usable by default, so this include the plumbing required to load features (called "scopes")
    • load-builtin-scope, this take in a string matcher as argument and return either a scope type if the IPSL implementation supports a matching builtin scope (see 3. for what are builtin scopes), else return none.
    • load-wasm-scope, this take in a cid and may tries to load it if the IPSL implementation support wasm, else return none
    • pick, this is a magic node, it has a varidic amount of arguments, they all must be of the same type or none. And it return a non none one. Which one is undefined, this is intended for fallback strategies, let's take for example this:
      (pick (load-builtin-scope (wildcard "/unixfs/v1.*")) (load-wasm-scope $Qmfoo {v1.1.0}) (load-wasm-scope $Qmfoo {v1.2.0})), first let's assume the server does not implement any of /unixfs/v1.*, it does implement wasm tho, so it has as arguments: None, WasmTraversal, WasmTraversal. The server is then allowed to implement whatever strategy to choose between the two WasmTraversals (it might choose $Qmbar because it already have it cached for example)
    • all takes a varidict amount of traversals and concatenate them.
    • wildcard, takes in a string and return a matcher[string] object, it will run some specification of wildcard on it (I'm not decided on which one yet).
    • eq, takes a matchable literal as argument and return a matcher that return true if what is passed in is equal to the argument, else false.
    • not, takes a matcher as argument, and return a matcher of the same type that returns the oposite result
    • and, takes a variadic amount of matchers as arguments, and return true if all of them returned true.
    • or, takes a variadic amount of matchers as arguments, and return true if any of them returned true.
    • filter, takes a matcher (cid (or string ?)), and a traversal, everytime the traversal is executed, filter the result by passing them into the matcher first, remove them if the matcher returns false
    • depth-limit, takes a number and a traversal and bind a filter that nulls out results when the depth limit is reached.
    • has-codec, takes a number and return a matcher[cid] that returns true if the CID codec match.
    • ... (for now that all but I'm not yet sure this is a finished list)
  3. Default builtins scope, thoses are all optional and are intended for people who don't want to run wasm ("I have no wasm runtime in my language", "wasm is too slow", ...)
    If you want to use IPSL as a local API (such as I want in Kubo) it's an awesome thing, else for remote requests they have the issue of you relying on the remote peer implementing your data format and request.
    • /unixfs/v<some version> (not finalised, still in pre alpha)
      • everything recursively traverse all links
      • all traverse all links 1 level deep
      • range, takes two numbers as arguments and a traversal, runs the traversal within all links within that file range
      • range-recursive, same as range but run recursively until finding blocks that are completely within the range or leaves (this has a different behaviour if your range falls between two links at some point) and execute the passed traversal on thoses.
      • name, takes a matcher[string] and a traversal as arguments and will return the child traversal on all matching names in the directory, for HAMT directories it will also follow all HAMT roots (because of how hamt works you can't run a wildcard without traversing everything) (if we add reflection to the wasm, we could add adhoc rules for things like eq).
      • ... some other unixfs traversal ?
    • ... some other data format ?
  4. Wasm loaded scopes, this part I've not worked on, I know what I want to do but I have not investigated how I'm gonna do it.
    The goal is gonna be to give it the CID of some wasm blob (unixfs file encoded probably ?) load that CID, and for the wasm blob to implement some matchers and traversals. Imagine implementing a bloomfilter to not send blocks you already have, the git protocol or IPLD selectors.
    Any details like API, runtime requirements don't exists yet, it's a very long term plan.

@willscott
Copy link
Contributor

Thanks for the detailed response!

I think the core of my push previously is that I don't fully understand what it means for something to be merged into libipfs, and if that means it's considered blessed as a core spec. If we're asking for review to merge this as an approved part of what ipfs is, that seems like it's ready for review? It seems like you're saying this is at an experimental level, but i'm not sure how we distinguish between experimental and important/maintained parts of what's living in this repo

@Jorropo
Copy link
Contributor Author

Jorropo commented Jan 30, 2023

@willscott I see what you mean now, there is no plans to merge this in libipfs yet, I've switched it from draft because it now has a minimal PoC people can play with and I wished to start gathering reviews from the rest of the team.

We have not yet decided if we will follow through with that project yet. 🙂

@Jorropo Jorropo changed the title ipsl: compiler and mesh for ipsl structure [Do Not Merge] ipsl: compiler and mesh for ipsl structure Jan 30, 2023
This remove the fake scopes nodes we had in the middle of the tree.
Now Serialize return BoundScope objects which are used by Uncompile to 
rebuild the Ast.
@Jorropo
Copy link
Contributor Author

Jorropo commented Feb 3, 2023

Superseeded by #148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IPSL Compiler And Basic Unixfs builtin scope