Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing graph information for modules. #140

Closed
samccone opened this issue May 24, 2020 · 5 comments
Closed

Exposing graph information for modules. #140

samccone opened this issue May 24, 2020 · 5 comments

Comments

@samccone
Copy link

👋 Evan,

As part of a long running project to better understand javascript bundles I have been working with the authors of Rollup, Webpack, Rome, and Parcel to expose a new graph file for compiled code. Think of it as an enabler to answer the question of why is X in my bundle?

I would like to make a feature request for esbuild to take a flag --graph or --bundle-buddy or read from an env variable to write out the following information to a .json file.

Data format

type BundleBuddyGraph = Array<{
  // current src file
  source: string;
  // The source file imports the following file
  target: string;
}>

An example of what this looks like:

zap.js


import {cool} from './doge';

foo.js


import {amazing} from '../cool';

wow(amazing());

Expected graph info:

{
    source: 'app/foo.js',
    target: 'app/zap.js',
},
{
    source: 'app/foo.js',
    target: 'cool.js',
},
{
    source: 'app/zap.js',
    target: 'app/doge.js',
},

Project design doc: https://docs.google.com/document/d/1ycGVBJmwIVs34yhC0oTqv_WH5f0fs2SAFmzyTiBK99k/edit#

Previous work with the other bundlers to enable exporting the graph file:

==

This will be integrated with the https://github.com/samccone/bundle-buddy project to empower people to better explore and tune their bundles.

Let me know if you have any questions or thoughts. Thanks!

@evanw
Copy link
Owner

evanw commented May 25, 2020

Cool! Thanks for the pointers. I already have a note on my list to check out bundle buddy because I have come across it in the past. For reference, this issue is similar to a previous one: #15. I'm already considering generating an output graph in JSON format for that issue.

I was thinking about including a bit more information than just the graph though. Don't you want to know sizes too so you can prioritize the display of the information from the bundle? I see you're using source maps but it's not clear to me how source maps would let you get that information. Or at least, it doesn't seem like it'd be very accurate since source maps only map locations, not spans. A source file may also end up split apart in the final bundle, both in different pieces within a chunk and even across chunks.

@samccone
Copy link
Author

samccone commented May 25, 2020

Hey Evan, thanks for asking these questions

TL;DR;

  • file size is calculated from the sourcemaps, as you said it is not perfect but in testing it is typically within 7%~ of the actual size of the file, this delta is uniform across the entire bundle so the relative sizing remains intact. (I have detailed the approach below)

  • As you stated the per-bundle output of a single file can be quite different, that is one of the reasons why I use the sourcemap per-bundle as the source of truth.

RE:

outputting sizing information in the graph:

I could imagine a project dumping a file like:

type BundleBuddyGraph = Array<{
  // current src file
  source: string;
  // The source file imports the following file
  target: string;
}>

interface ProjectStats {
   chunks: {
       [chunkName: string]: {
          // total bytes of the chunk
          totalBytes: number,
          graph: BundleBuddyGraph,
          chunkDeps: {
             [file: string]: {
               // bytes included in this chunk
               totalBytes: number;
             }
          }
       }
    }
}

If you could expose this information, that would be spectacular 🎉

Expanded:

How does bundle-buddy work?

bundle-buddy creates an association between the dependency graph of your project to the size of each node in the graph by connecting two discrete pieces of information

A graph of your project’s file dependencies (what file imports what file)
The sourcemap source files (what is in your bundle).

The project graph alone provides us with this information:
image

The graph generation comes from the project specific dependency graph output. (this is true for all tools except for webpack which already exposes this information in stats.json, bundle-buddy uses stats.json to extract the graph)

Then from the sourcemap information we compute node-size (details) which gives us the following level of information.

image

Through the combination of both pieces of this data we then have enough raw data to start solving for the core pain points of the project.

Source map size extraction:

Since bundle-buddy does not require your source to determine the size of your files we determine this information by computing the size of each dependency by looking at the sourcemapping file. From this information we can then determine the size of each file included in the bundles. [accuracy in the tested projects has been within 7% of actual size]

This approach can be beneficial as due to the modern advancement of tree shaking and dead code elimination during compile time we count only the file size after unused code has been stripped, resulting in much more accurate data.

Additionally this approach when granulated at a per-bundle level and not a SUM of all bundles level can provide more robust insights into the per-bundle-size of each file.

Pseudo code

* Def span: A range of text that is determined by calculating the number of characters between column and line entries in the sourcemap.

* For each sourcemap
  * Iterate over each sourcemapping entry
    * Determine the number of characters in each span (assume 1 byte per char) 
    * Sum the number of bytes in each span  into a dictionary whose key is the source file who the spans belong to

(source)


Thanks again for your time and working though this with me

@evanw
Copy link
Owner

evanw commented May 27, 2020

I'm thinking of providing a slightly different format for the information, given that there are likely many use cases for it. I'm also thinking of separating out source information from chunks, otherwise you'd potentially end up repeating a lot of information and unnecessarily making the graph output bigger.

Would it be ok for your use case to use a slightly different format as long as the information you need is there? Right now I'm thinking of something like this:

interface Graph {
  sources: {
    [path: string]: {
      byteSize: number
      imports: {
        kind: 'statement' | 'require' | 'dynamic'
        path: string | null
      }[]
    }
  }
  chunks: {
    [path: string]: {
      kind: 'entry'
      byteSize: number
      sources: {
        [path: string]: {
          byteSizeInChunk: number
        }
      }
    }
  }
}

@samccone
Copy link
Author

Hey Evan, yes this format will work just fine. 👍

I would like if possible to ensure that the path names you use for your keys are consistent between the sourcemap files paths and the internal Graph. If consistency is not possible having an invariant that the sourcemap file paths are a subset of the graph file would work as well, things get quite tricky to align otherwise in an automated fashion.

Thanks again.

@evanw evanw closed this as completed in 11e9d5b Jun 5, 2020
@evanw
Copy link
Owner

evanw commented Jun 5, 2020

This will be in the next release. The format I went with is a little different, but pretty much what I described above:

interface Metadata {
  inputs: {
    [path: string]: {
      bytes: number
      imports: {
        path: string
      }[]
    }
  }
  outputs: {
    [path: string]: {
      bytes: number
      inputs: {
        [path: string]: {
          bytesInOutput: number
        }
      }
    }
  }
}

It can easily be transformed into the format you want like this:

const edges = []
for (const source in json.inputs)
  for (const { path: target } of json.inputs[source].imports)
    edges.push({ source, target })

I think you shouldn't need the source map because of the extra information in the metadata, although of course you can continue to require the source map if you'd like. Let me know how I can follow along with the integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants