Exposing graph information for modules. #140

samccone · 2020-05-24T20:49:41Z

👋 Evan,

As part of a long running project to better understand javascript bundles I have been working with the authors of Rollup, Webpack, Rome, and Parcel to expose a new graph file for compiled code. Think of it as an enabler to answer the question of why is X in my bundle?

I would like to make a feature request for esbuild to take a flag --graph or --bundle-buddy or read from an env variable to write out the following information to a .json file.

Data format

type BundleBuddyGraph = Array<{
  // current src file
  source: string;
  // The source file imports the following file
  target: string;
}>

An example of what this looks like:

zap.js

import {cool} from './doge';

foo.js

import {amazing} from '../cool';

wow(amazing());

Expected graph info:

{
    source: 'app/foo.js',
    target: 'app/zap.js',
},
{
    source: 'app/foo.js',
    target: 'cool.js',
},
{
    source: 'app/zap.js',
    target: 'app/doge.js',
},

Project design doc: https://docs.google.com/document/d/1ycGVBJmwIVs34yhC0oTqv_WH5f0fs2SAFmzyTiBK99k/edit#

Previous work with the other bundlers to enable exporting the graph file:

==

This will be integrated with the https://github.com/samccone/bundle-buddy project to empower people to better explore and tune their bundles.

Let me know if you have any questions or thoughts. Thanks!

The text was updated successfully, but these errors were encountered:

evanw · 2020-05-25T06:49:07Z

Cool! Thanks for the pointers. I already have a note on my list to check out bundle buddy because I have come across it in the past. For reference, this issue is similar to a previous one: #15. I'm already considering generating an output graph in JSON format for that issue.

I was thinking about including a bit more information than just the graph though. Don't you want to know sizes too so you can prioritize the display of the information from the bundle? I see you're using source maps but it's not clear to me how source maps would let you get that information. Or at least, it doesn't seem like it'd be very accurate since source maps only map locations, not spans. A source file may also end up split apart in the final bundle, both in different pieces within a chunk and even across chunks.

samccone · 2020-05-25T17:15:53Z

Hey Evan, thanks for asking these questions

TL;DR;

file size is calculated from the sourcemaps, as you said it is not perfect but in testing it is typically within 7%~ of the actual size of the file, this delta is uniform across the entire bundle so the relative sizing remains intact. (I have detailed the approach below)
As you stated the per-bundle output of a single file can be quite different, that is one of the reasons why I use the sourcemap per-bundle as the source of truth.

RE:

outputting sizing information in the graph:

I could imagine a project dumping a file like:

type BundleBuddyGraph = Array<{
  // current src file
  source: string;
  // The source file imports the following file
  target: string;
}>

interface ProjectStats {
   chunks: {
       [chunkName: string]: {
          // total bytes of the chunk
          totalBytes: number,
          graph: BundleBuddyGraph,
          chunkDeps: {
             [file: string]: {
               // bytes included in this chunk
               totalBytes: number;
             }
          }
       }
    }
}

If you could expose this information, that would be spectacular 🎉

Expanded:

How does bundle-buddy work?

bundle-buddy creates an association between the dependency graph of your project to the size of each node in the graph by connecting two discrete pieces of information

A graph of your project’s file dependencies (what file imports what file)
The sourcemap source files (what is in your bundle).

The project graph alone provides us with this information:

The graph generation comes from the project specific dependency graph output. (this is true for all tools except for webpack which already exposes this information in stats.json, bundle-buddy uses stats.json to extract the graph)

Then from the sourcemap information we compute node-size (details) which gives us the following level of information.

Through the combination of both pieces of this data we then have enough raw data to start solving for the core pain points of the project.

Source map size extraction:

Since bundle-buddy does not require your source to determine the size of your files we determine this information by computing the size of each dependency by looking at the sourcemapping file. From this information we can then determine the size of each file included in the bundles. [accuracy in the tested projects has been within 7% of actual size]

This approach can be beneficial as due to the modern advancement of tree shaking and dead code elimination during compile time we count only the file size after unused code has been stripped, resulting in much more accurate data.

Additionally this approach when granulated at a per-bundle level and not a SUM of all bundles level can provide more robust insights into the per-bundle-size of each file.

Pseudo code

* Def span: A range of text that is determined by calculating the number of characters between column and line entries in the sourcemap.

* For each sourcemap
  * Iterate over each sourcemapping entry
    * Determine the number of characters in each span (assume 1 byte per char) 
    * Sum the number of bytes in each span  into a dictionary whose key is the source file who the spans belong to

(source)

Thanks again for your time and working though this with me

evanw · 2020-05-27T20:10:52Z

I'm thinking of providing a slightly different format for the information, given that there are likely many use cases for it. I'm also thinking of separating out source information from chunks, otherwise you'd potentially end up repeating a lot of information and unnecessarily making the graph output bigger.

Would it be ok for your use case to use a slightly different format as long as the information you need is there? Right now I'm thinking of something like this:

interface Graph {
  sources: {
    [path: string]: {
      byteSize: number
      imports: {
        kind: 'statement' | 'require' | 'dynamic'
        path: string | null
      }[]
    }
  }
  chunks: {
    [path: string]: {
      kind: 'entry'
      byteSize: number
      sources: {
        [path: string]: {
          byteSizeInChunk: number
        }
      }
    }
  }
}

samccone · 2020-05-28T19:02:19Z

Hey Evan, yes this format will work just fine. 👍

I would like if possible to ensure that the path names you use for your keys are consistent between the sourcemap files paths and the internal Graph. If consistency is not possible having an invariant that the sourcemap file paths are a subset of the graph file would work as well, things get quite tricky to align otherwise in an automated fashion.

Thanks again.

evanw · 2020-06-05T20:15:33Z

This will be in the next release. The format I went with is a little different, but pretty much what I described above:

interface Metadata {
  inputs: {
    [path: string]: {
      bytes: number
      imports: {
        path: string
      }[]
    }
  }
  outputs: {
    [path: string]: {
      bytes: number
      inputs: {
        [path: string]: {
          bytesInOutput: number
        }
      }
    }
  }
}

It can easily be transformed into the format you want like this:

const edges = []
for (const source in json.inputs)
  for (const { path: target } of json.inputs[source].imports)
    edges.push({ source, target })

I think you shouldn't need the source map because of the extra information in the metadata, although of course you can continue to require the source map if you'd like. Let me know how I can follow along with the integration.

evanw closed this as completed in 11e9d5b Jun 5, 2020

evanw mentioned this issue Jun 5, 2020

[Feature] Output bundle analysis #15

Closed

eulores mentioned this issue Aug 27, 2020

Include Metadata as part of BuildResult #352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exposing graph information for modules. #140

Exposing graph information for modules. #140

samccone commented May 24, 2020

evanw commented May 25, 2020

samccone commented May 25, 2020 •

edited

Loading

evanw commented May 27, 2020

samccone commented May 28, 2020

evanw commented Jun 5, 2020

Exposing graph information for modules. #140

Exposing graph information for modules. #140

Comments

samccone commented May 24, 2020

Data format

zap.js

foo.js

Expected graph info:

evanw commented May 25, 2020

samccone commented May 25, 2020 • edited Loading

How does bundle-buddy work?

Source map size extraction:

Pseudo code

evanw commented May 27, 2020

samccone commented May 28, 2020

evanw commented Jun 5, 2020

samccone commented May 25, 2020 •

edited

Loading