Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using downloads in the last N days, instead of all time. #101

Open
shepmaster opened this issue Feb 11, 2017 · 17 comments
Open

Consider using downloads in the last N days, instead of all time. #101

shepmaster opened this issue Feb 11, 2017 · 17 comments
Labels
enhancement Something new the playground could do help wanted Not immediately going to be prioritized — ask for mentoring instructions!

Comments

@shepmaster
Copy link
Member

shepmaster commented Feb 11, 2017

TL;DR: We use the top 100 crates based on all time downloads to avoid being arbiters of what crates are available. If you think that a "popular" crate should be present on the playground, consider enhancing crates.io to have an API that would prioritize crates downloaded in a more recent interval.

Rationale

See discussion started in #82

I'm hesitant to start cherry-picking crates for inclusion based on my own whims. If I were to do that, I'd certainly blacklist xml-rs in favor of my own XML parser 😈

Additionally, I think about how I'd explain to someone why a given crate is available or not. If it's "someone submitted a PR", then I fear we'd be compiling 1000s of crates every night. If it's "the maintainer picked it", then I fear the backlash directed towards me on public (and private?) forums.

And #113

We've also toyed around with the idea of having "sponsored crates", where someone contributes money towards server costs (which are paid out of our own pocket at the moment), but strangely no one seems excited to pay for it ;-) If you are, I'm sure we can get in contact.

@shepmaster shepmaster mentioned this issue Feb 11, 2017
@sgrif
Copy link

sgrif commented Feb 11, 2017

All I'm saying is... If you made libsqlite3 available in the sandbox, we would find it super helpful for bug reports. :)

@ArtemGr
Copy link

ArtemGr commented Feb 11, 2017

Will the "last N days" selection help libsqlite3? How would one check that?

@gsingh93
Copy link

Only somewhat related to this issue and the discussion in #82, but it seems kind of strange (if I'm understanding this correctly) that updating the top crates list could cause code that previously used to work to no longer work. i.e. the 100th crate in the list may drop to 101, the service is redeployed, and now any code snippets that relied on that crate no longer work. That being said, I don't know what the best solution would be.

@shepmaster
Copy link
Member Author

100th crate in the list may drop to 101

Even easier, a crate can simply release a new backwards-incompatible version. Since we always use the more-or-less-latest version, this is quite possible.

any code snippets that relied on that crate no longer work

This is a good point, but I don't see any good solution unless some benefactor decides to provide financial contribution ;-).

The Big Solution I can think of is to allow arbitrary versions of crates (and we might was well throw in arbitrary Rust versions at that point). Then we are one step away from being a (poor facsimile) of a web-based IDE.

no longer work

Beneficially, any code is still available, even if it's not executable.

@shepmaster
Copy link
Member Author

shepmaster commented Feb 28, 2017

code is still available

Which is part of the reason that I avoid adding the shortlink button and prefer the Gist button — I'm betting that GitHub and Gists will last a long time!

@shepmaster shepmaster added enhancement Something new the playground could do help wanted Not immediately going to be prioritized — ask for mentoring instructions! labels Apr 30, 2017
@shepmaster
Copy link
Member Author

shepmaster commented May 11, 2017

Looks like the last 90 days right now would be

  id  |          name           | downloads 
------+-------------------------+-----------
  463 | serde                   |   1387484
  795 | libc                    |    880648
  793 | bitflags                |    713088
  363 | lazy_static             |    541543
  524 | rustc-serialize         |    536186
  429 | winapi                  |    519486
  547 | log                     |    500864
 2164 | winapi-build            |    487439
 1339 | rand                    |    487066
  811 | kernel32-sys            |    486698
   35 | gcc                     |    451679
 4746 | num-traits              |    430125
 4697 | thread_local            |    427619
 2233 | regex-syntax            |    423146
 4446 | thread-id               |    413968
 2365 | aho-corasick            |    412091
 2364 | memchr                  |    403324
 1592 | num_cpus                |    400180
  545 | regex                   |    399375
 3231 | utf8-ranges             |    391706
   90 | time                    |    368111
    7 | semver                  |    352012
 2782 | serde_json              |    348980
  109 | url                     |    331922
   11 | toml                    |    312885
 5486 | itoa                    |    310237
 5512 | dtoa                    |    307166
 1368 | strsim                  |    306820
  456 | matches                 |    295491
 1873 | unicode-normalization   |    293407
 1306 | env_logger              |    291379
 1964 | unicode-xid             |    288507
   34 | pkg-config              |    284810
 1343 | byteorder               |    276103
 2361 | unicode-bidi            |    272374
 4747 | num-integer             |    270981
 6274 | syn                     |    268109
 4577 | idna                    |    267905
 4749 | num-iter                |    259747
 1369 | void                    |    259128
 1498 | clap                    |    251366
 6224 | quote                   |    248422
 2906 | rustc_version           |    246079
    9 | num                     |    243416
 1872 | unicode-segmentation    |    241820
 5544 | serde_codegen_internals |    227500
  399 | ansi_term               |    227338
 2016 | vec_map                 |    226511
 5535 | term_size               |    216202
 2556 | cfg-if                  |    216098
 1761 | traitobject             |    213957
 1869 | unicode-width           |    212723
 2291 | unreachable             |    209161
 6169 | serde_derive            |    208523
   41 | openssl-sys             |    196498
  327 | hyper                   |    194419
 1432 | httparse                |    186600
  121 | mime                    |    185423
  657 | term                    |    179560
  749 | unicase                 |    173259
  836 | language-tags           |    171383
   13 | typeable                |    165404
  120 | chrono                  |    163677
 8587 | synom                   |    160192
 1447 | tempdir                 |    156973
  231 | openssl                 |    151621
  546 | getopts                 |    148962
  835 | syntex_syntax           |    141851
 2725 | cmake                   |    134404
 4751 | num-rational            |    130499
   10 | uuid                    |    128982
 5571 | syntex_pos              |    127034
 5572 | syntex_errors           |    126549
 3019 | quick-error             |    116168
 2418 | net2                    |    115828
 4748 | num-complex             |    115238
 4750 | num-bigint              |    114431
   39 | libz-sys                |    113267
 2934 | crossbeam               |    112450
 2028 | filetime                |    110147
 2392 | slab                    |    110006
   37 | flate2                  |    109621
  314 | itertools               |    108840
    8 | glob                    |    107591
   56 | nix                     |    106651
   62 | mio                     |    105288
 7118 | redox_syscall           |    104466
   36 | miniz-sys               |    103141
 2303 | pulldown-cmark          |    103108
  174 | docopt                  |    102082
  326 | cookie                  |    100972
 3608 | rayon                   |     99444
  534 | deque                   |     97841
 4891 | error-chain             |     95444
 3119 | walkdir                 |     95152
 2572 | atty                    |     93841
 5045 | semver-parser           |     86274
  538 | curl-sys                |     84881
 2349 | backtrace-sys           |     84820
 5060 | rustc-demangle          |     83222
(100 rows)

@mitchhentges
Copy link

I'm a little nervous about using "last N days" as the factor to decide which crates to make available, because it will favour recently-updated crates. Of course, incentivizing actively-developed libraries is good, but I'm worried that stable, rock-solid crates that are used frequently will drop off, which is :(

I'm assuming that the underlying goal is to make the "100 most popular/used libraries" available. However, a crate can be used by a ton of projects, but if it isn't updated, then each project using the dependency won't download it again until the next release. However, a frequently-updated project will more quickly rack up downloads, as devs will be re-downloading the package to get the new version.

For example, I consider itertools to be useful, useful-for-many-projects dependency (even if I'm bad at using it 😉), but due to its slower release cycle, it could easily fall off the crates.io list.

(At the same time, perhaps incentivizing frequently-updated crates will encourage new libraries to be developed and increase competition, but I'm not sure the Rust playground should be the platform to push for that type of objective)

@devonhollowood
Copy link

Maybe it makes sense allow the use of the intersection of "top crates of all time" and "top crates from the past 90 days"? That way you can cover both crates that are very popular, but aren't updated frequently, and allow people to experiment with and share code from the newest hottest crates. And there's going to be a lot of overlap, so you won't be hosting too too many crates.

@sgrif
Copy link

sgrif commented Mar 9, 2018

Downloads will always be an imperfect metric that over-emphasizes crates which are likely to be dependencies of other crates, regardless of whether it's all-time or recent.

For example, Rocket is easily one of the most popular crates out there -- or at least one of the most talked about. Nearly a third of the front page posts on /r/rust mention it in the comments at least once. However, it is only ranked 511 on all time downloads, and 443 in recent downloads.

Or to put it from another perspective -- Diesel has a handful of dependencies (only one is actually mandatory, but in practice anybody relying on Diesel will have at least 5). Let's assume for the moment that all users of Diesel have the chrono feature enabled (which is actually mostly true). Since all users of Diesel will also be using chrono, no matter how many people are using Diesel, it will never be able to beat chrono on the download rankings (even if some people are only using chrono because of Diesel). So at best it can rank #2. However, it doesn't stop there. Chrono has 3 dependencies. It will never rank higher than any of those three crates. Of those 3 second level dependencies, there are a total of 6 third level dependencies. I stopped counting there, but Cargo.lock says it's 18 crates total for the one crate.

My point being, if a crate simply chooses to allow integration with chrono (or even just uses it internally for something), that crate will now at absolute best be ranked 19th in downloads, regardless of whether it's ranked by recent or all time.

@spease
Copy link

spease commented Jul 10, 2018

I was checking to see if nom was supported. It looks like it has enough downloads to be in that top 100 list above, but I guess that list above is out-of-date now?

It also seems like a pretty well-known crate for it to be excluded.

I'd wonder if there's a way of working categories into this - maybe the top 10 crates in each category should be included - although then it raises the issue of what the categories are, and who defines them. :) That way you get an even spread instead of favoring.

Another metric I could see factoring into this that would be harder to track - you could track how often somebody tries to include a crate and fails, or how often a crate actually gets included. This would then over time start to favor stuff that people often try to use the playground for. You couldn't directly compare failures and successes (once a crate started to work, it would likely start getting used more often) but maybe there's a multiplier that could be used to approximate a ranking of all currentlyincluded/currently-excluded crates.

@sollyucko
Copy link

Downloads will always be an imperfect metric that over-emphasizes crates which are likely to be dependencies of other crates, regardless of whether it's all-time or recent.

Hmm... If we assume that every time a crate is downloaded, all of its recursive dependencies are downloaded with it (including optional dependencies?), could there be a metric where they are subtracted to avoid double-counting?

@shepmaster
Copy link
Member Author

@sollyucko I'm not weighing in on if that would be better or worse, but it's not as straight-forward as stated due to changing dependencies over time. For example, if crate A v1 depends on B v1 then A v2 drops the dependency, you need to account for that.

assume that every time a crate is downloaded, all of its recursive dependencies are downloaded

It's unclear if it matters, but this assumption isn't true. If crates A and B depend on Z, then I install A I'll download (A, Z), then when I install B I'll only download (B). Removing the counts of both A and B from Z would lead to double subtracting.

@greggman
Copy link

greggman commented Jan 3, 2024

I have very little background here but ... I recently ran into an issue where all the examples in these docs for wasm-bindgen no longer run. If I understand correctly, the issue is that at one point wasm-bindgen was in this list and was then automatically removed when it fell off the list. That process in general seems undesirable as then any page using the playground could break at any time it updates.

If an algorithm for keeping things on the list forever is unacceptable, maybe it would be better if the rust-playground didn't include any crates period but the docs were updated on how to add a list of crates for a specific deployment. Then, each user of the playground could include the crates they need with their build (maybe this is already documented? This is my first brush with rust-playground)

update

I forgot that for most people they're relying on this one global server setup which only has one set of crates. That complicates the issue. Still, currently you write some code that requires a crate and tomorrow that code no longer runs because the crate fell off the list. That seems like a problem.

@shepmaster
Copy link
Member Author

I'm pretty sure this is the first time I've heard the argument that since sometimes old code won't work, we should remove all crates for everyone, forever. Feels a lot like Solomon cutting the baby in half.

There's nothing stopping people from running their own deployments of the playground, but it's certainly not the case we've optimized for.

However, your point doesn't really have much to do with this issue; the issue is changing the algorithm your suggestion is to completely abandon the algorithm. It'd be better to discuss it in a separate issue.

@greggman
Copy link

greggman commented Jan 4, 2024

My point was that the current algorithm is guaranteed to break stuff. People post on the playground, reference it, then others go to look, sample no longer runs. That's super confusing and not good for rust's reputation relative to other languages' playgrounds.

@spease
Copy link

spease commented Jan 4, 2024 via email

@BenjaminBrienen
Copy link

It should include the top 1200 most downloaded crates of all time so that test_case would be included. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something new the playground could do help wanted Not immediately going to be prioritized — ask for mentoring instructions!
Projects
None yet
Development

No branches or pull requests

10 participants