Skip to content
This repository has been archived by the owner on Dec 1, 2024. It is now read-only.

Napi rewrite #540

Merged
merged 42 commits into from
Dec 15, 2018
Merged

Napi rewrite #540

merged 42 commits into from
Dec 15, 2018

Conversation

ralphtheninja
Copy link
Member

@ralphtheninja ralphtheninja commented Dec 9, 2018

Why?

N-API is the latest interface for native addons in node.js. It's a set of c functions that lie between your native addon and v8. Main benefit is that your addon is independent of the v8 version, which means it will be compatible with future versions of node.js.

So, as long as your native code doesn't change, it doesn't need to be recompiled as new versions of node.js are released. This helps greatly with both maintenance and also when delivering code to the users of your module.

The downside is a loss of performance since you're no longer interfacing directly with v8.

Porting to N-API

There's been some work done already to port leveldown to N-API but I decided to give it a go myself. Mainly because accepting a huge PR as a maintainer means a lot of work, it's not much more work to actually do it yourself.

Also, personally, I've always felt that the v8 "lingo" and abstractions were difficult to understand. Basically, you need to understand how v8 works (to some extent) in order to write good native addons. Maybe it was just over my head.

Design

However, I think "workers" as a concept is easy to understand, so I've tried to keep the code similar to the nan approach in this sense.

But why all these C++ classes/structs? Why not use c structs? They are simpler!

LevelDB is a C++ library and the glue code was already written in C++. Also, I personally like virtual methods, which really come in handy when creating workers. You can abuse virtual methods as well, but if you keep it to a minimum (no multiple inheritance) it can be quite elegant.

Another reason for this was for simplicity during rewrite. The main focus when migrating from nan to N-API should be at the intersection of js and c++ land. The code in between should really not have to change that much.

Why one binding.cc? The file has so many lines!

Correct. At first I wanted to keep a single file to make it easier during rewrite, but the more I thought of it, the more it made sense to keep everything in one amalgamated file. I'd prefer to keep it this way. It's so much easier to find things. Each async method has its worker right next to it.

Feedback

How can we help you review this?

Since this PR is quite big, I'd prefer to keep review at a bigger level. Please let me know if there's something in the code that's a big no no, e.g. if I'm using the wrong abstractions.

Input such as "Did you know you can do X instead of Y and then you don't have to Z?" is also very helpful.

Another important thing is that when you rewrite things and there are many details to keep in your head, you often forget things. So if you can find something that I have forgotten, that'd be great.

One thing that would be greatly needed is also help with finding bugs. It would be really helpful if you can make the code segfault. Also running benchmarks to compare with the previous implementation would be nice.

Some stats

I tried running test-for -n 100 for pure tests (e.g. only the tape tests, without standard etc) and got the same average for both the master and the napi branch (~3.8s/test). Timing tests is most likely not a good measure, but it's a indication at least.

Total lines of native code in src/:

$ wc -l src/*
   32 src/async.h
   17 src/batch_async.cc
   26 src/batch_async.h
  148 src/batch.cc
   40 src/batch.h
   33 src/common.h
  290 src/database_async.cc
  168 src/database_async.h
  514 src/database.cc
  108 src/database.h
   86 src/iterator_async.cc
   43 src/iterator_async.h
  613 src/iterator.cc
   94 src/iterator.h
   38 src/leveldown_async.cc
   34 src/leveldown_async.h
   70 src/leveldown.cc
  117 src/leveldown.h
 2471 total

Compared to binding.cc:

$ wc -l binding.cc
1772

@@ -37,8 +37,8 @@ test('test destroy non-existent directory', function (t) {
rimraf(location, { glob: false }, function (err) {
t.ifError(err, 'no error from rimraf()')

leveldown.destroy(location, function () {
t.is(arguments.length, 0, 'no arguments returned on callback')
leveldown.destroy(location, function (err) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior has changed. The callback now return null.

@@ -55,8 +55,8 @@ test('test destroy non-existent parent directory', function (t) {

t.notOk(fs.existsSync(parent), 'parent does not exist before')

leveldown.destroy(location, function () {
t.is(arguments.length, 0, 'no arguments returned on callback')
leveldown.destroy(location, function (err) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

/**
* Returns a context object for a database.
*/
NAPI_METHOD(db_init) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the model a bit. Either a NAPI_METHOD returns a context, which all methods ending with _init does, or a method takes a context. This is different from the previous implementation, where you had methods on objects, sort of like a js object if you will. This is more c style.

/**
* Puts a key and a value to a database.
*/
NAPI_METHOD(db_put) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All NAPI_METHOD's have their corresponding worker next to it. So it's easy to find.

}

virtual void DoExecute () {
SetStatus(database_->Get(options_, key_, value_));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I mimic the behavior of the AsyncWorker class in nan. We store status and if status is ok, we call back will null as a default action. In case a method wants to call back with other data, e.g. in NAPI_METHOD(db_get), its corresponding worker (GetWorker) will override HandleOKCallback(), see below.

/**
* Worker class for deleting a value from a database.
*/
struct DelWorker : public BaseWorker {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you wonder why I'm using struct and not class here. A struct in C++ is a class. The only difference is default access. On a struct everything is public by default. On a class it's private. I picked struct for simplicity at first and I don't really think it's a big deal. I know some C++ devs will probably object. This is more js style I guess 😄

@@ -39,17 +39,10 @@
"<(module_root_dir)/deps/leveldb/leveldb.gyp:leveldb"
],
"include_dirs" : [
"<!(node -e \"require('nan')\")"
"<!(node -e \"require('napi-macros')\")"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ @mafintosh


function ChainedBatch (db) {
AbstractChainedBatch.call(this, db)
this.binding = db.binding.batch()
this.context = binding.batch_init(db.context)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how we do it, regardless of where we are. We use an _init method to get a context.

@@ -43,7 +44,7 @@
"verify-travis-appveyor": "^3.0.0"
},
"scripts": {
"install": "prebuild-install || node-gyp rebuild",
"install": "node-gyp-build",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node-gyp-build checks if the module can be require()'d, if it fails it will recompile.

@ralphtheninja
Copy link
Member Author

Picking some random people for review. It's fine if you don't have time or energy to review 😄

@ralphtheninja ralphtheninja mentioned this pull request Dec 9, 2018
1 task
@vweevers vweevers mentioned this pull request Dec 9, 2018
@vweevers
Copy link
Member

vweevers commented Dec 9, 2018

Why one binding.cc? The file has so many lines!

Correct. At first I wanted to keep a single file to make it easier during rewrite, but the more I thought of it, the more it made sense to keep everything in one amalgamated file. I'd prefer to keep it this way. It's so much easier to find things. Each async method has its worker right next to it.

I opened #542 for further discussion after this PR; personally I don't agree.

binding.cc Outdated Show resolved Hide resolved
@juliangruber
Copy link
Member

Wow, what a herculean effort! ❤️

@vweevers
Copy link
Member

vweevers commented Dec 9, 2018

There's a memory leak: running node --expose-gc test/leak-tester.js for a few minutes shows it growing to 1000% of initial memory usage.

getCount = 0 , putCount =  0 , rss = 101% 26M ["0","0","0","0","0","0","0"]
getCount = 1000 , putCount =  1000 , rss = 119% 31M ["0","0","0","0","0","0","0"]
..
getCount = 569000 , putCount =  553100 , rss = 1001% 262M ["4","13","126","423","0","0","0"]

Same with node --expose-gc test/leak-tester-batch.js, but at a slower rate.

@ralphtheninja ralphtheninja changed the title Napi rewrite [wip] Napi rewrite Dec 9, 2018
@ralphtheninja
Copy link
Member Author

There's a memory leak: running node --expose-gc test/leak-tester.js for a few minutes shows it growing to 1000% of initial memory usage.

I'll have a look. Any tools we can use to track stuff down? This is one of my Achilles' heels.

@ralphtheninja
Copy link
Member Author

@vweevers Which version of node are you running? Just making sure we use the same setup.

@vweevers
Copy link
Member

vweevers commented Dec 9, 2018

It was 8.6.0 x64 on Windows. I also tried 8.14.0, doesn't exhibit the same growth rate. I'm running 10.14.1 now, currently at 3 million gets but holding steady at ~400% memory usage.

Edit: still steady at 6 million gets, I stopped it. Not sure how to interpret these results.

@ralphtheninja
Copy link
Member Author

Hmm this is strange. I'm on v10.14.1 and I'm running up to 1000% even on the master branch.

@vweevers
Copy link
Member

vweevers commented Dec 9, 2018

Hmm this is strange. I'm on v10.14.1 and I'm running up to 1000% even on the master branch.

Ai.

@ralphtheninja
Copy link
Member Author

And some more plots. Difficult to see because it's a lot of data (four files, each ~60Mb). The last one looks a bit funny in scale because of an extreme outlier.

write-random-20181215-12 59 03
write-sorted-20181215-12 54 09

@ralphtheninja
Copy link
Member Author

@vweevers Question is now, do we squash this or make a merge commit? I'd personally like to merge and keep the rewrite history.

@vweevers
Copy link
Member

I'd personally like to merge and keep the rewrite history.

Sure 👍

@vweevers
Copy link
Member

@ralphtheninja I think it's time for our perfect Neil Patrick Harris gif. I'll let you do the honors.

@ralphtheninja
Copy link
Member Author

done

@ralphtheninja ralphtheninja merged commit c2fefcf into Level:master Dec 15, 2018
@ralphtheninja ralphtheninja deleted the napi branch December 15, 2018 13:54
@ralphtheninja
Copy link
Member Author

@vweevers I'm curious how the napi stuff works on node 6, e.g. the build https://travis-ci.org/Level/leveldown/jobs/468365583

Has napi been backported to node 6?

@vweevers
Copy link
Member

Has napi been backported to node 6?

Yes, but I think it's still experimental, see https://nodejs.org/api/n-api.html#n_api_n_api_version_matrix

@ralphtheninja
Copy link
Member Author

Yes, but I think it's still experimental, see https://nodejs.org/api/n-api.html#n_api_n_api_version_matrix

Aaah ✔️

But apparently, it does work :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants