Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

Data writing latency question #87

Open
polygan opened this issue Dec 12, 2011 · 4 comments
Open

Data writing latency question #87

polygan opened this issue Dec 12, 2011 · 4 comments

Comments

@polygan
Copy link

polygan commented Dec 12, 2011

Hi,

I have set up four servers bigCouch to form a cluster. And I select another server to be client (eg, YCSB), which will send large scale request to the bigCouch cluster. The thread number on the client is 10. I set the q=4, n=2, r=w=1.
For the loadbalancer, I used Nginx to forward the request.

In my experiment, I only got the average latency 12 milliseconds (ms) and 1000 records/second concurrency.
I would like to know how can get the lower writing latency.

Thanks.

@mlmiller
Copy link

Hi Polygan,

Would you be willing to share the tool that you used to profile? I will also point you to a simple python script I have that can drive about 6k writes/second from a few threads. Additionally, what type of hardware are you running this on? By default we flush every write to disk, although this is configurable on a per-transaction basis by passing in a custom HTTP header. If you are running on slow spinning disks (e.g. EC2 ephemeral or EBS), then each seek generally takes 10 ms, which sets the minimum latency for a write. Your options are to move to better spinning disks (such as our Softlayer based clusters on cloudant.com), SSDs, or to flush to disk asynchronously by adding the custom header in your write requests. However, before we go down those paths I'd like to take a look at the script you use to generate the load.

@polygan
Copy link
Author

polygan commented Jan 31, 2012

Hi Miller,

I use YCSB (Yahoo! Cloud Serving Benchmark) to profile Bigcouch data writing and query performance. In the setting, 10 threads /each client are used to create load. I used Nginx to be load balancer.

The type of hardware is listed as follows:
4 nodes cluster
CPU: Quard core, 2.4 G Xeon per server
Mem: 4GB per server
HD: 1TB×3 per server , 7200rpm

My YCSB-Bigcouch package is too large, so I just send the insert function to you.


public boolean doInsert(DB db, Object threadstate) {

    UUID uid = UUID.randomUUID();

    String dbkey = "user" + uid.toString();
    HashMap<String, String> values = new HashMap<String, String>();
    JSONObject base = new JSONObject();

    JSONArray fragments = new JSONArray();
    JSONObject fragment = new JSONObject();
    try {
        base.put("jsonType", table);
        byte []temp = new byte[30]; 
        _random.nextBytes(temp);
        fragment.put("DN","Company-G4S/SUNTIOVehicle-CTJ178/S" + temp.toString());
        _random.nextBytes(temp);
        fragment.put("name", "Telematics Data " + temp.toString());
        fragment.put("Id", "CEENG1100400511_" + _random.nextLong());
        fragment.put("type","com.nsn.telematics.cumulocity_agent.UDPStreamerWithUDP");

        JSONArray valuesArray = new JSONArray();
        for (int i = 0; i < 7; ++i) {
            JSONObject value = new JSONObject();

            value.put("UNIT", JSONObject.NULL);
            //if(i!=0){
            //  _random.nextBytes(temp);
                value.put("NAME", fragmentnames[i]);
            //}else{
                //value.put("NAME", temp.toString());
        //  }
            if(i%2==0)
                value.put("VALUE", Math.abs(_random.nextLong()));
            else
                value.put("VALUE", Math.abs(_random.nextDouble()));
            value.put("QUANTITY", JSONObject.NULL);
            valuesArray.put(value);
        }
        fragment.put("values", valuesArray);
        fragments.put(fragment);
        base.put("fragments", fragments);
        base.put("MO_ID",Math.abs(_random.nextInt(3000))); 
    } catch (JSONException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    values.put(dbkey, base.toString());

    if (db.insert(table, dbkey, values) == 0)
        return true;
    else
        return false;
}

@polygan
Copy link
Author

polygan commented Feb 7, 2012

Hi,

Is there any follow-up? Thanks.

Regards,
Ke-yan

2012/1/31 Mike Miller <
[email protected]

Hi Polygan,

Would you be willing to share the tool that you used to profile? I will
also point you to a simple python script I have that can drive about 6k
writes/second from a few threads. Additionally, what type of hardware are
you running this on? By default we flush every write to disk, although
this is configurable on a per-transaction basis by passing in a custom HTTP
header. If you are running on slow spinning disks (e.g. EC2 ephemeral or
EBS), then each seek generally takes 10 ms, which sets the minimum latency
for a write. Your options are to move to better spinning disks (such as
our Softlayer based clusters on cloudant.com), SSDs, or to flush to disk
asynchronously by adding the custom header in your write requests.
However, before we go down those paths I'd like to take a look at the
script you use to generate the load.


Reply to this email directly or view it on GitHub:
#87 (comment)

@mlmiller
Copy link

mlmiller commented Feb 7, 2012

Hi Ke-yan,

Sorry for the slow reply. Again, there are many things that you can do to speed up the response. Here's a python tool that demonstrates how to use bulk requests and stale=ok to achieve at least 5k writes/sec onto a 3 node cluster of ec2 ephemeral disks. You can get a further performance bump by relaxing the fsync to once/sec instead of once/request by adding X-COUCH-FULL-COMMIT=False to the header, but that's a bit more dangerous -- if your server shuts down abruptly you could lose a full second's worth of data.

Take a look at: https://github.com/cloudant/public-examples/tree/master/importer for a simple importer benchmarking tool.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants