Additional test types #133

bhauer · 2013-04-11T15:01:29Z

bhauer
Apr 11, 2013

We plan to add new test types over time. The following is a summary of tests we have presently and those we plan to specify and implement in the future.

Present: JSON Serialization, in which a trivial newly-instantiated object is serialized to JSON.
Present: Single database query, in which a single random row is fetched (via the framework's ORM) from a simple database table containing 10,000 rows and then serialized to JSON.
Present: Multiple database queries, which is similar to the previous test but allowing the number of random rows to be specified as a URL parameter with the results rendered as a JSON list.
Present: Server-side template and collections test. This involves retrieving a small number of rows from the database, sorting within the application code (not within the database), and rendering to HTML via server-side templates. No external assets will be referenced by the templates. This is detailed in issue Test 4: Server-side templates and collections #134.
Present: Database update test. This is a variation of test 2. A single row will be fetched via the ORM, some trivial math will be applied to the random number field of the row, and then the object will be persisted using the ORM. This is intended to exercise the ORM's ability to persist rows, so the trivial math isn't applied directly to the row using SQL. This is detailed in issue Test 5: Updates #263.
Present: Small plaintext responses. This is detailed in issue New test type: Plaintext with pipelining #290.
Future: Caching test. Testing caching might begin with a variation of test 2 using the framework's caching capability, but we will also want to test caching results of more complex query operations. See Proposal for next test type: cached multiple database queries #374. This is likely to be the next test type.
Future: Server-side templates with assets. This will extend test 4 and add to-be-determined assets, at least composed of a style-sheet (CSS), but possibly also including JavaScript. Performance-wise, this likely won't differ much from test 4. However, it will be an opportunity for readers to dig into the code and observe the frameworks' variety of approaches for handling assets.
Future: Compression tests. Add gzip or deflate compression to one or more test.
Future: SSL tests. Add SSL to one or more tests. This is detailed in issue TLS/SSL test type(s) #3290.
Future: WebSocket enabled tests. (High concurrency is desirable here.)
Future: Tests that exercise requests made to external services and therefore must go idle until the external service provides a response. (High concurrency is desirable here.)
Future: JSON responses with larger workloads (complex data structure serialization).
Future: Transactional update test. See Question: Why is Update test not transactional? #326.
Future: Large plaintext responses.
Future: Complex routing map test. Require a given number of routes to be present to exercise the overhead of a larger routing map/table/tree.
Future: Heavy model test, involving a larger number of entity objects and classes, as suggested by @methane in comments below.
Future: CSRF protection and form processing test as suggested by @michaelhixson below. Note that @wg of Wrk fame has made a special version that selects from a list of requests and might allow us to run this test with Wrk.
Future: Large static response test as suggested by @weltermann17 below.
Future: Static file serving, to exercise the performance of the web-server. To be clear, this test would be expected to bypass the framework where applicable and be served directly by the web server or application server, whichever is available and best suited.
Future: Penetration test(s). This would require additional client-side testing tools (beyond the load generator we use today), but would validate the security of the platform and framework combination.
Future: TCP-heavy test. This test would mirror the Plaintext test but eliminate both pipelining and keep-alive—each request would need to be connected via a TCP socket and disconnected. It may be the case that such a test runs into networking-layer limits in the Linux TCP stack, so we'll need to be prepared to do some tuning there.
Future: HTTP/2 pipelined tests. This would likely be a variant of one of the present-day simple tests (JSON Serialization or Plaintext or both), since the presumed intent of such a test would be to measure the overhead impact/benefit of switching from HTTP v1 to v2.
Future: File upload. See New benchmark type - file upload #4829.

For the time being, we're still interested in relatively simple tests that exercise various components of the frameworks. But we're also interested in hearing your thoughts on more tests for the long term. If you have any ideas, please post them here.

robertmeta · 2013-04-12T03:34:13Z

robertmeta
Apr 12, 2013

Any chance of getting more significant levels of concurrency being tested? At least 1,000+ concurrency, and ideally 10,000+. Concurrency seems to be exceptionally under-represented in these tests.

2 replies

forgaoqiang Jul 21, 2021

It seems useless for an web framework for such an concurrent,maybe 100~200 is the max

boazsegev Jul 21, 2021

It seems useless for an web framework for such an concurrent,maybe 100~200 is the max

Respectfully, I disagree.

High concurrency can easily end up flooding a web framework, especially where pub/sub patterns are implemented as well as when long polling or WebSocket reconnection patters are required.

The level of concurrency translates to money (operating costs) and deployment considerations when choosing a framework, as a higher concurrency framework translates to less server instances (i.e., dynos on Heroku).

I would love to seeing a 10K analysis as it would definitely have an impact on choices people make.

bhauer · 2013-04-12T04:03:32Z

bhauer
Apr 12, 2013
Author

Hi @robertmeta. Thanks for the input! Would you mind reviewing the thread on issue #49 regarding concurrency levels and perhaps add to that conversation? It's my opinion that higher concurrency levels beyond what we have provided here would be useful if we were ready to benchmark high-connection low-utilization Websockets. But presently we are testing high-traffic traditional HTTP where responding to requests as quickly as possible is the paramount objective.

As with anything though, I'm prepared to be proven wrong. :)

0 replies

bitemyapp · 2013-04-12T04:35:31Z

bitemyapp
Apr 12, 2013

I'm with bhauer, this isn't, "how many users can we serve per server on our chat service".

0 replies

drewcrawford · 2013-04-12T07:43:54Z

drewcrawford
Apr 12, 2013

Some things I would like to see in the future:

msgpack tests. Msgpack is rapidly becoming an alternative to JSON particularly with non-browser clients.
multirow reads/writes, perhaps computing a mathematical function from a table or updating a hundred rows. Almost all of the requests I serve are multirow reads or writes, and frameworks have some per-row overhead usually
Making an "onward" request to another server. This tests the outbound HTTP stack.
A relationship (join) test, where you are using the ORM to relate two or more entities in a parent/child configuration. Frameworks take different approaches for eager vs lazy loading; the results may be interesting. Maybe construct a loop of entities and then check it for cycles.

0 replies

bhauer · 2013-04-12T14:27:09Z

bhauer
Apr 12, 2013
Author

Hi @drewcrawford. Thanks for the ideas!

No rush, since this is just long-term planning, but I am curious about your second idea concerning multi-row reads and writes. In my head I conceive of that as executing a single UPDATE but I am probably misunderstanding you. Would you be able to draft up some quick pseudo-code to allow me to visualize what you mean?

Your third idea was echoed by another reader, so that's got "high demand" from my perspective. :)

A test of relationships is a great idea too.

0 replies

bhauer · 2013-04-12T14:28:16Z

bhauer
Apr 12, 2013
Author

A commenter on HN named Terretta suggested the following. I'm just copying this here for easy future reference.

Exercising a randomized mix of reading and writing. I think you already said you were planning a CRUD test. Consider a tunable ratio here, something like 10000 R to 100 U to 10 C to 1 D.
Exercising synchronous web service (JSONP) calls in two modes: (a) to some web service that is consistently fast and low latency, say, the initial JSON example from this test suite running in servlet mode, and (b) to a web service written in the same framework as the one being tested, again using the initial JSON example. (The idea here is that many frameworks fall on their faces when confronted with latency. This is why synthetic tests are usually so poorly predictive of real world behavior -- people forget that latency causes backlogs and backlogs cause all parts of the stack to misbehave in interesting ways.)
Test async ability if the framework has it, with a system call (sleep?) that takes a randomized 0 - 60 seconds to return. Would help understand when a framework is likely to blow up calling out to a credit card processor, doing server side image processing, etc.
Exercising authentication (standardize on bcrypt, but only create passwords on 1 in 10K requests), authorization, and session state, if offered.
Exercising any built-in support for caching, where 1 in rand(X) requests invalidates the DB query cache, 1 in rand(X) requests invalidates the WS call cache, 1 in rand(X) requests invalidates the long term async system call cache, and 1 in rand(Y) requests blows away the whole cache.
For the enterprise legacy integrators, it would also be interesting to test XML as well (in particular, SOAP), anywhere we're testing JSON.

0 replies

bhauer · 2013-04-12T14:49:16Z

bhauer
Apr 12, 2013
Author

Another HN commenter named kbenson suggested a goal of defining a simple blog-style application. This is ambitious but we've already passed the threshold at which we require community contributions in order to move forward (even adding one more simple test will require several pull requests from the community to see that test implemented in more than a small sampling of the frameworks).

With that in mind, I think it's a great item to have on the long-term plan. If we keep the requirements simple, it could be done.

0 replies

drewcrawford · 2013-04-12T18:38:40Z

drewcrawford
Apr 12, 2013

Would you be able to draft up some quick pseudo-code to allow me to visualize what you mean?

a = 0
b = 1
for i from 1 to 100
    insert into table values(a+b)
    c = b
    b = a + b
    a = c

or

i = 0
not_quite_sum = 0
for row in table
    if i is even:
        not_quite_sum += row.field
    else:
        not_quite_sum -= row.field

The key insight being

there's a for loop
each pass of the for loop operates on one row
the overall operation is simple, but not so simple that it's natural to do in a SQL one-liner

The interesting thing about this test is that it does reads/writes in the same connection. Whereas in the single row access case the dominating factor might be setting up the connection or acquiring it from a shared pool, here the test is about how quick the ORM bindings are once they're in place and how fast you can move memory between the DB process and the application process.

0 replies

bhauer · 2013-04-12T21:40:32Z

bhauer
Apr 12, 2013
Author

@drewcrawford, thanks! I understand what you have in mind now. For some reason, I read your original statement to imply something much more complicated.

But you simply mean for a test that operates over multiple rows in a single resultset, in the case of reading, with per-row functionality occurring within the application rather than within SQL functions. Your pseudo-code illustrates the idea well.

0 replies

bhauer · 2013-05-09T18:50:37Z

bhauer
May 9, 2013
Author

Note that requirements for each test type are now posted at the results web site: http://www.techempower.com/benchmarks/#section=code

0 replies

bhauer · 2013-05-18T14:31:45Z

bhauer
May 18, 2013
Author

I just edited this issue to indicate the updates test is "present," and to add quick notes about the need to implement plaintext tests (both small and large payloads) and a larger work-load JSON test (something involving a complex and large data structure).

0 replies

michaelhixson · 2013-07-12T21:19:28Z

michaelhixson
Jul 12, 2013

A test that exercises form rendering, validation, and CSRF protection could be interesting. I'm pretty sure most of the full stack frameworks have utilities for those. Maybe the test would have three parts? One server-side implementation, but three sets of wrk parameters: (a) GET the form, (b) POST the form with errors, (c) POST the form successfully.

0 replies

bhauer · 2013-07-26T21:03:14Z

bhauer
Jul 26, 2013
Author

I've added test type 16, a more complex routing table/map/tree based on Christopher Lord's comment on the Google Group:

https://groups.google.com/d/msg/framework-benchmarks/r0B3tPaCMPs/_PG1_p1McbwJ

0 replies

methane · 2013-07-28T02:59:09Z

methane
Jul 28, 2013

Heavy model test:

Make 10 ActiveRecord or RowGateway classes. Instantiate them from each table.
Make additional 100 classes with single methods. Instantiate and call them while request.

This test may reveal cost of class loading [1], method calling and GC.

[1] Some languages like php loads class for each request.

0 replies

weltermann17 · 2013-07-30T19:15:21Z

weltermann17
Jul 30, 2013

I think a pretty simple additional test would be serving static context of different sizes (100k, 1m, 10m, 100m?). Frameworks that perform similar to (maybe even better than) Apache httpd in this domain could make life a lot easier for full-blown web applications than those that serve small content extremely well but degrade significantly when content sizes get large. With our framework PLAIN, for instance, we generate dynamic content of 3D data (JT, 3DXML, CATIA) that quickly reaches sizes >100m. Streaming those to files and then serve them with an httpd or the like would be a big drawback in terms of performance and complexity.

0 replies

benaadams · 2018-01-20T10:33:57Z

benaadams
Jan 20, 2018

but are they really framework specific? Or they are generic practices across frameworks.

They may not be implemented, or the code path might a neglected one, or carry unnecessary overhead in a particular framework etc; don't know haven't seen benchmarks 😄

Why I'd most like to see TLS is because the encryption itself on a pre-negotiated connection is really fast on modern CPUs; however the integration of the crypto library; whether OpenSSL, boring SSL, libsodium; Libressl; etc (do not roll your own crypto); can be pretty crufty. Can be too many allocations, too much copying, too granular, not granular enough, all sorts of things.

The TechEmpower benchmarks have done wonders for the performance of frameworks http performance by having a set of standardized measurements independently done; and I'd like to see that effect replicated for https; whichever framework you are into. Its good for the web ecosystem as a whole.

0 replies

RX14 · 2018-01-20T11:03:27Z

RX14
Jan 20, 2018

I agree that HTTPS between microservices, or between the load balancer and web frontend is a good idea in general. Benchmarking this would bring visibility to the performance overhead of frameworks in this area (an aim of the test would be to be directly comparable to a http test so the https overhead could be measured for each framework).

0 replies

boazsegev · 2018-01-20T15:17:33Z

boazsegev
Jan 20, 2018

In production" many environments have TLS between layers and in service to service calls. Infact some industries (especially heavily regulated ones" require secure communication between these layers...

I think is Redis a great use case example. Communication is performed securely using a secure tunnel (such as using Spiped, Ghostunnel, etc'.

The separation of concerns between security and services is important. It allows security patches to be easily implemented across the board without patching each framework / service separately. Also, the security implementation is often superior to a BYO solution.

What's the best reverse proxy setup to use for frontend server Y and framework X, what do they support: localhost vs named pipes vs unix sockets vs shared memory etc.

I would love have benchmarks about TLS/SSL, but I'm not sure the benchmarks should include any frameworks. Once we use a framework's internal TLS/SSL in a benchmark, we're telling developers that this is a production grade setup.

I'm not even sure I want to know how many applications in the wild use security implementations with known vulnerabilities... it a source for worry for me on SO whenever someone asks about implementing TLS/SSL in a backend application server.

Than again, I'll probably implement TLS/SSL and let security be damned - the pressure to add TLS/SSL is too great and (re)explaining why security concerns should be separated from the framework is getting tiring.

0 replies

Drawaes · 2018-01-20T19:24:09Z

Drawaes
Jan 20, 2018

Right an article on a site dedicated to edge services (F5) ... Also most of your "edge" services will use the exact same underlying lib. Eg ngnix using openssl. And if we take openssl it's often part of the os.

So the big issue is ... How did they implement it? Did they do a lot of copying and allocations? Are there added bottle necks?

As someone who works in a very secure environment we are required to have a high security edge layer (no one said I was ditching that and we actually have many layers there) but we are also required to have ssl/TLS "Inside" that security ring. To assume a security layer and you are done in banking, health, government, education is dangerous and in some cases illegal

0 replies

boazsegev · 2018-01-20T19:33:46Z

boazsegev
Jan 20, 2018

Also most of your "edge" services will use the exact same underlying lib. Eg ngnix using openssl. And if we take openssl it's often part of the os.

And... if memory serves, openssl uses insecure defaults... but maybe that changed.

As someone who works in a very secure environment...

In your experience, do you use a framework's SSL/TLS implementation for production (to secure internal network communication), or do you tunnel the communication through secure channeling (i.e., spiped)?

I agree that if secure production environments rely on the framework's SSL/TLS layer, than it's important to both test and benchmark that option.

0 replies

hiyelbaz · 2018-01-24T17:19:28Z

hiyelbaz
Jan 24, 2018

websockets +1

0 replies

bhauer · 2018-02-14T22:03:04Z

bhauer
Feb 14, 2018
Author

Commenters above who've thoughts about TLS/SSL tests please see and weigh in on the stand-alone issue (#3290).

0 replies

tmds · 2018-12-05T08:01:18Z

tmds
Dec 5, 2018

Many microservices are running in 'tiny' containers. It would be interesting to see how the stacks behave in such an environment. For example: allocated 1 CPU and 500MB RAM.

0 replies

tmds · 2019-04-05T07:12:03Z

tmds
Apr 5, 2019

For some time I've been running some of the TechEmpower benchmarks and trying to make sense of the results. One thought that keeps coming back is that these benchmarks are stressing the frameworks at 100% cpu. That should not be the normal operating mode of a server. I'd be interested to see latency vs throughput numbers. wrk2 can be used for such benchmarks.

0 replies

bhauer · 2019-04-25T15:08:29Z

bhauer
Apr 25, 2019
Author

Added a future HTTP/2 test type based on the following comment: #2978 (comment)

1 reply

zugazagoitia Feb 21, 2023

h2load could be a great option for benchmarking both HTTP/2 and SSL endpoints.

abnud1 · 2020-06-19T07:37:59Z

abnud1
Jun 19, 2020

HTTP2 +1

0 replies

corentinway · 2021-08-17T09:07:36Z

corentinway
Aug 17, 2021

It seams the only performance you measure is response time.
Can you also measure

CPU
Energy
Memory

as it was done as described in this article : https://jaxenter.com/energy-efficient-programming-languages-137264.html

Github Repo : https://github.com/greensoftwarelab/Energy-Languages

This only issue about greensoftwarelab tests is they used the Debian Benchmarks Game which only apply to few use case far away of the web use.

0 replies

ShreckYe · 2024-11-08T10:24:30Z

ShreckYe
Nov 8, 2024

I am also interested in the performance of HTTP/3. Though there are few libraries supporting it now and HTTP/2 is still on the waiting list, adding this test type could give the library contributors a motive and speed up its support, especially in the more popular web frameworks. There is a WRK variant supporting HTTP/3 now so the tool should be ready.

0 replies

ShreckYe · 2024-11-08T10:36:07Z

ShreckYe
Nov 8, 2024

Vert.x supports connecting to the database with domain sockets, with both PostgreSQL and MySQL. I am wondering how much performance improvements this brings, especially in IO-intensive workloads. Therefore, as far as I'm concerned, adding a domain socket variant of the "single database query" test would be good.

2 replies

kbrock Nov 19, 2024

I would think all frameworks let you easily point to a domain socket instead of using tcp/ip.

Having said that, from what I've seen, most web database apps have 3 layers, with multiple app servers. Since this can't scale with domain sockets, not sure what this provides.

ShreckYe Nov 22, 2024

I would think all frameworks let you easily point to a domain socket instead of using tcp/ip.

So you mean as done here with the Vert.x PostgreSQL Client you can just pass the domain socket file location to the host parameter? I was not aware of this. Thanks for pointing out.

Having said that, from what I've seen, most web database apps have 3 layers, with multiple app servers. Since this can't scale with domain sockets, not sure what this provides.

Just a personal opinion here. I think the monolithic architecture is coming back these years with the hardware manufacturers' efforts to increase core count, memory size, and transmission speed. The new AMD EPYC™ 9965 is 192c384t with support for terabytes of memory for example. And Stack Overflow is based on such a kind of architecture as illustrated in this video. Although in such a case the server app and the database are not deployed on the same server, I believe this can be done with a machine that's powerful enough. And so to sum up, I think a monolithic server with an adequate hardware architecture can satisfy the scaling needs of most small-to-medium-sized apps, and reduce some overhead compared to a microservice architecture.

ShreckYe · 2024-11-19T00:18:22Z

ShreckYe
Nov 19, 2024

Building on "13. Future: JSON responses with larger workloads (complex data structure serialization).", I'd like to propose a variant using Protobuf, as complex data structure serialization is where Protobuf shines.

0 replies

Additional test types #133

Replies: 75 comments · 5 replies

bhauer Apr 12, 2013 Author

bhauer Apr 12, 2013 Author

bhauer Apr 12, 2013 Author

bhauer Apr 12, 2013 Author

bhauer Apr 12, 2013 Author

bhauer May 9, 2013 Author

bhauer May 18, 2013 Author

bhauer Jul 26, 2013 Author

bhauer Feb 14, 2018 Author

bhauer Apr 25, 2019 Author

Replies: 75 comments 5 replies

bhauer
Apr 12, 2013
Author

bhauer
Apr 12, 2013
Author

bhauer
Apr 12, 2013
Author

bhauer
Apr 12, 2013
Author

bhauer
Apr 12, 2013
Author

bhauer
May 9, 2013
Author

bhauer
May 18, 2013
Author

bhauer
Jul 26, 2013
Author

bhauer
Feb 14, 2018
Author

bhauer
Apr 25, 2019
Author