Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve toString Performance: Use StringBuilderWriter for toString methods #867

Conversation

Simulant87
Copy link
Contributor

@Simulant87 Simulant87 commented Feb 23, 2024

Implementation for #863

Using a StringBuilder wrapped in a new StringBuilderWriter class to improve the performance for the default toString methods.

using the not synchronised StringBuilder instead of the StringBuffer results in a ~37% faster toString method, which is mostly used for serialisation, so this improvement is valuable for any service communication receiving JSON from this library.

I validated the performance with this project: https://github.com/fabienrenaud/java-json-benchmark
./run ser --libs ORGJSON

Here is the result for the benchmark on my local machine (thoughput measured in operations per second, higher is better):

currently latest library version 20240205:

Benchmark               Mode  Cnt       Score      Error  Units
Serialization.orgjson  thrpt   20  741310,103 ▒ 6250,938  ops/s

Improved implementation from this PR:

Benchmark               Mode  Cnt        Score      Error  Units
Serialization.orgjson  thrpt   20  1019863,958 ▒ 2538,416  ops/s

@stleary
Copy link
Owner

stleary commented Mar 3, 2024

@Simulant87 The benchmarks look good, but I tried testing against a 30m JSON file. The improvement was less than 2% between the latest master branch code (943 ms) and this PR (927 ms). I don't think that is enough to justify accepting this PR. If you want to run the test case yourself, I used the file train-v1.1.json from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset

@Simulant87
Copy link
Contributor Author

Simulant87 commented Mar 3, 2024

@stleary Thank you for having a look at my PR and sharing your source file for your performance test. I repeated the performance test with your input file and wrote this "poor mans performance test" (see below). It was just faster to implement than a correct JMH setup and it should still give an approximate feelingabout the difference. I executed it on master and my branch with the following result:

master branch:
84.533ms for 200 iterations of toString. 422ms per toString on average.
my branch:
54.706ms for 200 iterations of toString. 273ms per toString on average.

the improvement is comparable to the java-json-benchmark I used before with ~35% improvement.
I would be interested in the execution result in your environment.

package org.json;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;

public class Main {

    public static void main(String[] args) throws FileNotFoundException {
        // TODO: update path to 30mb input file train-v1.1.json from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset
        InputStream inputStream = new FileInputStream("/path/to/train-v1.1.json");
        JSONTokener tokener = new JSONTokener(inputStream);
        JSONObject object = new JSONObject(tokener);

        String output = null;
        // JVM warm up: 100 iterations of toString
        for (int i = 0; i < 100; i++) {
            output = object.toString();
            System.out.println(output.charAt(0));
        }

        // measured run: 200 iterations of toString
        long start = System.currentTimeMillis();
        int iterations = 200;
        for (int i = 0; i < iterations; i++) {
            output = object.toString();
            // access the to string output, so it does not get optimised away
            System.out.println(output.charAt(0));
        }
        long end = System.currentTimeMillis();

        long totalTime = end - start;
        System.out.println(totalTime + "ms for " + iterations + " iterations of toString. " + totalTime / iterations + "ms per toString on average.");
    }
}

@stleary
Copy link
Owner

stleary commented Mar 4, 2024

@Simulant87 I think that measuring how fast toString() runs in isolation does not provide useful information for users' applications. The test I am starting with is: "What is the performance improvement when parsing a moderately large/complex JSON doc?".

@Test
public void foo() {
    final InputStream resourceAsStream = JSONObjectTest.class.getClassLoader().getResourceAsStream(
            "train-v1.1.json");
    JSONTokener tokener = new JSONTokener(resourceAsStream);
    long startTime = System.currentTimeMillis();
    JSONObject jsonObject = new JSONObject(tokener);
    long endTime = System.currentTimeMillis();
    System.out.println("Time to execute in ms: " + Long.toString(endTime-startTime));
}

@Simulant87
Copy link
Contributor Author

@stleary You are right, in my performance improvement I only focused on the toString() method, with a noticeable improvement. My use case it that my application creates JSONObjects as internal model and serialises them as response to a REST request. The time to create JSONObject will be the same, but the time to write the response will be improved, as there the toString() method is used (at least in my context).

replacing a switch-case statement with few branches
by if-else cases
@Simulant87
Copy link
Contributor Author

Simulant87 commented Mar 5, 2024

@stleary I also took a look into the deserialisation parsing a JSONObject from String. I was able to pull of a ~30% improvement by replacing switch-case statements with if-else statements and don't use the back() method for the JSONObject parsing.

Performance checks I did with java-json-benchmark
./run deser --libs ORGJSON

Here is the result for the benchmark on my local machine (thoughput measured in operations per second, higher is better):
currently latest library version 20240205:

Benchmark                 Mode  Cnt       Score       Error  Units
Deserialization.orgjson  thrpt   20  747134,406 ▒ 14445,770  ops/s

improvement on this branch:

Benchmark               Mode  Cnt        Score       Error  Units
Serialization.orgjson  thrpt   20  1062998,974 ▒ 12164,706  ops/s

Also doing my simple performance test on your large JSON input file shows an improvement, although it is not that large as on the benchmark above:

on master branch:
76.403ms for 200 iterations of parsing JSONObject. 382ms per parsing on average.

Improvement on my branch:
70.433ms for 200 iterations of parsing JSONObject. 352ms per parsing on average.

    public static void main(String[] args) throws FileNotFoundException {
        // TODO: update path to 30mb input file train-v1.1.json from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset
        InputStream inputStream = new FileInputStream("/path/to/train-v1.1.json");
        JSONTokener tokener = new JSONTokener(inputStream);
        JSONObject object = new JSONObject(tokener);

        // JVM warm up: 100 iterations of toString
        for (int i = 0; i < 100; i++) {
            inputStream = new FileInputStream("/path/to/train-v1.1.json");
            tokener = new JSONTokener(inputStream);
            object = new JSONObject(tokener);
        }

        // measured run: 200 iterations of toString
        long start = System.currentTimeMillis();
        int iterations = 200;
        for (int i = 0; i < iterations; i++) {
            inputStream = new FileInputStream("/path/to/train-v1.1.json");
            tokener = new JSONTokener(inputStream);
            object = new JSONObject(tokener);
        }
        long end = System.currentTimeMillis();

        long totalTime = end - start;
        System.out.println(totalTime + "ms for " + iterations + " iterations of parsing JSONObject. " + totalTime / iterations + "ms per parsing on average.");
    }

@stleary stleary changed the title #863 Improve toString Performance: Use StringBuilderWriter for toString methods Improve toString Performance: Use StringBuilderWriter for toString methods Mar 10, 2024
@stleary
Copy link
Owner

stleary commented Mar 17, 2024

What problem does this code solve?
This is a performance optimization. StringBuilderWriter (uses a StringBuilder) replaces the built-in class StringWriter (uses a StringBuffer). Large JSON docs that contain many strings can be stringified in about half the time with this enhancement.

Does the code still compile with Java6?
Yes

Risks
Low

Changes to the API?
No

Will this require a new release?
No

Should the documentation be updated?
No

Does it break the unit tests?
No, new unit tests were added.

Was any code refactored in this commit?
No

Review status
APPROVED

Starting 3-day comment window

Copy link
Contributor

@rikkarth rikkarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and tested locally, ok.

@stleary stleary merged commit 45dede4 into stleary:master Mar 20, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants