Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut memory usage in half by iterating arrays on the fly #66

Open
daniel-rusu opened this issue Oct 20, 2023 · 5 comments
Open

Cut memory usage in half by iterating arrays on the fly #66

daniel-rusu opened this issue Oct 20, 2023 · 5 comments

Comments

@daniel-rusu
Copy link
Contributor

daniel-rusu commented Oct 20, 2023

Instead of copying collections by pushing each array element individually onto the stack (in the MeasurementStack), we could create a pushArray method in the MeasurementStack that keeps track of the entire array reference instead of copying all the elements from the array. The measurement stack could keep track of the index into the array so when the pop method is called and an array is present then fetch that element from the array at that index and decrement the index. When the index becomes negative then actually pop the array itself.

This will make analyzing collections much more efficient in both time and space.

@daniel-rusu
Copy link
Contributor Author

daniel-rusu commented Oct 20, 2023

An alternative approach would be to have 2 measurement stacks, one for arrays and the other for regular objects so the logic would look something like this:

while (arrayStack.isNotEmpty() || objectStack.isNotEmpty) {
    while (arrayStack.isNotEmpty()) {
        val array = arrayStack.pop()
        for (element in array) {
            if (!shouldSkip(element)) {
              measureObject(element)
            }
        }
    }
    while (objectStack.isNotEmpty()) {
        measureObject(objectStack.pop())
    }
}

@daniel-rusu
Copy link
Contributor Author

daniel-rusu commented Oct 22, 2023

To clarify, when encountering an array with N eligible elements, Jamm currently makes 2N copies. Inside the MeasurementStack, all the array element references get copied to the tracker, and another copy of all the references gets added to the stack.

This enhancement cuts the memory usage in half since the references only need to be lazily added to the tracker without needing to copy them to the stack.

I tried an implementation using a variation of the first proposal above so the rest of the code can remain mostly unchanged. The MeasurementStack contained 2 stacks, objectStack & arrayStack, and found that this improved performance by about 5% while cutting the memory consumption in half since both stacks remained tiny and only the tracker grew to keep track of all the elements.

Since the process of measuring an array element can end up adding additional arrays, we need to keep track of where we left off so the arrayStack stored an ArrayWrapper that contained a reference to the array along with the next index to be processed. When fetching the current array element, the index would be decremented until finding the next eligible element and the wrapper is removed from the stack when it contains no more elements. The isEmpty function returns true when both stacks are empty.

@daniel-rusu daniel-rusu changed the title Iterate arrays instead of copying each element into the MeasurementStack Cut memory usage in half by iterating arrays on the fly Oct 22, 2023
@blerer
Copy link
Collaborator

blerer commented Oct 24, 2023

@daniel-rusu I am not sure that I understand your way of computing memory usage. Considering the way Java deal with memory when we add an array to the stack we only copy the reference to the array not its value. Therefore, the impact on memory is near neglectable.

@daniel-rusu
Copy link
Contributor Author

@blerer there seems to be a misunderstanding here as this has nothing to do with the Java memory model. Yes, when first encountering an array field, a reference to that array is added to the stack and this is negligible as you say. However, when popping that reference and discovering that it's pointing to an array then every single eligible array element reference is copied to the stack. See this section of the code for details:

addArrayElements((Object[]) current, stack);

So this does actually cut the memory usage of the Jamm library in half when analyzing a large array of fairly shallow objects using the first proposal assuming we prioritize popping from the object stack. That's because each array element is popped directly from the array and analyzing that element would add a small number of references to the stack to be analyzed and emptied before popping the next array element. So instead of blowing up the size of the stack by copying all the array elements, the stack would instead remain tiny since only direct field references would be copied there and cleared before moving to the next array element.

@blerer
Copy link
Collaborator

blerer commented Oct 26, 2023

Sorry, I did not read your 3rd comment before. I now understand your idea and it make sense. Fill free to propose a patch :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants