Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big integer Karatsuba algorithm #108

Closed
oscbyspro opened this issue Nov 5, 2023 · 13 comments
Closed

Big integer Karatsuba algorithm #108

oscbyspro opened this issue Nov 5, 2023 · 13 comments
Labels
brrr such code, much wow
Milestone

Comments

@oscbyspro
Copy link
Owner

oscbyspro commented Nov 5, 2023

I'm adding Karatsuba multiplication to UIntXL (#33).


Computing the 10,000,000th Fibonacci element now takes 2.6 seconds on my M1 MacBook Pro. With long multiplication only, the same calculation takes 34.4 seconds. The threshold for using the Karatsuba algorithm is 40 words. Edit: I have reduced it to 2.3 seconds and 20 words, with some improvements.

/// https://www.wolframalpha.com/input?i2d=true&i=fibonacci+10000000
///
/// - Note: The 10,000,000th element contains 2,089,877 decimal digits.
///
func test10000000() {
    NBK.blackHole(NBKFibonacciXL(NBK.blackHoleIdentity(10_000_000)))
}
@oscbyspro oscbyspro added the brrr such code, much wow label Nov 5, 2023
@oscbyspro oscbyspro added this to the v0.16.0 milestone Nov 5, 2023
@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 5, 2023

GNU says the threshold can be reduced to 10 words, in case I feel like optimizing it.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 5, 2023

Hm. I think I could do a bit better if I had composable (perhaps generic) methods like:

public static func increment(
_ base: inout UnsafeMutableBufferPointer<UInt>,
by rhs: UnsafeBufferPointer<UInt>, 
times lhs: UnsafeBufferPointer<UInt>) { 
    if  lhs.count < 40 || rhs.count < 40 { 
        ...
    }   else {
        ... 
    }
}

public static func incrementByLongAlgorithm(
_ base: inout UnsafeMutableBufferPointer<UInt>,
by rhs: UnsafeBufferPointer<UInt>, 
times lhs: UnsafeBufferPointer<UInt>) { 
    ...
}

public static func incrementByKaratsubaAlgorithm(
_ base: inout UnsafeMutableBufferPointer<UInt>,
by rhs: UnsafeBufferPointer<UInt>, 
times lhs: UnsafeBufferPointer<UInt>) { 
    ...
}

By that I mean, I wrote something quite neat and it'll take a day to clean up and push.


Edit: initialize(_:to:times:) seems more appropriate for multiplication algorithms.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 5, 2023

I'll be doing fewer allocations, so it appears that I can lower the threshold to 20 words.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 6, 2023

I can remove an additional allocation and some additions if I make a decrementing version too. Hm. Dunno whether it's worth it, but it'd be neat insofar as every level of recursion allocating 2(lhs.count + rhs.count) contiguous memory.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 6, 2023

Welp. It's possible to add the middle product in-place, but when I tried it I got random results. So I suppose there might be some finicky life-time thing to think about when doing it. Or maybe the recursion breaks exclusive access. It's not an issue I want to tackle at the moment, however.

@oscbyspro
Copy link
Owner Author

I want to eventually end up with a non-recursive algorithm, but I'll towards it incrementally.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 6, 2023

Hm. The middle product can reuse the 1st product's allocation. It's at least something 🤷‍♂️

square: 3 * a.count, non-square: 2 * (a.count + b.count)

@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 6, 2023

I chose to normalize each subsequence because I saw this in 10_000_000_000 ^ 1_000:

[silly string] - [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]

@oscbyspro
Copy link
Owner Author

I suppose there is still some work to do. Zzz.

  • 10_000_000_000 ^ 1_000 works
  • 10 ^ 10_000 triggers a debug assertion

@oscbyspro oscbyspro reopened this Nov 6, 2023
@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 6, 2023

It's late. I'll fix it tomorrow. Also, I think I can write z0 snd z1 directly to the product and then allocate z2.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 6, 2023

Observation. It works if I floor the partition, not ceil it. Is that a coincidence? Feels like it. Maybe not? Zzz.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Nov 6, 2023

Hm. I rather confident that flooring solves it, because then the high part is larger and it must be trimmable to fit with a half shift because otherwise it would not fit with a full shift, which it must because that's just z1. I already have an assertion for it. floor(_:) good, ceil(_:) bad, on floor can sleep.

@oscbyspro
Copy link
Owner Author

Test Case '-[NBKFlexibleWidthKitBenchmarks.NBKFibonacciXLBenchmarks testNoLoop10000000]' passed (2.290 seconds).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brrr such code, much wow
Projects
None yet
Development

No branches or pull requests

1 participant