Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning might stuck for hours when importing parquet files from cloud storage #56104

Closed
D3Hunter opened this issue Sep 18, 2024 · 5 comments · Fixed by #56205
Closed

lightning might stuck for hours when importing parquet files from cloud storage #56104

D3Hunter opened this issue Sep 18, 2024 · 5 comments · Fixed by #56205
Labels
affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. component/lightning This issue is related to Lightning of TiDB. severity/major type/bug The issue is confirmed as a bug.

Comments

@D3Hunter
Copy link
Contributor

D3Hunter commented Sep 18, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

after #46984, including >= v7.5.x, 7.1.3+, we will sample parquet all the time in serial, it's very slow and might takes hours if user have large mount of parquet files before lightning start doing import, and the time takes to sample the files might even longer than real import work.

we only need this size for displaying progress more accurately and use it as a reference when splitting engine, but slowing import this much is un-acceptable.

2. What did you expect to see? (Required)

start import fast

3. What did you see instead (Required)

it might takes hours before start doing any work

4. What is your TiDB version? (Required)

@D3Hunter D3Hunter added type/bug The issue is confirmed as a bug. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. severity/major labels Sep 18, 2024
@ti-chi-bot ti-chi-bot bot added may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 labels Sep 18, 2024
@D3Hunter
Copy link
Contributor Author

it's a enhance type of bug, it need to pick back

@zeminzhou
Copy link
Contributor

How about sampling just one file? Sample the first file and calculate a compression ratio, and use this compression ratio to estimate the remaining files.

@D3Hunter
Copy link
Contributor Author

lgtm

@lance6716 lance6716 added affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. labels Sep 24, 2024
@lance6716 lance6716 removed the may-affects-5.4 This bug maybe affects 5.4.x versions. label Sep 24, 2024
@lance6716
Copy link
Contributor

lance6716 commented Sep 25, 2024

How about sampling just one file

@zeminzhou Do you have real data to support this idea? I'm afraid that different table structure may have different compression ratio, so IMO we should at least sample once for every table. But the performance is not improved if every table only has 1 parquet file.

@zeminzhou
Copy link
Contributor

Good catch. The different table structure may have different compression ratio, I will fix #56205 and test with different tables.

@jebter jebter added the component/lightning This issue is related to Lightning of TiDB. label Oct 23, 2024
@ti-chi-bot ti-chi-bot bot added the affects-8.5 This bug affects the 8.5.x(LTS) versions. label Nov 1, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in 0a9a231 Nov 18, 2024
ti-chi-bot bot pushed a commit that referenced this issue Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. component/lightning This issue is related to Lightning of TiDB. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants