Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback error when exporting conversation #234

Closed
natea opened this issue Sep 9, 2023 · 11 comments · Fixed by #235
Closed

Callback error when exporting conversation #234

natea opened this issue Sep 9, 2023 · 11 comments · Fixed by #235

Comments

@natea
Copy link

natea commented Sep 9, 2023

Describe the bug

Callback error when exporting conversation. Any idea why this error occured?

To Reproduce
Steps to reproduce the behavior:

  1. Run slackdump like this ./slackdump -export my-slack-export -download -export-type standard

Expected behavior

I would have expected the command to complete without giving an error.

Output

This is the error that I observed:

2023/09/08 18:49:50 application error: export error: channels: error: error exporting conversation 
CAHFFCHPG: failed to dump "questions" (CAHFFCHPG): callback error: failed to dump channel:thread 
CAHFFCHPG:1560791995.077000: Post "https://slack.com/api/conversations.replies": 
read tcp 192.168.1.167:50726->34.204.109.226:443: read: connection reset by peer

Desktop (please complete the following information):

  • OS: MacOS Ventura

Additional context
Add any other context about the problem here.

@natea natea changed the title Callback error when exporting conversatio Callback error when exporting conversation Sep 10, 2023
@natea
Copy link
Author

natea commented Sep 10, 2023

I'm wondering if I can just run the slackdump command again and it will pick up where it left off, or if it's going to do everything all over again.

I guess I could run the command again and exclude all the channels that it already downloaded, but then will they be added to channels.json so I can open up the the entire workspace in slack-export-viewer?

@rusq
Copy link
Owner

rusq commented Sep 11, 2023

Hey @natea , if you run the export of this particular isolated thread: CAHFFCHPG:1560791995.077000 will you get this error?

slackdump CAHFFCHPG:1560791995.077000

Also, this is quite strange:

> curl 34.204.109.226:443
curl: (56) Recv failure: Connection reset by peer

> nslookup 34.204.109.226
Non-authoritative answer:
226.109.204.34.in-addr.arpa	name = ec2-34-204-109-226.compute-1.amazonaws.com.

looks like a EC2 instance, and now I'm wondering why was it trying to fetch from there, it may have been an intermittent failure

@natea
Copy link
Author

natea commented Sep 19, 2023

@rusq I ran that command and didn't get an error:

$ ./slackdump CAHFFCHPG:1560791995.077000
Slackdump 2.4.1 (commit: be0e57febccf8d705f43da2474e07c384414fe25) built on: 2023-08-15T09:48:09Z
2023/09/19 15:53:07 > checking user cache...
2023/09/19 15:53:07   cache expired: it will be recreated.
2023/09/19 15:53:07   thread request #    1, fetched:   17, total:       17, process results:  (speed: 203.38/sec, avg: 203.38/sec)
2023/09/19 15:53:07   thread fetch complete, total: 17
2023/09/19 15:53:07 dumped 1 item(s)
2023/09/19 15:53:07 completed, time taken: 425.293417ms

looks like a EC2 instance, and now I'm wondering why was it trying to fetch from there, it may have been an intermittent failure

Are you saying that it was trying to retrieve a file that was hosted on an EC2 instance, that perhaps that EC2 instance is no longer running, which is why the file retrieval failed?

Does Slackdump have a timeout mechanism so if it's unable to retrieve a file from a remote site after a particular duration, it will skip it and go on with the next one?

What do you suggest I do at this point? Do I try running the slackdump command again, or will that reset from the beginning and re-download everything?

@rusq
Copy link
Owner

rusq commented Sep 20, 2023

Most likely at that moment there was an internet issue, or Slack was doing something obscene with it's cluster nodes.

I suspect if you retry the full export it will complete successfully.

Let me know how it goes?

@natea
Copy link
Author

natea commented Sep 21, 2023

@rusq I ran a full export again last night, and got another similar error but on a different channel:

2023/09/20 22:30:56 messages request #   63, fetched:  200 (threads: 3, files: 3), total:    12600 (speed: 186.90/sec, avg:  15.84/sec)
2023/09/20 22:34:35 file "FME1BQXD5-Screenshot 2019-08-14 at 14.34.22.png" saved to dashboard/attachments: 32828 bytes written
2023/09/20 22:34:39 file "FM8HMDQQZ-Screen Shot 2019-08-16 at 9.02.19 AM.png" saved to dashboard/attachments: 73300 bytes written
2023/09/20 22:39:30 file "FM0JN02SF-OnPaste.20190814-111620.png" saved to dashboard/attachments: 95585 bytes written
2023/09/20 22:39:30 application error: export error: channels: error: error exporting conversation C0DLX9U86: failed to dump "dashboard" (C0DLX9U86): callback error: failed to dump channel C0DLX9U86: read tcp 192.168.4.78:60563->34.205.195.66:443: read: connection reset by peer

Rather than do a full export, would it be better to explicitly name all the channels so that I can have more control over which ones are downloaded and then if there's an error, I can remove the completed channels from the list, and re-run the export with just hte ones that haven't been downloaded already?

I'm concerned that I could do this export multiple times and it will continue to get hung up on a channel, requiring me to do a full export again. The current export directory is approximately 12GB, so it's a non-trivial amount of data to download, especially with all the attachments.

Alternatively, could some error handling be added to the script that when encountering an error like this, it would skip over that channel or attachment, and try again later, and not abort the entire operation?

@rusq
Copy link
Owner

rusq commented Sep 23, 2023

Hey Nate, I have introduced retry logic on network errors in #235 , you can check v2.4.2 on the Releases page. By default it attempts to retry 3 times with an exponential backoff of 1, 2, 4 seconds. Let me know if it works for you.

Note to self: ported to cli-remake too

@natea
Copy link
Author

natea commented Sep 24, 2023

That worked! It appears to have downloaded all 615 channels.

2023/09/24 09:21:07 channels request #    9, fetched:   56, total:      615 (speed:   0.02/sec, avg:   0.02/sec)
2023/09/24 09:21:07 channels fetch complete, total: 615 channels
2023/09/24 09:21:07   out of which exported:  615

It looks like it got the private channels too, and the multi-person DMs (signified by mpdm- prefix?).

Any idea why slack-export-viewer has no problem displaying the multi-person DMs and the private 1-1 DMs, but SlackLogViewer does not show any Direct messages or Group messages?

@natea
Copy link
Author

natea commented Sep 25, 2023

btw, is there a way to speed up the export, perhaps increasing the number of download threads? I'm on a 1Gigabit connection if that makes a difference.

I'm going to do another export using -r text so i have the conversations in a plaintext format, and won't bother to download the attachments this time, so i'm wondering if i can speed up the downloads of the Slack conversations.

@natea
Copy link
Author

natea commented Sep 25, 2023

Strangely the Mattermost export took 9 hr while the normal export took 8 hrs.

2023/09/25 08:50:23 channels fetch complete, total: 615 channels
2023/09/25 08:50:23   out of which exported:  615
2023/09/25 08:50:23 completed, time taken: 9h12m35.023321709s

@rusq
Copy link
Owner

rusq commented Sep 26, 2023

Hey Nate, thanks for the feedback, glad to hear that it worked. It must have been slack server latency/rate limiting, because mattermost and standard export formats are exactly the same in the way that they treat messages and threads. The only difference is where they put the file attachments.

Regarding the connection speed — short answer: you can experiment with the rate limiting in the slackdump CLI, by default it's set to safe values as to prevent hitting the rate limit error from Slack.

Long answer: there are several factors that affect that, from "affects the most" to "affects the least":

  1. Rate limiting. There are four throttling tiers in Slack API. If one exceeds the rate limit defined for the particular endpoint, the client receives 429 error and has to wait the number of seconds returned by the server before retrying. If one adheres to the limits, the number of 429 errors is lower. Slack allows for short bursts, but it's really a gamble. On short dumps, one might get away by exceeding these limits for short period of time.
  2. Number of threads in the channel — imagine we download a chunk of 100 messages. If none of these messages contain threads, we move on to the next chunk. If every message in the chunk has a thread attached to it, this will result in 100+ additional calls to the API to fetch the thread contents, basically reducing the speed 100+ times.
  3. Requesting 100 messages in a chunk does not guarantee that Slack will return 100 messages in response to the API call. Depending on their internal logic, it might return less. When testing on a huge channel which had messages from 2015, I saw it returning just 1 message instead of requested 100 per batch, so instead of, say, 10 requests to fetch 1000 messages with 100 chunk size, slackdump had to make 1000 requests, because Slack was being unreasonable.

@rusq
Copy link
Owner

rusq commented Sep 26, 2023

Re SlackLogViewer, I saw you opened an issue with it (thayakawa-gh/SlackLogViewer#19). That's exactly what I'd do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants