Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP overeager retransmission (IDFGH-14130) #14934

Closed
3 tasks done
bryghtlabs-richard opened this issue Nov 25, 2024 · 4 comments
Closed
3 tasks done

TCP overeager retransmission (IDFGH-14130) #14934

bryghtlabs-richard opened this issue Nov 25, 2024 · 4 comments
Assignees
Labels
Status: Opened Issue is new

Comments

@bryghtlabs-richard
Copy link
Contributor

bryghtlabs-richard commented Nov 25, 2024

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

Sometimes(usually every 45-75 seconds but sometimes within a few seconds) during a WebSocket connection on ESP-IDF 5.1.2, the ESP32's TCP stack will send a TCP retransmission very quickly, well before the retransmission time-out should've fired. This then causes the server to emit a duplicate acknowledgement unneccesarily, and generally increases data transfer overhead:
image

Here are the relevant LwIP configurations I thought might be important:

CONFIG_LWIP_TIMERS_ONDEMAND=y
CONFIG_LWIP_TCP_RTO_TIME=500
CONFIG_LWIP_TCP_TMR_INTERVAL=250

I attach the others here: sdkconfig_lwip.txt

@espressif-bot espressif-bot added the Status: Opened Issue is new label Nov 25, 2024
@github-actions github-actions bot changed the title TCP overeager retransmission TCP overeager retransmission (IDFGH-14130) Nov 25, 2024
@bryghtlabs-richard
Copy link
Contributor Author

Our RTT for this connection is approximately 35ms.

I haven't touched this layer of networking in a long time, it looks like the recommendation per-connection is now:

RTTVAR <- (1 - beta) * RTTVAR + beta * |SRTT - R'| //Smooth abs(deviation from SRTT) into RTTVAR
SRTT <- (1 - alpha) * SRTT + alpha * R' //Smooth the RTT value into SRTT
RTO <- SRTT + max (G, K*RTTVAR) //Where K=4, G=ClockGranularity so RTO should be SmoothedRTT, plus max(clock-Granularity, 4x RTTVAR).

I think it's on me to record this again and try to compute values for RTTVAR and SRTT - perhaps our RTT is remarkably stable, so RTTVAR is close to zero?

@bryghtlabs-richard
Copy link
Contributor Author

After rereading the RTO section of RFC, I didn't realize it's expected for newer TCP stacks could even compute an RTO so low, so perhaps these retransmissions may be expected?

image

I'm sorry to have bothered you about this, my knowledge was quite out of date.

@bryghtlabs-richard bryghtlabs-richard closed this as not planned Won't fix, can't repro, duplicate, stale Dec 2, 2024
@bryghtlabs-richard
Copy link
Contributor Author

bryghtlabs-richard commented Dec 2, 2024

Hmm, updating sys_now() to use highres time instead of OS ticks seems it may help some. More testing needed.

u32_t sys_now(void) {
int64_t esp_timer_get_time();
return esp_timer_get_time() / 1000;
//return xTaskGetTickCount() * portTICK_PERIOD_MS;
}

@rsaxvc
Copy link

rsaxvc commented Dec 14, 2024

Perhaps an LwIP issue, or an ESP integration with LwIP issue. Odd that I can't seem to find how portTICK_PERIOD_MS is mapped to G ESP-IDF/components/lwip code, but that should only impact RTO once RTTVAR < portTICK_PERIOD_MS / 4, which shouldn't be the case here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new
Projects
None yet
Development

No branches or pull requests

4 participants