Models using OpenAI endpoint have caching enabled #1589

nsarrazin · 2024-11-25T12:47:01Z

When using models that are currently using the OpenAI endpoint type on HuggingChat (Nemotron, llama 3.2, qwen coder) they seem to have caching enabled.

This means retrying will just reload the previous response extremely quickly. This is not the intended behaviour and does not match what is happening when using the TGI endpoint.

nsarrazin added the huggingchat For issues related to HuggingChat specifically label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models using OpenAI endpoint have caching enabled #1589

Models using OpenAI endpoint have caching enabled #1589

nsarrazin commented Nov 25, 2024

Models using OpenAI endpoint have caching enabled #1589

Models using OpenAI endpoint have caching enabled #1589

Comments

nsarrazin commented Nov 25, 2024