This will patch the safetensors python module, used by vLLM to load models in safetensors format. The patch will enable the automatic detection and loading of zipnn-compressed models. If you use vLLM ...
This will build a new image vllm-openai:zipnn that you can use to run vLLM in a container with zipnn support. Alternatively, you can simply use the pre-built zipnn/vllm-openai:latest image. Note that ...