I tested this patch with llama.cpp while adding xtheadvector support. Surprisingly, this bug did not prevent the LLM from generating plausible output, though the model's responses became noticeably less coherent. Tested-by: Xiongchuan Tan <tanxiongchuan at isrc.iscas.ac.cn>