Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OLMo November 2024 model #10394

Merged
merged 4 commits into from
Nov 19, 2024
Merged

Conversation

2015aroras
Copy link
Contributor

This PR implements the architecture for the soon-to-be-released OLMo November 2024 model. Tested with a model checkpoint that is through a large portion of pretraining. Fixes #10316.

Architecture differences from the original OLMo model:

  • RMSNorm is used instead of standard layer norm.
  • Norm is applied to attention queries and keys.
  • Norm is applied after attention/feedforward rather than before.

Test (base model):

// Download OLMoE weights to models/olmo-nov-hf
python3 convert_hf_to_gguf.py models/olmo-nov-hf
./llama-quantize models/olmo-nov-hf/OLMo-7B-1124-hf-7.3B-F16.gguf models/olmo-hf/OLMo-7B-1124-hf-7.3B-Q4_K_M.gguf Q4_K_M
./llama-cli -m models/olmo-nov-hf/OLMo-7B-1124-hf-7.3B-Q4_K_M.gguf -p "Bitcoin is" -n 128

Output:

Bitcoin is not only a currency, but also a technology and a movement. It has been heralded as the currency of the future, but it has also been called a bubble. Despite its volatility, it is an asset that is highly sought after, and it is a technology that has been praised by many. But what is it that makes Bitcoin so special?

Bitcoin is a decentralized digital currency that was created in 2009. It is not issued by any government or central bank, and it is not backed by any physical asset. Bitcoin is a virtual currency that can be used to purchase goods and services, and it can also be exchanged for traditional

@github-actions github-actions bot added the python python script changes label Nov 18, 2024
@@ -3040,6 +3040,11 @@ def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iter
return [(self.map_tensor_name(name), data_torch)]


@Model.register("Olmo1124ForCausalLM")
class Olmo1124Model(Model):
model_arch = gguf.MODEL_ARCH.OLMO_1124
Copy link
Contributor Author

@2015aroras 2015aroras Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is a lot simpler than OlmoModel because that model is using the wrong rope type and the OlmoModel compensates for it using modify_tensors.

@ggerganov ggerganov merged commit a88ad00 into ggerganov:master Nov 19, 2024
56 checks passed
@2015aroras 2015aroras deleted the olmo-nov-24-final branch November 19, 2024 18:23
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
* Add OLMo November 2024 constants

* Add OLMo November 2024 converter

* Add loading of OLMo November 2024 tensors and hyper parameters

* Add building of OLMo November 2024 model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Add OLMo November 2024
3 participants