Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speculative : refactor and add a simpler example #10362

Merged
merged 14 commits into from
Nov 25, 2024

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Nov 17, 2024

cont #10290

  • Refactor the speculative decoding into common/speculative. For now, just basic greedy speculation with a single sequence.
  • Add a more simple speculative-simple example that uses the new API.

For llama-server support, see #10455.

Sample usage:

./bin/llama-speculative-simple \
    -m  ../models/qwen2.5-32b-coder-instruct/ggml-model-q8_0.gguf \
    -md ../models/qwen2.5-0.5b-coder-instruct/ggml-model-q4_0.gguf \
    -f ../../test.txt -c 0 -ngl 99 -ngld 99 --draft-max 16 --draft-min 5 --color \
    --sampling-seq k --top-k 1 -fa

@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Nov 17, 2024
@ggerganov ggerganov force-pushed the gg/speculative-refactor branch 4 times, most recently from 96b1c3b to 74221ef Compare November 17, 2024 17:23
@ggerganov ggerganov force-pushed the gg/speculative-refactor branch from 74221ef to fe043ff Compare November 21, 2024 14:09
@ggerganov ggerganov force-pushed the gg/speculative-refactor branch from dbc7ac5 to e4c122b Compare November 22, 2024 09:08
@github-actions github-actions bot added the testing Everything test related label Nov 22, 2024
@ggerganov ggerganov marked this pull request as ready for review November 22, 2024 11:56
@ggerganov ggerganov removed the demo Demonstrate some concept or idea, not intended to be merged label Nov 24, 2024
@ggerganov ggerganov merged commit d9d54e4 into master Nov 25, 2024
62 checks passed
@ggerganov ggerganov deleted the gg/speculative-refactor branch November 25, 2024 07:58
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
* speculative : refactor and add a simpler example

ggml-ci

* speculative : clean-up and add comments and TODOs [no ci]

* speculative : manage context in common_speculative

ggml-ci

* speculative : simplify

ggml-ci

* speculative : simplify (cont)

ggml-ci

* speculative : add --draft-min CLI arg

* speculative : minor fixup

* make : build fixes

* speculative : do not redraft previous drafts

ggml-ci

* speculative : fix the draft sampling

ggml-ci

* speculative : fix compile warning

* common : refactor args

ggml-ci

* common : change defaults [no ci]

* common : final touches

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant