Would it be possible to give at least one known-working example for using Gemma.cpp as an API server, as described in https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md?
To wit, building and running Gemma with one of the Gemma.cpp-specific models from kaggle yields this:
./bazel-bin/gemma_api_server --weights $HOME/Downloads/gemma-3/2b-pt.sbs --tokenizer $HOME/Downloads/gemma-3/tokenizer.spm --model gemma3
Loading model... 5012344832 blob bytes (100.00%) of bf16 normalizer.cc(52) LOG(INFO) precompiled_charsmap is empty. use identity normalization. Abort at weights.cc:753: Tensor post_att_ns_0 is required but not found in file.
/usr/bin/bash: line 3: 2931544 Aborted (core dumped) ( ./bazel-bin/gemma_api_server --weights $HOME/Downloads/gemma-3/2b-pt.sbs --tokenizer $HOME/Downloads/gemma-3/tokenizer.spm --model gemma3 )
/usr/bin/bash: line 4: /tmp/gemini-shell-WIkCYY/pgrep.tmp: No such file or directory
Specifically, at the Usage section, it would be nice to hint which one of the several identical-seeming files might work.
Would it be possible to give at least one known-working example for using Gemma.cpp as an API server, as described in https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md?
To wit, building and running Gemma with one of the Gemma.cpp-specific models from kaggle yields this:
Specifically, at the Usage section, it would be nice to hint which one of the several identical-seeming files might work.