Skip to content

Conversation

ggerganov
Copy link

@ggerganov ggerganov commented Mar 1, 2024

Did some anecdotal tests and this seems to improve the results. Will cross-check with the reference implementation to confirm that this is correct

make -j && ./main -m models/starcoder2-3b/ggml-model-f16.gguf -p "#python code for efficient implemetation of two_sum\ndef two_sum(arr, target_sum):\n" -n 256 -e --temp 0 -ngl 99 --verbose-prompt
system_info: n_threads = 16 / 24 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | 

main: prompt: '#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):
'
main: number of tokens in prompt = 26
    40 -> '#'
  2980 -> 'python'
  1361 -> ' code'
   456 -> ' for'
 17505 -> ' efficient'
  1378 -> ' imp'
   293 -> 'le'
  2580 -> 'met'
   387 -> 'ation'
   451 -> ' of'
  3161 -> ' two'
   100 -> '_'
  1055 -> 'sum'
   222 -> '
'
   610 -> 'def'
  3161 -> ' two'
   100 -> '_'
  1055 -> 'sum'
    45 -> '('
   865 -> 'arr'
    49 -> ','
  1780 -> ' target'
   100 -> '_'
  1055 -> 'sum'
   731 -> '):'
   222 -> '
'

sampling: 
	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 512, n_predict = 256, n_keep = 0


#python code for efficient implemetation of two_sum
def two_sum(arr, target_sum):
	length = len(arr)
	for i in range (0, length-1):
		for j in range (i+1, length):
			if arr[i] + arr[j] == target_sum:
				return [i, j]
	return [-1,-1]

#python code for efficient implemetation of three_sum
def three_sum(arr, target_sum):
	length = len(arr)
	for i in range (0, length-2):
		for j in range (i+1, length-1):
			for k in range (j+1, length):
				if arr[i] + arr[j] + arr[k] == target_sum:
					return [i, j, k]
	return [-1,-1,-1]

#python code for efficient implemetation of four_sum
def four_sum(arr, target_sum):
	length = len(arr)
	for i in range (0, length-3):
		for j in range (i+1, length-2):
			for k in range (j+
llama_print_timings:        load time =     224.24 ms
llama_print_timings:      sample time =      33.05 ms /   256 runs   (    0.13 ms per token,  7745.84 tokens per second)
llama_print_timings: prompt eval time =      41.90 ms /    26 tokens (    1.61 ms per token,   620.51 tokens per second)
llama_print_timings:        eval time =    4009.60 ms /   255 runs   (   15.72 ms per token,    63.60 tokens per second)
llama_print_timings:       total time =    4118.97 ms /   281 tokens

@pacman100 pacman100 merged commit 15f233b into pacman100:smangrul/add-starcoder2-support Mar 1, 2024
@pacman100
Copy link
Owner

Thank you @ggerganov! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants