Local AI using llama.cpp on a Trashcan (Mac Pro 2013)

Intro

I recently acquired a Trashcan Mac Pro to use as a standalone dev machine. It has dual AMD FirePro D700 GPUs, and since I’ve been itching to try out locally hosted models, I ran llama.cpp to see what the performance looked like. There have been very few posts about it so I figured I’d make my own.

I got these results back in March, but never got around to writing about it. This is my first blog post after we welcomed our daughter, Aloura. Even though parenthood has been a ride, it’s been amazing seeing her grow and reach her milestones (like giggling).

Getting this to work required a bit of tweaking, especially with the AMD drivers on Linux, but the results were surprisingly interesting.

Pre-requisites

For this setup, I used Ubuntu 24.04 with the 6.19.10-zabbly kernel to ensure the AMD GPUs were properly recognized and utilized. At the time of writing, kernel 7+ has been released (and is the default kernel for zabbly installs).

To get the graphics cards working, I had to apply some specific kernel parameters. You can add these to your GRUB configuration:

radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=1 amdgpu.dc=1

These parameters disable the older radeon driver and enable the newer amdgpu driver. This allows the D700s to be used for inference using Vulkan. The llama.cpp version used was b8610 (with Vulkan).

Performance Results

Once the environment was set up, I ran several tests using llama-bench to measure the performance across different models with varying context lengths.

Here are the results:

GPT-OSS 20B (Q4_K_S)

model	test	t/s	peak t/s	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
gpt-oss-20b-Q4_K_S.gguf	pp2048	157.85 ± 1.25		11331.73 ± 308.50	11328.95 ± 308.50	11532.27 ± 310.15
gpt-oss-20b-Q4_K_S.gguf	tg32	15.14 ± 0.03	16.00 ± 0.00
gpt-oss-20b-Q4_K_S.gguf	pp2048 @ d4096	148.99 ± 0.90		37101.57 ± 805.79	37098.79 ± 805.79	37305.29 ± 806.57
gpt-oss-20b-Q4_K_S.gguf	tg32 @ d4096	15.17 ± 0.17	15.67 ± 0.47
gpt-oss-20b-Q4_K_S.gguf	pp2048 @ d8192	140.44 ± 0.16		64457.52 ± 312.34	64454.73 ± 312.34	64662.92 ± 315.27
gpt-oss-20b-Q4_K_S.gguf	tg32 @ d8192	14.55 ± 0.01	15.00 ± 0.00
gpt-oss-20b-Q4_K_S.gguf	pp2048 @ d16384	126.07 ± 0.24		128336.23 ± 662.56	128333.45 ± 662.56	128546.71 ± 663.86
gpt-oss-20b-Q4_K_S.gguf	tg32 @ d16384	14.03 ± 0.02	15.00 ± 0.00
gpt-oss-20b-Q4_K_S.gguf	pp2048 @ d32768	103.57 ± 0.01		295724.27 ± 688.69	295721.49 ± 688.69	295956.26 ± 686.78
gpt-oss-20b-Q4_K_S.gguf	tg32 @ d32768	13.15 ± 0.12	14.00 ± 0.00

Qwen 3.5 9B (Q8_0)

model	test	t/s	peak t/s	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
Qwen3.5-9B-Q8_0.gguf	pp2048	178.50 ± 1.67		10277.55 ± 32.49	10274.80 ± 32.49	10277.64 ± 32.50
Qwen3.5-9B-Q8_0.gguf	tg32	9.87 ± 0.00	10.00 ± 0.00
Qwen3.5-9B-Q8_0.gguf	pp2048 @ d4096	221.78 ± 3.31		25134.81 ± 33.85	25132.06 ± 33.85	25134.89 ± 33.85
Qwen3.5-9B-Q8_0.gguf	tg32 @ d4096	9.78 ± 0.03	10.00 ± 0.00
Qwen3.5-9B-Q8_0.gguf	pp2048 @ d8192	221.03 ± 1.37		41972.27 ± 705.33	41969.51 ± 705.33	41972.36 ± 705.34
Qwen3.5-9B-Q8_0.gguf	tg32 @ d8192	9.69 ± 0.02	10.00 ± 0.00
Qwen3.5-9B-Q8_0.gguf	pp2048 @ d16384	221.15 ± 2.72		75485.43 ± 670.50	75482.67 ± 670.50	75485.53 ± 670.52
Qwen3.5-9B-Q8_0.gguf	tg32 @ d16384	9.50 ± 0.03	10.00 ± 0.00
Qwen3.5-9B-Q8_0.gguf	pp2048 @ d32768	175.21 ± 5.48		180395.08 ± 6403.07	180392.32 ± 6403.07	180395.14 ± 6403.07
Qwen3.5-9B-Q8_0.gguf	tg32 @ d32768	8.63 ± 0.69	9.33 ± 0.94

Qwen 3.5 9B (UD-Q4_K_XL)

model	test	t/s	peak t/s	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
Qwen3.5-9B-UD-Q4_K_XL.gguf	pp2048	168.27 ± 2.52		11207.62 ± 180.70	11204.39 ± 180.70	11207.70 ± 180.70
Qwen3.5-9B-UD-Q4_K_XL.gguf	tg32	12.52 ± 0.03	13.00 ± 0.00
Qwen3.5-9B-UD-Q4_K_XL.gguf	pp2048 @ d4096	211.82 ± 5.74		26052.21 ± 775.94	26048.97 ± 775.94	26052.29 ± 775.93
Qwen3.5-9B-UD-Q4_K_XL.gguf	tg32 @ d4096	12.39 ± 0.04	13.00 ± 0.00
Qwen3.5-9B-UD-Q4_K_XL.gguf	pp2048 @ d8192	212.78 ± 3.01		42807.85 ± 749.67	42804.62 ± 749.67	42807.92 ± 749.67
Qwen3.5-9B-UD-Q4_K_XL.gguf	tg32 @ d8192	12.17 ± 0.05	13.00 ± 0.00
Qwen3.5-9B-UD-Q4_K_XL.gguf	pp2048 @ d16384	201.80 ± 7.98		83759.83 ± 3471.76	83756.60 ± 3471.76	83759.89 ± 3471.76
Qwen3.5-9B-UD-Q4_K_XL.gguf	tg32 @ d16384	10.17 ± 1.22	10.67 ± 0.94
Qwen3.5-9B-UD-Q4_K_XL.gguf	pp2048 @ d32768	164.12 ± 1.94		193022.98 ± 1597.32	193019.74 ± 1597.32	193023.05 ± 1597.33
Qwen3.5-9B-UD-Q4_K_XL.gguf	tg32 @ d32768	11.25 ± 0.04	12.00 ± 0.00

Gemma 4 26B-A4B (UD-IQ4_XS)

model	test	t/s	peak t/s	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
gemma-4-26B-A4B-it-UD-IQ4_XS.gguf	pp2048	52.22 ± 1.12		36465.95 ± 895.39	36464.93 ± 895.39	36821.64 ± 909.16
gemma-4-26B-A4B-it-UD-IQ4_XS.gguf	tg32	8.84 ± 0.08	9.33 ± 0.47
gemma-4-26B-A4B-it-UD-IQ4_XS.gguf	pp2048 @ d4096	49.57 ± 1.18		111344.48 ± 3712.67	111343.46 ± 3712.67	111689.86 ± 3712.22
gemma-4-26B-A4B-it-UD-IQ4_XS.gguf	tg32 @ d4096	8.89 ± 0.06	9.00 ± 0.00
gemma-4-26B-A4B-it-UD-IQ4_XS.gguf	pp2048 @ d8192	45.81 ± 0.12		203701.92 ± 4154.70	203700.89 ± 4154.70	204047.89 ± 4153.98
gemma-4-26B-A4B-it-UD-IQ4_XS.gguf	tg32 @ d8192	8.62 ± 0.16	9.00 ± 0.00
gemma-4-26B-A4B-it-UD-IQ4_XS.gguf	pp2048 @ d16384	40.72 ± 0.32		413226.91 ± 6268.73	413225.89 ± 6268.73	413670.64 ± 6279.82
gemma-4-26B-A4B-it-UD-IQ4_XS.gguf	tg32 @ d16384	8.55 ± 0.05	9.00 ± 0.00

Device Lost Error

At times during inference, particularly under heavy loads or extended runs, the process would crash unexpectedly.

I kept encountering this Vulkan-related device lost error:

[Inferior 1 (process 2890) detached]
terminate called after throwing an instance of 'vk::DeviceLostError'
  what():  vk::Queue::submit: ErrorDeviceLost
Aborted (core dumped)

This indicates that the GPU stopped responding to Vulkan command submissions. It could be due to memory limits, power delivery, or driver stability on these older AMD FirePro cards.

Done!

Despite the occasional crash, it was an incredibly fun experiment to see a 2013 machine pushing modern local AI models. The performance on some of these quantized models isn’t half bad given the age of the hardware, pulling respectable tokens per second (let’s not talk about power consumption LOL). If you have a Trashcan Mac Pro lying around, it’s definitely worth a weekend project to set it up!