Local AI using llama.cpp on a Trashcan (Mac Pro 2013)

Kovasky Buezo | Jun 19, 2026 min read

Intro

I recently acquired a Trashcan Mac Pro to use as a standalone dev machine. It has dual AMD FirePro D700 GPUs, and since I’ve been itching to try out locally hosted models, I ran llama.cpp to see what the performance looked like. There have been very few posts about it so I figured I’d make my own.

I got these results back in March, but never got around to writing about it. This is my first blog post after we welcomed our daughter, Aloura. Even though parenthood has been a ride, it’s been amazing seeing her grow and reach her milestones (like giggling).

Getting this to work required a bit of tweaking, especially with the AMD drivers on Linux, but the results were surprisingly interesting.

Pre-requisites

For this setup, I used Ubuntu 24.04 with the 6.19.10-zabbly kernel to ensure the AMD GPUs were properly recognized and utilized. At the time of writing, kernel 7+ has been released (and is the default kernel for zabbly installs).

To get the graphics cards working, I had to apply some specific kernel parameters. You can add these to your GRUB configuration:

radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=1 amdgpu.dc=1

These parameters disable the older radeon driver and enable the newer amdgpu driver. This allows the D700s to be used for inference using Vulkan. The llama.cpp version used was b8610 (with Vulkan).

Performance Results

Once the environment was set up, I ran several tests using llama-bench to measure the performance across different models with varying context lengths.

Here are the results:

GPT-OSS 20B (Q4_K_S)

modeltestt/speak t/sttfr (ms)est_ppt (ms)e2e_ttft (ms)
gpt-oss-20b-Q4_K_S.ggufpp2048157.85 ± 1.2511331.73 ± 308.5011328.95 ± 308.5011532.27 ± 310.15
gpt-oss-20b-Q4_K_S.gguftg3215.14 ± 0.0316.00 ± 0.00
gpt-oss-20b-Q4_K_S.ggufpp2048 @ d4096148.99 ± 0.9037101.57 ± 805.7937098.79 ± 805.7937305.29 ± 806.57
gpt-oss-20b-Q4_K_S.gguftg32 @ d409615.17 ± 0.1715.67 ± 0.47
gpt-oss-20b-Q4_K_S.ggufpp2048 @ d8192140.44 ± 0.1664457.52 ± 312.3464454.73 ± 312.3464662.92 ± 315.27
gpt-oss-20b-Q4_K_S.gguftg32 @ d819214.55 ± 0.0115.00 ± 0.00
gpt-oss-20b-Q4_K_S.ggufpp2048 @ d16384126.07 ± 0.24128336.23 ± 662.56128333.45 ± 662.56128546.71 ± 663.86
gpt-oss-20b-Q4_K_S.gguftg32 @ d1638414.03 ± 0.0215.00 ± 0.00
gpt-oss-20b-Q4_K_S.ggufpp2048 @ d32768103.57 ± 0.01295724.27 ± 688.69295721.49 ± 688.69295956.26 ± 686.78
gpt-oss-20b-Q4_K_S.gguftg32 @ d3276813.15 ± 0.1214.00 ± 0.00

Qwen 3.5 9B (Q8_0)

modeltestt/speak t/sttfr (ms)est_ppt (ms)e2e_ttft (ms)
Qwen3.5-9B-Q8_0.ggufpp2048178.50 ± 1.6710277.55 ± 32.4910274.80 ± 32.4910277.64 ± 32.50
Qwen3.5-9B-Q8_0.gguftg329.87 ± 0.0010.00 ± 0.00
Qwen3.5-9B-Q8_0.ggufpp2048 @ d4096221.78 ± 3.3125134.81 ± 33.8525132.06 ± 33.8525134.89 ± 33.85
Qwen3.5-9B-Q8_0.gguftg32 @ d40969.78 ± 0.0310.00 ± 0.00
Qwen3.5-9B-Q8_0.ggufpp2048 @ d8192221.03 ± 1.3741972.27 ± 705.3341969.51 ± 705.3341972.36 ± 705.34
Qwen3.5-9B-Q8_0.gguftg32 @ d81929.69 ± 0.0210.00 ± 0.00
Qwen3.5-9B-Q8_0.ggufpp2048 @ d16384221.15 ± 2.7275485.43 ± 670.5075482.67 ± 670.5075485.53 ± 670.52
Qwen3.5-9B-Q8_0.gguftg32 @ d163849.50 ± 0.0310.00 ± 0.00
Qwen3.5-9B-Q8_0.ggufpp2048 @ d32768175.21 ± 5.48180395.08 ± 6403.07180392.32 ± 6403.07180395.14 ± 6403.07
Qwen3.5-9B-Q8_0.gguftg32 @ d327688.63 ± 0.699.33 ± 0.94

Qwen 3.5 9B (UD-Q4_K_XL)

modeltestt/speak t/sttfr (ms)est_ppt (ms)e2e_ttft (ms)
Qwen3.5-9B-UD-Q4_K_XL.ggufpp2048168.27 ± 2.5211207.62 ± 180.7011204.39 ± 180.7011207.70 ± 180.70
Qwen3.5-9B-UD-Q4_K_XL.gguftg3212.52 ± 0.0313.00 ± 0.00
Qwen3.5-9B-UD-Q4_K_XL.ggufpp2048 @ d4096211.82 ± 5.7426052.21 ± 775.9426048.97 ± 775.9426052.29 ± 775.93
Qwen3.5-9B-UD-Q4_K_XL.gguftg32 @ d409612.39 ± 0.0413.00 ± 0.00
Qwen3.5-9B-UD-Q4_K_XL.ggufpp2048 @ d8192212.78 ± 3.0142807.85 ± 749.6742804.62 ± 749.6742807.92 ± 749.67
Qwen3.5-9B-UD-Q4_K_XL.gguftg32 @ d819212.17 ± 0.0513.00 ± 0.00
Qwen3.5-9B-UD-Q4_K_XL.ggufpp2048 @ d16384201.80 ± 7.9883759.83 ± 3471.7683756.60 ± 3471.7683759.89 ± 3471.76
Qwen3.5-9B-UD-Q4_K_XL.gguftg32 @ d1638410.17 ± 1.2210.67 ± 0.94
Qwen3.5-9B-UD-Q4_K_XL.ggufpp2048 @ d32768164.12 ± 1.94193022.98 ± 1597.32193019.74 ± 1597.32193023.05 ± 1597.33
Qwen3.5-9B-UD-Q4_K_XL.gguftg32 @ d3276811.25 ± 0.0412.00 ± 0.00

Gemma 4 26B-A4B (UD-IQ4_XS)

modeltestt/speak t/sttfr (ms)est_ppt (ms)e2e_ttft (ms)
gemma-4-26B-A4B-it-UD-IQ4_XS.ggufpp204852.22 ± 1.1236465.95 ± 895.3936464.93 ± 895.3936821.64 ± 909.16
gemma-4-26B-A4B-it-UD-IQ4_XS.gguftg328.84 ± 0.089.33 ± 0.47
gemma-4-26B-A4B-it-UD-IQ4_XS.ggufpp2048 @ d409649.57 ± 1.18111344.48 ± 3712.67111343.46 ± 3712.67111689.86 ± 3712.22
gemma-4-26B-A4B-it-UD-IQ4_XS.gguftg32 @ d40968.89 ± 0.069.00 ± 0.00
gemma-4-26B-A4B-it-UD-IQ4_XS.ggufpp2048 @ d819245.81 ± 0.12203701.92 ± 4154.70203700.89 ± 4154.70204047.89 ± 4153.98
gemma-4-26B-A4B-it-UD-IQ4_XS.gguftg32 @ d81928.62 ± 0.169.00 ± 0.00
gemma-4-26B-A4B-it-UD-IQ4_XS.ggufpp2048 @ d1638440.72 ± 0.32413226.91 ± 6268.73413225.89 ± 6268.73413670.64 ± 6279.82
gemma-4-26B-A4B-it-UD-IQ4_XS.gguftg32 @ d163848.55 ± 0.059.00 ± 0.00

Device Lost Error

At times during inference, particularly under heavy loads or extended runs, the process would crash unexpectedly.

I kept encountering this Vulkan-related device lost error:

[Inferior 1 (process 2890) detached]
terminate called after throwing an instance of 'vk::DeviceLostError'
  what():  vk::Queue::submit: ErrorDeviceLost
Aborted (core dumped)

This indicates that the GPU stopped responding to Vulkan command submissions. It could be due to memory limits, power delivery, or driver stability on these older AMD FirePro cards.

Done!

Despite the occasional crash, it was an incredibly fun experiment to see a 2013 machine pushing modern local AI models. The performance on some of these quantized models isn’t half bad given the age of the hardware, pulling respectable tokens per second (let’s not talk about power consumption LOL). If you have a Trashcan Mac Pro lying around, it’s definitely worth a weekend project to set it up!