The Frankenstein MiniPC

4 GPUs, 120 GB VRAM, 60W idle — how a tiny MiniPC became a 235B inference server

Setup: AOOSTAR GEM 10 MiniPC • 3x Tesla P40 (24 GB each) + 1x Quadro RTX 8000 (48 GB) = 120 GB VRAM (~115 GB usable) • Runs 235B models fully GPU-resident • 24/7 at ~60W idle • Cost: ~€3,200 total

The Base: AOOSTAR GEM 10

AMD Ryzen 9 7945HX, 32 GB RAM
3x M.2 2280 NVMe slots (1 TB SSD installed, 2 free)
1x OCuLink port (external)
1x USB4 port (external)
Compact, silent, runs 24/7

Originally bought as a simple home server. Then the GPU addiction started.

The Build — Step by Step

Step 1: First GPU — P40 via OCuLink

24 GB VRAM • "This is great, I'm done."

AOOSTAR AG01 eGPU adapter, Tesla P40, connected via OCuLink. Worked immediately. Running 30B models.

I was not done.

Step 2: Second GPU — P40 via USB4

48 GB VRAM total • 80B MoE models

AOOSTAR AG02 eGPU adapter with another P40 via USB4. Also worked immediately. The MiniPC handles both OCuLink and USB4 simultaneously — they don't share lanes. Before buying, AOOSTAR support confirmed this would work.

Step 3: Third GPU — P40 via internal M.2 (the one with the saw)

72 GB VRAM total • Creative cable management required

M.2-to-OCuLink adapter (K49SQBK, PCIe 5.0, active chip) plugged into a free internal M.2 slot. To get the cable out: sawed a slot into the fan grille on the side panel. Not pretty, but it works. Connected another AG01 + P40.

AOOSTAR support said M.2-to-OCuLink should work in principle. It did.

Step 4: The RTX 8000 — Where Things Got Frustrating

The dream: replace P40s with RTX 8000s • 4x 48 GB = 192 GB

Bought a Quadro RTX 8000 (48 GB). It would NOT work over OCuLink — wouldn't even complete POST. Hung at the handshake. P40s worked fine in the same slot.

Tried different BIOS settings, tried the Smokeless BIOS tool to access hidden UEFI variables — nothing helped. Moved it to the AG02 (USB4) where it worked, but that meant losing a P40 slot. Days of frustration.

Step 5: ReBarUEFI — The Breakthrough

github.com/xCuri0/ReBarUEFI

The problem: GEM 10's BIOS doesn't expose Resizable BAR settings, and the RTX 8000 needs a BAR larger than 256 MB to work over OCuLink. P40s are older and don't care.

ReBarState writes the BAR size directly into UEFI NVRAM. Set it to 4 GB, rebooted — RTX 8000 worked everywhere. OCuLink, M.2 adapter, AG01. Nearly fell off my chair.

Don't bother with the Smokeless BIOS tool if you need ReBAR — go straight to ReBarUEFI.

Step 6: Final Setup — 4 GPUs

120 GB VRAM • The Frankenstein lives

One more AG01 adapter + M.2-to-OCuLink adapter (second sawed slot in the fan grille). Each connection: PCIe x4, not shared, measured and verified.

GPU	VRAM	Connection	Adapter
Tesla P40 #1	24 GB	OCuLink (external port)	AG01
Tesla P40 #2	24 GB	M.2 → OCuLink (sawed grille)	AG01
Tesla P40 #3	24 GB	M.2 → OCuLink (sawed grille)	AG01
RTX 8000	48 GB	USB4 (external port)	AG02
Total	120 GB (~115 usable)

Photos

MiniPC close-up with OCuLink and USB4 cables

The MiniPC with OCuLink cables running to AG01 adapters and USB4 to the AG02. The two yellow cables are Ethernet — one for LAN, one for direct point-to-point RPC to the development machine.

The complete "server rack" — a wooden shelf with 3x AG01 + 1x AG02 eGPU adapters, each holding a GPU. The desk fan is for the operator, not the GPUs.

Cooling

The P40s and RTX 8000 are server/workstation cards — passive or blower-style coolers designed for chassis airflow that doesn't exist in an open shelf. Solution: 3D-printed fan adapters with BFB1012HH fans and temperature-controlled PWM fan controllers with probes.

Initially tried higher-CFM fans (BFB1012VH) — unbearably loud and didn't cool any better. The BFB1012HH are the sweet spot: quiet enough to live with, even at full speed. Even at 100% GPU load, nvidia-smi rarely shows temperatures above 50°C.

The eGPU adapters have small built-in fans, but they rarely spin up.

Cost Breakdown

Component	Price	Source
AOOSTAR GEM 10 MiniPC	~€450	New
Tesla P40 #1 + #2	~€190 each	AliExpress (+ customs)
Tesla P40 #3	~€200	AliExpress (+ customs)
RTX 8000	~€1,200	Used, Germany
AG01 eGPU adapter (x3)	~€155 each	AOOSTAR
AG02 eGPU adapter (x1)	~€210	AOOSTAR
M.2-to-OCuLink (x2, K49SQBK, PCIe 5.0)	~€45-50 each + customs	AliExpress
BFB1012HH fans (x4)	~€10 each	AliExpress
PWM fan controllers (x4)	~€10 each	AliExpress
3D-printed fan adapters	Free	Self-printed
Total	~€3,200

Power Consumption (Idle)

Component	Idle Power
Tesla P40 (x3)	~9-10W each = ~30W
RTX 8000	~20W
MiniPC	~7-10W
Total	~60W

A 120 GB VRAM inference server at 60W idle. Try that with a proper server rack.

What It Runs

Model	Size	Quant	GPUs	Tensor Split	Context	KV Cache	TG tok/s
Qwen3-4B Instruct	4B	Q8_0	1 (RTX 8000)	—	262K	f16	~30
Qwen3-14B Base	14B	Q4_K_M	1 (RTX 8000)	—	41K	f16	~25
Qwen3-30B-A3B Instruct	30B MoE	Q8_0	2	—	262K	f16	~35
Qwen3-VL-30B-A3B (Vision)	30B MoE	Q8_0	2	—	262K	f16	~30
GPT-OSS-120B-A5B	120B MoE	Q8_K_XL	2	2:1:1:1	131K	f16	~50
Qwen3-Next-80B-A3B	80B MoE	Q8_K_XL	4	22:9:9:8	262K	f16	~35
Qwen3.5-122B-A10B	122B MoE	Q5_K_XL	4	2:1:1:1	262K	f16	~21
Nemotron-3-Super-120B	120B NAS-MoE	Q5_K_XL	4	2:1:1:1	874K	f16	~17
Qwen3-235B-A22B Instruct	235B MoE	Q3_K_XL	4	2:1:1:1	112K	q8_0	~11

All models GPU-only (ngl=99), flash-attn, Direct-IO, mlock. Context sizes auto-calibrated by AIfred to maximize available VRAM. The 2:1:1:1 tensor split gives RTX 8000 twice as many layers as each P40 (proportional to VRAM: 48:24:24:24). Qwen3-Next-80B uses a custom 22:9:9:8 split optimized by AIfred's calibration algorithm.

Model lifecycle managed by llama-swap — models auto-swap on request, Direct-IO makes loading near-instant.

Limitations

No tensor parallelism (P40s: compute capability 6.1)
No vLLM (needs CC 7.0+)
RTX 8000 (CC 7.5) slightly bottlenecked alongside P40s
No native BF16 on either GPU (FP16 works fine)

Lessons Learned

What I'd do differently

64 GB RAM from the start. 32 GB is tight with 200B+ models.
RTX 8000 prices have risen significantly — most listed above €2,000 now. Got lucky at €1,200.
Skip the Smokeless BIOS tool — go straight to ReBarUEFI.

What I wouldn't change

MiniPC form factor. Silent, tiny, sips power, runs 24/7.
llama.cpp + llama-swap. Calibrate once per model, done.
OCuLink. Reliable x4 bandwidth, no driver issues.
Incremental approach. Start small, verify, expand.

Next upgrade

If another RTX 8000 shows up at a reasonable price, a P40 gets swapped. The dream of 4x RTX 8000 = 192 GB VRAM is alive — ReBAR is sorted, just need the cards.

The honest take

For €3,200 you could probably get a 128 GB unified memory MiniPC and call it a day. But I didn't know where this was going when I started. One GPU became two, two became four, and suddenly I'm sawing fan grilles. That's how hobbies work. And honestly, the building was half the fun.