And of course I am writing this as my first real post here on my developer site. I find myself with so many ongoing projects that it actually would benefit with a more centralized place for providing documentation of it all. And so of course this kind of post is more akin to a status update, but do expect more content in the form of guides as well within this site at some point.
I recently had began working with LLM’s (like the stuff with Chat GPT but you run it on your own computer). And to be honest with you I find myself realizing the staggering costs required to take part in even running a LLM, let alone figure out why you would do so in the first place. It’s basically just something that runs incredibly slow if you trying to run some sort of LLM on your personal laptop that doesn’t have some kind of desktop GPU built in. And so I recently began taking a more serious look into older GPU’s such as the Tesla Series from NVIDIA that are now sold for incredibly low prices given that they are incredibly difficult to install or use. That’s because they were designed to be installed in specifically disjoined server racks that simply churned numbers all day.
I am thrilled to present my significant contribution to the enhancement of the public access chat streaming API, https://t.co/ExIk8Csqhe which provides a session-based chat queries and websocket management, has seen a performance increase of over 100% thanks to @NVIDIATesla pic.twitter.com/Lq150ajiaO
— Mr. Ziping Liu 子平六 (@TheFakeLiu) October 19, 2023
External GPU cases became quite popular a few years back when Thunderbolt 3 laptops began featuring capabilities of connecting devices through its USB ports with speeds of 40 Gigabytes per second. This allowed for connecting peripherals that provided more functionality than just memory devices (such as USB flash drives). External GPU devices became a new form of peripheral that could be attached to the laptop to provide it with an almost desktop like GPU ability. Unfortunately this trend has died down a bit in popularity due to the difficulty in getting setting up a laptop to interface properly with an external GPU.
Connecting the NVIDIA Tesla GPU to my External GPU Adapter
And I wanted to see if I could just get away with buying a GPU without building an entire desktop. After days of mustering methods used to try to get the NVIDIA Tesla GPU to actually work with my External GPU adapter (that is designed for Gaming GPU’s) I finally got things working that exceeded my expectations.
As a quick way to provide details on how I was able to connect my thunderbolt 3 enabled MacBook Pro to the Tesla Series (K80 and P40) GPU, I did the following configuration steps after trying many different ways. After a bit of struggle, I finally got setup that was able to recognize the NVIDIA TESLA GPU through the USB C Thunderbolt 3 connection via an external GPU adapter via the AKITO Node Pro.
Operation System Setup
Windows 10 was the only OS that proved feasible. I ended choosing the build version: 1903v1, which can be download via archive.org. I tried using later versions of Windows but it ended giving me the dreaded Code 12 error. In terms of feasibility on MacOS, MacOS does not support NVIDIA GPU’s. In terms of Linux, I tried using Ubuntu 20.04 but could not get the thunderbolt connection to recognize the GPU fully (I do think it’s feasible because I was able to see it as a device but simply could find a way to connect it properly as a GPU display device, the device was recognized as some sort of USB peripheral).
Power Connection Setup
The TESLA GPU is connected via the Akito Node Pro Thunderbolt 3 External PCIE GPU Case (which offers a standard 16 pin PCIE slot). In terms of power connectivity, the TESLA GPU necessitates a non-traditional 8-PIN ATX input. However, this minor inconvenience can be easily rectified by employing an ATX 8PIN to 8 PIN PCIE power connector. By connecting the 8 PIN PCIE to the Power Supply Unit and subsequently linking the ATX 8 PIN to the NVIDIA TESLA, the power requirements of the GPU can be adequately met. At minimal the power cable needs to provide 75 watts of power, (assuming that the 16 PIN PCIE interface provides another 75 watts), given that the NVIDIA TESLA GPUs require at minimal 150 watts to function.
- Notes: In my setup, the PCIE to ATX Power cable that I utilize to power the Tesla provides 75 Watts of power to the Tesla GPU. Additionally, the AKITO Node Pro 16 PIN PCIE that interfaces with the Tesla GPU contributes an extra 75 Watts of power to the GPU.

I changed the default power connection setup that the Node Pro uses by doing the following:
I used a very basic power connector, a 8 PIN ATX to 6+2 PIN PCIE power cable. The 8 PIN ATX connects into the NVIDIA Tesla GPU, while the PCIE end connects to a 8 PIN PCIE power output on a PSU.
- The PSU that the AKITO Node uses, has two 8-PIN PCIE outputs, and are both used as the default configuration with connectors PCIE connectors that are intended to provide power to a typical GPU. Obviously the configuration does not work with the Tesla GPU given that the TESLA GPU requires power via an 8-PIN ATX power cable.
- I replaced one of the PCIE outputs, with my PCIE to ATX power cable. to allow the Tesla GPU to connect to the PSU through its ATX input.
- I did not utilize any power splitters since I did not have a need to use more power than the minimal that my connector provides. The Tesla series GPU can take up to 300 Watts of Power.
- The PCIE to ATX Power cable that I use to power the Tesla only provides 75 Watts of power to the Tesla GPU.
- Fortunately, the AKITO Node Pro 16 PIN PCIE that interfaces with the Tesla GPU provides an additional 75 Watts of power to the GPU.
- With the configuration, I am able to power the Tesla GPU with 250 Watts of power, the minimal amount of power required for the GPU to run and in my case is more than sufficient for running inference from language models or training language models.
Thunderbolt Connection to MacBook Setup
The connection of the GPU to the MacBook is done through hot plugging the AKITO Node Pro. In general, with EGPU setups using MacBooks, connnection is only possible via hot plugging the GPU.
This means that the EGPU Connection must only occur after the MacBook pro is fully booted into Windows. In my case, I simply boot up my MacBook through bootcamp, into Windows 10, and login to my Windows User Account as normal. I however ensure that there are no other devices connected to the Macbook via the USB C/Thunderbolt ports as it may cause issues with the GPU connecting to the MacBook. (The GPU Connection requires a lot of resources that can be effected by other usb connections).

Once I was finally able to get Windows 10 to recognize the connection as a proper display device, I was able to simply follow standard instructions for installing NVIDIA’s CUDA Libraries and Drivers. I was then able to get PyTorch to work. As I said before, I found the performance to be remarkably similar to my GPU Server that uses a newer NVIDIA’s Titan RTX.
Inference using Nvidia TESLA P40 via thunderbolt external eGPU setup. Remarkably, it is able to show case perforamce on par with desktops running on modern GPU’s. It’s important to note that eGPU connections via thunderbolt C are less than half the speed seen in actual PCIE connections to the motherboard. As for my own uses of NVIDIA’s GPU, I began training models, centered on language models (although I did also fine tune stable diffusion models as well with great success). I found that language models were a lot more interesting, at least for me as a writer and author. I was like a lot of creators or artists, beginning to think that AI would simply be a new competitor of mine in terms of me being a writer, or even a coder. But I beg to differ having now realized the true ability and functionalities of this new generative AI movement.
I can sum this up in a simple rule I’ve realized when it comes to generative AI (and machine learning based programs in general), and hence I find myself compelled to share a profound realization regarding the inherent limitations and potential of Generative AI:
In its most sublime form, it does not merely serve as a reflection or approximation of the artist upon whose work it is modeled, but rather, it exists as a mere shadow of the artist’s original creation.
Or if rather explained in logic,
This reflection, no matter how abstract, can never truly attain the full magnitude of the original artist’s capacity. This limitation arises from the fact that the AI’s decisions are predicated on a simplified representation of the artist’s work, which it employs as a form of abstraction to generate further creations. This process, however, necessitates an initial spark of inspiration from which to germinate.{\displaystyle f(a)+{\frac {f'(a)}{1!}}(x-a)+{\frac {f''(a)}{2!}}(x-a)^{2}+{\frac {f'''(a)}{3!}}(x-a)^{3}+\cdots ,}
As a passionate writer, I have come to realize that I possess the ability to further enhance the capabilities of generative AI, rather than being pushed by it. This realization has not only provided me with a remarkable avenue for personal growth and development as a writer, but has also led me to devise highly specialized language models that are incredibly insightful in providing responses to specific data inputs.
In recent times, you may have noticed a surge in companies adopting “AI” chat assistants. Regrettably, these tools often fall short, providing only general responses and failing to address matters specific to their intended purpose as a business’s support agent. This is primarily because training a language model to demonstrate actual usefulness towards a specialized set of data is a far more complex and involved process than simply feeding as much data as possible to create a giant language model, such as in the case of ChatGPT. While these large language models can provide responses to general queries, their performance in terms of specialized knowledge is less than satisfactory, and their understanding of how to provide support for a company’s internal business services is virtually non-existent.
However, I am pleased to share that I may have discovered a workflow that offers a method for specialized training of language models, as demonstrated by my examples. I am hopeful that I will be able to compile a guide on this subject soon, so I kindly request that you stay tuned for further updates.
Have something to say? Leave a comment, make yourself heard.