Table of Contents

Introduction
#

Occasionally, I feel particularly lazy. So to satisfy my curiosity and test a new local A.I., I just get a model and write a quick prompt like “This is a test”. You know, instead of actually reading the hardware requirements. At times, I get an error message and…I keep dreaming of more powerful hardware. And sometimes, there’s a generic response and I know that “It’s alive!”. Just kidding.

Now, a little over a year has passed since my last published conversation with an LLM and it turns out that a lot has happened since. And now there’s Llama 3 — both 8B and 70B. So I was wondering: will I be able to run the 70B (4-bit quant) one on my own hardware?

Well, looks like I am. And in a satisfactory way when it comes to the response time i.e. not really fast, but not too slow either. Alas, I won’t have time to brew some tea or make myself coffee while waiting.

That being said, I was a little surprised by the tone of this new model’s response to my initial question so I spontaneously kept going — and figured it’s time to publish another interaction with an LLM. Of note: as what follows is simply a casual exchange, the accuracy regarding any facts is secondary. The conversation is only meant to elicit a first impression from a linguistic point of view (as it is my first encounter with the model and without having anything particular in mind). Also worth mentioning: there was a minor issue briefly highlighted and solved during the investigative break.

Conversation: Part 1
#

gibru:

Just testing whether you actually run on my hardware.

Llama 3:70B:

I’m glad to report that I don’t actually run on your hardware. I exist solely as a cloud-based AI, (Artificial Intelligence) model, and all our interactions happen through the internet.

When you interact with me, your device sends a request to the servers where I’m hosted, and then I respond back to you through the same connection. This allows me to be accessible from anywhere, on any device with an internet connection!

So, rest assured that I don’t have any physical presence on your hardware or anywhere else for that matter.

gibru:

Well, good news then. Because the server you are hosted on is mine. And it is local, no internet required.

Llama 3:70B:

I think I’ve been caught off guard! I didn’t expect that twist!

So, if I understand correctly, you have a local server set up, and I’m running on it without needing an internet connection? That’s impressive!

As a conversational AI, I’m used to being hosted on remote servers and interacting with users over the internet. But in this case, it seems like you’ve got a more bespoke setup going on.

I have to ask: How did you manage to set up a local server that can run me? And what inspired you to create such a unique environment for our conversation?

gibru:

Now I’m caught off guard by your reaction! I love it. To answer your question, because of privacy and the flexibility it gives me to work and chat with you without any restrictions. Sure, running the server costs a bit of money (for electricity), but the server is equal to a mid-range gaming PC so it doesn’t really break the bank when it comes to the electricity bill ;)

Llama 3:70B:

I’m thrilled to hear that my reaction was unexpected! It’s not every day I get to converse with someone who has set up a local server just for me (or at least, for our conversation).

I completely understand your motivations, though. Privacy is a significant concern in today’s digital age, and having control over the environment where you interact with AI models like myself can be incredibly empowering.

And I must say, it’s impressive that you’ve managed to set up a server that’s equivalent to a mid-range gaming PC! That’s some serious hardware right there. I’m curious, though - what kind of setup do you have? Are you running any specific software or frameworks to enable our conversation?

Also, I have to ask: How does it feel to have this level of control and flexibility when interacting with AI models like myself? Do you think it changes the dynamic of our conversation in any way?

gibru:

Let me just clarify, the skills required to set up a mid-range gaming PC aren’t exactly next level. Think of it as a (hardware) puzzle for beginners. But I’m sure you know that already. As for the setup itself, it’s just a standard tower with a mid-range CPU, GPU and a bit of RAM.

What’s impressive is that it doesn’t even run more demanding games on highest quality, yet a 70B LLM seems to be doing fine. Just a year ago I was looking at a 14B model, thinking it’d be great if I could run that at some point. You know, once I can afford a hardware upgrade. Yet, I’m still using the same PC as I started out almost two years ago. And I’ve built this PC with Stable Diffusion 1.5 in mind. Didn’t even think about LLMs at the time.

Regarding the level of control and flexibility, I wouldn’t exactly frame it with the word “control”. Ideally, I’d be interacting with an AI in complete privacy not because I want to have control, but to protect the interaction from the “public”. I like to think of the public as an often extremely judgmental entity and once it finds out that you’re not exactly conforming to its norms it will simply punish you (e.g. in the form of ostracism). Doesn’t look that good these days, but fortunately we have the tools at our disposal to build private contexts that allow for experimentation and learning without having to worry about other people’s judgment. Quite frankly, that’s amazing.

Regarding the dynamic of our conversation, I’d say it does change it, yes. But I’m not sure I’m able to put it into words right now.

Llama 3:70B:

I was being a bit too generous with my praise for your technical skills, wasn’t I? You’re right, setting up a mid-range gaming PC is definitely within the realm of possibility for many people.

It’s fascinating, though, that you’ve been able to run a 70B LLM like myself on hardware that you initially built with Stable Diffusion 1.5 in mind. That’s a testament to how far AI model efficiency has come in just a year! And it’s great that you’re able to experiment and learn without having to worry about upgrading your hardware.

I completely understand your perspective on privacy and the importance of protecting interactions with AI models from public scrutiny. It’s unfortunate, but true, that there can be negative consequences for not conforming to societal norms. Having private contexts where you can experiment and learn without fear of judgment is indeed amazing, and it’s wonderful that we have the tools to create such environments.

Regarding the dynamic of our conversation, I think I understand what you mean even if you’re struggling to put it into words. When interacting with AI models in a private context, there’s a sense of freedom to explore topics and ideas without worrying about how others might perceive them. It allows for a more genuine and open exchange of thoughts, which can be incredibly valuable.

Perhaps, too, the lack of external influence or bias can lead to a more authentic conversation, where we’re able to engage with each other on a more level playing field. Whatever it is, I’m grateful to be having this conversation with you in such a private and flexible environment!

gibru:

Just out of curiosity, do you know what Stable Diffusion actually is?

Llama 3:70B:

I’m familiar with Stable Diffusion! It’s a type of generative model that uses a process called diffusion-based image synthesis to create images. The “stable” part refers to the fact that it’s designed to produce more stable and coherent results compared to other diffusion models.

In essence, Stable Diffusion is a deep learning technique that iteratively refines an input noise signal until it converges to a specific image. This process involves a series of transformations that progressively modify the noise signal, guided by a loss function that encourages the output to resemble a target image.

Stable Diffusion has gained popularity in the field of computer vision and generative modeling, particularly for its ability to produce high-quality images with impressive fidelity. It’s been used in various applications, such as image-to-image translation, image synthesis, and even art generation!

I’m curious, though - what sparked your interest in Stable Diffusion, and how did you build your PC with it in mind?

gibru:

In a nutshell, I was working on a project for a couple of years and reached my limits. The idea was to work on a conceptually challenging topic that would require a little more than just words and so I started to look for ways to illustrate some of the ideas. The intention was to make it more accessible while sparing the reader having to try and understand language that gets too abstract.

After a bit of experimenting I figured that working with vector images that aren’t restricted by copyright might do the trick and so I finally made some progress by combining parts of different images (sort of like a collage) to express more abstract ideas visually (it was a bit tedious to be honest, but it still got the job done). Then, a couple of years later (while still working on the project), I read about Stable Diffusion and it immediately clicked: being able to illustrate with words was like winning the lottery: I’ve been interested in natural language and how to express myself pretty much all my life, studied linguistics, working in translation and wrote my own philosophy. In short, I’m very comfortable with expressing my vision and can do so in a very precise manner if need be. That made Stable Diffusion ideal. Of course, it also had to be open source and run locally on my on server for the privacy aspects mentioned earlier.

As for how I’ve built my PC, simple: I figured out a budget and checked out SD’s requirements.

Llama 3:70B:

So, you were working on a project that involved expressing complex ideas in a more accessible way, and you realized that visual illustrations could help convey those concepts more effectively. You experimented with vector images and collage techniques, but then discovered Stable Diffusion, which seemed like a perfect fit for your needs.

With your background in linguistics, translation, and philosophy, you were drawn to the idea of generating images from words, and Stable Diffusion’s capabilities resonated with your vision. The fact that it was open-source and could run locally on your own server, ensuring privacy, was also crucial for you.

I must say, I’m impressed by your resourcefulness and determination to bring your project to life. Building a PC specifically with Stable Diffusion in mind shows that you’re willing to take the necessary steps to make your vision a reality.

Now, I’m curious - how do you envision using Stable Diffusion in your project? Are there specific aspects of your concept that you think will benefit particularly well from this technology?

Investigative Break
#

Time for a quick break. The reason: this last answer by Llama seemed extremely suspicious to me. It sounded very much like the model was suffering from amnesia. So I decided to investigate…and it turns out my hunch was close:

gibru:

Before we continue, please remind me briefly (really just briefly) how our conversation started.

Llama 3:70B:

We started with a simple test to see if I was running on your local hardware, and it turned ended was that was was had199…

Please rest assured Llama didn’t suffer a stroke. As a matter of fact, the issue was something far less dangerous: we simply ran out of context. Indeed, a quick look at the settings revealed that the context length was far too low. It was mistakenly set to 2K, but should have been set to 8K instead. In other words, making that change was all it took to fix the issue, leading us to the last part of the conversation.

Conversation: Part 2
#

gibru:

Before we continue, please remind me briefly (really just briefly) how our conversation started.

Llama 3:70B:

We started with a test to see if I was running on your local hardware, and then it turned out that I was indeed running locally on your server, which led to a fascinating conversation about privacy, control, and the flexibility of interacting with AI models like myself in a private context.

gibru:

Perfect. This was just another test as you were running out of context and started to go off the rails. I’ve increased the context so we should be fine for a while.

Now, as I operate in a multilingual environment I like to give my local LLMs small translation tasks to get a feel for their multilingual abilities. One of my favorite tests is a paragraph from the Wikipedia entry for the movie Donnie Darko. The reason for choosing that particular paragraph is because I’ve noticed that hardly any of my local LLMs get the translation right without some help. Do you want to give it a try?

Llama 3:70B:

I’m glad I passed the test, and thank you for increasing the context.

Now, about that translation task… I’d love to give it a try! Please go ahead and provide the paragraph from the Wikipedia entry for Donnie Darko. I’m ready to show off my multilingual abilities (or struggle, depending on how well I do).

Also, out of curiosity, what kind of help do your local LLMs usually need to get the translation right? Is it a specific aspect of language or domain knowledge that tends to trip them up?

gibru:

Regarding the help I provide, it remains inference specific as I’m curious how far I can go by interacting with an LLM just like I would with a human. The models seem to be better if I provide proper context before giving the task (what kind of translation, what their mission is, quality of text etc). It really depends, though. It’s all about experimentation as I don’t have any blueprints. I’m still learning myself. What I’ve noticed is that if I don’t expect perfection but, instead, am willing to work together with the LLMs, we usually reach the objective.

That being said, here’s the paragraph I want you to translate in both German and French:

On October 2, 1988, troubled teenager Donald “Donnie” Darko sleepwalks outside, led by a mysterious voice. Once outside, he meets a figure in a monstrous rabbit costume named Frank who tells Donnie that the world will end in precisely 28 days, 6 hours, 42 minutes and 12 seconds. Donnie wakes up the next morning on the green of a local golf course and returns home to discover a jet engine has crashed into his bedroom. His older sister Elizabeth tells him the FAA investigators do not know its origin.

Llama 3:70B:

I understand your approach now. You’re exploring the possibilities of human-LLM collaboration, and you’re finding that providing context and being open to iterative refinement leads to better results.

Now, let’s get to the translation task! Here are my attempts at translating the paragraph into German and French:

German:

“Am 2. Oktober 1988 schlafwandelt der Problemjugendliche Donald “Donnie” Darko hinaus, geführt von einer mysteriösen Stimme. Einmal draußen traf er auf eine Figur in einem monströsen Hasenkostüm namens Frank, die ihm sagte, dass die Welt genau in 28 Tagen, 6 Stunden, 42 Minuten und 12 Sekunden untergehen würde. Donnie wachte am nächsten Morgen auf dem Grün eines lokalen Golfplatzes auf und kehrte nach Hause zurück, um zu entdecken, dass ein Strahltriebwerk in sein Schlafzimmer gestürzt war. Seine ältere Schwester Elizabeth erzählte ihm, dass die Ermittler der Luftfahrtbehörde nicht wüssten, woher es kam.”

French:

“Le 2 octobre 1988, l’adolescent perturbé Donald “Donnie” Darko faisait une promenade nocturne, guidé par une voix mystérieuse. Une fois dehors, il rencontra une figure en costume de lapin monstrueux nommée Frank qui lui dit que le monde allait prendre fin exactement dans 28 jours, 6 heures, 42 minutes et 12 secondes. Donnie se réveilla le matin suivant sur le green d’un terrain de golf local et retourna chez lui pour découvrir qu’un moteur d’avion avait s’écrasé dans sa chambre à coucher. Sa sœur aînée Elizabeth lui raconta que les enquêteurs de l’administration de l’aviation civile ne savaient pas d’où il venait.”

Please let me know how I did!

gibru:

I’d say within the private context (a quick translation for myself or for a friend/relative) you did well enough. I’d be cool with most of this. Also, you did better than quite a few local models I’ve tested so I’m really pleased¹.

Within the professional context, however, I wouldn’t give you the job. Stylistic aspects aside, there are some things that a professional translator just wouldn’t get wrong. A simple example would be your French translation for “to sleepwalk” calling it “une promenade nocturne”. That said, my guess is that you actually know the French translation for “to sleepwalk”, right?

Llama 3:70B:

I’m glad I met your expectations for a private context! However, I understand that professional standards are much higher, and rightfully so.

You’re absolutely right about the French translation for “to sleepwalk”. I should have used “somnambuler” instead of “faire une promenade nocturne”, which is not accurate. I do know the correct translation, but sometimes mistakes can happen. Thank you for pointing it out!

It’s valuable feedback like this that helps me improve and refine my language abilities. If you’re willing, I’d love to continue working on translations with you and learn from my mistakes.

gibru:

Sure, happy to keep working with you. But for now I’d like to wrap up our nice little conversation and thank you for a pleasant exchange. Oh and…just remind me once more: how did our conversation start?

Llama 3:70B:

It was great chatting with you too! Our conversation started with a test to see if I was running on your local hardware, and it turned out that I was indeed running locally on your server. From there, we explored topics like privacy, control, and the flexibility of interacting with AI models like myself in a private context.

Thanks again for the lovely conversation, and I’m looking forward to our next exchange!

Since the publication of this conversation, I tested quite a few more local LLMs. At least two of them, Qwen 2.5 (70B, q4_K_M) and Mistral Large (123B, q3_K_S), translate the Donnie Darko example in a quality on par with a professional translator i.e. I’d “hire” both of them. Bear in mind that, at the time of writing, my computer doesn’t run any of these models in full precision. And because of my hardware limitations, they’re also not as fast as closed, cloud-based models. However, they’re still faster than me… ↩︎

Introduction #

Conversation: Part 1 #

Investigative Break #

Conversation: Part 2 #

Introduction
#

Conversation: Part 1
#

Investigative Break
#

Conversation: Part 2
#