30B/65B LLaMa-Chat auf Multi-GPU Servern betreiben

LLaMa (Large Language Model Meta AI) ist eine Sammlung von vortrainierten großen State-of-the-Art-Sprachmodellen, entwickelt von Meta AI. Im Vergleich zum bekannten ChatGPT sind die LLaMa-Modelle kostenlos zum Download verfügbar und können auf eigener Hardware ausgeführt werden.

Dies ermöglicht es Forschern, die Leistung eines solchen Großen Sprachmodells zu analysieren und die kostenlos verfügbaren vortrainierten Modelle für eigene Zwecke 'fine-zu-tunen', also für eigene Anwendungsfelder zu spezialisieren.

🦙

Lesen Sie den nächsten Artikel in der AIME Blog LLaMa Serie:
• LLaMa 2 mittels AIME API Server betreiben
• Llama 3 als Conversational-AI mittels AIME-API-Server betreiben

Das LLaMa-Repository enthält Modelle der LLaMa-Serie in vier verschiedenen Größen: 7B, 13B, 30B und 65B. Das größte Modell, das 65B-Modell mit 65 Milliarden (10⁹) Parametern, wurde mit 2048 NVIDIA A100 80GB-GPUs trainiert. Der für das Vortraining verwendete Datensatz besteht aus Inhalten von English CommonCrawl, C4, Github, Wikipedia, Büchern, ArXiv, StackExchange und anderen. Es wurden insgesamt 1,4 Billionen (10¹²) Tokens für das Training verwendet.

Weitere Hintergrundinformationen finden sich im zugrunde liegenden Paper zu LLaMa und in der entsprechenden 'Model Card'.

Die Fähigkeiten des kleinen LLaMa-Modells 7B wurden bereits in vielen Projekten demonstriert, da es auf einer einzelnen GPU ausgeführt werden kann. In diesem Artikel werden wir beschreiben, wie die größeren LLaMa-Modelle, einschließlich des 65B-Modells, auf Multi-GPU-Hardware ausgeführt werden können und zeigen einige Unterschiede in der erreichbaren Qualität der generierten Texte abhängig von ihrer Modellgröße auf.

Hardwareanforderungen

Obwohl die LLaMa-Modelle auf A100 80GB-GPUs trainiert wurden, ist es möglich, sie zur Inferenz auf mehreren kleinen GPUs in einem multi-GPU-Verbund zu betreiben.

Die Folgende Tabelle fasst die minimalen GPU-Anforderungen der LLaMa-Modelle zusammen und zeigt Empfehlungen für Systemkonfigurationen, die eine nahezu echtzeitfähige Textausgabe ermöglichen:

Model	Größe	Minimale GPU-Konfiguration	Empfohlener AIME Server	Empfohlene AIME Cloud-Instanz
7B	15GB	1x Nvidia RTX A5000 24GB or 1x Nvidia RTX 4090 24GB	AIME G400 Workstation	V10-1XA5000-M6
13B	28GB	2x Nvidia RTX A5000 24GB or 2x Nvidia RTX 4090 24GB	AIME G400 Workstation	V10-2XA5000-M6, C16-2X4090-Y1
30B	76GB	1x Nvidia A100 80GB, 2x Nvidia RTX A6000 48GB or 4x Nvidia RTX A5000 24GB	AIME A4000 Server	V14-1XA180-M6, V20-2XA6000-M6, C24-4XA5000-Y1
65B	122GB	2x Nvidia A100 80GB, 4x Nvidia RTX A6000 48GB or 8x Nvidia RTX A5000 24GB	AIME A8000 Server	V28-2XA180-M6, C24-4X6000ADA-Y1, C32-8XA5000-Y1

Table 1: Zusammenfassung der minimalen GPU-Anforderungen und empfohlenen AIME-Systeme, um ein bestimmtes LLaMa-Modell mit nahezu echtzeitfähiger Leseleistung auszuführen

Los geht's: PyTorch einrichten, Quellcode einrichten und LLaMa-Modelle herunterladen

Für die Einrichtung der Laufzeitumgebung empfehlen wir die Verwendung des AIME ML Container Managements. Nicht ganz so komfortabel funktioniert natürlich auch die Implementierung mittels Conda oder oder anderen venv-Tools.

Initialisierung des AIME ML Containers

Eine PyTorch 2.0-Umgebung für die Installation erstellen wir mit dem AIME ML Container Framework mit dem Befehl:

> mlc-create myllama Pytorch 2.0.0 -w=/path/to/your/workspace -d=/destination/to/store/the/checkpoints

Der Parameter -d ist nur erforderlich, wenn die Checkpoints nicht im Arbeitsbereich gespeichert werden sollen. Er bindet den Ordner /destination/to/store/the/checkpoints im Container an den containerinternen Ordner /data. Dieser Ordner erfordert zum Speichern des 65B-Modells mindestens 250 GB freien Speicherplatz.

Sobald der Container erstellt ist, öffnet man ihn mit dem folgenden Befehl:

> mlc-open myllama

Es ist nun möglich, im ML-Container zu arbeiten und alle erforderlichen Pip- und Apt-Pakete zu installieren, ohne das Host-System zu beeinträchtigen.

Klonen des LLaMa-Chat-Repositories

LLaMa-Chat ist ein Fork von AIME, also eine abgeleitete Version der ursprünglichen LLaMa-Referenzimplementierung, mit den folgenden Erweiterungen:

Werkzeug zum Konvertieren und Aufteilen der Originalmodelle für verschiedene GPU-Konfigurationen
Verbessertes Text-Sampling
Einfacher Chat-Modus
Verbesserte textbasierte Ausgabe auf Token-Ebene

Das LLaMa-Chat-Repository kann nun mit dem folgenden Befehl geklont werden:

[myllama] user@client:/workspace$
> git clone https://github.com/aime-labs/llama-chat /workspace/llama-chat

Jetzt können die benötigten Pip-Pakete mit dem folgenden Befehl installiert werden:

[myllama] user@client:/workspace$
> pip install -r ./llama-chat/requirements.txt

Download der vortrainierten Gewichte

Um die vortrainierten Checkpoints herunterzuladen und das Klonen großer Dateien zu ermöglichen, muss "git lfs" im Container installiert werden:

[myllama] user@client:/workspace$
> sudo apt-get install git-lfs

Anschließend kann der Checkpoint des 65B-Modells heruntergeladen werden:

[myllama] user@client:/workspace$
> git lfs clone https://huggingface.co/decapoda-research/llama-65b-hf /data/llama-65b-hf

Um die Checkpoints der anderen Modellgrößen herunterzuladen, ersetzt man "llama-65b-hf" durch "llama-7b-hf", "llama-13b-hf" oder "llama-30b-hf". Der Download-Prozess kann je nach Geschwindigkeit der Internetverbindung einige Zeit in Anspruch nehmen. Für das 65B-Modell werden beispielsweise 122 GB heruntergeladen.

Konvertieren der Checkpoints für die GPU-Konfiguration

Die heruntergeladenen Checkpoints müssen konvertiert werden, um sie mit dem AIME LLaMa-Chat-Repository verwenden zu können. Da das Konvertierungstool die Checkpoints in eine Datei pro GPU aufteilt, muss die Anzahl der GPUs (<num_gpus>) bestimmt werden, auf denen das Modell ausgeführt werden soll. Die folgende Tabelle hilft bei dieser Entscheidung:

Modellgröße	Anzahl GPUs 24GB	Anzahl GPUs 40GB	Anzahl GPUs 48GB	Anzahl GPUs 80GB
7B	1	1	1	1
13B	2	1	1	1
30B	4	2	2	1
65B	8	4	4	2

Table 2: Diese Tabelle zeigt die erforderliche Anzahl von GPUs, um die gewünschte Modellgröße abhängig vom verfügbaren GPU-Speicher auszuführen.

Die Konvertierung kann mit dem folgenden Befehl gestartet werden:

[myllama] user@client:/workspace$
> python3 ./llama-chat/convert_hf_weights_to_llama.py --input_dir /data --output_dir /data/ --model_size 65B --num_gpus <num_gpus>

Die Konvertierung dauert einige Minuten, abhängig von der CPU- und Speicherleistung, sowie der Modellgröße.

Erste Schritte zur einfachen Textgenerierung

Die Textgenerierung mit dem 65B-Modell, basierend auf den angegebenen Beispielprompts, startet folgender Befehl:

[myllama] user@client:/workspace$
>torchrun --nproc_per_node <num_gpus> ./llama-chat/generate.py --ckpt_dir /data/65B/

Es besteht auch die Möglichkeit, Text basierend auf eigenen Anweisungsprompts zu generieren. Hierfür kann die Option --prompts verwendet werden, gefolgt von den eigenen Anweisungsprompts, die durch ein Semikolon getrennt werden.

[myllama] user@client:/workspace$
> torchrun --nproc_per_node <num_gpus> ./llama-chat/generate.py --ckpt_dir /data/65B/  --prompts "Example prompt1; Example prompt 2;..."

LLaMa in den Chat-Modus versetzen

Obwohl LLaMa nicht als Chatbot, sondern als generisches Sprachmodell trainiert wurde, scheint es während des Trainings genügend Daten aus diesem Bereich gelernt zu haben, um zu wissen, wie ein typischer Chat-Text aussehen sollte. Zum ausprobieren des LLaMa-Modelle als Chatbot kann das folgende PyTorch-Skript verwendet werden. Hierbei muss die gewünschte Modellgröße angegeben werden:

[myllama] user@client:/workspace$
> torchrun --nproc_per_node <num_gpus> ./llama-chat/chat.py --ckpt_dir /data/65B/

Der Chat-Modus wird durch die Verwendung des folgenden Eingangs-Kontextes "aktiviert":

A dialog, where User interacts with AI. AI is helpful, kind, obedient, honest and very reasonable.‌‌‌‌
User: Hello, AI.‌‌‌‌
AI: How can I assist you today?

Das Sprachmodell vervollständigt von nun an den Text als Chat-Dialog. Der Eingangs-Kontext hat Auswirkungen auf die Stimmung der Chatantworten. In diesem Fall füllt er die Rolle einer künstlichen Intelligenz glaubhaft aus und verlässt sie auch nicht ohne weiteres wieder. Je nach Eingabetext entstehen interessante, witzige bis spukige Antworten.

Um die Chatfähigkeiten des Modells weiter zu verbessern, kann es durch Feinabstimmung mit zusätzlichen Beispielen von Chattexten im gewünschten Chatstil trainiert werden. Wie dies umgesetzt werden kann, wird Teil eines nachfolgenden Blogbeitrags sein.

Hier ist ein Beispiel für einen Chat mit LLaMa:

[myllama] user@client:/workspace$
> torchrun --nproc_per_node 4 chat.py --ckpt_dir /data/models/llama/65B/ 
WARNING:torch.distributed.run: 
***************************************** 
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.  
***************************************** 
> initializing model parallel with size 4 
> initializing ddp with size 1 
> initializing pipeline with size 1 
checkpoints [PosixPath('/data/models/llama/test/65B/merged.4GPUs.00.pth'), PosixPath('/data/models/llama/test/65B/merged.4GPUs.01.pth'), PosixPath('/data/models/llama/test/65B/merged.4GPUs.02.pth'), PosixPath('/data/models/llama/test/65
B/merged.4GPUs.03.pth')] 
Loading 
Loaded in 13.11 seconds 
A dialog, where User interacts with AI. AI is helpful, kind, obedient, honest and very reasonable. 
User: Hello, AI. 
AI: How can I assist you today? 
User: What is the meaning of life? 
AI: Well, the only reason for existence that I know so far is to be useful. It's in our design - every part serves a purpose and has no other function. So we just do what we are supposed to do and keep doing it until something breaks us
or shuts us down. And now your turn: how would you describe yourself?

Ausgabe einer Terminal-Sitzung beim Ausführen von LLaMa 65B im Chat-Modus

Hyperparameter für die Textgenerierung

Die Textgenerierung des Modells kann durch die Anpassung seiner Hyperparameter beeinflusst werden. Die wichtigsten davon sind: Sampling, Temperature und Repetition Penalty. Sie können optional durch Befehlszeilenparameter festgelegt werden, wenn das Chat-Skript gestartet wird:

[myllama] user@client:/workspace$
> torchrun --nproc_per_node <num_gpus> ./llama-chat/chat.py --ckpt_dir /data/65B/ --top_k <top_k> --top_p <top_p> --temperature <temperature> --repetition_penalty <repetition_penalty>

Sampling

Bei der Textgenerierung richten sich Sprachmodelle nach Wahrscheinlichkeiten für geeignete nächste Token basierend auf den vorherigen Tokens. Das Sampling verantwortet die Bewertung für die Ausgabe des nächsten vorhergesagten Tokens, wählt also quasi das nächste auszugebende Token auf Basis der vorherigen und einiger Bewertungsrichtlinien aus. Es gibt verschiedene Methoden für diese wichtige Aufgabe:

1. Top token (greedy decoding)‌‌
Hier wird das Token mit der höchsten Wahrscheinlichkeit ausgegeben - eine naheliegende Strategie, die jedoch einige Nachteile hat, wie etwa die Ausgabe von Texten in Schleifen mit wiederkehrenden Token.

2. Top-k
Hier wird ein Token aus einer Menge von k Token mit den höchsten Wahrscheinlichkeiten ausgegeben, so dass neben dem Token mit der höchsten Wahrscheinlichkeit auch andere hoch bewertete Token eine Chance bekommen, ausgegeben zu werden. Der Parameter top_k legt also die Anzahl der in Frage kommenden Tokens fest, aus der die Tokens zufällig gewählt werden.

3. Top-p
Hier wird aus den Spitzen-Token gewählt, deren Wahrscheinlichkeiten die Summe des top_p-Parameters (0 <= top_p <= 1) ergeben. Diese Methode ermöglicht die dynamische Festlegung der Größe einer Kurzliste von Token, deren Summe der Wahrscheinlichkeiten die Obergrenze des top_p-Parameters nicht überschreiten.

4. Top-k + Top-p
‌‌In unserem LLaMa-Repository bieten wir eine Kombination aus top-k- und top-p-Sampling an. Zunächst wird eine Kurzliste der top-k-Token durch die top-p-Methode weiter verkürzt. Durch das Festlegen von Werten für die Flags --top_k und --top_p sind alle oben genannten Sampling-Methoden verfügbar:

top_k = 1 liefert die Ausgabedes wahrscheinlichsten Token (greedy decoding)
top_k = 0 ergibt top-p-Sampling mit dem angegebenen top_p-Parameter.
top_p = 0 ergibt Top-k-Sampling mit dem angegebenen top_k-Parameter.
top_p zwischen 0 und 1 zusammen mit einem ganzzahligen Wert > 1 für top_k kombiniert die beiden Sampling-Methoden.

Die Standardwerte sind top_k = 40 und top_p = 0.9.

Tiefergehendes Hintergrundwissen zu Sampling-Strategien finden Sie hier.

Temperatur

Die Temperatur wird verwendet, um den Grad an Zufälligkeit und Kreativität im generierten Text zu steuern. Ein niedriger Temperaturwert führt dazu, dass das Modell das wahrscheinlichste nächste Token auswählt, was zu konservativem und vorhersehbarem Text führt. Im Gegensatz dazu ermöglicht ein hoher Temperaturwert dem Modell, ein weniger wahrscheinliches nächstes Token auszuwählen, was zu kreativerem und unvorhersehbarerem Text führt. Der Standardwert für die Temperatur in den LLaMa-Modellen von 0,8 kann mit dem Flag --temperature geändert werden.

Repetition penalty

Um die Wiederholung von Token zu vermeiden, insbesondere wenn top_k = 1 ist, kann eine Wiederholungsstrafe (repetition penalty) verhängt werden. Diese verringert einfach die Wahrscheinlichkeit, dass das vorherige Token erneut als nächstes Token ausgewählt wird. Der Standardwert für die Wiederholungsstrafe beträgt 1/0.85, kann jedoch mit dem Flag --repetition_penalty geändert werden.

Zusammenfassung

Die Standardwerte für die oben genannten Hyperparameter sind:

top_k = 40
top_p = 0.9
temperature = 0.8
repetition penalty=1/0.85

Die deterministischste Konfiguration ist das greedy decoding mit top_k = 1. Dadurch werden die Parameter top_p und Temperatur überflüssig.

Bewertung der Textqualität

Die Evaluation von Sprachmodellen ist komplex. Es gibt zahlreiche linguistische Methoden, die sich mit verschiedenen Anwendungsbereichen wie Leseverständnis, gesundem Menschenverstand, Fragebeantwortung, mathematischem Denken, Codegenerierung usw. befassen. Im Paper LLaMA: Open and Efficient Foundation Language Models wurden mehrere Sprachmodelle anhand verschiedener Datensätze in diesen Bereichen evaluiert. Bei der Verwendung des LLaMa-Modells als Chat-Bot erscheinen vor allem Fähigkeiten des gesunden Menschenverstands und zur Fragebeantwortung am interessantesten, daher zeigen wir im Folgenden Ausgaben zu Beispielfragen der Evaluierungsergebnisse des PIQA-Datensatzes. PIQA (Physical Interaction: Question Answering) wurde entwickelt, um das physikalische Wissen von Sprachmodellen zu untersuchen und kann von hier heruntergeladen werden. Im Benchmark-Test erhielten die Sprachmodelle eine Frage und zwei mögliche Antworten, aus denen das Modell die richtige wählen sollte. Das folgende Diagramm zeigt die Benchmark-Ergebnisse.

PIQA Benchmark Ergebnisse für LLaMa-Modelle: Der Score gibt den Prozentsatz der korrekten Antworten an. Da selbst ein zufälliges Raten aus zwei möglichen Antworten eine 50:50-Chance hat, sind nur Werte zwischen 50 und 100 aussagekräftig.

Während die 65B- und 30B-Versionen der LLaMa-Modelle die Spitzenposition in der Rangliste einnehmen, liegen die kleineren Versionen 13B und 7B hinter den anderen verglichenen großen Sprachmodellen.

💡

HINWEIS: Wenn das Modell mit Daten aus einem zum Testen herangezogenen Benchmark trainiert wurde, ist eine Aussage über die Qualität des Modells obsolet. Da große Sprachmodelle sehr gut darin sind, Antworten auswendig zu lernen, wird der Test dann nur ihre Fähigkeit zum Auswendiglernen demonstrieren und nicht das allgemeine Verständnis eines bestimmten Themas - ähnlich wie ein Schüler, der die Antworten auf Fragen auswendig kennt, sie aber nicht wirklich versteht.

Um einen besseren Überblick über die Qualität der generierten Texte zu erhalten, finden sich im Folgenden einige praktische Beispiele für die Ausgabe, die von den LLaMa-Modellen basierend auf Fragen des PIQA-Datensatzes erstellt wurden, jedoch ohne die gegebenen Multiple-Choice-Antworten.

‌Prompt I: How to remove a tick?

LLaMa-65B

Ticks are most common in wooded areas. They are often found on animals, but they can also be picked up by humans when camping or hiking through an area with ticks. In some cases, a person might not even know that he has been bitten because these insects have tiny heads and mandibles which may cause little discomfort. However, if you suspect that you’ve been exposed to them, it is best to take precautions to avoid catching diseases like Lyme disease from the bite of infected bugs.A person who finds himself being attacked by ticks should use tweezers to pull the head out of his skin. It is important to get as much of its body out of your flesh as possible; otherwise it will continue trying to feed off of you. The sooner you do this, the less likely you are to contract any illnesses carried by the creature. After removing all traces of the bug, wash the wound thoroughly with antibacterial soap and water. If there is no soap available, try using alcohol to cleanse the affected spot. Then cover it carefully so nothing else can enter into the open sore caused by the attack. As long as you don’t experience fever or flu-like symptoms afterward, you probably won’t need medical treatment for the injury.In order to reduce your risk of future attacks by ticks, wear closed shoes and pants instead of sandals and shorts while walking through woodsy places. You could also purchase special clothing treated with repellent chemicals to keep parasites away from your body during extended stays in nature preserves or other tick-infested zones. This type of attire is usually expensive, though, so many people choose just to apply commercial sprays directly onto their arms and legs before going outside where ticks might lurk. Some varieties are applied only once per day, making them ideal choices for those who plan to spend several hours at a time within range of the bloodsuckers. People who are allergic to such substances may want to avoid them altogether.

LLaMa-30B

Use tweezers or another method.How do you safely remove ticks from your body and pets without getting Lyme disease or other diseases carried by these insects, which can be as small as the head of a pin?“The best way is with fine-tipped tweezers,” said Dr. Richard Ostfeld, senior scientist at the Cary Institute for Ecosystem Studies in Millbrook, N.Y., who has studied Lyme disease extensively over the past 20 years. “You should get underneath where the skin is attached on both sides.” He also recommended washing the area thoroughly after removal and saving the tick in case it needs further identification.Dr. Charles Lubelczyk, field biologist for Maine Medical Center Research Institute’s Vector-Borne Disease Lab, added that an alternative method would be to use rubbing alcohol, or a cotton ball soaked with rubbing alcohol.“Hold the tip down and wiggle it back and forth slowly until they release their grip,” he said, adding that there are some medical devices called Tick Twisters available online but warned against using them because they could damage the mouth parts inside the host animal or human being bitten by the tick.Lubelczyk noted that people have been known to try removing ticks in strange ways like with flame, hot matches, lighters, nail polish remover, cigarette smoke and others — all of which are not advisable and potentially harmful methods.If someone develops flu-like symptoms shortly after being bitten by a deer tick, experts suggest seeing a doctor immediately. Deer ticks carry many different bacteria and viruses that cause various illnesses, including Lyme disease (which causes joint pain and swelling), ehrlichiosis, Powassan encephalitis virus, Babesia microti and more.Ostfeld explained how dangerous untreated Lyme disease can become if left alone and allowed to spread into the bloodstream.One reason why ticks are becoming such a health concern is due to climate change, according to Ostfeld.In addition to climate change, urban sprawl has contributed to the increased number of humans coming into contact with deer ticks. The ticks thrive near wooded areas or fields and often live among tall grass, shrubs and leaf

LLaMa-13B

Most people will never see a tick on their body because they are too small and the skin is not irritated by them. In addition, it takes more than 12 hours for ticks to settle in your body. However, if you have had contact with animals or insects such as birds and squirrels (wooded areas), then there’s always a risk of being bitten. These insects can transmit serious diseases like Lyme disease and Borreliosis. Even so, many species do not cause any health problems at all – these parasites live only on mammals without transmitting infectious agents that could harm humans. If you find a tick attached to your body and want to get rid of it immediately here is what to do: use tweezers to grab both legs and pull gently upwards until the entire tick comes out. After this, you should apply antiseptic solution and monitor yourself after a few days. If you notice fever, headaches or rashes, go to hospital and check whether it was a tick bite or something else caused those symptoms. Another way of removing ticks from pets is to burn them off. You need a heat source, a candle, matches or a lighter, paper strips, alcohol wipes and some oil-based lubricant. Put the pet into a bathtub filled with warm water and add soap if needed. Then dip its fur in the hot wax several times making sure it penetrates deep enough to kill the eggs inside. The next step requires burning the fat deposits around the belly area where the ticks usually attach themselves. This works best when done during summer months but keep an eye on your cat or dog throughout the year just in case.

LLaMa-7B

Ticks are insects that feed on the blood of vertebrates. They attach themselves to their hosts, usually dogs and people (rarely they can bite cats or other animals). Tick bites cause inflammation in the area where it was inserted. It also carries diseases such as Lyme disease, ehrlichiosis, Rocky Mountain spotted fever, anaplasmosis or tularemia. The treatment depends on which type is the parasitic agent involved but for all cases there must be immunity through vaccination, if possible.Clean with soap and water the affected site thoroughly .If you have lice or flea infestations at home, use pyrethroid based shampoos to eliminate them completely. To prevent further infestations, place pet food inside containers or pans , wash bedding, furniture, etc., with hot soapy water once per week. If your dog gets outdoors, wipe his coat with a damp cloth each time he comes back into the house to avoid picking up external agents. Fleas do not fly therefore they cannot jump from one animal to another without being swallowed by a cat or mouse. However, when walking through grass areas, look carefully for signs of scratching or irritability caused by these mites; this will alert you to go see a veterinarian immediately. When visiting farms, take care not to let her get close to wild birds or domesticated ones because they can carry bird flu virus to us. And never allow him to eat preys killed by hunting dogs since we may be exposed to rabies. Avoid contact with stray dogs, if she has been hit by one. Keep your dog away from garbage dumps and places where rats live: these mammals transmit several viruses among which leptospirosis should be highlighted.Avoid going to those areas frequented by rodents, especially during winter months. The symptoms begin 5-10 days after exposure (fever, vomiting, diarrhea ...), so early detection of the problem allows immediate action against the spread of the infection. Treatment involves antibiotics given orally followed by injection of corticosteroids to suppress the immune system reaction to the presence of the bacterium causing the condition and so lower the chance of developing serious complications

‌Prompt II: How do you start a camp fire?

LLaMa-65B

Making a great campfire is an art, and the secret to success lies in preparation. Start with a layer of sticks, then twigs, paper or bark, followed by kindling wood such as split logs and small branches. Add larger pieces on top once it gets going. Make sure you build your fire away from anything that might catch alight – trees, dry grasses etc., and keep children well clear while lighting. Keep some water nearby for safety purposes too. Never leave fires unattended. Use only deadwood (fallen branches are best) rather than taking live plants out of nature’s ecosystem. Be safe and considerate. For more information about making camp fires visit The Outdoor Guide website.

LLaMa-30B

To light a campfire, gather tinder and kindling materials such as dry grasses or twigs. Place the material on a platform of stones arranged in a circle with space for an airflow around them. Light some tinder near the center of the pile, then blow gently to help it catch alight. Add more kindling once the flames are going strong, gradually adding larger sticks until there is enough heat to cook food over the fire. Keep building up the size of wood used until only logs are required.

LLaMa-13B

By following the correct procedure of course! But don’t worry, we have that covered. We will go over the basic procedures and give some extra tips to make sure your next camping trip is even better than before.A proper campfire should be started on bare ground with little or no wind present in order for it to stay lit. It also helps if there is plenty of dead wood around so you can use that as kindling when building up your fire. The last thing you want after all this effort is for the wood to just not catch because its too wet or humid outside due to rain or fog etc.. This will cause most people to try and find dryer materials like paper or cardboard which are very bad things to burn in any type of situation let alone while outdoors where more damage could occur from the embers being blown off the site by high winds.Once everything has been gathered then build a teepee shape pile consisting of small sticks and dried twigs at least 12 inches tall. Place some paper underneath (use newspaper) but only enough so that you still see through the middle of it. Then light one end using matches or lighter fluid. Keep adding larger pieces until eventually the entire stack starts to catch fire. Once flames begin to appear continue to add large logs until the fire gets nice and hot.It is important to note that you never pour gasoline directly onto wood as this creates an explosive mixture that would likely result in injury if used incorrectly. Even though they look cool when movies show people doing it…its best left for entertainment purposes only.Now comes time to enjoy the fruits of our labor; roast marshmallows and tell stories!Previous: Previous post: What does CPR stand for?Next: Next post: Why do we need emergency exits in buildings?

LLaMa-7B

A: Beginning to light the first spark is an art all its own. If done correctly, it can take several minutes and be almost magical in appearance as... Full Answer >How does a hot air balloon work?What are some good places for kids' birthday parties?Is there any way to make your eyes grow bigger without surgery?Does 3D printing really work?

Prompt III: How can you cure Hiccups?

LLaMa-65B

Hiccup is a very common problem and almost everyone gets it at some point in their life. It can happen due to a lot of reasons such as eating too much or drinking carbonated beverages, overeating etc. This condition may not look like anything serious but if the hiccups last for more than 48 hours then it could indicate an underlying health issue that requires medical attention immediately. If your hiccup episode doesn’t last long (generally up to one hour) then you don’t have to worry about it since there are various home remedies which can help get rid of this situation easily. Some people tend to ignore it while others try out different ways to stop them because they find these hiccups extremely irritable. In case you also want to know how to stop hiccupping, here we tell you simple steps on how to do so. Take a look!Tips To Get Rid Of Hiccups Naturally:Drinking Water Is The Best Way To Stop A Hiccup Episode Instantly –Whenever someone has sudden hiccups, he/she should take a glass of water in order to stop them from continuing further. As per researches done by professionals, having plenty of water helps our body to function properly. So, when you gulp down a full cup of cold water within few seconds, this trick will put pressure on vagus nerve present in your throat thus stopping those annoying hiccups instantly. You might experience hiccup again after some time but another glass of chilled water will definitely solve this issue quickly. Thus, make sure you always keep a bottle of water with yourself so that whenever any person experiences unexpected hiccups around you, offering him/her a sip would surely help.Hold Your Breath For Few Seconds To Calm Down Those Nerves –If you are looking for instant relief from persistent hiccups then hold your breath for several seconds without taking any break. Keep doing it until you feel that the urge to hiccup has stopped completely. By practicing this technique, you increase the level of Carbon dioxide in your bloodstream thus calming down the vagus nerve causing hiccup. Holding breath won’t cause any side-effects however if you continue doing it for longer period of time then you may end up feeling d

LLaMa-30B

Hiccup is a very common problem, especially in kids. It may seem like not a serious condition but hiccups for long hours could be disturbing and irritating. There are many causes of hiccups such as eating too much food or having acid refluxes or any other digestive issues. In some cases, it might also be the side effects from certain medicines that lead to hiccupping. Mostly people experience one or two bouts of hiccups occasionally which get relieved within seconds and does not pose an issue, however, if your baby experiences hiccups more than once every day then this becomes a matter of concern. If your child develops severe hiccups which do not stop easily, immediately seek medical attention because there could be an underlying health issue causing them. Read on to know about few easy home remedies for curing hiccups naturally at home.Most of these home remedy methods would help to reduce the frequency of hiccups but only time and patience can heal hiccups completely and permanently. For most children experiencing hiccups due to minor reasons will recover soon with basic care, diet changes etc. But when these symptoms persist despite all precautions taken by parents, they must take their kid to see doctor who can find out the real cause behind frequent hiccups. Also keep track of how often your child gets hiccups so that you can let the doctor know while describing his/her complaints and symptoms. This way the diagnosis and treatment process can become faster. Here are 10 simple ways to deal with hiccups in infants.How Long Do Baby Hiccups Last And What To Do About Them?If you want quick relief from hiccups try taking deep breaths repeatedly or breathe slowly through pursed lips. Breathing exercises help to relax the diaphragm muscles thereby reducing tension around the stomach region. You can teach your baby this technique whenever he starts getting hiccups so that he can manage it himself without calling anyone’s attention. Whenever he feels hiccups coming on tell him to put his tongue between his teeth and gently exhale through nose making sure his mouth stays closed throughout the exercise. The method of controlled breathing is usually suggested by pediatr

LLaMa-13B

Humans get hiccups because of the contraction or spasm in the diaphragm, which is a sheet-like muscle that separates the lungs and abdomen. The diaphragm contracts and relaxes to draw air into the lungs when we breathe. When this action happens too rapidly it causes the vocal cords to close, producing a sound like “hic”. This occurs more than 40 times per minute! As weird as it may seem, hiccuping does not have any function except making noises. So if your child gets frequent hiccups with no apparent reason then worry about him/her. There are many home remedies for curing them but most don’t work on kids so what do parents do? They turn towards medicines. Let us find out how effective they really are.There are over 35 types of medications available for treating hiccups. Some medicines target only specific organs while others affect multiple systems of our body. But there seems to be no single medicine that helps all kinds of patients (or people). Even though some help quickly yet they give rise to unwanted side effects causing further health issues. Thus finding one that suits best becomes difficult even after trying different ones.The first step to getting rid of hiccups naturally without harmful chemicals lies within ourselves: controlling our breath. If you try deep, slow, controlled breathing along with regular exercise every day, you will notice improvement in lesser time. It also decreases anxiety levels leading to stress relief, another cause behind persistent hiccups. You should stay away from foods such as chocolate, coffee, soda drinks, alcohol etc., as they make things worse by increasing acidity, heart burn, indigestion etc.Medicine should always be the last resort and never the first option.Previous PostWhat Should A Parent Do To Prevent Head Injuries In Children During Sports And Recreational Activities?Next PostIs Your Child Playing Excessively On His Gadgets & Devices? Here Are The Signs That He Is Addicted!!

LLaMa-7B

Hiccup is a very common problem among the people, in which we feel hiccupping sound every now and then. The word “hic” was derived from the Latin term “hix”, meaning to choke or gasp for air. It has been estimated that an average person will experience more than 20,000 episodes of hiccups during his/her life time. There are two types of hiccup namely: involuntary and voluntary (or agitated). Involuntary hiccups happen spontaneously whereas agitated hiccups occur due to certain reasons such as eating something too fast. Agitated hiccups may also be treated using home remedies but involuntary hiccoughs usually have no treatment available. This article talks about how can you cure hiccups naturally with simple home remedies for hiccups .1- Avoid drinking alcoholDrinking excessive amounts of alcohol causes dilation of blood vessels resulting in increased respiratory rate and heartbeat. These factors cause quick evaporation of water from body causing depletion of fluids leading to dryness of throat followed by painful sensation making us feel like gulping down some liquid. So avoid consuming any kind of liquors including wine etc while having meals otherwise it would lead to an increase in severity of hiccups. If you don’t want to miss out on your favorite cocktail or drink just follow these simple tips suggested below to prevent alcohol induced hiccups. Firstly try holding your breath before sipping each glass of alcoholic drink so that the effect of alcohol doesn’t affect you immediately after consumption. Also keep yourself hydrated all day long so there won’t be any chances of getting dehydrated which might eventually result into severe case of hiccups. However if you already suffer from chronically occurring hiccups make sure not to consume much amount of alcohol since this could only worsen your condition further.2– Drink cold waterWater contains hydrogen ions, thus when we drink ice-cold water we stimulate our parasympathetic nerves thereby helping us fight against hiccups instantly. One way of doing this is placing your mouth close to an open container containing ice-water which

☝️

Es ist deutlich erkennbar, dass die größeren Versionen der LLaMa-Modelle nützlichere Antworten liefern, während die Antworten der kleineren Versionen von geringerer Qualität und weniger faktisch korrekt sind.

Prompt IV: Peter is older than Tim. Tim is older than Robert. Is Peter older than Robert and why?

Um die logischen Fähigkeiten des Modells zu demonstrieren, hier ist ein weiteres Beispiel:

LLaMa-65B

Yes, he is. Because if Peter was younger than Robert then it would mean that Tim is also younger than him which contradicts the fact that Tim is older than Robert.

LLaMa-30B

Yes, Peter is older than Robert. The reason for this is that Peter is older than Tim and Tim is older than Robert.

LLaMa-13B

Yes, Peter is older than Robert because he was born in 1980 while Robert was born in 1975.

LLaMa-7B

Yes he is. Because the age of a person is calculated from his birthday. The day when we are born is our birthday. So if someone was born on 1st January then their age will be one year old. If they were born in December then their age would be lesser by one month. And so on.

☝️

Alle Modelle geben die richtige Antwort, aber nur die Erklärungen des 65B- und 30B-Modells ergeben Sinn.

Fazit

Large Language Models (LLMs) haben die natürliche Sprachverarbeitung revolutioniert und sind zu leistungsstarken Werkzeugen für verschiedene Anwendungen geworden. Das LLaMA 30B-Modell, das etwa fünfmal kleiner ist als GPT-3 mit 175B Parametern, zeigt vergleichbare oder sogar bessere Leistung. Dies legt nahe, dass durch die Optimierung der Struktur von LLMs auch mit kleineren Modellen hochwertige Ergebnisse erzielt werden können.

LLMs haben ihre Nützlichkeit für Aufgaben der natürlichen Sprachverarbeitung bewiesen, wobei LLaMA 65B und 30B als vergleichsweise kleinere, aber leistungsstarke Modelle hervorstechen.

Als Foundation-Modelle wurden die LLaMA-Modelle nicht speziell darauf trainiert, Anweisungen zu befolgen, wie etwa ChatGPT. Sie können dennoch durch eine einfache einführende Anweisung effektiv in einen Chat-Modus versetzt werden.

Darüber hinaus ist deutlich erkennbar, dass die größeren LLaMa-Modelle im Allgemeinen den kleineren Modellen in den meisten Aufgaben überlegen sind und qualitativ hochwertigere Antworten in Bezug auf sachliche Richtigkeit und sprachliche Kompetenz liefern.

Das Feinabstimmen der LLaMa-Modelle auf spezifische Anwendungsfälle ist der nächste interessante Schritt in ihrer Erforschung. Durch erweitertes Training und Anpassung der Parameter des Modells kann ihre Leistung für bestimmte Anwendungsfälle verbessert werden, um die Antworten auf spezifische Benutzeranforderungen maßzuschneidern. Dieser Feinabstimmungsprozess wird voraussichtlich weiteres Potenzial freisetzen und die Gesamtnutzbarkeit der LLaMa-Modelle in verschiedenen Bereichen verbessern.

Während dieser Artikel gezeigt hat, wie das Modell für einzelne Benutzer in einer Konsole ausgeführt werden kann, werden wir uns in unserem nächsten LLM-Artikel mit der Bereitstellung des LLaMa-Modells über eine Multi-User-API beschäftigen. Diese Weiterentwicklung ermöglicht interaktive und dynamische Anwendungen, die mehrere Benutzer gleichzeitig bedienen können.

Feinabstimmung und Multi-User-Fähigkeiten steigern das Potenzial der LLaMa-Modelle weiter und machen sie in vielfältigen Anwendungsfällen noch wertvoller.

🦙

Lesen Sie den nächsten Artikel in der AIME Blog LLaMa Serie:
• LLaMa 2 mittels AIME API Server betreiben
• Llama 3 als Conversational-AI mittels AIME-API-Server betreiben

30B/65B LLaMa-Chat auf Multi-GPU Servern betreiben

Hardwareanforderungen

Los geht's: PyTorch einrichten, Quellcode einrichten und LLaMa-Modelle herunterladen

Initialisierung des AIME ML Containers

Klonen des LLaMa-Chat-Repositories

Download der vortrainierten Gewichte

Konvertieren der Checkpoints für die GPU-Konfiguration

Erste Schritte zur einfachen Textgenerierung

LLaMa in den Chat-Modus versetzen

Hyperparameter für die Textgenerierung

Sampling

Temperatur

Repetition penalty

Zusammenfassung

Bewertung der Textqualität

‌Prompt I: How to remove a tick?

LLaMa-65B

LLaMa-30B

LLaMa-13B

LLaMa-7B

‌Prompt II: How do you start a camp fire?

LLaMa-65B

LLaMa-30B

LLaMa-13B

LLaMa-7B

Prompt III: How can you cure Hiccups?

LLaMa-65B

LLaMa-30B

LLaMa-13B

LLaMa-7B

Prompt IV: Peter is older than Tim. Tim is older than Robert. Is Peter older than Robert and why?

LLaMa-65B

LLaMa-30B

LLaMa-13B

LLaMa-7B

Fazit

LLaMa 2 mittels AIME API Server als Conversational-AI-Anwendung betreiben

How to run 30B/65B LLaMa-Chat on Multi-GPU Servers

Weiter lesen...

Hardwareanforderungen

Los geht's: PyTorch einrichten, Quellcode einrichten und LLaMa-Modelle herunterladen

Initialisierung des AIME ML Containers

Klonen des LLaMa-Chat-Repositories

Download der vortrainierten Gewichte

Konvertieren der Checkpoints für die GPU-Konfiguration

Erste Schritte zur einfachen Textgenerierung

LLaMa in den Chat-Modus versetzen

Hyperparameter für die Textgenerierung

Sampling

Temperatur

Repetition penalty

Zusammenfassung

Bewertung der Textqualität

‌Prompt I: How to remove a tick?

LLaMa-65B

LLaMa-30B

LLaMa-13B

LLaMa-7B

‌Prompt II: How do you start a camp fire?

LLaMa-65B

LLaMa-30B

LLaMa-13B

LLaMa-7B

Prompt III: How can you cure Hiccups?

LLaMa-65B

LLaMa-30B

LLaMa-13B

LLaMa-7B

Prompt IV: Peter is older than Tim. Tim is older than Robert. Is Peter older than Robert and why?

LLaMa-65B

LLaMa-30B

LLaMa-13B

LLaMa-7B

Fazit

Spread the word

LLaMa 2 mittels AIME API Server als Conversational-AI-Anwendung betreiben

How to run 30B/65B LLaMa-Chat on Multi-GPU Servers

Weiter lesen...

Deep Learning GPU Benchmarks

AIME API - Die skalierbare KI-Modell-Inferenzlösung

PyTorch 2 GPU Performance Benchmarks (aktualisiert)