How to Run Qwen2.5-Coder-7B-Instruct Model in the Cloud?

Ayush Kumar

Oct 4, 2024 — 7 min read

How to Run Qwen2.5-Coder-7B-Instruct Model in the Cloud?

The Qwen2.5-Coder is Alibaba Cloud's latest series of large, code-specific Qwen language models. Compared to previous versions of the Qwen Series models, It is a more robust model that performs exceptionally well in various areas, including coding, mathematics, reasoning, and instruction-following tasks.

Qwen2.5-Coder is released in three base and instruction-tuned language models, with 1.5, 7, and 32 billion parameters (coming soon). All models are available under the Apache 2.0 License.

The Qwen2.5-Coder-7B-Instruct model has a context length of up to 128K tokens and can generate text of up to 8K tokens in length. It demonstrates exceptional performance compared to Llama in various fields, such as coding and mathematics.

Use cases of Qwen2.5-Coder-7B-Instruct Model.

There are several use cases of the Qwen2.5-72B-Instruct Model:

Code Generation: Generate questions of Data Structures & Algorithms.
AI Applications: Facilitate the development of chatbots and virtual assistants.
Code Reasoning: Generate MCQ Answers etc.
Math Reasoning: Generate questions about Maths Reasoning, like theorems, sums, etc
Debugging: Support code generation, debugging, and automation.

Model Inputs and Outputs

Inputs

Text prompt describing the theorem, binary questions, coding questions, instructions, etc.

Outputs

The primary output of the Qwen2.5-Coder-7B-Instruct model is natural language text in long forms, like code, theorems, etc.

Qwen2.5-Coder-Instruct excels in several key areas:

Outstanding Multi-programming Expert.

Code Reasoning

Math Reasoning

Model	Math	GSM8K	GaoKao2023en	OlympiadBench	CollegeMath	AIME24
DeepSeek-Coder-V2-Lite-Instruct	61.0	87.6	56.1	26.4	39.8	6.7
Qwen2.5-Coder-7B-Instruct	66.8	86.7	60.5	29.8	43.5	10.0

Basic capabilities

Model	AMC23	MMLU	MMLU-Pro	IFEval	CEval	GPQA
DeepSeek-Coder-V2-Lite-Instruct	40.4	42.5	60.6	38.6	60.1	27.6
Qwen2.5-Coder-7B-Instruct	42.5	45.6	68.7	58.6	61.4	35.6

In this blog, you'll learn:

About Qwen2.5-Coder-7B-Instruct model
Setup GPU-powered Virtual Machine offered by NodeShift
Run Qwen2.5-Coder-7B-Instruct Model in the NodeShift Cloud.

Step-by-Step Process to Run Qwen2.5-Coder-7B-Instruct Model in the Cloud

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you've signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift's GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deployment.

Step 3: Select a Model, Region, and Storage

In the "GPU Nodes" tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

Next, you will need to choose an image for your Virtual Machine. We will deploy the Qwen2.5-Coder-7B-Instruct Model on a Jupyter Virtual Machine. This open-source platform will allow you to install and run the Qwen2.5-Coder-7B-Instruct Model on your GPU node. By running this model on a Jupyter Notebook, we avoid using the terminal, simplifying the process and reducing the setup time. This allows you to configure the model in just a few steps and minutes.

Note: NodeShift provides multiple image template options, such as TensorFlow, PyTorch, NVIDIA CUDA, Deepo, Whisper ASR Webservice, and Jupyter Notebook. With these options, you don’t need to install additional libraries or packages to run Jupyter Notebook. You can start Jupyter Notebook in just a few simple clicks.

After choosing the image, click the 'Create' button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to Jupyter Notebook

Once your GPU VM deployment is successfully created and has reached the 'RUNNING' status, you can navigate to the page of your GPU Deployment Instance. Then, click the 'Connect' Button in the top right corner.

After clicking the 'Connect' button, you can view the Jupyter Notebook.

Now open Python 3(pykernel) Notebook.

Next, If you want to check the GPU details, run the command in the Jupyter Notebook cell:

!nvidia-smi

Step 8: Install the Torch Library

Torch is an open-source machine learning library, a scientific computing framework, and a scripting language based on Lua. It provides LuaJIT interfaces to deep learning algorithms implemented in C.

Torch was designed with performance in mind, leveraging highly optimized libraries like CUDA, BLAS, and LAPACK for numerical computations.

Run the following command in the Jupyter Notebook cell to install the Torch Library:

pip install torch

Step 9: Install Transformers from GitHub

To install the Transformers, run the following command in the Jupyter Notebook cell:

pip install git+https://github.com/huggingface/transformers

Transformers provide APIs and tools to download and efficiently train pre-trained models.

Step 10: Install Accelerate from GitHub

To install the Accelerate, run the following command in the Jupyter Notebook cell:

pip install git+https://github.com/huggingface/accelerate

Accelerate is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code.

Step 11: Install Huggingface Hub

To install the Huggingface hub, run the following command in the Jupyter Notebook cell:

pip install huggingface_hub

Hugging Face Hub is the go-to place for sharing machine learning models, demos, datasets, and metrics. huggingface_hub library helps you interact with the Hub without leaving your development environment.

Step 12: Run the Qwen2.5-Coder-7B-Instruct code

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-Coder-7B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Step 13: Run the Prompt and Print the Output

prompt = "Given a sorted array arr[] of N integers and a number K, the task is to write the C program to find the upper_bound() and lower_bound() of K in the given array."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

Conclusion

Qwen2.5-Coder-7B-Instruct is a groundbreaking open-source model from Alibaba Cloud that brings state-of-the-art AI capabilities to developers and researchers. Following this guide, you can quickly deploy Qwen2.5-Coder-7B-Instruct on a GPU-powered Virtual Machine with NodeShift, harnessing its full potential. NodeShift provides an accessible, secure, affordable platform to run your AI models efficiently. It is an excellent choice for those experimenting with Qwen2.5-Coder-7B-Instruct and other cutting-edge AI tools.

How to Run Qwen2.5-Coder-7B-Instruct Model in the Cloud?

Ayush Kumar

Use cases of Qwen2.5-Coder-7B-Instruct Model.

Model Inputs and Outputs

Step-by-Step Process to Run Qwen2.5-Coder-7B-Instruct Model in the Cloud

Conclusion

Read more

Bring Data to Life: How to Install Grafana on Ubuntu 22.04

Deploy SmolLM2 Models in the Cloud

How to Install a Private OpenVPN Server on Ubuntu 22.04

Convert Screenshots and Designs to Clean Code with Screenshot-to-Code Tool