Deploy LLM applications on Azure Container Apps with Code Interpreter dynamic sessions

An implementation with LangChain and Azure OpenAI

11 min readJul 6, 2024

One of the most prominent frameworks in LLMs applications today is based on AI agents. Agents can be seen as specialized entities powered by:

Tools to interact with the surrounding ecosystem
Instructions to follow
LLMs as brain and orchestrator of the tools

Among those tools, one with a great impact is for sure the Code Interpreter. The primary idea behind the code interpreter is to enhance the capabilities of the AI by enabling it to perform calculations, data analysis, generate visualizations, and interact with code directly. It is designed to generate and execute code snippets within a conversational context.

Existing code interpreters are the following:

OpenAI and Azure OpenAI Assistants, where you can find the Code interpreter as an integrated tool
Open Source AI orchestrator such as LangChain offers libraries that emulate the code interpreter (for example, PythonREPL).

However, both solutions has some limitations: in the first case, the code interpreter tool is tighly integrated with the OpenAI ecosystem, meaning that you won’t have the flexibility to invoke it with other LLMs (Llama, Phi-3…); in the second case, the solution doesn’t comply with enterprise-level security, since there is no guarantee that the code won’t affect the local environment once the agent invoke the tool (for example, deleting files).

An interesting solution to both the scenarios is presented a recent feature (still in preview) announced in May 2024 and released in Azure Container Apps (ACA): dynamic sessions.

Azure Container Apps and Dynamic Sessions

Azure Container Apps (ACA) is a serverless, fully managed serverless container service provided by Microsoft Azure, designed to streamline the deployment and scaling of containerized applications.

Azure Container Apps (ACA) is particularly beneficial for LLMs applications for several key reasons:

Seamless Scalability: LLMs applications require significant and often unpredictable computational resources for inference requests (think about a Shopping Assistant chatbot on a Black Friday). ACA’s auto-scaling capabilities ensure that the necessary resources are available to handle varying loads.
Cost Efficiency: ACA provides a serverless environment where resources are allocated dynamically based on demand, with the possibility of scaling even to 0 resources allocated. The pay-as-you-go princing model reduce the need for over-provisioning and minimizing operational costs.
Enhanced Developer Productivity: ACA abstracts the complexities of container orchestration and infrastructure management typical of AKS. This abstraction accelerates the development lifecycle and speeds up time-to-market for LLMs applications. Plus, it is integrated with Azure Pipelines and GitHub actions to guarantee CI/CD for your application lifecycle.
Security and Compliance: Azure provides robust security features and compliance certifications, ensuring that sensitive data used in AI models is protected.

On top of that, it now offers dynamic sessions, which are sandboxed environments that are ideal for running code or applications that require strong isolation from other workloads. They offer several benefits, including:

Secure Isolation: These sessions run in their own protected environment, like a sandbox.
Easy Access: You can use a unique identifier to access each session via a REST API.
Managed Lifecycle: Azure handles session management, cleaning up when sessions are no longer needed.
Quick Startup: New sessions are allocated in milliseconds from a pool of ready but unallocated sessions.
Scalability: You can run many sessions concurrently, making it flexible for your needs.

There are two type of sessions:

Custom Container Sessions: You can run your own container images in these secure sandboxes.
Code Interpreter Sessions: These are great for running untrusted code (which might be the case of an LLM-generated snippet) and can be easily integrated with your LLM-powered application.

Source: New: Secure Sandboxes at Scale with Azure Container Apps Dynamic Sessions — Microsoft Community Hub

The code interpreter session pool comes with pre-installed libraries and can be easily integrated in the most popular AI orchestrator including LangChain, Llama-Index, Semantic Kernel and Autogen.

In this article, we are going to see an implementation of the Code Interpreter dynamic session with LangChain.

Building a Data Analyst copilot with LangChain and Python REPL

To build your code interpreter in LangChain, you can leverage the Python REPL tool and pass it into an agent. Note that, as stated in the official documentation provided by LangChain, “Python REPL can execute arbitrary code on the host machine (e.g., delete files, make network requests).”

from azure.identity import DefaultAzureCredential
from langchain import agents, hub
from langchain_azure_dynamic_sessions import SessionsPythonREPLTool
from langchain_openai import AzureChatOpenAI

I’m going to use the Azure OpenAI GPT-4-turbo model:

llm = AzureChatOpenAI(
    api_key="xxx",
    azure_deployment="gpt-4-turbo",
    openai_api_version="2024-02-01",
    openai_api_type="azure_ad",
    temperature=0,
)

Let’s now import the tool and pass it to our agent:

python_repl = PythonREPL()
repl_tool = Tool(
    name="python_repl",
    description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`. You are able to show charts that you generate.",
    func=python_repl.run,
)
tools = [repl_tool]
prompt = hub.pull("hwchase17/openai-functions-agent")
agent = agents.create_tool_calling_agent(llm, tools, prompt)

agent_executor = agents.AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

response = agent_executor.invoke({"input": "tell me the sum of 2 and 2"})

Note: I used a pre-written prompt and pulled it directly from the LangChain Hub. Here you can find plenty of prompts to leverage when building your apps.

Here there is the Agent execution:

> Entering new AgentExecutor chain...

Invoking: `python_repl` with `print(2 + 2)`


4
The sum of 2 and 2 is 4.

> Finished chain.

Great, now let’s say we want to build an Optimization Copilot which is capable of solving complex optimization problems. To do so, we will leverage the Code Interpreter and slightly modity the prompt as follows (you can do that here):

And modify the pull request:

prompt = hub.pull("vale/pythonrepl_optimizer")

Let’s test it with the following problem:

problem_statement = """

You are managing the production for a small factory that produces two types of products: Product A and Product B. Your goal is to maximize the total profit while adhering to the production constraints.

### Given Data:

1. **Profit per unit:**
   - Product A: $20
   - Product B: $30

2. **Resource Requirements:**
   - Product A requires 2 hours of machine time and 3 hours of labor.
   - Product B requires 4 hours of machine time and 2 hours of labor.

3. **Available Resources:**
   - Machine time: 100 hours
   - Labor: 80 hours

### Objective:

Maximize the total profit while ensuring that the production of Product A and Product B does not exceed the available machine time and labor hours.

"""

agent_executor.invoke({"input": f'what is the solution of the following problem?  {problem_statement}'})

Below the result:

> Entering new AgentExecutor chain...

Invoking: `python_repl` with `import pulp

# Define the problem
prob = pulp.LpProblem('MaximizeProfit', pulp.LpMaximize)

# Decision variables
x1 = pulp.LpVariable('Product_A', lowBound=0, cat='Continuous')
# Product A
x2 = pulp.LpVariable('Product_B', lowBound=0, cat='Continuous')
# Product B

# Objective function
prob += 20 * x1 + 30 * x2, 'Total Profit'

# Constraints
prob += 2 * x1 + 4 * x2 <= 100, 'Machine Time'
prob += 3 * x1 + 2 * x2 <= 80, 'Labor'

# Solve the problem
prob.solve()
...
{'Product_A': 15.0, 'Product_B': 17.5, 'Total_Profit': 825.0}
The optimal solution to maximize the total profit, given the constraints, is to produce 15 units of Product A and 17.5 units of Product B. This will result in a total profit of $825.

Great! Now let’s see how to integrate it with the session pool in ACA.

Create a Code Interpreter dynamic session in ACA

To create a dynamic session for your code interpreter, you first need to enable this extension which is still in public preview. To do so, you can open your Azure CLI directly from your Azure Portal and run the following commands:

az upgrade
az extension add --name containerapp --upgrade --allow-preview true -y

Then, to create your session pool, run the following:

az containerapp sessionpool create -n yoursessionpool -g yourResourceGroup --container-type PythonLTS --max-sessions 30 --ready-sessions 10 --location yourlocation

If the deployment is successful, you will see a similar output:

You can find a full tutorial here and a list with all the CLI commands for sessionpool here.

Now that we have our session pool up and running, we can navigate it directly through the Azure Portal:

In the overview tab, you can see that we have pool endpoint that we can consume in our LLM-powered application. To do so, the only thing we need to do is modifying the Python REPL tool as follows:

pool_management_endpoint = "yourendpoint"
repl = SessionsPythonREPLTool(pool_management_endpoint=pool_management_endpoint)

Then, the agent configuration remains the same:

tools = [repl]
prompt = hub.pull("vale/pythonrepl_optimizer")
agent = agents.create_tool_calling_agent(llm, tools, prompt)

agent_executor = agents.AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

agent_executor.invoke({"input": f'what is the solution of the following problem?  {problem_statement}'})

> Entering new AgentExecutor chain...

Invoking: `Python_REPL` with `from pulp import *

# Define the problem
prob = LpProblem('Maximize_Profit', LpMaximize)

# Decision variables
x1 = LpVariable('Product_A', lowBound=0, cat='Continuous')
# Product A
x2 = LpVariable('Product_B', lowBound=0, cat='Continuous')
# Product B

# Objective function
prob += 20 * x1 + 30 * x2, 'Total Profit'

# Constraints
prob += 2 * x1 + 4 * x2 <= 100, 'Machine Time'
prob += 3 * x1 + 2 * x2 <= 80, 'Labor'

# Solve the problem
prob.solve()

# Print the results
result = {'Product_A': x1.varValue, 'Product_B': x2.varValue, 'Total_Profit': value(prob.objective)}
result`


{
  "result": null,
  "stdout": "",
  "stderr": "No module named 'pulp'---------------------------------------------------------------------------\nModuleNotFoundError                       Traceback (most recent call last)\nCell In[663], line 1\n----> 1 from pulp import *\n      3 # Define the problem\n      4 prob = LpProblem('Maximize_Profit', LpMaximize)\n\nModuleNotFoundError: No module named 'pulp'\n"
}
Invoking: `Python_REPL` with `from scipy.optimize import linprog

# Coefficients of the objective function (negative because linprog does minimization by default)
c = [-20, -30] # Profit per unit for Product A and Product B

# Inequality constraints (Ax <= b)
A = [[2, 4], # Machine time coefficients
     [3, 2]] # Labor coefficients
b = [100, 80] # Available machine time and labor

# Bounds for each variable
x0_bounds = (0, None) # Product A can't be negative
x1_bounds = (0, None) # Product B can't be negative

# Solve the problem
res = linprog(c, A_ub=A, b_ub=b, bounds=[x0_bounds, x1_bounds], method='highs')

# Results
solution = {'Product_A': res.x[0], 'Product_B': res.x[1], 'Total_Profit': -res.fun}
solution`
responded: It seems that the `pulp` module is not installed in the environment I am using. However, I can still solve the optimization problem using Python's built-in capabilities. Let me do that for you.

{
  "result": "{'Product_A': 15.0, 'Product_B': 17.5, 'Total_Profit': 825.0}",
  "stdout": "",
  "stderr": ""
}The solution to the optimization problem is as follows:

- Produce 15 units of Product A.
- Produce 17.5 units of Product B.
- The total profit will be $825.

> Finished chain.

As you can see, the result is the same. Now let’s build an UI with Streamlit as follows:

from langchain_core.tools import Tool
from langchain_experimental.utilities import PythonREPL
import dotenv
from azure.identity import DefaultAzureCredential
from langchain import agents, hub
from langchain_azure_dynamic_sessions import SessionsPythonREPLTool
from langchain_openai import AzureChatOpenAI
from langchain.callbacks import StreamlitCallbackHandler

import streamlit as st


st.image("optimizer.jpg", width=500)
st.title('🔢Welcome to your AI Optimizer!')

st.sidebar.title("📖OptimizerGPT!🌐")
st.sidebar.caption("Made by an [Valentina Alto](https://www.linkedin.com/in/valentina-alto-6a0590148/)")
st.sidebar.info("""
Note: Be aware that the content provided by this application is generated by AI. While we strive to provide accurate and up-to-date information, there may be instances where the AI-generated content might be incorrect, outdated, or incomplete. 
We recommend users to use this information as a guide and to perform personal due diligence to verify the accuracy of the information provided. We are not responsible for any decisions made based on the information provided by this application.
"""
)

llm = AzureChatOpenAI(
    api_key="xxx",
    azure_deployment="gpt-4-turbo",
    openai_api_version="2024-02-01",
    openai_api_type="azure_ad",
    temperature=0,
)

pool_management_endpoint = "xxx"
repl = SessionsPythonREPLTool(pool_management_endpoint=pool_management_endpoint)
prompt = hub.pull("vale/pythonrepl_optimizer")
tools = [repl]

agent = agents.create_tool_calling_agent(llm, tools, prompt)

agent_executor = agents.AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)


if prompt := st.chat_input():
    st.chat_message("user").write(prompt)
    with st.chat_message("assistant"):
        st_callback = StreamlitCallbackHandler(st.container())
        response = agent_executor.invoke({"input": prompt}, {"callbacks": [st_callback] })
        st.write(response)

The final result looks as follows:

Now the last step will be deploying our app in an Azure Container App instance.

Deploying the app on Azure Container Apps

To deploy your app on ACA, you can follow these steps:

Create your Dockerfile to create the image

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory in the container
WORKDIR /usr/src/app

# Copy the current directory contents into the container at /usr/src/app
COPY . .

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 8501 available to the world outside this container
EXPOSE 8501

# Run app.py when the container launches
ENTRYPOINT ["streamlit", "run", "optimizer.py", "--server.port=8501", "--server.address=0.0.0.0"]

Create your requirements.txt file. Make sure to include all the necessary libraries.

streamlit
pandas
numpy
langchain_core
langchain_experimental
langchain_azure_dynamic_sessions
langchain
langchain_openai
azure-identity
python-dotenv

Build your image:

docker build -t optimizer.py .

Tag and push your image into a container registry of your choice. In my case, I’ll use Azure Container Registry (ACR).

# Step 1: Log in to Azure
az login

# Step 2: Log in to your Azure Container Registry
az acr login --name acivaalt

# Step 3: Tag your Docker image
docker tag optimizer.py:latest acivaalt.azurecr.io/optimizer.py:v1

# Step 4: Push your Docker image to ACR
docker push acivaalt.azurecr.io/optimizer.py:v1

Tip: make sure to use all lowercase to avoid authentication errors!

You can now see your image living within your ACR directly from the Azure Portal under the tab “Repositories”:

Great! Now let’s create an Azure Container App from this image. You can easily create one from the Azure Portal by clicking “New Resource” and searching for Container App. Make sure to set “Deployment Source” to “Container Image”, so that you can directly mount the image in your registry.

Finally, make sure to enable ingress traffic so that your app will be reachable via HTTPS/TCP protocol.

Once the deployment is ready, you can see your application url in the overview of your resource:

And that’s it! Now you have your Optimizer up and running. You can also modify both the application and the infrastructure allocation anytime, under the tab “container”. For example, if you have multiple versions of your image, you can update the deployment as follows:

Last but not least, updates and revisions can be carried on in a CI/CD framework thanks to the ACA integrations with Azure Pipelines and GitHub actions. By doing so, as commits are pushed to your Azure DevOps or GitHub repo, a pipeline or action is triggered and it updates the container image in the container registry to the latest version (you can read more about CI/CD and ACA here).

Conclusions

Overall, Azure Container App offer a great managed platform to host and enhance your LLM applications in a secure and scalable environment. With the addition of dynamic sessions, they get even more interesting when it comes to GenAI applications, since they provide isolated and secure environments to run AI-generated code.

This makes Azure Container Apps a great building block in enterprise-scale AI landing zone, and I’m curious to see what future development will uncover.