Ever wished you had a private tutor that can assist you resolve difficult math issues? On this article, we’ll discover the right way to construct a math downside solver chat app utilizing LangChain, Gemma 9b, Llama 3.2 Imaginative and prescient and Streamlit. Our app won’t solely perceive and resolve text-based math issues but additionally in a position to resolve image-based questions. Let’s have a look at the issue assertion and discover the right way to strategy and resolve this downside step-by-step.
Studying Outcomes
- Study to create a strong, interactive Chat App utilizing LangChain to combine exterior instruments and resolve duties.
- Grasp the method of constructing a Chat App with LangChain that may effectively resolve complicated math issues.
- Discover the usage of APIs and setting variables to securely work together with massive language fashions.
- Acquire hands-on expertise in designing a user-friendly net app with dynamic question-solving capabilities.
- Uncover strategies for seamless interplay between frontend interfaces and backend AI fashions.
This text was revealed as part of the Information Science Blogathon.
Defining the Problem: Enterprise Case and Targets
We’re an EdTech firm trying to develop an modern AI-powered software that may resolve each text-based and image-based math issues in real-time. The app ought to present options with step-by-step explanations to reinforce studying and engagement for college kids, educators, and impartial learners.
We’re tasking you to design and construct this software utilizing newest AI applied sciences. The app have to be scalable, user-friendly, and able to processing each textual inputs and pictures with a seamless expertise.
Proposed Answer: Strategy and Implementation Technique
We are going to now focus on proposed options beneath:
Gemma2-9B It
It’s an open supply massive language mannequin from Google designed to course of and generate human-like textual content with exceptional accuracy. On this software:
- Position: It serves because the “mind” for fixing math issues introduced in textual content format.
- How It Works: When a person inputs a text-based math downside, Gemma2-9B understands the query, applies the required mathematical logic, and generates an answer.
Llama 3.2 Imaginative and prescient
It’s an open supply Mannequin from Meta AI, able to processing and analyzing photos, together with handwritten or printed math issues.
- Position: Allows the app to “see” and interpret math issues supplied in picture format and generate the response.
- How It Works: When customers add a picture, Llama 3.2 Imaginative and prescient Mannequin identifies the mathematical expressions or questions inside it, converts them right into a format appropriate for problem-solving.
LangChain
It is a framework particularly designed for constructing functions that contain interactions between language fashions and exterior methods.
- Position: Acts because the middleman between the app’s interface and the AI fashions, managing the circulate of data.
- How It Works:
- It coordinates how the person’s enter (textual content or picture) is processed.
- It ensures the sleek alternate of knowledge between Gemma2-9B, Llama 3.2 Imaginative and prescient Mannequin, and the app interface.
Streamlit
It is an open-source Python library for creating interactive net functions shortly and simply.
- Position: It’s used to put in writing frontend in Python
- How It Works:
- Builders can use Streamlit to design and deploy an online interface the place customers enter textual content or add photos.
- The interface interacts seamlessly with LangChain and the underlying AI fashions to show outcomes.
Visualizing the Strategy: Move Diagram of the Answer
The method begins by establishing the setting, checking the Groq API key, and configuring the Streamlit web page settings. It then initializes the Textual content LLM (ChatGroq) and integrates instruments like Wikipedia and a Calculator to reinforce the textual content agent’s capabilities. A welcome message and sidebar navigation information the person via the interface, the place they’ll enter both textual content or image-based queries. The textual content part collects person questions and processes them utilizing the textual content agent, which makes use of the LLM and exterior instruments to generate solutions. Equally, for picture queries, the picture part permits customers to add photos, that are then processed by the image-specific LLM (ChatGroq).
As soon as the textual content or picture question is processed, the respective agent generates and shows the suitable solutions. The system ensures easy interplay by alternating between dealing with textual content and picture queries. After displaying the solutions, the method concludes, and the system is prepared for the subsequent question. This circulate creates an intuitive, multi-modal expertise the place customers can ask each textual content and image-based questions, with the system offering correct and environment friendly responses.
Setting Up the Basis
Establishing the inspiration is an important step in making certain a seamless integration of instruments and processes, laying the groundwork for the profitable operation of the system.
Surroundings Setup
First issues first, arrange your growth setting. Ensure you have Python put in and create a digital setting to maintain your challenge dependencies organized.
# Create a Surroundings
python -m venv env
# Activate it on Home windows
.envScriptsactivate
# Activate in MacOS/Linux
supply env/bin/activate
Set up Dependencies
Set up the required libraries utilizing
pip set up -r https://uncooked.githubusercontent.com/Gouravlohar/Math-Solver/refs/heads/grasp/necessities.txt
Get the Groq API
- To entry the llama and Gemma Mannequin we are going to use Groq .
- Get your Free API Key from right here .
Import Needed Libraries
import streamlit as st
import os
import base64
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain.chains import LLMMathChain, LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.brokers.agent_types import AgentType
from langchain.brokers import Device, initialize_agent
from langchain_community.callbacks.streamlit import StreamlitCallbackHandler
from groq import Groq
These imports collectively arrange the required libraries and modules to create a Streamlit net software that interacts with language fashions for fixing mathematical issues and answering questions based mostly on textual content and picture inputs.
Load Surroundings Variables
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")
if not groq_api_key:
st.error("Groq API Key not present in .env file")
st.cease()
This part of the code is answerable for loading setting variables and making certain that the required API key for Groq is obtainable
Arrange the Each LLM’s
st.set_page_config(page_title="Math Solver", page_icon="👨🔬")
st.title("Math Solver")
llm_text = ChatGroq(mannequin="gemma2-9b-it", groq_api_key=groq_api_key)
llm_image = ChatGroq(mannequin="llama-3.2-90b-vision-preview", groq_api_key=groq_api_key)
This part of the code units up the Streamlit software by configuring its web page title and icon. It then initializes two totally different language fashions (LLMs) from llm_text for dealing with text-based questions utilizing the “gemma2-9b-it” mannequin, and llm_image for dealing with questions that embody photos utilizing the “llama-3.2-90b-vision-preview” mannequin. Each fashions are authenticated utilizing the beforehand retrieved Groq API key.
Initialize Instruments and Immediate Template
wikipedia_wrapper = WikipediaAPIWrapper()
wikipedia_tool = Device(
title="Wikipedia",
func=wikipedia_wrapper.run,
description="A instrument for looking the Web to seek out varied info on the matters talked about."
)
math_chain = LLMMathChain.from_llm(llm=llm_text)
calculator = Device(
title="Calculator",
func=math_chain.run,
description="A instrument for fixing mathematical issues. Present solely the mathematical expressions."
)
immediate = """
You're a mathematical problem-solving assistant tasked with serving to customers resolve their questions. Arrive on the answer logically, offering a transparent and step-by-step rationalization. Current your response in a structured point-wise format for higher understanding.
Query: {query}
Reply:
"""
prompt_template = PromptTemplate(
input_variables=["question"],
template=immediate
)
# Mix all of the instruments into a sequence for textual content questions
chain = LLMChain(llm=llm_text, immediate=prompt_template)
reasoning_tool = Device(
title="Reasoning Device",
func=chain.run,
description="A instrument for answering logic-based and reasoning questions."
)
# Initialize the brokers for textual content questions
assistant_agent_text = initialize_agent(
instruments=[wikipedia_tool, calculator, reasoning_tool],
llm=llm_text,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=False,
handle_parsing_errors=True
)
This a part of the code initializes varied instruments and configurations required to deal with text-based questions within the Streamlit software. It units up the instrument for Wikipedia search utilizing the WikipediaAPIWrapper, which permits the appliance to fetch info from the web, and initializes a mathematical instrument utilizing the LLMMathChain class, which makes use of the llm_text mannequin for fixing math issues, configured on calculator particularly for mathematical expressions. It additionally defines a immediate template to construction questions and anticipated solutions in a transparent, step-by-step method. This template guides the language mannequin to generate a logical and well-explained response to every person question.
Streamlit Session State
if "messages" not in st.session_state:
st.session_state["messages"] = [
{"role": "assistant", "content": "Welcome! I am your Assistant. How can I help you today?"}
]
for msg in st.session_state.messages:
if msg["role"] == "person" and "picture" in msg:
st.chat_message(msg["role"]).write(msg['content'])
st.picture(msg["image"], caption='Uploaded Picture', use_column_width=True)
else:
st.chat_message(msg["role"]).write(msg['content'])
The code initializes chat messages within the session state if they don’t exist, beginning with a default welcome message from the assistant. Subsequently, it loops via messages in st.session_state and prints every into the chat interface. For a message that’s from a person and carries a picture, the textual content content material together with uploaded picture will probably be rendered with a caption. If the message doesn’t include a picture, it shows solely the textual content content material. All chat messages-besides any uploaded images-to be displayed contained in the chat interface are additionally appropriate.
Sidebar and Response Cleansing
st.sidebar.header("Navigation")
if st.sidebar.button("Textual content Query"):
st.session_state["section"] = "textual content"
if st.sidebar.button("Picture Query"):
st.session_state["section"] = "picture"
if "part" not in st.session_state:
st.session_state["section"] = "textual content"
def clean_response(response):
if "```" in response:
response = response.break up("```")[1].strip()
return response
This Part of code makes the sidebar for Textual content Part and Picture Part and the perform clean_response cleansing the response from LLM.
Processing Textual content-Based mostly Inquiries
Processing text-based inquiries focuses on dealing with and addressing person questions in textual content kind, using language fashions to generate exact responses based mostly on the enter supplied.
if st.session_state["section"] == "textual content":
st.header("Textual content Query")
st.write("Please enter your mathematical query beneath, and I'll present an in depth answer.")
query = st.text_area("Your Query:", "Instance: I've 5 apples and three oranges. If I eat 2 apples, what number of fruits do I've left?")
if st.button("Get Reply"):
if query:
with st.spinner("Producing response..."):
st.session_state.messages.append({"position": "person", "content material": query})
st.chat_message("person").write(query)
st_cb = StreamlitCallbackHandler(st.container(), expand_new_thoughts=False)
attempt:
response = assistant_agent_text.run(st.session_state.messages, callbacks=[st_cb])
cleaned_response = clean_response(response)
st.session_state.messages.append({'position': 'assistant', "content material": cleaned_response})
st.write('### Response:')
st.success(cleaned_response)
besides ValueError as e:
st.error(f"An error occurred: {e}")
else:
st.warning("Please enter a query to get a solution.")
This part of the code handles the performance of the “Textual content Query” part within the Streamlit software. When the part is energetic, it gives a header and an area to enter any query associated to arithmetic. On the clicking of the “Get Reply” button, if the query is entered within the textual content space, it shows a spinner that signifies a response is being generated. The query entered by the person is added to the session state messages and rendered within the chat interface.
Processing Picture-Based mostly Inquiries
Processing image-based inquiries includes analyzing and decoding photos uploaded by customers, utilizing superior fashions to generate correct responses or insights based mostly on the visible content material.
elif st.session_state["section"] == "picture":
st.header("Picture Query")
st.write("Please enter your query beneath and add a picture. I'll present an in depth answer.")
query = st.text_area("Your Query:", "Instance: What would be the reply?")
uploaded_file = st.file_uploader("Add a picture", kind=["jpg", "jpeg", "png"])
if st.button("Get Reply"):
if query and uploaded_file just isn't None:
with st.spinner("Producing response..."):
image_data = uploaded_file.learn()
image_data_url = f"information:picture/jpeg;base64,{base64.b64encode(image_data).decode()}"
st.session_state.messages.append({"position": "person", "content material": query, "picture": image_data})
st.chat_message("person").write(query)
st.picture(image_data, caption='Uploaded Picture', use_column_width=True)
This part of the code handles the “Picture Query” performance within the Streamlit software. When the “Picture Query” part is energetic, it shows a header, a textual content space for customers to enter their questions, and an choice to add a picture. Upon clicking the “Get Reply” button, if each a query and a picture are supplied, it reveals a spinner indicating {that a} response is being generated. The uploaded picture is learn and encoded in base64 format. The person’s query and the picture information are appended to the session state messages and displayed within the chat interface, with the picture proven alongside the query. This setup ensures that each the textual content and picture inputs are accurately captured and displayed for additional processing.
Initialize Groq Consumer for Llama 3.2 Imaginative and prescient Mannequin
consumer = Groq()
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": question
},
{
"type": "image_url",
"image_url": {
"url": image_data_url
}
}
]
}
]
This part will put together the message for Llama imaginative and prescient mannequin
Groq API Name
attempt:
completion = consumer.chat.completions.create(
mannequin="llama-3.2-90b-vision-preview",
messages=messages,
temperature=1,
max_tokens=1024,
top_p=1,
stream=False,
cease=None,
)
This setup sends the person’s query and picture to the Groq API, which processes the inputs utilizing the required mannequin and returns a generated response.
Response from Picture Mannequin
response = completion.selections[0].message.content material
cleaned_response = clean_response(response)
st.session_state.messages.append({'position': 'assistant', "content material": cleaned_response})
st.write('### Response:')
st.success(cleaned_response)
besides ValueError as e:
st.error(f"An error occurred: {e}")
else:
st.warning("Please enter a query and add a picture to get a solution.")
This part of the code processes the response from the Groq API after producing a completion. It extracts the content material of the response from the primary selection within the completion consequence and cleans it utilizing the clean_response perform. The system appends the cleaned response to the session state messages with the position of “assistant” and shows it within the chat interface. The response seems beneath a “Response” header with a hit message. If a ValueError happens, the system shows an error message. If both the query or the picture just isn’t supplied, a warning prompts the person to enter each to get a solution.
Test the Full Code in GitHub Repo Right here.
Output
Enter for Textual content Part
A tank has three pipes hooked up to it. Pipe A can fill the tank in 4 hours, Pipe B can fill it in 6 hours, and Pipe C can empty the tank in 3 hours. If all three pipes are opened collectively, how lengthy will it take to fill the tank utterly?
Enter for Picture Part
Conclusion
By combining the powers of Gemma 9b, Llama 3.2 Imaginative and prescient, LangChain, and Streamlit, it’s doable to create a strong and user-friendly math problem-solving app that may revolutionize how college students be taught and have interaction with arithmetic, offering step-by-step options and real-time suggestions. This helps overcome not solely the complexity points inside mathematical ideas however, extra importantly, affords a scalable and accessible answer for learners in any respect ranges.
That is one instance of some ways such massive language fashions and AI can be utilized in schooling. As we proceed to develop these applied sciences, much more artistic and impactful functions will emerge to alter how we be taught and educate.
What do you consider such an idea? Have you ever ever tried to develop AI-based edutainment functions? Share your experiences and concepts within the feedback beneath!
Key Takeaways
- You’ll be able to construct a strong math downside solver utilizing superior AI fashions like Gemma 2 9b and Llama 3.2.
- Mix textual content and picture processing to create an app that may deal with varied kinds of math issues.
- Learn to combine LangChain with varied instruments to create a strong Math Drawback Solver Chat App that enhances person expertise.
- Leverage Groq acceleration to make sure your app delivers fast responses.
- Streamlit makes it simple to construct an intuitive and fascinating person interface.
- Contemplate the moral implications and design your app to advertise studying and understanding.
Ceaselessly Requested Questions
A. Gemma 2 9b is a strong language mannequin developed by Google, able to understanding and fixing complicated math issues introduced in textual content kind.
A. The app makes use of the Meta Llama 3.2 imaginative and prescient mannequin to interpret math issues in photos. It then extracts the issue and generate the response.
A. Sure, you may design the app to show the steps concerned in fixing an issue, which generally is a useful studying instrument for customers.
A. It’s necessary to make sure the app is used responsibly and doesn’t facilitate dishonest or hinder real studying. Design options that promote understanding and encourage customers to interact with the problem-solving course of.
A. You could find extra details about Gemma 2 9b, Llama 3.2, Groq, LangChain, and Streamlit on Analytics Vidhya, their respective official web sites and documentation pages.
The media proven on this article just isn’t owned by Analytics Vidhya and is used on the Creator’s discretion.