-9.5 C
United States of America
Monday, January 13, 2025

Automate Weblog To Twitter Thread


In in the present day’s digital panorama, content material repurposing has change into essential for maximizing attain and engagement. One efficient technique is remodeling long-form content material like weblog posts into partaking Twitter threads. Nonetheless, manually creating these threads will be time-consuming and difficult. On this article, we’ll discover tips on how to construct an software to automate weblog to Twitter thread creation utilizing Google’s Gemini-2.0 LLM, ChromaDB, and Streamlit.

Automate Weblog To Twitter Thread

Studying Aims

  • Automate weblog to Twitter thread transformation utilizing Google’s Gemini-2.0, ChromaDB, and Streamlit for environment friendly content material repurposing.
  • Achieve hands-on expertise to construct automate weblog to Twitter thread with embedding fashions and AI-driven immediate engineering.
  • Perceive the capabilities of Google’s Gemini-2.0 LLM for automated content material transformation.
  • Discover the combination of ChromaDB for environment friendly semantic textual content retrieval.
  • Construct a Streamlit-based net software for seamless PDF-to-Twitter thread conversion.
  • Achieve hands-on expertise with embedding fashions and immediate engineering for content material era.

This text was printed as part of the Knowledge Science Blogathon.

What’s Gemini-2.0?

Gemini-2.0 is Google’s newest multimodal Giant Language Mannequin (LLM), representing a big development in AI capabilities. It’s now accessible as Gemini-2.0-flash-exp API in Vertext AI Studio. It affords improved efficiency in areas like:

  • Multimodal understanding , coding, complicated directions following and performance calling in pure language.
  • Context-aware content material creation.
  • Advanced reasoning and evaluation.
  • It has native picture era, picture enhancing, controllable text-to-speech era.
  • Low-latency responses with the Flash variant.

For our mission, we’re particularly utilizing the gemini-2.0-flash-exp mannequin API, which is optimized for fast response whereas sustaining high-quality output.

What’s the ChromaDB Vector Database?

ChromaDB is an open-source embedding database that excels at storing and retrieving vector embeddings. It’s a high-performance database designed for environment friendly storing, looking, and managing embeddings generated by AI fashions. It allows similarity searches by indexing and evaluating vectors based mostly on their proximity to different comparable vectors in multidimensional house.

  • Environment friendly comparable search capabilities
  • Simple integration with common embedding fashions
  • Native storage and persistence
  • Versatile querying choices
  • Light-weight deployment

In our software, ChromaDB is the spine for storing and retrieving related chunks of textual content based mostly on semantic similarity, enabling extra contextual and correct thread era.

What’s Streamlit UI?

Streamlit is an open-source Python library designed to shortly construct interactive and data-driven net purposes for AI/ML tasks. Its give attention to simplicity allows builders to create visually interesting and practical apps with minimal effort.

Key Options:

  • Ease of Use: Builders can flip Python scripts into net apps with a number of strains of code.
  • Widgets: It affords a variety of enter widgets (sliders, dropdowns, textual content inputs) to make purposes interactive.
  • Knowledge Visualization: It Helps integration with common Python libraries like Matplotlib, Plotly, and Altair for dynamic viz.
  • Actual-time Updates: Mechanically rerun apps when code or enter adjustments, offering a seamless person expertise.
  • No Internet Growth Required: Take away the necessity to be taught HTML, CSS, or Javascript.

Utility of StreamLit

Streamlit is widley used for constructing bashboards, exploratory information evaluation instruments, AI/ML software prototypes. Its simplicity and interactivity makes it splendid for fast prototying and sharing insights with non-technical stakeholders. We’re utilizing streamlit for desiging the interface for the our software.

Motivation for Tweet Era Automation

The first motivation behind automating tweet thread era embody:

  • Time effectivity: Decreasing the annual effort required to create partaking Twitter threads.
  • Consistency: Sustaining a constant voice and format throughout all threads.
  • Scalability: Processing a number of article shortly and effectively.
  • Enhanced engagement: Leveraging AI to create extra compelling and shareable content material.
  • Content material optimization: Utilizing data-driven approaches to construction threads successfully.

Undertaking Environmental Setup Utilizing Conda

To arrange the mission setting, comply with these steps:

#create a brand new conda env
conda create -n tweet-gen python=3.11
conda activate tweet-gen

Set up required packages

pip set up langchain langchain-community langchain-google-genai
pip set up chromadb streamlit python-dotenv pypdf pydantic

Now create a mission folder named BlogToTweet or no matter you would like.

Additionally, create a .env file in your mission root. Get your GOOGLE API KEY from right here and put it within the .env file.

GOOGLE_API_KEY="<your API KEY>"

We’re all set as much as dive into the primary implementation half.

Undertaking Implementation

In our mission, there are 4 necessary information every having its performance for higher improvement. 

  • Providers: For placing all of the necessary companies in it.
  • fashions: For all of the necessary Pydantic information fashions.
  • foremost: For testing the automation within the terminal.
  • app: For Streamlit UI implementation.

Implementing Fashions

We are going to begin with implementing Pydantic information fashions within the fashions.py file. What’s Pydantic? learn this.

from typing import Elective, Record
from pydantic import BaseModel

class ArticleContent(BaseModel):
    title: str
    content material: str
    creator: Elective[str]
    url: str

class TwitterThread(BaseModel):
    tweets: Record[str]
    hashtags: Record[str]

It’s a easy but necessary mannequin that can give the article content material and all of the tweets a constant construction.

Implementing Providers

The ContentRepurposer handles the core performance of the applying. Right here is the skeletal construction of that class.

# companies.py
import os
from dotenv import load_dotenv
from typing import Record
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_community.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

from fashions import ArticleContent, TwitterThread

class ContentRepurposer:
    def __init__(self, content material):
        go

    def process_pdf(self, pdf_path: str) -> ArticleContent:
        go

    def get_relevant_chunk(self, question: str, ok: int = 3) -> Record[str]:
        go

    def generate_twitter_thread(self, article: ArticleContent):
        go

    def process_article(self, pdf_path: str):
        go

Within the preliminary technique, we’ll put all necessary parameters of the category

def __init__(self):
        from pydantic import SecretStr

        google_api_key = os.getenv("GOOGLE_API_KEY")
        if google_api_key is None:
            elevate ValueError("GOOGLE_API_KEY setting variable will not be set")
        _google_api_key = SecretStr(google_api_key)
        
        # Initialize Gemini mannequin and embeddings
        self.embeddings = GoogleGenerativeAIEmbeddings(
            mannequin="fashions/embedding-001",
        )
        self.llm = ChatGoogleGenerativeAI(

            mannequin="gemini-2.0-flash-exp",
            temperature=0.7)
        
        # Initialize textual content splitter
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            separators=["nn", "n", " ", ""]
        )

Right here, we use Pydantic SecretStr for the safe use of the API_KEY, for embedding our articles we use the GoogleGenerativeAIEmbeddings perform utilizing the embedding-001 mannequin. To create the tweets from the article we’ll use the ChatGoogleGenerativeAI perform with the most recent Gemini-2.0-flash-exp mannequin. RecursiveCharacterTextSplitter is used for splitting a big doc into elements right here we break up the doc in chunk_size 1000 with 200 character overlap.

Processing PDF

The system processes PDFs utilizing PyPDFLoader from LangChain and implements textual content chunking.

def process_pdf(self, pdf_path: str) -> ArticleContent:
        """Course of native PDF and create embeddings"""
        # Load PDF
        loader = PyPDFLoader(pdf_path)
        pages = loader.load()
        
        # Extract textual content
        textual content = " ".be part of(web page.page_content for web page in pages)
        
        # Cut up textual content into chunks
        chunks = self.text_splitter.split_text(textual content)
        
        # Create and retailer embeddings in Chroma
        self.vectordb = Chroma.from_texts(
            texts=chunks,
            embedding=self.embeddings,
            persist_directory="./information/chroma_db"
        )
        
        # Extract title and creator
        strains = [line.strip() for line in text.split("n") if line.strip()]
        title = strains[0] if strains else "Untitled"
        creator = strains[1] if len(strains) > 1 else None
        
        return ArticleContent(
            title=title,
            content material=textual content,
            creator=creator,
            url=pdf_path
        )

Within the above code, we implement the PDF processing performance of the applying.

  • Load and Extract PDF Textual content: The PyPDFLoader reads the PDF file and extracts the textual content content material from all pages, concatenating it right into a single string.
  • Cut up Textual content into Chunks: The textual content is split into smaller chunks utilizing the text_splitter for higher processing and bedding creation.
  • Generate Embeddings: Chroma creates vector embeddings from the textual content chunks and shops them in a persistent database listing.
  • Extract Title and Creator: The primary non-empty line is used because the title, and the second because the creator.
  • Return Article Content material: Assemble an Article Content material object containing the title, full textual content, creator, and file path.

Getting the related Chunk

def get_relevant_chunks(self, question: str, ok: int = 3) -> Record[str]:
        """Retrieve related chunks from vector database"""
        outcomes = self.vectordb.similarity_search(question, ok=ok)
        return [doc.page_content for doc in results]

This code retrieves the highest ok (default 3) most related textual content chunks from the vector database based mostly on similarity to the given question.

Producing Tweet thread from Article

This technique is an important as a result of right here we’ll use all of the generative AI, embedding, and prompts collectively to generate the Thread from the shopper’s PDF file.

def generate_twitter_thread(self, article: ArticleContent) -> TwitterThread:
        """Generate Twitter thread utilizing Gemini"""
        # First, get probably the most related chunks for various points
        intro_chunks = self.get_relevant_chunks("introduction and details")
        technical_chunks = self.get_relevant_chunks("technical particulars and implementation")
        conclusion_chunks = self.get_relevant_chunks("conclusion and key takeaways")
       
        thread_prompt = PromptTemplate(
            input_variables=["title", "intro", "technical", "conclusion"],
            template="""
            Write an enticing Twitter thread (8-10 tweets) summarizing this technical article in an approachable and human-like model.

            Title: {title}

            Introduction Context:
            {intro}

            Technical Particulars:
            {technical}

            Key Takeaways:
            {conclusion}

            Tips:
            1. Begin with a hook that grabs consideration (e.g., a stunning reality, daring assertion, or thought-provoking query).
            2. Use a conversational tone and clarify complicated particulars merely, with out jargon.
            3. Embody concise tweets beneath 280 characters, following the 1/n numbering format.
            4. Break down the important thing insights logically, and make every tweet construct curiosity for the subsequent one.
            5. Embody related examples, analogies, or comparisons to assist understanding.
            6. Finish the thread with a robust conclusion and a name to motion (e.g., "Learn the total article," "Observe for extra insights").
            7. Make it relatable, academic, and interesting.

            Output format:
            - A numbered record of tweets, with every tweet on a brand new line.
            - After the tweets, recommend 3-5 hashtags that summarize the thread, beginning with #.
            """
        )
        
        chain = LLMChain(llm=self.llm, immediate=thread_prompt)
        outcome = chain.run({
            "title": article.title,
            "intro": "n".be part of(intro_chunks),
            "technical": "n".be part of(technical_chunks),
            "conclusion": "n".be part of(conclusion_chunks)
        })
        
        # Parse the outcome into tweets and hashtags
        strains = outcome.break up("n")
        tweets = [line.strip() for line in lines if line.strip() and not line.strip().startswith("#")]
        hashtags = [tag.strip() for tag in lines if tag.strip().startswith("#")]
        
        # Guarantee we've at the least one tweet and hashtag
        if not tweets:
            tweets = ["Thread about " + article.title]
        if not hashtags:
            hashtags = ["#AI", "#TechNews"]
            
        return TwitterThread(tweets=tweets, hashtags=hashtags)

Let’s perceive what is occurring within the above code step-by-step

  • Retrieve Related Chunks: The tactic first extracts related chunks of textual content for the introduction, technical particulars, and conclusion utilizing the get_relevant_chunks technique.
  • Put together a Immediate: A PromptTemplate is created with directions to jot down an enticing Twitter thread summarizing the article, together with particulars on tone, construction, and formatting tips.
  • Run the LLM Chain: The LLMChain is used with the LLM fashions to course of the immediate and generate a thread based mostly on the article’s title and extracted chunks.
  • Parse Outcomes: The generated output is break up into tweets and hashtags, making certain correct formatting and extracting the required elements.
  • Return Twitter Thread: The tactic returns a TwitterThread object containing the formatted tweets and hashtags.

Course of The Article

This technique processes a PDF file to extract its content material and generates a Twitter thread summarizing it. and final it’ll return a Twitter Thread.

def process_article(self, pdf_path: str) -> TwitterThread:
        """Most important technique to course of article and generate content material"""
        attempt:
            article = self.process_pdf(pdf_path)
            thread = self.generate_twitter_thread(article)
            return thread
        besides Exception as e:
            print(f"Error processing article: {str(e)}")
            elevate

Upto right here We applied all the required code for this mission, now there are two methods we will proceed additional.

  • Implementing the Most important file for testing and
  • Implementing Streamlit Utility for the online interface

When you don’t wish to take a look at the applying in terminal mode then you’ll be able to skip the Most important file implementation and go on to the Streamlit Utility implementation.

Implementing the Most important file for testing

Now, we put collectively all of the modules to check the applying.

import os
from dotenv import load_dotenv
from companies import ContentRepurposer


def foremost():
    # Load setting variables
    load_dotenv()
    google_api_key = os.getenv("GOOGLE_API_KEY")

    if not google_api_key:
        elevate ValueError("GOOGLE_API_KEY setting variable not discovered")

    # Initialize repurposer
    repurposer = ContentRepurposer()

    # Path to your native PDF
    # pdf_path = "information/guide_to_jax.pdf"
    pdf_path = "information/build_llm_powered_app.pdf"

    attempt:
        thread = repurposer.process_article(pdf_path)

        print("Generated Twitter Thread:")
        for i, tweet in enumerate(thread.tweets, 1):
            print(f"nTweet {i}/{len(thread.tweets)}:")
            print(tweet)

        print("nSuggested Hashtags:")
        print(" ".be part of(thread.hashtags))

    besides Exception as e:
        print(f"Did not course of article: {str(e)}")


if __name__ == "__main__":
    foremost()

Right here, you’ll be able to see that it merely imports all of the modules, Verify the GOOGLE_API_KEY availability, initiates ContentRepuposer() class, after which within the attempt block creates a thread by calling the process_article() technique from the repurposer object. On the final, some printing strategies for tweets printing on the terminal and the Exception dealing with.

To check the applying, create a folder named information in your mission root and put your downloaded PDF there. To obtain the article from AnalyticsVidya, go to any article click on the obtain button, and obtain it.

Now in your terminal,

python foremost.py

Instance Weblog 1 Output

Example1 Output:

Instance Weblog 2 Output

Example2 Output:

I feel you get the concept of how lovely the applying is! Let’s make it extra aesthetically sensible.

Implementing the Streamlit APP

Now we’ll do just about the identical as above in a extra UI-centric manner.

Importing Libraries and Env Configuration

import os
import streamlit as st
from dotenv import load_dotenv
from companies import ContentRepurposer
import pyperclip
from pathlib import Path

# Load setting variables
load_dotenv()

# Set web page configuration
st.set_page_config(page_title="Content material Repurposer", page_icon="🐦", format="vast")

Customized CSS

# Customized CSS
st.markdown(
    """
<model>
    .tweet-box {
        background-color: #181211;
        border: 1px stable #e1e8ed;
        border-radius: 10px;
        padding: 15px;
        margin: 10px 0;
    }
    .copy-button {
        background-color: #1DA1F2;
        shade: white;
        border: none;
        border-radius: 5px;
        padding: 5px 10px;
        cursor: pointer;
    }
    .main-header {
        shade: #1DA1F2;
        text-align: heart;
    }
    .hashtag {
        shade: #1DA1F2;
        background-color: #E8F5FE;
        padding: 5px 10px;
        border-radius: 15px;
        margin: 5px;
        show: inline-block;
    }
</model>
""",
    unsafe_allow_html=True,
)

Right here, we’ve made some CSS styling for the online pages (tweets, copy buttons, hashtags) is CSS complicated to you? go to W3Schools

Some Necessary Features

def create_temp_pdf(uploaded_file):
    """Create a short lived PDF file from uploaded content material"""
    temp_dir = Path("temp")
    temp_dir.mkdir(exist_ok=True)

    temp_path = temp_dir / "uploaded_pdf.pdf"
    with open(temp_path, "wb") as f:
        f.write(uploaded_file.getvalue())

    return str(temp_path)


def initialize_session_state():
    """Initialize session state variables"""
    if "tweets" not in st.session_state:
        st.session_state.tweets = None
    if "hashtags" not in st.session_state:
        st.session_state.hashtags = None


def copy_text_and_show_success(textual content, success_key):
    """Copy textual content to clipboard and present success message"""
    attempt:
        pyperclip.copy(textual content)
        st.success("Copied to clipboard!", icon="✅")
    besides Exception as e:
        st.error(f"Failed to repeat: {str(e)}")

Right here, the create_temp_pdf() technique will create a temp listing within the mission folder and can put the uploaded PDF there for your complete course of.

initialize_session_state() technique will examine whether or not the tweets and hashtags are within the Streamlit session or not.

The copy_text_and_show_success() technique will use the Pyperclip library to repeat the tweets and hashtags to the clipboard and present that the copy was profitable.

Most important Perform

def foremost():
    initialize_session_state()

    # Header
    st.markdown(
        "<h1 class="main-header">📄 Content material to Twitter Thread 🐦</h1>",
        unsafe_allow_html=True,
    )

    # Create two columns for format
    col1, col2 = st.columns([1, 1])

    with col1:
        st.markdown("### Add PDF")
        uploaded_file = st.file_uploader("Drop your PDF right here", sort=["pdf"])

        if uploaded_file:
            st.success("PDF uploaded efficiently!")

            if st.button("Generate Twitter Thread", key="generate"):
                with st.spinner("Producing Twitter thread..."):
                    attempt:
                        # Get Google API key
                        google_api_key = os.getenv("GOOGLE_API_KEY")
                        if not google_api_key:
                            st.error(
                                "Google API key not discovered. Please examine your .env file."
                            )
                            return

                        # Save uploaded file
                        pdf_path = create_temp_pdf(uploaded_file)

                        # Course of PDF and generate thread
                        repurposer = ContentRepurposer()
                        thread = repurposer.process_article(pdf_path)

                        # Retailer leads to session state
                        st.session_state.tweets = thread.tweets
                        st.session_state.hashtags = thread.hashtags

                        # Clear up momentary file
                        os.take away(pdf_path)

                    besides Exception as e:
                        st.error(f"Error producing thread: {str(e)}")

    with col2:
        if st.session_state.tweets:
            st.markdown("### Generated Twitter Thread")

            # Copy total thread part
            st.markdown("#### Copy Full Thread")
            all_tweets = "nn".be part of(st.session_state.tweets)
            if st.button("📋 Copy Complete Thread"):
                copy_text_and_show_success(all_tweets, "thread")

            # Show particular person tweets
            st.markdown("#### Particular person Tweets")
            for i, tweet in enumerate(st.session_state.tweets, 1):
                tweet_col1, tweet_col2 = st.columns([4, 1])

                with tweet_col1:
                    st.markdown(
                        f"""
                    <div class="tweet-box">
                        <p>{tweet}</p>
                    </div>
                    """,
                        unsafe_allow_html=True,
                    )

                with tweet_col2:
                    if st.button("📋", key=f"tweet_{i}"):
                        copy_text_and_show_success(tweet, f"tweet_{i}")

            # Show hashtags
            if st.session_state.hashtags:
                st.markdown("### Instructed Hashtags")

                # Show hashtags with copy button
                hashtags_text = " ".be part of(st.session_state.hashtags)
                hashtags_col1, hashtags_col2 = st.columns([4, 1])

                with hashtags_col1:
                    hashtags_html = " ".be part of(
                        [
                            f"<span class="hashtag">{hashtag}</span>"
                            for hashtag in st.session_state.hashtags
                        ]
                    )
                    st.markdown(hashtags_html, unsafe_allow_html=True)

                with hashtags_col2:
                    if st.button("📋 Copy Tags"):
                        copy_text_and_show_success(hashtags_text, "hashtags")


if __name__ == "__main__":
    foremost()

When you learn this code carefully, you will note that Streamlit creates two columns: one for the PDF uploader perform and the opposite for exhibiting the generated tweets.

Within the first column, we’ve accomplished just about the identical because the earlier foremost.py with some further markdown, including buttons for importing and producing threads utilizing the Streamlit object.

Within the second column, Streamlit iterates the tweet record or generated thread, places every tweet in a tweet field, and creates a duplicate button for the person tweet, and within the final, it’ll present all of the hashtags and their copy buttons.

Now the enjoyable half!!

Open your terminal and sort

streamlit run .app.py

If the whole lot is finished proper It would begin a Streamlit software in your default browser.

twitter thread

Now, drag and drop your downloaded PDF on the field, it’ll mechanically add the PDF to the system, and click on on the Generate Twitter Thread button to generate tweets.

Twitter Thread

You’ll be able to copy full thread or particular person tweet utilizing respective copy buttons.

I hope doing hands-on tasks like these will assist you be taught many sensible ideas on Generative AI, Python libraries, and programming. Completely happy Coding, Keep wholesome.

All of the code used on this article is right here.

Conclusion

This mission demonstrates the ability of mixing fashionable AI applied sciences to automate content material repurposing. By leveraging Gemini-2.0 and ChromaDB, we’ve created a system that not solely saves time but additionally maintains high-quality output. The modular structure guarantee straightforward upkeep and extensibility, whereas the Streamlit interface makes it accessible to non-technical customers.

Key Takeaways

  • The mission demonstrates profitable integration of cutting-edge AI instruments for practival content material automation.
  • The structure’s modularity permits for simple upkeep and future enhancements, making it a sustainable answer for content material repurposing.
  • The Streamlit interface makes the instrument accessible to content material creators with out technical experience, bridging the hole between complicated AI expertise and sensible utilization.
  • The implementation can deal with varied content material sorts and volumes, making it appropriate for each particular person content material creators and huge organizations.

Often Requested Questions

Q1. How does the syste deal with lengthy article?

A. The system makes use of RecursiveCharacterTextSplitter to interrupt down lengthy articles into manageable chunks, that are then embedded and saved in ChromaDB. When producing threads, it retrieves probably the most related chunk utilizing similarity search.

Q2. What’s the optimum temperature setting for Gemini-2.0 on this software?

A. We used a temperature of 0.7, which supplied stability between creativity and coherence. You’ll be able to modify this setting based mostly on particular wants, with greater values (>0.7) producing extra inventive output and decrease values (<0.7) producing extra targeted content material.

Q3. How does the system guarantee tweet size compliance?

A. The immediate template explicitly specifies the 280-character restrict, and the LLM is skilled to respect this constraint. You’ll be able to add further validation to make sure compliance programmatically.

The media proven on this article will not be owned by Analytics Vidhya and is used on the Creator’s discretion.

A self-taught, project-driven learner, like to work on complicated tasks on deep studying, Laptop imaginative and prescient, and NLP. I at all times attempt to get a deep understanding of the subject which can be in any subject corresponding to Deep studying, Machine studying, or Physics. Like to create content material on my studying. Attempt to share my understanding with the worlds.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles