privateGPT Walkthrough

NLP
Deep Learning
LLMs
Author

Aayush Agrawal

Published

May 22, 2023

A code walkthrough of privateGPT repo on how to build your own offline GPT Q&A system.

Large Language Models (LLMs) have surged in popularity, pushing the boundaries of natural language processing. OpenAI’s GPT-3.5 is a prime example, revolutionizing our technology interactions and sparking innovation. Particularly, LLMs excel in building Question Answering applications on knowledge bases. In this blog, we delve into the top trending GitHub repository for this week: the PrivateGPT repository and do a code walkthrough.

Fig. 1: Private GPT on GitHub’s top trending chart

1 What is privateGPT?

One of the primary concerns associated with employing online interfaces like OpenAI chatGPT or other Large Language Model systems pertains to data privacy, data control, and potential data leakage. The privateGPT repository presents a fully offline alternative for engaging with personal documents. It is constructed using open source tools and technology, thereby enabling the utilization of LLMs capabilities without compromising data privacy or encountering data leakage issues.

Fig.2: privateGPT on GitHub. At the time of writing repo had 19K+ stars and 2k+ forks.

2 Running privateGPT locally

To run privateGPT locally, users need to install the necessary packages, configure specific variables, and provide their knowledge base for question-answering purposes. Additional information on the installation process and usage can be found in the repository documentation or by referring to a dedicated blog post on the topic.

Essentially you can run it by calling the privateGPT.py file like -

python privateGPT.py
Fig.3: Invoking privateGPT locally and asking a question.

And get a response that also mention the sources it looked up for context.

Fig.4: privateGPT response.

3 Code Walkthrough

privateGPT code comprises two pipelines:

  1. Ingestion Pipeline: This pipeline is responsible for converting and storing your documents, as well as generating embeddings for them. The documents are stored in a suitable format, and their embeddings are stored in an embedding database.

  2. Q&A Interface: This interface accepts user prompts, the embedding database, and an open-source Language Model (LM) model as inputs. It utilizes these inputs to generate responses to the user’s queries.

3.1 Ingestion Pipeline

Let’s delve into the ingestion pipeline for a closer examination. The ingestion pipeline encompasses the following steps:

  1. Identifying files with various extensions and retrieving all the knowledge base from the source directory.

  2. Splitting the documents into smaller chunks based on the parameters of chunk_size and chunk_overlap.

  3. Initializing the Huggingfaceembeddings module of langchain. This involves loading a pre-trained language model from the sentence_transformers library.

  4. Initializing the Chroma database from langchain.vectorstores. This step involves taking the chunked text and the initialized embedding model and saving it in the embedding database on disk.

Fig.5: Ingestion Pipeline

Let’s look at these steps one by one.

3.1.1 Identifying and loading files from the source directory

First, we import the required libraries and various text loaders from langchain.document_loaders.

import os
import glob
from typing import List
from multiprocessing import Pool
from tqdm import tqdm
from langchain.document_loaders import (
    CSVLoader,
    EverNoteLoader,
    PDFMinerLoader,
    TextLoader,
    UnstructuredEmailLoader,
    UnstructuredEPubLoader,
    UnstructuredHTMLLoader,
    UnstructuredMarkdownLoader,
    UnstructuredODTLoader,
    UnstructuredPowerPointLoader,
    UnstructuredWordDocumentLoader,
)
from langchain.docstore.document import Document

Next, we define the mapping b/w each extension and their respective langchain document loader. You can read document loader documentation for more available loaders.

# Map file extensions to document loaders and their arguments
LOADER_MAPPING = {
    ".csv": (CSVLoader, {}),
    ".doc": (UnstructuredWordDocumentLoader, {}),
    ".docx": (UnstructuredWordDocumentLoader, {}),
    ".enex": (EverNoteLoader, {}),
    ".epub": (UnstructuredEPubLoader, {}),
    ".html": (UnstructuredHTMLLoader, {}),
    ".md": (UnstructuredMarkdownLoader, {}),
    ".odt": (UnstructuredODTLoader, {}),
    ".pdf": (PDFMinerLoader, {}),
    ".ppt": (UnstructuredPowerPointLoader, {}),
    ".pptx": (UnstructuredPowerPointLoader, {}),
    ".txt": (TextLoader, {"encoding": "utf8"}),
}

Next, we define our single document loader.

def load_single_document(file_path: str) -> Document:
    ## Find extension of the file
    ext = "." + file_path.rsplit(".", 1)[-1] 
    if ext in LOADER_MAPPING: 
        # Find the appropriate loader class and arguments
        loader_class, loader_args = LOADER_MAPPING[ext] 
        # Invoke the instance of document loader
        loader = loader_class(file_path, **loader_args) 
        ## Return the loaded document
        return loader.load()[0] 
    raise ValueError(f"Unsupported file extension '{ext}'")
    
git_dir = "../../../../privateGPT/"
loaded_document = load_single_document(git_dir+'source_documents/state_of_the_union.txt')
print(f'Type of loaded document {type(loaded_document)}')
loaded_document
Type of loaded document <class 'langchain.schema.Document'>
Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \n\nIn this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. \n\nLet each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. \n\nPlease rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. \n\nThroughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos.   \n\nThey keep moving.   \n\nAnd the costs and the threats to America and the world keep rising.   \n\nThat’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2. \n\nThe United States is a member along with 29 other nations. \n\nIt matters. American diplomacy matters. American resolve matters. \n\nPutin’s latest attack on Ukraine was premeditated and unprovoked. \n\nHe rejected repeated efforts at diplomacy. \n\nHe thought the West and NATO wouldn’t respond. And he thought he could divide us at home. Putin was wrong. We were ready.  Here is what we did.   \n\nWe prepared extensively and carefully. \n\nWe spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. \n\nI spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.  \n\nWe countered Russia’s lies with truth.   \n\nAnd now that he has acted the free world is holding him accountable. \n\nAlong with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland. \n\nWe are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. \n\nTogether with our allies –we are right now enforcing powerful economic sanctions. \n\nWe are cutting off Russia’s largest banks from the international financial system.  \n\nPreventing Russia’s central bank from defending the Russian Ruble making Putin’s $630 Billion “war fund” worthless.   \n\nWe are choking off Russia’s access to technology that will sap its economic strength and weaken its military for years to come.  \n\nTonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. \n\nThe U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs.  \n\nWe are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains. \n\nAnd tonight I am announcing that we will join our allies in closing off American air space to all Russian flights – further isolating Russia – and adding an additional squeeze –on their economy. The Ruble has lost 30% of its value. \n\nThe Russian stock market has lost 40% of its value and trading remains suspended. Russia’s economy is reeling and Putin alone is to blame. \n\nTogether with our allies we are providing support to the Ukrainians in their fight for freedom. Military assistance. Economic assistance. Humanitarian assistance. \n\nWe are giving more than $1 Billion in direct assistance to Ukraine. \n\nAnd we will continue to aid the Ukrainian people as they defend their country and to help ease their suffering.  \n\nLet me be clear, our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.  \n\nOur forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies – in the event that Putin decides to keep moving west.  \n\nFor that purpose we’ve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia. \n\nAs I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.  \n\nAnd we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.  \n\nPutin has unleashed violence and chaos.  But while he may make gains on the battlefield – he will pay a continuing high price over the long run. \n\nAnd a proud Ukrainian people, who have known 30 years  of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.  \n\nTo all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world. \n\nAnd I’m taking robust action to make sure the pain of our sanctions  is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n\nTonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.  \n\nAmerica will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies.  \n\nThese steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming. \n\nBut I want you to know that we are going to be okay. \n\nWhen the history of this era is written Putin’s war on Ukraine will have left Russia weaker and the rest of the world stronger. \n\nWhile it shouldn’t have taken something so terrible for people around the world to see what’s at stake now everyone sees it clearly. \n\nWe see the unity among leaders of nations and a more unified Europe a more unified West. And we see unity among the people who are gathering in cities in large crowds around the world even in Russia to demonstrate their support for Ukraine.  \n\nIn the battle between democracy and autocracy, democracies are rising to the moment, and the world is clearly choosing the side of peace and security. \n\nThis is a real test. It’s going to take time. So let us continue to draw inspiration from the iron will of the Ukrainian people. \n\nTo our fellow Ukrainian Americans who forge a deep bond that connects our two nations we stand with you. \n\nPutin may circle Kyiv with tanks, but he will never gain the hearts and souls of the Ukrainian people. \n\nHe will never extinguish their love of freedom. He will never weaken the resolve of the free world. \n\nWe meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \n\nThe pandemic has been punishing. \n\nAnd so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \n\nI understand. \n\nI remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it. \n\nThat’s why one of the first things I did as President was fight to pass the American Rescue Plan.  \n\nBecause people were hurting. We needed to act, and we did. \n\nFew pieces of legislation have done more in a critical moment in our history to lift us out of crisis. \n\nIt fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans.  \n\nHelped put food on their table, keep a roof over their heads, and cut the cost of health insurance. \n\nAnd as my Dad used to say, it gave people a little breathing room. \n\nAnd unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind. \n\nAnd it worked. It created jobs. Lots of jobs. \n\nIn fact—our economy created over 6.5 Million new jobs just last year, more jobs created in one year  \nthan ever before in the history of America. \n\nOur economy grew at a rate of 5.7% last year, the strongest growth in nearly 40 years, the first step in bringing fundamental change to an economy that hasn’t worked for the working people of this nation for too long.  \n\nFor the past 40 years we were told that if we gave tax breaks to those at the very top, the benefits would trickle down to everyone else. \n\nBut that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century. \n\nVice President Harris and I ran for office with a new economic vision for America. \n\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up  \nand the middle out, not from the top down.  \n\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \n\nAmerica used to have the best roads, bridges, and airports on Earth. \n\nNow our infrastructure is ranked 13th in the world. \n\nWe won’t be able to compete for the jobs of the 21st Century if we don’t fix that. \n\nThat’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. \n\nThis was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. \n\nWe’re done talking about infrastructure weeks. \n\nWe’re going to have an infrastructure decade. \n\nIt is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China.  \n\nAs I’ve told Xi Jinping, it is never a good bet to bet against the American people. \n\nWe’ll create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \n\nAnd we’ll do it all to withstand the devastating effects of the climate crisis and promote environmental justice. \n\nWe’ll build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities. \n\n4,000 projects have already been announced. \n\nAnd tonight, I’m announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair. \n\nWhen we use taxpayer dollars to rebuild America – we are going to Buy American: buy American products to support American jobs. \n\nThe federal government spends about $600 Billion a year to keep the country safe and secure. \n\nThere’s been a law on the books for almost a century \nto make sure taxpayers’ dollars support American jobs and businesses. \n\nEvery Administration says they’ll do it, but we are actually doing it. \n\nWe will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America. \n\nBut to compete for the best jobs of the future, we also need to level the playing field with China and other competitors. \n\nThat’s why it is so important to pass the Bipartisan Innovation Act sitting in Congress that will make record investments in emerging technologies and American manufacturing. \n\nLet me give you one example of why it’s so important to pass it. \n\nIf you travel 20 miles east of Columbus, Ohio, you’ll find 1,000 empty acres of land. \n\nIt won’t look like much, but if you stop and look closely, you’ll see a “Field of dreams,” the ground on which America’s future will be built. \n\nThis is where Intel, the American company that helped build Silicon Valley, is going to build its $20 billion semiconductor “mega site”. \n\nUp to eight state-of-the-art factories in one place. 10,000 new good-paying jobs. \n\nSome of the most sophisticated manufacturing in the world to make computer chips the size of a fingertip that power the world and our everyday lives. \n\nSmartphones. The Internet. Technology we have yet to invent. \n\nBut that’s just the beginning. \n\nIntel’s CEO, Pat Gelsinger, who is here tonight, told me they are ready to increase their investment from  \n$20 billion to $100 billion. \n\nThat would be one of the biggest investments in manufacturing in American history. \n\nAnd all they’re waiting for is for you to pass this bill. \n\nSo let’s not wait any longer. Send it to my desk. I’ll sign it.  \n\nAnd we will really take off. \n\nAnd Intel is not alone. \n\nThere’s something happening in America. \n\nJust look around and you’ll see an amazing story. \n\nThe rebirth of the pride that comes from stamping products “Made In America.” The revitalization of American manufacturing.   \n\nCompanies are choosing to build new factories here, when just a few years ago, they would have built them overseas. \n\nThat’s what is happening. Ford is investing $11 billion to build electric vehicles, creating 11,000 jobs across the country. \n\nGM is making the largest investment in its history—$7 billion to build electric vehicles, creating 4,000 jobs in Michigan. \n\nAll told, we created 369,000 new manufacturing jobs in America just last year. \n\nPowered by people I’ve met like JoJo Burgess, from generations of union steelworkers from Pittsburgh, who’s here with us tonight. \n\nAs Ohio Senator Sherrod Brown says, “It’s time to bury the label “Rust Belt.” \n\nIt’s time. \n\nBut with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills.  \n\nInflation is robbing them of the gains they might otherwise feel. \n\nI get it. That’s why my top priority is getting prices under control. \n\nLook, our economy roared back faster than most predicted, but the pandemic meant that businesses had a hard time hiring enough workers to keep up production in their factories. \n\nThe pandemic also disrupted global supply chains. \n\nWhen factories close, it takes longer to make goods and get them from the warehouse to the store, and prices go up. \n\nLook at cars. \n\nLast year, there weren’t enough semiconductors to make all the cars that people wanted to buy. \n\nAnd guess what, prices of automobiles went up. \n\nSo—we have a choice. \n\nOne way to fight inflation is to drive down wages and make Americans poorer.  \n\nI have a better plan to fight inflation. \n\nLower your costs, not your wages. \n\nMake more cars and semiconductors in America. \n\nMore infrastructure and innovation in America. \n\nMore goods moving faster and cheaper in America. \n\nMore jobs where you can earn a good living in America. \n\nAnd instead of relying on foreign supply chains, let’s make it in America. \n\nEconomists call it “increasing the productive capacity of our economy.” \n\nI call it building a better America. \n\nMy plan to fight inflation will lower your costs and lower the deficit. \n\n17 Nobel laureates in economics say my plan will ease long-term inflationary pressures. Top business leaders and most Americans support my plan. And here’s the plan: \n\nFirst – cut the cost of prescription drugs. Just look at insulin. One in ten Americans has diabetes. In Virginia, I met a 13-year-old boy named Joshua Davis.  \n\nHe and his Dad both have Type 1 diabetes, which means they need insulin every day. Insulin costs about $10 a vial to make.  \n\nBut drug companies charge families like Joshua and his Dad up to 30 times more. I spoke with Joshua’s mom. \n\nImagine what it’s like to look at your child who needs insulin and have no idea how you’re going to pay for it.  \n\nWhat it does to your dignity, your ability to look your child in the eye, to be the parent you expect to be. \n\nJoshua is here with us tonight. Yesterday was his birthday. Happy birthday, buddy.  \n\nFor Joshua, and for the 200,000 other young people with Type 1 diabetes, let’s cap the cost of insulin at $35 a month so everyone can afford it.  \n\nDrug companies will still do very well. And while we’re at it let Medicare negotiate lower prices for prescription drugs, like the VA already does. \n\nLook, the American Rescue Plan is helping millions of families on Affordable Care Act plans save $2,400 a year on their health care premiums. Let’s close the coverage gap and make those savings permanent. \n\nSecond – cut energy costs for families an average of $500 a year by combatting climate change.  \n\nLet’s provide investments and tax credits to weatherize your homes and businesses to be energy efficient and you get a tax credit; double America’s clean energy production in solar, wind, and so much more;  lower the price of electric vehicles, saving you another $80 a month because you’ll never have to pay at the gas pump again. \n\nThird – cut the cost of child care. Many families pay up to $14,000 a year for child care per child.  \n\nMiddle-class and working families shouldn’t have to pay more than 7% of their income for care of young children.  \n\nMy plan will cut the cost in half for most families and help parents, including millions of women, who left the workforce during the pandemic because they couldn’t afford child care, to be able to get back to work. \n\nMy plan doesn’t stop there. It also includes home and long-term care. More affordable housing. And Pre-K for every 3- and 4-year-old.  \n\nAll of these will lower costs. \n\nAnd under my plan, nobody earning less than $400,000 a year will pay an additional penny in new taxes. Nobody.  \n\nThe one thing all Americans agree on is that the tax system is not fair. We have to fix it.  \n\nI’m not looking to punish anyone. But let’s make sure corporations and the wealthiest Americans start paying their fair share. \n\nJust last year, 55 Fortune 500 corporations earned $40 billion in profits and paid zero dollars in federal income tax.  \n\nThat’s simply not fair. That’s why I’ve proposed a 15% minimum tax rate for corporations. \n\nWe got more than 130 countries to agree on a global minimum tax rate so companies can’t get out of paying their taxes at home by shipping jobs and factories overseas. \n\nThat’s why I’ve proposed closing loopholes so the very wealthy don’t pay a lower tax rate than a teacher or a firefighter.  \n\nSo that’s my plan. It will grow the economy and lower costs for families. \n\nSo what are we waiting for? Let’s get this done. And while you’re at it, confirm my nominees to the Federal Reserve, which plays a critical role in fighting inflation.  \n\nMy plan will not only lower costs to give families a fair shot, it will lower the deficit. \n\nThe previous Administration not only ballooned the deficit with tax cuts for the very wealthy and corporations, it undermined the watchdogs whose job was to keep pandemic relief funds from being wasted. \n\nBut in my administration, the watchdogs have been welcomed back. \n\nWe’re going after the criminals who stole billions in relief money meant for small businesses and millions of Americans.  \n\nAnd tonight, I’m announcing that the Justice Department will name a chief prosecutor for pandemic fraud. \n\nBy the end of this year, the deficit will be down to less than half what it was before I took office.  \n\nThe only president ever to cut the deficit by more than one trillion dollars in a single year. \n\nLowering your costs also means demanding more competition. \n\nI’m a capitalist, but capitalism without competition isn’t capitalism. \n\nIt’s exploitation—and it drives up prices. \n\nWhen corporations don’t have to compete, their profits go up, your prices go up, and small businesses and family farmers and ranchers go under. \n\nWe see it happening with ocean carriers moving goods in and out of America. \n\nDuring the pandemic, these foreign-owned companies raised prices by as much as 1,000% and made record profits. \n\nTonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers. \n\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.  \n\nThat ends on my watch. \n\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \n\nWe’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \n\nLet’s pass the Paycheck Fairness Act and paid leave.  \n\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \n\nLet’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges. \n\nAnd let’s pass the PRO Act when a majority of workers want to form a union—they shouldn’t be stopped.  \n\nWhen we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we haven’t done in a long time: build a better America. \n\nFor more than two years, COVID-19 has impacted every decision in our lives and the life of the nation. \n\nAnd I know you’re tired, frustrated, and exhausted. \n\nBut I also know this. \n\nBecause of the progress we’ve made, because of your resilience and the tools we have, tonight I can say  \nwe are moving forward safely, back to more normal routines.  \n\nWe’ve reached a new moment in the fight against COVID-19, with severe cases down to a level not seen since last July.  \n\nJust a few days ago, the Centers for Disease Control and Prevention—the CDC—issued new mask guidelines. \n\nUnder these new guidelines, most Americans in most of the country can now be mask free.   \n\nAnd based on the projections, more of the country will reach that point across the next couple of weeks. \n\nThanks to the progress we have made this past year, COVID-19 need no longer control our lives.  \n\nI know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. \n\nWe will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard. \n\nHere are four common sense steps as we move forward safely.  \n\nFirst, stay protected with vaccines and treatments. We know how incredibly effective vaccines are. If you’re vaccinated and boosted you have the highest degree of protection. \n\nWe will never give up on vaccinating more Americans. Now, I know parents with kids under 5 are eager to see a vaccine authorized for their children. \n\nThe scientists are working hard to get that done and we’ll be ready with plenty of vaccines when they do. \n\nWe’re also ready with anti-viral treatments. If you get COVID-19, the Pfizer pill reduces your chances of ending up in the hospital by 90%.  \n\nWe’ve ordered more of these pills than anyone in the world. And Pfizer is working overtime to get us 1 Million pills this month and more than double that next month.  \n\nAnd we’re launching the “Test to Treat” initiative so people can get tested at a pharmacy, and if they’re positive, receive antiviral pills on the spot at no cost.  \n\nIf you’re immunocompromised or have some other vulnerability, we have treatments and free high-quality masks. \n\nWe’re leaving no one behind or ignoring anyone’s needs as we move forward. \n\nAnd on testing, we have made hundreds of millions of tests available for you to order for free.   \n\nEven if you already ordered free tests tonight, I am announcing that you can order more from covidtests.gov starting next week. \n\nSecond – we must prepare for new variants. Over the past year, we’ve gotten much better at detecting new variants. \n\nIf necessary, we’ll be able to deploy new vaccines within 100 days instead of many more months or years.  \n\nAnd, if Congress provides the funds we need, we’ll have new stockpiles of tests, masks, and pills ready if needed. \n\nI cannot promise a new variant won’t come. But I can promise you we’ll do everything within our power to be ready if it does.  \n\nThird – we can end the shutdown of schools and businesses. We have the tools we need. \n\nIt’s time for Americans to get back to work and fill our great downtowns again.  People working from home can feel safe to begin to return to the office.   \n\nWe’re doing that here in the federal government. The vast majority of federal workers will once again work in person. \n\nOur schools are open. Let’s keep it that way. Our kids need to be in school. \n\nAnd with 75% of adult Americans fully vaccinated and hospitalizations down by 77%, most Americans can remove their masks, return to work, stay in the classroom, and move forward safely. \n\nWe achieved this because we provided free vaccines, treatments, tests, and masks. \n\nOf course, continuing this costs money. \n\nI will soon send Congress a request. \n\nThe vast majority of Americans have used these tools and may want to again, so I expect Congress to pass it quickly.   \n\nFourth, we will continue vaccinating the world.     \n\nWe’ve sent 475 Million vaccine doses to 112 countries, more than any other nation. \n\nAnd we won’t stop. \n\nWe have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLet’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.  \n\nLet’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.  \n\nWe can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n\nI’ve worked on these issues a long time. \n\nI know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. \n\nSo let’s not abandon our streets. Or choose between safety and equal justice. \n\nLet’s come together to protect our communities, restore trust, and hold law enforcement accountable. \n\nThat’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. \n\nThat’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope.  \n\nWe should all agree: The answer is not to Defund the police. The answer is to FUND the police with the resources and training they need to protect our communities. \n\nI ask Democrats and Republicans alike: Pass my budget and keep our neighborhoods safe.  \n\nAnd I will keep doing everything in my power to crack down on gun trafficking and ghost guns you can buy online and make at home—they have no serial numbers and can’t be traced. \n\nAnd I ask Congress to pass proven measures to reduce gun violence. Pass universal background checks. Why should anyone on a terrorist list be able to purchase a weapon? \n\nBan assault weapons and high-capacity magazines. \n\nRepeal the liability shield that makes gun manufacturers the only industry in America that can’t be sued. \n\nThese laws don’t infringe on the Second Amendment. They save lives. \n\nThe most fundamental right in America is the right to vote – and to have it counted. And it’s under assault. \n\nIn state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n\nWe cannot let this happen. \n\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. \n\nA former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  \n\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  \n\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders. \n\nWe can do all this while keeping lit the torch of liberty that has led generations of immigrants to this land—my forefathers and so many of yours. \n\nProvide a pathway to citizenship for Dreamers, those on temporary status, farm workers, and essential workers. \n\nRevise our laws so businesses have the workers they need and families don’t wait decades to reunite. \n\nIt’s not only the right thing to do—it’s the economically smart thing to do. \n\nThat’s why immigration reform is supported by everyone from labor unions to religious leaders to the U.S. Chamber of Commerce. \n\nLet’s get it done once and for all. \n\nAdvancing liberty and justice also requires protecting the rights of women. \n\nThe constitutional right affirmed in Roe v. Wade—standing precedent for half a century—is under attack as never before. \n\nIf we want to go forward—not backward—we must protect access to health care. Preserve a woman’s right to choose. And let’s continue to advance maternal health care in America. \n\nAnd for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \n\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n\nWhile it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \n\nAnd soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n\nSo tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.  \n\nFirst, beat the opioid epidemic. \n\nThere is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.  \n\nGet rid of outdated rules that stop doctors from prescribing treatments. And stop the flow of illicit drugs by working with state and local law enforcement to go after traffickers. \n\nIf you’re suffering from addiction, know you are not alone. I believe in recovery, and I celebrate the 23 million Americans in recovery. \n\nSecond, let’s take on mental health. Especially among our children, whose lives and education have been turned upside down.  \n\nThe American Rescue Plan gave schools money to hire teachers and help students make up for lost learning.  \n\nI urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor. \n\nChildren were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media. \n\nAs Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. \n\nIt’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. \n\nAnd let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care. \n\nThird, support our veterans. \n\nVeterans are the best of us. \n\nI’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. \n\nMy administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.  \n\nOur troops in Iraq and Afghanistan faced many dangers. \n\nOne was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \n\nWhen they came home, many of the world’s fittest and best trained warriors were never the same. \n\nHeadaches. Numbness. Dizziness. \n\nA cancer that would put them in a flag-draped coffin. \n\nI know. \n\nOne of those soldiers was my son Major Beau Biden. \n\nWe don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \n\nBut I’m committed to finding out everything we can. \n\nCommitted to military families like Danielle Robinson from Ohio. \n\nThe widow of Sergeant First Class Heath Robinson.  \n\nHe was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \n\nStationed near Baghdad, just yards from burn pits the size of football fields. \n\nHeath’s widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter. \n\nBut cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body. \n\nDanielle says Heath was a fighter to the very end. \n\nHe didn’t know how to stop fighting, and neither did she. \n\nThrough her pain she found purpose to demand we do better. \n\nTonight, Danielle—we are. \n\nThe VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits. \n\nAnd tonight, I’m announcing we’re expanding eligibility to veterans suffering from nine respiratory cancers. \n\nI’m also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve. \n\nAnd fourth, let’s end cancer as we know it. \n\nThis is personal to me and Jill, to Kamala, and to so many of you. \n\nCancer is the #2 cause of death in America–second only to heart disease. \n\nLast month, I announced our plan to supercharge  \nthe Cancer Moonshot that President Obama asked me to lead six years ago. \n\nOur goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases.  \n\nMore support for patients and families. \n\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \n\nIt’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.  \n\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more. \n\nA unity agenda for the nation. \n\nWe can do this. \n\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \n\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself. \n\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \n\nWell I know this nation.  \n\nWe will meet the test. \n\nTo protect freedom and liberty, to expand fairness and opportunity. \n\nWe will save democracy. \n\nAs hard as these times have been, I am more optimistic about America today than I have been my whole life. \n\nBecause I see the future that is within our grasp. \n\nBecause I know there is simply nothing beyond our capacity. \n\nWe are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n\nThe only nation that can be defined by a single word: possibilities. \n\nSo on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n\nAnd my report is this: the State of the Union is strong—because you, the American people, are strong. \n\nWe are stronger today than we were a year ago. \n\nAnd we will be stronger a year from now than we are today. \n\nNow is our moment to meet and overcome the challenges of our time. \n\nAnd we will, as one people. \n\nOne America. \n\nThe United States of America. \n\nMay God bless you all. May God protect our troops.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'})

The load_single_document function accomplishes the following steps:

  1. Extracts the file extension from the given file path.
  2. Retrieves the corresponding document loader and its arguments from the previously defined LOADER_MAPPING dictionary.
  3. Creates an instance of the appropriate document loader.
  4. Loads the document using the instantiated loader.
  5. Returns the loaded document.

We can see that load_single_document returns a document of type langchain.schema.Document. Which according to the documentation consists of page_content (the content of the data) and metadata (auxiliary pieces of information describing attributes of the data).

def load_documents(source_dir: str, ignored_files: List[str] = []) -> List[Document]:
    """
    Loads all documents from the source documents directory, ignoring specified files
    """
    all_files = []
    for ext in LOADER_MAPPING:
        #Find all the files within source documents which matches the extensions in Loader_Mapping file
        all_files.extend(
            glob.glob(os.path.join(source_dir, f"**/*{ext}"), recursive=True)
        )
    
    ## Filtering files from all_files if its in ignored_files
    filtered_files = [file_path for file_path in all_files if file_path not in ignored_files]
    
    ## Spinning up resource pool
    with Pool(processes=os.cpu_count()) as pool:
        results = []
        with tqdm(total=len(filtered_files), desc='Loading new documents', ncols=80) as pbar:
            # Load each document from filtered files list using load_single_document function
            for i, doc in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
                results.append(doc)
                pbar.update()
    
    return results

The load_single_documents function carries out the following steps: 1. Initializes an empty dictionary called all_files.
2. For each extension in the LOADER_MAPPING dictionary, it searches for all the files with that extension in the source directory and adds them to the all_files list.
3. Creates a new list named filtered_files by removing the files listed in the ignored_files list from the all_files list.
4. Executes a parallel loading operation on all the files in the filtered_files list using the load_single_document function, and appends the results to the results list.
5. Returns the list of loaded documents.

loaded_documents = load_documents(git_dir+'source_documents')
print(f"Length of loaded documents: {len(loaded_documents)}")
loaded_documents[0]
Loading new documents: 100%|█████████████████████| 1/1 [00:00<00:00, 246.69it/s]
Length of loaded documents: 1
Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \n\nIn this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. \n\nLet each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. \n\nPlease rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. \n\nThroughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos.   \n\nThey keep moving.   \n\nAnd the costs and the threats to America and the world keep rising.   \n\nThat’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2. \n\nThe United States is a member along with 29 other nations. \n\nIt matters. American diplomacy matters. American resolve matters. \n\nPutin’s latest attack on Ukraine was premeditated and unprovoked. \n\nHe rejected repeated efforts at diplomacy. \n\nHe thought the West and NATO wouldn’t respond. And he thought he could divide us at home. Putin was wrong. We were ready.  Here is what we did.   \n\nWe prepared extensively and carefully. \n\nWe spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. \n\nI spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.  \n\nWe countered Russia’s lies with truth.   \n\nAnd now that he has acted the free world is holding him accountable. \n\nAlong with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland. \n\nWe are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. \n\nTogether with our allies –we are right now enforcing powerful economic sanctions. \n\nWe are cutting off Russia’s largest banks from the international financial system.  \n\nPreventing Russia’s central bank from defending the Russian Ruble making Putin’s $630 Billion “war fund” worthless.   \n\nWe are choking off Russia’s access to technology that will sap its economic strength and weaken its military for years to come.  \n\nTonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. \n\nThe U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs.  \n\nWe are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains. \n\nAnd tonight I am announcing that we will join our allies in closing off American air space to all Russian flights – further isolating Russia – and adding an additional squeeze –on their economy. The Ruble has lost 30% of its value. \n\nThe Russian stock market has lost 40% of its value and trading remains suspended. Russia’s economy is reeling and Putin alone is to blame. \n\nTogether with our allies we are providing support to the Ukrainians in their fight for freedom. Military assistance. Economic assistance. Humanitarian assistance. \n\nWe are giving more than $1 Billion in direct assistance to Ukraine. \n\nAnd we will continue to aid the Ukrainian people as they defend their country and to help ease their suffering.  \n\nLet me be clear, our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.  \n\nOur forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies – in the event that Putin decides to keep moving west.  \n\nFor that purpose we’ve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia. \n\nAs I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.  \n\nAnd we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.  \n\nPutin has unleashed violence and chaos.  But while he may make gains on the battlefield – he will pay a continuing high price over the long run. \n\nAnd a proud Ukrainian people, who have known 30 years  of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.  \n\nTo all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world. \n\nAnd I’m taking robust action to make sure the pain of our sanctions  is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n\nTonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.  \n\nAmerica will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies.  \n\nThese steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming. \n\nBut I want you to know that we are going to be okay. \n\nWhen the history of this era is written Putin’s war on Ukraine will have left Russia weaker and the rest of the world stronger. \n\nWhile it shouldn’t have taken something so terrible for people around the world to see what’s at stake now everyone sees it clearly. \n\nWe see the unity among leaders of nations and a more unified Europe a more unified West. And we see unity among the people who are gathering in cities in large crowds around the world even in Russia to demonstrate their support for Ukraine.  \n\nIn the battle between democracy and autocracy, democracies are rising to the moment, and the world is clearly choosing the side of peace and security. \n\nThis is a real test. It’s going to take time. So let us continue to draw inspiration from the iron will of the Ukrainian people. \n\nTo our fellow Ukrainian Americans who forge a deep bond that connects our two nations we stand with you. \n\nPutin may circle Kyiv with tanks, but he will never gain the hearts and souls of the Ukrainian people. \n\nHe will never extinguish their love of freedom. He will never weaken the resolve of the free world. \n\nWe meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \n\nThe pandemic has been punishing. \n\nAnd so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \n\nI understand. \n\nI remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it. \n\nThat’s why one of the first things I did as President was fight to pass the American Rescue Plan.  \n\nBecause people were hurting. We needed to act, and we did. \n\nFew pieces of legislation have done more in a critical moment in our history to lift us out of crisis. \n\nIt fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans.  \n\nHelped put food on their table, keep a roof over their heads, and cut the cost of health insurance. \n\nAnd as my Dad used to say, it gave people a little breathing room. \n\nAnd unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind. \n\nAnd it worked. It created jobs. Lots of jobs. \n\nIn fact—our economy created over 6.5 Million new jobs just last year, more jobs created in one year  \nthan ever before in the history of America. \n\nOur economy grew at a rate of 5.7% last year, the strongest growth in nearly 40 years, the first step in bringing fundamental change to an economy that hasn’t worked for the working people of this nation for too long.  \n\nFor the past 40 years we were told that if we gave tax breaks to those at the very top, the benefits would trickle down to everyone else. \n\nBut that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century. \n\nVice President Harris and I ran for office with a new economic vision for America. \n\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up  \nand the middle out, not from the top down.  \n\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \n\nAmerica used to have the best roads, bridges, and airports on Earth. \n\nNow our infrastructure is ranked 13th in the world. \n\nWe won’t be able to compete for the jobs of the 21st Century if we don’t fix that. \n\nThat’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. \n\nThis was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. \n\nWe’re done talking about infrastructure weeks. \n\nWe’re going to have an infrastructure decade. \n\nIt is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China.  \n\nAs I’ve told Xi Jinping, it is never a good bet to bet against the American people. \n\nWe’ll create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \n\nAnd we’ll do it all to withstand the devastating effects of the climate crisis and promote environmental justice. \n\nWe’ll build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities. \n\n4,000 projects have already been announced. \n\nAnd tonight, I’m announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair. \n\nWhen we use taxpayer dollars to rebuild America – we are going to Buy American: buy American products to support American jobs. \n\nThe federal government spends about $600 Billion a year to keep the country safe and secure. \n\nThere’s been a law on the books for almost a century \nto make sure taxpayers’ dollars support American jobs and businesses. \n\nEvery Administration says they’ll do it, but we are actually doing it. \n\nWe will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America. \n\nBut to compete for the best jobs of the future, we also need to level the playing field with China and other competitors. \n\nThat’s why it is so important to pass the Bipartisan Innovation Act sitting in Congress that will make record investments in emerging technologies and American manufacturing. \n\nLet me give you one example of why it’s so important to pass it. \n\nIf you travel 20 miles east of Columbus, Ohio, you’ll find 1,000 empty acres of land. \n\nIt won’t look like much, but if you stop and look closely, you’ll see a “Field of dreams,” the ground on which America’s future will be built. \n\nThis is where Intel, the American company that helped build Silicon Valley, is going to build its $20 billion semiconductor “mega site”. \n\nUp to eight state-of-the-art factories in one place. 10,000 new good-paying jobs. \n\nSome of the most sophisticated manufacturing in the world to make computer chips the size of a fingertip that power the world and our everyday lives. \n\nSmartphones. The Internet. Technology we have yet to invent. \n\nBut that’s just the beginning. \n\nIntel’s CEO, Pat Gelsinger, who is here tonight, told me they are ready to increase their investment from  \n$20 billion to $100 billion. \n\nThat would be one of the biggest investments in manufacturing in American history. \n\nAnd all they’re waiting for is for you to pass this bill. \n\nSo let’s not wait any longer. Send it to my desk. I’ll sign it.  \n\nAnd we will really take off. \n\nAnd Intel is not alone. \n\nThere’s something happening in America. \n\nJust look around and you’ll see an amazing story. \n\nThe rebirth of the pride that comes from stamping products “Made In America.” The revitalization of American manufacturing.   \n\nCompanies are choosing to build new factories here, when just a few years ago, they would have built them overseas. \n\nThat’s what is happening. Ford is investing $11 billion to build electric vehicles, creating 11,000 jobs across the country. \n\nGM is making the largest investment in its history—$7 billion to build electric vehicles, creating 4,000 jobs in Michigan. \n\nAll told, we created 369,000 new manufacturing jobs in America just last year. \n\nPowered by people I’ve met like JoJo Burgess, from generations of union steelworkers from Pittsburgh, who’s here with us tonight. \n\nAs Ohio Senator Sherrod Brown says, “It’s time to bury the label “Rust Belt.” \n\nIt’s time. \n\nBut with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills.  \n\nInflation is robbing them of the gains they might otherwise feel. \n\nI get it. That’s why my top priority is getting prices under control. \n\nLook, our economy roared back faster than most predicted, but the pandemic meant that businesses had a hard time hiring enough workers to keep up production in their factories. \n\nThe pandemic also disrupted global supply chains. \n\nWhen factories close, it takes longer to make goods and get them from the warehouse to the store, and prices go up. \n\nLook at cars. \n\nLast year, there weren’t enough semiconductors to make all the cars that people wanted to buy. \n\nAnd guess what, prices of automobiles went up. \n\nSo—we have a choice. \n\nOne way to fight inflation is to drive down wages and make Americans poorer.  \n\nI have a better plan to fight inflation. \n\nLower your costs, not your wages. \n\nMake more cars and semiconductors in America. \n\nMore infrastructure and innovation in America. \n\nMore goods moving faster and cheaper in America. \n\nMore jobs where you can earn a good living in America. \n\nAnd instead of relying on foreign supply chains, let’s make it in America. \n\nEconomists call it “increasing the productive capacity of our economy.” \n\nI call it building a better America. \n\nMy plan to fight inflation will lower your costs and lower the deficit. \n\n17 Nobel laureates in economics say my plan will ease long-term inflationary pressures. Top business leaders and most Americans support my plan. And here’s the plan: \n\nFirst – cut the cost of prescription drugs. Just look at insulin. One in ten Americans has diabetes. In Virginia, I met a 13-year-old boy named Joshua Davis.  \n\nHe and his Dad both have Type 1 diabetes, which means they need insulin every day. Insulin costs about $10 a vial to make.  \n\nBut drug companies charge families like Joshua and his Dad up to 30 times more. I spoke with Joshua’s mom. \n\nImagine what it’s like to look at your child who needs insulin and have no idea how you’re going to pay for it.  \n\nWhat it does to your dignity, your ability to look your child in the eye, to be the parent you expect to be. \n\nJoshua is here with us tonight. Yesterday was his birthday. Happy birthday, buddy.  \n\nFor Joshua, and for the 200,000 other young people with Type 1 diabetes, let’s cap the cost of insulin at $35 a month so everyone can afford it.  \n\nDrug companies will still do very well. And while we’re at it let Medicare negotiate lower prices for prescription drugs, like the VA already does. \n\nLook, the American Rescue Plan is helping millions of families on Affordable Care Act plans save $2,400 a year on their health care premiums. Let’s close the coverage gap and make those savings permanent. \n\nSecond – cut energy costs for families an average of $500 a year by combatting climate change.  \n\nLet’s provide investments and tax credits to weatherize your homes and businesses to be energy efficient and you get a tax credit; double America’s clean energy production in solar, wind, and so much more;  lower the price of electric vehicles, saving you another $80 a month because you’ll never have to pay at the gas pump again. \n\nThird – cut the cost of child care. Many families pay up to $14,000 a year for child care per child.  \n\nMiddle-class and working families shouldn’t have to pay more than 7% of their income for care of young children.  \n\nMy plan will cut the cost in half for most families and help parents, including millions of women, who left the workforce during the pandemic because they couldn’t afford child care, to be able to get back to work. \n\nMy plan doesn’t stop there. It also includes home and long-term care. More affordable housing. And Pre-K for every 3- and 4-year-old.  \n\nAll of these will lower costs. \n\nAnd under my plan, nobody earning less than $400,000 a year will pay an additional penny in new taxes. Nobody.  \n\nThe one thing all Americans agree on is that the tax system is not fair. We have to fix it.  \n\nI’m not looking to punish anyone. But let’s make sure corporations and the wealthiest Americans start paying their fair share. \n\nJust last year, 55 Fortune 500 corporations earned $40 billion in profits and paid zero dollars in federal income tax.  \n\nThat’s simply not fair. That’s why I’ve proposed a 15% minimum tax rate for corporations. \n\nWe got more than 130 countries to agree on a global minimum tax rate so companies can’t get out of paying their taxes at home by shipping jobs and factories overseas. \n\nThat’s why I’ve proposed closing loopholes so the very wealthy don’t pay a lower tax rate than a teacher or a firefighter.  \n\nSo that’s my plan. It will grow the economy and lower costs for families. \n\nSo what are we waiting for? Let’s get this done. And while you’re at it, confirm my nominees to the Federal Reserve, which plays a critical role in fighting inflation.  \n\nMy plan will not only lower costs to give families a fair shot, it will lower the deficit. \n\nThe previous Administration not only ballooned the deficit with tax cuts for the very wealthy and corporations, it undermined the watchdogs whose job was to keep pandemic relief funds from being wasted. \n\nBut in my administration, the watchdogs have been welcomed back. \n\nWe’re going after the criminals who stole billions in relief money meant for small businesses and millions of Americans.  \n\nAnd tonight, I’m announcing that the Justice Department will name a chief prosecutor for pandemic fraud. \n\nBy the end of this year, the deficit will be down to less than half what it was before I took office.  \n\nThe only president ever to cut the deficit by more than one trillion dollars in a single year. \n\nLowering your costs also means demanding more competition. \n\nI’m a capitalist, but capitalism without competition isn’t capitalism. \n\nIt’s exploitation—and it drives up prices. \n\nWhen corporations don’t have to compete, their profits go up, your prices go up, and small businesses and family farmers and ranchers go under. \n\nWe see it happening with ocean carriers moving goods in and out of America. \n\nDuring the pandemic, these foreign-owned companies raised prices by as much as 1,000% and made record profits. \n\nTonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers. \n\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.  \n\nThat ends on my watch. \n\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \n\nWe’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \n\nLet’s pass the Paycheck Fairness Act and paid leave.  \n\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \n\nLet’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges. \n\nAnd let’s pass the PRO Act when a majority of workers want to form a union—they shouldn’t be stopped.  \n\nWhen we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we haven’t done in a long time: build a better America. \n\nFor more than two years, COVID-19 has impacted every decision in our lives and the life of the nation. \n\nAnd I know you’re tired, frustrated, and exhausted. \n\nBut I also know this. \n\nBecause of the progress we’ve made, because of your resilience and the tools we have, tonight I can say  \nwe are moving forward safely, back to more normal routines.  \n\nWe’ve reached a new moment in the fight against COVID-19, with severe cases down to a level not seen since last July.  \n\nJust a few days ago, the Centers for Disease Control and Prevention—the CDC—issued new mask guidelines. \n\nUnder these new guidelines, most Americans in most of the country can now be mask free.   \n\nAnd based on the projections, more of the country will reach that point across the next couple of weeks. \n\nThanks to the progress we have made this past year, COVID-19 need no longer control our lives.  \n\nI know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. \n\nWe will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard. \n\nHere are four common sense steps as we move forward safely.  \n\nFirst, stay protected with vaccines and treatments. We know how incredibly effective vaccines are. If you’re vaccinated and boosted you have the highest degree of protection. \n\nWe will never give up on vaccinating more Americans. Now, I know parents with kids under 5 are eager to see a vaccine authorized for their children. \n\nThe scientists are working hard to get that done and we’ll be ready with plenty of vaccines when they do. \n\nWe’re also ready with anti-viral treatments. If you get COVID-19, the Pfizer pill reduces your chances of ending up in the hospital by 90%.  \n\nWe’ve ordered more of these pills than anyone in the world. And Pfizer is working overtime to get us 1 Million pills this month and more than double that next month.  \n\nAnd we’re launching the “Test to Treat” initiative so people can get tested at a pharmacy, and if they’re positive, receive antiviral pills on the spot at no cost.  \n\nIf you’re immunocompromised or have some other vulnerability, we have treatments and free high-quality masks. \n\nWe’re leaving no one behind or ignoring anyone’s needs as we move forward. \n\nAnd on testing, we have made hundreds of millions of tests available for you to order for free.   \n\nEven if you already ordered free tests tonight, I am announcing that you can order more from covidtests.gov starting next week. \n\nSecond – we must prepare for new variants. Over the past year, we’ve gotten much better at detecting new variants. \n\nIf necessary, we’ll be able to deploy new vaccines within 100 days instead of many more months or years.  \n\nAnd, if Congress provides the funds we need, we’ll have new stockpiles of tests, masks, and pills ready if needed. \n\nI cannot promise a new variant won’t come. But I can promise you we’ll do everything within our power to be ready if it does.  \n\nThird – we can end the shutdown of schools and businesses. We have the tools we need. \n\nIt’s time for Americans to get back to work and fill our great downtowns again.  People working from home can feel safe to begin to return to the office.   \n\nWe’re doing that here in the federal government. The vast majority of federal workers will once again work in person. \n\nOur schools are open. Let’s keep it that way. Our kids need to be in school. \n\nAnd with 75% of adult Americans fully vaccinated and hospitalizations down by 77%, most Americans can remove their masks, return to work, stay in the classroom, and move forward safely. \n\nWe achieved this because we provided free vaccines, treatments, tests, and masks. \n\nOf course, continuing this costs money. \n\nI will soon send Congress a request. \n\nThe vast majority of Americans have used these tools and may want to again, so I expect Congress to pass it quickly.   \n\nFourth, we will continue vaccinating the world.     \n\nWe’ve sent 475 Million vaccine doses to 112 countries, more than any other nation. \n\nAnd we won’t stop. \n\nWe have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLet’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.  \n\nLet’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.  \n\nWe can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n\nI’ve worked on these issues a long time. \n\nI know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. \n\nSo let’s not abandon our streets. Or choose between safety and equal justice. \n\nLet’s come together to protect our communities, restore trust, and hold law enforcement accountable. \n\nThat’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. \n\nThat’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope.  \n\nWe should all agree: The answer is not to Defund the police. The answer is to FUND the police with the resources and training they need to protect our communities. \n\nI ask Democrats and Republicans alike: Pass my budget and keep our neighborhoods safe.  \n\nAnd I will keep doing everything in my power to crack down on gun trafficking and ghost guns you can buy online and make at home—they have no serial numbers and can’t be traced. \n\nAnd I ask Congress to pass proven measures to reduce gun violence. Pass universal background checks. Why should anyone on a terrorist list be able to purchase a weapon? \n\nBan assault weapons and high-capacity magazines. \n\nRepeal the liability shield that makes gun manufacturers the only industry in America that can’t be sued. \n\nThese laws don’t infringe on the Second Amendment. They save lives. \n\nThe most fundamental right in America is the right to vote – and to have it counted. And it’s under assault. \n\nIn state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n\nWe cannot let this happen. \n\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. \n\nA former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  \n\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  \n\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders. \n\nWe can do all this while keeping lit the torch of liberty that has led generations of immigrants to this land—my forefathers and so many of yours. \n\nProvide a pathway to citizenship for Dreamers, those on temporary status, farm workers, and essential workers. \n\nRevise our laws so businesses have the workers they need and families don’t wait decades to reunite. \n\nIt’s not only the right thing to do—it’s the economically smart thing to do. \n\nThat’s why immigration reform is supported by everyone from labor unions to religious leaders to the U.S. Chamber of Commerce. \n\nLet’s get it done once and for all. \n\nAdvancing liberty and justice also requires protecting the rights of women. \n\nThe constitutional right affirmed in Roe v. Wade—standing precedent for half a century—is under attack as never before. \n\nIf we want to go forward—not backward—we must protect access to health care. Preserve a woman’s right to choose. And let’s continue to advance maternal health care in America. \n\nAnd for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \n\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n\nWhile it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \n\nAnd soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n\nSo tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.  \n\nFirst, beat the opioid epidemic. \n\nThere is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.  \n\nGet rid of outdated rules that stop doctors from prescribing treatments. And stop the flow of illicit drugs by working with state and local law enforcement to go after traffickers. \n\nIf you’re suffering from addiction, know you are not alone. I believe in recovery, and I celebrate the 23 million Americans in recovery. \n\nSecond, let’s take on mental health. Especially among our children, whose lives and education have been turned upside down.  \n\nThe American Rescue Plan gave schools money to hire teachers and help students make up for lost learning.  \n\nI urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor. \n\nChildren were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media. \n\nAs Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. \n\nIt’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. \n\nAnd let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care. \n\nThird, support our veterans. \n\nVeterans are the best of us. \n\nI’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. \n\nMy administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.  \n\nOur troops in Iraq and Afghanistan faced many dangers. \n\nOne was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \n\nWhen they came home, many of the world’s fittest and best trained warriors were never the same. \n\nHeadaches. Numbness. Dizziness. \n\nA cancer that would put them in a flag-draped coffin. \n\nI know. \n\nOne of those soldiers was my son Major Beau Biden. \n\nWe don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \n\nBut I’m committed to finding out everything we can. \n\nCommitted to military families like Danielle Robinson from Ohio. \n\nThe widow of Sergeant First Class Heath Robinson.  \n\nHe was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \n\nStationed near Baghdad, just yards from burn pits the size of football fields. \n\nHeath’s widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter. \n\nBut cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body. \n\nDanielle says Heath was a fighter to the very end. \n\nHe didn’t know how to stop fighting, and neither did she. \n\nThrough her pain she found purpose to demand we do better. \n\nTonight, Danielle—we are. \n\nThe VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits. \n\nAnd tonight, I’m announcing we’re expanding eligibility to veterans suffering from nine respiratory cancers. \n\nI’m also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve. \n\nAnd fourth, let’s end cancer as we know it. \n\nThis is personal to me and Jill, to Kamala, and to so many of you. \n\nCancer is the #2 cause of death in America–second only to heart disease. \n\nLast month, I announced our plan to supercharge  \nthe Cancer Moonshot that President Obama asked me to lead six years ago. \n\nOur goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases.  \n\nMore support for patients and families. \n\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \n\nIt’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.  \n\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more. \n\nA unity agenda for the nation. \n\nWe can do this. \n\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \n\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself. \n\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \n\nWell I know this nation.  \n\nWe will meet the test. \n\nTo protect freedom and liberty, to expand fairness and opportunity. \n\nWe will save democracy. \n\nAs hard as these times have been, I am more optimistic about America today than I have been my whole life. \n\nBecause I see the future that is within our grasp. \n\nBecause I know there is simply nothing beyond our capacity. \n\nWe are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n\nThe only nation that can be defined by a single word: possibilities. \n\nSo on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n\nAnd my report is this: the State of the Union is strong—because you, the American people, are strong. \n\nWe are stronger today than we were a year ago. \n\nAnd we will be stronger a year from now than we are today. \n\nNow is our moment to meet and overcome the challenges of our time. \n\nAnd we will, as one people. \n\nOne America. \n\nThe United States of America. \n\nMay God bless you all. May God protect our troops.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'})

You can see we have loaded the state_of_the_union.txt file from the privateGPT repo. As this is the only file in that directory the length of loaded documents is one.

3.1.2 Splitting the documents into smaller chunks

Now we have seen how we can load multiple documents of different extensions using the load_documents function. The next step is to look at process_document function which loads and splits large documents into smaller chunks.

from langchain.text_splitter import RecursiveCharacterTextSplitter

chunk_size = 500
chunk_overlap = 50

def process_documents(source_dir: str, ignored_files: List[str] = []) -> List[Document]:
    """
    Load documents and split in chunks
    """
    print(f"Loading documents from {source_dir}")
    documents = load_documents(source_dir, ignored_files)
    if not documents:
        print("No new documents to load")
        exit(0)
    print(f"Loaded {len(documents)} new documents from {source_dir}")
    ## Load text splitter
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    ## Split text
    texts = text_splitter.split_documents(documents)
    print(f"Split into {len(texts)} chunks of text (max. {chunk_size} tokens each)")
    return texts

processed_documents = process_documents(git_dir+'source_documents')
Loading documents from ../../../../privateGPT/source_documents
Loading new documents: 100%|█████████████████████| 1/1 [00:00<00:00, 315.74it/s]
Loaded 1 new documents from ../../../../privateGPT/source_documents
Split into 90 chunks of text (max. 500 tokens each)

The process_documents function performs the following steps:

  1. Loads all the documents from the source_dir directory using the load_documents function.
  2. Initializes an instance of RecursiveCharacterTextSplitter from the langchain.text_splitter module, providing the chunk_size and chunk_overlap parameters. This class is responsible for splitting a list of documents into smaller overlapping chunks. [RecursiveCharacterTextSplitter documentation].
  3. Uses the split_documents method of the RecursiveCharacterTextSplitter instance to split the loaded documents into smaller chunks.
  4. Returns the resulting list of the smaller document chunks.

3.1.3 Initializing the embedding model

Next, we load our embedding module which converts the smaller document chunks from previous steps to embeddings.

from langchain.embeddings import HuggingFaceEmbeddings
EMBEDDINGS_MODEL_NAME = "all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(model_name=EMBEDDINGS_MODEL_NAME)

print("Testing on a single query.")
embedded_vector = embeddings.embed_query("What is your name?")
print(f"Size of embedded vector: {len(embedded_vector)}")
Testing on a single query.
Size of embedded vector: 384

The given code snippet carries out the following steps:

  1. Imports the HuggingFaceEmbeddings function from the langchain.embeddings module. This function is responsible for loading and encapsulating the SentenceTransformers embeddings, which are used for generating dense vector representations of sentences. You can refer to the HuggingFaceEmbeddings documentation for more details.
  2. Loads the all-MiniLM-L6-v2 model from the sentence_transformers library. This model is specifically designed to map sentences and paragraphs into a 384-dimensional dense vector space. It is commonly utilized for tasks such as semantic search and similarity analysis.

We can see that our embedded vector on a sample query returns a 384 dimension vector.

3.1.4 Embed smaller text and save it in the vector database

The next step involves utilizing the document chunks and the embedding model to store the documents and their corresponding embeddings in a vector database.

from chromadb.config import Settings
from langchain.vectorstores import Chroma

PERSIST_DIRECTORY= git_dir+"db"
# Define the Chroma settings
CHROMA_SETTINGS = Settings(
        chroma_db_impl='duckdb+parquet',
        persist_directory=PERSIST_DIRECTORY,
        anonymized_telemetry=False
)
## Create the embedding database
db = Chroma.from_documents(processed_documents, embeddings, persist_directory=PERSIST_DIRECTORY, client_settings=CHROMA_SETTINGS)
db.persist()
Using embedded DuckDB with persistence: data will be stored in: ../../../../privateGPT/db

The given code snippet performs the following operations:

  1. It imports the Settings class from the chromadb.config module and the Chroma class from the langchain.vectorstores module.
  2. It creates an instance of the Settings class named CHROMA_SETTINGS, providing several configuration parameters:
    • chroma_db_impl is set to 'duckdb+parquet', specifying the implementation to be used for the Chroma vector database.
    • persist_directory is set to the PERSIST_DIRECTORY variable defined earlier, indicating the directory where the vector database will be saved.
    • anonymized_telemetry is set to False, indicating whether anonymized telemetry data should be collected.
  3. It creates a vector database by calling the Chroma.from_documents() method. This method takes the following arguments:
    • processed_documents: The list of processed documents obtained from the previous step.
    • embeddings: The embeddings object/model used to generate the document embeddings.
    • persist_directory: The directory where the vector database will be persisted, specified by the PERSIST_DIRECTORY variable.
    • client_settings: The settings object (CHROMA_SETTINGS) containing configuration parameters for the vector database.
  4. We use db.persist() to store the index for future retrieval task
## Test the semantic retrieval 
db.similarity_search(query="What is the American Rescue Plan?", k= 4)
[Document(page_content='The American Rescue Plan gave schools money to hire teachers and help students make up for lost learning.  \n\nI urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor. \n\nChildren were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'}),
 Document(page_content='It fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans.  \n\nHelped put food on their table, keep a roof over their heads, and cut the cost of health insurance. \n\nAnd as my Dad used to say, it gave people a little breathing room. \n\nAnd unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'}),
 Document(page_content='Look, the American Rescue Plan is helping millions of families on Affordable Care Act plans save $2,400 a year on their health care premiums. Let’s close the coverage gap and make those savings permanent. \n\nSecond – cut energy costs for families an average of $500 a year by combatting climate change.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'}),
 Document(page_content='That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. \n\nThat’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'})]
db = None

To test the retrieval of semantic similarity, we can use the similarity_search function. similarity_search function takes a text query as input and returns the top k=4 document chunks from the vector database.

3.2 Question & Answer Interface

Let’s explore the Q&A interface in more detail. The Q&A interface consists of the following steps:

  1. Load the vector database and prepare it for the retrieval task.
  2. Load a pre-trained Large language model from LlamaCpp or GPT4ALL.
  3. Prompt the user with a query and generate a response using the RetrievalQA pipeline from langchain.chains.
Fig.6: Question Answering Pipeline

Let’s look at these steps one by one.

3.2.1 Load the vector database

First, we import the required libraries.

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from chromadb.config import Settings
git_dir = "../../../../privateGPT/"
PERSIST_DIRECTORY= git_dir+"db"
EMBEDDINGS_MODEL_NAME = "all-MiniLM-L6-v2"

# Define the Chroma settings
CHROMA_SETTINGS = Settings(
        chroma_db_impl='duckdb+parquet',
        persist_directory=PERSIST_DIRECTORY,
        anonymized_telemetry=False
)

embeddings = HuggingFaceEmbeddings(model_name=EMBEDDINGS_MODEL_NAME)
db = Chroma(persist_directory=PERSIST_DIRECTORY, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)
retriever = db.as_retriever()
Using embedded DuckDB with persistence: data will be stored in: ../../../../privateGPT/db

The given code snippet carries out the following steps:

  1. Loads the embeddings using the HuggingFaceEmbeddings function, which was previously used to create the embedding store.
  2. Instantiates a Chroma vector database that was created earlier.
  3. Sets the vector database in retrieval mode.
## Testing retriever
retriever.vectorstore.similarity_search(query = "What is Amercian rescue plan?")
[Document(page_content='The American Rescue Plan gave schools money to hire teachers and help students make up for lost learning.  \n\nI urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor. \n\nChildren were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'}),
 Document(page_content='It fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans.  \n\nHelped put food on their table, keep a roof over their heads, and cut the cost of health insurance. \n\nAnd as my Dad used to say, it gave people a little breathing room. \n\nAnd unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'}),
 Document(page_content='That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. \n\nThat’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'}),
 Document(page_content='Look, the American Rescue Plan is helping millions of families on Affordable Care Act plans save $2,400 a year on their health care premiums. Let’s close the coverage gap and make those savings permanent. \n\nSecond – cut energy costs for families an average of $500 a year by combatting climate change.', metadata={'source': '../../../../privateGPT/source_documents/state_of_the_union.txt'})]

3.2.2 Load a pre-trained Large language model.

from langchain.llms import GPT4All

MODEL_PATH = git_dir+"models/ggml-gpt4all-j-v1.3-groovy.bin" 
MODEL_N_CTX=1000

# Prepare the LLM
llm = GPT4All(model=MODEL_PATH, n_ctx=MODEL_N_CTX, backend='gptj', callbacks=None, verbose=False)
gptj_model_load: loading model from '../../../../privateGPT/models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 4505.45 MB
gptj_model_load: memory_size =   896.00 MB, n_mem = 57344
gptj_model_load: ................................... done
gptj_model_load: model size =  3609.38 MB / num tensors = 285

The code snippet above create an instance of the GPT4All class named llm, which represents the Language Model (LLM) using the GPT-4All model. The constructor of GPT4All takes the following arguments:
- model: The path to the GPT-4All model file specified by the MODEL_PATH variable.
- n_ctx: The context size or maximum length of input sequences specified by the MODEL_N_CTX variable.
- backend: The backend to use for the LLM. In this case, it is set to ‘gptj’.
- callbacks: The callbacks to be used during the LLM execution. In this case, it is set to None.
- verbose: A boolean flag indicating whether to print verbose output during LLM execution. In this case, it is set to False.

3.2.3 Prompt the user with a query and generate a response

from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)
query = "What is American rescue plan?"
res = qa(query)
answer, docs = res['result'], res['source_documents']

# Get the answer from the chain
# Print the result
print("\n\n> Question:")
print(query)
print("\n> Answer:")
print(answer)

# Print the relevant sources used for the answer
for document in docs:
    print("\n> " + document.metadata["source"] + ":")
    print(document.page_content)


> Question:
What is American rescue plan?

> Answer:
 The American Rescue Plan is a program that provides funding to schools to hire teachers and help students make up for lost learning due to the COVID-19 pandemic. It also provides economic relief for tens of millions of Americans by helping them put food on their table, keep a roof over their heads, and cut the cost of health insurance. The plan also helps working people by providing breathing room and giving them a little breathing room. It is a program that helps millions of families on Affordable Care Act plans save $2,400 a year on their health care premiums and combat climate change by cutting energy costs for families an average of $500 a year.

> ../../../../privateGPT/source_documents/state_of_the_union.txt:
The American Rescue Plan gave schools money to hire teachers and help students make up for lost learning.  

I urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor. 

Children were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media.

> ../../../../privateGPT/source_documents/state_of_the_union.txt:
It fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans.  

Helped put food on their table, keep a roof over their heads, and cut the cost of health insurance. 

And as my Dad used to say, it gave people a little breathing room. 

And unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind.

> ../../../../privateGPT/source_documents/state_of_the_union.txt:
Look, the American Rescue Plan is helping millions of families on Affordable Care Act plans save $2,400 a year on their health care premiums. Let’s close the coverage gap and make those savings permanent. 

Second – cut energy costs for families an average of $500 a year by combatting climate change.

> ../../../../privateGPT/source_documents/state_of_the_union.txt:
That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. 

That’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope.

Firstly, an instance of the RetrievalQA class named qa is created using the from_chain_type method. The RetrievalQA class is a chain specifically designed for question-answering tasks over an index. Please refer to the documentation for further details. The from_chain_type method takes the following arguments:

  • llm: The Language Model instance (llm) that was created previously.
  • chain_type: A string representing the type of chain to be used. In this case, it is set to "stuff". There may be other available chain types specific to the question-answering scenario. Please consult the documentation for more information.
  • retriever: An instance of a Chroma database used to retrieve relevant documents for the given query.
  • return_source_documents: A boolean flag indicating whether to return the source documents along with the answer. In this case, it is set to True.

Next, the qa instance is used to process a query. The Language Model (LLM) within the qa instance generates a response that includes the query, the answer, and the source documents used as context for generating the answer.

Finally, the answer and source documents are printed out for display.

3.3 Conclusion

In this blog post, we explored privateGPT, its implementation, and the code walkthrough for its ingestion pipeline and q&A interface. I hope this blog post has been valuable in understanding privateGPT and its implementation. I recommend my readers to try privateGPT on your own knowledge base.

I hope you enjoyed reading it. If there is any feedback on the code or just the blog post, feel free to comment below or reach out on LinkedIn.