Entity Relationship Extraction¶
This guides explain the default implementation of the Entity Relationship Extraction.
The component can be customized in multiple ways including full replacement by an implementation that follows the same protocol.
In [1]:
Copied!
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
Out[1]:
True
Load Sample TextUnits DataFrame¶
In [2]:
Copied!
import pandas as pd
df_text_units = pd.read_parquet("sample-data/base_text_units.parquet")
# let's work only with a subset of the data
# for this guide to avoid any unnecessary LLM cost
df_text_units = df_text_units[0:3]
df_text_units.head()
import pandas as pd
df_text_units = pd.read_parquet("sample-data/base_text_units.parquet")
# let's work only with a subset of the data
# for this guide to avoid any unnecessary LLM cost
df_text_units = df_text_units[0:3]
df_text_units.head()
Out[2]:
| id | document_id | text_unit | |
|---|---|---|---|
| 0 | f28e49bc-5b67-46b3-b971-6d6cb2832790 | a0192baf-d76a-40d4-bcd3-437127eef568 | A CHRISTMAS CAROL\n\n [Illustration: _"How... |
| 1 | 6fae26d7-9b26-4f79-ac78-970e69fcab95 | a0192baf-d76a-40d4-bcd3-437127eef568 | at the grindstone, Scrooge! a\nsqueezing, wre... |
| 2 | c93ae0c0-c8c3-49a9-beb0-a1e3b74efa0a | a0192baf-d76a-40d4-bcd3-437127eef568 | dismal? What reason have you to be morose? You... |
The default implementation¶
In [3]:
Copied!
from langchain_graphrag.indexing.graph_generation import EntityRelationshipExtractor
from langchain_graphrag.indexing.graph_generation import EntityRelationshipExtractor
We first need to create an LLM to pass to EntityRelationshipExtractor
In [4]:
Copied!
from langchain_openai import ChatOpenAI
from langchain_community.cache import SQLiteCache
openai_api_key = os.getenv("LANGCHAIN_GRAPHRAG_OPENAI_CHAT_API_KEY", None)
if openai_api_key is None:
raise ValueError("Please set the LANGCHAIN_GRAPHRAG_OPENAI_CHAT_API_KEY environment variable")
er_llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0.0,
api_key=openai_api_key,
cache=SQLiteCache("openai_cache.db"), # always a good idea to use Cache
)
# There is a static method provide to build the default extractor
extractor = EntityRelationshipExtractor.build_default(llm=er_llm)
from langchain_openai import ChatOpenAI
from langchain_community.cache import SQLiteCache
openai_api_key = os.getenv("LANGCHAIN_GRAPHRAG_OPENAI_CHAT_API_KEY", None)
if openai_api_key is None:
raise ValueError("Please set the LANGCHAIN_GRAPHRAG_OPENAI_CHAT_API_KEY environment variable")
er_llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0.0,
api_key=openai_api_key,
cache=SQLiteCache("openai_cache.db"), # always a good idea to use Cache
)
# There is a static method provide to build the default extractor
extractor = EntityRelationshipExtractor.build_default(llm=er_llm)
We now run the extractor on the dataframe
In [5]:
Copied!
text_unit_graphs = extractor.invoke(df_text_units)
text_unit_graphs = extractor.invoke(df_text_units)
Extracting entities and relationships ...: 100%|██████████| 3/3 [00:00<00:00, 20.16it/s]
Let's see how many nodes and edges we got for each text unit.
In [6]:
Copied!
for index, g in enumerate(text_unit_graphs):
print("---------------------------------")
print(f"Graph: {index}")
print(f"Number of nodes - {len(g.nodes)}")
print(f"Number of edges - {len(g.edges)}")
print(g.nodes())
print(g.edges())
print("---------------------------------")
for index, g in enumerate(text_unit_graphs):
print("---------------------------------")
print(f"Graph: {index}")
print(f"Number of nodes - {len(g.nodes)}")
print(f"Number of edges - {len(g.edges)}")
print(g.nodes())
print(g.edges())
print("---------------------------------")
---------------------------------
Graph: 0
Number of nodes - 16
Number of edges - 9
['A CHRISTMAS CAROL', 'CHARLES DICKENS', 'EBENEZER SCROOGE', 'MARLEY', 'BOB CRATCHIT', 'TIM CRATCHIT', 'MR. FEZZIWIG', 'FRED', 'GHOST OF CHRISTMAS PAST', 'GHOST OF CHRISTMAS PRESENT', 'GHOST OF CHRISTMAS YET TO COME', 'JACOB MARLEY', 'MRS. CRATCHIT', 'BELLE', 'DICK WILKINS', 'MRS. FEZZIWIG']
[('EBENEZER SCROOGE', 'MARLEY'), ('EBENEZER SCROOGE', 'FRED'), ('EBENEZER SCROOGE', 'BOB CRATCHIT'), ('EBENEZER SCROOGE', 'GHOST OF CHRISTMAS PAST'), ('EBENEZER SCROOGE', 'GHOST OF CHRISTMAS PRESENT'), ('EBENEZER SCROOGE', 'GHOST OF CHRISTMAS YET TO COME'), ('EBENEZER SCROOGE', 'MR. FEZZIWIG'), ('EBENEZER SCROOGE', 'BELLE'), ('BOB CRATCHIT', 'TIM CRATCHIT')]
---------------------------------
---------------------------------
Graph: 1
Number of nodes - 4
Number of edges - 4
['SCROOGE', "SCROOGE'S NEPHEW", 'CHRISTMAS', 'COUNTING-HOUSE']
[('SCROOGE', "SCROOGE'S NEPHEW"), ('SCROOGE', 'CHRISTMAS'), ('SCROOGE', 'COUNTING-HOUSE'), ("SCROOGE'S NEPHEW", 'CHRISTMAS')]
---------------------------------
---------------------------------
Graph: 2
Number of nodes - 5
Number of edges - 5
['SCROOGE', "SCROOGE'S NEPHEW", 'CHRISTMAS', 'MARLEY', 'CLERK']
[('SCROOGE', "SCROOGE'S NEPHEW"), ('SCROOGE', 'CHRISTMAS'), ('SCROOGE', 'MARLEY'), ('SCROOGE', 'CLERK'), ("SCROOGE'S NEPHEW", 'CHRISTMAS')]
---------------------------------
Let's see data for some nodes and edges
In [7]:
Copied!
# You will see that every node has `description` and `text_unit_ids` as attributes
text_unit_graphs[0].nodes["EBENEZER SCROOGE"]
# You will see that every node has `description` and `text_unit_ids` as attributes
text_unit_graphs[0].nodes["EBENEZER SCROOGE"]
Out[7]:
{'type': 'PERSON',
'description': ['Ebenezer Scrooge is the main character in A Christmas Carol, depicted as a miserly old man who undergoes a profound transformation.'],
'text_unit_ids': ['f28e49bc-5b67-46b3-b971-6d6cb2832790']}
In [8]:
Copied!
# You will see that every edge has `weight`, `description` and `text_unit_ids` as attributes
text_unit_graphs[0].edges[('EBENEZER SCROOGE', 'MARLEY')]
# You will see that every edge has `weight`, `description` and `text_unit_ids` as attributes
text_unit_graphs[0].edges[('EBENEZER SCROOGE', 'MARLEY')]
Out[8]:
{'weight': 2.0,
'description': ["Marley is the ghost of Scrooge's former business partner, who warns him about his selfish ways and the consequences of his actions",
'Marley warns Scrooge about the chains he will wear if he does not change his ways, establishing a direct connection between their fates'],
'text_unit_ids': ['f28e49bc-5b67-46b3-b971-6d6cb2832790']}