Parquet Downloads
KG Registry data in efficient columnar format

Download KG Registry Data as Parquet Files

Parquet is a columnar storage file format designed for efficiency with big data processing frameworks. These files can be directly queried using tools like DuckDB, Apache Spark, or Pandas without loading the entire dataset into memory.

Available Files
File Description Download
resources.parquet Main resources table containing all knowledge graph registry entries Download
resource_domains.parquet Resource-domain relationships for better querying by domain Download
resource_products.parquet Products associated with each resource Download
Using Parquet Files

These Parquet files can be queried using various tools:

import pandas as pd

# Read Parquet file into DataFrame
resources_df = pd.read_parquet('resources.parquet')

# Query data
active_resources = resources_df[resources_df['activity_status'] == 'active']
print(f"Total active resources: {len(active_resources)}")

import duckdb

# Connect to in-memory database
conn = duckdb.connect(':memory:')

# Query directly from Parquet files
result = conn.execute("""
SELECT category, COUNT(*) as count
FROM 'resources.parquet'
GROUP BY category
ORDER BY count DESC
""").fetchall()

for category, count in result:
    print(f"{category}: {count} resources")

library(arrow)

# Read Parquet file
resources <- read_parquet("resources.parquet")

# Explore data
summary(resources)

# Filter active resources
active <- resources[resources$activity_status == "active", ]
print(paste("Total active resources:", nrow(active)))

For more information about using these Parquet files with the KG Registry, see our Parquet backend documentation.

Try the Advanced Search to query these files directly in your browser using SQL.