Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 31 additions & 5 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ pathrex/
│ │ └── inmemory.rs # InMemory marker, InMemoryBuilder, InMemoryGraph
│ └── formats/
│ ├── mod.rs # FormatError enum, re-exports
│ └── csv.rs # Csv<R> — CSV → Edge iterator (CsvConfig, ColumnSpec)
│ ├── csv.rs # Csv<R> — CSV → Edge iterator (CsvConfig, ColumnSpec)
│ └── nt.rs # NTriples<R> — N-Triples → Edge iterator (LabelExtraction)
├── tests/
│ └── inmemory_tests.rs # Integration tests for InMemoryBuilder / InMemoryGraph
├── deps/
Expand Down Expand Up @@ -187,9 +188,13 @@ into a single graph.

### Format parsers

[`Csv<R>`](src/formats/csv.rs:52) is the only built-in parser. It yields
`Iterator<Item = Result<Edge, FormatError>>` and is directly pluggable into
`GraphBuilder::load()` via its `GraphSource<InMemoryBuilder>` impl.
Two built-in parsers are available, both yielding
`Iterator<Item = Result<Edge, FormatError>>` and pluggable into
`GraphBuilder::load()` via their `GraphSource<InMemoryBuilder>` impls.

#### `Csv<R>`

[`Csv<R>`](src/formats/csv.rs:52) parses delimiter-separated edge files.

Configuration is via [`CsvConfig`](src/formats/csv.rs:17):

Expand All @@ -204,6 +209,27 @@ Configuration is via [`CsvConfig`](src/formats/csv.rs:17):
[`ColumnSpec`](src/formats/csv.rs:11) is either `Index(usize)` or `Name(String)`.
Name-based lookup requires `has_header: true`.

#### `NTriples<R>`

[`NTriples<R>`](src/formats/nt.rs:57) parses [W3C N-Triples](https://www.w3.org/TR/n-triples/)
RDF files using `oxttl`. Each triple `(subject, predicate, object)` becomes an
[`Edge`](src/graph/mod.rs:154) where:

- `source` — subject IRI or blank-node ID (`_:label`).
- `target` — object IRI or blank-node ID; triples whose object is an RDF
literal yield `Err(FormatError::LiteralAsNode)` (callers may filter these out).
- `label` — predicate IRI, transformed by [`LabelExtraction`](src/formats/nt.rs:36):

| Variant | Behaviour |
|---|---|
| `LocalName` (default) | Fragment (`#name`) or last path segment of the predicate IRI |
| `FullIri` | Full predicate IRI string |

Constructors:

- [`NTriples::new(reader)`](src/formats/nt.rs:72) — uses `LabelExtraction::LocalName`.
- [`NTriples::with_label_extraction(reader, strategy)`](src/formats/nt.rs:76) — explicit strategy.

### FFI layer

[`lagraph_sys`](src/lagraph_sys.rs) exposes raw C bindings for GraphBLAS and
Expand Down Expand Up @@ -254,7 +280,7 @@ Tests in `src/graph/mod.rs` use `CountingBuilder` / `CountOutput` / `VecSource`
[`src/utils.rs`](src/utils.rs) — these do **not** call into GraphBLAS and run without
native libraries.

Tests in `src/formats/csv.rs` are pure Rust and need no native dependencies.
Tests in `src/formats/csv.rs` and `src/formats/nt.rs` are pure Rust and need no native dependencies.

Tests in `src/graph/inmemory.rs` and [`tests/inmemory_tests.rs`](tests/inmemory_tests.rs)
call real GraphBLAS/LAGraph and require the native libraries to be present.
Expand Down
18 changes: 17 additions & 1 deletion src/formats/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,25 @@
//!
//! ```no_run
//! use pathrex::graph::{Graph, InMemory, GraphDecomposition};
//! use pathrex::formats::Csv;
//! use pathrex::formats::{Csv, NTriples};
//! use std::fs::File;
//!
//! // Build from CSV in one line
//! let g = Graph::<InMemory>::try_from(
//! Csv::from_reader(File::open("edges.csv").unwrap()).unwrap()
//! ).unwrap();
//!
//! // Build from N-Triples in one line
//! let g2 = Graph::<InMemory>::try_from(
//! NTriples::new(File::open("data.nt").unwrap())
//! ).unwrap();
//! ```

pub mod csv;
pub mod nt;

pub use csv::Csv;
pub use nt::NTriples;

use thiserror::Error;

Expand All @@ -33,4 +40,13 @@ pub enum FormatError {
/// An I/O error occurred while reading the data source.
#[error("I/O error: {0}")]
Io(#[from] std::io::Error),

/// An error produced by the N-Triples parser.
#[error("N-Triples parse error: {0}")]
NTriples(String),

/// An RDF literal appeared as a subject or object where a node IRI or
/// blank node was expected.
#[error("RDF literal cannot be used as a graph node (triple skipped)")]
LiteralAsNode,
}
235 changes: 235 additions & 0 deletions src/formats/nt.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
//! N-Triples edge iterator for the formats layer.
//!
//! ```no_run
//! use pathrex::formats::NTriples;
//! use pathrex::formats::FormatError;
//!
//! # let reader = std::io::empty();
//! let iter = NTriples::new(reader)
//! .filter_map(|r| match r {
//! Err(FormatError::LiteralAsNode) => None, // skip
//! other => Some(other),
//! });
//! ```
//!
//! To load into a graph:
//!
//! ```no_run
//! use pathrex::graph::{Graph, InMemory, GraphDecomposition};
//! use pathrex::formats::NTriples;
//! use std::fs::File;
//!
//! let graph = Graph::<InMemory>::try_from(
//! NTriples::new(File::open("data.nt").unwrap())
//! ).unwrap();
//! ```

use std::io::Read;

use oxrdf::{NamedOrBlankNode, Term};
use oxttl::NTriplesParser;
use oxttl::ntriples::ReaderNTriplesParser;

use crate::formats::FormatError;
use crate::graph::Edge;

/// Controls how predicate IRIs are converted to edge label strings.
#[derive(Debug, Clone, Default)]
pub enum LabelExtraction {
/// Use only the local name: the fragment (`#name`) or last path segment.
/// For example, `http://example.org/ns/knows` → `"knows"`.
/// This is the default.
#[default]
LocalName,
/// Use the full IRI string as the label.
/// For example, `http://example.org/ns/knows` → `"http://example.org/ns/knows"`.
FullIri,
}

/// An iterator that reads N-Triples and yields `Result<Edge, FormatError>`.
///
/// # Example
///
/// ```no_run
/// use pathrex::formats::nt::NTriples;
/// use std::fs::File;
///
/// let file = File::open("data.nt").unwrap();
/// let iter = NTriples::new(file);
/// for result in iter {
/// let edge = result.unwrap();
/// println!("{} --{}--> {}", edge.source, edge.label, edge.target);
/// }
/// ```
pub struct NTriples<R: Read> {
inner: ReaderNTriplesParser<R>,
label_extraction: LabelExtraction,
}

impl<R: Read> NTriples<R> {
pub fn new(reader: R) -> Self {
Self::with_label_extraction(reader, LabelExtraction::LocalName)
}

pub fn with_label_extraction(reader: R, label_extraction: LabelExtraction) -> Self {
Self {
inner: NTriplesParser::new().for_reader(reader),
label_extraction,
}
}

fn subject_to_node_id(subject: NamedOrBlankNode) -> String {
match subject {
NamedOrBlankNode::NamedNode(n) => n.into_string(),
NamedOrBlankNode::BlankNode(b) => format!("_:{}", b.as_str()),
}
}

fn object_to_node_id(object: Term) -> Result<String, FormatError> {
match object {
Term::NamedNode(n) => Ok(n.into_string()),
Term::BlankNode(b) => Ok(format!("_:{}", b.as_str())),
Term::Literal(_) => Err(FormatError::LiteralAsNode),
}
}

fn extract_label(iri: &str, strategy: &LabelExtraction) -> String {
match strategy {
LabelExtraction::FullIri => iri.to_owned(),
LabelExtraction::LocalName => {
// Fragment takes priority, then last path segment.
if let Some(pos) = iri.rfind('#') {
iri[pos + 1..].to_owned()
} else if let Some(pos) = iri.rfind('/') {
iri[pos + 1..].to_owned()
} else {
iri.to_owned()
}
}
}
}
}

impl<R: Read> Iterator for NTriples<R> {
type Item = Result<Edge, FormatError>;

fn next(&mut self) -> Option<Self::Item> {
let triple = match self.inner.next()? {
Ok(t) => t,
Err(e) => return Some(Err(FormatError::NTriples(e.to_string()))),
};

let source = Self::subject_to_node_id(triple.subject.into());
let label = Self::extract_label(triple.predicate.as_str(), &self.label_extraction);
let target = match Self::object_to_node_id(triple.object) {
Ok(t) => t,
Err(e) => return Some(Err(e)),
};

Some(Ok(Edge {
source,
target,
label,
}))
}
}

#[cfg(test)]
mod tests {
use super::*;

fn parse(nt: &str) -> Vec<Result<Edge, FormatError>> {
NTriples::new(nt.as_bytes()).collect()
}

#[test]
fn test_basic_ntriples() {
let nt = "<http://example.org/Alice> <http://example.org/knows> <http://example.org/Bob> .\n\
<http://example.org/Bob> <http://example.org/likes> <http://example.org/Charlie> .\n";
let edges = parse(nt);
assert_eq!(edges.len(), 2);

let e0 = edges[0].as_ref().unwrap();
assert_eq!(e0.source, "http://example.org/Alice");
assert_eq!(e0.target, "http://example.org/Bob");
assert_eq!(e0.label, "knows");

let e1 = edges[1].as_ref().unwrap();
assert_eq!(e1.source, "http://example.org/Bob");
assert_eq!(e1.target, "http://example.org/Charlie");
assert_eq!(e1.label, "likes");
}

#[test]
fn test_full_iri_label_extraction() {
let nt =
"<http://example.org/Alice> <http://example.org/knows> <http://example.org/Bob> .\n";
let edges: Vec<_> =
NTriples::with_label_extraction(nt.as_bytes(), LabelExtraction::FullIri).collect();

assert_eq!(edges.len(), 1);
assert_eq!(edges[0].as_ref().unwrap().label, "http://example.org/knows");
}

#[test]
fn test_blank_node_subject_and_object() {
let nt = "_:b1 <http://example.org/knows> _:b2 .\n";
let edges = parse(nt);
assert_eq!(edges.len(), 1);

let e = edges[0].as_ref().unwrap();
assert_eq!(e.source, "_:b1");
assert_eq!(e.target, "_:b2");
}

#[test]
fn test_literal_object_yields_error() {
let nt = "<http://example.org/Alice> <http://example.org/name> \"Alice\" .\n";
let edges = parse(nt);
assert_eq!(edges.len(), 1);
assert!(
matches!(edges[0], Err(FormatError::LiteralAsNode)),
"literal object should yield LiteralAsNode error"
);
}

#[test]
fn test_caller_can_skip_literal_triples() {
let nt = "<http://example.org/Alice> <http://example.org/knows> <http://example.org/Bob> .\n\
<http://example.org/Alice> <http://example.org/name> \"Alice\" .\n\
<http://example.org/Bob> <http://example.org/knows> <http://example.org/Charlie> .\n";
let edges: Vec<_> = NTriples::new(nt.as_bytes())
.filter_map(|r| match r {
Err(FormatError::LiteralAsNode) => None,
other => Some(other),
})
.collect();

assert_eq!(edges.len(), 2, "literal triple should be skipped");
assert!(edges.iter().all(|r| r.is_ok()));
}

#[test]
fn test_fragment_iri_local_name() {
let nt =
"<http://example.org/Alice> <http://example.org/ns#knows> <http://example.org/Bob> .\n";
let edges = parse(nt);
assert_eq!(edges[0].as_ref().unwrap().label, "knows");
}

#[test]
fn test_ntriples_graph_source() {
use crate::graph::{GraphBuilder, GraphDecomposition, InMemoryBuilder};

let nt = "<http://example.org/A> <http://example.org/knows> <http://example.org/B> .\n\
<http://example.org/B> <http://example.org/knows> <http://example.org/C> .\n";
let iter = NTriples::new(nt.as_bytes());

let graph = InMemoryBuilder::default()
.load(iter)
.expect("load should succeed")
.build()
.expect("build should succeed");
assert_eq!(graph.num_nodes(), 3);
}
}
30 changes: 29 additions & 1 deletion src/graph/inmemory.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use std::sync::Arc;
use std::{collections::HashMap, io::Read};

use crate::formats::Csv;
use crate::formats::{Csv, NTriples};
use crate::{
graph::GraphSource,
lagraph_sys::{GrB_Index, GrB_Matrix, GrB_Matrix_free, LAGraph_Kind},
Expand Down Expand Up @@ -191,6 +191,15 @@ impl<R: Read> GraphSource<InMemoryBuilder> for Csv<R> {
}
}

impl<R: Read> GraphSource<InMemoryBuilder> for NTriples<R> {
fn apply_to(self, mut builder: InMemoryBuilder) -> Result<InMemoryBuilder, GraphError> {
for item in self {
builder.push_edge(item?)?;
}
Ok(builder)
}
}

#[cfg(test)]
mod tests {
use super::*;
Expand Down Expand Up @@ -278,4 +287,23 @@ mod tests {
assert!(graph.get_graph("knows").is_ok());
assert!(graph.get_graph("likes").is_ok());
}

#[test]
fn test_with_stream_from_ntriples() {
use crate::formats::nt::NTriples;

let nt = "<http://example.org/A> <http://example.org/knows> <http://example.org/B> .\n\
<http://example.org/B> <http://example.org/knows> <http://example.org/C> .\n\
<http://example.org/A> <http://example.org/likes> <http://example.org/C> .\n";

let graph = InMemoryBuilder::new()
.load(NTriples::new(nt.as_bytes()))
.expect("load should succeed")
.build()
.expect("build should succeed");

assert_eq!(graph.num_nodes(), 3);
assert!(graph.get_graph("knows").is_ok());
assert!(graph.get_graph("likes").is_ok());
}
}
2 changes: 1 addition & 1 deletion src/graph/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ impl LagraphGraph {
Self::new(matrix, kind)
}

pub fn check_graph(&self) -> Result<(), GraphError> {
pub fn check_graph(&self) -> Result<(), GraphError> {
la_ok!(LAGraph_CheckGraph(self.inner))
}
}
Expand Down
Loading