Name	Name	Last commit message	Last commit date
parent directory ..
examples	examples
fuzz	fuzz
src	src
tests	tests
CHANGELOG.md	CHANGELOG.md
Cargo.toml	Cargo.toml
README.md	README.md
coverage.py	coverage.py

WebGraph

A Rust implementation of the WebGraph framework for graph compression.

WebGraph is a framework for graph compression aimed at studying web graphs, but currently being applied to several other types of graphs. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

Algorithms for compressing web graphs that exploit gap compression and differential compression (à la LINK), intervalization, and ζ codes to provide a high compression ratio (see our datasets). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
Algorithms for analyzing very large graphs, such as HyperBall, which has been used to show that Facebook has just four degrees of separation.
A Java implementation of the algorithms above, now in maintenance mode.
This crate, providing a complete, documented implementation of the algorithms above in Rust. It is free software distributed under either the GNU Lesser General Public License 2.1+ or the Apache Software License 2.0.
Data sets for large graphs (e.g., billions of links).

Users of WebGraph

Citation

You are welcome to use and improve WebGraph for your research work! If you find our software useful for research, please cite the following papers in your own:

"WebGraph: The Next Generation (Is in Rust)", by Tommaso Fontana, Sebastiano Vigna, and Stefano Zacchiroli, in WWW '24: Companion Proceedings of the ACM on Web Conference 2024, pages 686–689. DOI 10.1145/3589335.3651581.
"The WebGraph Framework I: Compression Techniques", by Paolo Boldi and Sebastiano Vigna, in Proc. of the 13th international conference on World Wide Web, WWW 2004, pages 595–602, ACM. DOI 10.1145/988672.988752.

Loading a compressed graph

A graph in BV format consists of a BASENAME.graph file (the compressed bitstream), a BASENAME.properties file (metadata), and a BASENAME.offsets file (pointers into the bitstream). You can download graphs from the LAW web site or the Common Crawl web site.

For random access to the successors of a node, you also need a BASENAME.ef file containing an Elias–Fano representation of the offsets. You can build it with the command-line interface (webgraph build ef BASENAME) or programmatically with store_ef.

To load a graph with random access:

let graph = BvGraph::with_basename("BASENAME").load()?;

BvGraph::with_basename returns a LoadConfig that can be further customized, selecting endianness, type of memory access (memory mapping, full loading, etc.), and so on. By default you will get big endianness, memory mapping, and dynamic code dispatch.

If you only need sequential access (e.g., scanning all arcs), use BvGraphSeq, which does not require the .ef file:

let graph = BvGraphSeq::with_basename("BASENAME").load()?;

You can retrieve the successors of a node or iterate on the whole graph. In particular, using the handy for_ macro:

for_![(src, succ) in graph {
    for dst in succ {
        [do something with the arc src -> dst]
    }
}];

Note that on Windows memory mapping requires that the length of the graph file is a multiple of the internal bit buffer. You can use the CLI command run pad u32 to ensure that your graph file is properly padded.

In-memory graph representations

Several structures are available for building graphs in memory:

VecGraph and LabeledVecGraph: mutable graphs backed by vectors; arcs must be added in increasing successor order. Serializable with ε-serde.
BTreeGraph and LabeledBTreeGraph: mutable graphs backed by B-tree maps; arcs can be added in any order.
CsrGraph: a classical immutable Compressed Sparse Row representation, useful for algorithms that need fast random access without compression overhead. Serializable with ε-serde.

All these types (except ArcListGraph) can also be serialized with serde using the feature gate serde.

Importing your data

The command-line interface provides a from arcs subcommand that reads tab-separated arcs from standard input and compresses them directly into BV format.

From code, you have several options depending on how your data is organized:

If you can generate arcs in sorted order (by source, then by target), wrap your iterator in an ArcListGraph and pass it directly to BvCompConfig::comp_graph; no intermediate storage is needed.
If your arcs are unsorted, ParSortedGraph::from_pairs accepts an iterator on (usize, usize) pairs, sorts them, and returns a graph ready for parallel compression.
If you can produce arcs in parallel as a Rayon parallel iterator, ParSortedGraph::par_from_pairs does the same with parallel sorting.

In all cases the result implements IntoParLenders, so it can be passed to BvCompConfig::par_comp for parallel compression. For full control over deduplication, memory budget, and progress logging, use the ParSortedGraphConf builder obtained via ParSortedGraph::config().

All of the above also works for labeled graphs: just use ((usize, usize), L) pairs instead of (usize, usize) and the corresponding ParSortedLabeledGraph methods.

Compressing graphs

The entry point for compression is BvComp::with_basename, which returns a BvCompConfig builder. Compression parameters (window size, codes for different components, etc.) are controlled by CompFlags. compression paramters are described in detail in the bvgraph module documentation.

To compress sequentially a graph that implements SequentialGraph:

BvComp::with_basename("BASENAME")
    .comp_graph::<BE>(graph)?;

For better compression ratios at the cost of a longer compression time, use BvCompConfig::with_bvgraphz, which enables Zuckerli-like dynamic-programming reference selection.

Parallel compression

Any graph implementing IntoParLenders can be compressed in parallel:

BvComp::with_basename("BASENAME")
    .par_comp::<BE, _>(&graph)?;

All common graph types implement IntoParLenders automatically via SplitLabeling. Transparent wrappers can adjust the splitting:

ParGraph: overrides the number of parallel parts.
ParDcfGraph: uses a precomputed degree cumulative function to balance work by arcs rather than by nodes.

Labeled compression

Graphs with labels can be compressed with BvCompConfig::comp_labeled_graph (sequential) or BvCompConfig::par_comp_labeled (parallel). Label storage is controlled by a StoreLabelsConfig implementation; BitStreamStoreLabelsConfig provides bitstream-based storage with optional Zstandard compression.

Graph transforms

The transform module provides common graph operations, each available in sequential and parallel (SplitLabeling-based) variants:

Transpose: reverse all arcs (transpose_split for parallel, transpose_labeled for labeled graphs).
Symmetrize: add missing reverse arcs, optionally removing self-loops (symmetrize_split for parallel).
Permute: renumber nodes according to a permutation (permute_split for parallel).
Map: renumber nodes through an arbitrary function, with deduplication (map_split for parallel).

Graph and data wrappers

Lightweight wrappers combine or modify data without copying:

PermutedGraph: applies a node permutation lazily.
UnionGraph: merges arcs from two graphs.
NoSelfLoopsGraph: filters out self-loops.
JavaPermutation: reads Java WebGraph permutation files.
ArcListGraph: a lazy sequential graph backed by an iterator of arcs, useful for feeding data directly into the compressor without materializing the graph in memory.

Labels

Graphs are a special case of labelings: a SequentialLabeling assigns a sequence of labels to each node, and a graph is simply a labeling whose labels are usize successors. You can attach additional labels to a graph by zipping it with a labeling.

Bitstream-based labelings are provided by BitStreamLabeling (random access) and BitStreamLabelingSeq (sequential). Custom label (de)serializers implement the BitDeserializer/BitSerializer traits. The Left and Right projections extract one component from a zipped labeling.

Graph traversals

The visits module provides breadth-first and depth-first traversals, both sequential and parallel:

BFS: breadth_first::Seq, breadth_first::ParFair (fair division of work) and breadth_first::ParLowMem (memory-efficient).
DFS: depth_first::SeqIter.

Visits use a callback-based API with event types (EventPred/EventNoPred) that carry predecessor/parent information when requested.

Graph iterators

BfsOrder and DfsOrder wrap the visit machinery into standard iterators that yield nodes in BFS or DFS order. BfsOrderFromRoots starts the visit from a given set of roots.

Random graphs

ErdosRenyi generates Erdős–Rényi random graphs with a given number of nodes and edge probability, useful for testing and benchmarking.

Utilities

par_map_fold: parallel map-fold over iterators, an alternative to Rayon's ParallelBridge that avoids some of its bottlenecks.
graph::eq, graph::eq_labeled, labels::eq_sorted: equality checks for graphs and labelings, useful in tests.
labels::check_impl: verifies that the sequential and random-access implementations of a labeling are consistent.

Command-line interface

We provide a command-line interface to perform various operations on unlabeled graphs (compression, transformation, analysis, etc.).

Acknowledgments

This software has been partially supported by project SERICS (PE00000014) under the NRRP MUR program funded by the EU - NGEU, and by project ANR COREGRAPHIE, grant ANR-20-CE23-0002 of the French Agence Nationale de la Recherche. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

WebGraph

Users of WebGraph

Citation

Loading a compressed graph

In-memory graph representations

Importing your data

Compressing graphs

Parallel compression

Labeled compression

Graph transforms

Graph and data wrappers

Labels

Graph traversals

Graph iterators

Random graphs

Utilities

Command-line interface

Acknowledgments

FilesExpand file tree

webgraph

Directory actions

More options

Directory actions

More options

Latest commit

History

webgraph

Folders and files

parent directory

README.md

WebGraph

Users of WebGraph

Citation

Loading a compressed graph

In-memory graph representations

Importing your data

Compressing graphs

Parallel compression

Labeled compression

Graph transforms

Graph and data wrappers

Labels

Graph traversals

Graph iterators

Random graphs

Utilities

Command-line interface

Acknowledgments