Sifter

Sifter is a Extract Tranform Load (ETL) engine. It can be used to Extract from a number of different data resources, including TSV files, SQLDump files and external databases. It includes a pipeline description language to define a set of Transform steps to create object messages that can be validated using a JSON schema data.

Finally, SIFTER has a loader module that takes JSON message streams and load them into a property graph using rules described by JsonHyperSchema.

Example Extract/Transform Playbook

class: sifter
name: census_2010

params:
  census: 
    type: File
    default: ../data/census_2010_byzip.json
  date: 
    type: string
    default: "2010-01-01"
  schema: 
    type: path
    default: ../covid19_datadictionary/gdcdictionary/schemas/

inputs:
  censusData:
    json:
      path: "{{params.census}}"

outputs:
  validated:
    json:
      path: census_data.ndjson

pipelines:
  transform:
    - from: censusData
    - map:
        #fix weird formatting of zip code
        gpython: >
          def f(x):
            d = int(x['zipcode'])
            x['zipcode'] = "%05d" % (int(d))
            return x
        method: f
    - project:
        mapping:
          submitter_id: "{{row.geo_id}}:{{params.date}}"
          type: census_report
          date: "{{params.date}}"
          summary_location: "{{row.zipcode}}"
    - objectValidate:
        title: census_report
        schema: "{{params.schema}}"

Running Sifter

sifter run examples/genome.yaml

Python Exec

Sifter will run Python code, however for this to function, the python environment needs to have GRPC install. To install, run:

pip install grpcio-tools

Go Tests

Run go tests with

go clean -testcache
go test ./test/... -v

Name		Name	Last commit message	Last commit date
Latest commit History 537 Commits
.github/workflows		.github/workflows
cmd		cmd
compose/mongo-loader		compose/mongo-loader
config		config
docs		docs
docschema		docschema
evaluate		evaluate
examples		examples
extractors		extractors
graphcheck		graphcheck
loader		loader
logger		logger
manifest		manifest
playbook		playbook
readers		readers
task		task
test		test
transform		transform
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
dev-notes.md		dev-notes.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
mkdocs.yml		mkdocs.yml
swagger.yml		swagger.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sifter

Example Extract/Transform Playbook

Running Sifter

Python Exec

Go Tests

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sifter

Example Extract/Transform Playbook

Running Sifter

Python Exec

Go Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages