Mismo

The SQL/Ibis powered sklearn of record linkage.

Still in alpha stage. Breaking changes will happen frequently and with no warning. Once things are more stabilized I will come up with a stability policy. Any suggestions as to how you want the API to look like would be greatly appreciated.

Installation

I have claimed mismo on PyPI, but I won't update it often until this is more stable. Until then, install from source:

python -m pip install "mismo[viz] @ git+https://github.com/NickCrews/mismo@<SOME-SHA-OR-BRANCH>"

Goals

Mismo tries to be the sklearn of record linkage, backed by the scalability and power of SQL and Ibis. It is made of many small data structures and functions, each with a well-defined and standard API that allows them to be composed together and extended easily. None of the other record linkage packages I have seen, such as Splink, Dedupe, or Record Linkage Toolkit, had all of these properties, so I decided to make my own.

See Goals and Alternatives for a more detailed discussion of the goals of Mismo and how it compares to other record linkage packages.

Features

Supports larger-than-memory datasets, executed on powerful SQL engines. Use DuckDB for prototyping and for jobs up to maybe ~10M records, or Spark or other distributed backends for larger tasks, without needing to change your code!
Use the clean, strong-typed, pythonic, and Dataframe API of Ibis.
Small, modular functions and data structures that are easy to plug together and extend.
Layered API: Use top-level APIs if your task is common enough that it is supported out of the box.

Examples

See the example notebook.

Documentation

See the documentation.

Contributing

See the contributing guide.

License

mismo is distributed under the terms of the LGPL-3.0-or-later license.

Name		Name	Last commit message	Last commit date
Latest commit History 672 Commits
.github		.github
docs		docs
mismo		mismo
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.txt		LICENSE.txt
README.md		README.md
justfile		justfile
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mismo

Installation

Goals

Features

Examples

Documentation

Contributing

License

About

Releases

Packages

Contributors 6

Languages

License

NickCrews/mismo

Folders and files

Latest commit

History

Repository files navigation

Mismo

Installation

Goals

Features

Examples

Documentation

Contributing

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages