Package: autodb 3.2.4.9000

autodb: Automatic Database Normalisation for Data Frames

Automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.) Originally inspired by the 'AutoNormalize' library for 'Python' by 'Alteryx' (<https://github.com/alteryx/autonormalize>), with various changes and improvements. Automatic discovery of functional or approximate dependencies, normalisation based on those, and plotting of the resulting "database" via 'Graphviz', with options to exclude some attributes at discovery time, or remove discovered dependencies at normalisation time.

Authors:Mark Webster [aut, cre]

autodb_3.2.4.9000.tar.gz
autodb_3.2.4.9000.zip(r-4.7)autodb_3.2.4.9000.zip(r-4.6)autodb_3.2.4.9000.zip(r-4.5)
autodb_3.2.4.9000.tgz(r-4.6-any)autodb_3.2.4.9000.tgz(r-4.5-any)
autodb_3.2.4.9000.tar.gz(r-4.7-any)autodb_3.2.4.9000.tar.gz(r-4.6-any)
autodb_3.2.4.9000.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
autodb/json (API)

# Install 'autodb' in R:
install.packages('autodb', repos = c('https://charnelmouse.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/charnelmouse/autodb/issues

Pkgdown/docs site:https://charnelmouse.github.io

Datasets:
  • nudge - Nudge meta-analysis data

On CRAN:

Conda:

functional-dependency-discovery

7.51 score 10 stars 37 scripts 256 downloads 45 exports 0 dependencies

Last updated from:294f88d8ad. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK388
source / vignettesOK238
linux-release-x86_64OK379
macos-release-arm64OK246
macos-oldrel-arm64OK310
windows-develOK425
windows-releaseOK419
windows-oldrelOK406
wasm-releaseOK127

Exports:attrsattrs_orderattrs_order<-attrs<-autodbautokeyautorefcreated2databasedatabase_schemadecomposedependantdependant<-detsetdetset<-df_anyDuplicateddf_duplicateddf_equivdf_rbinddf_recordsdf_uniquediscoverdiscover_keysfunctional_dependencygvinsertkeyskeys<-merge_empty_keysmerge_schemasnormaliserecordsrecords<-reducereferencesreferences<-rejoinrelationrelation_schemaremove_extraneousrename_attrssubrelationssubschemassynthesise

Dependencies:

Nested data

Last update: 2026-03-15
Started: 2026-03-15

Planned improvements
Views for BCNF and connected reference chains | Boyes-Codd Normal Form | Reference chains | Views as a solution | Proper handling of nullable data | Handling of duplicate records

Last update: 2025-11-08
Started: 2025-10-10

Using autodb
Terminology | Motivation | Database normalisation | For data cleaning | Not for database design | To third normal form | Individual steps | Finding functional dependencies | Normalisation | Adding foreign key references | Decomposing the original relation | Rejoining a database back into a data frame | Tuning detection and normalisation | Avoidable attributes

Last update: 2025-11-08
Started: 2022-11-29

A larger example: the nudge dataset
Initial decomposition | Simplifying the search result | Fixing the data | Simplifying further with search filters | How not to remove spurious structure | Alternative approach: hierarchical limits

Last update: 2025-11-08
Started: 2025-10-10

Handling missing values
Missing values | Decomposing to remove missing values | Structure conditional on value presence

Last update: 2025-11-08
Started: 2025-10-10

Limitations
Meaningful duplicate rows / row order | Value-based constraints | Semantic types | Table merges don't fix issues with merge.data.frame | Synthesis doesn't minimise relation key count | Normal forms are't all there is to database design

Last update: 2025-11-08
Started: 2025-10-10

Readme and manuals

Help Manual

Help pageTopics
Relational data attributesattrs attrs<-
Relational data attribute orderattrs_order attrs_order<-
Create a normalised database from a data frameautodb
Create a relation from a data frameautokey
Add foreign key references to a normalised databaseautoref
Create instance of a schemacreate
Generate D2 input text to plot objectsd2
Generate D2 input text to plot a data framed2.data.frame
Generate D2 input text to plot databasesd2.database
Generate D2 input text to plot database schemasd2.database_schema
Generate D2 input text to plot relationsd2.relation
Generate D2 input text to plot relation schemasd2.relation_schema
Databasesdatabase
Database schemasdatabase_schema
Decompose a data frame based on given normalised dependenciesdecompose
Dependantsdependant dependant<-
Determinant setsdetset detset<-
Determine Duplicate Elementsdf_anyDuplicated df_duplicated df_records df_unique
Test data frames for equivalence under row reorderingdf_equiv
Combine R Objects by Rows or Columnsdf_rbind
Dependency discovery with DFDdiscover
Key discovery with MCSSdiscover_keys
Functional dependency vectorsfunctional_dependency
Generate Graphviz input text to plot objectsgv
Generate Graphviz input text to plot a data framegv.data.frame
Generate Graphviz input text to plot databasesgv.database
Generate Graphviz input text to plot database schemasgv.database_schema
Generate Graphviz input text to plot relationsgv.relation
Generate Graphviz input text to plot relation schemasgv.relation_schema
Insert datainsert
Relational data keyskeys keys<-
Merge relation schemas with empty keysmerge_empty_keys
Merge relation schemas in given pairsmerge_schemas
Create normalised database schemas from functional dependenciesnormalise
Nudge meta-analysis datanudge
Relational data recordsrecords records<-
Remove relations not linked to the main relationsreduce
Remove database relations not linked to the main relationsreduce.database
Remove database schema relations not linked to the given relationsreduce.database_schema
Schema referencesreferences references<-
Join a database into a data framerejoin
Relation vectorsrelation
Relation schema vectorsrelation_schema
Remove extraneous components from functional dependenciesremove_extraneous
Rename relational data attributesrename_attrs
Database subrelationssubrelations
Schema subschemassubschemas
Synthesise relation schemas from functional dependenciessynthesise