metadatar

Tools for creating metadata files for reproducible research.

metadatar

Travis build
status
lifecycle

The goal of metadatar is to help produce minimum metadata files to
document datasets in simple formats that can form building blocks of
more complex metadata formats (eg. EML, rdf).

Installation

You can install the developent version of metadatar from GitHub with:

#install.packages("devtools")
devtools::install_github("annakrystalli/metadatar")

Example

This is a basic example which shows you how to create a metadata table
for the gapminder dataset

library(gapminder)
library(metadatar)
str(gapminder)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    1704 obs. of  6 variables:
#>  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
#>  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
#>  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
#>  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
#>  $ gdpPercap: num  779 821 853 836 740 ...
meta_shell <- mt_create_meta_shell(gapminder)
knitr::kable(meta_shell)
attributeName attributeDefinition columnClasses numberType unit minimum maximum formatString definition code levels
country NA factor NA NA NA NA NA NA NA NA
continent NA factor NA NA NA NA NA NA NA NA
year NA numeric NA NA NA NA NA NA NA NA
lifeExp NA numeric NA NA NA NA NA NA NA NA
pop NA numeric NA NA NA NA NA NA NA NA
gdpPercap NA numeric NA NA NA NA NA NA NA NA

I’ve focused on recognized column headers to make it easier to create an
EML object down the line and on the core columns required but additional
ones can be added.

Attributes associated with all variables:

  • attributeName (required, free text field)
  • attributeDefinition (required, free text field)
  • columnClasses (required, "numeric", "character", "factor",
    "ordered", or "Date", case sensitive)

columnClasses dependant attributes

  • For numeric (ratio or interval) data:
  • For character (textDomain) data:
    • definition (required)
  • For dateTime data:
    • formatString (required) e.g for date 11-03-2001 formatString
      would be "DD-MM-YYYY"
  • I use the columns code and levels to store information on
    factors. I use ";" to separate code and level descriptions. These
    can be extracted by metadatar function mt_extract_attr_factors()
    later
    on.

Complete metadata table

meta_df <- readr::read_csv(system.file("extdata", "gapminder_meta.csv", package="metadatar"))
knitr::kable(meta_df)
attributeName attributeDefinition columnClasses numberType unit minimum maximum formatString definition code levels
country Country factor NA NA NA NA NA NA Afghanistan;Albania;Algeria;Angola;Argentina;Australia;Austria;Bahrain;Bangladesh;Belgium;Benin;Bolivia;Bosnia and Herzegovina;Botswana;Brazil;Bulgaria;Burkina Faso;Burundi;Cambodia;Cameroon;Canada;Central African Republic;Chad;Chile;China;Colombia;Comoros;Congo, Dem. Rep.;Congo, Rep.;Costa Rica;Cote d’Ivoire;Croatia;Cuba;Czech Republic;Denmark;Djibouti;Dominican Republic;Ecuador;Egypt;El Salvador;Equatorial Guinea;Eritrea;Ethiopia;Finland;France;Gabon;Gambia;Germany;Ghana;Greece;Guatemala;Guinea;Guinea-Bissau;Haiti;Honduras;Hong Kong, China;Hungary;Iceland;India;Indonesia;Iran;Iraq;Ireland;Israel;Italy;Jamaica;Japan;Jordan;Kenya;Korea, Dem. Rep.;Korea, Rep.;Kuwait;Lebanon;Lesotho;Liberia;Libya;Madagascar;Malawi;Malaysia;Mali;Mauritania;Mauritius;Mexico;Mongolia;Montenegro;Morocco;Mozambique;Myanmar;Namibia;Nepal;Netherlands;New Zealand;Nicaragua;Niger;Nigeria;Norway;Oman;Pakistan;Panama;Paraguay;Peru;Philippines;Poland;Portugal;Puerto Rico;Reunion;Romania;Rwanda;Sao Tome and Principe;Saudi Arabia;Senegal;Serbia;Sierra Leone;Singapore;Slovak Republic;Slovenia;Somalia;South Africa;Spain;Sri Lanka;Sudan;Swaziland;Sweden;Switzerland;Syria;Taiwan;Tanzania;Thailand;Togo;Trinidad and Tobago;Tunisia;Turkey;Uganda;United Kingdom;United States;Uruguay;Venezuela;Vietnam;West Bank and Gaza;Yemen, Rep.;Zambia;Zimbabwe Afghanistan;Albania;Algeria;Angola;Argentina;Australia;Austria;Bahrain;Bangladesh;Belgium;Benin;Bolivia;Bosnia and Herzegovina;Botswana;Brazil;Bulgaria;Burkina Faso;Burundi;Cambodia;Cameroon;Canada;Central African Republic;Chad;Chile;China;Colombia;Comoros;Congo, Dem. Rep.;Congo, Rep.;Costa Rica;Cote d’Ivoire;Croatia;Cuba;Czech Republic;Denmark;Djibouti;Dominican Republic;Ecuador;Egypt;El Salvador;Equatorial Guinea;Eritrea;Ethiopia;Finland;France;Gabon;Gambia;Germany;Ghana;Greece;Guatemala;Guinea;Guinea-Bissau;Haiti;Honduras;Hong Kong, China;Hungary;Iceland;India;Indonesia;Iran;Iraq;Ireland;Israel;Italy;Jamaica;Japan;Jordan;Kenya;Korea, Dem. Rep.;Korea, Rep.;Kuwait;Lebanon;Lesotho;Liberia;Libya;Madagascar;Malawi;Malaysia;Mali;Mauritania;Mauritius;Mexico;Mongolia;Montenegro;Morocco;Mozambique;Myanmar;Namibia;Nepal;Netherlands;New Zealand;Nicaragua;Niger;Nigeria;Norway;Oman;Pakistan;Panama;Paraguay;Peru;Philippines;Poland;Portugal;Puerto Rico;Reunion;Romania;Rwanda;Sao Tome and Principe;Saudi Arabia;Senegal;Serbia;Sierra Leone;Singapore;Slovak Republic;Slovenia;Somalia;South Africa;Spain;Sri Lanka;Sudan;Swaziland;Sweden;Switzerland;Syria;Taiwan;Tanzania;Thailand;Togo;Trinidad and Tobago;Tunisia;Turkey;Uganda;United Kingdom;United States;Uruguay;Venezuela;Vietnam;West Bank and Gaza;Yemen, Rep.;Zambia;Zimbabwe
continent Continent factor NA NA NA NA NA NA Africa;Americas;Asia;Europe;Oceania Africa;Americas;Asia;Europe;Oceania
year Year numeric interval NA 1952 2007 NA NA NA NA
lifeExp Life Expectancy numeric ratio NA 0 NA NA NA NA NA
pop Population numeric ratio NA 0 NA NA NA NA NA
gdpPercap Per capita Gross Domestic Product numeric ratio International Dollars NA NA NA NA NA NA

Extracting factors

mt_extract_factors_tbl(meta_df)
#> # A tibble: 147 x 3
#>    attributeName code        definition 
#>    <fct>         <fct>       <fct>      
#>  1 country       Afghanistan Afghanistan
#>  2 country       Albania     Albania    
#>  3 country       Algeria     Algeria    
#>  4 country       Angola      Angola     
#>  5 country       Argentina   Argentina  
#>  6 country       Australia   Australia  
#>  7 country       Austria     Austria    
#>  8 country       Bahrain     Bahrain    
#>  9 country       Bangladesh  Bangladesh 
#> 10 country       Belgium     Belgium    
#> # … with 137 more rows

Data visualisation utilities

Create more descriptive variable labels for plot axes/titles or tables

mt_label(meta_df, var = "gdpPercap")
#> [1] "Per capita Gross Domestic Product (International Dollars)"

Please note that this project is released with a Contributor Code of
Conduct
. By participating in this project you agree
to abide by its terms.