Tools for creating metadata files for reproducible research.
The goal of metadatar is to help produce minimum metadata files to
document datasets in simple formats that can form building blocks of
more complex metadata formats (eg. EML, rdf).
You can install the developent version of metadatar from GitHub with:
#install.packages("devtools")
devtools::install_github("annakrystalli/metadatar")
This is a basic example which shows you how to create a metadata table
for the gapminder dataset
library(gapminder)
library(metadatar)
str(gapminder)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 1704 obs. of 6 variables:
#> $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
#> $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
#> $ lifeExp : num 28.8 30.3 32 34 36.1 ...
#> $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
#> $ gdpPercap: num 779 821 853 836 740 ...
meta_shell <- mt_create_meta_shell(gapminder)
knitr::kable(meta_shell)
attributeName | attributeDefinition | columnClasses | numberType | unit | minimum | maximum | formatString | definition | code | levels |
---|---|---|---|---|---|---|---|---|---|---|
country | NA | factor | NA | NA | NA | NA | NA | NA | NA | NA |
continent | NA | factor | NA | NA | NA | NA | NA | NA | NA | NA |
year | NA | numeric | NA | NA | NA | NA | NA | NA | NA | NA |
lifeExp | NA | numeric | NA | NA | NA | NA | NA | NA | NA | NA |
pop | NA | numeric | NA | NA | NA | NA | NA | NA | NA | NA |
gdpPercap | NA | numeric | NA | NA | NA | NA | NA | NA | NA | NA |
I’ve focused on recognized column headers to make it easier to create an
EML object down the line and on the core columns required but additional
ones can be added.
"numeric"
, "character"
, "factor"
,"ordered"
, or "Date"
, case sensitive)columnClasses
dependant attributesnumeric
(ratio or interval) data:
character
(textDomain) data:
dateTime
data:
11-03-2001
formatString"DD-MM-YYYY"
code
and levels
to store information on";"
to separate code and level descriptions. Thesemetadatar
function mt_extract_attr_factors()
meta_df <- readr::read_csv(system.file("extdata", "gapminder_meta.csv", package="metadatar"))
knitr::kable(meta_df)
attributeName | attributeDefinition | columnClasses | numberType | unit | minimum | maximum | formatString | definition | code | levels |
---|---|---|---|---|---|---|---|---|---|---|
country | Country | factor | NA | NA | NA | NA | NA | NA | Afghanistan;Albania;Algeria;Angola;Argentina;Australia;Austria;Bahrain;Bangladesh;Belgium;Benin;Bolivia;Bosnia and Herzegovina;Botswana;Brazil;Bulgaria;Burkina Faso;Burundi;Cambodia;Cameroon;Canada;Central African Republic;Chad;Chile;China;Colombia;Comoros;Congo, Dem. Rep.;Congo, Rep.;Costa Rica;Cote d’Ivoire;Croatia;Cuba;Czech Republic;Denmark;Djibouti;Dominican Republic;Ecuador;Egypt;El Salvador;Equatorial Guinea;Eritrea;Ethiopia;Finland;France;Gabon;Gambia;Germany;Ghana;Greece;Guatemala;Guinea;Guinea-Bissau;Haiti;Honduras;Hong Kong, China;Hungary;Iceland;India;Indonesia;Iran;Iraq;Ireland;Israel;Italy;Jamaica;Japan;Jordan;Kenya;Korea, Dem. Rep.;Korea, Rep.;Kuwait;Lebanon;Lesotho;Liberia;Libya;Madagascar;Malawi;Malaysia;Mali;Mauritania;Mauritius;Mexico;Mongolia;Montenegro;Morocco;Mozambique;Myanmar;Namibia;Nepal;Netherlands;New Zealand;Nicaragua;Niger;Nigeria;Norway;Oman;Pakistan;Panama;Paraguay;Peru;Philippines;Poland;Portugal;Puerto Rico;Reunion;Romania;Rwanda;Sao Tome and Principe;Saudi Arabia;Senegal;Serbia;Sierra Leone;Singapore;Slovak Republic;Slovenia;Somalia;South Africa;Spain;Sri Lanka;Sudan;Swaziland;Sweden;Switzerland;Syria;Taiwan;Tanzania;Thailand;Togo;Trinidad and Tobago;Tunisia;Turkey;Uganda;United Kingdom;United States;Uruguay;Venezuela;Vietnam;West Bank and Gaza;Yemen, Rep.;Zambia;Zimbabwe | Afghanistan;Albania;Algeria;Angola;Argentina;Australia;Austria;Bahrain;Bangladesh;Belgium;Benin;Bolivia;Bosnia and Herzegovina;Botswana;Brazil;Bulgaria;Burkina Faso;Burundi;Cambodia;Cameroon;Canada;Central African Republic;Chad;Chile;China;Colombia;Comoros;Congo, Dem. Rep.;Congo, Rep.;Costa Rica;Cote d’Ivoire;Croatia;Cuba;Czech Republic;Denmark;Djibouti;Dominican Republic;Ecuador;Egypt;El Salvador;Equatorial Guinea;Eritrea;Ethiopia;Finland;France;Gabon;Gambia;Germany;Ghana;Greece;Guatemala;Guinea;Guinea-Bissau;Haiti;Honduras;Hong Kong, China;Hungary;Iceland;India;Indonesia;Iran;Iraq;Ireland;Israel;Italy;Jamaica;Japan;Jordan;Kenya;Korea, Dem. Rep.;Korea, Rep.;Kuwait;Lebanon;Lesotho;Liberia;Libya;Madagascar;Malawi;Malaysia;Mali;Mauritania;Mauritius;Mexico;Mongolia;Montenegro;Morocco;Mozambique;Myanmar;Namibia;Nepal;Netherlands;New Zealand;Nicaragua;Niger;Nigeria;Norway;Oman;Pakistan;Panama;Paraguay;Peru;Philippines;Poland;Portugal;Puerto Rico;Reunion;Romania;Rwanda;Sao Tome and Principe;Saudi Arabia;Senegal;Serbia;Sierra Leone;Singapore;Slovak Republic;Slovenia;Somalia;South Africa;Spain;Sri Lanka;Sudan;Swaziland;Sweden;Switzerland;Syria;Taiwan;Tanzania;Thailand;Togo;Trinidad and Tobago;Tunisia;Turkey;Uganda;United Kingdom;United States;Uruguay;Venezuela;Vietnam;West Bank and Gaza;Yemen, Rep.;Zambia;Zimbabwe |
continent | Continent | factor | NA | NA | NA | NA | NA | NA | Africa;Americas;Asia;Europe;Oceania | Africa;Americas;Asia;Europe;Oceania |
year | Year | numeric | interval | NA | 1952 | 2007 | NA | NA | NA | NA |
lifeExp | Life Expectancy | numeric | ratio | NA | 0 | NA | NA | NA | NA | NA |
pop | Population | numeric | ratio | NA | 0 | NA | NA | NA | NA | NA |
gdpPercap | Per capita Gross Domestic Product | numeric | ratio | International Dollars | NA | NA | NA | NA | NA | NA |
mt_extract_factors_tbl(meta_df)
#> # A tibble: 147 x 3
#> attributeName code definition
#> <fct> <fct> <fct>
#> 1 country Afghanistan Afghanistan
#> 2 country Albania Albania
#> 3 country Algeria Algeria
#> 4 country Angola Angola
#> 5 country Argentina Argentina
#> 6 country Australia Australia
#> 7 country Austria Austria
#> 8 country Bahrain Bahrain
#> 9 country Bangladesh Bangladesh
#> 10 country Belgium Belgium
#> # … with 137 more rows
Create more descriptive variable labels for plot axes/titles or tables
mt_label(meta_df, var = "gdpPercap")
#> [1] "Per capita Gross Domestic Product (International Dollars)"
Please note that this project is released with a Contributor Code of
Conduct. By participating in this project you agree
to abide by its terms.