Open Knowledge Foundation (OKFN) Labs News
Wanted - Data Curators to Maintain Key Datasets in High-Quality, Easy-to-Use and Open Form
Wanted: volunteers to join a team of “Data Curators” maintaining “core” datasets (like GDP or ISO-codes) in high-quality, easy-to-use and open form.
- What is the project about: Collecting and maintaining important and commonly-used (“core”) datasets in high-quality, standardized and easy-to-use form - in particular, as up-to-date, well-structured Data Packages.
The “Core Datas
Wanted: volunteers to join a team of “Data Curators” maintaining “core” datasets (like GDP or ISO-codes) in high-quality, easy-to-use and open form.
- What is the project about: Collecting and maintaining important and commonly-used (“core”) datasets in high-quality, standardized and easy-to-use form - in particular, as up-to-date, well-structured Data Packages.
The “Core Datasets” effort is part of the broader Frictionless Data initiative.
- What would you be doing: identifying and locating core (public) datasets, cleaning and standardizing the data and making sure the results are kept up to date and easy to use
- Who can participate: anyone can contribute. Details on the skills needed are below.
- Get involved: read more below or jump straight to the sign-up section.
♦
What is the Core Datasets effort?
Summary: Collect and maintain important and commonly-used (“core”) datasets in high-quality, reliable and easy-to-use form (as Data Packages).
Core = important and commonly-used datasets e.g. reference data (country codes) and indicators (inflation, GDP)
Curate = take existing data and provide it in high-quality, reliable, and easy-to-use form (standardized, structured, open)
- Full details: including slide-deck at data.okfn.org/roadmap/core-datasets.
- Live examples: You can find already packaged core datasets at data.okfn.org/data/ and in “raw” form on Github at github.com/datasets/
What Roles and Skills are Needed
We need a variety of roles from identifying new “core” datasets to packaging the data to performing quality control (checking metadata etc).
Core Skills - at least one of these skills will be needed:
- Data Wrangling Experience. Many of our source datasets are not complex (just an Excel file or similar) and can be “wrangled” in a Spreadsheet program. What we therefore recommend is at least one of:
- Experience with a Spreadsheet application such as Excel or (preferably) Google Docs including use of formulas and (desirably) macros (you should at least know how you could quickly convert a cell containing ‘2014’ to ‘2014-01-01’ across 1000 rows)
- Coding for data processing (especially scraping) in one or more of python, javascript, bash
- Data sleuthing - the ability to dig up data on the web (specific desirable skills: you know how to search by filetype in google, you know where the developer tools are in chrome or firefox, you know how to find the URL a form posts to)
Desirable Skills (the more the better!):
- Data vs Metadata: know difference between data and metadata
- Familiarity with Git (and Github)
- Familiarity with a command line (preferably bash)
- Know what JSON is
- Mac or Unix is your default operating system (will make access to relevant tools that much easier)
- Knowledge of Web APIs and/or HTML
- Use of curl or similar command line tool for accessing Web APIs or web pages
- Scraping using a command line tool or (even better) by coding yourself
- Know what a Data Package and a Tabular Data Package are
- Know what a text editor is (e.g. notepad, textmate, vim, emacs, …) and know how to use it (useful for both working with data and for editing Data Package metadata)
Get Involved - Sign Up Now!
We are looking for volunteer contributors to form a “curation team”.
- Time commitment: Members of the team commit to at least 8-16h per month (though this will be an average - if you are especially busy with other things one month and do less that is fine)
- Schedule: There is no schedule so you can contribute at any time that is good for you - evenings, weekeneds, lunch-times etc
- Location: all activity will be carried out online so you can be based anywhere in the world
- Skills: see above
To register your interest fill in the following form. Any questions, please get in touch directly.
Loading...
Want to Dive Straight In?
Can’t wait to get started as a Data Curator? You can dive straight in and start packaging the already-selected (but not packaged) core datasets. Full instructions here:
data.okfn.org/roadmap/core-datasets#contribute
Wanted: volunteers to join a team of “Data Curators” maintaining “core” datasets (like GDP or ISO-codes) in high-quality, easy-to-use and open form.
- What is the project about: Collecting and maintaining important and commonly-used (“core”) datasets in high-quality, standardized and easy-to-use form - in particular, as up-to-date, well-structured Data Packages.
The “Core Datas
Wanted: volunteers to join a team of “Data Curators” maintaining “core” datasets (like GDP or ISO-codes) in high-quality, easy-to-use and open form.
- What is the project about: Collecting and maintaining important and commonly-used (“core”) datasets in high-quality, standardized and easy-to-use form - in particular, as up-to-date, well-structured Data Packages.
The “Core Datasets” effort is part of the broader Frictionless Data initiative. - What would you be doing: identifying and locating core (public) datasets, cleaning and standardizing the data and making sure the results are kept up to date and easy to use
- Who can participate: anyone can contribute. Details on the skills needed are below.
- Get involved: read more below or jump straight to the sign-up section.
♦
What is the Core Datasets effort?Summary: Collect and maintain important and commonly-used (“core”) datasets in high-quality, reliable and easy-to-use form (as Data Packages).
Core = important and commonly-used datasets e.g. reference data (country codes) and indicators (inflation, GDP)
Curate = take existing data and provide it in high-quality, reliable, and easy-to-use form (standardized, structured, open)
- Full details: including slide-deck at data.okfn.org/roadmap/core-datasets.
- Live examples: You can find already packaged core datasets at data.okfn.org/data/ and in “raw” form on Github at github.com/datasets/
We need a variety of roles from identifying new “core” datasets to packaging the data to performing quality control (checking metadata etc).
Core Skills - at least one of these skills will be needed:
- Data Wrangling Experience. Many of our source datasets are not complex (just an Excel file or similar) and can be “wrangled” in a Spreadsheet program. What we therefore recommend is at least one of:
- Experience with a Spreadsheet application such as Excel or (preferably) Google Docs including use of formulas and (desirably) macros (you should at least know how you could quickly convert a cell containing ‘2014’ to ‘2014-01-01’ across 1000 rows)
- Coding for data processing (especially scraping) in one or more of python, javascript, bash
- Data sleuthing - the ability to dig up data on the web (specific desirable skills: you know how to search by filetype in google, you know where the developer tools are in chrome or firefox, you know how to find the URL a form posts to)
Desirable Skills (the more the better!):
- Data vs Metadata: know difference between data and metadata
- Familiarity with Git (and Github)
- Familiarity with a command line (preferably bash)
- Know what JSON is
- Mac or Unix is your default operating system (will make access to relevant tools that much easier)
- Knowledge of Web APIs and/or HTML
- Use of curl or similar command line tool for accessing Web APIs or web pages
- Scraping using a command line tool or (even better) by coding yourself
- Know what a Data Package and a Tabular Data Package are
- Know what a text editor is (e.g. notepad, textmate, vim, emacs, …) and know how to use it (useful for both working with data and for editing Data Package metadata)
We are looking for volunteer contributors to form a “curation team”.
- Time commitment: Members of the team commit to at least 8-16h per month (though this will be an average - if you are especially busy with other things one month and do less that is fine)
- Schedule: There is no schedule so you can contribute at any time that is good for you - evenings, weekeneds, lunch-times etc
- Location: all activity will be carried out online so you can be based anywhere in the world
- Skills: see above
To register your interest fill in the following form. Any questions, please get in touch directly.
Loading... Want to Dive Straight In?Can’t wait to get started as a Data Curator? You can dive straight in and start packaging the already-selected (but not packaged) core datasets. Full instructions here:
data.okfn.org/roadmap/core-datasets#contribute