No description
Find a file
2026-04-14 09:17:42 +03:00
dataset Upload files to "dataset" 2026-04-14 09:17:42 +03:00
example_dataset_finnougric_objects.csv Upload files to "/" 2026-02-25 12:14:34 +02:00
ReadMe.md Update ReadMe.md 2026-04-14 09:16:23 +03:00

Finno-Ugric Objects Dataset (ERM / MuIS)

Access & allowed use

This dataset is provided for:

  • hackathon research and prototyping
  • visualizations and models
  • presentations and demo publications

Not allowed:

  • commercial reuse
  • re-publishing raw data outside this repository
  • redistributing linked media in bulk outside their original source context
  • treating uncertain metadata as verified fact without qualification

Citation requirement

Estonian National Museum / MuIS open data.
Finno-Ugric collection dataset, 2026.


Data structure

Main file

finno-ugric-objects-dataset.csv (or .xlsx)

This is the main table, containing one row per object metadata record.


  • finno-ugric-object-places.csv
  • finno-ugric-object-materials.csv
  • finno-ugric-object-collectors.csv
  • finno_ugric_object_relations.csv

These tables represent one-to-many relationships and are designed for easier analysis.


Technical details

  • Delimiter (CSV): , or ;
  • Encoding: UTF-8

Row represents

In the main dataset:

→ one metadata record describing one museum object

Not:

  • necessarily one perfectly normalized object record
  • a complete or authoritative museum description
  • a single interpretation of place, people, or date

Main dataset fields

Field Meaning Notes
object_id unique object identifier stable MuIS-based reference
object_url link to object page public source record
image_url link to media page availability may vary
museaali_number museum inventory number collection-specific identifier
title_clean normalized object title useful for grouping
material raw material field may contain multiple values
material_category broader material grouping manually assigned
description_type_label type of description e.g. comment, legend
description_text free-text description rich but uneven
places place information may contain multiple values
people_names ethnonyms / associated groups simplified field
collection_date_raw original date expression textual
collection_date_start normalized start date ISO format
collection_date_end normalized end date ISO format
collectors collectors / fieldworkers may contain multiple names
related_objects linked objects may be sparse
relation_type type of relationship heterogeneous

1. Places

finno-ugric-object-places.csv

  • one row = one object + one place
  • extracted from the main places field

Used for:

  • mapping
  • geographic filtering
  • place normalization

2. Materials

finno-ugric-object-materials.csv

  • one row = one object + one material
  • derived from material field

Includes:

  • detailed material values (preserved)
  • original variation retained

3. Collectors

finno-ugric-object-collectors.csv

  • one row = one object + one collector

Used for:

  • analyzing fieldwork
  • mapping collector activity
  • building networks

4. Object relations

finno_ugric_object_relations.csv

  • one row = one object pair (object ↔ related object)

Used for:

  • identifying object clusters
  • reconstructing sets or collections
  • network analysis

Note:

  • only a small subset of objects contains relations (~56%)
  • relations are likely intentional, not system-generated noise

About description fields

The description_text field is one of the richest parts of the dataset. It may include:

  • acquisition narratives
  • collection context
  • fieldwork notes
  • use descriptions
  • provenance information

It is valuable for:

  • text mining
  • qualitative analysis

However, it is:

  • uneven
  • multilingual
  • historically layered
  • not normalized

About place and people fields

The places and people_names fields are simplified analytical extracts.

Important characteristics:

  • multiple values may appear in a single field
  • historical and modern place names may coexist
  • most locations are in present-day Russia
  • names often reflect Soviet-era transliteration
  • values range from broad regions to specific villages

These fields are:

  • useful for exploration
  • not fully standardized authority data

About museaali_number

The museaali_number reflects archival grouping.

Example:

ERM B 193:1 → ERM B 193:13

This indicates:

  • a shared collection or expedition
  • multiple objects belonging to the same acquisition

Useful for:

  • reconstructing collecting events
  • grouping objects by expedition

About dates

The dataset includes both raw and normalized date fields.

  • collection_date_raw → original textual form
  • collection_date_start → normalized start date
  • collection_date_end → normalized end date

Patterns include:

  • full date ranges
  • partial ranges
  • single-year entries

Notes:

  • dates refer to collection, not object creation
  • year is almost always available
  • precision varies

Limitations

  • metadata quality is uneven
  • image links may not always work
  • multiple images per object may exist
  • some fields are missing or partially standardized
  • object titles vary in specificity
  • material categories are analytical simplifications
  • place names may require normalization
  • duplicate object rows may exist
  • description texts may contain historically sensitive terminology

General note

This dataset is designed for:

  • exploration
  • experimentation
  • prototyping

It is not a fully normalized museum database, but a research-friendly extract.