| dataset | ||
| example_dataset_finnougric_objects.csv | ||
| ReadMe.md | ||
Finno-Ugric Objects Dataset (ERM / MuIS)
Access & allowed use
This dataset is provided for:
- hackathon research and prototyping
- visualizations and models
- presentations and demo publications
Not allowed:
- commercial reuse
- re-publishing raw data outside this repository
- redistributing linked media in bulk outside their original source context
- treating uncertain metadata as verified fact without qualification
Citation requirement
Estonian National Museum / MuIS open data.
Finno-Ugric collection dataset, 2026.
Data structure
Main file
finno-ugric-objects-dataset.csv (or .xlsx)
This is the main table, containing one row per object metadata record.
Related tables (linked via object_id)
finno-ugric-object-places.csvfinno-ugric-object-materials.csvfinno-ugric-object-collectors.csvfinno_ugric_object_relations.csv
These tables represent one-to-many relationships and are designed for easier analysis.
Technical details
- Delimiter (CSV):
,or; - Encoding: UTF-8
Row represents
In the main dataset:
→ one metadata record describing one museum object
Not:
- necessarily one perfectly normalized object record
- a complete or authoritative museum description
- a single interpretation of place, people, or date
Main dataset fields
| Field | Meaning | Notes |
|---|---|---|
| object_id | unique object identifier | stable MuIS-based reference |
| object_url | link to object page | public source record |
| image_url | link to media page | availability may vary |
| museaali_number | museum inventory number | collection-specific identifier |
| title_clean | normalized object title | useful for grouping |
| material | raw material field | may contain multiple values |
| material_category | broader material grouping | manually assigned |
| description_type_label | type of description | e.g. comment, legend |
| description_text | free-text description | rich but uneven |
| places | place information | may contain multiple values |
| people_names | ethnonyms / associated groups | simplified field |
| collection_date_raw | original date expression | textual |
| collection_date_start | normalized start date | ISO format |
| collection_date_end | normalized end date | ISO format |
| collectors | collectors / fieldworkers | may contain multiple names |
| related_objects | linked objects | may be sparse |
| relation_type | type of relationship | heterogeneous |
Related tables explained
1. Places
finno-ugric-object-places.csv
- one row = one object + one place
- extracted from the main
placesfield
Used for:
- mapping
- geographic filtering
- place normalization
2. Materials
finno-ugric-object-materials.csv
- one row = one object + one material
- derived from
materialfield
Includes:
- detailed material values (preserved)
- original variation retained
3. Collectors
finno-ugric-object-collectors.csv
- one row = one object + one collector
Used for:
- analyzing fieldwork
- mapping collector activity
- building networks
4. Object relations
finno_ugric_object_relations.csv
- one row = one object pair (object ↔ related object)
Used for:
- identifying object clusters
- reconstructing sets or collections
- network analysis
Note:
- only a small subset of objects contains relations (~5–6%)
- relations are likely intentional, not system-generated noise
About description fields
The description_text field is one of the richest parts of the dataset. It may include:
- acquisition narratives
- collection context
- fieldwork notes
- use descriptions
- provenance information
It is valuable for:
- text mining
- qualitative analysis
However, it is:
- uneven
- multilingual
- historically layered
- not normalized
About place and people fields
The places and people_names fields are simplified analytical extracts.
Important characteristics:
- multiple values may appear in a single field
- historical and modern place names may coexist
- most locations are in present-day Russia
- names often reflect Soviet-era transliteration
- values range from broad regions to specific villages
These fields are:
- useful for exploration
- not fully standardized authority data
About museaali_number
The museaali_number reflects archival grouping.
Example:
ERM B 193:1 → ERM B 193:13
This indicates:
- a shared collection or expedition
- multiple objects belonging to the same acquisition
Useful for:
- reconstructing collecting events
- grouping objects by expedition
About dates
The dataset includes both raw and normalized date fields.
collection_date_raw→ original textual formcollection_date_start→ normalized start datecollection_date_end→ normalized end date
Patterns include:
- full date ranges
- partial ranges
- single-year entries
Notes:
- dates refer to collection, not object creation
- year is almost always available
- precision varies
Limitations
- metadata quality is uneven
- image links may not always work
- multiple images per object may exist
- some fields are missing or partially standardized
- object titles vary in specificity
- material categories are analytical simplifications
- place names may require normalization
- duplicate object rows may exist
- description texts may contain historically sensitive terminology
General note
This dataset is designed for:
- exploration
- experimentation
- prototyping
It is not a fully normalized museum database, but a research-friendly extract.