| Dataset_example_building_projects.csv | ||
| README.md | ||
| Tartu_building_projects_full_dataset.csv | ||
dataset-buiding-projects
Dataset: Tartu building project register
(19th-century urban construction permits and plans)
1. Access & allowed use
This dataset is provided for:
- ✔ hackathon research and prototyping
- ✔ visualizations and models
- ✔ presentations and demo publications
Not allowed:
- ✘ commercial reuse
- ✘ re-publishing raw data outside this repository
- ✘ identifying private individuals beyond historical context
Citation requirement
National Archives of Estonia. Tartu Architectural Projects dataset, 2026.
2. Data structure
Main file
data/raw/*.csv
- Delimiter:
| - Encoding: UTF-8
Row represents
One building project record (permit / plan entry)
Not:
- a single building
- a finished structure
- a full property history
Key fields
| Field | Meaning | Notes |
|---|---|---|
| id | unique identifier | stable |
| Projekti kuupäev | project date | administrative date |
| Linn / Tänav / Maja nr | location | historical naming |
| Otstarve | intended use | planned function |
| Ehituse tüüp | project type | e.g. new construction / extension |
| Välisseina materjal | wall material | planned material |
| Korruseid | floors | planned structure |
| Omanikud | owners / commissioners | JSON-encoded persons |
| Failid | linked drawings | archival references |
Extended fields (full schema)
The dataset contains a large number of fields describing archival metadata, building characteristics, and project details. They are grouped below for clarity.
📁 Archival reference
| Field | Meaning |
|---|---|
| Fondi nimi | Archive collection name |
| Fond | Collection identifier |
| Nimistu | Inventory number |
| Säilik | File / unit |
| Lehekülg | Page |
| refcode | Archival reference code |
| Pages Sequence | Page order within file |
🖼️ Linked materials
| Field | Meaning |
|---|---|
| Failid | Linked image/drawing references |
| Krundiplaan | Site plan |
| Ehitusplaan vanas / uues | Building plan (before / after) |
| Korruseplaan vanas / uues | Floor plan |
| Lõige vanas / uues | Section drawing |
| Tänavapoolne fassaad vanas / uues | Street-facing facade |
| Fassaad2–4 vanas / uues | Additional facades |
| Sisustusplaan vanas / uues | Interior plan |
📍 Location
| Field | Meaning |
|---|---|
| Linn | City |
| Linnaosa | District |
| Tänav | Street (historical) |
| Tänav (uus) | Street (new=1940) |
| Maja nr. (uus) | New (=1940) house number |
| Kinnistu nr. | Property number |
🏗️ Building description
| Field | Meaning |
|---|---|
| Otstarve | Intended use |
| Ehituse tüüp | Type of construction |
| Kapitaalsuse grupp | Building category / durability class |
| Välisseina materjal | Outer wall material |
| Vaheseina materjal | Inner wall material |
| Viimistlus | Finishing |
| Katusematerjal | Roof material |
⚙️ Infrastructure & amenities
| Field | Meaning |
|---|---|
| Küte | Heating |
| Vesi | Water supply |
| Soe vesi vanas / uues | Hot water |
| Vann vanas / uues | Bath |
| Kuivkäimla | Dry toilet |
🏢 Structure (before vs after)
| Field | Meaning |
|---|---|
| Korruseid vanas / uues | Number of floors |
| Soklikorrus vanas / uues | Basement level |
| Mansardkorrus vanas / uues | Attic floor |
| Solk vanas / uues | Sewage / drainage |
🏠 Apartment structure
| Field | Meaning |
|---|---|
| 1–10-toalisi kortereid vanas / uues | Number of apartments by size |
| rohkem kui 10-toalisi kortereid vanas / uues | Large apartments |
(Note: some fields may appear duplicated or inconsistently named — this reflects the source structure.)
👥 Ownership
| Field | Meaning |
|---|---|
| Projekti tellija \ Omanikud | Project commissioner / owner(s) |
📅 Project metadata
| Field | Meaning |
|---|---|
| Projekti nr. | Project number |
| Projekti kuupäev | Project date |
| Projekti kuupäeva täpsus | Date precision |
| Projekti kinnitamise kuupäev | Approval date |
| Projekti kinnitamise kuupäeva täpsus | Approval date precision |
📝 Additional notes
| Field | Meaning |
|---|---|
| Märkused | Free-text notes |
⚠️ Important notes about the schema
-
Many fields exist in pairs (vanas / uues) → representing before vs planned state, not time series
-
Not all fields are filled for all records → absence often reflects administrative relevance
-
Some column names are inconsistent or duplicated → this reflects the historical database export, not errors
Why this structure matters
The dataset contains:
- archival metadata
- spatial data
- building descriptions
- transformation data (before → after)
- linked visual materials
About the owner field
The Omanikud field contains structured JSON embedded in text, for example:
[{"Eesnimi": "...", "Perenimi": "..."}]
This allows reconstruction of:
- ownership networks
- commissioners vs builders
- institutions vs individuals
This is structured data embedded in a string.
About materials and structure fields
These describe the planned building, not necessarily the built one.
They reflect:
- regulatory expectations
- reporting practices
- administrative requirements
Do not interpret them as a direct description of the historical cityscape.
3. Associated image materials (temporary)
Some project records have linked visual materials:
They will be provided separately during the hackathon only and will be available via link through this README.
These files are not part of the persistent dataset and should be treated as a temporary research environment.
Relationship to the CSV:
one record → zero, one, or multiple images images correspond via identifiers in the Failid field absence of an image does not mean absence of a project
⚠ Important technical note
You should design your workflow so that results do not depend on keeping the image files locally after the hackathon. Extract features, annotations, or measurements instead of building pipelines that require long-term access to the images.
🔒 Temporary access to image materials
A ZIP archive with digitized project drawings is provided.
These files are provided only for the duration of the hackathon as temporary research material.
Allowed during the event
- ✔ viewing the images
- ✔ analysing visual features
- ✔ extracting measurements or annotations
- ✔ using derived data (coordinates, counts, classifications, embeddings, notes)
- ✔ including small illustrative excerpts in presentations
Not allowed
- ✘ storing the images permanently on personal devices
- ✘ uploading images to external services
- ✘ training reusable datasets containing the images
- ✘ redistributing the images after the event
- ✘ keeping local copies after the hackathon ends
After the hackathon, participants must delete all local copies of the image files.
Derived results (allowed to keep)
- measurements
- annotations
- model outputs
- statistics
- visualizations
- manually redrawn diagrams
Why this restriction exists
The images are shared under a limited access agreement with the holding institution. We can enable research access, but not public redistribution.
If you are unsure whether something counts as derived data, please ask in the Issues tab before using it.