No description
Find a file
2026-04-13 23:59:25 +03:00
Dataset_example_building_projects.csv Upload files to "/" 2026-02-25 12:19:26 +02:00
README.md Update extended fields on README 2026-04-13 23:57:31 +03:00
Tartu_building_projects_full_dataset.csv Upload full csv-dataset 2026-04-13 23:59:25 +03:00

dataset-buiding-projects


Dataset: Tartu building project register

(19th-century urban construction permits and plans)


1. Access & allowed use

This dataset is provided for:

  • ✔ hackathon research and prototyping
  • ✔ visualizations and models
  • ✔ presentations and demo publications

Not allowed:

  • ✘ commercial reuse
  • ✘ re-publishing raw data outside this repository
  • ✘ identifying private individuals beyond historical context

Citation requirement

National Archives of Estonia. Tartu Architectural Projects dataset, 2026.


2. Data structure

Main file

data/raw/*.csv
  • Delimiter: |
  • Encoding: UTF-8

Row represents

One building project record (permit / plan entry)

Not:

  • a single building
  • a finished structure
  • a full property history

Key fields

Field Meaning Notes
id unique identifier stable
Projekti kuupäev project date administrative date
Linn / Tänav / Maja nr location historical naming
Otstarve intended use planned function
Ehituse tüüp project type e.g. new construction / extension
Välisseina materjal wall material planned material
Korruseid floors planned structure
Omanikud owners / commissioners JSON-encoded persons
Failid linked drawings archival references

Extended fields (full schema)

The dataset contains a large number of fields describing archival metadata, building characteristics, and project details. They are grouped below for clarity.


📁 Archival reference

Field Meaning
Fondi nimi Archive collection name
Fond Collection identifier
Nimistu Inventory number
Säilik File / unit
Lehekülg Page
refcode Archival reference code
Pages Sequence Page order within file

🖼️ Linked materials

Field Meaning
Failid Linked image/drawing references
Krundiplaan Site plan
Ehitusplaan vanas / uues Building plan (before / after)
Korruseplaan vanas / uues Floor plan
Lõige vanas / uues Section drawing
Tänavapoolne fassaad vanas / uues Street-facing facade
Fassaad24 vanas / uues Additional facades
Sisustusplaan vanas / uues Interior plan

📍 Location

Field Meaning
Linn City
Linnaosa District
Tänav Street (historical)
Tänav (uus) Street (new=1940)
Maja nr. (uus) New (=1940) house number
Kinnistu nr. Property number

🏗️ Building description

Field Meaning
Otstarve Intended use
Ehituse tüüp Type of construction
Kapitaalsuse grupp Building category / durability class
Välisseina materjal Outer wall material
Vaheseina materjal Inner wall material
Viimistlus Finishing
Katusematerjal Roof material

⚙️ Infrastructure & amenities

Field Meaning
Küte Heating
Vesi Water supply
Soe vesi vanas / uues Hot water
Vann vanas / uues Bath
Kuivkäimla Dry toilet

🏢 Structure (before vs after)

Field Meaning
Korruseid vanas / uues Number of floors
Soklikorrus vanas / uues Basement level
Mansardkorrus vanas / uues Attic floor
Solk vanas / uues Sewage / drainage

🏠 Apartment structure

Field Meaning
110-toalisi kortereid vanas / uues Number of apartments by size
rohkem kui 10-toalisi kortereid vanas / uues Large apartments

(Note: some fields may appear duplicated or inconsistently named — this reflects the source structure.)


👥 Ownership

Field Meaning
Projekti tellija \ Omanikud Project commissioner / owner(s)

📅 Project metadata

Field Meaning
Projekti nr. Project number
Projekti kuupäev Project date
Projekti kuupäeva täpsus Date precision
Projekti kinnitamise kuupäev Approval date
Projekti kinnitamise kuupäeva täpsus Approval date precision

📝 Additional notes

Field Meaning
Märkused Free-text notes

⚠️ Important notes about the schema

  • Many fields exist in pairs (vanas / uues) → representing before vs planned state, not time series

  • Not all fields are filled for all records → absence often reflects administrative relevance

  • Some column names are inconsistent or duplicated → this reflects the historical database export, not errors


Why this structure matters

The dataset contains:

  • archival metadata
  • spatial data
  • building descriptions
  • transformation data (before → after)
  • linked visual materials

About the owner field

The Omanikud field contains structured JSON embedded in text, for example:

[{"Eesnimi": "...", "Perenimi": "..."}]

This allows reconstruction of:

  • ownership networks
  • commissioners vs builders
  • institutions vs individuals

This is structured data embedded in a string.


About materials and structure fields

These describe the planned building, not necessarily the built one.

They reflect:

  • regulatory expectations
  • reporting practices
  • administrative requirements

Do not interpret them as a direct description of the historical cityscape.

3. Associated image materials (temporary)

Some project records have linked visual materials:

They will be provided separately during the hackathon only and will be available via link through this README.

These files are not part of the persistent dataset and should be treated as a temporary research environment.

Relationship to the CSV:

one record → zero, one, or multiple images images correspond via identifiers in the Failid field absence of an image does not mean absence of a project

⚠ Important technical note

You should design your workflow so that results do not depend on keeping the image files locally after the hackathon. Extract features, annotations, or measurements instead of building pipelines that require long-term access to the images.


🔒 Temporary access to image materials

A ZIP archive with digitized project drawings is provided.

These files are provided only for the duration of the hackathon as temporary research material.


Allowed during the event

  • ✔ viewing the images
  • ✔ analysing visual features
  • ✔ extracting measurements or annotations
  • ✔ using derived data (coordinates, counts, classifications, embeddings, notes)
  • ✔ including small illustrative excerpts in presentations

Not allowed

  • ✘ storing the images permanently on personal devices
  • ✘ uploading images to external services
  • ✘ training reusable datasets containing the images
  • ✘ redistributing the images after the event
  • ✘ keeping local copies after the hackathon ends

After the hackathon, participants must delete all local copies of the image files.


Derived results (allowed to keep)

  • measurements
  • annotations
  • model outputs
  • statistics
  • visualizations
  • manually redrawn diagrams

Why this restriction exists

The images are shared under a limited access agreement with the holding institution. We can enable research access, but not public redistribution.

If you are unsure whether something counts as derived data, please ask in the Issues tab before using it.