This dataset contains hourly measurements of PM10 air pollutant concentrations collected from various monitoring stations located in Poland. The data is stored in an Excel workbook (`.xlsx`) with multiple sheets. Each sheet corresponds to a specific year and pollutant, and includes time-series measurement data. One sheet contains metadata about the stations.
In addition, station coordinates can be spatially joined with land use and land cover data from OpenStreetMap, enabling enriched spatial analysis.
Moreover, the datasets contain detailed hourly weather observations from airport meteorological stations. The data originates from the NOAA Integrated Surface Database (ISD). Each row in the dataset represents a single observation recorded at a specific datetime. The data is formatted using the FM-12 and FM-15 reporting codes and includes multiple meteorological variables, some of which are stored in compound encoded strings
This sheet provides metadata for all monitoring stations that have appeared in the dataset over the years. Each row corresponds to a single station. The columns include:
Each remaining sheet in the Excel workbook contains hourly measurements for a given year and pollutant. The sheet naming convention is as follows:
<Year>_<Pollutant>_<Frequency>
Example: 2023_PM10_1h – PM10 measurements for the year 2023, recorded hourly.
Different years may include different sets of stations, depending on their operational periods.
Missing values may occur due to sensor downtime or data loss.
Station closures and openings are reflected in the yearly measurement sheets accordingly.
The station coordinates (latitude/longitude) in the Stations sheet can be used to spatially enrich the dataset using external geospatial data such as land use information from OpenStreetMap (OSM).
Format: Extracted .pbf file compatible with GIS tools (osmnx, pyrosm, osmium, osmosis, etc.)
Relevant layers/tags:
landuse=* (e.g., residential, industrial, forest, grass)
natural=* (e.g., wood, water)
building=* (e.g., commercial, school)
highway=* (e.g., proximity to roads)
Most of the remaining fields are structured as compound encoded values, where fields are separated by commas and follow specific WMO/NOAA formats. Example: WND, CIG, VIS, etc.
These columns may include specialized measurements such as sky condition, precipitation, runway visibility, etc., depending on the availability:
Note: Some fields may contain placeholder values (99999, 99, etc.) indicating missing or unrecorded data.
Encoding Issues: Some characters in the file may appear as corrupted (e.g., 퍎�, �). This can result from encoding mismatch or locale issues. These should be validated or excluded during preprocessing.
Data Format: Compound values need to be split and decoded based on NOAA documentation (ISD format).
Missing Data: Fields with 99999 or similar values should be treated as missing.
Data have been collected from