Raster data management
This is mostly written from the ESRI perspective at the moment...
This page started with a 30m Oregon hillshade that crashed ArcMap, but that was a long time ago. (2006) That story is pushed down to the bottom of this page because I have more to say about rasters.
Storing rasters
Some basic formats include
- GRID - ESRI proprietary format
- GeoTIFF - TIFF tagged as geospatial
- MrSID - Lizardtech proprietary format
Organizing raster data in ESRI world
Explained, so I don't have to: https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/raster-data-organization.htm
Raster dataset - ESRI's words for "a bunch of pixels in one place", can be TIFF or in a geodatabase etc. Proudly they support 70 formats.
Mosaic dataset - A container that references raster datasets. Data can overlap or be disjoint.
Raster catalog - Don't use. This is an old format where you generate a table that has the name/location of each raster in it.
Terrain - Not a raster, it's a TIN build from 3d data. I don't have a copy of the 3D Analyst extension at work so I will write about this another time.
The Oregon HS grid problem
From 2006. I probably still have it somewhere, though I might have tossed it by now.
The first thing ArcMap tries to do when loading a raster is to 'build pyramids' if they don't exist. (ArcCatalog does the same thing when you preview the file.) Pyramids are downsampled copies of the full image. ArcGIS stores them in a file with the extension 'rrd'. You can tell it not to build them, but then it still has to downsample the image to display it. I think it crashes in the downsample, too.
First question: how big is the original raster?
The file size is approximately 30 gigabytes.
The state of Oregon is 600 kilometers wide by 485 kilometers tall.
A pixel in this file is 30 meters. The pixel depth of the original USGS data is one byte.
A digression
One byte to represent height? That's 256 levels, so if the vertical resolution is ten feet per value, you'd only be able to represent 2560 feet! For a given quad, they establish the minimum and and maximum heights and then scale the data between these limits. When you tile two of them together in a map, the tonal values are off. "Black" in quad A might represent 1200 feet and in quad B it might be 1400 feet. This looks bad but fortunately we don't actually need to view DEM's in a finished map. We use them to develop a hillshade or a contour.
This project is really about developing a hillshade for the entire state. BUT, in my experience with a smaller area (8 USGS quads) I had to mosaic the DEM data into a single raster before running Spatial Analyst on it to create the final hillshade.
Back to the main story.
600 * 485 * 1000m/km * 1000/30 meters/pixel = 9.6 billion pixels to represent state
If the grid has three layers (RGB) and has elevations scaled into an 8 bit pixel that would give me around 30 GB uncompressed.
I don't want it represented as three bands so if the 30 GB represents one band then it could be using 24 bits to represent elevation.
I can see the point of having a state wide hillshade for online applications... I want to be able to view the whole state and then zoom in without loss of resolution. But what I need is a selection of several hillshades at appropriate resolutions for the currently displayed scale. That's what we are attempting to do when we build pyramids. Remember pyramids?
I am hearing a voice in my head (it happens) saying SPATIAL DATABASE! I think to manage raster data at this level (or higher, see http://seamless.usgs.gov/ requires a smart backend. When the front end (ArcMap) requests some data, it has to decide which data to send: for a small scale, send low res data, and as scale increases, select a higher res dataset.
Then when the data arrive at ArcMap, it can decide how to downsample for final display.
And that's all I feel like writing at the moment, because I have not had my coffee yet.