Converting a NIMA text file into geojson: Difference between revisions

From Wildsong
Jump to navigationJump to search
Brian Wilson (talk | contribs)
m Created page with 'For the Mapping Vietnam project First download place names from NIMA: ftp://ftp.nga.mil/pub2/gns_data/vm.zip Unzip the file, which gives us "vm.txt" containing tab delimite…'
 
Brian Wilson (talk | contribs)
mNo edit summary
Line 7: Line 7:


Field names are on line 1, and they are described here: http://earth-info.nga.mil/gns/html/gis_countryfiles.html
Field names are on line 1, and they are described here: http://earth-info.nga.mil/gns/html/gis_countryfiles.html
Pseudo code
Read the file.
Create a features dictionary
For each input line,
  Ignore anything not a populated place
  Get coordinates into a point geometry dictionary
  Get the attributes that we want into attributes dictionary
  Save the geometry and attributes to a feature
  Add the feature to the features dictionary
Encode to GeoJson
Write to output.
Python code
<code>
#!/usr/bin/env python
import sys, os
import re
import json
re_newline = re.compile(r'^(.*?)\r?\n?$') # used to remove either dos or unix newline
filename = 'vm.txt'
output = 'vm.geojson'
try :
    f = open(filename, 'r')
except :
    print("Can't open %s" % filename)
    sys.exit(-1)
# first line contains field names
m = re_newline.match(f.readline())
line = m.group(1)
#print line
fields = line.split("\t")
# scrub the attributes we don't need
savefields = ['LAT','LONG', 'SHORT_FORM', 'FULL_NAME_RO']
linecounter = 0
placecounter = 0
features = []
for txt in f:
    m = re_newline.match(txt)
    line = m.group(1)
    row = line.split("\t")
   
    # Convert values that we want from row into a dictionary
    allattrib = {}
    i = 0
    for item in fields:
        allattrib[item] = row[i]
        i += 1
       
    # Currently we only care about populated places
    if allattrib['FC'] == 'P':
   
        savedattrib = {}
        for item in savefields:
            savedattrib[item] = allattrib[item]
        #print allattrib
        # float causes output w/o quotes. Quoted latlon not allowed in GeoJSON!
        coordinates = [float(savedattrib['LONG']), float(savedattrib['LAT']) ]
       
        geometry = {'type':'Point', 'coordinates': coordinates}
        features.append({'type':'Feature', 'geometry':geometry, 'properties':savedattrib})
       
        placecounter += 1
   
    linecounter += 1
    #if linecounter > 3: break # uncomment for debugging
print("%d lines processed, %d features in output." % (linecounter, placecounter))
print("Encoding..")
rows = { 'type':'FeatureCollection', 'features':features}
# just one line turns the entire dictionary into GeoJSON
# compact option, this takes least amount of space, squeezes out whitespace
encoded = json.dumps(rows, sort_keys=False, separators=(',',':'))
# prettyprint option, this takes time
#encoded = json.dumps(rows, sort_keys=False, indent=4)
print("Writing output..")
f = open(output, 'w')
f.write(encoded)
f.close()
print("Done!")
sys.exit(0)
</code>

Revision as of 23:13, 2 June 2012

For the Mapping Vietnam project

First download place names from NIMA: ftp://ftp.nga.mil/pub2/gns_data/vm.zip

Unzip the file, which gives us "vm.txt" containing tab delimited data. The file has over 51,000 entries, just what I need! A big data pile!

Field names are on line 1, and they are described here: http://earth-info.nga.mil/gns/html/gis_countryfiles.html

Pseudo code

Read the file.
Create a features dictionary
For each input line, 
 Ignore anything not a populated place
 Get coordinates into a point geometry dictionary
 Get the attributes that we want into attributes dictionary
 Save the geometry and attributes to a feature
 Add the feature to the features dictionary
Encode to GeoJson
Write to output.

Python code

  1. !/usr/bin/env python

import sys, os import re import json

re_newline = re.compile(r'^(.*?)\r?\n?$') # used to remove either dos or unix newline

filename = 'vm.txt' output = 'vm.geojson'

try :

   f = open(filename, 'r')

except :

   print("Can't open %s" % filename)
   sys.exit(-1)
  1. first line contains field names

m = re_newline.match(f.readline()) line = m.group(1)

  1. print line

fields = line.split("\t")

  1. scrub the attributes we don't need

savefields = ['LAT','LONG', 'SHORT_FORM', 'FULL_NAME_RO']

linecounter = 0 placecounter = 0 features = [] for txt in f:

   m = re_newline.match(txt)
   line = m.group(1)
   row = line.split("\t")
   
   # Convert values that we want from row into a dictionary
   allattrib = {}
   i = 0
   for item in fields:
       allattrib[item] = row[i]
       i += 1
       
   # Currently we only care about populated places
   if allattrib['FC'] == 'P':
   
       savedattrib = {}
       for item in savefields:
           savedattrib[item] = allattrib[item]
       #print allattrib
       # float causes output w/o quotes. Quoted latlon not allowed in GeoJSON!
       coordinates = [float(savedattrib['LONG']), float(savedattrib['LAT']) ]
       
       geometry = {'type':'Point', 'coordinates': coordinates}
       features.append({'type':'Feature', 'geometry':geometry, 'properties':savedattrib})
       
       placecounter += 1
   
   linecounter += 1
   #if linecounter > 3: break # uncomment for debugging

print("%d lines processed, %d features in output." % (linecounter, placecounter))

print("Encoding..") rows = { 'type':'FeatureCollection', 'features':features}

  1. just one line turns the entire dictionary into GeoJSON
  1. compact option, this takes least amount of space, squeezes out whitespace

encoded = json.dumps(rows, sort_keys=False, separators=(',',':'))

  1. prettyprint option, this takes time
  2. encoded = json.dumps(rows, sort_keys=False, indent=4)

print("Writing output..") f = open(output, 'w') f.write(encoded) f.close()

print("Done!") sys.exit(0)