aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authormjfernez <mjf@mjfer.net>2022-03-05 00:42:18 -0500
committermjfernez <mjf@mjfer.net>2022-03-05 00:42:18 -0500
commit81936e1e2cfbbbf903f7a461a2df61ec3bee127d (patch)
tree12c5b9bd26ccb50c96ec77c760d3157855c5a5db /README.md
parent84aeeb597c7ee4859ec22f9542cda4219ba56a5b (diff)
downloadnatural_earth_data_corrections-81936e1e2cfbbbf903f7a461a2df61ec3bee127d.tar.gz
Add bug details, full shapefile, and nonsense
This commit adds more details about the bug to the README, including steps to reproduce. I also realized the fixed shapefile from QGIS was exporting just fine all along, but I was zipping it into a directory and therefore was just entering the path wrong. So obviously... ogr didn't recognize that. There is no issue with QGIS
Diffstat (limited to 'README.md')
-rw-r--r--README.md79
1 files changed, 65 insertions, 14 deletions
diff --git a/README.md b/README.md
index edbf42d..90446eb 100644
--- a/README.md
+++ b/README.md
@@ -6,15 +6,16 @@ working on. I was attempting to import [the
world](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/)
in Elastic, but Elastic has some bug where you can't upload GeoJSON
through the web form, so I had to do it manually, like this:
+
```bash
-NAME,ECONOMY,FORMAL_EN,GDP_MD,ISO_A2
-ogr2ogr -f ElasticSearch -progress \
--select $fields \
--lco NOT_ANALYZED_FIELDS=$fields \
--lco INDEX_NAME=countries \
--lco OVERWRITE_INDEX=YES \
-ES:http://localhost:9200 \
-/vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp
+$ fields=NAME,ECONOMY,FORMAL_EN,GDP_MD,ISO_A2
+$ ogr2ogr -f ElasticSearch -progress \
+ -select $fields \
+ -lco NOT_ANALYZED_FIELDS=$fields \
+ -lco INDEX_NAME=countries \
+ -lco OVERWRITE_INDEX=YES \
+ ES:http://localhost:9200 \
+ /vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp
```
But ogr2ogr yells at you after processing about 170 countries or so. If
@@ -39,15 +40,65 @@ Fortunately, QGIS has a Geometry Checker Plugin, but unfortunately, it's
a bit complicated and was a pain to do. If you don't tune it right, you
end up having to sort through lots of "mistakes" which aren't mistakes.
-Also there's an [unfixed bug](https://github.com/qgis/QGIS/issues/37527)
-in QGIS which doesn't make shape files correctly.
-Don't know how that's possible considering that's
-literally what the software's made for, but I could only get geojson
-input to work correctly.
-
For anyone else who might be down this rabbit hole, Egypt is Object ID
161--I promise that will save you time. Or you could just download my
copy of the file here.
Hoping to use this git repo as part of a bug report, once I read their
process on that.
+
+Included here is [ESRI shape
+file](https://www.loc.gov/preservation/digital/formats/fdd/fdd000280.shtml)
+in the `ne_10m_admin_0_countries` directory as well as the same output
+in GeoJSON, since I think the format is a bit easier to work with.
+
+
+## Steps to reproduce the bug
+
+1. Download the original file from Natural Earth
+
+```bash
+wget https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip
+```
+
+2. Try to import the file into Elastic with the series of bash
+ commands given earlier. Or alternatively, just:
+
+```bash
+$ ogr2ogr -f ElasticSearch -progress \
+ -lco NOT_ANALYZED_FIELDS={ALL} \
+ -lco INDEX_NAME=countries \
+ -lco OVERWRITE_INDEX=YES \
+ ES:http://localhost:9200 \
+ /vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp
+```
+
+3. Observe you receive a similar error as given in `error.json`
+
+As a sanity check, you can re-run the same command without the fancy zip
+syntax by manually unzipping:
+
+```bash
+$ mkdir -p ne && unzip ne_10m_admin_0_countries.zip -d ne/
+$ ogr2ogr -f ElasticSearch -progress \
+ -lco NOT_ANALYZED_FIELDS={ALL} \
+ -lco INDEX_NAME=countries \
+ -lco OVERWRITE_INDEX=YES \
+ ES:http://localhost:9200 \
+ ne/ne_10m_admin_0_countries.shp
+```
+
+You will get the same error
+
+### Notes
+
+Oddly enough, converting to other formats *will not* yield the same
+error. I suspect there is some check that's not done by the GeoJSON
+(and other) drivers that the Elastic one does.
+
+
+``` bash
+$ ogr2ogr -progress -f GeoJSON test.geojson /vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp
+```
+
+^That runs just fine