From 81936e1e2cfbbbf903f7a461a2df61ec3bee127d Mon Sep 17 00:00:00 2001 From: mjfernez Date: Sat, 5 Mar 2022 00:42:18 -0500 Subject: Add bug details, full shapefile, and nonsense This commit adds more details about the bug to the README, including steps to reproduce. I also realized the fixed shapefile from QGIS was exporting just fine all along, but I was zipping it into a directory and therefore was just entering the path wrong. So obviously... ogr didn't recognize that. There is no issue with QGIS --- README.md | 79 ++++- error.json | 345 +++++++++++++++++++++ ne_10m_admin_0_countries.zip | Bin 0 -> 5012760 bytes .../ne_10m_admin_0_countries.cpg | 1 + .../ne_10m_admin_0_countries.dbf | Bin 0 -> 8744936 bytes .../ne_10m_admin_0_countries.prj | 1 + .../ne_10m_admin_0_countries.qmd | 26 ++ .../ne_10m_admin_0_countries.shp | Bin 0 -> 8806180 bytes .../ne_10m_admin_0_countries.shx | Bin 0 -> 2164 bytes 9 files changed, 438 insertions(+), 14 deletions(-) create mode 100644 error.json create mode 100644 ne_10m_admin_0_countries.zip create mode 100644 ne_10m_admin_0_countries/ne_10m_admin_0_countries.cpg create mode 100644 ne_10m_admin_0_countries/ne_10m_admin_0_countries.dbf create mode 100644 ne_10m_admin_0_countries/ne_10m_admin_0_countries.prj create mode 100644 ne_10m_admin_0_countries/ne_10m_admin_0_countries.qmd create mode 100644 ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp create mode 100644 ne_10m_admin_0_countries/ne_10m_admin_0_countries.shx diff --git a/README.md b/README.md index edbf42d..90446eb 100644 --- a/README.md +++ b/README.md @@ -6,15 +6,16 @@ working on. I was attempting to import [the world](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/) in Elastic, but Elastic has some bug where you can't upload GeoJSON through the web form, so I had to do it manually, like this: + ```bash -NAME,ECONOMY,FORMAL_EN,GDP_MD,ISO_A2 -ogr2ogr -f ElasticSearch -progress \ --select $fields \ --lco NOT_ANALYZED_FIELDS=$fields \ --lco INDEX_NAME=countries \ --lco OVERWRITE_INDEX=YES \ -ES:http://localhost:9200 \ -/vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp +$ fields=NAME,ECONOMY,FORMAL_EN,GDP_MD,ISO_A2 +$ ogr2ogr -f ElasticSearch -progress \ + -select $fields \ + -lco NOT_ANALYZED_FIELDS=$fields \ + -lco INDEX_NAME=countries \ + -lco OVERWRITE_INDEX=YES \ + ES:http://localhost:9200 \ + /vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp ``` But ogr2ogr yells at you after processing about 170 countries or so. If @@ -39,15 +40,65 @@ Fortunately, QGIS has a Geometry Checker Plugin, but unfortunately, it's a bit complicated and was a pain to do. If you don't tune it right, you end up having to sort through lots of "mistakes" which aren't mistakes. -Also there's an [unfixed bug](https://github.com/qgis/QGIS/issues/37527) -in QGIS which doesn't make shape files correctly. -Don't know how that's possible considering that's -literally what the software's made for, but I could only get geojson -input to work correctly. - For anyone else who might be down this rabbit hole, Egypt is Object ID 161--I promise that will save you time. Or you could just download my copy of the file here. Hoping to use this git repo as part of a bug report, once I read their process on that. + +Included here is [ESRI shape +file](https://www.loc.gov/preservation/digital/formats/fdd/fdd000280.shtml) +in the `ne_10m_admin_0_countries` directory as well as the same output +in GeoJSON, since I think the format is a bit easier to work with. + + +## Steps to reproduce the bug + +1. Download the original file from Natural Earth + +```bash +wget https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip +``` + +2. Try to import the file into Elastic with the series of bash + commands given earlier. Or alternatively, just: + +```bash +$ ogr2ogr -f ElasticSearch -progress \ + -lco NOT_ANALYZED_FIELDS={ALL} \ + -lco INDEX_NAME=countries \ + -lco OVERWRITE_INDEX=YES \ + ES:http://localhost:9200 \ + /vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp +``` + +3. Observe you receive a similar error as given in `error.json` + +As a sanity check, you can re-run the same command without the fancy zip +syntax by manually unzipping: + +```bash +$ mkdir -p ne && unzip ne_10m_admin_0_countries.zip -d ne/ +$ ogr2ogr -f ElasticSearch -progress \ + -lco NOT_ANALYZED_FIELDS={ALL} \ + -lco INDEX_NAME=countries \ + -lco OVERWRITE_INDEX=YES \ + ES:http://localhost:9200 \ + ne/ne_10m_admin_0_countries.shp +``` + +You will get the same error + +### Notes + +Oddly enough, converting to other formats *will not* yield the same +error. I suspect there is some check that's not done by the GeoJSON +(and other) drivers that the Elastic one does. + + +``` bash +$ ogr2ogr -progress -f GeoJSON test.geojson /vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp +``` + +^That runs just fine diff --git a/error.json b/error.json new file mode 100644 index 0000000..4f2496c --- /dev/null +++ b/error.json @@ -0,0 +1,345 @@ +{ + "took": 839, + "errors": true, + "items": [ + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "F9aIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 156, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "GNaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 157, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "GdaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 158, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "GtaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 159, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "G9aIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 160, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "HNaIWH8BuPJM6EPks_66", + "status": 400, + "error": { + "type": "mapper_parsing_exception", + "reason": "failed to parse field [geometry] of type [geo_shape]", + "caused_by": { + "type": "illegal_argument_exception", + "reason": "Self-intersection at or near point [35.621087106,23.139292914]" + } + } + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "HdaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 161, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "HtaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 162, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "H9aIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 163, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "INaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 164, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "IdaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 165, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "ItaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 166, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "I9aIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 167, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "JNaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 168, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "JdaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 169, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "JtaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 170, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "J9aIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 171, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "KNaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 172, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "KdaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 173, + "_primary_term": 1, + "status": 201 + } + }, + { + "index": { + "_index": "countries", + "_type": "_doc", + "_id": "KtaIWH8BuPJM6EPks_66", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 174, + "_primary_term": 1, + "status": 201 + } + } + ] +} diff --git a/ne_10m_admin_0_countries.zip b/ne_10m_admin_0_countries.zip new file mode 100644 index 0000000..39433bb Binary files /dev/null and b/ne_10m_admin_0_countries.zip differ diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.cpg b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.cpg new file mode 100644 index 0000000..3ad133c --- /dev/null +++ b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.cpg @@ -0,0 +1 @@ +UTF-8 \ No newline at end of file diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.dbf b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.dbf new file mode 100644 index 0000000..3d734c9 Binary files /dev/null and b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.dbf differ diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.prj b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.prj new file mode 100644 index 0000000..f45cbad --- /dev/null +++ b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.prj @@ -0,0 +1 @@ +GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]] \ No newline at end of file diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.qmd b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.qmd new file mode 100644 index 0000000..e6f3dba --- /dev/null +++ b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.qmd @@ -0,0 +1,26 @@ + + + + + + dataset + + + + + + + + + + 0 + 0 + + + + + false + + + + diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp new file mode 100644 index 0000000..28b350d Binary files /dev/null and b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp differ diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shx b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shx new file mode 100644 index 0000000..f04365d Binary files /dev/null and b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shx differ -- cgit v1.2.3