aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authormjfernez <mjf@mjfer.net>2022-03-05 00:42:18 -0500
committermjfernez <mjf@mjfer.net>2022-03-05 00:42:18 -0500
commit81936e1e2cfbbbf903f7a461a2df61ec3bee127d (patch)
tree12c5b9bd26ccb50c96ec77c760d3157855c5a5db
parent84aeeb597c7ee4859ec22f9542cda4219ba56a5b (diff)
downloadnatural_earth_data_corrections-81936e1e2cfbbbf903f7a461a2df61ec3bee127d.tar.gz
Add bug details, full shapefile, and nonsense
This commit adds more details about the bug to the README, including steps to reproduce. I also realized the fixed shapefile from QGIS was exporting just fine all along, but I was zipping it into a directory and therefore was just entering the path wrong. So obviously... ogr didn't recognize that. There is no issue with QGIS
-rw-r--r--README.md79
-rw-r--r--error.json345
-rw-r--r--ne_10m_admin_0_countries.zipbin0 -> 5012760 bytes
-rw-r--r--ne_10m_admin_0_countries/ne_10m_admin_0_countries.cpg1
-rw-r--r--ne_10m_admin_0_countries/ne_10m_admin_0_countries.dbfbin0 -> 8744936 bytes
-rw-r--r--ne_10m_admin_0_countries/ne_10m_admin_0_countries.prj1
-rw-r--r--ne_10m_admin_0_countries/ne_10m_admin_0_countries.qmd26
-rw-r--r--ne_10m_admin_0_countries/ne_10m_admin_0_countries.shpbin0 -> 8806180 bytes
-rw-r--r--ne_10m_admin_0_countries/ne_10m_admin_0_countries.shxbin0 -> 2164 bytes
9 files changed, 438 insertions, 14 deletions
diff --git a/README.md b/README.md
index edbf42d..90446eb 100644
--- a/README.md
+++ b/README.md
@@ -6,15 +6,16 @@ working on. I was attempting to import [the
world](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/)
in Elastic, but Elastic has some bug where you can't upload GeoJSON
through the web form, so I had to do it manually, like this:
+
```bash
-NAME,ECONOMY,FORMAL_EN,GDP_MD,ISO_A2
-ogr2ogr -f ElasticSearch -progress \
--select $fields \
--lco NOT_ANALYZED_FIELDS=$fields \
--lco INDEX_NAME=countries \
--lco OVERWRITE_INDEX=YES \
-ES:http://localhost:9200 \
-/vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp
+$ fields=NAME,ECONOMY,FORMAL_EN,GDP_MD,ISO_A2
+$ ogr2ogr -f ElasticSearch -progress \
+ -select $fields \
+ -lco NOT_ANALYZED_FIELDS=$fields \
+ -lco INDEX_NAME=countries \
+ -lco OVERWRITE_INDEX=YES \
+ ES:http://localhost:9200 \
+ /vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp
```
But ogr2ogr yells at you after processing about 170 countries or so. If
@@ -39,15 +40,65 @@ Fortunately, QGIS has a Geometry Checker Plugin, but unfortunately, it's
a bit complicated and was a pain to do. If you don't tune it right, you
end up having to sort through lots of "mistakes" which aren't mistakes.
-Also there's an [unfixed bug](https://github.com/qgis/QGIS/issues/37527)
-in QGIS which doesn't make shape files correctly.
-Don't know how that's possible considering that's
-literally what the software's made for, but I could only get geojson
-input to work correctly.
-
For anyone else who might be down this rabbit hole, Egypt is Object ID
161--I promise that will save you time. Or you could just download my
copy of the file here.
Hoping to use this git repo as part of a bug report, once I read their
process on that.
+
+Included here is [ESRI shape
+file](https://www.loc.gov/preservation/digital/formats/fdd/fdd000280.shtml)
+in the `ne_10m_admin_0_countries` directory as well as the same output
+in GeoJSON, since I think the format is a bit easier to work with.
+
+
+## Steps to reproduce the bug
+
+1. Download the original file from Natural Earth
+
+```bash
+wget https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip
+```
+
+2. Try to import the file into Elastic with the series of bash
+ commands given earlier. Or alternatively, just:
+
+```bash
+$ ogr2ogr -f ElasticSearch -progress \
+ -lco NOT_ANALYZED_FIELDS={ALL} \
+ -lco INDEX_NAME=countries \
+ -lco OVERWRITE_INDEX=YES \
+ ES:http://localhost:9200 \
+ /vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp
+```
+
+3. Observe you receive a similar error as given in `error.json`
+
+As a sanity check, you can re-run the same command without the fancy zip
+syntax by manually unzipping:
+
+```bash
+$ mkdir -p ne && unzip ne_10m_admin_0_countries.zip -d ne/
+$ ogr2ogr -f ElasticSearch -progress \
+ -lco NOT_ANALYZED_FIELDS={ALL} \
+ -lco INDEX_NAME=countries \
+ -lco OVERWRITE_INDEX=YES \
+ ES:http://localhost:9200 \
+ ne/ne_10m_admin_0_countries.shp
+```
+
+You will get the same error
+
+### Notes
+
+Oddly enough, converting to other formats *will not* yield the same
+error. I suspect there is some check that's not done by the GeoJSON
+(and other) drivers that the Elastic one does.
+
+
+``` bash
+$ ogr2ogr -progress -f GeoJSON test.geojson /vsizip/./ne_10m_admin_0_countries.zip/ne_10m_admin_0_countries.shp
+```
+
+^That runs just fine
diff --git a/error.json b/error.json
new file mode 100644
index 0000000..4f2496c
--- /dev/null
+++ b/error.json
@@ -0,0 +1,345 @@
+{
+ "took": 839,
+ "errors": true,
+ "items": [
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "F9aIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 156,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "GNaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 157,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "GdaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 158,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "GtaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 159,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "G9aIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 160,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "HNaIWH8BuPJM6EPks_66",
+ "status": 400,
+ "error": {
+ "type": "mapper_parsing_exception",
+ "reason": "failed to parse field [geometry] of type [geo_shape]",
+ "caused_by": {
+ "type": "illegal_argument_exception",
+ "reason": "Self-intersection at or near point [35.621087106,23.139292914]"
+ }
+ }
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "HdaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 161,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "HtaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 162,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "H9aIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 163,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "INaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 164,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "IdaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 165,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "ItaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 166,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "I9aIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 167,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "JNaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 168,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "JdaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 169,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "JtaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 170,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "J9aIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 171,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "KNaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 172,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "KdaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 173,
+ "_primary_term": 1,
+ "status": 201
+ }
+ },
+ {
+ "index": {
+ "_index": "countries",
+ "_type": "_doc",
+ "_id": "KtaIWH8BuPJM6EPks_66",
+ "_version": 1,
+ "result": "created",
+ "_shards": {
+ "total": 2,
+ "successful": 1,
+ "failed": 0
+ },
+ "_seq_no": 174,
+ "_primary_term": 1,
+ "status": 201
+ }
+ }
+ ]
+}
diff --git a/ne_10m_admin_0_countries.zip b/ne_10m_admin_0_countries.zip
new file mode 100644
index 0000000..39433bb
--- /dev/null
+++ b/ne_10m_admin_0_countries.zip
Binary files differ
diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.cpg b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.cpg
new file mode 100644
index 0000000..3ad133c
--- /dev/null
+++ b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.cpg
@@ -0,0 +1 @@
+UTF-8 \ No newline at end of file
diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.dbf b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.dbf
new file mode 100644
index 0000000..3d734c9
--- /dev/null
+++ b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.dbf
Binary files differ
diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.prj b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.prj
new file mode 100644
index 0000000..f45cbad
--- /dev/null
+++ b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.prj
@@ -0,0 +1 @@
+GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]] \ No newline at end of file
diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.qmd b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.qmd
new file mode 100644
index 0000000..e6f3dba
--- /dev/null
+++ b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.qmd
@@ -0,0 +1,26 @@
+<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
+<qgis version="3.22.4-Białowieża">
+ <identifier></identifier>
+ <parentidentifier></parentidentifier>
+ <language></language>
+ <type>dataset</type>
+ <title></title>
+ <abstract></abstract>
+ <links/>
+ <fees></fees>
+ <encoding></encoding>
+ <crs>
+ <spatialrefsys>
+ <wkt></wkt>
+ <proj4></proj4>
+ <srsid>0</srsid>
+ <srid>0</srid>
+ <authid></authid>
+ <description></description>
+ <projectionacronym></projectionacronym>
+ <ellipsoidacronym></ellipsoidacronym>
+ <geographicflag>false</geographicflag>
+ </spatialrefsys>
+ </crs>
+ <extent/>
+</qgis>
diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp
new file mode 100644
index 0000000..28b350d
--- /dev/null
+++ b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp
Binary files differ
diff --git a/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shx b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shx
new file mode 100644
index 0000000..f04365d
--- /dev/null
+++ b/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shx
Binary files differ