## All we need is an easy explanation of the problem, so here it is.

I want to create a database using any of the possible RDBMS. It will have a table with approximately 150 columns. The objective is to perform nearest neighbor search of some other objects. So it’s a NNS in the 150-dimensional space.

I already tried to use some obvious methods like L1 or L2 distances but of course it takes a lot of time for tables with many rows. Also I tried to look at the KD-tree (note I did not test it) and PG-Strom but they are not a good solution for data with a many dimensions.

Can I somehow improve the speed of the described search using math methods (like KD-tree) or tech methods (like PG-Strom)?

I will try to use any RDBMS which allow to improve speed of the NNS. But MySQL and PostgreSQL are the most appropriate DBMS for me.

## How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

### Method 1

#### PostgreSQL 9.6 using `cube`

First install the cube extension

```
CREATE EXTENSION cube;
```

Now we will create some n-dimensional space with 100,000 points in 50 dimensions. In addition we’ll add a GIST index.

```
CREATE TEMP TABLE space_nd
AS
SELECT i, cube(array_agg(random()::float)) AS c
FROM generate_series(1,1e5) AS i
CROSS JOIN LATERAL generate_series(1,50)
AS x
GROUP BY i;
CREATE INDEX ON space_nd USING gist ( c );
ANALYZE space_nd;
```

Now we will generate a single point and use the `<->`

operater to find the nearest point using Eucledian distance.

```
WITH points AS (
SELECT cube(array_agg(random()::float)) AS c
FROM generate_series(1,50)
AS x
)
SELECT i,
pg_typeof(space_nd.c),
pg_typeof(points.c),
cube_distance(space_nd.c, points.c)
FROM space_nd
CROSS JOIN points
ORDER BY space_nd.c <-> points.c
LIMIT 5;
```

PostgreSQL 9.6+ supports other distance operators over `cube`

. All of which can use the GIST index we created. Namely,

```
a <-> b float8 Euclidean distance between a and b.
a <#> b float8 Taxicab (L-1 metric) distance between a and b.
a <=> b float8 Chebyshev (L-inf metric) distance between a and b.
```

That said there is one caveat,

To make it harder for people to break things,

there is a limit of 100 on the number of dimensions of cubes. This is set in cubedata.h if you need something bigger.

You ask for 150 dimensions. That may present a minor complication.

### Method 2

Consider performing dimension reduction first (eg. Principle Component Analysis).

Then your are doing NN in small number of dimensions with higher performance.

You can use Pl/R to perform PCA inside postgres if needed.

### Method 3

Take a look at https://github.com/a-mma/AquilaDB it is a vector database to store Feature Vectors along with JSON Metadata. Keep it along with your RDBMS and do use metadata to maintain cross reference between data.

### Method 4

Have a look at FLANN and OpenCV.

Unfortunately I am not aware of an integration of that into a RDBMS system. But there is for example integration of chemical structure information with Posgres. So in principle this can be done.

**Note: Use and implement method 1 because this method fully tested our system.Thank you 🙂**

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0