iui-group-l-name-zensiert/0-pilot-project/Process.md

2.2 KiB

Using kNN

Finding optimal k-Value

Through testing on the original dataset (split 80:20) we found, that the optimal k-value is 3.

Running the kNN on the dataset without any preprocessing results in:

weighted avg 0.97 0.97 0.97 56000

Dataset optimization

Standardization

Standard

It seemed like StandardScalar on the MNIST dataset wouldn't change the outcome, so we ommitted standardization. Reason for that is probably, that the MNIST Dataset was already optimized for processing.

MinMax

Needs to be updated.

Feature selection

To be tested

Feature reduction

PCA

Testing with PCA and plotting component vs. variance we found that a 98.64% variance could be archived with only 300 components 1.

Testing further the a variance of 99.99999999999992% was archived at 709 components, which was also the same for 784 components (the original amount of components), which means, that no/minimal variance/information is lost when using 709 components in comparison to 784 components2.

For now we will simply go with n_components of 709.

LDA

To be tested

TODO