48 lines
2.2 KiB
Markdown
48 lines
2.2 KiB
Markdown
# Using kNN
|
|
## Finding optimal k-Value
|
|
Through testing on the original dataset (split 80:20) we found, that the optimal k-value is 3.
|
|
|
|
Running the kNN on the dataset without any preprocessing results in:
|
|
> weighted avg 0.97 0.97 0.97 56000
|
|
|
|
# Dataset optimization
|
|
## Standardization
|
|
### Standard
|
|
It seemed like StandardScalar on the MNIST dataset wouldn't change the outcome, so we ommitted standardization.
|
|
Reason for that is probably, that the MNIST Dataset was already optimized for processing.
|
|
|
|
### MinMax
|
|
Needs to be updated.
|
|
|
|
## Feature selection
|
|
To be tested
|
|
|
|
## Feature reduction
|
|
### PCA
|
|
Testing with PCA and plotting component vs. variance we found that a 98.64% variance could be archived with only 300 components [^1].
|
|
|
|
Testing further the a variance of 99.99999999999992% was archived at 709 components, which was also the same for 784 components (the original amount of components), which means, that no/minimal variance/information is lost when using 709 components in comparison to 784 components[^2].
|
|
|
|
For now we will simply go with n_components of 709.
|
|
|
|
### LDA
|
|
To be tested
|
|
|
|
# TODO
|
|
- [ ] Look up point of Covariance Matrix and how it works
|
|
- https://www.youtube.com/watch?v=152tSYtiQbw
|
|
- Probably part of PCA
|
|
- [ ] Reference for standardization not changing results of classifier
|
|
- [ ] Reference for MNIST already been standardized
|
|
- [ ] Test standardization method other than `StandardScalar`
|
|
- [ ] Test feature reduction method other than `PCA` (i.e. LDA(Linear Discriminant Analysis))
|
|
- https://en.wikipedia.org/wiki/Dimensionality_reduction
|
|
- https://towardsdatascience.com/is-lda-a-dimensionality-reduction-technique-or-a-classifier-algorithm-eeed4de9953a
|
|
- https://medium.com/machine-learning-researcher/dimensionality-reduction-pca-and-lda-6be91734f567
|
|
- https://towardsdatascience.com/dimensionality-reduction-does-pca-really-improve-classification-outcome-6e9ba21f0a32
|
|
- [ ] Add feature selection process
|
|
- https://scikit-learn.org/stable/modules/feature_selection.html
|
|
|
|
|
|
[^1]: https://medium.com/@miat1015/mnist-using-pca-for-dimension-reduction-and-also-t-sne-and-also-3d-visualization-55084e0320b5
|
|
[^2]: Could be due to rounding in python |