Too many changes to properly fit in a commit msg, changes will be discussed on discord or simply ask me
This commit is contained in:
48
0-pilot-project/Process.md
Normal file
48
0-pilot-project/Process.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Using kNN
|
||||
## Finding optimal k-Value
|
||||
Through testing on the original dataset (split 80:20) we found, that the optimal k-value is 3.
|
||||
|
||||
Running the kNN on the dataset without any preprocessing results in:
|
||||
> weighted avg 0.97 0.97 0.97 56000
|
||||
|
||||
# Dataset optimization
|
||||
## Standardization
|
||||
### Standard
|
||||
It seemed like StandardScalar on the MNIST dataset wouldn't change the outcome, so we ommitted standardization.
|
||||
Reason for that is probably, that the MNIST Dataset was already optimized for processing.
|
||||
|
||||
### MinMax
|
||||
Needs to be updated.
|
||||
|
||||
## Feature selection
|
||||
To be tested
|
||||
|
||||
## Feature reduction
|
||||
### PCA
|
||||
Testing with PCA and plotting component vs. variance we found that a 98.64% variance could be archived with only 300 components [^1].
|
||||
|
||||
Testing further the a variance of 99.99999999999992% was archived at 709 components, which was also the same for 784 components (the original amount of components), which means, that no/minimal variance/information is lost when using 709 components in comparison to 784 components[^2].
|
||||
|
||||
For now we will simply go with n_components of 709.
|
||||
|
||||
### LDA
|
||||
To be tested
|
||||
|
||||
# TODO
|
||||
- [ ] Look up point of Covariance Matrix and how it works
|
||||
- https://www.youtube.com/watch?v=152tSYtiQbw
|
||||
- Probably part of PCA
|
||||
- [ ] Reference for standardization not changing results of classifier
|
||||
- [ ] Reference for MNIST already been standardized
|
||||
- [ ] Test standardization method other than `StandardScalar`
|
||||
- [ ] Test feature reduction method other than `PCA` (i.e. LDA(Linear Discriminant Analysis))
|
||||
- https://en.wikipedia.org/wiki/Dimensionality_reduction
|
||||
- https://towardsdatascience.com/is-lda-a-dimensionality-reduction-technique-or-a-classifier-algorithm-eeed4de9953a
|
||||
- https://medium.com/machine-learning-researcher/dimensionality-reduction-pca-and-lda-6be91734f567
|
||||
- https://towardsdatascience.com/dimensionality-reduction-does-pca-really-improve-classification-outcome-6e9ba21f0a32
|
||||
- [ ] Add feature selection process
|
||||
- https://scikit-learn.org/stable/modules/feature_selection.html
|
||||
|
||||
|
||||
[^1]: https://medium.com/@miat1015/mnist-using-pca-for-dimension-reduction-and-also-t-sne-and-also-3d-visualization-55084e0320b5
|
||||
[^2]: Could be due to rounding in python
|
||||
Reference in New Issue
Block a user