Too many changes to properly fit in a commit msg, changes will be discussed on discord or simply ask me

2021-05-17 01:34:11 +00:00
parent 04d9431cae
commit 62fae7c77b
7 changed files with 1976 additions and 633 deletions
--- a/0-pilot-project/Process.md
+++ b/0-pilot-project/Process.md
@@ -0,0 +1,48 @@
+# Using kNN
+## Finding optimal k-Value
+Through testing on the original dataset (split 80:20) we found, that the optimal k-value is 3.
+
+Running the kNN on the dataset without any preprocessing results in:
+> weighted avg       0.97      0.97      0.97     56000
+
+# Dataset optimization
+## Standardization
+### Standard
+It seemed like StandardScalar on the MNIST dataset wouldn't change the outcome, so we ommitted standardization. 
+Reason for that is probably, that the MNIST Dataset was already optimized for processing. 
+
+### MinMax
+Needs to be updated.
+
+## Feature selection
+To be tested
+
+## Feature reduction
+### PCA
+Testing with PCA and plotting component vs. variance we found that a 98.64% variance could be archived with only 300 components [^1].
+
+Testing further the a variance of 99.99999999999992% was archived at 709 components, which was also the same for 784 components (the original amount of components), which means, that no/minimal variance/information is lost when using 709 components in comparison to 784 components[^2].
+
+For now we will simply go with n_components of 709.
+
+### LDA
+To be tested
+
+# TODO
+- [ ] Look up point of Covariance Matrix and how it works
+    - https://www.youtube.com/watch?v=152tSYtiQbw
+    - Probably part of PCA
+- [ ] Reference for standardization not changing results of classifier
+- [ ] Reference for MNIST already been standardized
+- [ ] Test standardization method other than `StandardScalar`
+- [ ] Test feature reduction method other than `PCA` (i.e. LDA(Linear Discriminant Analysis))
+    - https://en.wikipedia.org/wiki/Dimensionality_reduction
+    - https://towardsdatascience.com/is-lda-a-dimensionality-reduction-technique-or-a-classifier-algorithm-eeed4de9953a
+    - https://medium.com/machine-learning-researcher/dimensionality-reduction-pca-and-lda-6be91734f567
+    - https://towardsdatascience.com/dimensionality-reduction-does-pca-really-improve-classification-outcome-6e9ba21f0a32
+- [ ] Add feature selection process
+    - https://scikit-learn.org/stable/modules/feature_selection.html
+
+
+[^1]: https://medium.com/@miat1015/mnist-using-pca-for-dimension-reduction-and-also-t-sne-and-also-3d-visualization-55084e0320b5
+[^2]: Could be due to rounding in python