shakes76 · sully36 · Oct 13, 2022 · Oct 13, 2022 · Oct 13, 2022 · Oct 13, 2022
diff --git a/.DS_Store b/.DS_Store
diff --git a/.idea/Ass3_Report.iml b/.idea/Ass3_Report.iml
diff --git a/.idea/encodings.xml b/.idea/encodings.xml
diff --git a/.idea/inspectionProfiles/profiles_settings.xml b/.idea/inspectionProfiles/profiles_settings.xml
diff --git a/.idea/misc.xml b/.idea/misc.xml
diff --git a/.idea/modules.xml b/.idea/modules.xml
diff --git a/.idea/workspace.xml b/.idea/workspace.xml
diff --git a/recognition/.DS_Store b/recognition/.DS_Store
diff --git a/recognition/45799930/.DS_Store b/recognition/45799930/.DS_Store
diff --git a/recognition/45799930/README.md b/recognition/45799930/README.md
@@ -0,0 +1,90 @@
+# Segmentation of the ISIC Dataset with the Improved UNet
+
+---
+
+Author: Jessica Sullivan
+
+Student id: 45799930
+
+Assignment: COMP3710 Report Semester 2, 2022
+
+---
+
+## Description of the Algorithm and Problem it Solves
+
+The ISIC dataset contains images of skin lesions created by the International Skin Imaging collection (ISIC). They come out with a new dataset every year, which has become a major machine learning task to help identify skin cancer and to attempt to define whether the skin lesions are malignant (as stated [here](https://pubmed.ncbi.nlm.nih.gov/34852988/#:~:text=The%20International%20Skin%20Imaging%20Collaboration,cancer%20detection%20and%20malignancy%20assessment.)). The task given was to segment this data using the improved UNet and to ensure that all labels have a minimum dice coefficient of 0.8. UNet is an algorithm that segments the data using a convolutional neural network, which means that it categorises parts of an image based on what the algorithm has learned by training itself. The ISIC dataset comes with training data and ground truth training data. The ground truth shows the mask we want the training data to be once it has passed through the neural network. An example of this is:
+
+<img src="images/original.jpg" height="250px" width="250px" title="Original Image"/> <img src="images/mask.png" height="250px" width="250px" title="mask"/>
+
+where in the second image (mask) the black is the background and the white is the skin lesion. Now the minimum dice coefficient represents how accurate our model is. In this case, it would be how much the segmented regions created by the model overlap with the mask (ground truth data). Therefore, the higher the dice coefficient, the greater the accuracy of the model, and a dice coefficient of greater the 0.8 means that more than 80% of the time, the model is correct. The benefit of this model is that being able to detect skin lesions through images accurately would help further advances in medicine, allowing the model to identify the problem areas on the skin.
+
+## How the Algorithm Works
+
+The model is based of the improved unet structure. This structure, based upon the original UNet structure has been updated in an attempt to make the model more accurate. The structure of the  unet model looks like:
+
+<img src="images/unet.png"  title="Original Image"/>
+
+This model demonstrates a 'U' shape, where going down at the beginning of the U is contracting the image and the upside is expanding the image back out (based on [this](https://towardsdatascience.com/unet-line-by-line-explanation-9b191c76baf5)). The contraction of the image reduces the size of the image to classify the pixels. It does this by doing multiple 2D convolutions with a 3x3 kernel. Once two convolutions have been done per layer, a max pooling is added to reduce the dimensions. Once the base layer has been reached, after the two convolutions, an upsampling is done to increase the dimensions back up. Once you have reached the top layer, convolution with a kernel size of 1x1 is done to finish the process. When downsizing the sample (max pooling), information will be lost, so to reduce the information that is lost when you go up a layer (upsampling), the output of the convolutions of the same layer while downsizing the model is concatenated to the current layer before completing the two convolutions.
+
+Whereas the imroved UNet structure is:
+
+<img src="images/improved_unet.png"  title="Original Image"/>
+
+Now, this is very similar. However, there are a couple of key things that have been modified. The first of the major things that have been modified is the localisation module. The localisation model in the improved UNet is how a lower level is manipulated to a higher spatial resolution (referenced from [here](https://arxiv.org/pdf/1802.10508v1.pdf)) by upsampling and then having a convolution model afterwards. The second major difference is the segmentation layers. The purpose of these is to try to retain as. much information as we can from the lower levels while we move upwards. This involved performing a convolution with a 3x3 kernel and upsampling before doing an element-wise sum with the beginning of the segmentation layer above.
+
+---
+
+## Dependencies
+
+### Versions Required:
+
+```commandline
+Tensorflow: 2.10.0
+Matplotlib: 3.5.3
+```
+
+### Address Reproducibility:
+
+To ensure that the code can run, you will need to download the training dataset and the truth training dataset, where you should download the one in the first row, which has the binary masks in PNG format. These can be downloaded from [here](https://challenge.isic-archive.com/data/#2017). Once downloaded, these folders should be moved to the recognition/45799930 directory, keeping the same names made when created. Therefore the directories that should have been added are:
+
+* ISIC-2017_Training_Data
+* ISIC-2017_Training_Part1_GroundTruth
+
+### Running The Code:
+
+Once the dependancies have been set up and the files have been addedas described in Address Reproducibility then the code can be run by running the predict file. Please be aware that when i run the code, about 50% of the time it comes up with an error, however re-running the program will fix it. 
+
+---
+
+## Justification
+
+### Specific Pre-Processing
+
+Some pre-processing was done on the data to ensure that the images were downloaded and processed correctly, that they were the same size and that the colouring was correct. This was all done in the `dataset.py`file. Once the image was read by using the correct pathway if it was decoded depending on the file type (jpeg for images and png for the truth images), this was processed to ensure that all images were of the size (256, 256). After ensuring that we were only dealing with elements of type`tensorflow.float32`, the image was normalised by dividing by 255.
+
+### Training, Validation and Testing Splits
+
+As only the training data from the link provided was downloaded (and its corresponding truth values), the data with split into training, validation and testing sets. The ratio chosen was 80% of the data allocated to training the data, 10% allocated to validating the data, and the final 10% to testing the data. The initial dataset was shuffled (combined with the truth so that they are still at corresponding parts of their respective tensors) so that it was a random 80% of the data selected for the training, and the 10% allocated to the validation and testing data. This ratio was chosen as it is ideal to have as much data as possible to train the model so that the model can become as accurate as possible.
+
+---
+
+## Examples
+
+### Example Input:
+
+No input is needed for this model at the moment as this is to train, validate and test the model, all from the ISIC 2018 dataset, so nothing needs to be input. However, a future adaption of this code could be that once the model is trained and tested, input in the form of an image or a dataset of images could be asked for, which the mask would then be predicted. This would give it a more real-world application and could begin its use within the medical field.
+
+### Example Output:
+
+When running the predict method the output of the epoch was:
+
+<img src="images/epoch_result.png"  title="Original Image"/>
+
+which led to these graphs:
+
+
+<img src="images/accuracy.png" height="350px" width="500px" title="Original Image"/> <img src="images/Loss.png" height="350px" width="500px" title="mask"/>
+
+Now it is quite clear here that the implementation of the Dice Similarity Coefficient is incorrect. It should not be constant for all the epoches. However when looking at the model loss its clearly visible that as the model is more trained, the information that is lost after processing is less and less. 
+
+---
diff --git a/recognition/45799930/dataset.py b/recognition/45799930/dataset.py
@@ -0,0 +1,94 @@
+import glob
+import tensorflow as tf
+from sklearn.utils import shuffle
+from math import floor
+
+"""
+Create a class to download and sort the ISIC data set. We shall download the training, test and validation data as well 
+as all their truth data. 
+"""
+
+
+class DataSet:
+
+    def __init__(self):
+        self.validate = None
+        self.testing = None
+        self.training = None
+        self.download_dataset()
+        self.image_shape = (256, 256)
+
+    def download_dataset(self):
+        """
+        This sets up the datasets we need for training, testing or validating. We need both the dataset and the truth
+        sets so that we know what it should be after processing. They have been combined in a multidimensional
+        tensor.
+
+        :return:
+
+        bool: True if all data sets and their respective truth data sets are the same size, false otherwise.
+
+        """
+        # get all the image paths for the different sets - training testing and validation. Then sort these in order so
+        # that they are in the same order (truth and initial).
+        training_truth_filenames = sorted(glob.glob('./ISIC-2017_Training_Part1_GroundTruth/*.png'))
+        training_filenames = sorted(glob.glob('./ISIC-2017_Training_Data/*.jpg'))
+
+        train, truth = shuffle((training_filenames, training_truth_filenames))
+        length = len(train)
+        training_truth_filenames = truth[:floor(length * 0.8)]
+        training_filenames = train[:floor(length * 0.8)]
+        testing_truth_filenames = truth[floor(length * 0.8):floor(length * 0.9)]
+        testing_filenames = train[floor(length * 0.8):floor(length * 0.9)]
+        validate_truth_filenames = truth[floor(length * 0.9):]
+        validate_filenames = train[floor(length * 0.9):]
+
+        # convert this into tensorflow array
+        self.training = tf.data.Dataset.from_tensor_slices((training_filenames, training_truth_filenames))
+        self.testing = tf.data.Dataset.from_tensor_slices((testing_filenames, testing_truth_filenames))
+        self.validate = tf.data.Dataset.from_tensor_slices((validate_filenames, validate_truth_filenames))
+
+        self.training = self.training.map(self.pre_process)
+        self.testing = self.testing.map(self.pre_process)
+        self.validate = self.validate.map(self.pre_process)
+
+        if len(training_truth_filenames) != len(training_filenames) or len(testing_filenames) != len(
+                testing_truth_filenames) or len(validate_truth_filenames) != len(validate_filenames):
+            return False
+        return True
+
+    def pre_process(self, image, truth_image):
+        """
+        Need to preprocess the data as all we have right now is a location of the image and the truth image. Do this by
+        reading the file, decoding the jpeg or png respectively. We check to ensure that all the images are the same
+        size. Then cast them to make sure in the same form I cast it to a float.
+
+        :param image: the path to the image.
+        :param truth_image: the path to the truth image
+        :return: the processed image and truth image.
+        """
+        image = tf.io.read_file(image)
+        # todo: do i need to change the channels? 0 is the number used in the jpeg
+        image = tf.io.decode_jpeg(image, channels=3)
+        image = tf.image.resize(image, (256, 256))
+        image = tf.cast(image, tf.float32) / 255.
+
+        truth_image = tf.io.read_file(truth_image)
+        # todo: do i need to change the channels? 0 is the number used in the jpeg
+        truth_image = tf.io.decode_png(truth_image, channels=0)
+        truth_image = tf.image.resize(truth_image, (256, 256))
+        truth_image = tf.cast(truth_image, tf.float32) / 255.
+        return image, truth_image
+
+    def split_data(self, data, truths):
+        """
+
+        :param training_data:
+        :return:
+        """
+        length = len(data)
+        train_len = length * 0.8
+        val_test_len = length * 0.1
+        data, truths = tf.random.shuffle(data, truths)
+
+        # train, test, val = data.
diff --git a/recognition/45799930/images/.DS_Store b/recognition/45799930/images/.DS_Store
diff --git a/recognition/45799930/images/Epoch_Result.png b/recognition/45799930/images/Epoch_Result.png
diff --git a/recognition/45799930/images/Improved_UNet.png b/recognition/45799930/images/Improved_UNet.png
diff --git a/recognition/45799930/images/Loss.png b/recognition/45799930/images/Loss.png
diff --git a/recognition/45799930/images/UNet.png b/recognition/45799930/images/UNet.png
diff --git a/recognition/45799930/images/accuracy.png b/recognition/45799930/images/accuracy.png
diff --git a/recognition/45799930/images/mask.png b/recognition/45799930/images/mask.png
diff --git a/recognition/45799930/images/original.jpg b/recognition/45799930/images/original.jpg