Projects

DCGAN for Face Generation

In this project, I defined and trained a DCGAN on a dataset of faces. The goal is to get a generator network to generate new images of faces that look as realistic as possible! #### Get the Data I have used the [CelebFaces Attributes Dataset (CelebA)](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) to train the adversarial networks. #### Pre-processed Data Each of the CelebA images has been cropped to remove parts of the image that don't include a face, then resized down to 64x64x3 NumPy images. Some sample data is show below. <img src='https://github.com/salehsargolzaee/DCGAN-for-face-generation/blob/main/assets/processed_face_data.png?raw=true' width=60% /> > You can download this data [by clicking here](https://s3.amazonaws.com/video.udacity-data.com/topher/2018/November/5be7eb6f_processed-celeba-small/processed-celeba-small.zip) This is a zip file that you'll need to extract in the home directory of this notebook for further loading and processing. After extracting the data, you should be left with a directory of data `processed_celeba_small/` ### Network Architecture The architecture used for the generator and the discriminator was inspired by the [original DCGAN paper](https://arxiv.org/pdf/1511.06434.pdf): <img src="https://github.com/salehsargolzaee/DCGAN-for-face-generation/blob/main/assets/Generator.png?raw=true" alt="Network-architecture" width="620"/> **I have also used the same hyperparameters mentioned in this paper.** The loss functions were inspired by the LSGAN paper. > Binary cross-entropy loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, I've used a least-squares loss function for the discriminator. This structure is also referred to as a least-squares GAN or LSGAN, and you can [read the original paper on LSGANs, here](https://arxiv.org/pdf/1611.04076.pdf). The authors show that LSGANs are able to generate higher quality images than regular GANs and that this loss type is a bit more stable during training! Finally, the last layer of the discriminator was inspired by patchGAN. PatchGAN has fewer parameters, runs faster, and classifies images as fake or real. You can check about patchGAN in this paper: [Image-to-Image Translation with Conditional Adversarial Networks](https://arxiv.org/pdf/1611.07004.pdf). ### Results You can see the value for losses below: <img src="https://github.com/salehsargolzaee/DCGAN-for-face-generation/blob/main/assets/losses.png?raw=true" alt="losses" width="620"/> Some of the generated faces after `25 epochs` have been shown below. As you can see, they're not the most realistic faces in the world, but I argue that they are fantastic, considering that : 1. I didn't put so much time experimenting with hyperparameters, 2. The model is not very deep, 3. The number of epochs is relatively low, 4. And the initial input is small. <img src="https://github.com/salehsargolzaee/DCGAN-for-face-generation/blob/main/assets/generated_faces.png?raw=true" alt="generated faces" width="620"/>

LSTM for Sentiment Analysis

In this notebook, I implemented a recurrent neural network (Long short-term memory) using PyTorch that performs sentiment analysis. Here I used a dataset of Amazon baby products reviews, accompanied by product names and rates. You can find it [here](https://www.kaggle.com/ronnie3rg/amazon-baby-sentiment-analysis) #### Network Architecture The architecture for this network is shown below. <img src="https://github.com/salehsargolzaee/LSTM-for-Sentiment-Analysis/blob/main/assets/network_raedme.png?raw=true" alt="Network-architecture" width="620"/> The layers are as follows: 1. An embedding layer that converts our word tokens (integers) into embeddings of a specific size. 2. An LSTM layer defined by a hidden_state size and number of layers 3. A fully-connected output layer that maps the LSTM layer outputs to a desired output_size 4. A sigmoid activation layer which turns all outputs into a value 0-1; return only the last sigmoid output as the output of this network. ___ **It is not possible to push the model's `state_dict` here due to its size. If you need it, feel free to [contact](mailto:salehsargolzaee@gmail.com) me.** #### Dataset It's a `CSV` file consisting of reviews of Amazon baby products. You can download it from [`Kaggle`](https://www.kaggle.com/datasets/ronnie3rg/amazon-baby-sentiment-analysis?select=amazon_baby.csv). It consists of product names, reviews, and ratings associated with each. Bellow, you can see dataframe info: |Data columns (total 3 columns):||| | ----- | ----- | ----- | |name |183213 non-null |object| |review |182702 non-null |object| |rating |183531 non-null |int64| Head of the data: |name| review| rating| |---|---|---| |Planetwise Flannel| Wipes These flannel wipes are OK, but in my opinion ...| 3 |Planetwise Wipe Pouch| it came early and was not disappointed. i love...| 5 |Annas Dream Full Quilt with 2 Shams| Very soft and comfortable and warmer than it l...| 5 |Stop Pacifier Sucking without tears with Thumb...| This is a product well worth the purchase. I ...| 5 |Stop Pacifier Sucking without tears with Thumb...| All of my kids have cried non-stop when I trie...| 5

Landmark Tagging For Social Media

Photo sharing and photo storage services like to have location data for each photo that is uploaded. With the location data, these services can build advanced features, such as automatic suggestion of relevant tags or automatic photo organization, which help provide a compelling user experience. Although a photo's location can often be obtained by looking at the photo's metadata, many photos uploaded to these services will not have location metadata available. This can happen when, for example, the camera capturing the picture does not have GPS or if a photo's metadata is scrubbed due to privacy concerns. <br/> If no location metadata for an image is available, one way to infer the location is to detect and classify a discernable landmark in the image. Given the large number of landmarks across the world and the immense volume of images that are uploaded to photo sharing services, using human judgement to classify these landmarks would not be feasible. ## Sample results The images below display some sample outputs of my finished project (__on the left is top three probabilities__): ![Sydney_Harbour_Bridge](https://github.com/salehsargolzaee/Landmark-Recognition/blob/main/images/result2.png?raw=true) ![Trevi_Fountain](https://github.com/salehsargolzaee/Landmark-Recognition/blob/main/images/result4.png?raw=true) ![Death_valley2](https://github.com/salehsargolzaee/Landmark-Recognition/blob/main/images/result0.png?raw=true) ![Gateway_of_India](https://github.com/salehsargolzaee/Landmark-Recognition/blob/main/images/result1.png?raw=true) ### Dataset The landmark images are a subset of the Google Landmarks Dataset v2. It can be downloaded using [this link](https://udacity-dlnfd.s3-us-west-1.amazonaws.com/datasets/landmark_images.zip) You can find license information for the full dataset [on Kaggel](https://www.kaggle.com/competitions/landmark-recognition-challenge/data)