project.html

---
layout: default
title: "Kai Qu | Project"
permalink: /project/
---

<div class="container" id = "main_body">
  <div class="row">
    <div class="col-md-12">
      <div class="page_header">
        <h1 class="page_name">Projects</h1>
        <br>
      </div>
      <div class="body_content">
          <p class = "page_intro">This page contains some projects/applications developed with other CS/CSE students or by myself individually</p>


          <h5>Kaggle Cassava Leaf Disease Image Classification Task<small style="font-size: 0.8rem"> (Spring 2021)</small></h5>
          <span class="tab"><span class="group_members">Individual Project</span>
          <br>
          <p class="tab">Cassava Leaf Disease Classification is a multiclass image classification competition held at Kaggle 
            <a class="index-page-link" href="https://www.kaggle.com/c/cassava-leaf-disease-classification">(https://www.kaggle.com/c/cassava-leaf-disease-classification)</a>. In this competition, 
            the training dataset includes 21,367 labelled images of Cassava leaves with four categories of diseases or a healthy category. The task is to classify 
            Cassava images into one of the five categories above.
          </p>
          <p class="tab">
            In this project, I built an end-to-end machine learning pipeline and ensembled a 3-model classifier trained on a dataset of 30,000 images (with 21,000 images from the competition dataset
            and 9,000 images from another related competition). The competition dataset was unbalanced with 61.9% of training images labelled with one the CMD (Cassava Mosaic Disease) category. 
            Dataset augmentation was done by by downsampling samples in the majority class and upsampling samples in minority classes through various image scaling and color shifting.
            To combat the noise within the data, I applied CUTMIX, dataset augmentation, test-time inference, and various image transformations to make the classifier more robust.
            The ensembed model delivered 90.01% accuracy on the private dataset and was ranked 84/3947 (top 3%) in the leaderboard.       
          </p>
          <span class="tab">
            <span class="tab">Github link: <a class="index-page-link" href="https://github.com/kqu7/Kaggle-Cassava-Leaf-Disease-Classification">https://github.com/kqu7/Kaggle-Cassava-Leaf-Disease-Classification</a>
          </span>

          <div class="project-images">
            <div class="project-image-wrapper-horizontal">
              <img class="project-image" src="/assets/images/kaggle/augmentation.png" />
            </div>
            <div class="project-image-wrapper-horizontal">
              <img class="project-image" src="/assets/images/kaggle/data_distribution.png" />
            </div>
            <div class="project-image-wrapper-horizontal">
              <img class="project-image" src="/assets/images/kaggle/cassava_model.jpg" />
            </div>
            <div class="project-image-wrapper-horizontal">
              <img class="project-image" src="/assets/images/kaggle/lamba_lr.png" />
            </div>
          </div>

          <hr>

          <h5>Evaluating the Impact of Highway Network Desing on Population Dynamics<small style="font-size: 0.8rem"> (Spring 2021)</small></h5>
          <span class="tab"><span class="group_members">Group members: Ge Zhang (gzhang60), Zhe Zheng (zzheng308), Kai Qu (kqu30) [All members were CSE students at Georgia Tech.]</span>
          <br>
          <p class="tab">In this project, we aimed to detect the impact of roads and highway network development on the population dynamics of animal species under different 
            scenarios, using three models (single-species model, predator vs prey model, and multi-species interactions model).
          </p>
          <p class="tab">For each model, we defined the main behaviors of the population as growth and emigration based on the reaction-diffusion system.
            We assumed three kinds of initial population distributions (uniform, clustered, and even distributions). Over generations of populations, we tracked the density of population over areas and its growth/dispersal patterns, 
            analyzing the sensitivity of population on the locations of highways. 
          </p>
          <p class="tab">In conclusion, we observed that construction sites of highway roads had a significant impact on the population, killing three many times population when it was constructed 
            in the center than when constructed in peripheral sites. Also, by comparing the population change of different scenarios over generations, we also discovered that the competitive relationship among different species 
            was negatively correlated with the death rate resulted by highway networks.
          </p>
          <span class="tab">The project report is available
          <a class="index-page-link" href="https://kqu7.github.io/assets/reports/cse6730_report.pdf" target="_blank">here</a>. Full implementation is also available upon query.</span>

          <div class="project-images">

            <div class="project-image-wrapper-vertical">
              <img class="project-image" src="/assets/images/cse-6730/population_density.png" />
            </div>
            <div class="project-image-wrapper-vertical">
              <img class="project-image" src="/assets/images/cse-6730/prey_distribution.png" />
            </div>
            <div class="project-image-wrapper-vertical">
              <img class="project-image" src="/assets/images/cse-6730/eight_subdiagram.png" />
            </div>
          </div>

          <hr>

          <h5>Simulation of Biology<small style="font-size: 0.8rem"> (Spring 2021)</small></h5>
          <span class="tab"><span class="group_members">Individual Project</span>
          <br>
          <p class="tab"> I implemented four classical models (Game of Life, SIR, Gray-Scott Reaction-Diffusion and Flocking) that are frequently used in biological observation and analysis, using Processing, a
            graphical library built for the electronic arts, new media art, and visual design communities.
          </p>
          <p class="tab">For the simulation of Game of Life, a cellular automata simulator was written. Users could control or create cell patterns by using mouse clicks and can also enter single-step 
            or continuous mode by using keystrokes. To model the transmission pattern of SIR, I implemented a grid and agent based SIR epidemic simulator, along with a subpopulation graph that 
            tracked the change of different populations (infected, uninfected, recovered) over time. In the reaction-diffusion project, I wrote a PDE simulator using finite differencing on a 2D grid to simulate
            a system known as <a class="index-page-link" href="https://groups.csail.mit.edu/mac/projects/amorphous/GrayScott/">Gray-Scott Reaction-Diffusion system</a>, 
            creating patterns of spots, stripes or spiral waves by initializing the system with different parameter settings. For flocking simulation,  a flocking simulator was implemented based on 
            <a class="index-page-link" href="https://www.cs.toronto.edu/~dt/siggraph97-course/cwr87/">Craig Reynolds' three rules of interaction</a>. 
          </p>
          <span class="tab">Github link: <a class="index-page-link" href="https://github.com/kqu7/Gatech-CS-7492-Simulation-of-Biology">https://github.com/kqu7/Gatech-CS-7492-Simulation-of-Biology</a>

          <div class="project-images">
            <div class="project-image-wrapper-horizontal-one-row">
              <img class="project-image" src="/assets/images/cs-7492/reaction_diffusion_demo.png"/>
            </div>
            <div class="project-image-wrapper-horizontal-one-row">
              <img class="project-image" style="height: 265px" src="/assets/images/cs-7492/sir_epidemic_model_demo.png" />
            </div>
            <div class="project-image-wrapper-horizontal-one-row">
              <img class="project-image" style="height: 290px" src="/assets/images/cs-7492/flocking_simulation_demo.png" />
            </div>
            <div class="project-image-wrapper-horizontal-one-row">
              <img class="project-image" src="/assets/images/cs-7492/life-simulator-demo.png" />
            </div>
          </div>


          <hr>

          <h5>Exploration of Algorithms on Coping with the MVC Problem<small style="font-size: 0.8rem"> (Fall 2020)</small></h5>
          <span class="tab"><span class="group_members">Group members: Tiancheng Ye (tye97), Chenying Liu (cliu662), Kai Qu (kqu30) [All members were CSE students at Georgia Tech.]</span>
          <br>
          <p class="tab">We implemented four types algorithms (branch and bound, approximation, and two variations of local search algorithms) attempting to find reasonbly good solutions
            to <a class="index-page-link" href="https://en.wikipedia.org/wiki/Vertex_cover">Minimum Vertex Cover (MVC) problem</a>, a well-known NP-hard problem in graph theory, within 
            a limited amount of running time. We measured the performance of different algorithms in terms of their running time and relative errors compared to the ground truth.
          </p>
          <p class="tab">In addition, we analyzed the theoretical time and space complexities of the four algorithms, comparing the theoretical time and space complexities to the actual
            ones, and providing explainations for some inconsistencies between them on certain graph instances.
          </p>
          <span class="tab">The project report is available
          <a class="index-page-link" href="https://kqu7.github.io/assets/reports/algorithm_prj_report.pdf" target="_blank">here</a>. Full implementation is also available
          upon query.</span>

          <hr>

          <h5>Optimal Review Ranking for Improving Shopper’s Decision Making<small style="font-size: 0.8rem"> (Spring 2020)</small></h5>
          <span class="tab"><span class="group_members">Group members: Akhil Sai Peddireddy (ap3ub), Kai Qu (kq4ff),
          Rakshita Kaulgud Ramesh (rrk7pb), and Haoran Zhu (hz3fr) [All members were CS students at UVA.]</span>
          <br>
          <p class="tab">We developed a system that provides personalized ranking of  customer reviews for each shooper. Using
          cumulative position-weighted score, we were able to demonstrate that there is
          an average of 20% increase in user satisfaction using our personalized ranking system compared to the default Amazon
          ranking system.
          </p>
          <p class="tab">Also, we proposed a method of rating products at term level based on users’ preferences.
          For example, if a user cares very much about the price of a product, we’ll offer another rating of the product
          solely based on the price factor. This method can be applied to recommendation systems to provide accurate recommendations.
          </p>
          <span class="tab">The project report is available
          <a class="index-page-link" href="https://kqu7.github.io/assets/reports/ir_report.pdf" target="_blank">here</a>. Full implementation is also available
          upon query.

          <!-- <hr>

          <h5>Sarcastic Tweets Detection<small style="font-size: 0.8rem"> (Fall 2019)</small></h5>
          <span class="tab"><span class="group_members">Group members: Yuxin Wu (yw7vv), Issac Li (il5fq),
          Kai Qu (kq4ff) [All members were CS students at UVA.]</span>
          <br>
          <span class="tab">We experimented with different variations of CNN and LSTM models on the problem of sarcastic text detection
            with both the twitter and reddit datasets crawled from Kaggle, comparing the results of various models in different metrics (Precision, Recall, F1, and Accuracy)
          <br>
          <span class="tab">From the result analysis, we discovered that while the precision scores of some variations of LSTM models were high, it showed that all models
            had relatively low recall scores. We suspected that this was because of the inherent difficulty in sarcasm detection, and our models can only deal with the easy cases due to their limited structures
          <br>
          <span class="tab">The project report is available
          <a href="https://kqu7.github.io/assets/reports/sarcasm_detection_report.pdf" target="_blank">here</a>. Full implementation is also available
          upon query  -->

          <!-- <h5>DeepArt: Identify Artist from Painting<small style="font-size: 0.8rem"> (Fall 2019)</small></h5>
          <span class="tab"><span class="group_members">Group members: Daniel Xiao (zx8yz), Kai Qu(kq4ff), and Haoran Zhu (hz3fr)
          [All members were CS students at UVA.]</span>
          <br>
          <span class="tab">We adapted and fine-tune different pretrained models (ResNet50, GoogLeNet, VGGNet11) to a dataset that
          contains artworks of 50 the most influential artists of all time and compare the results to our baseline model (ResNet18).
          <br>
          <span class="tab">Furthermore, we trained another identical ResNet50 Model (call it ResNet50_2). However, during the
          training of ResNet50_2, we freezed deeper layers to only train shallow layers to capture the overall styles of different
          paintings, comparing this result to the one merely by fine-tuning approach.
          <br>
          <span class="tab">Our ResNet 50_2 model outperforms all other pretrained models by at least 1.5% in all the metrics we used.
          This validates our speculation that shallow layers play a more important role in understanding the overall style of
          paintings than deep layers.
          <br>
          <span class="tab">The project report is available
          <a href="https://kqu7.github.io/assets/reports/deepart_report.pdf" target="_blank">here</a>. Full implementation is also available
          upon query.
          <br>
          !-->
      </div>
    </div>
  </div>
</div>

</body>
</html>