Category: Data Mining & Machine Learning

Data mining provides training data for machine learning, and machine learning methods are powerful tools to dig out patterns from data.

Large Scale Text Processing and Sentiment Analysis Project with MapReduce, Hive, and Spark

This project is based on the final project I did with 2 teammates for the Cloud Computing and Big Data Application course. Motivation Apache Hive is a data processing software built on the platform of Apache Hadoop for implementing applications of data query, analysis, integration and so on. Among these applications, sentiment analysis is one…

Read more Large Scale Text Processing and Sentiment Analysis Project with MapReduce, Hive, and Spark

Alzheimer’s Disease Related Gene Analysis with Data Mining Techniques, Part II

K-Means Based the on the data we have after imputation, I did the normalization in two different methods. We need to normalize data because different sample points are generated in slightly different environments, which may cause the bias in the value of the gene expressions. For instance, if two cells have a different amount of…

Read more Alzheimer’s Disease Related Gene Analysis with Data Mining Techniques, Part II

Alzheimer’s Disease Related Gene Analysis with Data Mining Techniques, Part I

In this project, I will use multiple data mining & machine learning techniques and models to analyze the patterns embedded in the comparison of two datasets: AD data and control data. It is a real dataset of more than 8000 genes of 176 patients with Alzheimer’s disease (in text file case.gex) and 188 age-matched normal people,…

Read more Alzheimer’s Disease Related Gene Analysis with Data Mining Techniques, Part I

Why we should not over-trust data

Terms like “Big Data” and “Data Mining” are so popular these days. Businessmen use it for market analysis and investment planning, and researchers use it to expand the knowledge bound for humanity. Even in ‘Sherlock Holmes’, the main character expressed the most intriguing idea of data mining: The world is woven from billions of lives, every…

Read more Why we should not over-trust data