This is a programming oriented, hands-on training for starting a career in Data Mining and Machine Learning, and to acquire the necessary skills in statistical and inferential thinking. This course will enable you to pursue a career further in the most sought after skills and technologies in the market, whose applications span across almost all industry verticals. We will teach you the most popular tools of the trade used in industry today. After this course, many of the things you read and hear about Data Science, Artificial Intelligence and Machine learning would make a lot more sense. You could then build further on the foundation skills learned by applying them to your domain and industry.

Course Objectives

  • Preparation for tackling real-world problems in Data Science through a structured process
  • Ability to correlate and apply techniques learned in this course to their own respective domains to solve practical problems on which Data Mining could be potentially applied
  • Build the intuition behind contemporary Machine Learning models and algorithms that generate those models
  • Ability to evaluate the accuracy and performance of predictive and analytical techniques

Course Outline

Week 1

  • Introduction to Python, Data Mining and Machine Learning
    • Introduction to Data Mining, Machine Learning and Big Data Analysis
    • Introduction to CRISP-DM as a formal, structured and step by step data analysis process
    • Introduction to Python Programming necessary for Data Science
  • Homework
    • Python Programming

Week 2

  • Data Wrangling, cleaning and manipulation in Python
    • Python Programming (cont’d from Week 1)
    • Introduction to Pandas Python Library
    • Pre-processing, cleaning and preparing data with Pandas
  • Homework—Data Preprocessing

Week 3

  • Introduction to Statistics and Visualizations with Python and Pandas
    • Descriptive Statistics
    • Different types of Distributions
    • Inferential statistics and correlation
    • Statistical Hypothesis Testing
    • Data Visualization—Histograms, Scatter-plots and box-plots
  • Homework—Data Visualization and Statistics

Week 4

  • Introduction to Machine Learning Models, Supervised Learning and Performance Evaluation
    • Introduction to Machine Learning Models
    • Information based ML models and algorithms (Decision Trees)
    • Similarity Based ML models and algorithms (KNN etc.)
    • Introduction to Scikit-learn Python library
    • Model performance evaluation techniques
  • Homework—Modeling exercises using Scikit-learn on different datasets

Week 5

  • Machine Learning Models and Un-Supervised Learning
    • Error based ML models and algorithms (Linear and Logistic Regression)
    • Probability based models (Bayes Naïve Bayes)
    • Un-supervised learning (k-means Clustering)
  • Homework– Modeling exercises using Scikit-learn on different datasets

Pre-Requisites

People with knowledge of computer programming in any language. Basic knowledge of first year college level mathematics and statistics, and a passion for learning data science.

Although, we will introduce the core concepts of Python necessary for Data Science, but it is a fast paced course. You will struggle keeping up if you have no prior experience of programming in any other language. Familiarity with Python would help but is not necessary. Additionally, some knowledge of basic statistics would certainly help.

Course Benefits

Data Mining, Machine Learning and Artificial Intelligence are the most sought after skills and the basis of technological innovations throughout the world today. Requirement to gain insights into terabytes of data produced by the Web and social media networks, and identifying hidden patterns in data produced by corporations and consumers is growing at an exponential rate. The applications of this field span from marketing analysis and forecasts, predicting demands for products, making intelligent business decisions, cyber security and threat detection, predicting poll and survey results, and too many others to mention here. According to almost all market research reports, there is a huge gap in supply and demand of Data Scientists as of today. This course will enable participants to learn the foundation skills through programming, in arguably the most popular Data Science language today—Python. We don’t claim to make you an expert at the end of the course but you will get a kick start and be able to apply the gained knowledge in your specific domain, may it be telecommunications, marketing, business analysis, medicine, information security and finance to name a few.

Instructors Bio

Farhan Zaidi has over 25 years of experience in software architecture, design and development. He has an MS in Computer Science from University of Southern California, Los Angeles USA and a BS Electrical Engineering from University of Engineering Lahore Pakistan. He has worked in industry as a Software Architect and Senior Engineer for many years, and recently working as a professional trainer in latest and emerging technologies. He is skilled in architecting and designing networked, distributed software systems, enterprise and carrier grade systems, implementing network and telecommunication protocol stacks, engineering reliable and fault tolerant software applications, middleware, platforms and frameworks implementation. Farhan’s current interests include Data Science, Machine Learning and Blockchain technologies.