Machine Learning Interview Questions

If you are looking to build a career in Data Science, then you need to be well-versed in the field of machine learning. New-age technologies like artificial intelligence and machine learning have become the cornerstone of growth across diverse industries such as banking, finance, healthcare, manufacture, retail, and more. Companies are looking to hire data scientists, machine learning engineers, and similar profiles even in the times of a slump in the global economy. To successfully crack a machine learning interview, you need to be adept at programming and technical skills, while having a clear understanding of the knowledge and the basic concepts used. To help you get prepared, here is an extensive list of machine learning interview questions:

  1. What Are the Different Types of Machine Learning?
  2. What is ‘training Set’ and ‘test Set’ in a Machine Learning Model? How Much Data Will You Allocate for Your Training, Validation, and Test Sets?
  3. How Do You Handle Missing or Corrupted Data in a Dataset?
  4. How Can You Choose a Classifier Based on a Training Set Data Size?
  5. Explain the Confusion Matrix with Respect to Machine Learning Algorithms.
  6. What Is a False Positive and False Negative and How Are They Significant?
  7. What Are the Three Stages of Building a Model in Machine Learning?
  8. What is Deep Learning?
  9. Explain the Differences Between Machine Learning and Deep Learning.
  10. Explain the Applications of Supervised Machine Learning in Modern Businesses.
  11. What is Semi-supervised and Unsupervised Machine Learning techniques?
  12. Tell the Difference Between Inductive Machine Learning and Deductive Machine Learning?
  13. What’s the difference between K-means and KNN Algorithms?
  14. What Is ‘naive’ in the Naive Bayes Classifier?
  15. Describe how a System Can Play a Game of Chess Using Reinforcement Learning.
  16. How do you know which machine learning algorithm to choose for your Classification Problem?
  17. Explain how Amazon is able to Recommend Other Things to Buy? How Does the Recommendation Engine Work?
  18. When Will You Use Classification over Regression?
  19. How Do You Design an Email Spam Filter?
  20. What is a Random Forest?
  21. Considering a Long List of Machine Learning Algorithms, given a Data Set, How Do You Decide Which One to Use?
  22. What is Bias and Variance in a Machine Learning Model?
  23. What is the Trade-off Between Bias and Variance?
  24. Define Precision and Recall.
  25. What is Decision Tree Classification?
  26. What is Pruning in Decision Trees, and How Is It Done?
  27. Briefly Explain Logistic Regression.
  28. Explain the K Nearest Neighbor Algorithm.
  29. What is a Recommendation System?
  30. What is Kernel SVM?
  31. What Are Some Methods of Reducing Dimensionality?
  32. What is the difference between deep learning and machine learning?
  33. How do you select important variables while working on a data set?
  34. There are many machine learning algorithms till now. If given a data set, how can one determine which algorithm to be used for that?
  35. How are covariance and correlation different from one another?
  36. State the differences between causality and correlation?
  37. We look at machine learning software almost all the time. How do we apply Machine Learning to Hardware?
  38. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given dataset?
  39. When does regularization come into play in Machine Learning?
  40. How can we relate standard deviation and variance?
  41. A data set is given to you and it has missing values which spread along one standard deviation from the mean. How much of the data would remain untouched?
  42. Is a high variance in data good or bad?
  43. If your dataset is suffering from high variance, how would you handle it?
  44. Explain the handling of missing or corrupted values in the given dataset.
  45. What is Time series?
  46. What is a Box-Cox transformation?
  47. What is the difference between stochastic gradient descent (SGD) and gradient descent (GD)?
  48. What is the exploding gradient problem while using back propagation technique?
  49. Can you mention some advantages and disadvantages of decision trees?
  50. Explain the differences between Random Forest and Gradient Boosting machines.
  51. What’s a Fourier transform?
  52. What do you mean by Associative Rule Mining (ARM)?
  53. What is Marginalisation? Explain the process.
  54. Explain the phrase “Curse of Dimensionality”.
  55. What is the Principle Component Analysis?
  56. Why is rotation of components so important in Principle Component Analysis (PCA)?
  57. What are outliers? Mention three methods to deal with outliers.
  58. What is the difference between regularization and normalisation?
  59. Explain the difference between Normalization and Standardization.
  60. List the most popular distribution curves along with scenarios where you will use them in an algorithm.
  61. How do we check the normality of a data set or a feature?
  62. What is Linear Regression?
  63. What is target imbalance? How do we fix it? A scenario where you have performed target imbalance on data. Which metrics and algorithms do you find suitable to input this data onto?
  64. List all assumptions for data to be met before starting with linear regression.
  65. When does the linear regression line stop rotating or finds an optimal spot where it is fitted on data?
  66. Why is logistic regression a type of classification technique and not a regression? Name the function it is derived from?
  67. What could be the issue when the beta value for a certain variable varies way too much in each subset when regression is run on different subsets of the given dataset?
  68. What does the term Variance Inflation Factor mean?
  69. Which machine learning algorithm is known as the lazy learner and why is it called so?
  70. Is it possible to use KNN for image processing?
  71. How does the SVM algorithm deal with self-learning?
  72. What is Kernel Trick in an SVM algorithm?
  73. What are ensemble models? Explain how ensemble techniques yield better learning as compared to traditional classification ML algorithms?
  74. What are overfitting and underfitting? Why does the decision tree algorithm suffer often with overfitting problem?
  75. What is OOB error and how does it occur?
  76. Why boosting is a more stable algorithm as compared to other ensemble algorithms?
  77. How do you handle outliers in the data?
  78. List popular cross validation techniques.
  79. Is it possible to test for the probability of improving model accuracy without cross-validation techniques? If yes, please explain.
  80. Name a popular dimensionality reduction algorithm.
  81. How can we use a dataset without the target variable into supervised learning algorithms?
  82. List all types of popular recommendation systems? Name and explain two personalized recommendation systems along with their ease of implementation.
  83. How do we deal with sparsity issues in recommendation systems? How do we measure its effectiveness? Explain.
  84. Name and define techniques used to find similarities in the recommendation system.
  85. State the limitations of Fixed Basis Function.
  86. Define and explain the concept of Inductive Bias with some examples.
  87. Explain the term instance-based learning.
  88. Keeping train and test split criteria in mind, is it good to perform scaling before the split or after the split?
  89. Define precision, recall and F1 Score?
  90. Plot validation score and training score with data set size on the x-axis and another plot with model complexity on the x-axis.
  91. What is Bayes’ Theorem? State at least 1 use case with respect to the machine learning context?
  92. What is Naive Bayes? Why is it Naive?
  93. Explain how a Naive Bayes Classifier works.
  94. What do the terms prior probability and marginal likelihood in context of Naive Bayes theorem mean?
  95. Explain the difference between Lasso and Ridge?
  96. What’s the difference between probability and likelihood?
  97. Why would you Prune your tree?
  98. Model accuracy or Model performance? Which one will you prefer and why?
  99. List the advantages and limitations of the Temporal Difference Learning Method.
  100. How would you handle an imbalanced dataset?
  101. Mention some of the EDA Techniques?
  102. Mention why feature engineering is important in model building and list out some of the techniques used for feature engineering.
  103. Differentiate between Statistical Modeling and Machine Learning?
  104. Differentiate between Boosting and Bagging?
  105. What is the significance of Gamma and Regularization in SVM?
  106. Define ROC curve work
  107. What is the difference between a generative and discriminative model?
  108. What are hyperparameters and how are they different from parameters?
  109. What is shattering a set of points? Explain VC dimension.
  110. What are some differences between a linked list and an array?
  111. What is the meshgrid () method and the contourf () method? State some usesof both.
  112. Describe a hash table.
  113. List the advantages and disadvantages of using neural networks.
  114. You have to train a 12GB dataset using a neural network with a machine which has only 3GB RAM. How would you go about it?
  115. Write a simple code to binarize data.
  116. What is an Array?
  117. Explain Eigenvectors and Eigenvalues.
  118. How would you define the number of clusters in a clustering algorithm?
  119. What are the performance metrics that can be used to estimate the efficiency of a linear regression model?
  120. What is the default method of splitting in decision trees?
  121. How is p-value useful?
  122. Can logistic regression be used for classes more than 2?
  123. What are the hyperparameters of a logistic regression model?
  124. Name a few hyper-parameters of decision trees?
  125. How to deal with multicollinearity?
  126. What is Heteroscedasticity?
  127. Is ARIMA model a good fit for every time series problem?
  128. How do you deal with the class imbalance in a classification problem?
  129. What is the role of cross-validation?
  130. What is a voting model?
  131. How to deal with very few data samples? Is it possible to make a model out of it?
  132. What are the hyperparameters of an SVM?
  133. What is Pandas Profiling?
  134. What impact does correlation have on PCA?
  135. How is PCA different from LDA?
  136. What distance metrics can be used in KNN?
  137. Which metrics can be used to measure correlation of categorical data?
  138. Which algorithm can be used in value imputation in both categorical and continuous categories of data?
  139. When should ridge regression be preferred over lasso?
  140. Which algorithms can be used for important variable selection?
  141. What ensemble technique is used by Random forests?
  142. What ensemble technique is used by gradient boosting trees?
  143. If we have a high bias error what does it mean? How to treat it?
  144. Which type of sampling is better for a classification model and why?
  145. What is a good metric for measuring the level of multicollinearity?
  146. When can be a categorical value treated as a continuous variable and what effect does it have when done so?
  147. What is the role of maximum likelihood in logistic regression.
  148. What is a pipeline?
  149. Which sampling technique is most suitable when working with time-series data?
  150. What are the benefits of pruning?
  151. What is normal distribution?
  152. What is the 68 per cent rule in normal distribution?
  153. What is a chi-square test?
  154. What is a random variable?
  155. Which kind of recommendation system is used by amazon to recommend similar items?
  156. What is the error term composed of in regression?
  157. Which performance metric is better R2 or adjusted R2?
  158. What’s the difference between Type I and Type II error?
  159. What do you understand by L1 and L2 regularization?
  160. Which one is better, Naive Bayes Algorithm or Decision Trees?
  161. What do you mean by the ROC curve?
  162. What do you mean by AUC curve?
  163. What is log likelihood in logistic regression?
  164. How would you evaluate a logistic regression model?
  165. What are the advantages of SVM algorithms?
  166. Why does XGBoost perform better than SVM?
  167. What is the difference between SVM Rank and SVR (Support Vector Regression)?
  168. What is the difference between the normal soft margin SVM and SVM with a linear kernel?
  169. How is linear classifier relevant to SVM?
  170. What are the advantages of using a naive Bayes for classification?
  171. Are Gaussian Naive Bayes the same as binomial Naive Bayes?
  172. What is the difference between the Naive Bayes Classifier and the Bayes classifier?
  173. In what real world applications is Naive Bayes classifier used?
  174. Is naive Bayes supervised or unsupervised?
  175. What do you understand by selection bias in Machine Learning?
  176. What is the difference between Entropy and Information Gain?
  177. What are collinearity and multicollinearity?

For a career in Data Science, you also need to be proficient in Tableau, the data visualization tool. Here is a list of Tableau interview questions you should prepare for.