# 54

Questions d'entretien pour Data Mining Engineer partagées par les candidats

## Principales questions d'entretien

Trier: Pertinence|Populaires|Date
On a demandé à un Search and Data Mining Engineer...6 juin 2016

### List the strings that are anagrams from a set of strings?

2 réponses

Sorting the strings is not optimal because each sort is O(N log N) where N is the number of characters in each word. A more optimal solution is to create a function to encode each word as a hash table of character frequencies, which is O(N) for each word. Moins

sort the strings and compare

### How would you design a recommendation system (like amazon)?

2 réponses

Use collaborate filtering to compare personal preference with others. If A and B are similar, we can recommend preferred items in B to A. Moins

Why downvote on other answer? He/she is right. Collaborative filtering is the most common strategy for recommendation systems. You see user A buys these things and user B also bought those things but user B bought this other thing too so let's show that thing to User A. Moins

### Implement a sampling function with nominal distribution.

2 réponses

I think you mean Normal distribution! If you are using R use set.seed(). You can then use rnorm() with size, mean &amp; SD. e.g. &gt;set.seed(123) &gt;rnorm(100, 2, 5) Moins

I'm the original poster, sorry for my typo. I actually mean multinomial distribution. And the advanced question was, if the probability is a skewed distribution, how would you speed up your algorithm. You can find both answer from Wikipedia. :) Moins

### Only one easy/medium leetcode question during the coding module.

1 réponses

I got the optimal solution (with a couple nudges but time to spare), yet apparently this was the only module where I did not "meet expectations." Shame that some presumably small mistake in my first hour was enough to discount the otherwise very strong 6 hour interview. Moins

### longest palindrome of a string

1 réponses

there is a O(n) algorithm which I think is what they want because I correctly coded both the O(n^2) and O(n^3) and got rejected later. Moins

### Hacker Rank: remove 2 or more e's from a string.

1 réponses

just use replace function of string

### Difference between l1 and l2 regularization.

1 réponses

Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights Moins

### Design a recommendation system???

1 réponses

It depends on the volume of data that we have. Assuming there is a lot of data on hand, it is best to use a Collaborative filtering. This involves finding similar users/items for whom we are recommending products and implement a weighted average of their likeliness to the product to help make a decision on recommending the product. This could be implemented as a user-user collaborative filtering where we find similar users or an item-item collaborative filtering. If we have fewer data to work with, it is a better idea to implement a Content-based filtering approach where we create profiles for the users and try to recommend products based on the features of the user profiles. Moins

### What feature you can propose for Odnoklassniki social network?

1 réponses

I proposed the joint purchasing and joint fitness recommendation services for social network users. The interviewers were impressed by this idea. Moins

### Find the point where the sum of distance to all other points is minimized.

1 réponses

The closest point to the mean of all the points.

1 - 10 sur 54 Questions d'entretien