Round 1:
1. Introduce yourself - I explained about my overall and relevant experience, tech stack and achievements.
2. Explain project - Explained about a project about generating images from text. Not many questions asked, interviewer listened patiently and asked me explain every aspect in depth.
3. Project related to chatbot - Explained the project objectives. He asked about hallucination, vector databases, AWS services used etc.
After this, rest all are coding questions.
1. Generate 10 numbers and print the square of every 3rd number using list comprehension.
2. Given a number, count the number of 1's in it.
3. Given a number, identify whether the 3rd number is set or not (1 or 0 after converting to binary). Do it without converting the number to binary (using bit wise operations).
4. Given a list of numbers in ascending order, write binary search algorithm to find the index of a given number.
5. Sort the list of numbers without any inbuilt functions.
6. Given a nested json (or dictionary), write a function which returns the value of a key (key is of the form a.b.d.f where a, b, d, f are keys with a being initial key and b is nested key (key of value of a) etc.
Lastly, the interviewer asked if I worked on docker, kubernetes, redis, no sql databases etc
Round 2:
Asked me to explain what was asked in first round so that he can avoid those areas.
1. For finetuning a llama model, design the architecture (Like servers, s3 bucket, sagemaker, vector databases etc). He asked why vector database and after I said, to store documents, he asked why do you want to store in a database when they are already present in some location. I explained I will convert the documents into embeddings and store embeddings as models cannot work on text directly and needs to be converted to numbers either bow, tfidf or embeddings etc (I felt a bit bizzarre and at this point I felt may be the interviewer is not a data scientist perhaps ?)
2. He asked me to give a bare bone structure for creating a pipeline to finetune llama model with classes, class variables, functions that go into base class, different stages of pipeline etc. The interviewer wants to write the base class first ( I usually write all the functions first then convert them into classes and put common functions into base class but the interviewer wanted to start identifying all the common things and start with base class). I did not answer this to satisfaction. At this stage, I knew I messed up the interview.
Coding question
1. Function to calculate 2^7 without using inbuilt operators. I simply looped it 7 times. Asked the time complexity of the solution which is O(n). He asked me to do it in time complexity of O(logn). A leet code question. I failed to do it.
Interviewer was friendly. He helped me in both identifying the stages of pipeline and also to get the coding question done in O(logn) complexity but I could not.