- Implement cross entropy without using defined function in torch - Implement a data loader of large JSON text database in pytorch - Questions about: difference between SFO and RL(HF), what consumes more of GPUs during the training of LLMs, and many other questions about parallel training The interviewers seemed not too knowledgeable about their questions and likely asked an LLM to generate them. The answer that we discussed was divergent and did not convince me. Both interviewers were very unfriendly and not collaborative during the interview.