SQL Interview Technical Question 02

Date: 2021-09-20
Language: SQL
Code Link: Github

Background

Had a technical interview for a IT role and they asked the question below to see how a person would work through and solve the problem
Key items they were looking for:

Ask questions about the problem and seek help for any clarity required
Talk through with them how the problem is being solved while solving it
At the end, explain how the problem was solved
How efficient the solution code was and the Nth value required
Communicatio skills
Coding methodology and process. Whether it follows good practice or just ad-hoc style

Problem

First create the table useing the below SQL.
Please select PostgresSQL v10.0, and write SQL query on the right side We have a table with duplicated rows, for example, there are two records with id 1, and we need to only pick one of them.
The rule is, the latest row (based on the load_datetime) is the valid record.
Please write SELECT query to do the de-dup, and return valid unique rows only

CREATE TABLE customer_sales (
  id INT,
  amount INT,
  customer_id VARCHAR(30),
  load_datetime TIMESTAMP
);
INSERT INTO customer_sales VALUES (1, 200, 'Mike', '2020-01-01 01:00:00');
INSERT INTO customer_sales VALUES (2, 299, 'John', '2020-01-02 01:00:00');
INSERT INTO customer_sales VALUES (1, 201, 'Mike', '2020-01-01 01:13:00');
INSERT INTO customer_sales VALUES (3, 70, 'Andy', '2020-01-01 01:13:00');

Solution

This was the initial solution below which is not good at all and what you should avoid. However, under time pressure it is important to stay cool and take things slowly. In the end if you are qualified you can succeed.

Answer

select
  id
  ,amount
  ,customer_id
  ,load_datetime
from 
  (
    select 
      id
      ,amount
      ,customer_id
      ,load_datetime
      ,row_number() over (PARTITION BY id order by load_datetime desc) as row_number_rank
    from
      customer_sales
  ) a
where
  row_number_rank = 1;