Home » Top 50 SQL Interview Questions Asked in Data Analyst Interviews

Top 50 SQL Interview Questions Asked in Data Analyst Interviews

Top 50 SQL Interview Questions Asked in Data Analyst Interviews

SQL (Structured Query Language) is a fundamental tool for data analysts, crucial for extracting and analyzing data stored in relational databases. Proficiency in SQL is often a key requirement for data analyst roles, and interviews frequently assess candidates SQL knowledge through a series of targeted questions. Here’s an introduction to some of the top 50 SQL interview questions commonly asked in data analyst interviews:

1. Explain order of execution of SQL

The typical order of execution for an SQL query is:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. DISTINCT
  7. ORDER BY
  8. LIMIT / OFFSET

2. What is difference between where and having?

The main differences between the WHERE and HAVING clauses in SQL are as follows:

Usage with Aggregate Functions:

  • WHERE: The WHERE clause is used to filter rows before any groupings are made. It filters individual rows based on a condition.
  • HAVING: The HAVING clause is used to filter groups of rows after the GROUP BY clause has grouped them. It filters the results of aggregate functions applied to groups.

Placement:

  • WHERE: The WHERE clause is used with the SELECT statement and appears before the GROUP BY clause, if one exists.
  • HAVING: The HAVING clause is used with the GROUP BY clause and appears after it. It filters groups that have been created by the GROUP BY clause.

3. What is the use of group by?

In database management and data analysis, the GROUP BY clause is a powerful tool used to organize and summarize data based on specific categories. It’s like sorting your groceries by type (fruits, vegetables, dairy) instead of having them all mixed in one bag.

Here’s how Group By works:

  1. Data Selection: You first specify the columns you want to retrieve from your database table.
  2. Grouping Criteria: Then, you define one or more columns by which you want to group the data. These columns act as the categories for your summarized results.
  3. Aggregation: Finally, you use aggregate functions like COUNT, SUM, AVG, MIN, or MAX to perform calculations on the grouped data. These functions provide statistical summaries for each group.

4. Explain all types of joins in SQL?

In SQL, joins are used to combine rows from two or more tables based on related columns between them. There are several types of joins available, each serving different purposes depending on how you want to combine the data. Here are the main types of joins in SQL:

  • INNER JOIN is commonly used to fetch rows that have matching values in both tables.
  • LEFT JOIN and RIGHT JOIN are useful for situations where you want to include unmatched rows from one table (left or right, respectively).
  • FULL JOIN is less common but useful when you want to include all rows from both tables, regardless of whether there is a match.
  • CROSS JOIN is used when you need to combine all rows from one table with all rows from another table, which can be used for generating combinations but needs to be used carefully due to the potential for large result sets.

5. What are triggers in SQL?

In SQL, triggers are special database objects that automatically execute a set of predefined SQL statements whenever a specific event occurs on a particular table. They act like automated scripts that respond to insert, update, or delete operations on a table.

6. What is stored procedure in SQL

In SQL, a stored procedure is a precompiled collection of SQL statements and procedural logic that is stored in the database server. It is similar to a function in programming languages but exists within the database environment. Stored procedures can accept input parameters, perform calculations, and return results to the calling application or user. They provide several advantages, including improved performance, security, and modularity of code.

7. Explain all types of window functions?

Window functions are powerful tools in SQL that allow you to perform calculations across a set of table rows related to the current row. Here’s a brief overview of the types of window functions commonly used:

  1. Aggregate Window Functions: These functions compute a single result from a set of rows defined by the window frame. Examples include SUM(), AVG(), MIN(), MAX(), and COUNT().
  2. Ranking Window Functions: These functions assign a rank or row number to each row within a partition of a result set. Common ones include ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE().
  3. Analytic Window Functions: These functions compute an aggregate value based on a group of rows and return multiple rows for each input row. Examples include LAG(), LEAD(), FIRST_VALUE(), and LAST_VALUE().
  4. Distribution Window Functions: These functions analyze statistical distributions or cumulative distributions within a window frame. Examples include PERCENT_RANK() and CUME_DIST().

8. What is difference between Delete and Truncate?

FeatureDELETETRUNCATE
Operation TypeDML (Data Manipulation Language)DDL (Data Definition Language)
UsageDeletes rows based on a condition or all rowsRemoves all rows from a table (no conditions)
Transaction LogLogs each row deletionDoes not log individual row deletions
Rollback CapabilityCan be rolled back using a transaction rollbackCannot be rolled back (no transaction rollback)
SpeedSlower for large datasets due to individual loggingFaster for large datasets (no individual logging)
TriggersTriggers are fired for each deleted rowTriggers are not fired
LockingCan cause row-level locks depending on isolationObtains a table-level lock during operation
Where ClauseAllows WHERE clause for selective deletionsDoes not accept WHERE clause
Table StructureTable structure remains unchangedDeallocates table data pages, resetting table

9. What is difference between DML, DDL and DCL?

FeatureDML (Data Manipulation Language)DDL (Data Definition Language)DCL (Data Control Language)
PurposeManipulate data within database objects (rows in tables).Define and manage structure of database objects (tables, views, indexes, etc.).Control access to data within the database (permissions, security).
ExamplesSELECT, INSERT, UPDATE, DELETECREATE, ALTER, DROP, TRUNCATEGRANT, REVOKE
FunctionalityRetrieve, modify, insert, delete data from tables.Create, modify, delete database objects and schema.Grant or revoke permissions and control access to database objects.
ScopeOperates on data (rows) within tables.Operates on database objects (tables, views, indexes, etc.).Operates on access control and permissions.
ImpactAffects the contents of the database (data manipulation).Affects the structure of the database (schema definition).Affects user access and permissions (security management).
Examples of UsageSELECT * FROM table_name;CREATE TABLE table_name (column1 datatype, column2 datatype, …);GRANT SELECT ON table_name TO user_name;

10. What are aggregate function and when do we use them? Explain with few example.

Aggregate functions in SQL are functions that perform a calculation on a set of values and return a single value. They are commonly used to summarize data or compute derived values from multiple rows in a table. Aggregate functions operate on a group of rows and return a single result for each group. Aggregate functions in SQL are used whenever you need to compute a single result value from a set of multiple rows.

Example:

Count the number of orders placed by customers in an Orders table

SELECT COUNT(*) AS TotalOrders
FROM Orders;
SQL

Find the minimum age of employees in an Employees table

SELECT MIN(Age) AS MinimumAge
FROM Employees;
SQL

11. Which is faster between CTE and Subquery?

In terms of performance, CTEs (Common Table Expressions) and subqueries generally perform similarly in modern database systems. Both can be optimized by the query optimizer to generate efficient execution plans.

However, there are slight differences in how they are processed:

  • CTEs are typically more readable and can improve query organization by defining named temporary result sets that can be referenced multiple times within a query. They are especially useful for recursive queries.
  • Subqueries, on the other hand, are inline queries embedded within the main query and can sometimes be optimized differently depending on their structure and usage.

The choice between CTEs and subqueries should primarily consider readability, query complexity, and maintenance rather than performance alone. Optimize queries based on these factors to achieve efficient and manageable SQL code.

12. What are constraints and types of Constraints?

Constraints in SQL are rules or conditions that enforce data integrity and consistency in a database. They specify limits or conditions that data stored in tables must conform to, ensuring the accuracy and reliability of the data. Constraints are defined at the time of table creation or alteration and help maintain the quality of data within the database.

Here are the main types of constraints in SQL:

NOT NULL Constraint : Ensures that a column cannot contain NULL values.

UNIQUE Constraint : Ensures that all values in a column (or a group of columns) are unique (no duplicates).

PRIMARY KEY Constraint : Uniquely identifies each record in a table and ensures that the column or group of columns specified is unique and not NULL.

FOREIGN KEY Constraint : Establishes a relationship between two tables. Ensures referential integrity by requiring that the values in a column (or group of columns) from one table exist in another table’s referenced column (usually the primary key).

CHECK Constraint : Specifies a condition that must be true for each row in a table.

CHECK Constraint with Subquery : Used to enforce a rule based on the result of a subquery.

13. Types of Keys?

In SQL databases, keys are used to uniquely identify records (rows) within a table or to establish relationships between tables. There are several types of keys, each serving a specific purpose in database design. Here are the main types of keys in SQL:

Primary Key:

A primary key uniquely identifies each record in a table. Each table can have only one primary key. Values in the primary key column(s) must be unique and cannot be NULL.

Unique Key:

A unique key constraint ensures that all values in a column (or a set of columns) are unique (no duplicates).Unlike a primary key, a table can have multiple unique keys.

Foreign Key:

A foreign key establishes a link between two tables by referencing the primary key or unique key of another table. It enforces referential integrity, ensuring that values in the foreign key column(s) of the child table exist in the referenced column(s) of the parent table.

Composite Key:

A composite key consists of multiple columns that together uniquely identify a record in a table. Each column in a composite key can contain duplicate values individually, but the combination of values in all columns must be unique.

Super Key:

A super key is a set of attributes (columns) that can uniquely identify a record in a table. It can be a combination of one or more columns.

Candidate Key:

A candidate key is a minimal super key, meaning it is a minimal set of columns that can uniquely identify a record in a table.

Alternate Key:

An alternate key is any candidate key that is not chosen as the primary key.

14. Different types of Operators ?

In SQL, operators are used to perform operations on data values and manipulate data within queries. These operators can be categorized into several types based on their functionality and usage. Here are the main types of operators in SQL:

Arithmetic Operators: Perform mathematical operations on numeric data. Examples include + (addition), - (subtraction), * (multiplication), / (division), and % (modulo).

Comparison Operators: Compare values to determine the relationship between them. Examples include = (equal to), <> or != (not equal to), < (less than), > (greater than), <= (less than or equal to), and >= (greater than or equal to).

Logical Operators: Combine conditions to form more complex conditions. Examples include AND, OR, and NOT. These operators are used in WHERE clauses and other conditional expressions.

Concatenation Operator: Combines strings or columns of text data. In SQL, this is typically represented by || (double vertical bars) or CONCAT() function.

Bitwise Operators: Perform operations at the bit level on binary data. Examples include & (bitwise AND), | (bitwise OR), ^ (bitwise XOR), ~ (bitwise NOT), and << (left shift) and >> (right shift).

Assignment Operator: Assigns a value to a variable or column. In SQL, this is typically represented by =.

IN Operator: Tests whether a value matches any value in a list or a subquery. For example, value IN (list).

LIKE Operator: Tests whether a string matches a specified pattern. It’s often used with wildcard characters (% for zero or more characters, _ for a single character).

IS NULL Operator: Tests whether a value is NULL or not. It’s used in conjunction with the IS keyword, e.g., column IS NULL.

15. Difference between Where and Group By?

FeatureWHERE ClauseGROUP BY Clause
PurposeFilters rows based on specified conditions.Groups rows based on one or more columns for aggregation.
UsageUsed to narrow down the result set.Used to aggregate data and summarize information.
Operates onIndividual rows.Entire result set.
Functions withIndividual rows and conditions.Aggregation functions (SUM(), COUNT(), etc.).
ExampleSELECT * FROM employees WHERE department = 'Sales';SELECT department, COUNT(*) AS num_employees FROM employees GROUP BY department;
Order of ExecutionApplied before GROUP BY.Applied after WHERE and before SELECT.

16. What are Views?

In SQL, a view is a virtual table that is based on the result-set of a SELECT statement. It is not stored as a distinct object like a table; instead, it is a precompiled SQL query that is stored in the database with an associated name. Views can simplify complex queries, provide a layer of security, and encapsulate frequently used queries into reusable objects.

17. What is difference between varchar and nvarchar?

FeatureVARCHARNVARCHAR
StorageVariable-length non-Unicode character data.Variable-length Unicode character data.
SizeDefined size (1 to 8,000 bytes).Defined size (1 to 4,000 characters).
Character SetUses the database default character set (e.g., ASCII).Uses Unicode (UTF-16) character set.
Space UsageUses space for actual data length plus 2 bytes.Uses space for actual data length plus 2 bytes per character.
PrefixNone (typically specified without a prefix).N prefix (e.g., NVARCHAR(50)).
ExampleDECLARE @name VARCHAR(50) = 'John';DECLARE @name NVARCHAR(50) = N'John';
UsageSuitable for non-Unicode data like names or descriptions.Suitable for storing multilingual data that includes Unicode characters.

18. Difference between char and nchar?

FeatureCHARNCHAR
StorageFixed-length character data.Fixed-length Unicode character data.
SizeDefined size (1 to 8,000 bytes).Defined size (1 to 4,000 characters).
Character SetUses the database default character set (e.g., ASCII).Uses Unicode (UTF-16) character set.
Space UsageUses space for maximum length even if actual data is shorter.Similar, but accommodates Unicode characters, using 2 bytes per character.
PrefixNone (typically specified without a prefix).N prefix (e.g., NCHAR(50)).
ExampleDECLARE @name CHAR(10) = 'John';DECLARE @name NCHAR(10) = N'John';
UsageSuitable for fixed-length data like codes or IDs.Suitable for storing multilingual data that includes Unicode characters.

19. What is an index? Explain its different types.

An index in SQL is a structure that improves the speed of data retrieval operations on a database table at the cost of additional space and overhead during data modification. Here’s an explanation of different types of indexes commonly used in databases:

Primary Index: Defines the physical order of rows based on the primary key. Unique and automatically created with primary key constraints.

Unique Index: Ensures uniqueness of values in one or more columns, facilitating fast data retrieval and preventing duplicates.

Clustered Index: Dictates the physical order of data rows in a table based on the indexed column(s). Each table can have only one.

Non-Clustered Index: Stores a separate structure from the table data, enabling fast retrieval of data based on indexed columns.

Composite Index: Includes multiple columns in its definition, optimizing queries that filter based on combinations of these columns.

Covering Index: Includes all columns required to satisfy a query, enhancing performance by minimizing access to the actual table data.

Full-Text Index: Specialized for efficient searching of large text data, supporting queries on words and phrases within indexed text columns.

20. Differentiate between UNION and UNION ALL.

FeatureUNIONUNION ALL
PurposeCombines result sets of two or more SELECT queries, removing duplicate rows.Combines result sets of two or more SELECT queries, including all rows (including duplicates).
Duplicate RowsRemoves duplicate rows from the combined result set.Includes all rows from each SELECT query, even if duplicates exist.
PerformanceGenerally slower due to the need to check for and remove duplicates.Generally faster because it does not check for or remove duplicates.
SyntaxSELECT columns FROM table1 UNION SELECT columns FROM table2;SELECT columns FROM table1 UNION ALL SELECT columns FROM table2;

21. How many types of clauses in SQL ?

In SQL, there are several types of clauses that are used to specify conditions, constraints, or actions within queries. Here’s a brief summary of the main types of clauses:

  1. SELECT: Retrieves data from one or more tables or expressions.
  2. FROM: Specifies the tables or views from which to retrieve data.
  3. WHERE: Filters rows based on specified conditions.
  4. GROUP BY: Groups rows that have the same values into summary rows.
  5. HAVING: Filters groups based on specified conditions.
  6. ORDER BY: Sorts the result set in ascending or descending order.
  7. JOIN: Specifies how to combine rows from two or more tables based on a related column.
  8. INNER JOIN: Returns rows that have matching values in both tables.
  9. LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table and matching rows from the right table.
  10. RIGHT JOIN (or RIGHT OUTER JOIN): Returns all rows from the right table and matching rows from the left table.
  11. FULL JOIN (or FULL OUTER JOIN): Returns all rows when there is a match in either left or right table.
  12. UNION: Combines the result sets of two or more SELECT statements, removing duplicate rows.
  13. UNION ALL: Combines the result sets of two or more SELECT statements, including all rows (including duplicates).
  14. LIMIT (or FETCH FIRST): Limits the number of rows returned in the result set.
  15. OFFSET: Skips a specified number of rows before returning the result set.

22. What are the various types of relationships in SQL?

Here are the various types of relationships between tables in SQL, explained briefly:

  1. One-to-One Relationship:
    • Each record in Table A is associated with exactly one record in Table B, and vice versa. This relationship is less common in database design due to its inflexibility.
  2. One-to-Many Relationship:
    • Each record in Table A can be associated with multiple records in Table B, but each record in Table B is associated with only one record in Table A. This is the most common type of relationship.
  3. Many-to-One Relationship:
    • Multiple records in Table A can be associated with a single record in Table B. This is essentially the reverse of a One-to-Many relationship.
  4. Many-to-Many Relationship:
    • Multiple records in Table A can be associated with multiple records in Table B, and vice versa. This type of relationship requires a junction table (or associative table) to manage the many-to-many relationship.

23. Difference between Primary Key and Secondary Key?

FeaturePrimary KeySecondary Key
DefinitionUnique identifier for each row in a table.Indexed column(s) used for efficient querying.
UniquenessMust contain unique values; no duplicates allowed.May or may not contain unique values.
NULL ValuesDoes not allow NULL values.May allow NULL values unless otherwise constrained.
Number per TableExactly one primary key per table.Can have multiple secondary keys per table.
UsageEstablishes entity integrity and relationships.Improves query performance by indexing frequently queried columns.
ConstraintsAutomatically creates a unique constraint.Can include unique keys, foreign keys, or indexed columns.

24. What is the difference between where and having?

FeatureWHERE ClauseHAVING Clause
UsageFilters rows before grouping.Filters groups after grouping.
Applied ToIndividual rows.Groups of rows (result of aggregate functions).
AggregationCannot be used with aggregate functions.Used with aggregate functions (SUM, COUNT, AVG, etc.).

25. Find the second highest salary of an employee?

/*Employees Table*/

CREATE TABLE Employees (
    EmployeeID INT,
    Salary DECIMAL(10, 2)
);

INSERT INTO Employees (EmployeeID, Salary) VALUES
(1, 50000),
(2, 60000),
(3, 45000),
(4, 70000),
(5, 55000),
(6, 65000);
SQL
/*Query for finding second highest salary*/

SELECT MAX(Salary) AS SecondHighestSalary
FROM Employees
WHERE Salary < (
    SELECT MAX(Salary) FROM Employees
);
SQL

26. Write retention query in SQL?

1. Create a Table

CREATE TABLE UserActivity (
    UserID INT,
    ActivityDate DATE
);
SQL

2. Insert Data

INSERT INTO UserActivity (UserID, ActivityDate) VALUES
(1, '2024-01-01'),
(2, '2024-01-02'),
(1, '2024-02-01'),
(3, '2024-02-05'),
(1, '2024-03-01'),
(2, '2024-03-03'),
(4, '2024-03-10'),
(1, '2024-04-01'),
(3, '2024-04-05'),
(1, '2024-05-01'),
(2, '2024-05-02'),
(4, '2024-05-10');
SQL

3. Calculate monthly retention rate

WITH RetentionData AS (
    SELECT
        DATE_FORMAT(ActivityDate, '%Y-%m') AS Month,
        COUNT(DISTINCT CASE WHEN EXTRACT(MONTH FROM ActivityDate) = EXTRACT(MONTH FROM DATE_SUB(ActivityDate, INTERVAL 1 MONTH)) THEN UserID END) AS RetainedUsers,
        COUNT(DISTINCT UserID) AS TotalUsers
    FROM
        UserActivity
    WHERE
        ActivityDate >= '2024-01-01' -- Specify start date
    GROUP BY
        DATE_FORMAT(ActivityDate, '%Y-%m')
)

SELECT
    Month,
    RetainedUsers,
    TotalUsers,
    ROUND((RetainedUsers / TotalUsers) * 100, 2) AS RetentionRate
FROM
    RetentionData
ORDER BY
    Month;
SQL

27. Write a query for cummulative sum in SQL?

1. Create a table

CREATE TABLE Sales (
    SaleDate DATE,
    Amount DECIMAL(10, 2)
);
SQL

2. Insert some data

INSERT INTO Sales (SaleDate, Amount) VALUES
('2024-01-01', 100),
('2024-01-02', 150),
('2024-01-03', 200),
('2024-01-04', 75),
('2024-01-05', 300);
SQL

3. Run a query to calculate cumulative sum

SELECT
    SaleDate,
    Amount,
    SUM(Amount) OVER (ORDER BY SaleDate) AS CumulativeSum
FROM
    Sales
ORDER BY
    SaleDate;
SQL

28. Difference between Function and Store procedure ?

  • Return Value: Functions must return a value, whereas stored procedures may or may not return a value.
  • Usage: Functions are used to compute and return values within SQL queries, while stored procedures are used to execute a series of SQL statements and tasks.
  • Flexibility: Stored procedures offer more flexibility in terms of executing multiple statements, handling transactions, and performing complex logic, whereas functions are limited to returning a single value.
  • Invoking: Functions are typically invoked within SQL statements, while stored procedures are executed by calling their name.

29. Do we use variable in views?

No, SQL views do not typically support the use of variables directly within their definition. Views are essentially stored queries that are executed dynamically when queried against. They are used to simplify complex queries and provide a layer of abstraction over the underlying tables.

Variables, on the other hand, are used within procedural code (like stored procedures or functions) to hold temporary values or perform calculations. They are not directly supported within the definition of SQL views.

If you need to use variables within SQL operations, you would typically use them within stored procedures, functions, or dynamic SQL statements, but not directly within views.

30. What are the limitations of views?

Here are the limitations of views

  1. Performance Impact: Views can impact query performance due to query complexity and lack of materialization.
  2. Limited Updateability: Views may not support updates if they involve complex joins, aggregations, or derived columns.
  3. Index Limitations: Indexed views may not be optimally used, especially with complex view definitions.
  4. Security Constraints: Access to views depends on underlying table permissions, limiting fine-grained access control.
  5. Complexity Restrictions: Some databases restrict the nesting and data types in view definitions.
  6. Platform-Specific: Different database systems have varying syntax support and optimizations for views.
  7. Materialization: Materialized views introduce maintenance and storage considerations.