[2026] Valid DAA-C01 test answers & Snowflake DAA-C01 exam pdf
Verified DAA-C01 dumps Q&As - Pass Guarantee or Full Refund
NEW QUESTION # 11
You have a Snowflake table 'RAW DATA containing a 'VARIANT column called 'json_data'. This column stores JSON objects representing customer orders. The structure includes a nested array of items within each order. You need to create a flattened table called 'ORDER ITEMS with the following columns: 'order_id', and However, the field is not directly present in the JSON data'. Instead, it needs to be derived by concatenating the 'order_id' with the index (ordinal position) of the item within the 'items' array. The structure looks like this: { "order_id": "ORD-123", "customer_id": "CUST-456", "items": [ { "item_name": "Laptop", "item_price": 1200 }, { "item_name": "Mouse", "item_price": 25 } ] } Which of the following SQL statements correctly creates the 'ORDER ITEMS table?
- A. Option A
- B. Option E
- C. Option C
- D. Option B
- E. Option D
Answer: B
Explanation:
Option E correctly uses the 'LATERAL FLATTEN' function to unnest the 'items' array. The key is using 'f.seq' (sequence number) provided by 'FLATTEN' function, which is the ordinal position of the item in the array (starting from 1), to create the item_id. It concatenates the order_id with the sequence number to generate a unique item_id. Option A is wrong because row_number is an aggregate function, and needs Group by to execute. Option B uses f.index, which does not exist in the output of Lateral flatten. Options C is correct in most of the parameters, however, 'raw_data' alias is missing, as a result the result will be error. Option D also uses seq, however it adds 1, which changes the index to start from 1, which might be wrong.
NEW QUESTION # 12
You have a Snowflake table called 'PRODUCT SALES' with columns 'PRODUCT ID (INT), 'SALE DATE' (DATE), and 'SALES AMOUNT' You want to implement a data integrity rule to prevent duplicate records based on 'PRODUCT ID and 'SALE DATE. Which of the following methods provides the most effective way to achieve this in Snowflake, and why?
- A. Create a view that filters out duplicate records using 'ROW NUMBER()' and partitioning by 'PRODUCT ID and 'SALE DATE. This guarantees data integrity during query execution.
- B. Create a user defined function (UDF) that checks for the existance of data before insert. If data already exist, don't insert it.
- C. Create a stored procedure that runs periodically to identify and delete duplicate records based on 'PRODUCT and 'SALE DATE'. This approach fixes integrity issues reactively.
- D. Implement data validation within your ETL pipeline before loading data into Snowflake to prevent duplicates from entering the table. This approach keeps the table clean from the start.
- E. Add a composite UNIQUE constraint on 'PRODUCT ID' and 'SALE DATE. Snowflake will automatically prevent the insertion of duplicate rows during data loading or insertion.
Answer: D,E
Explanation:
Options C and D provide the most effective methods. A composite UNIQUE constraint directly prevents duplicate insertions at the table level. Validating in the ETL pipeline (D) prevents duplicates before they even reach the database. A view (A) only masks the issue, and a stored procedure (B) is reactive and doesn't prevent duplicates from being inserted in the first place. A UDF could be helpful but is not the BEST option for this scenario.
NEW QUESTION # 13
How do materialized views differ from secure views in data analysis?
- A. Materialized views restrict data access for improved security.
- B. Secure views provide precomputed snapshots, unlike materialized views.
- C. Secure views precompute data, unlike materialized views.
- D. Materialized views offer enhanced data security while allowing selective data access.
Answer: C
Explanation:
Secure views offer enhanced data security without precomputing data, distinguishing them from materialized views.
NEW QUESTION # 14
How do logging and monitoring solutions contribute to data processing solutions? (Select all that apply)
- A. Ensure real-time data processing
- B. Provide insights into processing status
- C. Automate data processing effectively
- D. Respond to processing failures promptly
Answer: B,D
Explanation:
Logging and monitoring solutions help respond to processing failures promptly and provide insights into processing status.
NEW QUESTION # 15
You have a table 'product_catalog' containing a 'description' column of type TEXT, and a 'tags' column which is a VARIANT containing an array of strings representing tags associated with the product. You need to build an efficient search mechanism that allows users to find products matching specific tags. Considering scalability and performance for large catalogs, which of the following methods using table functions and Snowflake's search capabilities would be most suitable? Choose all that apply.
- A. Create a search optimization service on the 'product_catalog' table including the 'description' and 'tags' columns. Use LATERAL FLATTEN to expand the 'tags' array and then create an index on the flattened 'tag' values.
- B. Use a Java UDF to iterate over the 'tags' array and check if any of the tags match the search terms. Apply this UDF in a WHERE clause along with a CONTAINS() check on the 'description'
- C. Create a search optimization service on the 'product_catalog' table including the 'description' column. When querying, use a combination of CONTAINS() for 'description' and ARRAY_CONTAINS() on the 'tags' column.
- D. Create a search optimization service on the 'product_catalog' table including the 'description' and 'tags' columns. When querying, use a combination of CONTAINS() for 'description' and ARRAY_CONTAINS() on the 'tags' column and a 'SEARCH' clause to filter results.
- E. Create a view that flattens the 'tags' array using LATERAL FLATTEN into a 'tag' column, and then create a full-text index on the 'description' column. Query the view using CONTAINS() or LIKE operator on the 'description' and EQUALS operator on the 'tag' column.
Answer: C,D
Explanation:
Search optimization service in Snowflake is designed to accelerate search queries and is best practice here. Using 'ARRAY on the 'tags' column lets you directly check if the array contains specific tags. Using on the 'description' column can search for specific search terms in your description. Using a 'SEARCH' clause can improve search performnce significantly. Option C and E, are both correct, since they use contains as well as the array_contains but option E includes the use of Search which is more effiecient. Option A is incorrect, as indexes are not allowed on flattened data. UDF will have performance issues. Creating a view and indexing the view is not optimal as querying directly with CONTAINS on the tags column gives faster results.
NEW QUESTION # 16
What is the primary benefit of using secure views in data analysis?
- A. Secure views offer enhanced data security while allowing selective data access.
- B. They don't impact data security but significantly enhance query performance.
- C. Secure views simplify complex data structures more effectively than materialized views.
- D. They prevent the creation of materialized views.
Answer: A
Explanation:
Secure views enhance data security while allowing selective data access.
NEW QUESTION # 17
A marketing analytics team is building a dashboard to track campaign performance. They have campaign data stored in Snowflake, including cost, impressions, clicks, and conversions. The data is currently stored in a single table, 'CAMPAIGN DATA, with columns like 'date', 'cost', 'impressions', 'clicks' , and 'conversions'. They want to optimize query performance for various aggregations and time-series analysis. Which of the following strategies would be MOST beneficial for improving dashboard responsiveness?
- A. Create a materialized view that pre-aggregates the data by campaign_id and date.
- B. Cluster the 'CAMPAIGN DATA' table on the 'campaign_id' column.
- C. Create a search optimization on the 'CAMPAIGN DATA' table on the 'campaign_id' column.
- D. Create a standard view that performs the aggregations on demand.
- E. Partition the 'CAMPAIGN_DATX table by date.
Answer: A
Explanation:
Materialized views (option A) pre-compute and store the results of the aggregation, making the dashboard queries much faster. Standard views (option B) perform the aggregations every time they are queried. Clustering (option C) can help with filtering but is not as effective as pre-aggregation. Partitioning (option D) is not supported in Snowflake. Search optimization (option E) helps with point lookups but not aggregations over large datasets.
NEW QUESTION # 18
You are loading data from a series of CSV files into Snowflake using Snowsight. The files have varying column orders and some files are missing certain columns. You need to ensure that all data is loaded into a consistent table schema, handling missing columns gracefully.
Which of the following strategies is MOST effective in Snowsight to achieve this?
- A. Pre-process the CSV files before loading using a scripting language (e.g., Python) to standardize the column order and add missing columns with NULL values. Then, load the pre-processed files into Snowflake using Snowsight.
- B. Create multiple file formats, one for each unique CSV file structure. Use Snowsight's 'Load Data' wizard to load each set of files with the corresponding file format. Use UNION ALL to combine the data from multiple tables into a single view.
- C. Define a file format with 'SKIP_HEADER = 1 ' and load all CSV files into a single table with all possible columns defined as VARCHAR. After loading, use SQL queries with and to convert the data to the appropriate data types.
- D. Define a file format with 'SKIP HEADER = 1', "FIELD OPTIONALLY ENCLOSED BY = and "NULL _ IF = (", 'NULL')'. Create a single table with all columns defined and use Snowsight's 'Load Data' wizard to load the CSV files. Columns not present in a given CSV file will automatically be populated with NULL.
- E. Load the data into a staging table with a single VARIANT column. Then, use SQL with 'CASE statements and 'GET ' function to extract data from the VARIANT column into the target table with the desired schema.
Answer: D
Explanation:
Option D is the most effective strategy. By defining the file format with 'SKIP_HEADER = , FIELD_OPTIONALLY_ENCLOSED_BY ' and 'NULL_IF = (", 'NULL')', Snowflake can handle missing columns by populating them with NULL values during the load process. Creating a single table with all columns defined ensures data consistency. Option A works, but the type conversion after loading is less efficient and more error-prone. Option B requires managing multiple file formats and using UNION ALL, which can be complex. Option C using VARIANT will work, but adds extra complexity. Option E requires an external preprocessing step, which is less desirable.
NEW QUESTION # 19
In diagnostic analysis, what significance do demographics and relationships hold when identifying anomalies? (Select all that apply)
- A. Ignoring data relationships for focused analysis
- B. Considering relationships among data variables
- C. Analyzing only recent demographic data for anomalies
- D. Identifying demographic variations linked to anomalies
Answer: B,D
Explanation:
Identifying demographic variations and considering relationships are crucial in identifying anomalies during diagnostic analysis.
NEW QUESTION # 20
You are tasked with performing a descriptive analysis of website traffic data stored in a Snowflake table named 'website traffic'. The table includes columns such as 'session_id', 'user id', 'page_url' , 'timestamp' , and 'device_type'. Which of the following SQL queries would be MOST efficient and accurate for calculating the daily active users (DAU) and their device distribution?
- A.

- B.

- C.

- D.

- E.

Answer: E
Explanation:
Option E is the most efficient and accurate. It correctly uses user_id)' to calculate DALI, groups by date and device type, and orders the results. Option A is missing aggregation to calculate DAU per device. Option B uses APPROX COUNT DISTINCT which is less accurate. Option C counts all user_id entries, not distinct users. Option D includes user_id in the GROUP BY, causing incorrect DAU calculation, and calculates total users incorrectly.
NEW QUESTION # 21
When performing a diagnostic analysis, what action aids in identifying demographics and relationships? (Select all that apply)
- A. Ignoring data relationships for focused analysis
- B. Analyzing statistical trends
- C. Collecting related data
- D. Focusing solely on isolated data points
Answer: B,C
Explanation:
Analyzing statistical trends and collecting related data are crucial in identifying demographics and relationships in diagnostic analysis.
NEW QUESTION # 22
How does leveraging clones aid in handling specific use-cases and maintaining data integrity in Snowflake?
- A. Clones restrict data access for specific user roles
- B. Clones enforce data consistency across multiple warehouses
- C. Clones allow isolated testing and analysis without impacting original data
- D. Clones facilitate real-time data updates
Answer: C
Explanation:
Clones in Snowflake enable isolated testing and analysis without affecting original data, supporting specific use-cases while maintaining data integrity by providing a separate environment for manipulation.
NEW QUESTION # 23
You have a Snowsight dashboard that visualizes daily sales trends. Business users complain that the dashboard takes too long to load, especially when filtering by specific product categories. The underlying data resides in a large table partitioned by 'sale date'. Which of the following actions would BEST improve the dashboard's performance, assuming the filters are appropriately configured in the dashboard and the virtual warehouse size is already appropriately sized?
- A. Implement result caching by setting = TRUE at the session level.
- B. Convert the dashboard into a Streamlit application for improved rendering performance.
- C. Increase the virtual warehouse size used by Snowsight.
- D. Create a materialized view that pre-aggregates the data used by the dashboard, including the dimensions used in the filters.
- E. Use query acceleration on the base table to improve the speed of underlying queries when the filter are being applied by users.
Answer: D
Explanation:
Creating a materialized view pre-aggregates the data, significantly reducing query execution time. The materialized view stores the result of a query, and Snowflake automatically refreshes it when the underlying data changes. Since the product categories are used as filters, pre- aggregating along these dimensions directly addresses the slow loading times. Increasing warehouse size (B) only helps if the compute resources are a bottleneck, which might not be the primary issue. Converting to Streamlit (C) changes the presentation layer but doesn't inherently improve data retrieval. Query Acceleration (D) can help, but only if it properly sized and configured. Session level caching (E) might only benefit the same user, but if multiple users are accessing the same dashboard, the best way would be through pre-aggregated results in a materialized view.
NEW QUESTION # 24
When optimizing query performance in Snowflake, what benefits does result caching provide?
- A. Restricts query optimization
- B. Limits data access for specific user roles
- C. Speeds up query execution by storing intermediate results
- D. Improves schema changes management
Answer: C
Explanation:
Result caching accelerates query execution by storing intermediate results, reducing processing time for repetitive or commonly accessed queries.
NEW QUESTION # 25
You are tasked with enriching a 'SALES DATA' table in Snowflake with geographic information based on IP addresses. You have access to an external function 'GEO LOOKUP(ip_address)' that returns a JSON object containing geographical details (city, region, country) for a given IP address. The 'SALES DATA' table contains 'SALE D', 'CUSTOMER D', ADDRESS', and 'SALE AMOUNT columns. You need to enrich the table with city and country information derived from the IP address. Which of the following statements will correctly add 'CITY' and 'COUNTRY columns to a new table 'ENRICHED SALES DATA based on the external function 'GEO LOOKUP , correctly handling potential NULL values and ensuring data type consistency?
- A. Option A
- B. Option E
- C. Option D
- D. Option C
- E. Option B
Answer: C
Explanation:
Option D is the most robust solution. It extracts the city and country values from the JSON object returned by the function using the operator. The cast ensures the data is stored as strings. Critically, it uses to handle cases where the 'GEO_LOOKUP' function might return NULL (e.g., for invalid IP addresses), preventing errors and providing a default value ('Unknown'). Option A does not handle NULL, Option B's 'GET_PATH' is not a standard Snowflake function for JSON parsing, Options C parses the GEO LOOKUP output to json format if its not which can result in the 'CITY' and 'COUNTRY becoming 'NULL'. The option E 'PARSE_JSON' would throw errors on invalid json strings in the ip address.
NEW QUESTION # 26
You are investigating a performance bottleneck in a frequently used Snowflake query. You suspect the bottleneck might be due to data skew in a particular table. Which Snowflake system function and query combination would be the MOST efficient way to collect data to diagnose data skew on the 'orders' table, specifically for the column?
- A.

- B.

- C.

- D.

- E.

Answer: B
Explanation:
Option B directly addresses data skew by grouping by 'order_date' and counting occurrences, then ordering by count in descending order to show the most frequent dates. The LIMIT 10 clause ensures that only the dates with the highest counts are returned, improving efficiency. Option A only provides an approximate distinct count, not showing the distribution. Option C is used to estimate query cost and not identify data skew. Option D identifies dates with counts above average, which doesn't necessarily indicate skew in the TOP most frequent dates. Option E only shows table properties.
NEW QUESTION # 27
When utilizing materialized views, what benefit do they offer in terms of query performance and data retrieval?
- A. Materialized views provide precomputed snapshots, improving query performance.
- B. They offer real-time updates reflecting instantaneous database changes.
- C. Regular views simplify complex data structures for better query performance.
- D. Materialized views restrict data retrieval for improved security.
Answer: A
Explanation:
Materialized views provide precomputed snapshots, enhancing query performance.
NEW QUESTION # 28
You have a CSV file loaded into a Snowflake table named 'raw data'. The file contains customer order data, but some rows have missing values in the 'order date' column. You need to create a new table, 'cleaned data' , that contains only valid records and handles missing 'order date' values by substituting them with the date '1900-01-01'. Which of the following approaches is the MOST efficient and correct way to achieve this using Snowflake features?
- A.

- B.

- C.

- D.

- E.

Answer: B
Explanation:
Option E is the most efficient and correct. 'COALESCE' efficiently handles NULL replacement, and ensures the replacement value is the correct data type (DATE). It also explicitly selects all other columns. Option A only filters out rows with null order_date. Options B, C and D creates a new column, . It does not also implicitly take all columns, which would make this more appropriate.
NEW QUESTION # 29
When manipulating data in Snowflake, what distinguishes aggregate functions from analytic functions?
- A. Analytic functions operate on individual rows within a partition
- B. Aggregate functions work on entire datasets
- C. Aggregate functions handle distinct value sets only
- D. Analytic functions return single calculated values
Answer: A
Explanation:
Analytic functions perform calculations on individual rows within a partition, while aggregate functions operate on entire datasets, making them distinct in their functionality.
NEW QUESTION # 30
You have identified corrupted data in a production table 'CUSTOMER DATA. Before attempting to clean the data directly in the production table, you want to create a safe environment to test your data cleaning scripts. You are also concerned about the impact of your data cleaning efforts on downstream reporting. Which of the following approaches using Snowflake clones is the MOST appropriate for this scenario?
- A. Create a zero-copy clone of named for testing. Clean the data in Create a separate table named 'CLEANED CUSTOMER DATA'. Insert the cleaned data from 'CUSTOMER DATA DEV' into the new 'CLEANED CUSTOMER DATA' table. Update with the cleaning logic.
- B. Create a zero-copy clone of 'CUSTOMER DATA' named 'CUSTOMER DATA DEV' for testing. Clean the data in 'CUSTOMER DATA DEV'. Create a zero- copy clone of 'CUSTOMER_DATX named Update 'CUSTOMER_DATX with the cleaning logic. Point the downstream reporting to 'CUSTOMER DATA REPORTING'.
- C. Create a full copy of 'CUSTOMER DATA' named 'CUSTOMER DATA DEV' for testing. Clean the data in 'CUSTOMER DATA DE-VS. Use a 'MERGE statement to update with the cleaned data from
- D. Create a zero-copy clone of 'CUSTOMER_DATA' named for testing. Clean the data in 'CUSTOMER_DATA_DEV'. Once satisfied, update the 'CUSTOMER_DATR table directly with the cleaning logic.
- E. Create a zero-copy clone of named for testing. Create another zero-copy clone of 'CUSTOMER DATA DEV' named 'CUSTOMER DATA REPORTING'. Clean the data in 'CUSTOMER DATA DENT. Point downstream reporting to 'CUSTOMER DATA REPORTING'.
Answer: B
Explanation:
Option D is the most appropriate and safely covers all aspects. Cloning to lets you experiment with cleaning. The most important part of the question is to handle the downstream reporting. So cloning 'CUSTOMER DATA' to lets you test how your new updates will affect the reports that depend on the data. Updating the 'CUSTOMER_DATR with the cleaning logic lets you apply the tested data cleaning. The other options do not protect the production reporting from potentially breaking changes during the data cleaning process. They may also directly update the production data, increasing risk. In option B, even though you are pointing to the new cloned reporting table, since that is created from DEV table it will already have changed data, and we want to report on the original, not the one with the dev changes. Option E does not discuss downstream impact on the reports, so this is not fully addressing all the impacts.
NEW QUESTION # 31
You are tasked with ingesting data from a REST API that provides daily sales reports in JSON format. The API has rate limits (100 requests per minute) and returns a large dataset (approximately 5GB per day). The data needs to be processed within 2 hours of its availability. You want to leverage Snowflake external functions and tasks. Which approach balances efficiency, cost, and adherence to rate limits?
- A. Call API directly from a scheduled task without considering Rate Limit. Persist the data using COPY INTO command.
- B. Create a Snowflake task that triggers an external function which retrieves only the metadata (e.g., total records, page count) from the API. Then, create a dynamic number of child tasks, each responsible for retrieving a subset of the data, using the metadata to respect the API rate limits.
- C. Create a Snowflake external function that directly connects to the API and loads data into a staging table. Use Snowpipe with auto-ingest to continuously load the data as it arrives. Ignore the API Rate limits; assume they will be handled by API itself.
- D. Create a Snowflake task that calls an external function. This external function calls an intermediate service (e.g., AWS Lambda, Azure Function) which is responsible for fetching the data in batches, respecting the API rate limits, and storing the data in cloud storage. Snowpipe then loads the data from cloud storage into Snowflake.
- E. Create a single Snowflake task that calls an external function which iterates through all API pages sequentially in a single execution to retrieve and load all the data. Rely on Snowflake's automatic scaling to handle the load.
Answer: D
Explanation:
Option D is the most balanced approach. It leverages an intermediate service to handle the complexities of API interaction (rate limits, pagination), decouples the data retrieval from Snowflake compute, and uses Snowpipe for efficient bulk loading. This approach addresses I both the rate limits and processing time requirements effectively.
NEW QUESTION # 32
......
DAA-C01 Exam Questions – Valid DAA-C01 Dumps Pdf: https://certification-questions.pdfvce.com/Snowflake/DAA-C01-exam-pdf-dumps.html