Updated BDS-C00 Dumps Prep Naterials To Help You Successfully Pass The AWS Certified Big Data – Speciality Exam

Updated BDS-C00 dumps preparation materials are available at passitdump.com – BDS-C00 dumps, ensuring to help you pass the important AWS Certified SpecialtyBDS-C00 exam. The dumps questions here have been answered by top experts with extensive experience in the actual BDS-C00 exam, ensuring effective preparation that you can rest assured.

Although questions are from BDS-C00 free dumps, the validity and accuracy of the BDS-C00 dumps are absolutely guaranteed.

Question 1:

A data engineer in a manufacturing company is designing a data processing platform that receives a large volume of unstructured data. The data engineer must populate a well-structured star schema in Amazon Redshift. What is the most efficient architecture strategy for this purpose?

A. Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV data into the analysis schema within Redshift.

B. Load the unstructured data into Redshift, and use string parsing functions to extract structured data for inserting into the analysis schema.

C. When the data is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform the file contents. Insert the data into the analysis schema on Redshift.

D. Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and use AWS Lambda to INSERT the data into Redshift.

Correct Answer: A


Question 2:

A data engineer chooses Amazon DynamoDB as a data store for a regulated application. This application must be submitted to regulators for review. The data engineer needs to provide a control framework that lists the security controls from

the process to follow to add new users down to the physical controls of the data center, including items like security guards and cameras.

How should this control mapping be achieved using AWS?

A. Request AWS third-party audit reports and/or the AWS quality addendum and map the AWS responsibilities to the controls that must be provided.

B. Request data center Temporary Auditor access to an AWS data center to verify the control mapping.

C. Request relevant SLAs and security guidelines for Amazon DynamoDB and define these guidelines within the application\’s architecture to map to the control framework.

D. Request Amazon DynamoDB system architecture designs to determine how to map the AWS responsibilities to the control that must be provided.

Correct Answer: A


Question 3:

A large grocery distributor receives daily depletion reports from the field in the form of gzip archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job. Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the EMR job. Which recommendation should an administrator provide?

A. Reduce the HDFS block size to increase the number of task processors.

B. Use bzip2 or Snappy rather than gzip for the archives.

C. Decompress the gzip archives and store the data as CSV files.

D. Use Avro rather than gzip for the archives.

Correct Answer: B


Question 4:

A web-hosting company is building a web analytics tool to capture clickstream data from all of the websites hosted within its platform and to provide near-real-time business intelligence. This entire system is built on AWS services. The web-hosting company is interested in using Amazon Kinesis to collect this data and perform sliding window analytics.

What is the most reliable and fault-tolerant technique to get each website to send data to Amazon Kinesis with every click?

A. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the sessionID as a partition key and set up a loop to retry until a success response is received.

B. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis Producer Library .addRecords method.

C. Each web server buffers the requests until the count reaches 500 and sends them to Amazon Kinesis using the Amazon Kinesis PutRecord API call.

D. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the exponential back-off algorithm for retries until a successful response is received.

Correct Answer: A


Question 5:

A customer has an Amazon S3 bucket. Objects are uploaded simultaneously by a cluster of servers from multiple streams of data. The customer maintains a catalog of objects uploaded in Amazon S3 using an Amazon DynamoDB table. This

catalog has the following fileds: StreamName, TimeStamp, and ServerName, from which ObjectName can be obtained.

The customer needs to define the catalog to support querying for a given stream or server within a defined time range.

Which DynamoDB table scheme is most efficient to support these queries?

A. Define a Primary Key with ServerName as Partition Key and TimeStamp as Sort Key. Do NOT define a Local Secondary Index or Global Secondary Index.

B. Define a Primary Key with StreamName as Partition Key and TimeStamp followed by ServerName as Sort Key. Define a Global Secondary Index with ServerName as partition key and TimeStamp followed by StreamName.

C. Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with StreamName as Partition Key. Define a Global Secondary Index with TimeStamp as Partition Key.

D. Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with TimeStamp as Partition Key. Define a Global Secondary Index with StreamName as Partition Key and TimeStamp as Sort Key.

Correct Answer: A


Question 6:

A media advertising company handles a large number of real-time messages sourced from over 200 websites in real time. Processing latency must be kept low. Based on calculations, a 60-shard Amazon Kinesis stream is more than sufficient to handle the maximum data throughput, even with traffic spikes. The company also uses an Amazon Kinesis Client Library (KCL) application running on Amazon Elastic Compute Cloud (EC2) managed by an Auto Scaling group. Amazon CloudWatch indicates an average of 25% CPU and a modest level of network traffic across all running servers.

The company reports a 150% to 200% increase in latency of processing messages from Amazon Kinesis during peak times. There are NO reports of delay from the sites publishing to Amazon Kinesis.

What is the appropriate solution to address the latency?

A. Increase the number of shards in the Amazon Kinesis stream to 80 for greater concurrency.

B. Increase the size of the Amazon EC2 instances to increase network throughput.

C. Increase the minimum number of instances in the Auto Scaling group.

D. Increase Amazon DynamoDB throughput on the checkpoint table.

Correct Answer: D


Question 7:

A Redshift data warehouse has different user teams that need to query the same table with very different query types. These user teams are experiencing poor performance. Which action improves performance for the user teams in this situation?

A. Create custom table views.

B. Add interleaved sort keys per team.

C. Maintain team-specific copies of the table.

D. Add support for workload management queue hopping.

Correct Answer: D

Reference: https://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html


Question 8:

A company operates an international business served from a single AWS region. The company wants to expand into a new country. The regulator for that country requires the Data Architect to maintain a log of financial transactions in the country within 24 hours of the product transaction. The production application is latency insensitive. The new country contains another AWS region.

What is the most cost-effective way to meet this requirement?

A. Use CloudFormation to replicate the production application to the new region.

B. Use Amazon CloudFront to serve application content locally in the country; Amazon CloudFront logs will satisfy the requirement.

C. Continue to serve customers from the existing region while using Amazon Kinesis to stream transaction data to the regulator.

D. Use Amazon S3 cross-region replication to copy and persist production transaction logs to a bucket in the new country\’s region.

Correct Answer: B


Question 9:

An administrator needs to design the event log storage architecture for events from mobile devices. The event data will be processed by an Amazon EMR cluster daily for aggregated reporting and analytics before being archived. How should the administrator recommend storing the log data?

A. Create an Amazon S3 bucket and write log data into folders by device. Execute the EMR job on the device folders.

B. Create an Amazon DynamoDB table partitioned on the device and sorted on date, write log data to table. Execute the EMR job on the Amazon DynamoDB table.

C. Create an Amazon S3 bucket and write data into folders by day. Execute the EMR job on the daily folder.

D. Create an Amazon DynamoDB table partitioned on EventID, write log data to table. Execute the EMR job on the table.

Correct Answer: A


Question 10:

A data engineer wants to use an Amazon Elastic Map Reduce for an application. The data engineer needs to make sure it complies with regulatory requirements. The auditor must be able to confirm at any point which servers are running and

which network access controls are deployed.

Which action should the data engineer take to meet this requirement?

A. Provide the auditor IAM accounts with the SecurityAudit policy attached to their group.

B. Provide the auditor with SSH keys for access to the Amazon EMR cluster.

C. Provide the auditor with CloudFormation templates.

D. Provide the auditor with access to AWS DirectConnect to use their existing tools.

Correct Answer: C


Question 11:

A social media customer has data from different data sources including RDS running MySQL, Redshift, and Hive on EMR. To support better analysis, the customer needs to be able to analyze data from different data sources and to combine the results.

What is the most cost-effective solution to meet these requirements?

A. Load all data from a different database/warehouse to S3. Use Redshift COPY command to copy data to Redshift for analysis.

B. Install Presto on the EMR cluster where Hive sits. Configure MySQL and PostgreSQL connector to select from different data sources in a single query.

C. Spin up an Elasticsearch cluster. Load data from all three data sources and use Kibana to analyze.

D. Write a program running on a separate EC2 instance to run queries to three different systems. Aggregate the results after getting the responses from all three systems.

Correct Answer: B


Question 12:

An Amazon EMR cluster using EMRFS has access to petabytes of data on Amazon S3, originating from multiple unique data sources. The customer needs to query common fields across some of the data sets to be able to perform interactive joins and then display results quickly.

Which technology is most appropriate to enable this capability?

A. Presto

B. MicroStrategy

C. Pig

D. R Studio

Correct Answer: C


Question 13:

A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance.

How should the administrator accomplish this task?

A. Feed the data into Amazon Machine Learning and build a regression model.

B. Feed the data into Spark Mlib and build a random forest modest.

C. Feed the data into Apache Mahout and build a multi-classification model.

D. Feed the data into Amazon Machine Learning and build a binary classification model.

Correct Answer: B


Question 14:

A company is using Amazon Machine Learning as part of a medical software application. The application will predict the most likely blood type for a patient based on a variety of other clinical tests that are available when blood type knowledge

is unavailable.

What is the appropriate model choice and target attribute combination for this problem?

A. Multi-class classification model with a categorical target attribute.

B. Regression model with a numeric target attribute.

C. Binary Classification with a categorical target attribute.

D. K-Nearest Neighbors model with a multi-class target attribute.

Correct Answer: A


Question 15:

An Amazon Kinesis stream needs to be encrypted. Which approach should be used to accomplish this task?

A. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the producer.

B. Use a partition key to segment the data by MD5 hash functions, which makes it undecipherable while in transit.

C. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the consumer.

D. Use a shard to segment the data, which has built-in functionality to make it indecipherable while in transit.

Correct Answer: B

Reference: https://docs.aws.amazon.com/firehose/latest/dev/encryption.html