IBM 認證 watsonx 資料科學家 – 助理:C1000-177
認證概覽
助理資料科學家擁有基本的資料科學技能和知識,可以使用 IBM watsonx.ai 透過機器學習解決方案解決商業問題。這包括將機器學習解決方案與企業需求連結並了解何時應用企業人工智慧工作流程的能力。
此助理級考試的概念包括:
問題範圍和工具選擇
探索性資料分析
特徵工程
模型訓練與選擇
模型評估
推薦技能
Python
R
描述性統計
預測分析
要求
考試 C1000-177:使用 IBM watsonx 的資料科學基礎
考試目標
在考試開發過程中,主題專家 (SME) 定義個人成功履行產品或解決方案角色所需的所有任務、知識和經驗。這些由以下目標代表,考試中的問題是基於這些目標。
問題數: 61
待通過的問題數: 43
允許時間: 90分鐘
狀態: 即時
第 1 部分:評估業務問題16%
第 2 部分:執行探索性資料分析21%
第 3 節:開發工具與技術13%
第 4 節:預處理和特徵工程33%
第 5 節:模型選擇、訓練、評估與演示17%
Sample TestExam: C1000-177 Foundations of Data Science using IBM watsonx
1. Which statement describes sample variance and standard deviation?
A. Variance is the square root of the average squared deviations from the mean, while
standard deviation is the sum of these squared deviations.
B. Variance is the average distance of each data point from the mean, while standard
deviation is the total distance of all points from the mean divided by the number of
points.
C. Variance is the average of all squared differences from the mean, while standard
deviation is the square root of the variance, showing how spread out the data is from
the mean.
D. Variance measures the total variability within a dataset without normalization, while
standard deviation is calculated by dividing the variance by the number of data points
minus one.
2. Which statement describes covariance?
A. A measure of model performance.
B. A measure of association between two variables.
C. A measure of the distribution of values within a variable.
D. A measure of overall predictive power of a set of variables.
3. Which statement is true about a categorical feature that has 998 unique values in a
dataset of 1000 records?
A. Unless some categories could be split, this feature should be excluded from the
model since some of its values are not unique.
B. This feature could be included into the model without any transformations since it
has enough categories relative to the total number of records.
C. Unless some categories could be grouped, this feature should be excluded from the
model since it has too many categories relative to the total number of records.
D. This feature could be included into the model since the number of rows with nonunique values is small, which means they could be safely deleted during
preprocessing.
4. A customer needs a model to identify fraud by flagging transactions with either a 0
or 1. Which type of model should they choose?
A. k-means
B. Linear Regression
C. Logistic Regression
D. Multi-class Decision Tree
5. What is the repository of statistic libraries called that is used by the R programming
language?
A. scikit-learn
B. R Package Manager (RPM)
C. MLlib (Machine Learning library)
D. Comprehensive R Archive Network (CRAN)
6. When is z-score normalization most useful?
A. When the data falls between 0 and 1.
B. When the data contains many outliers.
C. When the original range of the data needs to be retained.
D. When data is normally distributed.
7. The number of television viewers during a popular sporting event can far exceed
the number of viewers at other times of the year. Does this represent an intentional
or unintentional anomaly?
A. Intentional, because it is expected, due to a real-world event.
B. Intentional, because it represents the broadcaster’s desired result.
C. Unintentional, because there is no process designed to create the anomaly.
D. Unintentional, because it arises from noise in the data collection process.
8. What does a stratified train test split ensure?
A. The split is random without any additional conditions.
B. The positive to negative cases ratio is the same in train and test sets.
C. The train and test sets have the same absolute number of positive cases.
D. The positive to negative cases ratio in the train set is as close to 1 as possible.
9. How should k-fold cross validation be performed when data is split into test and
train sets?
A. Split the train set on k folds. Train the model on each fold and validate on test data.
B. Split the train set on k folds. For each fold perform training using other k-1 folds and
validate on this fold.
C. Split the entire dataset on k folds. For each fold train the model on this fold and
validate on the rest k-1 folds.
D. Split the entire dataset on k folds. For each fold perform training using other k-1
folds and validate on the entire dataset.
10. If the goal of the model is to predict whether a user will churn, which feature will
cause data leakage if included in the training set?
A. Subscribe date
B. Unsubscribe date
C. Last payment date
D. Last suspension date
11. What is the standard approach in evaluating the performance of classification
models using supervised machine learning?
A. Regularization
B. Cross-validation
C. Confusion matrix
D. Principal component analysis (PCA)
Answer Key
1. C
2. B
3. C
4. C
5. D
6. D
7. A
8. B
9. B
10.B
11.C
Comments
Warning: count(): Parameter must be an array or an object that implements Countable in /var/www/html/wwwroot/itrenzheng.hk/wp-includes/class-wp-comment-query.php on line 399
Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!