Endoscopic ultrasound routinely guides lymph node evaluation for the staging of a known or suspected lung cancer. Characteristics seen on B-mode imaging might help the observer decide on the lymph nodes of risk. The influence of nodal size on the predictivity of these characteristics and the agreement with which operators can combine these for malignancy risk prediction is to be determined. We evaluated (1) if prospectively scored individual B-mode ultrasound features predict malignancy when further divided by size and (2) assessed if observers were able to reproducibly agree on still lymph node image malignancy risk. Lymph nodes as visualized by EBUS were prospectively scored for B-mode characteristics. Still B-mode images were furthermore collected. After collection, a repeated scoring of a subset of lymph nodes was retrospectively performed (n = 11 observers). Analysis of 490 lymph nodes revealed the short axis size is an objective measure for stratifying risk of malignancy (ROC area under the curve 0.78). With ≥8-mm size, 210/237 malignant lymph nodes were correctly identified (89% sensitivity, 46% specificity, 61% PPV, and 81% NPV). Secondary addition of B-mode features in <8-mm nodes had limited value. Retrospective analysis of intra- and interobserver scoring furthermore revealed significant disagreement. Lymph nodes of ≥8-mm size and preferably even smaller should be aspirated regardless of other B-mode features. Observer disagreement in scoring both small and large lymph nodes suggests it is infeasible to include subjective features for stratification. Future research should focus on (integrating) other (semi)quantitative values for improving prediction.