【发布时间】:2020-09-22 07:59:34
【问题描述】:
我正在做一个 pyspark 项目,下面是我的项目目录结构。
project_dir/
src/
etl/
__init__.py
etl_1.py
spark.py
config/
__init__.py
utils/
__init__.py
test/
test_etl_1.py
setup.py
README.md
requirements.txt
当我在单元测试代码下面运行时,我得到了
python test_etl_1.py
Traceback (most recent call last):
File "test_etl_1.py", line 1, in <module>
from src.etl.spark import get_spark
ImportError: No module named src.etl.spark
这是我的单元测试文件:
from src.etl.spark import get_spark
from src.etl.addcol import with_status
class TestAppendCol(object):
def test_with_status(self):
source_data = [
("p", "w", "pw@sample.com"),
("j", "b", "jb@sample.com")
]
source_df = get_spark().createDataFrame(
source_data,
["first_name", "last_name", "email"]
)
actual_df = with_status(source_df)
expected_data = [
("p", "w", "pw@sample.com", "added"),
("j", "b", "jb@sample.com", "added")
]
expected_df = get_spark().createDataFrame(
expected_data,
["first_name", "last_name", "email", "status"]
)
assert(expected_df.collect() == actual_df.collect())
我需要将此文件作为 pytest 运行,但由于模块错误,它无法正常工作。你能帮我解决这个错误吗?
【问题讨论】:
-
这能回答你的问题吗? Using pytest with a src layer
标签: python apache-spark pyspark pytest