【发布时间】:2020-06-12 23:51:26
【问题描述】:
这里有一些 data 我将用来演示我的问题。这是从我找到的一个旧问题here 中衍生出来的一个问题。
好的,开始吧,我正在实现这段代码:
1) 加载数据,建立有效和无效数据。
df = pd.read_excel('Downloads/output.xlsx', index_col='date')
good_ranges = []
for i in df:
col = df[i]
gauge_name = col.name
start_mark = (col.notnull() & col.shift().isnull())
start = col[start_mark].index
end_mark = (col.notnull() & col.shift(-1).isnull())
end = col[end_mark].index
for s, e in zip(start, end):
good_ranges.append((gauge_name, s, e))
good_ranges = pd.DataFrame(good_ranges, columns=['gauge', 'start', 'end'])
good_ranges 产量:
gauge start end
0 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-02-06 2019-08-27
1 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-08-30 2019-10-01
2 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-10-09 2019-10-19
3 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-10-22 2019-10-22
4 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-10-25 2019-10-25
5 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-10-27 2019-10-31
6 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-11-05 2019-11-29
7 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-12-01 2019-12-02
8 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-12-04 2019-12-29
9 GALISTEO CREEK BELOW GALISTEO DAM, NM 2020-01-01 2020-01-02
10 GALISTEO CREEK BELOW GALISTEO DAM, NM 2020-01-04 2020-01-17
11 GALISTEO CREEK BELOW GALISTEO DAM, NM 2020-01-19 2020-02-04
12 RIO GRANDE AT OTOWI BRIDGE, NM 2019-02-06 2020-02-04
2) 在有有效数据的地方创建数据图
fig, ax = plt.subplots(figsize=(14,8))
ax = ax.xaxis_date()
ax = plt.hlines(good_ranges['gauge'],
dt.date2num(good_ranges['start']),
dt.date2num(good_ranges['end']))
fig.tight_layout()
plt.show()
3) 查找有效每日数据天数;指定具有 >350 天有效数据的位置
c = good_ranges[['start','end']]
good_ranges['Days'] = good_ranges['end'] - good_ranges['start']
good_ranges['Days'] = good_ranges['Days'].dt.days
df_days = good_ranges.filter(['gauge','Days'], axis=1)
df_new = df_days.groupby(df_days['gauge']).sum()
df_new['Day_Con'] = np.nan
df_new['Day_Con'] = 'YES'
df_new.loc[df_new['Days'] > 350,'Day_Con'] = 'NO'
#### Sort the gauge so that the list will line up with the list of ylabels
df_new = df_new.sort_values("gauge",ascending=False)
print(df_new)
yes_idxs = list(np.where(df_new["Day_Con"] == "YES")[0])
print(yes_idxs)
no_idxs = list(np.where(df_new["Day_Con"] == "NO")[0])
print(no_idxs)
这是 df_new 参数在找到具有 >350 天数据的计量站点后产生的结果
gauge
TESUQUE CREEK ABOVE DIVERSIONS NEAR SANTA FE, NM 310 YES
SANTA FE RIVER NEAR SANTA FE, NM 336 YES
SANTA FE RIVER ABOVE MCCLURE RES, NR SANTA FE, NM 344 YES
SANTA FE RIVER ABOVE COCHITI LAKE, NM 363 NO
SANTA CRUZ RIVER NEAR CUNDIYO, NM 304 YES
RIO TESUQUE BELOW DIVERSIONS NEAR SANTA FE, NM 361 NO
RIO NAMBE BELOW NAMBE FALLS DAM NEAR NAMBE, NM 363 NO
RIO NAMBE ABOVE NAMBE FALLS DAM NEAR NAMBE, NM 267 YES
RIO GRANDE AT OTOWI BRIDGE, NM 363 NO
GALISTEO CREEK BELOW GALISTEO DAM, NM 328 YES
yes_idxs 产量:
[0, 1, 2, 4, 7, 9]
no_idxs 产量:
[3, 5, 6, 8]
我想我可以指定 yes_idxs 和 no_idxs 来帮助我在正确的标签位置指定 ylabel 颜色。但是,当我运行下面的代码时,您只能指定一个整数或索引来使其工作。
ax.get_yticklabels()[1].set_color("red")
本质上,如果值不符合 > 350 天的标准,我希望能够将水平线和/或 ylabels 文本突出显示为红色。目前,我似乎找不到解决此问题的简单方法。
提前感谢您的帮助!
【问题讨论】:
标签: python python-3.x pandas matplotlib plot