对于特征分析与特征提取,主要从两个角度挖掘特征和两个维度区分目标,从而总结了四类特征。
首先从站点角度分析,每日客流变化跟前一同类型的日期大体相同,但是什么造成了波动。其次,从用户角度分析,大部分用户的行为轨迹应该具有重复性,即客流稳定性。对于两个维度,时间上,区分节假日、高峰时段带来的客流影响。空间上,区分地理位置不同、功能属性不同带来的客流影响。通过上述的两个角度和维度,总结出了四类特征,即基本特征,时间类特征,历史时间类特征和地理信息类特征。
To build the characteristics of passenger flow prediction under the framework of passenger flow machine learning prediction, it is necessary to analyze its changing characteristics and influencing factors on the basis of fully understanding the passenger flow data. Construct some characteristics with physical meaning from the original and available data, It makes its input play a certain role in the prediction of subway passenger flow.
Feature analysis overview
Generally speaking, there is no established theory and standard for feature construction, so we need to have a deep understanding and grasp of the problem. The purpose of feature construction is to build features related to the prediction target. As the basic set of subsequent feature selection, this link should build more features, and the features that affect the prediction result should be included as much as possible, so as to leave room for subsequent feature selection.
(1)Basic features: common features are those used for passenger flow forecasting in non-special cases. If the construction of the conventional features in this paper is to use the features often used in the passenger flow prediction problem in the current study and judge the passenger flow prediction based on experience Effectively select relevant passenger flow data and quantify relevant qualitative features.
(2)Time features: commuting characteristics of regular passenger flow. Include date type and peak hours. The data types include weekdays, weekends and holidays, and peak hours include peacetime, peak hours and special peak hours.
(3)Historical passenger flow features: continuous correlation of passenger flow changes. The change in passenger flow is a dynamic and continuous process in time. The state of passenger flow at any moment is the change of the state of passenger flow at the previous moment.
(4) Geographic information features: Mainly describe
the factors related to geographic information that may affect passenger flow.
Basic feature
According to the basic information obtained during data preprocessing, including stationID, day, hour, minute, week, timeCut, inNums, outNums. Among them, inNums represents the number of inbound during this period, outNums represents the number of outbound during this period.
(1) Basic feature
There are many types of basic information. Basic stations, weeks, days, hours, and minutes are constructed as basic time information features.
(2) Time period
The variable t represents the index of the time segment obtained by dividing a day into 10 minutes. The objective output of this paper is to reflect the prediction of passenger flow in 10 minutes. Because 10 minutes can reflect the changes in time and give the response time to the station operation management, the 10-minute is taken as the granularity of passenger flow prediction. A day can be divided into 144 times section.
(3) Weeks
Use W (t) to indicate the day of the week.
Time feature
Forthe prediction of urban rail transit passenger flow, passenger flow will also be affected by common time factors. Therefore, it is necessary to analyze the time influencing factors for the change of conventional passenger flow and build a time feature on this basis. The construction basis of the time feature is as follows.
(1)
Type of date In terms of the representation of time types, it can be divided into three types, specifically weekdays, weekends and holidays (Bai et al., 2017). According to the display in the figure below, we can see that there is a massive difference between the weekend passenger flow and the usual passenger flow. Therefore, the weekend and weekdays should be treated as different features, which can make the feature extraction more in line with real-world features.
From the figure below, the first day is a holiday, and it can be easily seen that the passenger flow of holidays is very different from the usual weekends and ordinary working days.
Therefore, weekdays, weekends, and holidays should be reated as characteristics of the date type.
See formula (10) for definition. According to the date type, each day is divided into three types according to its type: working days, weekends, and holidays, which are represented by characteristic variables F (t).
(2) Peak time
The types of events in the course of the day can have peaks and periods. As can be seen from the figure below, there are generally two peaks in a day, namely the morning peak and the evening peak, respectively. However, it is worth noting that there is also a peak between the morning peak and the evening peak. I judge this peak as a special peak.
Historical passenger flow feature
Historical passenger flow feature, which is time window feature. The time window feature is a more powerful feature because the value of one time is closely related to the value of the previous time. For the time series data, the weighted value of the short moment is constructed. By scrolling through the time window, the feature of short time window of different time is generated step by step.
(1)Passenger flow in adjacent periods
Consideringthat there is a certain relationship between the traffic information of the current time and the traffic information of the front and back time, the traffic information of the first two time periods and the last two time periods of the current time period is taken as the characteristics. Use inNums_before1, inNums_before2, inNums_after1, inNums_after2, outNums_before1, outNums_before2, outNums_after1, outNums_after2 as the passenger flow characteristics at the corresponding time.
(2)Passenger flow of adjacent days
The max value, mean value and min value reflect the distribution information of data to a certain extent (Liu et al., 2017), so the max value, mean value and mini value of inbound and outbound traffic in the same week are added as features.
(3)Weekly passenger flow in the previous sequence
The distribution characteristics can also be reflected by using the passenger flow inbound and outbound of the same station at the same time period of the same week and the same day (Luo, 2017).
Geographic information feature
The feature of geographic information mainly includes station type, station attributes, station equipment, passenger flow stability and so on. Geographic information features are fully considered in the design of all information related to geographic information. From the perspective of geographic location, function, attributes and passenger flow stability, sufficient consideration has been given.
(1)Station type
Different stations have different types, such as starting station, transfer station, ordinary station. The number of neighboring stations can express the type of these stations, so the number of neighboring stations is counted according to the road network diagram to indicate each station type. The starting station has only one adjacent station, the ordinary station consists of two adjacent stations, and the transfer station has more than two adjacent stations.
(2)Station attributes
Different stations also have different attributes. According to different attributes, it can be divided into commercial attributes, working attributes and residential attributes. According to the figure 17, the passenger flow law between different stations is very different. Some stations have more passenger traffic on weekends than on weekdays so that they can be judged as commercial attributes.
It can be seen from the figure below that some stations have much more passenger traffic on weekdays than on weekends, so they can be judged as residential or work attributes. In addition to observing the difference in passenger traffic between weekends and working days, the time distribution of entrances and exits within a day is also worth noting. As can be seen from the following two figures, some stations have a sizeable inbound passenger flow in the morning and a large outbound passenger flow in the evening, so they are judged as working attributes. Some stations are just the opposite (Ni, 2016). If there is a large outbound passenger flow in the morning and a large inbound passenger flow at night, then it will be judged as a residential property.
(3)Characteristics of station equipment
Considering the characteristics related to the card swiping device, the number of devices at each site has a certain relationship with the flow of people at that site, and the number of devices in each period also has a certain relationship with the flow.
(4)Passenger flow stability
The ratio of the number of people who enter and exit the same station once a day accounts for people entering the station. The proportion of the passenger flow at each station is relatively stable, and some have reached 70%, and the daily fluctuations are relatively regular. It can be considered that among the people
traveling by public transportation such as the subway, the residents who live near the subway station are the main ones. That is to say, a large proportion of the number of people entering and leaving the same site every day is the same person. Express this characteristic with passenger flow stability.
Summary
For feature analysis and feature extraction, we mainly excavate features from two perspectives and distinguish the targets from two dimensions, thus summarizing four types of features.
For the two perspectives, first of all, from the perspective of the site, the daily passenger flow changes are the same as the
previous day, but what caused the fluctuations. Secondly, from the user’s perspective, the behavior trajectory of most users should be repeatable, that is, the stability of passenger flow. For the two dimensions, in time, distinguish the impact of passenger flow brought by holidays and peak hours. Spatially, distinguish the influence of passenger flow caused by different geographical locations and different functional attributes. Through the above two angles and dimensions, four types of features are summarized, namely basic features, time features, historical time features and geographic information features.