The New York City subway Line 7 (Flushing) modernization project consists in installing a Communication Based Train Control (CBTC) system and updating the existing interlocking system. The CBTC system is designed and installed by THALES Toronto.
The main benefits of this modernization will be:
- Reduced headway;
- Signals and track circuit simplification;
- Extended routing possibilities, for instance in case of track failures.
The New York City Transit authority (NYCT) decided to include formal proofs at system level as part of the safety assessment for this project. ClearSy is responsible for these proofs, using the Event-B method and Atelier-B toolkit. In this text, we discuss the methodology applied to obtain these formal proofs and the conclusions so far. The proof project is currently progressing (June 2012) and is estimated at 60%.
Goals and expected benefits
The main goal is to obtain a formal proof for the main safety properties of the system: no collision and no over-speeding. For instance, the first selected property “no collision” is detailed as follows: a train will never encounter any obstacle or any switch not locked in correct position. So we seek to formulate a set of well defined assumptions, such that this “no collision” property can be obtained from these assumptions by pure logic only.
Thus the final proofs seen by the customer are a set of assumptions that lead to the desired properties through formal reasoning considered as completely safe. Those assumptions cover every relevant aspect of the system, from internal design (for example: what algorithm is used to calculate safe braking) to external conditions (for example the worst turns that motion determination can tolerate). Once the assumptions are checked the system is deemed to fulfill the proven properties.
Of course, the proof of a global property like “no collision” is hierarchically decomposed: the top proof steps rely on assumptions that are properties proved later, and so on. How deep we go is decided when an assumption is considered as terminal; for instance the property “the motion determination algorithm always detects a wheel starting to slip” can be considered as a terminal assumption, or be the subject of a lower level proof relying on wheel and sensor physical properties.
The expected benefits of having such proofs are:
- Revealing all assumptions needed;
- Reaching a “proof level” confidence for the system properties.
Revealing all assumptions needed is particularly useful:
- External and environment assumptions (for instance: train operator behavior), well defined assumptions are the unique opportunity to check if they really hold and to check if they still hold in case of any change (environmental change for instance).
- Design assumptions (for instance: sensor properties, algorithm details, etc.): well defined assumptions are obviously useful at system creation and in case of system evolution.
In particular, assumptions regarding the software are intended to contribute to the elimination of bugs or pitfalls, in spite of our not going inside the code. When we assume a well defined property for a precise piece of code, it is easier to check this property by code review or through dedicated testing (or even future proofs).
Assumptions about the software are thus the outputs from the proof team to the THALES design team: they constitute “verification requests” to apply to the code, by any appropriate means (testing, code review…).
We use two main steps to obtain the desired formal proofs:
- Write a document explaining how the system ensures the desired properties. We call this natural language “proof”, with quotes because it is not a formal proof.
- Write event-B models such that the proof performed is the formal equivalent of the natural language “proof”.
The first step is based on the fact that we do not use Event-B to understand why a property is ensured, but to validate that it is really ensured once the “why” is understood. We want to avoid mixing formal notation issues with domain issues, which are paramount. The problem is that the available documents for a system usually describe how the system works, and not why it is designed so. This is all the difference between knowing an algorithm and knowing why it produces the wanted result…
This natural language “proof” step is a way to stabilize this understanding before writing B models. Otherwise, if the understanding changes often, all time is burned in updating the B models.
Natural language “proofs” are supposed to contain only:
- well defined assumptions,
- only pure logical steps,
- resulting properties.
Of course, at this stage there is no tool to check these qualities. The most difficult part is to formulate well defined assumptions. We use the following criteria:
well defined assumptions: in any possible scenario, it should be possible to state unambiguously if the assumption is true or false in that case.
This forces precise terms: for instance for an assumption about wheel-sliding behavior, we have to define what we call “sliding” (this is not so obvious: linked axles always slide on one wheel in turns).
The second step is normally more straightforward: we use Event-B to construct a proof identical to the natural language “proof”. Through Event-B and Atelier-B, we obtain a powerful validation of the forecasted proofs.
Methodology for natural language « proofs »
Up to now, we have worked mainly on natural language “proofs” (we have only written sample B models to prepare for the next B modeling task). So let’s detail the methodology for these natural language “proofs”. Two pitfalls must be avoided:
- Spending too much time. The system is complex; we must understand only the minimum needed to reach the desired proof in order to be efficient.
- Proving without the system designers. A proof written without the collaboration of the original system designers would be a “naïve proof”, with very little chance to be relevant.
The first point is an optimization constraint: the system is described by a huge number of documents that would take too much time to read, and many details are irrelevant for the proof anyway. To find the simplest proof path without falling into the above pitfalls, we use the following steps:
- Play scenarios trying to violate the wanted property (for instance, at top level try to play a scenario leading to a collision), in a light and fast way, until the reasons why breaking the property is impossible appear. Ask feedback about those scenarios from system designers: as the desired property (for instance no collision at any time) has been their concern, they will quickly explain why the property is ensured in that case (unless there is a real pitfall)!
- Once the reasons why the property is ensured have appeared, explain those reasons, at first informally then more and more rigorously, until we reach a text with only well defined assumptions and pure logical steps. Again, ask feedback about those simplest reasons why the property is ensured from original system designers.
As explained above, assumption writing is a key point. Testing these assumptions using the previous criteria leads to many questions before the formulation is satisfying. Another good test is that if an assumption is required, then we should be able to find a realistic accident scenario if the assumption is not true. Such accident scenarios must be kept with the assumptions; they constitute their best justification.
Conclusions so far about methodology
The natural language proof step is necessary because such rigorous explanation of “why the design ensures this property” is usually not readily available. The task load for this step is not at all negligible and should not be underestimated; but the benefits of having such natural language precursors of proofs are great.
For this reason, some people may ask: once natural language “proofs” are obtained, why use formal methods? The reason is that this is the only reliable test for “only pure logical steps”. And experience at ClearSy shows that there are always flaws that are found only at the B formalization step.
Results so far
Let’s examine now what was established so far about the no-collision property for the Line 7 modernization project.
We have a CBTC system with an external (existing) interlocking, using the notion of movement authority limit (MAL) to denote the end of the space where a train can go without encountering any obstacle or unlocked switch. The system is made of the following parts:
- Trains are equipped with an OBCU (On board computer unit) responsible for train localization and ensuring stop before MALs;
- Zone controllers (ZC) are responsible for train tracking (using location reports from CBTC-equipped trains plus track circuits) and for MAL broadcasting (ensuring that trains remain on locked routes, and ensuring train spacing inside routes);
- Interlocking is responsible for route tracing and locking, and for spacing for unequipped trains.
The selected property (that we named anti-collision) is detailed as: a train will never encounter an obstacle or a switch not locked in correct position.
The interlocking ensures that trains only enter routes that are locked. This is obtained by authorizing routes only after their correct locking has been entirely checked. In NYCT subways, specific devices called “trip stops” ensure that even unequipped trains cannot overrun restrictive signals more than a certain distance. So before opening the entry signal of a route, the interlocking checks that all the space of that route is locked including the possible overrunning, and checks that all protection devices (like trip stops) are ready. Of course this locking is possible only if no other locked route can reach the concerned space. Blocks on routes are freed only if the entry signal is checked restrictive, up to the last train on the route.
Without going into too many details (return zones or route subdivision by the CBTC using virtual signals are special cases included in the proof, but not described here), starting from a 0 train situation we can use these interlocking properties to prove that trains only run on locked routes, in the same direction. This already ensures that a train cannot reach an unlocked switch, and that a train cannot encounter a reverse direction train (unless trains change direction in unauthorized zones or roll back more than the expected maximum).
Note that the overrunning portion beyond a route is not locked for routes authorized only for CBTC trains: such trains are not supposed to overrun signals. One consequence of this is that a failed CBTC train restarting in manual mode is supposed to drive in line-on-sight up to the next signal, without overrunning.
The interlocking also ensures train spacing for unequipped trains: this is done thanks to an intermediate property stating that a manual train always encounters a red signal before the next train, with enough space beyond in case of signal overrunning (2 blocks).
For zone controllers (ZC), the following train tracking intermediate property can be proved: free spaces according to the ZC state variables are actually free (at any time). The train tracking is based on location reports from CBTC trains (extended using the worst possible acceleration of a train to compensate transmission delays) and track circuits. The property is ensured thanks to algorithms that safely remove suspicions of unequipped trains behind or before a CBTC train (at domain entrance or when moving away from an unequipped train). Thanks to this intermediate property, it is possible to prove that MALs calculated by ZCs are such that the space from a train to a MAL will remain free, until a new MAL beyond is provided. (We donot describe direction change or annulations here…).
OBCU aboard CBTC trains must ensure that their position determination is correct, i.e. that they are fully inside their broadcasted position interval. Position determination is done thanks to transponder detection, and to motion measurements between transponders. Proving that motion measurement always matches the real motion is not easy: sensors attached to the wheels are subject to slipping problems (slipping is always a concern for steel rail systems, such that these systems are also called “adhesion railways”). Other sensors (accelerometers, optics, radars) might not detect very slow motions, so long undetected slow motions could lead to an undetected position shift that will cause a collision if the next obstacle is before the next transponder. The THALES system uses a combination or wheel sensors and accelerometers, with an elaborate algorithm. The key point is to prove that a wheel slip is detected before it can cause an undetected position error: this can be done under some assumptions about the physics of wheel slip (related to the grade: the slowest slip start should be distinguishable from the fastest grade change of the track).
The OBCU must also ensure that the train always stops before reaching its current MAL (unless it’s a pulled back MAL, in which case ensuring the previous MAL is safe). This proof is done by comparing the OBCU computations to the physics of kinetic energy: the computation should always be safer. The difficult parts here are related to how the track grade is taken into account, how the train load can affect the inertia center and how possible slides could affect the rotating masses inertia.
From all the above elements about interlocking, ZC and OBCU, one can tell why collisions seem indeed impossible. But difficulties are inside details, and natural language “proofs” written so far demonstrate the gap between global explanations and proofs based on well defined assumptions and logics.
These natural language “proofs” have different shapes. For the ZC train tracking property, the link between the ZC variables and the real position of trains is quite simple but many possible evolutions must be described, and should not break this link. For motion determination (tachometers and accelerometers), there are many stacked laws assumed to hold permanently between physical values (position, speed, acceleration), sensor output, algorithm intermediate variables and finally output vari
variables. All physical variables change in a continuous manner, and the evolution of all computed variables is linked to the OBCU cycle.
For proofs like motion determination, we must take uncertainties into account. For example, even a law linking purely physical variables (defined with infinite precision) like train position and axle angular position must include an uncertainty to account for track turns and conical shape of the wheels. These laws will not be represented as such inside the B models: the Event-B method is dedicated to the logic itself and the mathematical analysis steps must be externalized.
In the development of its own railway products, ClearSy had experiences of system level properties proving that found many pitfalls, even after the product was already coded and testing concluded 0r defects. In many cases, the tests are connected to detailed specifications that already “hide” the concerned bugs; proving properties at higher level is then the only efficient solution.
This is what we are doing for the line 7 project, at a very large scale.
In industrial projects, the efforts to reach the required performance and to obtain the mandatory documents obviously come first. Extra efforts to reach higher levels of confidence (like property proofs) are always a matter of conviction. Let’s hope that global experiences in more and more projects will accumulate evidence for this conviction.
Author: Denis Sabatier [source]