Understanding and predicting the occurrence of void street interfaces

Void street interfaces (VSIs) – building plinths with restricted visual interaction, accessibility, and public use – constitute an urban feature often associated with undermining the public domain, limiting free access and preventing interaction between social groups. Moreover, VSIs have been described as products of inequality designed to segregate and hinder integration between public and private urban spaces. This study assesses VSIs across six cities in Brazil, a country notable for its profound inequality and sociospatial fragmentation. The main aims of this research are: (i) to develop and test a predictive model for VSIs using socioeconomic indicators drawn from open-source ground-truth data; (ii) to identify the variance of VSI within selected case studies. In the development phase of the predictive model, data from the city of Recife are used to build the model. The testing phase involves the analysis of VSIs in the cities of Fortaleza, Salvador, Belo Horizonte, Curitiba and Porto Alegre. The model can potentially assist urban planners in better understanding and locating VSIs and mitigating undesirable outcomes.


Introduction
Urban segregation, inequality and spatial fragmentation are barriers to safety, inclusivity and sustainability. Previous research has shown how urban inequality can lead to segregation and limit integration between public and private urban spaces. This work explores this dynamic further by analysing void street interfaces (VSIs) across six Brazilian cities (see Supplementary Figure S1).
indicators and analysed the sample in detail. The model outcome captures areas with notable VSI presence.
The remainder of this paper is laid out as follows: (1) a review of the literature; (2) a description of our data and methodology; (3) an explanation of our results; (4) a discussion of our approach; and (5) a general conclusion.

Literature review
In the literature on safety and inclusivity in urban design, the plea for vibrant and inclusive streets (e.g. Jacobs, 1961;Gehl et al., 2006;Mehta and Bosson, 2010) is countered by researchers emphasizing defence and territorialism (e.g. Newman, 1972). These conflicting views on urban space are part of a debate over the roles and interaction of different users (residents, passers-by) as mechanisms to achieve or undermine safety in urban environments. For example, Jacobs's (1961) concepts such as 'eyes on the street' and 'sidewalk ballet' prompted pedestrian-centric research to focus on land allocation and block typology. At the same time, Newman's (1972) concept of safety via the presence of inhabitants or proprietors of an area has been related to establishing gated communities.
Latin American cities fruitfully embody this debate. It is clear that the notions of control and safety do not follow Jacobs' notion of safety by maximum interaction or Newman's notion of community development (see Supplementary Figure S6). In the context of development amid social conflictcommon in Latin America (Davis, 2020;Mattos, 2006)the quest for safety and control has dominated the actions of real estate and society. As Coy and Pöhler summarize, 'the realisation of a private idyll for the privileged seems to be possible only among equals. The consequence is an increasingly clear trend of self-segregation'. (Coy and Pöhler, 2002: p. 356, p. 356) This debate over control and inclusion is intimately tied to the scholarship on Latin American urban development (Kostenwein, 2021), which focuses on gated communities and the mechanisms behind control and territorialism (Coy, 2006;Coy and Pöhler, 2002). However, residential segregation discourse extends beyond gated communities, looking at both traditional and modern urban division patterns (Sabatini, 2006).
We recognize that VSIs and urban gated communities are inextricably related (Borsdorf and Hidalgo, 2010) and hence present the majority of the elements outlined in the literature on the topic in a single building. Consequently, VSIs may be triggered by the same logic as gated communities in the global South. In this context, luxury and security devices 'were necessary features to convince middle-income urban dwellers to buy property beyond traditional urban boundaries' (Libertun De Duren, 2012: p. 245). However, in the case of VSIs, the security features would attract middle-and high-income residents to live in highly urban parts of cities. Therefore, VSIs can be understood as indicators of shifts in segregation patterns, revealing the limitations of measuring complex sociospatial dynamics (Peach, 2012).
Another notable feature of Brazilian urbanization is the rapid transformation of the built environment. Moreover, this transformation was driven by a development model centred on individual property and guided by conflicting logics of market and state (Abramo, 2007).
As mentioned earlier, regulation and planning of urban development in Brazil (Rocco et al., 2019: 422) has long struggled to promote inclusiveness and prevent segregation. As a result, through territorial growth and building replacement, urban transformation in Brazil continuously produces a highly fragmented urban fabric and built environment that reinforces and reflects a high level of inequality and segregation.
In summary, VSIs in the Latin American context are symptoms of high inequality, low social mobility, the deregulation of land markets and targeted infrastructural development (Thibert and Osorio, 2014). With regard to Brazilian cities, we argue that VSI reflects (1) increased inequality, exacerbating the use of security features, (2) the deregulation of urban transformation, and (3) dependence on private vehicle mobility. These three aspects are evident in (1) the gates and walls, (2) the rapid replacement of the building stock, and (3) the strong presence of building plinths dedicated exclusively to parking.
(Void) street interface 'Street interfaces' are the physical boundaries between private and public spaces (Van Nes and Yamu, 2021)the contact zone between domestic and urban activities (Dovey and Pafka, 2017;Dovey and Wood, 2015). They are sometimes referred to as 'the city at eye level' (Karssenberg et al., 2016). Dovey and Wood (2015, p. 4) categorize street interfaces into five major types (see Supplementary Figure S4). What we propose as VSIs here borrows the main characteristics of the authors' impermeable/blank type. They refer to it as interfaces without transparency or pedestrian access, and a gate occasionally accompanies it. Pedestrian movement is socially inert even in these instances. According to Dovey and Wood, impermeable/blank interfaces are sometimes transparent, with private portions visible but inaccessible from public space. In this paper, VSIs correspond to building frontages with no public function, limited ground-floor accessibility and restricted visual interaction. VSIs can cover fully or partially the frontage of a building or urban block.
The interface between buildings and public spaces has generally been studied by analyzing the morphological features of buildings alongside the connectivity of the street network. In their analyses, studies have used plots (Bobkova, 2019), block faces (Vialard, 2013), pedestrian perspectives (Araldi and Fusco, 2019) and other resource combinations (Palaiologou and Vaughan, 2014).
Previous studies focused on urban development in Brazil have demonstrated a negative correlation between blind interfaces and the number of pedestrians (Netto, 2012). Additionally, research shows that pedestrian route choices are influenced by the presence or absence of windows and doors (Maciel and Zampieri, 2020). Studies on Recife have also identified negative correlations between closed interfaces and sociability (Roca Muñoz and Monteiro, 2019) and between safety perception and crime occurrence (Monteiro and Cavalcanti, 2015). This paper focuses on the socioeconomic context in which VSIs emerge, aiming to develop a reliable means of assessing its occurrence and informing planning. The novelty of our approach stems from our focus on the relationship between VSIs and socioeconomic indicators. We want to address the complex issue of spatial fragmentation by clarifying what triggers the occurrence of VSIs, following on Brenda Scheer's (2010, p 310) words about the analysis of simple types: Since ordinary building types are the most visible building blocks of the urban landscape, planners must study the naturalised conditions under which they arise, flourish, and change to have any hope of transforming them or the urban landscape that contains them.

Data and methodology
We developed this study over the course of several phases (see Figure 1). First, we developed a machine-learning predictive model, trained it using Recife VSI data and projected results in other cities using Brazilian census data. The model outputs were then refined using optimized geographic methods. Finally, we investigated the model output, comparing VSI characteristics to understand better the environment in which VSIs occur. In addition, we incorporate data on land use and development indicators to complement our analysis (see Results Section).

Building the predictive model
After training our exploratory model with the data from Recife, we applied our classifier to the five cities with no VSI spatial information. We selected these cities through cluster analysis using data from the 5556 municipalities surveyed in the 2010 IBGE Census, including income Gini index, population size and average household income (see results in Supplementary Figure S2). We selected five cities (out of 12) that are part of Recife's cluster. 2 Salvador and Fortaleza are close to Recife geographically and share similar demographic and socioeconomic characteristics. Belo Horizonte, Curitiba and Porto Alegre, in contrast, are located in Brazil's wealthier southern region.
To train the classifier, 3 we used the locations of buildings with VSIs in Recife as target labels. We verified this information using online street imagery and relabelled the data for buildings matching the VSI typology (see Supplementary Table S1 and Figures S3 and S7).
To detect VSIs in the other five cities, we applied the classifier to automatically detect areas with the socioeconomic attributes of urban spaces associated with the presence of VSIs. We sourced these attributes from the 2010 IBGE Census (IBGE, 2010), which covers 3000 indicators, including those pertaining to gender, demographics, ethnicity, family composition, income and literacy.

Feature selection
Before selecting the model with the best results, we went through an iterative process to select the best attributes to incorporate into the model. We tested the attributes in different combinations to understand and select those that were most informative. The final selection includes only attributes derived from the 2010 census. This decision guaranteed the uniformity and coherence of the dataset.
The final list of attributes is as follows: dwelling type (house, apartment or villa), population density, average household income and car ownership. We used this list to train our predictive model. The exploratory character of this research justifies the reduced set of parameters, as a complex model is not our primary objective. Instead, this paper aims to achieve an understanding of whether, when controlling for socioeconomic indicators, buildings in other cities share the same characteristics identified in Recife. The exploratory character is also the main reason for selecting an explainable model, which can provide weights for each attribute and better insights to the analyst.

Model training
We trained two different classifiers: one based on extra trees algorithms (extremely randomized trees; see Geurts et al., 2006) and one based on random forest algorithms (Ho, 1995). 4 This allows us to compare accuracy and feature importance and select the best model. Furthermore, since we are dealing with a dataset with imbalanced target labels, 5 we employed cross-validation and calculated the estimated balanced accuracy 6 to evaluate the model's performance.
We also calculated each feature's importance to understand how they have influenced the model. Information on the training phase for both models is presented below Table 1.
After the training phase, we used unseen data from Recife´s dataset to test the estimated balanced accuracy of the trained model. This testing phase shows that the random forest model is more sensitive to the income feature (0.408) and the percentage of the dwelling-type apartment (0.185). In contrast, the extra trees model shows more balance in the importance of each feature. Considering the relevant income differences across the cities that we are analyzing, we selected the extra trees model on account of its lower reliance on the income feature. This choice lends more importance to the attributes that are distinct for VSIsdwelling type and incomewithout the risk of incomebiased results. Finally, we applied the model to each city in the dataset to predict VSI locations based on Census socioeconomic attributes. As already established, we only had information about VSI locations in Recife. No relevant ground-truth data were available for the other cities. Therefore, we employed this modeltrained using data from Recifeto predict VSI locations in other cities affected by similar urban policies.

Output refinement and validation
The predictive model used in this study is a-spatial, meaning it does not directly consider the spatial location of each census tract or the influence of neighbouring tracts in the results. Therefore, once we obtained the results, we defined a three-step method to refine and validate the model output.
First, to incorporate a spatial dimension into the model output, we used optimized hotspot analysis on ArcMap (Ord and Getis, 1995). This statistical tool enabled us to identify significant areas and exclude outliers.
Second, we assessed the relationship between the presence of VSIs and three separate indicators: land use, Gini index and HDI (Human Development Index). We included the percentage of residential land use at this stage, following what we observed in Recife, where VSI presence has a positive relationship with monofunctional residential areas.
Third, we conducted a visual analysis of five per cent of the selected census tracts in the five considered cities. We used the most recent images available on Google Street View, only considering older versions when clarification was required. This visual analysis entailed coding buildings as VSIs (following the criteria detailed in the introduction) and ranking census tracts on VSI prominence from 0 to 5 (0 = total absence; 5 = over 75% of the buildings are VSIs). The classification is determined by dividing the number of VSIs by the total number of buildings.

VSI in Brazilian cities
The VSI prominence rankings obtained by the predictive model are shown below. The figures for Recife are included as a reference.
The results show that the prominence of VSIs varies across cities. While Fortaleza, Salvador and Curitiba have rates similar to those of Recife, those of Belo Horizonte and Porto Alegre are substantially higher. It is worth noting that these two cities (together with Curitiba) are fairly affluent relative to the others in the dataset. Also worth pointing out is that Porto Alegre's VSI rate exceeds even that of Belo Horizonte by a decent margin.
These preliminary findings suggest that our methodology is more effective for comprehending spatial fragmentation than a single-factor index such as Gini because our model incorporates several factors' associations.
The differences in the results warranted additional analytical steps, including spatial mapping to evaluate the coherence and plausibility of the VSI locations, comparing each result with empirical knowledge about each city (see Supplementary Figure S12).

Refining and comparing the results
To incorporate a spatial dimension to the predictive model's a-spatial output, we employed optimized hotspot analysis to highlight the areas of VSI concentration (see Figures 2 and 3 and Table 3). We refined the model results with this complementary step, identifying significant areas and excluding potential  outliers. As a result, the revised figures lie within the range of Recife, 22% of the total census tracts, bringing Belo Horizonte and Porto Alegre more in line with the other cities.
The most significant areas this process highlights are considered in the next step, where we compare the results with land use and development indicators as seen on Table 2.

Comparing the results with city-level indicators
This step aims to understand how the model results perform at the city scale in terms of land use, income inequality and human development (see Supplementary Table S1).
The results indicate that the areas selected by the model score higher in the three indicators (see Table 4). For example, residential use accounts for 90% of land use or more across all five citiesaround 20% higher than the city average. The other indicators also score relatively high values. Moreover, as seen in Supplementary Figure S11, the model-selection figures are concentrated around the country's average relative to the more dispersed city-wide figures.

Verification of predictive model results
Finally, we conducted a visual analysis of the considered census tracts using a classification score (see Section 3.2 for more details). This score is intended to capture the variation and consistency of model outcomes. It is not intended to quantitatively validate the results.
As shown in Table 5, the results demonstrate coherent distribution, with extremes values being underrepresented and a gradual increase towards more significant VSI prominence.
Of the 155 analyzed census tracts, only eight scored 0: including two in Curitiba, two in Belo Horizonte and four in Porto Alegre. Results in Porto Alegre and Curitiba are predominantly commercial areas with mixed-use and active ground-floor or suburban houses (Curitiba). In Belo Horizonte, the absence of VSIs may be attributed to the high rate of heritage-listed

Discussion
This research demonstrated how to apply innovative analysis techniques to well-known urban issues as a means of providing valuable insights into modern urban development. The principal outcome of this research is that a machine-learning model was capable of detecting VSI prominence. While the spatial outlook observed in the samples may vary from large blind parking garages to fenced-off gardens, most of the analyzed samples denote structures intended to close off and disconnect buildings from the public realm. Our observationsboth in Recife and the sample at largeshowed that VSI retrofit and adaption are prohibitively expensive, making improvements impracticable. This is in contrast to prevalent scholarly arguments about adaptation and spatial capacity (Marcus, 2010), which praise the ability of urban spaces to accommodate a variety of activities over time with minimal adaptation. The analysis of the sample-5% of the highest-scoring census tractsdemonstrates some trends and points of similarity. We can illustrate the most significant points regarding VSI and reflect on the contexts in which VSI buildings are constructed.
Fortaleza, Porto Alegre and Curitiba are all, to a certain extent, examples of cities in which the built environment surrounding VSIs has been impacted. In such cases, the introduction of VSIs can be associated with increased safety measures, such as higher walls and barbed wire. Additionally, some solutions used in VSIs, such as electric gates and surveillance cameras, are gradually integrated into pre-existing structures.
Worth noting is that we noticed several vacant buildings and closed shops near VSIs. We do not mean to assert that there is a causal relationship here. VSIs may simply be newer than those in their immediate vicinity. Still, without a temporal assessment, it is impossible to determine whether VSIs aggravate local environments or whether VSIs are simply more likely to be constructed in areas already in decay.
We also identified cases in which VSIs coexist alongside neighbouring interfaces with active ground floors featuring shops, cafes and other public services. This type of coexistence appeared to be most prominent in Curitiba and Porto Alegre; it was seldom found in areas with multi-floor buildings alongside individual houses. This dynamic may reflect the development process of areas previously occupied by detached residences. The remaining houses are no longer attractive for residential use and, eventually, undergo functional shifts.
We may also make some general observations regarding VSI features. Interfaces of older buildings that were once open to the public are gradually closed, with improvised solutions later incorporated. Interfaces of newer buildings incorporate these design strategies to control access. We noticed similarities across the sample in VSI designs. For example, many interfaces featured plinths exclusively used for parking; others featured linear enclosed gardens. Also worth noting is that there was a distinct lack of mixed-use functionality among VSIs. On a different note, our analysis identified some possible coexistence scenarios for active frontages and VSIs. In Curitiba and Belo Horizonte, a correlation between VSIs and road profiles was established. In these two cities, VSIs were typically located alongside local roads, whereas active frontages were typically located alongside main roads or transit corridors. While this apparent association was most pronounced in Curitiba and Belo Horizonte, it was observed in practically all of the cities analyzed.
While relatively infrequent, we detected another VSI variation in Belo Horizonte and Curitiba. We came across several buildings with VSIs on most facades but with active corners housing storefronts in these two cities.
In summary, the study of VSIs allows for a more nuanced understanding of the urban environment's capacity to respond to conflicting demands. As already stated, the study of planning instruments is crucial to understand why and how municipal regulations prevent, allow, or even push VSIs.
One limitation of this work is that it did not achieve an understanding of whether local characteristics respond more to local or global factors. That was remarkable in the sample observations, where interfaces changed regardless of local context. One potential approach to addressing this limitation would be to compare local characteristics (such as VSI prominence) with global measures, such as centrality (Hillier, 1999). Another potential avenue for future research is to review planning regulations in the six cities studied here to analyze how VSI profiles are normatively addressed at the municipal level.

Conclusion
Despite the extensive literature on spatial fragmentation in Brazil and Latin America, VSIs remain under-analyzed. This research employs a novel approach by processing, analyzing and modelling existing data to address this gap in the literature. This paper presented an exploratory process to detect VSIs across six Brazilian cities using sociospatial data and a machine-learning technique to build a classifier model that can automatically predict VSI prominence in cities for which this spatial information is not available.
This work employed a complementary approach, making it distinct from urban morphometrics studies that use urban form classification to detail the socioeconomic characteristics of cities (e.g. Araldi and Fusco, 2019;Bobkova et al., 2019;Dibble et al., 2019). By using socioeconomic parameters as a starting point, this study contributes to a better understanding of urban form and type emergence. However, one drawback of the methodology is that census data is organized in tracts that do not always correspond spatially to urban morphology features like urban blocks. Future research on VSIs and morphology must address this spatial disconnection.
The detailed study of the samples revealed consistency in the model results, identifying areas with high VSI occurrence. The model also performed well on a city scale, as confirmed by comparing land use data and social indicators.
The approach used here to combine ground-truth and open data in a predictive model could be extended to other cities. Although street-interface research has been conducted in various Brazilian cities (e.g. Alonso De Andrade et al., 2018;Holanda, 2002Holanda, , 2007Maciel and Zampieri, 2020;Saboya et al., 2015), existing studies lack a common framework and face severe limitations in terms of ground-truth data collection and observation. Future research should incorporate open-source mapping platforms to develop a shared database of street-interface types. This step could be crucial for future research using predictive models based on open-source data and volunteered geographic information (Milojevic-Dupont et al., 2020).

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental Material
Supplemental material for this article is available online. Notes 1. Plano Diretor is an instrument unique to Brazil; it employs city-scale zoning and norms to regulate urban transformation at the parcel level. 2. By selecting cities from the same cluster as Recife, we avoid the model being biased by population size or other characteristics. 3. In this analysis, we employed the sci-kit learn package for ML available for Python to select the suitable classifier model and algorithm; see https://scikit-learn.org/stable/modules/tree.html#classification and Pedregosa et al. (2011). 4. The two algorithms work similarly, building multiple trees and splitting nodes using random feature subsets.
There are two main differences between them: first, the Extra Trees algorithm samples without replacement (no bootstrapping); second, the Extra Trees algorithm splits nodes randomly, while the Random Forest algorithm opts for the best split. 5. In the Recife dataset, the number of areas labelled as containing a VSI is small relative to the total number of areas in the city. It is important to address this issue to obtain accurate model output through, for example, sampling techniques: in this case, we applied k-fold cross-validation to the training set obtained by splitting the original dataset. 6. The estimated balanced accuracy is defined as the average recall obtained on each class; it varies from 1 (best value) to zero (worst value; see Brodersen et al., 2010).