Objects, property visibility, and trade-offs

As if on cue, the public vs. private debate has sprung up again within Drupal. The timing is fitting given my last blog post on programming language paradigms. Of course, property visibility is not a new debate, and the PHP community debates this subject from time to time (sometimes humorously).

What I believe is usually missing from these discussions, and what I hope to offer here, is a broader picture view of the underlying assumptions that lead to different conclusions about when different visibility is appropriate (if ever).

In short: It's the difference between procedural-think and object-think.

Procedural think

Those coming from a procedural background, I find, tend to think in terms of what is possible in a procedural language. Any procedural language has a concept of a variable (duh), and any procedural language worth using has some way of creating more complex data structures to be used as variables. In C there is the concept of a Struct, for instance. In PHP one can largely emulate the same structure (no pun intended) using PHP's ridiculously flexible arrays and good documentation.

In either case, what one is doing is simply clustering variables together to make them easier to work with. To actually operate on that variable cluster requires a function that understands that structure.

To be sure, one can do an incredible amount with such an approach. In a past life I was a Palm OS developer, and the Palm OS API was entirely procedural. Most functions took some sort of struct as their first parameter and would manipulate that struct in some way. Occasionally you could manipulate the struct directly but it was usually documented as unsupported, "do this and it will probably break", although the API authors tried hard to not break the struct definition. In a sense it was poor-man's classes and methods but without the class or method.

With this approach, you're at best talking yourself into not making use of functionality that's right in front of you because it could break (or some host company like Palm says "naughty naughty", or like Apple rejects your app). However, even with being able to name structs in C you're not doing anything more than clustering variables to make the syntax cleaner.

In a procedural approach, the position of pre-created variables within the structure is the contract between caller and callee. The behavior of that structure is completely undefined.

Object think

As noted in the previous article, in an OOP paradigm one doesn't think about clustering variables. One thinks about objects as being a single opaque entity that has behaviors. Instead of thinking in terms of integers and strings, one thinks in terms of Domain objects: Nodes, or Posts, or Users, or Dates, or Customers.

These objects are atomic; they're just as atomic as an int or string would be in a procedural approach. And therein lies the power.

If I'm accessing a string in, say, PHP, I don't know, or care, if it's being stored internally as a character array. Or if I concatenate something to the string I don't know or care if the character array is being extended in memory, moved to a new location in memory that is bigger, or if an entirely new variable is created and both strings are then put into it, destroying the first string. That's not my problem. When writing my code, I should not have to care about the memory management that goes on under the hood. If I could access it and tried to manipulate it myself for some reason I'm more likely than not to cause a fatal error and segfault my entire PHP process, if not now then as soon as the next minor point release of PHP tweaks the string management code. Don't think that's a problem in practice? Why do you think some people hate C, where you're doing that sort of code back-stabbing all the time? :-)

In an OO model, if I'm accessing a Customer object then I don't know, or care, that it's being persisted internally to SQL or to a flat file. I don't know or care if when I call $customer->getName() if that name is stored as a single string within the object, as multiple strings that get concatenated together, or if it generates a new request to a SOAP service to look up the name. In fact, it's a design flaw if I even need to care which it is. If I start mucking with it directly, then as soon as someone switches from local MongoDB to a remote SOAP service my code falls apart and fatals.

There is no $customer->name variable. As far as I, in calling code, am concerned, it doesn't exist. All that exists is the defined contract of the interface methods.

In an OOP approach, the methods that are exposed in the interface is the contract between the caller and callee. The underlying logic and primitive variables are completely undefined.

Eclipse of the Sun

Once again, I blame Sun for why many people don't "get" that distinction. In Java, there is a concept of a JavaBean. A Bean is a class in Java that has a no-parameter constructor, is serializable, and has a getX() and setX() method that corresponds to every property X. That is, it is an object that is really a Struct and offers little if any advantage in being an object in the first place.

In so many conversations about OOP, I hear people ask "well where are your getters and setters?" Which is usually followed by "if you have to have getters and setters then why bother making properties non-public?" The latter is a perfectly valid question, but is backwards. Why bother having matching getters and setters for every property? That breaks encapsulation, one of the key reasons for using an object approach in the first place, and is one of the reasons why JavaBeans are, in most cases, a horribly bad model to follow. They are a terrible example of OO. They are Naked Objects wearing a see-through negligee. While there are valid use cases for such a design they are not representative of "good" OO. I encourage most PHP developers to forget they exist as the Bean approach in most cases defeats the entire purpose of using objects.

It is quite unfortunate that so many schools teach textbook Java as their primary programming language, as in so many ways bad API architecture like Beans encourages bad OO programmers, who then give good OO design a bad name.

PPP what?

When viewed from that standpoint, it's easy to see where the debate about public vs. private/protected comes from. To a procedural way of thinking, hiding properties doesn't make the slightest bit of sense. Those are the data, let me at the data dagnabbit!

From an OO standpoint, however, exposing internal properties makes no sense at all. You're just begging for someone to break your code, or worse yet open up a security hole, and it means you cannot refactor your code when you need to for fear that someone is relying on the current implementation. You cannot improve the API without breaking it. You may as well ship runkit with every copy of PHP and encourage people to change the language syntax out from under you. (This is a step more evil than eval().) You're making me care about the underlying complexity, stop that, just let me tell you what to do dagnabbit!

Certainly both concepts can be taken to an extreme. I did once work on a procedural system where the entire communication mechanism between different parts of the system was global variables. That was a horrid system, let me tell you, despite being the ultimate in "flexible and bare data". I haven't seen anything quite that horrid in OO code myself, unless you count JavaBeans, but I have faith that it exists.

If you're using your Classes as structs, then public properties make total sense and anything else is silly.

If you're using your Classes as objects, then protected/private properties make total sense and anything else is silly.

Double-think

It has been suggested that Drupal should adopt a "public only" policy, on the grounds that "Drupal is in the business of throwing doors wide open". Both that and the article the suggestion references misses the point; more specifically, the suggestion is to use classes as structs, not as objects.

However, that approach also runs afoul of another part of the Drupal business: Being modular. Exposing implementation details of that sort is, as explained above, an inherently non-modular approach. Exposing properties publicly encourages their direct use, which in turn means that we are using classes-as-structs: as little more than arrays that pass funnily. That means losing all of the benefits of encapsulation, abstraction, modularity, and portability that using classes-as-objects offers.

Given the markets that Drupal is moving into, where swappability of components (a mainstay of classes-as-objects) is critical, I believe that to be a very bad trade-off.

There is only one way I can see for such an approach to work, and that would be to adopt and rigorously enforce the following policy:

All public properties are to be treated as internal implementation details and not accessed unless no alternative is available. Accessing a public property is not supported in any circumstance, and the structure, definition, or existence of such properties may change at any time, even in a point-release, without notice. Changes to an object property are never considered an API change.

If we could hold to that, we would essentially be able to have our cake and eat it too: classes-as-objects but with an "out" in case the defined API only does 98% of what we need.

To be perfectly honest, however, I do not think we could pull that off. That's not a slight against Drupal developers but against human beings. The temptation to say "oh, well, it's easier to just grab this property than to file a bug report" is just too high. And then when we refactor something and inevitably break the "unofficial API" (aka those public properties), someone (who may or may not have read the documentation) will come screaming to the issue queues that we broke their site. It doesn't matter if it's core or contrib, it will happen.

What do we say then? "Sorry, you didn't read the docs, go away?" No, we'll find ourselves avoiding changing properties to avoid those sorts of issues. We'll find ourselves saying "Eh, I don't need to think through an API here, people can just grab the property and do what they need". We'll finally give in and, informally, consider properties to be part of the API and try to not change them, even if it would make the code better or provide some new feature.

And we will have gained nothing.

Architect think

Does that leave us with no resolution? No way forward to decide how to build the next generation of Drupal? Hardly. Rather, it puts the onus on us as Drupal developers to decide, situationally, which sort of flexibility we value more.

There are cases where "opaque domain objects" are absolutely the right fit, and we want to think in terms of behavior. The Drupal 7 database layer comes to mind, obviously, but any system where we want to flexibly exchange or chain together disparate components is also a strong fit for classes-as-objects, with all of the benefits and trade-offs that come to mind.

There are other cases where we want to turn it inside out, cases where we want to think in terms of raw data. Cases where we have highly unstructured or irregular data come to mind, such as Render API or FAPI, and we need to amorphously aggregate it over time. While those systems certainly could be made in a purely OO classes-as-objects approach, I suspect it would be even more complex than FAPI is now (which is already pretty complicated) and with all the extra stack calls that would result much slower. Any sort of info hook is another place where bare data wins, because it's all about definition rather than active behavior.

To be sure, that puts a lot of pressure on us as software architects to think through our APIs and figure out the appropriate technique. The right tool for the right job is never an easy decision, but it is from those decisions that really powerful designs are born.

That is a challenge I do believe we are up for.

Comments

I can only repeat myself

I linked from my blogpost and will link from here http://aperiplus.sourceforge.net/visibility.php and I can only repeat what it says: There must be a clear distinction between public interface and private methods. However, this is simply a labelling issue: it doesn't actually need to be enforced.

So first let me counter:

What do we say then? "Sorry, you didn't read the docs, go away?" No, we'll find ourselves avoiding changing properties to avoid those sorts of issues

This is utter nonsense, from one version to the next we happily rewrite whole APIs, your argument is not applicable. And, if we need to change implementation details in a minor version we will. It is valid from the developers to say "go away" if you used a protected property and it broke. As my favorite articles says, notionally-private methods were clearly labelled with underscores and nobody seemed to get confused.

Drupal have used underscores for private function since forever, and again everyone understood that these are not meant to be called -- and yet they were as appropriate.

You can happily code OOP without protected blankies. Setting bear traps beneath the windows is wasted effort.

All your talk does not change the fact there is already the connection issue (which you debate but that does not change the issue http://drupal.org/node/802514 on which two high profile contributors have already stumbled upon) which proves that sometimes it's just easier to change a private property and move on. With the reflection-magery in PHP 5.3 the PHP team gave in too. I recommend you to give in too. Lengthy theoretical posts does not change practice.

More...

If you're using your Classes as objects, then protected/private properties make total sense and anything else is silly.

Yes. Enforcing this, however, is silly too. You are, as I said before, are guilty of hubris: you think your API covers every case. Most cases yes. Let me quote my favorite article:

I think that's allowed when you're not acting as a normal user: mechanics should be allowed to rummage around all they like.

You forget that there are millions of Drupal sites out there and we know from our most complex API (form API) that there ARE cases we have not thought of before and because of that we provide millions of points where you can act and even that is not enough! Think of #required and a back button... that's solved only in D7 and broken in 4.7, 5, 6. You will lock us in similar traps. All I am asking for is not to lock down. The developers using those API will treat the underscored properties as something not to be touched. They won't be documented much and so on. But when push comes to shove, they are not locked in. That's what matters.

To sum up

As the API designer, please (to the contrary of what you state) use private/protected properties. But please, do not lock them in. Please do not believe your API will do everything. And please do not believe your users are idiots. If your API is as good as you they will almost never need to touch anything underscored. And that almost is what this turns around on.

Extending

When you program procedurally you need access to the code as it sits. Procedural code is final code. With OOP you can extend a class, make your changes, and then register that class for use. In the case of the Drupal database layer, you can create a new class extending an existing class, make changes, and then use this class instead.

This is a different way of making changes and having access to the data. Why is this not a valid approach?

One note is that if you

One note is that if you implement an interface or extend an abstract method, you cannot change the visibility in the new method signature.

object visibility

You cannot change the signature but you can add getters and setters. Since we are talking about protected properties and not private ones the additional getters and setters will have access to them.

Doing this, also, provides the benefit of providing an interface to data. If the internal properties change in a point release the extended classes new methods can deal with these changes.

object visibility

You cannot change the signature but you can add getters and setters. Since we are talking about protected properties and not private ones the additional getters and setters will have access to them.

Doing this, also, provides the benefit of providing an interface to data. If the internal properties change in a point release the extended classes new methods can deal with these changes.

Why change It

If you are defining an interface, everything should be public, so why would you need to change the visibility? If you are changing a public method in an interface to protected or private, you shouldn't be implementing the interface. And I don't know why you would have an interface with a protected or private method. The interface defines the public API, and as such, protected and private methods should be at the discretion of the implementing class.

Small off-topic question

chx, sorry to take this off-topic, but I'm not familiar with the back button / #required element fix in the D7 FAPI you're mentioning. I've been running into this problem myself with the Commerce checkout form and am wondering if you can spare a moment to point me to the pertinent issue or doc. Thanks! : )

OO approaches

IMO this whole 'debate' in PHP stems from PHPs typically confused idea of what it actually is. It is a weakly and dynamically typed language intended to be flexible and hackable, yet when it goes and designs an OO architecture for PHP5 it seems to copy Javas idea of one - which was part of a strongly and statically typed language intended to be very restrictive for avoiding the supposed C++ pitfalls.

Having private and protected restrictions on OO things is not strictly an example of OO vs procedural thinking - it is an example of one style of OO thinking brought on more by popular languages like C++ and Java than anything inherent to OO. ie an implementation detail.

OO in dynamic languages tends to have less of these restrictions - eg Python (which is arguably more OO than Java) has no enforced restrictions, Smalltalk (which is definitely more OO than Java) has some restrictions on some things but generally encourages enabling rather than restricting, and (as far as I understand) Perl which doesn't restrict much.

I tend to agree with chx's opinion that restricting visibility isn't a very 'Drupal' thing to do. Drupal has traditionally been one of the most hackable CMS codebases around.

But my opinion could just stem from my own biases towards Python. Python's attitude is 'We're all consenting adults' - ie you're welcome to do what you like, but if you break something it is your own problem.

Convert back to serial code

We could completely ban private variables by banning both objects and functions.

Most developers are happy with variables that are private to a function so where is the problem with variables that are private to an object? A lack of understanding is one issue. Private variables in functions are easy to understand. Private variables in objects are harder to understand and there are few examples of well documented variable use in Drupal 6's few classes.

I managed to finish one semester of Java at university but spent most of the time explaining "hello world" examples to fellow students. PHP has the advantage that you can start an Introduction to Programming course by learning programming instead of memorising a mess of language specific oddities. You can then grow your knowledge in any direction you need to build an application.

Drupal needs to grow OO and needs well documented role models for objects. Rules are useless without understandable working examples. D6 was sadly lacking. Are there examples in D7?

DB layer

I'd say the DB layer in Drupal 7 is a good example of Drupal OO done well, but I am admittedly biased as I was its main architect. The cache system and queue system are both now OO, and while their implementations aren't ideal (in the case of caching because it's so low-level) they're still a solid move in the right direction for those systems.

The Butler project aims to be another good, generic example to both build from and model on. It's still in early planning stages, though, but hopefully by the time Drupal 7 is really being used in earnest there will be something there.

For Drupal 6, the Feeds module is another module that, while not perfect, does a lot of things well with regards to OO.

"Most developers are happy

"Most developers are happy with variables that are private to a function so where is the problem with variables that are private to an object?"

How is that fair? Object properties are closer to globals than they are to variables that only have scope of one level.

Not true

How do you get object properties as globals? That's not even remotely close. They are closer to static variables within a function, but even that is a very very poor matching. Object properties are part of a meta-variable called an object. They are neither globals nor are they statics.

Trying to treat them as something they are not is how you get ugly, buggy, unperformant code that you cannot maintain.

Java as a bad influence on PHP

It's hardly surprising that most of the issues related to OO in PHP come from the fact that they just copied Java. It uses a message passing model. Thus being incompatible with the AOP concepts implemented in Drupal. If they had gone another route, and implemented a generic function model, then this mess of private/protected/public wouldn't exist. You define the specificity of each method when you declare it. It's a safer model. Because instead of relying in a "magic" mechanism of information hiding and encapsulation it makes the programmer completely responsible for the scope of the method. It's a model at odds with Java that treats the programmer as a suspect and tries to hold is hand in the right way. The problem is that it adds a layer of complexity and muddled concepts that in the end result in code more difficult to understand and also much more prone to bugs and security issues.

I've had moments when

I've had moments when accessing object properties as if they were public object properties would make the code base more consistent. Especially, maybe, if asking whether database rows are being returned as arrays or objects by default?

The cumbersome part here for me, is how to get a specific set of an object's properties. Suppose it contains the database row set and some additional properties, but you want to quickly get all of just the database row properties? Hence it seems that storing all the properties that one might like to make available within an internal protected struct or array? And providing convenience methods to access these properties as if they were public via __get or offSetGet? With an available getAttributes() method that would return an array of them.

(array) $obj currently doesn't seem to provide a means to quickly return this limited set.

Don't exist yet

The properties you're after may not exist yet. That's the point. They may not exist until they're requested, thus saving a lot of time if they're not needed.

Database records in a result set are a perfect example, actually. They are not pulled from the database and queued up into an array in PHP space. They are, depending on a flag on the connection, either pulled into PHP C-code space and stored there, or by default left on the database server and only retrieved from there once as requested. So there is no $result->arrayOfEverything for you to access. It doesn't exist.

If you want all records as an array, the D7 DB layer provides a single method to do just that: $result->fetchAll(). (That's actually a PDO feature; the Drupal layer adds several other variants of that.) That generates the array you're looking for on the fly and returns it, without wasting time generating it if you don't want it.

Hence it seems that storing all the properties that one might like to make available within an internal protected struct or array.

At what cost? The system should calculate 14 different ways of structuring the data, including potentially contacting 3rd party ReST or SOAP services, on the off chance that someone might need them? That's horribly inefficient.

By abstracting that behind a method, you can let the object calculate on-the-fly in the most efficient way what it is you need from it. So the 2 pieces of information you care about can be calculated either early or late (you don't need to worry about that), and the other 12 never need to happen. That saves a lot on performance and allows the author of the class to change how those are calculated and when to improve performance without forcing you to change your code.

The cumbersome part here for me, is how to get a specific set of an object's properties.

That's the point. If you as a user of the API are thinking in terms of the object's properties, you're doing it wrong. :-) (OK, not wrong, but class-as-struct, not class-as-object.) You should be thinking in terms of the object's behaviors, its actions. Those are methods, not properties. There are no properties.

(array)$obj wouldn't do what you want, because that casts the (public?) properties of the object into an array. That's what it's documented to do. I don't know why you'd expect that to magically know which subset of internal properties that may or may not even exist you want.

I like your original post - the issue isn't specific to Drupal.

By abstracting that behind a method

It doesn't have to be a method. In PHP, it is possible to pre-define that variable/property with a relational mapper that upon first access will then retrieve the desired info (database, web service, etc), e.g their posts, comments etc.

An example might be a session user object.

How?

Are you referring to the __get() method? That's honestly a hack. It's slower than a normal method, and still in most cases presumes a 1:1 correlation between internal and external properties. If that's not the case, then you may as well just use a method and be done with it. It's faster and clearer what you're doing.

I could not agree more.

I could not agree more. However, I would be happy if we would not need this discussion, and could move on with more interesting stuff.
I miss your "eval is evil" feedback on the "stapler" thing :)

Object Oriented Programming

> Why bother having matching getters and setters for every property? That breaks encapsulation, one of
> the key reasons for using an object approach in the first place, and is one of the reasons why JavaBeans
> are, in most cases, a horribly bad model to follow.

I agree in full with your observation, and this is just typical of the lack of understanding of those who just do not understand OOP and why in my opinion, they'll never really (ever) be a web developer... not at a professional level at least.

There are so "many experts" today but expert at what? Anyone can turn to the internet and claim to be an expert at one thing or another, yet these "experts" lack the basics of what constitutes a programming language, nevermind to say having any skill at whatever level.

I usually refer to those people as amateurs, damn would be Jedi's make a complete a*se of it and the rest of us have to clean up after them... those responsible for WordPress are a prime example of doing it wrong.

Overly harsh

I think you're being overly harsh. The claim that "you're not really a web developer if you aren't doing complex OOP" is just as closed minded an approach as "blech, that evil OOP is too inflexible". That's the point I'm trying to make. Both approaches if used properly can be extremely powerful and effective, but you need to understand the differences in those approaches in order to leverage them properly. Trying to use one as if it were the other is a great way to end up with spaghetti code.