Sets of continuous booleans

How can we turn these continuous truth values into something we can actually run computations with?

Let's imagine sets of classical booleans. Let's say we're trying to write an algorithm to determine whether 2 people are related based on various properties. We could define

def is_dog(it):
    IF (it.has_four_legs AND it.has_fur AND it.is_pet AND it.barks): 

So this is kind of a naive way to do it. There are animals that are clearly dogs that don't have 4 legs. There are also dogs that aren't pets. This will misclassify many animals. The concept of family resemblance points to a slightly more resilient algorithm.

In an imperative program it would be simple.

def is_dog(it, acc):
    VAR dog_points = 0;
    IF (it.has_four_legs):
    IF (it.is_pet):
    IF (it.barks):
    IF (dog_points >= 3):
        return TRUE
        return FALSE    

This would seem extremely oversimplified at first glance, but the fact that normal curves exist all over in nature indicate that this is closer to reality than you would think based on how simple it is.

A binomial distribution is what you get when you flip a coin N times. As you keep flipping the coin, it turns into a normal curve. In other words, normal curves can be modeled as a series of coin flips. In other words, the fact that height is normally distributed means that we can model the factors contributing to height as a series of coin flips, each contributing a minuscule amount to height. These factors could be genetic or environmental. It is clear that this is an oversimplified view. Some factors contribute more than others. But the key is that the randomness tends to be uniform throughout all scales. It really doesn't matter whether they all contribute the same amount if there are enough factors, they will still be randomly distributed. If there were one factor that vastly overshadowed the others, you would see a bimodal distribution. But since we often see a very well-shaped normal curve, in those instances we can assume there is no 1 factor that is so significant that it outweighs the others, so we can treat the situation as a series of coin flips... i.e. use the binomial.

In a similar manner, we can imagine different truth values for it.is_pet. We can assign a probability instead of a discrete boolean value. If we do this with enough variables and the probabilities are symmetrically distributed around .5, we should still see a similar distribution that tends towards a normal curve.

The most realistic version would take these factors into account, but with enough bits, we can translate these factors into more coin-flips and get a similar curve. In a way this makes sense. Sure maybe it.has_four_legs isn't resolved enough. It's clear that having 4 legs is more dog-like than having 2, so that should factor in some. But perhaps rather than just saying it.has_four_legs, it's better to say (it.has_four_legs OR (it.has_three_legs AND it.lost_one_leg)). In this case you're taking expanding the 1 bit, and essentially turning the statement into more and more a continuous boolean.. where it is can be in many more states than just TRUE and FALSE.

In real life everything has this property if you start looking hard enough. There are always edge cases. We use our human judgement on these, and computers do well with this using neural nets. We can keep zooming in infinitely. So in order to properly model it, we should develop a system of computation that reflects this.

So if we have a set of continuous booleans.. say have a set of 4 boolean criteria that are each met with a probability of 50%, we can model this very similarly to having 4 bits with 2 on and 2 off. Of course if these probabilities change we may have to allocate more bits to deal with the gradation. However if all 4 of the continuous booleans are changing, they may balance each other out and maintain a state of being able to be modeled with fewer bits.

Another thing is we can allocate more bits to give a greater resolution when the booleans represent something important, and scale to fewer bits when they should have less influence. For instance if we are running a program that is determining the predicted snowfall this year, we might have a boolean for june.is_record_heat, and that might get allocated a lot of bits.. Even if we only have 1 bit of information, it's potentially an important bit. However we might have continuous information about each day's air quality index... how far form average it is. But if we feel this information is less valuable, we can just assign it 2 or 3 bits.. then when we sum up all the bits. we will have an aggregate number that can easily be operated on in the same way that we would operate on any single bit.

Causality and continuous conditionals

In the discussion of continuous conditionals, I mentioned that if you play around with the formulas, you'll see why they make sense.

In a circuit, if you add 2 resistors in parallel, their total resistance weakens. As you keep adding resistors, the resistance moves towards 0 but never hits it. When you put them in series, the resistance just adds normally. In parallel, the resistance adds as . The OR function shown in the above post can be rewritten as .

While the math isn't exactly the same, there is a hyperbolic aspect to both this addition in parallel and the OR operator defined in the above post. They are both the inversion of a combination of inversions. Since probabilities are a way of mapping an entire space onto the interval [0,1], it kind of makes sense that the math wouldn't be exactly the same. Probabilities and information often have exponential mappings to regular dimensions, and with exponential transformations comes mapping multiplicative operations to arithmetic or visa versa.

This isn't a very exact explanation, but I'm not totally sure of exactly why it comes out that way myself. The reason this is still worth discussing is because of causal sequences. A domino can fall over and hit another domino, which can in turn hit another domino. This is causation in series. A domino can also be arranged to fall due to more than 1 domino. In other words, causality may be a directed graph, but it is not acyclical.

If there is a series causation, then that means each step in the chain must occur in order for the outcome to happen. This is much like an AND gate.

If there is parallel causation, this means that in order for the outcome to happen not none of the possible causes happened. In other words 1 OR the other happened.

So the fact that our parallel causation and parallel resistance show some mathematical similarity makes sense on some level. However, it is necessary to dig deeper to see where the differences arise. Perhaps an exploration of how much information is transferred through the and and or gates would yield some more insight into these similarities.

Continuous conditionals

Boolean logic is the basis of all computing. But these true and false values are assumed to be perfect. In practice our hardware may as well be perfect. There are few misreads and miswrites, and error correction is built in at a low enough level that it does not matter to the programmer. However there are situations where this does not work.

For these situations, we need new tools that do not rely on perfect boolean logic. This is the case for quantum computing, however it also is important for understanding and working with neural networks, because while their outputs are generally ultimately mapped to a true or false, there are may cases where it would be more useful to work with the continuous probabilistic output they give.

For a variety of reasons, the easiest way to deal with this is via probabilities. If this is not convincing, then I can make a detailed case for it, but I doubt most people involved in these topics would have qualms with that statement. So the goal then is to implement control structures via probabilities, where a probability close to 1 is "truthy" and a probability close to 0 is "falsey."

So first thing is that "AND(A,B)" and "OR(A,B)" can be mapped very easily to and respectively for single values. "NOT(A)" is just .

If you play around with it you will see why. For the "AND" function, if either A or B are close to 0, the value is close to zero. When the probabilities are 1 or 0, they behave appropriately as the discrete boolean gates would. In this way, the discrete case is a special case of these continuous probability control structures. In this post, I attempt to provide a deeper, intuitive explanation, other than just "the math works".

In boolean logic one way to determine whether a large set of bits contains any data--whether it is "NULL" is to use NOT OR(all the bits). This allows a mathematical definition of the IF operator as NOT(OR(A,B,...)). This would translate to ... which could alternatively be called "AND(NOT(A), NOT(B))."

So now we have at least the ability to do some basic control statements using only math.

So from a hardware perspective, as long as we can map all inputs to a signal voltage, we can directly take these inputs and use them to trigger logical chains, assuming we have gates that can perform these basic functions, which so far boil down to --[I'll call this the "inverter"] and --[I'll call this the "multiplier"]. Those circuit elements do exist, however I am not sure how accurate they are on computer-type scales. In a quantum computer it seems the inverter would be the Pauli X gate and the multiplier I can't find an obvious correlate.

Macroscopic experiment to demonstrate ERP Paradox

If you throw a stick and it is spinning around, and it breaks apart in midair, the two parts have angular momenta that are related to the original... If you measure the 2 sub-sticks, they will relate to each other in a predictable way

If the angle of the vector of the rotation is uniformly distributed around the sphere, if you measure the angle of rotation between these 2 sticks by using cameras on different axes. It should reproduce the EPR paradox statistics.

The rotation is a random vector uniformly distributed around the sphere. Let's give it a qty 3i - 4j +2k. Now for the positions of the cameras.. Lets choose 1 on the +y axis and one 90 degrees away on the +z axis. When we read this from the y axis, we will only be able to read the y component. If there is no y component the stick just looks like a line segment in the video (if you do the lighting right). Obviously we know the y component is uncorrelated to the x component. They should measure the same 50% of the time. So then we have our camera on the x axis measuring which measures the x axis. If you start to move this camera in an arc from where it is to the y position, at 45degrees the measurement will start to depend a bit on the x component, and slowly less on the y component.

The camera we're moving was measuring only the x component, but as it moves along the arc, it begins measuring some combo of the x and y component, depending on where it is in the arc. If the y component was positive and the x component was negative, at the starting point, they will read opposite. At 45 degrees, it will depend on which component was stronger. Given our above vector, the stationary camera on the y axis will be measuring +3. The mobile camera in the starting position will measure -4. As it moves to 45degrees, it will measure some combo, but it will be more negative as the x component is larger. However you can imagine plenty of scenarios where the the x and y components of the spin are the same. For all these, they will measure the same. You can also imagine plenty of scenarios where the x component is larger than the y component, in which case they will measure the same.

2nd law of thermodynamics directly applied to sociology

2 hydrogens together constrain each other along the spatial dimensions. This is a reduction of degrees of freedom. However they gain access to new orbital configurations and potentially nuclear resonance states. If these new states did not contain more degrees of freedom than the spatial degrees of freedom that are now constrained, it would be a violation of the 2nd law of thermo because there would be fewer microstates for the macrostate, meaning a decrease in entropy which is unstable.

So if this is the case, we can presume that a large stable molecule like DNA has a similar dynamic of increasing access to degrees of freedom, or similarly large groups of those molecules, like life.

OK, now for a leap:

Similarly, if a set of laws constrains human behavior, the resulting structure of those laws must provide access to more degrees of freedom than it is constraining or else it will be unstable.

There is no apparent logical reason why constraints on human behavior would not be considered a reduction of physical entropy. It is a reduction of the number of states that a human can exhibit, which in turn is a reduction of the number of states the molecules which make up a human can exhibit.

Work and Precision II

So in the previous post, we set up the basic of a model to allow us to distinguish between "intentional" and "unintentional" energy. The "unintentional energy" has a lot in common with the internal energy of the system. In a system describing a body moving with overall velocity , the energy is . If the body does not move in a straight line, then the "wasted" energy can be lumped into the term, or separated out from it.

But it also bears some parallels with another energy quantity. In the Gibb's free energy equation we have the term TS. This is temperature times entropy... basically energy we can't do anything with because we can't distinguish the microstates. Those 2 terms--average kinetic energy (Temperature) and bits (or joules per kelvin--Entropy) combine to account for the kinetic energy of the system we can't use. This is why it is subtracted from the total energy to get the "free" energy.

This entropy is defined by Gibb's formula as . This is basically a measure of the number of microstates being ignored per macrostate that we observe. We could define our macrostates as simply "T" for target and "~T" for not target and determine which portion of position microstates fall into each position macrostate to determine the entropy of each macrostate. Note that if each microstate is considered equiprobable, the equation simplifies to where N is the number of microstates covered by the macrostate.

So we have our particle which moves to position x in time t. In that same time t, rather than going to point x, it could have gone any number of other places. There is a probability that it went to point x instead of those places. We'll call that or "target probability." So is the probability of ending up anywhere else. An object that is intending to move toward a target can sample its position. The more often it samples its position, the more quickly it can correct its course. This means it will be truer to the most efficient path. In this way, more information from sampling and correcting ones trajectory can result in lower waste energy, so a sample rate can translate to energy through that route. So in this way we see that more information can reduce the term of an energy equation, reducing entropy. In this case the total energy of the system is reduced. However from a different perspective we could say the energy isn't reduced but made available for work. If we assume that whatever energy was being wasted while driving this particle would eventually be used for something else. So how can we describe that amount of energy that information can make available for work?

If we just set the particle down and it happened to end up at the target at time t due to random collisions with other particles, then work wasn't really done on the system. In order for us to have "done work," we have to actually manipulate the probability. We are changing by affecting the probability distribution of its motion. There are many ways we can physically accomplish this. We can charge our particle and create an opposite charge where our target is. We can create a hole and have a massive object pull our particle down into that hole. We can create a robot that pushes the particle, etc. Each of these things will affect the probability that our particle ends up at the target state and they all take physical work. In particular the first 2 require imbuing the particle with some potential relative to the target where the target is an equilibrium in that potential space. In the 3rd example, we're imbuing the robot with the potential and it is then dosing that potential out to the particle bit by bit for many sub-targets. The target must somehow be stored as an equilibrium in the robot's logic system. A discrete attractor in some function the robot executes.

So we do work on this particle by changing the probability distribution. If we put a lot of charge on the particle or create a very steep slope to the bottom of the hole, our particle will reach the target faster. The larger the potential we create, the faster the object reaches that equilibrium target. However it may overshoot and then have to come back. It may oscillate about the target rather than actually come to rest within the target area. However it will still spend a larger portion of its time in the target area. In other words, it has a higher probability of being found in the target area than if the potential was not applied. This should also affect the entropy calculation for the 2 macrostates since the probabilities of being found in the different states are different.

Generally work is discussed in the context of thermodynamics or newtonian physics as either heat energy or kinetic energy. This kinetic energy is often described as giving velocity to an object--moving it within momentum space. However due to the relationship between position and momentum space, this same conception of work/energy should be able to translate to specific movement within position space as well. What I'm trying to get at is that putting a certain particle in a certain position should take a quantifiable amount of work. Just like giving a certain particle a certain momentum takes a quantifiable amount of work. The more precise you want to be, the more energy it will take. In other words, the lower your tolerance is, or the smaller your target area, the more work will be required.

Depending on the temperature, a particle won't just be at a position. It will be within a certain range a certain percentage of time. And how will we know whether it is within range? There is no way to measure for sure... Many of the measurements will be inaccurate and will create the same issues of quantum measurement we often see. This means that we will never be 100% sure that it is in our target area. We can only have a certain confidence.

Back to the question of quantifying the amount of energy that information can make available. When we subdivide the state space into more possible configurations, we increase the information. Just by decreasing the number of microstates we describe with each macrostate, we're decreasing the term of the energy equation. But the total energy doesn't decrease. Free energy increases. This means we have to describe this information in some other way. If we have a system composed of a container with 3 particles, and consider the rest of the system to be a normal ideal gas, we add the information we have about the 3 particles, and then add the abstracted information from the rest. In this case in our system description, we have "names" for the 3 particles we're considering ourselves to have information on. The rest of the particles are unnamed.

As we observe more about the system, we can name more and more particles. However, rather than name actual particles, we could name sets of particles. This will reduce the amount of entropy for each macrostate. We should also be able to name "portions" of particles. If a given particle has a velocity vector which is made up of various components from various causes we should be able to abstract these out and "name" these contributions in the equation as long as we make a corresponding adjustment in the aggregated/abstracted term. This could be useful for a system with some magnetic particles whose motion is described partially due to magnetic forces as well as other forces. One could identify a contributor to the probability distribution of the particles while maintaining the unknown aspects of the particle as unidentified.

If 1 of these "named" components described the total energy of the system perfectly, we would say that that component is 100% correlated to the distribution described through that "naming." This would mean the mutual information would be exactly equal to the total calculated gibbs entropy of the system. In that case there would be 0 entropy however, because the "named" components of the system should wholly describe the kinetic energy distribution, so the term is 0, and the energy would be a consequence of the energy state described by the hypothesized distribution. This should never happen, so we will always have some of the term left.

We can hypothesize a distribution and measure the mutual information between our hypothesized and observed distributions. The mutual information that is created through this observation can be viewed as a form of entanglement with the observeer, which causes wavefunction collapse.

Work and Precision I

In chapter 9 of Alicia Juarrero's book "Dynamics in Action", the author describes the idea of "constraints as causes." She discusses these dynamics primarily from a thermodynamic point of view. In her discussion of the relationship between these contraints and neural networks, there are echos of the idea of ICP's from Representation and Behavior by Fred Keijzer. It seems to me that the work done to set these ICP's should be ruled by the laws of thermodynamics, as should the work of the behavior that results from them.

However all discussion of these things that I have come across has been in relatively general terms and has not guessed at a model that would indicate information's effect on individual particles. The goal is to find a model which relates information to the laws of thermodynamics in a way that allows us to pull out useful corollaries that can then be tested and apply to macroscopic systems in a more specific way.

One particular hurdle I have found is that there is not really a way to discuss intentional action to my knowledge in physics. In information theory on the other hand, intention is kind of taken as a given. The conceptions of entropy kind of highlight this. In information theory, entropy is a measure of the amount of information that can be communicated by comparing actual values to expected values and assuming the actual values have some meaning. In physics, it is often described as a measure of "randomness" but perhaps a better way to describe it is as a measure of how many degrees of freedom we are ignoring. We ignore those degrees of freedom because they cancel out, and we assume they are insignificant to what we are trying to find out.

In classical physics, the application of a force can be intentional, but whether it is or not doesn't matter at all to the end result. In information theory, we want to determine the degree to which the outcome matches the intention. This requires a comparison between the intended message and the received message. In information theory, this mechanism of comparison is a given and it is ignored.

However if setting our ICP's and reading from them are subject to thermodynamic constraints, then we need to understand the physics of that mechanism of comparison, or observation. Rather than try to tackle that entire model at once, I'm going to try to simplify the problem and follow intuitions and see where they lead until something sticks.

First I'm going to try to represent the energy required to move a particle from one point at rest to another at rest as the minimum possible energy plus some term representing the mean square displacement from that optimal trajectory.

In determining the most efficient force function we are trying to account for all energy put into the particle. This means that if the particle loses velocity, we count that as energy put into the system, because it takes force to slow down an object. Given and x-y graph where the x coordinate is time and the y coordinate is velocity, if the graph has any minima, this indicates that the particle slowed down and then sped back up. This will always be less energy efficient than if the particle never slowed down and merely sped up or maintained speed, as both the speeding up and and slowing down take energy. Any change in velocity takes energy. So our functions will be limited to those with no minima within the trajectory. We are constrained to 1 dimension, but even if we weren't, any deviation from a straight line would inherently be inefficient unless there were some obstacle in the way or an external force acting on the particle. But we are currently counting all forces as part of the total energy so that case is irrelevant, and we are ignoring obstacles.

We are not looking for the total kinetic energy of the particle at a given time, but the change . This is or in other words the proportion of kinetic energy that was added to or removed from the particle in a given dt. The total energy added or removed then should be . We can then subtract from that a term for the minimum energy necessary to get there. If the particle requires time T to get to that other point S meters away, then the average velocity will be so the minimum energy required to reach that point from the starting point would be .

There is some velocity curve that minimizes the total energy that will allow the particle to reach the desired point at time t with optimally low energy. According to my queries on math forums, these can be calculated using Hermite polynomials, which happen to give rise to eigenstates of the quantum harmonic oscillator. Whatever the function is, we can take the mean square displacement of the force vector from that force curve to give 2 terms--1 indicating the energy spent along the intended trajectory, and 1 indicating what is essentially waste.

So we can view these 2 quantities--the most efficient versus the "waste" enegry as intentional energy and unintentional energy. These quantities should be able to correlate through some model to the information and the entropy of the system. But there is still a lot of ground work to lay before it is clear.

Explanation of ICP's (Internal Control Parameters)

ICP's are referenced in "Representation and Behavior" as a concept of essentially a from of meta-information. The term Internal Control Parameter was coined in a paper on Meijer Bongaardt 1996. It is presumably referenced in the context of dynamical systems theory and epigenetics. I say presumably because I can't find the paper, but Keijzer discusses it in the context of epigenetics and here it is in the context of dynamical systems

Newtonian Physics as Information III

As referenced in the previous posts of this series, this post is dedicated to explaining how an information-theoretical model could tie into the physicality of accepted physics.  Information seems like such an ethereal thing.  Words and opinions are information and they are so subjective.  How could a word be responsible for something physical?  Most of the existing examples of information in physics deal with things that are hard to put in human terms.  For instance we know things about the the theoretical limits of information of a black hole, but it's hard to imagine what that could possibly mean.

We know that all matter exhibits wave-particle duality.  Any particle's motion can be described using a wave function.  This is essentially a probability density function that can either be described in terms of the particle's momentum or position.  It doesn't matter.  They're essentially describing the same thing... the movement of the particle.

We also know that every body with a temperature is emitting black body radiation.  This can be viewed as the result of collisions between particles within the body.  Each photon which is emitted as a result of this sort of collision will then carry some information about the motion of the particles which caused its emission.

It doesn't do much good for a mental intuition of the idea to think of that information as being stored as bits like they would be on a hard drive.  Instead it's easier to think of the wave nature of the particles.  As a large body moves through space, the particles which compose it do not occupy the exact same coordinates, just ones which are close.  They vary from the mean constantly.  The motion of larger mass could be viewed as a sinusoidal function with a very large amplitude and period, whereas the motion of the tiny particles making it up could be viewed as an added sine wave with a much smaller amplitude and period.  So the wave function of each tiny particle within the body also has a superposition of the wave function of the body it is contained in.

If many photons are being ejected from that body, then when they reach another body, their average should again show some effect of the wave function of the larger body.  In order for these photons to have any effect on the body, they must interact with the particles of that body in some way.  They either must be absorbed, absorbed and re-emitted, collide, or whatever.  No matter what, whatever particles they interact with will be influenced by the wave function of those photons.  The photons will have transferred some of the signal from the wave function of the original body to the new body in the process of that interaction.  This will cause some degree of correlation of movement between the particle in the new body that absorbs the photon, and the particle in the original body which emits it.

Meanwhile, the same is happening with photons that are emitted from the new body.  What is slowly happening is the particles which make up the 2 bodies are becoming entangled, or sharing signals, or increasing mutual information.

It's hard to see how jostling some photons would cause any behavior other than pushing the bodies away from each other, though.  And each individual particle is going to "forget" the information about the source signal as soon as it collides with something else.  For this we should think of dissipative systems.  A dissipative system is essentially one in which energy is being added to the system faster than it can be removed.  In these sorts of systems, patterns start emerging.  These patterns are correlations between particles, for instance vortices, like a whirlpool or a hurricane.  Another example is convection cells.

As photons are ejected from one body, they can not only affect individual particles in another body, but can churn up organized behavior.  These organized patterns of behavior are a form of memory, and the dynamics increase in magnitude as particles continue to bring information from their source.  Meanwhile, a similar system could be developing in the first body.  If there is some periodic churning going on in each body due to the signal coming from the other, then it is possible that that churning could be precisely aligned along some average phase and frequency of the photons coming from the source in a way that dampens oscillations away, and amplifies oscillations toward the source.  Regardless, the existence of those forms of dissipative organization would form a sort of distributed representation of the source signal.

That is one interpretation of the model I am proposing.  Another interpretation is to ignore the intuitive definition of distance and say that the mutual information is actually the distance.  And that the mass is the "number of switches."  This should work because we are essentially calling particles switches.  They store the transmitted information in their trajectories.  Uncorrelated particles could be moving any which way independently of the photons coming from the source.

A large body would have a large number of switches, and many of those switches would be correlated with other switches on the board.  In terms of particles, when a particle starts exhibiting behavior correlated with an outside source, it will likely soon be knocked off course by an internal particle.  In this way it will take a lot of driving from the original source to eventually create a significant pattern in the body.  This is why a body with more "switches" would have more resistance to an increase of mutual information, or a decrease in distance according to that interpretation.

These correlations in a body from an outside source are essentially a "representation" of that source of information.  As a representation becomes more an more resolved, it should contain more information about the original source.  This will mean that the 2 bodies are closer together, which will mean they are more able to transmit information, which will in turn increase the transmission of information.

Now we're talking about panels of switches and particles, but we could instead think of each body as containing continuous degrees of freedom.  These switches become continuous degrees of freedom in a similar way to how a binomial distribution with infinite trials becomes a gaussian distribution.  The patterns of organization that arise among several particles which I referred to as a sort of representation of the source signal can be viewed as a single degree of freedom where the potential energy that is stored in that pattern of physical behavior is a portion of the internal energy of the object. Each bit of information that arrives is like a trial in a binomial distribution that either contributes to that pattern or detracts from it.

These oscillators contain a certain amount of energy and interfering photons can either interfere constructively or destructively with the pattern of motion of the oscillator.  You can think of a body as having an oscillator representing each body around it, with an energy proportional to its distance.

That oscillator is essentially a like receiving antenna.  It receives signals with a fidelity that is affected by noise levels and the power of the signal.  The degree of entanglement is proportional in some way to the channel capacity. As the signal continues to be received, the mutual information grows/the oscillator increases in energy/the signal gets amplified, which in turn increases the channel capacity, decreasing the distance between the two bodies.  We can say that the body is entangled with everything in its observable universe to some extent, and that those entanglements are each represented by a channel capacity between the source and the representation within the body.  The energy that is stored in the representation is then proportional to the work that source can cause to be done within the body.

Signals from distant sources can thus do work on a body by doing work on these oscillators.  These oscillators can be correlated with each other in such a way that they exhibit chains of causal effects with an amount of work and precision proportional to the energy stored in these oscillators as well as the channel capacity between each oscillator.