Just a generic ramble on all the most frequently asked questions in this field of 'bitcoin detection', these are the most frequently asked questions, and the answer, the reason for posting this is to get google to make this site more 'sticky' by having the answers to commonly asked questions on LINUX, where people are working with HUGE FILES, which is essentially what Inflection.top is doing.
du . -a | sort -nr | head -n 25
Where '.' is these path, this lists the top 25 largest files in the path
cat hashfile.hex | sort -u > newfile.hex
Note 'sort' comes to a grinding halt once files get bigger than about 10gb, thus keep this in mind when sorting files. When working with huge files exceeding 64gb, its better to use 'split' and work with manageable files.
cat sorted-hash160.hex | xxd -r -p > sorted-hash160.bin
Converts the sorted list of hex addresses into binary numbers, which allows 100% verification that an address is found. Bloom-Filters are probabilistic ( typical 0.0001 % false positive ), Tries are 100% certain.
grep -f pristine.hex found.hex
where pristine.hex is a list of high value hex addresses in hash160(64 hex char format)
[ Note on Inflection we use our TRIE engine to do this type of searching which gives instant response ]
The public-key is the point Q=d*P, where Q(x,y) is a pair of numbers x & y, an un-compressed public key is just '0x04'+X+Y. A compressed key is 0x2 or 0x3 followed by 'X', the 2/3 or tells you whether the y was even or odd, e.g. positive or negative. This is just a means to drop the size of the public-key by 50%. [ d is the private-key a hex64 number, and P is the ECDSA base-point ]
In bitcoin an address is a Hash160 number, that is its a 20 byte integer stored as 40 hex characters. The '160' is the number of bits, where 8*20 = 160. Hash160 refers to the fact that a Sha256 operation is applied to a public-key to a 'hash'.
In bitcoin a user typically works with a ripe160 address which is another operation and the hex is converted to base 58 which allows almost the entire alphabet, this drops the 64 'hex' to 32 character human readable string. In bitcoin scanning we typically work with the hash160, or the public key as they are stored on the block-chain.
Its called a 'hash' and was developed by the NSA. Typically 'hashing' is a method of compressing lots of data into one small number that is unique.
In the bitcoin block-chain prior to 2013 the addresses were stored as public-keys, which are 128 hex-chars; Post 2013 bitcoin went to hash-160, where the public-key is stored as a 'hash', e.g. hash160=sha256(public-key). BITCOIN did this because they feared that giving the public the public-key on the block-chain was giving away too much information. This is TRUE.
In Inflection.Top we prefer these 'pristine' public-keys, as they provide much more information in the problem of Q=d*P; Post 2013 what you have is sha256(Q)=d*P; P is a known, d is what we're looking for the 'private-key', with Q hashed it is much harder to solve the DLP ( Discrete Log Problem ).
Again this was developed by the NSA. ECDSA is Ecliptic Curve Discrete Signature Algorithm; Its a method 'signing' a public-key to prove ownership. ECDSA is of most interest to 'Inflection' because this is the algorithm for calculating Q=d*P, where 'P' is calculated on the basis of Secp256k1, on of 1,000's of different methods of implementing ECDSA. Secp256k1 again was implemented by NSA, just like SHA-256. Which makes most experts assume that NSA created Bitcoin, and that they also have a 'backdoor' as at no time in HUMAN HISTORY, has NSA ever created a 'public standard' where they didn't retain the 'master-key'. They did this for DSA, DES, RSA, and AES. Only a fool would assume they didn't do this for Secp256k1 & SHA256.
Well you can catalogue them, or sweep them into your own account in bitcoin. Typically what people do is sweep, .e.g. they run the bitcoin daemon on their computer and import that private-key found, and the funds become theirs.
Ethically its most common to try to find the owner, and not assume the 'coin' is lost; Just like real-life, you find a bag of money or coin on the street, in most jurisdictions your required to report the 'discovery' and most governments have rules about how much of the unclaimed money goes to the 'finder' and how much goes to the government.
If its a large amount you find, then in general you may want to inspect the blockchain and determine when it was 'mined' and by whom, if its was mined by a 'pool' then its fairly easy to track down the real owner. Most intelligent people are more than happy to pay a finders fee for 'returning' their 'lost' money.
Well Bitcoin is now ten years old, and the TECH (NSA) ECDSA-SECP256k1/SHA256 is much older, historically most of these NSA public standards don't have more than a ten year life, before they're 'cracked'.
IMHO I would keep my BIG money in a 'coin' that has 512 bit Hash key and a 512 bit ECC Signature, which should be good until around 2030, I assume that most current sha256 now in use will be broken by 2020.
BITCOIN core is a private-club ( a clique you would call in high-school ) of guys who have gotten very conservative, and just because your first doesn't mean anything in the real-world. Lotus-123 was first in spreadsheet in 1985, and Digital Alta-vista was first in 'search' today nobody ever heard of these company's, the same will be of bitcoin.
BITCOIN refuses to solve any problem in bitcoin like privacy, or the use of NSA algo's, thus in time they will die because of HUBRIS.
The private-key is a scalar-number, 64 hex chars, or 32 bytes, e.g. 256 bits 8*32; 2**256 (base-16), or 10e77 in base-10
It's a HUGE number, as 10e77 is bigger than the number of atom's in the universe. Q & P are called "POINTS" they refer to an (x,y) pair that point to a place on the elliptic-curve. BITCOIN uses the secp256k1 curve, which just mean's y**2 = x**3 + 7. Quite simple? Right? If you know x you can get y, or vice-versa. The problem is that secp256k1 specifys more number, N, and n. These too are huge numbers, and the equation is just y=f(x), buts it y mod(n)=f(x) mod(n), which means that as d multiplys Q, that it wraps around n, like d/n, when d is bigger than n, or of d*P is bigger than n, and nobody knows how many wraps.
But 'd' is what we're looking for so, its the holy-grail of this business, lucky for us there are lots of relationships like
which means that for every scalar multiply ( our d ) there is some eta that will generate the same number when 'x' is multiplied by eta, if you can find the eta/lamda pair, then you lamda become the 'd' we're looking for. These are just properties of the elliptic-curve.
The most direct thing is to go to www.bitaddress.org, and put in the private-key and generate the address. Then using the cmd-line on linux you can type.
This will return the current value of that address.
Of course if your dealing with a million keys per second, its impossible to use 'curl' to find the value. There are tools cmd-line we include that will convert the private-key to an address, so you don't need to use bitaddress.org
Using 'Inflection' what we do is first run the found private-keys through bloom-filters, which are 1e-6 false positive, so one in a million, GPU-FLAYER ( our product ) will only alert on keys found that match the bloom filter, but note that one in a million false postive is a lot when your check 100 million key/second, which results in say one 'find' a minute, then these results are passed to a "TRIE" which has a 100% correct factor. If this generates a 'positive' then you know you have found a high value key.
Then we do the 'curl' operation to determine if these prospective key still has value. This way we rarely have to test the outside world, say just a few times a day.
Well invented by a guy name 'bloom' a long time ago. Its a method of marking binary bits in a array for the occurrence of a hashed number ( just like our hash160 btc addresses :) ). They don't just mark one bit, but they mark say 20 bits at calculated locations in the bloom-filter, which is a large file say 2GB, which is 16GB of bits, so if we divide 16e9/20 ( 50 million ), we get some can of an idea of the number of items we can uniquely identify. A bloom is called a 'probabilistic filter' because it doesn't tell you 100%, it only tells you 99.9999999%, and its that other 0.00000000001% that is OUR PROBLEM. Well a 2GB bloom-filter representing say 50 million addresses, means we can detect say all bitcoin addresses ever used having over 0.001 BTC in value. Not Bad. So instead of looking for 'dormant coin' one at a time, we're looking for 50e6 at at time, which greatly increases the probability of a find, when using gpu-flayer with modern ECDLP technique.
But bloom filters are great, they are RAM based and can be ran on GPU cards, so if we're running a NVIDIA-1080 with 10GB, then we can have an 8GB bloom, which means we can represent 800 million uniq bitcoin addresses with a 1e-9 false positive rate, one in a billion,
A trie is what we use to determine 100% that a BITCOIN address has been found. The bloom-filter is only a probabilistic filter, but we want to know 100% true yes or no, and then only then do we go online to determine whether that address has current value.
A trie is an algorithm of storing all digits as a sorted array in binary rather than ascii, ascii is human readable, but the TRIE file (.BIN) is not for humans, its for the hard-disk, and is only read by the TRIE-ENGINE. The Trie has very fast lookup, say we have our file of 50 million big-numbers, that file is 3-6 Giga-bytes, if we tried to use 'find' to determine whether the list contained a number it would take a long-long time. But with a TRIE its almost instaneous, as only a few lookups are required, and rather than searching through the file a Trie can directly pre-calculate the next sector to be read, so only a few disk-reads are required to get back a 100% yes/no.
Because keeping a bitcoin database up to date with all value for all bitcoin addresses is a BIG job, it requires a full-time database server, and there are dozens of companys that do this service for free, so why bother, just don't bug them. Most only support one prompt every second, but that's ok for Inflection, as we typically only enquire a few keys a day, we don't inquire unless we 100% know this is a good private-key/addresss pair.
Ok, good question, we provide a database, a python engine that processes the bitcoin block-chain every 10 minutes and harvests the new addresses by 'value', typically its best to not bother with addresses containing less than 0.001 btc value ( $6USD @ $6k BTC), actually 0.05 ( $300 USD ) is the best cutoff.
We provide tools to harvest all public-keys to date, and all hash-160 addresses to date. It's important to have processes running every 10 minutes to update the bloom-filters and lists of address database files by value and to update their TRIE files.
The GPU-FLAYER depends on having the bloom-filter's updated every 10 minutes so a long time doesn't go by, you don't want to miss a new address, you must assume there are another 100 people out there doing the same thing, thus if your target is new coins you must be FAST.
We also include code to harvest the 'memory pool' but we don't find that to be useful as much of those addresses are tossed in the /dev/null bucket, its best to just grab the new as the new block chain goes live.
If you want the files of all the bitcoin addresses ( or ethereum or whatever ) just tell us we'll send them to you, but its best to do it yourself, so you can learn to customize the process.
Yes, we have, we use LSTM-RNN on large databases and 'train' our ML engines to learn the relationship of addresses, private-keys, public-keys, wif's, and hashes.
The results are interesting for raw 'brain wallet' strings, or raw private-keys that have been used, the relationship of a sha256 hash to a public-key is pretty much zilch, as can be assumed, there is max entropy, but for low-entropy relationship this is a great way to learn patterns. For instance learning primes, and teaching the RNN to learn the ecdsa primes for bitcoin ( secp256k1 ), lots of areas for research here.
All this ML code is included in our 'inflection system'. Available to all our customers.
Our active research is using RNN-LSTM to feed GPU-FLAYER with a prime searching algo which 'Index Calculus', the ML can learn the patterns of the public-keys, and assist in prime de-composition, training data can come from MSIEVE ( state of the art factoring ), what's good about ML is that "on-the-fly" rough-ball-park factoring can be determined to feed search patterns to GPU-FLAYER.